Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Name________________________________ Stat 1 November 16, 2007 Exam #2 – SOLUTIONS (Nov. 19) You may use a calculator. You may not use other references or consult with anyone except the instructor. You may answer on the test paper itself or use your own paper. The exam ends at 11:20am by the front wall clock. Explain your answers! Sometimes an explanation is necessary for credit on a question; more often, it is necessary for partial credit. This test contains 10 numbered problems on pages 2-7. n 1 4.00 5.66 6.93 8.00 8.94 10.00 12.25 14.14 17.32 20.00 22.36 24.49 26.46 28.28 30.00 31.62 37.42 44.72 54.77 70.71 100.00 0.25 0.18 0.14 0.13 0.11 0.10 0.08 0.07 0.06 0.05 0.04 0.04 0.04 0.04 0.033 0.032 0.027 0.022 0.018 0.014 0.010 n 6 32 48 64 80 100 150 200 300 400 500 600 700 800 900 1000 1400 2000 3000 5000 10000 n It’s always fun to show Pascal’s triangle. 0 1 2 3 4 5 6 0 1 1 1 1 1 1 1 1 2 3 4 5 6 1 2 3 4 5 6 1 3 1 6 4 1 10 10 5 1 15 20 15 6 1 z -2.8 -2.7 -2.6 -2.5 -2.4 -2.3 -2.2 -2.1 -2.0 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1 -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 Percentile 0.0026 0.0035 0.0047 0.0062 0.0082 0.0107 0.0139 0.0179 0.0228 0.0287 0.0359 0.0446 0.0548 0.0668 0.0808 0.0968 0.1151 0.1357 0.1587 0.1841 0.2119 0.2420 0.2743 0.3085 0.3446 0.3821 0.4207 0.4602 0.5000 0.5398 0.5793 0.6179 0.6554 0.6915 0.7257 0.7580 0.7881 0.8159 0.8413 0.8643 0.8849 0.9032 0.9192 0.9332 0.9452 0.9554 0.9641 0.9713 0.9772 0.9821 0.9861 0.9893 0.9918 0.9938 0.9953 0.9965 0.9974 1 1. Here is a data table from a (very small) survey of rural children. interviewee Height (inches) Annie Bob Connie Donald Elsie Frank Gigi Harold 66 65 55 58 59 59 62 56 Number of siblings 0 4 1 2 2 0 2 0 Distance to school (miles) 3.3 0.7 1.5 4.5 4.4 3.0 1.0 82.0 The mean of the “daily commuting distance” variable is about 12.5, and the standard deviation is about 28. Some people would consider these statistics to be poor choices for describing the distribution. a. Why would they think that? Outliers have too much influence; values given do not describe bulk of values b. Suggest alternative statistics: Try median and IQR, or five-number summary, or compute mean and SD without the outlier (but in that case, mention the outlier) 2. The weights of melons shipped from a certain farm are normally distributed with these statistics: mean standard deviation = 60 ounces = 5 ounces What fraction of the weights are below 62 ounces ? (Note table on front page.) The standard score for a weight of 62 ounces is (62 – 60) / 5 = 0.40. According to the table, 0.6554 of all normally-distributed values fall below a standard score of 0.40. So, the answer is 65.54%. 2 3. A certain data table includes 100 values for a variable called “score.” Here is a histogram for this variable. (Note: Most of the values are not whole numbers, so there are aren’t many values at the boundaries of these intervals.) The mean of this variable is 85.0. Choose one: the median is less than 85.0 the median is more than 85.0 the median is equal to 85.0 (within rounding) can’t tell from the information given First: There are 100 observations, so the median should be between the 50th and 51st observations, counting from the top. From the histogram it appears that there are 40 observations in the top two boxes, and 20 in the “85-90” box, so the median should be somewhere in the middle of the 85-90 box. (Might it be exactly 85, or within rounding? Unlikely, but possible, from the information given.) or, Second: There is a substantial skew to the left (that is, large left tail). These extreme values will have more of an effect on the mean than the median, pulling the mean to the left of the median. So if the mean is 85, the median must be more than that. 3 4. In roulette, if you bet on “BLACK”, the probability of winning on each bet is 18/38 (or 9/19, if you like). Each bet is independent. a. If you bet twice, what is the probability that you win both bets? 9/19 times 9/19 equals 81/361. b. If you bet twice, what is the probability that you will lose at least once? (answer to b) = 1 – (answer to a) = 281 / 361. 5. Here is a probability model for your profit after two bets on “1 to 12”: value of profit (dollars) probability +4 +1 –2 0.10 0.43 0.47 What is the expected value of the profit? (+4) times (0.10) = 0.40 (+1) times (0.43) = 0.43 (–2) times (0.47) = –0.94 total = –0.11 (This is consistent with what we learned about roulette --- you lose an average of 5+ cents for each dollar you bet. Bet twice, lose an average of 10+ cents, which rounds up to 11 cents.) 6. Here are two sequences of wins and losses with exactly 3 wins and 3 losses: WWWLLL LWLWWL In all, how many sequences of wins and losses are there with exactly 3 wins and 3 losses? There are 20 of them. You can list them systematically, or look at row 6, column 3 of Pascal’s triangle, which is 20. 4 7. Exactly 50% of Swarthmore students expect to leave campus by December 18. But the 70 students in STAT 11 didn’t know that. So each of them did a poll of Swarthmore students, each using sample size 100, to see what fraction would leave by December 18. That meant that among them, they had 70 different estimates of the fraction. Their estimates ranged from 34% to 64%. a. What is the “margin of error” of a poll of size 100? 1/sqrt(100) = 1/10 = 0.10 = ten percentage points (That’s why Gallup doesn’t stop with 100 interviews!) b. (Roughly) what is the average of their estimates? The average of the poll results would be exactly 50% in the long run. For 70 of them the average would be close to 50%, although you couldn’t predict the average exactly. c. If the STAT 11 students drew a histogram of their estimates, what would be its shape? Normal. 8. Here are some scatterplots. Estimate the correlation coefficient ( r ) for each plot. 16 16 14 14 12 12 10 10 8 8 6 6 4 4 2 2 0 0 0 2 4 6 8 10 r = -1 (below -.9 is ok) 25 0 2 4 6 8 10 r = 0 (-.30 to +.30 is ok) 4 3. 5 20 3 2. 5 15 2 10 1. 5 1 5 0. 5 0 0 0 5 10 15 20 r = 0.80 (0.50 to 0.90 ok) 0 5 10 15 20 r = 0 (by symmetry, since there is no linear association) 5 9. Consider this scatterplot: a. Draw a reasonable guess for the regression line. (Think about the slope. What are you trying to accomplish with the line?) b. Do you like this regression model? Why or why not? No, it’s awful. The data are in two groups; any reasonable explanation or prediction method would take this into account. The negative linear relationship is entirely between groups, and misses the positive association within each group. 6 10. Here are some data. Well, actually I spilled a drink on the data, but you can still see the summary statistics. What is the regression line? Draw it and write the equation. Show scales on the axes. CARS (x) mean std. dev. 6 2 BOATS (y) boats 4 3 correlation coefficient: r = + 0.50 cars The line has to go through the “point of means,” with is x = 6, y = 4. Its slope is r = +0.50 when measured in standardized units --- that is, “y standard-deviations per x standard deviation.” In natural units, the slope is r times ( y std dev / x std dev ) = 0.50 times 3/2 = 0.75. So the line must have equation y = 0.75 x plus something and since this is true when x = 6, y = 4, the “something” must be -0.50. So the complete equation is y = 0.75 x – 0.50. (end of exam) 7