Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
DS280 – INTRODUCTION TO STATISTICS FALL SEMESTER 2003 “Big Quiz” #3 Answer the following questions in the space provided. Show your work as appropriate. Relative problem weights are given in brackets; these total 100 points. The word “Pledged” in front of your signature on this “big quiz” indicates your ongoing commitment to the Stetson Honor System. Question 1 [12 points; 2 points apiece]: Indicate whether each of the following statements is TRUE or FALSE. a) Income data are typically skewed right. TRUE FALSE b) If two events are mutually exclusive, they are also independent. TRUE FALSE c) It is theoretically possible for the covariance to be negative. TRUE FALSE d) When the Florida Lottery Commission says your odds against winning the lottery are 23 million to one, they are giving a classical probability. TRUE FALSE Question 2 [3 points]: For a normal distribution, about ________ of the data re within one standard deviation of the mean, about ________ are within two standard deviations of the mean, and about ________ are within three standard deviations of the mean. Question 3 [3 points]: 3456 Evaluate C 2 . Question 4 [12 points]: Find the sample mean, median, variance, and standard deviation for the following data: 6 2 9 0 3 Question 5 [8 points]: The table below gives the percentage of forested land in each of the twenty-one provinces of the Kingdom of Boravia. Province % forest Athabasca 3.9% Snohomish 5.4% Murgatroyd 5.7% Mugwump 6.4% Thalia 7.5% Ondibrox 7.9% Morquitize 8.2% Province % forest Paranoia 9.1% BamaAla 9.8% Hoozit 10.5% Prandagloid 12.2% Mortimer 15.3% Nermal 18.5% Lollapalooza 19.7% Province Jambalaya BoraBora Kali Lucretia Prunella Otho Percival % forest 24.7% 28.6% 32.9% 35.9% 39.5% 42.0% 48.9% Sketch an appropriate graph to illustrate these data. Label it appropriately. State, in a sentence or two, what the graph tells about the data. Question 6 [18 points, 6 each part]: Daily returns on the Boravian stock market are approximately normally distributed, with a mean of 0.05% and a standard deviation of 1.12%. a) What percentage of the time is the return on the Boravian stock market positive (i.e., greater than 0%)? b) What percentage of the time does the Boravian stock market gain more than 10% in a single day? c) On the worst 1% of the days on the Boravian stock market, what returns occur? Question 7 [16 points, divided as indicated]: Bettors in the Boravian State Lottery three numbers out of ten. They win if all three numbers are correct. a) [8] If you buy one ticket in the Boravian State Lottery, what is your probability of winning? b) [8] Lottery tickets cost one Boravian dollar. The lottery pays “100-to-1.” That is, if you win, you receive $100 (your $1 back, plus $99 more). Otherwise, you lose your dollar. What is the expected value of your net winnings? The variance? Question 8 [16 points]: The data below give the number of hours of sleep the night before a “big quiz”, and the score on the big quiz, for four students in Recreational Statistics class. Find the slope and intercept of the regression model for these data. Interpret these numbers, in context. Sleep: Score: 0 34 8 98 6 98 2 66 Question 9 [4 points]: Several research studies on the extent to which genius is inherited have shown that children of Nobel Prize winners, while generally of above average intelligence, tend not to be as smart as their parents. This is an example of … ______ Simpson’s paradox ______ Chevalier de Mere’s fallacy ______ regression to the mean ______ false positive rate Question 10 [4 points]: Clorinda Cragdingle owns 42 shares of stock in the Sirius Cybernetics Corporation (SCC). She wants to invest more money in the stock market. Statistically speaking, which of the following investments will reduce her risk (variance) the most? ______ buying more stock in the Sirius Cybernetics Corporation ______ buying stock in the Antares Cybernetics Corporation, a company in the same industry whose stock returns have correlation with SCC of .9. ______ buying stock in Amalgamated Fratostat, a heavy manufacturing company whose stock returns have correlation with SCC of .2. ______ buying stock in Repos-R-Us, an automobile repossession company whose stock returns have correlation with SCC of -.5. Question 11 [4 points]: Does increased television violence cause increased crime? Researchers at the University of Southern North Dakota have obtained data, for the past twenty years, on number of violent acts per hour of television programming. They have also obtained data on the per capita crime rate, for the same period. The correlation between the two is 0.68. Does this indicate that television violence causes increased crime? Explain. Question 12 [4 points]: What is a saturation model in regression? Give an example to illustrate. DS280 – FALL 2003 – “BIG QUIZ” #3 - SOLUTIONS 1a) True 1b) False 1c) True 1d) True 2) two-thirds, 95%, virtually all 3) C 3456 2 3456! 3456 3455 3454 ... 2 1 3456 3455 = 5 970 240 2! 3454! 2 1 3454 3453 ... 2 1 2 4) mean = (6 + 2 + 9 + 0 + 3)/5 = 20/5 = 4 median: 0 2 3 6 9 (put the data in order first) 1 2 2 X 2 X X X n variance: either compute or n 1 n 1 2 2 (6 4) (2 4) ... (3 4) 2 4 4 25 16 1 By the first method: s 2 = 12.5 4 4 By the second method: X X2 6 36 2 4 9 81 0 0 3 9 Totals: 20 130 1 130 (20) 2 130 80 5 So variance = = 12.5 4 4 standard deviation is the square root of the variance: 12.5 = 3.54 5) Do a histogram of the data. Any reasonable horizontal axis is OK. One possible graph is given below. (It was done in Excel, which does wretched histograms.) You should also comment on interpretation of the graph (most concentrated in 0-20%; skewed right; outlier s in the 40’s, etc.) Forested Land Percentages in Boravia 10 Frequency 8 6 4 2 0 10 20 30 40 Percentage of Forested Land 50 6a) z = (0 - .05)/1.12 = -.04. This is a z-score; looking up its probability gives .0160. So the overall probability is .5 + .0160 = .5160 6b) z = (10 - .05)/1.12 = 8.88. That is, nearly 9 standard deviations from the mean. So the probability is approximately 0. 6c) We want the bottom 1% of the curve. We look up .5 - .01 = .49 in the table (in the center of the table, with the probabilities), and read off a z-score of 2.33. In other words, the cutoff point is 2.33 standard deviations below the mean. So: .05 – (2.33)*(1.12) = -2.56%. 7a) P(all numbers correct) = P(1st correct AND 2nd AND 3rd) = OR: There are C 7b) Net winnings | 99 | -1 | 10 3 3 2 1 = 1/120 10 9 8 10! 120 possible tickets, so the probability of winning is 1/120. 3! 7 ! Probability 1/120 119/120 E(X) = (99)(1/120) + (-1)(119/120) = -.167 V(X) = [(99)2(1/120) + (-1)2(119/120)] – (-.167)2 = 82.64 8) First compute the covariance. First method: X Y X-Xbar 0 34 -4 8 98 4 6 98 2 2 66 -2 Total: Second method: Y-Ybar product X Y X*Y -40 160 0 34 0 24 96 8 98 784 24 48 6 98 588 -8 16 2 66 132 320 16 296 1504 1 1504 (16) (296) 320 4 Covariance = = 106.67 Covariance = = 106.67 3 3 Then: slope = rise/run = Covar/Var(X) = 106.67/13.33 = 8. For each additional hour of sleep, grade increases 8 points, on average. To get the intercept, plug in the average X and Y values: Y = mX + b → 74 = (8)*(4) + b → b = 42. Folk who get 0 sleep score a 42, on average. 9) regression to the mean 10) buying stock in Repos- are –Us … correlation with SCC of -.5. NOTE: Since Var(X+Y) = Var(X) + Var(Y) + 2*Cov(X,Y), we can reduce the overall variance the most by having the second stock have a negative covariance with the first. 11) No. Correlation does not imply causality. All we’ve shown is that the two have tended to happen in tandem. 12) A “filling up” process. After a point, diminishing returns set in and the rate of change declines. The product life cycle is the most common business example – after a while, everyone who plans to buy a DVD player has done so and sales level off.