* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter 14: Inference for Regression
Survey
Document related concepts
Transcript
Chapter 14: Inference for Regression Use the following to answer questions 1-7: An old saying in golf is “you drive for show and you putt for dough.” The point is that good putting is more important than long driving for shooting low scores and hence winning money. To see if this is the case, data on the top 69 money-winners on the PGA tour in 1993 are examined. The average number of putts per hole for each player is used to predict their total winnings using the simple linear regression model 1993 winnings = β 0 + (average number of putts per hole)β1 . This model was fit to the data using the method of least squares. The following results were obtained from statistical software. R-squared = 0.081 s = 281,777 Variable Constant Avg. Putts Coefficient 7,897,179 -4,139,198 s.e. of Coef 3,023,782 1,698,371 ___ 1. The explanatory variable in this study is a) 1993 winnings. b) average number of putts per hole. c) the slope, β 1. d) –4,139,198. ___ 2. The quantity s = 281,777 is an estimate of the standard deviation σ of the deviations in the simple linear regression model. The degrees of freedom for s are a) 69. b) 68. c) 67. d) 281,777. ___ 3. The intercept of the least-squares regression line is a) 7,897,179. b) –4,139,198. c) 3,023,782. d) 1,698,371. ___ 4. Suppose the researchers test the hypotheses H0: β 1 = 0, Ha: β 1 < 0 The value of the t statistic for this test is a) 2.61. b) 2.44. c) 0.081. d) –2.44. ___ 5. A 95% confidence interval for the slope β 1 in the simple linear regression model is (approximately) a) 7,897,179 ± 3,023,782. c) –4,139,198 ± 1,698,371. b) 7,897,179 ± 6,047,564. d) –4,139,198 ± 3,396,742. ___ 6. The correlation between 1993 winnings and average number of putts per hole is a) 0.081. b) –0.081. c) 0.285. d) –0.285. Page 1 ___ 7. Below is a scatterplot of the 1993 winnings versus the average number of putts per round; below it is a plot of the residuals versus the average number of putts per round. Which of the following statements is supported by these plots? a) There is no striking evidence in these plots that the assumptions for regression are violated. b) The abundance of outliers and influential observations in the plots means that the assumptions for regression are clearly violated. c) These plots contain dramatic evidence that the standard deviation of the response about the true regression line increases as the average number of putts per round increases. d) These plots contain many more points than were used to fit the least-squares regression line in the previous problems. Obviously there is a major error present. Page 2 Use the following to answer questions 8-13: Salary data for 1992–1993 for a sample of 15 universities was obtained. We are curious about the relation between mean salaries for assistant professors (junior faculty) and full professors (senior faculty) at a given university. In particular, do universities pay (relatively) high salaries to both assistant and full professors, or are full professors treated much better than assistant professors? In other words, do senior faculty receive high salaries compared to their junior faculty counterparts? Suppose we fit the following simple linear regression model Full prof. salary = β 0 + ( Asst. prof. salary ) β1 . The variables Full Prof. Salary and Asst. Prof. Salary are the mean salaries for full and assistant professors at a given university. This model was fit to the data using the method of least squares. The following results were obtained from statistical software. Note that salaries were in thousands of dollars. Mean assistant professor salaries were treated as the explanatory variable and mean full professor salaries as the response variable. R-squared = 0.596 s = 5.503 Variable Constant Asst. Prof. Salary Coef 15.0658 1.40827 SE of Coef 14.36 0.3217 ___ 8. The intercept of the least-squares regression line is (approximately) a) 15.07. b) 14.36. c) 1.41. d) 0.32. ___ 9. A 90% confidence interval for the slope β 1 in the simple linear regression model is (approximately) a) 1.41 ± 0.57. b) 1.41 ± 0.32. c) –1.41 ± 0.57. d) -1.41 ± 0.32. ___ 10. Suppose the researchers test the hypotheses H0: β 1 = 0, Ha: β 1 ≠ 0 The value of the t statistic for this test is a) 0.32. b) 1.05. c) 1.41. d) 4.38. ___ 11. The correlation between mean assistant and full professor salaries is a) 0.055. b) 0.355. c) 0.596. d) 0.772. ___ 12. Is there strong evidence (and if so, why) that the relationship between mean assistant and full professor salaries is adequately described by a straight line? a) yes, because the slope of the least-squares line is positive. b) yes, because the P-value for testing if the slope is 0 is quite small. c) no, because the value of the square of the correlation is relatively small. d) it is impossible to say, because we are not given the actual value of the correlation. Page 3 ___ 13. Below is a scatterplot of mean full versus assistant professor salaries (in thousands of dollars). Which of the following statements is supported by the plot? a) There is no striking evidence in the plot that the assumptions for regression are violated. b) There appears to be an outlier and/or influential observations in the plot suggesting that our results must be interpreted with caution. c) The plot contains dramatic evidence that the standard deviation of the response about the true regression line is not even approximately the same everywhere. d) The plot contains many fewer points than were used to fit the least-squares regression line in the previous problems. Obviously there is a major error present. Page 4 ___ 14. After a snowstorm in a large metropolitan area, meteorologists took a random sample of several locations and measured depth of the snow along with the water content. The results were summarized in a computer printout: LINEAR REGRESSION ANALYSIS The regression equation is Water = -0.03039+0.10341*Inches R-squared = 0.95020 DF = 4 T = 8.610 P = Unfortunately, the printer failed just as the p-value was being displayed. What is the p-value for the hypothesis test H 0 : β = 0 vs. H a : β ≠ 0 ? a) P < 0.001 b) P = 0.001 c) P = 0.0304 d) P = 0.103 e) P = 0.950 ___ 15. Suppose we are given the following data: Femur Humerus 38 41 56 63 59 70 64 72 74 84 We wish to construct a 96% confidence interval for the true slope of the regression line relating humerus (dependent) to femur (independent) lengths. What is the value of SEb, the standard error of the slope? HINT: Do NOT use the formula from the book. Instead, use your calculator in a clever way. Even though you are not asked to conduct a test, use your calculator to perform a LinRegTTest to acquire values for t and b. Then use the relationship you know between t, b, and SEb to find SEb. a) 13.1985 b) 15.8902 c) 1.9820 d) 0.0751 e) 11.7853 Answer Key 1. 2. 3. 4. 5. 6. 7. 8. b c a d d d a a 9. 10. 11. 12. 13. 14. 15. a d d b b b d Page 5