Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Survey

Transcript

Stat 301 HW 7 Due: 30 Oct. / 2 Nov. 2015 Midterm II will cover course material from correlations through extrapolation (lecture on 23 Oct. with a little bit on 26 Oct.). This is the content on Homeworks 5, 6, and 7. Text chapters to be covered are: 3.7, Chapter 8 (but not 8.6), Chapter 4 (but not 4.12), Chapter 7. 1. Text problem 7.10 (p. 378-379), with new questions. The data are in steer.txt. Again, there is a typo in one value that I have corrected in the data set on the class web page. Remember, the context for this analysis is to determine whether the consumer bringing in a 300 lb steer (live weight) was shorted, i.e. received less meat (lower dressed weight) than they should have. (a) Fit the linear model E Y = β0 + β1 X, where Y is the dressed weight and X is the live weight. Report the estimated regression coefficients. (b) Use the linear model to predict the dressed weight for two steers, one with 300lb live weight and one with 400 lb live weight. (c) What is the 95% prediction interval (i.e., for one observation) for a steer with 300lb live weight? (d) Based on the results of the last few questions, does it seem that the customer receiving only 150 lbs was shorted? Briefly explain why or why not. (e) Fit a quadratic model E Y = β0 + β1 X + β2 X 2 . Report the estimated regression coefficients. (f) Based on results from fitting the quadratic model (not just the estimated coefficients), what can you say about lack of fit of the linear model in question 1a? (g) Use the quadratic model to predict the dressed weight for two steers, one with 300lb live weight and one with 400 lb live weight. (h) Does the choice of model (linear or quadratic) make a substantial difference to the prediction for the 400 lb live weight steer? For the 300lb live weight steer? (i) Briefly explain why what you found in question 1h should be expected. (j) What is the 95% prediction interval for a steer with 300lb live weight when you use the quadratic model? (k) Based on everything you have done, does it seem that the customer receiving only 150 lbs was shorted? Briefly explain why or why not. 2. The data in store.txt were collected as part of a management review of a large metropolitan department store. The variable of interest, hours worked, is the total number of hours worked by the clerical staff. So, if 3 people worked 8 hours and one person worked 6 hours, the total number of hours worked for that day is 3*8 + 6 = 30. The other variables are counts of the numbers of various types of documents processed by the clerical department that day: number of pieces of mail, gift certificates, charge account inquiries, change orders, checks cashed, miscellaneous mail items, and bus tickets. Fit a multiple regression model that predicts the total number of hours worked from numbers of each type of document (i.e., the other seven variables). Use results and other information from that fit to answer the following questions: 1 (a) Even though you fit a regression model with seven X variables, you are most interested in two of them: the number of checks or the number of miscellaneous items. Which of those two variables is more important at predicting the number of hours worked? Explain your choice. Some additional information that may be useful: Variable Checks Misc. items minimum 334 30 maximum 1081 86 std. dev. 184 13.8 Note: We talked about this a while ago. (b) Which of the seven X variables has the highest standardized beta? Which of the seven X variables has the lowest standardized beta? Notes: Lowest means “closest to zero”. Again, we talked about this a while ago. The book discusses standardized beta’s = standardized regression coefficients on p. 362. (c) I am surprised by the negative coefficient for change orders. You wonder whether there is large multicollinearity in these X variables. Is there an issue with multicollinearity for any of the seven variables? If so, which variables are you concerned about? Briefly explain your answer. (d) The dept. store plans to use this model to predict workload (i.e., hours that will need to be worked) when the clerical department has to process various combinations of types of documents. Do you have any concerns about extrapolating outside the range of the data if you have to use this model to predict at three new observations, labeled X1, X2, and X3 in the store.txt data set? For your convenience, I have also included an observation labeled mean that has the mean value of each explanatory variable. State which observations (X1, X2, or X3) you have concerns about and briefly explain why you have a concern. 2