Download Chapter 13

Student’s Solutions Manual and Study Guide: Chapter 14 Page 1 Chapter 14 Building Multiple Regression Models LEARNING OBJECTIVES This chapter presents several advanced topics in multiple regression analysis enabling you to: 1. Generalize linear regression models as polynomial regression models using model transformation and Tukey’s ladder of transformation, accounting for possible interaction among the independent variables. 2. Examine the role of indicator, or dummy, variables as predictors or independent variables in multiple regression analysis. 3. Use all possible regressions, stepwise regression, forward selection, and backward elimination search procedures to develop regression models that account for the most variation in the dependent variable and are parsimonious. 4. Recognize when multicollinearity is present, understanding general techniques for preventing and controlling it. 5. Explain when to use logistic regression, and interpret its results. CHAPTER OUTLINE 14.1 Non Linear Models: Mathematical Transformation Polynomial Regression Tukey’s Ladder of Transformations Regression Models with Interaction Model Transformation 14.2 Indicator (Dummy) Variables 14.3 Model-Building: Search Procedures Search Procedures All Possible Regressions Stepwise Regression Forward Selection Backward Elimination Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition Student’s Solutions Manual and Study Guide: Chapter 14 14.4 Multicollinearity 14.5 Logistic Regression Page 2 KEY TERMS All Possible Regressions Backward Elimination Dummy Variable Forward Selection Indicator Variable Multicollinearity Quadratic Regression Model Qualitative Variable Search Procedures Stepwise Regression Tukey’s Four-quadrant Approach Tukey’s Ladder of Transformations Variance Inflation Factor (VIF) STUDY QUESTIONS 1. Another name for an indicator variable is a ________________ variable. These variables are _____________________ as opposed to quantitative variables. 2. Indicator variables are coded using __________ and _________. 3. Suppose an indicator variable has four categories. In coding this into variables for multiple regression analysis, there should be _______________ variables. 4. Regression models in which the highest power of any predictor variable is one and in which there are no interaction terms are referred to as ________________________ models. 5. The interaction of two variables can be studied in multiple regression using the _______________ terms. 6. Suppose a researcher wants to analyze a set of data using the model: ŷ = b0b1x The model would be transformed by taking the _______________________ of both sides of the equation. 7. Perhaps the most widely known and used of the multiple regression search procedures is _______________________ regression. Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition Student’s Solutions Manual and Study Guide: Chapter 14 Page 3 8. One multiple regression search procedure is Forward Selection. Forward selection is essentially the same as stepwise regression except that _________________________. 9. Backward elimination is a step-by-step process that begins with the _________________________ model. 10. A search procedure that computes all the possible linear multiple regression models from the data using all variables is called ___________________________________. 11. When two or more of the independent variables of a multiple regression model are highly correlated it is referred to as ___________________________________________. This condition causes several other problems to occur including (1) difficulty in interpreting __________________________________________. (2) Inordinately small ______________________ for the regression coefficients may result. (3) The standard deviations of regression coefficients are ________________________. (4) The ____________________________ of estimated regression coefficients may be the opposite of what would be expected for a particular predictor variable. Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition Student’s Solutions Manual and Study Guide: Chapter 14 ANSWERS TO STUDY QUESTIONS 1. Dummy, Qualitative 2. 0, 1 3. 3 4. First-Order 5. x1  x2 or Cross Product 6. Logarithm 7. Stepwise 8. Once a variable is entered into the process, it is never removed 9. Full 10. All Possible Regressions 11. Multicollinearity, the Estimates of the Regression Coefficients, t Values, Overestimated, Algebraic Sign Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition Page 4 Student’s Solutions Manual and Study Guide: Chapter 14 Page 5 SOLUTIONS TO PROBLEMS IN CHAPTER 14 14.1 Simple Regression Model: ŷ = – 147.27 + 27.128 x F = 229.67 with p = .000, se = 27.27, R2 = .97, adjusted R2 = .966, and t = 15.15 (for x) with p = .000. This is a very strong simple regression model. Quadratic Model (Using both x and x2): ŷ = – 22.01 + 3.385 x + 0.9373 x2 F = 578.76 with p = .000, se = 12.3, R2 = .995, adjusted R2 = .993, for x: t = 0.75 with p = .483, and for x2: t = 5.33 with p = .002. The quadratic model is also very strong with an even higher R2 value. However, in this model only the x2 term is a significant predictor. 14.3 Simple regression model: ŷ = - 1,456.6 + 71.017 x R2 = .928 and adjusted R2 = .910. t = 7.17 (for x) with p = .002. Quadratic regression model: ŷ = 1,012 - 14.06 x + 0.6115 x2 R2 = .947 but adjusted R2 = .911. The t statistic for the x term is t = - 0.17 with p = .876. The t statistic for the x2 term is t = 1.03 with p = .377 Neither predictor is significant in the quadratic model. Also, the adjusted R2 for this model is virtually identical to the simple regression model. The quadratic model adds virtually no predictability that the simple regression model does not already have. The scatter plot of the data follows: Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition Student’s Solutions Manual and Study Guide: Chapter 14 Page 6 7000 6000 Ad Exp 5000 4000 3000 2000 1000 30 40 50 60 70 80 90 100 110 Eq & Sup Exp 14.5 The regression model is: ŷ = - 28.61 - 2.68 x1 + 18.25 x2 - 0.2135 x12 - 1.533 x22 + 1.226 x1x2 F = 63.43 with p = .000 significant at  = .001 se = 4.669, R2 = .958, and adjusted R2 = .943 None of the t statistics for this model are significant. They are t(x1) = 0.25 with p = .805, t(x2) = 0.91 with p = .378, t(x12) = - 0.33 with .745, t(x22) = - 0.68 with .506, and t(x1x2) = 0.52 with p = .613. This model has a high R2 yet none of the predictors are individually significant. The same thing occurs when the interaction term is not in the model. None of the t statistics are significant. The R2 remains high at .957 indicating that the loss of the interaction term was insignificant. 14.7 The regression equation is: ŷ = 13.619 - 0.01201 x1 + 2.998 x2 Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition Student’s Solutions Manual and Study Guide: Chapter 14 Page 7 The overall F = 8.43 is significant at  = .01 (p = .009). se = 1.245, R2 = .652, adjusted R2 = .575 The t statistic for the x1 variable is only t = -0.14 with p = .893. However the t statistic for the dummy variable, x2 is t = 3.88 with p = .004. The indicator variable is the significant predictor in this regression model that has some predictability (adjusted R2 = .575). 14.9 This regression model has relatively strong predictability as indicated by R2 = .795. Of the three predictor variables, only x1 and x2 have significant t statistics (using  = .05). x3 (a non-indicator variable) is not a significant predictor. x1, the indicator variable, plays a significant role in this model along with x2. 14.11 The regression equation is: Price = 3.4394 - 0.0195 Hours + 9.113 ProbSeat + 10.528 Downtown The overall F = 6.58 with p = .0099 which is significant at  = .01. se = 3.94, R2 = .664, and adjusted R2 = .563. The difference between R2 and adjusted R2 indicates that there are some non-significant predictors in the model. The t statistics, t = - 0.13 with p = .901 and t = 1.34 with p = .209, of Hours and Probability of Being Seated are non-significant at  = .05. The only significant predictor is the dummy variable, Downtown location or not, which has a t statistic of 3.95 with p = .003 which is significant at  = .01. The positive coefficient on this variable indicates that being in the Downtown adds to the price of a meal. 14.13 Stepwise Regression: Step 1: After developing a simple regression model for each independent variable, we select the model with x2 with t = - 7.35 and R2 = .794. The model is ŷ = 36.15 - 0.146 x2. Step 2: x3 enters the model and x2 remains in the model. t for x2 is -4.60, t for x3 is 2.93. R2 = .876. The model is ŷ = 26.40 - 0.101 x2 + 0.116 x3. Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition Student’s Solutions Manual and Study Guide: Chapter 14 Step 3: Page 8 The regression model is explored that contains x1 in addition to x2 and x3. The model does not produce any significant result. No new variable is added to the model produced in Step 2. Note that at every step of the procedure, the variable x1 appears to be non-significant. 14.15 The output shows that the final model had four predictor variables, x3, x1, x2, and x6. The variables, x4 and x5 did not enter the stepwise analysis. The procedure took four steps. The final model was: y1 = 5.96 – 5.00 x3 + 3.22 x1 + 1.78 x2 + 1.56 x6 The R2 for this model is .5929, and se is 3.36. The t ratios are: x3 : t = 3.07; x1 :t = 2.05; x2: t = 2.02; and x6: t = 1.98. 14.17 Stepwise Regression: Step 1: After developing a simple regression model for each independent variable, we select the model for Durability with t = 3.32. For this model: R2 = .379 and se = 15.48. The regression equation is: Amount Spent = 17.093 + 7.135 Durability Step 2: The regression models are explored that contain Value or Service in addition to Durability. The t value of the regression coefficient for Value (Service) is not significant. No new variable is added to the model produced in Step 1. 14.19 y y 1 x1 -.653 x1 x2 -.653 -.891 1 x2 -.891 .650 x3 .821 .650 -.615 1 -.688 Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition Student’s Solutions Manual and Study Guide: Chapter 14 x3 .821 -.615 -.688 Page 9 1 There appears to be some correlation between all pairs of the predictor variables, x1, x2, and x3. All pairwise correlations between independent variables are in the .600 to .700 range. 14.21 The predictor intercorrelations are: Value Value Durability Service 1 .559 .533 Durability .559 1 .364 Service .533 .364 1 An examination of the predictor intercorrelations reveals that Service and Durability have very little correlation, but Value and Durability have a correlation of .559 and Value and Service a correlation of .533. These correlations might suggest multicollinearity. 14.23 The log of the odds ratio or logit equation is: ln ( S )  0.932546  0.0000323 Payroll Expenditur es. The G statistic is 11.175 which with one degree of freedom has a p-value of 0.001. Thus, there is overall significance in this model. The predictor, Payroll Expenditures, is significant at  = .01 because the associated p-value of 0.008 is less than  = .01. If the payroll expenditures are $80,000, then ln ( S )  0.932546  0.0000323 (80,000) 14.5 PR ln( S )  3.516546 OBLEMS S  e 3.516546  0.0297. From this, the probability that the hospital with the $80,000 payroll expenditure is a psychiatric hospital can be determined by S 0.0297  p   0.0288 or 2.88% . S  1 0.0297  1 Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition Student’s Solutions Manual and Study Guide: Chapter 14 Page 10 14.25 The log of the odds ratio or logit equation is: ln ( S )  3.07942  0.0544532 Number of Production Workers . The G statistic is 97.492 which with one degree of freedom has a p-value of 0.000. Thus, there is overall significance in this model. The p-value associated with the predictor variable, Number of Production Workers, is 0.000. This indicates that Number of Production Workers is a significant predictor in the model at  = .001. If the number of production workers is 30, then ln ( S )  3.07942  0.0544532 30 ln ( S )  1.445824 S  e 1.445824  0.23555. From this, the probability that that a company with 30 production workers has a large value of industrial shipments can be determined by S 0.23555  p   0.1906 or 19.06% . S  1 0.23555  1 14.27 The regression model is: ŷ = 564.2 - 27.99 x1 - 6.155 x2 - 15.90 x3 F = 11.32 with p = .003, se = 42.88, R2 = .809, adjusted R2 = .738.Thus, overall there is statistical significance at  = .01, For x1, t = -0.92 with p = .384, for x2, t = -4.34 with p = .002, for x3, t = -0.71 with p = .497. Thus, only one of the three predictors, x2, is a significant predictor in this model. This model has very good predictability (R2 = .809). The gap between R2 and adjusted R2 underscores the fact that there are two non-significant predictors in this model. x1 is a non-significant indicator variable. 14.29 Stepwise Regression: Step 1: After developing a simple regression model for each independent variable(x1 , Log x1) , we select the model for Log x1 because it has the largest absolute value of t = 17.36 ( p-value of 0.000). For this model: R2 = .9617. The model appears in the form: Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition Student’s Solutions Manual and Study Guide: Chapter 14 Page 11 ŷ = - 13.20 + 11.64 Log x1. Step 2: The regression model with two predictors is explored that contains x1 in addition to Log x1. At this step, the t ratio for x1 is 0.90 with the p-value = 0.386. It indicates that the predictor x1 is non-significant. No new variable is added to the model produced in Step 1. 14.31 Stepwise Regression: Step 1: After developing a simple regression model for each independent variable (Copper , Silver, Aluminum), we select the model with Silver because it has the largest absolute value t statistic: tSilver = 3.32 ( p-value of 0.007). The predictor Silver is significant at  = .01. For this model: R2 = 0.5244. The regression equation is Gold = 233.4 + 17.74 Silver . Step 2: The regression models with two predictors are explored that contain Copper (or Aluminum) in addition to Silver. At this step, analysis of the t statistics shows the best model: Gold = – 50.07 + 18.86 Silver +3.587 Aluminum. The R2 at this step is .8204, the t ratio for Silver is 5.43 with p = .0004, and the t ratio for Aluminum is 3.85 with p = .004. Step 3: A search is made to determine whether the variable Copper in conjunction with Silver and Aluminum produces the largest significant absolute t value in the model. The model does not produce significant result. No new variable is added to the model produced in Step 2. 14.33 Let Beef = x1, Chicken = x2, Eggs = x3, Bread = x4, Coffee = x5, and Price Index = y. Stepwise Regression: Step 1: Using graphs and Tukey’s ladder of transformations we develop a simple regression model for each independent variable (x1, Log x2, x3 , x4 , x5 ). We select the model for x1 because it has the largest absolute value of t = 13.67. For this model: R2 = .8696. The model appears in the form ŷ = 93.62 + 0.2080 x1. Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition Student’s Solutions Manual and Study Guide: Chapter 14 Page 12 Step 2: The regression models with two predictors are explored that contain Logx2 (or x3, x4, x5) in addition to x1. At this step, analysis of the t statistics shows the best model: ŷ = 86.96 + 0.1427 x1 + 0.08561 x4. 2 The R at this step is .9033, the t ratio for x1 is 5.67 with p = .000, and the t ratio for x4 is 3.06 with p = .005 (it is significant at  = .01). Step 3: A search is made to determine which of the remaining independent variables in conjunction with x1 and x4 produces the largest significant absolute t value in the model. None of the models produce significant results. No new variables are added to the model produced in Step 2. 14.35 Stepwise Regression: Step 1: After developing a simple regression model for each independent variable (Familiarity, Satisfaction , Proximity), we select the model with Familiarity because it has the largest absolute value t statistic: t Familiarity = 6.71 ( p-value of 0.000). The predictor Familiarity is significant at  = .001.For this model: R2 = 0.6167. The regression equation is Number of Visits = 0.05488 + 1.0915 Familiarity. Step 2: A search is made to determine whether the variable Satisfaction or Proximity in conjunction with Familiarity produces the significant absolute t value in the model. None of the models produce significant results. No new variables are added to the model produced in Step 1. 14.37 The output shows that that the stepwise regression procedure stopped at Step 3. At step 1, the model with x3 is selected. R2 = .8124 and t statistic for x3: t = 6.90 . The regression equation is ŷ = 74.81 + 0.099 x3. At step 2, x2 is entered into the model along with x3. The regression equation is ŷ = 82.18 + 0.067 x3 – 2.26 x2. The t statistics are t x3  3.65 and t x2  2.32 . The R2 for this model is .8782. At step 3, x1 is entered into the model along with x3 and x2. The procedure stops Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition Student’s Solutions Manual and Study Guide: Chapter 14 Page 13 here with a final model of: ŷ = 87.89 + 0.071 x3 – 2.71 x2 – 0.256 x1. The t statistics are : t x3  5.22 , t x2  3.71 and t x1  3.08 . The R2 for this model is .9407 indicating very strong predictability. 14.39 The log of the odds ratio or logit equation is: ln ( S )  3.94828  1.36988 Number of kilometres . The G statistic is 100.537 with p-value of 0.000. Thus, the model is significant overall. The degree of freedom is equal to 1. The p-value associated with the predictor variable, Number of kilometres, is 0.000. This indicates that Number of kilometres is a significant predictor in the model at  = .001. If a shopper drives 5 kilometres to get to the store, then ln ( S )  3.94828  1.36988 (5) ln ( S )  2.90112 S  e 2.90112  18.1945 . From this, the probability that that a person would purchase something can be determined by S 18.1945  p   0.948 or about 95% . S  1 18.1945  1 This indicates that there is very high probability that the person who drives 5 kilometres would purchase something. For 4 kilometres, the probability drops to .822. For 3 kilometres, the probability drops to .540 (almost a coin toss). For 2 kilometres, the probability drops to .230. For 1 kilometer, the probability drops to .071. Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition Student’s Solutions Manual and Study Guide: Chapter 14 Page 14 Legal Notice Copyright Copyright © 2014 by John Wiley & Sons Canada, Ltd. or related companies. All rights reserved. The data contained in these files are protected by copyright. This manual is furnished under licence and may be used only in accordance with the terms of such licence. The material provided herein may not be downloaded, reproduced, stored in a retrieval system, modified, made available on a network, used to create derivative works, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise without the prior written permission of John Wiley & Sons Canada, Ltd. (MMXIII xii FI) Black, Chakrapani, Castillo: Business Statistics, Second Canadian Edition

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Chapter 13