Download Chapter 14

Chapter 14 Multiple Regression McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Multiple Regression 14.1 The Multiple Regression Model and the Least Squares Point Estimate 14.2 Model Assumptions and the Standard Error 14.3 R2 and Adjusted R2 14.4 The Overall F Test 14.5 Testing the Significance of an Independent Variable 14-2 Multiple Regression Continued 14.6 Confidence and Prediction Intervals 14.7 Using Dummy Variables to Model Qualitative Independent Variables 14.8 The Partial F Test: Testing the Significance of a Portion of a Regression Model 14.9 Residual Analysis in Multiple Regression 14-3 The Multiple Regression Model • Simple linear regression used one independent variable to explain the dependent variable – Some relationships are too complex to be described using a single independent variable • Multiple regression uses two or more independent variables to describe the dependent variable – This allows multiple regression models to handle more complex situations – There is no limit to the number of independent variables a model can use • Multiple regression has only one dependent variable 14-4 The Multiple Regression Model • • • • The linear regression model relating y to x1, x2,…, xk is y = β0 + β1x1 + β2x2 +…+ βkxk +  µy = β0 + β1x1 + β2x2 +…+ βkxk is the mean value of the dependent variable y when the values of the independent variables are x1, x2,…, xk β0, β1, β2,… βk are unknown the regression parameters relating the mean value of y to x1, x2,…, xk  is an error term that describes the effects on y of all factors other than the independent variables x1, x2,…, xk 14-5 Example 14.1 Fuel Consumption Case Week 1 2 3 4 5 6 7 8 Average Hourly Temperature, x1 (F) 28.0 28.0 32.5 39.0 45.9 57.8 58.1 62.5 Chill Index, x2 18 14 24 22 8 16 1 0 Fuel Consumption y (MMcf) 12.4 11.7 12.4 10.8 9.4 9.5 8.0 7.5 14-6 Example 14.1: Fuel Consumption Versus Average Hourly Temperature 14-7 Example 14.1: Weekly Fuel Consumption Versus the Chill Index 14-8 Example 14.1: A Geometrical Interpretation of the Regression Model 14-9 The Least Squares Point Estimates • Estimation/prediction equation ŷ = b0 + b1x01 + b2x02 + … + bkx0k is the point estimate of the mean value of the dependent variable when the values of the independent variables are x1, x2,…, xk • It is also the point prediction of an individual value of the dependent variable when the values of the independent variables are x1, x2,…, xk • b0, b1, b2,…, bk are the least squares point estimates of the parameters β0, β1, β2,…, βk • x01, x02,…, x0k are specified values of the independent predictor variables x1, x2,…, xk 14-10 Calculating the Model • A formula exists for computing the least squares model for multiple regression • This formula is written using matrix algebra and is presented in Appendix G of the CD-ROM • In practice, the model can be easily computed using Excel, MINITAB, MegaStat or many other computer packages 14-11 Figure 14.4a: Fuel Consumption Case MINITAB Output 14-12 Figure 14.4b: Fuel Consumption Case Excel Output 14-13 Table 14.2: Point Predictions and Residuals Using Least Squares Estimates 14-14 The Multiple Regression Model y = β0 + β1x1 + β2x2 + … + βkxk + ε 1. μy = β0 + β1x1 + β2x2 + … + βkxk + ε is the mean value of the dependent variable 2. β0, β1, β2, … , βk are unknown regression parameters relating the mean value of y to x1, x2, …, xk 3. ε is an error term 14-15 Model Assumptions and the Standard Error • The model is y = β0 + β1x1 + β2x2 + … + βkxk +  • Assumptions for multiple regression are stated about the model error terms, ’s 14-16 The Regression Model Assumptions Continued • Mean of Zero Assumption The mean of the error terms is equal to 0 • Constant Variance Assumption The variance of the error terms σ2 is, the same for every combination values of x1, x2,…, xk • Normality Assumption The error terms follow a normal distribution for every combination values of x1, x2,…, xk • Independence Assumption The values of the error terms are statistically independent of each other 14-17 Sum of Squared Errors SSE   e   ( yi  yˆi ) 2 i 2 14-18 Mean Square Error • This is the point estimate of the residual variance σ2 • SSE is from last slide • This formula is slightly different from simple regression SSE s  MSE  n-k  1 2 14-19 Standard Error • This is the point estimate of the residual standard deviation σ • MSE is from last slide • This formula too is slightly different from simple regression SSE s  MSE  n-k  1 14-20 Fuel Consumption Case MINITAB Output 14-21 R2 and Adjusted R2 1. Total variation is given by the formula Σ(yi - ȳ)2 2. Explained variation is given by the formula Σ(ŷi - ȳ)2 3. Unexplained variation is given by the formula Σ(yi - ŷi)2 4. Total variation is the sum of explained and unexplained variation 14-22 R2 and Adjusted R2 5. The multiple coefficient of determination is the ratio of explained variation to total variation 6. R2 is the proportion of the total variation that is explained by the overall regression model 7. Multiple correlation coefficient R is the square root of R2 14-23 What Does R2 Mean? The multiple coefficient of determination, R2, is the proportion of the total variation in the n observed values of the dependent variable that is explained by the multiple regression model 14-24 Multiple Correlation Coefficient R • The multiple correlation coefficient R is just the square root of R2 • With simple linear regression, r would take on the sign of b1 • There are multiple bi’s with multiple regression • For this reason, R is always positive • To interpret the direction of the relationship between the x’s and y, you must look to the sign of the appropriate bi coefficient 14-25 The Adjusted R2 • Adding an independent variable to multiple regression will raise R2 • R2 will rise slightly even if the new variable has no relationship to y • The adjusted R2 corrects this tendency in R2 • As a result, it gives a better estimate of the importance of the independent variables 14-26 Calculating The Adjusted R2 • The adjusted multiple coefficient of determination is  2 k  n  1   R   R    n  1  n  (k  1)  2 14-27 Fuel Consumption Case MINITAB Output 14-28 A Problem With Adjusted R2 • In the rare case where R2 is less than k/(n-1), Adjusted R2 will be negative • For R2 to be less than k/(n-1), there must be little or no relationship between the independent variables and y • When this happens, many statistical software systems will set adjusted R2 to zero rather than displaying a negative value • However, Excel shows the negative value for adjusted R2 14-29 The Overall F Test • To test H0: β1= β2 = …= βk = 0 versus Ha: At least one of β1, β2,…, βk ≠ 0 • The test statistic is (Explained variation )/k F(model)  (Unexplain ed variation )/[n - (k  1)] • Reject H0 in favor of Ha if F(model) > F* or p-value <  • *F is based on k numerator and n-(k+1) denominator degrees of freedom  14-30 Example 14.3 Fuel Consumption Case Test Statistic: F(model)  (Explained variation )/k 24.875 / 2   92.30 (Unexplain ed variation )/[n - (k  1)] 0.674 /(8  3) Reject H0 at  level of significance, since F-test at  = 0.05 F(model)  92.30  5.79  F.05 and level of significance p - value  0.000  0.05   F is based on 2 numerator and 5 denominator degrees of freedom 14-31 What Next? • The F test tells us that at least one independent variable is significant • The natural question is which ones? • That question is addressed in the next section 14-32 Testing the Significance of an Independent Variable • A variable in a multiple regression model is not likely to be useful unless there is a significant relationship between it and y • To test significance, we use the null hypothesis H0: βj = 0 • Versus the alternative hypothesis Ha : β j ≠ 0 14-33 Testing Significance of an Independent Variable #2 If the regression assumptions hold, we can reject H0: j = 0 at the  level of significance (probability of Type I error equal to ) if and only if the appropriate rejection point condition holds or, equivalently, if the corresponding pvalue is less than  14-34 Testing Significance of an Independent Variable #3 Alternative Reject H0 If p-Value Ha: βj > 0 t > tα Area under t distribution right of t Ha: βj < 0 t < –tα Area under t distribution left of t Ha: βj ≠ 0 |t| > t/2* Twice area under t distribution right of |t| * That is t > t/2 or t < –t/2 14-35 Testing Significance of an Independent Variable #4 • Test Statistics t= bj sbj • 100(1-)% Confidence Interval for βj [b1 ± t/2 Sbj] • t, t/2 and p-values are based on n-(k+1) degrees of freedom 14-36 Testing Significance of an Independent Variable #5 • It is customary to test the significance of every independent variable in a regression model • If we can reject H0: βj = 0 at the 0.05 level of significance, we have strong evidence that the independent variable xj is significantly related to y • If we can reject H0: βj = 0 at the 0.01 level of significance, we have very strong evidence that the independent variable xj is significantly related to y • The smaller the significance level  at which H0 can be rejected, the stronger the evidence that xj is significantly related to y 14-37 A Note on Significance Testing • Whether the independent variable xj is significantly related to y in a particular regression model is dependent on what other independent variables are included in the model • That is, changing independent variables can cause a significant variable to become insignificant or cause an insignificant variable to become significant • This issue is addressed in a later section on multicollinearity 14-38 Fuel Consumption Case: Calculation of the t Statistics Chill is significant at the  = 0.05 level, but not at  = 0.01 t, t/2 and p-values are based on 5 degrees of freedom 14-39 Fuel Consumption Case: The MINITAB and Excel Output 14-40 A Confidence Interval for the Regression Parameter βj • If the regression assumptions hold, 100(1-)% confidence interval for βj is [b1 ± t/2 Sbj] • t/2 is based on n – (k + 1) degrees of freedom 14-41 Example 14.6 Fuel Consumption Case • We know b1 = –0.09001 and sb1 = 0.01408 • The t value with n-(k+1)=5 degrees of freedom and a 95 percent confidence interval is 2.571 • This gives us the information we need to compute a confidence interval for b1 [b1 ± t/2sb1] = [-0.09001± 2.571·0.01408] = [-0.1262, -0.0538] • Thus, we can be 95 percent confident that b1 is between –0.1262 and –0.0538 14-42 Confidence and Prediction Intervals • The point on the regression line corresponding to a particular value of x01, x02,…, x0k, of the independent variables is ŷ = b0 + b1x01 + b2x02 + … + bkx0k • It is unlikely that this value will equal the mean value of y for these x values • Therefore, we need to place bounds on how far the predicted value might be from the actual value • We can do this by calculating a confidence interval for the mean value of y and a prediction interval for an individual value of y 14-43 Distance Value • Both the confidence interval for the mean value of y and the prediction interval for an individual value of y employ a quantity called the distance value • With simple regression, we were able to calculate the distance value fairly easily • However, for multiple regression, calculating the distance value requires matrix algebra • See Appendix G on CD-ROM for more detail 14-44 A Confidence Interval for a Mean Value of y • Assume the regression assumptions hold • The formula for a 100(1-) confidence interval for the mean value of y is as follows: [ŷ  t /2 s( y  yˆ ) ] s( y  yˆ )  s Distance value • This is based on n-(k+1) degrees of freedom 14-45 A Prediction Interval for an Individual Value of y • Assume the regression assumptions hold • The formula for a 100(1-) prediction interval for an individual value of y is as follows: [ŷ  t /2 s yˆ ], s yˆ  s 1 + Distance value • This is based on n-(k+1) degrees of freedom 14-46 Example 14.7 Fuel Consumption Case • Recall from Example 14.1 that ŷ = 13.1087 – 0.09001 x1 + 0.08249 x2 • For x1 = 40 and x2 = 10, ŷ = 10.333 • 95 Percent confidence interval [ŷ  t /2 s Distance value ] [10.333  (2.571)(0.3671) 0.2144515 ] [10.333  0.438] [9.895,10.771] • 95 Percent prediction interval [ŷ  t /2 s 1  Distance value ] [10.333  (2.571)( 0.3671) 1  0.2144515 ] [10.333  1.041] [9.292,11.374] 14-47 Using Dummy Variables to Model Qualitative Independent Variables • So far, we have only looked at including quantitative data in a regression model • However, we may wish to include descriptive qualitative data as well – For example, might want to include the gender of respondents • We can model the effects of different levels of a qualitative variable by using what are called dummy variables – Also known as indicator variables 14-48 How to Construct Dummy Variables • A dummy variable always has a value of either 0 or 1 • For example, to model sales at two locations, would code the first location as a zero and the second as a 1 – Operationally, it does not matter which is coded 0 and which is coded 1 14-49 Example 14.9: The Electronics World Case #1 Store 1 2 3 4 5 6 7 8 9 10 Number of Households x 161 99 135 120 164 221 179 204 214 101 Location Street Street Street Street Street Mall Mall Mall Mall Mall Location Dummy DM 0 0 0 0 0 1 1 1 1 1 Sales Volume y 157.27 93.28 136.81 123.79 153.51 241.74 201.54 206.71 229.78 135.22 Location Dummy Variable DM  1 if a store is in a mall location 0 otherwise 14-50 Example 14.9: Plot of Sales Volume and Geometrical Interpretation of the Model 14-51 Example 14.9: Excel Output of a Regression Analysis 14-52 What If We Have More Than Two Categories? • Consider having three categories, say A, B and C • Cannot code this using one dummy variable – A=0, B=1 and C=2 would be invalid – Assumes the difference between A and B is the same as B and C • We must use multiple dummy variables – Specifically, k categories requires k-1 dummy variables 14-53 What If We Have More Than Two Categories? Continued • For A, B, and C, would need two dummy variables – x1 is 1 for A, zero otherwise – x2 is 1 for B, zero otherwise – If x1 and x2 are zero, must be C • This is why the third dummy variable is not needed 14-54 Interaction Models • So far, have only considered dummy variables as stand-alone variables – Model so far is y = β0 + β1x + β2D +  – Where D is dummy variable • However, can also look at interaction between dummy variable and other variables – That model would take the form y = β0 + β1x + β2D + β3xD +  • With an interaction term, both the intercept and slope are shifted 14-55 Other Uses • So far, we have seen dummy variables used to code categorical variables • Can be used to flag unusual events with an impact on the dependent variable • These can be one-time events – Impact of a strike on sales – Impact of major sporting event coming to town • Or they can be reoccurring events – Hot temperatures on soft drink sales – Cold temperatures on coat sales 14-56 The Partial F Test: Testing the Significance of a Portion of a Regression Model • So far, we have looked at testing single slope coefficients using t test • We have also looked at testing all the coefficients at once using F test • The partial F test allows us to test the significance of any set of independent variables in a regression model 14-57 The Partial F Test Model • Complete Model y = β0 + β1x1+…+βgxg + βg+1xg+1+…+βkxk +  • Reduced Model y = β0 + β1x1 + … + βgxg +  14-58 The Partial F Test Model Continued • To test H0: βg+1 = βg+2 = …= βk = 0 versus Ha: At least one of the βg+1, βg+2,…, βk ≠ 0 (SSE R - SSE C )/(k - g) F SSE C /[n - (k  1)] • Reject H0 in favor of Ha if: – F > F or – p-value <  • F is based on k-g numerator and n-(k+1) denominator degrees of freedom 14-59 Example 14.10: Electronic World • The model from Example 14.9 is: y = β 0 + β 1x + β 2D M + β 3D D + ε – DM and DD are dummy variables – This called the complete model 14-60 Example 14.10: Electronic World #2 • Will now look at just the reduced model: y = β0 + β1 x + e • This gives us the hypotheses: H0: β2 = β3 = 0 Ha: At least one of β2 and β3  0 • The SSE for the complete model is SSEC = 443.4650 • The SSE for the reduced model is SSER = 2,467.8067 14-61 Example 14.10: Electronic World #3  SSER  SSEc  / k  g  F SSEc /n  k  1 2,467.8067  443.4650 / 2  443.4650 /15  4  25.1066 14-62 Example 14.10: Electronic World #4 • We compare F with F.01 = 7.21 – Based on k – g = 2 numerator degrees of freedom – And n – (k + 1) = 11 denominator degrees of freedom – Note that k – g denotes the number of regression parameters set to 0 • Since F = 25.1066 > 7.21 we reject the null hypothesis • We conclude that at least two locations have different effects on mean sales volume 14-63 Residual Analysis in Multiple Regression • For an observed value of yi, the residual is ei = yi - ŷ = yi – (b0 + b1xi1 + … + bkxik) • If the regression assumptions hold, the residuals should look like a random sample from a normal distribution with mean 0 and variance σ2 14-64 Example: MegaStat Residual Plots for the Sales Territory Performance Model 14-65 Residual Plots • Residuals versus each independent variable • Residuals versus predicted y’s • Residuals in time order (if the response is a time series) 14-66

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Chapter 14