Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data assimilation wikipedia , lookup
Instrumental variables estimation wikipedia , lookup
Time series wikipedia , lookup
Choice modelling wikipedia , lookup
Interaction (statistics) wikipedia , lookup
Regression toward the mean wikipedia , lookup
Regression analysis wikipedia , lookup
Statistics for Managers Using Microsoft® Excel 4th Edition Chapter 13 Introduction to Multiple Regression Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter Goals After completing this chapter, you should be able to: apply multiple regression analysis to business decision-making situations analyze and interpret the computer output for a multiple regression model perform residual analysis for the multiple regression model test the significance of the independent variables in a multiple regression model Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-2 Chapter Goals (continued) After completing this chapter, you should be able to: use a coefficient of partial determination to test portions of the multiple regression model incorporate qualitative variables into the regression model by using dummy variables use interaction terms in regression models Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-3 The Multiple Regression Model The coefficients of the multiple regression model are estimated using sample data Estimated multiple regression model: Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ b0 b1X1 b2 X2 bk Xk In this chapter we will always use Excel to obtain the regression slope coefficients and other regression summary measures. Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-4 Multiple Regression Model Two variable model Y Ŷ b0 b1X1 b2 X2 X2 X1 Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-5 Example: 2 Independent Variables A distributor of frozen desert pies wants to evaluate factors thought to influence demand Dependent variable: Pie sales (units per week) Independent variables: Price (in $) Advertising ($100’s) Data is collected for 15 weeks Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-6 Pie Sales Model Week Pie Sales Price ($) Advertising ($100s) 1 350 5.50 3.3 2 460 7.50 3.3 3 350 8.00 3.0 4 430 8.00 4.5 5 350 6.80 3.0 6 380 7.50 4.0 7 430 4.50 3.0 8 470 6.40 3.7 9 450 7.00 3.5 10 490 5.00 4.0 11 340 7.20 3.5 12 300 7.90 3.2 13 440 5.90 4.0 14 450 5.00 3.5 15 300 7.00 2.7 Multiple regression model: Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Sales = b0 + b1 (Price) + b2 (Advertising) Chap 13-7 Estimating a Multiple Linear Regression Equation Excel will be used to generate the coefficients and measures of goodness of fit for multiple regression Excel: Tools / Data Analysis... / Regression PHStat: PHStat / Regression / Multiple Regression… Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-8 Multiple Regression Output Regression Statistics Multiple R 0.72213 R Square 0.52148 Adjusted R Square 0.44172 Standard Error 47.46341 Observations ANOVA Regression 15 df Sales 306.5262 - 24.9751(Price) 74.1310(Advertising) SS MS F 2 29460.027 14730.013 Residual 12 27033.306 2252.776 Total 14 56493.333 Coefficients Standard Error Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404 Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888 Advertising Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. t Stat 6.53861 Significance F P-value 0.01201 Lower 95% Upper 95% Chap 13-9 The Multiple Regression Equation Sales 306.5262 - 24.9751(Price) 74.1310(Advertising) where Sales is in number of pies per week Price is in $ Advertising is in $100’s. b1 = -24.975: sales will decrease, on average, by 24.9751 pies per week for each $1 increase in selling price, net of the effects of changes due to advertising Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. b2 = 74.131: sales will increase, on average, by 74.1310 pies per week for each $100 increase in advertising, net of the effects of changes due to price Chap 13-10 Using The Model to Make Predictions Predict sales for a week in which the selling price is $5.50 and advertising is $350: Sales 306.526 - 24.975(Pri ce) 74.131(Adv ertising) 306.526 - 24.975 (5.50) 74.131 (3.5) 428.62 Predicted sales is 428.62 pies Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Note that Advertising is in $100’s, so $350 means that X2 = 3.5 Chap 13-11 Predictions in PHStat PHStat | regression | multiple regression … Check the “confidence and prediction interval estimates” box Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-12 Predictions in PHStat (continued) Input values < Predicted Y value < Confidence interval for the mean Y value, given these X’s < Prediction interval for an individual Y value, given these X’s Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-13 Coefficient of Multiple Determination Reports the proportion of total variation in Y explained by all X variables taken together 2 Y.12..k r SSR regression sum of squares SST total sum of squares Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-14 Multiple Coefficient of Determination (continued) Regression Statistics Multiple R 0.72213 R Square 0.52148 Adjusted R Square 0.44172 Standard Error Regression r 15 df SSR 29460.0 .5215 SST 56493.3 52.1% of the variation in pie sales is explained by the variation in price and advertising 47.46341 Observations ANOVA 2 Y.12 SS MS F 2 29460.027 14730.013 Residual 12 27033.306 2252.776 Total 14 56493.333 Coefficients Standard Error Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404 Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888 Advertising Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. t Stat 6.53861 Significance F P-value 0.01201 Lower 95% Upper 95% Chap 13-15 Adjusted r2 r2 never decreases when a new X variable is added to the model This can be a disadvantage when comparing models What is the net effect of adding a new variable? We lose a degree of freedom when a new X variable is added Did the new X variable add enough explanatory power to offset the loss of one degree of freedom? Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-16 Adjusted r2 (continued) Shows the proportion of variation in Y explained by all X variables adjusted for the number of X variables used 2 adj r n 1 2 1 (1 rY.12..k ) n k 1 (where n = sample size, k = number of independent variables) Penalize excessive use of unimportant independent variables Smaller than r2 Useful in comparing among models Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-17 Multiple Coefficient of Determination (continued) Regression Statistics Multiple R 0.72213 R Square 0.52148 Adjusted R Square 0.44172 Standard Error 47.46341 Observations ANOVA Regression 15 df r .4417 2 adj 44.2% of the variation in pie sales is explained by the variation in price and advertising, taking into account the sample size and number of independent variables SS MS F 2 29460.027 14730.013 Residual 12 27033.306 2252.776 Total 14 56493.333 Coefficients Standard Error Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404 Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888 Advertising Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. t Stat 6.53861 Significance F P-value 0.01201 Lower 95% Upper 95% Chap 13-18 Residual Plots Used in Multiple Regression These residual plots are used in multiple regression: < Residuals vs. Yi Residuals vs. X1i Residuals vs. X2i etc. Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-19 Residual Plots Pie Example Use Tools | Data Analysis | Regression Residuals Price Residual Plot 200 100 0 -100 4 5 6 7 8 9 Price Residuals Advertising Residual Plot 200 100 0 -100 2 2.5 3 3.5 4 4.5 5 Advertising Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-20 Residual Plots Pie Example Use Tools | Data Analysis | Regression Pie Sales Normal Probability Plot 550 500 450 400 350 300 250 200 0 20 40 60 80 100 Sample Percentile Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-21 Is the Model Significant? F-Test for Overall Significance of the Model Shows if there is a linear relationship between all of the X variables considered together and Y Use F test statistic Hypotheses: H0: β1 = β2 = … = βk = 0 (no linear relationship) H1: at least one βi ≠ 0 (at least one independent variable affects Y) Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-22 F-Test for Overall Significance (continued) Regression Statistics Multiple R 0.72213 R Square 0.52148 Adjusted R Square 0.44172 Standard Error 47.46341 Observations ANOVA Regression 15 df MSR 14730.0 F 6.5386 MSE 2252.8 With 2 and 12 degrees of freedom SS MS P-value for the F-Test F 2 29460.027 14730.013 Residual 12 27033.306 2252.776 Total 14 56493.333 Coefficients Standard Error Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404 Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888 Advertising Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. t Stat 6.53861 Significance F P-value 0.01201 Lower 95% Upper 95% Chap 13-23 F-Test for Overall Significance (continued) Test Statistic: H0: β1 = β2 = 0 H1: β1 and β2 not both zero = .05 df1= 2 df2 = 12 MSR F 6.5386 MSE Decision: p-value=.01201 Reject H0 at = 0.05 Conclusion: = .05 0 Do not reject H0 Reject H0 F There is evidence that at least one independent variable affects Y Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-24 Are Individual Variables Significant? Use t-tests of individual variable slopes Shows if there is a linear relationship between the variable Xi and Y Hypotheses: H0: βi = 0 (no linear relationship) H1: βi ≠ 0 (linear relationship does exist between Xi and Y) Test Statistic bi 0 t S bi Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. (df = n – k – 1) Chap 13-25 Are Individual Variables Significant? (continued) Regression Statistics Multiple R 0.72213 R Square 0.52148 Adjusted R Square 0.44172 Standard Error 47.46341 Observations ANOVA Regression 15 df t-value for Price is t = -2.306, with p-value .0398 t-value for Advertising is t = 2.855, with p-value .0145 SS MS F 2 29460.027 14730.013 Residual 12 27033.306 2252.776 Total 14 56493.333 Coefficients Standard Error Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404 Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888 Advertising Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. t Stat 6.53861 Significance F P-value 0.01201 Lower 95% Upper 95% Chap 13-26 Inferences about the Slope: t Test Example H0: β1 = 0 H1: β1 0 From Excel output: Price Advertising H0: β2 = 0 H1: β2 0 Coefficients Standard Error t Stat P-value -24.97509 10.83213 -2.30565 0.03979 74.13096 25.96732 2.85478 0.01449 The test statistic for each variable falls in the rejection region (p-values < .05) Decision: d.f. = 15-2-1 = 12 = .05 Reject H0 for each variable Conclusion: There is evidence that both Price and Advertising affect pie sales at = .05 Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-27 Confidence Interval Estimate for the Slope (continued) Confidence interval for the population slope βi bi t nk 1Sbi where t has (n – k – 1) d.f. (15 – 2 – 1) = 12 d.f. Coefficients Standard Error … Intercept 306.52619 114.25389 … 57.58835 555.46404 Price -24.97509 10.83213 … -48.57626 -1.37392 74.13096 25.96732 … 17.55303 130.70888 Advertising Lower 95% Upper 95% Example: Excel output also reports these interval endpoints: Weekly sales are estimated to be reduced by between 1.37 to 48.58 pies for each increase of $1 in the selling price Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-28 Coefficient of Partial Determination for k variable model 2 rYj.(all v ariables except j) SSR (X j | all variables except j) SST SSR(all variables) SSR(X j | all variables except j) Measures the proportion of variation in the dependent variable that is explained by Xj while controlling for (holding constant) the other explanatory variables Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-29 Coefficient of Partial Determination in Excel Coefficients of Partial Determination can be found using Excel: PHStat | regression | multiple regression … Check the “coefficient of partial determination” box Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-30 Using Dummy Variables A dummy variable is a categorical explanatory variable with two levels: yes or no, on or off, male or female coded as 0 or 1 Regression intercepts are different if the variable is significant Assumes equal slopes for other variables If more than two levels, the number of dummy variables needed is (number of levels - 1) Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-31 Dummy-Variable Model Example (with 2 Levels) Ŷ b0 b1X1 b2 X2 Let: Y = pie sales X1 = price X2 = holiday (X2 = 1 if a holiday occurred during the week) (X2 = 0 if there was no holiday that week) Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-32 Dummy-Variable Model Example (with 2 Levels) (continued) Ŷ b0 b1X1 b2 (1) (b0 b2 ) b1X1 Holiday Ŷ b0 b1X1 b2 (0) No Holiday Y (sales) b0 + b2 b0 Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. b1X1 b0 Different intercept Same slope If H0: β2 = 0 is rejected, then “Holiday” has a significant effect on pie sales X1 (Price) Chap 13-33 Interpreting the Dummy Variable Coefficient (with 2 Levels) Example: Sales 300 - 30(Price) 15(Holiday) Sales: number of pies sold per week Price: pie price in $ 1 If a holiday occurred during the week Holiday: 0 If no holiday occurred b2 = 15: on average, sales were 15 pies greater in weeks with a holiday than in weeks without a holiday, given the same price Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-34 Dummy-Variable Models (more than 2 Levels) The number of dummy variables is one less than the number of levels Example: Y = house price ; X1 = square feet If style of the house is also thought to matter: Style = ranch, split level, condo Three levels, so two dummy variables are needed Advanced concept not included in homework or on the exam. Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-35 Interaction Between Explanatory Variables Hypothesizes interaction between pairs of X variables Response to one X variable may vary at different levels of another X variable Contains two-way cross product terms Ŷ b0 b1X1 b2 X2 b3 X3 b0 b1X1 b2 X2 b3 (X1X2 ) Hypotheses: H0: 3 = 0 (no interaction between X1 and X2) H1: 3 0 (X1 interacts with X2) Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-36 Interaction Regression Model Worksheet Item, i Yi X1i X2i X1i*X2i 1 1 1 3 3 2 4 8 5 40 3 1 3 2 6 4 3 5 6 30 … … … … … Multiply X1 by X2 to get X1*X2. Run regression with Y, X1, X2 , X1X2 Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-37 Chapter Summary Developed the multiple regression model Tested the significance of the multiple regression model Discussed adjusted r2 Used residual plots to check model Tested individual regression coefficients Tested portions of the regression model Used dummy variables Evaluated interaction effects Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-38