Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data assimilation wikipedia , lookup
Time series wikipedia , lookup
Instrumental variables estimation wikipedia , lookup
Interaction (statistics) wikipedia , lookup
Choice modelling wikipedia , lookup
Regression toward the mean wikipedia , lookup
Linear regression wikipedia , lookup
Chapter 14 Multiple Regression McGraw-Hill/Irwin Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. Multiple Regression 14.1 The Multiple Regression Model and the Least Squares Point Estimate 14.2 Model Assumptions and the Standard Error 14.3 R2 and Adjusted R2 14.4 The Overall F Test 14.5 Testing the Significance of an Independent Variable 14-2 Multiple Regression Continued 14.6 14.7 14.8 Confidence and Prediction Intervals The Sales Territory Performance Case Using Dummy Variables to Model Qualitative Independent Variables 14.9 The Partial F Test: Testing the Significance of a Portion of a Regression Model 14.10 Residual Analysis in Multiple Regression 14-3 LO 1: Explain the multiple regression model and the related least squares point estimates. Simple linear regression used one independent variable to explain the dependent variable Some relationships are too complex to be described using a single independent variable Multiple regression uses two or more independent variables to describe the dependent variable 14.1 The Multiple Regression Model and the Least Squares Point Estimate This allows multiple regression models to handle more complex situations There is no limit to the number of independent variables a model can use Multiple regression has only one dependent variable 14-4 LO1 The Multiple Regression Model The linear regression model relating y to x1, x2,…, xk is y = β0 + β1x1 + β2x2 +…+ βkxk + µy = β0 + β1x1 + β2x2 +…+ βkxk is the mean value of the dependent variable y when the values of the independent variables are x1, x2,…, xk β0, β1, β2,… βk are unknown the regression parameters relating the mean value of y to x1, x2,…, xk is an error term that describes the effects on y of all factors other than the independent variables x1, x2,…, xk 14-5 LO 2: Explain the assumptions behind multiple regression and calculate the standard error. 14.2 Model Assumptions and the Standard Error The model is y = β0 + β1x1 + β2x2 + … + βkxk + Assumptions for multiple regression are stated about the model error terms, ’s 14-6 LO2 The Regression Model Assumptions Continued 1. 2. 3. 4. Mean of Zero Assumption The mean of the error terms is equal to 0 Constant Variance Assumption The variance of the error terms σ2 is, the same for every combination values of x1, x2,…, xk Normality Assumption The error terms follow a normal distribution for every combination values of x1, x2,…, xk Independence Assumption The values of the error terms are statistically independent of each other 14-7 LO 3: Calculate and interpret the multiple and adjusted multiple coefficients of determination. 1. 2. 3. 4. 14.3 R2 and Adjusted R2 Total variation is given by the formula Σ(yi - ȳ)2 Explained variation is given by the formula Σ(ŷi - ȳ)2 Unexplained variation is given by the formula Σ(yi - ŷi)2 Total variation is the sum of explained and unexplained variation This section can be read anytime after reading Section 14.1 14-8 LO 4: Test the significance of a multiple regression model by using an F test. 14.4 The Overall F Test To test H0: β1= β2 = …= βk = 0 versus Ha: At least one of β1, β2,…, βk ≠ 0 The test statistic is (Explained variation )/k F(model) (Unexplain ed variation )/[n - (k 1)] Reject H0 in favor of Ha if F(model) > F* or p-value < *F is based on k numerator and n-(k+1) denominator degrees of freedom 14-9 LO 5: Test the significance of a single independent variable. 14.5 Testing the Significance of an Independent Variable A variable in a multiple regression model is not likely to be useful unless there is a significant relationship between it and y To test significance, we use the null hypothesis H0: βj = 0 Versus the alternative hypothesis H a: β j ≠ 0 14-10 LO 6: Find and interpret a confidence interval for a mean value and a prediction interval for an individual value. 14.6 Confidence and Prediction Intervals The point on the regression line corresponding to a particular value of x01, x02,…, x0k, of the independent variables is ŷ = b0 + b1x01 + b2x02 + … + bkx0k It is unlikely that this value will equal the mean value of y for these x values Therefore, we need to place bounds on how far the predicted value might be from the actual value We can do this by calculating a confidence interval for the mean value of y and a prediction interval for an individual value of y 14-11 LO 7: Use dummy variables to model qualitative independent variables. So far, we have only looked at including quantitative data in a regression model However, we may wish to include descriptive qualitative data as well 14.8 Using Dummy Variables to Model Qualitative Independent Variables For example, might want to include the gender of respondents We can model the effects of different levels of a qualitative variable by using what are called dummy variables Also known as indicator variables 14-12 LO 8: Test the significance of a portion of a regression model by using an F test. 14.9 The Partial F Test: Testing the Significance of a Portion of a Regression Model So far, we have looked at testing single slope coefficients using t test We have also looked at testing all the coefficients at once using F test The partial F test allows us to test the significance of any set of independent variables in a regression model 14-13 LO 9: Use residual analysis to check the assumptions of multiple regression. 14.10 Residual Analysis in Multiple Regression For an observed value of yi, the residual is ei = yi - ŷ = yi – (b0 + b1xi1 + … + bkxik) If the regression assumptions hold, the residuals should look like a random sample from a normal distribution with mean 0 and variance σ2 14-14