Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Stat 112 Notes 6 • Today: – Chapter 4.1 (Introduction to Multiple Regression) Multiple Regression • In multiple regression analysis, we consider more than one explanatory variable, X1,…,XK . We are interested in the conditional mean of Y given X1,…,XK , E(Y| X1,…,XK ). • Two motivations for multiple regression: – We can obtain better predictions of Y by using information on X1,…,XK rather than just X1. – We can control for lurking variables Automobile Example • A team charged with designing a new automobile is concerned about the gas mileage (gallons per 1000 miles on a highway) that can be achieved. The design team is interested in two things: (1) Which characteristics of the design are likely to affect mileage? (2) A new car is planned to have the following characteristics: weight – 4000 lbs, horsepower – 200, length – 200 inches, seating – 5 adults. Predict the new car’s gas mileage. • The team has available information about gallons per 1000 miles and four design characteristics (weight, horsepower, length, seating) for a sample of cars made in 2004. Data is in car04.JMP. Multivariate Correlations GP1000M_Hwy Weight(lb) Horsepower Length Seating GP1000M_Hwy 1.0000 0.8575 0.6120 0.3912 0.3993 Weight(lb) 0.8575 1.0000 0.6434 0.7023 0.5858 Horsepower 0.6120 0.6434 1.0000 0.4910 0.0642 Length 0.3912 0.7023 0.4910 1.0000 0.6010 Seating 0.3993 0.5858 0.0642 0.6010 1.0000 20 rows not used due to missing or excluded values or frequency or weight variables missing, negative or less than one. Scatterplot Matrix 60 50 40 30 GP1000M_Hw y 5000 Weight(lb) 4000 3000 2000 300 Horsepow er 200 100 200 Length 180 160 140 6 Seating 4 2 30 40 50 60 2000 4000 100 200 300 140 170 200 2 3 4 5 6 7 Best Single Predictor • To obtain the correlation matrix and pairwise scatterplots, click Analyze, Multivariate Methods, Multivariate. • If we use simple linear regression with each of the four explanatory variables, which provides the best predictions? Best Single Predictor • Answer: The simple linear regression that has the highest R2 gives the best predictions because recall that SSE 2 R 1 SST • Weight gives the best predictions of GPM1000Hwy based on simple linear regression. • But we can obtain better predictions by using more than one of the independent variables. Multiple Linear Regression Model E (Y | X 1 , , X K ) ( X1 , Yi 0 1 X i1 ,XK ) 0 1 X 1 K X K K X iK ei For each possible value of ( X 1 , , X K ) , there is a subpopulation. Assumptions of the multiple linear regression model: (1) Linearity: the means of the subpopulation are a linear function of ( X 1 , , X K ) , i.e., E (Y | X1 , , X K ) 0 1 X1 K X K for some ( 0 , , K ) . (2) Constant variance: the subpopulation standard deviations are all equal (to e ) (3) Normality: The subpopulations are normally distributed. (4) Independence: The observations are independent. Point Estimates for Multiple Linear Regression Model • We use the same least squares procedure as for simple linear regression. • Our estimates of 0 ,..., K are the coefficients that minimize the sum of squared prediction b0 ,..., bK errors: b0 ,..., bK arg min b* ,..., b* 0 K * * * 2 ( y b b x b x ) i1 i 0 1 i1 K iK n yˆ b0 b1x1 bK xK • Least Squares in JMP: Click Analyze, Fit Model, put dependent variable into Y and add independent variables to the construct model effects box. Response GP1000M_Hwy Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.834148 0.831091 3.082396 39.75907 222 Parameter Estimates Term Intercept Weight(lb) Seating Horsepower Length Estimate 42.198338 0.0102748 0.2748828 0.0189373 -0.244818 Std Error 3.300533 0.00052 0.254288 0.00524 0.02358 t Ratio 12.79 19.77 1.08 3.61 -10.38 Prob>|t| <.0001 <.0001 0.2809 0.0004 <.0001 Root Mean Square Error • Estimate of e : 2 ˆ ( y y ) i1 i i n se n K 1 • se = Root Mean Square Error in JMP • For simple linear regression of GP1000MHWY on Weight, RMSE 3.87 . For multiple linear regression of GP1000MHWY on weight, horsepower, cargo, seating, RMSE 3.08 • The multiple regression improves the predictions. Residuals and Root Mean Square Errors • Eˆ (Y | X 1 x1 , , X K xK ) b0 b1 x1 bK xK • Residual for observation i = prediction error for observation i = Yi Eˆ (Y | X1 xi1 , Yi b0 b1 xi1 , X K xiK ) bK xiK • Root mean square error = Typical size of absolute value of prediction error • As with simple linear regression model, if multiple linear regression model holds – About 95% of the observations will be within two RMSEs of their predicted value • For car data, about 95% of the time, the actual GP1000M will be within 2*3.08=6.16 GP1000M of the predicted GP1000M of the car based on the car’s weight, horsepower, length and seating. Residual Example BMW 745i Weight = 4376 Seating = 5 Horsepower = 325 Length = 198 Eˆ (Y | X 1 , , X 4 ) 42.19 .01027 * 4376 .2479*5 .0189*325 .2448*198 46.22 Actual Y (GP1000M) for BMW745i = 38.46 Residual = 38.46-46.22 = -7.76 The BMW is more fuel efficient (lower GP1000M) than we would expect based on its weight, seating, horsepower and length. The residuals and predicted values can be saved by clicking the red triangle next to Response after Fit Model, then clicking Save Columns and clicking Predicted Values and Residuals. Interpretation of Regression Coefficients • Gas mileage regression from car04.JMP Parameter Estimates Term Intercept Weight(lb) Seating Horsepower Length Estimate 42.198338 0.0102748 0.2748828 0.0189373 -0.244818 Std Error 3.300533 0.00052 0.254288 0.00524 0.02358 t Ratio 12.79 19.77 1.08 3.61 -10.38 Prob>|t| <.0001 <.0001 0.2809 0.0004 <.0001 Interpretation of coefficient bweight 0.0103 : The mean of GP1000Mwy is estimated to increase 0.0103 for a one pound increase in weight holding fixed seating, horsepower and length. E (Y | X 1 x1 1, X 2 x2 , ( 0 1 ( x1 1) , X K xK ) E (Y | X 1 x1 , K xK ) ( 0 1 x1 , X K xK ) K xK ) 1