Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Multiple Linear Regression Learning Objectives • Extend Simple Linear Regression concepts to regression with multiple explanatory variables • Apply the Matlab regression tools and interpret their output • Choose the variables to use in a multiple regression • Quantify the uncertainty of MLR predictions Readings • • • • Kottegoda and Rosso, Chapter 6 (6.2) Helsel and Hirsch, Chapters 9 and 11 Hastie, Tibshirani and Friedman, Chapter 3 Matlab Statistics Toolbox Users Guide, Chapter 6. Multiple Linear Regression Model Y o 1x1 2 x 2 ... p1x p1 ~ N(0, ) 2 Ŷ E( Y | x ) o 1x1 2 x 2 ... p1x p1 Var ( Y) Var () 2 Data for Multiple Linear Regression Output Input y1 y 2 yn 1 x1,1 x1, 2 x1,p1 1 x x x 2 ,1 2, 2 2 ,p 1 . . . 1 x n ,1 x n , 2 x n ,p1 Carrier Matrix y X Residuals Solving Multiple Linear Regression y X SSE T 2 i Minimizing results in 1 ˆ X X XT y T (KR 6.2.7) Vector of estimated mean values at each observation 1 T ˆ ŷ X X X X X y H y T Vector of Residuals y ŷ (KR 6.2.9) Error Variance SSE T 2 i SSE ˆ np 2 SSy ( yi y) Sum of squares of observation deviations from the mean SSy SSR SSE (KR 6.2.13) 2 SSR ( ŷi y) 2 Sum of squares of regression estimates deviations from the mean SSR R 1 SSy 2 Significance Tests on the Regression Overall Significance H 0 : i 0 for all i SSR /( p 1) ~ Fp1,n p SSE /( n p) (KR 6.2.16) Nested/Partial F Test (Significance of ‘new’ parameters) H 0 : i 0 for new parameters (SSE,p0 SSE,p1 ) /( p1 p0 ) SSE,p1 /( n p1 ) (KR 6.2.19) ~ Fp1 p0 ,n p Complicated model with p1 parameters versus simpler model with p0 parameters (HH p297) Significance and confidence limits on regression parameters C X X ˆ i i ˆ cii 2 T 1 ~ Tn p (KR 6.2.17) 2 2 ˆ ˆ ˆ ˆ Pr[i t np, / 2 cii i i t np, / 2 cii ] 1 (KR 6.2.18) [b,bint,r,rint,stats]=regress(Y,X); b,bint b = 0.0057 0.2187 -0.0074 bint = -0.1128 0.1242 0.1319 0.3054 -0.0248 0.0100 Confidence limits on mean response ŷ o x o ˆ Var [ ŷo | x o ] x o Cx o T 2 C X X T 1 T 2 ˆ ˆ Pr[ x o t n p, / 2 x o Cx o E[ Y | x o ] T 2 ˆ x o t n p, / 2 ˆ x o Cx o ] 1 (KR 6.2.32) (KR 6.2.33) Confidence limits on individual future value Yo x o Var [ Yo | x o ] Var [ ŷ o | x o ] 2 ( x o Cx o 1) T 2 T 2 ˆ ˆ Pr[ x o t n p, / 2 (1 x o Cx o ) Yo (KR 6.2.34) T 2 ˆ x o t n p, / 2 ˆ (1 x o Cx o ) ] 1 Regression Diagnostics Do not rely only on R2, F, SSE and T statistics. (Read Helsel and Hirsch page 244 and 300) Use graphical tools to diagnose MLR deficiencies Partial Residual Plot Added variable plot for X1 Adjusted for X3 0.06 0.04 Y residuals 0.02 0 -0.02 -0.04 Adjusted data Fit: y=0.0352132*x 95% conf. bounds -0.06 -0.08 -0.4 -0.3 -0.2 -0.1 0 X1 residuals 0.1 0.2 0.3 Residual Diagnostic Plots 0.1 0 Residual -0.1 0.4 0.6 0.7 Precipitation 0.8 0.9 1 0.1 0 -0.1 Probability 0.5 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 Predicted Probability plot for Normal distribution 0.16 0.18 0.1 0.12 0.9999 0.999 0.99 0.95 0.9 0.75 0.5 0.25 0.1 0.05 0.0001 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 Residual 0.06 0.08 Helsel and Hirsch page 245 The Hat Matrix 1 T ˆ ŷ X X X X X y H y T H is independent of the observed outputs (y). Linear regression predictions are a weighted average of the original y-values [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,] 0.46 0.36 0.25 0.14 0.04 -0.07 -0.18 [2,] 0.36 0.29 0.21 0.14 0.07 0.00 -0.07 [3,] 0.25 0.21 0.18 0.14 0.11 0.07 0.04 [4,] 0.14 0.14 0.14 0.14 0.14 0.14 0.14 [5,] 0.04 0.07 0.11 0.14 0.18 0.21 0.25 [6,] -0.07 0.00 0.07 0.14 0.21 0.29 0.36 [7,] -0.18 -0.07 0.04 0.14 0.25 0.36 0.46 15 20 7 1 7 2 6 1 5 0.1 0.0 -0.1 2 3 4 xx[, 1] x-value 5 3 4 x 4 1 2 4 x -0.2 hat[1, ] Weight 0.2 1 3 3 2 5 0.3 y 10 y 0.4 25 Weights from the Hat matrix. Each line in the plot represents the weights used to determine the fitted y-value at the indicated point 6 7 5 5 6 6 7 Diagonals of the Hat Matrix Quantify the Leverage that a point has on the regression Diagonals of Hat Matrix 0.18 0.16 0.14 Leverage 0.12 0.1 0.08 0.06 0.04 0.02 0 0 0.05 0.1 0.15 0.2 0.25 Leverage MLR: Hat matrix H = outlier with high leverage but low influence SLR outlier with high leverage and high influence Helsel and Hirsch page 246 Outliers are harder to detect in MLR Standardized residual (Compare to Normal or t distribution) ri i (KR 6.2.26) ˆ (1 h i ) 2 Prediction residual (leave one out estimate) ( i ) yi ŷ ( i ) i 1 hi (HH p247) Prediction Error Sum of Squares PRESS (2i ) (HH p247) Studentized residual (compare to t distribution) TRESID i i ˆ (2i ) (1 hi ) (HH p247) Cook’s Distance: Leverage v. Actual Influence • The Hat matrix (hii) indicates the leverage of point i. • The leverage is not the same as the actual influence. • Actual influence is only realized if the predicted value is very different than the observed point. ˆ ( i ) vs ˆ ŷ ( i ) vs y i • Cook’s Distance (Outlier if > 1) C ˆ ˆ T X T X ˆ ˆ (i) (i) i pˆ 2 (KR 6.2.27) Choosing Variables in MLR (Helsel and Hirsch p 309) • Stepwise regression (forward or backward based on F or t statistic). Best model not guaranteed • Plausible theory why variable should influence response • Evaluate all possibilities using overall measure of quality (HH p313) Overall Measures of Quality • Mallow’s Cp (HH p313) • Prediction Error Sum of Squares PRESS (2i ) (HH p247) • Adjusted R2 ( n 1) SSE R 1 ( n p) SSy 2 a (HH p313)