Download Multiple Regression

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Multiple Linear Regression
Learning Objectives
• Extend Simple Linear Regression concepts to
regression with multiple explanatory variables
• Apply the Matlab regression tools and interpret
their output
• Choose the variables to use in a multiple
regression
• Quantify the uncertainty of MLR predictions
Readings
•
•
•
•
Kottegoda and Rosso, Chapter 6 (6.2)
Helsel and Hirsch, Chapters 9 and 11
Hastie, Tibshirani and Friedman, Chapter 3
Matlab Statistics Toolbox Users Guide,
Chapter 6.
Multiple Linear Regression Model
Y  o  1x1  2 x 2  ...  p1x p1  
 ~ N(0,  )
2
Ŷ  E( Y | x )  o  1x1  2 x 2  ...  p1x p1
Var ( Y)  Var ()  2
Data for Multiple Linear Regression
Output
Input
 y1 
y 
 2

 
yn 
1 x1,1 x1, 2  x1,p1 
1 x

x

x
2 ,1
2, 2
2 ,p 1 

.
. 
 .


1 x n ,1 x n , 2  x n ,p1 
Carrier Matrix
y  X  
Residuals
Solving Multiple Linear Regression
y  X  
SSE      
T
2
i
Minimizing results in


1
ˆ  X X XT y
T
(KR 6.2.7)
Vector of estimated mean values at each observation


1
T
ˆ
ŷ  X  X X X X y  H y
T
Vector of Residuals
  y  ŷ
(KR 6.2.9)
Error Variance
SSE      
T
2
i
SSE
ˆ 
np
2
SSy   ( yi  y)
Sum of squares of
observation deviations
from the mean
SSy  SSR  SSE
(KR 6.2.13)
2
SSR   ( ŷi  y)
2
Sum of squares of
regression estimates
deviations from the mean
SSR
R  1
SSy
2
Significance Tests on the Regression
Overall Significance
H 0 : i  0 for all i
SSR /( p  1)
~ Fp1,n p
SSE /( n  p)
(KR 6.2.16)
Nested/Partial F Test (Significance of ‘new’ parameters)
H 0 : i  0 for new parameters
(SSE,p0  SSE,p1 ) /( p1  p0 )
SSE,p1 /( n  p1 )
(KR 6.2.19)
~ Fp1 p0 ,n p
Complicated model with p1 parameters
versus simpler model with p0 parameters
(HH p297)
Significance and confidence limits on
regression parameters

C X X
ˆ i  i
ˆ cii
2
T

1
~ Tn  p
(KR 6.2.17)
2
2
ˆ
ˆ
ˆ
ˆ
Pr[i  t np, / 2  cii  i  i  t np, / 2  cii ]  1  
(KR 6.2.18)
[b,bint,r,rint,stats]=regress(Y,X);
b,bint
b = 0.0057 0.2187 -0.0074
bint = -0.1128
0.1242
0.1319
0.3054
-0.0248
0.0100
Confidence limits on mean response
ŷ o  x o ˆ
Var [ ŷo | x o ]  x o Cx o 
T
2

C X X
T

1
T
2
ˆ
ˆ
Pr[ x o  t n p, / 2  x o Cx o  E[ Y | x o ]
T
2
ˆ
 x o  t n p, / 2 ˆ x o Cx o ]  1  
(KR 6.2.32)
(KR 6.2.33)
Confidence limits on individual future
value
Yo  x o   
Var [ Yo | x o ]  Var [ ŷ o | x o ]  2
 ( x o Cx o  1)
T
2
T
2
ˆ
ˆ
Pr[ x o  t n p, / 2  (1  x o Cx o )  Yo
(KR 6.2.34)
T
2
ˆ
 x o  t n p, / 2 ˆ (1  x o Cx o ) ]  1  
Regression Diagnostics
Do not rely only on R2, F, SSE and T statistics.
(Read Helsel and Hirsch page 244 and 300)
Use graphical tools to diagnose MLR deficiencies
Partial Residual Plot
Added variable plot for X1
Adjusted for X3
0.06
0.04
Y residuals
0.02
0
-0.02
-0.04
Adjusted data
Fit: y=0.0352132*x
95% conf. bounds
-0.06
-0.08
-0.4
-0.3
-0.2
-0.1
0
X1 residuals
0.1
0.2
0.3
Residual
Diagnostic Plots
0.1
0
Residual
-0.1
0.4
0.6
0.7
Precipitation
0.8
0.9
1
0.1
0
-0.1
Probability
0.5
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Predicted
Probability plot for Normal distribution
0.16
0.18
0.1
0.12
0.9999
0.999
0.99
0.95
0.9
0.75
0.5
0.25
0.1
0.05
0.0001
-0.08 -0.06 -0.04 -0.02
0
0.02 0.04
Residual
0.06
0.08
Helsel and Hirsch page 245
The Hat Matrix


1
T
ˆ
ŷ  X  X X X X y  H y
T
H is independent of the observed outputs (y). Linear regression
predictions are a weighted average of the original y-values
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 0.46 0.36 0.25 0.14 0.04 -0.07 -0.18
[2,] 0.36 0.29 0.21 0.14 0.07 0.00 -0.07
[3,] 0.25 0.21 0.18 0.14 0.11 0.07 0.04
[4,] 0.14 0.14 0.14 0.14 0.14 0.14 0.14
[5,] 0.04 0.07 0.11 0.14 0.18 0.21 0.25
[6,] -0.07 0.00 0.07 0.14 0.21 0.29 0.36
[7,] -0.18 -0.07 0.04 0.14 0.25 0.36 0.46
15
20
7
1
7
2
6
1
5
0.1
0.0
-0.1
2
3
4
xx[, 1]
x-value
5
3
4
x
4
1
2
4
x
-0.2
hat[1, ]
Weight
0.2
1
3
3
2
5
0.3
y
10
y
0.4
25
Weights from the Hat matrix. Each line in the plot represents the
weights used to determine the fitted y-value at the indicated point
6
7
5
5
6
6
7
Diagonals of the Hat Matrix Quantify the Leverage
that a point has on the regression
Diagonals of Hat Matrix
0.18
0.16
0.14
Leverage
0.12
0.1
0.08
0.06
0.04
0.02
0
0
0.05
0.1
0.15
0.2
0.25
Leverage
MLR: Hat matrix H =
outlier with high leverage
but low influence
SLR
outlier with high leverage
and high influence
Helsel and Hirsch page 246
Outliers are harder to detect in MLR
Standardized residual (Compare to Normal or t distribution)
ri 
i
(KR 6.2.26)
ˆ (1  h i )
2
Prediction residual (leave one out estimate)
( i )  yi  ŷ ( i )
i

1  hi
(HH p247)
Prediction Error Sum of Squares
PRESS  (2i )
(HH p247)
Studentized residual (compare to t distribution)
TRESID i 
i
ˆ (2i ) (1  hi )
(HH p247)
Cook’s Distance: Leverage v. Actual Influence
• The Hat matrix (hii) indicates the leverage of point i.
• The leverage is not the same as the actual influence.
• Actual influence is only realized if the predicted
value is very different than the observed point.
ˆ ( i ) vs ˆ
ŷ ( i ) vs y i
• Cook’s Distance (Outlier if > 1)

C 


ˆ  ˆ T X T X ˆ  ˆ
(i)
(i)
i
pˆ
2

(KR 6.2.27)
Choosing Variables in MLR
(Helsel and Hirsch p 309)
• Stepwise regression (forward or backward
based on F or t statistic). Best model not
guaranteed
• Plausible theory why variable should
influence response
• Evaluate all possibilities using overall
measure of quality (HH p313)
Overall Measures of Quality
• Mallow’s Cp
(HH p313)
• Prediction Error Sum of Squares
PRESS  (2i )
(HH p247)
• Adjusted R2
( n  1) SSE
R  1
( n  p) SSy
2
a
(HH p313)
Related documents