Download Lecture 9 - IDA.LiU.se

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Generalized linear models (GAMs)
 Some examples of linear models
 Proc GAM in SAS
 Model selection in GAM
Datamining and statistical learning lecture 9
Linear regression models
p
E (Y | X 1 , ..., X n )   0    j X j
j 1
The inputs can be:

quantitative inputs

functions of quantitative inputs

base expansions of quantitative inputs

dummy variables

interaction terms
Datamining and statistical learning lecture 9
Justification of linear regression models

Many response variables are linearly or almost linearly
related to a set of inputs

Linear models are easy to comprehend and to fit to
observed data

Linear regression models are particularly useful when:
•
the number of cases is moderate
•
data are sparse
•
the signal-to-noise ratio is low
Datamining and statistical learning lecture 9
Performance of predictors based on:
(i) a simple linear regression model
(ii) a quadratic regression model
when the true expected response is a second order polynomial
in the input
Predictions based on a linear model
18
16
14
12
10
8
6
4
2
0
E(y)
yhat
0
1
2
3
4
Predictions based on a quadratic model
18
16
14
12
10
8
6
4
2
0
E(y)
yhat2
0
1
Datamining and statistical learning lecture 9
2
3
4
Logistic regression of multiple purchases
vs first amount spent
Observed binary response
Estimated event probability
1
0.8
0.6
0.4
0.2
0
0
1000
2000
3000
4000
5000
First amount spent
Datamining and statistical learning lecture 9
6000
7000
Logistic regression for a binary response variable Y
1
E (Y | X  x) 
E(Y | X=x)
0.8
0.6
log
0.4
exp(  0  1 x)
1  exp(  0  1 x)
E (Y | X  x)
P(Y  1 | X  x)
 log
  0  1 x
1  E (Y | X  x)
P(Y  0 | X  x)
0.2
0
0
2
4
x
The expectation of Y
given x is a linear
function of x
Datamining and statistical learning lecture 9
Generalized additive models: some examples
A nonlinear, additive model
E (Y | X 1 , ..., X n )    s1 ( X 1 )  ...  s p ( X p )
A mixed linear and nonlinear, additive model
q
E (Y | X 1 , ..., X n )    s1 ( X 1 )  ...  s p ( X p )    j X p  j
j 1
A mixed linear and nonlinear, additive model with a class variable
q
E (Y | X 1 , ..., X n , Class  k )   k  s1 ( X 1 )  ...  s p ( X p )    j X p  j
j 1
Datamining and statistical learning lecture 9
Generalized additive models:
Modelling the concentration of total nitrogen at Lobith on the Rhine
Jan
Feb
6
Mar
Apr
4
May
Jun
Jul
Aug
Output:
Sep
Oct
Total-N conc
Nov
Dec
Jan
Feb
6
Mar
Apr
Monthly pattern
4
May
Jun
Trend function
Jul
Aug
Sep
Oct
Nov
Dec
8
2
Nov
2001
Jun
1997
1993
0
1989
Observed
data
Total-N
concentration
(mg N/l)
Jan
8
2
Nov
2001
Jun
1997
1993
0
1989
Fitted model
Total-N
concentration
(mg N/l)
Jan
Datamining and statistical learning lecture 9
Inputs:
Modelling the concentration of total nitrogen at Lobith on the Rhine:
Extracted additive components
Year: linear component
Year: Smooth component
2.0
1.5
1.0
0.5
Year
components
0.0
-0.5
-1.0
-1.5
1988
1990
1992
1994
1996
Month: linear component
1998
2000
2002
2004
Month: Smooth component
1.5
1.0
Month
components
0.5
0.0
-0.5
-1.0
0
2
4
6
Datamining and statistical learning lecture 9
8
10
12
14
Weekly mortality and confirmed cases of influenza in Sweden
Mortality
3000
Influenza
450
2500
Mortality
2000
300
1500
1000
150
500
0
0
1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
Datamining and statistical learning lecture 9
Confirmed cases of influenza
Response:
Weekly mortality
Inputs:
Confirmed cases
of influenza
Seasonal
dummies
Long-term trend
SYNTAX for common GAM models
Type of Model
Syntax
Parametric
model y = param(x);
Nonparametric
model y = spline(x);
Nonparametric
model y = loess(x);
Semiparametric
model y = param(x1)
spline(x2);
Additive
model y = spline(x1)
spline(x2);
Thin-plate spline
model y =
spline2(x1,x2);
Mathematical Form
Datamining and statistical learning lecture 9
Generalized additive models:
Modelling the concentration of total nitrogen at Lobith on the Rhine
Model 1
proc gam data=Mining.Rhine;
model Nconc = spline(Year) spline(Month);
output out = addmodel1;
run;
Model 2
proc gam data=Mining.Rhine;
model Nconc = spline2(Year, Month);
output out = addmodel2;
run;
Datamining and statistical learning lecture 9
Proc GAM – degrees of freedom of the spline components
The degrees of freedom of the spline components is selected
by the user or by specifying method=GCV
proc gam data=Mining.Rhine;
model Nconc = spline(Year, df=3) spline(Month, df=3);
output out = addmodel1;
run;
•
Df=3 implies that the same cubic polynomial is valid in the entire
range of the input
•
Increasing the df-value implies that knots are introduced
Datamining and statistical learning lecture 9
Generalized additive models:
Modelling the concentration of total nitrogen at Lobith on the Rhine
proc gam data=Mining.Rhine;
model Nconc = spline(Year) spline(Month);
output out = addmodel1;
run;
P_Month
P_Year
1.50
0.25
0.20
1.00
0.15
0.10
0.50
0.05
0.00
0.00
-0.50
-0.05
-1.00
-0.10
1
13 25 37 49 61 73 85 97 109 121 133 145 157
1
13 25 37 49 61 73 85 97 109 121 133 145 157
Observation nr
Observation nr
Datamining and statistical learning lecture 9
Generalized additive models:
Modelling the concentration of total nitrogen at Lobith on the Rhine
Surface Plot of P_Nconc vs Month, Year
Model 1
6
P_Nconc
4
12
8
2
1990
4
1995
Year
2000
Month
0
Datamining and statistical learning lecture 9
Generalized additive models:
Modelling the concentration of total nitrogen at Lobith on the Rhine
Surface Plot of P_Nconc_2 vs Month, Year
Model 2
6
df=4
P_Nconc_2 5
4
12
3
8
1990
4
1995
Year
2000
Month
0
Datamining and statistical learning lecture 9
Generalized additive models:
Modelling the concentration of total nitrogen at Lobith on the Rhine
Surface Plot of P_Nconc_3 vs Month, Year
Model 3
6.0
P_Nconc_3
4.5
12
3.0
8
1990
4
1995
Year
2000
Month
0
Datamining and statistical learning lecture 9
df=20
Generalized additive models:
Modelling the concentration of total nitrogen at Lobith on the Rhine
Model 1
The GAM Procedure
Dependent Variable: Nconc
Smoothing Model Component(s): spline(Year) spline(Month)
Summary of Input Data Set
Number of Observations
Number of Missing Observations
Distribution
Link Function
168
0
Gaussian
Identity
Iteration Summary and Fit Statistics
Final Number of Backfitting Iterations
Final Backfitting Criterion
The Deviance of the Final Estimate
2
1.987193E-30
42.92519322
The local score algorithm converged.
Datamining and statistical learning lecture 9
Generalized additive models:
Modelling the concentration of total nitrogen at Lobith on the Rhine
Model 1
Regression Model Analysis
Parameter Estimates
Parameter
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
Intercept
Linear(Year)
Linear(Month)
420.69388
-0.20824
-0.10461
19.84413
0.00994
0.01161
21.20
-20.94
-9.01
<.0001
<.0001
<.0001
Smoothing Model Analysis
Analysis of Deviance
Source
Spline(Year)
Spline(Month)
DF
Sum of
Squares
Chi-Square
Pr > ChiSq
3.00000
3.00000
2.527155
51.143931
9.3609
189.4432
0.0249
<.0001
Datamining and statistical learning lecture 9
Generalized additive models:
Modelling the concentration of total nitrogen at Lobith on the Rhine
Model 2
Iteration Summary and Fit Statistics
Final Number of Backfitting Iterations
Final Backfitting Criterion
The Deviance of the Final Estimate
2
0
74.22284569
Regression Model Analysis
Parameter Estimates
Parameter
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
Intercept
4.46475
0.05206
85.76
<.0001
Smoothing Model Analysis
Analysis of Deviance
Source
Spline2(Year Month)
DF
Sum of
Squares
Chi-Square
Pr > ChiSq
4.00000
162.668070
357.2336
<.0001
Datamining and statistical learning lecture 9
Generalized additive models:
Modelling the concentration of total nitrogen at Lobith on the Rhine
Model 2
(20 df)
Iteration Summary and Fit Statistics
Final Number of Backfitting Iterations
Final Backfitting Criterion
The Deviance of the Final Estimate
2
0
36.577160798
Regression Model Analysis
Parameter Estimates
Parameter
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
Intercept
4.46475
0.03849
116.01
<.0001
Smoothing Model Analysis
Analysis of Deviance
Source
Spline2(Year Month)
DF
Sum of
Squares
20.00000
200.313755
Datamining and statistical learning lecture 9
Chi-Square
805.0412
Pr > ChiSq
<.0001
Estimation of additive models
- the backfitting algorithm
E (Y | X 1  x1 ,..., X p  x p )    f1 ( x1 )  ...  f p ( x p )
1 N
1.Initiali ze : ˆ   yi , fˆ j  0, j  1,..., p
N i 1
2.Cycle : j  1,..., p,2,..., p,...,1,..., p


ˆf  s ( y  ˆ 
ˆf ( x )

j
j
i
k
ik
k j


N
1
fˆ j  fˆ j   fˆ j ( xij )
N i 1
Datamining and statistical learning lecture 9
Modelling ln daily electricity consumption as a spline function
of the population-weighted mean temperature in Sweden
proc gam data=sasuser.smhi;
model lnDaily_consumption = spline(Meantemp, df=20);
ID Time;
output out=smhiouttemp pred resid;
run;
Observed
Fitted
ln daily electricity
consumption (MWh)
13.4
13.2
13.0
12.8
12.6
12.4
12.2
-30
-20
-10
0
10
20
Datamining
and statistical learning
Population-weighted
temperature
lecture 9
30
Residual
Modelling ln daily electricity consumption as a spline function
of the population-weighted mean temperature in Sweden:
residual analysis
0.20
0.15
0.10
0.05
0.00
-0.05
-0.10
-0.15
-0.20
-0.25
0
100
200
Julian day
Datamining and statistical learning lecture 9
300
400
Modelling ln daily electricity consumption in Sweden
- residual analysis
Spline of temperature
Spline of Julian day
Weekday dummies
0.20
0.15
0.10
0.05
0.00
-0.05
-0.10
-0.15
-0.20
-0.25
Residual
Residual
Spline of
temperature
0
100
200
Julian day
300
400
0.20
0.15
0.10
0.05
0.00
-0.05
-0.10
-0.15
-0.20
-0.25
0
Datamining and statistical learning lecture 9
100
200
Julian day
300
400
Modelling ln daily electricity consumption in Sweden
- residual analysis
Splines of contemporaneous and
time-lagged weather data
Splines of Julian day and time
Weekday and holiday dummies
0.20
0.15
0.10
0.05
0.00
-0.05
-0.10
-0.15
-0.20
-0.25
0.20
0.15
0.10
Residual
Residual
Spline of temperature
Spline of Julian day
Weekday dummies
0.05
0.00
-0.05
-0.10
-0.15
-0.20
0
100
200
Julian day
300
400
-0.25
0
Datamining and statistical learning lecture 9
100
200
Julian day
300
400
Deviance analysis of the investigated models of
ln daily electricity consumption in Sweden
Deviance
12
10.233
10
8
6
3.822
4
0.742
2
0
Temp only
Temp, Julian day,
weekday
The residual deviance of a fitted model is
minus twice its log-likelihood
Final model
If the error terms are normally
distributed, the deviance is equal to the
sum of squared residuals
Datamining and statistical learning lecture 9
Modelling ln daily electricity consumption in Sweden:
time series plot of residuals
0.15
Residual
0.10
0.05
0.00
-0.05
-0.10
-0.15
0
500
1000
Time
Datamining and statistical learning lecture 9
1500
2000
Related documents