Download Lecture 9 - IDA.LiU.se

Generalized linear models (GAMs)  Some examples of linear models  Proc GAM in SAS  Model selection in GAM Datamining and statistical learning lecture 9 Linear regression models p E (Y | X 1 , ..., X n )   0    j X j j 1 The inputs can be:  quantitative inputs  functions of quantitative inputs  base expansions of quantitative inputs  dummy variables  interaction terms Datamining and statistical learning lecture 9 Justification of linear regression models  Many response variables are linearly or almost linearly related to a set of inputs  Linear models are easy to comprehend and to fit to observed data  Linear regression models are particularly useful when: • the number of cases is moderate • data are sparse • the signal-to-noise ratio is low Datamining and statistical learning lecture 9 Performance of predictors based on: (i) a simple linear regression model (ii) a quadratic regression model when the true expected response is a second order polynomial in the input Predictions based on a linear model 18 16 14 12 10 8 6 4 2 0 E(y) yhat 0 1 2 3 4 Predictions based on a quadratic model 18 16 14 12 10 8 6 4 2 0 E(y) yhat2 0 1 Datamining and statistical learning lecture 9 2 3 4 Logistic regression of multiple purchases vs first amount spent Observed binary response Estimated event probability 1 0.8 0.6 0.4 0.2 0 0 1000 2000 3000 4000 5000 First amount spent Datamining and statistical learning lecture 9 6000 7000 Logistic regression for a binary response variable Y 1 E (Y | X  x)  E(Y | X=x) 0.8 0.6 log 0.4 exp(  0  1 x) 1  exp(  0  1 x) E (Y | X  x) P(Y  1 | X  x)  log   0  1 x 1  E (Y | X  x) P(Y  0 | X  x) 0.2 0 0 2 4 x The expectation of Y given x is a linear function of x Datamining and statistical learning lecture 9 Generalized additive models: some examples A nonlinear, additive model E (Y | X 1 , ..., X n )    s1 ( X 1 )  ...  s p ( X p ) A mixed linear and nonlinear, additive model q E (Y | X 1 , ..., X n )    s1 ( X 1 )  ...  s p ( X p )    j X p  j j 1 A mixed linear and nonlinear, additive model with a class variable q E (Y | X 1 , ..., X n , Class  k )   k  s1 ( X 1 )  ...  s p ( X p )    j X p  j j 1 Datamining and statistical learning lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Jan Feb 6 Mar Apr 4 May Jun Jul Aug Output: Sep Oct Total-N conc Nov Dec Jan Feb 6 Mar Apr Monthly pattern 4 May Jun Trend function Jul Aug Sep Oct Nov Dec 8 2 Nov 2001 Jun 1997 1993 0 1989 Observed data Total-N concentration (mg N/l) Jan 8 2 Nov 2001 Jun 1997 1993 0 1989 Fitted model Total-N concentration (mg N/l) Jan Datamining and statistical learning lecture 9 Inputs: Modelling the concentration of total nitrogen at Lobith on the Rhine: Extracted additive components Year: linear component Year: Smooth component 2.0 1.5 1.0 0.5 Year components 0.0 -0.5 -1.0 -1.5 1988 1990 1992 1994 1996 Month: linear component 1998 2000 2002 2004 Month: Smooth component 1.5 1.0 Month components 0.5 0.0 -0.5 -1.0 0 2 4 6 Datamining and statistical learning lecture 9 8 10 12 14 Weekly mortality and confirmed cases of influenza in Sweden Mortality 3000 Influenza 450 2500 Mortality 2000 300 1500 1000 150 500 0 0 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Datamining and statistical learning lecture 9 Confirmed cases of influenza Response: Weekly mortality Inputs: Confirmed cases of influenza Seasonal dummies Long-term trend SYNTAX for common GAM models Type of Model Syntax Parametric model y = param(x); Nonparametric model y = spline(x); Nonparametric model y = loess(x); Semiparametric model y = param(x1) spline(x2); Additive model y = spline(x1) spline(x2); Thin-plate spline model y = spline2(x1,x2); Mathematical Form Datamining and statistical learning lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Model 1 proc gam data=Mining.Rhine; model Nconc = spline(Year) spline(Month); output out = addmodel1; run; Model 2 proc gam data=Mining.Rhine; model Nconc = spline2(Year, Month); output out = addmodel2; run; Datamining and statistical learning lecture 9 Proc GAM – degrees of freedom of the spline components The degrees of freedom of the spline components is selected by the user or by specifying method=GCV proc gam data=Mining.Rhine; model Nconc = spline(Year, df=3) spline(Month, df=3); output out = addmodel1; run; • Df=3 implies that the same cubic polynomial is valid in the entire range of the input • Increasing the df-value implies that knots are introduced Datamining and statistical learning lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine proc gam data=Mining.Rhine; model Nconc = spline(Year) spline(Month); output out = addmodel1; run; P_Month P_Year 1.50 0.25 0.20 1.00 0.15 0.10 0.50 0.05 0.00 0.00 -0.50 -0.05 -1.00 -0.10 1 13 25 37 49 61 73 85 97 109 121 133 145 157 1 13 25 37 49 61 73 85 97 109 121 133 145 157 Observation nr Observation nr Datamining and statistical learning lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Surface Plot of P_Nconc vs Month, Year Model 1 6 P_Nconc 4 12 8 2 1990 4 1995 Year 2000 Month 0 Datamining and statistical learning lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Surface Plot of P_Nconc_2 vs Month, Year Model 2 6 df=4 P_Nconc_2 5 4 12 3 8 1990 4 1995 Year 2000 Month 0 Datamining and statistical learning lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Surface Plot of P_Nconc_3 vs Month, Year Model 3 6.0 P_Nconc_3 4.5 12 3.0 8 1990 4 1995 Year 2000 Month 0 Datamining and statistical learning lecture 9 df=20 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Model 1 The GAM Procedure Dependent Variable: Nconc Smoothing Model Component(s): spline(Year) spline(Month) Summary of Input Data Set Number of Observations Number of Missing Observations Distribution Link Function 168 0 Gaussian Identity Iteration Summary and Fit Statistics Final Number of Backfitting Iterations Final Backfitting Criterion The Deviance of the Final Estimate 2 1.987193E-30 42.92519322 The local score algorithm converged. Datamining and statistical learning lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Model 1 Regression Model Analysis Parameter Estimates Parameter Parameter Estimate Standard Error t Value Pr > |t| Intercept Linear(Year) Linear(Month) 420.69388 -0.20824 -0.10461 19.84413 0.00994 0.01161 21.20 -20.94 -9.01 <.0001 <.0001 <.0001 Smoothing Model Analysis Analysis of Deviance Source Spline(Year) Spline(Month) DF Sum of Squares Chi-Square Pr > ChiSq 3.00000 3.00000 2.527155 51.143931 9.3609 189.4432 0.0249 <.0001 Datamining and statistical learning lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Model 2 Iteration Summary and Fit Statistics Final Number of Backfitting Iterations Final Backfitting Criterion The Deviance of the Final Estimate 2 0 74.22284569 Regression Model Analysis Parameter Estimates Parameter Parameter Estimate Standard Error t Value Pr > |t| Intercept 4.46475 0.05206 85.76 <.0001 Smoothing Model Analysis Analysis of Deviance Source Spline2(Year Month) DF Sum of Squares Chi-Square Pr > ChiSq 4.00000 162.668070 357.2336 <.0001 Datamining and statistical learning lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Model 2 (20 df) Iteration Summary and Fit Statistics Final Number of Backfitting Iterations Final Backfitting Criterion The Deviance of the Final Estimate 2 0 36.577160798 Regression Model Analysis Parameter Estimates Parameter Parameter Estimate Standard Error t Value Pr > |t| Intercept 4.46475 0.03849 116.01 <.0001 Smoothing Model Analysis Analysis of Deviance Source Spline2(Year Month) DF Sum of Squares 20.00000 200.313755 Datamining and statistical learning lecture 9 Chi-Square 805.0412 Pr > ChiSq <.0001 Estimation of additive models - the backfitting algorithm E (Y | X 1  x1 ,..., X p  x p )    f1 ( x1 )  ...  f p ( x p ) 1 N 1.Initiali ze : ˆ   yi , fˆ j  0, j  1,..., p N i 1 2.Cycle : j  1,..., p,2,..., p,...,1,..., p   ˆf  s ( y  ˆ  ˆf ( x )  j j i k ik k j   N 1 fˆ j  fˆ j   fˆ j ( xij ) N i 1 Datamining and statistical learning lecture 9 Modelling ln daily electricity consumption as a spline function of the population-weighted mean temperature in Sweden proc gam data=sasuser.smhi; model lnDaily_consumption = spline(Meantemp, df=20); ID Time; output out=smhiouttemp pred resid; run; Observed Fitted ln daily electricity consumption (MWh) 13.4 13.2 13.0 12.8 12.6 12.4 12.2 -30 -20 -10 0 10 20 Datamining and statistical learning Population-weighted temperature lecture 9 30 Residual Modelling ln daily electricity consumption as a spline function of the population-weighted mean temperature in Sweden: residual analysis 0.20 0.15 0.10 0.05 0.00 -0.05 -0.10 -0.15 -0.20 -0.25 0 100 200 Julian day Datamining and statistical learning lecture 9 300 400 Modelling ln daily electricity consumption in Sweden - residual analysis Spline of temperature Spline of Julian day Weekday dummies 0.20 0.15 0.10 0.05 0.00 -0.05 -0.10 -0.15 -0.20 -0.25 Residual Residual Spline of temperature 0 100 200 Julian day 300 400 0.20 0.15 0.10 0.05 0.00 -0.05 -0.10 -0.15 -0.20 -0.25 0 Datamining and statistical learning lecture 9 100 200 Julian day 300 400 Modelling ln daily electricity consumption in Sweden - residual analysis Splines of contemporaneous and time-lagged weather data Splines of Julian day and time Weekday and holiday dummies 0.20 0.15 0.10 0.05 0.00 -0.05 -0.10 -0.15 -0.20 -0.25 0.20 0.15 0.10 Residual Residual Spline of temperature Spline of Julian day Weekday dummies 0.05 0.00 -0.05 -0.10 -0.15 -0.20 0 100 200 Julian day 300 400 -0.25 0 Datamining and statistical learning lecture 9 100 200 Julian day 300 400 Deviance analysis of the investigated models of ln daily electricity consumption in Sweden Deviance 12 10.233 10 8 6 3.822 4 0.742 2 0 Temp only Temp, Julian day, weekday The residual deviance of a fitted model is minus twice its log-likelihood Final model If the error terms are normally distributed, the deviance is equal to the sum of squared residuals Datamining and statistical learning lecture 9 Modelling ln daily electricity consumption in Sweden: time series plot of residuals 0.15 Residual 0.10 0.05 0.00 -0.05 -0.10 -0.15 0 500 1000 Time Datamining and statistical learning lecture 9 1500 2000

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture 9 - IDA.LiU.se