Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Generalized linear models (GAMs) Some examples of linear models Proc GAM in SAS Model selection in GAM Datamining and statistical learning lecture 9 Linear regression models p E (Y | X 1 , ..., X n ) 0 j X j j 1 The inputs can be: quantitative inputs functions of quantitative inputs base expansions of quantitative inputs dummy variables interaction terms Datamining and statistical learning lecture 9 Justification of linear regression models Many response variables are linearly or almost linearly related to a set of inputs Linear models are easy to comprehend and to fit to observed data Linear regression models are particularly useful when: • the number of cases is moderate • data are sparse • the signal-to-noise ratio is low Datamining and statistical learning lecture 9 Performance of predictors based on: (i) a simple linear regression model (ii) a quadratic regression model when the true expected response is a second order polynomial in the input Predictions based on a linear model 18 16 14 12 10 8 6 4 2 0 E(y) yhat 0 1 2 3 4 Predictions based on a quadratic model 18 16 14 12 10 8 6 4 2 0 E(y) yhat2 0 1 Datamining and statistical learning lecture 9 2 3 4 Logistic regression of multiple purchases vs first amount spent Observed binary response Estimated event probability 1 0.8 0.6 0.4 0.2 0 0 1000 2000 3000 4000 5000 First amount spent Datamining and statistical learning lecture 9 6000 7000 Logistic regression for a binary response variable Y 1 E (Y | X x) E(Y | X=x) 0.8 0.6 log 0.4 exp( 0 1 x) 1 exp( 0 1 x) E (Y | X x) P(Y 1 | X x) log 0 1 x 1 E (Y | X x) P(Y 0 | X x) 0.2 0 0 2 4 x The expectation of Y given x is a linear function of x Datamining and statistical learning lecture 9 Generalized additive models: some examples A nonlinear, additive model E (Y | X 1 , ..., X n ) s1 ( X 1 ) ... s p ( X p ) A mixed linear and nonlinear, additive model q E (Y | X 1 , ..., X n ) s1 ( X 1 ) ... s p ( X p ) j X p j j 1 A mixed linear and nonlinear, additive model with a class variable q E (Y | X 1 , ..., X n , Class k ) k s1 ( X 1 ) ... s p ( X p ) j X p j j 1 Datamining and statistical learning lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Jan Feb 6 Mar Apr 4 May Jun Jul Aug Output: Sep Oct Total-N conc Nov Dec Jan Feb 6 Mar Apr Monthly pattern 4 May Jun Trend function Jul Aug Sep Oct Nov Dec 8 2 Nov 2001 Jun 1997 1993 0 1989 Observed data Total-N concentration (mg N/l) Jan 8 2 Nov 2001 Jun 1997 1993 0 1989 Fitted model Total-N concentration (mg N/l) Jan Datamining and statistical learning lecture 9 Inputs: Modelling the concentration of total nitrogen at Lobith on the Rhine: Extracted additive components Year: linear component Year: Smooth component 2.0 1.5 1.0 0.5 Year components 0.0 -0.5 -1.0 -1.5 1988 1990 1992 1994 1996 Month: linear component 1998 2000 2002 2004 Month: Smooth component 1.5 1.0 Month components 0.5 0.0 -0.5 -1.0 0 2 4 6 Datamining and statistical learning lecture 9 8 10 12 14 Weekly mortality and confirmed cases of influenza in Sweden Mortality 3000 Influenza 450 2500 Mortality 2000 300 1500 1000 150 500 0 0 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Datamining and statistical learning lecture 9 Confirmed cases of influenza Response: Weekly mortality Inputs: Confirmed cases of influenza Seasonal dummies Long-term trend SYNTAX for common GAM models Type of Model Syntax Parametric model y = param(x); Nonparametric model y = spline(x); Nonparametric model y = loess(x); Semiparametric model y = param(x1) spline(x2); Additive model y = spline(x1) spline(x2); Thin-plate spline model y = spline2(x1,x2); Mathematical Form Datamining and statistical learning lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Model 1 proc gam data=Mining.Rhine; model Nconc = spline(Year) spline(Month); output out = addmodel1; run; Model 2 proc gam data=Mining.Rhine; model Nconc = spline2(Year, Month); output out = addmodel2; run; Datamining and statistical learning lecture 9 Proc GAM – degrees of freedom of the spline components The degrees of freedom of the spline components is selected by the user or by specifying method=GCV proc gam data=Mining.Rhine; model Nconc = spline(Year, df=3) spline(Month, df=3); output out = addmodel1; run; • Df=3 implies that the same cubic polynomial is valid in the entire range of the input • Increasing the df-value implies that knots are introduced Datamining and statistical learning lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine proc gam data=Mining.Rhine; model Nconc = spline(Year) spline(Month); output out = addmodel1; run; P_Month P_Year 1.50 0.25 0.20 1.00 0.15 0.10 0.50 0.05 0.00 0.00 -0.50 -0.05 -1.00 -0.10 1 13 25 37 49 61 73 85 97 109 121 133 145 157 1 13 25 37 49 61 73 85 97 109 121 133 145 157 Observation nr Observation nr Datamining and statistical learning lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Surface Plot of P_Nconc vs Month, Year Model 1 6 P_Nconc 4 12 8 2 1990 4 1995 Year 2000 Month 0 Datamining and statistical learning lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Surface Plot of P_Nconc_2 vs Month, Year Model 2 6 df=4 P_Nconc_2 5 4 12 3 8 1990 4 1995 Year 2000 Month 0 Datamining and statistical learning lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Surface Plot of P_Nconc_3 vs Month, Year Model 3 6.0 P_Nconc_3 4.5 12 3.0 8 1990 4 1995 Year 2000 Month 0 Datamining and statistical learning lecture 9 df=20 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Model 1 The GAM Procedure Dependent Variable: Nconc Smoothing Model Component(s): spline(Year) spline(Month) Summary of Input Data Set Number of Observations Number of Missing Observations Distribution Link Function 168 0 Gaussian Identity Iteration Summary and Fit Statistics Final Number of Backfitting Iterations Final Backfitting Criterion The Deviance of the Final Estimate 2 1.987193E-30 42.92519322 The local score algorithm converged. Datamining and statistical learning lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Model 1 Regression Model Analysis Parameter Estimates Parameter Parameter Estimate Standard Error t Value Pr > |t| Intercept Linear(Year) Linear(Month) 420.69388 -0.20824 -0.10461 19.84413 0.00994 0.01161 21.20 -20.94 -9.01 <.0001 <.0001 <.0001 Smoothing Model Analysis Analysis of Deviance Source Spline(Year) Spline(Month) DF Sum of Squares Chi-Square Pr > ChiSq 3.00000 3.00000 2.527155 51.143931 9.3609 189.4432 0.0249 <.0001 Datamining and statistical learning lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Model 2 Iteration Summary and Fit Statistics Final Number of Backfitting Iterations Final Backfitting Criterion The Deviance of the Final Estimate 2 0 74.22284569 Regression Model Analysis Parameter Estimates Parameter Parameter Estimate Standard Error t Value Pr > |t| Intercept 4.46475 0.05206 85.76 <.0001 Smoothing Model Analysis Analysis of Deviance Source Spline2(Year Month) DF Sum of Squares Chi-Square Pr > ChiSq 4.00000 162.668070 357.2336 <.0001 Datamining and statistical learning lecture 9 Generalized additive models: Modelling the concentration of total nitrogen at Lobith on the Rhine Model 2 (20 df) Iteration Summary and Fit Statistics Final Number of Backfitting Iterations Final Backfitting Criterion The Deviance of the Final Estimate 2 0 36.577160798 Regression Model Analysis Parameter Estimates Parameter Parameter Estimate Standard Error t Value Pr > |t| Intercept 4.46475 0.03849 116.01 <.0001 Smoothing Model Analysis Analysis of Deviance Source Spline2(Year Month) DF Sum of Squares 20.00000 200.313755 Datamining and statistical learning lecture 9 Chi-Square 805.0412 Pr > ChiSq <.0001 Estimation of additive models - the backfitting algorithm E (Y | X 1 x1 ,..., X p x p ) f1 ( x1 ) ... f p ( x p ) 1 N 1.Initiali ze : ˆ yi , fˆ j 0, j 1,..., p N i 1 2.Cycle : j 1,..., p,2,..., p,...,1,..., p ˆf s ( y ˆ ˆf ( x ) j j i k ik k j N 1 fˆ j fˆ j fˆ j ( xij ) N i 1 Datamining and statistical learning lecture 9 Modelling ln daily electricity consumption as a spline function of the population-weighted mean temperature in Sweden proc gam data=sasuser.smhi; model lnDaily_consumption = spline(Meantemp, df=20); ID Time; output out=smhiouttemp pred resid; run; Observed Fitted ln daily electricity consumption (MWh) 13.4 13.2 13.0 12.8 12.6 12.4 12.2 -30 -20 -10 0 10 20 Datamining and statistical learning Population-weighted temperature lecture 9 30 Residual Modelling ln daily electricity consumption as a spline function of the population-weighted mean temperature in Sweden: residual analysis 0.20 0.15 0.10 0.05 0.00 -0.05 -0.10 -0.15 -0.20 -0.25 0 100 200 Julian day Datamining and statistical learning lecture 9 300 400 Modelling ln daily electricity consumption in Sweden - residual analysis Spline of temperature Spline of Julian day Weekday dummies 0.20 0.15 0.10 0.05 0.00 -0.05 -0.10 -0.15 -0.20 -0.25 Residual Residual Spline of temperature 0 100 200 Julian day 300 400 0.20 0.15 0.10 0.05 0.00 -0.05 -0.10 -0.15 -0.20 -0.25 0 Datamining and statistical learning lecture 9 100 200 Julian day 300 400 Modelling ln daily electricity consumption in Sweden - residual analysis Splines of contemporaneous and time-lagged weather data Splines of Julian day and time Weekday and holiday dummies 0.20 0.15 0.10 0.05 0.00 -0.05 -0.10 -0.15 -0.20 -0.25 0.20 0.15 0.10 Residual Residual Spline of temperature Spline of Julian day Weekday dummies 0.05 0.00 -0.05 -0.10 -0.15 -0.20 0 100 200 Julian day 300 400 -0.25 0 Datamining and statistical learning lecture 9 100 200 Julian day 300 400 Deviance analysis of the investigated models of ln daily electricity consumption in Sweden Deviance 12 10.233 10 8 6 3.822 4 0.742 2 0 Temp only Temp, Julian day, weekday The residual deviance of a fitted model is minus twice its log-likelihood Final model If the error terms are normally distributed, the deviance is equal to the sum of squared residuals Datamining and statistical learning lecture 9 Modelling ln daily electricity consumption in Sweden: time series plot of residuals 0.15 Residual 0.10 0.05 0.00 -0.05 -0.10 -0.15 0 500 1000 Time Datamining and statistical learning lecture 9 1500 2000