Download Computer lab 7: Generalized additive models

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
732A20 Data Mining and Statistical Learning
Department of Computer and Information Science
Computer lab 7: Generalized additive models
Learning objectives
The main objective of this computer lab is to make the student familiar with the use of
generalized additive models for prediction and hypothesis testing.
After completing the lab the student shall be able to:
(i)
(ii)
(iii)
Use proc GAM in SAS to fit different types of generalized additive models to
a given dataset.
Interpret the output from proc GAM, and test hypotheses regarding the model
components.
Visually inspect the residuals of a generalized additive models and based on
the patterns found in the residuals suggest how the model can be improved
Recommended reading
Chapter 9.1 in Hastie et al.
Assignment 1: Using additive models to examine how mortality
is related to the number of influenza cases
The Excel document influenza.xls contains weekly data on the mortality and the number
of laboratory-confirmed cases of influenza in Sweden. In addition, there is information
about population-weighted temperature anomalies (temperature deficits). The last
columns contain time-lagged variables. Your task is to employ generalized additive
models to examine how the mortality is influenced by the number of influenza cases.
a) Use time series plots and scatter-charts to visually inspect how the mortality varies
with year, week, and the number of laboratory-confirmed cases of influenza.
b) Use Proc GAM to investigate how the mortality can be described as a function of year
(or time) and week. The set of models fitted to data shall include:
(i)
ordinary regression models with independent normally distributed error terms;
(ii)
additive semiparametric models with independent normally distributed error
terms.
Use various combinations of param() and spline() in the model statement, and examine
how the degrees of freedom of the spline function(s) influence the deviance of the model.
Also, plot predicted and observed mortality against time for the fitted models.
c) Choose one of the models you have already fitted to data and examine the residuals. Is
the temporal pattern in the residuals correlated to the outbreaks of influenza?
732A20 Data Mining and Statistical Learning
Department of Computer and Information Science
d) Use Proc GAM to investigate how the mortality can be described as a function of year
(or time), week, and the number of confirmed cases of influenza.
Summarize your findings in a table of deviances for the tested models.
Choose the model having the smallest deviance and make suitable plots of:
(i)
the spline components in the model;
(ii)
observed and predicted mortality rates.
Test whether or not the mortality is influenced by the outbreaks of influenza.
Assignment 2: Using additive models to examine how mortality
is related to the number of influenza cases and extremely low
temperatures
The Excel file influenza.xls contains observations of population-weighted temperature
deficits in Sweden. (A high deficit means that is unusually cold.) Your task is to employ
generalized additive models to investigate how the mortality is influenced by influenza
outbreaks and temperature deficits.
a) Take the best model in assignment 1 and examine by visual inspection whether or not
the residuals in that model are correlated to the temperature deficit?
b) Use Proc GAM to investigate how the mortality can be described as a function of year
(or time), week, the number of confirmed cases of influenza, and the temperature deficit.
Summarize your findings in a table of deviances for the tested models.
Choose the model having the smallest deviance and make suitable plots of:
(i)
the spline components in the model;
(ii)
observed and predicted mortality rates.
Test whether or not the mortality is influenced by the outbreaks of influenza and the
temperature deficit.
c) Use Proc GAM to investigate whether your so far best model can be further improved
by introducing time-lagged information about influenza cases and temperature deficits.
Summarize your findings in a table of deviances for the tested models.
Choose the model having the smallest deviance and make suitable plots of:
(i)
the spline components in the model;
(ii)
observed and predicted mortality rates.
Test whether or not the mortality is influenced by the outbreaks of influenza and the
temperature deficit.
To hand in
Highlighted items.