Download Lecture 17: Poisson GLMs with a Rate Parameter

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia , lookup

Theoretical ecology wikipedia , lookup

Predictive analytics wikipedia , lookup

Vector generalized linear model wikipedia , lookup

Least squares wikipedia , lookup

General circulation model wikipedia , lookup

History of numerical weather prediction wikipedia , lookup

Plateau principle wikipedia , lookup

Data assimilation wikipedia , lookup

Computer simulation wikipedia , lookup

Tropical cyclone forecast model wikipedia , lookup

Generalized linear model wikipedia , lookup

Transcript
Lecture 17: Poisson GLMs with a Rate Parameter
(Text Section 9.2)
So far, we have considered Poisson GLMs where exposure is constant, so that it is sensible
to model the mean of each count, Yi .
In this lecture, we extend the Poisson regression model by allowing for the case where (nonnegative) exposure ti is associated with observation i. Then, Yi ∼Poisson(µi ) where µi = θi ti
and
p
g(θi ) =
X
xij βj .
j=1
Here g is used to transform θi rather than µi . However, for fitting these models in S-PLUS,
we need to specify g(µi ), not g(θi ).
NOTE: We wouldn’t want to simply compute Yi0 = Yi /ti and use the Yi0 ’s as the responses
because Yi0 is not necessarily a count! Therefore, it would likely be hard to find an appropriate
distribution to describe the transformed responses.
Specifying the Form of the Rate Parameter in S-PLUS
If g is the log link,
log µi = log ti + log θi
= log ti +
p
X
xij βj .
j=1
Therefore, we can specify this model in S-PLUS using the usual Poisson GLM with log link
by including the term log ti as an offset.
If g is the square-root link,
√
√ q
ti θi
p
√ X
=
ti
xij βj
µi =
j=1
=
≡
p
X
√
( ti xij )βj
j=1
p
X
x∗ij βj .
j=1
In our usual models, xi1 ≡ 1 so that β√
1 is the intercept (included in the model by default in SPLUS). However, in this case, x∗i1 = ti , i.e. there is no intercept. Therefore, we can specify
this model in S-PLUS using the usual Poisson GLM with square-root link by including the
covariates x∗i rather than xi and by excluding the intercept term.
1
We can also work out how to specify the model properly in S-PLUS when g is the identity
link (see Assignment 3).
Example: Geiger counter experiment
Let Yi be the ith count observed at distance xi metres from the source of radioactivity over
a period of ti seconds. Goal: Estimate the effect of distance on the number of counts.
A reasonable preliminary model is Yi ∼Poisson(µi ), where µi = ti θi , and
log θi = β0 + β1 xi
so that
log µi = log ti + β0 + β1 xi .
Let θ∗ be the rate when xi = c, and let θ1∗ be the rate when xi = c + 1, i.e.
log θ∗ = β0 + β1 c
log θ1∗ = β0 + β1 (c + 1).
Then, β1 = log θ1∗ −log θ∗ is the difference in log geiger counter rate when distance is increased
by 1 metre. Equivalently, using the fact that
eβ1 =
θ1∗
,
θ∗
the geiger counter rate changes by a factor of eβ1 when distance is increased by 1 metre.
S-PLUS estimates β̂1 = −0.684 with a standard error of 0.019.
Alternatively, we can fit the model Yi ∼Poisson(µi ), where
√
√
µi = β0 ti + β1 x∗i ,
√
where x∗i = ti xi . Then, β1 is the difference in the square-root of the geiger counter rate
when distance is increased by 1 metre.
S-PLUS estimates β̂1 = −1.14 with a standard error of 0.031.
Choosing the Link Function
Counts are easier to model than binary data in the sense that we usually have a variety of
observed values (i.e., 0, 1, 2, . . . ) instead of just 0’s and 1’s. For this reason, plots of the
raw data are often more informative than in the binary case.
If exposure is constant so that we are modelling a mean rather than a rate, we can plot
the observed counts vs. a continuous predictor variable. The relationship between these
quantities can give insight into the link function relating the mean (µi ) and the predictor.
2
If exposure varies by a variable ti , then we can plot the “adjusted” observed counts
Yi0 = Yi /ti
vs. a continuous predictor variable. Since E[Yi0 ] = θi , this plot can give insight into the link
function relating the rate (θi ) and the predictor. IMPORTANT: We are modelling θi , not
Yi0 . And, we use Yi as the response, not Yi0 . We are simply using the Yi0 ’s in this plot to
learn about the behaviour of θi in relation to the predictor.
Goodness-of-Fit Assessment in Poisson GLMs
In Lecture 11, we computed the deviance as
D=2
" n
X
yi (log yi − log µ̂i ) −
i=1
n
X
#
(yi − µ̂i ) ,
i=1
which does not contain unknown parameters, so can be calculated from the data. This
is labelled as the “residual deviance” in the S-PLUS summary output. NOTE: If we’ve
modelled the rate rather than the mean, then µ̂i = ti θ̂i .
Likewise, if there are no replicates, the Pearson chi-squared statistic is defined as
X2 =
n
X
(yi − µ̂i )2
µ̂i
i=1
,
which has the alternative interpretation of
X2 =
n
X
(observedi − expectedi )2
expectedi
i=1
.
This statistic is not defined in the case where there are replicates.
If D and/or X 2 are large relative to the χ2n−p distribution, then we have evidence against
the null hypothesis that our model fits the data well (relative to the saturated model). The
χ2n−p approximation is more accurate when the µi ’s are relatively large.
The deviance residuals are defined as
q
di = sign(yi − µ̂i ) 2 [yi (log yi − log µ̂i ) − (yi − µ̂i )],
i = 1, . . . , n. The Pearson residuals are defined as
observedi − expectedi
yi − µ̂i
q
≡
Xi = √
,
µ̂i
expectedi
i = 1, . . . , n.
We can plot these against the fitted values or against the predictor variables. Patterns in
these plots might suggest that a different form of a predictor variable is required (e.g., x2 in
3
addition to x). In addition, they might indicate outliers, i.e. counts which depart significantly
from the fitted model.
As in the binary case, these residuals are not normally distributed (though the normal
distribution provides a reasonable approximation when the µi ’s are large). Therefore, we
would not expect residual plots to look like those in the linear setting.
To assess the predictive abilities of the model, we can plot the observed counts (Yi ) vs. the
fitted values (µ̂i ). Note that, if we’re modelling the rate rather than the mean, the fitted
values are already adjusted by ti , so it doesn’t make sense to plot them vs. the adjusted
counts (Yi /ti ).
Example: Geiger counter experiment (cont.)
How do we check whether our chosen model is appropriate for these data? For example, how
do we choose between the log and square-root link functions?
1. Deviance/Pearson chi-squared tests. The tests when applied to both the log and
square-root link models suggest that neither model fits well. The deviances of these
models are nearly identical. If we had to choose between the two, we might choose the
model with log link for simplicity.
2. Plot the observed vs. fitted values. Even though the above tests suggest that the model
fits poorly, this plot does not show any major deviation of the observed values from
the fitted values!
3. Plot the observed and fitted values (simultaneously) vs. distance in order to detect
discrepancies for specific values of distance. This plot also suggests that the model fits
reasonably well.
4. Plot the deviance/Pearson residuals. These plots do not show any obvious problems.
Summary: Although the formal GOF tests provide evidence that the model with log link
doesn’t fit well, our graphical tools suggest otherwise.
Likely explanation: Both the predictor and response variables are observed over a large
range in this example. We therefore have a lot of information with which to estimate the
relationship between these variables. The formal tests are likely picking up small deviations
from the proposed model which (hopefully) aren’t important in practice.
Conclusion: We would likely accept the model with log link as providing a reasonable description of the data. Remember that “All models are wrong, but some are useful” (George
Box).
4