Download PSCI6000 Maximum Likelihood Estimation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Goals
PSCI6000 Maximum Likelihood Estimation
Event Count Model
A Poisson model
Tetsuya Matsubayashi
A negative binomial model
A zero-inflated model
University of North Texas
November 16, 2010
1 / 40
Event Count Models
2 / 40
Use a Linear Model?
Count morels are useful for dependent variables that count the
number of times over time:
Not appropriate except for a special situation
Number of wars
The use of linear regression model can result in inefficient and
inconsistent estimates.
Number of legislation passed by Congress
Number of veto overrides by presidents
Consider carefully the data generating process of your
dependent variable.
Number of doctor visits
Number of articles published
Number of police arrests
3 / 40
4 / 40
The Poisson Distribution
The Poisson Distribution
If the process generates events independently and at a fixed
rate within time periods, then the result is a Poisson process.
The distribution follows
“Independently” means that an event has no impact on
occurrence of another event within the same time interval.
Two examples
P(y |λ) =
e −λ λyi
yi !
and
The number of phone calls received → independent
The number of veto overrides by president → not independent
E (yi ) = λ
In the Poisson process, the probability of the event occurring
is very small, but the number of trials is very large.
and
It is the limit of a binomial process in which π → 0 and
n → ∞.
See some examples of the Poisson distribution in R.
Var (yi ) = λ
5 / 40
Some Characteristics of the Poisson Distribution
6 / 40
Reparameterize λ
We reparameterize λ to be a non-negative, continuous
variable:
λi = e xi β
As λ increases, the mass of the distribution shifts to the right.
Var (y ) = E (y ) = λ. This is called equidispersion.
This gives us the following probability model:
As λ increases, the probability of 0 decreases.
As λ increases, the Poisson process approximates a normal
distribution.
P(yi ) =
See an example.
=
7 / 40
e −λi λyi i
yi !
e −e
xi β
(e xi β )yi
yi !
8 / 40
Likelihood
Gradient Vector and Hessian Matrix
The gradient vector is obtained by:
The likelihood function for the whole sample is:
L=
G=
x β
N
Y
e −e i (e xi β )yi
i=1
∂lnL
∂β
=
−e xi β xi + yi xi
i=1
yi !
=
The log-likelihood is:
lnL =
N
X
N
X
(yi − e xi β )xi = 0
i=1
N
X
The Hessian matrix is obtained by:
[−e xi β + yi xi β − ln(yi !)]
i=1
H=
∂ 2 lnL
∂β∂β
=
N
X
−xi e xi β xi
i=1
10 / 40
9 / 40
Estimation and Interpretation
The Expected Count
The expected count can be computed by
λ̂ = E (y ) = e x β̂
Use zelig(function, model=”poisson”) in R.
See an example using R.
This tells you the expected number of y when xs are set at
some values.
You can interpret the sign of the coefficients as usual.
The factor change in the expected count can be computed by:
use poisson in Stata.
You can also find statistical significance as usual.
E (y |x, xk + δ)
= e β̂k δ
E (y |x, xk )
For a change of δ in xk , the expected count increases by a
factor of e β̂k δ , holding other variables constant.
11 / 40
12 / 40
The Expected Count
Marginal Effect
Marginal effects in the Poisson model are
∂E (y |x)
∂x
The discrete change in the expected count can be computed
by
δE (y |x)
= E (y |x, xk = a) − E (y |x, xk = b)
δxk
=
=
=
For a change in xk from a to b, the expected count changes
(y |x)
by δEδx
, holding all other variables constant.
k
∂ λ̂i
∂x
∂e x β̂
∂x
∂e x β̂ ∂x β̂
∂x β̂ ∂x
= e x β̂ β̂
= λ̂i β̂
When x increases by small amount, the expected count
increases by β̂ times the expected number of events.
14 / 40
13 / 40
Predicted Probabilities
Unequal Observation Intervals
The results can be also used to compute the probability
distributions of counts for a given level of independent
variables.
x β̂
e −e (e x β̂ )m
P̂(Y = m|xβ) =
m!
The mean predicted probability for each count m can be used
to summarize the predictions of the model:
We assumed that each observation comes from intervals of
time that have the same length.
The Poisson is easily modified to incorporate unequal
observation intervals.
Let ti be the time interval for observation of case i.
The probability distribution is rewritten as
N
1 X
P̄(Y = m) =
P̂(Y = m|xi β)
N
P(y |λ) =
i=1
e −λi ti (λi ti )yi
yi !
The mean predicted probabilities can be compared to the
observed proportions of the sample at each count.
15 / 40
16 / 40
Unequal Observation Intervals
Assumption behind the Poisson Model
Reparameterize λi = e xi β and the log-likelihood becomes
X
x β
lnL =
[ln(e −e i ti ) + ln(e xi β ti )y + ln(yi !)]
X
=
[−e xi β ti + yi ln(e xi β ti )]
X
=
[yi (xi β + lnti ) − e xi β ti ]
X
=
[yi xi β − e xi β ti ]
The Poisson process rests on the assumption that the
probability of an event occurring is constant within a
particular period and is independent of other events during
the same period.
This assumption is restrictive because there might be
heterogeneity and social contagion in DGP.
where ln ti does not involve β and so can be dropped from the
function.
King (1989) shows that you can estimate the above function
by simply including ln ti in the model.
17 / 40
Heterogeneity
18 / 40
Contagion
What if one event increases or decreases the likelihood of
another?
What if events are more likely to occur in some time periods
than in others.
Individuals with a given set of xs initially have the same
probability of an event occurring, but this probability changes
as events occur.
The Poisson model λi = e xi β captures variation in λi .
But the model might fail to adequately account for this type
of heterogeneity.
This is called contagion.
The assumption of a constant within-period rate λ is often
violated.
If we have contagion, we are likely to see a greater number of
higher and lower counts.
Heterogeneity leads to overdispersion, E (Y ) < Var (Y ),
because the non-constant rate induces greater random
variability in Y than would a constant λ.
Positive contagion increases the variance of the observed
counts.
This also leads overdispersion.
19 / 40
20 / 40
Overdispersion
Negative Binomial
We replace λi = e xi β with λ̃i and suppose that
λ̃i = e xi β+i
The Poisson regression model rarely fits in practice because in
most applications E (Y ) < Var (Y ).
where i is a random error that is assumed to be uncorrelated
with x.
If we use the Poisson model using overdispersed data, we will
underestimate standard errors and overestimate the precision
of coefficients.
i represents unobserved heterogeneity.
λ̃ is rewritten as
We develop a Negative Binomial model to deal with
overdispersion.
λ̃i
= e xi β+i = e xi β e i
= λi e i = λi δ i
where δi = e i .
22 / 40
21 / 40
Negative Binomial
Negative Binomial
The negative binomial model is not identified without an
assumption about the mean of the error term δi .
Since δi is unknown, we cannot compute P(yi |xi , δi ) and
instead need to compute the distribution of y given only xi .
We assume that E (δi ) = 1 because this gives E (λ̃) = λi .
To compute P(yi |xi ) without conditioning on δi , we average
P(yi |xi , δi ) by the probability of each value of δi .
Further, the distribution of yi conditional on x and δ remains
a Poisson.
P(yi |xi , δi ) =
If g is the pdf for δi , then
Z ∞
P(yi |xi ) =
[P(yi |xi , δi ) × g (δi )]dδi
e −λ̃ λ̃yi
e −λi δi (λi δi )yi
=
yi !
yi !
0
23 / 40
24 / 40
Negative Binomial
Negative Binomial
To clarify what this important equation is doing, assume that
δ has only two values, δ1 and δ2 .
We suppose that δi has a gamma distribution with parameter
νi .
ν νi
g (δi ) = i δiνi −1 e −δi νi
Γ(νi )
The previous equation is now
P(yi |xi ) = [P(yi |xi , δi = δ1 ) × P(δi = δ1 )]
+[P(yi |xi , δi = δ2 ) × P(δi = δ2 )]
where the
R ∞gamma function is defined as
Γ(ν) = 0 t ν−1 e −t dt.
This equation weights P(y |x, δ) by P(δ), and adds over all
values of δ.
If δi has a gamma distribution, E (δi ) = 1 and Var (δi ) =
1
νi
This equation computes the probability of y as a mixture of
two probability distributions.
25 / 40
Negative Binomial
26 / 40
Negative Binomial
The negative binomial probability function is
νi yi
Γ(yi + νi )
νi
λi
P(yi |xi , νi ) =
yi !Γ(νi )
νi + λi
νi + λi
The variance remains unidentified since νi varies by
observation and we have more parameters than observations.
Then we assume that νi is the same for all observations:
νi = α−1
The expected value is the same as for the Poisson distribution:
for α > 0.
E (yi |xi ) = λi
This α is known as the dispersion parameter since increasing
α increases the conditional variance of y .
The variance differs:
Var (yi |xi ) = λi
λi
e xi β
1+
= e xi β 1 +
νi
νi
The variance equation is rewritten as
λi
Var (yi |xi ) = λi 1 + −1 = λi (1 + αλi ) = λi + αλ2i
α
Since λ and ν are both positive, the variance will exceed the
conditional mean.
27 / 40
28 / 40
Testing for Overdispersion
Estimation
The likelihood function is
A one tailed-test of H0 : α = 0 can be used to test for
overdispersion.
L =
N
Y
i=1
N
Y
P(yi |xi )
α−1 yi
λi
α−1 + λi
i=1
α−1 yi
N
Y
Γ(yi + α−1 )
α−1
e xi β
=
yi !Γ(α−1 )
α−1 + e xi β
α−1 + e xi β
You can also use a LR test for H0 : α = 0:
=
2(lnLNB − lnLP ) ∼ χ2
where the degrees of freedom is two.
Γ(yi + α−1 )
yi !Γ(α−1 )
α−1
−1
α + λi
i=1
After taking the log of the likelihood function, we would
maximize it with respect to β and α.
29 / 40
Estimation and Interpretation
30 / 40
Zero Inflated Model
A zero inflated model changes the mean structure to explicitly
model the production of zero counts.
This is done by assuming that 0’s can be generated by a
different process from positive counts.
For estimation, use zelig(function, model=”negbin”,
data=data) in R.
Suppose there are two (unobservable) groups of people.
Or use nbreg in Stata.
The first group (A) consists of people who always have zero
counts.
Methods of interpretation are identical to those for the
Poisson model.
The second group (B) consists of people who may have
positive counts but may have a zero count.
We assume that there are two different processes that might
produce a zero.
31 / 40
32 / 40
Zero Inflated Model
Zero Inflated Model
Let Ai = 1 if an individual is in group A, 0 otherwise.
Three steps to derive a zero inflated model
1
Model membership into the latent groups A and B.
Let the probability that an individual belongs to A be ψ and a
function of characteristics of individuals z.
2
Model counts for group B.
ψ is determined by a logit or probit model.
3
Compute observed probabilities as a mixture of the
probabilities for the two groups.
P(Ai = 1) = ψ = F (zi γ)
where F is either the normal or the logistic cdf.
33 / 40
Zero Inflated Model
34 / 40
Zero Inflated Model
We mix the two groups according to their proportions in the
sample to determine the overall rate.
Among people in group B, the probability of each count
(including zeros) is determined by the Poisson or Negative
binomial models.
The proportion of each group is
P(Ai = 1) = ψi
The probability function is conditional on the xs and ψ.
P(Ai = 0) = 1 − ψi
e −µi µyi i
P(yi |xi , Ai = 0) =
yi !
The probabilities of a zero count within each group are
P(yi = 0|Ai = 1, xi , zi ) = 1 by definition
P(yi = 0|Ai = 0, xi , zi ) = outcome of the model
35 / 40
36 / 40
Zero Inflated Model
Zero Inflated Model
The overall probability of zero count for the Poisson case is
The probability of other counts for the Poisson case is
P(yi = 0|xi , zi ) = [ψi × 1] + [(1 − ψ) × P(yi = 0|Ai = 0, xi )]
P(yi = k|xi , zi ) = [ψi × 0] + [(1 − ψ) × P(yi = k|Ai = 0, xi )]
= (1 − ψ) × P(yi = k|Ai = 0, xi )
e −µi µyi i
= (1 − ψ)
yi !
= ψi + [(1 − ψ) × P(yi = 0|Ai = 0, xi )]
= ψi + [(1 − ψ) × e −µi ]
If you use a negative binomial model, you replace the Poisson
distribution in the equation with the negative binomial
distribution.
If you use a negative binomial model, you replace the Poisson
distribution in the equation with the negative binomial
distribution.
37 / 40
Zero Inflated Model
38 / 40
Estimation and Interpretation
The expected counts are computed by
Use zeroinfl in pscl library in R.
E (yi |xi , zi ) = [0 × ψi ] + [µi × (1 − ψi )] = µi − µi ψi
Use zip or zinb in Stata
The variance for the Poisson model is
Coefficients associated with xi can be interpreted as in the
other count models, while coefficients associated with zi can
be interpreted as in the logit or probit model.
Var (yi |xi , zi ) = µi (1 − ψi )(1 + µi ψi )
The binary process predicts membership in the group that
must have a zero count.
The variance for the negative binomial model is
Var (yi |xi , zi ) = µi (1 − ψi )(1 + µi (ψi + α))
39 / 40
40 / 40
Related documents