Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Goals PSCI6000 Maximum Likelihood Estimation Event Count Model A Poisson model Tetsuya Matsubayashi A negative binomial model A zero-inflated model University of North Texas November 16, 2010 1 / 40 Event Count Models 2 / 40 Use a Linear Model? Count morels are useful for dependent variables that count the number of times over time: Not appropriate except for a special situation Number of wars The use of linear regression model can result in inefficient and inconsistent estimates. Number of legislation passed by Congress Number of veto overrides by presidents Consider carefully the data generating process of your dependent variable. Number of doctor visits Number of articles published Number of police arrests 3 / 40 4 / 40 The Poisson Distribution The Poisson Distribution If the process generates events independently and at a fixed rate within time periods, then the result is a Poisson process. The distribution follows “Independently” means that an event has no impact on occurrence of another event within the same time interval. Two examples P(y |λ) = e −λ λyi yi ! and The number of phone calls received → independent The number of veto overrides by president → not independent E (yi ) = λ In the Poisson process, the probability of the event occurring is very small, but the number of trials is very large. and It is the limit of a binomial process in which π → 0 and n → ∞. See some examples of the Poisson distribution in R. Var (yi ) = λ 5 / 40 Some Characteristics of the Poisson Distribution 6 / 40 Reparameterize λ We reparameterize λ to be a non-negative, continuous variable: λi = e xi β As λ increases, the mass of the distribution shifts to the right. Var (y ) = E (y ) = λ. This is called equidispersion. This gives us the following probability model: As λ increases, the probability of 0 decreases. As λ increases, the Poisson process approximates a normal distribution. P(yi ) = See an example. = 7 / 40 e −λi λyi i yi ! e −e xi β (e xi β )yi yi ! 8 / 40 Likelihood Gradient Vector and Hessian Matrix The gradient vector is obtained by: The likelihood function for the whole sample is: L= G= x β N Y e −e i (e xi β )yi i=1 ∂lnL ∂β = −e xi β xi + yi xi i=1 yi ! = The log-likelihood is: lnL = N X N X (yi − e xi β )xi = 0 i=1 N X The Hessian matrix is obtained by: [−e xi β + yi xi β − ln(yi !)] i=1 H= ∂ 2 lnL ∂β∂β = N X −xi e xi β xi i=1 10 / 40 9 / 40 Estimation and Interpretation The Expected Count The expected count can be computed by λ̂ = E (y ) = e x β̂ Use zelig(function, model=”poisson”) in R. See an example using R. This tells you the expected number of y when xs are set at some values. You can interpret the sign of the coefficients as usual. The factor change in the expected count can be computed by: use poisson in Stata. You can also find statistical significance as usual. E (y |x, xk + δ) = e β̂k δ E (y |x, xk ) For a change of δ in xk , the expected count increases by a factor of e β̂k δ , holding other variables constant. 11 / 40 12 / 40 The Expected Count Marginal Effect Marginal effects in the Poisson model are ∂E (y |x) ∂x The discrete change in the expected count can be computed by δE (y |x) = E (y |x, xk = a) − E (y |x, xk = b) δxk = = = For a change in xk from a to b, the expected count changes (y |x) by δEδx , holding all other variables constant. k ∂ λ̂i ∂x ∂e x β̂ ∂x ∂e x β̂ ∂x β̂ ∂x β̂ ∂x = e x β̂ β̂ = λ̂i β̂ When x increases by small amount, the expected count increases by β̂ times the expected number of events. 14 / 40 13 / 40 Predicted Probabilities Unequal Observation Intervals The results can be also used to compute the probability distributions of counts for a given level of independent variables. x β̂ e −e (e x β̂ )m P̂(Y = m|xβ) = m! The mean predicted probability for each count m can be used to summarize the predictions of the model: We assumed that each observation comes from intervals of time that have the same length. The Poisson is easily modified to incorporate unequal observation intervals. Let ti be the time interval for observation of case i. The probability distribution is rewritten as N 1 X P̄(Y = m) = P̂(Y = m|xi β) N P(y |λ) = i=1 e −λi ti (λi ti )yi yi ! The mean predicted probabilities can be compared to the observed proportions of the sample at each count. 15 / 40 16 / 40 Unequal Observation Intervals Assumption behind the Poisson Model Reparameterize λi = e xi β and the log-likelihood becomes X x β lnL = [ln(e −e i ti ) + ln(e xi β ti )y + ln(yi !)] X = [−e xi β ti + yi ln(e xi β ti )] X = [yi (xi β + lnti ) − e xi β ti ] X = [yi xi β − e xi β ti ] The Poisson process rests on the assumption that the probability of an event occurring is constant within a particular period and is independent of other events during the same period. This assumption is restrictive because there might be heterogeneity and social contagion in DGP. where ln ti does not involve β and so can be dropped from the function. King (1989) shows that you can estimate the above function by simply including ln ti in the model. 17 / 40 Heterogeneity 18 / 40 Contagion What if one event increases or decreases the likelihood of another? What if events are more likely to occur in some time periods than in others. Individuals with a given set of xs initially have the same probability of an event occurring, but this probability changes as events occur. The Poisson model λi = e xi β captures variation in λi . But the model might fail to adequately account for this type of heterogeneity. This is called contagion. The assumption of a constant within-period rate λ is often violated. If we have contagion, we are likely to see a greater number of higher and lower counts. Heterogeneity leads to overdispersion, E (Y ) < Var (Y ), because the non-constant rate induces greater random variability in Y than would a constant λ. Positive contagion increases the variance of the observed counts. This also leads overdispersion. 19 / 40 20 / 40 Overdispersion Negative Binomial We replace λi = e xi β with λ̃i and suppose that λ̃i = e xi β+i The Poisson regression model rarely fits in practice because in most applications E (Y ) < Var (Y ). where i is a random error that is assumed to be uncorrelated with x. If we use the Poisson model using overdispersed data, we will underestimate standard errors and overestimate the precision of coefficients. i represents unobserved heterogeneity. λ̃ is rewritten as We develop a Negative Binomial model to deal with overdispersion. λ̃i = e xi β+i = e xi β e i = λi e i = λi δ i where δi = e i . 22 / 40 21 / 40 Negative Binomial Negative Binomial The negative binomial model is not identified without an assumption about the mean of the error term δi . Since δi is unknown, we cannot compute P(yi |xi , δi ) and instead need to compute the distribution of y given only xi . We assume that E (δi ) = 1 because this gives E (λ̃) = λi . To compute P(yi |xi ) without conditioning on δi , we average P(yi |xi , δi ) by the probability of each value of δi . Further, the distribution of yi conditional on x and δ remains a Poisson. P(yi |xi , δi ) = If g is the pdf for δi , then Z ∞ P(yi |xi ) = [P(yi |xi , δi ) × g (δi )]dδi e −λ̃ λ̃yi e −λi δi (λi δi )yi = yi ! yi ! 0 23 / 40 24 / 40 Negative Binomial Negative Binomial To clarify what this important equation is doing, assume that δ has only two values, δ1 and δ2 . We suppose that δi has a gamma distribution with parameter νi . ν νi g (δi ) = i δiνi −1 e −δi νi Γ(νi ) The previous equation is now P(yi |xi ) = [P(yi |xi , δi = δ1 ) × P(δi = δ1 )] +[P(yi |xi , δi = δ2 ) × P(δi = δ2 )] where the R ∞gamma function is defined as Γ(ν) = 0 t ν−1 e −t dt. This equation weights P(y |x, δ) by P(δ), and adds over all values of δ. If δi has a gamma distribution, E (δi ) = 1 and Var (δi ) = 1 νi This equation computes the probability of y as a mixture of two probability distributions. 25 / 40 Negative Binomial 26 / 40 Negative Binomial The negative binomial probability function is νi yi Γ(yi + νi ) νi λi P(yi |xi , νi ) = yi !Γ(νi ) νi + λi νi + λi The variance remains unidentified since νi varies by observation and we have more parameters than observations. Then we assume that νi is the same for all observations: νi = α−1 The expected value is the same as for the Poisson distribution: for α > 0. E (yi |xi ) = λi This α is known as the dispersion parameter since increasing α increases the conditional variance of y . The variance differs: Var (yi |xi ) = λi λi e xi β 1+ = e xi β 1 + νi νi The variance equation is rewritten as λi Var (yi |xi ) = λi 1 + −1 = λi (1 + αλi ) = λi + αλ2i α Since λ and ν are both positive, the variance will exceed the conditional mean. 27 / 40 28 / 40 Testing for Overdispersion Estimation The likelihood function is A one tailed-test of H0 : α = 0 can be used to test for overdispersion. L = N Y i=1 N Y P(yi |xi ) α−1 yi λi α−1 + λi i=1 α−1 yi N Y Γ(yi + α−1 ) α−1 e xi β = yi !Γ(α−1 ) α−1 + e xi β α−1 + e xi β You can also use a LR test for H0 : α = 0: = 2(lnLNB − lnLP ) ∼ χ2 where the degrees of freedom is two. Γ(yi + α−1 ) yi !Γ(α−1 ) α−1 −1 α + λi i=1 After taking the log of the likelihood function, we would maximize it with respect to β and α. 29 / 40 Estimation and Interpretation 30 / 40 Zero Inflated Model A zero inflated model changes the mean structure to explicitly model the production of zero counts. This is done by assuming that 0’s can be generated by a different process from positive counts. For estimation, use zelig(function, model=”negbin”, data=data) in R. Suppose there are two (unobservable) groups of people. Or use nbreg in Stata. The first group (A) consists of people who always have zero counts. Methods of interpretation are identical to those for the Poisson model. The second group (B) consists of people who may have positive counts but may have a zero count. We assume that there are two different processes that might produce a zero. 31 / 40 32 / 40 Zero Inflated Model Zero Inflated Model Let Ai = 1 if an individual is in group A, 0 otherwise. Three steps to derive a zero inflated model 1 Model membership into the latent groups A and B. Let the probability that an individual belongs to A be ψ and a function of characteristics of individuals z. 2 Model counts for group B. ψ is determined by a logit or probit model. 3 Compute observed probabilities as a mixture of the probabilities for the two groups. P(Ai = 1) = ψ = F (zi γ) where F is either the normal or the logistic cdf. 33 / 40 Zero Inflated Model 34 / 40 Zero Inflated Model We mix the two groups according to their proportions in the sample to determine the overall rate. Among people in group B, the probability of each count (including zeros) is determined by the Poisson or Negative binomial models. The proportion of each group is P(Ai = 1) = ψi The probability function is conditional on the xs and ψ. P(Ai = 0) = 1 − ψi e −µi µyi i P(yi |xi , Ai = 0) = yi ! The probabilities of a zero count within each group are P(yi = 0|Ai = 1, xi , zi ) = 1 by definition P(yi = 0|Ai = 0, xi , zi ) = outcome of the model 35 / 40 36 / 40 Zero Inflated Model Zero Inflated Model The overall probability of zero count for the Poisson case is The probability of other counts for the Poisson case is P(yi = 0|xi , zi ) = [ψi × 1] + [(1 − ψ) × P(yi = 0|Ai = 0, xi )] P(yi = k|xi , zi ) = [ψi × 0] + [(1 − ψ) × P(yi = k|Ai = 0, xi )] = (1 − ψ) × P(yi = k|Ai = 0, xi ) e −µi µyi i = (1 − ψ) yi ! = ψi + [(1 − ψ) × P(yi = 0|Ai = 0, xi )] = ψi + [(1 − ψ) × e −µi ] If you use a negative binomial model, you replace the Poisson distribution in the equation with the negative binomial distribution. If you use a negative binomial model, you replace the Poisson distribution in the equation with the negative binomial distribution. 37 / 40 Zero Inflated Model 38 / 40 Estimation and Interpretation The expected counts are computed by Use zeroinfl in pscl library in R. E (yi |xi , zi ) = [0 × ψi ] + [µi × (1 − ψi )] = µi − µi ψi Use zip or zinb in Stata The variance for the Poisson model is Coefficients associated with xi can be interpreted as in the other count models, while coefficients associated with zi can be interpreted as in the logit or probit model. Var (yi |xi , zi ) = µi (1 − ψi )(1 + µi ψi ) The binary process predicts membership in the group that must have a zero count. The variance for the negative binomial model is Var (yi |xi , zi ) = µi (1 − ψi )(1 + µi (ψi + α)) 39 / 40 40 / 40