Download Bayesian Inference for Heterogeneous Event Counts

Bayesian Inference for Heterogeneous Event Counts Andrew D. Martin∗ Department of Political Science Washington University Campus Box 1063 One Brookings Drive St. Louis, MO 63130 [email protected] April 19, 2000 Abstract This paper presents a handful of Bayesian tools one can use to model heterogeneous event counts. In many political science applications we are interested in modeling the number of times a particular event takes place. While models for event count cross-sections are now widely used in political science (King, 1988, 1989b), little has been written about how to model counts when contextual factors introduce heterogeneity. I begin with a discussion of Bayesian cross-sectional count models and introduce an alternative model for counts with overdispersion. To illustrate the Bayesian framework, I model event counts of the number of discharge petitions from the 61 st to the 105th House, and the number of women’s rights bills cosponsored by each member in the 92 nd House. I then generalize the model to allow for contextual heterogeneity and posit a hierarchical Poisson regression model, fitting this model to the number of women rights cosponsorships for each member of the 83 rd to 102nd House. I demonstrate the advantages of this approach over pooled and independent Poisson regressions. The hierarchical model allows one to explicitly model contextual factors and test alternative contextual explanations. Additionally, I discuss software one can use to easily implement these models with little start-up cost. ∗ Paper prepared for presentation at the Midwest Political Science Association Annual Meeting, Chicago, Illinois, April 27-30, 2000. The author is an Assistant Professor of Political Science at Washington University. I would like to thank Kyle Saunders for his comments, and Kevin Quinn for many helpful conversations. I am particularly indebted to Christina Wolbrecht for sharing her valuable dataset with me, for directing me substantively to the study of pre-floor behavior in the House, and for introducing me to my wife. All models presented here were estimated using Classic BUGS 0.6 on a Linux workstation (Spiegelhalter et al., 1997). Soon I will make available BUGS code (and sample datasets) to estimate these models available at homepage: http://artsci.wustl.edu/∼admartin/. This is an extremely preliminary draft and I would appreciate any comments. 1 Introduction Heterogeneity is what makes politics, and for that matter statistics, interesting. However, until recently, little attention has been paid to modeling political science data observed in heterogeneous clusters (some notable exceptions include Beck and Katz, 1995; Western, 1998). Social explanations throughout the discipline predict that identical individuals will behave quite differently in different contexts. For example, rational choice accounts of political behavior posit that individuals will act in accordance with an equilibrium which is in part a function of the other actors in the game. In short, individuals with the same preferences and the same characteristics are predicted to behave differently in different contexts. In this paper, I review and introduce a type of statistical model one can use to model counts of observed political behavior that allow the researcher to model and test different explanations of heterogeneity. For well over a decade, political scientists have been interested in modeling the number of times a particular event occurs. King (1988) introduced the discipline to the Poisson regression model, and later developed the generalized event count (GEC) model (King, 1989b). These models have been widely used throughout the discipline and applied to problems in international relations, American politics, comparative politics, and political economy. Yet, these applications have almost universally been used for homogeneous event counts; that is, individual cross-sections when all observations are assumed to be exchangeable. In many applications, this assumption is not tenable. For example, suppose the I am interested in modeling the number of times local school boards issued bonds for infrastructure projects in the 1990s, and have collected data from every school board in every state. If I were to pool every observation into a single sample, I would be implicitly assuming that upon knowing the relevant characteristics of each school district, contextual state-by-state factors are unimportant. This assumption would be inappropriate in this example because state laws and funding mechanisms vary widely. The hierarchical model I introduce here relaxes this assumption; not only would it allow me to control for important contextual factors that might impact bond issues, it also allows me to test competing contextual explanations. The purpose of this paper is to develop a general modeling strategy one can use to draw inferences from heterogeneous event counts. For practical reasons detailed below, I adopt a Bayesian approach and use Markov chain Monte Carlo (MCMC) techniques for estimation. I begin by discussing Bayesian inference for cross-sectional counts in Section 2. This section includes the standard Poisson regression model, as well as a model that allows for overdispersion in observed event counts. The section also includes an introduction to the software used throughout this paper. In Section 3, I present results from two case studies: the number of discharge petitions filed from the 61 st to the 105th Congress and the number of women’s rights bills cosponsored by every member of the 92 nd House. I then turn my attention to heterogeneous counts and offer a hierarchical Poisson regression model suitable for this type of data. In Section 5, I turn my attention to my final case study: the number of women’s rights cosponsorships for every member of the 83 rd to the 102nd House. In this section, I compare alternative modeling strategies, and demonstrate the usefulness of hierarchical models. The final section concludes with a discussion of other types of models one can use for heterogeneous event counts. 2 Bayesian Inference for Homogeneous Event Counts Before I turn my attention to heterogeneous event counts, it is important to cover Bayesian inference for homogeneous counts. These models are most commonly applied to single cross-sections of data, and require the statistical assumption that each observation is conditionally independent, and is thus exchangeable. These models have been used extensively in political science for a variety of applications. For a good introduction to event count models, I refer the reader to Cameron and Trivedi (1998). King (1988, 1989b) also provides introductions to cross-sectional models geared toward political scientists. In the remainder of this section, I discuss cross-sectional event count models that allow for equidispersion and overdispersion in observed counts. Poisson Regression and Bayesian Inference Event count models were first introduced in political science by King (1988), who demonstrates many advantages of what he terms the exponential Poisson regression model over multiple regression. The Poisson distribution is a discrete distribution defined on the non-negative integers and can be derived from distributions of waiting times (Cameron and Trivedi, 1998, pp. 6-8). The Poisson regression model is used to model a set of counts y i ∈ 0, 1, 2, . . ., ∞ = Z∗ on the non-negative integers for a set of observations i = 1, . . ., n. We further observe a row vector of explanatory variables x i of dimensionality (1 × j). The Poisson regression model is thus: yi ∼ Poisson(λi) (1) λi = exp(xi β) We are interested in estimating the parameter vector β which has dimensionality (j × 1). King (1988) estimates this model using maximum likelihood. Indeed, Equation 1 defines the likelihood, which is simply the product of Poisson densities across all observations i. The approach I take here is to estimate this model from a Bayesian perspective. Fundamentally, Bayesians assume that the data is fixed and that parameters vary. Inference, then, is conducted by computing the probability density of the parameters conditioned on the data observed. The alternative frequentist (or classical) approach, on the other hand, assumes that parameters are fixed and unknown. One thus finds the parameters most likely to have generated the data by maximizing the likelihood. These approaches are inextricably related through Bayes theorem. For the Poisson regression model, I am interested in: f (β|y) ∝ f (y|β) f (β) The distribution f (β|y) is called the posterior density, and us the probability density of β conditioned on observing the data. The posterior summarizes our a posteriori beliefs about the parameter vector β after having observed the data. From it, one can derive many probability statements, including those about the probability a particular β j differs from zero. The posterior is proportional to the likelihood f (y|β) times the prior f (β). In most applications, the posterior density f (β|y) does not take a closed form. However, there are a set of algorithms one can use to simulation from the posterior density. In so doing, one can make posterior inferences to any level of precision, depending only the computation time (which, thankfully, has gotten very inexpensive). The algorithms employed here are Markov chain Monte Carlo (MCMC) techniques, intro3 duced by Geman and Geman (1984), Tanner and Wong (1987), and Gelfand and Smith (1990). For general introductions to MCMC methods I recommend Carlin and Louis (1996) and Gelman et al. (1995). Jackman (2000) has also published an excellent introduction to MCMC geared specifically to political scientists. To date, MCMC estimation has been sporadically employed in political science. Gelman and King (1990) are credited with the first application of MCMC in political science, and use a Gibbs sampling algorithm to assess the impact of legislative redistricting. Other examples include: Western (1998) who uses MCMC to estimate hierarchical models of political determinants of economic growth; Schofield et al. (1998) and Quinn et al. (1999) who estimate multinomial probit models of voter choice using MCMC; Smith (1999) who estimates an ordered probit model with strategic censoring to model crisis escalation; and King et al. (1999) who use MCMC to estimate a hierarchical ecological inference model. As with any Bayesian model, it is necessary for the practitioner to formally posit her prior beliefs. To complete the Poisson regression model, I assume normal independent priors for each element of the β vector βj : βj ∼ N ormal(µ0 , σ02).1 In most applications, one uses noninformative priors. That is, priors that contribute little information about the parameters of interest [although, there are cases when using informative priors is quite valuable (Western and Jackman, 1994)]. In the models presented here, I assume noninformative priors. Thus, for each β j in Poisson regression I assume β j ∼ N ormal(0, 104), which is an extremely flat distribution that contributes little information. 2 Estimating statistical models using MCMC algorithms require two steps that are oftentimes quite difficult. Most common MCMC techniques, including the Gibbs sampler and the Metropolis-Hastings algorithm, require the practitioner to derive the full conditional distributions. In most cases, this algebra is far from trivial, and beyond the interest of most political scientists. The other difficulty is programming a computer to sample from the posterior density. To date, most commonly used statistical packages lack facility for MCMC estimation. Thus, programming in a high level language like Gauss, Ox, or S-Plus is oftentimes required to estimate these models. There is one freely available software package whose sole purpose is the estimation of Bayesian models. This software is called BUGS which stands for Bayesian inference Using Gibbs Sampling (Spiegelhalter et al., 1997). 3 BUGS eliminates many of the start-up costs of performing Bayesian inference, and requires the practitioner to only write down the probability model and specify the priors. BUGS then chooses an appropriate updating algorithm to simulate from the posterior distribution. I refer the reader to Appendix A for a brief description of BUGS, and code for the models discussed in this paper. Other priors could just as easily be applied. One could assume a multivariate prior β ∼ N j (β0 , B0−1 ), or an improper uniform prior on . Unless using informative priors, this choice usually makes very little difference in practice. If it does make a difference in a particular application, the prior, not the likelihood, is driving the inference. 2 Additionally, for all of the models presented below, I tested for robustness of the results to various prior specifications. For all models, the substantive conclusions are robust to other prior specifications, including informative priors. I have also tested for convergence using various techniques implemented in CODA, and am confident that each chain has reached a steady state. 3 This software is available on the web for Windows, Unix, and Linux from the MRC Biostatistics Unit and the Imperial College School of Medicine at St Mary’s, London at http://www.mrc-bsu.cam.ac.uk/bugs/. There are many helpful examples at this website. Additionally, Jackman (2000) has posted a handful of useful BUGS applications geared toward a political science audience at http://tamarama.stanford.edu/mcmc/. Gilks et al. (1996) also includes some examples of models estimated with BUGS. Beginning in Summer 2000, I will being posting useful BUGS and MCMC code, including all code used in this paper, at http://artsci.wustl.edu/∼admartin/. 1 R 4 Poisson Regression with Overdispersion The Poisson regression model has been used extensively in political science, but there is a well-known deficiency of the model: the expected value (mean) of the distribution equals its variance (Cameron and Trivedi, 1998, pp. 9-10). There are many observed counts, however, where positive or negative contagion may occur, and cause observed counts to be overdispersed (having variance greater than the mean), or underdispersed (having variance less than the mean). It is important to note that heterogeneity at the individual level will cause counts to be under- or overdispersed (see King, 1989b). Thus, models that allow for under- or overdispersion allow the researcher to model unobserved heterogeneity within a single cross-section. By neglecting overdispersion and fitting a model that requires equidispersion, standard errors will be inconsistent and inefficient, thus undermining inference. When faced with overdispersion, the traditional approach is to mix a gamma distribution with mean zero with the Poisson model, thus yielding a model with overdispersion. By the Greenwood and Yule (1920) compounding rule, this results in the negative binomial regression model, which can be estimated using maximum likelihood. For underdispersion, one must truncate the counts, which yields what King terms the continuous parameter binomial model (1989b). In the case when the existence of overdispersion or underdispersion is not known with certainty, King (1989b) offers a generalized event count (GEC) model that allows the researcher to estimate the amount of dispersion without assuming a priori what type exists. In practice, most political science event counts tend to exhibit equidispersion or overdispersion (but see King, 1989b, for some examples of underdispersion). In this paper, I will focus only on models that allow for overdispersion. However, instead of mixing the Poisson with the gamma distribution, I offer an alternative model where I mix the Poisson with the lognormal distribution. This model was offered by Breslow (1984). Here we have the same dependent variable yi , the same number of observations n, and the same row vector of covariates x i . The only difference is that now I include a random effect, which has zero mean (thus not contributing to the count), but adds to the variance of yi . I will call these effects η i. The Poisson regression model with overdispersion can now be written: yi ∼ Poisson(λi) (2) λi = exp(xiβ + ηi ) ηi ∼ N ormal(0, τ −1) τ is a scalar that captures the precision (inverse variance) of the random effect, thus indicates how much (or little) overdispersion there is. Again, to estimate this model in the Bayesian framework I assume independent priors for each βj : βj ∼ N ormal(µ0 , σ02). I also assume a gamma prior for the precision parameter τ : τ ∼ Gamma(ν0 , δ0 ). For the models presented here, I assume noninformative priors for τ : τ ∼ Gamma(0.001, 0.001) which has its mass to the immediately to the right of zero and is decreasing on R + . This model is easily estimable in BUGS; see the appendix for the code employed. Note that this is one of many models one could use when facing overdispersion or extra-Poisson variation (see Cameron and Trivedi, 1998, pp. 103-104 or Albert and Pepple, 1989). 5 Case Studies: Pre-Floor Behavior in the House To illustrate Bayesian inference for cross-sectional event count models, I use two datasets relating to prefloor behavior in the House of Representatives. In the first, I model the number of discharge petitions filed in every Congress from the 61 st to the 105th. The second case study is cosponsorship on women’s rights legislation in the 92 nd House – the Congress that passed the Equal Rights Amendment. The Use of the Discharge Procedure, 1910-1998 After the revolt against Speaker Joseph Cannon in 1910, the 61 st House of Representatives instituted a discharge procedure. This provision, which has remained a part of House rules to date, is the only mechanism by which any majority can force the floor to consider legislation without approval from the committee of jurisdiction, the party leadership, or the Rules Committee. It thus plays a very important institutional role, as it allows groups of members to audit the committee system as well as their party leadership. Yet, as noted by Beth (1990), the discharge petition is used infrequently. Not only is there the collective action problem of obtaining the requisite number of signatures on the petition, but there are costs borne by the member by auditing committees, and perhaps breaking log-rolling arrangements. While theoretically the discharge provision plays a very important role, little has been written explicitly about the procedure. Beth (1990) provides a history of the procedure, and tabulates important data about the frequency of its use. Krehbiel (1995) models discharge petition bargaining over a discharge petition in the 104 th House and finds that policy preferences alone explain who signed the “A-to-Z Spending Bill” petition, although Binder et al. (1999) find a party effect when employing a different measurement strategy. Martin and Wolbrecht (2000) find party and preference impacts on discharge petition bargaining on two petitions in the pre-Reform House. Yet, to date, little has been done to explain the frequency of discharge petition use. To gauge the importance of the discharge procedure as an auditing mechanism, it is important to understand the frequency of its use. 4 To wit, I have collected data from Beth (1990) which contains the number of discharge petitions filed in each Congress from the 61st to the 105th (n = 45). This is event count data, and thus the two models presented above are appropriate. This dataset has an extremely small sample size. One advantage of Bayesian inference over classical inference is that one does not need to rely on asymptotic theory to make claims about statistical significance. Indeed, if one were to rely on maximum likelihood estimation for this dataset, the standard errors would be of dubious value. By adopting a Bayesian approach, however, I am able to simulate directly from the posterior density without relying on an asymptotic justification. The dependent variable y i is simply the number of discharge petitions filed with the Clerk of the House in each Congress since the inception of the rule. I have chosen a set of theoretically important explanatory variables. First is chamber preference heterogeneity, which I operationalize using the standard deviation of the D-Nominate scores for each Congress (Poole and Rosenthal, 1997). If the chamber is more heterogeneous, the more likely it is for members to be dissatisfied with committee and/or party leadership consideration of legislation, and thus file more discharge petitions, holding all else constant. Divided government is also likely to impact the number of discharge petitions filed. Since the purpose of the discharge procedure is to 4 Although, the procedure might play an equally important institutional role when not used by keeping outlier committees from “misbehaving.” Assessing the importance of the procedure thus requires a rich theoretical understanding, supplemented with data analysis. 6 Parameter Equidispersion Model Post. Post. 90% BCI Mean SD Lower Upper Overdispersion Model Post. Post. 90% BCI Mean SD Lower Upper β1 -Constant β2 -Preference Heterogeneity β3 -Divided Government β4 -Number of Committees β5 -Majority Party Size β6 -Rule Regime 2.822 0.689 -0.125 0.303 0.165 -0.454 2.639 0.646 -0.082 0.458 0.150 -0.731 0.114 0.205 0.138 0.213 0.126 0.210 2.446 0.322 -0.310 0.105 -0.059 -1.076 2.825 1.002 0.141 0.810 0.356 -0.384 2.053 0.525 0.556 0.149 1.244 0.327 3.059 0.803 τ (Precision) τ −1 (Variance) Burn In Iterations MCMC Iterations n 0.038 0.033 0.041 0.063 0.035 0.054 2.745 0.624 -0.205 0.185 0.095 -0.559 n.a. n.a. 10000 25000 45 2.896 0.754 -0.044 0.427 0.232 -0.346 10000 25000 45 Table 1: Summary of the posterior densities from Poisson regressions of discharge petition filings, 19101998. The equidispersion model is written in Equation 1, and the overdispersion model is written in Equation 2. The abbreviation Post. Mean denotes the posterior mean, Post. SD the posterior standard deviation, and 90% BCI the 90% Bayesian Credible Interval. This interval summarizes the central 90% of the posterior density. Noninformative priors for the β j and precision τ are assumed: β j ∼ N ormal(0, 104) and τ ∼ Gamma(0.001, 0.001). ultimately effect policy change, petitions are less likely to be filed when the president and the majority party in the House are of different party. I thus include a dummy variable that designates divided government. The number of standing committees is also likely to impact the number of discharge petitions filed; the more committees, the more opportunity for members to be dissatisfied with committee behavior. I include the number of standing committees as an explanatory variable. Note that this variable captures the Legislative Reorganization Act of 1946, where the number of committees drops substantially from the forties to the twenties. The number of committees should positively impact the number of discharge petitions filed. Partisanship is also likely to impact the frequency of discharge petition filing. The larger the majority party, the more likely majority members will be dissatisfied with their leadership, and thus will be more likely to file petitions to audit their leadership. I include the majority party size, which equals the seat differential between the majority and minority party as an explanatory variable. The final variable is a dummy variable which takes a value of one between 1910 and 1931, and zero thereafter. This takes into account the major rule change that occurred in 1931 which made the procedure much easier to use (Beth, 1990). The expectation is that in this rule regime, the number of discharge petitions filed will be lower, all things being equal. I have fit a Poisson regression and a Poisson regression with overdispersion to this data. 5 The results are 5 Because these are time series data, there may be an autoregressive error structure. However, upon examining the data and 7 presented in Table 1. In this table, I report the posterior mean and standard deviation for each coefficient. Each of these can be interpreted like the familiar parameter estimates and standard errors in likelihood inference. I also report the 90% Bayesian Credible Intervals (BCI), which summarizes the central 90% of the posterior densities. I have chosen 90% as my threshold for significance because of the small sample size. For the Poisson regression model, I note that all of the coefficients are in the hypothesized direction, and all are significantly different from zero. For the overdispersion model, however, this is no longer the case. First, it is important to note that overdispersion is likely a problem in this data. Indeed, the variance of the random effect (τ −1 ) has a BCI [0.327, 0.803]. For the overdispersion model, preference heterogeneity, the number of committees, and the rule regime variable remain significant. 72.5% of the posterior density sample of the divided government variable is negative, and 88.1% of the posterior density of the majority party size variable are positive. One nice aspect of interpreting results from Bayesian models is that one can state the precise probability that a parameter is greater than or less than zero. 6 This is weak support for these hypotheses, but not conclusive. The purpose of this analysis is to demonstrate how to fit simple cross-sectional models in a Bayesian framework. The appendix contains sample BUGS code used for these models. Women’s Rights Cosponsorships in the 92 nd House In the second case study I turn my attention to cosponsorship on women’s rights legislation in the 92 nd House of Representatives. Because it was the Congress that passed the Equal Rights Amendment, the 92 nd is particularly important. In this example I employ data collected and previously analyzed by Wolbrecht (2000a,b). See Wolbrecht (2000b) for a detailed description of the data collection and coding scheme. The dependent variable y i is the number of pro-women’s rights bills cosponsored by each member of the 92 nd House. Thus, the member of Congress is the unit of analysis. I am interesting in seeing what explains these counts of congressional behavior. The legislative studies literature suggests that cosponsorship is primarily a preference based activity geared toward position taking (Campbell, 1982; Regens, 1989) and intra-legislative politics (Schiller, 1995; Kessler and Krehbiel, 1996). While the implications of these explanations for the timing of cosponsorships differ, the implication after the intra-legislative game has played out is clear: members tend to cosponsor legislation close to their preferred policy position. Thus, I include a measure of general left-right ideology as an explanatory variable: liberal members should cosponsor more women’s rights legislation than conservative members. I include each member’s D-Nominate first dimension score as a measure of left-right tendency (Poole and Rosenthal, 1997). 7 It is also likely that partisanship may impact women’s rights cosponsorships. Indeed, majority party leaders may compel their members to cosponsor legislation (Cox and McCubbins, 1993) as a tool for highlighting bill importance to constituents. I thus include a party dummy variable as an results from linear regression models, this does not seem to be a problem. Even if it were, the small sample size makes alternative models computationally infeasible. Brandt et al. (2000) provide a survey of time series models that can be used for event count data. 6 In the classical paradigm, all parameters are fixed but unobservable. Thus, the probability that a parameter is positive equals zero or one; it just cannot be observed. This requires mental gymnastics to interpret confidence intervals. 90% Bayesian Credible Intervals, on the other hand, can simply be interpreted as intervals where the parameter falls 90% of the time. 7 Clearly there are measures that better capture women’s right preferences (see Martin and Wolbrecht, 2000). However, I employ these scores here because they are available for a long period of time. This becomes particularly important for the hierarchical model, discussed below. 8 Parameter Equidispersion Model Post. Post. 95% BCI Mean SD Lower Upper Overdispersion Model Post. Post. 95% BCI Mean SD Lower Upper β1 -Constant β2 -Preferences β3 -Party β4 -Gender -0.672 -2.712 -0.113 1.194 -0.847 -2.581 -0.083 1.194 0.151 0.308 0.203 0.262 -1.150 -3.177 -0.476 0.676 -0.560 -1.979 0.315 1.711 3.104 0.360 1.140 0.115 1.618 0.165 6.058 0.617 τ (Precision) τ −1 (Variance) Burn In Iterations MCMC Iterations n 0.130 0.256 0.180 0.165 -0.937 -3.217 -0.458 0.862 n.a. n.a. 10000 25000 440 -0.426 -2.201 0.245 1.513 10000 25000 440 Table 2: Summary of the posterior densities from Poisson regressions of sponsorships on women’s rights legislation in the 92 nd House. The equidispersion model is written in Equation 1, and the overdispersion model is written in Equation 2. The abbreviation Post. Mean denotes the posterior mean, Post. SD the posterior standard deviation, and 95% BCI the 95% Bayesian Credible Interval. This interval summarizes the central 95% of the posterior density. Noninformative priors for the β j and precision τ are assumed: βj ∼ N ormal(0, 104) and τ ∼ Gamma(0.001, 0.001). explanatory variable (that takes a value one for Democrats, and zero for Republicans). Wolbrecht (2000b) argues that prior to the passage of the ERA, the Republican party was more supportive of women’s rights than the Democratic party. Thus, I expect Republicans to be more supportive of the ERA after controlling for policy preferences. Finally, I hypothesize that gender may impact the number of women’s rights cosponsorships. Simply put, because of their greater awareness of and interest in women’s rights issues, I expect women to be more likely to cosponsor women’s rights legislation than men. I code the gender dummy variable one for women and zero for men. In Table 2, I present results from the Poisson regressions of women’s rights cosponsorships on the explanatory variables noted above. Here I report 95% Bayesian Credible Intervals because of the larger sample size. For both the equidispersion and overdispersion models preferences are statistically significant; the more liberal the member of Congress, the more women’s rights bills they are likely to cosponsor. Gender is also statistically significant in both models: women tend to cosponsor are a higher rate than men. In both models, party is insignificant after controlling for preferences. This suggests that this position taking behavior seems to be primarily preference driven. There seems to be slight overdispersion in this model, although its magnitude is not as large as that in the discharge petition model. Posterior Predictive Densities When performing any statistical analysis it is always useful to take coefficient estimates and transform them in a manner that is substantively meaningful. Perhaps the most effective way to do this is to use the parameter 9 20 Cosponsorships 15 10 5 0 ConDM ConDW ConRM ConRW LibDM LibDW LibRM LibRW ModDM ModDW ModRM ModRW Hypothetical Member Figure 1: Boxplots of posterior predictive densities for hypothetical members of the 92 nd House. These densities are for the equidispersion model. Con denotes conservative, Mod denotes moderate, and Lib denotes liberal. Party membership is represented D for Democrats and R for Republicans. Gender is denoted M for men and W for women. estimates to predict out-of-sample behavior. In the Bayesian framework, one is interested in the posterior predictive density. For some out-of-sample observations ỹ, the posterior predictive distribution for a Poisson regression model is simply: f (ỹ|y) = f (ỹ, β|y)dβ = f (ỹ|β)f (β|y)dβ This average of the conditional predictions f (ỹ|β) over the posterior distribution f (β|y) can be evaluated using MCMC techniques. In fact, the distribution of any function of β can be estimated using MCMC methods. Interestingly, this is the precise approach used by King et al. (2000) in the context of frequentist inference. To wit, I have computed the posterior predictive densities for hypothetical members of Congress. I am specifically interested in liberal, moderate, and conservative members of both parties and both genders. I define a typical liberal as someone two standard deviations below the sample mean on the D-Nominate scale (-0.627), a moderate equal to the sample mean (-0.027), and a conservative as someone two standard deviations above the sample mean (0.573). I plot these densities in Figure 1. Many important patterns can be discerned from this figure. First, liberal women, whether Democrats or Republicans, tend to cosponsor women’s rights legislation most. The posterior predictive mean for both is approximately nine cosponsorships. After liberal women, liberal men cosponsor women’s rights legislation next most often, although there is a definite drop-off to a little over two cosponsorships. Moderate women of both parties are next 10 most likely to cosponsor, followed by moderate men. As the statistical results demonstrate, there is little difference between members of each party. Preferences and gender seem to be driving cosponsorship behavior in the 92nd House. Modeling Heterogeneous Event Counts: Hierarchical Poisson Regression So far I have presented results from commonly used models that can estimated in standard software packages using maximum likelihood estimation. Besides philosophical claims about the importance of subjective probability, what is the advantage of Bayesian inference? In this section, I argue that the advantage of Bayesian approaches generally, and hierarchical (or multilevel) models specifically, is that they allow extremely flexible modeling of heterogeneous data. Many of these models are computationally impossible to estimate (or very nearly so) in classical settings. Moreover, to estimate these using frequentist methods both the number of observations and the number of clusters must approach infinity to trust the standard errors. In most settings, neither of these conditions are met. Bayesian inference using MCMC, on the hand hand, provides a computationally efficient way to estimate hierarchical models for heterogeneous data. Assume now that we observe data from k different contexts (or clusters). Within each cluster we we have observations: i = 1, . . . , n k . Note that there may be a different number of observations n k in each cluster. The dependent variable y i,k ∈ 0, 1, 2, . . ., ∞ = Z∗ is an event count variable as discussed above. What sort of model should be apply to this data? One approach would be to pool all of the observations into one sample. This would yield yi,k ∼ Poisson(λi,k ) λi,k = exp(xi,k β) which is the Poisson regression model discussed above. For this model, however, there is an implicit assumption that an individual i with the same vector of characteristics x i,k would behave the same in each decision context k. That is, pooling the data requires one to assume behavioral homogeneity. If individuals are hypothesized to behave differently in different contexts, pooling data in this manner is inappropriate. In some applications this assumption is justifiable, yet what happens if contextual factors impact behavior? An alternative model would be to estimate independent models for each context k. So, one would estimate yi,k ∼ Poisson(λi,k ) λi,k = exp(xi,k βk ) which is simply k separate Poisson regressions. This is a viable strategy, but is throws away the baby with the bathwater. If contextual factors are important, one would ideally model those contextual effects as opposed to ignoring them by estimating individual models. By dividing the data in this fashion, drawing inferences about contextual factors is impossible. The common ground between these approaches is the Bayesian hierarchical model. This modeling strategy assumes that there are some commonalities between each context k, but that there may exist profound differences. The approach allows the practitioner to explicitly model these differences. In so doing, one is able to draw inferences that dominate fully pooled and completely separated models on a mean square error basis (Efron and Morris, 1973). While common in the statistics literature, hierarchical models have 11 only been sporadically used in the political science literature. Two notable applications are Western (1998), who uses a hierarchical model to model political determinants of economic growth, and King et al. (1999) who use hierarchical models to perform ecological inference. Hierarchical models are similar to the idea of fractional pooling (Bartels, 1996), where each cluster can “borrow strength” from other clusters to yield highly efficient parameter estimates. Hierarchical models are not a unique to the Bayesian approach. Jones and Steenbergen (1997) provide a comprehensive introduction to multilevel modeling from a classical standpoint. The major difficulty with hierarchical models is that frequentist inference is computationally difficult, and is based on assumptions that are untenable in most applied settings (namely, that n k → ∞ and k → ∞). The hierarchical model looks strikingly like the previous equation, however now we explicitly mode the distribution of β k – the column vector of parameters in each decision context. The first level of the model is within each cluster, and the second level includes contextual variables. The hierarchical Poisson regression model is thus: yi,k ∼ Poisson(λi,k ) λi,k = exp(xi,k βk ) βk,1 ∼ N ormal(zk α1 , ξ1−1 ) βk,2 ∼ N ormal(zk α2 , ξ2−1 ) .. .. .. . . . βk,j ∼ N ormal(zk αj , ξj−1 ) (3) In this formulation, I have assumed that each individual element of β k can be modeled by independent regressions with a row vector of cluster-level covariates z k , of dimensionality (1 × p j ). Note that each of these regressions need not include the same number of explanatory variables. 8 Notice that the secondlevel specification serves as the priors for the first-level parameters. Thus, one only needs to specify the priors on the α j,1 to αj,pj for all j and the ξ j parameters. In this application, I assume standard priors: 2 ) and for the precisions: ξ j ∼ Gamma(ν0,j , δ0,j ) α∗,∗ ∼ N ormal(µ0,∗,∗, σ0,∗,∗ The advantage of this model is clear. One can look at the posterior density of the parameters β k to see how the characteristics x i,k impact behavior within each cluster k. And, one can look at the parameters α ∗,∗ to see how contextual factors impact behavior at the first-level. In short, this modeling strategy allows one to model and test for various causes of behavioral heterogeneity. The hierarchical Poisson regression model is thus useful when modeling heterogeneous event counts. Others have applied hierarchical event count models to various sorts of data. Albert (1992) fits a random effects Poisson model to home run data to detect contextual factors that impact various seasons of twelve baseball players. Christiansen and Morris (1997) develop a hierarchical Poisson model with a different parameterization than that above, and provide software one can use in S-Plus to simulate from the posterior density. Chib et al. (1998) posit an MCMC algorithm one can use to estimate from random effects panel event count model using a unique parameterization, and apply the model to epilepsy and patent data. Berry et al. (1999) develop an interesting set of hierarchical models that can be used to compare players in different A more general model would be to model the distribution of the β k vectors with a multivariate regression model: β j ∼ Nj (Ak α, Ω). However, when there are many first-level parameters and few clusters, it becomes difficult, if not impossible, to estimate Ω with precision. In the application below, with four first-level covariates and twenty contexts, Ω has ten free elements, which would be impossible to estimate. 8 12 eras of sports. For their hockey data, they offer a hierarchical Poisson model that includes an elaborate clustering scheme, and a random curve function to allow for aging. Scollnik (N.D.) discusses actuarial models, and provides BUGS code for some hierarchical event count models. While these models are well traveled in applied statistics, their potential has been untapped in political science applications. Heterogeneity in Women’s Rights Cosponsorships in the House, 1953-1992 Earlier I fit a cross-sectional model to all women’s rights cosponsorships in the 92 nd House. Now I turn my attention to the similar data for the 83 rd to the 102nd Congress.9 This dataset contains the number of women’s rights bills cosponsored by every member in each of these Congresses. Clearly contextual factors, such as the national political agenda, congressional rules, and chamber characteristics, would induce profoundly different behavior from a member of Congress with the identical policy preference of the same party and gender. In this section, I fit a hierarchical Poisson regression to model behavioral heterogeneity in this women’s rights sponsorship data. Not only am I interested in explaining behavior within a particular Congress, I am also interested in how the macropolitical factors discussed below impact congressional behavior. Returning to the notation used in Equation 3, here we observe behavior for k = 20 Congresses (or clusters). Within each House, there are i = 1, . . ., nk members. Without member replacement within each Congress (and without the brief change of the size of the House in the late 1950s) n k = 435 ∀ k. For all of these clusters, there are approximately 440 members in each, although the number is different for each Congress. We thus observe counts y i,k ∈ Z∗ – one for each member i in each Congress k. I use the same explanatory variables xi,k : policy preferences measured with first dimension D-Nominate scores, a party dummy, and a gender dummy. The Baselines: Pooled and Independent Poisson Regressions Perhaps congressional cosponsorship behavior is not influenced by contextual factors. If that is the case we could pool all observations into a single sample and estimate one vector of parameters β. I have done just that, and present results in Table 3. Looking at the BCIs, the results are quite straightforward: liberals tend to cosponsor at a higher rate than conservatives; when controlling for preferences Republicans are more likely to cosponsor than Democrats; and women are more likely to cosponsor than men. However, the assumption of homogeneity is theoretically untenable. So, to serve as a second baseline, I have estimated a set of independent Poisson regressions – one for each Congress k. I present the results in Table 4. Two things are to be gleaned from this table. First, when examining the constants across each decision context, it is clear that they vary: the posterior mean takes its lowest value of −2.971 in the 83 rd House, and its maximum value of 1.738 in the 99 th House. Substantively, exponentiating these constants gives us the expected number of bills cosponsored for a moderate Republican man. For the 83 rd, we would expect this individual to cosponsor 0.01 bills, but in the 99 th to cosponsor 5.7 bills. Clearly, the assumption of homogeneity is not met. Further, when examining the importance of preferences across Congresses, we 9 This data was generously provided by Christina Wolbrecht. She discusses the manner in which it is collected and presents much of the data in her book (Wolbrecht, 2000b). 13 Parameter Post. Mean Post. SD 95% BCI Lower Upper β1 -Constant β2 -Preferences β3 -Party β4 -Gender 0.367 -3.128 -0.890 1.193 0.020 0.041 0.031 0.025 0.329 -3.210 -0.953 1.145 Burn In Iterations MCMC Iterations N 10000 25000 440 0.406 -3.049 -0.831 1.243 Table 3: Summary of the posterior density of a Poisson regression of cosponsorships on women’s rights legislation for the 82 nd to the 102nd House pooled data. The model is written in Equation 1. The abbreviation Post. Mean denotes the posterior mean, Post. SD the posterior standard deviation, and 95% BCI the 95% Bayesian Credible Interval. This interval summarizes the central 95% of the posterior density. Noninformative priors for the β j are assumed: βj ∼ N ormal(0, 104). not that the preferences significantly impact behavior in all but the 86 th House. Yet, the magnitude of the relationship varies throughout the time period, again implying behavioral heterogeneity. The same story holds for the party and gender dummies: the β k,3 and βk,4 coefficients vary across clusters. The weakness of this model is that while heterogeneity is apparent, we can neither model it nor test for contextual effects. If the goal of social science is to evaluate competing social explanations, it is necessary to employ statistical models that allow one to test contextual explanations. I thus turn my attention to a hierarchical Poisson regression model, which does just this. Results: Hierarchical Poisson Regression The hierarchical model presented earlier allows one to model heterogeneity with suitable contextual variables. Clearly members of Congress have behaved quite differently in different Congresses. What explains this heterogeneity? I hypothesize that four contextual factors are likely to impact cosponsorship behavior on women’s rights. First, I hypothesize that the national agenda will impact cosponsorship; once women’s rights generally, and the Equal Rights Amendment specifically, are firmly a part of the national agenda, the number of cosponsorships should increase. This is derived from electoral considerations. Once women’s rights became supported by a majority, more members, regardless of preferences, party, and gender, would tend to cosponsor legislation. As argued by Wolbrecht (2000b), the women’s rights agenda was not firmly entrenched on the national political agenda until the 92 nd Congress; the Congress which passed the Equal Rights Amendment. Second, congressional rules are also likely to impact congressional behavior. Throughout this time period, cosponsorship rules were liberalized. Until 1967, cosponsorship was not allowed in the House. Thus, members introduced identical legislation with their name as primary sponsor. In 1967 the rule was changed to allow for multiple cosponsorship, allowing up to 25 members to cosponsor each bill. This did not catch on for many years, and most members continued to introduce individual bills. This changed 14 in 1978, when the rules were again changed to allow for unlimited cosponsorship. The second rule change was widely accepted by the membership, and began to be used immediately. Thus, after the rules change in the 95th House, I expect cosponsorship behavior for all members to increase. These first two contextual factors are hypothesized to explain the baseline level of cosponsorship for all members. In terms of the hierarchical model, they should thus impact the constant. For the 92 nd Congress and thereafter, the constant β 1,k should be bigger than before due to the change in the women’s rights agenda. Similarly, after the rules change in the 95 th House, we would expect the baseline level of cosponsorships to be higher, all things being equal. I thus include two dummy variables at the second level of the hierarchy, one that takes the value one for the 92 nd to 95th House, and one that takes the value one from the 96 th to 102nd House. I use these variables to model the distribution of the constant as: βk,1 = α1,1 + α1,2 Post 91st Dummyk + α1,3 Post 95th Dummyk + εk εk ∼ N ormal(0, ξ1−1) The hypothesis is that α 1,2 should be positive, indicating an increased level of women’s rights cosponsorships for all members. α1,3 should also be positive, and should have a magnitude higher than α 1,2 , since it captures the agenda effect and the effect of the rules change. I hypothesize that two additional contextual factors impact women’s rights cosponsorships. First, I expect that chamber heterogeneity should impact the importance of preferences. I measure chamber heterogeneity by computing the standard deviation of the D-Nominate first dimension scores for the chamber. The more heterogeneous the chamber, the magnitude of the preference β 2,k should decrease, and vice versa. As preference heterogeneity increases in the chamber, the larger the agenda for all members. Additionally, the probability of effecting policy change on any issue is more difficult in a heterogeneous chamber. It is Parameter Post. Mean Post. SD 95% BCI Lower Upper Parameter Post. Mean Post. SD 95% BCI Lower Upper β1,1 -Constant β2,1 -Constant β3,1 -Constant β4,1 -Constant β5,1 -Constant β6,1 -Constant β7,1 -Constant β8,1 -Constant β9,1 -Constant β10,1 -Constant β11,1 -Constant β12,1 -Constant β13,1 -Constant β14,1 -Constant β15,1 -Constant β16,1 -Constant β17,1 -Constant β18,1 -Constant β19,1 -Constant β20,1 -Constant β1,2 -DNominate β2,2 -DNominate β3,2 -DNominate β4,2 -DNominate β5,2 -DNominate β6,2 -DNominate β7,2 -DNominate β8,2 -DNominate β9,2 -DNominate β10,2 -DNominate β11,2 -DNominate β12,2 -DNominate β13,2 -DNominate β14,2 -DNominate β15,2 -DNominate β16,2 -DNominate β17,2 -DNominate β18,2 -DNominate β19,2 -DNominate β20,2 -DNominate -2.971 -1.494 -0.179 -1.810 -0.432 -0.724 -0.755 -0.620 -0.266 -0.662 0.061 -0.767 0.372 -0.905 0.391 1.738 1.230 1.104 0.869 1.654 -2.578 -3.271 -0.764 -0.083 -1.655 -2.048 -1.283 -1.415 -0.841 -2.718 -4.336 -4.725 -3.078 -4.083 -3.592 -3.446 -3.156 -2.894 -2.881 -2.112 0.502 0.307 0.125 0.251 0.149 0.166 0.166 0.129 0.103 0.125 0.089 0.156 0.091 0.180 0.089 0.060 0.077 0.074 0.059 0.059 1.083 0.766 0.328 0.565 0.416 0.460 0.440 0.329 0.281 0.258 0.231 0.334 0.216 0.433 0.229 0.163 0.187 0.142 0.122 0.091 -4.026 -2.116 -0.424 -2.336 -0.731 -1.055 -1.099 -0.880 -0.468 -0.915 -0.121 -1.089 0.183 -1.267 0.211 1.620 1.074 0.957 0.750 1.538 -4.737 -4.798 -1.424 -1.187 -2.452 -2.938 -2.128 -2.066 -1.398 -3.216 -4.784 -5.363 -3.502 -4.909 -4.026 -3.765 -3.523 -3.166 -3.123 -2.288 β 1,3 -Party β 2,3 -Party β 3,3 -Party β 4,3 -Party β 5,3 -Party β 6,3 -Party β 7,3 -Party β 8,3 -Party β 9,3 -Party β 10,3 -Party β 11,3 -Party β 12,3 -Party β 13,3 -Party β 14,3 -Party β 15,3 -Party β 16,3 -Party β 17,3 -Party β 18,3 -Party β 19,3 -Party β 20,3 -Party β 1,4 -Gender β 2,4 -Gender β 3,4 -Gender β 4,4 -Gender β 5,4 -Gender β 6,4 -Gender β 7,4 -Gender β 8,4 -Gender β 9,4 -Gender β 10,4 -Gender β 11,4 -Gender β 12,4 -Gender β 13,4 -Gender β 14,4 -Gender β 15,4 -Gender β 16,4 -Gender β 17,4 -Gender β 18,4 -Gender β 19,4 -Gender β 20,4 -Gender -0.737 -1.551 -0.776 -0.248 -1.287 -1.173 -0.975 -0.404 -0.446 -0.126 -1.114 -0.797 -0.574 -0.796 -0.679 -1.317 -1.056 -0.597 -1.034 -0.666 2.188 0.768 0.482 1.741 0.964 1.155 0.643 -0.049 0.321 1.198 1.150 1.303 1.157 1.259 0.597 0.850 0.849 0.809 1.265 0.953 0.841 0.573 0.215 0.376 0.267 0.302 0.279 0.211 0.170 0.177 0.142 0.214 0.133 0.269 0.140 0.101 0.126 0.109 0.109 0.095 0.517 0.494 0.271 0.306 0.261 0.309 0.400 0.433 0.336 0.164 0.117 0.131 0.102 0.196 0.131 0.075 0.092 0.084 0.059 0.050 -2.380 -2.676 -1.196 -0.985 -1.825 -1.757 -1.520 -0.821 -0.780 -0.467 -1.384 -1.207 -0.828 -1.318 -0.948 -1.516 -1.301 -0.808 -1.246 -0.853 1.082 -0.304 -0.093 1.117 0.436 0.515 -0.185 -0.973 -0.393 0.869 0.916 1.049 0.952 0.866 0.337 0.704 0.666 0.646 1.147 0.852 Burn In Iterations MCMC Iterations Clusters N -2.063 -0.906 0.064 -1.344 -0.150 -0.404 -0.438 -0.375 -0.066 -0.421 0.229 -0.477 0.544 -0.566 0.562 1.857 1.384 1.247 0.981 1.769 -0.502 -1.800 -0.117 1.028 -0.839 -1.119 -0.416 -0.780 -0.300 -2.215 -3.884 -4.075 -2.653 -3.250 -3.136 -3.126 -2.781 -2.612 -2.639 -1.934 0.896 -0.445 -0.353 0.483 -0.775 -0.582 -0.413 0.009 -0.120 0.221 -0.824 -0.369 -0.310 -0.258 -0.398 -1.115 -0.805 -0.385 -0.822 -0.482 3.121 1.657 0.984 2.315 1.459 1.732 1.367 0.711 0.929 1.514 1.373 1.550 1.352 1.632 0.847 1.000 1.026 0.976 1.380 1.052 10000 25000 20 8808 Table 4: Summary of the posterior density of independent Poisson regressions of cosponsorships on women’s rights legislation for the 83 nd to the 102nd House. The abbreviation Post. Mean denotes the posterior mean, Post. SD the posterior standard deviation, and 95% BCI the 95% Bayesian Credible Interval. This interval summarizes the central 95% of the posterior density. Noninformative priors for the β j are assumed: βj ∼ N ormal(0, 104). 16 Parameter β1,1 -Constant β2,1 -Constant β3,1 -Constant β4,1 -Constant β5,1 -Constant β6,1 -Constant β7,1 -Constant β8,1 -Constant β9,1 -Constant β10,1 -Constant β11,1 -Constant β12,1 -Constant β13,1 -Constant β14,1 -Constant β15,1 -Constant β16,1 -Constant β17,1 -Constant β18,1 -Constant β19,1 -Constant β20,1 -Constant β1,2 -DNominate β2,2 -DNominate β3,2 -DNominate β4,2 -DNominate β5,2 -DNominate β6,2 -DNominate β7,2 -DNominate β8,2 -DNominate β9,2 -DNominate β10,2 -DNominate β11,2 -DNominate β12,2 -DNominate β13,2 -DNominate β14,2 -DNominate β15,2 -DNominate β16,2 -DNominate β17,2 -DNominate β18,2 -DNominate β19,2 -DNominate β20,2 -DNominate α1,1 -Constant α1,2 -Post-91st α1,3 -Post-95th α2,1 -Constant α2,2 -Hetero. α2,3 -PartyDiff Burn In Iterations MCMC Iterations Clusters n Post. Mean Post. SD 95% BCI Lower Upper Parameter Post. Mean Post. SD 95% BCI Lower Upper -2.558 -1.733 -0.183 -1.484 -0.549 -0.811 -0.809 -0.569 -0.222 -0.532 0.030 -0.725 0.394 -0.823 0.402 1.694 1.201 1.122 0.861 1.656 -2.600 -2.523 -0.839 -0.695 -1.369 -1.782 -1.178 -1.622 -1.064 -2.948 -4.193 -4.536 -3.108 -3.968 -3.638 -3.333 -3.102 -2.929 -2.849 -2.123 0.249 0.193 0.103 0.162 0.122 0.130 0.135 0.109 0.094 0.115 0.087 0.131 0.083 0.142 0.082 0.059 0.070 0.071 0.056 0.056 0.524 0.428 0.274 0.405 0.334 0.348 0.346 0.288 0.261 0.247 0.216 0.292 0.200 0.344 0.210 0.156 0.168 0.139 0.115 0.087 -3.087 -2.119 -0.385 -1.798 -0.792 -1.065 -1.079 -0.792 -0.408 -0.768 -0.139 -0.992 0.227 -1.106 0.241 1.580 1.063 0.978 0.750 1.546 -3.657 -3.373 -1.371 -1.494 -2.049 -2.459 -1.884 -2.186 -1.565 -3.443 -4.623 -5.131 -3.504 -4.651 -4.055 -3.649 -3.438 -3.199 -3.080 -2.293 -2.106 -1.362 0.013 -1.173 -0.310 -0.558 -0.545 -0.359 -0.037 -0.313 0.201 -0.476 0.556 -0.551 0.555 1.813 1.336 1.258 0.971 1.765 -1.606 -1.709 -0.315 0.099 -0.729 -1.104 -0.514 -1.055 -0.551 -2.474 -3.764 -3.963 -2.719 -3.284 -3.213 -3.030 -2.769 -2.659 -2.626 -1.952 β 1,3 -Party β 2,3 -Party β 3,3 -Party β 4,3 -Party β 5,3 -Party β 6,3 -Party β 7,3 -Party β 8,3 -Party β 9,3 -Party β 10,3 -Party β 11,3 -Party β 12,3 -Party β 13,3 -Party β 14,3 -Party β 15,3 -Party β 16,3 -Party β 17,3 -Party β 18,3 -Party β 19,3 -Party β 20,3 -Party β 1,4 -Gender β 2,4 -Gender β 3,4 -Gender β 4,4 -Gender β 5,4 -Gender β 6,4 -Gender β 7,4 -Gender β 8,4 -Gender β 9,4 -Gender β 10,4 -Gender β 11,4 -Gender β 12,4 -Gender β 13,4 -Gender β 14,4 -Gender β 15,4 -Gender β 16,4 -Gender β 17,4 -Gender β 18,4 -Gender β 19,4 -Gender β 20,4 -Gender -0.876 -0.966 -0.809 -0.703 -1.048 -0.960 -0.885 -0.564 -0.585 -0.353 -1.011 -0.753 -0.605 -0.821 -0.714 -1.234 -1.008 -0.630 -1.000 -0.676 1.201 0.951 0.745 1.262 0.981 1.063 0.882 0.641 0.705 1.130 1.122 1.249 1.133 1.150 0.678 0.872 0.870 0.825 1.251 0.955 0.257 0.251 0.168 0.215 0.190 0.201 0.193 0.168 0.150 0.158 0.127 0.163 0.119 0.184 0.123 0.098 0.111 0.104 0.102 0.089 0.272 0.240 0.190 0.231 0.186 0.207 0.225 0.236 0.212 0.142 0.107 0.119 0.095 0.162 0.120 0.071 0.085 0.081 0.059 0.050 -1.412 -1.495 -1.143 -1.116 -1.439 -1.377 -1.276 -0.888 -0.881 -0.653 -1.261 -1.074 -0.836 -1.182 -0.958 -1.429 -1.225 -0.831 -1.201 -0.846 0.718 0.467 0.336 0.847 0.612 0.655 0.409 0.129 0.248 0.849 0.911 1.016 0.944 0.839 0.435 0.734 0.697 0.664 1.135 0.859 -0.379 -0.495 -0.479 -0.277 -0.683 -0.577 -0.522 -0.231 -0.296 -0.032 -0.762 -0.429 -0.369 -0.459 -0.470 -1.044 -0.790 -0.427 -0.802 -0.496 1.803 1.403 1.090 1.762 1.343 1.465 1.304 1.047 1.077 1.413 1.331 1.482 1.320 1.477 0.903 1.010 1.035 0.983 1.366 1.053 -0.986 0.776 1.853 20.860 -122.800 21.300 0.282 0.513 0.427 8.773 53.550 14.630 -1.546 -0.239 0.986 8.055 -256.800 -3.775 -0.448 1.794 2.683 42.340 -46.370 57.420 α 3,1 -Constant α 4,1 -Constant ξ 1 (Precision) ξ 2 (Precision) ξ 3 (Precision) ξ 4 (Precision) -0.811 0.984 1.631 1.339 15.920 18.600 0.077 0.075 0.583 0.535 8.324 12.540 -0.967 0.835 0.679 0.543 5.474 4.952 -0.664 1.134 2.936 2.631 36.910 51.430 10000 25000 20 8808 Table 5: Summary of the posterior density of a hierarchical Poisson regression of cosponsorships on women’s rights legislation for the 83 nd to the 102nd House. The model is written in Equation 3. The abbreviation Post. Mean denotes the posterior mean, Post. SD the posterior standard deviation, and 95% BCI the 95% Bayesian Credible Interval. This interval summarizes the central 95% of the posterior density. Noninformative priors for the α ∗,∗ and ξj are assumed: α∗,∗ ∼ N ormal(0, 104) and ξj ∼ Gamma(0.001, 0.001). 17 I present the results from the hierarchical Poisson regression in Table 5. The results for the individual clusters are strikingly similar to those for the independent Poisson regressions. The interesting findings are in the hyperparameters α∗,∗ that capture the contextual effects. While the BCI for α 1,2 contains zero, 93.8% of the posterior density sample falls above zero. Thus, I can state with 93.8% certainty that this coefficient is position, which substantively implies that the macropolitical change in the women’s rights agenda did in fact contribute to a greater number of cosponsorships for all members, consistent with the argument of Wolbrecht (2000b). α1,3 is also positive as hypothesized, and significantly different from zero. Its magnitude is greater than α1,2 , indicating that the rules change contributed above and beyond the agenda change in impacting women’s rights cosponsorships. 10 Preference heterogeneity explains variance in the β k,2 parameters. As hypothesized, as preference heterogeneity increases, the marginal impact of preferences increases. There is no finding for the party difference variable, however. The α 3,1 and α4,1 parameters summarize the grand mean of the βk,3 and βk,4 parameters across the models. The precisions of the hyperparameter regressions are interesting. Indeed, given only k = 20 clusters, precise estimation of the hyperstructure is unlikely. However, these rather large precisions (which can be interpreted as the reciprocal estimated error variance in a regression model) indicate a good fit. Posterior Predictive Densities While the table of coefficients is important in judging the significance and magnitude of the explanatory variables at the first and second level of the hierarchies, the question remains as to what the substantive impact of these variables have on women’s rights cosponsorships. If it were the case the that number of cosponsorships varied between 4.9 to 5.1 for a typical member across decision contexts, we could conclude that the contextual factors had little substantive impact. To assess the substantive fit of the model, I have examined the posterior predictive density of the model in two ways. First, I have estimated the posterior predictive distributions for each observation in the dataset, and then scattered the mean of this distribution against the sample values y i,k . I did this for the pooled model, and for the hierarchical model. The pooled model clearly does an unsatisfactory job of explaining y i,k for the simple reason it does not allow contextual variables to impact behavior. While far from a perfect fit along the 45-degree line, the hierarchical model does a much better job of explaining the variance in y i,k . When fitting any statistical model, it is important to make sure the model could have generated the observed data (see Gelman et al., 1995). For the sake of space I have omitted these figures from this paper, instead focussing on the more interesting typical members in each Congresses. To illustrate the heterogeneity across Congresses, I have estimated the posterior predictive densities for two typical members of Congress: a liberal female Republican, and a moderate male Democrat (using the same operationalizations as Figure 1). For each Congress, I plot a boxplot of logarithm base-10 of the number of predicted cosponsorships. 11 Figure 2 contains the summaries for the liberal Republican woman – the individual most likely to cosponsor women’s rights legislation. For the 83 rd to the 91st Congress this hypothetical member is predicted to cosponsor between 1.5 and 4 pieces of legislation per Congress. I have re-specified the model with the Post-91 st dummy taking a value of one until the 102 nd , thus making the Post-95 th dummy the marginal contribution of the rules change above and beyond the agenda change. Given this specification, it remains positive. 11 I plot the logarithm of the counts so the variance in each plot is apparent. One can exponentiate these to return to the original scale. 10 18 log(Cosponsorships) 2 1 0 -1 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 Congress Figure 2: Boxplots of posterior predictive densities for a liberal Republican woman in the 83 nd to 102rd House. This figure presents the logarithm base-10 of predicted cosponsorships for each Congress. However, the agenda change in the 92 nd is quite apparent; in the 92 nd , this member is expected to cosponsor over 11 bills, and in the 93 rd and thereafter, over 40. The jump for the rules change in the 97 th Congress is also apparent. For the hypothetical moderate Democratic male in Figure 3, similar patterns are discernible. While the agenda change does not seem to have as dramatic an effect, the rules change in the 97 th seems to impact behavior dramatically. Indeed, this expectation for the hypothetical member went from substantially less than one cosponsorship per Congress until the 97 th when the number jumps to 1.6, and then further jumps to 6.0 in the 98 th . Holding all else constant, the rules change dramatically impacted cosponsorship behavior on women’s rights bills, as did the change in the macropolitical agenda. Clearly, contextual factors have a strong substantive impact on congressional cosponsorship behavior. The point to be taken from this analysis and these results are that hierarchical models are extremely useful and flexible models that can be used to model all sorts of heterogeneous data. Conclusion The analysis presented above demonstrates the usefulness of Bayesian estimation using MCMC for the analysis of clustered, heterogeneous data. The model above, however, is just the tip of the iceberg. Indeed, there are many more sophisticated models one can use to model all sorts of contextual effects. Not only can these models be generalized with more levels of analysis (for example, looking at roll call voting by members of every state legislature for a fifty year time period). Here one could model contextual differences across 19 1.0 log(Cosponsorships) 0.5 0.0 -0.5 -1.0 -1.5 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 Congress Figure 3: Boxplots of posterior predictive densities for a moderate Democratic man in the 83 nd to 102rd House. This figure presents the logarithm base-10 of predicted cosponsorships for each Congress. states and across time using three levels of analysis. One could also develop more elaborate cross-cutting mindful of dealing with the problem of persistence. Brandt et al. (2000) offer a time series regression model called the PEWMA model which is particularly usefully when dealing with many counts observed through time. Appendix: BUGS Code In Summer 2000 I will make available all BUGS code necessary to estimate these models available on my homepage. Since BUGS code is somewhat dependent on the platform used, here I present the key model definition statements for the models presented in the paper. The data input statements and the specification of starting values differ slightly across platforms. All of the models presented here were estimated in Classic BUGS 0.6 on a Linux workstation (note that WinBUGS would use a slightly different specification). BUGS eliminates many of the start-up costs associated with perform Bayesian inference. Not only does it alleviate the need to program in a high level language, but it chooses appropriate MCMC algorithms for the problem at hand. Specifically, BUGS uses Gibbs sampling when the full conditional takes a known form, adaptive rejection sampling when the full conditional is not of a closed form but is log-concave (Gilks and Wild, 1992), and the Metropolis-Hastings algorithm to sample for non-log-concave distributions (Spiegelhalter et al., 1997). BUGS also comes with a suite of routines one can use with R and S-Plus to test for convergence called CODA (Best et al., 1997). For a comprehensive treatment of how to use BUGS, compile and estimate models, and test for convergence, consult Spiegelhalter et al. (1997) and Best et al. (1997). Poisson Regression with Posterior Prediction The first model is the Poisson regression model with the estimation of posterior predictive distributions. The results from this model appear in Table 2 and Figure 1. Note that I use the same model definition notation used in the paper. Since the posterior predictive distributions are just Monte Carlo averages over the posterior density of β, they can be simulated from within the sampler. # Poisson Regression Model { for (i in 1:N) { log(mu[i]) <beta[1] * cons[i] + beta[2] * dnom[i] + beta[3] * party[i] + beta[4] * gender[i]; cospon[i] ˜ dpois(mu[i]); } # Priors for(j in 1:J) { beta[j] ˜ dnorm(0.0, 0.0001); } # Posterior Prediction libRW <- exp(beta[1] + modRW <- exp(beta[1] + conRW <- exp(beta[1] + libDW <- exp(beta[1] + modDW <- exp(beta[1] + -0.627 -0.027 0.573 -0.627 -0.027 * * * * * beta[2] beta[2] beta[2] beta[2] beta[2] 21 + + + + + 0 0 0 1 1 * * * * * beta[3] beta[3] beta[3] beta[3] beta[3] + + + + + 1 1 1 1 1 * * * * * beta[4]); beta[4]); beta[4]); beta[4]); beta[4]); conDW libRM modRM conRM libDM modDM conDM } <<<<<<<- exp(beta[1] exp(beta[1] exp(beta[1] exp(beta[1] exp(beta[1] exp(beta[1] exp(beta[1] + + + + + + + 0.573 -0.627 -0.027 0.573 -0.627 -0.027 0.573 * * * * * * * beta[2] beta[2] beta[2] beta[2] beta[2] beta[2] beta[2] + + + + + + + 1 0 0 0 1 1 1 * * * * * * * beta[3] beta[3] beta[3] beta[3] beta[3] beta[3] beta[3] + + + + + + + 1 0 0 0 0 0 0 * * * * * * * beta[4]); beta[4]); beta[4]); beta[4]); beta[4]); beta[4]); beta[4]); Poisson Regression with Overdispersion The model definition code for the Poisson regression with overdispersion is based on the salm example in the BUGS manual. Below is the code for the discharge petition model presented in Table 1. To facilitate convergence, in this model all variables are standardized. BUGS allows the practitioner to perform all manner of recodes within the software. The only difference between this model and the previous one is that now a random effect ηi is included for each observation, and a prior probability is placed on the precision parameter τ . # Poisson Regression Model with Overdispersion for (i in 1:N) { log(mu[i]) <beta[1] * cons[i] + beta[2] * hsdnom.std[i] + beta[3] * divgov.std[i] + beta[4] * totcomm.std[i] + beta[5] * majsize.std[i] + beta[6] * rul1.std[i] + eta[i]; petfiled[i] ˜ dpois(mu[i]); eta[i] ˜ dnorm(0.0, tau); } # Priors for(j in 1:K) { beta[j] ˜ dnorm(0.0, 0.0001); } tau ˜ dgamma(0.001,0.001); } Hierarchical Poisson Regression The final code I present is from the hierarchical Poisson regression model. Here the data is loaded from three files: one that contains the level one covariates sorted by Congress, one that contains the number of observations n k within each cluster, and one that contains the contextual (second-level) covariates. This code begins with the first level, assuming a Poisson regression within each cluster. Then, across each cluster k, I assume independent regressions are noted in the text. The code concludes by placing priors on all of the hyperparameters. # Hierarchical Poisson Regression Model -- Level One 22 { for (k in 1:K) { for (i in lower[k]:upper[k]) { log(mu[i]) <(beta[1,k] * cons[i] + beta[2,k] * dnom[i] + beta[3,k] * party[i] + beta[4,k] * gender[i]); cospon[i] ˜ dpois(mu[i]); } } # Hierarchical Poisson Regression Model -- Level Two for (k in 1:K) { nu[1,k] <- alpha1[1] + alpha1[2] * era1[k] + alpha1[3] * era2[k]; nu[2,k] <- alpha2[1] + alpha2[2] * sdnom[k] + alpha2[3] * pdist[k]; nu[3,k] <- alpha3[1]; nu[4,k] <- alpha4[1]; for (j in 1:J) { beta[j,k] ˜ dnorm(nu[j,k],xi[j]); } } # Priors for(m1 in 1: M1) { alpha1[m1] ˜ dnorm(0.0, 0.0001); for(m2 in 1: M2) { alpha2[m2] ˜ dnorm(0.0, 0.0001); for(m3 in 1: M3) { alpha3[m3] ˜ dnorm(0.0, 0.0001); for(m4 in 1: M4) { alpha4[m4] ˜ dnorm(0.0, 0.0001); for (j in 1:J) { xi[j] ˜ dgamma(0.001,0.001); } } } } } References Albert, James. 1992. “A Bayesian Analysis of a Poisson Random Effects Model for Home Run Hitters.” The American Statistician 46(November):246–253. Albert, James H., and Patricia A. Pepple. 1989. “A Bayesian Approach to Some Overdispersion Models.” The Canadian Journal of Statistics 17(September):333–344. Bartels, Larry M. 1996. 40(August):905–942. “Pooling Disparate Observations.” American Journal of Political Science Beck, Neal, and Jonathan N. Katz. 1995. “What To Do (And Not To Do) with Time-Series Cross-Section Data.” American Political Science Review 89(September):634–647. 23 Berry, Scott M., C. Shane Reese, and Patrick D. Larkey. 1999. “Bridging Different Eras in Sports.” Journal of the American Statistical Association 94(September):661–686. Best, Nicky, Mary Kathryn Cowles, and Karen Vines. 1997. Convergence Diagnostics and Output Analysis Software for Gibbs Sampling Output, Version 0.4. Cambridge: MRC Biostatistics Unit. Beth, Richard S. 1990. The Discharge Rule in the House of Representatives: Procedure, History, and Statistics. Washington, DC: Congressional Research Service. Binder, Sarah A., Eric D. Lawrence, and Forrest Maltzman. 1999. “Uncovering the Hidden Effect of Party.” Journal of Politics 61(August):815–831. Brandt, Patrick T., John T. Williams, Benjamin O. Fordham, and Brian Pollins. 2000. “Dynamic Modeling for Persistent Event Count Time Series.” Workshop in Political Theory and Policy Working Paper, Indiana University. Breslow, N. E. 1984. “Extra-Poisson Variation in Log-Linear Models.” Applied Statistics 33(1):38–44. Cameron, A. Colin, and Pravin K. Trivedi. 1998. Regression Analysis of Count Data. Cambridge: Cambridge University Press. Campbell, James E. 1982. “Cosponsoring Legislation in the U.S. Congress.” Legislative Studies Quarterly 7(August):415–422. Carlin, Bradley P., and Thomas A. Louis. 1996. Bayes and Empirical Bayes Methods for Data Analysis. London: Chapman & Hall. Chib, Siddhartha, Edward Greenberg, and Rainer Winkelmann. 1998. “Posterior Simulation and Bayes Factors in Panel Count Data Models.” Journal of Econometrics 86(September):33–54. Chib, Siddhartha, and Rainer Winkelmann. 1999. “Markov Chain Monte Carlo Analysis of Correlated Count Data.” Olin School of Business Working Paper, Washington University. Christiansen, Cindy L., and Carl N. Morris. 1997. “Hierarchical Poisson Regression Modeling.” Journal of the American Statistical Association 92(June):618–632. Cox, Gary W., and Mathew D. McCubbins. 1993. Legislative Leviathan: Party Government in the House. Berkeley: University of Calfornia Press. Efron, B., and C. Morris. 1973. “Combining Possibly Related Estimation Problems.” Journal of the Royal Statistical Society B 35(3):379–421. Gelfand, Alan E., and Adrian F. M. Smith. 1990. “Sampling-Based Approaches to Calculating Marginal Densities.” Journal of the American Statistical Association 85(December):398–409. Gelman, Andrew, John B. Carlin, Hal S. Stern, and Donald B. Rubin. 1995. Bayesian Data Analysis. London: Chapman & Hall. Gelman, Andrew, and Gary King. 1990. “Estimating the Consequences of Legislative Redistricting.” Journal of the American Statistical Association 85(June):274–282. Geman, S., and D. Geman. 1984. “Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images.” IEEE Transactions of Pattern Analysis and Machine Intelligence 6(November):721–741. Gilks, W. R., S. Richardson, and D. J. Spiegelhalter. 1996. Markov Chain Monte Carlo in Practice. London: Chapman & Hall. 24 Gilks, W. R., and P. Wild. 1992. “Adaptive Rejection Sampling for Gibbs Sampling.” Applied Statistics 41(2):337–248. Greenwood, Major, and G. Udny Yule. 1920. “An Inquiry into the Nature of Frequency Distributions Representative of Multiple Happenings with Particular Reference to the Occurrence of Multiple Attcks of Disease or of Repeated Accidents.” Journal of the Royal Statistical Society 83(March):255–279. Jackman, Simon. 2000. “Estimation and Inference via Bayesian Simulation: An Introduction to Markov Chain Monte Carlo.” American Journal of Political Science 44(April):369–398. Jones, Bradford S., and Marco R. Steenbergen. 1997. “Modeling Multilevel Data Structures.” Paper presented at the Annual Meeting of the Political Methodology Section of the American Political Science Association. Kessler, Daniel, and Keith Krehbiel. 1996. “Dynamics of Cosponsorship.” American Political Science Review 90(September):555–566. King, Gary. 1988. “Statistical Models for Political Science Event Counts: Bias in Coventional Procedures and Evidence for the Exponential Poisson Regression Model.” American Journal of Political Science 32(August):838–863. King, Gary. 1989a. “A Seemingly Unrelated Poisson Regression-Model.” Sociological Research and Methods 17(February):235–255. King, Gary. 1989b. “Variance Specification in Event Count Models: From Restrictive Assumptions to a Generalized Estimator.” American Journal of Political Science 33(August):762–784. King, Gary, Ori Rosen, and Martin A. Tanner. 1999. “Binomial-beta Hierarchical Models for Ecological Inference.” Sociological Research and Methods 28(August):61–90. King, Gary, Michael Tomz, and Jason Wittenberg. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” American Journal of Political Science 44(April):347–361. Krehbiel, Keith. 1995. “Cosponsors and Wafflers from A to Z.” American Journal of Political Science 39(November):906–923. Martin, Andrew D., and Christina Wolbrecht. 2000. “Partisanship and Pre-Floor Behavior: The Equal Rights and School Prayer Amendments.” SUNY Stony Brook Department of Political Science Working Paper. Poole, Keith T., and Howard Rosenthal. 1997. Congress: A Political-Economic History of Roll-Call Voting. Oxford: Oxford University Press. Quinn, Kevin M., Andrew D. Martin, and Andrew B. Whitford. 1999. “Voter Choice in Multi-Party Democracies: A Test of Competing Theories and Models.” American Journal of Political Science 43(October):1231–1247. Regens, James L. 1989. “Congressional Cosponsorship of Acid Rain Controls.” Social Science Quarterly 70(June):505–512. Schiller, Wendy J. 1995. “Senators as Political Entrepreneurs: Bill Sponsorship to Shape Legislative Agendas.” American Journal of Political Science 39(February):186–203. Schofield, Norman J., Andrew D. Martin, Kevin M. Quinn, and Andrew B. Whitford. 1998. “Multiparty Electoral Competition in the Netherlands and Germany: A Model Based on Multinomial Probit.” Public Choice 97(December):257–293. 25 Scollnik, David P. M. N.D. “Actuarial Modeling with MCMC and BUGS.” Actuarial Research Clearing House, forthcoming. Smith, Alastair. 1999. “Testing Theories of Strategic Choice: The Example of Crisis Escalation.” American Journal of Political Science 43(October):1254–1283. Spiegelhalter, David J., Andrew Thomas, Nicky Best, and Wally R. Gilks. 1997. BUGS 0.6: Bayesian Inference Using Gibbs Sampling. Cambridge: MRC Biostatistics Unit. Tanner, M. A., and W. Wong. 1987. “The Calculation of Posterior Distributions by Data Augmentation.” Journal of the American Statistical Association 82(June):528–550. Western, Bruce. 1998. “Causal Heterogeneity in Comparative Research: A Bayesian Hierarchical Modelling Approach.” American Journal of Political Science 42(October):1233–1259. Western, Bruce, and Simon Jackman. 1994. “Bayesian Inference for Comparative Research.” American Political Science Review 88(June):412–423. Wolbrecht, Christina. 2000a. “Female Legislators and the Women’s Rights Agenda.” Paper presented at the Women Transforming Congress Congress, Carl Albert Center, April 13-15 2000. Wolbrecht, Christina. 2000b. The Politics of Women’s Rights: Parties, Positions, and Change. Princeton: Princeton University Press. Young, Cheryl D., and Rick K. Wilson. 1997. “Copsonsorship in the United States Congress.” Legislative Studies Quarterly 22(February):25–43. Zorn, Christopher J. W. 1998. “An Analytic and Empirical Examination of Zero-Inflated and Hurdle Poisson Specifications.” Sociological Research and Methods 26(February):368–400. 26

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Bayesian Inference for Heterogeneous Event Counts