Download Bayesian Inference for Heterogeneous Event Counts

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistical inference wikipedia , lookup

Transcript
Bayesian Inference for Heterogeneous Event Counts
Andrew D. Martin∗
Department of Political Science
Washington University
Campus Box 1063
One Brookings Drive
St. Louis, MO 63130
[email protected]
April 19, 2000
Abstract
This paper presents a handful of Bayesian tools one can use to model heterogeneous event counts.
In many political science applications we are interested in modeling the number of times a particular
event takes place. While models for event count cross-sections are now widely used in political science
(King, 1988, 1989b), little has been written about how to model counts when contextual factors introduce
heterogeneity. I begin with a discussion of Bayesian cross-sectional count models and introduce an
alternative model for counts with overdispersion. To illustrate the Bayesian framework, I model event
counts of the number of discharge petitions from the 61 st to the 105th House, and the number of women’s
rights bills cosponsored by each member in the 92 nd House. I then generalize the model to allow for
contextual heterogeneity and posit a hierarchical Poisson regression model, fitting this model to the
number of women rights cosponsorships for each member of the 83 rd to 102nd House. I demonstrate
the advantages of this approach over pooled and independent Poisson regressions. The hierarchical
model allows one to explicitly model contextual factors and test alternative contextual explanations.
Additionally, I discuss software one can use to easily implement these models with little start-up cost.
∗
Paper prepared for presentation at the Midwest Political Science Association Annual Meeting, Chicago, Illinois, April 27-30,
2000. The author is an Assistant Professor of Political Science at Washington University. I would like to thank Kyle Saunders
for his comments, and Kevin Quinn for many helpful conversations. I am particularly indebted to Christina Wolbrecht for sharing
her valuable dataset with me, for directing me substantively to the study of pre-floor behavior in the House, and for introducing me to my wife. All models presented here were estimated using Classic BUGS 0.6 on a Linux workstation (Spiegelhalter
et al., 1997). Soon I will make available BUGS code (and sample datasets) to estimate these models available at homepage:
http://artsci.wustl.edu/∼admartin/. This is an extremely preliminary draft and I would appreciate any comments.
1
Introduction
Heterogeneity is what makes politics, and for that matter statistics, interesting. However, until recently, little
attention has been paid to modeling political science data observed in heterogeneous clusters (some notable
exceptions include Beck and Katz, 1995; Western, 1998). Social explanations throughout the discipline
predict that identical individuals will behave quite differently in different contexts. For example, rational
choice accounts of political behavior posit that individuals will act in accordance with an equilibrium which
is in part a function of the other actors in the game. In short, individuals with the same preferences and
the same characteristics are predicted to behave differently in different contexts. In this paper, I review and
introduce a type of statistical model one can use to model counts of observed political behavior that allow
the researcher to model and test different explanations of heterogeneity.
For well over a decade, political scientists have been interested in modeling the number of times a particular
event occurs. King (1988) introduced the discipline to the Poisson regression model, and later developed
the generalized event count (GEC) model (King, 1989b). These models have been widely used throughout
the discipline and applied to problems in international relations, American politics, comparative politics,
and political economy. Yet, these applications have almost universally been used for homogeneous event
counts; that is, individual cross-sections when all observations are assumed to be exchangeable. In many
applications, this assumption is not tenable. For example, suppose the I am interested in modeling the
number of times local school boards issued bonds for infrastructure projects in the 1990s, and have collected
data from every school board in every state. If I were to pool every observation into a single sample, I would
be implicitly assuming that upon knowing the relevant characteristics of each school district, contextual
state-by-state factors are unimportant. This assumption would be inappropriate in this example because state
laws and funding mechanisms vary widely. The hierarchical model I introduce here relaxes this assumption;
not only would it allow me to control for important contextual factors that might impact bond issues, it also
allows me to test competing contextual explanations.
The purpose of this paper is to develop a general modeling strategy one can use to draw inferences from
heterogeneous event counts. For practical reasons detailed below, I adopt a Bayesian approach and use
Markov chain Monte Carlo (MCMC) techniques for estimation. I begin by discussing Bayesian inference
for cross-sectional counts in Section 2. This section includes the standard Poisson regression model, as well
as a model that allows for overdispersion in observed event counts. The section also includes an introduction
to the software used throughout this paper. In Section 3, I present results from two case studies: the number
of discharge petitions filed from the 61 st to the 105th Congress and the number of women’s rights bills
cosponsored by every member of the 92 nd House. I then turn my attention to heterogeneous counts and
offer a hierarchical Poisson regression model suitable for this type of data. In Section 5, I turn my attention
to my final case study: the number of women’s rights cosponsorships for every member of the 83 rd to the
102nd House. In this section, I compare alternative modeling strategies, and demonstrate the usefulness of
hierarchical models. The final section concludes with a discussion of other types of models one can use for
heterogeneous event counts.
2
Bayesian Inference for Homogeneous Event Counts
Before I turn my attention to heterogeneous event counts, it is important to cover Bayesian inference for
homogeneous counts. These models are most commonly applied to single cross-sections of data, and require
the statistical assumption that each observation is conditionally independent, and is thus exchangeable.
These models have been used extensively in political science for a variety of applications. For a good
introduction to event count models, I refer the reader to Cameron and Trivedi (1998). King (1988, 1989b)
also provides introductions to cross-sectional models geared toward political scientists. In the remainder of
this section, I discuss cross-sectional event count models that allow for equidispersion and overdispersion in
observed counts.
Poisson Regression and Bayesian Inference
Event count models were first introduced in political science by King (1988), who demonstrates many advantages of what he terms the exponential Poisson regression model over multiple regression. The Poisson
distribution is a discrete distribution defined on the non-negative integers and can be derived from distributions of waiting times (Cameron and Trivedi, 1998, pp. 6-8). The Poisson regression model is used to
model a set of counts y i ∈ 0, 1, 2, . . ., ∞ = Z∗ on the non-negative integers for a set of observations
i = 1, . . ., n. We further observe a row vector of explanatory variables x i of dimensionality (1 × j). The
Poisson regression model is thus:
yi ∼ Poisson(λi)
(1)
λi = exp(xi β)
We are interested in estimating the parameter vector β which has dimensionality (j × 1). King (1988)
estimates this model using maximum likelihood. Indeed, Equation 1 defines the likelihood, which is simply
the product of Poisson densities across all observations i.
The approach I take here is to estimate this model from a Bayesian perspective. Fundamentally, Bayesians
assume that the data is fixed and that parameters vary. Inference, then, is conducted by computing the
probability density of the parameters conditioned on the data observed. The alternative frequentist (or
classical) approach, on the other hand, assumes that parameters are fixed and unknown. One thus finds
the parameters most likely to have generated the data by maximizing the likelihood. These approaches are
inextricably related through Bayes theorem. For the Poisson regression model, I am interested in:
f (β|y) ∝ f (y|β) f (β)
The distribution f (β|y) is called the posterior density, and us the probability density of β conditioned on
observing the data. The posterior summarizes our a posteriori beliefs about the parameter vector β after
having observed the data. From it, one can derive many probability statements, including those about the
probability a particular β j differs from zero. The posterior is proportional to the likelihood f (y|β) times the
prior f (β).
In most applications, the posterior density f (β|y) does not take a closed form. However, there are a set of
algorithms one can use to simulation from the posterior density. In so doing, one can make posterior inferences to any level of precision, depending only the computation time (which, thankfully, has gotten very
inexpensive). The algorithms employed here are Markov chain Monte Carlo (MCMC) techniques, intro3
duced by Geman and Geman (1984), Tanner and Wong (1987), and Gelfand and Smith (1990). For general
introductions to MCMC methods I recommend Carlin and Louis (1996) and Gelman et al. (1995). Jackman
(2000) has also published an excellent introduction to MCMC geared specifically to political scientists. To
date, MCMC estimation has been sporadically employed in political science. Gelman and King (1990) are
credited with the first application of MCMC in political science, and use a Gibbs sampling algorithm to
assess the impact of legislative redistricting. Other examples include: Western (1998) who uses MCMC
to estimate hierarchical models of political determinants of economic growth; Schofield et al. (1998) and
Quinn et al. (1999) who estimate multinomial probit models of voter choice using MCMC; Smith (1999)
who estimates an ordered probit model with strategic censoring to model crisis escalation; and King et al.
(1999) who use MCMC to estimate a hierarchical ecological inference model.
As with any Bayesian model, it is necessary for the practitioner to formally posit her prior beliefs. To
complete the Poisson regression model, I assume normal independent priors for each element of the β
vector βj : βj ∼ N ormal(µ0 , σ02).1 In most applications, one uses noninformative priors. That is, priors
that contribute little information about the parameters of interest [although, there are cases when using
informative priors is quite valuable (Western and Jackman, 1994)]. In the models presented here, I assume
noninformative priors. Thus, for each β j in Poisson regression I assume β j ∼ N ormal(0, 104), which is
an extremely flat distribution that contributes little information. 2
Estimating statistical models using MCMC algorithms require two steps that are oftentimes quite difficult.
Most common MCMC techniques, including the Gibbs sampler and the Metropolis-Hastings algorithm,
require the practitioner to derive the full conditional distributions. In most cases, this algebra is far from
trivial, and beyond the interest of most political scientists. The other difficulty is programming a computer
to sample from the posterior density. To date, most commonly used statistical packages lack facility for
MCMC estimation. Thus, programming in a high level language like Gauss, Ox, or S-Plus is oftentimes
required to estimate these models. There is one freely available software package whose sole purpose is the
estimation of Bayesian models. This software is called BUGS which stands for Bayesian inference Using
Gibbs Sampling (Spiegelhalter et al., 1997). 3 BUGS eliminates many of the start-up costs of performing
Bayesian inference, and requires the practitioner to only write down the probability model and specify the
priors. BUGS then chooses an appropriate updating algorithm to simulate from the posterior distribution. I
refer the reader to Appendix A for a brief description of BUGS, and code for the models discussed in this
paper.
Other priors could just as easily be applied. One could assume a multivariate prior β ∼ N j (β0 , B0−1 ), or an improper uniform
prior on . Unless using informative priors, this choice usually makes very little difference in practice. If it does make a difference
in a particular application, the prior, not the likelihood, is driving the inference.
2
Additionally, for all of the models presented below, I tested for robustness of the results to various prior specifications. For
all models, the substantive conclusions are robust to other prior specifications, including informative priors. I have also tested for
convergence using various techniques implemented in CODA, and am confident that each chain has reached a steady state.
3
This software is available on the web for Windows, Unix, and Linux from the MRC Biostatistics Unit and the Imperial College
School of Medicine at St Mary’s, London at http://www.mrc-bsu.cam.ac.uk/bugs/. There are many helpful examples
at this website. Additionally, Jackman (2000) has posted a handful of useful BUGS applications geared toward a political science
audience at http://tamarama.stanford.edu/mcmc/. Gilks et al. (1996) also includes some examples of models estimated with BUGS. Beginning in Summer 2000, I will being posting useful BUGS and MCMC code, including all code used in this
paper, at http://artsci.wustl.edu/∼admartin/.
1
R
4
Poisson Regression with Overdispersion
The Poisson regression model has been used extensively in political science, but there is a well-known deficiency of the model: the expected value (mean) of the distribution equals its variance (Cameron and Trivedi,
1998, pp. 9-10). There are many observed counts, however, where positive or negative contagion may occur,
and cause observed counts to be overdispersed (having variance greater than the mean), or underdispersed
(having variance less than the mean). It is important to note that heterogeneity at the individual level will
cause counts to be under- or overdispersed (see King, 1989b). Thus, models that allow for under- or overdispersion allow the researcher to model unobserved heterogeneity within a single cross-section. By neglecting
overdispersion and fitting a model that requires equidispersion, standard errors will be inconsistent and inefficient, thus undermining inference.
When faced with overdispersion, the traditional approach is to mix a gamma distribution with mean zero
with the Poisson model, thus yielding a model with overdispersion. By the Greenwood and Yule (1920)
compounding rule, this results in the negative binomial regression model, which can be estimated using
maximum likelihood. For underdispersion, one must truncate the counts, which yields what King terms
the continuous parameter binomial model (1989b). In the case when the existence of overdispersion or
underdispersion is not known with certainty, King (1989b) offers a generalized event count (GEC) model
that allows the researcher to estimate the amount of dispersion without assuming a priori what type exists.
In practice, most political science event counts tend to exhibit equidispersion or overdispersion (but see
King, 1989b, for some examples of underdispersion).
In this paper, I will focus only on models that allow for overdispersion. However, instead of mixing the
Poisson with the gamma distribution, I offer an alternative model where I mix the Poisson with the lognormal distribution. This model was offered by Breslow (1984). Here we have the same dependent variable
yi , the same number of observations n, and the same row vector of covariates x i . The only difference is
that now I include a random effect, which has zero mean (thus not contributing to the count), but adds to the
variance of yi . I will call these effects η i. The Poisson regression model with overdispersion can now be
written:
yi ∼ Poisson(λi)
(2)
λi = exp(xiβ + ηi )
ηi ∼ N ormal(0, τ −1)
τ is a scalar that captures the precision (inverse variance) of the random effect, thus indicates how much (or
little) overdispersion there is. Again, to estimate this model in the Bayesian framework I assume independent
priors for each βj : βj ∼ N ormal(µ0 , σ02). I also assume a gamma prior for the precision parameter
τ : τ ∼ Gamma(ν0 , δ0 ). For the models presented here, I assume noninformative priors for τ : τ ∼
Gamma(0.001, 0.001) which has its mass to the immediately to the right of zero and is decreasing on R + .
This model is easily estimable in BUGS; see the appendix for the code employed. Note that this is one
of many models one could use when facing overdispersion or extra-Poisson variation (see Cameron and
Trivedi, 1998, pp. 103-104 or Albert and Pepple, 1989).
5
Case Studies: Pre-Floor Behavior in the House
To illustrate Bayesian inference for cross-sectional event count models, I use two datasets relating to prefloor behavior in the House of Representatives. In the first, I model the number of discharge petitions filed
in every Congress from the 61 st to the 105th. The second case study is cosponsorship on women’s rights
legislation in the 92 nd House – the Congress that passed the Equal Rights Amendment.
The Use of the Discharge Procedure, 1910-1998
After the revolt against Speaker Joseph Cannon in 1910, the 61 st House of Representatives instituted a discharge procedure. This provision, which has remained a part of House rules to date, is the only mechanism
by which any majority can force the floor to consider legislation without approval from the committee of
jurisdiction, the party leadership, or the Rules Committee. It thus plays a very important institutional role,
as it allows groups of members to audit the committee system as well as their party leadership. Yet, as noted
by Beth (1990), the discharge petition is used infrequently. Not only is there the collective action problem
of obtaining the requisite number of signatures on the petition, but there are costs borne by the member
by auditing committees, and perhaps breaking log-rolling arrangements. While theoretically the discharge
provision plays a very important role, little has been written explicitly about the procedure. Beth (1990)
provides a history of the procedure, and tabulates important data about the frequency of its use. Krehbiel
(1995) models discharge petition bargaining over a discharge petition in the 104 th House and finds that policy preferences alone explain who signed the “A-to-Z Spending Bill” petition, although Binder et al. (1999)
find a party effect when employing a different measurement strategy. Martin and Wolbrecht (2000) find
party and preference impacts on discharge petition bargaining on two petitions in the pre-Reform House.
Yet, to date, little has been done to explain the frequency of discharge petition use. To gauge the importance
of the discharge procedure as an auditing mechanism, it is important to understand the frequency of its use. 4
To wit, I have collected data from Beth (1990) which contains the number of discharge petitions filed in
each Congress from the 61st to the 105th (n = 45). This is event count data, and thus the two models presented above are appropriate. This dataset has an extremely small sample size. One advantage of Bayesian
inference over classical inference is that one does not need to rely on asymptotic theory to make claims
about statistical significance. Indeed, if one were to rely on maximum likelihood estimation for this dataset,
the standard errors would be of dubious value. By adopting a Bayesian approach, however, I am able to
simulate directly from the posterior density without relying on an asymptotic justification.
The dependent variable y i is simply the number of discharge petitions filed with the Clerk of the House in
each Congress since the inception of the rule. I have chosen a set of theoretically important explanatory variables. First is chamber preference heterogeneity, which I operationalize using the standard deviation of the
D-Nominate scores for each Congress (Poole and Rosenthal, 1997). If the chamber is more heterogeneous,
the more likely it is for members to be dissatisfied with committee and/or party leadership consideration
of legislation, and thus file more discharge petitions, holding all else constant. Divided government is also
likely to impact the number of discharge petitions filed. Since the purpose of the discharge procedure is to
4
Although, the procedure might play an equally important institutional role when not used by keeping outlier committees from
“misbehaving.” Assessing the importance of the procedure thus requires a rich theoretical understanding, supplemented with data
analysis.
6
Parameter
Equidispersion Model
Post.
Post.
90% BCI
Mean
SD
Lower Upper
Overdispersion Model
Post.
Post.
90% BCI
Mean
SD
Lower Upper
β1 -Constant
β2 -Preference Heterogeneity
β3 -Divided Government
β4 -Number of Committees
β5 -Majority Party Size
β6 -Rule Regime
2.822
0.689
-0.125
0.303
0.165
-0.454
2.639
0.646
-0.082
0.458
0.150
-0.731
0.114
0.205
0.138
0.213
0.126
0.210
2.446
0.322
-0.310
0.105
-0.059
-1.076
2.825
1.002
0.141
0.810
0.356
-0.384
2.053
0.525
0.556
0.149
1.244
0.327
3.059
0.803
τ (Precision)
τ −1 (Variance)
Burn In Iterations
MCMC Iterations
n
0.038
0.033
0.041
0.063
0.035
0.054
2.745
0.624
-0.205
0.185
0.095
-0.559
n.a.
n.a.
10000
25000
45
2.896
0.754
-0.044
0.427
0.232
-0.346
10000
25000
45
Table 1: Summary of the posterior densities from Poisson regressions of discharge petition filings, 19101998. The equidispersion model is written in Equation 1, and the overdispersion model is written in Equation
2. The abbreviation Post. Mean denotes the posterior mean, Post. SD the posterior standard deviation, and
90% BCI the 90% Bayesian Credible Interval. This interval summarizes the central 90% of the posterior
density. Noninformative priors for the β j and precision τ are assumed: β j ∼ N ormal(0, 104) and τ ∼
Gamma(0.001, 0.001).
ultimately effect policy change, petitions are less likely to be filed when the president and the majority party
in the House are of different party. I thus include a dummy variable that designates divided government.
The number of standing committees is also likely to impact the number of discharge petitions filed; the
more committees, the more opportunity for members to be dissatisfied with committee behavior. I include
the number of standing committees as an explanatory variable. Note that this variable captures the Legislative Reorganization Act of 1946, where the number of committees drops substantially from the forties to
the twenties. The number of committees should positively impact the number of discharge petitions filed.
Partisanship is also likely to impact the frequency of discharge petition filing. The larger the majority party,
the more likely majority members will be dissatisfied with their leadership, and thus will be more likely to
file petitions to audit their leadership. I include the majority party size, which equals the seat differential
between the majority and minority party as an explanatory variable. The final variable is a dummy variable which takes a value of one between 1910 and 1931, and zero thereafter. This takes into account the
major rule change that occurred in 1931 which made the procedure much easier to use (Beth, 1990). The
expectation is that in this rule regime, the number of discharge petitions filed will be lower, all things being
equal.
I have fit a Poisson regression and a Poisson regression with overdispersion to this data. 5 The results are
5
Because these are time series data, there may be an autoregressive error structure. However, upon examining the data and
7
presented in Table 1. In this table, I report the posterior mean and standard deviation for each coefficient.
Each of these can be interpreted like the familiar parameter estimates and standard errors in likelihood
inference. I also report the 90% Bayesian Credible Intervals (BCI), which summarizes the central 90% of
the posterior densities. I have chosen 90% as my threshold for significance because of the small sample
size. For the Poisson regression model, I note that all of the coefficients are in the hypothesized direction,
and all are significantly different from zero. For the overdispersion model, however, this is no longer the
case. First, it is important to note that overdispersion is likely a problem in this data. Indeed, the variance of
the random effect (τ −1 ) has a BCI [0.327, 0.803]. For the overdispersion model, preference heterogeneity,
the number of committees, and the rule regime variable remain significant. 72.5% of the posterior density
sample of the divided government variable is negative, and 88.1% of the posterior density of the majority
party size variable are positive. One nice aspect of interpreting results from Bayesian models is that one
can state the precise probability that a parameter is greater than or less than zero. 6 This is weak support
for these hypotheses, but not conclusive. The purpose of this analysis is to demonstrate how to fit simple
cross-sectional models in a Bayesian framework. The appendix contains sample BUGS code used for these
models.
Women’s Rights Cosponsorships in the 92 nd House
In the second case study I turn my attention to cosponsorship on women’s rights legislation in the 92 nd
House of Representatives. Because it was the Congress that passed the Equal Rights Amendment, the 92 nd
is particularly important. In this example I employ data collected and previously analyzed by Wolbrecht
(2000a,b). See Wolbrecht (2000b) for a detailed description of the data collection and coding scheme. The
dependent variable y i is the number of pro-women’s rights bills cosponsored by each member of the 92 nd
House. Thus, the member of Congress is the unit of analysis. I am interesting in seeing what explains these
counts of congressional behavior.
The legislative studies literature suggests that cosponsorship is primarily a preference based activity geared
toward position taking (Campbell, 1982; Regens, 1989) and intra-legislative politics (Schiller, 1995; Kessler
and Krehbiel, 1996). While the implications of these explanations for the timing of cosponsorships differ,
the implication after the intra-legislative game has played out is clear: members tend to cosponsor legislation close to their preferred policy position. Thus, I include a measure of general left-right ideology as an
explanatory variable: liberal members should cosponsor more women’s rights legislation than conservative
members. I include each member’s D-Nominate first dimension score as a measure of left-right tendency
(Poole and Rosenthal, 1997). 7 It is also likely that partisanship may impact women’s rights cosponsorships.
Indeed, majority party leaders may compel their members to cosponsor legislation (Cox and McCubbins,
1993) as a tool for highlighting bill importance to constituents. I thus include a party dummy variable as an
results from linear regression models, this does not seem to be a problem. Even if it were, the small sample size makes alternative
models computationally infeasible. Brandt et al. (2000) provide a survey of time series models that can be used for event count
data.
6
In the classical paradigm, all parameters are fixed but unobservable. Thus, the probability that a parameter is positive equals
zero or one; it just cannot be observed. This requires mental gymnastics to interpret confidence intervals. 90% Bayesian Credible
Intervals, on the other hand, can simply be interpreted as intervals where the parameter falls 90% of the time.
7
Clearly there are measures that better capture women’s right preferences (see Martin and Wolbrecht, 2000). However, I employ
these scores here because they are available for a long period of time. This becomes particularly important for the hierarchical
model, discussed below.
8
Parameter
Equidispersion Model
Post.
Post.
95% BCI
Mean
SD
Lower Upper
Overdispersion Model
Post.
Post.
95% BCI
Mean
SD
Lower Upper
β1 -Constant
β2 -Preferences
β3 -Party
β4 -Gender
-0.672
-2.712
-0.113
1.194
-0.847
-2.581
-0.083
1.194
0.151
0.308
0.203
0.262
-1.150
-3.177
-0.476
0.676
-0.560
-1.979
0.315
1.711
3.104
0.360
1.140
0.115
1.618
0.165
6.058
0.617
τ (Precision)
τ −1 (Variance)
Burn In Iterations
MCMC Iterations
n
0.130
0.256
0.180
0.165
-0.937
-3.217
-0.458
0.862
n.a.
n.a.
10000
25000
440
-0.426
-2.201
0.245
1.513
10000
25000
440
Table 2: Summary of the posterior densities from Poisson regressions of sponsorships on women’s rights
legislation in the 92 nd House. The equidispersion model is written in Equation 1, and the overdispersion
model is written in Equation 2. The abbreviation Post. Mean denotes the posterior mean, Post. SD the
posterior standard deviation, and 95% BCI the 95% Bayesian Credible Interval. This interval summarizes
the central 95% of the posterior density. Noninformative priors for the β j and precision τ are assumed:
βj ∼ N ormal(0, 104) and τ ∼ Gamma(0.001, 0.001).
explanatory variable (that takes a value one for Democrats, and zero for Republicans). Wolbrecht (2000b)
argues that prior to the passage of the ERA, the Republican party was more supportive of women’s rights
than the Democratic party. Thus, I expect Republicans to be more supportive of the ERA after controlling
for policy preferences. Finally, I hypothesize that gender may impact the number of women’s rights cosponsorships. Simply put, because of their greater awareness of and interest in women’s rights issues, I expect
women to be more likely to cosponsor women’s rights legislation than men. I code the gender dummy
variable one for women and zero for men.
In Table 2, I present results from the Poisson regressions of women’s rights cosponsorships on the explanatory variables noted above. Here I report 95% Bayesian Credible Intervals because of the larger sample size.
For both the equidispersion and overdispersion models preferences are statistically significant; the more
liberal the member of Congress, the more women’s rights bills they are likely to cosponsor. Gender is also
statistically significant in both models: women tend to cosponsor are a higher rate than men. In both models, party is insignificant after controlling for preferences. This suggests that this position taking behavior
seems to be primarily preference driven. There seems to be slight overdispersion in this model, although its
magnitude is not as large as that in the discharge petition model.
Posterior Predictive Densities
When performing any statistical analysis it is always useful to take coefficient estimates and transform them
in a manner that is substantively meaningful. Perhaps the most effective way to do this is to use the parameter
9
20
Cosponsorships
15
10
5
0
ConDM
ConDW
ConRM
ConRW
LibDM
LibDW
LibRM
LibRW
ModDM ModDW ModRM ModRW
Hypothetical Member
Figure 1: Boxplots of posterior predictive densities for hypothetical members of the 92 nd House. These
densities are for the equidispersion model. Con denotes conservative, Mod denotes moderate, and Lib
denotes liberal. Party membership is represented D for Democrats and R for Republicans. Gender is denoted
M for men and W for women.
estimates to predict out-of-sample behavior. In the Bayesian framework, one is interested in the posterior
predictive density. For some out-of-sample observations ỹ, the posterior predictive distribution for a Poisson
regression model is simply:
f (ỹ|y) =
f (ỹ, β|y)dβ =
f (ỹ|β)f (β|y)dβ
This average of the conditional predictions f (ỹ|β) over the posterior distribution f (β|y) can be evaluated
using MCMC techniques. In fact, the distribution of any function of β can be estimated using MCMC
methods. Interestingly, this is the precise approach used by King et al. (2000) in the context of frequentist
inference.
To wit, I have computed the posterior predictive densities for hypothetical members of Congress. I am
specifically interested in liberal, moderate, and conservative members of both parties and both genders. I
define a typical liberal as someone two standard deviations below the sample mean on the D-Nominate
scale (-0.627), a moderate equal to the sample mean (-0.027), and a conservative as someone two standard
deviations above the sample mean (0.573). I plot these densities in Figure 1. Many important patterns can
be discerned from this figure. First, liberal women, whether Democrats or Republicans, tend to cosponsor
women’s rights legislation most. The posterior predictive mean for both is approximately nine cosponsorships. After liberal women, liberal men cosponsor women’s rights legislation next most often, although
there is a definite drop-off to a little over two cosponsorships. Moderate women of both parties are next
10
most likely to cosponsor, followed by moderate men. As the statistical results demonstrate, there is little difference between members of each party. Preferences and gender seem to be driving cosponsorship behavior
in the 92nd House.
Modeling Heterogeneous Event Counts: Hierarchical Poisson Regression
So far I have presented results from commonly used models that can estimated in standard software packages using maximum likelihood estimation. Besides philosophical claims about the importance of subjective probability, what is the advantage of Bayesian inference? In this section, I argue that the advantage of
Bayesian approaches generally, and hierarchical (or multilevel) models specifically, is that they allow extremely flexible modeling of heterogeneous data. Many of these models are computationally impossible to
estimate (or very nearly so) in classical settings. Moreover, to estimate these using frequentist methods both
the number of observations and the number of clusters must approach infinity to trust the standard errors.
In most settings, neither of these conditions are met. Bayesian inference using MCMC, on the hand hand,
provides a computationally efficient way to estimate hierarchical models for heterogeneous data.
Assume now that we observe data from k different contexts (or clusters). Within each cluster we we have
observations: i = 1, . . . , n k . Note that there may be a different number of observations n k in each cluster.
The dependent variable y i,k ∈ 0, 1, 2, . . ., ∞ = Z∗ is an event count variable as discussed above. What
sort of model should be apply to this data? One approach would be to pool all of the observations into one
sample. This would yield
yi,k ∼ Poisson(λi,k )
λi,k = exp(xi,k β)
which is the Poisson regression model discussed above. For this model, however, there is an implicit assumption that an individual i with the same vector of characteristics x i,k would behave the same in each
decision context k. That is, pooling the data requires one to assume behavioral homogeneity. If individuals
are hypothesized to behave differently in different contexts, pooling data in this manner is inappropriate. In
some applications this assumption is justifiable, yet what happens if contextual factors impact behavior?
An alternative model would be to estimate independent models for each context k. So, one would estimate
yi,k ∼ Poisson(λi,k )
λi,k = exp(xi,k βk )
which is simply k separate Poisson regressions. This is a viable strategy, but is throws away the baby with the
bathwater. If contextual factors are important, one would ideally model those contextual effects as opposed
to ignoring them by estimating individual models. By dividing the data in this fashion, drawing inferences
about contextual factors is impossible.
The common ground between these approaches is the Bayesian hierarchical model. This modeling strategy assumes that there are some commonalities between each context k, but that there may exist profound
differences. The approach allows the practitioner to explicitly model these differences. In so doing, one
is able to draw inferences that dominate fully pooled and completely separated models on a mean square
error basis (Efron and Morris, 1973). While common in the statistics literature, hierarchical models have
11
only been sporadically used in the political science literature. Two notable applications are Western (1998),
who uses a hierarchical model to model political determinants of economic growth, and King et al. (1999)
who use hierarchical models to perform ecological inference. Hierarchical models are similar to the idea
of fractional pooling (Bartels, 1996), where each cluster can “borrow strength” from other clusters to yield
highly efficient parameter estimates. Hierarchical models are not a unique to the Bayesian approach. Jones
and Steenbergen (1997) provide a comprehensive introduction to multilevel modeling from a classical standpoint. The major difficulty with hierarchical models is that frequentist inference is computationally difficult,
and is based on assumptions that are untenable in most applied settings (namely, that n k → ∞ and k → ∞).
The hierarchical model looks strikingly like the previous equation, however now we explicitly mode the
distribution of β k – the column vector of parameters in each decision context. The first level of the model is
within each cluster, and the second level includes contextual variables. The hierarchical Poisson regression
model is thus:
yi,k ∼ Poisson(λi,k )
λi,k = exp(xi,k βk )
βk,1 ∼ N ormal(zk α1 , ξ1−1 )
βk,2 ∼ N ormal(zk α2 , ξ2−1 )
..
..
..
.
.
.
βk,j ∼ N ormal(zk αj , ξj−1 )
(3)
In this formulation, I have assumed that each individual element of β k can be modeled by independent
regressions with a row vector of cluster-level covariates z k , of dimensionality (1 × p j ). Note that each
of these regressions need not include the same number of explanatory variables. 8 Notice that the secondlevel specification serves as the priors for the first-level parameters. Thus, one only needs to specify the
priors on the α j,1 to αj,pj for all j and the ξ j parameters. In this application, I assume standard priors:
2
) and for the precisions: ξ j ∼ Gamma(ν0,j , δ0,j )
α∗,∗ ∼ N ormal(µ0,∗,∗, σ0,∗,∗
The advantage of this model is clear. One can look at the posterior density of the parameters β k to see how
the characteristics x i,k impact behavior within each cluster k. And, one can look at the parameters α ∗,∗ to
see how contextual factors impact behavior at the first-level. In short, this modeling strategy allows one to
model and test for various causes of behavioral heterogeneity. The hierarchical Poisson regression model is
thus useful when modeling heterogeneous event counts.
Others have applied hierarchical event count models to various sorts of data. Albert (1992) fits a random
effects Poisson model to home run data to detect contextual factors that impact various seasons of twelve
baseball players. Christiansen and Morris (1997) develop a hierarchical Poisson model with a different
parameterization than that above, and provide software one can use in S-Plus to simulate from the posterior
density. Chib et al. (1998) posit an MCMC algorithm one can use to estimate from random effects panel
event count model using a unique parameterization, and apply the model to epilepsy and patent data. Berry
et al. (1999) develop an interesting set of hierarchical models that can be used to compare players in different
A more general model would be to model the distribution of the β k vectors with a multivariate regression model: β j ∼
Nj (Ak α, Ω). However, when there are many first-level parameters and few clusters, it becomes difficult, if not impossible, to
estimate Ω with precision. In the application below, with four first-level covariates and twenty contexts, Ω has ten free elements,
which would be impossible to estimate.
8
12
eras of sports. For their hockey data, they offer a hierarchical Poisson model that includes an elaborate
clustering scheme, and a random curve function to allow for aging. Scollnik (N.D.) discusses actuarial
models, and provides BUGS code for some hierarchical event count models. While these models are well
traveled in applied statistics, their potential has been untapped in political science applications.
Heterogeneity in Women’s Rights Cosponsorships in the House, 1953-1992
Earlier I fit a cross-sectional model to all women’s rights cosponsorships in the 92 nd House. Now I turn
my attention to the similar data for the 83 rd to the 102nd Congress.9 This dataset contains the number
of women’s rights bills cosponsored by every member in each of these Congresses. Clearly contextual
factors, such as the national political agenda, congressional rules, and chamber characteristics, would induce
profoundly different behavior from a member of Congress with the identical policy preference of the same
party and gender. In this section, I fit a hierarchical Poisson regression to model behavioral heterogeneity
in this women’s rights sponsorship data. Not only am I interested in explaining behavior within a particular
Congress, I am also interested in how the macropolitical factors discussed below impact congressional
behavior.
Returning to the notation used in Equation 3, here we observe behavior for k = 20 Congresses (or clusters).
Within each House, there are i = 1, . . ., nk members. Without member replacement within each Congress
(and without the brief change of the size of the House in the late 1950s) n k = 435 ∀ k. For all of these
clusters, there are approximately 440 members in each, although the number is different for each Congress.
We thus observe counts y i,k ∈ Z∗ – one for each member i in each Congress k. I use the same explanatory
variables xi,k : policy preferences measured with first dimension D-Nominate scores, a party dummy, and a
gender dummy.
The Baselines: Pooled and Independent Poisson Regressions
Perhaps congressional cosponsorship behavior is not influenced by contextual factors. If that is the case we
could pool all observations into a single sample and estimate one vector of parameters β. I have done just
that, and present results in Table 3. Looking at the BCIs, the results are quite straightforward: liberals tend
to cosponsor at a higher rate than conservatives; when controlling for preferences Republicans are more
likely to cosponsor than Democrats; and women are more likely to cosponsor than men.
However, the assumption of homogeneity is theoretically untenable. So, to serve as a second baseline, I
have estimated a set of independent Poisson regressions – one for each Congress k. I present the results
in Table 4. Two things are to be gleaned from this table. First, when examining the constants across each
decision context, it is clear that they vary: the posterior mean takes its lowest value of −2.971 in the 83 rd
House, and its maximum value of 1.738 in the 99 th House. Substantively, exponentiating these constants
gives us the expected number of bills cosponsored for a moderate Republican man. For the 83 rd, we would
expect this individual to cosponsor 0.01 bills, but in the 99 th to cosponsor 5.7 bills. Clearly, the assumption
of homogeneity is not met. Further, when examining the importance of preferences across Congresses, we
9
This data was generously provided by Christina Wolbrecht. She discusses the manner in which it is collected and presents
much of the data in her book (Wolbrecht, 2000b).
13
Parameter
Post.
Mean
Post.
SD
95% BCI
Lower Upper
β1 -Constant
β2 -Preferences
β3 -Party
β4 -Gender
0.367
-3.128
-0.890
1.193
0.020
0.041
0.031
0.025
0.329
-3.210
-0.953
1.145
Burn In Iterations
MCMC Iterations
N
10000
25000
440
0.406
-3.049
-0.831
1.243
Table 3: Summary of the posterior density of a Poisson regression of cosponsorships on women’s rights
legislation for the 82 nd to the 102nd House pooled data. The model is written in Equation 1. The abbreviation Post. Mean denotes the posterior mean, Post. SD the posterior standard deviation, and 95% BCI
the 95% Bayesian Credible Interval. This interval summarizes the central 95% of the posterior density.
Noninformative priors for the β j are assumed: βj ∼ N ormal(0, 104).
not that the preferences significantly impact behavior in all but the 86 th House. Yet, the magnitude of the
relationship varies throughout the time period, again implying behavioral heterogeneity. The same story
holds for the party and gender dummies: the β k,3 and βk,4 coefficients vary across clusters. The weakness
of this model is that while heterogeneity is apparent, we can neither model it nor test for contextual effects.
If the goal of social science is to evaluate competing social explanations, it is necessary to employ statistical
models that allow one to test contextual explanations. I thus turn my attention to a hierarchical Poisson
regression model, which does just this.
Results: Hierarchical Poisson Regression
The hierarchical model presented earlier allows one to model heterogeneity with suitable contextual variables. Clearly members of Congress have behaved quite differently in different Congresses. What explains
this heterogeneity? I hypothesize that four contextual factors are likely to impact cosponsorship behavior
on women’s rights. First, I hypothesize that the national agenda will impact cosponsorship; once women’s
rights generally, and the Equal Rights Amendment specifically, are firmly a part of the national agenda, the
number of cosponsorships should increase. This is derived from electoral considerations. Once women’s
rights became supported by a majority, more members, regardless of preferences, party, and gender, would
tend to cosponsor legislation. As argued by Wolbrecht (2000b), the women’s rights agenda was not firmly
entrenched on the national political agenda until the 92 nd Congress; the Congress which passed the Equal
Rights Amendment. Second, congressional rules are also likely to impact congressional behavior. Throughout this time period, cosponsorship rules were liberalized. Until 1967, cosponsorship was not allowed in the
House. Thus, members introduced identical legislation with their name as primary sponsor. In 1967 the rule
was changed to allow for multiple cosponsorship, allowing up to 25 members to cosponsor each bill. This
did not catch on for many years, and most members continued to introduce individual bills. This changed
14
in 1978, when the rules were again changed to allow for unlimited cosponsorship. The second rule change
was widely accepted by the membership, and began to be used immediately. Thus, after the rules change in
the 95th House, I expect cosponsorship behavior for all members to increase.
These first two contextual factors are hypothesized to explain the baseline level of cosponsorship for all
members. In terms of the hierarchical model, they should thus impact the constant. For the 92 nd Congress
and thereafter, the constant β 1,k should be bigger than before due to the change in the women’s rights agenda.
Similarly, after the rules change in the 95 th House, we would expect the baseline level of cosponsorships to
be higher, all things being equal. I thus include two dummy variables at the second level of the hierarchy,
one that takes the value one for the 92 nd to 95th House, and one that takes the value one from the 96 th to
102nd House. I use these variables to model the distribution of the constant as:
βk,1 = α1,1 + α1,2 Post 91st Dummyk + α1,3 Post 95th Dummyk + εk εk ∼ N ormal(0, ξ1−1)
The hypothesis is that α 1,2 should be positive, indicating an increased level of women’s rights cosponsorships for all members. α1,3 should also be positive, and should have a magnitude higher than α 1,2 , since it
captures the agenda effect and the effect of the rules change.
I hypothesize that two additional contextual factors impact women’s rights cosponsorships. First, I expect
that chamber heterogeneity should impact the importance of preferences. I measure chamber heterogeneity
by computing the standard deviation of the D-Nominate first dimension scores for the chamber. The more
heterogeneous the chamber, the magnitude of the preference β 2,k should decrease, and vice versa. As
preference heterogeneity increases in the chamber, the larger the agenda for all members. Additionally,
the probability of effecting policy change on any issue is more difficult in a heterogeneous chamber. It is
Parameter
Post.
Mean
Post.
SD
95% BCI
Lower
Upper
Parameter
Post.
Mean
Post.
SD
95% BCI
Lower
Upper
β1,1 -Constant
β2,1 -Constant
β3,1 -Constant
β4,1 -Constant
β5,1 -Constant
β6,1 -Constant
β7,1 -Constant
β8,1 -Constant
β9,1 -Constant
β10,1 -Constant
β11,1 -Constant
β12,1 -Constant
β13,1 -Constant
β14,1 -Constant
β15,1 -Constant
β16,1 -Constant
β17,1 -Constant
β18,1 -Constant
β19,1 -Constant
β20,1 -Constant
β1,2 -DNominate
β2,2 -DNominate
β3,2 -DNominate
β4,2 -DNominate
β5,2 -DNominate
β6,2 -DNominate
β7,2 -DNominate
β8,2 -DNominate
β9,2 -DNominate
β10,2 -DNominate
β11,2 -DNominate
β12,2 -DNominate
β13,2 -DNominate
β14,2 -DNominate
β15,2 -DNominate
β16,2 -DNominate
β17,2 -DNominate
β18,2 -DNominate
β19,2 -DNominate
β20,2 -DNominate
-2.971
-1.494
-0.179
-1.810
-0.432
-0.724
-0.755
-0.620
-0.266
-0.662
0.061
-0.767
0.372
-0.905
0.391
1.738
1.230
1.104
0.869
1.654
-2.578
-3.271
-0.764
-0.083
-1.655
-2.048
-1.283
-1.415
-0.841
-2.718
-4.336
-4.725
-3.078
-4.083
-3.592
-3.446
-3.156
-2.894
-2.881
-2.112
0.502
0.307
0.125
0.251
0.149
0.166
0.166
0.129
0.103
0.125
0.089
0.156
0.091
0.180
0.089
0.060
0.077
0.074
0.059
0.059
1.083
0.766
0.328
0.565
0.416
0.460
0.440
0.329
0.281
0.258
0.231
0.334
0.216
0.433
0.229
0.163
0.187
0.142
0.122
0.091
-4.026
-2.116
-0.424
-2.336
-0.731
-1.055
-1.099
-0.880
-0.468
-0.915
-0.121
-1.089
0.183
-1.267
0.211
1.620
1.074
0.957
0.750
1.538
-4.737
-4.798
-1.424
-1.187
-2.452
-2.938
-2.128
-2.066
-1.398
-3.216
-4.784
-5.363
-3.502
-4.909
-4.026
-3.765
-3.523
-3.166
-3.123
-2.288
β 1,3 -Party
β 2,3 -Party
β 3,3 -Party
β 4,3 -Party
β 5,3 -Party
β 6,3 -Party
β 7,3 -Party
β 8,3 -Party
β 9,3 -Party
β 10,3 -Party
β 11,3 -Party
β 12,3 -Party
β 13,3 -Party
β 14,3 -Party
β 15,3 -Party
β 16,3 -Party
β 17,3 -Party
β 18,3 -Party
β 19,3 -Party
β 20,3 -Party
β 1,4 -Gender
β 2,4 -Gender
β 3,4 -Gender
β 4,4 -Gender
β 5,4 -Gender
β 6,4 -Gender
β 7,4 -Gender
β 8,4 -Gender
β 9,4 -Gender
β 10,4 -Gender
β 11,4 -Gender
β 12,4 -Gender
β 13,4 -Gender
β 14,4 -Gender
β 15,4 -Gender
β 16,4 -Gender
β 17,4 -Gender
β 18,4 -Gender
β 19,4 -Gender
β 20,4 -Gender
-0.737
-1.551
-0.776
-0.248
-1.287
-1.173
-0.975
-0.404
-0.446
-0.126
-1.114
-0.797
-0.574
-0.796
-0.679
-1.317
-1.056
-0.597
-1.034
-0.666
2.188
0.768
0.482
1.741
0.964
1.155
0.643
-0.049
0.321
1.198
1.150
1.303
1.157
1.259
0.597
0.850
0.849
0.809
1.265
0.953
0.841
0.573
0.215
0.376
0.267
0.302
0.279
0.211
0.170
0.177
0.142
0.214
0.133
0.269
0.140
0.101
0.126
0.109
0.109
0.095
0.517
0.494
0.271
0.306
0.261
0.309
0.400
0.433
0.336
0.164
0.117
0.131
0.102
0.196
0.131
0.075
0.092
0.084
0.059
0.050
-2.380
-2.676
-1.196
-0.985
-1.825
-1.757
-1.520
-0.821
-0.780
-0.467
-1.384
-1.207
-0.828
-1.318
-0.948
-1.516
-1.301
-0.808
-1.246
-0.853
1.082
-0.304
-0.093
1.117
0.436
0.515
-0.185
-0.973
-0.393
0.869
0.916
1.049
0.952
0.866
0.337
0.704
0.666
0.646
1.147
0.852
Burn In Iterations
MCMC Iterations
Clusters
N
-2.063
-0.906
0.064
-1.344
-0.150
-0.404
-0.438
-0.375
-0.066
-0.421
0.229
-0.477
0.544
-0.566
0.562
1.857
1.384
1.247
0.981
1.769
-0.502
-1.800
-0.117
1.028
-0.839
-1.119
-0.416
-0.780
-0.300
-2.215
-3.884
-4.075
-2.653
-3.250
-3.136
-3.126
-2.781
-2.612
-2.639
-1.934
0.896
-0.445
-0.353
0.483
-0.775
-0.582
-0.413
0.009
-0.120
0.221
-0.824
-0.369
-0.310
-0.258
-0.398
-1.115
-0.805
-0.385
-0.822
-0.482
3.121
1.657
0.984
2.315
1.459
1.732
1.367
0.711
0.929
1.514
1.373
1.550
1.352
1.632
0.847
1.000
1.026
0.976
1.380
1.052
10000
25000
20
8808
Table 4: Summary of the posterior density of independent Poisson regressions of cosponsorships on
women’s rights legislation for the 83 nd to the 102nd House. The abbreviation Post. Mean denotes the
posterior mean, Post. SD the posterior standard deviation, and 95% BCI the 95% Bayesian Credible Interval. This interval summarizes the central 95% of the posterior density. Noninformative priors for the β j are
assumed: βj ∼ N ormal(0, 104).
16
Parameter
β1,1 -Constant
β2,1 -Constant
β3,1 -Constant
β4,1 -Constant
β5,1 -Constant
β6,1 -Constant
β7,1 -Constant
β8,1 -Constant
β9,1 -Constant
β10,1 -Constant
β11,1 -Constant
β12,1 -Constant
β13,1 -Constant
β14,1 -Constant
β15,1 -Constant
β16,1 -Constant
β17,1 -Constant
β18,1 -Constant
β19,1 -Constant
β20,1 -Constant
β1,2 -DNominate
β2,2 -DNominate
β3,2 -DNominate
β4,2 -DNominate
β5,2 -DNominate
β6,2 -DNominate
β7,2 -DNominate
β8,2 -DNominate
β9,2 -DNominate
β10,2 -DNominate
β11,2 -DNominate
β12,2 -DNominate
β13,2 -DNominate
β14,2 -DNominate
β15,2 -DNominate
β16,2 -DNominate
β17,2 -DNominate
β18,2 -DNominate
β19,2 -DNominate
β20,2 -DNominate
α1,1 -Constant
α1,2 -Post-91st
α1,3 -Post-95th
α2,1 -Constant
α2,2 -Hetero.
α2,3 -PartyDiff
Burn In Iterations
MCMC Iterations
Clusters
n
Post.
Mean
Post.
SD
95% BCI
Lower
Upper
Parameter
Post.
Mean
Post.
SD
95% BCI
Lower
Upper
-2.558
-1.733
-0.183
-1.484
-0.549
-0.811
-0.809
-0.569
-0.222
-0.532
0.030
-0.725
0.394
-0.823
0.402
1.694
1.201
1.122
0.861
1.656
-2.600
-2.523
-0.839
-0.695
-1.369
-1.782
-1.178
-1.622
-1.064
-2.948
-4.193
-4.536
-3.108
-3.968
-3.638
-3.333
-3.102
-2.929
-2.849
-2.123
0.249
0.193
0.103
0.162
0.122
0.130
0.135
0.109
0.094
0.115
0.087
0.131
0.083
0.142
0.082
0.059
0.070
0.071
0.056
0.056
0.524
0.428
0.274
0.405
0.334
0.348
0.346
0.288
0.261
0.247
0.216
0.292
0.200
0.344
0.210
0.156
0.168
0.139
0.115
0.087
-3.087
-2.119
-0.385
-1.798
-0.792
-1.065
-1.079
-0.792
-0.408
-0.768
-0.139
-0.992
0.227
-1.106
0.241
1.580
1.063
0.978
0.750
1.546
-3.657
-3.373
-1.371
-1.494
-2.049
-2.459
-1.884
-2.186
-1.565
-3.443
-4.623
-5.131
-3.504
-4.651
-4.055
-3.649
-3.438
-3.199
-3.080
-2.293
-2.106
-1.362
0.013
-1.173
-0.310
-0.558
-0.545
-0.359
-0.037
-0.313
0.201
-0.476
0.556
-0.551
0.555
1.813
1.336
1.258
0.971
1.765
-1.606
-1.709
-0.315
0.099
-0.729
-1.104
-0.514
-1.055
-0.551
-2.474
-3.764
-3.963
-2.719
-3.284
-3.213
-3.030
-2.769
-2.659
-2.626
-1.952
β 1,3 -Party
β 2,3 -Party
β 3,3 -Party
β 4,3 -Party
β 5,3 -Party
β 6,3 -Party
β 7,3 -Party
β 8,3 -Party
β 9,3 -Party
β 10,3 -Party
β 11,3 -Party
β 12,3 -Party
β 13,3 -Party
β 14,3 -Party
β 15,3 -Party
β 16,3 -Party
β 17,3 -Party
β 18,3 -Party
β 19,3 -Party
β 20,3 -Party
β 1,4 -Gender
β 2,4 -Gender
β 3,4 -Gender
β 4,4 -Gender
β 5,4 -Gender
β 6,4 -Gender
β 7,4 -Gender
β 8,4 -Gender
β 9,4 -Gender
β 10,4 -Gender
β 11,4 -Gender
β 12,4 -Gender
β 13,4 -Gender
β 14,4 -Gender
β 15,4 -Gender
β 16,4 -Gender
β 17,4 -Gender
β 18,4 -Gender
β 19,4 -Gender
β 20,4 -Gender
-0.876
-0.966
-0.809
-0.703
-1.048
-0.960
-0.885
-0.564
-0.585
-0.353
-1.011
-0.753
-0.605
-0.821
-0.714
-1.234
-1.008
-0.630
-1.000
-0.676
1.201
0.951
0.745
1.262
0.981
1.063
0.882
0.641
0.705
1.130
1.122
1.249
1.133
1.150
0.678
0.872
0.870
0.825
1.251
0.955
0.257
0.251
0.168
0.215
0.190
0.201
0.193
0.168
0.150
0.158
0.127
0.163
0.119
0.184
0.123
0.098
0.111
0.104
0.102
0.089
0.272
0.240
0.190
0.231
0.186
0.207
0.225
0.236
0.212
0.142
0.107
0.119
0.095
0.162
0.120
0.071
0.085
0.081
0.059
0.050
-1.412
-1.495
-1.143
-1.116
-1.439
-1.377
-1.276
-0.888
-0.881
-0.653
-1.261
-1.074
-0.836
-1.182
-0.958
-1.429
-1.225
-0.831
-1.201
-0.846
0.718
0.467
0.336
0.847
0.612
0.655
0.409
0.129
0.248
0.849
0.911
1.016
0.944
0.839
0.435
0.734
0.697
0.664
1.135
0.859
-0.379
-0.495
-0.479
-0.277
-0.683
-0.577
-0.522
-0.231
-0.296
-0.032
-0.762
-0.429
-0.369
-0.459
-0.470
-1.044
-0.790
-0.427
-0.802
-0.496
1.803
1.403
1.090
1.762
1.343
1.465
1.304
1.047
1.077
1.413
1.331
1.482
1.320
1.477
0.903
1.010
1.035
0.983
1.366
1.053
-0.986
0.776
1.853
20.860
-122.800
21.300
0.282
0.513
0.427
8.773
53.550
14.630
-1.546
-0.239
0.986
8.055
-256.800
-3.775
-0.448
1.794
2.683
42.340
-46.370
57.420
α 3,1 -Constant
α 4,1 -Constant
ξ 1 (Precision)
ξ 2 (Precision)
ξ 3 (Precision)
ξ 4 (Precision)
-0.811
0.984
1.631
1.339
15.920
18.600
0.077
0.075
0.583
0.535
8.324
12.540
-0.967
0.835
0.679
0.543
5.474
4.952
-0.664
1.134
2.936
2.631
36.910
51.430
10000
25000
20
8808
Table 5: Summary of the posterior density of a hierarchical Poisson regression of cosponsorships on
women’s rights legislation for the 83 nd to the 102nd House. The model is written in Equation 3. The abbreviation Post. Mean denotes the posterior mean, Post. SD the posterior standard deviation, and 95% BCI the
95% Bayesian Credible Interval. This interval summarizes the central 95% of the posterior density. Noninformative priors for the α ∗,∗ and ξj are assumed: α∗,∗ ∼ N ormal(0, 104) and ξj ∼ Gamma(0.001, 0.001).
17
I present the results from the hierarchical Poisson regression in Table 5. The results for the individual
clusters are strikingly similar to those for the independent Poisson regressions. The interesting findings are
in the hyperparameters α∗,∗ that capture the contextual effects. While the BCI for α 1,2 contains zero, 93.8%
of the posterior density sample falls above zero. Thus, I can state with 93.8% certainty that this coefficient is
position, which substantively implies that the macropolitical change in the women’s rights agenda did in fact
contribute to a greater number of cosponsorships for all members, consistent with the argument of Wolbrecht
(2000b). α1,3 is also positive as hypothesized, and significantly different from zero. Its magnitude is greater
than α1,2 , indicating that the rules change contributed above and beyond the agenda change in impacting
women’s rights cosponsorships. 10 Preference heterogeneity explains variance in the β k,2 parameters. As
hypothesized, as preference heterogeneity increases, the marginal impact of preferences increases. There is
no finding for the party difference variable, however. The α 3,1 and α4,1 parameters summarize the grand
mean of the βk,3 and βk,4 parameters across the models. The precisions of the hyperparameter regressions
are interesting. Indeed, given only k = 20 clusters, precise estimation of the hyperstructure is unlikely.
However, these rather large precisions (which can be interpreted as the reciprocal estimated error variance
in a regression model) indicate a good fit.
Posterior Predictive Densities
While the table of coefficients is important in judging the significance and magnitude of the explanatory
variables at the first and second level of the hierarchies, the question remains as to what the substantive
impact of these variables have on women’s rights cosponsorships. If it were the case the that number of
cosponsorships varied between 4.9 to 5.1 for a typical member across decision contexts, we could conclude
that the contextual factors had little substantive impact. To assess the substantive fit of the model, I have
examined the posterior predictive density of the model in two ways. First, I have estimated the posterior
predictive distributions for each observation in the dataset, and then scattered the mean of this distribution
against the sample values y i,k . I did this for the pooled model, and for the hierarchical model. The pooled
model clearly does an unsatisfactory job of explaining y i,k for the simple reason it does not allow contextual
variables to impact behavior. While far from a perfect fit along the 45-degree line, the hierarchical model
does a much better job of explaining the variance in y i,k . When fitting any statistical model, it is important to
make sure the model could have generated the observed data (see Gelman et al., 1995). For the sake of space
I have omitted these figures from this paper, instead focussing on the more interesting typical members in
each Congresses.
To illustrate the heterogeneity across Congresses, I have estimated the posterior predictive densities for
two typical members of Congress: a liberal female Republican, and a moderate male Democrat (using the
same operationalizations as Figure 1). For each Congress, I plot a boxplot of logarithm base-10 of the
number of predicted cosponsorships. 11 Figure 2 contains the summaries for the liberal Republican woman
– the individual most likely to cosponsor women’s rights legislation. For the 83 rd to the 91st Congress
this hypothetical member is predicted to cosponsor between 1.5 and 4 pieces of legislation per Congress.
I have re-specified the model with the Post-91 st dummy taking a value of one until the 102 nd , thus making the Post-95 th
dummy the marginal contribution of the rules change above and beyond the agenda change. Given this specification, it remains
positive.
11
I plot the logarithm of the counts so the variance in each plot is apparent. One can exponentiate these to return to the original
scale.
10
18
log(Cosponsorships)
2
1
0
-1
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100 101 102
Congress
Figure 2: Boxplots of posterior predictive densities for a liberal Republican woman in the 83 nd to 102rd
House. This figure presents the logarithm base-10 of predicted cosponsorships for each Congress.
However, the agenda change in the 92 nd is quite apparent; in the 92 nd , this member is expected to cosponsor
over 11 bills, and in the 93 rd and thereafter, over 40. The jump for the rules change in the 97 th Congress is
also apparent.
For the hypothetical moderate Democratic male in Figure 3, similar patterns are discernible. While the
agenda change does not seem to have as dramatic an effect, the rules change in the 97 th seems to impact
behavior dramatically. Indeed, this expectation for the hypothetical member went from substantially less
than one cosponsorship per Congress until the 97 th when the number jumps to 1.6, and then further jumps
to 6.0 in the 98 th . Holding all else constant, the rules change dramatically impacted cosponsorship behavior
on women’s rights bills, as did the change in the macropolitical agenda. Clearly, contextual factors have a
strong substantive impact on congressional cosponsorship behavior. The point to be taken from this analysis
and these results are that hierarchical models are extremely useful and flexible models that can be used to
model all sorts of heterogeneous data.
Conclusion
The analysis presented above demonstrates the usefulness of Bayesian estimation using MCMC for the
analysis of clustered, heterogeneous data. The model above, however, is just the tip of the iceberg. Indeed,
there are many more sophisticated models one can use to model all sorts of contextual effects. Not only can
these models be generalized with more levels of analysis (for example, looking at roll call voting by members
of every state legislature for a fifty year time period). Here one could model contextual differences across
19
1.0
log(Cosponsorships)
0.5
0.0
-0.5
-1.0
-1.5
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99 100 101 102
Congress
Figure 3: Boxplots of posterior predictive densities for a moderate Democratic man in the 83 nd to 102rd
House. This figure presents the logarithm base-10 of predicted cosponsorships for each Congress.
states and across time using three levels of analysis. One could also develop more elaborate cross-cutting
mindful of dealing with the problem of persistence. Brandt et al. (2000) offer a time series regression model
called the PEWMA model which is particularly usefully when dealing with many counts observed through
time.
Appendix: BUGS Code
In Summer 2000 I will make available all BUGS code necessary to estimate these models available on my
homepage. Since BUGS code is somewhat dependent on the platform used, here I present the key model
definition statements for the models presented in the paper. The data input statements and the specification
of starting values differ slightly across platforms. All of the models presented here were estimated in Classic
BUGS 0.6 on a Linux workstation (note that WinBUGS would use a slightly different specification). BUGS
eliminates many of the start-up costs associated with perform Bayesian inference. Not only does it alleviate
the need to program in a high level language, but it chooses appropriate MCMC algorithms for the problem
at hand. Specifically, BUGS uses Gibbs sampling when the full conditional takes a known form, adaptive
rejection sampling when the full conditional is not of a closed form but is log-concave (Gilks and Wild,
1992), and the Metropolis-Hastings algorithm to sample for non-log-concave distributions (Spiegelhalter
et al., 1997). BUGS also comes with a suite of routines one can use with R and S-Plus to test for convergence
called CODA (Best et al., 1997). For a comprehensive treatment of how to use BUGS, compile and estimate
models, and test for convergence, consult Spiegelhalter et al. (1997) and Best et al. (1997).
Poisson Regression with Posterior Prediction
The first model is the Poisson regression model with the estimation of posterior predictive distributions.
The results from this model appear in Table 2 and Figure 1. Note that I use the same model definition
notation used in the paper. Since the posterior predictive distributions are just Monte Carlo averages over
the posterior density of β, they can be simulated from within the sampler.
# Poisson Regression Model
{
for (i in 1:N)
{
log(mu[i]) <beta[1] * cons[i] +
beta[2] * dnom[i] +
beta[3] * party[i] +
beta[4] * gender[i];
cospon[i] ˜ dpois(mu[i]);
}
# Priors
for(j in 1:J)
{
beta[j] ˜ dnorm(0.0, 0.0001);
}
# Posterior Prediction
libRW <- exp(beta[1] +
modRW <- exp(beta[1] +
conRW <- exp(beta[1] +
libDW <- exp(beta[1] +
modDW <- exp(beta[1] +
-0.627
-0.027
0.573
-0.627
-0.027
*
*
*
*
*
beta[2]
beta[2]
beta[2]
beta[2]
beta[2]
21
+
+
+
+
+
0
0
0
1
1
*
*
*
*
*
beta[3]
beta[3]
beta[3]
beta[3]
beta[3]
+
+
+
+
+
1
1
1
1
1
*
*
*
*
*
beta[4]);
beta[4]);
beta[4]);
beta[4]);
beta[4]);
conDW
libRM
modRM
conRM
libDM
modDM
conDM
}
<<<<<<<-
exp(beta[1]
exp(beta[1]
exp(beta[1]
exp(beta[1]
exp(beta[1]
exp(beta[1]
exp(beta[1]
+
+
+
+
+
+
+
0.573
-0.627
-0.027
0.573
-0.627
-0.027
0.573
*
*
*
*
*
*
*
beta[2]
beta[2]
beta[2]
beta[2]
beta[2]
beta[2]
beta[2]
+
+
+
+
+
+
+
1
0
0
0
1
1
1
*
*
*
*
*
*
*
beta[3]
beta[3]
beta[3]
beta[3]
beta[3]
beta[3]
beta[3]
+
+
+
+
+
+
+
1
0
0
0
0
0
0
*
*
*
*
*
*
*
beta[4]);
beta[4]);
beta[4]);
beta[4]);
beta[4]);
beta[4]);
beta[4]);
Poisson Regression with Overdispersion
The model definition code for the Poisson regression with overdispersion is based on the salm example in
the BUGS manual. Below is the code for the discharge petition model presented in Table 1. To facilitate
convergence, in this model all variables are standardized. BUGS allows the practitioner to perform all
manner of recodes within the software. The only difference between this model and the previous one is that
now a random effect ηi is included for each observation, and a prior probability is placed on the precision
parameter τ .
# Poisson Regression Model with Overdispersion
for (i in 1:N)
{
log(mu[i]) <beta[1] * cons[i]
+
beta[2] * hsdnom.std[i] +
beta[3] * divgov.std[i] +
beta[4] * totcomm.std[i] +
beta[5] * majsize.std[i] +
beta[6] * rul1.std[i]
+
eta[i];
petfiled[i] ˜ dpois(mu[i]);
eta[i] ˜ dnorm(0.0, tau);
}
# Priors
for(j in 1:K)
{
beta[j] ˜ dnorm(0.0, 0.0001);
}
tau ˜ dgamma(0.001,0.001);
}
Hierarchical Poisson Regression
The final code I present is from the hierarchical Poisson regression model. Here the data is loaded from
three files: one that contains the level one covariates sorted by Congress, one that contains the number of
observations n k within each cluster, and one that contains the contextual (second-level) covariates. This
code begins with the first level, assuming a Poisson regression within each cluster. Then, across each cluster
k, I assume independent regressions are noted in the text. The code concludes by placing priors on all of the
hyperparameters.
# Hierarchical Poisson Regression Model -- Level One
22
{
for (k in 1:K)
{
for (i in lower[k]:upper[k])
{
log(mu[i]) <(beta[1,k] * cons[i] +
beta[2,k] * dnom[i] +
beta[3,k] * party[i] +
beta[4,k] * gender[i]);
cospon[i] ˜ dpois(mu[i]);
}
}
# Hierarchical Poisson Regression Model -- Level Two
for (k in 1:K)
{
nu[1,k] <- alpha1[1] + alpha1[2] * era1[k] + alpha1[3] * era2[k];
nu[2,k] <- alpha2[1] + alpha2[2] * sdnom[k] + alpha2[3] * pdist[k];
nu[3,k] <- alpha3[1];
nu[4,k] <- alpha4[1];
for (j in 1:J)
{
beta[j,k] ˜ dnorm(nu[j,k],xi[j]);
}
}
# Priors
for(m1 in 1: M1)
{ alpha1[m1] ˜ dnorm(0.0, 0.0001);
for(m2 in 1: M2)
{ alpha2[m2] ˜ dnorm(0.0, 0.0001);
for(m3 in 1: M3)
{ alpha3[m3] ˜ dnorm(0.0, 0.0001);
for(m4 in 1: M4)
{ alpha4[m4] ˜ dnorm(0.0, 0.0001);
for (j in 1:J)
{ xi[j] ˜ dgamma(0.001,0.001); }
}
}
}
}
References
Albert, James. 1992. “A Bayesian Analysis of a Poisson Random Effects Model for Home Run Hitters.”
The American Statistician 46(November):246–253.
Albert, James H., and Patricia A. Pepple. 1989. “A Bayesian Approach to Some Overdispersion Models.”
The Canadian Journal of Statistics 17(September):333–344.
Bartels, Larry M. 1996.
40(August):905–942.
“Pooling Disparate Observations.”
American Journal of Political Science
Beck, Neal, and Jonathan N. Katz. 1995. “What To Do (And Not To Do) with Time-Series Cross-Section
Data.” American Political Science Review 89(September):634–647.
23
Berry, Scott M., C. Shane Reese, and Patrick D. Larkey. 1999. “Bridging Different Eras in Sports.” Journal
of the American Statistical Association 94(September):661–686.
Best, Nicky, Mary Kathryn Cowles, and Karen Vines. 1997. Convergence Diagnostics and Output Analysis
Software for Gibbs Sampling Output, Version 0.4. Cambridge: MRC Biostatistics Unit.
Beth, Richard S. 1990. The Discharge Rule in the House of Representatives: Procedure, History, and
Statistics. Washington, DC: Congressional Research Service.
Binder, Sarah A., Eric D. Lawrence, and Forrest Maltzman. 1999. “Uncovering the Hidden Effect of Party.”
Journal of Politics 61(August):815–831.
Brandt, Patrick T., John T. Williams, Benjamin O. Fordham, and Brian Pollins. 2000. “Dynamic Modeling
for Persistent Event Count Time Series.” Workshop in Political Theory and Policy Working Paper, Indiana
University.
Breslow, N. E. 1984. “Extra-Poisson Variation in Log-Linear Models.” Applied Statistics 33(1):38–44.
Cameron, A. Colin, and Pravin K. Trivedi. 1998. Regression Analysis of Count Data. Cambridge: Cambridge University Press.
Campbell, James E. 1982. “Cosponsoring Legislation in the U.S. Congress.” Legislative Studies Quarterly
7(August):415–422.
Carlin, Bradley P., and Thomas A. Louis. 1996. Bayes and Empirical Bayes Methods for Data Analysis.
London: Chapman & Hall.
Chib, Siddhartha, Edward Greenberg, and Rainer Winkelmann. 1998. “Posterior Simulation and Bayes
Factors in Panel Count Data Models.” Journal of Econometrics 86(September):33–54.
Chib, Siddhartha, and Rainer Winkelmann. 1999. “Markov Chain Monte Carlo Analysis of Correlated
Count Data.” Olin School of Business Working Paper, Washington University.
Christiansen, Cindy L., and Carl N. Morris. 1997. “Hierarchical Poisson Regression Modeling.” Journal of
the American Statistical Association 92(June):618–632.
Cox, Gary W., and Mathew D. McCubbins. 1993. Legislative Leviathan: Party Government in the House.
Berkeley: University of Calfornia Press.
Efron, B., and C. Morris. 1973. “Combining Possibly Related Estimation Problems.” Journal of the Royal
Statistical Society B 35(3):379–421.
Gelfand, Alan E., and Adrian F. M. Smith. 1990. “Sampling-Based Approaches to Calculating Marginal
Densities.” Journal of the American Statistical Association 85(December):398–409.
Gelman, Andrew, John B. Carlin, Hal S. Stern, and Donald B. Rubin. 1995. Bayesian Data Analysis.
London: Chapman & Hall.
Gelman, Andrew, and Gary King. 1990. “Estimating the Consequences of Legislative Redistricting.” Journal of the American Statistical Association 85(June):274–282.
Geman, S., and D. Geman. 1984. “Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration
of Images.” IEEE Transactions of Pattern Analysis and Machine Intelligence 6(November):721–741.
Gilks, W. R., S. Richardson, and D. J. Spiegelhalter. 1996. Markov Chain Monte Carlo in Practice. London:
Chapman & Hall.
24
Gilks, W. R., and P. Wild. 1992. “Adaptive Rejection Sampling for Gibbs Sampling.” Applied Statistics
41(2):337–248.
Greenwood, Major, and G. Udny Yule. 1920. “An Inquiry into the Nature of Frequency Distributions
Representative of Multiple Happenings with Particular Reference to the Occurrence of Multiple Attcks
of Disease or of Repeated Accidents.” Journal of the Royal Statistical Society 83(March):255–279.
Jackman, Simon. 2000. “Estimation and Inference via Bayesian Simulation: An Introduction to Markov
Chain Monte Carlo.” American Journal of Political Science 44(April):369–398.
Jones, Bradford S., and Marco R. Steenbergen. 1997. “Modeling Multilevel Data Structures.” Paper presented at the Annual Meeting of the Political Methodology Section of the American Political Science
Association.
Kessler, Daniel, and Keith Krehbiel. 1996. “Dynamics of Cosponsorship.” American Political Science
Review 90(September):555–566.
King, Gary. 1988. “Statistical Models for Political Science Event Counts: Bias in Coventional Procedures
and Evidence for the Exponential Poisson Regression Model.” American Journal of Political Science
32(August):838–863.
King, Gary. 1989a. “A Seemingly Unrelated Poisson Regression-Model.” Sociological Research and Methods 17(February):235–255.
King, Gary. 1989b. “Variance Specification in Event Count Models: From Restrictive Assumptions to a
Generalized Estimator.” American Journal of Political Science 33(August):762–784.
King, Gary, Ori Rosen, and Martin A. Tanner. 1999. “Binomial-beta Hierarchical Models for Ecological
Inference.” Sociological Research and Methods 28(August):61–90.
King, Gary, Michael Tomz, and Jason Wittenberg. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” American Journal of Political Science 44(April):347–361.
Krehbiel, Keith. 1995. “Cosponsors and Wafflers from A to Z.” American Journal of Political Science
39(November):906–923.
Martin, Andrew D., and Christina Wolbrecht. 2000. “Partisanship and Pre-Floor Behavior: The Equal Rights
and School Prayer Amendments.” SUNY Stony Brook Department of Political Science Working Paper.
Poole, Keith T., and Howard Rosenthal. 1997. Congress: A Political-Economic History of Roll-Call Voting.
Oxford: Oxford University Press.
Quinn, Kevin M., Andrew D. Martin, and Andrew B. Whitford. 1999. “Voter Choice in Multi-Party
Democracies: A Test of Competing Theories and Models.” American Journal of Political Science
43(October):1231–1247.
Regens, James L. 1989. “Congressional Cosponsorship of Acid Rain Controls.” Social Science Quarterly
70(June):505–512.
Schiller, Wendy J. 1995. “Senators as Political Entrepreneurs: Bill Sponsorship to Shape Legislative Agendas.” American Journal of Political Science 39(February):186–203.
Schofield, Norman J., Andrew D. Martin, Kevin M. Quinn, and Andrew B. Whitford. 1998. “Multiparty
Electoral Competition in the Netherlands and Germany: A Model Based on Multinomial Probit.” Public
Choice 97(December):257–293.
25
Scollnik, David P. M. N.D. “Actuarial Modeling with MCMC and BUGS.” Actuarial Research Clearing
House, forthcoming.
Smith, Alastair. 1999. “Testing Theories of Strategic Choice: The Example of Crisis Escalation.” American
Journal of Political Science 43(October):1254–1283.
Spiegelhalter, David J., Andrew Thomas, Nicky Best, and Wally R. Gilks. 1997. BUGS 0.6: Bayesian
Inference Using Gibbs Sampling. Cambridge: MRC Biostatistics Unit.
Tanner, M. A., and W. Wong. 1987. “The Calculation of Posterior Distributions by Data Augmentation.”
Journal of the American Statistical Association 82(June):528–550.
Western, Bruce. 1998. “Causal Heterogeneity in Comparative Research: A Bayesian Hierarchical Modelling
Approach.” American Journal of Political Science 42(October):1233–1259.
Western, Bruce, and Simon Jackman. 1994. “Bayesian Inference for Comparative Research.” American
Political Science Review 88(June):412–423.
Wolbrecht, Christina. 2000a. “Female Legislators and the Women’s Rights Agenda.” Paper presented at the
Women Transforming Congress Congress, Carl Albert Center, April 13-15 2000.
Wolbrecht, Christina. 2000b. The Politics of Women’s Rights: Parties, Positions, and Change. Princeton:
Princeton University Press.
Young, Cheryl D., and Rick K. Wilson. 1997. “Copsonsorship in the United States Congress.” Legislative
Studies Quarterly 22(February):25–43.
Zorn, Christopher J. W. 1998. “An Analytic and Empirical Examination of Zero-Inflated and Hurdle Poisson
Specifications.” Sociological Research and Methods 26(February):368–400.
26