Download Diffusion or Confusion? Modeling Policy Diffusion with Discrete Event History Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia , lookup

Least squares wikipedia , lookup

Linear regression wikipedia , lookup

Data assimilation wikipedia , lookup

Choice modelling wikipedia , lookup

Discrete choice wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Diffusion or Confusion?
Modeling Policy Diffusion with Discrete Event History Data
Jack Buckley
Department of Political Science
State University of New York at Stony Brook
Stony Brook, NY 11794-4392
[email protected]
Paper prepared for the 19th Annual Summer Political Methodology Meetings, Seattle,
Washington, 2002. The research reported in this paper was supported by the National
Science Foundation’s Graduate Research Fellowship program.
Abstract
In a review of the literature on policy diffusion in the states, Berry and Berry
(1999) conclude that the “gold standard” methodological approach to the empirical
modeling of these processes is discrete event history analysis. I concur, but find that the
literature largely ignores several important issues in the proper specification of these
models, including choice of functional form, modeling different mechanisms of diffusion,
modeling duration dependence, and spatial autocorrelation in the cross-sections. I use
data from Berry and Berry’s (1990) classic study of the diffusion of state lotteries to
suggest possible improvements to this research.
Introduction
Since Walker’s (1969) seminal work, students of the policy process have
recognized that diffusion, or the spreading of policy innovations from state to state, has
important consequences both for theories of public policy and real policy outcomes. As
Berry and Berry (1999) point out in their review of the diffusion literature, the “gold
standard” approach to the empirical testing of diffusion models is widely regarded to be
event history analysis. I concur, but find that the literature largely ignores several
important issues in the proper specification of these models, including modeling duration
dependence, modeling different mechanisms of diffusion, choice of functional form, and
spatial autocorrelation or dependency in the cross-sections. In this paper, I use data from
Berry and Berry’s (1990) classic study of the diffusion of state lotteries to suggest
possible improvements to this research.
The best recent empirical research on policy diffusion and innovation in the
states, such as Berry and Berry's (1990;1992) work on state lotteries and tax policy,
Mooney and Mei-Hsien's (1995) study of pre-Roe abortion policy, or Mintrom's (1997;
Mintrom and Vergari 1998) analysis of education reform, shares a common
methodological approach: discrete event history analysis. This method has the flexibility
to allow modeling of both what the literature calls the internal determinants of policy
change (i.e. per capita income in a state, the percent of religious fundamentalists, etc) and
specialized measures that test different theoretical approaches to diffusion (discussed in
greater detail below). In the standard approach a starting date is chosen, such as the first
year a policy is introduced, and data are collected on all the states that are in the risk set—
have a positive probability of adopting the policy—for a series of discrete time periods.
As states adopt the policy and are thus removed from the risk set, data are no longer
collected on them. The data set is then analyzed typically using a dichotomous dependent
variable indicating whether the policy was adopted in a given year by a given state.
Coefficients and their standard errors are estimated using familiar maximum likelihood
estimators such as logit or probit (Allison 1984).
This approach to testing theory with discrete event history data is straightforward
and computationally economical, but has several shortcomings. First, as Berry and Berry
note, this simple analysis “assumes that the probability that a state will adopt in one year
is unrelated to its probability of adoption in prior years,” (1999:192). In duration
modeling or event history analysis parlance, the hazard rate is flat—the model is
constrained to allow no duration dependence (see Box-Steffensmeier and Jones 2002),
chapter 6). Second, researchers have not paid enough attention to including theoretically
sound measures of diffusion that prevent confounding this important concept with
duration dependence. Third, several alternatives to the traditional logit or probit
estimators exist that can lead to different results or to the modeling of additional
processes beyond the scope of the original model. Finally, the literature to date has
ignored issues of spatial dependency in the data.
As I address each of these issues in turn below, the central point I wish to convey
is that a host of modeling choices (and sometimes tradeoffs) must be made to properly
model policy diffusion. I also wish to stress that this paper is not a primer on duration
modeling generally—in fact I refer frequently to just such a work, the excellent and
accessible book by Box-Steffensmeier and Jones (2002). Rather the focus here is more
narrowly on issues specifically important to the modeling of policy diffusion.
Duration Dependence
The unique attribute of duration or survival analysis is the ability to consider
functions of time conditional on the data in hand or time itself. Practitioners of event
history analysis frequently refer to one of these functions as the hazard rate, h(t ) , which
can be defined as the instantaneous rate at which units “fail” (i.e. states adopt a policy)
given that they have survived to time t. As Berry and Berry point out in the quotation
cited above, their event history analysis of state lottery adoptions (1990) and tax policy
diffusion (1992) implicitly assumes that this rate is unchanging over time—a flat hazard
rate. In other words they are assuming that, conditional on the covariates in their model,
the probability of duration until policy adoption is not changing as a function of time
(Box-Steffensmeier and Jones 2002: 134-5).
This assumption of flat hazard or no duration dependence is equivalent to the
assumption that their models are completely and correctly specified—a rather strong
assumption for a social science model. Moreover, since the purpose of their research is
the study of policy diffusion over time (and space), it would seem to be more practical to
relax this assumption and account for the possibility of unmodeled duration dependence
in the data (Beck, Katz, and Tucker 1998). Fortunately, there are several relatively simple
means of doing this.
As Box-Steffensmeier and Jones point out (2002: 135), the most general means of
doing this in the context of discrete duration modeling is the inclusion of a dichotomous
(or “dummy”) variable for the t-1 time periods under observation. The advantage of this
approach is that it is very flexible—no a priori functional form need be chosen to model
the effect of time on the hazard. However, adding t-1 variables can make it difficult to
estimate other parameters if the initial number of degrees of freedom is not large.1
One alternative approach is the inclusion of time itself as a regressor in the model.
This can be done simply by including a “counter” variable representing the time of
observation2, or by including one or more transformations of time (such as the natural log
of time, or the square and cube of time). This method is certainly more parsimonious than
the temporal dummies approach, but requires the researcher to make an assumption about
the effects of time conditional on the other covariates in the model. If, for example, the
researcher includes a linear time counter, then the implicit assumption is that conditional
on the various quantities in the model and their effects, the hazard rate is increasing at a
constant rate with time.
To illustrate the use method and its effect on the estimates obtained, I turn to a
familiar example: the Berry and Berry (1990) study of state lottery adoptions.3 Berry and
Berry estimate a discrete event history model of the form:
Adopt i ,t = Φ ( β 0 + β1Fiscal Health i ,t −1 + β 2 Per Capita Incomei ,t −1 + β 3Single Party Controli ,t
+ β 4 Gubernatorial Election Yeari ,t + β 5 Neither Election Year nor Year Afteri ,t
+ β 6 Percentage Fundamentalistsi ,t + β 7 Number of Neighboring States with Lotteryi ,t )
1
Mintrom and Vergari include dichotomous variables for most of their time points (the first two are
collapsed into one). However, they do this not in recognition of the issue of modeling duration dependence
but as an alternative specification of diffusion. I discuss this further below.
c ∈ [1, 2,..., t ] or c ∈ [0,1,..., t ] .
2
Generally coded so that
3
Note that I do not include any of the interactive terms discussed by (Frant 1991) and (Berry and Berry
1991) in the controversy over whether Berry and Berry’s theory is correctly tested.
where policy adoption by a state in a given year is a probit function of the linear
combination of lagged fiscal health and per capital income, dichotomous variables for
single party control, gubernatorial election year, and whether t is neither an election year
or the year after an election, the percentage of religious fundamentalists out of the whole
state population, and the number of neighboring states with a lottery.
The first column of Table 1 presents my results of the estimation of this model
using the original dataset. Unsurprisingly, I exactly replicate the results of Berry and
Berry.4 In the second column of the table, however, I add a simple linear term using a
time counter variable (coded 1 for the first year of observation to 23 for the last), yielding
the model (with the other terms abbreviated in a slight abuse of notation):
Adopt i ,t = Φ( β 0 + x 'i ,t β + β8 Time Countert )
This addition to the model has some interesting consequences. First, and most
importantly, the ratio of the estimate of the marginal effect of the number of neighboring
states to its standard error is smaller than in the Berry and Berry model, small enough in
fact to lead to the failure to reject the null hypotheses that the coefficient is equivalent to
0 at conventional two-tailed significance levels.
Also, note the estimated of effect of lagged fiscal health is now statistically
significant, as is the time counter. This latter result suggests (at least when the linear
functional form is chosen) that there is an increasing risk of states adopting a lottery over
4
More precisely, the coefficient estimates are identical but my standard errors are slightly different because
I also cluster on the state variable, since the observations over time for a single state are almost certainly
not independent.
time, even though the neighboring states hypothesis is not supported. I caution the reader
not to immediately interpret this result, however, and in the next section I examine
alternative specifications of diffusion estimated simultaneously with a duration
dependence term.
Table 1 About Here
Before moving on, however, I first need to explore alternative specifications of
duration in the model. Following the suggestions of Box-Steffensmeier and Jones (2002),
I estimate first a model including a term of the natural logarithm of the time counter, and
second a model using a natural cubic spline of time that attempts to recover some of the
flexibility of the temporal dummy approach. The first model is simply:
Adopt i ,t = Φ ( β 0 + x 'i ,t β + β 8 ln [Time Countert ])
The results, presented in the third column of Table 1, are similar to those of the linear
time counter model, although they are somewhat more supportive of the state competition
(number of neighbors) theory of diffusion.
The cubic spline approach, suggested by Box-Steffensmeier and Jones (2002:
137) follows the recent work of Beck, Katz, and Tucker (1998) and Beck and Jackman
(1998) on the use of spline functions or locally weighted regression to estimate smooth
functions of variables. In this case, I estimate a five knot cubic spline of the probability of
adoption of a state lottery on the time counter variable and include the linear prediction as
a covariate to model duration dependency.5 This new variable allows me to capture
considerable non-linearity without the loss of degrees of freedom associated with dummy
variables or the need to specify a restrictive functional form. The model is thus:
Adopt i ,t = Φ ( β 0 + x 'i ,t β + β 8τˆi ,t )
where τˆi ,t are the residuals from the cubic spline regression. The results of this approach
are in the last column of Table 1, and Figure 1 provides a visual comparison of the linear,
natural log, and cubic spline specifications of duration dependence. Once again, estimates
are similar to those of the original model. Duration dependence is positive and
statistically significant in this model as well.
Figure 1 About Here
Table 2 provides comparison of the four models on two criteria—estimated
coefficient of the number of neighboring states (the diffusion measure in the original
paper) and the chi-square statistics and p-values for likelihood ratio tests of the original
model with no duration dependence against the three alternative models. All three of the
models have statistically-significant likelihood ratio statistics, indicating superior fit to
the original model. The cubic spline model, as should be expected given its greater
flexibility, appears to be the best fit. Accordingly, I will continue to include the time
spline variable in subsequent models as I go on to discuss other aspects of the event
history analysis of policy diffusion.
5
This is easily done using the downloadable command “spline” for the software package STATA 7. Here I
rescale the result by a factor of 100 for ease of comparison with the other duration approaches.
Modeling Diffusion
I now turn to the most theoretically interesting choice that the researcher must
make: how to properly model the diffusion mechanism itself. Berry and Berry (1999)
identify two relevant patterns of diffusion in state policy adoption: regional diffusion and
national interaction models.6 I take these categories as starting points for our discussion.
Within the category of regional diffusion models, Berry and Berry further specify
two subsets: neighbor models and fixed-region models. In the former, policies are
thought to spread to contiguous states due to their shared borders. This is the
hypothesized diffusion mechanism in the state lottery example, and Berry and Berry
support this approach by arguing that states directly compete for revenue which can be
lost to neighboring states as consumer drive across state lines to purchase lottery tickets.
The neighbor model might also be appropriate, Berry and Berry argue, for other types of
economic competition among states (1992).
Estimates from a model using the neighboring states measure of diffusion are thus
identical to those in the models presented above; for the sake of comparison with later
models, I present again the results of the cubic spline duration dependence model with
number of neighbors as diffusion measure, using the Berry and Berry (1990) data on state
lottery adoptions, in the first column of Table 3.
Table 3 About Here
6
They also identify a third and fourth: leader-laggard models and vertical influence. The former, as they
point out, is either virtually impossible to specify or else is equivalent to one of the other mechanisms. The
latter is only relevant to policies with incentives or direction from the Federal government to the states.
In contrast to the neighbor model, the fixed-region model does not constrain
diffusion to occur only among contiguous states. Rather, states are assumed to have a
positive probability of adopting policies that other states in the same geographical region
have adopted. Mooney and Mei-Hsien (1995), for example, adopt this approach in their
analysis of pre-Roe abortion policy. Regions can either be the familiar divisions in the
United States (i.e. Northeast, Southeast, etc.) or they may be constructed empirically by
factor analysis (Berry 1994; Walker 1969). In the second column of Table 3, I replicate
the lottery analysis replacing the covariate for the number of neighbors adopting with a
new measure: the percentage of other states in state i’s region at time t that have adopted
the lottery. We specify four, approximately equal regions: Northeast, Southeast, Midwest
and West (includes the Southwest). As the table shows, results are similar to the original
model.
The national interaction model is a sort of epidemiological approach to diffusion
at the national level. Pioneered by Gray (1973), this approach assumes that leaders of the
states (governors, legislators, policy professionals) constitute a single social system
through which ideas propagate in accordance with simple dynamics borrowed from
communications theory (see Berry and Berry 1999: 172-174 for a more thorough
explanation). An alternative (but perhaps equivalent given certain assumptions at the
micro-level) formulation is the “maturation effects” approach of Mintrom and Vergari
(1998). In this model unspecified forces cause state leaderships to conclude that the “time
has come” for certain policy ideas.
Figure 2 Here
If a national interaction process is occurring, policy diffusion will follow the
familiar S-shaped or learning curve as states are slow to adopt at first, more rapidly adopt
in the middle, then slow down again as the system becomes saturated (as almost all states
have adopted). Figure 2 shows the empirical curve for the cumulative proportion of states
adopting a lottery over time. The figure suggests that the saturation point has not been
reached: there is no “flattening out” at the end of the observation period.7
How can one include a national interaction or maturation effect in an event
history analysis of policy diffusion? Mintrom and Vergari choose to include dummy
variables for years observed, but as we mention above, this is equivalent to allowing for
unspecified duration dependence (which they do not model separately). Another
possibility is including a count of the number of states in the entire nation (or the
proportion of states) that have adopted the policy at time t. This is equivalent to the
neighborhood model with the entire nation as neighborhood. As in the case of the time
counter, this total counter can be transformed (i.e. use logged or polynomial functions of
the variable) or locally-weighted or spline regression can be used to account for
nonlinearities.
To illustrate this approach, I estimate a model identical to those discussed in the
previous section with the residuals from a cubic spline regression of whether a state has
adopted a lottery on the total number of states with a lottery at time t (denoted cˆi ,t ).
7
This is supported by the fact that since 1986, the last year of the lottery study, 10 more states have
adopted a lottery.
Duration dependence ( τˆi ,t ) is also included separately in the model using the cubic spline
approach. The model is thus:
Adopt i ,t = Φ ( β 0 + x 'i ,t β + β 8τˆi ,t + β 9 cˆi ,t )
Results of this model are presented in the third column of Table 3. The coefficient
on the total spline is not statistically significant, suggesting that this model of diffusion is
not appropriate for the state lottery case.8
I now turn to a brief discussion of the estimation of diffusion models with
different functional forms than the probit used up until now.
Functional Forms
Yet another modeling choice that the researcher confronts when conducting
discrete event history analysis is the selection of the appropriate functional form for the
empirical model. In an ideal world, this choice would not matter. Unfortunately, as we
demonstrate below, choice functional form can have a substantial effect on results when
real data are analyzed.
Most political scientists are familiar with both the probit function and the logit
function; indeed, most of the time they are used interchangeably in the analysis of
dichotomous random variables. If π i ,t is the probability that state i adopts a given policy
at time t, then the probit function is:
π i ,t = Φ(x 'i ,t β)
8
Note also that the correlation between the total spline and the time spline is 0.92.
where phi is the cumulative distribution function of the standard normal, and the logit
function:
 π
log  i ,t
1−π
i ,t


 = x 'i ,t β

A simple graph of the two functions, however, reveals that they are not identical. As
Figure 3 illustrates (the third curve in the figure is discussed below), the logit function
has “fatter tails” than the probit—it approaches 0 and 1 more slowly. In many
applications this is unimportant; however the difference can become a problem with a
large number of observations or when many of the predicted probabilities are very small
or very large (or, equivalently, if the unobserved continuous random variable has very
large or small values). This latter case is precisely the problem in many discrete event
history analyses: as units fail (or states adopt a policy) they are removed from the data
set. Adoption or failure thus essential a “rare event”—the number of observed 1’s in the
dependent variable vector is small relative to the number of 0’s.
Figure 3 Here
Table 4 Here
As an example of the consequences of this choice, I present two identical models
of state lottery diffusion (with duration dependence modeled by time spline and diffusion
by regional percentage) differing only in functional form. I do not calculate first
derivatives or predicted probabilities to compare results—I only consider here the
differences in the ratios of estimated coefficients to their standard errors. The results are
in the first two columns of Table 4. As is evident, changing from a probit functional form
to a logit has a substantial effect on the results of conventional hypothesis tests on the
estimated values of the coefficients versus the null that they are equal to zero. Most
prominently, both diffusion and single party control go from being significant at the 0.05
and 0.10 levels (respectively) to not statistically significant. From this narrow
perspective, at least, choice of functional form matters.
Despite this issue, logit and probit are still considered appropriate for discrete
event history analysis (Allison 1984), Box-Steffensmeier and Jones (2002). Nevertheless,
there are several alternative functional forms worth considering. The first of these is the
complementary log-log (or “cloglog”) function:
π i = 1 − exp  − exp ( x 'i ,t β )
As Figure 5 illustrates, this function is asymmetrical: it has a “fat tail” as it approaches 0
but it approaches 1 more quickly than either the logit or probit functions. This suggests
that it may be more appropriate for the “rare event” discrete event history case. As the
third column in Table 4 shows, estimates of the lottery diffusion model using the cloglog
function are similar (in terms of what is conventionally significant) to those obtained
using the logit estimator.
Another justification for using the complementary log-log function that BoxSteffensmeier and Jones (2002: 133) point out is that it is mathematically the discretetime analog of the continuous-time Cox proportional hazards model that is their preferred
model for continuous duration analysis. Box-Steffensmeier and Jones also discuss the use
of the conditional logit form of the Cox model (in which “ties” are resolved via the exact
discrete method) as yet another alternative for discrete event history analysis.
For completeness, I also include one additional estimator for discrete event
history data. As Western and Jackman (1994) and more recently Gill (2001) have
discussed, the logic of frequentist statistical inference is on less than firm ground when it
comes to analyses when no random sampling is used. In all of the diffusion literature (and
in all of the models estimated in this paper to this point), every attempt is made to gather
data on all 48 contiguous states9 for a long a time period as possible (usually from the
first adoption of a policy until the date of the research). Given this fact, Gill proposes two
solutions: present additional measures of variance and treat the estimates obtained as
population parameters, or use a full Bayesian analyses and present results not as point
estimates with standard error but as marginal posterior distributions.
I illustrate a simple version of the latter, again using the state lottery data with the
time spline and regional percentage variables, and a logit link function. The model10 is:
adopti ,t ~ bernoulli (π i ,t )
 π i ,t

 1 − π i ,t

 = x 'i ,t β

with uninformative priors on the covariates and missing values (which are estimated
jointly with the other parameters). Results of a Markov chain Monte Carlo estimate of the
model are presented in the last column of Table 4 (based on 2500 iterations with 500
discarded as burn-in). The table gives the posterior empirical means and the 2.5 and 97.5
percentiles. Results are similar to the other estimators (although interpretation is, of
course, quite different). Of particular interest is the posterior for the diffusion
9
Actually 46 in the lottery data due to listwise deletion of missing values for the party control variable.
10
I include the code for estimating this model using the free package WinBugs (Gilks, Thomas, and
Spiegelhalter 1994; Spiegelhalter, Thomas, and Best 1999) in the Appendix.
parameter—although the [2.5%,97.5%] interval includes 0, most of the probability mass
is positive (see Figure 4).
Figure 4 Here
This simple Bayesian model barely begins to demonstrate the power of this
approach for estimating complex models of diffusion, and for incorporating prior
information to condition our posterior estimates. Unfortunately I do not have the space
here to discuss this area in greater detail. I now turn to our final issue: spatial dependence
and autocorrelation.
Considering Space
Tobler's (1979) First Law of Geography holds that “everything is related to
everything else, but near things are more related than distant things.” Unfortunately,
geography can also hamper the proper estimation of empirical models of diffusion. Why?
In most of the examples presented above, I estimate robust standard errors clustered at
the state level to account for the non-independence of observations. This, however, does
not account for two additional problems.11
First, just as in the case of duration dependence discussed above, unmeasured
factors that vary with respect to geography may confound estimates of diffusion. As I
demonstrate below, the solutions for this problem are quite similar to those introduced for
modeling duration dependence.
At the regional level, or in the immediate vicinity of each additional state, there is
a different problem. Several of the “internal determinants” variables in the lottery
11
Yet a third potential problem, spatial heteroscedasticity, is not considered.
model(s), such as lagged fiscal health, lagged per capita income, and even the percent of
the population that adheres to a fundamentalist religion, are likely to be highly correlated
within neighborhoods during certain years. The business cycle may affect the states
differently, but nearby states often have similar economies and cultural histories. There is
a large literature on accounting for or modeling this spatial autocorrelation in econmetrics
(see, for example, Anselin 1988). Unfortunately, there is little (no) work on discrete event
history analysis with spatial dependence. I conclude this paper with some thoughts on this
issue.
Figure 5 Here
First, how do we know that either problem is a concern? A simple graphical
analysis is revealing: Figure 5 shows the same cumulative proportion of states adopting a
lottery versus time that is presented in Figure 2—but this time broken down by the four
regions discussed above. The Northeast, in the lower left quadrant, exhibits the classic
“S” or learning curve discussed above—much better than the national aggregate, in fact.
The other three regions, however, each exhibit different diffusion patterns. Most striking
is that of the Southeast—no state lotteries adopted at all until the very end of the series.
First, I examine the problem of unmodeled spatial dependence. As in the case of
temporal (or duration) dependence, a possible strategy for accounting for this is the
introduction of dichotomous variables, here for geographic region. The results of this are
demonstrated in the first column of Table 5, as part of a model using the familiar state
lottery covariates, the time spline, and the complementary log-log functional form. As the
table illustrates, the inclusion of the geographic dummies leads to a failure to reject the
null hypothesis (at conventional significance levels) that policy diffusion through the
neighboring states mechanism is occurring.
The problem with geographic dummy variables is not the loss of degrees of
freedom (as in the case of duration dependence) but the arbitrariness of the regional
categories. I now turn to two methods of overcoming this weakness.
Table 5 Here
One alternative method of accounting for unmodeled differences in space, that
avoids the arbitrary nature of dichotomous region variables, is a generalized geographical
approach. Rather than constrain a model to the a priori definitions of regions, I instead
model the diffusion of policy considering covariates for the physical location of the state
capitals. Figure 6 shows the spatial data—x and y grid coordinates12 for each city
(referred to sometimes as “eastings” and “northings”). Once again, I introduce flexibility
into our model by estimating cubic spline regressions on x and y and including the
smoothed values in the model instead of the untransformed covariates. Figure 7a and b
present the results of the spline regressions graphically. They suggest that the extreme
east and west cities are more likely to adopt lotteries, as are cities that are more northern.
Figures 6a,b and 7 here
The second column of Table 5 includes these splines in the same cloglog model
discussed above. Conditional on the other covariates, neither the smoothed spline values
nor the neighboring states variable are found to be significant.
12
I use arbitrary grid references instead of the more familiar latitude and longitude measures to avoid
introducing nonlinearities due to spherical coordinates.
A final way of introducing geography to the model is the use of a single spline for
both the x and y dimensions, such as a thin-plate spline (Hastie, Tibshirani, and Friedman
2001), which is the multidimensional analog of the cubic smoothing spline. Two thinplate spline fits of x and y position to the lottery adoption variable are shown as contour
maps in Figures 8a and b. The first spline fit was constructed by placing five knots at five
centroids of the data, the second uses 10 knots. As a comparison of the figures shows, the
use of more knots allows for a more complex contour surface.
Figures 8a, b Here
The results of including the smoothed values of the two thin-plate spline
regressions in the same cloglog model are presented in the last two columns of Table 5.
In both cases, once again, the coefficient on the number of neighboring states adopting a
lottery is not significant.
The final issue to discuss is the potential problem of spatial autocorrelation. In the
context of the lottery data, the variables that appear to be suspect from a theoretical
standpoint are lagged fiscal health, lagged income, and the percentage of the state
population adhering to a fundamentalist religion. In Table 6, I present the results of a
standard diagnostic, Moran’s I statistic (Anselin 1988), evaluated for each state capital
using the 1964 cross-section (the only year to have a complete set of observations). The
analysis is repeated for neighbors of each state in three concentric neighborhoods: 01000, 1000-2000, and 2000-3000 units (recall that the entire country is about 5000 X
7000 units).
Table 6 Here
As the results show, spatial autocorrelation is a problem for all three variables in
the smallest neighborhood, in which states are positively correlated with their neighbors
(unsurprisingly). Interestingly, the religion variable is also negatively correlated within
the largest neighborhood.13
Accounting for this spatial autocorrelation in the context of a discrete duration
model, however, is not as simple as diagnosing it. One possible approach is the use of a
geographically weighted regression (Brunsdon, Fotheringham, and Charlton 1996),
perhaps in conjunction with a generalized linear model with an appropriate (i.e. logit,
probit or cloglog) link function or a relatively new and more general Bayesian approach
(LeSage Forthcoming). The problem here is that states that adopt the policy late or not at
all in the period of observation have a disproportionate number of observations in space
and may cause the results to be biased. To illustrate this approach (mindful of its
shortcomings) I conclude this paper with a comparison of a linear probability duration
model (i.e. with the use of the least squares estimator instead of properly accounting for
the dichotomous nature of the dependent variable) of the adoption of a state lottery with a
geographically weighted regression model in which local linear regressions are estimated
(in a 1000 iteration Monte Carlo Process) for each data point in space and the bandwidth
parameter is estimated jointly with the covariates. The results are below in Table 7.
13
I believe that this is an artifact of the size of the nation relative to the neighborhood size and the historical
concentration of fundamentalists in the South.
Table 1: Comparing Models of Duration Dependence Using State Lottery Adoption
Data (Berry and Berry 1990)—Probit Function, Standard Errors in
Parentheses
Variable
Berry and Berry
Model*
(No Duration
Dependence)
-1.69 (1.22)
Linear
Duration
Dependence
Log Duration
Dependence
Cubic Spline
Duration
Dependence
-2.52 (1.28)
-2.56 (1.25)
-2.40 (1.45)
Lagged Per
Capita Income
0.023 (0.027)
0.008 (0.008)
0.013 (0.007)
0.013 (0.007)
Single Party
Control
-0.40 (0.22)
-0.41 (0.21)
-0.41 (0.21)
-0.43 (0.21)
Gubernatorial
Election Year
0.82 (0.34)
0.81 (0.36)
0.84 (0.35)
0.72 (0.37)
Neither
Election Year
nor Year After
0.59 (0.34)
0.60 (0.36)
0.57 (0.35)
0.57 (0.38)
Percentage
-0.034 (0.019)
Fundamentalists
-0.07 (0.03)
-0.05 (0.02)
-0.07 (0.03)
Number of
Neighboring
States with
Lottery
0.27 (0.086)
0.15 (0.10)
0.18 (0.10)
0.15 (0.09)
Constant
-4.51 (0.94)
-8.68 (1.50)
-4.57 (0.98)
-3.80 (0.89)
Lagged Fiscal
Health
Time Counter
Natural Log of
Time Counter
0.078 (0.022)
0.49 (0.25)
Time Spline
0.15 (0.03)
Log-Likelihood -89.56
-84.86
-87.20
PCP
97%
97%
97%
PRE
0%
3.7%
3.7%
N = 857
* all models use robust standard errors clustered at the state level
-77.78
97%
0%
Figure 1: Comparing Methods of Accounting for Duration Dependency
24
Time Counter
ln(Time Counter)
Cubic Spline
19
14
9
4
-1
65
70
75
Year
80
85
Table 2: Comparing the Fit of Duration Dependency Models
Model
Coefficient for
Number of
Neighboring States
(Standard Error)
Likelihood Ratio
Statistic (d.f.)
No Duration
Dependence
0.27 (0.09)***
N/A
Linear Function
0.15 (0.10)
9.4 (1)***
Natural Logarithm
0.18 (0.10)**
4.7 (1)**
Cubic Spline
0.15 (0.09)*
23.6 (1)***
*
p < .10 (two-tailed for first column)
** p < .05
*** p < .01
Cumulative Proportion Adopting Lottery
Figure 2: Cumulative Proportion of All 48 Contiguous States Adopting a Lottery
versus Time
0.5
0.4
0.3
0.2
0.1
0.0
65
70
75
Year
80
85
Table 3: Comparing Models of Diffusion Using State Lottery Adoption Data (Berry
and Berry 1990)—Probit Function, Standard Errors in Parentheses
Variable
Lagged Fiscal Health
Lagged Per Capita Income
Single Party Control
Gubernatorial Election Year
Neither Election Year nor Year
After
Percentage Fundamentalists
Time Spline
Number of Neighboring States
with Lottery
Regional Percentage
Number of
Neighbors
-2.40
(1.45)
0.013
(0.007)
-0.43
(0.21)
Regional Percentage
-2.78
(1.46)
0.011
(0.007)
-0.42
(0.21)
Nationwide Total
(Cubic Spline)
-2.34
(1.40)
0.015
(0.007)
-0.40
(0.21)
0.72
(0.37)
0.57
(0.38)
0.70
(0.37)
0.57
(0.38)
0.78
(0.38)
0.59
(0.38)
-0.07
(0.03)
0.15
(0.03)
-0.07
(0.02)
0.14
(0.03)
-0.08
(0.03)
0.19
(0.08)
0.15
(0.09)
0.98
(.052)
Nationwide Total Spline
-2.85
(6.89)
East Spline
North Spline
Constant
-3.80 (0.89)
-3.75 (0.89)
2
61.18 (8)
59.99 (8)
Wald χ (d.f.)
PCP
97%
97%
PRE
0%
0%
N = 857
All models estimated with robust standard errors clustered on state
-3.85 (0.84)
47.90 (8)
97%
-3.7%
Figure 3: Logit, Probit and Complementary Log-Log Compared
Probit
Cloglog
Logit
Table 4: Choice of Functional Form Matters for the Lottery Data
Variable
Probit
Logit
Complementary
Log-Log
Bayesian
Linear Model
with Logit Link
Lagged Fiscal Health
-2.78 *
(1.46)
-5.80 *
(3.14)
-5.01 *
(2.67)
-6.25
[-11.15, -4.69]
Lagged Per Capita
Income
0.011
(0.007)
0.02
(0.16)
0.016
(0.14)
0.02
[-0.003, 0.05]
Single Party Control
-0.42 **
(0.21)
-0.68
(0.48)
-0.52
(0.45)
-0.76
[-1.77, 0.07]
Gubernatorial Election
Year
0.70 *
(0.37)
1.56 *
(0.83)
1.44 *
(0.79)
1.947
[0.19, 4.45]
Neither Election Year
nor Year After
0.57
(0.38)
1.24
(0.87)
1.19
(0.81)
1.64
[-0.05, 4.19]
Percentage
Fundamentalists
-0.07 ***
(0.02)
-0.14 **
(0.07)
-0.13 **
(0.07)
-0.16
[-0.29, -0.06]
Time Spline
14.46 ***
(3.26)
28.86 ***
(6.33)
25.50 ***
(5.29)
29.40
[17.32, 41.16]
Regional Percentage
0.98 *
(.052)
1.82
(1.19)
1.65
(1.10)
1.69
[-0.85, 4.19]
Constant
-3.75 ***
(0.89)
-7.14 ***
(1.80)
-6.52 ***
(1.50)
-7.93
[-11.15, -4.70]
59.99 (8)
68.39 (8)
88.30 (8)
Wald χ 2 (d.f.)
PCP
97%
97%
97%
PRE
0%
3.7%
0%
N = 857,
N = 901 for Bayesian model (missing data estimated jointly with parameters)
All classical models estimated with robust standard errors clustered on state
Bayesian model results are mean and [2.5%, 97.5%] of posterior distributions
*
p < .10, two-tailed
** p < .05, two-tailed
*** p < .01, two-tailed
Figure 4: Empirical Posterior Distribution of Regional Percentage Coefficient
b8 sample: 2001
0.4
0.3
0.2
0.1
0.0
-5.0
0.0
5.0
Figure 5: Lottery “Learning Curves” by Region
66
Midwest
72
78
West
84
1.0
Cumulative Proportion Adopting Lottery
0.8
0.6
0.4
0.2
0.0
Southeast
Northeast
1.0
0.8
0.6
0.4
0.2
0.0
66
72
78
84
Year
Figure 6: Grid Coordinate Map of State Capitals, by Region
5000
Northeast
Southeast
Midwest
West
Northings
4000
3000
2000
1000
0
1000
3000
5000
Eastings
7000
Figure 7a and b: Splines for x and y Positions of Capital Cities
0.08
xspline
0.06
0.04
0.02
0.00
1000
3000
5000
7000
x
0.12
yspline
0.08
0.04
0.00
0
1000
2000
3000
y
4000
5000
Figures 8a and b:
Thin-plate Spline Fit to Probability of Adopting State Lottery, Knots at Five Centroids
7000
0.1
0.1
0.1
0.1
0.1
0.1
0.1
5000
0.0
0.0
y
0.0
3000
0.0
1000
1000
2000
3000
4000
5000
x
Thin-plate Spline Fit to Probability of Adopting
State Lottery, Knots at Ten Centroids
0.1
0.1
7000
0.1
0.1
0.1
0.1
0.0
5000
0.0
y
0.0
0.0
3000
0
0.
0.0
0.0
0.0
0.0
0.0
1000
1000
2000
3000
x
4000
5000
Table 5: Accounting for Geography, Complementary Log-Log Function, Standard
Errors in Parentheses
Variable
Region Dummies
Geographic
Splines (x and y)
Thin-Plate
Spline, Knots
at 5 Centroids
Lagged Fiscal Health
-3.62 (2.37)
-3.93 (2.41)
-3.94 (2.50)
Thin-Plate
Spline, Knots
at 10
Centroids
-3.83 (2.50)
Lagged Per Capita
Income
0.02 (0.01)
.0.02 (0.015)
0.02 (0.01)
0.02 (0.01)
Single Party Control
-0.60 (0.43)
-0.56 (0.45)
-0.51 (0.45)
-0.50 (0.45)
Gubernatorial Election
Year
1.41 (0.79)
1.42 (0.80)
1.42 (0.79)
1.41 (0.79)
Neither Election Year nor
Year After
1.24 (0.81)
1.25 (0.82)
1.25 (0.81)
1.25 (0.81)
Percentage
Fundamentalists
-0.12 (0.08)
-0.13 (0.07)
-0.13 (0.07)
-0.13 (0.07)
Number of Neighboring
States Adopting Policy
0.13 (0.16)
0.16 (0.17)
0.16 (0.16)
0.13 (0.17)
Time Spline
0.29 (0.06)
0.28 (0.06)
0.27 (0.05)
0.28 (0.05)
Southeast
-0.86 (1.32)
Midwest
-1.05 (0.71)
West
-0.87 (0.76)
7.58 (8.32)
9.24 (7.11)
-7.37 (1.84)
-7.55 (1.79)
East Spline
6.78 (13.52)
North Spline
5.37 (10.62)
Thin-Plate Spline
Constant
-6.27 (1.42)
-7.28 (1.70)
Log-likelihood
-78.4
-78.9
-79.1
PCP
97%
97%
97%
PRE
9%
6%
6%
N = 857
All models estimated with robust standard errors clustered on state
-78.9
97%
6%
Table 6: Spatial Autocorrelation Diagnostics
Lagged Fiscal
Health
Distance
(0-1000]
Moran’s I
0.182*
Standard Deviation
0.118
(1000-2000]
-0.029
0.062
(2000-3000]
-0.044
0.065
Distance
(0-1000]
Moran’s I
0.473**
Standard Deviation
0.116
(1000-2000]
-0.098
0.060
(2000-3000]
-0.038
0.064
Distance
(0-1000]
Moran’s I
0.516**
Standard Deviation
0.112
(1000-2000]
0.060
0.058
(2000-3000]
-0.280**
0.062
Lagged Income
Percentage
Fundamentalist
Expected value of I = -0.021 for all tests
*
p < .05
**
p < .01
Table 7: Comparing Results of Geographically Weighted Regression to Linear
Probability Model Estimates
Variable
Linear Probability Model
Estimate (Standard Error)
Geographically Weighted Regression
Simulation Mean (Standard
Deviation)
Lagged Fiscal
Health
-0.16 (0.07)
-0.14 (0.06)
Lagged Income
0.0004 (0.0004)
0.0004 (0.06)
Single Party
Control
-0.02 (0.01)
-0.02 (0.009)
Gubernatorial
Election Year
0.03 (0.015)
0.03 (0.02)
Neither Election
Year nor Year
After
0.02 (0.01)
0.01 (0.01)
Percentage
-0.001 (0.0004)
Fundamentalists†
-0.001 (0.0008)
Number of
Neighboring
States with
Lottery
0.02 (0.007)
0.01 (0.008)
Time Spline
0.93 (0.19)
1.0 (0.37)
Eastings
4852.4 (2324.6)
Northings
2323.2 (1160.5)
Constant
-0.04 (0.04)
-0.04 (0.02)
N = 857
† Null hypothesis of spatial nonstationarity rejected at less than the .01 level
References
Allison, Paul D. 1984. Event History Analysis: Regression for Longitudinal Data.
Newbury Park, Calif.: Sage.
Anselin, Luc. 1988. Spatial Econometrics: Methods and Models. Dordrecht: Kluwer.
Beck, Nathaniel, and Simon Jackman. 1998. Beyond Linearity by Default: Generalized
Additive Models. American Journal of Political Science 42:596-627.
Beck, Nathaniel, Jonathan N. Katz, and Richard Tucker. 1998. Taking Time Seriously:
Time-Series-Cross-Section Analysis with a Binary Dependent Variable. American
Journal of Political Science 42 (October):1260-88.
Berry, Frances Stokes. 1994. Sizing Up State Policy Innovation Research. Policy Studies
Journal 22 (3):442-456.
Berry, Frances Stokes, and William D. Berry. 1990. State Lottery Adoptions as Policy
Innovations: An Event History Analysis. American Politcal Science Review 84
(2):395-415.
Berry, Frances Stokes, and William D. Berry. 1991. Specifying a Model of State Policy
Innovation (Response). American Politcal Science Review 85 (2):573-9.
Berry, Frances Stokes, and William D. Berry. 1992. Tax Innovation in the States:
Capitalizing on Political Opportunity. American Journal of Political Science 36
(3):715-742.
Berry, Frances Stokes, and William D. Berry. 1999. Innovation and Diffusion Models in
Policy Research. In Theories of the Policy Process, edited by P. A. Sabatier.
Boulder, Colorado: Westview.
Box-Steffensmeier, Janet M., and Bradford S. Jones. 2002. Timing and Political Change:
Event History Modeling in Political Science. Ann Arbor: University of Michigan
Press.
Brunsdon, C., A. S. Fotheringham, and A. Charlton. 1996. Geographical Weighted
Regression: A Method for Exploring Spatial Non-Stationarity. Geographical
Analysis 28:281-298.
Frant, Howard. 1991. Specifying a Model of State Policy Innovation. American Politcal
Science Review 85 (2):571-3.
Gilks, W. R., A. Thomas, and David J. Spiegelhalter. 1994. A Language and Program for
Complex Bayeisan Modeling. The Statistician 43:169-178.
Gill, Jeff. 2001. Whose Variance is it Anyway? Interpreting Empirical Models with
State-Level Data. State Politics and Policy Quarterly 1 (Fall):318-338.
Gray, Virginia. 1973. Innovation in the States: A Diffusion Study. American Political
Science Review 67 (4):1174-1185.
Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2001. The Elements of
Statistical Learning, Springer Series in Statistics. New York: Springer.
LeSage, James P. Forthcoming. A Family of Geographically Weighted Regression
Models. In Advances in Spatial Econometrics, edited by L. Anselin and J. G. M.
Florax. New York: Springer-Verlag.
Mintrom, Michael. 1997. Policy Entrepreneurs and the Diffusion of Innovation.
American Journal of Political Science 42:738-770.
Mintrom, Michael, and Sandra Vergari. 1998. Policy Networks and Innovation Diffusion:
The Case of Education Reform. Journal of Politics 60 (1):126-148.
Mooney, Christopher Z., and Lee Mei-Hsien. 1995. Legislative Morality in the American
States: The Case of Pre-Roe Abortion Regulation Reform. American Journal of
Political Science 39 (3):599-627.
Spiegelhalter, David J., Andrew Thomas, and N. G. Best. 1999. WinBUGS Version 1.2
User Manual. Cambridge, U.K.: MRC Biostatistics Unit.
Tobler, W. 1979. Cellular Geography. In Philosophy in Geography, edited by S. Gale and
G. Olsson. Dordrecht: Reidel.
Walker, Jack L. 1969. The Diffusion of Innovations among the States. American Political
Science Review 63 (3):880-899.
Western, Bruce, and Simon Jackman. 1994. Bayesian Inference for Comparative
Research. American Politcal Science Review 88:412-23.
Appendix: WinBugs Code for Simple Bayesian Discrete Event History Model
(Linear with Logit Link)
model {
for (i in 1:n) {
adopt[i] ~dbern(pi[i])
logit(pi[i]) <- b0
+
b1*lagfiscal[i]
+
b2*lagincome[i]
+
b3*party[i]
+
b4*elect1[i]
+
b5*elect2[i]
+
b6*religion[i]
+
b7*timespline[i]
+
b8*regionpercent[i]
party[i] ~dbern(.5)
}
b0 ~ dnorm(0,0.001)
b1 ~ dnorm(0,0.001)
b2 ~ dnorm(0,0.001)
b3 ~ dnorm(0,0.001)
b4 ~ dnorm(0,0.001)
b5 ~ dnorm(0,0.001)
b6 ~ dnorm(0,0.001)
b7 ~ dnorm(0,0.001)
b8 ~ dnorm(0,0.001)
}
_______________________[Data and starting values omitted]__________________