Download Causes and Statistics - University of Rochester

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Ars Conjectandi wikipedia , lookup

Birthday problem wikipedia , lookup

Determinism wikipedia , lookup

Inductive probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Indeterminism wikipedia , lookup

Transcript
Causes and Statistics
Kevin A. Clarke † and Bear Braumoeller ‡
Do not cite without permission.
March 30, 2011
†
Associate Professor, Department of Political Science, University of Rochester,
Rochester, NY 14627-0146.
Tel.: (585) 275-5217; Fax: (585) 271-1616; Email:
[email protected].
‡
Associate Professor, Department of Political Science, The Ohio State University, Columbus, OH 43210. Tel.: (614) 292-9499; Fax: (614) 292-1146; Email:
[email protected].
No causal claim can be established by purely statistical methods,
be it propensity scores, regression, stratification, or any other
distribution-based design. —Judea Pearl
1
Introduction
The Neyman-Rubin model has made major inroads in political science due
to the fact that the model combines two powerful ideas: randomized experiments and the counterfactual account of causation. Criticisms of the
Neyman-Rubin model are often met with the claim that there are no other
options for those who seek a statistical technique that corresponds to a coherent account of causation. The goal of this paper is to show that there is
at least one additional account of causation that can be successfully matched
with an existing statistical methodology, in the sense that the causal mechanism comprising the former corresponds very closely to the data-generating
process assumed by the latter. We argue that probabilistic causality and
discrete choice models also fit this description.
In Section 2, we argue that methodology must be aligned with ontology.
In Section 3, we lay out two traditional accounts of causation, along with
a counterfactual account and a probabilistic account, and in Section 4, we
describe four desiderata that political scientists should require in a causal
account. We show that the traditional accounts, as well as the counterfactual account, fall short of satisfying these conditions. Section 5 chronicles
1
the development of a probabilistic account of causation that meets the four
desiderata described in the previous section. This account feels familiar to
political scientists, as it parallels the regression-type analyses common to the
discipline.
2
Causation to statistics
Progress in the field of political methodology in recent years has primarily
consisted of finding solutions to data problems—that is, addressing the inability of standard statistical techniques to extract information accurately
from diverse types of data. Developments in this area have been undeniably
impressive: the accomplishments of political methodologists are numerous
and diverse and have attracted considerable attention.1
This emphasis on the alignment of methodology with data has, however,
come at the expense of a focus on aligning methodology with ontology—at
least to a degree. Such a failure can be pernicious due to its invisibility.
Statistical models can only describe reality in their own terms, and the proliferation of such descriptions promotes a view of causation derived not from
theoretical logic but from statistical convention. Such implicit views of causation, once accepted, obscure the possibility of richer theoretical processes
that would require more complex methodology.
Granted, some methodologists have been alert to the need to maintain a
1
For especially prominent examples see King (1997), Beck, Katz, & Tucker (1998),
King & Zeng (2001), and Martin & Quinn (2002).
2
tight connection between ontology and method. After defining ontology as
“the fundamental assumptions scholars make about the nature of the social
and political world and especially about the nature of causal relationships
within that world,” for example, Peter Hall notes that
[o]ntology is ultimately crucial to methodology because the appropriateness of a particular set of methods for a given problem
turns on assumptions about the nature of the causal relations
they are meant to discover. . . . To be valid, the methodologies
used in a field must be congruent with its prevailing ontologies.
(Hall 2003, 374)
Nevertheless, with rare exception, connections between methodology and ontology generally remain both implicit and tenuous.
The Neyman-Rubin model, described in Holland (1986) (also see the Appendix), is one refreshing exception to this generalization. Based as it is on
the counterfactual logic of causation described by Lewis (1973), the model
is a methodology tailored to a specific understanding of causation.2 That
fact explains its advocates’ tendency to refer to the results of such analyses as “causal inference,” in contradistinction to the “descriptive” inference
provided by other methods—a distinction which, we fear, may obscure more
than it reveals. As Holland (1986, 947) pointed out, the problem of causal
inference is fundamental. The fact that causation cannot be observed implies
2
Lewis (1973) is cited prominently by King, Keohane, & Verba (1994), Morgan &
Winship (2007), Pearl (2009), and Sekhon (2009).
3
that inference will by its nature be inexact—a problem that can, and should,
be minimized, but never completely eliminated. The distinction between
“causal” and “descriptive” inference therefore does not refer to a distinction
between direct observation of and inference about causal effects, since the
former is impossible. Moreover, basing the distinction on the difference in
the precision with which inferences are made is especially arbitrary, since the
precision of those inferences may vary more across subject areas than across
methodologies.
The only viable justification for a distinction between “causal” and “descriptive” inference that places matching methods in one category and other
correlational methodologies in another lies in the close connection between
the potential outcomes framework and the counterfactual account of causation that underpins it. Basing the distinction between the two on that
foundation, however, immediately raises the question of whether the counterfactual account of causation is the only one that could be plausibly connected
to a methodological counterpart.
The answer, in short, is “far from it.” Although political scientists tend
to focus on understandings of causation that bear a striking resemblance to
the uncomplicated ontologies implied by the simplest of their methodologies
(Achen 1992, 197; Beck & Jackman 1998, 597), a collection of essays (Sosa
& Tooley 1993) on the varieties of causation contains no fewer than fifteen
chapters. There is no single, uncontested account of causation; nor, as we
intend to demonstrate, is there only one that can plausibly connected to an
4
existing statistical method.
3
A few accounts of causation
What follows are a few (necessarily brief) existing accounts of causation.
Our goal is not to be comprehensive, but rather, to demonstrate that many
understandings of causation exist side-by-side in the philosophy of science
literature by discussing a few of the most prominent extant accounts. In the
subsequent section we assess each by measuring it against four desiderata to
determine its utility for social scientists.
3.1
Necessity and sufficiency
Traditional analyses of causation specify causal relations in terms of necessary and sufficient conditions. Let c be a cause of e, an event. A general
specification of causation in terms of these conditions might be as follows
(see Sosa & Tooley 1993):
1. Sufficiency
c is a cause of e if and only if c and e are actual and c is ceteris paribus
sufficient for e.
2. Necessity
c is a cause of e if and only if c and e are actual and c is ceteris paribus
necessary for e.
5
3.2
Counterfactuals and manipulation
Lewis’s (1973) paper on counterfactual causation is philosophically technical,
but the main part of his account can be summarized succinctly,
An event e causally depends on an event c just in case if c had
not occurred e would not have occurred.
The same condition written in terms of smoking and lung cancer is
∃x(∼ Sx →∼ LCx).
That is, there exists a person who, if he had not smoked, would not have
lung cancer.
3.3
Probabilistic causality
The basic idea of probabilistic causality is that the presence of a cause should
raise the probability of its effect. A general specification of causation in terms
of probability might be
c is a cause of e if and only if e is more likely in the presence of
c than in the absence of c, ceteris paribus.
This account of causation was developed by Reichenbach (1956), Good (1961),
Good (1962), Suppes (1970), Cartwright (1979), Salmon (1998), Humphreys
(1989), and Eells (1991).
6
4
Desiderata and problems for the traditional
analyses
Political scientists, and social scientists in general, should demand four things
of an account of causation: an ability to handle general causal statements,
an ability to handle multiple causes (overdetermination), an acceptance of
non-manipulable causes, and a compatibility with indeterminism. Justifying these desiderata is not difficult. Social scientists generally wish to make
general causal statements of the type “education causes liberalism” versus
singular causal statements of the type “Jane’s education caused Jane’s liberalism.” Theories positing multiple causes are common throughout the social
sciences (Braumoeller 2003, Humphreys 1989). Scholars often wish to speak
of the effects of attributes such as race or gender, which are non-manipulable,
and social phenomena are quite likely to be indeterminate. We discuss each
of these issues below and demonstrate the problems they present for traditional analyses.
4.1
General Causation
Political scientists generally want to make claims of the form “c causes e.”
To be specific, they wish to make claims that are analogous to
Smoking causes lung cancer,
as opposed to claims such as
7
Bob’s smoking caused his lung cancer.
Taking an example from the literature of international relations, political
scientists wish to make claims similar to
System uncertainty causes an increase in dispute initiation (Huth,
Gelpi, & Bennett 1993),
and not
System uncertainty caused a dispute initiation between Israel and
Egypt.
A clue to the distinction between the first and third statements and between the second and fourth statements can be found in the tense of the
verb “cause.” Singular causal statements such as the second and fourth
statements address past events and answer the question, “Did c cause e?”
General causal claims, on the other hand, answer the question, “Does c cause
e?” The difference is between a statement that makes a claim that is contingent upon a specific time and place, and a claim that is generalized. Political
scientists want to make generalized claims about political phenomena and
therefore need a theory of causation that can accommodate such claims.
The unique characteristic of general causal statements is that they may
true despite numerous examples to the contrary (Cartwright 1979, Carroll
1991). Smoking may cause lung cancer even if it did not cause lung cancer
in Bob’s case or Jane’s case or Fred’s case.
8
The problems these statements present to a necessary or sufficient analysis
are clear. A sufficient condition takes the form
S → LC.
That is, smoking is sufficient for lung cancer, which means that everyone
who smokes contracts lung cancer. That statement is clearly untrue. Many
smokers never contract lung cancer, and yet, we still believe that smoking
causes lung cancer.3
Necessary condition accounts suffer from similar flaws. A necessary condition takes the form
LC → S.
That is, smoking is necessary for lung cancer, which means that everyone
who contracts lung cancer smoked. This statement is also untrue. There
exist numerous people with lung cancer who never touched a cigarette, and
still we believe that smoking causes lung cancer.4
As for counterfactuals, recall that social scientists wish to make statement
of the kind “smoking causes lung cancer,” or more appropriately, “dyadic
3
Even if the biological link had not been found.
While the above makes clear that traditional analyses cannot accommodate general
causal statements, it is less clear whether or not separate theories of singular and general
causation are necessary. Good (1961), Sober (1984) and Eells (1991) are among those
who believe that separate analyses are necessary. Carroll (1991) and Hitchcock (1995) are
among those who believe that one analysis should work for both levels. We assume that
singular and general causal claims need not be explained by the same theory.
4
9
democracy causes peace.” Lewis (1973) is quite explicit that his account
does not apply to such cases, “My analysis is meant to apply to causation in
particular cases. It is not an analysis of causal generalizations.” The problem he points to concerns the interpretability of the general counterfactual.
The statement “If Bob had not smoked, he would not have contracted lung
cancer” is perfectly interpretable. The statement “If people do not smoke,
they would not get lung cancer” is demonstrably false, and its meaning is
unclear. It could mean, for example, that for some cases of smoking and
that for some cases of lung cancer, the smoking caused the lung cancer. It
could also mean that for every case of lung cancer, there exist some cases
of smoking where smoking caused the lung cancer. It could even mean that
for every case of smoking, there exist some cases of lung cancer such that
smoking caused them.
4.2
Multiple Causes (Overdetermination)
The presence of multiple causes also creates serious problems for the traditional analyses of causation. If these theories cannot accommodate multiple
causes, they are useless to the social sciences, where very few social theories
can be adequately represented by single variable models. Multivariate analysis has been the mainstay of empirical social science, and its use implies
that multiple causal factors are at work. An example from Mackie (1974, 44)
illustrates the problems of overdetermination:
10
Lightning strikes a barn in which straw is stored, and a tramp
throws a burning cigarette butt into the straw at the same place
and at the same time: the straw catches fire.
The necessity analysis of causation requires that a cause be a necessary
condition of its effect. Neither the lightning strike nor the cigarette butt in
this example is necessary for the fire to start. If lightning had not struck, the
cigarette butt would have caused the fire. If the tramp had not thrown the
cigarette, the lightning strike would have caused the fire. The lightning strike
and the cigarette butt are certainly not jointly necessary. Either way then,
by adhering to the necessity account of causation, we would have to deny
that either the lightning strike or the cigarette butt caused the fire (Sosa &
Tooley 1993).
A similar problem plagues the sufficiency analysis. Let us assume, following Humphreys (1989), that the basic minimum definition of cause is that c
is a cause of e only if c’s existence contributes to e’s existence. Let us further
assume that either the lightning strike or the cigarette butt is sufficient to
start the fire. Given the lightning strike, the cigarette butt contributes nothing to the onset of the fire. Similarly, given the cigarette butt, the lightning
strike contributes nothing to the onset of the fire. Neither the lightning strike
nor the cigarette butt then contributed to the existence of the fire and therefore neither meets Humphreys’s minimum standard. Neither the lightning
strike nor the cigarette butt, then, is the cause of the fire.
11
The problem of multiple causes or overdetermination fares no better for
the counterfactual account. Consider two people simultaneously shooting a
third person through the heart. If bullet A had not been fired, the third
party would not have died. That is not true, bullet B would have killed him,
and yet no one would have a problem stating that bullet A was a cause of
the untimely death. Consider again Schelling’s (1966) claim that either a
strong deterrent or a strong defense could prevent war. Neither the absence
of either a strong deterrent nor the absence of a strong defense would be
sufficient to generate war, which is what the counterfactual requires.5
Braumoeller (2003) provides some examples of overdetermination from
the international relations literature. These include Schelling’s (1966) claim
that either a strong deterrent or a strong defense could prevent war; Mueller’s
(1988) claim that either nuclear weapons or the lessons of World War II would
have produced Cold War stability; Morrow’s (1991) claim that pursuit of
either autonomy goals or security goals can lead to the decision to seek an
alliance.
4.3
Non-manipulable causes
“No causation without manipulation” is a principle espoused by a number of
statistical accounts of causation, as well as a number of philosophical accounts
(Pearl 2009, Woodward 2003, Holland 1986, Cook & Campbell 1979). Under
5
It might be argued that these predicates are not events, but they could easily be
turned into events.
12
such a principle, an attribute of a unit, such as race or gender, cannot be
a cause due to the fact that the attribute could not be used as a treatment
in an experiment. Also ruled out would be examples such as the causal
relationship between the gravitational attraction of the moon and the motion
of the tides (Woodward 2008). In general, necessity and sufficiency accounts
of causation have no trouble accommodating non-manipulable causes. As we
discuss below, the same is not true of counterfactual accounts.
While Lewis’s (1973) counterfactual account is not strictly a manipulationist account, it is closely related. A counterfactual analysis must explain
what is to remain fixed and what is changed in the closest or most-similar
world (Woodward 2008). Correspondingly, Holland (1986, 954) limits the
notion of a cause in the Neyman-Rubin model to events that could serve as
treatments in an experiment. In doing so, he leaves no role either for the
attributes of units or for the actions of units (Goldthorpe 2001, 6). Political scientists, however, often have compelling theoretical reasons to speak of
attributes, such as dyadic democracy or race or gender, as causes.
4.4
Indeterminism
Determinism is simply the philosophical position that if an event has happened or will happen, then it was always meant to happen. That is, all actions and reactions in the universe have been predetermined since the dawn
of time. The relationship between necessary and sufficient conditions and
determinism is easily seen if we rewrite the conditions in general terms:
13
c → e
Sufficient condition
e → c
Necessary condition
The sufficient condition reads, “If c (the cause) occurs, then e (the effect)
occurs.” The sufficient condition states that if c occurs, e must follow. If
the occurrence of e does not always follow c, as would be the case in a
nondeterministic world, then there can be no sufficient conditions.
A similar argument may be made for necessary conditions. The necessary
condition reads, “e (the effect) occurs only if c (the cause) occurs.” The
condition states that if e occurred, then c must have occurred. If e can occur
by chance, as would be the case in a nondeterministic world, then there exist
no necessary conditions. Necessary and sufficient conditions and the analyses
that depend upon them (such as Mackie’s INUS conditions) are incompatible
with indeterminism.
The counterfactual account is just as rooted in determinism as the other
traditional accounts are. Lewis (1973) explicitly restricts his attention to
determinism, stating that he is “content” to offer an account that works
under determinism.
Whether or not the world is actually indeterministic, we can certainly
accept the premise as a working hypothesis. The fact that determinism
is under attack in the physical sciences makes determinism in the social
14
sphere even less likely. Humphreys (1989, 17) makes the following persuasive
argument,
Consider a man who, on a whim, takes an afternoon’s motorcycle
ride. Descending a hill, a fly strikes him in the eye, causing him
to lose control. He skids on a patch of loose gravel, is thrown
from the machine, and is killed. This sad event, according to the
universal determinist, was millions of years beforehand destined
to occur at the exact time and place that it did...This claim, when
considered in an open-minded way, is incredible.
Humphreys goes on to argue that whatever may actually be true, determinism should not be accorded “high initial probability.”
Finally, there are a host of other problems that plague counterfactuals.
Lewis has to add special requirements that rule out certain kinds of counterfactuals. Events, for example, must be distinct so as rule out examples
such as “If I had not typed the letter B, I would not have typed the word
Bear.” A non-backtracking condition must be added to rule out cases where
a counterfactual dependence between c and e does not mean that c causes
e (Horwich 1987). An example is “If the war had happened, it would have
been over oil.” Non-causal action is also a problem (Kim 1973). Consider
the counterfactual “If I had not pulled the lever, I would not have voted for
Obama.” While a perfectly reasonable counterfactual, pulling the lever does
15
not cause the vote for Obama, although it did cause the vote to be cast.
Those two events, however, are quite different.
In the next section, we show that probabilistic causality does not run into
many of these problems and can be coherently linked with discrete choice
statistical models.
5
Probabilistic causality
As a reminder, the basic idea of probabilistic causality is that the presence of
a cause should raise the probability of its effect, and a general specification
of causation in terms of probability might be
c is a cause of e if and only if e is more likely in the presence of
c than in the absence of c, ceteris paribus.
Accounts of this type were developed by Reichenbach (1956), Good (1961),
Good (1962), Suppes (1970), Cartwright (1979), Salmon (1998), Humphreys
(1989), and Eells (1991).
Note that general causal statements present no problems for probabilistic
analyses of causation. Proponents of these accounts claim only that causes
make their effects more likely, not that the causes must always be present
when the effects are present (necessary conditions) or that effects must always
be present when their causes are present (sufficient conditions). The fact that
general causal claims describe relations that do not hold in every instance is
16
easily accommodated by probabilistic accounts because of the very nature of
probability. To say that the probability of “heads” when flipping a coin is
50% does not mean that half of every sequence of coin flips should result in
“heads.” The statement could very well be true even if no sequence (except
the infinite sequence) resulted in 50% “heads.”
Probabilistic accounts do not restrict the notion of cause to manipulable
events, nor do they require “all or nothing” determinations. The lightning
strike and the cigarette butt may both raise the probability of the fire and
therefore both may be classified as a cause. Thinking about causation in
probabilistic fashion allows for talk of “contributing” causes or factors in a
way that the other analyses do not. It is then possible to talk of how much
one factor or another contributes to the effect.
Probabilistic accounts of causation are, by their very nature, compatible
with indeterminism. What must be understood is the relation between causation and chance. Under the probabilistic account, causes raise the probabilities of their effects. Whether those effects happen, though, is up to chance;
the effect or event either occurs or does not occur. The point to remember
here is that chance is not a causal power (Humphreys 1989).
As stated above, the general idea behind a probabilistic analysis of causation is quite simple: a cause raises the background probability that an
event or effect occurs. The condition is generally formalized in the notation
of conditional probability. c is a cause of an effect, e, if and only if:
17
Pr(e| c) > Pr(e| ∼ c).
That is, the probability that e occurs is greater in the presence of c
than in the absence of c. The canonical example from the philosophical
literature is again the relationship between lung cancer and smoking. While it
is certainly possible to contract lung cancer without smoking, the probability
of attracting lung cancer is greater if one smokes. Numerous similar examples
may be found throughout political science: higher education increases the
probability of voting Democratic, dyadic democracy increases the probability
of peace between dyads, and bipolarity increases the probability of stability.
Unfortunately, the above does not suffice as a statement of probabilistic
causation. To take another canonical example, a storm is more probable
when a barometer is falling than when a barometer is rising. That is, storms
and barometers are positively correlated. Under our condition, we would
have to assign the falling barometer the status of cause, although falling
barometers are clearly not the cause of storms. The simple inequality above,
therefore, is too vague and must be refined.
The problem with the barometer example is, of course, the problem of
spurious correlation, an idea that is familiar to social scientists of all persuasions. The stumbling block is that our intuitions regarding causation in this
case do not match the physical correlations that we observe. The goal is to
find a sufficient condition that always provides correlations that match our
18
causal intuitions. We know very well that falling barometers do not cause
storms, and we need a method of calculating the conditional probability that
reflects that fact. If we can find such a condition, we can apply it to situations
where our intuitions are not as good.
The history of probabilistic causation in philosophy is really the history
of attempts to match causal intuitions to correlations. In the discussion that
follows, I draw explicitly on the work of Cartwright (1979) and Eells (1991),
who together have crafted one of the more successful attempts to pin down
a theory of probabilistic causation. Their work in turn owes quite a bit to
Reichenbach (1956) and Suppes (1970).
In the simplest case of spurious correlation, c raises the probability of e,
but c is not a cause of e. Instead, a third factor, z, is correlated with both
c and e and therefore accounts for the noncausal correlation between c and
e. Looking at the barometer/storm example, it is an approaching cold front
that causes both the falling barometer and the storm and thereby generates
the spurious correlation. Following Reichenbach (1956), Eells defines the
following probabilistic structure which Salmon (1998) labels a “conjunctive
fork”:
19
Pr(c|z) > Pr(c| ∼ z),
Pr(e|z) > Pr(e| ∼ z),
Pr(e| c & z) > Pr(e| ∼ c & z),
Pr(e| c & ∼ z) > Pr(e| ∼ c & ∼ z),
all of which, when taken together, imply
Pr(e| c) > Pr(e| ∼ c).
The first inequality states that z is correlated with c (cold fronts and
falling barometers are correlated). The second inequality states that z is
also correlated with e (cold fronts and storms are correlated). The third
and fourth inequalities state that z “screens off” c from e. In more familiar
terms, the third and fourth inequalities state that by controlling for z (i.e.,
holding z “fixed”), the spurious correlation between c and e disappears (when
we control for approaching cold fronts, we see that there is no correlation
between storms and falling barometers).
Based on the above, we could enhance our original statement of probabilistic causality with the results derived above:
There exists no z such that its existence being temporally prior
to c and e would cause c to be “screened off” from e.
20
The issue, though, is not so simple. So far we have addressed only positive correlation and positive causal relevance (i.e., c raises the probability
of e). The correlations we observe, however, may also be negative or zero.
The causal relations between the factors may also be negative (c lowers the
probability of e) or neutral (c has no effect on the probability of e.) Eells
points out that each of these possible correlational states may be combined
with each of the states of causal relevance (see Eells, 1991, chapter 2 for
examples). Table 1 lists the possible combinations.
Table 1:
Causal Relevance
Correlation Positive
Neutral
Negative
Positive
Match
Mismatch Mismatch
Zero Mismatch
Match
Mismatch
Negative Mismatch Mismatch
Match
Along Table 1’s main diagonal, the observed correlations match our causal
intuitions, and hence, there is no problem. The interesting cases are in the
off-diagonal cells. Here, the observed correlations do not match our causal
intuitions, and we observe spurious correlations.
The lesson, Eells concludes, it that positive correlation is neither necessary nor sufficient for positive causal relevance. What can be said about these
cases of spurious correlation is that the explanation involves a third factor, z,
that is causally relevant to e “independently of c’s causal role, if any, for e”
Eells (1991). Cartwright (1979) points out that Simpson’s paradox explains
21
all these examples and that any association between two variables in a given
population, such as,
Pr(e| c) > Pr(e| ∼ c),
Pr(e| c) = Pr(e| ∼ c),
Pr(e| c) < Pr(e| ∼ c),
can be reversed in the subpopulations by finding a z that is correlated with
both c and e. She argues that the condition holds, but only in situations
where all other causal factors are held fixed—a requirement she refers to as
“causal homogeneity.” Cartwright’s account is then,
c causes e if and only if c increases the probability of e in every
situation which is otherwise causally homogeneous with respect
to e.
The above is the starting point for Eells’ account of causation. Eells
demonstrates that controlling for “causal background contexts” is a sufficient condition for matching our intuitions to the observed correlations. The
“causal background context” requirement is precisely the same as Cartwright’s
requirement of causal homogeneity. Both requirements simply state that all
factors that are causally relevant to e, but are causally independent of c, must
22
be held fixed. Employing Hitchcock’s (1993) notation, Eells’s necessary and
sufficient condition for positive causal relevance is
Pr(e| c ∩ zi ) > Pr(e| ∼ c ∩ zi ), ∀zi ,
where zi is a partition of the causal background context. A partition is a
mutually exclusive and exhaustive set of factors that are causally relevant to
e. Using the same notation, we can define negative causal relevance
Pr(e| c ∩ zi ) < Pr(e| ∼ c ∩ zi ), ∀zi ,
and causal neutrality
Pr(e| c ∩ zi ) = Pr(e| ∼ c ∩ zi ), ∀zi .
The three inequalities simply state that in order to affirm the causal relevance
of factor c, whether positive, negative or neutral, we must control for all other
causal factors.
To be causally relevant then, according to Eells, c must have an effect
on e “beyond that which is explainable by other, independent, causes of e
that may be correlated with c” (Eells 1991, 84). The three inequalities above
explicitly state that the equality or inequality must hold for all zi . The
requirement is know as “context-unanimity” and means that a cause must
raise or lower, or neither raise nor lower, the probability of its effect across
all elements of the partition z (Hitchcock 1995). The “context-unanimity”
23
requirement has not been universally accepted. The counterexample comes
from Dupre (1984, 172):
Suppose that scientists employed by the tobacco industry were
to discover some rare physiological condition the beneficiaries of
which were less likely to get lung cancer if they smoked than if
they didn’t.
The problem is that were we to hold the “rare physiological condition”
fixed, as context-unanimity requires, we would find a situation where smoking
lowers the probability of lung cancer and thereby violates Eels’s necessary
and sufficient condition. We would conclude therefore that smoking is not a
cause of lung cancer.
Eells is, of course, aware of this argument and answers it by arguing that
a probabilistic causal claim may only be made in reference to a particular
population. The example Eells uses is that the conditional probability of
having a heart attack given 15 years of heavy smoking is likely to be different
for 30 years-olds and 50 year-olds. In order to assess accurately the causal
relevance of c on e, then, we must identify both the causal background context
and the relevant population.
The account of probabilistic causation provided above should prove to
be somewhat familiar to quantitative political scientists as the account is
analogous to discrete choice models in statistics. Cartwright (1979, 435), in
fact, points out that Eels’s condition is known to statisticians as the partial
24
conditional probability of e on c, holding zi fixed, and as such, forms the
basis for regression-type analyses. Note the match to, for example, binary
choice models, where the probability of the dependent variable taking a value
of 1 is the partial conditional probability holding the independent variables
x fixed,
Pr(Y = 1|x) = F (x, β).
Letting F (x, β) = Φ(Xβ), the result is the probit model,
Z
Xβ
Pr(Y = 1|x) = Φ(Xβ) =
φ(t)dt.
−∞
Letting F (x, β) = Λ(Xβ), the result is the logit model,
Pr(Y = 1|x) = Λ(Xβ) =
eXβ
.
1 + eXβ
The match between these statistical models and probabilistic causality
is so close that working with these techniques in practice requires fulfilling
the conditions that Eells and others have laid down for causal relevance.
Context-unanimity, for example, must hold. That is, to demonstrate the
causal relevance of variable x1 , it is necessary that x1 have an effect on the
dependent variable beyond what is explainable by the other independent
variables. Meeting the condition in practice requires “proper specification”
before the results of the estimation may be believed (Freedman 1987). This
assumption is no more or less difficult to meet in practice than the assumption
25
of strong ignorability in the Neyman-Rubin model. The difference in the two
approaches lies in the underlying causal accounts that impart causal meaning
to their results.
6
Conclusion
Our goal in this paper is a simple existence proof: We seek to demonstrate
that more than one coherent account of causality corresponds to a statistical
methodology widely used by political scientists. To that end, we explore the
connection between the probabilistic account of causation and discrete choice
models in statistics. We conclude that the two are indeed congruent, and
that the use of the latter requires only demonstrating that the assumption of
context-unanimity is reasonable—a challenge no greater than demonstrating
strong ignorability in the Neyman-Rubin model. Moreover, we lay out four
desiderata that political scientists should require in an account of causality. These include an ability to handle general causal statements, an ability
to deal with cases of overdetermination, an acceptance of non-manipulable
causes, and a compatibility with indeterminism. We argue that a probabilistic account of causality meets these desiderata more successfully than
a counterfactual account. Thus, the Neyman-Rubin model, which relies on
a counterfactual understanding of causality, is likely not the best model for
assessing causality in quantitative political science.
26
A
Neyman-Rubin
Following the notation of Holland (1986), let Yt (u) be the effect of cause t on
population unit u. Let Yc (u) be the effect of cause c on the same population
unit u. Let t stand for “treated” and c stand for “not treated” or “control.”
The effect of cause t on u, relative to the control group is
Yt (u) − Yc (u).
The effect above cannot be estimated because unit u cannot simultaneously be both in the treatment group and the control group. In a real sense,
causal inference is a missing data problem because only one of these effects
can be observed on unit u (Sekhon 2008).
The solution to this problem is to turn attention to the average treatment
effect, or
T = E(Yt − Yc )
= E(Yt ) − E(Yc ).
The equations above tell us that if the treatment and control groups
meet certain criteria, then we can learn about the average treatment effect
by averaging over those in the treatment group, averaging over those in the
control group, and taking the difference.
27
The key to estimating T accurately lies in the “certain criteria” just
mentioned. Let S be the indicator of which potential outcome is actually
observed, and Ys be the actual observed data. The treatment and control
groups must be selected in such a way that assignment to the treatment
group is independent of the potential outcomes Yt and Yc . That is,
{Yt (u), Yc (u)}⊥
⊥S,
where ⊥
⊥ stands for independence.
When the independence condition holds, E(Yt ) = E(Yt |S = t), similarly
for E(Yc ).6 The average treatment effect can then be written
T = E(Ys |S = t) − E(Ys |S = c).
In experiments, the independence of treatment assignment and the potential outcomes is achieved through randomization. International relations
scholars, however, only have observational data with which to work. Without
the benefits of randomization, the independence of treatment assignment is
unlikely. The solution is to condition on a set of observed covariates X, such
that the potential outcomes and the treatment are independent
{Yt (u), Yc (u)}⊥
⊥S|X.
6
Holland (1986, 948) points out that E(Yt ) and E(Yt |S = t) are in general different as
the former is the average over all u, and the latter is the average over just those in the
treatment group.
28
The average treatment effect conditional on X is then written
T = E(Ys |X, S = t) − E(Ys |X, S = c).
When the above assumption holds, treatment assignment is said to be strongly
ignorable.7 The idea here is to mimic the conditions of a randomized experiment.
7
We must also assume that treated and untreated cases exist for every value of X.
29
References
Achen, Christopher H. 1992. “Social Psychology, Demographic Variables,
and Linear Regression: Breaking the Iron Triangle in Voting Research.”
Political Behavior 14 (3): 195-211.
Beck, Nathaniel, Jonathan N. Katz, & Richard Tucker. 1998. “Taking Time
Seriously: Time-Series–Cross Section Analysis with a Binary Dependent
Variable.” American Journal of Political Science 42 (October): 825-844.
Beck, Nathaniel, & Simon Jackman. 1998. “Beyond Linearity by Default:
Generalized Additive Models.” American Journal of Political Science
42 (2): 596-627.
Braumoeller, Bear F. 2003. “Causal Complexity and the Study of Politics.”
Political Analysis 11 (3): 209-233.
Carroll, John W. 1991. “Property-Level Causation.” Philosophical Studies
63: 245-270.
Cartwright, Nancy. 1979. “Causal Laws and Effective Strategies.” Nous 13:
419-437.
Cook, T., & D. Campbell. 1979. Quasi-Experimentation: Design and Analysis Issues for Field Settings. Boston, MA: Houghton Miflin Company.
Dupre, John. 1984. “Probabilistic Causality Emancipated.” Midwest Studies
in Philosophy 9: 169-175.
Eells, Ellery. 1991. Probabilitistic Causality. New York: Cambridge University Press.
Freedman, David A. 1987. “As Others See Us: A Case Study in Path Analysis.” Journal of Educational Statistics 12 (Summer): 101-128.
Goldthorpe, John H. 2001. “Causation, Statistics, and Sociology.” European
Sociological Review 17 (March): 1-20.
Good, I.J. 1961. “A Causal Calculus I.” British Journal for the Philosophy
of Science 11: 305-318.
30
Good, I.J. 1962. “A Causal Calculus II.” British Journal for the Philosophy
of Science 12: 43-51.
Hall, Peter A. 2003. “Aligning Ontology and Methodology in Comparative
Research.” In Comparative Historical Research in the Social Sciences,
ed. James Mahoney & Dietrich Rueschemeyer. (forthcoming): Cambridge University Press.
Hitchcock, Christpher Read. 1993. “A Generalized Probabilistic Theory of
Causal Relevance.” Synthese 97: 335-364.
Hitchcock, Christpher Read. 1995. “The Mishap at Reichenbach Fall: Singular vs. General Causation.” Philosophical Studies 78: 257-291.
Holland, Paul W. 1986. “Statistics and Causal Inference.” Journal of the
American Statistical Association 81 (December): 945-960.
Horwich, Paul. 1987. Asymmetries in time: problems in the philosophy of
science. Cambridge, MA: The MIT Press.
Humphreys, Paul. 1989. The Chances of Explanation: Causal Explanation in
the Social, Medical, and Physical Sciences. New York: Oxford University
Press.
Huth, Paul, Christopher Gelpi, & D. Scott Bennett. 1993. “The Escalation of Great Power Militarized Disputes: Testing Rational Deterrence
Theory and Structural Realism.” American Political Science Review 87
(September): 609-623.
Kim, Jaegwon. 1973. “Causes and Counterfactuals.” Journal of Philosophy
70 (October): 570-572.
King, Gary. 1997. A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data. Princeton: Princeton University Press.
King, Gary, & Langche Zeng. 2001. “Logistic Regression in Rare Events
Data.” Political Analysis 9 (2): 137-163.
King, Gary, Robert O. Keohane, & Sidney Verba. 1994. Designing Social
Inquiry. Princeton: Princeton University Press.
31
Lewis, David. 1973. “Causation.” Journal of Philosophy 70 (October): 556567.
Mackie, J.L. 1974. The Cement of the Universe. New York: Oxford University Press.
Martin, Andrew D., & Kevin M. Quinn. 2002. “Dynamic Ideal Point Estimation via Markov Chain Monte Carlo for the U.S. Supreme Court,
1953–1999.” Political Analysis 10 (2): 134-153.
Morgan, Stephen L., & Christopher Winship. 2007. Counterfactuals and
Causal Inference. New York: Cambridge University Press.
Morrow, James D. 1991. “Alliances and Asymmetry: An Alternative to
the Capability Aggregation Model of Alliances.” American Journal of
Political Science 35 (November): 904-933.
Mueller, John. 1988. “The Essential Irrelevance of Nuclear Weapons: Stability in the Postwar World.” International Security 13 (Fall): 45-69.
Pearl, Judea. 2009. Causality: Models, Reasoning, and Inference. New York:
Cambridge University Press.
Reichenbach, Hans. 1956. The Direction of Time. Berkeley, CA: University
of California Press.
Salmon, Wesley C. 1998. Causality and Explanation. New York: Oxford
University Press.
Schelling, Thomas C. 1966. Arms and Influence. New Haven, CT: Yale
University Press.
Sekhon, Jasjeet S. 2008. “The Neyman-Rubin Model of Causal Inference
and Estimation via Matching Methods.” In The Oxford Handbook of
Political Methodology, ed. Janet M. Box-Steffensmeier, Henry E. Brady,
& David Collier. New York: Oxford University Press.
Sekhon, Jasjeet S. 2009. “Opiates for the Matches: Matching Methods for
Causal Inference.” Annual Review of Political Science 12: 487-508.
32
Sober, Elliott. 1984. “Two Concepts of Cause.” In PSA 1984, ed. Peter
Asquith & Philip Kitcher. Vol. 2 East Lansing, MI: Philosophy of Science Association.
Sosa, Ernest, & Michael Tooley. 1993. Causation. New York: Oxford University Press.
Suppes, Patrick. 1970. A Probabilistic Theory of Causality. Amsterdam:
North-Holland.
Woodward, James. 2003. Making Things Happen: A Theory of Causal Explanation. New York: Oxford University Press.
Woodward, James. 2008. “Causation and Manipulability.” Stanford Encyclopedia of Philosophy.
33