Download Cheng Models - Philsci

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

Transcript
7.
Cheng Models
1. Introduction
The most interesting and novel recent psychological account of adult judgements of causation
has been developed by Patricia Cheng (1997) and her collaborators (Cheng and Novick, 1999).
Cheng argues that the account uniquely captures many of the phenomena of adult judgement, but
even if it did not, or does not, it is a brilliant piece of mathematical metaphysics.
Nancy Cartwright (1987) proposed that there are in the world various fundamental capacities of
kinds of events or circumstances. The capacity of a kind of circumstance C to bring about
another kind of circumstance E is the probability of E conditional on C and on the absence of all
other potential causes of E. Ordinary objects in our everyday world are amalgams of components
with fundamental capacities. Cheng’s psychological theory of our tacit causal theories is a
generalization of that idea: we judge instances of kinds to have causal powers to produce or to
prevent kinds of effects; the powers can act separately or, in some cases, they may interact. We
make minimal assumptions about our world that enable us often to form judgements of causal
powers, which we in turn can use in prediction. That is the psychological theory for which Cheng
has provided evidence. I am concerned here with using Bayes net methods to unravel
implications of the theory that have not yet been tested. Cheng’s models of our models of
causation turn out to be Bayes nets under a particular parameterization, which means we can use
what is known about search and estimation for Bayes nets to extend Cheng’s theoretical results.
That is the aim of this chapter.
2. Cheng's Model of Human Judgement of Generative Causal Power.
1
The metaphysics of Cheng’s theory can be viewed as an anatomy of kinds of causal relationships.
Cheng considers only causal factors that have two values, present or absent, and only the
presence of a factor can have a causal role. She divides causal relations into two sorts: generative
and preventive. Generative causal factors increase the probability of an effect, preventive causal
factors decrease it, both subject to appropriate conditions. Causal powers are further divided into
the simple and the compound, or interactive. Instances of two or more simple causal powers for
the same kind of effect produce an instance of that effect independently of one another. That is, if
A and B have simple, non-interactive, generative causal powers to produce E, then when A and B
are both present, A may cause E or B may cause E, or both may separately cause E, and the
probability that A, if A occurs, causes E ( which is not the probability of E given A) is
independent (in probability) of the probability that B, if B occurs, causes E. When A and B
generatively interact, the effect may be produced by A alone, or by B alone, or by both acting
separately, or by A and B acting conjointly. Similar relationships apply when one or both causes
are preventive, or which their interaction tends to prevent E.
That is the metaphysics, and it may seem to many philosophers and statisticians an unpromising
basis for a normative, let alone descriptive, theory of causal judgement that is both a real guide in
life and has real empirical content. As we will see, there is a good case to be made that it is a
more promising basis than conventional statistical analysis provides.
Given data on the joint frequency of candidate causes (of effect E) and of E, when unobserved
causes of E may also be acting, how do people judge the efficacy or causal power of any
particular observed candidate cause? Suppose they know, or believe, that all unobserved causes
of E are generative, and one or more generative candidate causes of E are observed along with E.
Consider the simplest case in which there is one observed generative causal factor, C, and one
unobserved generative causal factor, U. In that case, E occurs if and only if either C occurs and C
causes E or if U occurs and U causes E.
2
We let the parameter qce represent the proposition that C causes E, given that C occurs. And
analogously for que. The q parameters have two possible values; 1 represents that the causal
factor, if it occurs, acts to bring E about; 0 represents that the causal factor, even if it occurs, does
not bring E about. We let C, U and E be binary variables; C = 1 if C occurs, and C = 0
otherwise, and analogously with U and E.
So E = 1 if and only if qce C = 1 or queU = 1. Taking the probability of both sides:
(1) pr(E = 1) = pr(qce C = 1 or queU = 1)
For any propositions A, B, the probability of the proposition that A or B is the probability of A
plus the probability of B minus the probability of the proposition that A and B. Hence:
(2)
pr(E = 1) = pr(qceC = 1) + pr(queU = 1) - pr(qce queCU = 1)
Now assume that qce, que are jointly independent. Then (2) becomes1
(3)
pr(E = 1) =
pr(qce = 1)  pr(C = 1) + pr(que=1)  pr(U = 1) –
pr(qce = 1)  pr(que=1)  pr(C = 1, U = 1)
Hence the probability that E = 1 conditional on C = 1 and U = 0 is
(4)
pr(E = 1 | C = 1, U = 0) = pr(qce = 1).
which justifies describing pr(qce = 1) as the "causal power" of C to produce E.
1
Equation 3 has a long history. The first occurrence I know of is in a paper in the 1850s by the great 19 th century
mathematican, Arthur Cayley, responding to a problem about causal inference posed by George Boole.Cayley
assumes U and C are independent. Boole objected to Cayley’s solution to his problem, but the solution, and equation
(3), were defended by Richard Dedekind. Cayley’s argument for (3) was quite different from Cheng’s.
3
It still remains mysterious how anyone could know—or reasonably estimate—the causal power
of C to produce E. But assume that it is known, or believed, that C and U are independent. From
(3), and using the independence of C and U:
(5)
pr(E = 1 | C = 1) =
pr(qce = 1) + pr(que=1)  pr(U = 1) - pr(qce = 1)  pr(que=1)  pr(U = 1)
(6)
pr(E = 1 | C = 0) = pr(que=1)  pr(U = 1)
Noting that the difference of (5) and (6) is PC = pr(E = 1 | C = 1) – pr(E = 0 | C = 0), we have
(7) PC =
pr(qce = 1) + pr(que=1)  pr(U = 1) - pr(qce = 1)  pr(que=1)  pr(U = 1) - pr(que=1) 
pr(U = 1)
= pr(qce = 1) [ 1 - pr(que=1)  pr(U = 1)].
Hence,
PC
(8)
=
pr(qce = 1)
[1 - pr(que=1)  pr(U = 1)].
Finally, we note that pr(que=1)  pr(U = 1) is just the probability that E = 1 given that C = 0. And
so, finally,
(9)
PC
= pr(qce = 1)
[1 - pr(E =1 | C= 0)]
Equation (9) implies that under the specified assumptions, the causal power of C to generate E
can be estimated from P and from the probability that E occurs given that C does not occur,
which can all be estimated from observations of C and E alone, Moreover, under otherwise
4
similar assumptions we obtain the same result no matter how many unobserved causes there are,
so long as they are all generative and independent of C. We note for later use that a derivation
resulting in an equivalent equation for the causal power of C similar to (9) is possible if there is
another (or several) observed causal factor D, independent of C, and we condition on the absence
of D.
This surprising transformation of metaphysics into testable mathematics predicts the following
for appropriate contexts:
(i)
there should be pairs of cases in which people judge causal powers to be unequal
but judge Ps to be equal.
(ii)
when an effect always occurs in the absence of a causal factor, rather than judging
the factor to have no influence, people should be unwilling to judge the power of
the factor to produce the effect.
(iii)
when the effect never occurs in the absence of a causal factor, people should judge
the efficacy of the factor by P.
Cheng provides experimental evidence that all three are true for contexts—causal factors that
have but two values, present or absent, are all generative, and independent—to which her theory
applies.
3. Preventive Causes
Now suppose that all unobserved causes U of E are generating, and there is an observed
candidate preventing cause, F, of E. In this case, E will occur if U occurs, U acts to bring E
about, and F does not prevent E from occurring.
E = que U• (1 – qfe F).
Cheng's equation is:
5
(10)
pr( E = 1) = pr(que U • (1 – qfe F) = 1).
By using (10), we compute pr(qu U = 1) = pr(E = 1 | F = 0), and pr(E = 1 | F = 1) = pr(que U =
1) • pr(qfe = 0) = pr(que U = 1) • (1 - pr(qfe = 1)) . We can therefore substitute pr(E = 1 | F = 0)
for pr(que U = 1) in the equation for pr(E = 1 | F = 1) and solve for pr(qfe = 1). The result is
(11) pr(qfe = 1) = - Pf / pr( E = 1 | F = 0)
Cheng’s account of preventive power predicts that in appropriate contexts if an effect never
occurs even when a potential preventive cause is absent, people will be uncertain as to the
preventive power, because it is undefined. She reports experiments confirming that prediction.
As Cheng notes, the ceiling effects that follow from her model are standard pieces of
experimental practice. If you set out to test a new antibiotic and you apply it to a culture, and do
not apply it to a control culture, and all of the cells in both cultures die, you don’t—or
shouldn’t—conclude that your antibiotic has no effect. Instead you conclude that the experiment
is no good because, in all probability, some unknown factor independently killed the cultures.
4. Generative Interaction
Many, perhaps most, everyday causal relations provide apparent counterexamples to Cheng’s
theory. Consider the house current circuit breaker, a lamp switch, and the light on the lamp.
Suppose the state of the circuit breaker and the state of the lamp switch are independent. The
light is on if and only if both the circuit breaker and the lamp switch are on. Suppose the state of
the circuit breaker and the state of the lamp switch are independent, and each is on half the time.
If we apply Cheng’s model of simple causal powers, the causal power of the circuit breaker is
[pr(L = on | C = on) – pr(L = on | C = off)] / [1 – pr(L = on | C = off) = ½.
6
and the causal power of the lamp switch is also ½. Allen’s measure, P, gives the same values.
The Rescorla-Wagner equilibrium depends on the salience of the lamp switch and of the circuit
breaker, and on the relative frequencies of their states. If both the lamp switch and the circuit
breaker are on half the time and the saliences are equal, then the equilibrium associative strengths
are both 1/3. Spellman’s measures, P conditional on values of other potential causes, make both
causal powers 1 if we condition on the presence of the other variable, and 0 if we condition on
the absence of the other variable. And that presents a difficulty for Cheng’s theory as well as
Spellman’s.
On Cartwright’s view, and Cheng’s, causal power is supposed to a fundamental feature of the
relation between a potential cause and an effect, insensitive to background conditions. But if the
circuit breaker is always on, then Cheng’s measure of the causal power of the lamp switch is no
longer ½, but 1.
Some account of interaction is required, and in collaboration with Laura Novick, Cheng (1999)
has provided one. It is based on a simple and compelling intuition: If causes A and B of effect E
do not interact, then the set of cases that would exhibit E if exposed to both A and B is the union
of the set of cases that would exhibit E if exposed to A alone and the set of cases that would
exhibit E if exposed to B alone. If we find otherwise, as in the light and the circuit breaker, then
there is an interaction. The explicit mathematical model when A and B are generative and they
interact generatively is:
(12)
E = queU  qae A  qbeB  qabAB
where  is Boolean addition. The probability of E is found by taking the probability of the right
had side of (12). As before, we assume that A, B, U and all of the parameters are independent in
probability. The problem is how to use (12) and the independence assumptions to compute the
causal power of the interaction, that is, pr(qab = 1). Results equivalent to all of those in this
section are in Cheng and Novick (1999), but what is intuitive to one mind may not be so to
7
another, and my derivations, which mimic the derivations of the estimation formulas (9) and (11)
for simple causal powers, sometimes differ from theirs.
When B is absent, the interaction term vanishes and (11) reduces to
(13)
E = que U  qae A
and analogously for B when A is absent
(14)
E = que U  qbe B
So the simple causal powers of A and of B, that is pr(qae = 1) and pr(qbe = 1) can be estimated
as described in section 2, but conditioning on the absence of B to estimate the causal power of A,
and conditioning on the absence of A to estimate the causal power of B.
Further, when A or B are both absent
(15) E = que U
and so the probability that E is produced by unobserved causes, pr(que U = 1) can be estimated.
Because of the independence assumptions, equations 13, 14 and 15 give us all of the terms that
occur when the probability of the right hand side of (11) is taken, except for the causal power of
the interaction, pr(qab = 1). Substituting in the results of 13, 14 and 15 in the expression for the
probability of right hand side of equation 12, we can then solve for pr(qab = 1) from the
probability of E when A and B are both present. The result has a simple form if we first define
the (counterfactual) probability E would have given A and B if there were no interaction, that is
8
(16)
pNI(E = 1 | A =1, B =1) = pr(que U = 1) + pr(qae = 1)pr(A = 1) + pr(qbe = 1)pr(B = 1)
- pr(que U = 1) pr(qae = 1)pr(A = 1) - pr(que U = 1) pr(qbe = 1)pr(B = 1)
-
pr(qae = 1)pr(A = 1) pr(qbe = 1)pr(B = 1)
+ pr(que U = 1) pr(qae = 1)pr(A = 1) pr(qbe = 1) pr(B = 1)
We have already shown in equations 13, 14 and 15 how to estimate all quantities on the right
hand side of equation 16. The causal power of the interaction then takes the form
(17)
pr(qab = 1) = [pr(E = 1 | A = 1,B =1) – pNI(E =1 | A=1,B=1)]
[1 - pNI(E=1 | A =1 , B =1) ]
analogous to Cheng’s formula (9) for estimating simple generative causal power.
The Cheng and Novick interaction formula gives a completely principled account of generative
interaction and how to estimate it in the simple case we have considered of two direct,
independent, generative causes. The theory gives an intuitive result for the example with which I
began, the light, lamp switch and circuit breaker. In that case the effect is the product of the
causes, understood as (0, 1) valued variables, and while the simple causal powers are zero, the
interactive causal power of the lamp switch and circuit breaker to turn the light on has the value
1. Further, the theory gives different results from those of any of the variety of ad hoc measures
of interaction proposed in epidemiology, or the measures of interaction used in standard
statistical categorical data analysis.
5. Other Forms of Interaction
Cheng and Novick consider five other combinations of generative and preventing simple and
interactive causes. The generative interactions are all marked by the fact that the actual
probability of E given A and B is greater than the counterfactual probability, which can be
9
calculated, of E given A and B and assuming no interaction. In preventive interactions the
inequality is reversed. I will briefly review their cases.
If one of the observed causes, A, is generative and the other, B, is preventive and the interaction
is generative then there are alternative models
(18)
E = [que U  qaeA ] (1 - qbeB )  qabAB
(19)
E = [que U  qaeA  qabAB] (1 - qbeB )
As in the previous section, pr(que U = 1) can be estimated in both equations from the frequency
of E when A and B are absent, pr( qae = 1) can be estimated from the frequency of E when B is
absent, pr(qbe = 1) can be estimated from the frequency of E when A is absent, and the estimates
of each of these quantities have the same values for (18) as for (19).
A similar analysis holds if both observed causes are preventive and the interaction is generative.
If both A and B are generative and their interaction is preventive, then there is a single natural
equation
(20)
E = [que U  qaeA  qbeB ] (1 - qabAB)
As before, pr(que U = 1) can be estimated from pr(E = 1 | A = 0, B = 0), and similarly, pr(qae = 1)
and pr(qbe = 1) can be estimated. These pieces can be put together to estimate the counterfactual
probability pNI(E = 1 | A = 1, B = 1), and the expression for pr(E = 1 | A=1, B = 1) formed by
taking the probability of the right side of (20) can then be solve for pr(qab = 1) in terms of pNI(E
= 1 | A = 1, B = 1). The result is
10
pr(qab = 1) = - [pr(E = 1 | A=1, B = 1) - pNI(E = 1 | A = 1, B = 1]
1 - pNI(E = 1 | A = 1, B = 1]
in perfect analogy with Cheng’s formula for estimating simple preventive causal powers.
Analogous formulas, obtained analogously, hold if one of the observed causes is preventive and
the interaction is preventive or if both of the observed causes are preventive and the interaction is
preventive.
6. Cheng Models as Bayes Nets
Cheng and Cheng and Novick are concerned both about how people conceive causal relations
and how they do, or could, discover and use causal relations according to that conception. They
give us an answer for a family of cases, those in which the causal graph is partially known (which
variables are potential causes of others is known, and some causal connections are known not to
obtain, and there is no confounding, no association between the effect and potential causes due to
unobserved causes) but the values of its parameters—the causal powers—are not known, there
are no unobserved confounders. The aim in these cases is to estimate the causal power of a direct
(adjacent) cause of an effect.. We can summarize the estimation theory for these circumstances as
follows. I assume that the probability of any unobserved causes and of their causal powers are not
zero
i.
Assume that E has a single observed, generating cause A, and an independent (in
probability) unobserved preventing cause U. Then the causal power of A to generate E
cannot be estimated.
ii.
Assume that E has any number greater than one of observed, generating causes A, B,
etc., and any number (zero or more) of observed preventing causes, and an independent
unobserved preventing cause U. Then the ratios to one another of the causal powers of
each of the generating causes can be estimated.
11
iii.
Assume that E has one or more observed, generating causes, A, B, etc., and any number
(zero or more) of observed,
preventing causes, C, and an independent unobserved
preventing cause. Then the causal power of each observed preventing cause.can be
estimated.
iv.
Assume that E has any number of observed, generating causes, A, B, etc. and any
number of observed preventing causes, C, and an independent, unobserved generating
cause. Then the causal powers of each of the observed causes can be estimated.
v.
Assume that E has any number of observed, generating causes, A, B, etc. and any
number of observed preventing causes, C, and an unobserved generating cause U. If U
is not a cause of A, and no other observed cause D of E is both an effect of U and either
an effect of A or an effect of another common unobserved cause of A and D, then the
causal power of A can be estimated.
Proofs are given in the appendix. Cheng (2000) has studied the properties of under or over
estimates obtained if the methods of section 2 are applied in cases where they do not give the
correct result. For example, in case i underestimates of the direct causal powers are obtained.
6.1 Estimating the simple total causal power given the true causal graph
Consider the structure
A
B
W
U
E
Figure 1
where W, U are unobserved and independent. Consider the case where there is no interaction,
and all causes are generative. Then the graph above corresponds to the equations.
E = queU  qbeB  qaeA
B = qwbW  qabA
12
Cheng’s methods (1997)—essentially those of sections 2, 3 and 6.1, apply to this case. In this
case in order to estimate qbe it is essential, not optional, to condition on the absence of A, and
similarly to estimate qae. But the “simple causal power” of A is now ambiguous.: it can mean the
causal power of A associated with the A
E edge alone, which is the probability of E given A
and the absence of all other causes (B and W and U in this case) of E, or it can mean the causal
power of A associated with the A -> E edge and the A -> B -> E path, which is the probability of
E given A and the absence of all other causes of E that are not effects of A (W and U in this
case). I will call the former quantity the direct causal power of A, and when the probability is
greater than zero that E occurs given that A occurs and no other causes of E, other than effects of
A, occur, I will call the latter quantity the total causal power of A. Given a directed graph, the
set of all of the direct causal powers somehow determines the total causal powers. How?
Consider a more complicated example:
A
B
S
F
W
C
R
D
V
E
U
G
T
Figure 2
Suppose D is a preventive cause of E, and A is a preventive cause of G and all other causes are
generative, and suppose all of the q parameters are known, except for those associated with R, S,
T, W, V and U, which are unobserved variables..
E = (que U  qceC  qfeF  qgeG)(1 – qdeD)
C = qbcB  qwcW
D = qbdB  qvcV
13
F = qbfB  qsfS
G = qtgT(1 – qagA)
B = qabA  qrbR
Substituting,
(21) E = (que U  qgeqtgT(1 – qagA)  qce(qbc(qabA  qrbR)  qwcW)  qfe(qbf(qabA  qrbR)
 qsf))(1 – qde (qbd(qabA  qrbR)  qvcV))
Hence the total causal power of A to generate E is
(22) pr(E =1 | A =1, U = 0, R = 0, W = 0, V = 0, S = 0) =
pr(qab = 1) 
(pr(qbc = 1)  pr(qce = 1) + pr(qbf = 1)  pr(qfe = 1)) 
(1 – pr(qab =1)  pr(qbd = 1)  pr(qde = 1)))
The general procedure works like this. Consider the set of all paths from A to E. From that set,
eliminate any paths that contain a variable X that directly prevents a variable Y if Y does not
have a generating cause that is A or an effect of A (because otherwise, given that no cause of E
occurs that is not an effect of A, Y cannot occur—notice that the terms with qg in (21) do not
appear in (22) ). The result is a subgraph of the original graph. Figure 2, for example, reduces to:
A
B
F
C
D
E
Figure 3
14
Write the sum of the q terms complements for the generative parents of E, and multiply the sum
by the product, over each preventive parent of E, of 1 minus the corresponding q term. Each q
term in the resulting expression corresponds to a parent of E. Now for each such parent, repeat
the procedure (as if it were E), and multiply the q term for the parent in the previous expression
for E by the result. Iterate, at each stage multiplying each q term whereever it occurs by the
corresponding combination of the q terms of its parents, until there are no more parents. Replace
each q term in the final expression by the probability that it equals 1.
When all causes are generative, the procedure and the resulting formula is isomorphic to a
procedure and formula for computing the total correlation between two variables, A and E, in a
standardized linear model from the correlations of the directly connected variables.
6.3 Estimating causal powers when there are unobserved confounders
All of the procedures so far assume that there is no unobserved common cause influencing for
the cause of an effect and the effect itself. But if the causal graph is know, direct and total causal
powers can sometimes be estimated even when there is such confounding.
Consider the simple case
A
B
E
U
Figure 4
where U is unobserved and generative. The total causal power of A to generate E can be
estimated by the method of the previous subsection (and if A prevents B, or B prevents E, the
total causal power of A is not defined). The causal power of B cannot be estimated by any of the
methods so far described. But it can be estimated. If all causes are generative:
E = qbeB  queU
B = qabA  qubU
15
Substituting and factoring:
E = qbe qabA  (qbequb  que)U
Now pr(qbe =1)  pr(qab =1) can be estimated by the methods of section 2. But A and B are
unconfounded, and so so pr(qab = 1) can be estimated analogously. The ratio gives pr(qbe. = 1).
The technique, called the method of instrumental variables, has a long history in econometrics. It
works here because of the isomorphism noted in the previous subsection of generative Cheng
models and linear models. When B is preventing and all other causes are generative, the
preventive power of B can be estimated, but the derivation is less straightforward: the probability
that qab = 1 can be estimated, of course, and pr(qubU = 1) can be estimated from pr(B = 1 | A =
0). This results in two linearly independent equations, one for pr(E =1 | A = 1) and one for pr(E =
1 | A = 0), in two unknowns, pr(queU = 1) and pr(qbe = 1), which can be solved for pr(qbe = 1).
If figure 4 is altered only by adding a direct influence of A on E, as in:
A
B
E
U
Figure 5.
then the causal power of B, whether generative or preventive, cannot be estimated; nor can the
total causal power of A to generate E be estimated.. There is an important moral here: when
unobserved confounders may be present, estimating causal powers accurately depends on
knowing the causal structure, the structure I have represented by a directed graph.
Consider next the case, where all causes are generative and U and W are not observed.
U
16
A
B
E
W
Figure 6
The direct causal power of A to generate B, and of B to generate E can be estimated, and, more
surprisingly, so can the total causal power of A to generate E. To estimate pr(qbe = 1), condition
on the absence of A and apply the method of section 2. To estimate pr(qab = 1) apply the method
of section 2 directly, since there is no confounding. Now by an obvious variant of the results in
section 6.1, the total causal power of A to generate E is pr(qab = 1)pr(qbe.= 1).
Finally, consider a circumstance that sometimes arises in science, and presumably in everyday
life as well, in which the effect itself influences what is observed. Let “S” represent the property
that a system is observed, and suppose the causal structure is
A
E
S
U
W
Figure 7
All of the observations are conditioned on S = 1, and U and W are unobserved. In this case the
probability that qae =1 cannot be estimated. Recall from the Monte Hall problem, described in
Chapter 6, that conditional on their common effect two otherwise independent variables, in this
case A and U, are dependent. So A and U are dependent conditional on E. But the same is true if
the conditioning variable is any descendant of a common effect (Pearl, 1988). So A and U are
dependent conditional on S.
I leave it to ambitious readers to develop the theory of estimation for interactive causes for
general Cheng models with arbitrary causal graphs.
17
7. Discovering the Causal Graph
The theory of estimation for Cheng models so far developed assumes that the causal graph is
completely known save that, if the associated direct causal powers are zero, some represented
edges may be phantoms. The separation between causal graphs and estimates of causal powers
may seem artificial and unmotivated. Since the occurrences of features we encounter in life are
usually ordered by their known time of occurrence, why not, given a set of features whose causal
relations are to be investigated, apply Cheng’s method, the method of sections 2 and 3, to
determine the influence of each feature on subsequent features. The method is formally a
sequence of regressions: in judging the influence, or causal power of a candidate cause, all other
observed candidate causes are conditioned on. Then the causal graph would appear to emerge as
a result of, not a precondiiton for, the estimation of causal powers.
There are several reasons. One is that a feature occurring at time 1 may influence a feature
occurring at time 3 directly, not through any feature observed at time 2. So to estimate the direct
causal power of A to generate E, we must condition on all features occurring simultaneous with
or prior to A—and as the number of such variables grows over time, the number of cases in
which they are all absent and A is present or absent shrinks. Both memory and statistics are
challenged.
But the preceeding section supplies a more obvious and perhaps more important reason why the
method will not be reliable: unobserved common causes. We have seen that the estimation
methods of sections 2, 3 and 6.1 are generally insufficient when there are unobserved common
causes at work, and often we have no idea before we begin inquiry whether such factors are
operating.
In the last decade there has been extensive reseach into the causal information that can and
cannot be obtained under the Markov and faithfulness assumptions, or similar conditions, and it
18
continues. I will not survey it here (see Spirtes, 2000, especially Chapter 12, for a review), but I
will give some examples.
Suppose, to take almost the worst case, that time order is not known and nothing is known about
the true causal structure, except that there is one, and the Markov and faithfulness assumption
hold. The aim is to estimate the causal power of A to influence E. Suppose the true unknown
structure is:
C
U
T
A
E
S
D
W
Figure 8
and only C, D, A and E are observed. Figure 8 implies that C and D are independent and
independt of E conditional on A, and no other independencies hold. We can begin the inquiry by
supposing for all we now, any of C, D, A, E may be directly dependent on one another:
C
D
A
E
Figure 9
Examining figure 8, C and D are independent, and so there can be no direct connection between
them:
C
D
A
E
Figure 10
But C is independent of E conditional on A, and D is also independent of E conditional on A.
Hence there can be no direct connection between C and E or between D and E:
19
C
D
A
E
Figure 11
Since C and D are independent, but not independent conditional on A, it follows from the
faithfulness assumption that they must have arrows directed into A, although one cannot tell
whether they cause A, or have a common cause with A, or both:
C
D
A
E
Figure 12
where the circles note that we cannot tell whether there is a direct cause, an unobserved common
cause, or both. Now C and D are jointly independent of E conditional on A, but neither is
independent of E. If E caused A, then C and D would be independent of E, and they are not. If
there were in addition an unobserved common cause of E then
C and D would not be
independent of E conditional on A (Monte Hall again). So we conclude
C
D
A
E
Figure 13
and the causal power of A to generate (or prevent) E can therefore be estimated by the methods
of section 2. For the general algorithm, and proofs that it gives the correct result under the
Markov and faithfulness conditions, as well as other procedures for learning Bayes nets from
data, see Spirtes, 2000.
Suppose instead the true structure were:
20
C
U
T
A
E
S
D
W
Figure 14
so that unobserved S is now a common cause of A and E. Then by similar procedures to those
just illustrated, we could not determine whether the true structure is:
C
D
C
D
or
A
E
C
D
or
A
E
A
U
E
U
Figure 15
Figure 16
Figure 17
Moreover, knowledge of the time order (that C and D precede A which precedes E) would not
help a whit in distinguishing these structures. If one knew figure 15 were correct, the causal
power of A could be estimated by the method of instrumental variables. If one knew only that
figure 16 or figure 17 were correct, however, the causal power of A could not be estimated at all.
(If one knew that the two circles at C were not both arrowheads into C, and analogously for the
two circles at D, then in figure 16, but not in figure 17, the causal power of A could be
estimated.).
Finally, consider a case in which the time order is known, and a causal power can be estimated,
but cannot be estimated by a regression procedure. Suppose the true structure is figure 18, with U
and W unobserved.
W
C
A
B
U
21
E
Figure 18
Estimating the causal power of A by conditioning on the absence of B will result in the wrong
answer (Monte Hall, yet again). But the procedure illustrated previously results in the following
structure:
C
A
B
.
E
Figure 19
where the double headed arrow indicates the presence of an unobserved common cause. The
causal power of A can then be estimated by the method of section 2, but not conditioning on B.
7. Conclusion
Most of the results of sections 5 and 6 suggest experiments on human subjects, whether adults or
children, that have not been done, and some of which surely should be. The normative theory,
Cheng’s theory embedded in causal Bayes nets, may of course not describe human judgement
precisely. It may be, for example, that people typically ignore the possibility of unobserved
common causes, and repair their erroneous judgements only as it proves necessary, and, of
course, there are memory and processing limitations. For that reason Cheng’s (2000) recent study
of the under and over estimates that result from incorrect assumptions is an especially valuable
step. We need, besides, an understanding of how incorrect causal Bayes nets—networks that
postulate connections that don’t exist, networks that omit common causes, networks that leave
out connections that do exist—can be remedied without starting over from scratch. Most of the
data from which an erroneous network has been learned will have long since been forgotten
when new phenomena are discovered that require its modification. Neural net models, for
example, typically must be entirely retrained when a new property is considered, and that is a
feature very much to be avoided in a psychological model. No repair theory for Bayes nets exists
22
which is compatible with severe memory and computational limitations. In the next chapter,
however, I consider some interactions between causal Bayes net representations and memory and
computational limitations.
23
Appendix
i.
Assume that E has a single observed, direct generating cause A, and an independent
unobserved preventing cause U. Then the causal power of A to generate E cannot be
estimated.
Proof: The independent, obserservable probabilities and conditonal probabilities that contain
pr(qa = 1).are
1. pr(E = 1) = pr(qa = 1) pr(A = 1) (1 – pr(qu = 1)pr(U = 1))
2. pr(E = 1| A = 1) = pr(qa = 1)(1 – pr(qu = 1)pr(U = 1))
which cannot be solved for pr(qa = 1) since the r.h.s. of equation 2. is proportional to the r.h.s. of
equation 1.
ii.
Assume that E has any number greater than one of observed, direct generating causes A,
B, etc., and any number (zero or more) of direct observed preventing causes, and an
independent unobserved preventing cause U. Then, when defined, the ratios to one
another of the direct causal powers of each of the generating causes can be estimated.
Proof: Assume there are two observed generating causes, A, B. Then after conditioning on the
absence of any observed preventing causes we have
1. pr(E = 1 | A = 1, B = 0) = [pr(qa = 1))] (1 - pr(qu = 1)pr(U = 1))
2. pr(E = 1 |A = 0, B = 1) = [ pr(qb = 1)] (1 - pr(qu = 1)pr(U = 1))
The ratio of the l.h.s. of equation 1 to the l.h.s. of equation 2 is the ratio of the causal powers. The
argument generalizes to any number of observed generating causes..
iii.
Assume that E has one or more observed, direct generating causes, A, B, etc., and any
number (zero or more) of observed, direct preventing causes, C, D, etc., and an
independent unobserved preventing cause. Then the direct causal power of each observed
preventing cause.can be estimated if pr(E = 1 | A = 1, C = 0)  0.
24
Proof: Let A be a generating cause and C be a preventing cause. Then, conditional on the absence
of all other direct causes we have.
1. pr(E = 1 | A = 1, C = 1 = [pr(qc = 1)pr(qa = 1))] (1 - (qu = 1)pr(U = 1))
2. pr(E = 1 | A = 1, C = 0) = [pr(qa = 1))] (1 - pr(qu = 1)pr(U = 1))
1 divided by 2 yields the complement of the preventive causal power of C.
iv.
Assume that E has any number of observed, direct generating causes, A, B, etc. and any
number of observed direct preventing causes, C, and an independent, unobserved
generating cause, whose probability and causal power is not zero. Then the direct causal
power of each of the observed causes can be estimated.
Proof:
For each generating direct cause A, compute the probability that E = 1 given A = 1, conditional
on the absence of all other observed direct causes. The result gives an estimate of pr(qa = 1) +
pr(qu = 1)pr(U = 1) - . pr(qa = 1)) pr(qu = 1)pr(U = 1). Now compute the probability that E = 1
given A = 0, conditional on the absence of all other observed causes. The result gives an estimate
of pr(qu = 1)pr(U = 1). Solve for pr(qa = 1).
For each preventing direct cause, C, compute the probability that E = 1 given C = 1, conditional
on the absence of all other observed direct preventive causes, and likewise compute the
probability that E = 1 given C = 0, conditional on the absence of all other observed direct
preventive causes and the presence of at least one generative cause. The ratio of the first to the
second gives an estimate of (1- pr(qc = 1)) if pr(qu = 1)pr(U = 1) , unless pr(C)pr(qc = 1) =1.
The latter case holds if and only if pr(E =1 | C = 1) = 0.
v.
Assume that E has any number of observed, direct generating causes, A, B, etc. and any
number of observed direct preventing causes, C, and an unobserved direct generating
cause U, whose probability and causal power is not zero. If U is not a cause of A, and no
other observed cause D of E is both an effect of U and either an effect of A or an effect of
25
another common unobserved cause of A and D, then the direct causal power of A can be
estimated.
Proof. Conditioning on the absence of any other direct causes of A eliminates any associations of
A and E other that the direct dependence. Hence the methods of iv can be applied.
26