Download Cheng Models - Philsci

7. Cheng Models 1. Introduction The most interesting and novel recent psychological account of adult judgements of causation has been developed by Patricia Cheng (1997) and her collaborators (Cheng and Novick, 1999). Cheng argues that the account uniquely captures many of the phenomena of adult judgement, but even if it did not, or does not, it is a brilliant piece of mathematical metaphysics. Nancy Cartwright (1987) proposed that there are in the world various fundamental capacities of kinds of events or circumstances. The capacity of a kind of circumstance C to bring about another kind of circumstance E is the probability of E conditional on C and on the absence of all other potential causes of E. Ordinary objects in our everyday world are amalgams of components with fundamental capacities. Cheng’s psychological theory of our tacit causal theories is a generalization of that idea: we judge instances of kinds to have causal powers to produce or to prevent kinds of effects; the powers can act separately or, in some cases, they may interact. We make minimal assumptions about our world that enable us often to form judgements of causal powers, which we in turn can use in prediction. That is the psychological theory for which Cheng has provided evidence. I am concerned here with using Bayes net methods to unravel implications of the theory that have not yet been tested. Cheng’s models of our models of causation turn out to be Bayes nets under a particular parameterization, which means we can use what is known about search and estimation for Bayes nets to extend Cheng’s theoretical results. That is the aim of this chapter. 2. Cheng's Model of Human Judgement of Generative Causal Power. 1 The metaphysics of Cheng’s theory can be viewed as an anatomy of kinds of causal relationships. Cheng considers only causal factors that have two values, present or absent, and only the presence of a factor can have a causal role. She divides causal relations into two sorts: generative and preventive. Generative causal factors increase the probability of an effect, preventive causal factors decrease it, both subject to appropriate conditions. Causal powers are further divided into the simple and the compound, or interactive. Instances of two or more simple causal powers for the same kind of effect produce an instance of that effect independently of one another. That is, if A and B have simple, non-interactive, generative causal powers to produce E, then when A and B are both present, A may cause E or B may cause E, or both may separately cause E, and the probability that A, if A occurs, causes E ( which is not the probability of E given A) is independent (in probability) of the probability that B, if B occurs, causes E. When A and B generatively interact, the effect may be produced by A alone, or by B alone, or by both acting separately, or by A and B acting conjointly. Similar relationships apply when one or both causes are preventive, or which their interaction tends to prevent E. That is the metaphysics, and it may seem to many philosophers and statisticians an unpromising basis for a normative, let alone descriptive, theory of causal judgement that is both a real guide in life and has real empirical content. As we will see, there is a good case to be made that it is a more promising basis than conventional statistical analysis provides. Given data on the joint frequency of candidate causes (of effect E) and of E, when unobserved causes of E may also be acting, how do people judge the efficacy or causal power of any particular observed candidate cause? Suppose they know, or believe, that all unobserved causes of E are generative, and one or more generative candidate causes of E are observed along with E. Consider the simplest case in which there is one observed generative causal factor, C, and one unobserved generative causal factor, U. In that case, E occurs if and only if either C occurs and C causes E or if U occurs and U causes E. 2 We let the parameter qce represent the proposition that C causes E, given that C occurs. And analogously for que. The q parameters have two possible values; 1 represents that the causal factor, if it occurs, acts to bring E about; 0 represents that the causal factor, even if it occurs, does not bring E about. We let C, U and E be binary variables; C = 1 if C occurs, and C = 0 otherwise, and analogously with U and E. So E = 1 if and only if qce C = 1 or queU = 1. Taking the probability of both sides: (1) pr(E = 1) = pr(qce C = 1 or queU = 1) For any propositions A, B, the probability of the proposition that A or B is the probability of A plus the probability of B minus the probability of the proposition that A and B. Hence: (2) pr(E = 1) = pr(qceC = 1) + pr(queU = 1) - pr(qce queCU = 1) Now assume that qce, que are jointly independent. Then (2) becomes1 (3) pr(E = 1) = pr(qce = 1)  pr(C = 1) + pr(que=1)  pr(U = 1) – pr(qce = 1)  pr(que=1)  pr(C = 1, U = 1) Hence the probability that E = 1 conditional on C = 1 and U = 0 is (4) pr(E = 1 | C = 1, U = 0) = pr(qce = 1). which justifies describing pr(qce = 1) as the "causal power" of C to produce E. 1 Equation 3 has a long history. The first occurrence I know of is in a paper in the 1850s by the great 19 th century mathematican, Arthur Cayley, responding to a problem about causal inference posed by George Boole.Cayley assumes U and C are independent. Boole objected to Cayley’s solution to his problem, but the solution, and equation (3), were defended by Richard Dedekind. Cayley’s argument for (3) was quite different from Cheng’s. 3 It still remains mysterious how anyone could know—or reasonably estimate—the causal power of C to produce E. But assume that it is known, or believed, that C and U are independent. From (3), and using the independence of C and U: (5) pr(E = 1 | C = 1) = pr(qce = 1) + pr(que=1)  pr(U = 1) - pr(qce = 1)  pr(que=1)  pr(U = 1) (6) pr(E = 1 | C = 0) = pr(que=1)  pr(U = 1) Noting that the difference of (5) and (6) is PC = pr(E = 1 | C = 1) – pr(E = 0 | C = 0), we have (7) PC = pr(qce = 1) + pr(que=1)  pr(U = 1) - pr(qce = 1)  pr(que=1)  pr(U = 1) - pr(que=1)  pr(U = 1) = pr(qce = 1) [ 1 - pr(que=1)  pr(U = 1)]. Hence, PC (8) = pr(qce = 1) [1 - pr(que=1)  pr(U = 1)]. Finally, we note that pr(que=1)  pr(U = 1) is just the probability that E = 1 given that C = 0. And so, finally, (9) PC = pr(qce = 1) [1 - pr(E =1 | C= 0)] Equation (9) implies that under the specified assumptions, the causal power of C to generate E can be estimated from P and from the probability that E occurs given that C does not occur, which can all be estimated from observations of C and E alone, Moreover, under otherwise 4 similar assumptions we obtain the same result no matter how many unobserved causes there are, so long as they are all generative and independent of C. We note for later use that a derivation resulting in an equivalent equation for the causal power of C similar to (9) is possible if there is another (or several) observed causal factor D, independent of C, and we condition on the absence of D. This surprising transformation of metaphysics into testable mathematics predicts the following for appropriate contexts: (i) there should be pairs of cases in which people judge causal powers to be unequal but judge Ps to be equal. (ii) when an effect always occurs in the absence of a causal factor, rather than judging the factor to have no influence, people should be unwilling to judge the power of the factor to produce the effect. (iii) when the effect never occurs in the absence of a causal factor, people should judge the efficacy of the factor by P. Cheng provides experimental evidence that all three are true for contexts—causal factors that have but two values, present or absent, are all generative, and independent—to which her theory applies. 3. Preventive Causes Now suppose that all unobserved causes U of E are generating, and there is an observed candidate preventing cause, F, of E. In this case, E will occur if U occurs, U acts to bring E about, and F does not prevent E from occurring. E = que U• (1 – qfe F). Cheng's equation is: 5 (10) pr( E = 1) = pr(que U • (1 – qfe F) = 1). By using (10), we compute pr(qu U = 1) = pr(E = 1 | F = 0), and pr(E = 1 | F = 1) = pr(que U = 1) • pr(qfe = 0) = pr(que U = 1) • (1 - pr(qfe = 1)) . We can therefore substitute pr(E = 1 | F = 0) for pr(que U = 1) in the equation for pr(E = 1 | F = 1) and solve for pr(qfe = 1). The result is (11) pr(qfe = 1) = - Pf / pr( E = 1 | F = 0) Cheng’s account of preventive power predicts that in appropriate contexts if an effect never occurs even when a potential preventive cause is absent, people will be uncertain as to the preventive power, because it is undefined. She reports experiments confirming that prediction. As Cheng notes, the ceiling effects that follow from her model are standard pieces of experimental practice. If you set out to test a new antibiotic and you apply it to a culture, and do not apply it to a control culture, and all of the cells in both cultures die, you don’t—or shouldn’t—conclude that your antibiotic has no effect. Instead you conclude that the experiment is no good because, in all probability, some unknown factor independently killed the cultures. 4. Generative Interaction Many, perhaps most, everyday causal relations provide apparent counterexamples to Cheng’s theory. Consider the house current circuit breaker, a lamp switch, and the light on the lamp. Suppose the state of the circuit breaker and the state of the lamp switch are independent. The light is on if and only if both the circuit breaker and the lamp switch are on. Suppose the state of the circuit breaker and the state of the lamp switch are independent, and each is on half the time. If we apply Cheng’s model of simple causal powers, the causal power of the circuit breaker is [pr(L = on | C = on) – pr(L = on | C = off)] / [1 – pr(L = on | C = off) = ½. 6 and the causal power of the lamp switch is also ½. Allen’s measure, P, gives the same values. The Rescorla-Wagner equilibrium depends on the salience of the lamp switch and of the circuit breaker, and on the relative frequencies of their states. If both the lamp switch and the circuit breaker are on half the time and the saliences are equal, then the equilibrium associative strengths are both 1/3. Spellman’s measures, P conditional on values of other potential causes, make both causal powers 1 if we condition on the presence of the other variable, and 0 if we condition on the absence of the other variable. And that presents a difficulty for Cheng’s theory as well as Spellman’s. On Cartwright’s view, and Cheng’s, causal power is supposed to a fundamental feature of the relation between a potential cause and an effect, insensitive to background conditions. But if the circuit breaker is always on, then Cheng’s measure of the causal power of the lamp switch is no longer ½, but 1. Some account of interaction is required, and in collaboration with Laura Novick, Cheng (1999) has provided one. It is based on a simple and compelling intuition: If causes A and B of effect E do not interact, then the set of cases that would exhibit E if exposed to both A and B is the union of the set of cases that would exhibit E if exposed to A alone and the set of cases that would exhibit E if exposed to B alone. If we find otherwise, as in the light and the circuit breaker, then there is an interaction. The explicit mathematical model when A and B are generative and they interact generatively is: (12) E = queU  qae A  qbeB  qabAB where  is Boolean addition. The probability of E is found by taking the probability of the right had side of (12). As before, we assume that A, B, U and all of the parameters are independent in probability. The problem is how to use (12) and the independence assumptions to compute the causal power of the interaction, that is, pr(qab = 1). Results equivalent to all of those in this section are in Cheng and Novick (1999), but what is intuitive to one mind may not be so to 7 another, and my derivations, which mimic the derivations of the estimation formulas (9) and (11) for simple causal powers, sometimes differ from theirs. When B is absent, the interaction term vanishes and (11) reduces to (13) E = que U  qae A and analogously for B when A is absent (14) E = que U  qbe B So the simple causal powers of A and of B, that is pr(qae = 1) and pr(qbe = 1) can be estimated as described in section 2, but conditioning on the absence of B to estimate the causal power of A, and conditioning on the absence of A to estimate the causal power of B. Further, when A or B are both absent (15) E = que U and so the probability that E is produced by unobserved causes, pr(que U = 1) can be estimated. Because of the independence assumptions, equations 13, 14 and 15 give us all of the terms that occur when the probability of the right hand side of (11) is taken, except for the causal power of the interaction, pr(qab = 1). Substituting in the results of 13, 14 and 15 in the expression for the probability of right hand side of equation 12, we can then solve for pr(qab = 1) from the probability of E when A and B are both present. The result has a simple form if we first define the (counterfactual) probability E would have given A and B if there were no interaction, that is 8 (16) pNI(E = 1 | A =1, B =1) = pr(que U = 1) + pr(qae = 1)pr(A = 1) + pr(qbe = 1)pr(B = 1) - pr(que U = 1) pr(qae = 1)pr(A = 1) - pr(que U = 1) pr(qbe = 1)pr(B = 1) - pr(qae = 1)pr(A = 1) pr(qbe = 1)pr(B = 1) + pr(que U = 1) pr(qae = 1)pr(A = 1) pr(qbe = 1) pr(B = 1) We have already shown in equations 13, 14 and 15 how to estimate all quantities on the right hand side of equation 16. The causal power of the interaction then takes the form (17) pr(qab = 1) = [pr(E = 1 | A = 1,B =1) – pNI(E =1 | A=1,B=1)] [1 - pNI(E=1 | A =1 , B =1) ] analogous to Cheng’s formula (9) for estimating simple generative causal power. The Cheng and Novick interaction formula gives a completely principled account of generative interaction and how to estimate it in the simple case we have considered of two direct, independent, generative causes. The theory gives an intuitive result for the example with which I began, the light, lamp switch and circuit breaker. In that case the effect is the product of the causes, understood as (0, 1) valued variables, and while the simple causal powers are zero, the interactive causal power of the lamp switch and circuit breaker to turn the light on has the value 1. Further, the theory gives different results from those of any of the variety of ad hoc measures of interaction proposed in epidemiology, or the measures of interaction used in standard statistical categorical data analysis. 5. Other Forms of Interaction Cheng and Novick consider five other combinations of generative and preventing simple and interactive causes. The generative interactions are all marked by the fact that the actual probability of E given A and B is greater than the counterfactual probability, which can be 9 calculated, of E given A and B and assuming no interaction. In preventive interactions the inequality is reversed. I will briefly review their cases. If one of the observed causes, A, is generative and the other, B, is preventive and the interaction is generative then there are alternative models (18) E = [que U  qaeA ] (1 - qbeB )  qabAB (19) E = [que U  qaeA  qabAB] (1 - qbeB ) As in the previous section, pr(que U = 1) can be estimated in both equations from the frequency of E when A and B are absent, pr( qae = 1) can be estimated from the frequency of E when B is absent, pr(qbe = 1) can be estimated from the frequency of E when A is absent, and the estimates of each of these quantities have the same values for (18) as for (19). A similar analysis holds if both observed causes are preventive and the interaction is generative. If both A and B are generative and their interaction is preventive, then there is a single natural equation (20) E = [que U  qaeA  qbeB ] (1 - qabAB) As before, pr(que U = 1) can be estimated from pr(E = 1 | A = 0, B = 0), and similarly, pr(qae = 1) and pr(qbe = 1) can be estimated. These pieces can be put together to estimate the counterfactual probability pNI(E = 1 | A = 1, B = 1), and the expression for pr(E = 1 | A=1, B = 1) formed by taking the probability of the right side of (20) can then be solve for pr(qab = 1) in terms of pNI(E = 1 | A = 1, B = 1). The result is 10 pr(qab = 1) = - [pr(E = 1 | A=1, B = 1) - pNI(E = 1 | A = 1, B = 1] 1 - pNI(E = 1 | A = 1, B = 1] in perfect analogy with Cheng’s formula for estimating simple preventive causal powers. Analogous formulas, obtained analogously, hold if one of the observed causes is preventive and the interaction is preventive or if both of the observed causes are preventive and the interaction is preventive. 6. Cheng Models as Bayes Nets Cheng and Cheng and Novick are concerned both about how people conceive causal relations and how they do, or could, discover and use causal relations according to that conception. They give us an answer for a family of cases, those in which the causal graph is partially known (which variables are potential causes of others is known, and some causal connections are known not to obtain, and there is no confounding, no association between the effect and potential causes due to unobserved causes) but the values of its parameters—the causal powers—are not known, there are no unobserved confounders. The aim in these cases is to estimate the causal power of a direct (adjacent) cause of an effect.. We can summarize the estimation theory for these circumstances as follows. I assume that the probability of any unobserved causes and of their causal powers are not zero i. Assume that E has a single observed, generating cause A, and an independent (in probability) unobserved preventing cause U. Then the causal power of A to generate E cannot be estimated. ii. Assume that E has any number greater than one of observed, generating causes A, B, etc., and any number (zero or more) of observed preventing causes, and an independent unobserved preventing cause U. Then the ratios to one another of the causal powers of each of the generating causes can be estimated. 11 iii. Assume that E has one or more observed, generating causes, A, B, etc., and any number (zero or more) of observed, preventing causes, C, and an independent unobserved preventing cause. Then the causal power of each observed preventing cause.can be estimated. iv. Assume that E has any number of observed, generating causes, A, B, etc. and any number of observed preventing causes, C, and an independent, unobserved generating cause. Then the causal powers of each of the observed causes can be estimated. v. Assume that E has any number of observed, generating causes, A, B, etc. and any number of observed preventing causes, C, and an unobserved generating cause U. If U is not a cause of A, and no other observed cause D of E is both an effect of U and either an effect of A or an effect of another common unobserved cause of A and D, then the causal power of A can be estimated. Proofs are given in the appendix. Cheng (2000) has studied the properties of under or over estimates obtained if the methods of section 2 are applied in cases where they do not give the correct result. For example, in case i underestimates of the direct causal powers are obtained. 6.1 Estimating the simple total causal power given the true causal graph Consider the structure A B W U E Figure 1 where W, U are unobserved and independent. Consider the case where there is no interaction, and all causes are generative. Then the graph above corresponds to the equations. E = queU  qbeB  qaeA B = qwbW  qabA 12 Cheng’s methods (1997)—essentially those of sections 2, 3 and 6.1, apply to this case. In this case in order to estimate qbe it is essential, not optional, to condition on the absence of A, and similarly to estimate qae. But the “simple causal power” of A is now ambiguous.: it can mean the causal power of A associated with the A E edge alone, which is the probability of E given A and the absence of all other causes (B and W and U in this case) of E, or it can mean the causal power of A associated with the A -> E edge and the A -> B -> E path, which is the probability of E given A and the absence of all other causes of E that are not effects of A (W and U in this case). I will call the former quantity the direct causal power of A, and when the probability is greater than zero that E occurs given that A occurs and no other causes of E, other than effects of A, occur, I will call the latter quantity the total causal power of A. Given a directed graph, the set of all of the direct causal powers somehow determines the total causal powers. How? Consider a more complicated example: A B S F W C R D V E U G T Figure 2 Suppose D is a preventive cause of E, and A is a preventive cause of G and all other causes are generative, and suppose all of the q parameters are known, except for those associated with R, S, T, W, V and U, which are unobserved variables.. E = (que U  qceC  qfeF  qgeG)(1 – qdeD) C = qbcB  qwcW D = qbdB  qvcV 13 F = qbfB  qsfS G = qtgT(1 – qagA) B = qabA  qrbR Substituting, (21) E = (que U  qgeqtgT(1 – qagA)  qce(qbc(qabA  qrbR)  qwcW)  qfe(qbf(qabA  qrbR)  qsf))(1 – qde (qbd(qabA  qrbR)  qvcV)) Hence the total causal power of A to generate E is (22) pr(E =1 | A =1, U = 0, R = 0, W = 0, V = 0, S = 0) = pr(qab = 1)  (pr(qbc = 1)  pr(qce = 1) + pr(qbf = 1)  pr(qfe = 1))  (1 – pr(qab =1)  pr(qbd = 1)  pr(qde = 1))) The general procedure works like this. Consider the set of all paths from A to E. From that set, eliminate any paths that contain a variable X that directly prevents a variable Y if Y does not have a generating cause that is A or an effect of A (because otherwise, given that no cause of E occurs that is not an effect of A, Y cannot occur—notice that the terms with qg in (21) do not appear in (22) ). The result is a subgraph of the original graph. Figure 2, for example, reduces to: A B F C D E Figure 3 14 Write the sum of the q terms complements for the generative parents of E, and multiply the sum by the product, over each preventive parent of E, of 1 minus the corresponding q term. Each q term in the resulting expression corresponds to a parent of E. Now for each such parent, repeat the procedure (as if it were E), and multiply the q term for the parent in the previous expression for E by the result. Iterate, at each stage multiplying each q term whereever it occurs by the corresponding combination of the q terms of its parents, until there are no more parents. Replace each q term in the final expression by the probability that it equals 1. When all causes are generative, the procedure and the resulting formula is isomorphic to a procedure and formula for computing the total correlation between two variables, A and E, in a standardized linear model from the correlations of the directly connected variables. 6.3 Estimating causal powers when there are unobserved confounders All of the procedures so far assume that there is no unobserved common cause influencing for the cause of an effect and the effect itself. But if the causal graph is know, direct and total causal powers can sometimes be estimated even when there is such confounding. Consider the simple case A B E U Figure 4 where U is unobserved and generative. The total causal power of A to generate E can be estimated by the method of the previous subsection (and if A prevents B, or B prevents E, the total causal power of A is not defined). The causal power of B cannot be estimated by any of the methods so far described. But it can be estimated. If all causes are generative: E = qbeB  queU B = qabA  qubU 15 Substituting and factoring: E = qbe qabA  (qbequb  que)U Now pr(qbe =1)  pr(qab =1) can be estimated by the methods of section 2. But A and B are unconfounded, and so so pr(qab = 1) can be estimated analogously. The ratio gives pr(qbe. = 1). The technique, called the method of instrumental variables, has a long history in econometrics. It works here because of the isomorphism noted in the previous subsection of generative Cheng models and linear models. When B is preventing and all other causes are generative, the preventive power of B can be estimated, but the derivation is less straightforward: the probability that qab = 1 can be estimated, of course, and pr(qubU = 1) can be estimated from pr(B = 1 | A = 0). This results in two linearly independent equations, one for pr(E =1 | A = 1) and one for pr(E = 1 | A = 0), in two unknowns, pr(queU = 1) and pr(qbe = 1), which can be solved for pr(qbe = 1). If figure 4 is altered only by adding a direct influence of A on E, as in: A B E U Figure 5. then the causal power of B, whether generative or preventive, cannot be estimated; nor can the total causal power of A to generate E be estimated.. There is an important moral here: when unobserved confounders may be present, estimating causal powers accurately depends on knowing the causal structure, the structure I have represented by a directed graph. Consider next the case, where all causes are generative and U and W are not observed. U 16 A B E W Figure 6 The direct causal power of A to generate B, and of B to generate E can be estimated, and, more surprisingly, so can the total causal power of A to generate E. To estimate pr(qbe = 1), condition on the absence of A and apply the method of section 2. To estimate pr(qab = 1) apply the method of section 2 directly, since there is no confounding. Now by an obvious variant of the results in section 6.1, the total causal power of A to generate E is pr(qab = 1)pr(qbe.= 1). Finally, consider a circumstance that sometimes arises in science, and presumably in everyday life as well, in which the effect itself influences what is observed. Let “S” represent the property that a system is observed, and suppose the causal structure is A E S U W Figure 7 All of the observations are conditioned on S = 1, and U and W are unobserved. In this case the probability that qae =1 cannot be estimated. Recall from the Monte Hall problem, described in Chapter 6, that conditional on their common effect two otherwise independent variables, in this case A and U, are dependent. So A and U are dependent conditional on E. But the same is true if the conditioning variable is any descendant of a common effect (Pearl, 1988). So A and U are dependent conditional on S. I leave it to ambitious readers to develop the theory of estimation for interactive causes for general Cheng models with arbitrary causal graphs. 17 7. Discovering the Causal Graph The theory of estimation for Cheng models so far developed assumes that the causal graph is completely known save that, if the associated direct causal powers are zero, some represented edges may be phantoms. The separation between causal graphs and estimates of causal powers may seem artificial and unmotivated. Since the occurrences of features we encounter in life are usually ordered by their known time of occurrence, why not, given a set of features whose causal relations are to be investigated, apply Cheng’s method, the method of sections 2 and 3, to determine the influence of each feature on subsequent features. The method is formally a sequence of regressions: in judging the influence, or causal power of a candidate cause, all other observed candidate causes are conditioned on. Then the causal graph would appear to emerge as a result of, not a precondiiton for, the estimation of causal powers. There are several reasons. One is that a feature occurring at time 1 may influence a feature occurring at time 3 directly, not through any feature observed at time 2. So to estimate the direct causal power of A to generate E, we must condition on all features occurring simultaneous with or prior to A—and as the number of such variables grows over time, the number of cases in which they are all absent and A is present or absent shrinks. Both memory and statistics are challenged. But the preceeding section supplies a more obvious and perhaps more important reason why the method will not be reliable: unobserved common causes. We have seen that the estimation methods of sections 2, 3 and 6.1 are generally insufficient when there are unobserved common causes at work, and often we have no idea before we begin inquiry whether such factors are operating. In the last decade there has been extensive reseach into the causal information that can and cannot be obtained under the Markov and faithfulness assumptions, or similar conditions, and it 18 continues. I will not survey it here (see Spirtes, 2000, especially Chapter 12, for a review), but I will give some examples. Suppose, to take almost the worst case, that time order is not known and nothing is known about the true causal structure, except that there is one, and the Markov and faithfulness assumption hold. The aim is to estimate the causal power of A to influence E. Suppose the true unknown structure is: C U T A E S D W Figure 8 and only C, D, A and E are observed. Figure 8 implies that C and D are independent and independt of E conditional on A, and no other independencies hold. We can begin the inquiry by supposing for all we now, any of C, D, A, E may be directly dependent on one another: C D A E Figure 9 Examining figure 8, C and D are independent, and so there can be no direct connection between them: C D A E Figure 10 But C is independent of E conditional on A, and D is also independent of E conditional on A. Hence there can be no direct connection between C and E or between D and E: 19 C D A E Figure 11 Since C and D are independent, but not independent conditional on A, it follows from the faithfulness assumption that they must have arrows directed into A, although one cannot tell whether they cause A, or have a common cause with A, or both: C D A E Figure 12 where the circles note that we cannot tell whether there is a direct cause, an unobserved common cause, or both. Now C and D are jointly independent of E conditional on A, but neither is independent of E. If E caused A, then C and D would be independent of E, and they are not. If there were in addition an unobserved common cause of E then C and D would not be independent of E conditional on A (Monte Hall again). So we conclude C D A E Figure 13 and the causal power of A to generate (or prevent) E can therefore be estimated by the methods of section 2. For the general algorithm, and proofs that it gives the correct result under the Markov and faithfulness conditions, as well as other procedures for learning Bayes nets from data, see Spirtes, 2000. Suppose instead the true structure were: 20 C U T A E S D W Figure 14 so that unobserved S is now a common cause of A and E. Then by similar procedures to those just illustrated, we could not determine whether the true structure is: C D C D or A E C D or A E A U E U Figure 15 Figure 16 Figure 17 Moreover, knowledge of the time order (that C and D precede A which precedes E) would not help a whit in distinguishing these structures. If one knew figure 15 were correct, the causal power of A could be estimated by the method of instrumental variables. If one knew only that figure 16 or figure 17 were correct, however, the causal power of A could not be estimated at all. (If one knew that the two circles at C were not both arrowheads into C, and analogously for the two circles at D, then in figure 16, but not in figure 17, the causal power of A could be estimated.). Finally, consider a case in which the time order is known, and a causal power can be estimated, but cannot be estimated by a regression procedure. Suppose the true structure is figure 18, with U and W unobserved. W C A B U 21 E Figure 18 Estimating the causal power of A by conditioning on the absence of B will result in the wrong answer (Monte Hall, yet again). But the procedure illustrated previously results in the following structure: C A B . E Figure 19 where the double headed arrow indicates the presence of an unobserved common cause. The causal power of A can then be estimated by the method of section 2, but not conditioning on B. 7. Conclusion Most of the results of sections 5 and 6 suggest experiments on human subjects, whether adults or children, that have not been done, and some of which surely should be. The normative theory, Cheng’s theory embedded in causal Bayes nets, may of course not describe human judgement precisely. It may be, for example, that people typically ignore the possibility of unobserved common causes, and repair their erroneous judgements only as it proves necessary, and, of course, there are memory and processing limitations. For that reason Cheng’s (2000) recent study of the under and over estimates that result from incorrect assumptions is an especially valuable step. We need, besides, an understanding of how incorrect causal Bayes nets—networks that postulate connections that don’t exist, networks that omit common causes, networks that leave out connections that do exist—can be remedied without starting over from scratch. Most of the data from which an erroneous network has been learned will have long since been forgotten when new phenomena are discovered that require its modification. Neural net models, for example, typically must be entirely retrained when a new property is considered, and that is a feature very much to be avoided in a psychological model. No repair theory for Bayes nets exists 22 which is compatible with severe memory and computational limitations. In the next chapter, however, I consider some interactions between causal Bayes net representations and memory and computational limitations. 23 Appendix i. Assume that E has a single observed, direct generating cause A, and an independent unobserved preventing cause U. Then the causal power of A to generate E cannot be estimated. Proof: The independent, obserservable probabilities and conditonal probabilities that contain pr(qa = 1).are 1. pr(E = 1) = pr(qa = 1) pr(A = 1) (1 – pr(qu = 1)pr(U = 1)) 2. pr(E = 1| A = 1) = pr(qa = 1)(1 – pr(qu = 1)pr(U = 1)) which cannot be solved for pr(qa = 1) since the r.h.s. of equation 2. is proportional to the r.h.s. of equation 1. ii. Assume that E has any number greater than one of observed, direct generating causes A, B, etc., and any number (zero or more) of direct observed preventing causes, and an independent unobserved preventing cause U. Then, when defined, the ratios to one another of the direct causal powers of each of the generating causes can be estimated. Proof: Assume there are two observed generating causes, A, B. Then after conditioning on the absence of any observed preventing causes we have 1. pr(E = 1 | A = 1, B = 0) = [pr(qa = 1))] (1 - pr(qu = 1)pr(U = 1)) 2. pr(E = 1 |A = 0, B = 1) = [ pr(qb = 1)] (1 - pr(qu = 1)pr(U = 1)) The ratio of the l.h.s. of equation 1 to the l.h.s. of equation 2 is the ratio of the causal powers. The argument generalizes to any number of observed generating causes.. iii. Assume that E has one or more observed, direct generating causes, A, B, etc., and any number (zero or more) of observed, direct preventing causes, C, D, etc., and an independent unobserved preventing cause. Then the direct causal power of each observed preventing cause.can be estimated if pr(E = 1 | A = 1, C = 0)  0. 24 Proof: Let A be a generating cause and C be a preventing cause. Then, conditional on the absence of all other direct causes we have. 1. pr(E = 1 | A = 1, C = 1 = [pr(qc = 1)pr(qa = 1))] (1 - (qu = 1)pr(U = 1)) 2. pr(E = 1 | A = 1, C = 0) = [pr(qa = 1))] (1 - pr(qu = 1)pr(U = 1)) 1 divided by 2 yields the complement of the preventive causal power of C. iv. Assume that E has any number of observed, direct generating causes, A, B, etc. and any number of observed direct preventing causes, C, and an independent, unobserved generating cause, whose probability and causal power is not zero. Then the direct causal power of each of the observed causes can be estimated. Proof: For each generating direct cause A, compute the probability that E = 1 given A = 1, conditional on the absence of all other observed direct causes. The result gives an estimate of pr(qa = 1) + pr(qu = 1)pr(U = 1) - . pr(qa = 1)) pr(qu = 1)pr(U = 1). Now compute the probability that E = 1 given A = 0, conditional on the absence of all other observed causes. The result gives an estimate of pr(qu = 1)pr(U = 1). Solve for pr(qa = 1). For each preventing direct cause, C, compute the probability that E = 1 given C = 1, conditional on the absence of all other observed direct preventive causes, and likewise compute the probability that E = 1 given C = 0, conditional on the absence of all other observed direct preventive causes and the presence of at least one generative cause. The ratio of the first to the second gives an estimate of (1- pr(qc = 1)) if pr(qu = 1)pr(U = 1) , unless pr(C)pr(qc = 1) =1. The latter case holds if and only if pr(E =1 | C = 1) = 0. v. Assume that E has any number of observed, direct generating causes, A, B, etc. and any number of observed direct preventing causes, C, and an unobserved direct generating cause U, whose probability and causal power is not zero. If U is not a cause of A, and no other observed cause D of E is both an effect of U and either an effect of A or an effect of 25 another common unobserved cause of A and D, then the direct causal power of A can be estimated. Proof. Conditioning on the absence of any other direct causes of A eliminates any associations of A and E other that the direct dependence. Hence the methods of iv can be applied. 26

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Cheng Models - Philsci