Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour, and many others Dept. of Philosophy & CALD Carnegie Mellon Nov. 13th, 2003 1 Outline 1. Motivation 2. Representation 3. Connecting Causation to Probability (Independence) 4. Searching for Causal Models 5. Improving on Regression for Causal Inference Nov. 13th, 2003 2 1. Motivation Non-experimental Evidence Day Care John A lot Mary None Aggressivenes s A lot A little Typical Predictive Questions • Can we predict aggressiveness from the amount of violent TV watched • Can we predict crime rates from abortion rates 20 years ago Causal Questions: • Does watching violent TV cause Aggression? • I.e., if we change TV watching, will the level of Aggression change? Nov. 13th, 2003 3 Causal Estimation When and how can we use non-experimental data to tell us about the effect of an intervention? Manipulated Probability P(Y | X set= x, Z=z) from Unmanipulated Probability P(Y | X = x, Z=z) Nov. 13th, 2003 4 2. Representation 1. Association & causal structure - qualitatively 2. Interventions 3. Statistical Causal Models 1. Bayes Networks 2. Structural Equation Models Nov. 13th, 2003 5 Causation & Association X and Y are associated (X _||_ Y) iff x1 x2 P(Y | X = x1) P(Y | X = x2) Association is symmetric: X _||_ Y Y _||_ X X is a cause of Y iff x1 x2 P(Y | X set= x1) P(Y | X set= x2) Causation is asymmetric: X Y Nov. 13th, 2003 Y X 6 Direct Causation X is a direct cause of Y relative to S, iff z,x1 x2 P(Y | X set= x1 , Z set= z) P(Y | X set= x2 , Z set= z) where Z = S - {X,Y} X Nov. 13th, 2003 Y 7 Causal Graphs Causal Graph G = {V,E} Each edge X Y represents a direct causal claim: X is a direct cause of Y relative to V Exposure Chicken Pox Exposure Rash Infection Nov. 13th, 2003 Rash 8 Causal Graphs Not Cause Complete Exposure Omitted Causes Infection Symptoms Common Cause Complete Omitted Common Causes Exposure Infection Nov. 13th, 2003 Symptoms 9 Modeling Ideal Interventions Ideal Interventions (on a variable X): • Completely determine the value or distribution of a variable X • Directly Target only X (no “fat hand”) E.g., Variables: Confidence, Athletic Performance Intervention 1: hypnosis for confidence Intervention 2: anti-anxiety drug (also muscle relaxer) Nov. 13th, 2003 10 Modeling Ideal Interventions Interventions on the Effect Post Pre-experimental System Sweaters On Nov. 13th, 2003 Room Temperature 11 Modeling Ideal Interventions Interventions on the Cause Post Pre-experimental System Sweaters On Nov. 13th, 2003 Room Temperature 12 Interventions & Causal Graphs • Model an ideal intervention by adding an “intervention” variable outside the original system • Erase all arrows pointing into the variable intervened upon Intervene to change Inf Pre-intervention graph Post-intervention graph? Exp Exp Inf Inf Rash Rash I Nov. 13th, 2003 13 Conditioning vs. Intervening P(Y | X = x1) vs. P(Y | X set= x1) Teeth Slides Nov. 13th, 2003 14 Causal Bayes Networks The Joint Distribution Factors Smoking [0,1] According to the Causal Graph, Yellow Fingers [0,1] Lung Cancer [0,1] P(S = 0) = .7 P(S = 1) = .3 P(YF = 0 | S = 0) = .99 P(YF = 1 | S = 0) = .01 P(YF = 0 | S = 1) = .20 P(YF = 1 | S = 1) = .80 i.e., for all X in V P(V) = P(X|Immediate Causes of(X)) P(S,YF, L) = P(S) P(YF | S) P(LC | S) P(LC = 0 | S = 0) = .95 P(LC = 1 | S = 0) = .05 P(LC = 0 | S = 1) = .80 P(LC = 1 | S = 1) = .20 Nov. 13th, 2003 15 Structural Equation Models Education Causal Graph Income Statistical Model Longevity 1. Structural Equations 2. Statistical Constraints Nov. 13th, 2003 16 Structural Equation Models Education Causal Graph Income Longevity z Structural Equations: One Equation for each variable V in the graph: V = f(parents(V), errorV) for SEM (linear regression) f is a linear function z Statistical Constraints: Joint Distribution over the Error terms Nov. 13th, 2003 17 Structural Equation Models Causal Graph Education Equations: Education = ed Longevity Income = Educationincome Longevity = EducationLongevity Income SEM Graph Education (path diagram) 1 Income Income 2 Longevity Statistical Constraints: (ed, Income,Income ) ~N(0,2) 2 diagonal - no variance is zero Longevity Nov. 13th, 2003 18 3. Connecting Causation to Probability Nov. 13th, 2003 19 The Markov Condition Causal Structure Statistical Predictions Causal Markov Axiom Independence Causal Graphs X Y X _||_ Z | Y Z i.e., P(X | Y) = P(X | Y, Z) Nov. 13th, 2003 20 Causal Markov Axiom If G is a causal graph, and P a probability distribution over the variables in G, then in P: every variable V is independent of its non-effects, conditional on its immediate causes. Nov. 13th, 2003 21 Causal Markov Condition Two Intuitions: 1) Immediate causes make effects independent of remote causes (Markov). 2) Common causes make their effects independent (Salmon). Nov. 13th, 2003 22 Causal Markov Condition 1) Immediate causes make effects independent of remote causes (Markov). E = Exposure to Chicken Pox I = Infected S = Symptoms Markov Cond. E I E || S | I S Nov. 13th, 2003 23 Causal Markov Condition 2) Effects are independent conditional on their common causes. Smoking (S) YF || LC | S Markov Cond. Yellow Fingers (YF) Lung Cancer (LC) Nov. 13th, 2003 24 Causal Structure Statistical Data Acyclic Causal Graph X1 X2 X3 Causal Markov Axiom (D-separation) Independence Relations X1 Nov. 13th, 2003 X3 | X2 25 Causal Markov Axiom In SEMs, d-separation follows from assuming independence among error terms that have no connection in the path diagram i.e., assuming that the model is common cause complete. Nov. 13th, 2003 26 Causal Markov and D-Separation • In acyclic graphs: equivalent • Cyclic Linear SEMs with uncorrelated errors: • D-separation correct • Markov condition incorrect • Cyclic Discrete Variable Bayes Nets: • If equilibrium --> d-separation correct • Markov incorrect Nov. 13th, 2003 27 D-separation: Conditioning vs. Intervening T X1 P(X3 | X2) P(X3 | X2, X1) X3 X2 X3 _||_ X1 | X2 T X1 X2 X3 P(X3 | X2 set= ) = P(X3 | X2 set=, X1) X3 _||_ X1 | X2 set= I Nov. 13th, 2003 28 4. Search From Statistical Data to Probability to Causation Nov. 13th, 2003 29 Causal Discovery Statistical Data Causal Structure Data Equivalence Class of Causal Graphs X1 X1 X1 X2 X2 X2 X3 Causal Markov Axiom (D-separation) X3 X3 Statistical Inference Discovery Algorithm Independence Relations X1 X3 | X2 Background Knowledge - X2 before X3 - no unmeasured common causes Nov. 13th, 2003 30 Representations of D-separation Equivalence Classes We want the representations to: • Characterize the Independence Relations Entailed by the Equivalence Class • Represent causal features that are shared by every member of the equivalence class Nov. 13th, 2003 31 Patterns & PAGs • Patterns (Verma and Pearl, 1990): graphical representation of an acyclic d-separation equivalence - no latent variables. • PAGs: (Richardson 1994) graphical representation of an equivalence class including latent variable models and sample selection bias that are d-separation equivalent over a set of measured variables X Nov. 13th, 2003 32 Patterns Example Possible Edges X1 X2 X1 X2 X1 X2 Nov. 13th, 2003 X1 X2 X3 X4 33 Patterns: What the Edges Mean X2 X1 and X2 are not adjacent in any member of the equivalence class X1 X2 X1 X2 (X1 is a cause of X2) in every member of the equivalence class. X1 X2 X1 X2 in some members of the equivalence class, and X2 X1 in others. X1 Nov. 13th, 2003 34 Patterns X1 X2 Pattern X4 X3 Represents X1 X1 X2 X3 X4 Nov. 13th, 2003 X2 X3 X4 35 PAGs: Partial Ancestral Graphs What PAG edges mean. X1 X2 X1 and X2 are not adjacent X1 X2 X2 is not an ancestor of X1 X1 X2 No set d-separates X2 and X1 X1 X2 X1 is a cause of X2 X1 X2 There is a latent common cause of X1 and X2 Nov. 13th, 2003 36 PAGs: Partial Ancestral Graph X1 X2 PAG X3 Represents X1 X2 X1 X3 X2 T1 X3 etc. X1 X2 X1 T1 X3 T1 X2 X3 Nov. 13th, 2003 T2 37 Tetrad 4 Demo www.phil.cmu.edu/projects/tetrad_download/ Nov. 13th, 2003 38 Overview of Search Methods • Constraint Based Searches • TETRAD • Scoring Searches • Scores: BIC, AIC, etc. • Search: Hill Climb, Genetic Alg., Simulated Annealing • Very difficult to extend to latent variable models Heckerman, Meek and Cooper (1999). “A Bayesian Approach to Causal Discovery” chp. 4 in Computation, Causation, and Discovery, ed. by Glymour and Cooper, MIT Press, pp. 141-166 Nov. 13th, 2003 39 5. Regession and Causal Inference Nov. 13th, 2003 40 Regression to estimate Causal Influence • Let V = {X,Y,T}, where - Y : measured outcome - measured regressors: X = {X1, X2, …, Xn} - latent common causes of pairs in X U Y: T = {T1, …, Tk} • Let the true causal model over V be a Structural Equation Model in which each V V is a linear combination of its direct causes and independent, Gaussian noise. Nov. 13th, 2003 41 Regression to estimate Causal Influence • Consider the regression equation: Y = b0 + b1X1 + b2X2 + ..…bnXn • Let the OLS regression estimate bi be the estimated causal influence of Xi on Y. • That is, holding X/Xi experimentally constant, bi is an estimate of the change in E(Y) that results from an intervention that changes Xi by 1 unit. • Let the real Causal Influence Xi Y = i • When is the OLS estimate bi an unbiased estimate of i Nov. 13th, 2003 ? 42 Linear Regression Let the other regressors O = {X1, X2,....,Xi-1, Xi+1,...,Xn} bi = 0 if and only if Xi,Y.O = 0 In a multivariate normal distribuion, Xi,Y.O = 0 if and only if Xi _||_ Y | O Nov. 13th, 2003 43 Linear Regression So in regression: bi = 0 Xi _||_ Y | O But provably : i = 0 S O, Xi _||_ Y | S So S O, Xi _||_ Y | S i = 0 ~ S O, Xi _||_ Y | S don’t know Nov. 13th, 2003 (unless we’re lucky) 44 Regression Example X1 X2 X1 _||_ Y | X2 b1 0 X2 _||_ Y | X1 b2 = 0 ~S {X2} X1 _||_ Y | S Don’t know S {X1} X2 _||_ Y | {X1} 2 = 0 Y True Model Nov. 13th, 2003 45 Regression Example T1 X1 X2 T2 X3 Y True Model X1 _||_ Y | {X2,X3} b1 0 X2 _||_ Y | {X1,X3} b2 0 X3 _||_ Y | {X1,X2} b3 0 ~S {X2,X3}, X1 _||_ Y | S DK S {X1,X3}, X2 _||_ Y | {X1} 2 = 0 ~S {X1,X2}, X3 _||_ Y | S DK Nov. 13th, 2003 46 Regression Example T1 X1 X2 T2 X3 X1 X2 X3 Y Y True Model PAG Nov. 13th, 2003 47 Regression Bias If • Xi is d-separated from Y conditional on X/Xi in the true graph after removing Xi Y, and • X contains no descendant of Y, then: bi is an unbiased estimate of i See “Using Path Diagrams ….” Nov. 13th, 2003 48 Applications • Parenting among Single, • • • • • Black Mothers Pneumonia Photosynthesis Lead - IQ College Retention Corn Exports • • • • • • Rock Classification Spartina Grass College Plans Political Exclusion Satellite Calibration Naval Readiness Nov. 13th, 2003 49 References • Causation, Prediction, and Search, 2nd Edition, (2000), by P. Spirtes, C. Glymour, and R. Scheines ( MIT Press) • Causality: Models, Reasoning, and Inference, (2000), Judea Pearl, Cambridge Univ. Press • Computation, Causation, & Discovery (1999), edited by C. Glymour and G. Cooper, MIT Press • Causality in Crisis?, (1997) V. McKim and S. Turner (eds.), Univ. of Notre Dame Press. • TETRAD IV: www.phil.cmu.edu/projects/tetrad • Web Course on Causal and Statistical Reasoning : www.phil.cmu.edu/projects/csr/ Nov. 13th, 2003 50