Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
“Cogito ergo sum” …or do I?: When can Causality be inferred from DPGM Felipe Orihuela-Espina Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE) DyNaMo Research Meeting, 3-4th June 2011 Cogito ergo sum • A familiar (for the audience) graphical representation Cause Effect Cogito Sum Cogito Sum Present Not present Present Not present 1.0 0.5 0 0.5 Felipe Orihuela-Espina (INAOE) 2 Why is causality so problematic? • Cannot be computed from the data alone • Systematic temporal precedence is not sufficient • Co-ocurrence is not sufficient • It is not always a direct relation (indirect relations, transitivity/mediation, etc may be present), let alone linear… • It may occur across frequency bands • YOU NAME IT HERE… A very silly example Which process causes which? Causality is so difficult that “it would be very healthy if more researchers abandoned thinking of and using terms such as cause and effect” [Muthen1987 in PearlJ2011] Felipe Orihuela-Espina (INAOE) 3 A real example An ECG [KaturaT2006] only claim that there are interrelations (quantified using MI) [OrihuelaEspinaF2010] Felipe Orihuela-Espina (INAOE) 4 THE CONTRIBUTION OF PHYLOSOPHY Felipe Orihuela-Espina (INAOE) 5 Causality in Phylosophy • Aristotle’s four "causes"' of a thing – The material cause (that out of which the thing is made), – the formal cause (that into which the thing is made), – the efficient cause (that which makes the thing), and – the final cause (that for which the thing is made). In [HollandPW1986] Felipe Orihuela-Espina (INAOE) Aristotle (384BC-322BC) 6 Causality in Phylosophy • Hume’s legacy – Sharp distinction between analytical (thoughts) and empirical (facts) claims – Causal claims are empirical – All empirical claims originate from experience (sensory input) • Hume’s three basic criteria for causation – (a) spatial/temporal contiguity, – (b) temporal succession, and – (c) constant conjunction • It is not empirically verifiable that the cause produces the effect, but only that the cause is invariably followed by the effect. [HollandPW1986, PeralJ1999_IJCAITalk] Felipe Orihuela-Espina (INAOE) David Hume (1711-1776) 7 Causality in Phylosophy • Mill’s general methods of experimental enquiry – Method of concomitant variation (i.e. correlation…) – Method of difference (i.e. causation) – Method of residues (i.e. induction) – Method of agreement (i.e. null effect – can only rule out possible causes) • Mill “only” coded these methods; but they have been put forth by Sir Francis Bacon 250 years earlier (The Advancement of Learning and Novum Organum Scientiarum) In [HollandPW1986] Felipe Orihuela-Espina (INAOE) John Stuart Mill (1806-1873) Sir Francis Bacon (15611626) 8 Causality in Phylosophy • Suppe’s probabilistic theory of causality – “… one event is the cause of another if the appearance of the first is followed with a high probability by the appearance of the second, and there is no third event that we can use to factor out the probability relationship between the first and second events” – C is a genuine cause of E if: • P(E|C)>P(E) (prima facie) and • not (P(E|C,D)=P(E|D) and P(E|C,D)>=P(E|C)) (spurious cause) [SuppeP1970, HollandPW1986] Felipe Orihuela-Espina (INAOE) Patrick Colonel Suppes (1922-) Lucie Stern Emeritus Proffesor of Philosophie at Stanford 9 CAUSALITY: DIFFERENT VIEWS, SAME CONCEPT Felipe Orihuela-Espina (INAOE) 10 Causality requires time! • “…there is little use in the practice of attempting to dicuss causality without introducing time” [Granger,1969] – …whether philosphical, statistical, econometrical, topological, etc… Felipe Orihuela-Espina (INAOE) 11 Causality requires directionality! • Algebraic equations, e.g. regression “do not properly express causal relationships […] because algebraic equations are symmetrical objects […] To express the directionality of the underlying process, Wright augmented the equation with a diagram, later called path diagram in which arrows are drawn from causes to effects” [PearlJ2009] – Feedback and instantaneous causality in any case are a double causation. Felipe Orihuela-Espina (INAOE) 12 From association to causation • Barriers between classical statistics and causal analysis [PearlJ2009] 1. Coping with untested assumptions and changing conditions 2. Inappropiate mathematical notation Felipe Orihuela-Espina (INAOE) 13 Stronger Causality • Zero-level causality: a statistical association, i.e. nonindependence which cannot be removed by conditioning on allowable alternative features. – i.e. Granger’s, Topological • First-level causality: Use of a treatment over another causes a change in outcome Weaker – i.e. Rubin´s, Pearl’s • Second-level causality: Explanation via a generating process, provisional and hardly lending to formal characterization, either merely hypothesized or solidly based on evidence – i.e. Suppe’s, Wright’s path analysis – e.g. Smoking causes lung cancer Inspired from [CoxDR2004] Felipe Orihuela-Espina (INAOE) It is debatable whether second level causality is indeed causality 14 Variable types and their joint probability distribution • Variable types: – Background variables (B) – specify what is fixed – Potential causal variables (C) – Intermediate variables (I) – surrogates, monitoring, pathways, etc – Response variables (R) – observed effects • Joint probability distribution of the variables: P(RICB) = P(R|ICB) P(I|CB) P(C|B) P(B) …but it is possible to integrate over I (marginalized) P(RCB) = P(R|CB) P(C|B) P(B) In [CoxDR2004] Felipe Orihuela-Espina (INAOE) 15 Granger’s Causality • Granger´s causality: – Y is causing X (YX) if we are better to predict X using all available information (Z) than if the information apart of Y had been used. • The groundbreaking paper: – Granger “Investigating causal relations by econometric models and cross-spectral methods” Econometrica 37(3): 424-438 • Granger’s causality is only a statement about one thing happening before another! Sir Clive William John Granger (1934 –2009) – University of Nottingham – Nobel Prize Winner – Rejects instantaneous causality Considered as slowness in recording of information Felipe Orihuela-Espina (INAOE) 16 Granger’s Causality • “The future cannot cause the past” [Granger 1969] – “the direction of the flow of time [is] a central feature” – Feedback is a double causation; XY and YX denoted XY • “causality…is based entirely on the predictability of some series…” [Granger 1969] – Causal relationships may be investigated in terms of coherence and phase diagrams Felipe Orihuela-Espina (INAOE) 17 Topological causality • “A causal manifold is one with an assignment to each of its points of a convex cone in the tangent space, representing physically the future directions at the point. The usual causality in MO extends to a causal structure in M’.” [SegalIE1981] • Causality is seen as embedded in the geometry/topology of manifolds – Causality is a curve function defined over the manifdld • The groundbreaking book: – Segal IE “Mathematical Cosmology and Extragalactic Astronomy” (1976) • I am not sure whether Segal is the father of causal manifolds, but his contribution to the field is simply overwhelming… Irving Ezra Segal (1918-1998) Professor of Mathematics at MIT Felipe Orihuela-Espina (INAOE) 18 Causal (homogeneous Lorentzian) Manifolds: The topological view of causality • The cone of causality [SegalIE1981,RainerM1999, Future MosleySN1990, KrymVR2002] Instant present Past Felipe Orihuela-Espina (INAOE) 19 Rubin Causal Model • Rubin Causal Model: – “Intuitively, the causal effect of one treatment relative to another for a particular experimental unit is the difference between the result if the unit had been exposed to the first treatment and the result if, instead, the unit had been exposed to the second treatment” • The groundbreaking paper: – Rubin “Bayesian inference for causal effects: The role of randomization” The Annals of Statistics 6(1): 34-58 Donald B Rubin (1943 – ) – John L. Loeb Professor of Stats at Harvard • The term Rubin causal model was coined by his student Paul Holland Felipe Orihuela-Espina (INAOE) 20 Rubin Causal Model • Causality is an algebraic difference: treatment causes the effect Ytreatment(u)-Ycontrol(u) …or in other words; the effect of a cause is always relative to another cause [HollandPW1986] • Rubin causal model establishes the conditions under which associational (e.g. Bayesian) inference may infer causality (makes assumptions for causality explicit). Felipe Orihuela-Espina (INAOE) 21 Fundamental Problem of Causal Inference • Only Ytreatment(u) or Ycontrol(u) can be observed on a phenomena, but not both. – Causal inference is impossible without making untested assumptions – …yet causal inference is still possible under uncertainty [HollandPW1986] (two otherwise identical populations u must be prepared and all appropiate background variables must be considered in B). • Again! (see slide #15“Statistical dependence vs Causality”); Causal questions cannot be computed from the data alone, nor from the distributions that govern the data [PearlJ2009] Felipe Orihuela-Espina (INAOE) 22 Relation between Granger, Rubin and Suppes causalities Granger Rubin’s model Cause (Treatment) Y t Effect X Ytreatment(u) All other available information Z Z (pre-exposure variables) • Granger’s noncausality: X is not Granger cause of Y (relative to information in Z) X and Y are conditionally independent (i.e. P(Y|X,Z)=P(Y|Z)) • Granger’s noncausality is equal to Suppes spurious case Modified from [HollandPW1986] Felipe Orihuela-Espina (INAOE) 23 Pearl’s statistical causality (a.k.a. structural theory) • “Causation is encoding behaviour under intervention […] Causality tells us which mechanisms [stable functional relationships] is to be modified [i.e. broken] by a given action” [PearlJ1999_IJCAI] • Causality, intervention and mechanisms can be encapsulated in a causal model • The groundbreaking book: Judea Pearl (1936-) Professor of computer science and statistics at UCLA – Pearl J “Causality: Models, Reasoning and Inference” (2000)* • Pearl’s results do establish conditions under which first level causal conclusions are possible [CoxDR2004] Felipe Orihuela-Espina (INAOE) * With permission of his 1995 Biometrika paper masterpiece Sewall Green Wright (1889-1988) – Father of path analysis (graphical rules) 24 [PearlJ2000, Lauritzen2000, DawidAP2002] Statistical causality • Conditioning vs Intervening [PearlJ2000] – Conditioning: P(R|C)=P(R|CB)P(B|C) useful but innappropiate for causality as changes in the past (B) occur before intervention (C) – Intervention: P(R║C)=P(R|CB)P(B) Pearl´s definition of causality • Underlying assumption: The distribution of R (and I) remains unaffected by the intervention. – Watch out! This is not trivial serious interventions may distort all relations [CoxDR2004] • βCB=0 C╨B P(R|C)=P(R║C) i.e. there is no difference between conditioning and intervention Structural coefficient Conditional independence Felipe Orihuela-Espina (INAOE) 25 LOOKING FOR CAUSALITY: DYNAMIC PROBABILISTIC CAUSAL MODELS AND SOME OTHER ANALYTICAL TOOLS Felipe Orihuela-Espina (INAOE) 26 Some tools for looking at causality… beyond the interest of this research meeting • Structural Causal Models* and Path Analysis [WrightS1921,1932, PearlJ2009] – Structural Equation Modelling [WrightS1921, PearlJ2011] • Dynamic Transfer Function [Kaminski 1991, 2001 and 2005] • Dynamic Causal Modelling [FristonKJ2003] • Partial Directed Coherence [BaccaláLA2001] Felipe Orihuela-Espina (INAOE) •Well…this one is of interest… as it is the father of probabilistic dynamic models 27 Bayesian Networks • Bayesian networks are structures (often in the form of graph) describing probabilistic relationships between variables [PearlJ2000, KaminskiM2005] – Conditional independencies are represented by missing edges – Arrows convey causal directionality but merely indicate the possiblity of a causal relation (i.e. they are only a notational clue); implication of causality must be discarded as inadequate [PearlJ2009] • Conditional distributions e.g. P(X|Y), determines associational distributions [HollandPW1986] Felipe Orihuela-Espina (INAOE) 28 Causal Bayesian Networks • The problem of Identification: – Can the controlled (post-intervention) distribution P(R║C) be estimated from data governed by the pre-intervention distribution P(RCB)? – The answer is a “yes, but…” • i.e. as long as we account for general control of confounding and counterfactuals, admissibility, Markovian graphs (i.e. acyclic graph), ignorability, and a few other criteria beyond my humble human limitation… seasoned with a good dose of inscrutable maths. • Some “recommended” reading if you are up to the challenge: [PearlJ2000, 2009, Lauritzen2000, DawidAP2002] Felipe Orihuela-Espina (INAOE) 29 Dynamic Graphical Models • Tian’s theorem: – “A sufficient condition for identifying a causal effect P(R║C) is that every path between C and any of its children traces at least one arrow emanating from a measured variable I” – Translation to plain English: You ought to account for confounders (which are also part of your graph) and causal relations must cross through those confounders (i.e. they have been taken into account) • Note that Tian’s theorem is sufficient but not necessary, i.e. direct links CR may still encode direct causality – More translation to plain English: P(R║C) cannot encode questions of attribution (e.g. how many deaths are due to specific exposure?) or of susceptibility (e.g. how many would have got diseased if exposed) • Note the important implication that a thoroughly/carefully designed randomized control trial may not suffice! Felipe Orihuela-Espina (INAOE) 30 Dynamic Graphical Models: A common error when using them… • Correct methodology of structural approach to causation [PearlJ2009]: 1. Define the target quantity 2. Assume: Formulate causal assumptions 3. Identify: Determine if the target is identifiable 4. Estimate: i.e. approximate • Common application of the methodology of structural approach to causation: 1. Estimate: i.e. approximate 2. Assume: Formulate causal assumptions 3. Sometimes Define the target quantity Felipe Orihuela-Espina (INAOE) 31 Conclusions Cogito Sum ? • Well… only if you can prove no other factor to intervene… Felipe Orihuela-Espina (INAOE) 33 Questions? THANKS! Felipe Orihuela-Espina (INAOE) 34 BACK UP SLIDES Felipe Orihuela-Espina (INAOE) 35 Structural Causal Models and Path Analysis • [WrightS1921, 1932, GoldbergerA1972, 1973, DuncanO1975, PearlJ2009] Felipe Orihuela-Espina (INAOE) 36 Structural Equation Modelling • “a huge logical gap exists between “establishing causation,” which requires careful manipulative experiments, and “interpreting parameters as causal effects” “ [PearlJ2011] Felipe Orihuela-Espina (INAOE) 37 • Uses coherence and phase • Can be interpreted in terms of Granger’s causality [KaminskiM2001] Coherence Phase [KaminskiM 1991, 2001 and 2005] Felipe Orihuela-Espina (INAOE) Figure from [KaminskiM2001] Dynamic Transfer Function 38 Dynamic Causal Modelling • A bilinear model by which the neural model (not observed) is inferred from the haemodynamic model (observed) [FristonKJ2003] • Embodies requisite constraints using a Bayesian framework Fig. 1. This is a schematic illustrating the concepts underlying dynamic causal modelling. In particular it highlights the two distinct ways in which inputs or perturbations can elicit responses in the regions or nodes that compose the model. In this example there are five nodes, including visual areas V1 and V4 in the fusiform gyrus, areas 39 and 37, and the superior temporal gyrus STG. Stimulus-bound perturbations designated u1 act as extrinsic inputs to the primary visual area V1. Stimulus-free or contextual inputs u2 mediate their effects by modulating the coupling between V4 and BA39 and between BA37 and V4. For example, the responses in the angular gyrus (BA39) are caused by inputs to V1 that are transformed by V4, where the influences exerted by V4 are sensitive to the second input. The dark square boxes represent the components of the DCM that transform the state variables zi in each region (neuronal activity) into a measured (hemodynamic) response yi Felipe Orihuela-Espina (INAOE) 39 [FristonKJ2003] Partial Directed Coherence • Based on Granger’s causality Felipe Orihuela-Espina (INAOE) 40 [BaccaláLA2001] WHAT IT IS NOT CAUSALITY – AND OTHER COMMON MISCONCEPTIONS Felipe Orihuela-Espina (INAOE) 41 Statistical dependence • Statistical dependence is a type of relation between any two variables [WermuthN1998]: if we find one, we can expect to find the other Statistical independence Association (symmetric or assymettric) Deterministic dependence • The limits of statistical dependence – Statistical independence: The distribution of one variable is the same no matter at which level changes occur on in the other variable X and Y are independent P(X∩Y)=P(X)P(Y) – Deterministic dependence: Levels of one variable occur in an exactly determined way with changing levels of the other. – Association: Intermediate forms of statistical dependency • Symmetric • Asymmetric (a.k.a. response) or directed association Felipe Orihuela-Espina (INAOE) 42 Associational Inference ≡ Descriptive Statistics!!! • The most detailed information linking two variables is given by the joint distribution: P(X=x,Y=y) • The conditional distribution describes how the values of X changes as Y varies: P(X=x|Y=y)=P(X=x,Y=y)/P(Y=y) • Associational statistics is simply descriptive (estimates, regressions, posterior distributions, etc…) [HollandPW1986] – Example: Regression of X on Y is the conditional expectation E(X|Y=y) Felipe Orihuela-Espina (INAOE) 43 Regression and Correlation; two common forms of associational inference • Regression Analysis: “the study of the dependence of one or more response variables on explanatory variables” [CoxDR2004] – Strong regression ≠ causality [Box1966] – Prediction systems ≠ Causal systems [CoxDR2004] • Correlation is a relation over mean values; two variables correlate as they move over/under their mean together (correlation is a ”normalization” of the covariance) • Correlation ≠ Statistical dependence – If r=0 (i.e. absence of correlation), X and Y are statistically independent, but the opposite is not true [MarrelecG2005]. • Correlation ≠ Causation [YuleU1900 in CoxDR2004, WrightS1921] – Yet, causal conclusions from a carefully design (often synonym of randomized) experiment are often (not always) valid [HollandPW1986, FisherRA1926 in CoxDR2004] Felipe Orihuela-Espina (INAOE) 44 Coherence: yet another common form of associational inference • Often understood as “correlation in the frequency domain” Cxy = |Gxy|2/(GxxGyy) – where Gxy is the cross-spectral density, – i.e. coherence is the ratio between the (squared) correlation coefficient and the frequency components. • Coherence measures the degree to which two series are related – Coherence alone does not implies causality! The temporal lag of the phase difference between the signals must also be considered. Felipe Orihuela-Espina (INAOE) 45 Statistical dependence vs Causality • Statistical dependence provide associational relations and can be expressed in terms of a joint distribution alone – Causal relations CANNOT be expressed on terms of statistical association alone [PearlJ2009] • Associational inference ≠ Causal Inference [HollandPW1986, PearlJ2009] – …ergo, Statistical dependence ≠ Causal Inference – In associational inference, time is merely operational Felipe Orihuela-Espina (INAOE) 46 Causation defies (1st level) logic… • Input: – “If the floor is wet, then it rained” – “If we break this bottle, the floor will get wet” • Logic output: – “If we break this bottle, then it rained” Example taken from [PearlJ1999] Felipe Orihuela-Espina (INAOE) 47