Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Graphical Causal Models References Causality in Econometrics (3) Alessio Moneta Max Planck Institute of Economics Jena [email protected] 26 April 2011 GSBC Lecture Friedrich-Schiller-Universität Jena Causality in Econometrics 1/27 Graphical Causal Models References Introduction (In)dependence Probabilistic Inference Graphical Causal Models Terminology and Representation of Statistical Dependence Causality in Econometrics 2/27 Graphical Causal Models References Introduction (In)dependence Probabilistic Inference Sources and Motivations B The graphical-models approach to causal inference was mainly developed by: • Spirtes, Glymour, Scheines (2000), Causation, Prediction, and Search, 2nd edition. • Pearl (2000), Causality: Models, Reasoning, and Inference. B Forerunners: • J.S. Mill • C. Spearman • T. Haavelmo, H. Wold, H. Simon • H. Reichenbach, P. Suppes Causality in Econometrics 3/27 Graphical Causal Models References Introduction (In)dependence Probabilistic Inference Sources and Motivations B Ideas: • Use of probability + diagrams to represent associations in the data • Use of graph-theory to represent and analyze causal relations • This permits, in particular: • addressing the symmetry problem, typical of probabilistic approaches • representation of structures where interventions are possible • Formalization of the relationship between probabilistic and causal representation • Emphasis on inference, agnosticism about causal ontology. But: many points of contact with • probabilistic approach (Reichenbach) • manipulability theory (Woodward). Causality in Econometrics 4/27 Graphical Causal Models References Introduction (In)dependence Probabilistic Inference Formal preliminaries B Graph: < V, M, E > • set V of vertices (or nodes) to represent variables. • set M of marks as ‘>’, ‘−’ (or EM ≡ empty mark), ‘o’, to represent directions of causal influences. • set E of edges, which are pairs of the form {[V1 , M1 ], [V2 , M2 ]}, to represent causal relationships. V1 - V 2 V3 G: < {V1 , V2 , V3 }, {EM, >}, {{[V1 , EM], [V2 , >]}, {[V1 , EM], [V3 , EM]}, {[V3 , EM], [V2 , >]}} > Causality in Econometrics 5/27 Graphical Causal Models References Introduction (In)dependence Probabilistic Inference Formal preliminaries B Undirected graph: • graph in which the set of marks M = {EM} B Directed graph: • graph in which the set of marks M = {EM, >} and for each edge in E the marks are are always: EM, > B Directed edges: A −→ B (≡ {[A, EM], [B, >]}) • A : parent, B : child (descendant). Causality in Econometrics 6/27 Graphical Causal Models References Introduction (In)dependence Probabilistic Inference Formal preliminaries B Path: • undirected path: a sequence of vertices A, . . . , B such that for every pair of vertices X, Y adjacent (in the sequence) there is a connecting edge {[X, M1 ][Y, M2 ]}. • directed path: a sequence of vertices A, . . . , B such that for every pair of vertices X, Y adjacent (in the sequence) there is a connecting edge {[X, EM][Y, >]}. • acyclic path: path that contains no vertex more than once, otherwise it is cyclic. Causality in Econometrics 7/27 Graphical Causal Models References Introduction (In)dependence Probabilistic Inference Example V1 - V 2 -V 4 -V 5 V3 • Directed paths: < V1 , V2 , V4 , V5 >; < V3 , V2 , V4 , V5 >; < V2 , V4 , V5 >, etc. • Undirected paths: < V1 , V3 , V2 , V4 , V5 >; < V1 , V2 , V3 >, etc. • Undirected cyclic path: < V1 , V2 , V3 , V1 > • No directed cyclic paths. Causality in Econometrics 8/27 Graphical Causal Models References Introduction (In)dependence Probabilistic Inference More terminology B Collider: vertex V such that A −→ V ←− B B Unshielded collider: vertex V such that A −→ V ←− B and A and B are not adjacent (≡ connected by edge) in the graph B Complete graph: graph in which every pair of vertices are adjacent B Directed Acyclic Graph (DAG): directed graph that contains no directed cyclic paths B Directed Cyclic Graph (DCG): directed graph that contains directed cyclic paths Causality in Econometrics 9/27 Graphical Causal Models References Introduction (In)dependence Probabilistic Inference Graphs and probabilistic dependence B First use of graphs: representation of probabilistic dependence and independence B Nodes: random variables (discrete or continuous). B Edges: probabilistic dependence. B Bayesian networks (Pearl 1985). Causality in Econometrics 10/27 Graphical Causal Models References Introduction (In)dependence Probabilistic Inference Conditional Independence B If X, Y, Z are random variables, we say that X is conditionally independent of Y given Z, and write X⊥ ⊥ Y |Z (1) if • for discrete variables: P(X = x, Y = y|Z = z) = P(X = x|Z = z)P(Y = y|Z = z) • for continuous variables: fXY|Z (x, y|z) = fX|Z (x|z)fY|Z (y|z) • We can also write (simplifying the notation): X⊥ ⊥ Y|Z ⇐⇒ f (x, y, z)f (z) = f (x, z)f (y, z) Causality in Econometrics 11/27 Graphical Causal Models References Introduction (In)dependence Probabilistic Inference Conditional independence B Some equalities: • X⊥ ⊥ Y|Z ⇐⇒ f (x, y|z) = f (x|z)f (y|z) • X⊥ ⊥ Y|Z ⇐⇒ f (x, y, z)f (z) = f (x, z)f (y, z) • X⊥ ⊥ Y|Z ⇐⇒ f (x|y, z) = f (x|z) • X⊥ ⊥ Y|Z ⇐⇒ f (x, z|y) = f (x|z)f (z|y) • X⊥ ⊥ Y|Z ⇐⇒ f (x, y, z) = f (x|z)f (y, z) Note: f (x, y|z) = f (x, y, z)/f (z) Causality in Econometrics 12/27 Graphical Causal Models References Introduction (In)dependence Probabilistic Inference Conditional independence B It holds also: • X⊥ ⊥ Y|Z ⇐⇒ Y ⊥ ⊥ X|Z (symmetry) • If Z is empty (trivial) X ⊥ ⊥ Y: X is independent of Y. B Other properties: • X⊥ ⊥ YW |Z =⇒ X ⊥ ⊥ Y|Z (decomposition) • X⊥ ⊥ YW |Z =⇒ X ⊥ ⊥ Y|ZW (weak union) See Pearl 2000:11 Causality in Econometrics 13/27 Graphical Causal Models References Introduction (In)dependence Probabilistic Inference Interpretations of C.I. B Useful interpretations of C.I. X ⊥ ⊥ Y|Z: • once we know Z, learning the value of Y does not provide additional information about X. • once we know Z, reading X is irrelevant for reading Y. • once we observe realizations of Z, observing realizations of Y is irrelevant for predicting the frequent realizations of X. Causality in Econometrics 14/27 Graphical Causal Models References Introduction (In)dependence Probabilistic Inference Independence and uncorrelatedness B Important to distinguish between (conditional) independence and (conditional or partial) correlation. • Recall: B Variance of X: σX2 := E[(X − E(X))2 ] B Covariance between X and Y: σXY := E[(X − E(X))(Y − E(Y))] B Correlation coefficient (Pearson): σ ρXY := XY σX σY B Linear regression coefficient: σ σ = ρXY X rXY := XY σY σY2 B This suggest that correlation is a measure of linear dependence B Notice: σXY = σYX and ρXY = ρYX but rXY 6= rYX Causality in Econometrics 15/27 Graphical Causal Models References Introduction (In)dependence Probabilistic Inference Independence and uncorrelatedness • Recall: B Partial correlation between X and Y given Z ρXY.Z = q ρXY − ρYZ ρXZ q 1 − ρ2XZ 1 − ρ2YZ B Conditional independence X ⊥ ⊥ Y|Z: fXY|Z (x, y|z) = fX|Z (x|z)fY|Z (y|z) B It holds: • X⊥ ⊥ Y =⇒ ρXY = 0 • X⊥ ⊥ Y|Z =⇒ ρXY.Z = 0 B and (of course): • ρXY 6= 0 =⇒ X ⊥ ⊥ / Y • ρXY.Z 6= 0 =⇒ X ⊥ ⊥ / Y |Z Causality in Econometrics 16/27 Graphical Causal Models References Introduction (In)dependence Probabilistic Inference Independence and uncorrelatedness B In general: • ρXY = 0 =⇒ × X⊥ ⊥Y • ρXY.Z = 0 =⇒ × X⊥ ⊥ Y |Z B However, if the joint distribution F(XYZ) is normal: • ρXY = 0 =⇒ X ⊥ ⊥Y • ρXY.Z = 0 =⇒ X ⊥ ⊥ Y |Z Causality in Econometrics 17/27 Graphical Causal Models References Introduction (In)dependence Probabilistic Inference Population and sample B Notice also the difference between population parameters and sample statistics: ρXY = σXY σX σY rYX = σXY σX2 ρ̂XY = q r̂YX = ∑nk=1 (Xk − X̄)(Yk − Ȳ) ∑nk=1 (Xk − X̄)2 ∑nk=1 (Yk − Ȳ)2 ∑nk=1 (Xk − X̄)(Yk − Ȳ) ∑nk=1 (Xk − X̄)2 β̂ OLS = (X0 X)−1 XY, for vectors of data X ≡ (X1 , . . . , Xn )0 , Y ≡ (Y1 , . . . , Yn )0 and where X̄ = n−1 ΣXi . Notice that when X̄ = 0 and Ȳ = 0, r̂YX = β̂ OLS . Causality in Econometrics 18/27 Graphical Causal Models References Introduction (In)dependence Probabilistic Inference Other concepts related to independence B If, given the r.v. X and Y, the moments E(Xk ) < ∞ and E(Ym ) < ∞, it turns out that X ⊥ ⊥ Y iff E(Xk Ym ) = E(Xk )E(Ym ), for all k, m = 1, 2, . . . B X and Y are (k, m)-order dependent iff E(Xk Ym ) 6= E(Xk )E(Ym ), for any k, m = 1, 2, . . . B (1-1)-order linear dependence: E(XY) 6= E(X)E(Y) B (1-1)-order independence: E(XY) = E(X)E(Y) ⇔ E{[X − E(X)][Y − E(Y)]} = 0 ⇔ σXY = 0 ⇔ ρXY = 0 B Orthogonality E(XY) = 0 B Note: 1 if X and Y are uncorrelated (ρXY = 0), this is equivalent to say that their mean deviations are orthogonal (if X and Y are “centered”, subtracting their mean, they become orthogonal). 2 if X and Y are orthogonal, ρXY = 0 only if E(X) = 0 or E(Y) = 0 Causality in Econometrics 19/27 Graphical Causal Models References Introduction (In)dependence Probabilistic Inference Other concepts related to independence B r-th order independence E(Yr |X = x) = 0 for all x ∈ RX B In summary: independence =⇒ 1st -order independence =⇒ non-correlation ⇐⇒ orthogonality mean-subtracted variables non-correlation =⇒ × independence (there could be non-liner dependencies!) (cfr. Spanos 1999: 272-279) Causality in Econometrics 20/27 Graphical Causal Models References Introduction (In)dependence Probabilistic Inference Statistical model B Importance of defining a statistical model. B Typical statistical model for continuous set of n random variables X • Probability model: defines a family of density functions f (x; θ ) defined over the range of values of X; • Sampling model: X ((T × n) matrix of data) is a random sample. (cfr. Spanos 1999: 33) Causality in Econometrics 21/27 Graphical Causal Models References Introduction (In)dependence Probabilistic Inference The Markov Condition B The Markov condition permits the representation of probabilistic dependence through a DAG. In particular, it imposes a relationship between the Bayesian network (DAG in which nodes are random variables) and the probabilistic structure. • A directed acyclic graph G over V (set of vertices) and a probability distribution P(V) satisfy the Markov condition iff for every W ∈ V, W ⊥ ⊥ V\(Descendants(W ) ∪ Parents(W )) given Parents(W ). (Spirtes et al. 2000: 11) • or, in other words: Any vertex (node) is conditionally independent of its nondescendants (except parents), given its parents. Causality in Econometrics 22/27 Graphical Causal Models References Introduction (In)dependence Probabilistic Inference Markov Condition (example) V1 6 - V 2 -V 4 -V 5 V3 • The DAG above and the probability distribution P(V1 , V2 , V3 , V4 ) satisfy MC iff: (1) V4 ⊥ ⊥ {V1 , V3 }|V2 (2) V5 ⊥ ⊥ {V1 , V2 , V3 }|V4 • Notice that many other c.i. relations follow from (1) and (2) by applying symmetry, decomposition, and weak union (see Slide For example 13 ). • {V1 , V3 } ⊥ ⊥ V4 | V2 ; V1 ⊥ ⊥ V4 |V2 ; V3 ⊥ ⊥ V4 | V2 ; V1 , ⊥ ⊥ V4 |{V2 , V3 }; etc. • { V1 , V2 , V3 } ⊥ ⊥ V5 |V4 ; V5 ⊥ ⊥ {V1 , V2 }|V4 ; etc. Causality in Econometrics 23/27 Graphical Causal Models References Introduction (In)dependence Probabilistic Inference Markov condition (factorization) B The M.C. permits the following factorization: • discrete case: P(V1 , . . . , Vn ) = Πni=1 P(Vi |Parents(Vi )),where if Parents(Vi ) = ∅, P(Vi |Parents(Vi )) = P(Vi ) • continuous case: f (V1 , . . . , Vn ) = Πni=1 f (Vi |Parents(Vi )), where if Parents(Vi ) = ∅, f (Vi |Parents(Vi )) = f (Vi ) V1 6 - V 2 -V 4 -V 5 V3 • We have: P(V1 , V2 , V3 , V4 , V5 ) = P ( V1 | V3 ) P ( V2 | V1 , V3 ) P ( V3 ) P ( V4 | V2 ) P ( V5 | V4 ) Recall chain rule: in general P(V1 , . . . , Vn ) = P(Vn |Vn−1 , . . . , V2 , V1 ), . . . , P(V2 |V1 )P(V1 ) Causality in Econometrics 24/27 Graphical Causal Models References Introduction (In)dependence Probabilistic Inference The d-separation criterion B d-separation: a graphical criterion which captures exactly all the C.I. relationships that are implied by the M.C.∗ B Consider a graph G, with distinct nodes X, Y and a set of nodes W, where neither X nor Y belongs to W. We say that X and Y are d-separated given W in G iff there exists no undirected path U between X and Y, such that: 1 every collider C (−→ C ←−) on U is in W or has a descendant in W, and 2 no other vertex on U is in W. • if there is such a path, then X and Y are d-connected. (cfr. Spirtes et al. 2000: 14). ∗ Included those derived by the MC through symmetry, decomposition and weak union. Causality in Econometrics 25/27 Graphical Causal Models References Introduction (In)dependence Probabilistic Inference The d-separation criterion (Pearl’s definition) B d-separation: B Consider a graph G, with distinct nodes X, Y and a set of nodes W, where neither X nor Y belongs to W. A path U is said to be d-separated by a set of nodes W iff 1 U contains a chain (−→ C −→ or ←− C ←−) or a fork (←− C −→) such that the middle node C ∈ W, or 2 U contains a collider C (−→ C ←−) s.t. C ∈ / W and s.t. no descendant of C is in W. • A set W is said to d-separate X from Y iff W every path from X to Y is d-separated by W. • Otherwise X and Y d-connected by W. (cfr. Pearl 2000: 16-17). Causality in Econometrics 26/27 Graphical Causal Models References Reading List • Spirtes, Glymour, Scheines (2000), Causation, Prediction, and Search, MIT Press 2nd edition: • Chapter 1 and 2 • Pearl (2000), Causality: Models, Reasoning, and Inference, CUP: • Section 1.1 and 1.2 • Spanos, A. (1999), Probability Theory and Statistical Inference. CUP: • Section 2.2 and 6.4 Further reading: • Cooper, G.F. (1999), An Overview of the Representation and Discovery of Causal Relationships Using Bayesian Networks, in C. Glymour, G.F. Cooper, Computation Causation, and Discovery, MIT Press. • Scheines, R. (1997), An Introduction to causal inference. www Causality in Econometrics 27/27