Download Causality in Econometrics (3)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Transcript
Graphical Causal Models References
Causality in Econometrics (3)
Alessio Moneta
Max Planck Institute of Economics
Jena
[email protected]
26 April 2011
GSBC Lecture
Friedrich-Schiller-Universität Jena
Causality in Econometrics
1/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Graphical Causal Models
Terminology and Representation of Statistical
Dependence
Causality in Econometrics
2/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Sources and Motivations
B The graphical-models approach to causal inference was mainly
developed by:
• Spirtes, Glymour, Scheines (2000), Causation, Prediction, and Search,
2nd edition.
• Pearl (2000), Causality: Models, Reasoning, and Inference.
B Forerunners:
• J.S. Mill
• C. Spearman
• T. Haavelmo, H. Wold, H. Simon
• H. Reichenbach, P. Suppes
Causality in Econometrics
3/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Sources and Motivations
B Ideas:
• Use of probability + diagrams to represent associations in the data
• Use of graph-theory to represent and analyze causal relations
• This permits, in particular:
• addressing the symmetry problem, typical of probabilistic approaches
• representation of structures where interventions are possible
• Formalization of the relationship between probabilistic and causal
representation
• Emphasis on inference, agnosticism about causal ontology. But:
many points of contact with
• probabilistic approach (Reichenbach)
• manipulability theory (Woodward).
Causality in Econometrics
4/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Formal preliminaries
B Graph: < V, M, E >
• set V of vertices (or nodes) to represent variables.
• set M of marks as ‘>’, ‘−’ (or EM ≡ empty mark), ‘o’, to represent
directions of causal influences.
• set E of edges, which are pairs of the form {[V1 , M1 ], [V2 , M2 ]}, to
represent causal relationships.
V1
- V
2
V3
G: < {V1 , V2 , V3 }, {EM, >}, {{[V1 , EM], [V2 , >]}, {[V1 , EM], [V3 , EM]}, {[V3 , EM], [V2 , >]}} >
Causality in Econometrics
5/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Formal preliminaries
B Undirected graph:
• graph in which the set of marks M = {EM}
B Directed graph:
• graph in which the set of marks M = {EM, >} and for each edge in
E the marks are are always: EM, >
B Directed edges: A −→ B (≡ {[A, EM], [B, >]})
• A : parent, B : child (descendant).
Causality in Econometrics
6/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Formal preliminaries
B Path:
• undirected path: a sequence of vertices A, . . . , B such that for every
pair of vertices X, Y adjacent (in the sequence) there is a connecting
edge {[X, M1 ][Y, M2 ]}.
• directed path: a sequence of vertices A, . . . , B such that for every
pair of vertices X, Y adjacent (in the sequence) there is a connecting
edge {[X, EM][Y, >]}.
• acyclic path: path that contains no vertex more than once, otherwise
it is cyclic.
Causality in Econometrics
7/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Example
V1
- V
2
-V
4
-V
5
V3
• Directed paths: < V1 , V2 , V4 , V5 >; < V3 , V2 , V4 , V5 >;
< V2 , V4 , V5 >, etc.
• Undirected paths: < V1 , V3 , V2 , V4 , V5 >; < V1 , V2 , V3 >, etc.
• Undirected cyclic path: < V1 , V2 , V3 , V1 >
• No directed cyclic paths.
Causality in Econometrics
8/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
More terminology
B Collider: vertex V such that A −→ V ←− B
B Unshielded collider: vertex V such that A −→ V ←− B and A
and B are not adjacent (≡ connected by edge) in the graph
B Complete graph: graph in which every pair of vertices are
adjacent
B Directed Acyclic Graph (DAG): directed graph that contains no
directed cyclic paths
B Directed Cyclic Graph (DCG): directed graph that contains
directed cyclic paths
Causality in Econometrics
9/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Graphs and probabilistic dependence
B First use of graphs: representation of probabilistic dependence
and independence
B Nodes: random variables (discrete or continuous).
B Edges: probabilistic dependence.
B Bayesian networks (Pearl 1985).
Causality in Econometrics
10/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Conditional Independence
B If X, Y, Z are random variables, we say that X is conditionally
independent of Y given Z, and write
X⊥
⊥ Y |Z
(1)
if
• for discrete variables:
P(X = x, Y = y|Z = z) = P(X = x|Z = z)P(Y = y|Z = z)
• for continuous variables:
fXY|Z (x, y|z) = fX|Z (x|z)fY|Z (y|z)
• We can also write (simplifying the notation):
X⊥
⊥ Y|Z ⇐⇒ f (x, y, z)f (z) = f (x, z)f (y, z)
Causality in Econometrics
11/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Conditional independence
B Some equalities:
• X⊥
⊥ Y|Z ⇐⇒ f (x, y|z) = f (x|z)f (y|z)
• X⊥
⊥ Y|Z ⇐⇒ f (x, y, z)f (z) = f (x, z)f (y, z)
• X⊥
⊥ Y|Z ⇐⇒ f (x|y, z) = f (x|z)
• X⊥
⊥ Y|Z ⇐⇒ f (x, z|y) = f (x|z)f (z|y)
• X⊥
⊥ Y|Z ⇐⇒ f (x, y, z) = f (x|z)f (y, z)
Note: f (x, y|z) = f (x, y, z)/f (z)
Causality in Econometrics
12/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Conditional independence
B It holds also:
• X⊥
⊥ Y|Z ⇐⇒ Y ⊥
⊥ X|Z (symmetry)
• If Z is empty (trivial) X ⊥
⊥ Y: X is independent of Y.
B Other properties:
• X⊥
⊥ YW |Z =⇒ X ⊥
⊥ Y|Z (decomposition)
• X⊥
⊥ YW |Z =⇒ X ⊥
⊥ Y|ZW (weak union)
See Pearl 2000:11
Causality in Econometrics
13/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Interpretations of C.I.
B Useful interpretations of C.I. X ⊥
⊥ Y|Z:
• once we know Z, learning the value of Y does not provide
additional information about X.
• once we know Z, reading X is irrelevant for reading Y.
• once we observe realizations of Z, observing realizations of Y is
irrelevant for predicting the frequent realizations of X.
Causality in Econometrics
14/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Independence and uncorrelatedness
B Important to distinguish between (conditional) independence
and (conditional or partial) correlation.
• Recall:
B Variance of X:
σX2 := E[(X − E(X))2 ]
B Covariance between X and Y:
σXY := E[(X − E(X))(Y − E(Y))]
B Correlation coefficient (Pearson):
σ
ρXY := XY
σX σY
B Linear regression coefficient:
σ
σ
= ρXY X
rXY := XY
σY
σY2
B This suggest that correlation is a measure of linear dependence
B Notice: σXY = σYX and ρXY = ρYX but rXY 6= rYX
Causality in Econometrics
15/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Independence and uncorrelatedness
• Recall:
B Partial correlation between X and Y given Z
ρXY.Z = q
ρXY − ρYZ ρXZ
q
1 − ρ2XZ
1 − ρ2YZ
B Conditional independence X ⊥
⊥ Y|Z:
fXY|Z (x, y|z) = fX|Z (x|z)fY|Z (y|z)
B It holds:
• X⊥
⊥ Y =⇒ ρXY = 0
• X⊥
⊥ Y|Z =⇒ ρXY.Z = 0
B and (of course):
• ρXY 6= 0 =⇒ X ⊥
⊥
/ Y
• ρXY.Z 6= 0 =⇒ X ⊥
⊥
/ Y |Z
Causality in Econometrics
16/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Independence and uncorrelatedness
B In general:
• ρXY = 0 =⇒
× X⊥
⊥Y
• ρXY.Z = 0 =⇒
× X⊥
⊥ Y |Z
B However, if the joint distribution F(XYZ) is normal:
• ρXY = 0 =⇒ X ⊥
⊥Y
• ρXY.Z = 0 =⇒ X ⊥
⊥ Y |Z
Causality in Econometrics
17/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Population and sample
B Notice also the difference between population parameters and
sample statistics:
ρXY =
σXY
σX σY
rYX =
σXY
σX2
ρ̂XY = q
r̂YX =
∑nk=1 (Xk − X̄)(Yk − Ȳ)
∑nk=1 (Xk − X̄)2 ∑nk=1 (Yk − Ȳ)2
∑nk=1 (Xk − X̄)(Yk − Ȳ)
∑nk=1 (Xk − X̄)2
β̂ OLS = (X0 X)−1 XY,
for vectors of data X ≡ (X1 , . . . , Xn )0 , Y ≡ (Y1 , . . . , Yn )0 and where X̄ = n−1 ΣXi .
Notice that when X̄ = 0 and Ȳ = 0, r̂YX = β̂ OLS .
Causality in Econometrics
18/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Other concepts related to independence
B If, given the r.v. X and Y, the moments E(Xk ) < ∞ and E(Ym ) < ∞, it turns out
that X ⊥
⊥ Y iff
E(Xk Ym ) = E(Xk )E(Ym ), for all k, m = 1, 2, . . .
B X and Y are (k, m)-order dependent iff
E(Xk Ym ) 6= E(Xk )E(Ym ), for any k, m = 1, 2, . . .
B (1-1)-order linear dependence:
E(XY) 6= E(X)E(Y)
B (1-1)-order independence:
E(XY) = E(X)E(Y) ⇔ E{[X − E(X)][Y − E(Y)]} = 0 ⇔ σXY = 0 ⇔ ρXY = 0
B Orthogonality
E(XY) = 0
B Note:
1 if X and Y are uncorrelated (ρXY = 0), this is equivalent to say that their
mean deviations are orthogonal (if X and Y are “centered”, subtracting
their mean, they become orthogonal).
2 if X and Y are orthogonal, ρXY = 0 only if E(X) = 0 or E(Y) = 0
Causality in Econometrics
19/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Other concepts related to independence
B r-th order independence
E(Yr |X = x) = 0 for all x ∈ RX
B In summary:
independence =⇒ 1st -order independence =⇒ non-correlation
⇐⇒ orthogonality mean-subtracted variables
non-correlation =⇒
× independence (there could be non-liner
dependencies!)
(cfr. Spanos 1999: 272-279)
Causality in Econometrics
20/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Statistical model
B Importance of defining a statistical model.
B Typical statistical model for continuous set of n random variables
X
• Probability model: defines a family of density functions f (x; θ )
defined over the range of values of X;
• Sampling model: X ((T × n) matrix of data) is a random sample.
(cfr. Spanos 1999: 33)
Causality in Econometrics
21/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
The Markov Condition
B The Markov condition permits the representation of probabilistic
dependence through a DAG. In particular, it imposes a
relationship between the Bayesian network (DAG in which
nodes are random variables) and the probabilistic structure.
• A directed acyclic graph G over V (set of vertices) and a probability
distribution P(V) satisfy the Markov condition iff for every W ∈ V,
W ⊥
⊥ V\(Descendants(W ) ∪ Parents(W )) given Parents(W ).
(Spirtes et al. 2000: 11)
• or, in other words:
Any vertex (node) is conditionally independent of its nondescendants (except
parents), given its parents.
Causality in Econometrics
22/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Markov Condition (example)
V1
6
- V
2
-V
4
-V
5
V3
• The DAG above and the probability distribution
P(V1 , V2 , V3 , V4 ) satisfy MC iff:
(1) V4 ⊥
⊥ {V1 , V3 }|V2
(2) V5 ⊥
⊥ {V1 , V2 , V3 }|V4
• Notice that many other c.i. relations follow from (1) and (2) by
applying symmetry, decomposition, and weak union (see Slide
For example
13
).
• {V1 , V3 } ⊥
⊥ V4 | V2 ; V1 ⊥
⊥ V4 |V2 ; V3 ⊥
⊥ V4 | V2 ;
V1 , ⊥
⊥ V4 |{V2 , V3 }; etc.
• { V1 , V2 , V3 } ⊥
⊥ V5 |V4 ; V5 ⊥
⊥ {V1 , V2 }|V4 ; etc.
Causality in Econometrics
23/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
Markov condition (factorization)
B The M.C. permits the following factorization:
• discrete case: P(V1 , . . . , Vn ) = Πni=1 P(Vi |Parents(Vi )),where if
Parents(Vi ) = ∅, P(Vi |Parents(Vi )) = P(Vi )
• continuous case: f (V1 , . . . , Vn ) = Πni=1 f (Vi |Parents(Vi )), where if
Parents(Vi ) = ∅, f (Vi |Parents(Vi )) = f (Vi )
V1
6
- V
2
-V
4
-V
5
V3
• We have: P(V1 , V2 , V3 , V4 , V5 ) =
P ( V1 | V3 ) P ( V2 | V1 , V3 ) P ( V3 ) P ( V4 | V2 ) P ( V5 | V4 )
Recall chain rule: in general P(V1 , . . . , Vn ) = P(Vn |Vn−1 , . . . , V2 , V1 ), . . . , P(V2 |V1 )P(V1 )
Causality in Econometrics
24/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
The d-separation criterion
B d-separation: a graphical criterion which captures exactly all the
C.I. relationships that are implied by the M.C.∗
B Consider a graph G, with distinct nodes X, Y and a set of nodes W,
where neither X nor Y belongs to W. We say that X and Y are
d-separated given W in G iff there exists no undirected path U
between X and Y, such that:
1 every collider C (−→ C ←−) on U is in W or has a descendant in W,
and
2 no other vertex on U is in W.
• if there is such a path, then X and Y are d-connected.
(cfr. Spirtes et al. 2000: 14).
∗
Included those derived by the MC through symmetry, decomposition and weak union.
Causality in Econometrics
25/27
Graphical Causal Models References
Introduction (In)dependence Probabilistic Inference
The d-separation criterion (Pearl’s definition)
B d-separation:
B Consider a graph G, with distinct nodes X, Y and a set of nodes W,
where neither X nor Y belongs to W. A path U is said to be
d-separated by a set of nodes W iff
1 U contains a chain (−→ C −→ or ←− C ←−) or a fork (←− C −→)
such that the middle node C ∈ W, or
2 U contains a collider C (−→ C ←−) s.t. C ∈
/ W and s.t. no descendant
of C is in W.
• A set W is said to d-separate X from Y iff W every path from X to Y
is d-separated by W.
• Otherwise X and Y d-connected by W.
(cfr. Pearl 2000: 16-17).
Causality in Econometrics
26/27
Graphical Causal Models References
Reading List
• Spirtes, Glymour, Scheines (2000), Causation, Prediction, and Search, MIT Press 2nd
edition:
• Chapter 1 and 2
• Pearl (2000), Causality: Models, Reasoning, and Inference, CUP:
• Section 1.1 and 1.2
• Spanos, A. (1999), Probability Theory and Statistical Inference. CUP:
• Section 2.2 and 6.4
Further reading:
• Cooper, G.F. (1999), An Overview of the Representation and Discovery of Causal
Relationships Using Bayesian Networks, in C. Glymour, G.F. Cooper,
Computation Causation, and Discovery, MIT Press.
• Scheines, R. (1997), An Introduction to causal inference. www
Causality in Econometrics
27/27