Download Chain Event Graphs and their Geometry

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

Transcript
Chain Event Graphs and their Geometry
Jim Smith with Eva Riccomagno & Christiane Gorgen
University of Warwick
Jan 2014
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
1 / 37
This Long Morning
1
Graphical Models: BNs, probability trees and the genesis of CEGs.
2
How CEGs used: representation, learning & causation.
3
Algebra & geometry of CEG and their use to date for exploring
potential causes & identi…ability.
4
Some recent speculations & current work (with Christiane Gorgen).
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
2 / 37
Advantages of Discrete Bayesian Networks
Bayesian networks well known to:
Graphically represent underlying structure of models elegantly,
expressively & formally.
Provide framework for embellishing this qualitative structure to full
probability model.
Guide fast propagation algorithms,conjugate learning.& model
selection.
Extend to models expressing causal hypotheses.
Admit elegant algebraic representation (e.g. links to Segre varieties).
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
3 / 37
Limitations of Bayesian Networks now Identi…ed
However!
Models speci…ed using dependence relations between a preferred set
of measurement variables.
BN’s don’t fully represent how things might happen.
BN’s suppress info on sample space: often critical to learning
identi…ablity & selection.
Express graphically only a restrictive type of probabilistic symmetry.
Extensions to causal hypotheses fragile and rather restrictive.
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
4 / 37
Advantages of an Event Tree
Express how things happen even when evolution asymmetric.
No set of preferred variables needed a priori.
Give an explicit representation of the sample space.
Admit an (algebraic) probabilistic embellishment, for learning &
selection.
More diverse set of causal hypotheses expressible than in their BN
analogues.
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
5 / 37
Example of an Event Tree
v0
hostile
!
v1
&
benign v2
#
down v6
Jim Smith (Warwick)
up
v3
%
! v4
down
&
up
v5
die
v7
% survive
!
v8
!
&
full recovery
v11
%
!
v12
partial
v10
die
full recovery
v9
!
v13
survive &
v14
partial
Chain Event Graphs
Jan 2014
6 / 37
Limitations of Events Trees
However!
ETs typically big!
Conditional independences not expressed topologically.
No non-trivial graphical interrogation algorithms about
dependence available.
No e¢ cient framework for propagation and learning.
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
7 / 37
Chain Event Graphs
Typically topologically simpler than ETs but still describe how
things happen.
Their root to leaf paths = atoms of the sample space, each with
an associated monomial probability.
Express rich variety of dependence structures within topology to be
graphically queried.
Vehicle for map asymmetric probability model
algebraic rep.
Like BNs provide framework for fast propagation and conjugate
learning.
Almost as expressive of causal hypotheses as ETs.
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
8 / 37
Constructing a CEG
CEG a new coloured graph: a function of:
1
an elicited ET &
2
a partition of its interior vertices ="situations": sets of situations called stages. Probabilities associated with emanating edges from
each situation within each stage the same.
ET ! Staged tree ! CEG
Situations of ET given the same colour. Edges out of situations in the
same stage coloured so that edges with the same probabilities
identi…ed. Giving a (coloured) staged tree
If 2 situations have isomorphic subtrees rooted from them they
are represented by a common vertex w in the CEG called a position.
All leaves of the ET are combined into one sink vertex w∞ in the
CEG. All edges & colours then mapped from the ET in obvious way.
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
9 / 37
Example of an Event Tree
STAGES
"COLOURED"
die
v3
hostile
v0 !
&
benign
+
v1
%
! v4
-
v2
v7
% survive
!
v8
|{z}
!
&
# &+ up
v6
v5
full recov
v11
%
!
v12
part recov
v10
die
full recov
v9
!
v13
|{z}
survive &
v14
part recov
Stages are the sets of positions (fv0 g, fv1 , v2 g, fv3 , v4 g, fv8 , v9 g)
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
10 / 37
Example of a CEG
Elicit stages:
u0 = fv0 g, u1 = fv1 , v2 g, u2 = fv3 , v4 g, u3 = fv8 , v9 g
Deduce from ET and stages positions - the vertices of CEG:
w0 = fv0 g, w1 = fv1 g, w2 = fv2 g, w3 = fv3 , v4 g,
w4 = fv8 , v9 g, w∞ = fleaves of ETg
+
w1 (1)
hostile "
w0 (0)
Jim Smith (Warwick)
%%
w3 (3, 4)
#survive
w4 (8, 9)
&
+
!
w2 (2)
Chain Event Graphs
w∞
%%
Jan 2014
11 / 37
Snake Bite S&A 08: meaning from construction of CEG
X1 s Bitten by snake, X2 s Carry and apply perfect antidote,X3 sDie
tomorrow.
"
N
antid. "
live
bite
"
die
%
live
%
!
%
die
no antid.
live
%
!
N !
no bite
die
endangered
!
" &
w0 !
N
safe
&&
w∞
X s not bitten/ bitten but apply antidote, Y s (= X3 ) live/die, Z s
safe/endangered.
So the rvs whose values su¢ cient to describe future can be deduced
from topology of the CEG.
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
12 / 37
CEG’s - Smith & Anderson(08), Thwaites & Smith (11)
CEG’s contain BN’s as a very special case
Theorem
If joint distribution of X1 , X2 , . . . , Xn (with their known sample spaces) are
fully expressed as a (context speci…c) BN G , then from its CEG, C , the rvs
X1 , X2 , . . . , Xn , all their conditional independence structure and their
sample spaces can be retrieved from the topology and colouring of C .
Conditional independence relationships between constructed rvs can
be directly read from the (qualitative) topology of the CEG: eg
Theorem
Downstream q Upstreamj w Cut
Thwaites and Smith (11) give comprehensive list for regular
uncoloured CEGs. Others being investigated by Gorgen and Smith.
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
13 / 37
A CEG which extends a BN
BN
X
CEG
Y
%
&
=)
w0
Z
%
!
&
X
&&
Y
w∞
%%
Z
but context speci…c BN+ …ts much better
BN+
X
CEG
Y
%
&
=)
Z
w0
%
!
&
X
&&
Y
w∞
%%
Z
(the distribution of Z is the same whether or not X takes a medium or
large value). Can search for such re…nements: see below!
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
14 / 37
Example of a CEG with Cuts
%
%
&
%
!
%
!
&
!
&
%
!
&&
%
Downstream Y (z ) independent of upstream X (z ) given cut Z = z.Cuts
need not be orthogonal. So can construct dependence through functional
relationships.
X (z )
&
Jim Smith (Warwick)
%
z !
%
&
Chain Event Graphs
!
&
Y (z )
&&
%
!
%
Jan 2014
15 / 37
Embellishments and Algebra for CEGs
By adding conditional probability tables we can embellish the securely
elicited structure to give full probability tables
Treating probabilities in these tables as indeterminates makes a link
to algebraic geometry
Issues associated with identi…ability & causality have recently
successfully exploited this correspondence for BNs.
Question: Is it possible to do the same for CEG’s?
Answer:Yes!
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
16 / 37
Probabilities on the Gene Example CEG
Stage u0 = fv0 g, u1 = fv1 , v2 g, u2 = fv3 , v4 g, u3 = fv8 , v9 g. Edge
probs u0h , uob , u1+ , u1 , u1d , u1s , (u1f , u1p )
P (u0h ) = π 01 , P (u1+ ) = π 11 , P (u2d ) = π 21 , P (u3f ) = π 31 ,
P (u0b ) = π 02 , P (u1 ) = π 12 , P (u2s ) = π 22 , P (u3p ) = π 32 ,
fully specify the joint distribution. Note
πi 1 + πi 2 = 1 & πi 1 , πi 2
0, i = 1, .., 4.
In general associated simplex of probabilities fπ i ,u : 1
associated with each lu of u’s emanating edges.
π 11
w1 (1)
π 01 "
w0 (0)
Jim Smith (Warwick)
%%π12
w3 (3, 4)
#π22
w4 (8, 9)
π 21
π 31
π 32
π 11
π 02
!
w2 (2)
Chain Event Graphs
&
i
lu g
w∞
%%π12
Jan 2014
17 / 37
Geometry of CEG’s (Riccomagno and Smith (09))
Once probabilities added, structure begins to look much more (semi)
algebraic!!
P(atom) = product of edge probs on its associated root to leaf path.
In example, the root to sink (= atomic) probabilities are
p (v5 ) = π 02 π 12
p (v6 ) = π 02 π 11
p (v7 ) = π 01 π 12 π 21
p (v10 ) = π 01 π 11 π 21
p (v11 ) = π 01 π 12 π 22 π 31 p (v14 ) = π 01 π 11 π 22 π 31
p (v12 ) = π 01 π 12 π 22 π 32 p (v13 ) = π 01 π 11 π 22 π 32
The probability of event of seeing any measurable random variable on
this space take a particular value is the sum of these monomials: in
particular a polynomial.
Note that the 8 vector of atomic probabilities is constrained to lie in
a 4 (rather than 7) dimensional space.
Unlike the BN of the atomic monomials need not be multilinear or
homogeneous - in above they range from degree 2 to 4.
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
18 / 37
Random Variables and conditional independence
Note leaves denote sequence of edges
(v5 = (b +) , v7 = (h + d ), v11 = (h + sf ), v12 = (h + sp ))
(v6 = (b ) , v10 = (h d ), v14 = (h sf ), v13 = (h sp ))
Note indicator X ( ) of gene is +/ - regulated is measurable.
P (X ( ) = +) = π 12 , P (X ( ) =
) = π 11 ,
Measurable rv Y take value 0 if environ. benign, 1 if hostile & gene
dies, 2 if hostile & fully recovers and 3 if hostile & partially recovers.
= +) = π 121 (p (v5 ), p (v7 ), p (v11 ), p (v12 )) =
p(y jx
=
) = π 111 (p (v6 ), p (v10 ), p (v14 ), p (v13 ))
= (π 02 , π 01 π 21 , π 01 π 22 π 31 , π 01 π 22 π 32 )
() Y ä X ( )
p(y jx
Question: How in general can we discover all deducible such ci’s from
algebra of tree + stages?
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
19 / 37
Graphs Algebra and Conditioning
Subgraphs de…ne intrinsic events (T&S, 11) e.g. levels of X ( )
represented by subgraph
fX ( ) =
g v
fX ( ) = +g v
w1 (1)
π 01 "
w0 (0)
w1 (1)
π 01 "
w0 (0)
%π11 =1
π 02
!
%π12 =1
π 02
!
w3 (3, 4)
#π22
w4 (8, 9)
w2 (2)
w3 (3, 4)
#π22
w4 (8, 9)
w2 (2)
π 21
π 31
π 32
&
w∞
%π11 =1
π 21
π 31
π 32
&
w∞
%π12 =1
But algebra clearer and more general:a simple projection of probabilities of
conditioned events.
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
20 / 37
Conjugate Bayesian Inference on CEG’s using full data.
Algebra is well used to help understand learning on BNs.
Bayesian learning for CEGs exploits likelihood separation. Explicitly
likelihood under complete random sampling is given by
l (π ) =
∏ lu ( π u )
u 2U
lu ( π u ) =
x (i ,u )
∏ π i ,u
i 2u
where x (i, u ) is the number of units entering stage u and proceeding
along edge labelled (i, u ). and ∑i π u,i = 1. Note likelihood has nice
monomial form (albeit with a simplex constraint).
Independent Dirichlet priors D (α(u )) on the vectors π u leads to
independent Dirichlet D (α (u )) posteriors where
α (i, u ) = α(i, u ) + x (i, u )
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
21 / 37
Learning the topology of a CEG
Under appropriate priors - because of monomial structures log
marginal likelihood score of a CEG is linear in stage components.
Explicitly for α = (α1 , . . . , αk ), let s (α) = log Γ(∑ki=1 αi ) and
t (α) = ∑ki=1 log Γ(αi )
Ψ(C ) = log p (C ) =
Ψ u (c ) =
∑ Ψ u (c )
u 2C
∑ s (α(i, u ))
s (α (i, u )) + t (α(i, u ))
t (α(i, u ))
Conjugacy & linearity implies e.g. MAP model selection using AHC or
Integer Programming is simple & fast over the albeit vast space class
of CEG’s (see Freeman & Smith,11,11a Cowell and Smith 13).
These methods appear to provide consistent model estimation as they
do for BNs (uncompleted work!)
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
22 / 37
Generating functions to represent events
Naive polynomial associated with a CEG C is
G (c,C) =
∑ c a pa ( π )
a 2A
where ca is indicator on atom/leaf a and pa (π ) is its monomial
probability.
The probabilities of events B then projections GB (C)of this
polynomial where
GB (C) ,
∑
ca p a ( π ) =
fa 2A:ca =1 i¤ a 2B g
∑ pa ( π )
a 2E
Likelihoods from a random sample lead to consistent estimates,
conjugate analyses whenever each of the observed events GB (C)
within the sample has a monomial form. If not then at least
conjugacy breaks.
Complete and ancestral sampling gives this. But easiest to check this
condition algebraically.
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
23 / 37
CEG generating functions to represent intrinsic events
Can explore ci relationships T & S(11) and develop fast graphical
propagation algorithms T,C,S (08) wrt subgraphs of regular C for
certain conditioning sets.
These have a simple algebraic expression.
De…nition
A regular CEG C is one where no root to leaf path contains two situations
in the same stage.
De…nition
An event B in a regular CEG C is called intrinsic i¤ it is the union of all
root to leaf paths whose edges all lie in a subset EB
E (C) the edge set
of C .
An intrinsic event B can be represented
ca =
Jim Smith (Warwick)
∏
e 2E (C)\a
Chain Event Graphs
ce ( B )
Jan 2014
24 / 37
Manifest polynomials, identi…ability & non-ancestrality in
BNs
Geometry of BNs & CEGs interesting under incomplete sampling so
part of story left systematically hidden
!
!
!
.
!
. #
M,S,vS(03), K,R&S(13)
Z&S(11,12), A&R(13)
BN’s
Here algebraic + semi - algebraic structure of observed distribution
non- trivial and worthy of study.
Potential unidenti…ability ,lose consistency lack conjugacy.
Geometrical insights help to understand & address di¢ culties in BNs.
Question: Can the same methods be used to better understand partially
learned CEGs?
Answer: Preliminary work by Riccomagno and Smith (09) suggests this is
so! Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
25 / 37
Probabilities of events in our example
A gene in a benign environment will survive unharmed =)
P{organism survives unharmed} , φ = π 02 + π 01 π 22 π 31 .
Take random sample size n from CEG population observing exactly x
survive unharmed. Then a likelihood of φ
l ( φ ) = φx (1
φ )n
x
φ consistently estimable & posterior variance of φ ! 0.
So can learn from data about (π 01 , π 11 , π 21 , π 31 ) through inverse
image of equation
π 02 + π 01 π 22 π 31 = φ
π 01 [1
(1
π 21 ) π 31 ] = 1
φ
but nothing else about population probs.
Note induced algebraic constraint (cubic) + inequalities because
probs 0 (e.g. π 01 1 φ).
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
26 / 37
Identifying Causes in a CEG
Pearl de…nes a causal BN as one where rvs not downstream of a
manipulated node X , are una¤ected by the manipulation.
X is set to its manipulated value x̂ with probability 1.
E¤ect on downstream variables is identical to ordinary conditioning.
Many controls don’t …t this scenario e.g. More usually“Whenever unit
in set A of positions, take it to an adjacent position B”.
Same de…nition can be used on CEGs (see Shafer 96) but much more
generally Algebra can then be used to check whether the potential
impact of a manipulation in a CEG is theoretically estimable.
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
27 / 37
Causal CEGs: An Example
Plan to force the environment to hostile (π 01 = 1 & π 02 = 0) & up
regulate gene arti…cially ( π 11 = 0 & π 12 = 1) Controlled CEG then
becomes
w1 (1)
1 "
w0 (0)
1
! w3 (3, 4)
#π22
w4 (8, 9)
π 21
π 31
π 32
&
w∞
Interest: P{fully recovery after control} = γ. CEG causal
) γ , π 22 π 31 .
Observe sample fully healthy (or not) in natural system. Has prob
φ + π 01 1
φ = π 02 + π 01 π 22 π 31 = (1 π 01 ) + π 01 γ , γ =
π 01
Without prior information on π 01 all we can say is that γ φ.
Thwaites et al(10) Thwaites(13) give many su¢ cient conditions to
read identi…ability of cause
Remark: Inverse image might be enough! Algebra gives us this!
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
28 / 37
Unanswered questions
Despite being a very important class, are CEGs themselves just a
subclass of models which also share all the advantages of BNs? e.g.
which can support interrogation, fast propagation, conjugate analysis
& model selection and inferential properties, estimability, causal
expressiveness?In Riccomagno and Smith (04,09) tentative steps.
Can these regular classes of models be expressed simply as an
algebraic entity? For then BNs and CEGs could be relegated to a
depiction of subclasses of this class? I believe that this is so!
Gorgen and Smith beginning to investigate algebraically model
elaboration normally expressed graphically or implicitly.
Can Computer algebra be used here? I think this is so: subtrees and
subCEGs v elimination
Are there maps of well known algebraic varieties into this domain that
can be exploited?
THANK YOU FOR YOUR ATTENTION!!
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
29 / 37
Selected References of team
Barclay, LM, Hutton, JL & Smith JQ (2014) "Chain Event Graphs for
Informed Missingness" Bayesian Analysis (to appear)
Cowell, RG & Smith, JQ (2013) "Causal discovery through MAP selection of
strati…ed chain event graphs" CRiSM Res Rep. (Submitted electronic J of Stats)
Barclay, LM, Smith, JQ, Thwaites, PA & Nicholson, A (2013) "Dynamic
Chain Event Graphs" CRiSM Res. Rep and Math Achive
Barclay, R, Hutton, JL, Smith, JQ (2013) "Re…ning a Bayesian Network using
a Chain Event Graph IJAR, 54, 1300 -1308
Zwiernik, P & Smith, JQ (2012) "Tree-cumulants and the identi…ability of
Bayesian tree models", Bernoulli 2012, Vol. 18, No. 1, 290-321.
Zwiernik, P and Smith, JQ (2011) "Implicit inequality constraints in a binary
tree model" Electronic J. of Statistics, 5, 1276 - 1312
Thwaites, PA & Smith, JQ (2011) "Separation Theorems for Chain Event
Graphs" CRiSM Res.Rep. Math Archive
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
30 / 37
Moe selected References of team
Freeman, G & Smith, JQ (2011) " Dynamic Staged Trees for Discrete
Multivariate Time Series", Bayesian Analysis, Vol. 06, No. 02, 279 - 306
Freeman, G & Smith, JQ (2011) " Bayesian MAP Selection of Chain Event
graphs" J. Multivariate Analysis,102, 1152 -1165
Thwaites, P, Riccomagno, EM & Smith, JQ (2010) "Causal Analysis with
Chain Event Graphs" Arti…cial Intelligence.174, 889–909
Riccomagno, E & Smith, J (2009) "The Geometry of Causal Probability Trees
that are Algebraically Constrained" in "Optimal Design & Related Areas in
Optimization & Statistics" Springer 131-152
Thwaites, P, Smith, JQ & Cowell, R (2008) "Propagation using Chain Event
Graphs" Proceedings of 24th UAI, 546 -553
Smith, JQ & Anderson PE (2008) "Conditional independence and Chain
Event Graphs" Arti…cial Intelligence, 172, 1, 42 - 68
Riccomagno, E.M. and Smith, J.Q. (2004) "Identifying a cause in models
which are not simple Bayesian networks" Proceedings of IMPU, Perugia July 04,
1315-22
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
31 / 37
Propagation on BNs
Usual algorithms based on triangulation step adding undirected edges
to the BN until it is decomposable.
Use fast local propagation on learning values of a subset of variables
on this subset of BNs.
Update clique probability tables of the simple in a backward and
forward sweep.
X2
X1
X2
X4
X6
& " % " ) j
X1
! X3
X5
X4
j
X3
X6
j
_ X5
Note Algorithm works only on observations = subsets of values of rvs.
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
32 / 37
Propagation on CEGs
Algorithms (Cowell et al,2008) based on deleting undirected edges in
the CEG.
Use fast local propagation on learning values of a subset of edges on
this simple CEG.
Update conditional probability tables of the simple CEG
w1
"
w0
%%
w3
#
w4
!
w2
&
w∞ ) w1
%%
"
w0
%%
w3
#
w4
!
w2
&
w∞
%%
Note Algorithm works only on observations = subsets of edges. Very much
faster than BN algorithms when many asymmetries.
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
33 / 37
Formal de…nitions of stages and positions
Two nodes v , v 0 are in the same stage u exactly when X (v ), X (v 0 )
have the same distribution under a bijection ψu (v , v 0 ), where
ψu (v , v 0 ) : X(v ) = E (F (v , T )) ! X(v 0 ) = E (F (v 0 , T ))
In other words, the two nodes have identical probability distributions
on their edges.
Two nodes v , v 0 are in the same position w exactly when there exists
a bijection φw (v , v 0 ) from Λ(v , T ), the set of paths in the tree from
v to a leaf node, to Λ(v 0 , T ),the set of paths from v 0 to a leaf node,
such that all edges in all the paths are coloured, and that the
sequence of colors in any path is the same as that in the path under
the bijection.
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
34 / 37
Formal de…nition of a staged tree
A staged tree is a tree with stage set L(T ) and edges coloured as
follows:
When v 2 u 2 L(T ), but u contains only one node, all edges
emanating from v are left uncoloured
When u contains more than one node, all edges emanating from v are
coloured, such that two edges e (v , v ), e (v 0 , v 0 ) have the same
colour if and only if ψu (e (v , v )) = e (v 0 , v 0 )
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
35 / 37
Formal De…nition of a Probability Graph
The probability graph of a staged tree is a directed graph, possibly
with some coloured edges. Each node represents a set of nodes from
the probability tree in the same position in the staged tree
Its edges are constructed as follows:
For each position w , choose a representative node v (w ). For each
edge from v (w ) to v 0 (w 0 ), construct a single edge e (w , w 0 ), where
w 0 = w∞ if v 0 is a leaf node in the tree; otherwise w 0 is the position of
v 0.
The colour of the edge is the colour of the edge between v and v 0 .
So the number of edges in the probability tree is the same as in the
staged tree.
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
36 / 37
A formal de…nition of the CEG
The chain event graph is the mixed graph with
the same nodes as the probability graph;
the same directed edges as the probability graph; and
undirected edges drawn between di¤erent positions that are at the
same stage
The colors of the edges are also inherited from the probability graph
Jim Smith (Warwick)
Chain Event Graphs
Jan 2014
37 / 37