Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chain Event Graphs and their Geometry Jim Smith with Eva Riccomagno & Christiane Gorgen University of Warwick Jan 2014 Jim Smith (Warwick) Chain Event Graphs Jan 2014 1 / 37 This Long Morning 1 Graphical Models: BNs, probability trees and the genesis of CEGs. 2 How CEGs used: representation, learning & causation. 3 Algebra & geometry of CEG and their use to date for exploring potential causes & identi…ability. 4 Some recent speculations & current work (with Christiane Gorgen). Jim Smith (Warwick) Chain Event Graphs Jan 2014 2 / 37 Advantages of Discrete Bayesian Networks Bayesian networks well known to: Graphically represent underlying structure of models elegantly, expressively & formally. Provide framework for embellishing this qualitative structure to full probability model. Guide fast propagation algorithms,conjugate learning.& model selection. Extend to models expressing causal hypotheses. Admit elegant algebraic representation (e.g. links to Segre varieties). Jim Smith (Warwick) Chain Event Graphs Jan 2014 3 / 37 Limitations of Bayesian Networks now Identi…ed However! Models speci…ed using dependence relations between a preferred set of measurement variables. BN’s don’t fully represent how things might happen. BN’s suppress info on sample space: often critical to learning identi…ablity & selection. Express graphically only a restrictive type of probabilistic symmetry. Extensions to causal hypotheses fragile and rather restrictive. Jim Smith (Warwick) Chain Event Graphs Jan 2014 4 / 37 Advantages of an Event Tree Express how things happen even when evolution asymmetric. No set of preferred variables needed a priori. Give an explicit representation of the sample space. Admit an (algebraic) probabilistic embellishment, for learning & selection. More diverse set of causal hypotheses expressible than in their BN analogues. Jim Smith (Warwick) Chain Event Graphs Jan 2014 5 / 37 Example of an Event Tree v0 hostile ! v1 & benign v2 # down v6 Jim Smith (Warwick) up v3 % ! v4 down & up v5 die v7 % survive ! v8 ! & full recovery v11 % ! v12 partial v10 die full recovery v9 ! v13 survive & v14 partial Chain Event Graphs Jan 2014 6 / 37 Limitations of Events Trees However! ETs typically big! Conditional independences not expressed topologically. No non-trivial graphical interrogation algorithms about dependence available. No e¢ cient framework for propagation and learning. Jim Smith (Warwick) Chain Event Graphs Jan 2014 7 / 37 Chain Event Graphs Typically topologically simpler than ETs but still describe how things happen. Their root to leaf paths = atoms of the sample space, each with an associated monomial probability. Express rich variety of dependence structures within topology to be graphically queried. Vehicle for map asymmetric probability model algebraic rep. Like BNs provide framework for fast propagation and conjugate learning. Almost as expressive of causal hypotheses as ETs. Jim Smith (Warwick) Chain Event Graphs Jan 2014 8 / 37 Constructing a CEG CEG a new coloured graph: a function of: 1 an elicited ET & 2 a partition of its interior vertices ="situations": sets of situations called stages. Probabilities associated with emanating edges from each situation within each stage the same. ET ! Staged tree ! CEG Situations of ET given the same colour. Edges out of situations in the same stage coloured so that edges with the same probabilities identi…ed. Giving a (coloured) staged tree If 2 situations have isomorphic subtrees rooted from them they are represented by a common vertex w in the CEG called a position. All leaves of the ET are combined into one sink vertex w∞ in the CEG. All edges & colours then mapped from the ET in obvious way. Jim Smith (Warwick) Chain Event Graphs Jan 2014 9 / 37 Example of an Event Tree STAGES "COLOURED" die v3 hostile v0 ! & benign + v1 % ! v4 - v2 v7 % survive ! v8 |{z} ! & # &+ up v6 v5 full recov v11 % ! v12 part recov v10 die full recov v9 ! v13 |{z} survive & v14 part recov Stages are the sets of positions (fv0 g, fv1 , v2 g, fv3 , v4 g, fv8 , v9 g) Jim Smith (Warwick) Chain Event Graphs Jan 2014 10 / 37 Example of a CEG Elicit stages: u0 = fv0 g, u1 = fv1 , v2 g, u2 = fv3 , v4 g, u3 = fv8 , v9 g Deduce from ET and stages positions - the vertices of CEG: w0 = fv0 g, w1 = fv1 g, w2 = fv2 g, w3 = fv3 , v4 g, w4 = fv8 , v9 g, w∞ = fleaves of ETg + w1 (1) hostile " w0 (0) Jim Smith (Warwick) %% w3 (3, 4) #survive w4 (8, 9) & + ! w2 (2) Chain Event Graphs w∞ %% Jan 2014 11 / 37 Snake Bite S&A 08: meaning from construction of CEG X1 s Bitten by snake, X2 s Carry and apply perfect antidote,X3 sDie tomorrow. " N antid. " live bite " die % live % ! % die no antid. live % ! N ! no bite die endangered ! " & w0 ! N safe && w∞ X s not bitten/ bitten but apply antidote, Y s (= X3 ) live/die, Z s safe/endangered. So the rvs whose values su¢ cient to describe future can be deduced from topology of the CEG. Jim Smith (Warwick) Chain Event Graphs Jan 2014 12 / 37 CEG’s - Smith & Anderson(08), Thwaites & Smith (11) CEG’s contain BN’s as a very special case Theorem If joint distribution of X1 , X2 , . . . , Xn (with their known sample spaces) are fully expressed as a (context speci…c) BN G , then from its CEG, C , the rvs X1 , X2 , . . . , Xn , all their conditional independence structure and their sample spaces can be retrieved from the topology and colouring of C . Conditional independence relationships between constructed rvs can be directly read from the (qualitative) topology of the CEG: eg Theorem Downstream q Upstreamj w Cut Thwaites and Smith (11) give comprehensive list for regular uncoloured CEGs. Others being investigated by Gorgen and Smith. Jim Smith (Warwick) Chain Event Graphs Jan 2014 13 / 37 A CEG which extends a BN BN X CEG Y % & =) w0 Z % ! & X && Y w∞ %% Z but context speci…c BN+ …ts much better BN+ X CEG Y % & =) Z w0 % ! & X && Y w∞ %% Z (the distribution of Z is the same whether or not X takes a medium or large value). Can search for such re…nements: see below! Jim Smith (Warwick) Chain Event Graphs Jan 2014 14 / 37 Example of a CEG with Cuts % % & % ! % ! & ! & % ! && % Downstream Y (z ) independent of upstream X (z ) given cut Z = z.Cuts need not be orthogonal. So can construct dependence through functional relationships. X (z ) & Jim Smith (Warwick) % z ! % & Chain Event Graphs ! & Y (z ) && % ! % Jan 2014 15 / 37 Embellishments and Algebra for CEGs By adding conditional probability tables we can embellish the securely elicited structure to give full probability tables Treating probabilities in these tables as indeterminates makes a link to algebraic geometry Issues associated with identi…ability & causality have recently successfully exploited this correspondence for BNs. Question: Is it possible to do the same for CEG’s? Answer:Yes! Jim Smith (Warwick) Chain Event Graphs Jan 2014 16 / 37 Probabilities on the Gene Example CEG Stage u0 = fv0 g, u1 = fv1 , v2 g, u2 = fv3 , v4 g, u3 = fv8 , v9 g. Edge probs u0h , uob , u1+ , u1 , u1d , u1s , (u1f , u1p ) P (u0h ) = π 01 , P (u1+ ) = π 11 , P (u2d ) = π 21 , P (u3f ) = π 31 , P (u0b ) = π 02 , P (u1 ) = π 12 , P (u2s ) = π 22 , P (u3p ) = π 32 , fully specify the joint distribution. Note πi 1 + πi 2 = 1 & πi 1 , πi 2 0, i = 1, .., 4. In general associated simplex of probabilities fπ i ,u : 1 associated with each lu of u’s emanating edges. π 11 w1 (1) π 01 " w0 (0) Jim Smith (Warwick) %%π12 w3 (3, 4) #π22 w4 (8, 9) π 21 π 31 π 32 π 11 π 02 ! w2 (2) Chain Event Graphs & i lu g w∞ %%π12 Jan 2014 17 / 37 Geometry of CEG’s (Riccomagno and Smith (09)) Once probabilities added, structure begins to look much more (semi) algebraic!! P(atom) = product of edge probs on its associated root to leaf path. In example, the root to sink (= atomic) probabilities are p (v5 ) = π 02 π 12 p (v6 ) = π 02 π 11 p (v7 ) = π 01 π 12 π 21 p (v10 ) = π 01 π 11 π 21 p (v11 ) = π 01 π 12 π 22 π 31 p (v14 ) = π 01 π 11 π 22 π 31 p (v12 ) = π 01 π 12 π 22 π 32 p (v13 ) = π 01 π 11 π 22 π 32 The probability of event of seeing any measurable random variable on this space take a particular value is the sum of these monomials: in particular a polynomial. Note that the 8 vector of atomic probabilities is constrained to lie in a 4 (rather than 7) dimensional space. Unlike the BN of the atomic monomials need not be multilinear or homogeneous - in above they range from degree 2 to 4. Jim Smith (Warwick) Chain Event Graphs Jan 2014 18 / 37 Random Variables and conditional independence Note leaves denote sequence of edges (v5 = (b +) , v7 = (h + d ), v11 = (h + sf ), v12 = (h + sp )) (v6 = (b ) , v10 = (h d ), v14 = (h sf ), v13 = (h sp )) Note indicator X ( ) of gene is +/ - regulated is measurable. P (X ( ) = +) = π 12 , P (X ( ) = ) = π 11 , Measurable rv Y take value 0 if environ. benign, 1 if hostile & gene dies, 2 if hostile & fully recovers and 3 if hostile & partially recovers. = +) = π 121 (p (v5 ), p (v7 ), p (v11 ), p (v12 )) = p(y jx = ) = π 111 (p (v6 ), p (v10 ), p (v14 ), p (v13 )) = (π 02 , π 01 π 21 , π 01 π 22 π 31 , π 01 π 22 π 32 ) () Y ä X ( ) p(y jx Question: How in general can we discover all deducible such ci’s from algebra of tree + stages? Jim Smith (Warwick) Chain Event Graphs Jan 2014 19 / 37 Graphs Algebra and Conditioning Subgraphs de…ne intrinsic events (T&S, 11) e.g. levels of X ( ) represented by subgraph fX ( ) = g v fX ( ) = +g v w1 (1) π 01 " w0 (0) w1 (1) π 01 " w0 (0) %π11 =1 π 02 ! %π12 =1 π 02 ! w3 (3, 4) #π22 w4 (8, 9) w2 (2) w3 (3, 4) #π22 w4 (8, 9) w2 (2) π 21 π 31 π 32 & w∞ %π11 =1 π 21 π 31 π 32 & w∞ %π12 =1 But algebra clearer and more general:a simple projection of probabilities of conditioned events. Jim Smith (Warwick) Chain Event Graphs Jan 2014 20 / 37 Conjugate Bayesian Inference on CEG’s using full data. Algebra is well used to help understand learning on BNs. Bayesian learning for CEGs exploits likelihood separation. Explicitly likelihood under complete random sampling is given by l (π ) = ∏ lu ( π u ) u 2U lu ( π u ) = x (i ,u ) ∏ π i ,u i 2u where x (i, u ) is the number of units entering stage u and proceeding along edge labelled (i, u ). and ∑i π u,i = 1. Note likelihood has nice monomial form (albeit with a simplex constraint). Independent Dirichlet priors D (α(u )) on the vectors π u leads to independent Dirichlet D (α (u )) posteriors where α (i, u ) = α(i, u ) + x (i, u ) Jim Smith (Warwick) Chain Event Graphs Jan 2014 21 / 37 Learning the topology of a CEG Under appropriate priors - because of monomial structures log marginal likelihood score of a CEG is linear in stage components. Explicitly for α = (α1 , . . . , αk ), let s (α) = log Γ(∑ki=1 αi ) and t (α) = ∑ki=1 log Γ(αi ) Ψ(C ) = log p (C ) = Ψ u (c ) = ∑ Ψ u (c ) u 2C ∑ s (α(i, u )) s (α (i, u )) + t (α(i, u )) t (α(i, u )) Conjugacy & linearity implies e.g. MAP model selection using AHC or Integer Programming is simple & fast over the albeit vast space class of CEG’s (see Freeman & Smith,11,11a Cowell and Smith 13). These methods appear to provide consistent model estimation as they do for BNs (uncompleted work!) Jim Smith (Warwick) Chain Event Graphs Jan 2014 22 / 37 Generating functions to represent events Naive polynomial associated with a CEG C is G (c,C) = ∑ c a pa ( π ) a 2A where ca is indicator on atom/leaf a and pa (π ) is its monomial probability. The probabilities of events B then projections GB (C)of this polynomial where GB (C) , ∑ ca p a ( π ) = fa 2A:ca =1 i¤ a 2B g ∑ pa ( π ) a 2E Likelihoods from a random sample lead to consistent estimates, conjugate analyses whenever each of the observed events GB (C) within the sample has a monomial form. If not then at least conjugacy breaks. Complete and ancestral sampling gives this. But easiest to check this condition algebraically. Jim Smith (Warwick) Chain Event Graphs Jan 2014 23 / 37 CEG generating functions to represent intrinsic events Can explore ci relationships T & S(11) and develop fast graphical propagation algorithms T,C,S (08) wrt subgraphs of regular C for certain conditioning sets. These have a simple algebraic expression. De…nition A regular CEG C is one where no root to leaf path contains two situations in the same stage. De…nition An event B in a regular CEG C is called intrinsic i¤ it is the union of all root to leaf paths whose edges all lie in a subset EB E (C) the edge set of C . An intrinsic event B can be represented ca = Jim Smith (Warwick) ∏ e 2E (C)\a Chain Event Graphs ce ( B ) Jan 2014 24 / 37 Manifest polynomials, identi…ability & non-ancestrality in BNs Geometry of BNs & CEGs interesting under incomplete sampling so part of story left systematically hidden ! ! ! . ! . # M,S,vS(03), K,R&S(13) Z&S(11,12), A&R(13) BN’s Here algebraic + semi - algebraic structure of observed distribution non- trivial and worthy of study. Potential unidenti…ability ,lose consistency lack conjugacy. Geometrical insights help to understand & address di¢ culties in BNs. Question: Can the same methods be used to better understand partially learned CEGs? Answer: Preliminary work by Riccomagno and Smith (09) suggests this is so! Jim Smith (Warwick) Chain Event Graphs Jan 2014 25 / 37 Probabilities of events in our example A gene in a benign environment will survive unharmed =) P{organism survives unharmed} , φ = π 02 + π 01 π 22 π 31 . Take random sample size n from CEG population observing exactly x survive unharmed. Then a likelihood of φ l ( φ ) = φx (1 φ )n x φ consistently estimable & posterior variance of φ ! 0. So can learn from data about (π 01 , π 11 , π 21 , π 31 ) through inverse image of equation π 02 + π 01 π 22 π 31 = φ π 01 [1 (1 π 21 ) π 31 ] = 1 φ but nothing else about population probs. Note induced algebraic constraint (cubic) + inequalities because probs 0 (e.g. π 01 1 φ). Jim Smith (Warwick) Chain Event Graphs Jan 2014 26 / 37 Identifying Causes in a CEG Pearl de…nes a causal BN as one where rvs not downstream of a manipulated node X , are una¤ected by the manipulation. X is set to its manipulated value x̂ with probability 1. E¤ect on downstream variables is identical to ordinary conditioning. Many controls don’t …t this scenario e.g. More usually“Whenever unit in set A of positions, take it to an adjacent position B”. Same de…nition can be used on CEGs (see Shafer 96) but much more generally Algebra can then be used to check whether the potential impact of a manipulation in a CEG is theoretically estimable. Jim Smith (Warwick) Chain Event Graphs Jan 2014 27 / 37 Causal CEGs: An Example Plan to force the environment to hostile (π 01 = 1 & π 02 = 0) & up regulate gene arti…cially ( π 11 = 0 & π 12 = 1) Controlled CEG then becomes w1 (1) 1 " w0 (0) 1 ! w3 (3, 4) #π22 w4 (8, 9) π 21 π 31 π 32 & w∞ Interest: P{fully recovery after control} = γ. CEG causal ) γ , π 22 π 31 . Observe sample fully healthy (or not) in natural system. Has prob φ + π 01 1 φ = π 02 + π 01 π 22 π 31 = (1 π 01 ) + π 01 γ , γ = π 01 Without prior information on π 01 all we can say is that γ φ. Thwaites et al(10) Thwaites(13) give many su¢ cient conditions to read identi…ability of cause Remark: Inverse image might be enough! Algebra gives us this! Jim Smith (Warwick) Chain Event Graphs Jan 2014 28 / 37 Unanswered questions Despite being a very important class, are CEGs themselves just a subclass of models which also share all the advantages of BNs? e.g. which can support interrogation, fast propagation, conjugate analysis & model selection and inferential properties, estimability, causal expressiveness?In Riccomagno and Smith (04,09) tentative steps. Can these regular classes of models be expressed simply as an algebraic entity? For then BNs and CEGs could be relegated to a depiction of subclasses of this class? I believe that this is so! Gorgen and Smith beginning to investigate algebraically model elaboration normally expressed graphically or implicitly. Can Computer algebra be used here? I think this is so: subtrees and subCEGs v elimination Are there maps of well known algebraic varieties into this domain that can be exploited? THANK YOU FOR YOUR ATTENTION!! Jim Smith (Warwick) Chain Event Graphs Jan 2014 29 / 37 Selected References of team Barclay, LM, Hutton, JL & Smith JQ (2014) "Chain Event Graphs for Informed Missingness" Bayesian Analysis (to appear) Cowell, RG & Smith, JQ (2013) "Causal discovery through MAP selection of strati…ed chain event graphs" CRiSM Res Rep. (Submitted electronic J of Stats) Barclay, LM, Smith, JQ, Thwaites, PA & Nicholson, A (2013) "Dynamic Chain Event Graphs" CRiSM Res. Rep and Math Achive Barclay, R, Hutton, JL, Smith, JQ (2013) "Re…ning a Bayesian Network using a Chain Event Graph IJAR, 54, 1300 -1308 Zwiernik, P & Smith, JQ (2012) "Tree-cumulants and the identi…ability of Bayesian tree models", Bernoulli 2012, Vol. 18, No. 1, 290-321. Zwiernik, P and Smith, JQ (2011) "Implicit inequality constraints in a binary tree model" Electronic J. of Statistics, 5, 1276 - 1312 Thwaites, PA & Smith, JQ (2011) "Separation Theorems for Chain Event Graphs" CRiSM Res.Rep. Math Archive Jim Smith (Warwick) Chain Event Graphs Jan 2014 30 / 37 Moe selected References of team Freeman, G & Smith, JQ (2011) " Dynamic Staged Trees for Discrete Multivariate Time Series", Bayesian Analysis, Vol. 06, No. 02, 279 - 306 Freeman, G & Smith, JQ (2011) " Bayesian MAP Selection of Chain Event graphs" J. Multivariate Analysis,102, 1152 -1165 Thwaites, P, Riccomagno, EM & Smith, JQ (2010) "Causal Analysis with Chain Event Graphs" Arti…cial Intelligence.174, 889–909 Riccomagno, E & Smith, J (2009) "The Geometry of Causal Probability Trees that are Algebraically Constrained" in "Optimal Design & Related Areas in Optimization & Statistics" Springer 131-152 Thwaites, P, Smith, JQ & Cowell, R (2008) "Propagation using Chain Event Graphs" Proceedings of 24th UAI, 546 -553 Smith, JQ & Anderson PE (2008) "Conditional independence and Chain Event Graphs" Arti…cial Intelligence, 172, 1, 42 - 68 Riccomagno, E.M. and Smith, J.Q. (2004) "Identifying a cause in models which are not simple Bayesian networks" Proceedings of IMPU, Perugia July 04, 1315-22 Jim Smith (Warwick) Chain Event Graphs Jan 2014 31 / 37 Propagation on BNs Usual algorithms based on triangulation step adding undirected edges to the BN until it is decomposable. Use fast local propagation on learning values of a subset of variables on this subset of BNs. Update clique probability tables of the simple in a backward and forward sweep. X2 X1 X2 X4 X6 & " % " ) j X1 ! X3 X5 X4 j X3 X6 j _ X5 Note Algorithm works only on observations = subsets of values of rvs. Jim Smith (Warwick) Chain Event Graphs Jan 2014 32 / 37 Propagation on CEGs Algorithms (Cowell et al,2008) based on deleting undirected edges in the CEG. Use fast local propagation on learning values of a subset of edges on this simple CEG. Update conditional probability tables of the simple CEG w1 " w0 %% w3 # w4 ! w2 & w∞ ) w1 %% " w0 %% w3 # w4 ! w2 & w∞ %% Note Algorithm works only on observations = subsets of edges. Very much faster than BN algorithms when many asymmetries. Jim Smith (Warwick) Chain Event Graphs Jan 2014 33 / 37 Formal de…nitions of stages and positions Two nodes v , v 0 are in the same stage u exactly when X (v ), X (v 0 ) have the same distribution under a bijection ψu (v , v 0 ), where ψu (v , v 0 ) : X(v ) = E (F (v , T )) ! X(v 0 ) = E (F (v 0 , T )) In other words, the two nodes have identical probability distributions on their edges. Two nodes v , v 0 are in the same position w exactly when there exists a bijection φw (v , v 0 ) from Λ(v , T ), the set of paths in the tree from v to a leaf node, to Λ(v 0 , T ),the set of paths from v 0 to a leaf node, such that all edges in all the paths are coloured, and that the sequence of colors in any path is the same as that in the path under the bijection. Jim Smith (Warwick) Chain Event Graphs Jan 2014 34 / 37 Formal de…nition of a staged tree A staged tree is a tree with stage set L(T ) and edges coloured as follows: When v 2 u 2 L(T ), but u contains only one node, all edges emanating from v are left uncoloured When u contains more than one node, all edges emanating from v are coloured, such that two edges e (v , v ), e (v 0 , v 0 ) have the same colour if and only if ψu (e (v , v )) = e (v 0 , v 0 ) Jim Smith (Warwick) Chain Event Graphs Jan 2014 35 / 37 Formal De…nition of a Probability Graph The probability graph of a staged tree is a directed graph, possibly with some coloured edges. Each node represents a set of nodes from the probability tree in the same position in the staged tree Its edges are constructed as follows: For each position w , choose a representative node v (w ). For each edge from v (w ) to v 0 (w 0 ), construct a single edge e (w , w 0 ), where w 0 = w∞ if v 0 is a leaf node in the tree; otherwise w 0 is the position of v 0. The colour of the edge is the colour of the edge between v and v 0 . So the number of edges in the probability tree is the same as in the staged tree. Jim Smith (Warwick) Chain Event Graphs Jan 2014 36 / 37 A formal de…nition of the CEG The chain event graph is the mixed graph with the same nodes as the probability graph; the same directed edges as the probability graph; and undirected edges drawn between di¤erent positions that are at the same stage The colors of the edges are also inherited from the probability graph Jim Smith (Warwick) Chain Event Graphs Jan 2014 37 / 37