Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Conditional Probability Distributions Eran Segal Weizmann Institute Last Time Local Markov assumptions – basic BN independencies d-separation – all independencies via graph structure G is an I-Map of P if and only if P factorizes over G I-equivalence – graphs with identical independencies Minimal I-Map All distributions have I-Maps (sometimes more than one) Minimal I-Map does not capture all independencies in P Perfect Map – not every distribution P has one PDAGs Compact representation of I-equivalence graphs Algorithm for finding PDAGs CPDs Thus far we ignored the representation of CPDs Today we will cover the range of CPD representations Discrete Continuous Sparse Deterministic Linear Table CPDs Entry for each joint assignment of X and Pa(X) For each pax: P( x | pa x ) 1 xVal ( X ) Most general representation Represents every discrete CPD I S Limitations Cannot model continuous RVs Number of parameters exponential in |Pa(X)| Cannot model large in-degree dependencies Ignores structure within the CPD P(I) P(S|I) I S i0 i1 I s0 s1 0.7 0.3 i0 0.95 0.05 i1 0.2 0.8 Structured CPDs Key idea: reduce parameters by modeling P(X|PaX) without explicitly modeling all entries of the joint Lose expressive power (cannot represent every CPD) Deterministic CPDs There is a function f: Val(PaX) Val(X) such that 1 x f ( pax ) P( x | pax ) 0 otherwise Examples OR, AND, NAND functions Z = Y+X (continuous variables) Deterministic CPDs Replace spurious dependencies with deterministic CPDs Need to make sure that deterministic CPD is compactly stored T1 T2 T1 S T2 T S S T1 T2 s0 s1 t0 t0 0.95 0.05 t0 t1 0.2 0.8 t1 t0 0.2 0.8 T s0 s1 t1 t1 0.2 0.8 t0 0.95 0.05 t1 0.2 0.8 S T T1 T2 t0 t1 t0 t0 1 0 t0 t1 0 1 t1 t0 0 1 t1 t1 0 1 Deterministic CPDs Induce additional conditional independencies Example: T is any deterministic function of T1,T2 T1 Ind(S1;S2 | T1,T2) T2 T S1 S2 Deterministic CPDs Induce additional conditional independencies Example: C is an XOR deterministic function of A,B D Ind(D;E | B,C) A B C E Deterministic CPDs Induce additional conditional independencies Example: T is an OR deterministic function of T1,T2 T1 Ind(S1;S2 | T1=t1) T2 T S1 S2 Context specific independencies Context Specific Independencies Let X,Y,Z be pairwise disjoint RV sets Let C be a set of variables and cVal(C) X and Y are contextually independent given Z and c, denoted (XcY | Z,c) if: P( X | Y , Z , c) P( X | Z , c) whenever P(Y , Z , c) 0 Tree CPDs A B C A B D C D D A B C d0 d1 a0 b0 c0 0.2 0.8 a0 b0 c1 0.2 0.8 a0 b1 c0 0.2 0.8 a0 b1 c1 0.2 0.8 a1 b0 c0 0.9 0.1 a1 b0 c1 0.7 0.3 a1 b1 c0 0.4 0.6 A1 b1 C1 0.4 0.6 A a0 a1 (0.2,0.8) B b0 C c0 (0.9,0.1) 8 parameters b1 (0.4,0.6) c1 (0.7,0.3) 4 parameters Gene Regulation: Simple Example State 1 State 2 State 3 Regulated gene Activator Repressor Regulated gene Activator Repressor Regulators Activator Repressor Regulators DNA Microarray DNA Microarray Regulated gene Regulated gene Regulation Tree Segal et al., Nature Genetics ’03 Activator? Activator expression false true true Regulation program Repressor? Repressor expression false true Module genes State 1 State 2 State 3 Respiration Module Segal et al., Nature Genetics ’03 Module genes known targets of predicted regulators? Predicted regulator Regulation program Module genes Hap4+Msn4 known to regulate module genes Rule CPDs A rule r is a pair (c;p) where c is an assignment to a subset of variables C and p[0,1]. Let Scope[r]=C A rule-based CPD P(X|Pa(X)) is a set of rules R s.t. For each rule rR Scope[r]{X}Pa(X) For each assignment (x,u) to {X}Pa(X) we have exactly one rule (c;p)R such that c is compatible with (x,u). Then, we have P(X=x | Pa(X)=u) = p Rule CPDs Example Let X be a variable with Pa(X) = {A,B,C} r1: (a1, b1, x0; 0.1) r2: (a0, c1, x0; 0.2) r3: (b0, c0, x0; 0.3) r4: (a1, b0, c1, x0; 0.4) r5: (a0, b1, c0; 0.5) r6: (a1, b1, x1; 0.9) r7: (a0, c1, x1; 0.8) r8: (b0, c0, x1; 0.7) r9: (a1, b0, c1, x1; 0.6) Note: each assignment maps to exactly one rule Rules cannot always be represented compactly within tree CPDs Tree CPDs and Rule CPDs Can represent every discrete function Can be easily learned and dealt with in inference But, some functions are not represented compactly XOR in tree CPDs: cannot split in one step on a0,b1 and a1,b0 Alternative representations exist Complex logical rules Context Specific Independencies A A=a1 Ind(D,C | A=a1) A B B C C D D A a0 A=a0 Ind(D,B | A=a0) A B C C c0 D a1 (0.9,0.1) B c1 (0.7,0.3) b0 (0.2,0.8) Reasoning by cases implies that Ind(B,C | A,D) b1 (0.4,0.6) Independence of Causal Influence Causes: X1,…Xn Effect: Y X1 X2 ... Xn Y General case: Y has a complex dependency on X1,…Xn Common case Each Xi influences Y separately Influence of X1,…Xn is combined to an overall influence on Y Example 1: Noisy OR Two independent effects X1, X2 Y=y1 cannot happen unless one of X1, X2 occurs P(Y=y0 | X1=x10 , X2=x20) = P(X1=x10)P(X2=x20) X1 X2 Y Y X1 X2 y0 y1 x1 0 x2 0 1 0 x1 0 x2 1 0.2 0.8 x1 1 x2 0 0.1 0.9 x1 1 x2 1 0.02 0.98 Noisy OR: Elaborate Representation X1 X2 X’2 X’1 X1 x1 0 x1 1 x1 0 1 0 x1 1 0.1 0.9 X’1 Noisy CPD 1 X’2 x2 0 x2 1 x2 0 1 0 x2 1 0.2 0.8 Noisy CPD 2 Y Noise parameter X1=0.9 X2 Noise parameter X1=0.8 Y X’1 X’2 y0 y1 x1 0 x2 0 1 0 x1 0 x2 1 0 1 x1 1 x2 0 0 1 x1 1 x2 1 0 1 Deterministic OR Noisy OR: Elaborate Representation Decomposition results in the same distribution P(Y y 0 | X 1 x11 , X 2 x12 ) P(Y y , X ' , X ' | X x , X x ) P( X ' | X x , X x ) P( X ' | X ' , X x , X x ) P(Y y P( X ' | X x ) P( X ' | X x ) P(Y y | X ' , X ' ) 0 1 2 1 1 1 2 1 2 X '1 , X '2 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 2 2 X '1 , X '2 X '1 , X '2 2 2 1 2 0 0.1 0.2 1 0.1 0.8 0 0.9 0.2 0 0.9 0.8 0 0.02 1 2 0 | X '1 , X '2 , X 1 x11 , X 2 x12 ) Noisy OR: General Case Y is a binary variable with k binary parents X1,...Xn CPD P(Y | X1,...Xn) is a noisy OR if there are k+1 noise parameters 0,1,...,n such that P(Y y 0 | X 1 ,..., X k ) (1 0 ) (1 ) i i: X i x1i P(Y y | X 1 ,..., X k ) 1 (1 0 ) (1 i ) i: X i x1i 1 Noisy OR Independencies X1 X2 X’1 X’2 Xn ... X’n Y ij: Ind(Xi Xj | Y=yo) Generalized Linear Models Model is a soft version of a linear threshold function Example: logistic function Binary variables X1,...Xn,Y P(Y y1 | X 1 ,..., X k ) 1 k 1 exp wo wi 1( X i 1) i 1 General Formulation Let Y be a random variable with parents X1,...Xn The CPD P(Y | X1,...Xn) exhibits independence of causal influence (ICI) if it can be described by Logistic X1 X2 ... Xn Z1 Z2 ... Zn Noisy OR Zi = wi1(Xi=1) Zi has noise model Z = Zi Z is an OR function Z Y is the identity CPD Y = logit (Z) Y The CPD P(Z | Z1,...Zn) is deterministic General Formulation Key advantage: O(n) parameters As stated, not all that useful as any complex CPD can be represented through a complex deterministic CPD Continuous Variables One solution: Discretize Often requires too many value states Loses domain structure Other solution: use continuous function for P(X|Pa(X)) Can combine continuous and discrete variables, resulting in hybrid networks Inference and learning may become more difficult Gaussian Density Functions Among the most common continuous representations Univariate case: P( X ) ~ N ( , ) if 2 1 p( X ) e 2 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -4 -2 0 2 4 1 x 2 2 Gaussian Density Functions A multivariate Gaussian distribution over X1,...Xn has Mean vector nxn positive definite covariance matrix positive definite: x n : xT x 0 Joint density function: p( x) 1 (2 ) n / 2 | |1/ 2 1 exp ( x )T 1 ( x ) 2 i=E[Xi] ii=Var[Xi] ij=Cov[Xi,Xj]=E[XiXj]-E[Xi]E[Xj] (ij) Gaussian Density Functions Marginal distributions are easy to compute P( X , Y ) N X ; XX YX Y P( X ) N X XY YY ; XX Independencies can be determined from parameters If X=X1,...Xn have a joint normal distribution N(;) then Ind(Xi Xj) iff ij=0 Does not hold in general for non-Gaussian distributions Linear Gaussian CPDs Y is a continuous variable with parents X1,...Xn Y has a linear Gaussian model if it can be described using parameters 0,...,n and 2 such that P(Y | x1 ,..., xn ) N ( 0 1 x1 ... n xn ; 2 ) Vector notation: P(Y | x ) N ( 0 T x; 2 ) Pros Simple Captures many interesting dependencies Cons Fixed variance (variance cannot depend on parents values) Linear Gaussian Bayesian Network A linear Gaussian Bayesian network is a Bayesian network where All variables are continuous All of the CPDs are linear Gaussians Key result: linear Gaussian models are equivalent to multivariate Gaussian density functions Equivalence Theorem Y is a linear Gaussian of its parents X1,...Xn: P(Y | x ) N ( 0 T x; 2 ) Assume that X1,...Xn are jointly Gaussian with N(;) Then: The marginal distribution of Y is Gaussian with N(Y;Y2) Y 0 T μ Y2 2 T The joint distribution over {X,Y} is Gaussian where Cov[ X i ; Y ] j 1 j ij n Linear Gaussian BNs define a joint Gaussian distribution Converse Equivalence Theorem If {X,Y} have a joint Gaussian distribution then P(Y | X ) N ( 0 T X ; 2 ) Implications of equivalence Joint distribution has compact representation: O(n2) We can easily transform back and forth between Gaussian distributions and linear Gaussian Bayesian networks Representations may differ in parameters X2 Xn Example: X1 ... Gaussian distribution has full covariance matrix Linear Gaussian Hybrid Models Models of continuous and discrete variables Conditional Linear Gaussians Continuous variables with discrete parents Discrete variables with continuous parents Y continuous variable X = {X1,...,Xn} continuous parents U = {U1,...,Um} discrete parents u U : P(Y | u, x) N au,0 i 1 au,i xi ; u2 n A Conditional Linear Bayesian network is one where Discrete variables have only discrete parents Continuous variables have only CLG CPDs Hybrid Models Continuous parents for discrete children Threshold models x 10 0.9 P(Y y | x) 0.05 otherwise 1 Linear sigmoid P(Y y | x1 ,..., xk ) 1 1 k 1 exp wo wi xi ) i 1 Summary: CPD Models Deterministic functions Context specific dependencies Independence of causal influence Noisy OR Logistic function CPDs capture additional domain structure