Download Machine Learning Methods for Conditional Independence Inference

Machine Learning Methods for Conditional Independence Inference Tokyo satellite meeting of WSC2013 4. 9. 2013 Kentaro TANAKA (Milan STUDENY, Akimichi TAKEMURA, Tomonari SEI) Abstract Conditional independence is a fundamental concept in statistics and applied to simplify the structure of a model. In this paper, we deal with the implication problem of conditional independence statements, that is, testing whether a conditional independence statement is derived from a set of other conditional independence statements. To solve this problem, we propose a new machine learning method. The method is based on an idea that the implication problem can be transformed into an easier problem by adding extra conditional independence statements to a given set of conditional independence statements (..we skip this part). Furthermore, we also give another method for this problem. Another method is based on an idea that we can remove unnecessary information about conditional independence statements to solve the implication problem. We also discuss some computational results on our method. 1/20 Conditional independence implication problem [Example] Is the following relation true? X ?? Y j Z , X ?? Z ) X ??(Y; Z) The conditional independence implication problem has been considered at least since the 1980s. (Pearl & Paz(1987), Geiger et. al.(1991), Studeny(1992). Matus(1994), …. ) In this talk, we only deal with discrete random variables. We assume that the sample space is finite and each point has positive probability. (Positive probability functions for contingency tables) 2/20 Conditional Probabilities X; Y ; Z : Random variables p(X; Y; Z) ; p(X; Y ); p(X; Z); p(Y; Z); p(X); p(Y ); p(Z) ; p(fg) = 1 ( : Probability functions (p No variable) [Example] The conditional probability function of >0) X given Z is defined as p(X; Z) p(X j Z) = p(Z) . [Example] The conditional probability function of X and Y p(X; Y; Z) p(X; Y j Z) = p(Z) . given Z is defined as 3/20 Conditional Independence Definition X def is independent of Y : X ?? Y , p(X,Y ) = p(X)p(Y ) p(X; Y )p(fg) , p(X)p(Y ) = 1 Definition X is conditionally independent of Y given Z : def X ?? Y j Z , p(X; Y j Z) = p(X j Z)p(Y j Z) p(X; Y; Z) p(X; Z) p(Y; Z) , = p(Z) p(Z) p(Z) p(X; Y; Z)p(Z) =1 , p(X; Z)p(Y; Z) 4/20 Conditional Independence Relations [Example] Is the following relation true? X ?? Y j Z (Proof) ² From we have , X ?? Z X ?? Y j Z and p(X; Y; Z)p(Z) =1 p(X; Z)p(Y; Z) ). X ??(Y; Z) X ?? Z and , p(X; Z)p(fg) = 1. p(X)p(Z) ² We obtain the following relation. p(X; Y; Z)p(Z) p(X; Z)p(fg) p(X; Y; Z)p(fg) 1= ¢ = p(X; Z)p(Y; Z) p(X)p(Z) p(X)p(Y; Z) ² This means X ??(Y; Z) . 5/20 Imset (Integer-Valued Multiset) We interpret the above discussion in terms of the exponents(powers) of p(X; Y; Z); p(X; Y ); p(X; Z); p(Y; Z); p(X); p(Y ); p(Z); p(fg). Imset The method of “imsets” by Studeny provides a powerful algebraic method for describing conditional independence statements. For a given conditional independence statement, an imset is defined as an integer-valued vector whose elements correspond to the powers of the probability functions. [Example] p(X; Y; Z)p(Z) X ?? Y j Z , p(X; Z)p(Y; Z) = 1 Imset of , p(X; Y; Z)1 p(X; Z)¡1 p(Y; Z)¡1 p(Z)1 = 1 X ?? Y j Z : T uhX,Y j Zi = (1; 0; ¡1; ¡1; 0; 0; 1; 0) p(X; Y; Z); p(X; Y ) ; p(X; Z); p(Y; Z); p(X); p(Y ); p(Z); p(fg) 6/20 Imset (Integer-Valued Multiset) By using imsets and linear algebra, we can derive conditional independence relations. [Example] X ?? Y j Z X ?? Y j Z p(X; Y; Z) p(X; Y ) p(X; Z) p(Y; Z) p(X) p(Y ) p(Z) p(fg) 0 X ?? Z ) X ??(Y; Z) , + 1 1 B0C B C B C B¡1C B C B¡1C B C B0C B C B C B0C B C @1A 0 X ?? Z 0 + = 1 0 B0C B C B C B1C B C B0C B C B¡1C B C B C B0C B C @¡1A 1 X ??(Y; Z) 0 = 1 1 B0C B C B C B0C B C B¡1C B C B¡1C B C B C B0C B C @0A 1 7/20 Formulation as a linear programming Problem Is the following relation true? fAi ?? Bi j Ci gIi=1 ) A ? ?B j C The implication problem of conditional independence statements can be formulated as the following linear programming problem. Theorem 1 (Studeny (2005), Bouckaert et. al. (2010)) Let u = I X i=1 uhAi ,Bi j Ci i . If there exist non-negative rational numbers X such that k ¢ u = uhA,B j Ci + v2E(N ) ¸v ¢ v then A ? ? B j C holds. ² E(N ) : the set of elementary imsets for ² fAi g , fBi g , fCi g , A , B , C ½ N k and f¸v g , n variables 8/20 Example 4.1 of Studeny(2005) The method of Theorem 1 based on imsets is very powerful. Theorem 1 gives a sufficient condition for the implication problem of conditional independence statements. Thus, even if we failed to show A ? ? B j C by the method of Theorem 1, A ? ? B j C may be true. [Example] A; B; C; D : Random variables Is the following relation true? A? ? B j (C,D) , C ? ?D j A, C? ?D j B , C ? ?D ) C ?? D j (A,B) The relation is true. However, we cannot prove the relation just by using the method of Theorem 1 (Studeny (2005)). 9/20 Conditional Independence A ?? B j C def , , p(A,B,C)p(C) =1 p(A,C)p(B,C) 9 q, r p(A,B,C) = q(A,C)r(B,C) (Proof) ² ) is trivial. ² (( ) p(A,B,C) = q(A,C)r(B,C) , p(A,C) = q(A,C) p(C) = ( X A s.t. ( X r(B,C) B q(A,C) )( X B Therefore we obtain ) , r(B,C) p(B,C) = r(B,C) ) p(A,B,C)p(C) = 1. p(A,C)p(B,C) ( X A q(A,C) ) , 10/20 Conditional Independence [Example] def C? ? D j (A,B) , 9 q, r s.t. p(A,B,C,D) = q(A,B,C)r(A,B,D) p(A,B,C,D) If p(A,B,C,D) is decomposed into q(A,B,C) and r(A,B,D), then C ? ? D j (A,B). q(A,B,C) If p(A,B,C,D) has a q 0 (A,C) finer decomposition, then it also fits the definition of C ? ? D j (A,B). r(A,B,D) r0 (B,C) q 00 (A,D) r00 (B,D) 11/20 Example 4.1 of Studeny(2005) Is the following relation true? A? ? B j (C,D) C ? ? D ) C ?? D j (A,B) ? D j A C ?? D j B C ? , , , (Proof) A? ? B j C,D 1 = = C? ?D j A C? ?D j B C? ?D p(A,B,C,D)p(C,D) p(A,C,D)p(A) p(B,C,D)p(B) p(C)p(D) ¢ ¢ ¢ p(A,C,D)p(B,C,D) p(A,C)p(A,D) p(B,C)p(B,D) p(C,D)p(fg) p(A,B,C,D)p(A)p(B)p(C)p(D) p(A,C)p(B,C)p(A,D)p(B,D) p(A,C)p(B,C) p(A,D)p(B,D) 1 p(A,B,C,D) = ¢ ¢ p(C) p(D) p(A)p(B) A function of A,B,C C? ? D j (A,B) A function of A,B,D 12/20 Example 4.1 of Studeny(2005) C ?? D j (A,B) , , When we prove C ? ?D representations of p(A,B,C,D)p(A,B) =1 p(A,B,C)p(A,B,D) 9 q, r s.t. p(A,B,C,D) = q(A,B,C)r(A,B,D) j (A,B) , we can ignore the differences in the p(A,B,C), p(A,B,D), p(A,B) , i.e. the further decomposition of the above three functions can be ignored. Thus, we can ignore the following elements in the imsets. p(A,B,C), p(A,B,D), p(A,B), p(C) , p(A,C) , p(A,D) , p(A) , p(D) , p(B,C) , p(B,D) , p(B) , p(fg) 13/20 Example 4.1 of Studeny(2005) A? ? B j (C,D) p(A,B,C,D) p(B,C,D) p(A,C,D) p(A,B,D) p(A,B,C) p(C,D) p(B,D) p(B,C) p(A,D) p(A,C) p(A,B) p(D) p(C) p(B) p(A) p(fg) 0 1 1 B¡1C B C B C B¡1C B C B0C B C B0C B C B1C B C B C B0C B C B0C B C B0C B C B C B0C B C B0C B C B0C B C B0C B C B C B0C B C @0A 0 , C? ?D j A 0 1 0 B0C B C B C B1C B C B0C B C B0C B C B0C B C B C B0C B C B0C B C B¡1C B C B C B¡1C B C B0C B C B0C B C B0C B C B C B0C B C @1A 0 , C? ?D j B 0 1 0 B1C B C B C B0C B C B0C B C B0C B C B0C B C B C B¡1C B C B¡1C B C B0C B C B C B0C B C B0C B C B0C B C B0C B C B C B1C B C @0A 0 , ? C? ? D ) C ?? D j (A,B) 0 1 0 B0C B C B C B0C B C B0C B C B0C B C B1C B C B C B0C B C B0C B C B0C B C B C B0C B C B0C B C B¡1C B C B¡1C B C B C B0C B C @0A 1 0 1 1 B0C B C B C B0C B C B¡1C B C B¡1C B C B0C B C B C B0C B C B0C B C B0C B C B C B0C B C B1C B C B0C B C B0C B C B C B0C B C @0A 0 14/20 Example 4.1 of Studeny(2005) A? ? B j (C,D) p(A,B,C,D) p(B,C,D) p(A,C,D) p(A,B,D) p(A,B,C) p(C,D) p(B,D) p(B,C) p(A,D) p(A,C) p(A,B) p(D) p(C) p(B) p(A) p(fg) 0 1 1 B¡1C B C B C B¡1C B C B0C B C B0C B C B1C B C B C B0C B C B0C B C B0C B C B C B0C B C B0C B C B0C B C B0C B C B C B0C B C @0A 0 , C? ?D j A 0 1 0 B0C B C B C B1C B C B0C B C B0C B C B0C B C B C B0C B C B0C B C B¡1C B C B C B¡1C B C B0C B C B0C B C B0C B C B C B0C B C @1A 0 , C? ?D j B 0 1 0 B1C B C B C B0C B C B0C B C B0C B C B0C B C B C B¡1C B C B¡1C B C B0C B C B C B0C B C B0C B C B0C B C B0C B C B C B1C B C @0A 0 , ? C? ? D ) C ?? D j (A,B) 0 1 0 B0C B C B C B0C B C B0C B C B0C B C B1C B C B C B0C B C B0C B C B0C B C B C B0C B C B0C B C B¡1C B C B¡1C B C B C B0C B C @0A 1 0 1 1 B0C B C B C B0C B C B¡1C B C B¡1C B C B0C B C B C B0C B C B0C B C B0C B C B C B0C B C B1C B C B0C B C B0C B C B C B0C B C @0A 0 15/20 Example 4.1 of Studeny(2005) A? ? B j (C,D) p(A,B,C,D) p(B,C,D) p(A,C,D) p(A,B,D) p(A,B,C) p(C,D) p(B,D) p(B,C) p(A,D) p(A,C) p(A,B) p(D) p(C) p(B) p(A) p(fg) 1 1 B¡1C B C B C B¡1C B C B0C B C B0C B C B1C B C B C B0C B C B0C B C B0C B C B C B0C B C B0C B C B0C B C B0C B C B C B0C B C @0A 0 , C? ?D j A 0 C? ?D j B 0 + 1 0 B0C B C B C B1C B C B0C B C B0C B C B0C B C B C B0C B C B0C B C B¡1C B C B C B¡1C B C B0C B C B0C B C B0C B C B C B0C B C @1A 0 , 0 C? ? D ) C ?? D j (A,B) 0 0 + 1 0 B1C B C B C B0C B C B0C B C B0C B C B0C B C B C B¡1C B C B¡1C B C B0C B C B C B0C B C B0C B C B0C B C B0C B C B C B1C B C @0A , True!! _ 1 0 B0C B C B C B0C B C B0C B C B0C B C B1C B C B C B0C B C B0C B C B0C B C B C B0C B C B0C B C B¡1C B C B¡1C B C B C B0C B C @0A 1 0 = 1 1 B0C B C B C B0C B C B¡1C B C B¡1C B C B0C B C B C B0C B C B0C B C B0C B C B C B0C B C B1C B C B0C B C B0C B C B C B0C B C @0A 0 16/20 Main result We can generalize the above discussion as follows. Is the following relation true? fAi ?? Bi j Ci gIi=1 ) A ? ?B j C Definition ?B j C S (a subset of random variables) bridges A ? if S \ A 6= ; and S \ B 6= ; . Theorem 2 For a vector u , let ujABC be the restriction of u to the elements which bridge A ? ?B j C . If there exist rational numbers ¸1 ; : : : ; ¸I such that I X i=1 ¸i ¢ uhAi ,Bi j Ci i jABC = uhA,B j Ci jABC , ? B j C holds. then A ? 17/20 Other Examples A ?? B j C , A ?? B j C , A ?? B j D, A ?? C j E , D ?? E j A , A ?? C j B ) A ?? B A ?? C j B, A ?? D j B, )A ?? E j D A ?? E j C , D ?? E A ?? B j (D,E), A ?? B j D , ) A ?? E j (B,D) A ?? E j C , C ?? E j A, D ?? E j A , D ?? E .. . 18/20 Conclusions We introduced the method of imsets by Studeny. We can derive the conditional independence relations automatically by using imsets and linear algebra. We gave a new machine learning method for the implication problem of conditional independence statements. Our method broaden the applicability of techniques based on imsets for the conditional independence implication problem. 19/20 References [1] Dan Geiger, Azaria Paz, Judea Pearl, Axioms and algorithms for inferences involving probabilistic independence, Information and Computation, Volume 91, Issue 1, Pages 128–141, 1991. [2] Dan Geiger, Judea Pearl, Logical and algorithmic properties of conditional independence and graphical models, The Annals of Statistics, Volume 21, Issue 4, Pages 2001–2021, 1993. [3] Francesco M. Malvestuto, A unique formal system for binary decomposition of database relations, probability distributions and graphs, Information Sciences, Volume 59, Pages 21–52, 1992 + Francesco M. Malvestuto, M.Studeny Comment on “A unique formal ... graphs”, Information Sciences, Volume 63, Pages 1–2, 1992. [4] Frantisek Matus Stochastic independence, algebraic independence and abstract connectedness, Theoretical Computer Science, Volume 134, Issue 2, Pages 455–471, 1994. [5] Judea Pearl, Azaria Paz, Graphoids, graph-based logic for reasoning about relevance relations, Advances in Articial Intelligence, Volume II (B. Du Boulay, D. Hogg, L. Steels eds.), North-Holland, Elsevier, Pages 357-363, 1987. [6] Mathias Niepert, Dirk Van Gucht, and Marc Gyssens, Logical and Algorithmic Properties of Stable Conditional Independence, International Journal of Approximate Reasoning, Volume 51, Issue 5, pages 531–543, 2010. [7] Milan Studeny. Conditional independence relations have no finite complete characterization, Information Theory, Statistical Decision Functions and Random Processes, Transactions of the 11th Prague Conference, Volume B (S. Kubık, J. A. Vısek eds.), Kluwer, Pages 377–396, 1992. [8] Milan Studeny. Probabilistic Conditional Independence Structures, Springer-Verlag, London, 2005. [9] Remco Bouckaert, Raymond Hemmecke, Silvia Lindner, and Milan Studeny. Efficient Algorithms for Conditional Independence Inference, Journal of Machine Learning Research, Volume 11, Pages 3453–3479, 2010. 20/20 Conditional Independence Relations We can also show the converse relation! X ??(Y; Z) ) X ?? Y j Z Proof ) , X ?? Z p(X; Y; Z)p(Z) p(X; Z)p(fg) p(X; Y; Z)p(fg) ²1 = ¢ = p(X; Z)p(Y; Z) p(X)p(Z) p(X)p(Y; Z) · ¸ · ¸ · ¸ p(X; Y; Z)p(Z) p(X; Z)p(fg) p(X; Y; Z)p(fg) ² 0 = E log + E log = E log p(X; Z)p(Y; Z) p(X)p(Z) p(X)p(Y; Z) . KL-divergence KL-divergence KL-divergence ² KL-divergence is non-negative and the equality holds iff the fraction is 1. p(X; Y; Z)p(Z) = 1 and ² Then we have p(X; Z)p(Y; Z) X ?? Y j Z p(X; Z)p(fg) = 1. p(X)p(Z) X ?? Z 21/20 Conditional Independence Structures [Example] X; Y ; Z 2 f0; 1g : Random variables X Y Z p(X; Y; Z) 0 0 0 0.25 0 0 1 0 0 1 0 0 0 1 1 0.25 1 0 0 0 1 0 1 0.25 1 1 0 0.25 1 1 1 0 The (conditional ) independence structure for p(X; Y; Z) X ?? Y ; X ?? Z ; Y ?? Z X ?? Y j Z ; X ?? Z j Y ; Y ?? Z j X ; X ??(Y; Z) ; Y ??(X; Z) ; Z ??(X; Y ) 22/20 Notation for n Random Variables n-way contingency table. ± X1 ; : : : ; Xn : n random variables ² Let us consider ±N = f1; 2; : : : ; ng ± p(N) = p(XN ) = p(X1 ; : : : ; Xn ) : Joint probability function ± p(A) = p(XA ) : Marginal probability function for A µ N ² Let us consider the case where ² We abbreviate A[B as p(XN ) > 0 . AB . ² For disjoint subsets A,B,C µ N , we abbreviate XA ? ? XB j XC as A? ?B j C . 23/20 Conditional Independence A , B , C ½ N = f1; : : : ; ng , disjoint subsets Definition def A ?? B , p(AB) = p(A)p(B) , , p(AB)p(fg) =1 p(A)p(B) p(AB)1 p(fg)1 p(A)¡1 p(B)¡1 = 1 Definition A ?? B j C def , p(AB j C) = p(A j C)p(B j C) , , p(ABC)p(C) =1 p(AC)p(BC) p(ABC)1 p(C)1 p(AC)¡1 p(BC)¡1 = 1 24/20 Imsets For E , F ½ N , define ±E : 2n ! R as ±E (F ) = ( 1 (if E = F ) 0 (otherwise) A, B, C ½ N , disjoint subsets Definition: semi-elementary imset uhA,B j Ci = ±ABC + ±C ¡ ±AC ¡ ±BC uhA,B j Ci as a 2n dimensional integer vector. We can regard ( 0 ; : : : ; 0; 1 ; 0; : : : ; 0; ¡1 ; 0; : : : ; 0; ¡1 ; 0; : : : ; 0; 1 ; 0; : : : ; 0 ) C N ABC AC BC fg Definition: elementary imset If A and the imset B are singletons (say a and b ), ha,b j Ci is called elementary. u 25/20 The Elementary Imsets : 3 vars p(X; Y; Z) p(X; Y ) p(X; Z) p(Y; Z) p(X) p(Y ) p(Z) p(fg) 0 10 10 10 10 10 1 1 1 1 0 0 0 B¡1CB¡1CB 0 CB 1 CB 0 CB 0 C B CB CB CB CB CB C B CB CB CB CB CB C B¡1CB 0 CB¡1CB 0 CB 1 CB 0 C B CB CB CB CB CB C B 0 CB¡1CB¡1CB 0 CB 0 CB 1 C B CB CB CB CB CB C B 1 CB 0 CB 0 CB¡1CB¡1CB 0 C B CB CB CB CB CB C B CB CB CB CB CB C B 0 CB 1 CB 0 CB¡1CB 0 CB¡1C B CB CB CB CB CB C @ 0 A@ 0 A@ 1 A@ 0 A@¡1A@¡1A 0 0 0 1 1 1 26/20 The Elementary Imsets : 4 vars abcd 27/20 Formulation as a linear programming Problem Is the following relation true? fAi ?? Bi j Ci gIi=1 ) A ? ?B j C The implication problem of conditional independence statements can be formulated as the following linear programming problem. Theorem 1 (Studeny (2005), Bouckaert et. al. (2010)) Let u = I X i=1 uhAi ,Bi j Ci i . If there exist non-negative rational numbers X such that k ¢ u = uhA,B j Ci + v2E(N ) ¸v ¢ v then A ? ? B j C holds. ² E(N ) : the set of elementary imsets for ² fAi g , fBi g , fCi g , A , B , C ½ N k and f¸v g , n variables 28/20

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Machine Learning Methods for Conditional Independence Inference