Download Machine Learning Methods for Conditional Independence Inference

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Machine Learning Methods for
Conditional Independence Inference
Tokyo satellite meeting of WSC2013
4. 9. 2013
Kentaro TANAKA
(Milan STUDENY, Akimichi TAKEMURA, Tomonari SEI)
Abstract
Conditional independence is a fundamental concept in statistics and applied to
simplify the structure of a model. In this paper, we deal with the implication problem of
conditional independence statements, that is, testing whether a conditional
independence statement is derived from a set of other conditional independence
statements. To solve this problem, we propose a new machine learning method. The
method is based on an idea that the implication problem can be transformed into an
easier problem by adding extra conditional independence statements to a given set of
conditional independence statements (..we skip this part). Furthermore, we also give
another method for this problem. Another method is based on an idea that we can
remove unnecessary information about conditional independence statements to solve
the implication problem. We also discuss some computational results on our method.
1/20
Conditional independence implication problem
[Example]
Is the following relation true?
X ?? Y j Z , X ?? Z
) X ??(Y; Z)
The conditional independence implication problem
has been considered at least since the 1980s.
(Pearl & Paz(1987), Geiger et. al.(1991), Studeny(1992). Matus(1994), …. )
In this talk, we only deal with discrete random variables.
We assume that the sample space is finite
and each point has positive probability.
(Positive probability functions for contingency tables)
2/20
Conditional Probabilities
X; Y ; Z
: Random variables
p(X; Y; Z) ;
p(X; Y ); p(X; Z); p(Y; Z);
p(X); p(Y ); p(Z) ;
p(fg) = 1
(
: Probability functions
(p
No variable)
[Example]
The conditional probability function of
>0)
X given Z is defined as
p(X; Z)
p(X j Z) =
p(Z) .
[Example]
The conditional probability function of
X and Y
p(X; Y; Z)
p(X; Y j Z) =
p(Z) .
given
Z is defined as
3/20
Conditional Independence
Definition
X
def
is independent of
Y : X ?? Y , p(X,Y ) = p(X)p(Y )
p(X; Y )p(fg)
, p(X)p(Y ) = 1
Definition
X is conditionally independent of Y given Z :
def
X ?? Y j Z , p(X; Y j Z) = p(X j Z)p(Y j Z)
p(X; Y; Z)
p(X; Z) p(Y; Z)
,
=
p(Z)
p(Z) p(Z)
p(X; Y; Z)p(Z)
=1
,
p(X; Z)p(Y; Z)
4/20
Conditional Independence Relations
[Example]
Is the following relation true?
X ?? Y j Z
(Proof)
² From
we have
,
X ?? Z
X ?? Y j Z
and
p(X; Y; Z)p(Z)
=1
p(X; Z)p(Y; Z)
). X ??(Y; Z)
X ?? Z
and
,
p(X; Z)p(fg)
= 1.
p(X)p(Z)
² We obtain the following relation.
p(X; Y; Z)p(Z) p(X; Z)p(fg) p(X; Y; Z)p(fg)
1=
¢
=
p(X; Z)p(Y; Z)
p(X)p(Z)
p(X)p(Y; Z)
² This means
X ??(Y; Z) .
5/20
Imset (Integer-Valued Multiset)
We interpret the above discussion in terms of the exponents(powers) of
p(X; Y; Z); p(X; Y ); p(X; Z); p(Y; Z); p(X); p(Y ); p(Z); p(fg).
Imset
The method of “imsets” by Studeny provides a powerful algebraic
method for describing conditional independence statements.
For a given conditional independence statement, an imset is defined
as an integer-valued vector whose elements correspond to the powers
of the probability functions.
[Example]
p(X; Y; Z)p(Z)
X ?? Y j Z , p(X; Z)p(Y; Z) = 1
Imset of
, p(X; Y; Z)1 p(X; Z)¡1 p(Y; Z)¡1 p(Z)1 = 1
X ?? Y j Z
:
T
uhX,Y j Zi = (1; 0; ¡1; ¡1; 0; 0; 1; 0)
p(X; Y; Z); p(X; Y ) ; p(X; Z); p(Y; Z); p(X); p(Y ); p(Z); p(fg)
6/20
Imset (Integer-Valued Multiset)
By using imsets and linear algebra,
we can derive conditional independence relations.
[Example]
X ?? Y j Z
X ?? Y j Z
p(X; Y; Z)
p(X; Y )
p(X; Z)
p(Y; Z)
p(X)
p(Y )
p(Z)
p(fg)
0
X ?? Z ) X ??(Y; Z)
,
+
1
1
B0C
B C
B C
B¡1C
B C
B¡1C
B C
B0C
B C
B C
B0C
B C
@1A
0
X ?? Z
0
+
=
1
0
B0C
B C
B C
B1C
B C
B0C
B C
B¡1C
B C
B C
B0C
B C
@¡1A
1
X ??(Y; Z)
0
=
1
1
B0C
B C
B C
B0C
B C
B¡1C
B C
B¡1C
B C
B C
B0C
B C
@0A
1
7/20
Formulation as a linear programming Problem
Is the following relation true?
fAi ?? Bi j Ci gIi=1 ) A ?
?B j C
The implication problem of conditional independence statements
can be formulated as the following linear programming problem.
Theorem 1 (Studeny (2005), Bouckaert et. al. (2010))
Let u =
I
X
i=1
uhAi ,Bi j Ci i .
If there exist non-negative rational numbers
X
such that
k ¢ u = uhA,B j Ci +
v2E(N )
¸v ¢ v
then A ?
? B j C holds.
² E(N ) : the set of elementary imsets for
² fAi g , fBi g , fCi g , A , B , C ½ N
k
and
f¸v g
,
n variables
8/20
Example 4.1 of Studeny(2005)
The method of Theorem 1 based on imsets is very powerful.
Theorem 1 gives a sufficient condition for the implication problem
of conditional independence statements.
Thus, even if we failed to show A ?
? B j C by the method of
Theorem 1, A ?
? B j C may be true.
[Example]
A; B; C; D : Random variables
Is the following relation true?
A?
? B j (C,D) , C ?
?D j A,
C?
?D j B , C ?
?D
) C ?? D j (A,B)
The relation is true.
However, we cannot prove the relation
just by using the method of Theorem 1 (Studeny (2005)).
9/20
Conditional Independence
A ?? B j C
def
,
,
p(A,B,C)p(C)
=1
p(A,C)p(B,C)
9 q, r
p(A,B,C) = q(A,C)r(B,C)
(Proof)
² ) is trivial.
² (( ) p(A,B,C) = q(A,C)r(B,C) ,
p(A,C) = q(A,C)
p(C) =
(
X
A
s.t.
(
X
r(B,C)
B
q(A,C)
)(
X
B
Therefore we obtain
)
,
r(B,C)
p(B,C) = r(B,C)
)
p(A,B,C)p(C)
= 1.
p(A,C)p(B,C)
(
X
A
q(A,C)
)
,
10/20
Conditional Independence
[Example]
def
C?
? D j (A,B) ,
9 q, r
s.t.
p(A,B,C,D) = q(A,B,C)r(A,B,D)
p(A,B,C,D)
If p(A,B,C,D) is
decomposed into
q(A,B,C) and r(A,B,D),
then C ?
? D j (A,B).
q(A,B,C)
If p(A,B,C,D) has a
q 0 (A,C)
finer decomposition,
then it also fits the
definition of C ?
? D j (A,B).
r(A,B,D)
r0 (B,C) q 00 (A,D) r00 (B,D)
11/20
Example 4.1 of Studeny(2005)
Is the following relation true?
A?
? B j (C,D) C ?
? D ) C ?? D j (A,B)
? D j A C ?? D j B C ?
,
,
,
(Proof)
A?
? B j C,D
1 =
=
C?
?D j A
C?
?D j B
C?
?D
p(A,B,C,D)p(C,D) p(A,C,D)p(A) p(B,C,D)p(B)
p(C)p(D)
¢
¢
¢
p(A,C,D)p(B,C,D) p(A,C)p(A,D) p(B,C)p(B,D) p(C,D)p(fg)
p(A,B,C,D)p(A)p(B)p(C)p(D)
p(A,C)p(B,C)p(A,D)p(B,D)
p(A,C)p(B,C) p(A,D)p(B,D)
1
p(A,B,C,D) =
¢
¢
p(C)
p(D)
p(A)p(B)
A function of A,B,C
C?
? D j (A,B)
A function of A,B,D
12/20
Example 4.1 of Studeny(2005)
C ?? D j (A,B) ,
,
When we prove C ?
?D
representations of
p(A,B,C,D)p(A,B)
=1
p(A,B,C)p(A,B,D)
9 q, r
s.t.
p(A,B,C,D) = q(A,B,C)r(A,B,D)
j (A,B) , we can ignore the differences in the
p(A,B,C), p(A,B,D), p(A,B) ,
i.e. the further decomposition of the above three functions can be ignored.
Thus, we can ignore the following elements in the imsets.
p(A,B,C), p(A,B,D), p(A,B), p(C) ,
p(A,C) , p(A,D) , p(A) , p(D) ,
p(B,C) , p(B,D) , p(B) , p(fg)
13/20
Example 4.1 of Studeny(2005)
A?
? B j (C,D)
p(A,B,C,D)
p(B,C,D)
p(A,C,D)
p(A,B,D)
p(A,B,C)
p(C,D)
p(B,D)
p(B,C)
p(A,D)
p(A,C)
p(A,B)
p(D)
p(C)
p(B)
p(A)
p(fg)
0
1
1
B¡1C
B C
B C
B¡1C
B C
B0C
B C
B0C
B C
B1C
B C
B C
B0C
B C
B0C
B C
B0C
B C
B C
B0C
B C
B0C
B C
B0C
B C
B0C
B C
B C
B0C
B C
@0A
0
,
C?
?D j A
0
1
0
B0C
B C
B C
B1C
B C
B0C
B C
B0C
B C
B0C
B C
B C
B0C
B C
B0C
B C
B¡1C
B C
B C
B¡1C
B C
B0C
B C
B0C
B C
B0C
B C
B C
B0C
B C
@1A
0
,
C?
?D j B
0
1
0
B1C
B C
B C
B0C
B C
B0C
B C
B0C
B C
B0C
B C
B C
B¡1C
B C
B¡1C
B C
B0C
B C
B C
B0C
B C
B0C
B C
B0C
B C
B0C
B C
B C
B1C
B C
@0A
0
,
?
C?
? D ) C ?? D j (A,B)
0
1
0
B0C
B C
B C
B0C
B C
B0C
B C
B0C
B C
B1C
B C
B C
B0C
B C
B0C
B C
B0C
B C
B C
B0C
B C
B0C
B C
B¡1C
B C
B¡1C
B C
B C
B0C
B C
@0A
1
0
1
1
B0C
B C
B C
B0C
B C
B¡1C
B C
B¡1C
B C
B0C
B C
B C
B0C
B C
B0C
B C
B0C
B C
B C
B0C
B C
B1C
B C
B0C
B C
B0C
B C
B C
B0C
B C
@0A
0
14/20
Example 4.1 of Studeny(2005)
A?
? B j (C,D)
p(A,B,C,D)
p(B,C,D)
p(A,C,D)
p(A,B,D)
p(A,B,C)
p(C,D)
p(B,D)
p(B,C)
p(A,D)
p(A,C)
p(A,B)
p(D)
p(C)
p(B)
p(A)
p(fg)
0
1
1
B¡1C
B C
B C
B¡1C
B C
B0C
B C
B0C
B C
B1C
B C
B C
B0C
B C
B0C
B C
B0C
B C
B C
B0C
B C
B0C
B C
B0C
B C
B0C
B C
B C
B0C
B C
@0A
0
,
C?
?D j A
0
1
0
B0C
B C
B C
B1C
B C
B0C
B C
B0C
B C
B0C
B C
B C
B0C
B C
B0C
B C
B¡1C
B C
B C
B¡1C
B C
B0C
B C
B0C
B C
B0C
B C
B C
B0C
B C
@1A
0
,
C?
?D j B
0
1
0
B1C
B C
B C
B0C
B C
B0C
B C
B0C
B C
B0C
B C
B C
B¡1C
B C
B¡1C
B C
B0C
B C
B C
B0C
B C
B0C
B C
B0C
B C
B0C
B C
B C
B1C
B C
@0A
0
,
?
C?
? D ) C ?? D j (A,B)
0
1
0
B0C
B C
B C
B0C
B C
B0C
B C
B0C
B C
B1C
B C
B C
B0C
B C
B0C
B C
B0C
B C
B C
B0C
B C
B0C
B C
B¡1C
B C
B¡1C
B C
B C
B0C
B C
@0A
1
0
1
1
B0C
B C
B C
B0C
B C
B¡1C
B C
B¡1C
B C
B0C
B C
B C
B0C
B C
B0C
B C
B0C
B C
B C
B0C
B C
B1C
B C
B0C
B C
B0C
B C
B C
B0C
B C
@0A
0
15/20
Example 4.1 of Studeny(2005)
A?
? B j (C,D)
p(A,B,C,D)
p(B,C,D)
p(A,C,D)
p(A,B,D)
p(A,B,C)
p(C,D)
p(B,D)
p(B,C)
p(A,D)
p(A,C)
p(A,B)
p(D)
p(C)
p(B)
p(A)
p(fg)
1
1
B¡1C
B C
B C
B¡1C
B C
B0C
B C
B0C
B C
B1C
B C
B C
B0C
B C
B0C
B C
B0C
B C
B C
B0C
B C
B0C
B C
B0C
B C
B0C
B C
B C
B0C
B C
@0A
0
,
C?
?D j A
0
C?
?D j B
0
+
1
0
B0C
B C
B C
B1C
B C
B0C
B C
B0C
B C
B0C
B C
B C
B0C
B C
B0C
B C
B¡1C
B C
B C
B¡1C
B C
B0C
B C
B0C
B C
B0C
B C
B C
B0C
B C
@1A
0
,
0
C?
? D ) C ?? D j (A,B)
0
0
+
1
0
B1C
B C
B C
B0C
B C
B0C
B C
B0C
B C
B0C
B C
B C
B¡1C
B C
B¡1C
B C
B0C
B C
B C
B0C
B C
B0C
B C
B0C
B C
B0C
B C
B C
B1C
B C
@0A
,
True!!
_
1
0
B0C
B C
B C
B0C
B C
B0C
B C
B0C
B C
B1C
B C
B C
B0C
B C
B0C
B C
B0C
B C
B C
B0C
B C
B0C
B C
B¡1C
B C
B¡1C
B C
B C
B0C
B C
@0A
1
0
=
1
1
B0C
B C
B C
B0C
B C
B¡1C
B C
B¡1C
B C
B0C
B C
B C
B0C
B C
B0C
B C
B0C
B C
B C
B0C
B C
B1C
B C
B0C
B C
B0C
B C
B C
B0C
B C
@0A
0
16/20
Main result
We can generalize the above discussion as follows.
Is the following relation true?
fAi ?? Bi j Ci gIi=1 ) A ?
?B j C
Definition
?B j C
S (a subset of random variables) bridges A ?
if S \ A 6= ; and S \ B 6= ; .
Theorem 2
For a vector u , let ujABC be the restriction of u to
the elements which bridge A ?
?B j C .
If there exist rational numbers ¸1 ; : : : ; ¸I such that
I
X
i=1
¸i ¢ uhAi ,Bi j Ci i jABC = uhA,B j Ci jABC ,
? B j C holds.
then A ?
17/20
Other Examples
A ?? B j C ,
A ?? B j C ,
A ?? B j D,
A ?? C j E ,
D ?? E j A ,
A ?? C j B ) A ?? B
A ?? C j B,
A ?? D j B,
)A ?? E j D
A ?? E j C ,
D ?? E
A ?? B j (D,E),
A ?? B j D ,
) A ?? E j (B,D)
A ?? E j C , C ?? E j A,
D ?? E j A , D ?? E
..
.
18/20
Conclusions
We introduced the method of imsets by Studeny.
We can derive the conditional independence relations
automatically by using imsets and linear algebra.
We gave a new machine learning method
for the implication problem of
conditional independence statements.
Our method broaden the applicability of techniques
based on imsets for the conditional independence
implication problem.
19/20
References
[1] Dan Geiger, Azaria Paz, Judea Pearl,
Axioms and algorithms for inferences involving probabilistic independence,
Information and Computation, Volume 91, Issue 1, Pages 128–141, 1991.
[2] Dan Geiger, Judea Pearl,
Logical and algorithmic properties of conditional independence and graphical models,
The Annals of Statistics, Volume 21, Issue 4, Pages 2001–2021, 1993.
[3] Francesco M. Malvestuto, A unique formal system for binary decomposition of database relations,
probability distributions and graphs, Information Sciences, Volume 59, Pages 21–52, 1992
+ Francesco M. Malvestuto, M.Studeny
Comment on “A unique formal ... graphs”, Information Sciences, Volume 63, Pages 1–2, 1992.
[4] Frantisek Matus Stochastic independence, algebraic independence and abstract connectedness,
Theoretical Computer Science, Volume 134, Issue 2, Pages 455–471, 1994.
[5] Judea Pearl, Azaria Paz, Graphoids, graph-based logic for reasoning about relevance relations,
Advances in Articial Intelligence, Volume II (B. Du Boulay, D. Hogg, L. Steels eds.),
North-Holland, Elsevier, Pages 357-363, 1987.
[6] Mathias Niepert, Dirk Van Gucht, and Marc Gyssens,
Logical and Algorithmic Properties of Stable Conditional Independence,
International Journal of Approximate Reasoning, Volume 51, Issue 5, pages 531–543, 2010.
[7] Milan Studeny. Conditional independence relations have no finite complete characterization,
Information Theory, Statistical Decision Functions and Random Processes,
Transactions of the 11th Prague Conference, Volume B
(S. Kubık, J. A. Vısek eds.), Kluwer, Pages 377–396, 1992.
[8] Milan Studeny. Probabilistic Conditional Independence Structures, Springer-Verlag, London, 2005.
[9] Remco Bouckaert, Raymond Hemmecke, Silvia Lindner, and Milan Studeny.
Efficient Algorithms for Conditional Independence Inference,
Journal of Machine Learning Research, Volume 11, Pages 3453–3479, 2010.
20/20
Conditional Independence Relations
We can also show the converse relation!
X ??(Y; Z) ) X ?? Y j Z
Proof )
,
X ?? Z
p(X; Y; Z)p(Z) p(X; Z)p(fg) p(X; Y; Z)p(fg)
²1 =
¢
=
p(X; Z)p(Y; Z)
p(X)p(Z)
p(X)p(Y; Z)
·
¸
·
¸
·
¸
p(X; Y; Z)p(Z)
p(X; Z)p(fg)
p(X; Y; Z)p(fg)
² 0 = E log
+ E log
= E log
p(X; Z)p(Y; Z)
p(X)p(Z)
p(X)p(Y; Z)
.
KL-divergence
KL-divergence
KL-divergence
² KL-divergence is non-negative and the equality holds iff the fraction is 1.
p(X; Y; Z)p(Z)
= 1 and
² Then we have
p(X; Z)p(Y; Z)
X ?? Y j Z
p(X; Z)p(fg)
= 1.
p(X)p(Z)
X ?? Z
21/20
Conditional Independence Structures
[Example]
X; Y ; Z 2 f0; 1g : Random variables
X Y Z p(X; Y; Z)
0
0
0
0.25
0
0
1
0
0
1
0
0
0
1
1
0.25
1
0
0
0
1
0
1
0.25
1
1
0
0.25
1
1
1
0
The (conditional ) independence structure for
p(X; Y; Z)
X ?? Y ; X ?? Z ; Y ?? Z
X ?? Y j Z ; X ?? Z j Y ; Y ?? Z j X ;
X ??(Y; Z) ; Y ??(X; Z) ; Z ??(X; Y )
22/20
Notation for
n
Random Variables
n-way contingency table.
± X1 ; : : : ; Xn : n random variables
² Let us consider
±N
= f1; 2; : : : ; ng
± p(N)
= p(XN ) = p(X1 ; : : : ; Xn ) : Joint probability function
± p(A) = p(XA ) : Marginal probability function for A µ N
² Let us consider the case where
² We abbreviate
A[B
as
p(XN ) > 0 .
AB .
² For disjoint subsets A,B,C µ N ,
we abbreviate
XA ?
? XB j XC
as
A?
?B j C
.
23/20
Conditional Independence
A , B , C ½ N = f1; : : : ; ng , disjoint subsets
Definition
def
A ?? B , p(AB) = p(A)p(B) ,
,
p(AB)p(fg)
=1
p(A)p(B)
p(AB)1 p(fg)1 p(A)¡1 p(B)¡1 = 1
Definition
A ?? B j C
def
, p(AB j C) = p(A j C)p(B j C)
,
,
p(ABC)p(C)
=1
p(AC)p(BC)
p(ABC)1 p(C)1 p(AC)¡1 p(BC)¡1 = 1
24/20
Imsets
For
E , F ½ N , define ±E : 2n ! R
as ±E (F ) =
(
1 (if E = F )
0 (otherwise)
A, B, C ½ N , disjoint subsets
Definition: semi-elementary imset
uhA,B j Ci = ±ABC + ±C ¡ ±AC ¡ ±BC
uhA,B j Ci as a 2n dimensional integer vector.
We can regard
( 0 ; : : : ; 0; 1 ; 0; : : : ; 0; ¡1 ; 0; : : : ; 0; ¡1 ; 0; : : : ; 0; 1 ; 0; : : : ; 0 )
C
N
ABC
AC
BC
fg
Definition: elementary imset
If A and
the imset
B
are singletons (say a and b ),
ha,b j Ci is called elementary.
u
25/20
The Elementary Imsets : 3 vars
p(X; Y; Z)
p(X; Y )
p(X; Z)
p(Y; Z)
p(X)
p(Y )
p(Z)
p(fg)
0
10
10
10
10
10
1
1
1
1
0
0
0
B¡1CB¡1CB 0 CB 1 CB 0 CB 0 C
B CB CB CB CB CB C
B CB CB CB CB CB C
B¡1CB 0 CB¡1CB 0 CB 1 CB 0 C
B CB CB CB CB CB C
B 0 CB¡1CB¡1CB 0 CB 0 CB 1 C
B CB CB CB CB CB C
B 1 CB 0 CB 0 CB¡1CB¡1CB 0 C
B CB CB CB CB CB C
B CB CB CB CB CB C
B 0 CB 1 CB 0 CB¡1CB 0 CB¡1C
B CB CB CB CB CB C
@ 0 A@ 0 A@ 1 A@ 0 A@¡1A@¡1A
0
0
0
1
1
1
26/20
The Elementary Imsets : 4 vars
abcd
27/20
Formulation as a linear programming Problem
Is the following relation true?
fAi ?? Bi j Ci gIi=1 ) A ?
?B j C
The implication problem of conditional independence statements
can be formulated as the following linear programming problem.
Theorem 1 (Studeny (2005), Bouckaert et. al. (2010))
Let u =
I
X
i=1
uhAi ,Bi j Ci i .
If there exist non-negative rational numbers
X
such that
k ¢ u = uhA,B j Ci +
v2E(N )
¸v ¢ v
then A ?
? B j C holds.
² E(N ) : the set of elementary imsets for
² fAi g , fBi g , fCi g , A , B , C ½ N
k
and
f¸v g
,
n variables
28/20
Related documents