Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Lecture 26 of 42
Inference and Software Tools 2
Discussion: MP5, Hugin, BNJ; Term Projects
Friday, 31 October 2008
William H. Hsu
Department of Computing and Information Sciences, KSU
KSOL course page: http://snipurl.com/v9v3
Course web site: http://www.kddresearch.org/Courses/Fall-2008/CIS730
Instructor home page: http://www.cis.ksu.edu/~bhsu
Reading for Next Class:
Chapter 15, Russell & Norvig 2nd edition
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Lecture Outline
Wednesday’s Reading: Sections 14.6 – 14.8, R&N 2e, Chapter 15
Friday: Sections 18.1 – 18.2, R&N 2e
Today: Inference in Graphical Models
Bayesian networks and causality
Inference and learning
BNJ interface (http://bnj.sourceforge.net)
Causality
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Bayes Net Application:
Car Diagnosis
© 2003 D. Lin, University of Alberta
CS 366, Introduction to Artificial Intelligence, http://www.cs.ualberta.ca/~lindek/366/
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Bayes Net Application:
Interpreting Mammograms
© 2003 D. Lin, University of Alberta
CS 366, Introduction to Artificial Intelligence, http://www.cs.ualberta.ca/~lindek/366/
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Bayes Net Application:
Forecasting Oil Prices (ARCO1)
© 2003 D. Lin, University of Alberta
CS 366, Introduction to Artificial Intelligence, http://www.cs.ualberta.ca/~lindek/366/
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Bayes Net Application:
Forecasting Oil Prices – Real
Network
© 2003 D. Lin, University of Alberta
CS 366, Introduction to Artificial Intelligence, http://www.cs.ualberta.ca/~lindek/366/
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
Bayesian Network Inference
Inference: calculating P(X |Y ) for some variables or sets of variables X
and Y.
Inference in Bayesian networks is #P-hard!
Inputs: prior probabilities of .5
I1
I2
I3
I4
I5
Reduces to
O
P(O) must be
Computing & Information Sciences
How
many
satisfying
CIS 530 /
730: Artificial
Intelligence assignments?
Friday, 31 Oct
2008
(#sat.
assign.)*(.5^#inputs)
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
Bayesian Network Inference
But…inference is still tractable in some cases.
Let’s look a special class of networks: trees / forests in which each node
has at most one parent.
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
Decomposing the probabilities
Suppose we want P(Xi | E ) where E is some set of evidence variables.
Let’s split E into two parts:
Ei- is the part consisting of assignments to variables in the subtree rooted at Xi
Ei+ is the rest of it
Xi
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
Decomposing the probabilities
i
i
P( X i | E ) P( X i | E , E )
P( Ei | X , Ei ) P( X | Ei )
P( Ei | Ei )
i
Xi
i
P( E | X ) P( X | E )
P( Ei | Ei )
απ ( X i )λ( X i )
CIS 530 / 730: Artificial Intelligence
Where:
• a is a constant independent of Xi
• p(Xi) = P(Xi |Ei+)
• l(Xi) = P(Ei-| Xi)
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
Using the decomposition for inference
We can use this decomposition to do inference as follows. First,
compute l(Xi) = P(Ei-| Xi) for all Xi recursively, using the leaves of the tree
as the base case.
If Xi is a leaf:
If Xi is in E : l(Xi) = 0 if Xi matches E, 1 otherwise
If Xi is not in E : Ei- is the null set, so
P(Ei-| Xi) = 1 (constant)
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
Quick aside: “Virtual evidence”
For theoretical simplicity, but without loss of generality, let’s assume that all
variables in E (the evidence set) are leaves in the tree.
Why can we do this WLOG:
Equivalent to
Xi
Observe Xi
Xi
Xi’
Observe Xi’
Where P(Xi’| Xi) =1 if Xi’=Xi, 0 otherwise
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
Calculating l(Xi) for non-leaves
Suppose Xi has one child, Xj
Then:
Xi
Xj
λ ( X i ) P ( Ei | X i ) P( Ei , X j | X i )
X
j
P ( X j | X i ) P ( Ei | X i , X j )
X
j
P ( X j | X i ) P ( Ei | X j )
X
j
P( X j | X i )λ( X j )
X
j
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
Calculating l(Xi) for non-leaves
Now, suppose Xi has a set of children, C.
Since Xi d-separates each of its subtrees, the contribution of each subtree to l(Xi) is
independent:
λ( X i ) P( Ei | X i )
λ
j
(Xi )
X j C
P( X j | X i )λ( X j )
X j C
Xj
where lj(Xi) is the contribution to P(Ei-| Xi) of the part of the evidence lying in the
subtree rooted at one of Xi’s children Xj.
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
We are now l-happy
So now we have a way to recursively compute all the l(Xi)’s, starting
from the root and using the leaves as the base case.
If we want, we can think of each node in the network as an autonomous
processor that passes a little “l message” to its parent.
l
l
CIS 530 / 730: Artificial Intelligence
l
l
Friday, 31 Oct 2008
l
l
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
The other half of the problem
Remember, P(Xi|E) = ap(Xi)l(Xi). Now that we have all the l(Xi)’s, what
about the p(Xi)’s?
p(Xi) = P(Xi |Ei+).
What about the root of the tree, Xr? In that case, Er+ is the null set, so
p(Xr) = P(Xr). No sweat. Since we also know l(Xr), we can compute the
final P(Xr).
So for an arbitrary Xi with parent Xp, let’s inductively assume we know
p(Xp) and/or P(Xp|E). How do we get p(Xi)?
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
Computing p(Xi)
π( X i ) P ( X i | Ei ) P( X i , X p | Ei )
Xp
P( X i | X p , Ei ) P( X p | Ei )
Xp
Xp
P( X i | X p ) P ( X p | Ei )
Xp
Xi
P( X i | X p )
P( X p | E )
Xp
λi ( X p )
P( X i | X p ) π i ( X p )
Xp
Where pi(Xp) is defined as
CIS 530 / 730: Artificial Intelligence
P( X p | E )
Friday, 31 Oct 2008
λi ( X p )
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
We’re done. Yay!
Thus we can compute all the p(Xi)’s, and, in turn, all the P(Xi|E)’s.
Can think of nodes as autonomous processors passing l and p
messages to their neighbors
l
l
p p
l
l
p p
CIS 530 / 730: Artificial Intelligence
l
p p
Friday, 31 Oct 2008
l
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
Conjunctive queries
What if we want, e.g., P(A, B | C) instead of just marginal
distributions P(A | C) and P(B | C)?
Just use chain rule:
P(A, B | C) = P(A | C) P(B | A, C)
Each of the latter probabilities can be computed using the
technique just discussed.
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
Polytrees
Technique can be generalized to polytrees: undirected versions of the graphs are
still trees, but nodes can have more than one parent
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
Dealing with cycles
Can deal with undirected cycles in graph by
clustering variables together
A
A
B
Conditioning
C
BC
D
D
Set to 0
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Set to 1
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
Join trees
Arbitrary Bayesian network can be transformed via some evil graph-theoretic
magic into a join tree in which a similar method can be employed.
ABC
A
B
E
C
D
BCD
BCD
G
DF
F
In the worst case the join tree nodes must take on exponentially
many combinations of values, but often works well in practice
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
BNJ Visualization:
Pseudo-Code Annotation (Code Page)
© 2004 KSU BNJ Development Team
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Graphical Models Overview [1]:
Bayesian Networks
Conditional Independence
X is conditionally independent (CI) from Y given Z (sometimes written X Y | Z) iff P(X | Y, Z) = P(X | Z) for all values of X, Y,
and Z
Example: P(Thunder | Rain, Lightning) = P(Thunder | Lightning) T R | L
Bayesian (Belief) Network
Acyclic directed graph model B = (V, E, ) representing CI assertions over
Vertices (nodes) V: denote events (each a random variable)
Edges (arcs, links) E: denote conditional dependencies
Markov Condition for BBNs (Chain Rule):
Example BBN
n
P X 1 , X 2 , , X n P X i | parents X i
i 1
Exposure-To-Toxins
Age X1
X3
Cancer
Serum Calcium
X6
X5
Gender X2
X4
X7
Smoking
Tumor
Lung
Descendants
NonDescendants
Parents
P(20s, Female, Low, Non-Smoker, No-Cancer, Negative, Negative)
= P(T) · P(F) · P(L | T) · P(N | T, F) · P(N | L, N) · P(N | N) · P(N | N)
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Graphical Models Overview [3]:
Inference Problem
Multiply-connected case: exact, approximate inference are #P-complete
Adapted from slides by S. Russell, UC Berkeley
CIS 530 / 730: Artificial Intelligence
http://aima.cs.berkeley.edu/
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Tree Dependent Distributions
Polytrees
aka singly-connected Bayesian networks
Definition: a Bayesian network with no undirected loops
Idea: restrict distributions (CPTs) to single nodes
Theorem: inference in singly-connected BBN requires linear time
• Linear in network size, including CPT sizes
• Much better than for unrestricted (multiply-connected) BBNs
Tree Dependent Distributions
Further restriction of polytrees: every node has at one parent
Now only need to keep 1 prior, P(root), and n - 1 CPTs (1 per node)
root
All CPTs are 2-dimensional: P(child | parent)
z
Independence Assumptions
x
As for general BBN: x is independent of non-descendants given (single) parent z
Very strong assumption (applies in some domains but not most)
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Propagation Algorithm in Singly-Connected
Bayesian Networks – Pearl (1983)
C1
Upward (child-toparent) l messages
C2
’ (Ci’) modified during l
message-passing phase
C3
Downward p messages
C4
C5
P’ (Ci’) is computed during p
message-passing phase
C6
Multiply-connected case: exact, approximate inference are #P-complete
(counting problem is #P-complete iff decision problem is NP-complete)
Adapted from Neapolitan (1990), Guo (2000)
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Inference by Clustering [1]: Graph Operations
(Moralization, Triangulation, Maximal Cliques)
Find Maximal Cliques
A1
Clq4
Clq1
A
1
B
2
Moralize
A
B
Bayesian Network
(Acyclic Digraph)
A
G
E
F
6
E
3
D
8
D
G
5
G5
E3
E3
G5
E3
Clq3
C4
H
7
C
G
E
B2
C4
G5
C4
Clq6
F
B
Clq2
C
4
F
F6
B2
Triangulate
C4
D8
H
Clq5
H7
C
D
H
Adapted from Neapolitan (1990), Guo (2000)
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Inference by Clustering [2]:
Junction Tree – Lauritzen & Spiegelhalter (1988)
Input: list of cliques of triangulated, moralized graph Gu
Output:
Tree of cliques
Separators nodes Si,
Residual nodes Ri and potential probability (Clqi) for all cliques
Algorithm:
1. Si = Clqi (Clq1 Clq2 … Clqi-1)
2. Ri = Clqi - Si
3. If i >1 then identify a j < i such that Clqj is a parent of Clqi
4. Assign each node v to a unique clique Clqi that v c(v) Clqi
5. Compute (Clqi) = f(v) Clqi = P(v | c(v)) {1 if no v is assigned to Clqi}
6. Store Clqi , Ri , Si, and (Clqi) at each vertex in the tree of cliques
Adapted from Neapolitan (1990), Guo (2000)
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Inference by Clustering [3]:
Clique-Tree Operations
Ri: residual nodes
Si: separator nodes
(Clqi): potential probability of Clique i
A1
Clq4
Clq1
F6
B2
Clq2
B2
BEC
Clq2 = {B,E,C}
R2 = {C,E}
S2 = { B }
G5
E3
C4
Clq3
C4
G5
Clq3 = {E,C,G}
R3 = {G}
S3 = { E,C }
C4
Clq6
C4
EGF
Clq5
H7
Clq4 = {E, G, F}
R4 = {F}
S4 = { E,G }
Adapted from Neapolitan (1990), Guo (2000)
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
B
(Clq2) = P(C|B,E)
Clq2
ECG EC
(Clq3) = 1
Clq3
EG
(Clq4) =
P(E|F)P(G|F)P(F)
D8
(Clq1) = P(B|A)P(A)
Clq1
G5
E3
E3
AB
Clq1 = {A, B}
R1 = {A, B}
S1 = {}
CG
(Clq5) = P(H|C,G)
CGH
Clq5
Clq4
Clq5 = {C, G,H}
R5 = {H}
S5 = { C,G }
CD
Clq6 = {C, D}
R5 = {D}
S5 = { C}
(Clq2) = P(D|C)
C
Clq6
Computing & Information Sciences
Kansas State University
Inference by Loop Cutset Conditioning
Age = [0, 10)
Split vertex in
undirected cycle;
condition upon each
of its state values
X1,1
Age = [10, 20)
X1,2
Exposure-ToToxins
X3
Age = [100, )
X1,10
X2
Gender
Serum Calcium
Cancer
X6
X5
X4
X7
Smoking
Lung Tumor
Deciding Optimal Cutset: NP-hard
Current Open Problems
Number of network
instantiations:
Product of arity of
nodes in minimal loop
cutset
Posterior: marginal
conditioned upon
cutset variable values
Bounded cutset conditioning: ordering heuristics
Finding randomized algorithms for loop cutset optimization
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
BNJ Visualization [2]
Pseudo-Code Annotation (Code Page)
ALARM
Network
© 2004 KSU BNJ Development Team
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
BNJ Visualization [3]
Network
Poker
Network
© 2004 KSU BNJ Development Team
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Inference by Variable Elimination [1]:
Intuition
Adapted from slides by S. Russell, UC Berkeley
CIS 530 / 730: Artificial Intelligence
http://aima.cs.berkeley.edu/
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Inference by Variable Elimination [2]:
Factoring Operations
Adapted from slides by S. Russell, UC Berkeley
CIS 530 / 730: Artificial Intelligence
http://aima.cs.berkeley.edu/
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Genetic Algorithms for Parameter Tuning in
Bayesian Network Structure Learning
Dtrain (Inductive Learning)
D:
Training Data
[2] Representation Evaluator
for Learning Problems
Dval (Inference)
I e:
Inference Specification
f(α)
α
Representation
Candidate
Fitness
Representation
[1] Genetic Algorithm
Genetic Wrapper for
Change of Representation
and Inductive Bias Control
CIS 530 / 730: Artificial Intelligence
α̂
Optimized
Representation
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Tools for Building Graphical Models
Commercial Tools: Ergo, Netica, TETRAD, Hugin
Bayes Net Toolbox (BNT) – Murphy (1997-present)
Distribution page
http://http.cs.berkeley.edu/~murphyk/Bayes/bnt.html
Development group
http://groups.yahoo.com/group/BayesNetToolbox
Bayesian Network tools in Java (BNJ) – Hsu et al. (1999-present)
Distribution page
http://bnj.sourceforge.net
Development group
http://groups.yahoo.com/group/bndev
Current (re)implementation projects for KSU KDD Lab
• Continuous state: Minka (2002) – Hsu, Guo, Li
• Formats: XML BNIF (MSBN), Netica – Barber, Guo
• Space-efficient DBN inference – Meyer
• Bounded cutset conditioning – Chandak
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
References [1]:
Graphical Models and Inference Algorithms
Graphical Models
Bayesian (Belief) Networks tutorial – Murphy (2001)
http://www.cs.berkeley.edu/~murphyk/Bayes/bayes.html
Learning Bayesian Networks – Heckerman (1996, 1999)
http://research.microsoft.com/~heckerman
Inference Algorithms
Junction Tree (Join Tree, L-S, Hugin): Lauritzen & Spiegelhalter (1988)
http://citeseer.nj.nec.com/huang94inference.html
(Bounded) Loop Cutset Conditioning: Horvitz & Cooper (1989)
http://citeseer.nj.nec.com/shachter94global.html
Variable Elimination (Bucket Elimination, ElimBel): Dechter (1986)
http://citeseer.nj.nec.com/dechter96bucket.html
Recommended Books
• Neapolitan (1990) – out of print; see Pearl (1988), Jensen (2001)
• Castillo, Gutierrez, Hadi (1997)
• Cowell, Dawid, Lauritzen, Spiegelhalter (1999)
Stochastic Approximation
http://citeseer.nj.nec.com/cheng00aisbn.html
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
References [2]:
Machine Learning, KDD, and Bioinformatics
Machine Learning, Data Mining, and Knowledge Discovery
K-State KDD Lab: literature survey and resource catalog (1999-present)
http://www.kddresearch.org/Resources
Bayesian Network tools in Java (BNJ): Hsu, Barber, King, Meyer,
Thornton (2002-present)
http://bnj.sourceforge.net
Machine Learning in Java (BNJ): Hsu, Louis, Plummer (2002)
http://mldev.sourceforge.net
Bioinformatics
European Bioinformatics Institute Tutorial: Brazma et al. (2001)
http://www.ebi.ac.uk/microarray/biology_intro.htm
Hebrew University: Friedman, Pe’er, et al. (1999, 2000, 2002)
http://www.cs.huji.ac.il/labs/compbio/
K-State BMI Group: literature survey and resource catalog (2002-2005)
http://www.kddresearch.org/Groups/Bioinformatics
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Terminology
Introduction to Reasoning under Uncertainty
Probability foundations
Definitions: subjectivist, frequentist, logicist
(3) Kolmogorov axioms
Bayes’s Theorem
Prior probability of an event
Joint probability of an event
Conditional (posterior) probability of an event
Maximum A Posteriori (MAP) and Maximum Likelihood (ML) Hypotheses
MAP hypothesis: highest conditional probability given observations (data)
ML: highest likelihood of generating the observed data
ML estimation (MLE): estimating parameters to find ML hypothesis
Bayesian Inference: Computing Conditional Probabilities (CPs) in A Model
Bayesian Learning: Searching Model (Hypothesis) Space using CPs
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Summary Points
Introduction to Probabilistic Reasoning
Framework: using probabilistic criteria to search H
Probability foundations
Definitions: subjectivist, objectivist; Bayesian, frequentist, logicist
Kolmogorov axioms
Bayes’s Theorem
Definition of conditional (posterior) probability
Product rule
Maximum A Posteriori (MAP) and Maximum Likelihood (ML) Hypotheses
Bayes’s Rule and MAP
Uniform priors: allow use of MLE to generate MAP hypotheses
Relation to version spaces, candidate elimination
Next Week: Chapter 14, Russell and Norvig
Later: Bayesian learning: MDL, BOC, Gibbs, Simple (Naïve) Bayes
Categorizing text and documents, other applications
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University