Download original

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Lecture 26 of 42
Inference and Software Tools 2
Discussion: MP5, Hugin, BNJ; Term Projects
Friday, 31 October 2008
William H. Hsu
Department of Computing and Information Sciences, KSU
KSOL course page: http://snipurl.com/v9v3
Course web site: http://www.kddresearch.org/Courses/Fall-2008/CIS730
Instructor home page: http://www.cis.ksu.edu/~bhsu
Reading for Next Class:
Chapter 15, Russell & Norvig 2nd edition
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Lecture Outline
 Wednesday’s Reading: Sections 14.6 – 14.8, R&N 2e, Chapter 15
 Friday: Sections 18.1 – 18.2, R&N 2e
 Today: Inference in Graphical Models
 Bayesian networks and causality
 Inference and learning
 BNJ interface (http://bnj.sourceforge.net)
 Causality
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Bayes Net Application:
Car Diagnosis
© 2003 D. Lin, University of Alberta
CS 366, Introduction to Artificial Intelligence, http://www.cs.ualberta.ca/~lindek/366/
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Bayes Net Application:
Interpreting Mammograms
© 2003 D. Lin, University of Alberta
CS 366, Introduction to Artificial Intelligence, http://www.cs.ualberta.ca/~lindek/366/
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Bayes Net Application:
Forecasting Oil Prices (ARCO1)
© 2003 D. Lin, University of Alberta
CS 366, Introduction to Artificial Intelligence, http://www.cs.ualberta.ca/~lindek/366/
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Bayes Net Application:
Forecasting Oil Prices – Real
Network
© 2003 D. Lin, University of Alberta
CS 366, Introduction to Artificial Intelligence, http://www.cs.ualberta.ca/~lindek/366/
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
Bayesian Network Inference
 Inference: calculating P(X |Y ) for some variables or sets of variables X
and Y.
 Inference in Bayesian networks is #P-hard!
Inputs: prior probabilities of .5
I1
I2
I3
I4
I5
Reduces to
O
P(O) must be
Computing & Information Sciences
How
many
satisfying
CIS 530 /
730: Artificial
Intelligence assignments?
Friday, 31 Oct
2008
(#sat.
assign.)*(.5^#inputs)
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
Bayesian Network Inference
 But…inference is still tractable in some cases.
 Let’s look a special class of networks: trees / forests in which each node
has at most one parent.
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
Decomposing the probabilities
 Suppose we want P(Xi | E ) where E is some set of evidence variables.
 Let’s split E into two parts:
 Ei- is the part consisting of assignments to variables in the subtree rooted at Xi
 Ei+ is the rest of it
Xi
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
Decomposing the probabilities

i

i
P( X i | E )  P( X i | E , E )
P( Ei | X , Ei ) P( X | Ei )

P( Ei | Ei )

i
Xi

i
P( E | X ) P( X | E )

P( Ei | Ei )
 απ ( X i )λ( X i )
CIS 530 / 730: Artificial Intelligence
Where:
• a is a constant independent of Xi
• p(Xi) = P(Xi |Ei+)
• l(Xi) = P(Ei-| Xi)
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
Using the decomposition for inference
 We can use this decomposition to do inference as follows. First,
compute l(Xi) = P(Ei-| Xi) for all Xi recursively, using the leaves of the tree
as the base case.
 If Xi is a leaf:
 If Xi is in E : l(Xi) = 0 if Xi matches E, 1 otherwise
 If Xi is not in E : Ei- is the null set, so
P(Ei-| Xi) = 1 (constant)
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
Quick aside: “Virtual evidence”
 For theoretical simplicity, but without loss of generality, let’s assume that all
variables in E (the evidence set) are leaves in the tree.
 Why can we do this WLOG:
Equivalent to
Xi
Observe Xi
Xi
Xi’
Observe Xi’
Where P(Xi’| Xi) =1 if Xi’=Xi, 0 otherwise
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
Calculating l(Xi) for non-leaves
 Suppose Xi has one child, Xj
 Then:
Xi
Xj
λ ( X i )  P ( Ei | X i )   P( Ei , X j | X i )
X
j
  P ( X j | X i ) P ( Ei | X i , X j )
X
j
  P ( X j | X i ) P ( Ei | X j )
X
j
  P( X j | X i )λ( X j )
X
j
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
Calculating l(Xi) for non-leaves
 Now, suppose Xi has a set of children, C.
 Since Xi d-separates each of its subtrees, the contribution of each subtree to l(Xi) is
independent:
λ( X i )  P( Ei | X i ) 
λ
j
(Xi )
X j C


   P( X j | X i )λ( X j )
X j C 

 Xj
where lj(Xi) is the contribution to P(Ei-| Xi) of the part of the evidence lying in the
subtree rooted at one of Xi’s children Xj.
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
We are now l-happy
 So now we have a way to recursively compute all the l(Xi)’s, starting
from the root and using the leaves as the base case.
 If we want, we can think of each node in the network as an autonomous
processor that passes a little “l message” to its parent.
l
l
CIS 530 / 730: Artificial Intelligence
l
l
Friday, 31 Oct 2008
l
l
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
The other half of the problem
 Remember, P(Xi|E) = ap(Xi)l(Xi). Now that we have all the l(Xi)’s, what
about the p(Xi)’s?
 p(Xi) = P(Xi |Ei+).
 What about the root of the tree, Xr? In that case, Er+ is the null set, so
p(Xr) = P(Xr). No sweat. Since we also know l(Xr), we can compute the
final P(Xr).
 So for an arbitrary Xi with parent Xp, let’s inductively assume we know
p(Xp) and/or P(Xp|E). How do we get p(Xi)?
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
Computing p(Xi)
π( X i )  P ( X i | Ei )   P( X i , X p | Ei )
Xp
  P( X i | X p , Ei ) P( X p | Ei )
Xp
Xp
  P( X i | X p ) P ( X p | Ei )
Xp
Xi
  P( X i | X p )
P( X p | E )
Xp
λi ( X p )
  P( X i | X p ) π i ( X p )
Xp
Where pi(Xp) is defined as
CIS 530 / 730: Artificial Intelligence
P( X p | E )
Friday, 31 Oct 2008
λi ( X p )
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
We’re done. Yay!
 Thus we can compute all the p(Xi)’s, and, in turn, all the P(Xi|E)’s.
 Can think of nodes as autonomous processors passing l and p
messages to their neighbors
l
l
p p
l
l
p p
CIS 530 / 730: Artificial Intelligence
l
p p
Friday, 31 Oct 2008
l
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
Conjunctive queries
 What if we want, e.g., P(A, B | C) instead of just marginal
distributions P(A | C) and P(B | C)?
 Just use chain rule:
 P(A, B | C) = P(A | C) P(B | A, C)
 Each of the latter probabilities can be computed using the
technique just discussed.
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
Polytrees
 Technique can be generalized to polytrees: undirected versions of the graphs are
still trees, but nodes can have more than one parent
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
Dealing with cycles
 Can deal with undirected cycles in graph by
 clustering variables together
A
A
B
 Conditioning
C
BC
D
D
Set to 0
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Set to 1
Computing & Information Sciences
Kansas State University
www.cs.cmu.edu/~awm/381/lec/bayesinfer/bayesinf.ppt
Join trees
 Arbitrary Bayesian network can be transformed via some evil graph-theoretic
magic into a join tree in which a similar method can be employed.
ABC
A
B
E
C
D
BCD
BCD
G
DF
F
In the worst case the join tree nodes must take on exponentially
many combinations of values, but often works well in practice
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
BNJ Visualization:
Pseudo-Code Annotation (Code Page)
© 2004 KSU BNJ Development Team
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Graphical Models Overview [1]:
Bayesian Networks


Conditional Independence

X is conditionally independent (CI) from Y given Z (sometimes written X  Y | Z) iff P(X | Y, Z) = P(X | Z) for all values of X, Y,
and Z

Example: P(Thunder | Rain, Lightning) = P(Thunder | Lightning)  T  R | L
Bayesian (Belief) Network

Acyclic directed graph model B = (V, E, ) representing CI assertions over 

Vertices (nodes) V: denote events (each a random variable)

Edges (arcs, links) E: denote conditional dependencies

Markov Condition for BBNs (Chain Rule):

Example BBN
n
P  X 1 , X 2 ,  , X n    P  X i | parents X i 
i 1
Exposure-To-Toxins
Age X1
X3
Cancer
Serum Calcium
X6
X5
Gender X2
X4
X7
Smoking

 



Tumor
Lung


Descendants
NonDescendants
Parents
P(20s, Female, Low, Non-Smoker, No-Cancer, Negative, Negative)
= P(T) · P(F) · P(L | T) · P(N | T, F) · P(N | L, N) · P(N | N) · P(N | N)
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Graphical Models Overview [3]:
Inference Problem
Multiply-connected case: exact, approximate inference are #P-complete
Adapted from slides by S. Russell, UC Berkeley
CIS 530 / 730: Artificial Intelligence
http://aima.cs.berkeley.edu/
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Tree Dependent Distributions
 Polytrees
 aka singly-connected Bayesian networks
 Definition: a Bayesian network with no undirected loops
 Idea: restrict distributions (CPTs) to single nodes
 Theorem: inference in singly-connected BBN requires linear time
• Linear in network size, including CPT sizes
• Much better than for unrestricted (multiply-connected) BBNs
 Tree Dependent Distributions
 Further restriction of polytrees: every node has at one parent
 Now only need to keep 1 prior, P(root), and n - 1 CPTs (1 per node)
root
 All CPTs are 2-dimensional: P(child | parent)
z
 Independence Assumptions
x
 As for general BBN: x is independent of non-descendants given (single) parent z
 Very strong assumption (applies in some domains but not most)
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Propagation Algorithm in Singly-Connected
Bayesian Networks – Pearl (1983)
C1
Upward (child-toparent) l messages
C2
’ (Ci’) modified during l
message-passing phase
C3
Downward p messages
C4
C5
P’ (Ci’) is computed during p
message-passing phase
C6
Multiply-connected case: exact, approximate inference are #P-complete
(counting problem is #P-complete iff decision problem is NP-complete)
Adapted from Neapolitan (1990), Guo (2000)
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Inference by Clustering [1]: Graph Operations
(Moralization, Triangulation, Maximal Cliques)
Find Maximal Cliques
A1
Clq4
Clq1
A
1
B
2
Moralize
A
B
Bayesian Network
(Acyclic Digraph)
A
G
E
F
6
E
3
D
8
D
G
5
G5
E3
E3
G5
E3
Clq3
C4
H
7
C
G
E
B2
C4
G5
C4
Clq6
F
B
Clq2
C
4
F
F6
B2
Triangulate
C4
D8
H
Clq5
H7
C
D
H
Adapted from Neapolitan (1990), Guo (2000)
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Inference by Clustering [2]:
Junction Tree – Lauritzen & Spiegelhalter (1988)
Input: list of cliques of triangulated, moralized graph Gu
Output:
Tree of cliques
Separators nodes Si,
Residual nodes Ri and potential probability (Clqi) for all cliques
Algorithm:
1. Si = Clqi (Clq1  Clq2 … Clqi-1)
2. Ri = Clqi - Si
3. If i >1 then identify a j < i such that Clqj is a parent of Clqi
4. Assign each node v to a unique clique Clqi that v  c(v)  Clqi
5. Compute (Clqi) = f(v) Clqi = P(v | c(v)) {1 if no v is assigned to Clqi}
6. Store Clqi , Ri , Si, and (Clqi) at each vertex in the tree of cliques
Adapted from Neapolitan (1990), Guo (2000)
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Inference by Clustering [3]:
Clique-Tree Operations
Ri: residual nodes
Si: separator nodes
(Clqi): potential probability of Clique i
A1
Clq4
Clq1
F6
B2
Clq2
B2
BEC
Clq2 = {B,E,C}
R2 = {C,E}
S2 = { B }
G5
E3
C4
Clq3
C4
G5
Clq3 = {E,C,G}
R3 = {G}
S3 = { E,C }
C4
Clq6
C4
EGF
Clq5
H7
Clq4 = {E, G, F}
R4 = {F}
S4 = { E,G }
Adapted from Neapolitan (1990), Guo (2000)
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
B
(Clq2) = P(C|B,E)
Clq2
ECG EC
(Clq3) = 1
Clq3
EG
(Clq4) =
P(E|F)P(G|F)P(F)
D8
(Clq1) = P(B|A)P(A)
Clq1
G5
E3
E3
AB
Clq1 = {A, B}
R1 = {A, B}
S1 = {}
CG
(Clq5) = P(H|C,G)
CGH
Clq5
Clq4
Clq5 = {C, G,H}
R5 = {H}
S5 = { C,G }
CD
Clq6 = {C, D}
R5 = {D}
S5 = { C}
(Clq2) = P(D|C)
C
Clq6
Computing & Information Sciences
Kansas State University
Inference by Loop Cutset Conditioning
Age = [0, 10)
Split vertex in
undirected cycle;
condition upon each
of its state values
X1,1
Age = [10, 20)
X1,2
Exposure-ToToxins
X3
Age = [100, )
X1,10
X2
Gender
Serum Calcium
Cancer
X6
X5
X4
X7
Smoking
Lung Tumor
 Deciding Optimal Cutset: NP-hard
 Current Open Problems
Number of network
instantiations:
Product of arity of
nodes in minimal loop
cutset
Posterior: marginal
conditioned upon
cutset variable values
 Bounded cutset conditioning: ordering heuristics
 Finding randomized algorithms for loop cutset optimization
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
BNJ Visualization [2]
Pseudo-Code Annotation (Code Page)
ALARM
Network
© 2004 KSU BNJ Development Team
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
BNJ Visualization [3]
Network
Poker
Network
© 2004 KSU BNJ Development Team
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Inference by Variable Elimination [1]:
Intuition
Adapted from slides by S. Russell, UC Berkeley
CIS 530 / 730: Artificial Intelligence
http://aima.cs.berkeley.edu/
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Inference by Variable Elimination [2]:
Factoring Operations
Adapted from slides by S. Russell, UC Berkeley
CIS 530 / 730: Artificial Intelligence
http://aima.cs.berkeley.edu/
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Genetic Algorithms for Parameter Tuning in
Bayesian Network Structure Learning
Dtrain (Inductive Learning)
D:
Training Data
[2] Representation Evaluator
for Learning Problems
Dval (Inference)

I e:
Inference Specification
f(α)
α
Representation
Candidate
Fitness
Representation
[1] Genetic Algorithm
Genetic Wrapper for
Change of Representation
and Inductive Bias Control
CIS 530 / 730: Artificial Intelligence
α̂
Optimized
Representation
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Tools for Building Graphical Models
 Commercial Tools: Ergo, Netica, TETRAD, Hugin
 Bayes Net Toolbox (BNT) – Murphy (1997-present)
 Distribution page
http://http.cs.berkeley.edu/~murphyk/Bayes/bnt.html
 Development group
http://groups.yahoo.com/group/BayesNetToolbox
 Bayesian Network tools in Java (BNJ) – Hsu et al. (1999-present)
 Distribution page
http://bnj.sourceforge.net
 Development group
http://groups.yahoo.com/group/bndev
 Current (re)implementation projects for KSU KDD Lab
• Continuous state: Minka (2002) – Hsu, Guo, Li
• Formats: XML BNIF (MSBN), Netica – Barber, Guo
• Space-efficient DBN inference – Meyer
• Bounded cutset conditioning – Chandak
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
References [1]:
Graphical Models and Inference Algorithms
 Graphical Models
 Bayesian (Belief) Networks tutorial – Murphy (2001)
http://www.cs.berkeley.edu/~murphyk/Bayes/bayes.html
 Learning Bayesian Networks – Heckerman (1996, 1999)
http://research.microsoft.com/~heckerman
 Inference Algorithms
 Junction Tree (Join Tree, L-S, Hugin): Lauritzen & Spiegelhalter (1988)
http://citeseer.nj.nec.com/huang94inference.html
 (Bounded) Loop Cutset Conditioning: Horvitz & Cooper (1989)
http://citeseer.nj.nec.com/shachter94global.html
 Variable Elimination (Bucket Elimination, ElimBel): Dechter (1986)
http://citeseer.nj.nec.com/dechter96bucket.html
 Recommended Books
• Neapolitan (1990) – out of print; see Pearl (1988), Jensen (2001)
• Castillo, Gutierrez, Hadi (1997)
• Cowell, Dawid, Lauritzen, Spiegelhalter (1999)
 Stochastic Approximation
http://citeseer.nj.nec.com/cheng00aisbn.html
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
References [2]:
Machine Learning, KDD, and Bioinformatics
 Machine Learning, Data Mining, and Knowledge Discovery
 K-State KDD Lab: literature survey and resource catalog (1999-present)
http://www.kddresearch.org/Resources
 Bayesian Network tools in Java (BNJ): Hsu, Barber, King, Meyer,
Thornton (2002-present)
http://bnj.sourceforge.net
 Machine Learning in Java (BNJ): Hsu, Louis, Plummer (2002)
http://mldev.sourceforge.net
 Bioinformatics
 European Bioinformatics Institute Tutorial: Brazma et al. (2001)
http://www.ebi.ac.uk/microarray/biology_intro.htm
 Hebrew University: Friedman, Pe’er, et al. (1999, 2000, 2002)
http://www.cs.huji.ac.il/labs/compbio/
 K-State BMI Group: literature survey and resource catalog (2002-2005)
http://www.kddresearch.org/Groups/Bioinformatics
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Terminology
 Introduction to Reasoning under Uncertainty
 Probability foundations
 Definitions: subjectivist, frequentist, logicist
 (3) Kolmogorov axioms
 Bayes’s Theorem
 Prior probability of an event
 Joint probability of an event
 Conditional (posterior) probability of an event
 Maximum A Posteriori (MAP) and Maximum Likelihood (ML) Hypotheses
 MAP hypothesis: highest conditional probability given observations (data)
 ML: highest likelihood of generating the observed data
 ML estimation (MLE): estimating parameters to find ML hypothesis
 Bayesian Inference: Computing Conditional Probabilities (CPs) in A Model
 Bayesian Learning: Searching Model (Hypothesis) Space using CPs
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Summary Points
 Introduction to Probabilistic Reasoning
 Framework: using probabilistic criteria to search H
 Probability foundations
 Definitions: subjectivist, objectivist; Bayesian, frequentist, logicist
 Kolmogorov axioms
 Bayes’s Theorem
 Definition of conditional (posterior) probability
 Product rule
 Maximum A Posteriori (MAP) and Maximum Likelihood (ML) Hypotheses
 Bayes’s Rule and MAP
 Uniform priors: allow use of MLE to generate MAP hypotheses
 Relation to version spaces, candidate elimination
 Next Week: Chapter 14, Russell and Norvig
 Later: Bayesian learning: MDL, BOC, Gibbs, Simple (Naïve) Bayes
 Categorizing text and documents, other applications
CIS 530 / 730: Artificial Intelligence
Friday, 31 Oct 2008
Computing & Information Sciences
Kansas State University
Related documents