Download graphModels - The University of Kansas

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Mixture model wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Transcript
EECS 800 Research Seminar
Mining Biological Data
Instructor: Luke Huan
Fall, 2006
The UNIVERSITY of Kansas
Overview
Bayesian network and other probabilistic graph models
10/30/2006
Probabilistic Graph Models
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
slide2
Bayesian networks (informal)
A simple, graphical notation for conditional independence
assertions and hence for compact specification of full joint
distributions
Syntax:
a set of nodes, one per variable
a directed, acyclic graph (link ≈ "directly influences")
a conditional distribution for each node given its parents:
P (Xi | Parents (Xi))
In the simplest case, conditional distribution represented
as a conditional probability table (CPT) giving the
distribution over Xi for each combination of parent values
10/30/2006
Probabilistic Graph Models
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
slide3
Example
Topology of network encodes conditional independence
assertions:
Weather is independent of the other variables
Toothache and Catch are conditionally independent given
Cavity
10/30/2006
Probabilistic Graph Models
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
slide4
Example
I'm at work, neighbor John calls to say my alarm is
ringing, but neighbor Mary doesn't call. Sometimes it's set
off by minor earthquakes. Is there a burglar?
Variables: Burglary, Earthquake, Alarm, JohnCalls,
MaryCalls
Network topology reflects "causal" knowledge:
A burglar can set the alarm off
An earthquake can set the alarm off
The alarm can cause Mary to call
The alarm can cause John to call
10/30/2006
Probabilistic Graph Models
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
slide5
Example contd.
10/30/2006
Probabilistic Graph Models
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
slide6
Semantics
The full joint distribution is defined as the product of the
local conditional distributions:
n
P (X1, … ,Xn) = πi = 1 P (Xi | Parents(Xi))
e.g., P(j  m  a  b  e)
= P (j | a) P (m | a) P (a | b, e) P (b) P (e)
10/30/2006
Probabilistic Graph Models
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
slide7
Inference
Given the data that “neighbor John calls to say my alarm
is ringing, but neighbor Mary doesn't call”, how do we
make a decision about the following four possible
explanations:
Nothing at all
Burglary but not Earthquake
Earthquake but not Burglary
Burglary and Earthquake
10/30/2006
Probabilistic Graph Models
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
slide8
Learning
Suppose that we only have a joint distribution, how do
you “learn” the topology of a BN?
10/30/2006
Probabilistic Graph Models
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
slide9
Application: Clustering Users
Input: TV shows that each user watches
Output: TV show “clusters”
Assumption: shows watched by same users are similar
Class 1
• Power rangers
• Animaniacs
• X-men
• Tazmania
• Spider man
Class 4
• 60 minutes
• NBC nightly news
• CBS eve news
• Murder she wrote
• Matlock
10/30/2006
Probabilistic Graph Models
Class 2
• Young and restless
• Bold and the beautiful
• As the world turns
• Price is right
• CBS eve news
Class 5
• Seinfeld
• Friends
• Mad about
• ER
• Frasier
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
Class 3
• Tonight show
• Conan O’Brien
• NBC nightly news
• Later with Kinnear
• Seinfeld
you
slide10
App.: Finding Regulatory Networks
P(Level | Module, Regulators)
Module
HAP4 
1
Expression level of
CMK1 
Regulator
1 in experiment
0 does
What module
gene “g” belong to?
0
Experiment
Regulator1
0
BMH1 
Regulator2
GIC2 
2
Module
Regulator3
Gene
0
0
0
Expression level in each
module is a function of
expression of regulators
10/30/2006
Probabilistic Graph Models
Level
Expression
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
slide11
App.: Finding Regulatory Networks
Gat1
Hap4
1
9
10
11
8
CBF1_B
GCN4
N11
GATA
Energy and
cAMP signaling
Msn4
Xbp1
25
HAP234
3
Sip2
2
STRE
48 Module (number)
Kin82
Tpk1
N41
N14
41 33
N13
MIG1
REPCAR
CAT8
ADR1
N26
MCM1
18 13 17 15 14
DNA and RNA
processing
nuclear
Yer184c
Cmk1
Ppt1
Lsg1
4
N18
Pph3
26
Gis1
Tpk2
Gac1
30 42
N30
GCR1
HSF
XBP1
HAC1
5 16
Yap6
Ypl230w
ABF_C
N36
47 39
Ime4
Bmh1
Gcn20
Not3
31 36
Amino acid
metabolism
Inferred regulation
Regulator (Signaling molecule)
Regulation supported in literature
Regulator (transcription factor)
Enriched cis-Regulatory Motif
Experimentally tested regulator
10/30/2006
Probabilistic Graph Models
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
slide12
Constructing Bayesian networks
Base:
We know the joint distribution of X = X1, … ,Xn
We know the “topology” of X
Xi X, we know the parents of Xi
Goal: we want to create a Bayesina network that capture
the joint distribution
according to the topology
n
n
Theorem: such BN exists
10/30/2006
Probabilistic Graph Models
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
slide13
Prove by Construction
A leaf in X is a Xi  X such that Xi has no child.
For each Xi
add Xi to the network
select parents from X1, … ,Xi-1 such that
P (Xi | Parents(Xi)) = P (Xi | X1, ... Xi-1)
X = X – {Xi}
This choice of parents guarantees:
P (X1, … ,Xn) = πi =1 P (Xi | X1, … , Xi-1)
Mining Biological Data
10/30/2006
(chain rule)
Probabilistic Graph Models
KU EECS 800, Luke Huan, Fall’06
slide14
Compactness
A CPT for Boolean Xi with k Boolean parents has 2k rows for the
combinations of parent values
Each row requires one number p for Xi = true
(the number for Xi = false is just 1-p)
If each variable has no more than k parents, the complete network
requires O(n · 2k) numbers
I.e., grows linearly with n, vs. O(2n) for the full joint distribution
For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 25-1 = 31)
10/30/2006
Probabilistic Graph Models
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
slide15
Reasoning: Probability Theory
Well understood framework for modeling uncertainty
Partial knowledge of the state of the world
Noisy observations
Phenomenon not covered by our model
Inherent stochasticity
Clear semantics
Can be learned from data
10/30/2006
Probabilistic Graph Models
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
slide16
Probability Theory
A (Discrete) probability P over (, S = 2) is a mapping from
elements in S such that:
 is a set of all possible outcomes (sample space) in a probabilistic
experiment, S is a set of “events”
P() 0 for all S
P() = 1
If ,S and =, then P()=P()+P()
P(   )
Conditional Probability: P(  |  ) 
P( )
Chain Rule:
P(   )  P(  |  ) P( )
Bayes Rule:
P( |  ) P(  )
P(  |  ) 
P( )
Conditional Independence: P( |    )  P( |  )  (   |  )
10/30/2006
Probabilistic Graph Models
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
slide17
Random Variables & Notation
Random variable: Function from to a non-negative real
value such that summation of all the values is 1.
Val(X) – set of possible values of RV X
Upper case letters denote RVs (e.g., X, Y, Z)
Upper case bold letters denote set of RVs (e.g., X, Y)
Lower case letters denote RV values (e.g., x, y, z)
Lower case bold letters denote RV set values (e.g., x)
Eg. P(X = x), P(X) = {P(X=x) | x }
10/30/2006
Probabilistic Graph Models
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
slide18
Joint Probability Distribution
Given a group of random variables X = X1, … ,Xn,
Xi takes value from a set xi, the joint probability distribution is a
function that maps elements in  = Π xi to a non-negative value
such that the summation of all the values is 1.
For example, RV weather takes four values “sunny, rainy, cloudy,
snow”, RV Cavity takes 2 values “true, false”
P(Weather,Cavity) = a 4 × 2 matrix of values:
Weather =
Cavity = true
Cavity = false
10/30/2006
Probabilistic Graph Models
sunny rainy
0.144 0.02
0.576 0.08
cloudy snow
0.016 0.02
0.064 0.08
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
slide19
Marginal Probability
Given a set of RV X and its joint probabilities, a marginal
probability distribution over X’  X is:
P(X' ) 
 P( X )
X X '
Weather =
sunny rainy cloudy snow
Cavity = true 0.144 0.02 0.016 0.02
Cavity = false 0.576 0.08 0.064 0.08
P(weather=sunny) = 0.144 + 0.576 = 0.72
P(Cavity=true) = 0.144+0.02+0.016 + 0.02 = 0.2
10/30/2006
Probabilistic Graph Models
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
slide20
Independence
Two RV X, Y are independent, denoted as X  Y if
x  X , y  Y : P( X  x, Y  y)  P( X  x) P(Y  y)
Conditional independence: X is independent of Y given Z
if:
x  X , y  Y , z  Z : ( X  x  Y  y | Z  z )
Weather =
Cavity = true
Cavity = false
sunny
0.144
0.576
rainy
0.02
0.08
cloudy
0.016
0.064
snow
0.02
0.08
P(weather=sunny) = 0.144 + 0.576 = 0.72
P(Cavity=true) = 0.144+0.02+0.016 + 0.02 = 0.2
10/30/2006
Probabilistic Graph Models
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
slide21
Representing Joint Distributions
Random variables: X1,…,Xn
P is a joint distribution over X1,…,Xn
If X1,..,Xn binary, need 2n parameters to describe P
Can we represent P more compactly?
Key: Exploit independence properties
10/30/2006
Probabilistic Graph Models
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
slide22
Independent Random Variables
If X and Y are independent then:
P(X, Y) = P(X|Y)P(Y) = P(X)P(Y)
If X1,…,Xn are independent then:
P(X1,…,Xn) = P(X1)…P(Xn)
O(n) parameters
All 2n probabilities are implicitly defined
Cannot represent many types of distributions
We may need to consider conditional independence
10/30/2006
Probabilistic Graph Models
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
slide23
Conditional Parameterization
S = Score on test, Val(S) = {s0,s1}
I = Intelligence, Val(I) = {i0,i1}
G = Grade, Val(G) = {g0,g1,g2}
Assume that G and S are independent given I
Joint parameterization
223=12-1=11 independent parameters
Conditional parameterization has
P(I,S,G) = P(I)P(S|I)P(G|I,S) = P(I)P(S|I)P(G|I)
P(I) – 1 independent parameter
P(S|I) – 21 independent parameters
P(G|I) - 22 independent parameters
7 independent parameters
10/30/2006
Probabilistic Graph Models
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
slide24
Naïve Bayes Model
Class variable C, Val(C) = {c1,…,ck}
Evidence variables X1,…,Xn
Naïve Bayes assumption: evidence variables are conditionally independent
given C
n
P (C , X 1 ,..., X n )  P (C ) P ( X i | C )
i 1
Applications in medical diagnosis, text classification
Used as a classifier:
P(C  c1 | x1 ,..., xn ) P(C  c1 ) n P( xi | C  c1 )


P(C  c2 | x1 ,..., xn ) P(C  c2 ) i 1 P( xi | C  c2 )
Problem: Double counting correlated evidence
10/30/2006
Probabilistic Graph Models
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
slide25
Bayesian Network A Formal Study
A Bayesian network on a group of random variables X =
X1, … ,Xn is a tupple (T, P) such that
The topology T  X  X is a directed acyclic graph
A joint distribution P such that
for all i [1,n], for all possible value of xi and xs
P(Xi = xi| Xs = xs)
= P(Xi = xi| parents(Xi) = xs)
S = non-descendents of Xi in X
Or, Xi is conditional independent of any of its non-descendent
variables, given its parents(Xi)
10/30/2006
Probabilistic Graph Models
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
slide26
Factorization Theorem
If G is an Independence-Map (I-map) of P, then
n
P( X 1 ,..., X n )   P( X i | Pa( X i ))
i 1
Proof:
X1,…,Xn is an ordering consistentnwith G
P( X 1 ,..., X n )   P( X i | X 1 ,..., X i 1 )
By chain rule:
i 1
From assumption: Pa( X i )  { X 1, , X i 1}
{ X 1, , X i 1}  Pa( X i )  NonDesc ( X i )
Since G is an I-Map  (Xi; NonDesc(Xi)| Pa(Xi))I(P)
P( X i | X 1 ,..., X i 1 )  P( X i | Pa( X i ))
10/30/2006
Probabilistic Graph Models
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
slide27
Factorization Implies I-Map
n
P ( X 1 ,..., X n )   P ( X i | Pa( X i ))
i 1
 G is an I-Map of P
Proof:
Need to show that P(Xi | ND(Xi)) = P(Xi | Pa(Xi))
D is the descendents of node I, ND all nodes except i and D
P ( X i | ND( X i ))


P ( X i , ND( X i ))
P ( ND( X i ))
 P( X
d
jD
jD {i }
 P( X
d
jD
| Pa( X j )) P ( X i | Pa( X i ))  P ( X j | Pa( X j ))
 
d

j
j
P ( X j | Pa( X j ))  P ( X j | Pa( X j ))
jND
| Pa( X j )) P ( X i | Pa( X i ))
  P( X
d
jND
j
| Pa( X j ))
jD {i }
 P ( X i | Pa( X i ))
10/30/2006
Probabilistic Graph Models
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
slide28
Probabilistic Graphical Models
Tool for representing complex systems and performing
sophisticated reasoning tasks
Fundamental notion: Modularity
Complex systems are built by combining simpler parts
Why have a model?
Compact and modular representation of complex systems
Ability to execute complex reasoning patterns
Make predictions
Generalize from particular problem
10/30/2006
Probabilistic Graph Models
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
slide29
Probabilistic Graphical Models
Increasingly important in Machine Learning
Many classical probabilistic problems in statistics,
information theory, pattern recognition, and statistical
mechanics are special cases of the formalism
Graphical models provides a common framework
Advantage: specialized techniques developed in one field can be
transferred between research communities
10/30/2006
Probabilistic Graph Models
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
slide30
Representation: Graphs
Intuitive data structure for modeling highly-interacting
sets of variables
Explicit model for modularity
Data structure that allows for design of efficient generalpurpose algorithms
10/30/2006
Probabilistic Graph Models
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
slide31
Reference
“Bayesian Networks and Beyond”,
Daphne Koller (Stanford) & Nir Friedman (Hebrew U.)
10/30/2006
Probabilistic Graph Models
Mining Biological Data
KU EECS 800, Luke Huan, Fall’06
slide32