Download Sobel_Graphical Models

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Least squares wikipedia , lookup

Choice modelling wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
Graphical Models for Machine
Learning and Computer Vision
Statistical Models
• Statistical Models Describe observed ‘DATA’ via an
assumed likelihood:
L (DATA | Θ)
• With Θ denoting the ‘parameters’ needed to describe the
data.
• Likelihoods measure how likely what was observed was.
They implicitly assume an error mechanism (in the
translation between what was observed and what was
‘supposed’ to be observed).
• Parameters may describe model features or even specify
different models.
An Example of a Statistical
Model
• A burgler alarm is affected by both earthquakes
and burgleries. It has a mechanism to
communicate with the homeowner if activated. It
went off at Judah Pearles house one day. Should
he:
•
a) immediately call the police
•
under suspicion that a burglary took
•
place, or
•
b) go home and immediately transfer his
•
valueables elsewhere?
A Statistical Analysis
• Observation: The burgler alarm went off
(i.e., a=1);
• Parameter 1: The presence or absence of an
earthquake (i.e., e=1,0);
• Parameter 2: The presence or absence of a
burglary at Judah’s house
(i.e., b=1,0).
LIKELIHOODS/PRIORS IN
THIS CASE
• The Likelihood associated with the
observation is:
L ( DATA | )  P(a  1| b, e)
• With b,e =0,1 (depending on whether a
burglery,earthquake has taken place).
• The Priors specify the probabilities of a
burglery or earthquake happenning:
P(b  1)  ?; P(e=1)=?;
Example Probabilities
• Here are some probabilities indicating
something about the likelihood and prior:
P(b  0)  .9; P(b=1)=.1;
P(a=1|e=b=0)=.001; P(a=1|b=1,e=0)=.368;
P(a=1|e=1,b=0)=.135; P(a=1|b=e=1)=.607;
LIKELIHOOD/PRIOR
INTERPRETATION
• Burglaries are as likely (apriori) as earthquakes.
• It is unlikely that the alarm just went off by itself.
• The alarm goes off more often when a burglary
happens but an earthquakes does not than (the
reverse) i.e., when an earthquake happens but a
burglary does not.
• If both a burglary and an earthquake happens than
it is (virtually) twice as likely the alarm will go
off.
Probability Propagation Graph
•
PROBABILITY PROPOGATION
• There are two kinds of Probability
Propogation: (see Frey 1998)
a) marginalization i.e., P( B  b)
• And b) multiplication i.e., P(b  B)
• Marginalization sums over terms leading
into the node;
• Multiplication multiplies over terms leading
into the node.
CAUSAL ANALYSIS
• To analyze the causes of the alarm going
off, we calculate the probability that it was a
burglary (in this case) and compare it with
the probability
P(b = 1 | a = 1)  P(B  b)P(A  b)

= (.1)* 
P(a = 1 | e,b = 1)P(e  A
 e
= .1.368* .9 + .607 * .1 = .1* .3919

CAUSAL ANALYSIS II
• So, after normalization:
P(b = 1 | a = 1) = .751
• Similarly, P(e = 1 | a = 1) = .349
• So, if we had to choose between burglary
and earthquake as a cause of making the
alarm go off, we should choose burglary.
Markov Chain Monte Carlo for
the Burglar Problem
• For current values of e =e*, calculate
*
P(b
=
0
|
a
=
1,e
=
e*),P(b
=
1
|
a
=
1,e
=
e
)
• or
P( A  b | e = e*) * P( B  b | e = e )
*
• Simulate b from this distribution. Call the result
b*. Now calculate:
*
*
P(e = 0 | b = b ,a = 1), P(e = 1 | b = b ,a = 1)
• Or
P(A  e | b )* P(E  e | b )
*
*
Independent Hidden Variables: A
Factorial Model
• In statistical modeling it is often advantageous to
treat variables which are not observed as ‘hidden’.
This means that they themselves have
distributions. In our case suppose b and e are
independent hidden variables:
•
P(b = 1) = β; P(b = 0) = 1 - β;
P(e = 1) = ε;
P(e = 0) = 1 - ε;
Then optimally:
P(b = 1 | a = 1) = .951
P(e = 1 | a = 1) = .186
Nonfactorial Hidden Variable
Models
• Suppose b and e are dependent hidden
variables:
P(b = 1,e = 1) = p1,1 ; P(b = 1,e = 0) = p1,0
P(b = 0,e = 1) = p 0,1 ;
P(b = 0,e = 0) = 1 - p1,1 - p1,0 - p 0,1
• Then a similar analysis yields a related
result
INFORMATION
• The difference in information available from
parameters after observing the alarm versus
before the alarm was observed is:
I(β,ε) =

b ,e
 L (b,e | β,ε) 
L (b,e | β,ε)log 

 L (b,e,a = 1) 
• This is the Kullback-Leibler ‘distance’ DQ P
between the prior and posterior distributions.
• Parameters β,ε are chosen to optimize this
distance.
INFORMATION IN THIS
EXAMPLE
• The information available in this example
• Calculated using:
L (b,e | β,ε)  β (1 - β) ε (1 - ε)
b
1-b e
1-e
L (b,e,a = 1)  P(a = 1 | b,e).9 * .1 * .1 * .9
1-b
is
b
e
(1-e)
I(β,ε) = -H(β) - H(ε) +





-logP(a = 1 | b,e) - (b + e)* log(.1) - (2 - b - e)log(.9) 

Markov Random Fields
• Markov Random Fields are simply Graphical
Models set in a 2 or higher dimensional field.
Their fundamental criterion is that the distribution
of a point x conditional on all of those that remain
(i.e., -x) is identical to its distribution given a
neighborhood ‘N’ of it (i.e.,
L ( x |  x)  L ( x | N x )
EXAMPLE OF A RANDOM
FIELD
• Modeling a video frame is typically done
via a random field. Parameters identify our
expectations of what the frame looks like.
• We can ‘clean up’ video frames or related
media using a methodology which
distinguishes between what we expect and
what was observed.
GENERALIZATION
• This is can be generalized to non-discrete
likelihoods with non-discrete parameters.
• More generally (sans data) assume that a
movie (consisting of many frames, each of
which consists in grey level pixel values
over a lattice) is observed. We would like
to ‘detect’ ‘unnatural’ events.
GENERALIZATION II
• Assume a model for frame i (given frame i-1)
taking the form,
L (Frame[i] | Θ,Frame[i - 1])
• The parameters
typically denote invariant
features for pictures of cars, houses, etc..
Θ
• The presence or absence of unnatural events can
be described by hidden variables.
• The (frame) likelihood describes the natural
evolution of the movie over time.
GENERALIZATION III
• Parameters are estimated by optimizing the
information they provide. This is
accomplished by ‘summing or integrating
over’ the hidden variables.