Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Graphical Models for Machine Learning and Computer Vision Statistical Models • Statistical Models Describe observed ‘DATA’ via an assumed likelihood: L (DATA | Θ) • With Θ denoting the ‘parameters’ needed to describe the data. • Likelihoods measure how likely what was observed was. They implicitly assume an error mechanism (in the translation between what was observed and what was ‘supposed’ to be observed). • Parameters may describe model features or even specify different models. An Example of a Statistical Model • A burgler alarm is affected by both earthquakes and burgleries. It has a mechanism to communicate with the homeowner if activated. It went off at Judah Pearles house one day. Should he: • a) immediately call the police • under suspicion that a burglary took • place, or • b) go home and immediately transfer his • valueables elsewhere? A Statistical Analysis • Observation: The burgler alarm went off (i.e., a=1); • Parameter 1: The presence or absence of an earthquake (i.e., e=1,0); • Parameter 2: The presence or absence of a burglary at Judah’s house (i.e., b=1,0). LIKELIHOODS/PRIORS IN THIS CASE • The Likelihood associated with the observation is: L ( DATA | ) P(a 1| b, e) • With b,e =0,1 (depending on whether a burglery,earthquake has taken place). • The Priors specify the probabilities of a burglery or earthquake happenning: P(b 1) ?; P(e=1)=?; Example Probabilities • Here are some probabilities indicating something about the likelihood and prior: P(b 0) .9; P(b=1)=.1; P(a=1|e=b=0)=.001; P(a=1|b=1,e=0)=.368; P(a=1|e=1,b=0)=.135; P(a=1|b=e=1)=.607; LIKELIHOOD/PRIOR INTERPRETATION • Burglaries are as likely (apriori) as earthquakes. • It is unlikely that the alarm just went off by itself. • The alarm goes off more often when a burglary happens but an earthquakes does not than (the reverse) i.e., when an earthquake happens but a burglary does not. • If both a burglary and an earthquake happens than it is (virtually) twice as likely the alarm will go off. Probability Propagation Graph • PROBABILITY PROPOGATION • There are two kinds of Probability Propogation: (see Frey 1998) a) marginalization i.e., P( B b) • And b) multiplication i.e., P(b B) • Marginalization sums over terms leading into the node; • Multiplication multiplies over terms leading into the node. CAUSAL ANALYSIS • To analyze the causes of the alarm going off, we calculate the probability that it was a burglary (in this case) and compare it with the probability P(b = 1 | a = 1) P(B b)P(A b) = (.1)* P(a = 1 | e,b = 1)P(e A e = .1.368* .9 + .607 * .1 = .1* .3919 CAUSAL ANALYSIS II • So, after normalization: P(b = 1 | a = 1) = .751 • Similarly, P(e = 1 | a = 1) = .349 • So, if we had to choose between burglary and earthquake as a cause of making the alarm go off, we should choose burglary. Markov Chain Monte Carlo for the Burglar Problem • For current values of e =e*, calculate * P(b = 0 | a = 1,e = e*),P(b = 1 | a = 1,e = e ) • or P( A b | e = e*) * P( B b | e = e ) * • Simulate b from this distribution. Call the result b*. Now calculate: * * P(e = 0 | b = b ,a = 1), P(e = 1 | b = b ,a = 1) • Or P(A e | b )* P(E e | b ) * * Independent Hidden Variables: A Factorial Model • In statistical modeling it is often advantageous to treat variables which are not observed as ‘hidden’. This means that they themselves have distributions. In our case suppose b and e are independent hidden variables: • P(b = 1) = β; P(b = 0) = 1 - β; P(e = 1) = ε; P(e = 0) = 1 - ε; Then optimally: P(b = 1 | a = 1) = .951 P(e = 1 | a = 1) = .186 Nonfactorial Hidden Variable Models • Suppose b and e are dependent hidden variables: P(b = 1,e = 1) = p1,1 ; P(b = 1,e = 0) = p1,0 P(b = 0,e = 1) = p 0,1 ; P(b = 0,e = 0) = 1 - p1,1 - p1,0 - p 0,1 • Then a similar analysis yields a related result INFORMATION • The difference in information available from parameters after observing the alarm versus before the alarm was observed is: I(β,ε) = b ,e L (b,e | β,ε) L (b,e | β,ε)log L (b,e,a = 1) • This is the Kullback-Leibler ‘distance’ DQ P between the prior and posterior distributions. • Parameters β,ε are chosen to optimize this distance. INFORMATION IN THIS EXAMPLE • The information available in this example • Calculated using: L (b,e | β,ε) β (1 - β) ε (1 - ε) b 1-b e 1-e L (b,e,a = 1) P(a = 1 | b,e).9 * .1 * .1 * .9 1-b is b e (1-e) I(β,ε) = -H(β) - H(ε) + -logP(a = 1 | b,e) - (b + e)* log(.1) - (2 - b - e)log(.9) Markov Random Fields • Markov Random Fields are simply Graphical Models set in a 2 or higher dimensional field. Their fundamental criterion is that the distribution of a point x conditional on all of those that remain (i.e., -x) is identical to its distribution given a neighborhood ‘N’ of it (i.e., L ( x | x) L ( x | N x ) EXAMPLE OF A RANDOM FIELD • Modeling a video frame is typically done via a random field. Parameters identify our expectations of what the frame looks like. • We can ‘clean up’ video frames or related media using a methodology which distinguishes between what we expect and what was observed. GENERALIZATION • This is can be generalized to non-discrete likelihoods with non-discrete parameters. • More generally (sans data) assume that a movie (consisting of many frames, each of which consists in grey level pixel values over a lattice) is observed. We would like to ‘detect’ ‘unnatural’ events. GENERALIZATION II • Assume a model for frame i (given frame i-1) taking the form, L (Frame[i] | Θ,Frame[i - 1]) • The parameters typically denote invariant features for pictures of cars, houses, etc.. Θ • The presence or absence of unnatural events can be described by hidden variables. • The (frame) likelihood describes the natural evolution of the movie over time. GENERALIZATION III • Parameters are estimated by optimizing the information they provide. This is accomplished by ‘summing or integrating over’ the hidden variables.