Download Lecture VII--InferenceInBayesianNet

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Mixture model wikipedia , lookup

Pattern recognition wikipedia , lookup

Multi-armed bandit wikipedia , lookup

Time series wikipedia , lookup

Linear belief function wikipedia , lookup

Inductive probability wikipedia , lookup

Transcript
CS 541: Artificial Intelligence
Lecture VII: Inference in Bayesian Networks
Schedule
Week
Date
Topic
Reading
Homework
1
08/30/2011
Introduction & Intelligent Agent
Ch 1 & 2
N/A
2
09/06/2011
Search: search strategy and heuristic search
Ch 3 & 4s
HW1 (Search)
3
09/13/2011
Search: Constraint Satisfaction & Adversarial Search
Ch 4s & 5 & 6
Teaming Due
4
09/20/2011
Logic: Logic Agent & First Order Logic
Ch 7 & 8s
HW1 due, Midterm Project (Game)
5
09/27/2011
Logic: Inference on First Order Logic
Ch 8s & 9
6
10/04/2011
Uncertainty and Bayesian Network
Ch 13 &14s
10/11/2011
No class (function as Monday)
7
10/18/2011
Midterm Presentation
8
10/25/2011
Inference in Baysian Network
Ch 14s
9
11/01/2011
Probabilistic Reasoning over Time
Ch 15
10
11/08/2011
TA Session: Review of HW1 & 2
11
11/15/2011
Learning: Decision Tree & SVM
Ch 18
12
11/22/2011
Ch 20
13
11/29/2011
Statistical Learning: Introduction: Naive Bayesian &
Percepton
Markov Decision Process& Reinforcement learning
14
12/06/2011
Final Project Presentation


TA: Vishakha Sharma
Email Address: [email protected]
HW2 (Logic)
Midterm Project Due
HW2 Due, HW3 (Probabilistic Reasoning)
HW3 due, HW4 (MDP)
HW4 due
Ch 16s & 21
Final Project Due
Re-cap of Lecture VI – Part I





Probability is a rigorous formalism for uncertain knowledge
Joint probability distribution species probability of every atomic event
Queries can be answered by summing over atomic events
For nontrivial domains, we must find a way to reduce the joint size
Independence and conditional independence provide the tools
Re-cap of Lecture VI – Part II

Bayes nets provide a natural representation for (causally induced) conditional
independence
Topology + CPTs = compact representation of joint distribution
Generally easy for (non)experts to construct
Canonical distributions (e.g., noisy-OR) = compact representation of CPTs

Continuous variables parameterized distributions (e.g., linear Gaussian)



Outline




Exact inference by enumeration
Exact inference by variable elimination
Approximate inference by stochastic simulation
Approximate inference by Markov chain Monte Carlo
Exact inference by enumeration
Inference tasks

Simple queries: compute posterior marginal P( Xi | E=e )



Conjunctive queries: P( Xi, Xj | E=e ) = P( Xi | E=e )P( Xj | Xi, E=e )
Optimal decisions: decision networks include utility information;




e.g., P( NoGas | Gauge=empty, Lights=on, Starts=false )
probabilistic inference required for P( outcome | action, evidence)
Value of information: which evidence to seek next?
Sensitivity analysis: which probability values are most critical?
Explanation: why do I need a new starter motor?
Inference by enumeration

Slightly intelligent way to sum out variables from the joint without actually

Simple query on the burglary network:

Rewrite full joint entries using product of CPT entries:

Recursive depth-first enumeration: O(n) space, O(dn) time
Enumeration algorithm
Evaluation tree

Enumeration is inefficient: repeated computation

E.g., computes
for every value of e
Inference by variable elimination
Inference by variable elimination

Variable elimination: carry out summations right-to-left, storing
intermediate results (factors) to avoid recomputation
Variable elimination: basic operations

Summing out a variable from a product of factors:

Move any constant factors outside the summation
Add up submatrices in pointwise product of remaining factors

Assuming

do not depend on

Pointwise product of factors

E.g.,
and
:
Variable elimination algorithm
Irrelevant variables

Consider the query

Sum over m is identically 1,M is irrelevant of to the query

Theorem 1:

Here

so
is irrelevant
(Compare this to backward chaining from the query in Horn clause KBs)

is irrelevant unless
, and
Irrelevant variables




Definition: moral graph of Bayesian network: marry all parents and drop
arrows
Definition: A is m-separated from B by C iff separated by C in the moral
graph
Theorem 2:Y is irrelevant if m-separated from X by E
For
and Earthquake are irrelevant
, both Burglary
Complexity of exact inference

Singly connected networks (or polytree):



any two nodes are connected by at most one (undirected) path
time and space cost of variable elimination are O(dkn)
Multiply connected networks:


Can reduce 3SAT to exact inference  NP-hard
Equivalent to counting 3SAT models  #P-complete
Inference by stochastic simulation
Inference by stochastic simulation

Basic idea:




Draw N samples from a sampling distribution S
Compute an approximate posterior probability
Show this converges to the true probability P
Outline:

Direct sampling from a Bayesian network
Rejection sampling: reject samples disagreeing with evidence
Likelihood weighting: use evidence to weight samples

Markov chain Monte Carlo (MCMC): sample from a stochastic process, whose


stationary distribution is the true posterior
Direct sampling
Example
Example
Example
Example
Example
Example
Example
Direct sampling

Probability that PRIORSAMPLE generates a particular event

i.e., the true prior probability
E.g.,
Let
be the number of samples generated for events
Then we have





That is, estimates from PRIORSAMPLE are consistent
Shorthand:
Rejection sampling
Rejection sampling
estimated from samples agreeing with e


E.g, estimate



27 samples have
Of these, 8 have
using 100 samples
, 19 have
.
Similar to a basic real-world empirical estimation procedure
Analysis of rejection sampling



Hence rejection sampling returns consistent posterior estimates
Problem: hopelessly expensive if P(e) is small
P(e) drops o exponentially with number of evidence variables!
Likelihood weighting
Likelihood weighting
Idea: x evidence variables, sample only nonevidence variables,
and weight each sample by the likelihood it accords the evidence
Likelihood weighting example
Likelihood weighting example
Likelihood weighting example
Likelihood weighting example
Likelihood weighting example
Likelihood weighting example
Likelihood weighting example
Likelihood weighting analysis

Sampling probability for WEIGHTEDSAMPLE is

Note: pays attention to evidence in ancestors only

Somewhere "between" prior and posterior distribution

Weight for a given sample z, e is

Weighted sampling probability is

Hence likelihood weighting returns consistent estimates

but performance still degrades with many evidence variables because a few samples
have nearly all the total weight
Markov chain Monte Carlo
Approximate inference using MCMC

"State" of network = current assignment to all variables.

Generate next state by sampling one variable given Markov blanket

Sample each variable in turn, keeping evidence fixed

Can also choose a variable to sample at random each time
The Markov chain

With Sprinkler =true ,WetGrass=true, there are four states:

Wander about for a while, average what you see
MCMC example




Estimate
Sample Cloudy or Rain given its Markov blanket, repeat.
Count number of times Rain is true and false in the samples.
E.g., visit 100 states


31 have Rain=true, 69 have Rain=false
Theorem: chain approaches stationary distribution, i.e., long-run fraction
of time spent in each state is exactly proportional to its posterior
probability
Markov blanket sampling

Markov blanket of Cloudy is


Sprinkler and Rain
Markov blanket of Rain is

Cloudy, Sprinkler, and WetGrass

Probability given the Markov blanket is calculated as follows:

Easily implemented in message-passing parallel systems, brains
Main computational problems:




Difficult to tell if convergence has been achieved
Can be wasteful if Markov blanket is large:
P(Xi|mb(Xi)) won't change much (law of large numbers)
Summary

Exact inference by variable elimination:



polytime on polytrees, NP-hard on general graphs
space = time, very sensitive to topology
Approximate inference by LW, MCMC:




LW does poorly when there is lots of (downstream) evidence
LW, MCMC generally insensitive to topology
Convergence can be very slow with probabilities close to 1 or 0
Can handle arbitrary combinations of discrete and continuous variables
HW#3 Probabilistic reasoning

Mandatory




Problem 14.1 (1 points)
Problem 14.4 (2 points)
Problem 14.5 (2 points)
Problem 14.21 (5 points)




Programming is needed for question e)
You need to submit your executable along with source code, with
detailed readme file on how we can run it.
You lose 3 points if no program.
Bonus

Problem 14.19 (2 points)