Download Bayesian Networks - Computer Science & Engineering

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Bayesian Networks
Chapter 2 (Duda et al.) – Section 2.11
CS479/679 Pattern Recognition
Dr. George Bebis
Statistical Dependence
Between Variables
– Representing high-dimensional densities is very
challenging since we need to estimate many
parameters (e.g., kn)
p( x1 , x2 ,..., xn )
• Many times, the only knowledge we have about a
distribution is which variables are (or are not)
dependent.
• Such dependencies can be represented efficiently
using Bayesian Networks (or Belief Networks).
Example of Dependencies
• Represent the state of an automobile:
–
–
–
–
Engine temperature
Brake fluid pressure
Tire air pressure
Wire voltages
• Causally related variables
– Engine temperature
– Coolant temperature
• NOT causally related variables
– Engine oil pressure
– Tire air pressure
Bayesian Net Applications
• Microsoft: Answer Wizard, Print Troubleshooter
• US Army: SAIP (Battalion Detection from SAR, IR etc.)
• NASA: Vista (DSS for Space Shuttle)
• GE: Gems (real-time monitoring of utility generators)
Definitions and Notation
• A bayesian net is usually a Directed Acyclic Graph (DAG)
• Each node represents a variable.
• Each variable assumes certain states (i.e., values).
Relationships Between Nodes
• A link joining two nodes is directional and represents a causal
influence (e.g., A influences X or X depends on A)
• Influences could be direct or indirect (e.g., A influences X
directly and A influences C indirectly through X).
Prior / Conditional Probabilities
• Each variable is associated with prior or conditional
probabilities (discrete or continuous).
Markov Property
“Each node is conditionally independent of its ancestors
given its parents”
Example:
p( x1 / x2 ,..., xn )  p( x1 / 1 )
1 : parents of x1
Computing Joint Probabilities
Using the Markov property
• Using the chain rule, the joint probability of a set of
variables x1, x2, …, xn is given as:
p( x1 , x2 ,..., xn ) =
p( x1 / x2 ,..., xn ) p( x2 / x3 ,..., xn )... p( xn1 / xn ) p( xn )
• Using the Markov property (i.e., node xi is conditionally
independent of its ancestors given its parents πi), we have :
n
p( x1 , x2 ,..., xn )   p( xi /  i )
i 1
much simpler!
Example
• We can compute the probability of any configuration of
states in the joint density, e.g.:
P(a3, b1, x2, c3, d2)=P(a3)P(b1)P(x2 /a3,b1)P(c3 /x2)P(d2 /x2)=
0.25 x 0.6 x 0.4 x 0.5 x 0.4 = 0.012
Fundamental Problems in
Bayesian Nets
• Evaluation (inference): Given the values of
the observed variables (evidence), estimate
the values of the non-observed variables.
• Learning: Given training data and prior
information (e.g., expert knowledge, causal
relationships), estimate the network
structure, or the parameters (probabilities),
or both.
Inference Example:
Medical Diagnosis
Uppermost nodes: biological agents (bacteria, virus)
causes
Intermediate nodes: diseases
Lowermost nodes: symptoms
• Goal: given some evidence (biological agents,
symptoms), find most likely disease.
effects
Evaluation (Inference) Problem
• In general, if X denotes the query variables and e denotes
the evidence, then
P( X, e)
P( X / e) 
  P( X, e)
P(e)
where α=1/P(e) is a constant of proportionality.
Example
• Classify a fish given that the fish is light (c1) and was caught
in south Atlantic (b2) -- no evidence about what time of the
year the fish was caught nor its thickness.
Example (cont’d)
P( X, e)
P( X / e) 
  P( X, e)
P(e)
Example (cont’d)
Example (cont’d)
• Similarly,
P(x2 / c1,b2)=α 0.066
• Normalize probabilities (not needed necessarily):
P(x1 /c1,b2)+ P(x2 /c1,b2)=1 (α=1/0.18)
P(x1 /c1,b2)= 0.73
P(x2 /c1,b2)= 0.27
salmon
Evaluation (Inference) Problem
(cont’d)
• Exact inference is an NP-hard problem
because the number of terms in the
summations (or integrals) for discrete (or
continuous) variables grows exponentially
with the number of variables.
• For some restricted classes of networks (e.g.,
singly connected networks where there is no
more than one path between any two nodes)
exact inference can be efficiently solved in
time linear in the number of nodes.
Evaluation (Inference) Problem
(cont’d)
• For singly connected Bayesian networks:
P( X / e)  P( X / eC , eP )   P( X / eP ) P(eC / X)
eC : children nodes, eP : parent nodes
• Approximate inference methods are typically
used in most cases.
– Sampling (Monte Carlo) methods
– Variational methods
– Loopy belief propagation
Another Example
• You have a new burglar alarm installed at home.
• It is fairly reliable at detecting burglary, but also
sometimes responds to minor earthquakes.
• You have two neighbors, Ali and Veli, who promised to
call you at work when they hear the alarm.
Another Example (cont’d)
• Ali always calls when he hears the alarm, but
sometimes confuses telephone ringing with the alarm
and calls too.
• Veli likes loud music and sometimes misses the alarm.
• Design a Bayesian network to estimate the probability
of a burglary given some evidence.
Another Example (cont’d)
• What are the system variables?
– Alarm
– Causes
• Burglary, Earthquake
– Effects
• Ali calls, Veli calls
Another Example (cont’d)
• What are the conditional dependencies among
them?
– Burglary (B) and earthquake (E) directly affect the
probability of the alarm (A) going off
– Whether or not Ali calls (AC) or Veli calls (VC)
depends on the alarm.
Another Example (cont’d)
Another Example (cont’d)
• What is the probability that the alarm has
sounded but neither a burglary nor an
earthquake has occurred, and both Ali and Veli
call?
Another Example (cont’d)
• What is the probability that there is a burglary
given that Ali calls?
• What about if both Veli and Ali call?
Naïve Bayesian Network
• Assuming that features are conditionally independent, the
conditional class density can be simplified as follows:
Naïve Bayesian Network:
• Sometimes works well in practice despite the strong
assumption behind it.
Related documents