Download Probabilistic Reasoning

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Probabilistic Reasoning
Chapter 14 (14.1, 14.2, 14.3, 14.4)
• Capturing uncertain knowledge
• Probabilistic inference
Copyright, 1996 © Dale Carnegie & Associates, Inc.
Knowledge representation
Joint probability distribution



can answer any question about the domain
can become intractably large as #RV grows
can be difficult to specify P for atomic events
Conditional independence can simplify
probabilistic assignment
A data structure - a belief network or Bayesian
network that represents the dependence
between variables and gives a concise
specification of the joint.
CSE 471/598 by H. Liu
2
A Bayesian network is a graph:




A set of random variables
A set of directed links connects pairs of nodes
Each node has a conditional P table that quantifies
the effects that the parents have on the node
The graph has no directed cycles (DAG)
It is usually much easier for an expert to
decide conditional dependence relationships
than specifying probabilities
 Sometimes, experts can have very different
opinions 
CSE 471/598 by H. Liu
3
Once the network is specified, we need only specify
conditional probabilities for the nodes that participate
in direct dependencies, and use those to compute
any other probabilities.
A simple Bayesian network (Fig 14.1)
An example of burglary-alarm-call (Fig 14.2)
The topology of the network can be thought of as
the general structure of the causal process.
Many details (Mary listening to loud music, or phone
ringing and confusing John) are summarized in the
uncertainty associated with the links from Alarm to
JohnCalls and MaryCalls.
CSE 471/598 by H. Liu
4
The probabilities actually summarize a
potentially infinite set of possible
circumstances


Overcoming both laziness and ignorance
The degree of approximation can be improve if we
have additional relevant information
Specifying the CPT for each node (Fig 14.2)



A conditioning case - a possible combination of
values for the parent nodes (2n)
Each row in a CPT must sum to 1
A node with no parents has only one row (priors)
CSE 471/598 by H. Liu
5
The semantics of Bayesian
networks
Two equivalent views of a Bayesian network


Representing the JPD - helpful in understanding how
to construct networks
Representing conditional independence relations helpful in designing inference procedures
CSE 471/598 by H. Liu
6
Representing JPD - constructing a BN
A Bayesian network provides a complete
description of the domain. Every entry in the
JPD can be calculated from the info in the
network.
A generic entry in the joint is the probability of
a conjunction of particular assignments to each
variable.
P(x1,…,xn)=P(xi|Parents(xi))
(14.1)
What’s the probability of the event of
J^M^A^!B^!E?


=P(j|a)P(m|a)P(a|!b^!e)P(!b)P(!e)
Find the values in Figure 14.2 and done
CSE 471/598 by H. Liu
7
A method for constructing
Bayesian networks
Eq 14.1 defines what a given BN means but implies
certain conditional independence relationships that can be
used to guide the construction.
P(x1,…,xn)=P(xn|xn-1,…,x1)P(xn-1,…,x1)


continue for P(xn-1,…,x1) to form the Chain Rule
we get (14.2) below
P(Xi|Xi-1,…,X1)=P(Xi|Parents(Xi))

(14.2)
Parents(Xi) is contained in {Xi-1,…,X1}
The BN is a correct representation of the domain only if each
node is C-independent of its predecessors in the node
ordering, given its parents.
 E.g., P(M|J,A,E,B)=P(M|A)
CSE 471/598 by H. Liu
8
Incremental network construction
Choose relevant variables describing the
domain
Choose an ordering for the variables
While there are variables left:



Pick a var and add a node to the network
Set its parents to some minimal set of
nodes already in the net to satisfy Eq.14.2
Define the CPT for the var.
CSE 471/598 by H. Liu
9
Compactness
A Bayesian network can often be far more
compact than the full joint.
In a locally structured system, each subcomponent interacts directly with only a bounded
number of other components.
A local structure is usually associated with linear
rather than exponential growth in complexity.
With 30 (n) nodes, if a node is directly influenced
by 5 (k) nodes, what’s the difference between BN
& joint?

30*2^5 vs. 2^30, or n*2^k vs. 2^n
CSE 471/598 by H. Liu
10
Node ordering
The correct order to add nodes is to add the “root
causes” first, then the variables they influence,
and so on until we reach the leaves that have no
direct causal influence on the other variables.

Domain knowledge helps!
What if we happen to choose the wrong order?
Fig 14.3 shows an example.
If we stick to a true causal model, we end up
having to specify fewer numbers, and the
numbers will often be easier to come up with.
CSE 471/598 by H. Liu
11
Conditional independence relations
Designing inference algorithms, we need to know
if more general conditional independences hold.
Given a network, can we know if a set of nodes X
is independent of another set Y, given a set of
evidence nodes E? It boils down to the concept of
non-descendants.

As in Fig 14.2, JohnCalls is indept of Burglary and
Earthquake, given Alarm.
A node is cond independent of all other nodes in
the network, given its parents, children, and
children’s parents (its Markov blanket).

Burglary is indept of JohnCalls and MaryCalls, given
Alarm and Earthquake
CSE 471/598 by H. Liu
12
Representation of CPTs
Given canonical distributions, the complete table can
be specified by naming the distribution with some
parameters.
A deterministic node has its values specified exactly
by the values of its parents.
Uncertain relationships can often be characterized by
“noisy” logical relationships.

Noisy-OR (page 500)
An example for determine cond probabilities starting
with P(!fever) on page 501 given the individual
inhibition probabilities given cold, flu, malaria as

P(!fever|c,!f,!m) = 0.6, P(!fever|!c,f,!m) = 0.2, and
P(!fever|!c,!f,m) = 0.1
CSE 471/598 by H. Liu
13
Inference in Bayesian networks
Exact inference




Inference by enumeration
The variable elimination algorithm
The complexity of exact inference
Clustering algorithms
Approximate inference

Direct sampling methods
 Rejection sampling
 Likelihood weighting

Inference by Markov chain simulation
CSE 471/598 by H. Liu
14
Knowledge engineering for
uncertain reasoning
Decide what to talk about
Decide on a vocabulary of random variables
Encode general knowledge about the
dependence
Encode a description of the specific problem
instance
Pose queries to the inference procedure and get
answers
CSE 471/598 by H. Liu
15
Other approaches to uncertain
reasoning
Different generations of expert systems






Strict logic reasoning (ignore uncertainty)
Probabilistic techniques using the full Joint
Default reasoning - believed until a better reason
is found to believe something else
Rules with certainty factors
Handling ignorance - Dempster-Shafer theory
Vagueness - something is sort of true (fuzzy logic)
Probability makes the same ontological
commitment as logic: the event is true or
false
CSE 471/598 by H. Liu
16
Default reasoning
The four-wheel car conclusion is reached by
default.
New evidence can cause the conclusion
retracted, while FOL is strictly monotonic.
Representatives are default logic,
nonmonotonic logic, circumscription
There are problematic issues

Details in Chapter 10
CSE 471/598 by H. Liu
17
Rule-based methods
Logical reasoning systems have properties like:




Monotonicity: additional facts won’t affect the
existing ones
Locality: each rule is considered independently
Detachment: After it is derived, a rule can be
detached from its justification
Truth-functionality: the truth of complex sentences
can be computed from the truth of the components
These properties are good for obvious
computational advantages; bad as they’re
inappropriate for uncertain reasoning.
CSE 471/598 by H. Liu
18
Summary
Reasoning properly


In FOL, it means conclusions follow from premises
In probability, it means having beliefs that allow an
agent to act rationally
Conditional independence info is vital
A Bayesian network is a complete representation
for the JPD, but exponentially smaller in size
Bayesian networks can reason causally,
diagnostically, intercausally, or combining two or
more of the three.
For polytrees (singly connected networks), the
computational time is linear in network size.
CSE 471/598 by H. Liu
19