Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Artificial Intelligence
Chapter 19
Reasoning with Uncertain
Information
Biointelligence Lab
School of Computer Sci. & Eng.
Seoul National University
Outline
Review of Probability Theory
Probabilistic Inference
Bayes Networks
Patterns of Inference in Bayes Networks
Uncertain Evidence
D-Separation
Probabilistic Inference in Polytrees
(C) 2008 SNU CSE Biointelligence Lab
2
19.1 Review of Probability Theory (1/4)
Random variables
Joint probability
Ex. (B (BAT_OK), M (MOVES) , L
(LIFTABLE), G (GUAGE))
Joint
Probability
(True, True, True, True)
0.5686
(True, True, True, False)
0.0299
(True, True, False, True)
0.0135
(True, True, False, False)
0.0007
…
…
(C) 2008 SNU CSE Biointelligence Lab
3
19.1 Review of Probability Theory (2/4)
Marginal probability
Ex.
Conditional probability
Ex. The probability that the battery is charged given
that the arm does not move
(C) 2008 SNU CSE Biointelligence Lab
4
19.1 Review of Probability Theory (3/4)
Figure 19.1 A Venn Diagram
(C) 2008 SNU CSE Biointelligence Lab
5
19.1 Review of Probability Theory (4/4)
Chain rule
Bayes’ rule
set notation
Abbreviation for
where
(C) 2008 SNU CSE Biointelligence Lab
6
19.2 Probabilistic Inference
We desire to calculate the probability of some variable Vi
has value vi given the evidence E =e.
p (Q | ØR ) =
Example
p(P,Q,R)
0.3
p(P,Q,¬R)
0.2
p(P, ¬Q,R)
0.2
p(P, ¬Q,¬R)
0.1
p(¬P,Q,R)
0.05
p(¬P, Q, ¬R)
0.1
p(¬P, ¬Q,R)
0.05
p(¬P, ¬Q,¬R)
0.0
=
p (Q , ØR )
p (ØR )
ép (P ,Q , ØR ) + p (ØP ,Q , ØR )ù
ê
ú
û
= ë
p (ØR )
(0.2 + 0.1) 0.3
=
p (ØR )
p (ØR )
p (ØQ | ØR ) =
p (ØQ , ØR )
p (ØR )
ép (P , ØQ , ØR ) + p (ØP , ØQ , ØR )ù
ê
ú
û
= ë
p (ØR )
(0.1 + 0.0) 0.1
=
=
p (ØR )
p (ØR )
p (Q | ØR ) = 0.75
(C) 2008 SNU CSE Biointelligence Lab Q P (Q | ØR ) + p (ØQ | Ø
7R ) = 1
Statistical Independence
Conditional independence
: a set of variables
Intuition: Vi tells us nothing more about V than we already knew
by knowing Vj
Mutually conditional independence
Unconditional independence
(When
(C) 2008 SNU CSE Biointelligence Lab
is empty)
8
19.3 Bayes Networks (1/2)
Directed, acyclic graph (DAG) whose nodes are
labeled by random variables
Characteristics of Bayesian networks
Node Vi is conditionally independent of any subset of
nodes that are not descendents of Vi given its parents
k
pV1 , V2 ,...,Vk pVi | Pa(Vi )
i 1
Prior probability
Conditional probability table (CPT)
(C) 2008 SNU CSE Biointelligence Lab
9
19.3 Bayes Networks (2/2)
Bayes network about the block-lifting example
(C) 2008 SNU CSE Biointelligence Lab
10
19.4 Patterns of Inference in Bayes Networks (1/3)
Causal or top-down inference
Ex. The probability that the arm moves given that the block is
liftable
L
B
p (M | L ) = p (M , B | L ) + p (M , ØB | L )
G
M
= p (M | B , L )p (B | L ) + p (M | ØB , L )p (ØB | L ) (chain rule)
= p (M | B , L )p (B ) + p (M | ØB , L )p (ØB )
(from the structure)
= 0.9 * 0.95 = 0.855 .
(C) 2008 SNU CSE Biointelligence Lab
11
19.4 Patterns of Inference in Bayes Networks (2/3)
L
Diagnostic or bottom-up inference
B
Using an effect (or symptom) to infer a cause
G
M
Ex. The probability that the block is not liftable given that the arm
does not move.
pM | L 0.9525 (using causal reasoning)
p (ØL | ØM ) =
p (L | ØM ) =
p (ØM | ØL )p (ØL )
p (ØM )
p (ØM | L )p (L )
p (ØM )
=
=
0.9525 ´ 0.3 0.28575
=
p (ØM )
p (ØM )
0.145 ´ 0.7
0.1015
=
p (ØM )
p (ØM )
(using Bayes’ rule)
(using Bayes’ rule)
p (ØL | ØM ) = 0.7379
(C) 2008 SNU CSE Biointelligence Lab
12
19.4 Patterns of Inference in Bayes Networks (3/3)
L
B
Explaining away
G
One evidence: ØM (the arm does not move)
Additional evidence: ØB (the battery is not charged)
p (ØL | ØB , ØM ) =
=
=
p (ØM , ØB | ØL )p (ØL )
p (ØB , ØM )
p (ØM | ØB , ØL )p (ØB | ØL )p (ØL )
p (ØB , ØM )
p (ØM | ØB , ØL )p (ØB )p (ØL )
p (ØB , ØM )
M
(Bayes’ rule)
(def. of conditional prob.)
(structure of the
Bayes network)
= 0.30.
¬B explains ¬M, making ¬L less certain (0.30<0.7379)
(C) 2008 SNU CSE Biointelligence Lab
13
19.5 Uncertain Evidence
We must be certain about the truth or falsity of the
propositions they represent.
Each uncertain evidence node should have a child node, about
which we can be certain.
Ex. Suppose the robot is not certain that its arm did not move.
Introducing M’ : “The arm sensor says that the arm moved”
– We can be certain that that proposition is either true or false.
p(¬L| ¬B, ¬M’) instead of p(¬L| ¬B, ¬M)
Ex. Suppose we are uncertain about whether or not the battery is
charged.
Introducing G : “Battery guage”
p(¬L| ¬G, ¬M’) instead of p(¬L| ¬B, ¬M’)
(C) 2008 SNU CSE Biointelligence Lab
14
19.6 D-Separation (1/3)
D-saparation: direction-dependent separation
Two nodes Vi and Vj are conditionally independent given a set of nodes
E if for every undirected path in the Bayes network between Vi and Vj,
there is some node, Vb, on the path having one of the following three
properties.
Vb is in E, and both arcs on the path lead out of Vb
Vb is in E, and one arc on the path leads in to Vb and one arc leads out.
Neither Vb nor any descendant of Vb is in E, and both arcs on the path lead
in to Vb.
Vb blocks the path given E when any one of these conditions holds for
a path.
If all paths between Vi and Vj are blocked, we say that E d-separates Vi
and Vj
(C) 2008 SNU CSE Biointelligence Lab
15
19.6 D-Separation (2/3)
Figure 19.3 Conditional Independence via Blocking Nodes
(C) 2008 SNU CSE Biointelligence Lab
16
19.6 D-Separation (3/3)
L
B
Ex.
I(G, L|B) by rules 1 and 3
G
M
By
rule 1, B blocks the (only) path between G and L, given B.
By rule 3, M also blocks this path given B.
I(G, L)
By
rule 3, M blocks the path between G and L.
I(B, L)
By
rule 3, M blocks the path between B and L.
Even using d-separation, probabilistic inference in
Bayes networks is, in general, NP-hard.
(C) 2008 SNU CSE Biointelligence Lab
17
19.7 Probabilistic Inference in Polytrees (1/2)
Polytree
A DAG for which there is just one path, along arcs in
either direction, between any two nodes in the DAG.
(C) 2008 SNU CSE Biointelligence Lab
18
19.7 Probabilistic Inference in Polytrees (2/2)
A node is above Q
The node is connected to Q only through Q’s parents
A node is below Q
The node is connected to Q only through Q’s
immediate successors.
Three types of evidences
All evidence nodes are above Q.
All evidence nodes are below Q.
There are evidence nodes both above and below Q.
(C) 2008 SNU CSE Biointelligence Lab
19
Evidence Above (1/2)
Bottom-up recursive algorithm
Ex. p(Q|P5, P4)
pQ | P5, P 4
pQ, P6, P7 | P5, P4
P 6, P 7
pQ | P6, P7, P5, P4 pP6, P7 | P5, P4
P 6, P 7
pQ | P6, P7 pP6, P7 | P5, P4
P 6, P 7
pQ | P6, P7 pP6 | P5, P4 pP7 | P5, P4
P 6, P 7
pQ | P6, P7 pP6 | P5 pP7 | P4
(Structure of
The Bayes network)
(d-separation)
(d-separation)
P 6, P 7
(C) 2008 SNU CSE Biointelligence Lab
20
Evidence Above (2/2)
Calculating p(P7|P4) and p(P6|P5)
p (P 7 | P 4) = å p (P 7 | P 3, P 4)p (P 3 | P 4) = å p (P 7 | P 3, P 4)p (P 3)
p (P 6 | P 5) = å p (P 6 | P 1, P 2)p (P 1 | P 5)p (P 2)
P3
P3
P 1,P 2
Calculating p(P1|P5)
Evidence is “below”
Here, we use Bayes’ rule
pP5 | P1 pP1
pP1 | P5
pP5
(C) 2008 SNU CSE Biointelligence Lab
21
Evidence Below (1/2)
p (Q | P 12, P 13, P 14, P 11) =
p (P 12, P 13, P 14, P 11 | Q )p (Q )
p (P 12, P 13, P 14, P 11)
= kp (P 12, P 13, P 14, P 11 | Q )p (Q )
= kp (P 12, P 13 | Q )p (P 14, P 11 | Q )p (Q )
Using a top-down recursive algorithm
pP12, P13 | Q pP12, P13 | P9, Q p p9 | Q
P9
pP12, P13 | P9 pP9 | Q
(d-separation)
P9
pP9 | Q pP9 | P8, Q pP8
pP12, P13 | P9 pP12 | P9 pP13 | P9
P8
(C) 2008 SNU CSE Biointelligence Lab
22
Evidence Below (2/2)
pP14, P11 | Q pP14, P11 | P10 pP10 | Q
P10
pP14 | P10 pP11 | P10 pP10 | Q
P10
å p (P 11 | P 15, P 10)p (P 15 | P 10)
p (P 15 | P 10) = å p (P 15 | P 10, P 11)p (P 11)
p (P 15, P 10 | P 11)p (P 11)
p (P 11 | P 15, P 10) =
= k p (P 15, P 10 | P 11)p (P 11)
p (P 15, P 10)
p (P 15, P 10 | P 11) = p (P 15 | P 10, P 11)p (P 10 | P 11) = p (P 15 | P 10, P 11)p (P 10)
p (P 11 | P 10) =
P 15
P 11
1
(C) 2008 SNU CSE Biointelligence Lab
23
Evidence Above and Below
pQ | {P5, P4},{P12, P13, P14, P11}
E-
E+
(
+
-
p Q | E ,E
(
)
(
)
)=
p (E | E )
= k p (E | Q , E ) p (Q | E )
= k p (E | Q ) p (Q | E )
(d-separation)
p E- | Q , E+ p Q | E+
-
-
+
+
+
2
-
+
2
(We have calculated two probabilities already)
(C) 2008 SNU CSE Biointelligence Lab
24
A Numerical Example (1/2)
• We want to calculate p(Q|U)
pQ | U kpU | Q pQ (Bayes’ rule)
p (U | Q ) =
å
p(U | P )p(P | Q )
P
p P | Q p P | R, Q p R
R
pP | R, Q pR pP | R, Q pR
p (ØP | Q ) = 0.20
0.95 0.01 0.8 0.99 0.80
pU | Q pU | P 0.8 pU | P 0.2
0.7 0.8 0.2 0.2 0.60
p (Q | U ) = k ´ 0.6 ´ 0.05 = k ´ 0.03
To determine k, we need to calculate p(¬Q|U)
(C) 2008 SNU CSE Biointelligence Lab
25
A Numerical Example (2/2)
p (ØQ | U ) = kp (U | ØQ )p (ØQ )
(Bayes’ rule)
p (U | ØQ ) =
å
p(U | P )p(P | ØQ )
P
p (P | ØQ ) =
å p (P | R , ØQ )p (R )
p (P | R , ØQ )p (R ) + p (P | ØR , ØQ )p (ØR )
R
=
= 0.90 ´ 0.01 + 0.01 ´ 0.99 = 0.019
p (ØP | ØQ ) = 0.98
p (U | ØQ ) = p (U | P )´ 0.019 + p (U | ØP )´ 0.98
= 0.7 ´ 0.19 + 0.2 ´ 0.98 = 0.21
Finally
pQ | U k 0.6 0.05 k 0.03
p (ØQ | U ) = k ´ 0.21 ´ 0.95 = k ´ 0.20
pQ | U k 0.21 0.95 k 0.20
k 4.35, pQ | U 4.35 0.03 0.13
(C) 2008 SNU CSE Biointelligence Lab
26
Other methods for Probabilistic inference in
Bayes Networks
Bucket elimination
Monte Carlo methods (when the network is not a
polytree)
Clustering
(C) 2008 SNU CSE Biointelligence Lab
27
Additional Readings (1/5)
[Feller 1968]
Probability Theory
[Goldszmidt, Morris & Pearl 1990]
Non-monotonic inference through probabilistic method
[Pearl 1982a, Kim & Pearl 1983]
Message-passing algorithm
[Russell & Norvig 1995, pp.447ff]
Polytree methods
(C) 2008 SNU CSE Biointelligence Lab
28
Additional Readings (2/5)
[Shachter & Kenley 1989]
Bayesian network for continuous random variables
[Wellman 1990]
Qualitative networks
[Neapolitan 1990]
Probabilistic methods in expert systems
[Henrion 1990]
Probability inference in Bayesian networks
(C) 2008 SNU CSE Biointelligence Lab
29
Additional Readings (3/5)
[Jensen 1996]
Bayesian networks: HUGIN system
[Neal 1991]
Relationships between Bayesian networks and neural
networks
[Hecherman 1991, Heckerman & Nathwani 1992]
PATHFINDER
[Pradhan, et al. 1994]
CPCSBN
(C) 2008 SNU CSE Biointelligence Lab
30
Additional Readings (4/5)
[Shortliffe 1976, Buchanan & Shortliffe 1984]
MYCIN: uses certainty factor
[Duda, Hart & Nilsson 1987]
PROSPECTOR: uses sufficiency index and necessity index
[Zadeh 1975, Zadeh 1978, Elkan 1993]
Fuzzy logic and possibility theory
[Dempster 1968, Shafer 1979]
Dempster-Shafer’s combination rules
(C) 2008 SNU CSE Biointelligence Lab
31
Additional Readings (5/5)
[Nilsson 1986]
Probabilistic logic
[Tversky & Kahneman 1982]
Human generally loses consistency facing uncertainty
[Shafer & Pearl 1990]
Papers for uncertain inference
Proceedings & Journals
Uncertainty in Artificial Intelligence (UAI)
International Journal of Approximate Reasoning
(C) 2008 SNU CSE Biointelligence Lab
32