Download Ch.6 Uncertain Knowledge Representation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Ch.6 Uncertain Knowledge
Representation
Hantao Zhang
http://www.cs.uiowa.edu/∼ hzhang/c145
The University of Iowa
Department of Computer Science
Artificial Intelligence – p.1/39
Logic and Uncertainty
One problem with logical approaches:
Agents almost never have access to the whole truth
about their environments.
Very often, even in simple worlds, there are important
questions for which there is no boolean answer.
In that case, an agent must reason under uncertainty.
Uncertainty also arises because of an agent’s
incomplete or incorrect understanding of its
environment.
Artificial Intelligence – p.2/39
Uncertainty
Let action Lt = “leave for airport t minutes before flight”.
Will Lt get me there on time?
Problems
partial observability (road state, other drivers’ plans, etc.)
noisy sensors (unreliable traffic reports)
uncertainty in action outcomes (flat tire, etc.)
immense complexity of modelling and predicting traffic
Hence a purely logical approach either
1. risks falsehood (“A25 will get me there on time”), or
2. leads to conclusions that are too weak for decision making (“A25 will get
me there on time if there’s no accident on the way, it doesn’t rain, my tires remain
intact, . . . ”)
Artificial Intelligence – p.3/39
Handling Uncertain Knowledge
FOL-based approaches fail to cope with domains like medical diagnosis:
too much work to write complete axioms, or too hard to
work with the enormous sentences that result.
Laziness:
Theoretical Ignorance:
The available knowledge of the domain is
incomplete.
The theoretical knowledge of the domain is
complete but some evidential facts are missing.
Practical Ignorance:
Artificial Intelligence – p.4/39
Degrees of Belief
In several real-world domains the agent’s knowledge
can only provide a degree of belief in the relevant
sentences.
The agent cannot say whether a sentence is true, but
only that is true x% of the times.
The main tool for handling degrees of belief is Probability
Theory.
The use of probability summarizes the uncertainty that
stems from our laziness or ignorance about the domain.
Artificial Intelligence – p.5/39
Probability Theory
Probability Theory makes the same ontological
commitments as FOL:
Every sentence ϕ is either true or false.
The degree of belief that ϕ is true is a number P between 0
and 1.
P (ϕ) = 1
P (ϕ) = 0
P (ϕ) = 0.65
−→
−→
−→
ϕ is certainly true.
ϕ is certainly not true.
ϕ is true with a 65% chance.
Artificial Intelligence – p.6/39
Probability of Facts
Let A be a propositional variable, a symbol denoting a proposition
that is either true or false.
P (A) denotes the probability that A is true in the absence of any
other information.
Similarly,
P (¬A)
=
probability that A is false
P (A ∧ B)
=
probability that both A and B are true
P (A ∨ B)
=
probability that either A or B (or both) are true
Examples:
P (¬Blonde), P (Blonde ∧ BlueEyed), P (Blonde ∨ BlueEyed)
Artificial Intelligence – p.7/39
Joint Probability Distribution
If X1 , . . . , Xn are random variables,
P(X1 = x1 , . . . , Xn = xn ), or simply P(X1 , . . . , Xn )
denotes their joint probability distribution (JPD), an
n-dimensional matrix specifying the probability of every
possible combination of values for X1 , . . . , Xn .
Example Sky : {sunny, cloudy, rain, snow}
W ind : {true, f alse}
P(W ind, Sky)
=
true
f alse
sunny cloudy rain
0.30
0.15 0.17
0.30
0.05 0.01
snow
0.01
0.01
Artificial Intelligence – p.8/39
Joint Probability Distribution
All relevant probabilities about a vector hX1 , . . . , Xn i of
random variables can be computed from P(X1 , . . . , Xn ).
S = sunny
S = cloudy
S = rain
S = snow
P(W )
W
0.30
0.15
0.17
0.01
0.63
¬W
0.30
0.05
0.01
0.01
0.37
P(S)
0.60
0.20
0.18
0.02
1.00
P (S = rain ∧ W )
= 0.17
P (W )
= 0.30 + 0.15 + 0.17 + 0.01 = 0.63
P (S = rain)
= 0.17 + 0.01
P (S = rain|W )
= P (S = rain ∧ W )/P (W )
= 0.17/0.63
= 0.18
= 0.27
Artificial Intelligence – p.9/39
Joint Probability Distribution
A joint probability distribution P(X1 , . . . , Xn ) provides
complete information about the probabilities of its
random variables.
However, JPD’s are often hard to create (again because
of incomplete knowledge of the domain).
Even when available, JPD tables are very expensive, or
impossible, to store because of their size.
A JPD table for n random variables, each ranging over k
distinct values, has k n entries!
A better approach is to come up with conditional
probabilities as needed and compute the others from
them.
Artificial Intelligence – p.10/39
An alternative to JPD: The Bayes Rule
Recall that for any fact A and B ,
P (A ∧ B) = P (A|B)P (B) = P (B|A)P (A)
From this we obtain the Bayes Rule:
P (A|B)P (B)
P (B|A) =
P (A)
The rule is useful in practice because it is often easier to
figure out P (A|B), P (B), P (A) than to figure out P (B|A)
directly.
Artificial Intelligence – p.11/39
Applying the Bayes Rule
What is the probability that Sam is infected by HIV
(humman immunodeficiency virus) if he is tested positive in
a routine blood test using ELISA?
ELISA (Enzyme-Linked Immunosorbent Assay) is a highly
accurate and relatively cheap test for HIV.
Let H = HIV present, E = positive in ELISA
P (E|H)P (H)
P (H|E) =
P (E)
Artificial Intelligence – p.12/39
Applying the Bayes Rule
P (E|H)P (H)
P (H|E) =
P (E)
P (E|H) is easier to estimate than P (H|E) because it
refers to causal knowledge: people affected by HIV have
high rate of testing positive in ELISA .
P (E|H) can be estimated from past medical cases: say
9990 of 10000 HIV patients tested positive in ELISA.
Similarly, P (H) can be estimated from past medical
cases. If 1 in in 100,000 men in Sam’s state is affected
with HIV, then P(H) = .00001.
How to estimate P(E), the probability of testing positive
in ELISA?
Artificial Intelligence – p.13/39
Applying the Bayes Rule
P (E|H)P (H)
P (H|E) =
P (E)
How to estimate P(E), the probability of testing positive
in ELISA?
If we know the false positive rate (say .002) for those
who are not affected by HIV, then
P (E) = P (E|H)P (H) + P (E|H)P (H)
= 0.999 ∗ 0.00001 + 0.002 ∗ 0.99999 = 0.00200997
P (E|H)P (H)
P (H|E) =
P (E)
= 0.999 ∗ 0.00001/0.00200997 = 0.004970
Artificial Intelligence – p.14/39
Applying the Bayes Rule
The Bayes rule is helpful even in absence of (immediate)
causal relationships.
What is the probability that a blonde (B) is Swedish (S)?
P (S|B) =
P (B|S)P (S)
P (B)
All P (B|S), P (S), P (B) are easily estimated from statistical information.
P (B|S) ≈
P (S)
≈
P (B) ≈
# of blonde Swedish
Swedish population
Swedish population
world population
# of blondes
world population
=
9
10
=
...
=
...
Artificial Intelligence – p.15/39
Conditional Probability
JPD is a collection of probabilties:
P(X1 , . . . , Xn ) = {P (X1 = x1 ∧· · ·∧Xn = xn ) | xi ∈ Domain(Xi )}
Conditional Probability:
P (X1 = x1 |X2 = x2 ) =
P (X1 =x1 ∧X2 =x2 )
P (X2 =x2 )
or
P (X1 = x1 ∧ X2 = x2 ) = P (X1 = x1 |X2 = x2 )P (X2 = x2 )
P (X1 = x1 |X2 = x2 , ..., Xn = xn ) =
P (X1 =x1 ∧X2 =x2 ∧···∧Xn =xn )
P (X2 =x2 ∧···∧Xn =xn )
or
P (X1 = x1 ∧ X2 = x2 ∧ · · · ∧ Xn = xn )
= P (X1 = x1 |X2 = x2 , ..., Xn = xn )P (X2 = x2 ∧ · · · ∧ Xn = xn )
Qn
=
i=1 P (Xi = xi |Xi+1 = xi+1 , ..., Xn = xn )
Artificial Intelligence – p.16/39
Conditional Independence
Conditional Independence: If
P (X1 = x1 |X2 = x2 , ..., Xn−1 = xn−1 , Xn = xn ) =
P (X1 = x1 |X2 = x2 , ..., Xn−1 = xn−1 )
then we say X1 = x1 is conditionally indepedent of Xn = xn
given the evidence of X2 = x2 , ..., Xn−1 = xn−1 .
Artificial Intelligence – p.17/39
Bayesian Networks
A joint probability distribution (JPD) contains all the
relevant information to reason about the various kinds of
probabilities of a set {X1 , . . . , Xn } of random variables.
Unfortunately, JPD tables are difficult to create and also
very expensive to store.
One possibility is to work with conditional probabilities
and exploit the fact that many random variables are
conditionally independent.
are a successful example of probabilistic
systems that exploit conditional independence to
reason effectively under uncertainty.
Belief Networks
Artificial Intelligence – p.18/39
Belief Networks
Let X1 , . . . , Xn be random variables.
A belief network (or Bayesian network) for X1 , . . . , Xn is a graph
with m nodes such that
there is a node for each Xi ,
all the edges between two nodes are directed,
there are no cycles,
each node has a conditional probability table (CPT),
given in terms of the node’s parents.
The intuitive meaning of an edge from a node Xi to a node
Xj is that Xi has a direct influence on Xj .
Artificial Intelligence – p.19/39
A Belief Network
A Belief Network is an acyclic graph G = (V, E), where
V is a set of random variables and E expresses
conditional dependencies among them.
Artificial Intelligence – p.20/39
Detecting Credit Card Fraud
We condider the following random variables:
Fraud (F): Whether the current purchase is fraudulent
Gas(G): Whether gas has been purchased in the past 24
hours
Jewelry (J): Whether jewelry has been purchased in the
past 24 hours
Age (A): Age of the card holder
Sex (S) Gender of the card holder
These variables are all causally related.
Artificial Intelligence – p.21/39
Conditional Probability Tables
Each node Xi in a belief network has a CPT expressing the
probability of Xi , given its parents as evidence.
Artificial Intelligence – p.22/39
Some Notations
Given a joint probability distribution P , IP (X, A|B) means
X is conditionally independent of A given B, i.e.,
P (X|A, B) = P (X|B).
A and B could be sets of random variables.
Example: L means the letter is A; S means the shape is a
square, and C means the color is black. Then IP (L, S|C),
i.e., P (L|S, C) = P (L|C) (and P (S|L, C) = P (S|C)).
Artificial Intelligence – p.23/39
Some Notations
Given a DAG G = (V, E), if (X, Y ) is an edge in E, X is a
parent of Y and Y is a child of Y . If there exists a path
from X to Y , then X is an ancestor of Y and Y is is a
decedent of X.
Artificial Intelligence – p.24/39
Markov Condition
A DAG G = (V, E) can be a Bayesian network if G
satisfies the Markov condition: For any node X, let P AX
and N DX be the sets of parents and nondescendents of
X, respectively, then IP (X, N DX |P AX ).
Artificial Intelligence – p.25/39
Conditional Probability Tables (CPT)
Each node Xi in a belief network has a CPT expressing the
probability of Xi , given its parents as evidence.
P (S, L, C) = P (S|L, C)P (L, C)
= P (S|C)P (L|C)P (C)
Note: P (S, L, C) = P (s1 , l1 , c1 ), ..., P (S, L, C) = P (s2 , l2 , c2 ), in the textbook,
Artificial Intelligence – p.26/39
The Semantics of Belief Networks
There are two equivalent ways to interpret a belief network
for the variables X1 , . . . , Xn :
1. The network is a representation of the JPD
P(X1 , . . . , Xn ).
2. The network is a collection of conditional dependence
statements about X1 , . . . , Xn .
Interpretation 1 is helpful in understanding how to construct
belief networks.
Interpretation 2 is helpful in designing inference procedures
on top of them.
Artificial Intelligence – p.27/39
Belief Network as JPD
The whole joint P(X1 , . . . , Xn ) can be computed from a belief
network for X1 , . . . , Xn and its CPTs.
For each tuple hx1 , . . . , xn i of possible values for hX1 , . . . , Xn i,
P (X1 = x1 ∧ · · · ∧ Xn = xn ) =
n
Y
P (Xi = xi |P AXi )
i=1
where P AXi is the parents of Xi in the DAG.
Artificial Intelligence – p.28/39
Belief Network as JPD
P (X, Y, Z, W )
= P (X) P (Y |X) P (Z|Y ) P (W |Z)
= (1 − 0.4)(0.8)(1 − 0.7)0.6 = 0.0864
P (Y ) = P (Y |X)P (X) + P (Y |X)P (X) = 0.9(0.4) + 0.8(0.6) = 0.84
Artificial Intelligence – p.29/39
Belief Network as JPD
P (Z|X) = P (Z|Y, X)P (Y |X) + P (Z|Y , X)P (Y |X)
= P (Z|Y )P (Y |X) + P (Z|Y )P (Y |X)
= 0.7(0.9) + 0.4(0.1) = 0.67
P (X|Z) = P (Z|X)P (X)/P (Z)
= 0.67(0.4)/(0.652) = 0.411
Artificial Intelligence – p.30/39
Another Example
How to compute P (Y |Z)?
Artificial Intelligence – p.31/39
Constructing BN: General Procedure
1. Choose a set of random variables Xi that describe the
domain.
2. Order the variables into a list L.
3. Start with an empty network.
4. While there are variables left in L:
(a) Remove the first variable Xi of L and add a node for
it to the network.
(b) Choose a minimal set of nodes already in the
network which satisfy the conditional dependence
property with Xi .
(c) Make these nodes the parents of Xi .
(d) Fill in the CPT for Xi .
Artificial Intelligence – p.32/39
Ordering the Variables
The order in which add the variables to the network is
important.
“Wrong” orderings produces more complex networks.
Artificial Intelligence – p.33/39
Ordering the Variables Right
A general, effective heuristic for constructing simpler belief
networks is to exploit causal links between random variables
whenever possible.
That is, add the variables to the network so that causes get
added before effects.
Artificial Intelligence – p.34/39
Locally Structured Systems
In addition to being a complete and non-redundant
representation of the domain, a belief network is often
more compact than a full joint.
The reason is that probabilistic domains are often
representable as a locally structured system.
In a locally structured system, each subcomponent
interacts with only a bounded number of other
components, regardless of the size of the system.
The complexity of local structures generally grows
linearly, instead of exponentially.
Artificial Intelligence – p.35/39
BN and Locally Structured Systems
In many real-world domains, each random variable is
influenced by at most k others, for some fixed constant k.
Assuming that we have n variables and, for simplicity, that they
are all Boolean, a joint table will have 2n entries.
In a well constructed belief network, each node will have at
most k parents.
This means that each node will have a CPT with at most 2k
entries, for a total of 2k n entries.
Example: n=20, k=5.
entries in network CPTs ≤ 25 × 20 = 640
entries in joint table
= 220
= 1, 048, 576
Artificial Intelligence – p.36/39
Inference in Belief Networks
Main task of a belief network:
Compute the conditional probability of a set of query
variables, given exact values for some evidence variables.
P (Query|Evidence)
Belief networks are flexible enough so that any node
can serve as either a query or an evidence variable.
In general, to decide what actions to take, an agent
1. first gets values for some variables from its percepts,
or from its own reasoning,
2. then asks the network about the possible values of
the other variables.
Artificial Intelligence – p.37/39
Probabilistic Inference with BN
Belief networks are a very flexible tool for probabilistic
inference because they allow several kinds of inference:
Diagnostic inference
Example: Infer
(from causes to effects)
P (JohnCalls|Burglary)
Causal inference
Infer
(from effects to causes)
P (Burglary|JohnCalls)
Example:
(between causes of a common
effect)
Example: Infer
P (Burglary|Alarm ∧ Earthquake)
Intercausal inference
(combination of the above)
Example:
P (Alarm|JohnCalls ∧ ¬Earthquake)
Mixed inference
Infer
Artificial Intelligence – p.38/39
Types of Inference in Belief Networks
Q = query variable
E = evidence variable
Artificial Intelligence – p.39/39