Download Uncertainty

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Foundations of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
Uncertainty
Chapter 13
Copyright, 1996 © Dale Carnegie & Associates, Inc.
Uncertainty
Evolution of an intelligent agent: problem
solving, planning, uncertainty
Dealing with uncertainty is an unavoidable
problem in reality.
An agent must act under uncertainty.
To make decision with uncertainty, we need



Probability theory
Utility theory
Decision theory
CSE 471/598, CBS 598 by H. Liu
2
Sources of uncertainty
No access to the whole truth
No categorical answer
Incompleteness

The qualification problem - impossible to explicitly
enumerate all conditions
Incorrectness of information about conditions
The rational decision depends on both the
relative importance of various goals and the
likelihood of its being achieved.
CSE 471/598, CBS 598 by H. Liu
3
Handling uncertain knowledge
Difficulties in using FOL to cope with UK

A dental diagnosis system using FOL
 Symptom (p, Toothache) =>Disease (p, Cavity)
 Disease (p, Cavity) => Symptom (p, Toothache)
 Are they correct?
Reasons for handling uncertain knowledge



Laziness - too much work! How to avoid it?
Theoretical ignorance - we don’t know everything
Practical ignorance - we don’t want to include all
Represent UK with a degree of belief
The tool for handling UK is probability theory
CSE 471/598, CBS 598 by H. Liu
4
Probability provides a way of summarizing the
uncertainty that comes from our laziness and
ignorance - how wonderful it is!
Probability, belief of the truth of a sentence

1 - true, 0 - false, 0<P<1 - intermediate degrees
of belief in the truth of the sentence
Degree of truth (fuzzy logic) vs. degree of
belief
Alternatives to probability theory?
 Yes, to be discussed in later chapters.
CSE 471/598, CBS 598 by H. Liu
5
All probability statements must indicate
the evidence w.r.t. which the probability
is being assessed.


Prior or unconditional probability before
evidence is obtained
Posterior or conditional probability after
new evidence is obtained
CSE 471/598, CBS 598 by H. Liu
6
Uncertainty & rational decisions
Without uncertainty, decision making is simple achieving the goal or not
With uncertainty, it becomes uncertain - three
plans A90, A120 and A1440
We first need to have preferences between the
different possible outcomes of the plans
Utility theory is used to represent and reason with
preferences.
CSE 471/598, CBS 598 by H. Liu
7
Rationality
Decision Theory = Probability T + Utility T
Maximum Expected Utility Principle defines
rationality

An agent is rational if and only if it chooses the
action that yields the highest utility, averaged over
all possible outcomes of the action
A decision-theoretic agent (Fig 13.1, p 466)

Is it any different from other agents we learned?
CSE 471/598, CBS 598 by H. Liu
8
Basic probability notation
Prior probability


Proposition - P(Sunny)
Random variable - P(Weather=Sunny)
 Boolean, discrete, and continuous random variables


Each RV has a domain (sunny,rain,cloudy,snow)
Probability distribution P(weather) =
<.7,.2,.08,.02>
Joint probability P(A^B)


probabilities of all combinations of the values of a
set of Random Variables
more later
CSE 471/598, CBS 598 by H. Liu
9
Conditional probability
Conditional probability


P(A|B) = P(A^B)/P(B)
Product rule - P(A^B) = P(A|B)P(B)
Probabilistic inference does not work like
logical inference

“P(A|B)=0.6” != “when B is true, P(A) is 0.6”
 P(A) is always a prior.
 For P(A|B), B is the only available evidence.
 When C is available, P(A|B,C) may have little
relation to P(A|B).
CSE 471/598, CBS 598 by H. Liu
10
The axioms of probability
All probabilities are between 0 and 1
Necessarily true (valid) propositions have
probability 1, false (unsatisfiable) 0
The probability of a disjunction
P(AvB)=P(A)+P(B)-P(A^B)

A Venn diagram illustration
Ex: Deriving the rule of
Negation from P(a v !a)
CSE 471/598, CBS 598 by H. Liu
11
The joint probability distribution
Joint completely specifies an agent’s
probability assignments to all
propositions in the domain
A probabilistic model consists of a set of
random variables (X1, …,Xn).
An atomic event is an assignment of
particular values to all the variables

Given Boolean random variables A and B,
what are atomic events?
CSE 471/598, CBS 598 by H. Liu
12
Joint probabilities
An example of two Boolean variables
Toothache
Cavity
!Cavity
0.04
0.01
!Toothache
0.06
0.89
• Observations: mutually exclusive and
collectively exhaustive
• What are P(Cavity), P(Cavity v Toothache),
P(Cavity|Toothache)?
CSE 471/598, CBS 598 by H. Liu
13
Joint (2)
If there is a Joint distribution, we can
read off any probability we need. Is it
true? How?

Discuss next
Impractical to specify all the entries for
the Joint over n Boolean variables.
Sidestep the Joint and work directly
with conditional probability
CSE 471/598, CBS 598 by H. Liu
14
Inference using full joint
distributions
Marginal probability (Fig 13.3)


P(cavity) =
Maginalization – summing out all the
variables other than cavity
 P(Y) =

z P(Y,z)
Conditioning – a variant of maginalization
using the product rule
 P(Y) =
z P(Y|z)P(z)
CSE 471/598, CBS 598 by H. Liu
15
Normalization

Method 1 using P(t) to nomalize
 P(cavity|toothache) = P(c^t)/P(t)
 P(!cavity|toothache) = P(!c^t)/P(t)

Method 2 using α and leaving P(t) out
P(t) is the
same
 P(cavity|toothache) = αP(cavity,toothache)
= α[P(cavity, T, catch) + P(cavity, T, !catch)]
= α[<0.108,0.016> + <0.012,0.064>] = α<0.12,0.08>
 What is α?
CSE 471/598, CBS 598 by H. Liu
16
Independence
P(toothache, catch, cavity, weather)



A total of 32 entries, given W has 4 values
How is one’s tooth problem related to weather?
P(T,Ch,Cy,W=cloudy) = P(W=Clo|T...)P(T…)?
 Whose tooth problem can influence our weather?



P(W=Clo|T…) = P(W=Clo)
Hence, P(T,Ch,Cy,W=clo) = P(W=Clo)P(T…)
How many joint distribution tables? Two - (4, 8)
Independence between X and Y means

P(X|Y) = P(X) or P(Y|X) = P(Y) or P(XY) = P(X)P(Y)
CSE 471/598, CBS 598 by H. Liu
17
Bayes’ rule
Deriving the rule via the product rule

P(B|A) = P(A|B)P(B)/P(A)
A more general case is P(X|Y) = P(Y|X)P(X)/P(Y)
Bayes’ rule conditionalized on evidence E
P(X|Y,E) = P(Y|X,E)P(X|E)/P(Y|E)
Applying the rule to medical diagnosis


meningitis (P(M)=1/50,000)), stiff neck (P(S)=1/20),
P(S|M)=0.5, what is P(M|S)?
Why is this kind of inference useful?
CSE 471/598, CBS 598 by H. Liu
18
Applying Bayes’ rule
Relative likelihood

Comparing the relative likelihood of meningitis and
whiplash, given a stiff neck, which is more likely?
P(M|S)/P(W|S) = P(S|M)P(M)/P(S|W)P(W)
Avoiding direct assessment of the prior


P(M|S) =? P(!M|S) =? And P(M|S) + P(!M|S) = 1,
P(S) = ? P(S|!M) = ?
P (S) = m P(Sm) = m P(S|m) P(m) =
P(S|M)P(M) + P(S|!M)P(!M)
CSE 471/598, CBS 598 by H. Liu
19
Using Bayes’ rule
Combining evidence

from P(Cavity|Toothache) and P(Cavity|Catch) to
P(Cavity|Toothache,Catch)
Bayesian updating


from P(Cavity|T)=P(Cavity)P(T|Cavity)/P(T)
 P(A|B) = P(B|A)P(A)/P(B)
to P(Cavity|T,Catch)=•
P(Catch|T,Cavity)/P(Catch|T)
 P(A|B,C) = P(B|A,C)P(A|C)/P(B|C)
CSE 471/598, CBS 598 by H. Liu
20
Recall that independent events A, B
P(B|A)=P(B), P(A|B)=P(A), P(A,B)=P(A)P(B)
Conditional independence (X and Y are ind
given Z)
 P(X|Y,Z)=P(X|Z) and P(Y|X,Z)=P(Y|Z)
 P(XY|Z)=P(X|Z)P(Y|Z) derived from absolute
indepence

Given Cavity, Toothache and Catch are indpt


P(T,Ch,Cy) = P(T,Ch|Cy)P(Cy) =
T|Cy T|!Cy
P(T|Cy)P(Ch|Cy)P(Cy)
One large table is decomposed into 3 smaller
tables: 23-1 vs. 5 (= 2*(21-1)+2*(21-1)+(21-1))
Cy
CSE 471/598, CBS 598 by H. Liu
21
Independence, decomposition,
Naïve Bayes
If all n symptoms are conditionally indpt given
Cavity, the size of the representation grows as
O(n) instead of O(2n)
The decomposition of large probabilistic domains
into weakly connected subsets via conditional
independence is one important development in
modern AI
Naïve Bayes model (Cause and Effects)
 P(Cause,E1,…,En) = P(Cause)  P(Ei|Cause)

An amazingly successful classifier can be built
CSE 471/598, CBS 598 by H. Liu
22
Where do probabilities come from?
There are three positions:



The frequentist - numbers can come only from experiments
The objectivist - probabilities are real aspects of the universe
The subjectivist - characterizing an agent’s belief
The reference class problem – what constitutes
the reference class - intrusion of subjectivity

A frequentist doctor wants to consider similar patients
 How similar two patients are?
Laplace’s principle of indifference

Propositions that are syntactically “symmetric” w.r.t. the
evidence should be accorded equal probability
CSE 471/598, CBS 598 by H. Liu
23
Summary
Uncertainty exists in the real world.
It is good (as it allows for laziness) and bad
(since we need new tools)
Priors, posteriors, and joint
Bayes’ rule - the base of Bayesian Inference
Conditional independence allows Bayesian
updating to work effectively with many
pieces of evidence.
But ...
CSE 471/598, CBS 598 by H. Liu
24