Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CSC 450
AIMA 3e Chapter 13:
Quantifying Uncertainty
OUTLINE
2
Overview
1. rationale for a new representational language
what logical representations can't do
2.
3.
4.
5.
6.
7.
utilities & decision theory
possible worlds & propositions
unconditional & conditional probabilities
random variables
probability distributions
using the Joint Probability Distribution
for inference by enumeration
for unconditional & conditional probabilities
Quantifying Uncertainty
3
Quantifying Uncertainty
consider our approach so far
we've handled limited observability &/or non-determinism
using belief states that capture all possible world states
but the representation can become large, as can corresponding
contingent plans, and it's possible that no plan can be
guaranteed to reach the goal, yet the agent must act
agents should behave rationally
this rationality depends both on the importance of goals and
on the chances of & degree to which they'll be reached
Quantifying Uncertainty
4
A Visit to the Dentist
we'll use medical/dental diagnosis examples
extensively
our new prototype problem relates to whether a dental
patient has a cavity or not
the process of diagnosis always involves uncertainty & this
leads to difficulty with logical representations (propositional
logic examples)
(1) toothache cavity
(2) toothache cavity gumDisease ...
(3) cavity toothache
(1) is just wrong since other things cause toothaches
(2) will need to list all possible causes
(3) tries a causal rule but it's not always the case that cavities
cause toothaches & fixing the rule requires making it logically
exhaustive
Quantifying Uncertainty
5
Representations for Diagnosis
logic is not sufficient for medical diagnosis, due to
our Laziness: it's too hard to list all possible antecedents or
consequents to make the rule have no exceptions
our Theoretical Ignorance: generally, there is no complete
theory of the domain, no complete model
our Practical Ignorance: even if the rules were complete, in
any particular case it's impractical or impossible to do all the
necessary tests, to have all relevant evidence
the example relationship between toothache & cavities is not
a logical consequence in either direction
instead, knowledge of the domain provides a degree of belief
in diagnostic sentences & the way to represent this is with
probability theory
next slide: recall our discussion of ontological & epistemological
commitments from 352
Quantifying Uncertainty
6
Epistemological Commitment
ontological commitment
what a representational language assumes about the nature
of reality - logic & probability theory agree in this, that facts
do or do not hold
epistemological commitment
the possible states of knowledge
for logic, sentences are true/false/unknown
for probability theory, there's a numerical degree of belief in
sentences, between 0 (certainly false) and 1 (certainly true)
Quantifying Uncertainty
The Qualification Problem
7
for a logical representation
the success of a plan can't be inferred because of all the
conditions that could interfere but can't be deduced not to
happen (this is the qualification problem)
probability is a way of dealing with the qualification problem by
numerically summarizing the uncertainty that derives from
laziness &/or ignorance
returning to the toothache & cavity problem
in the real world, the patient either does or does not have a cavity
a probabilistic agent makes statements with respect to the
knowledge state, & these may change as the state of knowledge
changes
for example, an agent initially may believe there's an 80% chance
(probability 0.8) that the patient with the toothache has a cavity,
but subsequently revises that as additional evidence is available
8
Rational Decisions
making choices among plans/actions when the
probabilities of their success differ
this requires additional knowledge of preferences among
outcomes
this is the domain of utility theory: every state has a degree
of utility/usefulness to the agent & the agent will prefer those
with higher utility
utilities are specific to an agent, to the extent that they can even
encompass perverse or altruistic preferences
Quantifying Uncertainty
9
Rational Decisions
making choices among plans/actions when the
probabilities of their success differ
we can combine preferences (utilities) + probabilities to get a
general theory of rational decisions: Decision Theory
a rational agent chooses actions to yield the highest expected
utility averaged over all possible outcomes of the action
this is the Maximum Expected Utility (MEU) principle
expected = average of the possible outcomes of an action weighted
by their probabilities
choice of action = the one with highest expected utility
Quantifying Uncertainty
Revising Belief States
10
belief states
in addition to the possible world states that we included before,
belief states now include probabilities
the agent incorporates probabilistic predictions of action
outcomes, selecting the one with the highest expected utility
AIMA3e chapters 13 through 17 address various aspects of using
probabilistic representations
an algorithmic description of the Decision Theoretic Agent
function DT-AGENT (percept) returns an action
persistent: belief-state, probabilistic beliefs about the current state of the world
action, the agent's action
update belief-state based on action and percept
calculate outcome probabilities for actions,
given action descriptions and current belief state
select action with the highest expected utility,
given probabilities of outcomes and utility information
return action
11
Notation & Basics
we should interpret probabilities as describing
possible worlds and their likelihoods
the sample space is the set of all possible worlds
note that possible worlds are mutually exclusive & exhaustive
for example, a roll of a pair of dice has 36 possible worlds
we use the Greek letter omega to refer to possible worlds
refers to the sample space, to its elements (particular
possible worlds)
a basic axiom for probability theory
(13.1) 0 P() 1,
P() 1
as an example, for the dice rolls, each possible world is a pair
(1, 1), (1, 2), ..., (6, 6)
each with a probability of 1/36, all summing to 1
Quantifying Uncertainty
12
Notation & Basics
assertions & queries in probabilistic reasoning
these are usually about sets of possible worlds
these are termed events in probability theory
for AI, the sets of possible worlds are described by
propositions in a formal language
the set of possible worlds corresponding to a proposition
contains those in which the proposition holds
the probability of the proposition is the sum over those
possible worlds
Quantifying Uncertainty
Propositions
propositions
another axiom of probability theory, using the Greek letter phi
() for proposition
(13.2) P()
P()
so for a fair pair of dice P(total = 7) =
P((1+6))+P((6+1))+P((2+5))+P((5+2))+P(((3+4))+P((4+3))
=1/36+1/36+1/36+1/36+1/36+1/36 = 1/6
asserting the probability of a proposition constrains the
underlying probability model without fully determining it
13
Propositions
14
propositions: unconditional & conditional probabilities
P(total = 7) from the previous slide & similar probabilities are
called unconditional or prior probabilities, sometimes
abbreviated as priors
they indicate the degree of belief in propositions without any other
information, though in most cases, we do have other information,
or evidence
when we have evidence, the probabilities are conditional or
posterior, given the evidence
Conditional Probability
P(A | B) is the probability of A given B
Assumes that B is the only info known.
Defined by:
True
P( A B)
P( A | B)
P( B)
A
AB
B
15
Random Variables & Values
16
propositions in our probability notation
by convention, the values for random variables use lower case
letters, for example Weather = rain
each random variable has a domain, its set of possible values
for a Boolean random variable the domain is {true, false}
also by convention, A = true is written as simply a, A = false as
¬a
domains also may be arbitrary sets of tokens, like the {red, green,
blue} of the map coloring CSP or {juvenile, teen, adult} for Age
when it's unambiguous, a value by itself may represent the
proposition that a variable has that value
for example, using just sunny for Weather = sunny
17
Distribution Notation
bold is used as a notational coding
for the probabilities of all possible values of a random variable
we may list the propositions or we may abbreviate, given an
ordering on the domain
as in the ordering (sunny, rain, cloudy, snow) for Weather
then P(Weather) = <0.6, 0.1, 0.29, 0.01>, where bold indicates
there's a vector of values
this defines a probability distribution for the random variable
Weather
we can use a similar shorthand for conditional distributions, for
example:
P(X|Y) lists the values for P(X=xi | Y=yj) for all i,j pairs
Quantifying Uncertainty
Continuous Variables
distributions are
the probabilities of all possible values of a random variable
there's alternative notation for continuous variables where there
cannot be an explicit list: instead, express the distribution as a
parameterized function of value
for example, P(NoonTemp=x) = Uniform[18C,26C] (x) specifies a
probability density function (pdf) that defines density function
values for intervals of the NoonTemp variable values
18
Distribution Notation
19
for distributions on multiple variables
we use commas between the variables: so P(Weather, Cavity)
denotes the probabilities of all combinations of values of the 2
variables
for discrete random variables we can use a tabular
representation, in this case yielding a 4x2 table of probabilities
this gives the joint probability distribution of Weather & Cavity
tabulates the probabilities for all combinations
Full Joint Distribution
20
semantics of a proposition
the probability model is determined by the joint distribution for
all the random variables: the full joint probability distribution
for the Cavity, Toothache, Weather domain, the notation is:
P(Cavity, Toothache, Weather)
this can be represented as a 2x2x4 table
given the definition of the probability of a proposition as a sum
over possible worlds, the full joint distribution allows calculating
the probability of any proposition over its variables by summing
entries in the FJD
21
Inference With Probability
using the full joint distributions for inference
here's the FJD for the Toothache, Cavity, Catch domain of 3
Boolean variables
as required by the axioms, the probabilities sum to 1.0
when available, the FJD gives a direct means of calculating
the probability of any proposition
just sum the probabilities for all the possible worlds in which the
proposition is true
Quantifying Uncertainty
Inference With Probability
P(toothache)=.108+.012+.016+.064
= .20 or 20%
22
Inference With Probability
P(toothachecavity) =
.20 + ??.072 + .008
.28
23
Inference With Probability
24
25
Inference for Probability
given the full joint distribution & 13.9
we can answer all probability queries for discrete variables
are we left with any unresolved issues?
well, given n variables, and d as an upper bound on the number
of values then the full joint distribution table size &
corresponding processing of it are O(dn), exponential in n
since n might be 100 or more for real problems, this is often
simply not practical
as a result, the FJD is not the implementation of choice for
real systems, but functions more as the theoretical reference
point (analogous to role of truth tables for propositional logic)
the next sections we look at are foundational for developing
practical systems
Quantifying Uncertainty
26
END OF LECTURE
Quantifying Uncertainty