Download P - Computing Science - Thompson Rivers University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Uncertainty
Fall 2013
Comp3710 Artificial Intelligence
Computing Science
Thompson Rivers University
Course Outline



Part I – Introduction to Artificial Intelligence
Part II – Classical Artificial Intelligence
Part III – Machine Learning





Introduction to Machine Learning
Neural Networks
Probabilistic Reasoning and Bayesian Belief Networks
Artificial Life: Learning through Emergent Behavior
Part IV – Advanced Topics
TRU-COMP3710
Uncertainty
2
Learning Outcomes







Relate a proposition to probability, i.e., random variables.
Give the sample space for a set of random variables.
Use one of Komogorov’s 3 axioms
P(A  B) = P(A) + P(B) – P(A  B)
Give the joint probability distribution for a set of random variables
Compute probabilities given a joint probability distribution
Give the joint probability distribution for a set of random variables that
have conditional independece relations.
Compute the diagnosic probability from causal probabilities.
Uncertainty
3
Outline
1.
2.
3.
4.
5.
How to deal with uncertainty
Probability
Syntax and Semantics
Inference
Independence and Bayes' Rule
Uncertainty
4
1. Uncertainty

An example of decision making problem:




Problems: Application environments are not deterministic and
unpredictable. Many hidden factors.
1.
2.
3.
4.

2.

partial observability (road state, other drivers' plans, etc.)
noisy sensors (traffic reports)
uncertainty in action outcomes (flat tire, etc.)
immense complexity of modeling and predicting traffic
Hence a purely logical approach either
1.

Let action At = leave for airport t minutes before flight.
Will At get me there on time?
[Q] How to decide?
risks falsehood: “A25 will get me there on time”, or
leads to conclusions that are too weak for decision making:
“A25 will get me there on time if there's no accident on the bridge and it
doesn't rain and my tires remain intact etc etc.”
(A1440 might reasonably be said to get me there on time but I'd have to
stay overnight in the airport …)
Uncertainty
5
Methods for handling uncertainty


[Q] How to handle uncertainty?
[Q] Default logic?



Default assumptions: My car does not have a flat tire. ...
A25 works unless contradicted by evidence.
But issues: What assumptions are reasonable? How to handle contradiction?

[Q] Can we use fuzzy logic?

Probability




Topics
Model of degree of belief
Given the available evidence or knowledge,
A25 will get me there on time with probability 0.04.
Fuzzy logic


Degree of truth
E.g., my age
Uncertainty
6
2. Probability

Probability is a way of expressing knowledge or belief that an event
will occur or has occurred. E.g., …

Probabilistic assertions summarize effects of hidden factors, i.e.,





laziness: failure to enumerate exceptions, qualifications, etc.
ignorance: lack of relevant facts, initial conditions, etc.
Subjective or Bayesian probability:
Probabilities can relate propositions to agent's own state of
knowledge.
e.g., P(A25 | no reported accidents) = 0.06
Probabilities of propositions change with new evidence:
e.g., P(A25 | no reported accidents, 5 a.m.) = 0.15
Uncertainty
7
Making decisions under uncertaintyTopics

Suppose I believe the following:
P(A25 gets me there on time | …) = 0.04
P(A90 gets me there on time | …) = 0.70
P(A120 gets me there on time | …) = 0.95
P(A1440 gets me there on time | …) = 0.9999


[Q] Which action to choose?
Depends on my preferences for missing flight vs. time spent waiting,
etc.


Utility (the quality of being useful) theory is used to represent and infer
preferences.
Decision theory = probability theory + utility theory
Uncertainty
8
3. Syntax and Semantics




Sample space, event, probability, probability distribution
Random variable and proposition
Prior probability (aka unconditional probability)
Posterior probability (aka conditional probability)
Uncertainty
9
Sample Space and Probability

Definition. Begin with a set Ω – the sample space of the same type
events



Definition. A probability space or probability distribution or
probability model is a sample space with an assignment P(ω) for every
ω in Ω s.t.
1.
2.


E.g., 6 possible rolls of a die -> Ω = {???}
ω in Ω is a sample point / possible world / atomic event
0 ≤ P(ω) ≤ 1
Σω P(ω) = 1
E.g., P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6.
Definition. An event A can be considered as a subset of Ω.

P(A) = Σω in A P(ω)

E.g., P(die roll < 4) = P(1) + P(2) + P(3) = 1/6 + 1/6 + 1/6 = 1/2
Uncertainty
10
Axioms of probability

3.
Kolmogorov’s axioms: For any propositions A, B



0 ≤ P(A) ≤ 1
P(true) = 1 and P(false) = 0
P(A  B) = P(A) + P(B) – P(A  B)
Uncertainty
11
Random Variable and Proposition

Definition. Basic element: random variable, representing a sample
space



Boolean random variables


E.g., Cavity (do I have a cavity?); [Q] What is the sample space?
Discrete random variables (finite or infinite)


A function from sample points to some range, e.g., the reals or Booleans
E.g., Odd(1) = true
E.g., Weather is one of <sunny, rainy, cloudy, snow>; [Q] What is the
sample space?
Continuous random variables (bounded or unbounded)

E.g., Temp < 0.22; [Q] What is the sample space?
Uncertainty
12
3.
Proposition (how to relate to probability?)

Think of a proposition as the event where the proposition is true.



[Q] How to relate to probability, i.e., random variables?
Similar to propositional logic: possible worlds defined by assignment
of values to random variables.




E.g., I have a toothache; the weather is sunny and there is no cavity.
A sample point is a value of a set of random variables.
A sample space is the Cartesian product of the ranges of random variables.
[Q] What is the sample space for Cavity and Weather?
Elementary proposition constructed by assignment of a value to a
random variable:
E.g., Weather = sunny; Cavity = false (can be abbreviated as cavity)

Complex propositions formed from elementary propositions and
standard logical connectives
E.g., Weather = sunny  Cavity = false

Proposition = conjunction/disjunction of atomic events
Uncertainty
13
Prior and Posterior Probability


Prior (aka unconditional) probability
Posterior (aka conditional) probability
Uncertainty
14
Prior probability

Prior or unconditional probabilities of propositions
E.g., P(Cavity = true) = 0.1 and P(Weather = sunny) = 0.72 correspond to
belief prior to arrival of any (new) evidence.

Probability distribution gives values for all possible assignments:
P(Weather) = <0.72, 0.1, 0.08, 0.1> (normalized, i.e., sums to 1)


Joint probability distribution for a set of random variables gives the
probability of every atomic event on those random variables.
P(Weather, Cavity) = a 4 × 2 matrix of values:
Weather = sunny rainy cloudy snow
Cavity = true
0.144 0.02
0.016 0.02
Cavity = false
0.576 0.08
0.064 0.08
[Q] What is the sum of all the values above?
Every question about a domain can be answered by the joint
probability distribution.
Uncertainty
15
P(Weather, Cavity) = a 4 × 2 matrix of values:
Weather = sunny rainy
Cavity = true
0.144 0.02
Cavity = false
0.576 0.08



cloudy snow
0.016 0.02
0.064 0.08
[Q] What is the probability of cloudy?
[Q] What is the probability of snow and no-cavity?
[Q] What is the probability of sunny or cavity?
Uncertainty
16
Prior probability – continuous variables

In general, express probability distribution as parameterized function,
called a probability density function (pdf), of value

E.g., P(X=x) = U[18,26](x) = 0.125
0.125
18

dx
26
P is a density; integrates to 1
P(20.5  X  20.5  dx)
 0.125
dx 0
dx
lim
 P( X )dx  (26  18)  0.125  1
Uncertainty
17
3.

Example of Normal distribution, or also called Gaussian
distribution
( x )
P( x) 
1
 2 2

e
2
2 2
; :mean,  2 :variance
Probabilities
From Wikipedia.org
Events
Uncertainty
18
Conditional probability

Posterior or conditional probabilities
e.g., P(cavity | toothache) = 0.6
i.e., given that toothache is all I know, (it is not like if toothache,) the 60%
chance of cavity.
Another example, P(toothache | cavity) = 0.8

New evidence may be irrelevant, allowing simplification, e.g.,
P(cavity | toothache, sunny) = P(cavity | toothache) = 0.6

This kind of inference, sanctioned by domain knowledge, is crucial.

Other very interesting examples: The stiff neck is one symptom of
meningitis; the probability of having meningitis with stiff neck?
A car won’t start. What is the probability
of having a bad starter? What is the
probability of having a bad battery?
Uncertainty
19
3.

Definition of conditional probability:
P(a | b) = P(a  b) / P(b) if P(b) > 0

b
Topics
a
Product rule gives an alternative formulation:
P(a  b) = P(a) P(b | a) = P(b) P(a | b)

Chain rule is derived by successive application of product rule:
P(a  b  c)
P(X1,…, Xn)
= ???
= P(a  b) P(c | a  b)
= P(a) P(b | a) P(c | a  b)
= P(X1,..., Xn-1) P(Xn | X1,..., Xn-1)
= P(X1,..., Xn-2) P(Xn-1 | X1,..., Xn-2) P(Xn | X1,..., Xn-1)
=…
= X1 P(X2 | X1) P(X3 | X1, X2) P(X4 | X1, X2, X3) …
n
= i=1 P(Xi | X1,…, Xi-1)
will be used in Inference later!
Uncertainty
20
4. Inference by Enumeration





Start with the joint probability distribution: Toothache (I have a
toothache), Cavity (meaning?), Catch (meaning?).
[Q] What are random variables here?
[Q] What is the sum of probabilities of all the atomic events?
[Q] How can you obtain the above probability distribution?
For any proposition φ, the probability is the sum of the atomic events
where it is true:
P(φ) = Σω:ω╞φ P(ω)
Uncertainty
21





Start with the joint probability distribution: Toothache, Cavity, Catch
For any proposition φ, sum the atomic events where it is true:
P(φ) = Σω:ω╞φ P(ω)
[Q] P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2
[Q] P(cavity  toothache) = ???
[Q] P(cavity  toothache) = ???
Uncertainty
22

Start with the joint probability distribution: Toothache, Cavity, Catch

Can also compute conditional probabilities:
Very important!
P(cavity | toothache) = P(cavity  toothache)
P(toothache)
=
0.016 + 0.064
0.108 + 0.012 + 0.016 + 0.064
= 0.4
[Q] P(cavity | toothache) = ??? P(cavity | toothache) = ???
[Q] P(cavity | catch) = ???
Uncertainty
23

Denominator can be viewed as a normalization constant α
P(Cavity | toothache)
= P(Cavity, toothache) / P(toothache)
= α, P(Cavity, toothache)
= α, [P(Cavity, toothache, catch) + P(Cavity, toothache, catch)]
= α, [<0.108, 0.016> + <0.012, 0.064>]
= α, <0.12, 0.08> = <0.6, 0.4>
[Q] What is α?
General idea: compute distribution on query variable by fixing evidence
variables and summing over hidden variables
Uncertainty
24
Topics

Obvious problems:
1.
2.
3.
Worst-case time complexity O(dn) where d is the largest arity
Space complexity O(dn) to store the joint distribution
How to find the numbers for O(dn) entries?
 In other words, too much.
 [Q] How to reduce?
Uncertainty
25
5. Independence & Bayes’ Rule
1.
2.
3.
Independence
Conditional independence
Bayes’ rule
Uncertainty
26
Independence

5.
A and B are independent iff P(A|B) = P(A) or P(B|A) = P(B)
or P(A,B) = P(A) P(B)
P(Toothache, Catch, Cavity, Weather) = P(Toothache, Catch, Cavity) ×
P(Weather)



In the above example, 32 (2*2*2*4) entries reduced to 12 (2*2*2 + 4).
Absolute independence powerful but rare.
Dentistry is a large field with hundreds of variables, none of which are
independent. [Q] What to do?
Uncertainty
27
Conditional Independence

If I have a cavity, the probability that the probe catches in it doesn't
depend on whether I have a toothache:
(1) P(catch | toothache, cavity) = P(catch | cavity)

The same independence holds if I haven't got a cavity:
(2) P(catch | toothache,cavity) = P(catch | cavity)

Catch is conditionally independent of Toothache given Cavity:
P(Catch | Toothache, Cavity) = P(Catch | Cavity)

When Catch is conditionally independent of Toothache given Cavity,
equivalent statements:
P(Toothache | Catch, Cavity) = P(Toothache | Cavity)
P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity)
(similarly P(A, B) = P(A) P(B) when A and B are independent)
Uncertainty
28

Cavity
Write out full joint distribution using chain rule:
P(Toothache, Catch, Cavity)
Toothache
Catch
= P(Toothache | Catch, Cavity) P(Catch, Cavity) by the product rule
= P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity) by ???
= P(Toothache | Cavity) P(Catch | Cavity) P(Cavity)
by ???
2 * 2 * 2 = 8 -> 2 + 2 + 1
How ???
[Q] P(toothache) from the tables below???
= P(tcavity) + P(tcavity)
= P(t|c) P(c) + P(t|c) P(c) = ???
[Q] P(t|cavity) = ???
= 1 – P(t|c) = ???
?
toothache
| cavity
toothache
| cavity
catch
| cavity
catch
| cavity
×(.108+.012)
×(.016+.064)
×(.108+.072)
×(.016+.144)
Uncertainty
cavity
.108+.012+.072+.008
29
5.

Write out full joint distribution using chain rule:
P(Toothache, Catch, Cavity)
= P(Toothache | Catch, Cavity) P(Catch, Cavity)
= P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity)
= P(Toothache | Cavity) P(Catch | Cavity) P(Cavity)


In most cases, the use of conditional independence reduces the size of
the representation of the joint distribution from exponential in n to
linear in n.
Conditional independence is our most basic and robust form of
knowledge about uncertain environments.
Uncertainty
30
Bayes' Rule



Product rule P(ab) = P(a | b) P(b) = P(b | a) P(a)
 Bayes' rule, or also called Bayes’ Theorem:
P(a | b) = P(b | a) P(a) / P(b)
or in distribution form
P(Y|X) = P(X|Y) P(Y) / P(X) = α P(X|Y) P(Y)
Useful for assessing diagnostic probability from causal probability:

P(Cause|Effect) = P(Effect|Cause) P(Cause) / P(Effect)

The stiff neck is one symptom of meningitis; the probability of having
meningitis with stiff neck?
A car won’t start. What is the probability of having a bad starter? What is
the probability of having a bad battery?

Uncertainty
31

Useful for assessing diagnostic probability from causal probability:

P(Cause|Effect) = P(Effect|Cause) P(Cause) / P(Effect)

[Q] P(cavity | toothache) ?

[Q] E.g., let M be meningitis, S be stiff neck; the stiff neck is one symptom
of meningitis; the probability of having meningitis with stiff neck?
P(s) = 0.1; P(m) = 0.0001; P(s|m) = 0.8
P(m|s) = ???
How to find P(s|m), P(m) and P(s) ???
Note: posterior probability of meningitis still very small!

Uncertainty
32
5.
Bayes' rule and conditional independence
P(Cavity | toothache  catch)
= α P(toothache  catch | Cavity) P(Cavity)
= α P(toothache | Cavity) P(catch | Cavity) P(Cavity)

Topics
This is an example of a naïve Bayes model:
P(Cause, Effect1, … , Effectn) = P(Cause) πi P(Effecti|Cause)

We will use this later again!
Uncertainty
33
6. Summary





Topics
Probability is a rigorous formalism for uncertain knowledge
Joint probability distribution for discrete random variables specifies
probability of every atomic event
Queries can be answered by summing over atomic events
For nontrivial domains, we must find a way to reduce the joint size
Independence and conditional independence provide the tools
Uncertainty
34