Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Uncertainty Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University Course Outline Part I – Introduction to Artificial Intelligence Part II – Classical Artificial Intelligence Part III – Machine Learning Introduction to Machine Learning Neural Networks Probabilistic Reasoning and Bayesian Belief Networks Artificial Life: Learning through Emergent Behavior Part IV – Advanced Topics TRU-COMP3710 Uncertainty 2 Learning Outcomes Relate a proposition to probability, i.e., random variables. Give the sample space for a set of random variables. Use one of Komogorov’s 3 axioms P(A B) = P(A) + P(B) – P(A B) Give the joint probability distribution for a set of random variables Compute probabilities given a joint probability distribution Give the joint probability distribution for a set of random variables that have conditional independece relations. Compute the diagnosic probability from causal probabilities. Uncertainty 3 Outline 1. 2. 3. 4. 5. How to deal with uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule Uncertainty 4 1. Uncertainty An example of decision making problem: Problems: Application environments are not deterministic and unpredictable. Many hidden factors. 1. 2. 3. 4. 2. partial observability (road state, other drivers' plans, etc.) noisy sensors (traffic reports) uncertainty in action outcomes (flat tire, etc.) immense complexity of modeling and predicting traffic Hence a purely logical approach either 1. Let action At = leave for airport t minutes before flight. Will At get me there on time? [Q] How to decide? risks falsehood: “A25 will get me there on time”, or leads to conclusions that are too weak for decision making: “A25 will get me there on time if there's no accident on the bridge and it doesn't rain and my tires remain intact etc etc.” (A1440 might reasonably be said to get me there on time but I'd have to stay overnight in the airport …) Uncertainty 5 Methods for handling uncertainty [Q] How to handle uncertainty? [Q] Default logic? Default assumptions: My car does not have a flat tire. ... A25 works unless contradicted by evidence. But issues: What assumptions are reasonable? How to handle contradiction? [Q] Can we use fuzzy logic? Probability Topics Model of degree of belief Given the available evidence or knowledge, A25 will get me there on time with probability 0.04. Fuzzy logic Degree of truth E.g., my age Uncertainty 6 2. Probability Probability is a way of expressing knowledge or belief that an event will occur or has occurred. E.g., … Probabilistic assertions summarize effects of hidden factors, i.e., laziness: failure to enumerate exceptions, qualifications, etc. ignorance: lack of relevant facts, initial conditions, etc. Subjective or Bayesian probability: Probabilities can relate propositions to agent's own state of knowledge. e.g., P(A25 | no reported accidents) = 0.06 Probabilities of propositions change with new evidence: e.g., P(A25 | no reported accidents, 5 a.m.) = 0.15 Uncertainty 7 Making decisions under uncertaintyTopics Suppose I believe the following: P(A25 gets me there on time | …) = 0.04 P(A90 gets me there on time | …) = 0.70 P(A120 gets me there on time | …) = 0.95 P(A1440 gets me there on time | …) = 0.9999 [Q] Which action to choose? Depends on my preferences for missing flight vs. time spent waiting, etc. Utility (the quality of being useful) theory is used to represent and infer preferences. Decision theory = probability theory + utility theory Uncertainty 8 3. Syntax and Semantics Sample space, event, probability, probability distribution Random variable and proposition Prior probability (aka unconditional probability) Posterior probability (aka conditional probability) Uncertainty 9 Sample Space and Probability Definition. Begin with a set Ω – the sample space of the same type events Definition. A probability space or probability distribution or probability model is a sample space with an assignment P(ω) for every ω in Ω s.t. 1. 2. E.g., 6 possible rolls of a die -> Ω = {???} ω in Ω is a sample point / possible world / atomic event 0 ≤ P(ω) ≤ 1 Σω P(ω) = 1 E.g., P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6. Definition. An event A can be considered as a subset of Ω. P(A) = Σω in A P(ω) E.g., P(die roll < 4) = P(1) + P(2) + P(3) = 1/6 + 1/6 + 1/6 = 1/2 Uncertainty 10 Axioms of probability 3. Kolmogorov’s axioms: For any propositions A, B 0 ≤ P(A) ≤ 1 P(true) = 1 and P(false) = 0 P(A B) = P(A) + P(B) – P(A B) Uncertainty 11 Random Variable and Proposition Definition. Basic element: random variable, representing a sample space Boolean random variables E.g., Cavity (do I have a cavity?); [Q] What is the sample space? Discrete random variables (finite or infinite) A function from sample points to some range, e.g., the reals or Booleans E.g., Odd(1) = true E.g., Weather is one of <sunny, rainy, cloudy, snow>; [Q] What is the sample space? Continuous random variables (bounded or unbounded) E.g., Temp < 0.22; [Q] What is the sample space? Uncertainty 12 3. Proposition (how to relate to probability?) Think of a proposition as the event where the proposition is true. [Q] How to relate to probability, i.e., random variables? Similar to propositional logic: possible worlds defined by assignment of values to random variables. E.g., I have a toothache; the weather is sunny and there is no cavity. A sample point is a value of a set of random variables. A sample space is the Cartesian product of the ranges of random variables. [Q] What is the sample space for Cavity and Weather? Elementary proposition constructed by assignment of a value to a random variable: E.g., Weather = sunny; Cavity = false (can be abbreviated as cavity) Complex propositions formed from elementary propositions and standard logical connectives E.g., Weather = sunny Cavity = false Proposition = conjunction/disjunction of atomic events Uncertainty 13 Prior and Posterior Probability Prior (aka unconditional) probability Posterior (aka conditional) probability Uncertainty 14 Prior probability Prior or unconditional probabilities of propositions E.g., P(Cavity = true) = 0.1 and P(Weather = sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence. Probability distribution gives values for all possible assignments: P(Weather) = <0.72, 0.1, 0.08, 0.1> (normalized, i.e., sums to 1) Joint probability distribution for a set of random variables gives the probability of every atomic event on those random variables. P(Weather, Cavity) = a 4 × 2 matrix of values: Weather = sunny rainy cloudy snow Cavity = true 0.144 0.02 0.016 0.02 Cavity = false 0.576 0.08 0.064 0.08 [Q] What is the sum of all the values above? Every question about a domain can be answered by the joint probability distribution. Uncertainty 15 P(Weather, Cavity) = a 4 × 2 matrix of values: Weather = sunny rainy Cavity = true 0.144 0.02 Cavity = false 0.576 0.08 cloudy snow 0.016 0.02 0.064 0.08 [Q] What is the probability of cloudy? [Q] What is the probability of snow and no-cavity? [Q] What is the probability of sunny or cavity? Uncertainty 16 Prior probability – continuous variables In general, express probability distribution as parameterized function, called a probability density function (pdf), of value E.g., P(X=x) = U[18,26](x) = 0.125 0.125 18 dx 26 P is a density; integrates to 1 P(20.5 X 20.5 dx) 0.125 dx 0 dx lim P( X )dx (26 18) 0.125 1 Uncertainty 17 3. Example of Normal distribution, or also called Gaussian distribution ( x ) P( x) 1 2 2 e 2 2 2 ; :mean, 2 :variance Probabilities From Wikipedia.org Events Uncertainty 18 Conditional probability Posterior or conditional probabilities e.g., P(cavity | toothache) = 0.6 i.e., given that toothache is all I know, (it is not like if toothache,) the 60% chance of cavity. Another example, P(toothache | cavity) = 0.8 New evidence may be irrelevant, allowing simplification, e.g., P(cavity | toothache, sunny) = P(cavity | toothache) = 0.6 This kind of inference, sanctioned by domain knowledge, is crucial. Other very interesting examples: The stiff neck is one symptom of meningitis; the probability of having meningitis with stiff neck? A car won’t start. What is the probability of having a bad starter? What is the probability of having a bad battery? Uncertainty 19 3. Definition of conditional probability: P(a | b) = P(a b) / P(b) if P(b) > 0 b Topics a Product rule gives an alternative formulation: P(a b) = P(a) P(b | a) = P(b) P(a | b) Chain rule is derived by successive application of product rule: P(a b c) P(X1,…, Xn) = ??? = P(a b) P(c | a b) = P(a) P(b | a) P(c | a b) = P(X1,..., Xn-1) P(Xn | X1,..., Xn-1) = P(X1,..., Xn-2) P(Xn-1 | X1,..., Xn-2) P(Xn | X1,..., Xn-1) =… = X1 P(X2 | X1) P(X3 | X1, X2) P(X4 | X1, X2, X3) … n = i=1 P(Xi | X1,…, Xi-1) will be used in Inference later! Uncertainty 20 4. Inference by Enumeration Start with the joint probability distribution: Toothache (I have a toothache), Cavity (meaning?), Catch (meaning?). [Q] What are random variables here? [Q] What is the sum of probabilities of all the atomic events? [Q] How can you obtain the above probability distribution? For any proposition φ, the probability is the sum of the atomic events where it is true: P(φ) = Σω:ω╞φ P(ω) Uncertainty 21 Start with the joint probability distribution: Toothache, Cavity, Catch For any proposition φ, sum the atomic events where it is true: P(φ) = Σω:ω╞φ P(ω) [Q] P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2 [Q] P(cavity toothache) = ??? [Q] P(cavity toothache) = ??? Uncertainty 22 Start with the joint probability distribution: Toothache, Cavity, Catch Can also compute conditional probabilities: Very important! P(cavity | toothache) = P(cavity toothache) P(toothache) = 0.016 + 0.064 0.108 + 0.012 + 0.016 + 0.064 = 0.4 [Q] P(cavity | toothache) = ??? P(cavity | toothache) = ??? [Q] P(cavity | catch) = ??? Uncertainty 23 Denominator can be viewed as a normalization constant α P(Cavity | toothache) = P(Cavity, toothache) / P(toothache) = α, P(Cavity, toothache) = α, [P(Cavity, toothache, catch) + P(Cavity, toothache, catch)] = α, [<0.108, 0.016> + <0.012, 0.064>] = α, <0.12, 0.08> = <0.6, 0.4> [Q] What is α? General idea: compute distribution on query variable by fixing evidence variables and summing over hidden variables Uncertainty 24 Topics Obvious problems: 1. 2. 3. Worst-case time complexity O(dn) where d is the largest arity Space complexity O(dn) to store the joint distribution How to find the numbers for O(dn) entries? In other words, too much. [Q] How to reduce? Uncertainty 25 5. Independence & Bayes’ Rule 1. 2. 3. Independence Conditional independence Bayes’ rule Uncertainty 26 Independence 5. A and B are independent iff P(A|B) = P(A) or P(B|A) = P(B) or P(A,B) = P(A) P(B) P(Toothache, Catch, Cavity, Weather) = P(Toothache, Catch, Cavity) × P(Weather) In the above example, 32 (2*2*2*4) entries reduced to 12 (2*2*2 + 4). Absolute independence powerful but rare. Dentistry is a large field with hundreds of variables, none of which are independent. [Q] What to do? Uncertainty 27 Conditional Independence If I have a cavity, the probability that the probe catches in it doesn't depend on whether I have a toothache: (1) P(catch | toothache, cavity) = P(catch | cavity) The same independence holds if I haven't got a cavity: (2) P(catch | toothache,cavity) = P(catch | cavity) Catch is conditionally independent of Toothache given Cavity: P(Catch | Toothache, Cavity) = P(Catch | Cavity) When Catch is conditionally independent of Toothache given Cavity, equivalent statements: P(Toothache | Catch, Cavity) = P(Toothache | Cavity) P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity) (similarly P(A, B) = P(A) P(B) when A and B are independent) Uncertainty 28 Cavity Write out full joint distribution using chain rule: P(Toothache, Catch, Cavity) Toothache Catch = P(Toothache | Catch, Cavity) P(Catch, Cavity) by the product rule = P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity) by ??? = P(Toothache | Cavity) P(Catch | Cavity) P(Cavity) by ??? 2 * 2 * 2 = 8 -> 2 + 2 + 1 How ??? [Q] P(toothache) from the tables below??? = P(tcavity) + P(tcavity) = P(t|c) P(c) + P(t|c) P(c) = ??? [Q] P(t|cavity) = ??? = 1 – P(t|c) = ??? ? toothache | cavity toothache | cavity catch | cavity catch | cavity ×(.108+.012) ×(.016+.064) ×(.108+.072) ×(.016+.144) Uncertainty cavity .108+.012+.072+.008 29 5. Write out full joint distribution using chain rule: P(Toothache, Catch, Cavity) = P(Toothache | Catch, Cavity) P(Catch, Cavity) = P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity) = P(Toothache | Cavity) P(Catch | Cavity) P(Cavity) In most cases, the use of conditional independence reduces the size of the representation of the joint distribution from exponential in n to linear in n. Conditional independence is our most basic and robust form of knowledge about uncertain environments. Uncertainty 30 Bayes' Rule Product rule P(ab) = P(a | b) P(b) = P(b | a) P(a) Bayes' rule, or also called Bayes’ Theorem: P(a | b) = P(b | a) P(a) / P(b) or in distribution form P(Y|X) = P(X|Y) P(Y) / P(X) = α P(X|Y) P(Y) Useful for assessing diagnostic probability from causal probability: P(Cause|Effect) = P(Effect|Cause) P(Cause) / P(Effect) The stiff neck is one symptom of meningitis; the probability of having meningitis with stiff neck? A car won’t start. What is the probability of having a bad starter? What is the probability of having a bad battery? Uncertainty 31 Useful for assessing diagnostic probability from causal probability: P(Cause|Effect) = P(Effect|Cause) P(Cause) / P(Effect) [Q] P(cavity | toothache) ? [Q] E.g., let M be meningitis, S be stiff neck; the stiff neck is one symptom of meningitis; the probability of having meningitis with stiff neck? P(s) = 0.1; P(m) = 0.0001; P(s|m) = 0.8 P(m|s) = ??? How to find P(s|m), P(m) and P(s) ??? Note: posterior probability of meningitis still very small! Uncertainty 32 5. Bayes' rule and conditional independence P(Cavity | toothache catch) = α P(toothache catch | Cavity) P(Cavity) = α P(toothache | Cavity) P(catch | Cavity) P(Cavity) Topics This is an example of a naïve Bayes model: P(Cause, Effect1, … , Effectn) = P(Cause) πi P(Effecti|Cause) We will use this later again! Uncertainty 33 6. Summary Topics Probability is a rigorous formalism for uncertain knowledge Joint probability distribution for discrete random variables specifies probability of every atomic event Queries can be answered by summing over atomic events For nontrivial domains, we must find a way to reduce the joint size Independence and conditional independence provide the tools Uncertainty 34