* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Uncertainty
Survey
Document related concepts
Transcript
Uncertainty Chapter 13 Copyright, 1996 © Dale Carnegie & Associates, Inc. Uncertainty Evolution of an intelligent agent: problem solving, planning, uncertainty Dealing with uncertainty is an unavoidable problem in reality. An agent must act under uncertainty. To make decision with uncertainty, we need Probability theory Utility theory Decision theory CSE 471/598, CBS 598 by H. Liu 2 Sources of uncertainty No access to the whole truth No categorical answer Incompleteness The qualification problem - impossible to explicitly enumerate all conditions Incorrectness of information about conditions The rational decision depends on both the relative importance of various goals and the likelihood of its being achieved. CSE 471/598, CBS 598 by H. Liu 3 Handling uncertain knowledge Difficulties in using FOL to cope with UK A dental diagnosis system using FOL Symptom (p, Toothache) =>Disease (p, Cavity) Disease (p, Cavity) => Symptom (p, Toothache) Are they correct? Reasons for handling uncertain knowledge Laziness - too much work! How to avoid it? Theoretical ignorance - we don’t know everything Practical ignorance - we don’t want to include all Represent UK with a degree of belief The tool for handling UK is probability theory CSE 471/598, CBS 598 by H. Liu 4 Probability provides a way of summarizing the uncertainty that comes from our laziness and ignorance - how wonderful it is! Probability, belief of the truth of a sentence 1 - true, 0 - false, 0<P<1 - intermediate degrees of belief in the truth of the sentence Degree of truth (fuzzy logic) vs. degree of belief Alternatives to probability theory? Yes, to be discussed in later chapters. CSE 471/598, CBS 598 by H. Liu 5 All probability statements must indicate the evidence w.r.t. which the probability is being assessed. Prior or unconditional probability before evidence is obtained Posterior or conditional probability after new evidence is obtained CSE 471/598, CBS 598 by H. Liu 6 Uncertainty & rational decisions Without uncertainty, decision making is simple achieving the goal or not With uncertainty, it becomes uncertain - three plans A90, A120 and A1440 We first need to have preferences between the different possible outcomes of the plans Utility theory is used to represent and reason with preferences. CSE 471/598, CBS 598 by H. Liu 7 Rationality Decision Theory = Probability T + Utility T Maximum Expected Utility Principle defines rationality An agent is rational if and only if it chooses the action that yields the highest utility, averaged over all possible outcomes of the action A decision-theoretic agent (Fig 13.1, p 466) Is it any different from other agents we learned? CSE 471/598, CBS 598 by H. Liu 8 Basic probability notation Prior probability Proposition - P(Sunny) Random variable - P(Weather=Sunny) Boolean, discrete, and continuous random variables Each RV has a domain (sunny,rain,cloudy,snow) Probability distribution P(weather) = <.7,.2,.08,.02> Joint probability P(A^B) probabilities of all combinations of the values of a set of Random Variables more later CSE 471/598, CBS 598 by H. Liu 9 Conditional probability Conditional probability P(A|B) = P(A^B)/P(B) Product rule - P(A^B) = P(A|B)P(B) Probabilistic inference does not work like logical inference “P(A|B)=0.6” != “when B is true, P(A) is 0.6” P(A) is always a prior. For P(A|B), B is the only available evidence. When C is available, P(A|B,C) may have little relation to P(A|B). CSE 471/598, CBS 598 by H. Liu 10 The axioms of probability All probabilities are between 0 and 1 Necessarily true (valid) propositions have probability 1, false (unsatisfiable) 0 The probability of a disjunction P(AvB)=P(A)+P(B)-P(A^B) A Venn diagram illustration Ex: Deriving the rule of Negation from P(a v !a) CSE 471/598, CBS 598 by H. Liu 11 The joint probability distribution Joint completely specifies an agent’s probability assignments to all propositions in the domain A probabilistic model consists of a set of random variables (X1, …,Xn). An atomic event is an assignment of particular values to all the variables Given Boolean random variables A and B, what are atomic events? CSE 471/598, CBS 598 by H. Liu 12 Joint probabilities An example of two Boolean variables Toothache Cavity !Cavity 0.04 0.01 !Toothache 0.06 0.89 • Observations: mutually exclusive and collectively exhaustive • What are P(Cavity), P(Cavity v Toothache), P(Cavity|Toothache)? CSE 471/598, CBS 598 by H. Liu 13 Joint (2) If there is a Joint distribution, we can read off any probability we need. Is it true? How? Discuss next Impractical to specify all the entries for the Joint over n Boolean variables. Sidestep the Joint and work directly with conditional probability CSE 471/598, CBS 598 by H. Liu 14 Inference using full joint distributions Marginal probability (Fig 13.3) P(cavity) = Maginalization – summing out all the variables other than cavity P(Y) = z P(Y,z) Conditioning – a variant of maginalization using the product rule P(Y) = z P(Y|z)P(z) CSE 471/598, CBS 598 by H. Liu 15 Normalization Method 1 using P(t) to nomalize P(cavity|toothache) = P(c^t)/P(t) P(!cavity|toothache) = P(!c^t)/P(t) Method 2 using α and leaving P(t) out P(t) is the same P(cavity|toothache) = αP(cavity,toothache) = α[P(cavity, T, catch) + P(cavity, T, !catch)] = α[<0.108,0.016> + <0.012,0.064>] = α<0.12,0.08> What is α? CSE 471/598, CBS 598 by H. Liu 16 Independence P(toothache, catch, cavity, weather) A total of 32 entries, given W has 4 values How is one’s tooth problem related to weather? P(T,Ch,Cy,W=cloudy) = P(W=Clo|T...)P(T…)? Whose tooth problem can influence our weather? P(W=Clo|T…) = P(W=Clo) Hence, P(T,Ch,Cy,W=clo) = P(W=Clo)P(T…) How many joint distribution tables? Two - (4, 8) Independence between X and Y means P(X|Y) = P(X) or P(Y|X) = P(Y) or P(XY) = P(X)P(Y) CSE 471/598, CBS 598 by H. Liu 17 Bayes’ rule Deriving the rule via the product rule P(B|A) = P(A|B)P(B)/P(A) A more general case is P(X|Y) = P(Y|X)P(X)/P(Y) Bayes’ rule conditionalized on evidence E P(X|Y,E) = P(Y|X,E)P(X|E)/P(Y|E) Applying the rule to medical diagnosis meningitis (P(M)=1/50,000)), stiff neck (P(S)=1/20), P(S|M)=0.5, what is P(M|S)? Why is this kind of inference useful? CSE 471/598, CBS 598 by H. Liu 18 Applying Bayes’ rule Relative likelihood Comparing the relative likelihood of meningitis and whiplash, given a stiff neck, which is more likely? P(M|S)/P(W|S) = P(S|M)P(M)/P(S|W)P(W) Avoiding direct assessment of the prior P(M|S) =? P(!M|S) =? And P(M|S) + P(!M|S) = 1, P(S) = ? P(S|!M) = ? P (S) = m P(Sm) = m P(S|m) P(m) = P(S|M)P(M) + P(S|!M)P(!M) CSE 471/598, CBS 598 by H. Liu 19 Using Bayes’ rule Combining evidence from P(Cavity|Toothache) and P(Cavity|Catch) to P(Cavity|Toothache,Catch) Bayesian updating from P(Cavity|T)=P(Cavity)P(T|Cavity)/P(T) P(A|B) = P(B|A)P(A)/P(B) to P(Cavity|T,Catch)=• P(Catch|T,Cavity)/P(Catch|T) P(A|B,C) = P(B|A,C)P(A|C)/P(B|C) CSE 471/598, CBS 598 by H. Liu 20 Recall that independent events A, B P(B|A)=P(B), P(A|B)=P(A), P(A,B)=P(A)P(B) Conditional independence (X and Y are ind given Z) P(X|Y,Z)=P(X|Z) and P(Y|X,Z)=P(Y|Z) P(XY|Z)=P(X|Z)P(Y|Z) derived from absolute indepence Given Cavity, Toothache and Catch are indpt P(T,Ch,Cy) = P(T,Ch|Cy)P(Cy) = T|Cy T|!Cy P(T|Cy)P(Ch|Cy)P(Cy) One large table is decomposed into 3 smaller tables: 23-1 vs. 5 (= 2*(21-1)+2*(21-1)+(21-1)) Cy CSE 471/598, CBS 598 by H. Liu 21 Independence, decomposition, Naïve Bayes If all n symptoms are conditionally indpt given Cavity, the size of the representation grows as O(n) instead of O(2n) The decomposition of large probabilistic domains into weakly connected subsets via conditional independence is one important development in modern AI Naïve Bayes model (Cause and Effects) P(Cause,E1,…,En) = P(Cause) P(Ei|Cause) An amazingly successful classifier can be built CSE 471/598, CBS 598 by H. Liu 22 Where do probabilities come from? There are three positions: The frequentist - numbers can come only from experiments The objectivist - probabilities are real aspects of the universe The subjectivist - characterizing an agent’s belief The reference class problem – what constitutes the reference class - intrusion of subjectivity A frequentist doctor wants to consider similar patients How similar two patients are? Laplace’s principle of indifference Propositions that are syntactically “symmetric” w.r.t. the evidence should be accorded equal probability CSE 471/598, CBS 598 by H. Liu 23 Summary Uncertainty exists in the real world. It is good (as it allows for laziness) and bad (since we need new tools) Priors, posteriors, and joint Bayes’ rule - the base of Bayesian Inference Conditional independence allows Bayesian updating to work effectively with many pieces of evidence. But ... CSE 471/598, CBS 598 by H. Liu 24