Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
History of randomness wikipedia , lookup
Indeterminism wikipedia , lookup
Infinite monkey theorem wikipedia , lookup
Probability box wikipedia , lookup
Birthday problem wikipedia , lookup
Dempster–Shafer theory wikipedia , lookup
Ars Conjectandi wikipedia , lookup
Bayes Classification Uncertainty • Our main tool is the probability theory, which assigns to each sentence numerical degree of belief between 0 and 1 • It provides a way of summarizing the uncertainty 5/3/2017 H.Malekinezhad 2 Variables • Boolean random variables: cavity might be true or false • Discrete random variables: weather might be sunny, rainy, cloudy, snow • • • • P(Weather=sunny) P(Weather=rainy) P(Weather=cloudy) P(Weather=snow) • Continuous random variables: the temperature has continuous values 5/3/2017 H.Malekinezhad 3 Where do probabilities come from? • Frequents: • From experiments: form any finite sample, we can estimate the true fraction and also calculate how accurate our estimation is likely to be • Subjective: • Agent’s believe • Objectivist: • True nature of the universe, that the probability up heads with probability 0.5 is a probability of the coin 5/3/2017 H.Malekinezhad 4 • Before the evidence is obtained; prior probability • P(a) the prior probability that the proposition is true • P(cavity)=0.1 • After the evidence is obtained; posterior probability • P(a|b) • The probability of a given that all we know is b • P(cavity|toothache)=0.8 5/3/2017 H.Malekinezhad 5 Axioms of Probability (Kolmogorov’s axioms, first published in German 1933) • All probabilities are between 0 and 1. For any proposition a 0 ≤ P(a) ≤ 1 • P(true)=1, P(false)=0 • The probability of disjunction is given by P(a b) P(a) P(b) P(a b) 5/3/2017 H.Malekinezhad 6 • Product rule P(a b) P(a | b) P(b) P(a b) P(b | a ) P(a ) 5/3/2017 H.Malekinezhad 7 Theorem of total probability If events A1, ... , An are mutually exclusive with 5/3/2017 then H.Malekinezhad 8 Bayes’s rule • (Reverent Thomas Bayes 1702-1761) • He set down his findings on probability in "Essay Towards Solving a Problem in the Doctrine of Chances" (1763), published posthumously in the Philosophical Transactions of the Royal Society of London P(a | b)P(b) P(b | a) P(a) 5/3/2017 H.Malekinezhad 9 Diagnosis • What is the probability of meningitis in the patient with stiff neck? • A doctor knows that the disease meningitis causes the patient to have a stiff neck in 50% of the time -> P(s|m) • Prior Probabilities: • That the patient has meningitis is 1/50.000 -> P(m) • That the patient has a stiff neck is 1/20 -> P(s) P(s | m)P(m) P(m | s) P(s) 5/3/2017 0.5* 0.00002 P(m | s) 0.0002 H.Malekinezhad 0.05 10 Bayes Theorem • P(h) = prior probability of hypothesis h • P(D) = prior probability of training data D • P(h|D) = probability of h given D • P(D|h) = probability of D given h 5/3/2017 H.Malekinezhad 11 Choosing Hypotheses • Generally want the most probable hypothesis given the training data • Maximum a posteriori hypothesis hMAP: 5/3/2017 H.Malekinezhad 12 5/3/2017 H.Malekinezhad 13 • If assume P(hi)=P(hj) for all hi and hj, then can further simplify, and choose the • Maximum likelihood (ML) hypothesis 5/3/2017 H.Malekinezhad 14 Example • Does patient have cancer or not? A patient takes a lab test and the result comes back positive. The test returns a correct positive result (+) in only 98% of the cases in which the disease is actually present, and a correct negative result (-) in only 97% of the cases in which the disease is not present Furthermore, 0.008 of the entire population have this cancer 5/3/2017 H.Malekinezhad 15 Suppose a positive result (+) is returned... 5/3/2017 H.Malekinezhad 16 Normalization 0.0078 0.20745 0.0078 0.0298 0.0298 0.79255 0.0078 0.0298 • The result of Bayesian inference depends strongly on the prior probabilities, which must be available in order to apply the method 5/3/2017 H.Malekinezhad 17 Brute-Force Bayes Concept Learning • For each hypothesis h in H, calculate the posterior probability • Output the hypothesis hMAP with the highest posterior probability 5/3/2017 H.Malekinezhad 18 Bayes optimal Classifier A weighted majority classifier • What is he most probable classification of the new instance given the training data? • The most probable classification of the new instance is obtained by combining the prediction of all hypothesis, weighted by their posterior probabilities • If the classification of new example can take any value vj from some set V, then the probability P(vj|D) that the correct classification for the new instance is vj, is just: 5/3/2017 H.Malekinezhad 19 Bayes optimal classification: 5/3/2017 H.Malekinezhad 20 Gibbs Algorithm • Bayes optimal classifier provides best result, but can be expensive if many hypotheses • Gibbs algorithm: • Choose one hypothesis at random, according to P(h|D) • Use this to classify new instance 5/3/2017 H.Malekinezhad 21 • Suppose correct, uniform prior distribution over H, then • Pick any hypothesis at random.. • Its expected error no worse than twice Bayes optimal 5/3/2017 H.Malekinezhad 22 A single cause directly influence a number of effects, all of which are conditionally independent n P(cause, effect1 , effect2 ,...effectn ) P(cause) P(effecti | cause) i 1 5/3/2017 H.Malekinezhad 23 Naive Bayes Classifier • Along with decision trees, neural networks, nearest nbr, one of the most practical learning methods • When to use: • Moderate or large training set available • Attributes that describe instances are conditionally independent given classification • Successful applications: • Diagnosis • Classifying text documents 5/3/2017 H.Malekinezhad 24 Naive Bayes Classifier • Assume target function f: X V, where each instance x described by attributes a1, a2 .. an • Most probable value of f(x) is: 5/3/2017 H.Malekinezhad 25 vNB • Naive Bayes assumption: • which gives 5/3/2017 H.Malekinezhad 26 Naive Bayes Algorithm • For each target value vj • estimate P(vj) • For each attribute value ai of each attribute a • estimate P(ai|vj) 5/3/2017 H.Malekinezhad 27 Example • Example: Play Tennis 5/3/2017 H.Malekinezhad 28 Example • Learning Phase Outlook Play=Yes Play=No Temperature Play=Yes Play=No Sunny 2/9 4/9 3/9 3/5 0/5 2/5 Hot 2/9 4/9 3/9 2/5 2/5 1/5 Overcast Rain Humidity High Normal Mild Cool Play=Yes Play=No 3/9 6/9 4/5 1/5 Play=Yes Play=No Strong 3/9 6/9 3/5 2/5 Weak P(Play=Yes) = 9/14 5/3/2017 Wind P(Play=No) = 5/14 H.Malekinezhad 29 Example • Test Phase – Given a new instance, x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong) – Look up tables – P(Outlook=Sunny|Play=Yes) = 2/9 P(Outlook=Sunny|Play=No) = 3/5 P(Temperature=Cool|Play=Yes) = 3/9 P(Temperature=Cool|Play==No) = 1/5 P(Huminity=High|Play=Yes) = 3/9 P(Huminity=High|Play=No) = 4/5 P(Wind=Strong|Play=Yes) = 3/9 P(Wind=Strong|Play=No) = 3/5 P(Play=Yes) = 9/14 P(Play=No) = 5/14 MAP rule P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053 P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206 Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”. 5/3/2017 H.Malekinezhad 30 Naïve Bayesian Classifier: Comments • Advantages : • Easy to implement • Good results obtained in most of the cases • Disadvantages • Assumption: class conditional independence , therefore loss of accuracy • Practically, dependencies exist among variables • E.g., hospitals: patients: Profile: age, family history etc Symptoms: fever, cough etc., Disease: lung cancer, diabetes etc • Dependencies among these cannot be modeled by Naïve Bayesian Classifier • How to deal with these dependencies? • Bayesian Belief Networks 5/3/2017 H.Malekinezhad 31