Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Course Year : T0273 EXPERT SYSTEMS : 2012 Reasoning Under Uncertainty Session 4 Lecture Outline • Introduction to Frame • Learn the meaning of uncertainty and explore some theories designed to deal with it • Find out what types of errors can be attributed to uncertainty and induction • Learn about classical probability, experimental, and subjective probability, and conditional probability • Explore hypothetical reasoning and backward induction Bina Nusantara University Frames • One type of schema that has been used in many AI applications is the frame. • Another type of schema is script, which is essentially a time-ordered sequence of frames. • The basic characteristic of a frame is that it represents related knowledge about a narrow subject that has much default knowledge. • The frame contrasts with the semantic net, which is generally used for broad knowledge representation. • A frame is analogous to a record structure in a high-level language such as C or an atom with its property list in LISP. T0273 - Expert Systems 4 Frames • A frame is basically a group of slots and fillers that defines a stereotypical object. • Example: A car frame Slots Fillers manufacturer model year transmission engine tires color General Motors Chevrolet Caprice 1979 automatic gasoline 4 Blue T0273 - Expert Systems 5 Frames • Frames are generally designed to represent either generic or specific knowledge. • Example: A Generic Frame for Property Slots Fillers name specialization_of types property a_kind_of_object (car, boat, house) if-added: Procedure ADD_PROPERTY default: government if-needed: Procedure FIND_OWNER (home, work, mobile) (missing, poor, good) (yes, no) owner location status under_warranty T0273 - Expert Systems 6 How to Expert Systems Deal with Uncertainty? • Expert systems provide an advantage when dealing with uncertainty as compared to decision trees. • With decision trees, all the facts must be known to arrive at an outcome. • Probability theory is devoted to dealing with theories of uncertainty. • There are many theories of probability – each with advantages and disadvantages. Bina Nusantara University What is Uncertainty • Uncertainty is essentially lack of information to formulate a decision. • Uncertainty may result in making poor or bad decisions. • As living creatures, we are accustomed to dealing with uncertainty – that’s how we survive. • Dealing with uncertainty requires reasoning under uncertainty along with possessing a lot of common sense. Bina Nusantara University Theories for Uncertainity • • • • • • Bina Nusantara University Bayesian Probability Hartley Theory Shannon Theory Dempster-Shafer Theory Markov Models Zadeh’s Fuzzy Theory Dealing with Uncertainty • Deductive reasoning – deals with exact facts and exact conclusions • Deduction proceeds from the general to the specific All men are mortal Socrates is a man -> Socrates is mortal • Induction goes from the specific to the general Bina Nusantara University My disk has never crashed -> My disk will never crash Type of Errors Many different types of error can contribute to uncertainty Bina Nusantara University Conditional Probabilities • The probability of an event A occurring, given that event B has already occurred is called conditional probability Bina Nusantara University Types of error Many different types of errors can contribute to uncertainty. 1. data might be missing or unavailable 2. data might be ambiguous or unreliable due to measurement errors 3. the representation of data may be imprecise or inconsistent 4. data may just be user's best guess (random) 5. data may be based on defaults, and defaults may have exceptions Bina Nusantara University 13 Reasoning Under Uncertainty Given these sources of errors, most knowledge base systems incorporate some form of uncertainty management. There are three issues to be considered: 1. How to represent uncertain data. 2. How to combine two or more pieces of uncertain data. 3. How to draw inference using uncertain data Reasoning Under Uncertainty Errors and Induction Deduction is going from general to specific All men are mortal Socrates is a man therefore Socrates is mortal Induction tries to generalize from the specific. My disk has never crashed. Inductive therefore my disk will never crash. Reasoning Under Uncertainty Inductive arguments can never be proven correct (except mathematical induction). Inductive arguments can provide some degree of confidence that the conclusion is correct. Deductive errors or fallacies may also occur If p implies q q is true therefore p Example If the valve is in good condition then the output is normal The output is normal Therefore, the valve is in good condition Uncertainty is major problem in knowledge elicitation, especially when the expert's knowledge must be quantized in rules. Approaches in Dealing with Uncertainty Numerically oriented methods: • • • • Bayes’ Rules Certainty Factors Dempster Shafer Fuzzy Sets Quantitative approaches • Non-monotonic reasoning Symbolic approaches • Cohen’s Theory of Endorsements • Fox’s semantic systems Classical Probability • This is also called a priori probability. It is assumed that all possible events are known and that each event is equally likely to happen (rolling a die). • Prior or unconditional probability is the one before the evidence is received. • Posterior or conditional probability is the one after the evidence is received. Theory of Probability Experimental or Subjective Probabilities • In contrast to the prior approach, experimental probability defines the probability of an event P(E) as the limit of a frequency distribution. P(E) =[ lim (N-> infinity)] f(E)/N This type of probability is called a posterior probability. A subjective probability is a belief or opinion expressed as a probability rather than a probability based on axioms or empirical measurements. This is applied on the decisions for non-repeatable events. Theory of Probability Compound Probabilities • What is the probability of rolling a die with an outcome of even number divisible by 3. Event A= {2, 4, 6} Event B = {3, 6} A B 6 P( A B) n( A B ) 1 n( s ) 6 P( A B) P( A) P( B) Theory of Probability The two events are called stochastically independent events if and only if the above formula is true. Stochastic is a Greek word meaning "guess". It is commonly used as a synonym for probability, random or chance. The probability of rolling a die with an outcome of even number or divisible by 3. P( A B) P( A) P( B) P( A B) = 3/6 + 2/6 – 1/6 = 4/6 Theory of Probability Conditional Probabilities • The probability of an event A, given that event B occurred, is called a conditional probability and indicated by P(A|B). P( A B) P( A | B) forP( B) 0 P( B) P( A | B) P( B) P( A B) Baye's Theorem Baye's Theorem in terms of events E, and hypothesis, P( E H i ) P( H i | E ) P( E H j ) j P( E | H i ) P( H i ) P( E | H j ) P( H j ) j P( E | H i ) P( H i ) P( E ) Baye's Theorem The conditional probability, P(A|B), states the probability of event A given that event B occurred. The inverse problem is to find the inverse probability which states the probability of an earlier event given that a later one occurred. Example: Probability of chosing brand X given it has crashed. This is inverse or posterior probability. Example Table below shows hypothetical disk crashes using a brand X drive within one year X X’ Crash C No Crash C’ 0.6 0.2 0.1 0.1 0.7 0.3 Total of Columns 0.8 0.2 1.0 Total of Rows P(C|X) = ? P(C|X) = P(C X) / P(X) = 0.6 / 0.8 = 0.75 P(C|X’) = P(C X’) / P(X’) = 0.1 / 0.2 = 0.50 P(X|C) = ? P(X|C) = P(C X) / P(C) = 0.6 / 0.7 = 6/7 P(X|C) = P(C|X) P(X) / P(C) = 0.75 * 0.8 / 0.7 = 0.6 / 0.7 Example Suppose, statistics show that Brand X drive crashes with a probability of 75% within one year and non-Brand X drive crash within one year is 50%. The inverse question is, what is the probability of a crashed drive being brand X or non-brand X. Reasoning with Certainty Factors During the development of MYCIN, researchers developed certainty factors formalism for the following reasons: • The medical data lacks large quantities of data and/or the numerous approximations required by Bayes' theorem. • There is a need to represent medical knowledge and heuristics explicitly, which can not be done when using probabilities. • Physicians reason by capturing evidence that supports or denies a particular hypothesis. Certainty Factor (CF) Formalism Eg of MYCIN rule IF the stain of the organism is gram pos AND the morphology of the organism is coccus AND the growth of the organism is chains THEN there is evidence that the organism is streptococcus CF 0.7 Given the evidence a doctor only partially believe the conclusion • General Form IF E1 And E2 ….THEN H CF = Cfi where E= evidence & H is the conclusion Certainty Factor (CF) Formalism • A measure of belief, MB(h, e) indicates the degree to which our belief in hypothesis, h, is increased based on the presence of evidence, e • A measure of disbelief, MD(h, e), indicates the degree to which our disbelief in hypothesis, h, is increased based on the presence of evidence, e. When p(h | e) = 0 MB(h, e) = 0 MD(h, e) = 1 p(h | e) = 1 MB(h, e) = 1 MD(h, e) = 0 Certainty Factor (CF) Formalism CF interpretation Certainty Factor (CF) Formalism Propagation of Certainty Factors When there are two or more rules supporting the same conclusion CFs are propagated as follows: CFrevised = CFold + CFnew(1 - CFold) if both CFold and CFnew > 0 = CFold + CFnew(1 + CFold) if both CFold and CFnew < 0 = otherwise Certainty Factor Example In a murder trial the defendant is being accused of a first degree murder (hypothesis).The jury must balance the evidences presented by the prosecutor and the defense attorney to decide if the suspect is guilty. RULE001 IFthe defendant's fingerprints are on the weapon, THEN the defendant is guilty. CF=0.75 RULE002 IFthe defendant has a motive, THEN the defendant is guilty. CF=0.60 RULE003 IFthe defendant has a alibi, THEN he is not guilty. CF=-.80 Certainty Factor Example We start with CF = 0.0 for the defendant being guilty. • After submission of the evidence 1 (fingerprints on the weapon) CFcomb1 = CF rule1's conclusin * CF evid1 = 0.75 * 0.90 = 0.675 CF revised = CF old + CF new * (1 - CFold) = 0.0 + 0.675(1-0.0) = 0.675 Example of CFs Propagation CFcon1=CFnew=0.675 CFold=0.0 Guilty CFnew=0.675 Guilty CF = 0.0 CFrevised=0.675 fingerprints on weapon CFevid1=0.90 CFrule1=0.75 CFrevised=CFold + CFnew*(1-CFold) =0.0 + 0.675*(1-0.0) =0.675 RULE 1. IF the defendant’s fingerprints are on the weapon THEN the defendant is guilty CFcon1=CFevid1*CFrule1 (single premise rule) =0.9*0.75 =0.675 Certainty Factor Example The defendant’s mother in law says that he had the motive for slaying CFnew = CFcomb2 = CF rule2's conclusin * CF evid2 = 0.60 * 0.50 = 0.30 CF revised = CF old + CF new * (1 - CFold) = 0.675 + 0.30(1-0.675) = 0.7725 CFcon2=CFnew=0.30 Motive exists CFevid2=0.50 CFrule2=0.60 CFold=0.675 Guilty CFnew=0.30 Guilty CFrevised=0.772 CFrevised=0.675 CFrevised=CFold + CFnew*(1-CFold) =0.675 + 0.30*(1-0.675) =0.7725 RULE 2. IF the defendant has a motive THEN the defendant is guilty of the crime CFcon2=CFevid2*CFrule2 =0.50*0.60 =0.30 (single premise rule) Certainty Factor Example A respected judge witnesses for alibi, so a cf of 0.95 is assigned for this evidence CFcomb3 = CF rule3's conclusin * CF evid3 = 0.95 * (-0.80) = -0.76 CFrevised = = (0.7725 - 0.76) / (1 - 0.76) = 0.052 CFcon3=CFnew=-0.76 Guilty CFrevised=0.772 CFold=0.772 CFnew=-0.76 Guilty CFrevised=0.052 CFold CFnew CFreviced= 1 min(| CFold |, | CFnew | ) Alibi found CFevid3=0.95 CFrule3= -0.80 RULE 3. IF the defendant has an alibi THEN he is not guilty CFcon3=CFevid3*CFrule3 =0.95*(-0.80) = -0.76 = (0.772-0.76)/(1-0.76) = 0.052 Certainty Factor Example Confidence Factor in guilty verdict after introduction of all evidences is: Advantages of Certainty Factors • It is a simple computational model that permits experts to estimate their confidence in conclusions being drawn. • It permits the expression of belief and disbelief in each hypothesis, allowing the expression of the effect of multiple sources of evidence. • It allows knowledge to be captured in a rule representation while allowing the quantification of uncertainty. • The gathering of the CF values is significantly easier than the gathering of values for the other methods. No statistical base is required – you merely have to ask the expert for the values. Difficulties Deep Inference Chains If we have a chain of inference such as: IF A IF B THEN B THEN C CF=0.8 CF= 0.9 Then because of the multiplication of CFs the resulting CF decreases. For example if CF(A) = 0.8, then CF(C) = .8*.8*.9 = .58 With long chain of inferences the final CF may become very small Difficulties Many Rules with same Conclusion The more rules with the same conclusion the higher the CF value. If there are many rules then CF can become artificially high. Difficulties Conjunctive Rules If a rule has a number of conjunctive premises, overall CF may be reduced too much. IF sky dark AND temperature dropping THEN will rain 0.9 If CF(sky dark) = 1, CF(temperature dropping) = .1 then CF(will rain) = min(1, .1)*.9 = .09 whereas if we had IF the sky dark THEN will rain 0.7 IF temperature dropping THEN will rain 0.5 CF1 = 1 * .7 = 0.7, CF2 = .1 * .5 = 0.05 CF (will rain) = .7 + .05*(1 - .7) = .7 + 0.015 = .715 Fuzzy Logic In everyday speech we use vague or imprecise terms to describe properties. Fuzzy logic was developed by Zadeh to deal with these imprecise values in a mathematical way. Fuzzy Logic • It will allow us to deal with fuzzy rules IF the temperature is cold THEN the motor speed stops IF speed is slow THEN make acceleration high. Fuzzy Sets • In ordinary set theory, an element from the domain is either in a set or not in a set. • In fuzzy sets, a number in the range 0-1 is attached to an element – the degree to which the element belongs to the set. • A value of 1 means the element is definitely in the set • A value of 0 means the element is definitely not in the set • Other values are grades of membership. • Formally a fuzzy set A from X is given by its membership function which has type A : X [0, 1] Fuzzy Sets Fuzzy set of small men Small men – Simpler Curve Fuzzy Sets • The following figure shows the representation of three fuzzy sets for small, medium and tall men. We see that a man of height 4.8 feet is considered both small and medium to some degree. Boolean Operations The Boolean operations of union, intersection, and complement can be defined in the straightforward manner. Complement The operation is A (x) = 1 - A (x) Boolean Operations Intersection The intersection of two fuzzy sets A and B is given by AB (x) = min({A (x), B (x)}) Union The union of two fuzzy sets A and B is given by AB (x) = max({A (x), B (x)}) Fuzzy Reasoning In this section, fuzzy rules and how inference is performed on these rules is presented. This will be illustrated by a fuzzy system used to control an air conditioner. The variables to be used (with fuzzy values) are temperature (of the room) and speed (of the fan motor). Fuzzy Reasoning The rules are given as follows: • IF the temperature is cold THEN motor speed stops • IF the temperature is cool THEN motor speed slows Temperature Fuzzy Sets • IF the temperature is just right THEN motor speed medium • IF the temperature is warm THEN motor speed fast • IF the temperature is hot THEN motor speed blast Speed Fuzzy Sets Fuzzy Reasoning • In a fuzzy system all the rules fire in parallel, although in the end many will not contribute to the output. • What we need to determine, in the above system is, given a particular value of the temperature how do we calculate the motor speed. Fuzzy Reasoning • Now, the temperature can be measured fairly accurately, but it will lie in several fuzzy sets. For example if the temperature were 17C then from the figure we see that it is about 25% cool and 80% just right. Fuzzy Reasoning • This means that rules 2 and 3 will contribute to the output speed of the motor. • The fuzzy set for the output can be calculated by multiplying the slow graph by .25 and the medium graph by .80 assuming the contribution is proportional to the fuzzy values of the input temperature References Textbooks: • Joseph Giarratano, Gary Riley,. 2005. Expert systems : principles and programming. THOCO. Australia. ISBN:0534-38447-1 • Stuart Russell Peter Norvig. 2010. Artificial Intelligence, A Modern Approach. PE. New Jersey. ISBN:978-0-13207148-2 Web : • www.widodo.com Bina Nusantara University