* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Artificial Intelligence Winter 2004
Survey
Document related concepts
Artificial intelligence in video games wikipedia , lookup
Inductive probability wikipedia , lookup
Embodied cognitive science wikipedia , lookup
Technological singularity wikipedia , lookup
Neural modeling fields wikipedia , lookup
Type-2 fuzzy sets and systems wikipedia , lookup
Fuzzy concept wikipedia , lookup
Philosophy of artificial intelligence wikipedia , lookup
History of artificial intelligence wikipedia , lookup
Ethics of artificial intelligence wikipedia , lookup
Intelligence explosion wikipedia , lookup
Fuzzy logic wikipedia , lookup
Existential risk from artificial general intelligence wikipedia , lookup
Transcript
Section 6: Impreciseness CPSC 533 - Artificial Intelligence Michael M. Richter Forms of Impreciseness Impreciness occurs frequently: • A number is only approximatively correct (toerance) • A frequency says nothing about a specific case or is even not exactly known (probability) • The exact number is not of interest (abstraction) • The expression itself has no exact meaning (informal semantics of e.g. like „this is useful“). • Some forms are of objective and others are of subjective character „(the tolerance is 0.2“ or „the weather is nice“) CPSC 533 - Artificial Intelligence Michael M. Richter Forms of Uncertainty and Vagueness (2) We distinguish vagueness which has an objective origin from vagueness which has a mainly subjective character. In an objective situation there is an agreement which has a formal character and a model to which one can refer refer. The informal notion than has a formal original. E.g. “high fever” has as original a certain precise temperature (which may not be known or may not be of interest). In a subjective situation there is no exact original (like “this is stupid”), only a subjective impression is reflected. In order put subjective vagueness on solid foundations the reasoning in the model is replaced by experiments with the individuals who express their subjective opinions. From the expression itself it may not be evident whether it has an objective or a subjective meaning. CPSC 533 - Artificial Intelligence Michael M. Richter Subjective Uncertainty and a Turing Test (1) Suppose there is a partial ordering „<„ with the concept C associated: The partial ordering then again has two versions: formal and subjective. The Turing test refers to these two versions of „<„ : Formal version of C Subjective human version of C To be verified by experiments CPSC 533 - Artificial Intelligence The tests when variations of the arguments of < are presented. Goal: The human says „up“ if and only if the formal system says „up“ Michael M. Richter Subjective Uncertainty and a Turing Test (2) Example: Fuzzy values for “typical lion” Concept to grasp: Typical lion better better Human: Subjective judgement CPSC 533 - Artificial Intelligence Formal version uses Ordering: Quotient of length/height The partial ordering approximates the concept C in the sense that semantics of y < z is : z is more typical for C than y is. Michael M. Richter A Warning Example (1) Reasoning with vague concepts may easily lead to consequences which are inconsistent. Example: We consider two medications A and B which are applied two male and female patients in a hospital. The successes are recorded. There are three predicates introduced with the following semantics: BetterM(X,Y): More successes with medication X applied to men BetterF(X,Y): More successes with X applied to women Better(X,Y): More successes with X in total. Question: Is BetterM(X,Y) BetterF(X,Y) Better(X,Y) true ? This sounds plausible but there is a counterexample. CPSC 533 - Artificial Intelligence Michael M. Richter A Warning Example (2) Men: + - A 20 180 10% Success B 4 96 4% Success + - A 20 20 B 37 63 Total: + - A 40 200 Women : B 41 159 CPSC 533 - Artificial Intelligence 50% Success Better M (A,B) Better F (A,B) 37% Success 16,6% Success 20,5% Success Better(B,A) Michael M. Richter Rough Sets (1): A Basic Method We consider a universe U; the uncertainty is given by a binary relation „“ over U called indiscernability relation. Assumption: is reflexive and symmetric. Idea: We cannot distinguish two objects a and b with a b. We consider some predicate P(x) over U (represented as subsets of U). The relation motivates the following definition: Def.: (i) The lower approximation of P is Pu = { x U | for all y mit x y : y P } (ii) The upper approximation of P is Po = { x U | there is y P mit x y } Elements of Pu are surely and elements of Po are possibly in P. CPSC 533 - Artificial Intelligence Michael M. Richter Rough Sets (2) P x y * * o P P u * x1 * y 1 Here we have x y and x1 y1. The set Po \Pu called the uncertainty area. Because decisions about elements in Pu and elements not in Po are certain the rough set method can be regarded as „to be on the safe side“. CPSC 533 - Artificial Intelligence Michael M. Richter Rough Sets (3) There are basically two types of indiscernability relations: is transitive: Then is an equivalence relation. A typical example occurs in the attribute value representation when two objects cannot be distinguished because the values of some attributes are missing. is not transitive: Then is not an equivalence relation. A typical example is when domains of some attributes are real numbers which cannot be distinguished if the difference is smaller then some threshold (e.g. due to measurement errors). Although the two types have many differences the rough set method applies to both of them. CPSC 533 - Artificial Intelligence Michael M. Richter Fuzzy Sets (1) Fuzzy sets are generalizations of ordinary („crisp“) sets. Suppose U is some (ordinary) set. A fuzzy subset X of U is defined by a function µX : U [0,1] Notation: X f U For y in U µX (y) is called the degree of membership of y to X and µX is the membership function of X. Example: X = Young_Customer, µX(Bill) = 0.5 if Bill is of age 32 This is easily generalized to n-ary relations CPSC 533 - Artificial Intelligence Michael M. Richter Fuzzy Sets (2) In fuzzy logic and set theory many classical notions are generalized. A fuzzy partition of U into n fuzzy subsets is given by membership functions µ1(x),...,µn(x) such that (µi | 1in) = 1. In particular, disjoint means now disjoint to some degree. A fuzzy classifier for such a partition is mapping cf : U [ 0, 1]n such that for cf(x) = (y1(x),...,yn(x)) we have ( yi(x) | 1in) = 1. CPSC 533 - Artificial Intelligence Michael M. Richter Fuzzy Sets (3) Fuzzy equality E is a special fuzzy equivalence relation satisfying (1) E(x,x) = 1, (2) E(x,y) = E(y,x), (3) E(x,y) + E(y,z) -1 E(x,z) A weakening of fuzzy equivalence is a similarity measure sim : SIM f V: = U x U, µSIM (x1, x2): = sim (x1, x2) Similarity measures are generalized measures SIM f V: = A x B, µSIM (a, b): = sim (a, b) We call the pairs (a,b) partners and sim(a,b) the degree of partnership. CPSC 533 - Artificial Intelligence Michael M. Richter Example This is a fuzzy partition: There is an area where high and low fever overlap. Low fever 1 High fever 0 35 36 37 38 39 40 41 42 In the intersection area: What is the decision, high or low ? Rationality principle: The one with highest degree of membership. There is one point where both decisions can be made. CPSC 533 - Artificial Intelligence Michael M. Richter Regular Membership Functions It is not useful to consider arbitrary membership functions. Def.: For T IR a function µ : T IR is called regular if (i) µ is piecewise continuous (ii)If x,y T and [0,1] then µ(x + (1-)y) = min(µ(x), µ(y)) (iii) There is some x in T with µ(x) = 1 (i) is often specialized to “piecewise linear” which makes it computationally more feasible. Certain exceptions should be allowed in order to include the jump from 0 to 1 in classical logic. (ii) excludes several maxima and (iii) postulates the existence of an ideal argument. CPSC 533 - Artificial Intelligence Michael M. Richter Norms and Co-Norms (1) t -norms f(x, y) (intended to compute µA B): Axioms: (T1) f(x, y) = f(y, x) (T2) f(x, f(y, z)) = f(f (x, y), z) (T3) x x' , y y' f(x, y) f(x', y') (T4) f(x, 1) = x Typical t-norms are f(x,y) = min(x,y) or f(x,y) = xy co-t-norms f(x, y) (intended to compute µA B): Axioms: (T1), (T2), (T3) and (T4*) f(x, 0) = x Typical co-t-norms are f(x,y) = max(x,y) or f(x,y) = x+y - xy. CPSC 533 - Artificial Intelligence Michael M. Richter Norms and Co-Norms (2) Consequences: f(x, 0) = 0 for t-norms f(x, 1) = 1 for co-t-norms. Norms and co-norms cannot reflect relative importance (e.g. using weights) There are other fuzzy combination rules available which are fuzzy versions of general Boolean operators like different types of implications. CPSC 533 - Artificial Intelligence Michael M. Richter Negation and Complement A negation is a function neg: [0,1] [0,1] with the axioms (N1) neg(0) = 1, neg(1) = 0 (N2) x y neg(y) neg(x) The negation function is intended to compute the membership function of the complement C(A) of A. A typical example is neg(x) = 1 - x. For this function the de Morgan laws hold: µC(A B)(x) = µC(A) C(B) (x), µC(A B)(x) = µC(A) C(B) (x) if the t-norm min and the co-t-norm max are used. CPSC 533 - Artificial Intelligence Michael M. Richter The Mamdami-Implication We consider implications between literals, i.e.of the form F1 F2Fn Because the literals have often names from natural language such implications are called linguistic rules. Predicate logic interpretation: Predicates, interpreted in a model. Fuzzy logic interpretation: Membership functions over the universe. The membership function in the conjunction is computed using a t-norm. The implication is considered as a conjunction F1 F2Fn CPSC 533 - Artificial Intelligence Michael M. Richter Example Linguistic rule: If temperature low and road narrow then speed slow. Fuzzy representation Actual temperature Inferred membership Actual narrownessfunction for “slow” The result of the implication is a membership function, not a single value. To obtain such a value an operation called defuzzification is needed. CPSC 533 - Artificial Intelligence Michael M. Richter Defuzzification There are three major defuzzification methods which assign to am membership function µ with domain Y an element y Y. We put Max = {y Y "y’Y: µ(y’) µ(y)} 1) Maximum method: Choose an arbitrary y Max 2) Mean-of-Maximum method: Choose y as the mean of Max: y =Max-1* (y’ y’Max) 3) Center-of-Area method: Define y as the element below the center of gravity of the area bounded by the curve µ and the y-axis: y= µ( y )dy * y * µ( y )dy-1 CPSC 533 - Artificial Intelligence Michael M. Richter Linguistic Expressions Linguistic expressions are taken from natural language Something is large, expensive, ...., nice, pale, .... IF the water is very dirty THEN add much soap Ways of formalization: Classical logic: Logical predicates and rules with binary interpretation. A specific object satisfies a predicate or it does not. Fuzzy logic: Fuzzy set with membership function. An object satisfies a predicate to some degree (fuzzification). Rules where the conclusion is an action or a decision: Classical logic: Action can be performed if preconditions are satisfied. Fuzzy logic: The conclusion is the membership function which cannot be performed directly. The defuzzification transforms this into some action, depending on the degrees of truth of the conditions. CPSC 533 - Artificial Intelligence Michael M. Richter Abstract and Linguistic Predicates Abstract Level Abstraction to Abstract classical predicates instantiation defuzzification Concrete Level (Real Data) Abstraction to linguistic predicates with fuzzy interpretations Example (1) Linguistic variables: Variable: Xtemp ; Values: {low, med, high} Membership functions: MF low med high 0.7 0.49 Temp Variable: Xpressure ; Values: {no, low, med, high} Membership functions: MF no low med high 0.3 0.21 0.09 Pressure CPSC 533 - Artificial Intelligence Michael M. Richter Example (2) New fuzzy variable for representing the decision class: Variable: Xclass MF Values: {K1, K2} 1 Rules R1: OR R2: OR R3: OR K1 K2 IF XTemp is high XPressure is no THEN IF XTemp is low XPressure is high THEN IF XTemp is high XPressure is high THEN Class XClass is K1 XClass is K1 XClass is K2 Actual query: X*Temp is med, X*Pressure is low CPSC 533 - Artificial Intelligence Michael M. Richter Example (3) - Application of rule R1: maxx min { mTemp,high(x) , mTemp,med(x) } = 0.49 maxx min { mPressure,no(x) , mPressure,low(x) } = 0.21 - Fuzzy for precondition of rule R1: min{1, 0.49 + 0.21} = 0.8 - Resulting membership function for conclusion is a K1 singleton with value 0.8 Application of R2: membership function for conclusion is a K1 singleton with value 0.58 Application of R3: membership function for conclusion is a K2 singleton with value 0.58 CPSC 533 - Artificial Intelligence Michael M. Richter Example (4) Aggregation of all rules leads to the following fuzzy set: MF 0.8 K1 K2 0.58 Class The application of the “Mean-of-Max” defuzzification operator (selecting the value with the maximum membership value) leads to the crisp value K1. CPSC 533 - Artificial Intelligence Michael M. Richter Fuzzy Sets and Rough Sets We assume a fuzzy set, represented by a membership function µ. A crisp interval property P is defined by a real number a, 0 a 1 s.t. P(x) µ(x) < a (or µ(x) a ) or it is a complement of such a set. If the number a is not exactly known this gives rise to a rough set by introducing to numbers b,g, 0 b a g 1, which function as thresholds: Pu(x) µ(x) b,),Po(x) g µ(x) The smaller the difference between the thresholds is the smaller the uncertainty area is. P(x) for sure b a uncertainty CPSC 533 - Artificial Intelligence g not P(x) for sure Michael M. Richter Similarities We consider a fixed set U (the universe). Similarity has a relational and a functional aspect Relational aspect: R(x,y,u,v) intended as “y is more similar to x than v is to u”. Special case: R(x,y, x,u), “y is more similar to x than v is to x”. Functional aspect: A similarity measure on a set U is a mapping sim: U x U [0,1] (real interval) A dual notion is a distance measure A similarity measure quantifies the degree of similarity. Fuzzy equalities are special cases of similarity measures. CPSC 533 - Artificial Intelligence Michael M. Richter From Fuzzy Sets to Similarities In order to define a similarity measure one needs not to start from pairs of objects. If we have simply a fuzzy set K f U then we would need in addition a reference object in order to define a measure. Such a reference object has to satisfy µK (x) = 1. In this case we can define a similarity measure by sim(x, y) = µK (y) and get a measure satisfying sim(x,x) = 1. If there is a subset P U such that for each x P we have some fuzzy subset Kx f U with membership functions µx (y), for which µK (x) = 1 holds then we can again define a measure on UxP by sim (y, x) = µx(y), y U, x P. CPSC 533 - Artificial Intelligence Michael M. Richter From Similarities to Fuzzy Sets One can also start with a similarity measure and want to associate some fuzzy subsets with it: SIM f V: = U x U, µSIM (x1, x2): = sim (x1, x2) We associate to each x U a fuzzy subset Fx f U by µx (y) = µ Fx (y) = sim (x,y). Reflexivity of sim means sim(x,x) = 1. If sim is symmetric then µx (y) = µy (x) holds. So we obtain from sim for each x some fuzzy subset which can be regarded how U is structured by sim from the viewpoint of x: x regards itself as the central element and the degree of membership of the other elements depend on the similarity to x. CPSC 533 - Artificial Intelligence Michael M. Richter Generalization: Partnerships We consider to arbitrary sets A and B. A partner relation is a fuzzy subset PART f V: = A x B. Functional view: part: A x B [0,1] (real interval) Best partners are those with highest degree of partnership. Examples: A = set of women, B = set of men, partnership: marriage A = customer demands, B = products, partnership: customer satisfaction A = diseases, B = therapies, partnership: best for health CPSC 533 - Artificial Intelligence Michael M. Richter Objects, Fuzzy Sets and Similarities We assume a universe U and a fuzzy set X of U with a membership function µX . Up to now we have only considered similarities and distances between objects of U, i.e. sim(object, object). There are three more possibilities: 1) sim(membership function, membership function) 2) sim(object, membership function) 3) The third is 2) with permutated arguments (this plays a role if the first argument is considered as a query and the second as an answer. Because membership functions interpret linguistic expressions this can also be interpreted as a similarity between linguistic predicates and between objects and linguistic predicates. CPSC 533 - Artificial Intelligence Michael M. Richter Distances between Membership Functions (1) a) Crisp Method: Select ai for which mi (ai) maximal, i = 1,2; d(m1,m2) = a1 - a2 b) Integral Method: Fi = Area between mi and x - axis d (m1, m2) = Size (F1D F2) ( Symmetric Difference) m1 m2 Distances now compare fuzzy-membership functions ! A corresponding similarity is e.g. sim (m1, m2) = 1- d (m1, m2) CPSC 533 - Artificial Intelligence Michael M. Richter Distances between Membership Functions (2) The disadvantage of the integral method is that two fuzzy functions with disjoint areas have always the same distance; the crisp method avoids this. The disadvantage of the crisp method is that the shape of the curves do not play a role. The integral method avoids this. A combined method is as follows: If the areas are not disjoint apply the integral method If the areas are disjoint add to the distance obtained by the integral method the distance between the two points where both curves reach zero. A generalization is obtained if the euclidean distance a1 - a2 is replaced by an arbitrary distance measure. CPSC 533 - Artificial Intelligence Michael M. Richter Similarities between Objects and Fuzzy Functions Distinction: For y U sim(y, µX ) should not be the same as µX(y) ! E.g. for the fuzzy set high fever 37C and 36C have both membership value 0, but we expect sim(36, high fever) < sim(37, high fever). In order to combine fuzzy sets and similarity measures we need some simplifying assumptions: The universe U as an interval of real numbers (the approach generalizes to the situation where U is partially ordered). We consider n linguistic predicates A 1, ..., A n where each predicate A i has a regular membership function which attains the maximum value 1 at exactly one argument denote by m(A i). There is a similarity measure sim given on U. CPSC 533 - Artificial Intelligence Michael M. Richter Application: Case-Based Reasoning Case-based reasoning (CBR) is a method which has its origin in analogical reasoning. The main intention is to reuse previous experiences for actual problems. The difficulty arises when the actual situation is not identical to the previous one: There is an inexactness involved. Its main aspect is that CBR-techniques allow inexact (approximate) reasoning in a controlled manner. CBR has many application in Knowledge Management CPSC 533 - Artificial Intelligence Michael M. Richter Case-Based Reasoning (CBR) Cases = Experience = Problem/Solution pair. Cases are stored in a Case Base for further reuse To solve a new problem: retrieve case with a similar problem from the case base adapt the solution from the most similar case to the new problem similarity new problem new solution ~ adaptation problem Problem solution Lösung case-i Fall-i C A S E B A S E store CPSC 533 - Artificial Intelligence Michael M. Richter Probabilities and Diagnosis In a diagnostic situation there are Certain observations which are regarded as an event E which has happened A number of hypothetical diagnoses H1, ..., Hn Each hypothesis has a known a priori probability P(Hi) and for each Hi the conditional probability P(E Hi) is assumed to be known. If the a posteriori probabilities P(Hi E) (the conditional probabilities after the event E has happened the maximum likelihood principle selects a hypothesis: Take the Hi where P(Hi E) is maximal. It remains to compute the a posteriori probabilities from the known ones. CPSC 533 - Artificial Intelligence Michael M. Richter Bayes Formula Bayes formula allows to compute the desired probabilities: P( Hi / E ) = P( Hi ) P( E / Hi ) n S P( Hj ) P( E / H ) j i =1 For the proof we consider: P( Hi E ) , P( E / Hi ) P( E Hi ) P( Hi / E ) P( Hi ) P( E ) and P(E) = P(i (E Hi)) = SiP(EHi)P(Hi) from which the formula follows directly. CPSC 533 - Artificial Intelligence Michael M. Richter Bayesian Nets A Bayesian net is a semantic net with labels: Nodes: Random variables Directed edges: Conditional probabilities An edge from A to B exists if node A is a causal reason for node B, i. e. the conditional probability is non-trivial. If the probability of some node is known then the probability of all linked nodes can be calculated. If certain nodes are initialized (i.e. the event happened) this gives rise to probabilities of their son nodes: In this way the probabilities are propagated over the net. CPSC 533 - Artificial Intelligence Michael M. Richter Subjective Probabilities They express the opinion of people and are therefore of informal nature. They are qualitatively represented with the basic notions A B (“B is more likely then A”) The derived notion is A B neither A B nor B A . (see chapter on partial orderings) The partial ordering allows to establish Turing tests. There are various attempts to formulate axioms for the partial ordering with different aims: To derive a probability measure which induces the partial ordering To grasp peoples behavior when dealing with probabilities. In probability theory expectation is a derived concept; for subjective probabilities this may be questioned. CPSC 533 - Artificial Intelligence Michael M. Richter Intervals for Probabilities (1) Often it is not reasonable to assume that we know exactly the probabilities: What is the probability that your left knee will by ok by next week? („at least 50%“) What is the probability that the sun is shining at next Christmas? („no idea, any probability“) What is the probability that you are available tomorrow evening? („at least 98“) Consequence: We need to introduce an uncertainty on the level of probabilities too! CPSC 533 - Artificial Intelligence Michael M. Richter Intervals for Probabilities (2) Suppose W = {E1, ...,En} is a set of events with an unknown probability distribution P(E1),...,P(En ), Si P(Ei) = 1. Def.:A set of real intervals [Li, Ui], 0 Li Ui 1 is called an n-dimensional probability interval for W. Its purpose to give estimates for the unknown probabilities. Def.: An n-dimensional probability interval is called reasonable if there are numbers pi, 0 pi 1, Si pi = 1 such that Li pi Ui for all i. Reasonable means it estimates at least one probability distribution. CPSC 533 - Artificial Intelligence Michael M. Richter Intervals for Probabilities (3) Proposition: An n-dimensional probability interval is reasonable if and only if Si Li 1 Si Ui . Proof: If the condition holds then there is some number p, 0 p 1 with p*Si Li +(1-p)* Si Ui = 1; th desired probabilities are P(Ei) = p*Li + (1-p)*Ui. On the other hand the condition is necessary because otherwise the probabilities would not be normalized to 1. Probability interval for an unknown distribution is a knowledge unit. If several such units are given then they can be combined resulting in some sharper interval estimate. CPSC 533 - Artificial Intelligence Michael M. Richter