Download Artificial Intelligence Winter 2004

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Artificial intelligence in video games wikipedia , lookup

Inductive probability wikipedia , lookup

Embodied cognitive science wikipedia , lookup

Technological singularity wikipedia , lookup

AI winter wikipedia , lookup

Neural modeling fields wikipedia , lookup

Type-2 fuzzy sets and systems wikipedia , lookup

Fuzzy concept wikipedia , lookup

Philosophy of artificial intelligence wikipedia , lookup

History of artificial intelligence wikipedia , lookup

Ethics of artificial intelligence wikipedia , lookup

Intelligence explosion wikipedia , lookup

Fuzzy logic wikipedia , lookup

Existential risk from artificial general intelligence wikipedia , lookup

Transcript
Section 6: Impreciseness
CPSC 533 - Artificial Intelligence
Michael M. Richter
Forms of Impreciseness
Impreciness occurs frequently:
• A number is only approximatively correct (toerance)
• A frequency says nothing about a specific case or is
even not exactly known (probability)
• The exact number is not of interest (abstraction)
• The expression itself has no exact meaning (informal
semantics of e.g. like „this is useful“).
• Some forms are of objective and others are of
subjective character „(the tolerance is 0.2“ or „the
weather is nice“)
CPSC 533 - Artificial Intelligence
Michael M. Richter
Forms of Uncertainty and Vagueness (2)
 We distinguish vagueness which has an objective origin from
vagueness which has a mainly subjective character.
 In an objective situation there is an agreement which has a formal
character and a model to which one can refer refer. The informal
notion than has a formal original. E.g. “high fever” has as original a
certain precise temperature (which may not be known or may not be
of interest).
 In a subjective situation there is no exact original (like “this is stupid”),
only a subjective impression is reflected.
 In order put subjective vagueness on solid foundations the reasoning
in the model is replaced by experiments with the individuals who
express their subjective opinions.
 From the expression itself it may not be evident whether it has an
objective or a subjective meaning.
CPSC 533 - Artificial Intelligence
Michael M. Richter
Subjective Uncertainty and a Turing Test
(1)
Suppose there is a partial ordering „<„
with the concept C associated: The partial ordering
then again has two versions: formal and subjective.
The Turing test refers to these two versions of „<„ :
Formal
version
of C
Subjective human
version of C
To be verified
by experiments
CPSC 533 - Artificial Intelligence
The tests when
variations of the
arguments of < are
presented. Goal:
The human says „up“
if and only if the formal
system says „up“
Michael M. Richter
Subjective Uncertainty and a Turing
Test (2)
Example: Fuzzy values for “typical lion”
Concept to grasp: Typical lion
better
better
Human:
Subjective judgement
CPSC 533 - Artificial Intelligence
Formal
version uses
Ordering:
Quotient of
length/height
The partial ordering
approximates the
concept C in the
sense that
semantics of y < z
is : z is more
typical for C
than y is.
Michael M. Richter
A Warning Example (1)
 Reasoning with vague concepts may easily lead to consequences
which are inconsistent.
 Example: We consider two medications A and B which are applied
two male and female patients in a hospital. The successes are
recorded.
 There are three predicates introduced with the following semantics:
 BetterM(X,Y): More successes with medication X applied to men
 BetterF(X,Y): More successes with X applied to women
 Better(X,Y): More successes with X in total.
 Question: Is BetterM(X,Y)  BetterF(X,Y)  Better(X,Y) true ? This
sounds plausible but there is a counterexample.
CPSC 533 - Artificial Intelligence
Michael M. Richter
A Warning Example (2)
Men:
+
-
A
20
180
10% Success
B
4
96
4% Success
+
-
A
20
20
B
37
63
Total:
+
-
A
40
200
Women
:
B
41
159
CPSC 533 - Artificial Intelligence
50% Success
Better M (A,B)
Better F (A,B)
37% Success
16,6% Success
20,5% Success
Better(B,A)
Michael M. Richter
Rough Sets (1): A Basic Method
 We consider a universe U; the uncertainty is given by a
binary relation „“ over U called indiscernability relation.
 Assumption: is reflexive and symmetric.
 Idea: We cannot distinguish two objects a and b with a  b.
 We consider some predicate P(x) over U (represented as subsets of U). The
relation  motivates the following definition:
 Def.:
(i) The lower approximation of P is
Pu = { x  U | for all y mit x  y : y  P }
(ii) The upper approximation of P is
Po = { x  U | there is y  P mit x  y }
 Elements of Pu are surely and elements of Po are possibly in P.
CPSC 533 - Artificial Intelligence
Michael M. Richter
Rough Sets (2)
P
x
y
* *
o
P
P
u
* x1
* y
1
Here we have x  y and x1  y1.
The set Po \Pu called the uncertainty area. Because decisions about
elements in Pu and elements not in Po are certain the rough set
method can be regarded as „to be on the safe side“.
CPSC 533 - Artificial Intelligence
Michael M. Richter
Rough Sets (3)
 There are basically two types of indiscernability relations:
  is transitive: Then  is an equivalence relation. A
typical example occurs in the attribute value
representation when two objects cannot be
distinguished because the values of some attributes are
missing.
  is not transitive: Then  is not an equivalence relation.
A typical example is when domains of some attributes
are real numbers which cannot be distinguished if the
difference is smaller then some threshold (e.g. due to
measurement errors).
 Although the two types have many differences the rough
set method applies to both of them.
CPSC 533 - Artificial Intelligence
Michael M. Richter
Fuzzy Sets (1)
 Fuzzy sets are generalizations of ordinary („crisp“) sets.
 Suppose U is some (ordinary) set.
 A fuzzy subset X of U is defined by a function
µX : U  [0,1]
 Notation: X  f U
 For y in U µX (y) is called the degree of membership of
y to X and µX is the membership function of X.
 Example: X = Young_Customer, µX(Bill) = 0.5 if Bill is of
age 32
 This is easily generalized to n-ary relations
CPSC 533 - Artificial Intelligence
Michael M. Richter
Fuzzy Sets (2)
In fuzzy logic and set theory many classical notions are
generalized.
A fuzzy partition of U into n fuzzy subsets
is given by membership functions
µ1(x),...,µn(x) such that  (µi | 1in) = 1.
In particular, disjoint means now disjoint to some degree.
A fuzzy classifier for such a partition is mapping
cf : U  [ 0, 1]n
such that for
cf(x) = (y1(x),...,yn(x)) we have ( yi(x) | 1in) = 1.
CPSC 533 - Artificial Intelligence
Michael M. Richter
Fuzzy Sets (3)
 Fuzzy equality E is a special fuzzy equivalence relation satisfying (1)
E(x,x) = 1, (2) E(x,y) = E(y,x), (3) E(x,y) + E(y,z) -1  E(x,z)
 A weakening of fuzzy equivalence is a similarity measure sim :
 SIM f V: = U x U, µSIM (x1, x2): = sim (x1, x2)
 Similarity measures are generalized measures
 SIM f V: = A x B, µSIM (a, b): = sim (a, b)
We call the pairs (a,b) partners and sim(a,b) the degree of
partnership.
CPSC 533 - Artificial Intelligence
Michael M. Richter
Example
This is a fuzzy partition: There is an area where high and low
fever overlap.
Low
fever
1
High
fever
0
35
36
37
38
39
40
41
42
In the intersection area: What is the decision, high or low ?
Rationality principle: The one with highest degree of membership.
There is one point where both decisions can be made.
CPSC 533 - Artificial Intelligence
Michael M. Richter
Regular Membership Functions
It is not useful to consider arbitrary membership functions.
Def.: For T IR a function µ : T  IR is called regular if
(i) µ is piecewise continuous
(ii)If x,y  T and  [0,1] then µ(x + (1-)y) = min(µ(x), µ(y))
(iii) There is some x in T with µ(x) = 1
(i) is often specialized to “piecewise linear” which makes it
computationally more feasible. Certain exceptions should be
allowed in order to include the jump from 0 to 1 in classical
logic.
(ii) excludes several maxima and (iii) postulates the existence
of an ideal argument.
CPSC 533 - Artificial Intelligence
Michael M. Richter
Norms and Co-Norms (1)
t -norms f(x, y) (intended to compute µA  B):
Axioms:
(T1) f(x, y) = f(y, x)
(T2) f(x, f(y, z)) = f(f (x, y), z)
(T3) x  x' , y  y'  f(x, y)  f(x', y')
(T4) f(x, 1) = x
Typical t-norms are f(x,y) = min(x,y) or f(x,y) = xy
co-t-norms f(x, y) (intended to compute µA  B):
Axioms:
(T1), (T2), (T3) and
(T4*) f(x, 0) = x
Typical co-t-norms are f(x,y) = max(x,y) or f(x,y)
= x+y - xy.
CPSC 533 - Artificial Intelligence
Michael M. Richter
Norms and Co-Norms (2)
Consequences:
f(x, 0) = 0 for t-norms
f(x, 1) = 1 for co-t-norms.
Norms and co-norms cannot reflect relative
importance (e.g. using weights)
There are other fuzzy combination rules available
which are fuzzy versions of general Boolean operators
like different types of implications.
CPSC 533 - Artificial Intelligence
Michael M. Richter
Negation and Complement
 A negation is a function neg: [0,1] [0,1] with the
axioms
 (N1) neg(0) = 1, neg(1) = 0
 (N2) x  y  neg(y)  neg(x)
 The negation function is intended to compute
the membership function of the complement
C(A) of A.
 A typical example is neg(x) = 1 - x. For this
function the de Morgan laws hold:
µC(A B)(x) = µC(A) C(B) (x), µC(A B)(x) = µC(A) C(B) (x)
if the t-norm min and the co-t-norm max are used.
CPSC 533 - Artificial Intelligence
Michael M. Richter
The Mamdami-Implication
 We consider implications between literals, i.e.of the form
F1 F2Fn Because the literals have often
names from natural language such implications are called
linguistic rules.
 Predicate logic interpretation: Predicates, interpreted in a
model.
 Fuzzy logic interpretation: Membership functions over the
universe.
 The membership function in the conjunction is
computed using a t-norm.
 The implication is considered as a conjunction F1 
F2Fn 
CPSC 533 - Artificial Intelligence
Michael M. Richter
Example
Linguistic rule: If temperature low and road narrow then
speed slow.
Fuzzy representation
Actual temperature
Inferred membership
Actual narrownessfunction for “slow”
The result of the implication is a membership function, not a single value.
To obtain such a value an operation called defuzzification is needed.
CPSC 533 - Artificial Intelligence
Michael M. Richter
Defuzzification
 There are three major defuzzification methods which assign to am
membership function µ with domain Y an element y  Y. We put
Max = {y Y "y’Y: µ(y’)  µ(y)}
 1) Maximum method:
 Choose an arbitrary y  Max
 2) Mean-of-Maximum method:
 Choose y as the mean of Max:
 y =Max-1* (y’ y’Max)
 3) Center-of-Area method:
 Define y as the element below the center of gravity of the area
bounded by the curve µ and the y-axis:
 y=
µ( y )dy *  y * µ( y )dy-1

CPSC 533 - Artificial Intelligence
Michael M. Richter
Linguistic Expressions
 Linguistic expressions are taken from natural language
 Something is large, expensive, ...., nice, pale, ....
 IF the water is very dirty THEN add much soap
 Ways of formalization:
 Classical logic: Logical predicates and rules with binary
interpretation. A specific object satisfies a predicate or it does not.
 Fuzzy logic: Fuzzy set with membership function. An object
satisfies a predicate to some degree (fuzzification).
 Rules where the conclusion is an action or a decision:
 Classical logic: Action can be performed if preconditions are
satisfied.
 Fuzzy logic: The conclusion is the membership function which
cannot be performed directly. The defuzzification transforms this
into some action, depending on the degrees of truth of the
conditions.
CPSC 533 - Artificial Intelligence
Michael M. Richter
Abstract and Linguistic Predicates
Abstract Level
Abstraction
to
Abstract
classical
predicates
instantiation
defuzzification
Concrete Level
(Real Data)
Abstraction
to
linguistic
predicates
with fuzzy
interpretations
Example (1)
Linguistic variables:
Variable: Xtemp ;
Values: {low, med, high}
Membership functions:
MF
low
med
high
0.7
0.49
Temp
Variable: Xpressure ; Values: {no, low, med, high}
Membership functions:
MF
no
low
med
high
0.3
0.21
0.09
Pressure
CPSC 533 - Artificial Intelligence
Michael M. Richter
Example (2)
New fuzzy variable for representing the decision class:
Variable: Xclass
MF
Values: {K1, K2}
1
Rules
R1:
OR
R2:
OR
R3:
OR
K1
K2
IF
XTemp is high
XPressure is no
THEN
IF
XTemp is low
XPressure is high
THEN
IF
XTemp is high
XPressure is high
THEN
Class
XClass is K1
XClass is K1
XClass is K2
Actual query: X*Temp is med, X*Pressure is low
CPSC 533 - Artificial Intelligence
Michael M. Richter
Example (3)
- Application of rule R1:
maxx min { mTemp,high(x) , mTemp,med(x) } = 0.49
maxx min { mPressure,no(x) , mPressure,low(x) } = 0.21
- Fuzzy for precondition of rule R1: min{1, 0.49 + 0.21} = 0.8
- Resulting membership function for conclusion
is a K1 singleton with value 0.8
Application of R2: membership function for conclusion
is a K1 singleton with value 0.58
Application of R3: membership function for conclusion
is a K2 singleton with value 0.58
CPSC 533 - Artificial Intelligence
Michael M. Richter
Example (4)
Aggregation of all rules leads to the following fuzzy set:
MF
0.8
K1
K2
0.58
Class
The application of the “Mean-of-Max” defuzzification
operator (selecting the value with the maximum
membership value) leads to the crisp value K1.
CPSC 533 - Artificial Intelligence
Michael M. Richter
Fuzzy Sets and Rough Sets
 We assume a fuzzy set, represented by a membership
function µ.
 A crisp interval property P is defined by a real number a, 0
 a 1 s.t. P(x)  µ(x) < a (or µ(x)  a ) or it is a
complement of such a set.
 If the number a is not exactly known this gives rise to a
rough set by introducing to numbers b,g, 0  b  a g  1,
which function as thresholds:
Pu(x)  µ(x)  b,),Po(x)  g µ(x)
 The smaller the difference between the thresholds is the
smaller the uncertainty area is.
P(x) for sure
b
a
uncertainty
CPSC 533 - Artificial Intelligence
g
not P(x) for sure
Michael M. Richter
Similarities
 We consider a fixed set U (the universe).
 Similarity has a relational and a functional aspect
 Relational aspect: R(x,y,u,v) intended as “y is more similar
to x than v is to u”.
 Special case: R(x,y, x,u), “y is more similar to x than v is
to x”.
 Functional aspect: A similarity measure on a set U is a
mapping
sim: U x U  [0,1] (real interval)
 A dual notion is a distance measure
 A similarity measure quantifies the degree of similarity.
 Fuzzy equalities are special cases of similarity measures.
CPSC 533 - Artificial Intelligence
Michael M. Richter
From Fuzzy Sets to Similarities
In order to define a similarity measure one needs not to
start from pairs of objects. If we have simply a fuzzy set
K  f U then we would need in addition a reference object
in order to define a measure. Such a reference object has to
satisfy
µK (x) = 1.
In this case we can define a similarity measure by
sim(x, y) = µK (y)
and get a measure satisfying sim(x,x) = 1. If there is a
subset P U such that for each x  P we have some fuzzy
subset Kx f U with membership functions µx (y), for which
µK (x) = 1 holds then we can again define a measure on
UxP by
sim (y, x) = µx(y), y  U, x P.
CPSC 533 - Artificial Intelligence
Michael M. Richter
From Similarities to Fuzzy Sets
One can also start with a similarity measure and want
to associate some fuzzy subsets with it:
SIM f V: = U x U, µSIM (x1, x2): = sim (x1, x2)
We associate to each x  U a fuzzy subset
Fx f U
by µx (y) = µ Fx (y) = sim (x,y).
Reflexivity of sim means sim(x,x) = 1.
If sim is symmetric then µx (y) = µy (x) holds.
So we obtain from sim for each x some fuzzy subset
which can be regarded how U is structured by sim
from the viewpoint of x: x regards itself as the central
element and the degree of membership of the other
elements depend on the similarity to x.
CPSC 533 - Artificial Intelligence
Michael M. Richter
Generalization: Partnerships
 We consider to arbitrary sets A and B.
 A partner relation is a fuzzy subset PART f V: = A x B.
 Functional view:
part: A x B  [0,1] (real interval)
 Best partners are those with highest degree of
partnership.
 Examples:
 A = set of women, B = set of men, partnership:
marriage
 A = customer demands, B = products, partnership:
customer satisfaction
 A = diseases, B = therapies, partnership: best for
health
CPSC 533 - Artificial Intelligence
Michael M. Richter
Objects, Fuzzy Sets and Similarities
 We assume a universe U and a fuzzy set X of U with a
membership function µX .
 Up to now we have only considered similarities and
distances between objects of U, i.e. sim(object, object).
 There are three more possibilities:
 1) sim(membership function, membership function)
 2) sim(object, membership function)
 3) The third is 2) with permutated arguments (this
plays a role if the first argument is considered as a
query and the second as an answer.
 Because membership functions interpret linguistic
expressions this can also be interpreted as a similarity
between linguistic predicates and between objects and
linguistic predicates.
CPSC 533 - Artificial Intelligence
Michael M. Richter
Distances between Membership
Functions (1)
a) Crisp Method:
Select ai for which mi (ai) maximal, i = 1,2; d(m1,m2) =  a1 - a2 
b) Integral Method:
Fi = Area between mi and x - axis
d (m1, m2) = Size (F1D F2) ( Symmetric Difference)
m1
m2
Distances now compare fuzzy-membership functions !
A corresponding similarity is e.g. sim (m1, m2) = 1- d (m1, m2)
CPSC 533 - Artificial Intelligence
Michael M. Richter
Distances between
Membership Functions (2)
 The disadvantage of the integral method is that two fuzzy functions
with disjoint areas have always the same distance; the crisp
method avoids this.
 The disadvantage of the crisp method is that the shape of the
curves do not play a role. The integral method avoids this.
 A combined method is as follows:
 If the areas are not disjoint apply the integral method
 If the areas are disjoint add to the distance obtained by the
integral method the distance between the two points where both
curves reach zero.
 A generalization is obtained if the euclidean distance  a1 - a2  is
replaced by an arbitrary distance measure.
CPSC 533 - Artificial Intelligence
Michael M. Richter
Similarities between Objects
and Fuzzy Functions
 Distinction: For y  U sim(y, µX ) should not be the same as µX(y) !
E.g. for the fuzzy set high fever 37C and 36C have both
membership value 0, but we expect sim(36, high fever) < sim(37,
high fever).
 In order to combine fuzzy sets and similarity measures we need
some simplifying assumptions:
 The universe U as an interval of real numbers (the approach
generalizes to the situation where U is partially ordered).
 We consider n linguistic predicates A 1, ..., A n where each
predicate A i has a regular membership function which attains
the maximum value 1 at exactly one argument denote by m(A i).
 There is a similarity measure sim given on U.
CPSC 533 - Artificial Intelligence
Michael M. Richter
Application: Case-Based Reasoning
 Case-based reasoning (CBR) is a method which has its
origin in analogical reasoning.
 The main intention is to reuse previous experiences for
actual problems.
 The difficulty arises when the actual situation is not
identical to the previous one: There is an inexactness
involved.
 Its main aspect is that CBR-techniques allow inexact
(approximate) reasoning in a controlled manner.
 CBR has many application in Knowledge Management
CPSC 533 - Artificial Intelligence
Michael M. Richter
Case-Based Reasoning (CBR)
 Cases = Experience = Problem/Solution pair.
 Cases are stored in a Case Base for further reuse
 To solve a new problem:
 retrieve case with a similar problem from the case
base
 adapt the solution from the most similar case to the
new problem
similarity
new problem
new solution
~
adaptation
problem
Problem
solution
Lösung
case-i
Fall-i
C
A
S
E
B
A
S
E
store
CPSC 533 - Artificial Intelligence
Michael M. Richter
Probabilities and Diagnosis
 In a diagnostic situation there are
 Certain observations which are regarded as an event
E which has happened
 A number of hypothetical diagnoses H1, ..., Hn
 Each hypothesis has a known a priori probability
P(Hi) and for each Hi the conditional probability P(E
Hi) is assumed to be known.
 If the a posteriori probabilities P(Hi E) (the conditional probabilities
after the event E has happened the maximum likelihood principle
selects a hypothesis: Take the Hi where P(Hi E) is maximal.
 It remains to compute the a posteriori probabilities from the known
ones.
CPSC 533 - Artificial Intelligence
Michael M. Richter
Bayes Formula
Bayes formula allows to compute the desired probabilities:
P( Hi / E ) =
P( Hi ) P( E / Hi )
n
S P( Hj ) P( E / H )
j
i =1
For the proof we consider:
P( Hi  E ) , P( E / Hi )  P( E  Hi )
P( Hi / E ) 
P( Hi )
P( E )
and
P(E) = P(i (E Hi)) = SiP(EHi)P(Hi) from which the
formula follows directly.
CPSC 533 - Artificial Intelligence
Michael M. Richter
Bayesian Nets
 A Bayesian net is a semantic net with labels:
 Nodes: Random variables
 Directed edges: Conditional probabilities
 An edge from A to B exists if node A is a causal reason for node
B, i. e. the conditional probability is non-trivial.
 If the probability of some node is known then the probability of
all linked nodes can be calculated.
 If certain nodes are initialized (i.e. the event happened) this
gives rise to probabilities of their son nodes: In this way the
probabilities are propagated over the net.
CPSC 533 - Artificial Intelligence
Michael M. Richter
Subjective Probabilities
 They express the opinion of people and are therefore of informal
nature.
 They are qualitatively represented with the basic notions
 A B (“B is more likely then A”)
 The derived notion is A B neither A B nor B A .
 (see chapter on partial orderings)
 The partial ordering allows to establish Turing tests.
 There are various attempts to formulate axioms for the partial
ordering with different aims:
 To derive a probability measure which induces the partial
ordering
 To grasp peoples behavior when dealing with probabilities.
 In probability theory expectation is a derived concept; for
subjective probabilities this may be questioned.
CPSC 533 - Artificial Intelligence
Michael M. Richter
Intervals for Probabilities (1)
 Often it is not reasonable to assume that we know
exactly the probabilities:
 What is the probability that your left knee will by ok
by next week? („at least 50%“)
 What is the probability that the sun is shining at
next Christmas? („no idea, any probability“)
 What is the probability that you are available
tomorrow evening? („at least 98“)
 Consequence: We need to introduce an uncertainty
on the level of probabilities too!
CPSC 533 - Artificial Intelligence
Michael M. Richter
Intervals for Probabilities (2)
 Suppose W = {E1, ...,En} is a set of events with an
unknown probability distribution P(E1),...,P(En ),
Si P(Ei) = 1.
 Def.:A set of real intervals [Li, Ui], 0  Li  Ui  1 is called
an n-dimensional probability interval for W. Its purpose
to give estimates for the unknown probabilities.
 Def.: An n-dimensional probability interval is called
reasonable if there are numbers pi, 0  pi  1, Si pi = 1
such that Li  pi  Ui for all i.
 Reasonable means it estimates at least one probability
distribution.
CPSC 533 - Artificial Intelligence
Michael M. Richter
Intervals for Probabilities (3)
 Proposition: An n-dimensional probability interval is
reasonable if and only if Si Li  1  Si Ui .
Proof: If the condition holds then there is some number
p, 0  p  1 with p*Si Li +(1-p)* Si Ui = 1; th desired
probabilities are P(Ei) = p*Li + (1-p)*Ui. On the other
hand the condition is necessary because otherwise the
probabilities would not be normalized to 1.
 Probability interval for an unknown distribution is a
knowledge unit. If several such units are given then
they can be combined resulting in some sharper
interval estimate.
CPSC 533 - Artificial Intelligence
Michael M. Richter