Download Certainty Factor Example

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Inductive probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Course
Year
: T0273 EXPERT SYSTEMS
: 2012
Reasoning Under Uncertainty
Session 4
Lecture Outline
• Introduction to Frame
• Learn the meaning of uncertainty and explore some
theories designed to deal with it
• Find out what types of errors can be attributed to
uncertainty and induction
• Learn about classical probability, experimental, and
subjective probability, and conditional probability
• Explore hypothetical reasoning and backward induction
Bina Nusantara University
Frames
• One type of schema that has been used in many AI
applications is the frame.
• Another type of schema is script, which is essentially a
time-ordered sequence of frames.
• The basic characteristic of a frame is that it represents
related knowledge about a narrow subject that has much
default knowledge.
• The frame contrasts with the semantic net, which is
generally used for broad knowledge representation.
• A frame is analogous to a record structure in a high-level
language such as C or an atom with its property list in
LISP.
T0273 - Expert Systems
4
Frames
• A frame is basically a group of slots and fillers that
defines a stereotypical object.
• Example:
A car frame
Slots
Fillers
manufacturer
model
year
transmission
engine
tires
color
General Motors
Chevrolet Caprice
1979
automatic
gasoline
4
Blue
T0273 - Expert Systems
5
Frames
• Frames are generally designed to represent either
generic or specific knowledge.
• Example:
A Generic Frame for Property
Slots
Fillers
name
specialization_of
types
property
a_kind_of_object
(car, boat, house)
if-added: Procedure ADD_PROPERTY
default: government
if-needed: Procedure FIND_OWNER
(home, work, mobile)
(missing, poor, good)
(yes, no)
owner
location
status
under_warranty
T0273 - Expert Systems
6
How to Expert Systems Deal
with Uncertainty?
• Expert systems provide an advantage when dealing with
uncertainty as compared to decision trees.
• With decision trees, all the facts must be known to arrive
at an outcome.
• Probability theory is devoted to dealing with theories of
uncertainty.
• There are many theories of probability – each with
advantages and disadvantages.
Bina Nusantara University
What is Uncertainty
• Uncertainty is essentially lack of information to formulate
a decision.
• Uncertainty may result in making poor or bad decisions.
• As living creatures, we are accustomed to dealing with
uncertainty – that’s how we survive.
• Dealing with uncertainty requires reasoning under
uncertainty along with possessing a lot of common
sense.
Bina Nusantara University
Theories for Uncertainity
•
•
•
•
•
•
Bina Nusantara University
Bayesian Probability
Hartley Theory
Shannon Theory
Dempster-Shafer Theory
Markov Models
Zadeh’s Fuzzy Theory
Dealing with Uncertainty
• Deductive reasoning – deals with exact facts and
exact conclusions
•
Deduction proceeds from the general to the
specific
All men are mortal
Socrates is a man
-> Socrates is mortal
• Induction goes from the specific to the general
Bina Nusantara University
My disk has never crashed
-> My disk will never crash
Type of Errors
Many different types of error can contribute to
uncertainty
Bina Nusantara University
Conditional Probabilities
• The probability of an event A occurring, given that
event B has already occurred is called conditional
probability
Bina Nusantara University
Types of error
Many different types of errors can contribute to uncertainty.
1. data might be missing or unavailable
2. data might be ambiguous or unreliable due to
measurement errors
3. the representation of data may be imprecise or
inconsistent
4. data may just be user's best guess (random)
5. data may be based on defaults, and defaults may have
exceptions
Bina Nusantara University
13
Reasoning Under Uncertainty
Given these sources of errors, most knowledge base systems
incorporate some form of uncertainty management.
There are three issues to be considered:
1. How to represent uncertain data.
2. How to combine two or more pieces of uncertain data.
3. How to draw inference using uncertain data
Reasoning Under Uncertainty
Errors and Induction
Deduction is going from general to specific
All men are mortal
Socrates is a man
therefore Socrates is mortal
Induction tries to generalize from the specific.
My disk has never crashed.
Inductive
therefore my disk will never crash.
Reasoning Under Uncertainty
Inductive arguments can never be proven correct (except mathematical
induction). Inductive arguments can provide some degree of
confidence that the conclusion is correct.
Deductive errors or fallacies may also occur
If p implies q
q is true
therefore p
Example
If the valve is in good condition then the output is normal
The output is normal
Therefore, the valve is in good condition
Uncertainty is major problem in knowledge elicitation, especially when
the expert's knowledge must be quantized in rules.
Approaches in Dealing with Uncertainty
Numerically oriented methods:
•
•
•
•
Bayes’ Rules
Certainty Factors
Dempster Shafer
Fuzzy Sets
Quantitative approaches
•
Non-monotonic reasoning
Symbolic approaches
•
Cohen’s Theory of Endorsements
•
Fox’s semantic systems
Classical Probability
• This is also called a priori probability. It is assumed that
all possible events are known and that each event is
equally likely to happen (rolling a die).
• Prior or unconditional probability is the one before the
evidence is received.
• Posterior or conditional probability is the one after the
evidence is received.
Theory of Probability
Experimental or Subjective Probabilities
• In contrast to the prior approach, experimental
probability defines the probability of an event P(E) as the
limit of a frequency distribution.
P(E) =[ lim (N-> infinity)] f(E)/N
This type of probability is called a posterior probability.
A subjective probability is a belief or opinion expressed as
a probability rather than a probability based on axioms or
empirical measurements. This is applied on the
decisions for non-repeatable events.
Theory of Probability
Compound Probabilities
• What is the probability of rolling a die with an
outcome of even number divisible by 3.
Event A=
{2, 4, 6}
Event B =
{3, 6}
A  B  6
P( A  B) 
n( A  B ) 1

n( s )
6
P( A  B)  P( A) P( B)
Theory of Probability
The two events are called stochastically independent
events if and only if the above formula is true.
Stochastic is a Greek word meaning "guess". It is
commonly used as a synonym for probability, random or
chance.
The probability of rolling a die with an outcome of even
number or divisible by 3.
P( A  B)  P( A)  P( B)  P( A  B)
= 3/6 + 2/6 – 1/6 = 4/6
Theory of Probability
Conditional Probabilities
• The probability of an event A, given that event B
occurred, is called a conditional probability and indicated
by P(A|B).
P( A  B)
P( A | B) 
forP( B)  0
P( B)
P( A | B) P( B)  P( A  B)
Baye's Theorem
Baye's Theorem in terms of events E, and hypothesis,
P( E  H i )
P( H i | E ) 
 P( E  H j )
j

P( E | H i ) P( H i )
 P( E | H j ) P( H j )
j
P( E | H i ) P( H i )

P( E )
Baye's Theorem
The conditional probability, P(A|B), states the probability of
event A given that event B occurred. The inverse problem
is to find the inverse probability which states the probability
of an earlier event given that a later one occurred.
Example: Probability of chosing brand X given it has
crashed.
This is inverse or posterior probability.
Example
Table below shows hypothetical disk crashes using a brand X drive within one year
X
X’
Crash C
No Crash C’
0.6
0.2
0.1
0.1
0.7
0.3
Total of Columns
0.8
0.2
1.0
Total of Rows
P(C|X) = ?
P(C|X) = P(C  X) / P(X) = 0.6 / 0.8 = 0.75
P(C|X’) = P(C  X’) / P(X’) = 0.1 / 0.2 = 0.50
P(X|C) = ?
P(X|C) = P(C  X) / P(C) = 0.6 / 0.7 = 6/7
P(X|C) = P(C|X) P(X) / P(C) = 0.75 * 0.8 / 0.7
= 0.6 / 0.7
Example
Suppose, statistics show that Brand X drive crashes with a probability
of 75% within one year and non-Brand X drive crash within one year is
50%. The inverse question is, what is the probability of a crashed drive
being brand X or non-brand X.
Reasoning with Certainty Factors
During the development of MYCIN, researchers developed certainty
factors formalism for the following reasons:
• The medical data lacks large quantities of data and/or the numerous
approximations required by Bayes' theorem.
• There is a need to represent medical knowledge and heuristics
explicitly, which can not be done when using probabilities.
• Physicians reason by capturing evidence that supports or denies a
particular hypothesis.
Certainty Factor (CF) Formalism
Eg of MYCIN rule
IF the stain of the organism is gram pos
AND the morphology of the organism is coccus
AND the growth of the organism is chains
THEN there is evidence that the organism is streptococcus CF
0.7
Given the evidence a doctor only partially believe the
conclusion
• General Form
IF E1 And E2 ….THEN H CF = Cfi
where E= evidence & H is the conclusion
Certainty Factor (CF)
Formalism
• A measure of belief, MB(h, e) indicates the degree to
which our belief in hypothesis, h, is increased based
on the presence of evidence, e
• A measure of disbelief, MD(h, e), indicates the
degree to which our disbelief in hypothesis, h, is
increased based on the presence of evidence, e.
When
p(h | e) = 0 MB(h, e) = 0 MD(h, e) = 1
p(h | e) = 1 MB(h, e) = 1 MD(h, e) = 0
Certainty Factor (CF)
Formalism
CF interpretation
Certainty Factor (CF)
Formalism
Propagation of Certainty Factors
When there are two or more rules supporting the same
conclusion CFs are propagated as follows:
CFrevised = CFold + CFnew(1 - CFold) if both CFold and
CFnew > 0
= CFold + CFnew(1 + CFold) if both CFold and
CFnew < 0
=
otherwise
Certainty Factor
Example
In a murder trial the defendant is being accused of a first degree
murder (hypothesis).The jury must balance the evidences presented
by the prosecutor and the defense attorney to decide if the suspect is
guilty.
RULE001 IFthe defendant's fingerprints are on the weapon,
THEN the defendant is guilty. CF=0.75
RULE002 IFthe defendant has a motive,
THEN the defendant is guilty. CF=0.60
RULE003 IFthe defendant has a alibi,
THEN he is not guilty. CF=-.80
Certainty Factor
Example
We start with CF = 0.0 for the defendant being guilty.
• After submission of the evidence 1 (fingerprints on the
weapon)
CFcomb1
= CF rule1's conclusin * CF evid1
= 0.75 * 0.90 = 0.675
CF revised = CF old + CF new * (1 - CFold)
= 0.0 + 0.675(1-0.0) = 0.675
Example of CFs Propagation
CFcon1=CFnew=0.675
CFold=0.0
Guilty
CFnew=0.675 Guilty
CF = 0.0
CFrevised=0.675
fingerprints
on weapon
CFevid1=0.90
CFrule1=0.75
CFrevised=CFold + CFnew*(1-CFold)
=0.0 + 0.675*(1-0.0)
=0.675
RULE 1. IF the defendant’s fingerprints are on the weapon
THEN the defendant is guilty
CFcon1=CFevid1*CFrule1 (single premise rule)
=0.9*0.75
=0.675
Certainty Factor Example
The defendant’s mother in law says that he had the motive
for slaying
CFnew
= CFcomb2 = CF rule2's conclusin * CF evid2
= 0.60 * 0.50 = 0.30
CF revised = CF old + CF new * (1 - CFold)
= 0.675 + 0.30(1-0.675) = 0.7725
CFcon2=CFnew=0.30
Motive exists
CFevid2=0.50
CFrule2=0.60
CFold=0.675
Guilty
CFnew=0.30 Guilty
CFrevised=0.772
CFrevised=0.675
CFrevised=CFold + CFnew*(1-CFold)
=0.675 + 0.30*(1-0.675)
=0.7725
RULE 2. IF the defendant has a motive
THEN the defendant is guilty of the crime
CFcon2=CFevid2*CFrule2
=0.50*0.60
=0.30
(single premise rule)
Certainty Factor Example
A respected judge witnesses for alibi, so a cf of 0.95 is
assigned for this evidence
CFcomb3
= CF rule3's conclusin * CF evid3
= 0.95 * (-0.80) = -0.76
CFrevised =
= (0.7725 - 0.76) / (1 - 0.76) = 0.052
CFcon3=CFnew=-0.76
Guilty
CFrevised=0.772
CFold=0.772
CFnew=-0.76
Guilty
CFrevised=0.052
CFold  CFnew
CFreviced=
1  min(| CFold |, | CFnew | )
Alibi found
CFevid3=0.95
CFrule3= -0.80
RULE 3. IF the defendant has an alibi
THEN he is not guilty
CFcon3=CFevid3*CFrule3
=0.95*(-0.80)
= -0.76
= (0.772-0.76)/(1-0.76)
= 0.052
Certainty Factor Example
Confidence Factor in guilty verdict after introduction of all
evidences is:
Advantages of Certainty Factors
• It is a simple computational model that permits experts to
estimate their confidence in conclusions being drawn.
• It permits the expression of belief and disbelief in each
hypothesis, allowing the expression of the effect of
multiple sources of evidence.
• It allows knowledge to be captured in a rule
representation while allowing the quantification of
uncertainty.
• The gathering of the CF values is significantly easier
than the gathering of values for the other methods. No
statistical base is required – you merely have to ask the
expert for the values.
Difficulties
Deep Inference Chains
If we have a chain of inference such as:
IF A
IF B
THEN B
THEN C
CF=0.8
CF= 0.9
Then because of the multiplication of CFs the resulting CF
decreases.
For example if
CF(A) = 0.8, then
CF(C) = .8*.8*.9 = .58
With long chain of inferences the final CF may become
very small
Difficulties
Many Rules with same Conclusion
The more rules with the same conclusion the higher the CF
value. If there are many rules then CF can become
artificially high.
Difficulties
Conjunctive Rules
If a rule has a number of conjunctive premises, overall CF may be
reduced too much.
IF sky dark AND temperature dropping
THEN will rain 0.9
If CF(sky dark) = 1,
CF(temperature dropping) = .1 then
CF(will rain) = min(1, .1)*.9 = .09 whereas if we had
IF the sky dark THEN will rain 0.7
IF temperature dropping THEN will rain 0.5
CF1 = 1 * .7 = 0.7,
CF2 = .1 * .5 = 0.05
CF (will rain) = .7 + .05*(1 - .7) = .7 + 0.015 = .715
Fuzzy Logic
In everyday speech we use vague or imprecise terms to
describe properties.
Fuzzy logic was developed by Zadeh to deal with these
imprecise values in a mathematical way.
Fuzzy Logic
• It will allow us to deal with fuzzy rules
IF the temperature is cold
THEN the motor speed stops
IF speed is slow
THEN make acceleration high.
Fuzzy Sets
• In ordinary set theory, an element from the domain is
either in a set or not in a set.
• In fuzzy sets, a number in the range 0-1 is attached to an
element – the degree to which the element belongs to
the set.
• A value of 1 means the element is definitely in the set
• A value of 0 means the element is definitely not in the
set
• Other values are grades of membership.
• Formally a fuzzy set A from X is given by its membership
function which has type
A : X  [0, 1]
Fuzzy Sets
Fuzzy set of small men
Small men – Simpler Curve
Fuzzy Sets
• The following figure shows the representation of three
fuzzy sets for small, medium and tall men. We see that a
man of height 4.8 feet is considered both small and
medium to some degree.
Boolean Operations
The Boolean operations of union, intersection, and complement can be
defined in the straightforward manner.
Complement
The operation is
A (x) = 1 - A (x)
Boolean Operations
Intersection
The intersection of two fuzzy sets A and B is given by
AB (x) = min({A (x), B (x)})
Union
The union of two fuzzy sets A and B is given by
AB (x) = max({A (x), B (x)})
Fuzzy Reasoning
In this section, fuzzy rules and how inference is performed
on these rules is presented.
This will be illustrated by a fuzzy system used to control an
air conditioner. The variables to be used (with fuzzy values)
are temperature (of the room) and speed (of the fan
motor).
Fuzzy Reasoning
The rules are given as follows:
• IF the temperature is cold
THEN motor speed stops
• IF the temperature is cool
THEN motor speed slows
Temperature Fuzzy Sets
• IF the temperature is just right
THEN motor speed medium
• IF the temperature is warm
THEN motor speed fast
• IF the temperature is hot
THEN motor speed blast
Speed Fuzzy Sets
Fuzzy Reasoning
• In a fuzzy system all the rules fire in parallel, although in
the end many will not contribute to the output.
• What we need to determine, in the above system is,
given a particular value of the temperature how do we
calculate the motor speed.
Fuzzy Reasoning
• Now, the temperature can be measured fairly accurately,
but it will lie in several fuzzy sets. For example if the
temperature were 17C then from the figure we see that it
is about 25% cool and 80% just right.
Fuzzy Reasoning
• This means that rules 2 and 3 will contribute to the
output speed of the motor.
• The fuzzy set for the output can be calculated by
multiplying the slow graph by .25 and the medium graph
by .80 assuming the contribution is proportional to the
fuzzy values of the input temperature
References
Textbooks:
• Joseph Giarratano, Gary Riley,. 2005. Expert systems :
principles and programming. THOCO. Australia. ISBN:0534-38447-1
• Stuart Russell Peter Norvig. 2010. Artificial Intelligence,
A Modern Approach. PE. New Jersey. ISBN:978-0-13207148-2
Web :
• www.widodo.com
Bina Nusantara University