Download INTRODUCTION TO Al AND PRODUCTION SYSTEMS 9

Document related concepts

Genetic algorithm wikipedia , lookup

Ecological interface design wikipedia , lookup

Personal knowledge base wikipedia , lookup

Pattern recognition wikipedia , lookup

Inductive probability wikipedia , lookup

Linear belief function wikipedia , lookup

Unification (computer science) wikipedia , lookup

AI winter wikipedia , lookup

Multi-armed bandit wikipedia , lookup

Computer Go wikipedia , lookup

Philosophy of artificial intelligence wikipedia , lookup

Intelligence explosion wikipedia , lookup

Embodied cognitive science wikipedia , lookup

Existential risk from artificial general intelligence wikipedia , lookup

History of artificial intelligence wikipedia , lookup

Structural equation modeling wikipedia , lookup

Knowledge representation and reasoning wikipedia , lookup

Transcript
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
CS6659 -ARTIFICIAL INTELLIGENCE
UNIT I -INTRODUCTION TO Al AND PRODUCTION SYSTEMS
9
Introduction to AI-Problem formulation, Problem Definition -Production systems, Control
strategies, Search strategies.Problem characteristics, Production system characteristics Specialized production system- Problem solving methods Problem graphs, Matching, Indexing and Heuristic functions –Hill Climbing-Depth first
and Breath first, Constraints satisfaction - Related algorithms, Measure of performance and
analysis of search algorithms.
UNIT II- REPRESENTATION OF KNOWLEDGE
9
Game playing - Knowledge representation, Knowledge representation using Predicate logic,
Introduction to predicate calculus, Resolution, Use of predicate calculus, Knowledge
representation using other logic-Structured representation of knowledge.
UNIT III- KNOWLEDGE INFERENCE
9
Knowledge representation -Production based system, Frame based system. Inference –
Backward chaining, Forward chaining, Rule value approach, Fuzzy reasoning Certainty factors, Bayesian Theory-Bayesian Network-Dempster – Shafer theory.
UNIT IV -PLANNING AND MACHINE LEARNING
Basic plan generation systems - Strips -Advanced plan generation systems – K strips –
Strategic explanations -Why, Why not and how explanations. Learning- Machine
learning, adaptive Learning.
UNIT V -EXPERT SYSTEMS 9
Expert systems - Architecture of expert systems, Roles of expert systems - Knowledge
Acquisition – Meta knowledge, Heuristics. Typical expert systems - MYCIN, DART,
XOON, Expert systems shells
9
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
What is Artificial Intelligence?
Artificial Intelligence (AI) is a branch of Science which deals with helping machines finding
solutions to complex problems in a more human-like fashion. This generally involves
borrowing characteristics from human intelligence, and applying them as algorithms in a
computer friendly way. A more or less flexible or efficient approach can be taken depending
on the requirements established, which influences how artificial the intelligent behaviour
appears. AI is generally associated with Computer Science, but it has many important links
with other fields such as
Maths, Psychology, Cognition, Biology and Philosophy, among many others. Our ability to
combine knowledge from all these fields will ultimately benefit our progress in the quest of
creating an intelligent artificial being.
AI currently encompasses a huge variety of subfields, from general-purpose areas such as
perception and logical reasoning, to specific tasks such as playing chess, proving mathematical
theorems, writing poetry, and diagnosing diseases. Often, scientists in other fields move
gradually into artificial intelligence, where they find the tools and vocabulary to systematize
and automate the intellectual tasks on which they have been working all their lives. Similarly,
workers in AI can choose to apply their methods to any area of human intellectual endeavour.
In this sense, it is truly a universal field.
HISTORY OF AI
The origin of artificial intelligence lies in the earliest days of machine computations. During
the 1940s and 1950s, AI begins to grow with the emergence of the modern computer. Among
the first researchers to attempt to build intelligent programs were Newell and Simon. Their
first well known program, logic theorist, was a program that proved statements using the
accepted rules of logic and a problem solving program of their own design. By the late fifties,
programs existed that could do a passable job of translating technical documents and it was
seen as only a matter of extra databases and more computing power to apply the techniques to
less formal, more ambiguous texts. Most problem solving work revolved around the work of
Newell, Shaw and Simon, on the general problem solver (GPS). Unfortunately the
GPS did not fulfill its promise and did not because of some simple lack of computing capacity.
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
In the 1970‟s the most important concept of AI was developed known as Expert System which
exhibits as a set rules the knowledge of an expert. The application area of expert system is
very large. The 1980‟s saw the development of neural networks as a method learning
examples.
Prof. Peter Jackson (University of Edinburgh) classified the history of AI into three periods as:
1. Classical
2. Romantic
3. Modern
1. Classical Period:
It was started from 1950. In 1956, the concept of Artificial Intelligence came into existance.
During this period, the main research work carried out includes game plying, theorem proving
and concept of state space approach for solving a problem.
2. Romantic Period:
It was started from the mid 1960 and continues until the mid 1970. During this period people
were interested in making machine understand, that is usually mean the understanding of
natural language. During this period the knowledge representation technique ―semantic net‖
was developed.
3. Modern Period:
It was started from 1970 and continues to the present day. This period was developed to solve
more complex problems. This period includes the research on both theories and practical
aspects of Artificial Intelligence. This period includes the birth of concepts like Expert system,
Artificial Neurons, Pattern Recognition etc. The research of the various advanced concepts of
Pattern Recognition and Neural Network are still going on.
COMPONENTS OF AI
There are three types of components in AI
1) Hardware Components of AI
a) Pattern Matching
b) Logic Representation
c) Symbolic Processing
d) Numeric Processing
e) Problem Solving
f) Heuristic Search
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
g) Natural Language processing
h) Knowledge Representation
i) Expert System
j) Neural Network
k) Learning
l) Planning
m) Semantic Network
2) Software Components
a) Machine Language
b) Assembly language
c) High level Language
d) LISP Language
e) Fourth generation Language
f) Object Oriented Language
g) Distributed Language
h) Natural Language
i) Particular Problem Solving Language
3) Architectural Components
a) Uniprocessor
b) Multiprocessor
c) Special Purpose Processor
d) Array Processor
e) Vector Processor
f) Parallel Processor
g) Distributed Processor
10-Definition of Artificial intelligence
1.AI is the study of how to make computers do things which at the moment people do better.
This is ephemeral as it refers to the current state of computer science and it excludes a major
area ; problems that cannot be solved well either by computers or by people at the moment.
2. AI is a field of study that encompasses computational techniques for performing tasks that
apparently require intelligence when performed by humans.
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
3. AI is the branch of computer science that is concerned with the automation of intelligent
behaviour. A I is based upon the principles of computer science namely data structures used in
knowledge representation, the algorithms needed to apply that knowledge and the languages
and programming techniques used in their implementation.
4. AI is the field of study that seeks to explain and emulate intelligent behaviour in terms of
computational processes.
5. AI is about generating representations and procedures that automatically or autonomously
solve problems heretofore solved by humans.
6. A I is the part of computer science concerned with designing intelligent computer systems,
that is, computer systems that exhibit the characteristics we associate with intelligence in
human behaviour such as understanding language, learning, reasoning and solving problems.
7. A I is the study of mental faculties through the use of computational models.
8. A I is the study of the computations that make it possible to perceive, reason, and act.
9. A I is the exciting new effort to make computers think machines with minds, in the full and
literal sense.
10. AI is concerned with developing computer systems that can store knowledge and
effectively use the knowledge to help solve problems and accomplish tasks. This brief
statement sounds a lot like one of the commonly accepted goals in the education of humans.
We want students to learn (gain knowledge) and to learn to use this knowledge to help solve
problems and accomplish tasks.
WEAK AND STRONG AI
There are two conceptual thoughts about AI namely the Weak AI and Strong AI. The strong
AI is very much promising about the fact that the machine is almost capable of solve a
complex problem like an intelligent man. They claim that a computer is much more efficient
to solve the problems than some of the human experts. According to strong AI, the computer
is not merely a tool in the study of mind, rather the appropriately programmed computer is
really a mind. Strong AI is the supposition that some forms of artificial intelligence can truly
reason and solve problems. The term strong AI was originally coined by John Searle.
In contrast, the weak AI is not so enthusiastic about the outcomes of AI and it simply says that
some thinking like features can be added to computers to make them more useful tools. It says
that computers to make them more useful tools. It says that computers cannot be made
intelligent equal to human being, unless constructed significantly differently. They claim that
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
computers may be similar to human experts but not equal in any cases. Generally weak AI
refers to the use of software to study or accomplish specific problem solving that do not
encompass the full range of human cognitive abilities. An example of weak AI would be a
chess program. Weak AI programs cannot be called ―intelligent‖ because they cannot really
think.
TASK DOMAIN OF AI
Areas of Artificial Intelligence
- Perception
·
Machine Vision: It is easy to interface a TV camera to a computer and get an
image into memory; the problem is understandingwhat the image represents.
Vision takes lots of computation; in humans, roughly 10% of all calories
consumed are burned in vision computation.
·
Speech Understanding: Speech understanding is available now. Some systems
must be trained for the individual user and require pauses between words.
Understanding continuous speech with a larger vocabulary is harder.
·
Touch(tactile or haptic) Sensation: Important for robot assembly tasks.
- Robotics Although industrial robots have been expensive, robot hardware can be cheap:
Radio Shack has sold a working robot arm and hand for $15. The limiting factor in application
of robotics is not the cost of the robot hardware itself. What is needed is perception and
intelligence to tell the robot what to do; ``blind'' robots are limited to very well-structured
tasks (like spray painting car bodies).
- Planning Planning attempts to order actions to achieve goals. Planning applications include
logistics, manufacturing scheduling, planning manufacturing steps to construct a desired
product. There are huge amounts of money to be saved through better planning.
- Expert Systems Expert Systems attempt to capture the knowledge of a human expert and
make it available through a computer program. There have been many successful and
economically valuable applications of expert systems. Expert systems provide the following
benefits
• Reducing skill level needed to operate complex de vices.
• Diagnostic advice for device repair.
• Interpretation of complex data.
• ``Cloning'' of scarce expertise.
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
• Capturing knowledge of expert who is about to ret ire.
• Combining knowledge of multiple experts.
- Theorem Proving Proving mathematical theorems might seem to be mainly of academic
interest. However, many practical problems can be cast in terms of theorems. A general
theorem prover can therefore be widely applicable.
Examples:
• Automatic construction of compiler code generators from a description of a CPU's
instruction set.
• J Moore and colleagues proved correctness of the floating-point division algorithm on AMD
CPU chip.
- Symbolic Mathematics Symbolic mathematics refers to manipulation of formulas,
rather than arithmetic on numeric values.
• Algebra
• Differential and Integral Calculus
Symbolic manipulation is often used in conjunction with ordinary scientific computation as a
generator of programs used to actually do the calculations. Symbolic manipulation programs
are an important component of scientific and engineering workstations.
- Game Playing Games are good vehicles for research because they are well formalized,
small, and self-contained. They are therefore easily programmed. Games can be good models
of competitive situations, so principles discovered in game-playing programs may be
applicable to practical problems.
AI Technique
Intelligence requires knowledge but knowledge possesses less desirable properties such as
- It is voluminous
- it is difficult to characterise accurately
- it is constantly changing
- it differs from data by being organised in a way that corresponds to its application
An AI technique is a method that exploits knowledge that is represented so that
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
- The knowledge captures generalisations; situations that share properties, are grouped
together, rather than being allowed separate representation.
- It can be understood by people who must provide it; although for many programs the bulk of
the data may come automatically, such as from readings. In many AI domains people must
supply the knowledge to programs in a form the people understand and in a form that is
acceptable to the program.
- It can be easily modified to correct errors and reflect changes in real conditions.
- It can be widely used even if it is incomplete or inaccurate.
- It can be used to help overcome its own sheer bulk by helping to narrow the range of
possibilities that must be usually considered.
Problem Spaces and Search
Building a system to solve a problem requires the following steps
- Define the problem precisely including detailed specifications and what constitutes an
acceptable solution;
- Analyse the problem thoroughly for some features may have a dominant affect on the
chosen method of solution;
- Isolate and represent the background knowledge needed in the solution of the problem;
- Choose the best problem solving techniques in the solution.
Defining the Problem as state Search
To understand what exactly artificial intelligence is, we illustrate some common problems.
Problems dealt with in artificial intelligence generally use a common term called 'state'. A
state represents a status of the solution at a given step of the problem solving procedure. The
solution of a problem, thus, is a collection of the problem states. The problem solving
procedure applies an operator to a state to get the next state. Then it applies another operator to
the resulting state to derive a new state. The process of applying an operator to a state and its
subsequent transition to the next state, thus, is continued until the goal (desired) state is
derived. Such a method of solving a problem is generally referred to as state space approach
For example, in order to solve the problem play a game, which is restricted to two person table
or board games, we require the rules of the game and the targets for winning as well as a
means of representing positions in the game. The opening position can be defined as the initial
state and a winning position as a goal state, there can be more than one. legal moves allow for
transfer from initial state to other states leading to the goal state. However the rules are far too
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
copious in most games especially chess where they exceed the number of particles in the
universe 10. Thus the rules cannot in general be supplied accurately and computer programs
cannot easily handle them. The storage also presents another problem but searching can be
achieved by hashing. The number of rules that are used must be minimised and the set can be
produced by expressing each rule in as general a form as possible. The representation of
games in this way leads to a state space representation and it is natural for well organised
games with some structure. This representation allows for the formal definition of a problem
which necessitates the movement from a set of initial positions to one of a set of target
positions. It means that the solution involves using known techniques and a systematic search.
This is quite a common method in AI.
Formal description of a problem
- Define a state space that contains all possible configurations of the relevant objects, without
enumerating all the states in it. A state space represents a problem in terms of states and
operators that change states
- Define some of these states as possible initial states;
- Specify one or more as acceptable solutions, these are goal states;
- Specify a set of rules as the possible actions allowed. This involves thinking about the
generality of the rules, the assumptions made in the informal presentation and how much work
can be anticipated by inclusion in the rules.
The control strategy is again not fully discussed but the AI program needs a structure to
facilitate the search which is a characteristic of this type of program.
Example:
The water jug problem :There are two jugs called four and three ; four holds a maximum of
four gallons and three a maximum of three gallons. How can we get 2 gallons in the jug four.
The state space is a set of ordered pairs giving the number of gallons in the pair of jugs at any
time ie (four, three) where four = 0, 1, 2, 3, 4 and three = 0, 1, 2, 3. The start state is (0,0)
and the goal state is (2,n) where n is a don't care but is limited to three holding from 0 to 3
gallons. The major production rules for solving this problem are shown below:
CS6659-ARTIFICIAL INTELLIGENCE
Initial
condition
VI SEM CSE
goal
comment
1 (four,three) if four < 4
(4,three)
fill four from tap
2 (four,three) if three< 3
(four,3)
fill three from tap
3 (four,three) If four > 0
(0,three)
empty four into drain
4 (four,three) if three > 0
6 (four,three) if four+three<3
(four,0)
empty three into drain
(four+three,0
)
empty three into four
(0,four+three
)
empty four into three
7 (0,three) If three>0
(three,0)
empty three into four
8 (four,0) if four>0
(0,four)
empty four into three
9 (0,2)
(2,0)
empty three into four
5 (four,three) if four+three<4
10 (2,0)
(0,2)
11
(four,three)
if four<4
(4,three-diff)
12 (three,four) if three<3 (four-diff,3)
empty four into three
pour diff, 4-four, into four from
three
pour diff, 3-three, into three from four and
a solution is given below Jug four, jug three rule applied
00
032307
Control strategies.
3 3 2 4 2 11 0 2 3 2 0 10
A good control strategy should have the following requirement: The first requirement is that it
causes motion. In a game playing program the pieces move on the board and in the water jug
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
problem water is used to fill jugs. The second requirement is that it is systematic, this is a clear
requirement for it would not be sensible to fill a jug and empty it repeatedly nor in a game
would it be advisable to move a piece round and round the board in a cyclic way. We shall
initially consider two systematic approaches to searching.
Production Systems
System Types of Production Systems.
A Knowledge representation formalism consists of collections of condition-action
rules(Production Rules or Operators), a database which is modified in accordance with the
rules, and a Production System Interpreter which controls the operation of the rules i.e The
'control mechanism' of a Production System, determining the order in which Production Rules
are fired.
A system that uses this form of knowledge representation is called a production system.
A production system consists of rules and factors. Knowledge is encoded in a declarative from
which comprises of a set of rules of the form
Situation ------------ Action
SITUATION that implies
ACTION.
Example:IF the initial state is a goal state THEN quit.
The major components of an AI production system are
i. A global database
ii. A set of production rules and
iii. A control system
The goal database is the central data structure used by an AI production system. The
production system. The production rules operate on the global database. Each rule has a
precondition that is either satisfied or not by the database. If the precondition is satisfied, the
rule can be applied. Application of the rule changes the database. The control system chooses
which applicable rule should be applied and ceases computation when a termination condition
on the database is satisfied. If several rules are to fire at the same time, the control system
resolves the conflicts.
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
Four classes of production systems:1. A monotonic production system
2. A non monotonic production system
3. A partially commutative production system
4. A Commutative production system.
Advantages of production systems:1. Production systems provide an excellent tool for structuring AI programs.
2. Production Systems are highly modular because the individual rules can be added, removed
or modified independently.
3. The production rules are expressed in a natural form, so the statements contained in the
knowledge base should the a recording of an expert thinking out loud.
Disadvantages of Production Systems:One important disadvantage is the fact that it may be very difficult analyse the flow of
control within a production system because the individual rules don‟t call each other.
Production systems describe the operations that can be performed in a search for a solution to
the problem. They can be classified as follows.
Monotonic production system :- A system in which the application of a rule never prevents
the later application of another rule, that could have also been applied at the time the first
rule was selected.
Partially commutative production system:A production system in which the application of a particular sequence of rules transforms
state X into state Y, then any permutation of those rules that is allowable also transforms state
x into state Y.
Theorem proving falls under monotonic partially communicative system. Blocks world
and 8 puzzle problems like chemical analysis and synthesis come under monotonic, not
partially commutative systems. Playing the game of bridge comes under non monotonic ,
not partially commutative system.
For any problem, several production systems exist. Some will be efficient than others.
Though it may seem that there is no relationship between kinds of problems and kinds of
production systems, in practice there is a definite relationship.
Partially commutative , monotonic production systems are useful for solving ignorable
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
problems. These systems are important for man implementation standpoint because they can
be implemented without the ability to backtrack to previous states, when it is discovered that
an incorrect path was followed. Such systems increase the efficiency since it is not
necessary to keep track of the changes made in the search process.
Monotonic partially commutative systems are useful for problems in which changes occur
but can be reversed and in which the order of operation is not critical (ex: 8 puzzle
problem).
Production systems that are not partially commutative are useful for many problems in which
irreversible changes occur, such as chemical analysis. When dealing with such systems, the
order in which operations are performed is very important and hence correct decisions have to
be made at the first time itself.
Monotonic and Non monotonic Learning :
Monotonic learning is when an agent may not learn any knowledge that contradicts what it
already knows. For example, it may not replace a statement with its negation. Thus, the
knowledge base may only grow with new facts in a monotonic fashion. The advantages of
monotonic learning are:
1.greatly simplified truth-maintenance
2.greater choice in learning strategies
Non-monotonic learning is when an agent may learn knowledge that contradicts what it
already knows. So it may replace old knowledge with new if it believes there is sufficient
reason to do so. The advantages of non-monotonic learning are:
1.increased applicability to real domains,
2.greater freedom in the order things are learned in
A related property is the consistency of the knowledge. If an architecture must maintain a
consistent knowledge base then any learning strategy it uses must be monotonic.
7- PROBLEM CHARACTERISTICS
A problem may have different aspects of representation and explanation. In order to choose
the most appropriate method for a particular problem, it is necessary to analyze the problem
along several key dimensions. Some of the main key features of a problem are given below.
CS6659-ARTIFICIAL INTELLIGENCE




VI SEM CSE
 Is the problem decomposable into set of sub problems? Can the solution step be
ignored or undone? 
 Is the problem universally predictable? 
 Is a good solution to the problem obvious without comparison to all the possible
solutions? Is the desire solution a state of world or a path to a state? 
 Is a large amount of knowledge absolutely required to solve the problem? 

Will the solution of the problem required interaction between the computer and
the person?  
The above characteristics of a problem are called as 7-problem characteristics under which
the solution must take place.
PRODUCTION SYSTEM AND ITS CHARACTERISTICS
The production system is a model of computation that can be applied to implement search
algorithms and model human problem solving. Such problem solving knowledge can be
packed up in the form of little quanta called productions. A production is a rule consisting of a
situation recognition part and an action part. A production is a situation-action pair in which
the left side is a list of things to watch for and the right side is a list of things to do so. When
productions are used in deductive systems, the situation that trigger productions are specified
combination of facts. The actions are restricted to being assertion of new facts deduced
directly from the triggering combination. Production systems may be called premise
conclusion pairs rather than situation action pair.
A production system consists of following components.
(a)
side constituent that represent the current problem state and a right hand side that represent an
output state. A rule is applicable if its left hand side matches with the current problem state.
(b) A database, which contains all the appropriate information for the particular task.
Some part of the database may be permanent while some part of this may pertain only
to the solution of the current problem.
(c) A control strategy that specifies order in which the rules will be compared to the
database of rules and a way of resolving the conflicts that arise when several rules
match simultaneously.
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
(d) A rule applier, which checks the capability of rule by matching the content state
with the left hand
side of the rule and finds the appropriate rule from database of rules.
The important roles played by production systems include a powerful knowledge
representation scheme. A production system not only represents knowledge but also action. It
acts as a bridge between AI and expert systems. Production system provides a language in
which the representation of expert knowledge is very natural. We can represent knowledge in
a production system as a set of rules of the form
If (condition) THEN (condition)
along with a control system and a database. The control system serves as a rule interpreter
and sequencer. The database acts as a context buffer, which records the conditions evaluated
by the rules and information on which the rules act. The production rules are also known as
condition – action, antecedent – consequent, pattern – action, situation – response, feedback –
result pairs.
For example,
If (you have an exam tomorrow)
THEN (study the whole night)
The production system can be classified as monotonic, non-monotonic, partially
commutative and commutative.
Figure Architecture of Production System
Features of Production System
Some of the main features of production system are:
Expressiveness and intuitiveness: In real world, many times situation comes like ―i f this
happen-you
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
will do that‖, ―if this is so-then this should happ en‖ and many more. The
production rules essentially tell
us what to do in a given situation.
1. Simplicity: The structure of each sentence in a production system is unique and uniform
as they use ―IF-THEN‖ structure. This structure provides simpli city in knowledge
representation. This feature of production system improves the readability of production
rules.
2. Modularity: This means production rule code the knowledge available in discrete pieces.
Information can be treated as a collection of independent facts which may be added or
deleted from the system with essentially no deletetious side effects.
3. Modifiability: This means the facility of modifying rules. It allows the development of
production rules in a skeletal form first and then it is accurate to suit a specific application.
4. Knowledge intensive: The knowledge base of production system stores pure knowledge.
This part does not contain any type of control or programming information. Each
production rule is normally written as an English sentence; the problem of semantics is
solved by the very structure of the representation.
Disadvantages of production system
1. Opacity: This problem is generated by the combination ofproduction rules. The opacity is
generated because of less prioritization of rules. More priority to a rule has the less
opacity.
2. Inefficiency: During execution of a program several rules may active. A well devised
control strategy reduces this problem. As the rules of the production system are large in
number and they are hardly written in hierarchical manner, it requires some forms of
complex search through all the production rules for each cycle of control program.
3. Absence of learning: Rule based production systems do not store the result of the problem
for future use. Hence, it does not exhibit any type of learning capabilities. So for each time
for a particular problem, some new solutions may come.
4. Conflict resolution: The rules in a production system should not have any type of conflict
operations. When a new rule is added to a database, it should ensure that it does not have
any conflicts with the existing rules.
Problem characteristics
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
Analyze each of them with respect to the seven problem characteristics

 Chess  
 Water jug  
1. Chess
Problem characteristic
Satisfied
Reason
Is the problem decomposable?
No
One game have Single solution
Can solution steps be ignored or
No
In actual game(not in PC) we can‟t undo
previous steps
Is the problem universe
predictable?
No
Problem Universe is not predictable as
we are not sure about move of other
player(second player)
Is a good solution absolute or
relative?
absolute
Absolute solution : once you get one
solution you do need to bother about
other possible solution.
Relative Solution : once you get one
solution you have to find another
possible solution to check which solution
is best(i.e low cost).
By considering this chess is absolute
Is the solution a state or a path?
Path
Is the solution a state or a path to a
state?
– For natural language understanding,
some of the words have
different
interpretations .therefore
sentence may
cause ambiguity. To solve the problem
we need to find interpretation only
, the workings are not necessary (i.e path
to solution is not necessary)
So In chess winning state(goal state)
undone?
describe path to state
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
What is the role of knowledge?
Does the task require humaninteraction?
Lot of knowledge helps to constrain the
search for a solution.
No
Conversational
In which there is intermediate
communication between a person and
the computer, either to
provide additional assistance to the
computer or to provide additional
information to the user, or both.
In chess additional assistance is not
required
2. Water jug
Problem characteristic
Satisfied Reason
Is the problem decomposable?
No
Can solution steps be ignored or
undone?
Yes
Is the problem universe
predictable?
Yes
Problem Universe is predictable bcz to
slove this problem it require only one
person .we can predict what will happen
in next step
Is a good solution absolute or
relative?
absolute
Absolute solution , water jug problem
may have number of solution , bt once
we found one solution,no need to bother
about other solution
Bcz it doesn’t effect on its cost
Is the solution a state or a path?
Path
Path to solution
One Single solution
What is the role of knowledge?
lot of knowledge helps to constrain the
search for a solution.
CS6659-ARTIFICIAL INTELLIGENCE
Does the task require humaninteraction?
VI SEM CSE
Yes
additional assistance is required.
Additional assistance, like to get jugs or
pump
ALGORITHM OF PROBLEM SOLVING
Any one algorithm for a particular problem is not applicable over all types of problems in a
variety of situations. So there should be a general problem solving algorithm, which may work
for different strategies of different problems.
Algorithm (problem name and specification)
Step 1:
Analyze the problem to get the starting state and goal state.
Step 2:
Find out the data about the starting state, goalstate
Step 3:
Find out the production rules from initial database for proceeding the problem to goal state.
Step 4:
Select some rules from the set of rules that can be applied to data.
Step 5:
Apply those rules to the initial state and proceed to get the next state.
Step 6:
Determine some new generated states after applying the rules. Accordingly make
them as current state.
Step 7:
Finally, achieve some information about the goal state from the recently used current state
and get the goal state.
Step 8:
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
Exit.
After applying the above rules an user may get the solution of the problem from a given state
to another state. Let us take few examples.
VARIOUS TYPES OF PROBLEMS AND THEIR SOLUTIONS
Water Jug Problem
Definition:
Some jugs are given which should have non-calibrated properties. At least any one of the jugs
should have filled with water. Then the process through which we can divide the whole water
into different jugs according to the question can be called as water jug problem.
Procedure:
Suppose that you are given 3 jugs A,B,C with capacities 8,5 and 3 liters respectively but are
not calibrated (i.e. no measuring mark will be there). Jug A is filled with 8 liters of water. By a
series of pouring back and forth among the 3 jugs, divide the 8 liters into 2 equal parts i.e. 4
liters in jug A and 4 liters in jug B. How?
In this problem, the start state is that the jug A will contain 8 liters water whereas jug B and
jug C will be empty. The production rules involve filling a jug with some amount of water,
taking from the jug A. The search will be finding the sequence of production rules which
transform the initial state to final state. The state space for this problem can be described by
set of ordered pairs of three variables (A, B, C) where variable A represents the 8 liter jug,
variable B represents the 5 liter and variable C represents the 3 liters jug respectively.
Figure
The production rules are formulated as follows:
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
Step 1:
In this step, the initial state will be (8, 0, 0) as the jug B and jug C will be empty. So the water
of jug A can be poured like:
(5, 0, 3) means 3 liters to jug C and 5 liters will remain in jug A.
(3, 5, 0) means 5 liters to jug B and 3 liters will be in jug A.
(0, 5, 3) means 5 liters to jug B and 3 liters to jug C and jug C and jug A will be empty.
Step2:
In this step, start with the first current state of step-1 i.e. (5, 0, 3). This state can only be
implemented by pouring the 3 liters water of jug C into jug B. so the state will be (5, 3, 0).
Next, come to the second
current state of step-1 i.e. (3, 5, 0). This state can be implemented by only pouring the 5 liters
water of jug B into jug C. So the remaining water in jug B will be 2 liters. So the state will be
(3, 2, 3). Finally come to the third current state of step-1 i.e. (0, 5, 3). But from this state no
more state can be implemented because after implementing we may get (5, 0, 3) or (3, 5, 0) or
(8, 0, 0) which are repeated state. Hence these states are not considerably again for going
towards goal.
So the state will be like:
(5, 0, 3)
(3, 5, 0)
(5, 3, 0)
(3, 2, 3)
(0,5, 3)
X
Step 3:
In this step, start with the first current state of step-2 i.e. (5, 3, 0) and proceed likewise the
above steps.
(5,3, 0) (2, 3, 3)
CS6659-ARTIFICIAL INTELLIGENCE
(3, 2,3)
VI SEM CSE
(6, 2, 0)
Step 4:
In this step, start with the first current state of step-3 i.e. (2, 3, 3) and proceed.
_2,3, 3_ _ _2, 5,
1_ _6,2, 0_ _ _7,
0, 1_
Step 5:
_2, 5, 1_ _ _7, 0,
1_ _6, 0, 2_ _ _1,
5, 2_
Step6:
_7, 0, 1_ _ _7, 1,
0_ _1, 4, 3_ _ _1,
4, 3_
Step7:
_7, 1, 0_ _ _4, 1, 3_
_1, 4, 3_ _ _4, 4, 0_ _Goal_
So finally the state will be (4, 4, 0) that means jug A and jug B contains 4 liters of water each
which is our goal state. One thing you have to very careful about the pouring of water from
one jug to another that the capacity of jug must satisfy the condition to contain that much of
water.
The tree of the water jug problem can be like:
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
Figure
Comments:
This problem takes a lot of time to find the goal state.
This process of searching in this problem is very lengthy.
At each step of the problem the user have to strictly follow the production
rules. Otherwise the problem may go to infinity step.
Missionaries and Carnivals Problem
Definition:
In Missionaries and Carnivals Problem, initially there are some missionaries and some
carnivals will be at a sideof a river. They want to cross the river. But there is only one boat
available to cross the river. The capacity of the boat is 2 and no one missionary or no
Carnivals can cross the river together. So for solving the problem and to find out the solution
on different states is called the Missionaries and Carnival Problem.
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
Procedure:
Let us take an example. Initially a boatman, Grass, Tiger and Goat is present at the left bank of
the river and want to cross it. The only boat available is one capable of carrying 2 objects of
portions at a time. The condition of safe crossing is that at no time the tiger present with goat,
the goat present with the grass at the either side of the river. How they will cross the river?
The objective of the solution is to find the sequence of their transfer from one bank of the river
to the other using the boat sailing through the river satisfying these constraints.
Let us use different representations for each of the missionaries and Carnivals as follows.
B: Boat
T: Tiger
G: Goat
Gr: Grass
Step 1:
According to the question, this step will be (B, T, G, Gr) as all the Missionaries and the
Carnivals are at one side of the bank of the river. Different states from this state can be
implemented as
The states (B, T, O, O) and (B, O, O, Gr) will not be countable because at a time the
Boatman and the Tiger or the Boatman and grass cannot go. (According to the question).
Step 2:
Now consider the current state of step-1 i.e. the state (B, O, G, O). The state is the right side
of the river.
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
So on the left side the state may be (B, T,
O, Gr) i.e._B,O,G,O_ _ _B,T, O, Gr_
(Right)
(Left)
Step 3:
Now proceed according to the left and right sides of the river such that the condition of the
problem must be satisfied.
Step 4:
First, consider the first current state on the right side of step 3 i.e.
Now consider the second current state on the right side of step-3 i.e.
Step 5:
Now first consider the first current state of step 4 i.e.
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
Hence the final state will be (B, T, G, Gr) which are on the right side of the river.
Comments:
This problem requires a lot of space for its state
implementation. It takes a lot of time to search the goal
node.
The production rules at each level of state are very strict.
Chess Problem
Definition:
It is a normal chess game. In a chess problem, the start is the initial configuration of
chessboard. The final state is the any board configuration, which is a winning position for any
player. There may be multiple final positions and each board configuration can be thought of
as representing a state of the game. Whenever any player moves any piece, it leads to different
state of game.
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
Procedure:
Figure
The above figure shows a 3x3 chessboard with each square labeled with integers 1 to 9. We
simply enumerate the alternative moves rather than developing a general move operator
because of the reduced size of the problem. Using a predicate called move in predicate
calculus, whose parameters are the starting and ending squares, we have described the legal
moves on the board. For example, move (1, 8) takes the knight from the upper left-hand corner
to the middle of the bottom row. While playing Chess, a knight can move two squares either
horizontally or vertically followed by one square in an orthogonal direction as long as it does
not move off the board.
The all possible moves of figure are as follows.
Move (1, 8)
move (6, 1)
Move (1, 6)
move (6, 7)
Move (2, 9)
move (7, 2)
Move (2, 7)
move (7, 6)
Move (3, 4)
move (8, 3)
Move (3, 8)
move (8, 1)
Move (4, 1)
move (9, 2)
Move (4, 3)
move (9, 4)
The above predicates of the Chess Problem form the knowledge base for this problem. An
unification algorithm is used to access the knowledge base.
Suppose we need to find the positions to which the knight can move from a particular location,
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
square 2. The goal move (z, x) unifies with two different predicates in the knowledge base,
with the substitutions {7/x} and {9/x}. Given the goal move (2, 3), the responsible is failure,
because no move (2, 3) exists in the knowledge base.
Comments:
In this game a lots of production rules are applied for each move of the square on
the chessboard. A lots of searching are required in this game.
Implementation of algorithm in the knowledge base is very important.
8- Queen Problem
Definition:
―We have 8 queens and an 8x8 Chess board having al ternate black and white squares. The
queens are placed on the chessboard. Any queen can attack any other queen placed on same
row, or column or diagonal. We have to find the proper placement of queens on the Chess
board in such a way that no queen attacks other queen‖.
Procedure:
Figure A possible board configuration of 8 queen problem
In figure , the possible board configuration for 8-queen problem has been shown. The board
has alternative black and white positions on it. The different positions on the board hold the
queens. The production rule for this game is you cannot put the same queens in a same row or
same column or in same diagonal. After shifting a single queen from its position on the board,
the user have to shift other queens according to the production rule. Starting from the first row
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
on the board the queen of their corresponding row and column are to be moved from their
original positions to another position. Finally the player has to be ensured that no rows or
columns or diagonals of on the table is same.
Comments:
This problem requires a lot of space to store the board.
It requires a lot of searching to reach at the goal
state. The time factor for each queen‟s move
is very lengthy. The problem is very strict
towards the production rules.
8- Puzzle Problem
Definition:
―It has set off a 3x3 board having 9 block spaces o ut of which 8 blocks having tiles bearing
number from 1 to 8. One space is left blank. The tile adjacent to blank space can move into it.
We have to arrange the tiles in a sequence for getting the goal state‖.
Procedure:
The 8-puzzle problem belongs to the category of ―sl iding block puzzle‖ type of problem. The
8-puzzle i s a square tray in which eight square tiles are placed. The remaining ninth square is
uncovered. Each tile in the tray has a number on it. A tile that is adjacent to blank space can be
slide into that space. The game consists of a starting position and a specified goal position.
The goal is to transform the starting position into the goal position by sliding the tiles around.
The control mechanisms for an 8-puzzle solver must keep track of the order in which
operations are performed, so that the operations can be undone one at a time if necessary. The
objective of the puzzles is to find a sequence of tile movements that leads from a starting
configuration to a goal configuration such as two situations given below.
Figure
(Starting State)
(Goal State)
The state of 8-puzzle is the different permutation of tiles within the frame. The operations are
the permissible moves up, down, left, right. Here at each step of the problem a function f(x)
will be defined
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
which is the combination of g(x) and h(x).
i.e.
F(x)=g(x) + h (x)
Where
g (x): how many steps in the problem you have already done or the current state
from the initial state.
h (x): Number of ways through which you can reach at the goal state from the current state or
Or
h (x): is the heuristic estimator that compares the current state with the goal state note down
how many states are displaced from the initial or the current state. After calculating the f (x)
value at each step finally take the smallest f (x) value at every step and choose that as the next
current state to get the goal
state. Let us take an example.
Figure
(Initial State)
(Goal State)
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
Step1:
f (x) is the step required to reach at the goal state from the initial state. So in the tray
either 6 or 8 can change their portions to fill the empty position. So there will be two
possible current states namely B and C. The f (x) value of B is 6 and that of C is 4. As 4
is the minimum, so take C as the current state to the next state.
Step 2:
In this step, from the tray C three states can be drawn. The empty position will contain
either 5 or 3 or 6. So for three different values three different states can be obtained. Then
calculate each of their f (x) and
take the minimum one.
Here the state F has the minimum value i.e. 4 and hence take that as the
next current state.
Step 3:
The tray F can have 4 different states as the empty positions can be filled with b4
values i.e.2, 4, 5, 8.
31
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
Step 4:
In the step-3 the tray I has the smallest f (n) value. The tray I can be implemented in 3
different states because the empty position can be filled by the members like 7, 8, 6.
Hence, we reached at the goal state after few changes of tiles in different positions
of the trays.
Comments:
This problem requires a lot of space for saving the different trays.
Time complexity is more than that of other problems.
The user has to be very careful about the shifting of tiles in the trays.
Very complex puzzle games can be solved by this technique.
Tower of Hanoi Problem
Definition:
―We are given a tower of eight discs (initially) fo ur in the applet below, initially
stacked in increasing size on one of three pegs. The objective is to transfer the entire
tower to one of the other pegs (the right most one in the applet below), moving only one
disc at a time and never a larger one onto a smaller‖.
Procedure:
The tower of Hanoi puzzle was invented by the French mathematician Eduardo Lucas in
1883. The puzzle is well known to students of computer science since it appears in
virtually any introductory text on data structure and algorithms.
The objective of the puzzle is to move the entire stack to another rod, obeying the
following rules.
32
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
Only one disc can be moved at a time.
Each move consist of taking the upper disc from one of the rods and sliding it
onto another rod, on top of the other discs that may already be present on that
rod.
No disc may be placed on the top of a smaller disk.
There is a legend about a Vietnamese temple which contains a large room with three
times. Worn posts in it surrounded by 64 golden disks. The priests of Hanoi, acting out of
command of an ancient prophecy, have been moving these disks, in accordance with the
rules of the puzzle, since that time. The puzzle is therefore also known as the tower of
Brahma puzzle. According to the legend, when the last move of the puzzle is completed,
the world will end.
There are many variations on this legend. For instance, in some tellings, the temple is a
monastery and the priests are monks. The temple or monastery may be said to be in
different parts of the world including Hanoi, Vietnam and may be associated with any
religion. The flag tower of Hanoi may have served as the inspiration for the name.
The puzzle can be played with any number of disks, although many toy versions have
around seven to nine of them. The game seems impossible to many novices yet is
solvable with a simple algorithm. The following solution is a very simple method to solve
the tower of Hanoi problem.
Alternative moves between the smallest piece and a non- smallest piece. When
moving the smallest piece, always move it in the same direction (to the right if
starting number of pieces is even, to the left if starting number of pieces is odd).
If there is no tower in the chosen direction, move the pieces to the opposite end,
but then continue to move in the correct direction, for example if you started
with three pieces, you would move the smallest piece to the opposite end, then
continue in the left direction after that.
When the turn is to move the non-smallest piece, there is only one legal move. Doing
this should complete the puzzle using the least amount of moves to do so. Finally, the user
will reach at the goal. Also various types of solutions may be possible to solve
the tower of Hanoi problem like recursive procedure, non-recursive procedure and
binary solution procedure.
33
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
Another simple solution to the problem is given below.
For an even number of disks
Make the legal move between pegs A and B. Make the legal moves
between pages A and C. Make the legal move between pages B and C.
For an even number of disks
Make the legal move between pegs A and C. Make the legal move between pegs
A and B. Make the legal move between pegs B and C. Repeat until complete.
A recursive solution for tower of Hanoi problem is as follows.
A key to solving this problem is to recognize that it can be solve by breaking the problem
down into the collection of smaller problems and further breaking those problems down
into even smaller problems until a solution is reached. The following procedure
demonstrates this approach.
Label the pegs A, B, C - these levels may move at
different steps. Let n be the total number of disks.
Number of disks from 1 (smallest, topmost) to n (largest, bottom most). To
move n disks from peg A to peg C.
a) Move n-1 disks from A to B. This leaves disk #n alone on peg A.
b) Move disk #n from A to C.
c) Move n-1 disks from B to C so they sit on disk #n.
To carry out steps a and c, apply the same algorithm again for n-1. The entire procedure
is a finite number of steps, since at most point the algorithm will be required for n = 1.
This step, moving a single disc from peg A to peg B, is trivial.
Comments:
The tower of Hanoi is frequently used in psychological research on problem solving.
This problem is frequently used in neuro-psychological diagnosis and treatment of
executive functions.
The tower of Hanoi is also used as backup rotation scheme when performing
computer data backups where multiple tabs/media are involved.
This problem is very popular for teaching recursive algorithm to beginning
programming students.
A pictorial version of this puzzle is programmed into emacs editor, accessed by
34
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
typing M - X Hanoi.
The tower of Hanoi is also used as a test by neuro-psychologists trying to
evaluate frontal lobe deficits.
Cryptarithmatic Problem
Definition:
―It is an arithmetic problem which is represented i n letters. It involves the decoding of
digit represented by a character. It is in the form of some arithmetic equation where digits
are distinctly represented by some characters. The problem requires finding of the digit
represented by each character. Assign a decimal digit to each of the letters in such a way
that the answer to the problem is correct. If the same letter occurs more than once, it must
be assigned the same digit each time. No two different letters may be assigned the same
digit‖.
Procedure:
Cryptarithmatic problem is an interesting constraint satisfaction problem for which
different algorithms have been developed. Cryptarithm is a mathematical puzzle in which
digits are replaced by letters of the alphabet or other symbols. Cryptarithmatic is the
science and art of creating and solving cryptarithms.
The different constraints of defining a cryptarithmatic problem are as follows.
1) Each letter or symbol represented only one and a unique digit throughout the
problem.
2) When the digits replace letters or symbols, the resultant arithmetical operation
must be correct. The above two constraints lead to some other restrictions in the
problem.
For example:
Consider that, the base of the number is 10. Then there must be at most 10 unique
symbols or letters in the problem. Otherwise, it would not possible to assign a unique
digit to unique letter or symbol in the problem. To be semantically meaningful, a number
35
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
must not begin with a 0. So, the letters at the beginning of each number should not
correspond to 0. Also one can solve the problem by a simple blind search. But a rule
based searching technique can provide the solution in minimum time.
SEARCHING
Problem solving in artificial intelligence may be characterized as a systematic search
through a range of possible actions in order to reach some predefined goal or solution.
In AI problem solving by search
algorithms is quite common technique. In the coming age of AI it will have big impact on
the technologies of the robotics and path finding. It is also widely used in travel planning.
This chapter contains the different search algorithms of AI used in various applications.
Let us look the concepts for visualizing the algorithms.
A search algorithm takes a problem as input and returns the solution in the form of an
action sequence. Once the solution is found, the actions it recommends can be carried
out. This phase is called as the execution phase. After formulating a goal and problem to
solve the agent cells a search procedure to solve it. A problem can be defined by 5
components.
a) The initial state: The state from which agent will start.
b) The goal state: The state to be finally reached.
c) The current state: The state at which the agent is present after starting from the
initial state.
d) Successor function: It is the description of possible actions and their outcomes.
e) Path cost: It is a function that assigns a numeric cost to each path.
DIFFERENT TYPES OF SEARCHING
The searching algorithms can be various types. When any type of searching is
performed, there may some information about the searching or mayn‟t be. Also it is
possible that the searching procedure may depend upon any constraints or rules.
However, generally searching can be classified into two types i.e. uninformed searching
and informed searching. Also some other classifications of these searches are given
below in the figure
36
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
The searching algorithms are divided into two categories
1. Uninformed Search Algorithms (Blind Search)
2. Informed Search Algorithms (Heuristic Search)
There are six Uninformed Search Algorithms
1.Breadth First Search
2.Uniform-cost search
3.Depth-first search
4.Depth-limited search
5. Iterative deepening depth-first
search 6.Bidirectional Search
There are three Informed Search Algorithms
1. Best First Search
2. Greedy Search
3. A* Search
Blind search Vs Heuristic search
Blind search
Heuristic search
No information about the number of steps The path cost from the current state to
(or)path cost fromcurrent stateto goal state the goal state is calculated ,to select the
minimum path cost as the next state.
Less effective insearch method
More effectiveinsearch method
Problem to be solved with the given
information
Additional information can be added as
assumption to solve the problem
Breadth-first search
Breadth-first search is a simple strategy in which the root node is expanded first,
then all the successors of the root node are expanded next, then their successors, and
so on. In general, all the nodes are expanded at a given depth in thesearch tree before
any nodes at the next level are expanded.
Breadth-firstsearch canbe implemented bycalling TREE- SEARCH with an empty
fringe that is afirst-in-first-out (FIFO) queue, assuringthat the nodes that arevisited
first will be expanded first.
In other words,calling TREE-SEARCH(Problem,FIFO- QUEUE()) results in a
breadth-first search. The FIFO queue puts allnewlygenerated successors at the
endofthe queue, which means that shallow nodes are expanded before deeper nodes
37
CS6659-ARTIFICIAL INTELLIGENCE
Breadth first search trees after node expansions
Example: Route finding problem
Task: Find a ,path from. S to G using BFS
VI SEM CSE
The path in the 2nd depth level is selected, (i.e) SBG{or) SCG.
38
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
Algorithm :
function BFS{problem) returns a solution or failure
return TREE-SEARCH (problem, FIFO-QUEUE( ))
Time and space complexity:
Example:
Time complexity
2
d
= 1 +b + b + . . . . . . . + b
d
= O(b )
The space complexity is same as time complexity because all the leaf nodes of the tree
d
must be maintained in memory at the same time = O(b )
Completeness: Yes
Optimality: Yes, provided the path cost is a non decreasing function of the depth of
the node
Guaranteed
Advantage:
shallowest depth level
to find
the
single
solution
at
the
Disadvantage: Suitable for only smallest instances problem (i.e.) (number of levels
to be mininimum (or) branching factor to be minimum)
'
Uniform-cost search
Breadth-first search is optimal whenallstep costs
are
equal, because
it always
expands the shallowest unexpanded node. By a simple extension, we can find an
algorithm that is optimal with anystep cost function. Instead of expanding the
shallowest node, uniform-cost search expands the node n with the lowest path cost.
Note that if all step costs are equal, this is identical to breadth-first search.
39
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
Uniform-cost search does not care about the number of steps a path has, but only
about their total cost.
Example: Route finding problem
Task : Find a minimum path cost from S to G
Since the value of A is less it is expanded first, but it is not optimal.
B to be expanded next
SBG is the path with minimum path cost.
No need to expand the next path SC, because its path cost is high to reach C from S, as
well as goal state is reached in the previous path with minimum cost.
40
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
Time and space complexity:
Time complexity
is same
as breadth
first
instead of depth level the minimum path cost is considered.
search
d
Time complexity:
Completeness: Yes
O(b )
because
d
Space complexity: O(b
Optimality: Yes
)
Advantage: Guaranteed to find the single solution at minimum path cost.
Disadvantage: Suitable for only smallest instances problem.
Depth-first search
Depth-first search always expands the deepest node in the current fringe of the search
tree
The search proceeds immediately to the deepest level of the search tree, where the
nodes have no successors. As those nodes are expanded, they are dropped from the
fringe, so then the search ―backs up― to the next shallowest node that still has
unexploredsuccessors.Thisstrategy can be implemented by TREE-SEARCH with a
last-in-first-out (LIFO) queue, also known as a stack.
Depth first search tree with 3 level expansion
41
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
Example: Route finding problem
Task: Find a path from S to G using DFS
The path in the 3rd depth level is selected. (i.e. S-A-D-G
Algorithm:
function DFS(problem) return a solution or failure
TREE-SEARCH(problem, LIFO-QUEUE())
Time and space complexity:
In the worst case depth first search has to expand all the nodes
Time complexity
m
: O(b ).
The nodes are expanded towards one particular direction requires memory for
only that nodes.
42
CS6659-ARTIFICIAL INTELLIGENCE
Space complexity :
VI SEM CSE
O(bm)
b=2
m = 2 :. bm=4
Completeness: No
Optimality: No
Advantage: If more than one solution exists (or) number of levels is high then DFS is
best because exploration is done only in a small portion of the whole space.
Disadvantage: Not guaranteed to find a solution
Depth - limited search
1. Definition: A cut off (maximum level of the depth) is introduced in this search
technique to over come the disadvantage of depth first search.The cutoff value
depends on the number of states.
Example: Route finding problem
The number of states
in the
given
map is
5. So, it
is
43
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
possible to get the goal state at a maximum depth of 4. Therefore the cutoff value is 4
Task : Find a path from A to E.
A recursive implementation of depth-limited search
function DEPTH-LIMITED-SEARCH(problem, limit) returns a solution,
or failure/cutoff
return RECURSIVE-DLS(MAKE-NODE(INITIAL-STATE [problem]),
problem, limit)
function RECURSIVE-DLS(node, problem, limit) returns a solution, or
failure/cutoff cutoffoccurred? <- false
if GOAL-TEST[problem](STATE[node]) then return
SOLUTION(node)
else if DEPTH[node] =limit then return cutoff
else for each successor in EXPAND(node, problem) do result <RECURSIVE-DLS(successor, problem, limit) if result = cutoff then
cutoff-occurred?<- true
else if result
failure then return result
if cutoff-occurred? then return cutoff else
return failure
44
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
Time and space complexity:
The worst case time complexity is equivalent to BFS and worst case DFS.
l
Time complexity : O(b )
The nodes which is expanded in one particular direction above to be stored.
Space complexity : O(bl)
Optimality: No, because not guaranteed to find the shortest solution first in the search
technique.
Completeness : Yes, guaranteed to find the solution if it exists.
Advantage: Cut off level is introduced in the DFS technique
Disadvantage : Not guaranteed to find the optimal solution. Iterative deepening
search
Iterative deepening search
Definition: Iterative deepening search is a strategy that side step the issue
of choosing the best depth limits by trying all possible depth limits.
Example: Route finding problem
Task: Find a path from A to G
Limit = 0
Limit = 1
45
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
Limit = 2
Solution: The goal state G can be reached from A in four ways. They are:
-------Limit
1.A — B — D - E — G 4
2.A - B - D - E - G ------- Limit 4
3.A - C - E - G
------- Limit 3
4.A - F - G ------ Limit2
Since it is a iterative deepening search it selects lowest depth limit (i.e.) A-F-G is
selected as the solution path.
The iterative deepening search algorithm :
function ITERATIVE-DEEPENING-SEARCH (problem) returns a solution,
or failure
inputs : problem
for depth <- 0 to
do
result < -DEPTH-LIMITED-SEARCH(problem,
depth) if result cutoff then return result
Time and space complexity :
Iterative deepening combines the advantage of breadth first search and depth first
search (i.e) expansion of states is done as BFS and memory requirement is equivalent
to DFS.
d
Time complexity : O(b )
Space Complexity : O(bd)
46
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
Optimality: Yes, because the order of expansion of states is similar to breadth first
search.
Completeness:yes, guaranteed
to find
the solutionifit exists.
Advantage: This method is preferred for large state space and the depth of the
search is not known.
Disadvantage : Many states are expanded multiple times
Example : The state D is expanded twice in limit 2
Bidirectional search
:Bidirectional search is a strategy that simultaneously search
Definition
boththedirections(i.e.) forward from the initial state andbackward from the goal,
and stops when the two searches meet in the middle.
Example: Route finding problem
Task : Find a path from A to E. Search
from forward (A) :
Search from backward (E) :
Time and space complexity:
47
CS6659-ARTIFICIAL INTELLIGENCE
The forward
and backward
d/2
VI SEM CSE
searches
done at the
same time
d/2
will lead to the solution in O(2b
) = O(b
)step, because search is done to go only
halfway If the two searches meet at all, the nodes of at least one of them
d/2
must all be retained in memory requires O(b
) space.
Optimality: Yes, because the order of expansion of states is done in both the
directions.
Completeness: Yes, guaranteed to find the solution if it exists.
Advantage : Time and space complexity is reduced.
Disadvantage: If two searches (forward, backward) does not meet at all, complexity
arises in the search technique. In backward search calculating predecessor is difficult
task. If more than one goal state „exists then explicit, multiple state search is required
Comparing uninformed search strategies
Criterion
Complete
Time
Space
Optimal
Breadth
First
Yes
O(b
)
O(b
)
Yes
Uniform
Cost
Yes
O(b
)
O(b
)
Yes
Depth
First
No
O(bm)
O(bm)
No
Depth
Limited
No
O(b )
O(bl)
No
Iterative
Deepening
Yes
O(b )
O(bd)
Yes
Avoiding Repeated States
The most important complication of search strategy is expanding states that
have already been encountered and expanded before on some other path
A state space and its exponentially larger search tree
Bi
direction
Yes
O(b
)
O(b
)
Yes
48
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
The repeated states can be avoided using three different ways. They are:
1. Do not return
to thestate
you just
came from (i.e)
avoid any successor that is the same state
as the node‟s parent.
2. Do not create path with cycles (i.e) avoid any successor of a node that is the same
as any of the node‟s ancestors.
3. Do not generate any state that was ever generated before.
The general TREE-SEARCH algorithm is modified with additional data
structure, such as :
Closed list - which stores every expanded node. Open list - fringe of unexpanded
nodes.
If the current node matches a node on the closed list, then it is discarded and it is not
onsidered for expansion. This is done with GRAPH-SEARCH algorithm. This
algorithm is efficient for problems with many repeated states
function GRAPH-SEARCH (problem, fringe) returns a solution, or
failure
closed <- an empty set
fringe <- INSERT (MAKE-NODE(INITIAL-STATE[problem]), fringe)
loop do
if EMPTv?(fringe) then return failure
node <- REMOVE-FIKST (fringe)
if GOAL-TEST [problem I(STATE[node]) then return SOLUTION
(node)
49
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
if STATE [node] is not in closed
then add STATE [node] to closed
fringe <- INSERT-ALL(EXPAND(node, problem), fringe)
The worst-case time and space requirements are proportional to the size of the state
d
space, this may be much smaller than O(b
)
Searching With Partial Information
When the knowledge
of the states or actions is incomplete about the environment,
then onlypartial
information is known to the agent. This incompleteness lead to
three distinct problem types. They are:
Sensorlessproblems(conformantproblems): Ifthe agent has no sensors at all, then it
could be in one of several possible initial to states, and each action might therefore lead
one of possible successor states.
(ii) Contigency
areuncertain,then
problems: If the environment is partially observable or if actions
theagent‟s percepts
provide new information after each
action. A problem is called adversarial if the uncertainty is caused by the actions of
another agent. To handle the situation of unknown circumstances the agent needs a
contigency plan.
(iii)Explorationproblem:It is an extreme case of contigency problems, where the states
andactionsofthe environment are unknown and the agent must act to discover them.
Measuring Performance of Algorithms
There are two aspects of algorithmic performance:
• Time
- Instructions take time.
- How fast does the algorithm perform?
- What affects its runtime?
• Space
- Data structures take space
- What kind of data structures can be used?
- How does choice of data structure affect the runtime?
Algorithms cannot be compared by running them on computers. Run time is system
dependent. Even on same computer would depend on language Real time units like
microseconds not to be used.
50
-
UNIT II REPRESENTATION OF KNOWLEDGE
Game playing – Knowledge representation, Knowledge representation using Predicate
logic,Introduction to predicate calculus, Resolution, Use of predicate calculus,
Knowledge representation using other logic-Structured representation of knowledge.
The Knowledge Base
The central component of a Knowledge Based agent is its
Knowledge Base or KB. Informally a KB is a set of
sentences. Each sentence is expressed in a language called
a Knowledge Representation Language
The knowledge based agent
perform
the
representation. The tasks
(constructed
following
agent
To know the current state of the
How to infer the unseen properties of the
New changes in the
Goal of the
How to perform actions depends on
language to represent
which reasoning is carried out to achieve the
The
of
a
knowledge
sentence.
sentences are
based
knowledge
agent is
expressed
a language
How to
sentences to the existing knowledge
and how to
the answer for the given percept (inference)? To
these operations two standard tasks are used (i.e) TELL and
ASK.
TELL - to add new sentences to the known To know the
current state of the
function KB-AGENT(percept) returns an
KB, a knowledge
t, a counter, initially 0, indicating
<TELL(KB,MAKE-ACTION-SENTENCE(action,
t <- t +
return
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
and
sentence representing the
MAKE-ACTION-QUERY:
takes
returns
about the
the
time
as
input,
and
returns
MAKE-ACTION-SENTENCE: constructs a sentence asserting
that the chosen action was
Three levels of knowledge
based agent
:
level
knows a formal
work correctly
work knowledge
which describes the agent by saying what
languages representation. If TELL
then most of the time we can
level and not worry about lower
delivering a passenger to Marin County and might know that
it
San Francisco and that the Golden Gate Bridge
the only link between
two
The
knowledge is encoded
is
logical
iii)
The implementation level : is the level
agent
logical
represented as a physical representation in
The Wumpus World
that
runs
Environment
Problem statement: Wumpus was an early game, based on
agent who explores a cave consisting of connected passage
ways. Somewhere in the cave is the wumpus, a
some
contain bottomless pits that will
anyone who
into these rooms (except for the
who
too big
fall in). The only
feature
living in this environment is the occasional
of
Monster: the
Bottomless
Action: shoot an
Treasure:
CSE – Panimalar Institute of Technology
2
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
Wumpus world is a grid of squares surrounded
walls,
Each square can
agent and
Agent
at 1,1 (lower
Agent
facing to the
The gold
the Wumpus are placed at random
in the grid — can’t be in start
The wumpus can be shot by an agent, but the agent has
only
In the square containing the wumpus and in the
adjacent squares the agent will perceive a
a
a
a
agent
When
walks into a wall,
bump.
When the wumpus is killed,
perceived anywhere in
To perceive the
agent
order
perceive
a
percept
CSE – Panimalar Institute of Technology
3
CS6659-ARTIFICIAL INTELLIGENCE
2
3
4
5
VI SEM CSE
Turn left by
Turn right by
Grab - pick up an
Shoot - fire an arrow to kill the wumpus, to
used only
to
Points
and/or
Find
Death by
Death by
Each
Picking up the
PEAS description
for
WUMPUS WORLD ENVIRONMENT
measure: +1000 for picking up the
into a pit or being eaten by the wumpus, -1 for
taken and -10 for using up the
for
each
Environment: A 4 x 4 grid of rooms. The agent always
[1,1],
to
locations of the gold and the wumpus are chosen randomly,
with
uniform distribution, from the squares other
the start square. In addition,
each square other than
the
start
be a pit,
with
probability
Actuators:
right
The
agent can
by
900. The
forward, turn left by 900,
dies
miserable death if it
a live wumpus. Moving
wall in front of
enters a square containing pit
forward has no effect if there is
agent.
The action Grab can be used to pick up an object that
is
the same squareas the agent. The action Shoot can
used to fire an arrow in a straight line in the
the agent is facing. The arrow continues until it
hits
hence kills) the wumpus or hits a
The
only has one arrow, so only the first Shoot action has any
effect.
CSE – Panimalar Institute of Technology
4
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
Sensors: The agent has five sensors, each of which
gives single bit
(not diagonally) adjacent
squares will perceive
the
directly adjacent to a pit,
will perceive
breeze.
the
a
When
agent walks into a wall,
will perceive
bump.
When
wumpus is killed, it
scream that
be perceived anywhere
the
For
example,
percept
there
is
stench
breeze,
will
Breeze, None, None,
Initial state : Agent in the location of square
Goal state : To find the gold and bring it out of
as quickly as possible, without getting
After percept (None, None, None, None,
1. The first step taken by the agent in
The initial situation, after percept
None,
wumpus
None,
but
CSE – Panimalar Institute of Technology
5
CS6659-ARTIFICIAL INTELLIGENCE
After the
VI SEM CSE
with percept (None, Breeze, None, None,
2. The
perceives,
the pit
be exist
(3,1) (or) (2,2) (as
backtracks to the
towards the other
Breeze in square (2,1).
anyone of the
squares
the
rule). Now the
(1,1) and selects a
(1,2).
After the move with
(Stench, None, None, None,
agent
in
Therefore the wumpus
exist in any one of the
squares (1,3) (or)
Now the agent identifies the
safe square (2,2) with the following two
should exist in (2,1) also. Therefore the
is exist only in the square
If the pit is exist in (2,2)
be perceived by the
(1,2)
the pit is exist
in the
should
Therefore
(3,1)
CSE – Panimalar Institute of Technology
6
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
After the fourth move
(None, None, None,
After the fifth move with percept (Stench, Breeze, Glitter,
The
square (2,3) has the percept — Glitter, indicates
is
in the square.
the agent uses
following operators in sequence to come out of the
with
Grab — Pick up an object
Climb — To leave the cave from the start
Representation, Reasoning
And Logic
main
aim
knowledge in computer tractable
improve
defined by two aspects
The
of a language
inside the
form,
help agents
representation language
describes
how
sentences
CSE – Panimalar Institute of Technology
7
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
The semantics determines
facts in the world
(i.e)
agent believes the
Example: x=y, a valid statement If
the
value
of
x
and
y
matches
then
true
else
false
Logic: A knowledge representation language in which syntax
and semantics are defined correctly is known as
Semantics: In logic, semantics represents the meaning of
a sentence is what it states about the
Inference (or) Reasoning: From the set of given facts
conclusion is derived using logical inference or
Validity: A sentence is valid or necessarily true if
it interpretations in
worlds (i.e) universally
Example: There is a
stench at [1, 1] wumpus
The
above
example
is
at [1,1] or there
problem.
true,
not
because,
of
is
world
environment.
Satisfiability: A sentence is satisfiable if and only
there is some interpretation in some world for which it
true.
Example: There is a wumpus at [1,3]
sentence
Example: There is a wall in front of
wall in front of me world
wumpus world
wumpus
and there
CSE – Panimalar Institute of Technology
8
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
LOGICS - AN INTRODUCTION
Logics consist of the following two representations
sequence.
1. A
formal
system
Syntax
-
to
which
describe
the
describes
state
how
of
the
to
make
which describes the meaning of the
2. The
entailments
theory a set of
a set
of
rules
for
deducing
We will represent the sentences using two different
logics. They
1. Propositional logic (or) Boolean
2. Predicate logic (or) First order
Propositional logic
(or)
Boolean logic
The syntax of
sentences. The atomic sentences the
syntactic
elements-consist of a single proposition
Each
symbol stands for a proposition that can be true or
Complex sentences are constructed from
simpler using logical
There are five connectives in common
: A sentence such
is called the
A literal is either an atomic sentence (a
literal) or a negated atomic sentence (a negative
(and) : A sentence whose main
connective
, such
P
3,1
V (or)
a
is, called a
A sentence using V, such
(implies)
A sentence such
called an implication. Its premise
) V
P3,1
CSE – Panimalar Institute of Technology
9
CS6659-ARTIFICIAL INTELLIGENCE
P3,1
VI SEM CSE
or consequent
as rules or if-then
its
are also
only if).
(if
is
of sentences
propositional
sentence -> Atomic sentence I Complex sentence
Atomic Sentence -> True I False I Symbol (P, Q,
R) Complex sentence -> (Sentence)
(Sentence Connective
Sentence) Sentence
connective
I V
Order of precedence (from highest to lowest)
, V
Note : The grammar is very strict about parentheses:
sentence
must
enclosed in
Example
PVQ
S is equivalent to((
P)V(Q
R))=>
Truth table for the five logical
P
The
semantics
Propositional
true
P V
defines
with
logic,
a
the
model
P
P
rules for
a
simply
determining
fixes
the
truth
the
value-
false-for every proposition
For example,
if the sentences in the
use of the proposition
P1,2
possible model
ml
=
=
P3,1
knowledge
base
then
CSE – Panimalar Institute of Technology
10
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
With three proposition symbols, there
= 8
The semantics for propositional logic must specify how
compute the truth value of any sentence, given a
are
the
connectives; therefore,
compute the truth of
of
need
atomic
how
and
to
how
the
For example, a square is breezy
a neighboring square has
a pit, and a square is breezy
if
neighboring
has a pit.
we need bi conditional such
P1,2
can
define
sentence.
the
is true in every row (i.e
different types of logical constants) then the sentence
a valid
Example: (P V
is a valid sentence or
P V
P check whether the given
(P V H)
a
true
((P V H)
because
different
logical
Two sentences a and b are logically equivalent if they
are true
the same set of
We write this as a
For example, we can easily show (using truth tables)
that Q and P are logically
CSE – Panimalar Institute of Technology
11
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
An alternative definition of equivalence is as follows:
for any
sentences a and
b if and only if a=b and
logical
(a
(a V
(b
((a V
(
(a
(
a
(a
commutativity
commutativity of
c ) ) associativity
V
associativity of
a double-negation
contraposition
implication
biconditional
a
V
(b
over
v
over
(
b
b) V
c
V
A sentence is satisfiable if
sentence is true in a
a, or that m is a model of
Inference rules for
De
c)) distributivity
V c)) distributivity of
is true in some model. If
then
satisfies
propositional logic
Inference rules is a standard patterns of inference
can
be applied to derive chains of conclusions that lead to the
desired
1. Modus Ponens or
Implication —
The best-known rule is called Modus Ponens or
Implication Elimination is written as
From an implication and the premise of the
conclusion is
The notation means that,
the sentence
the
CSE – Panimalar Institute of Technology
12
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
For
Shoot and ( WumpusAlive) are given, Shoot can be
And -
From a conjunction, any of the conjunct is
For example, from ( WumpusAlive), can
3.
From
list of sentences their conjunction is
4. Or From a sentence, its disjunction with
is
5. Double - Negation
6. Unit
From a disjunction, if one of the disjuncts is false,
the true one is
7.
CSE – Panimalar Institute of Technology
13
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
Knowledge base for the wumpus world
each
be true
be true
there
there
a pit
[i,
in [i,
knowledge
There is no pit in following sentences,
for
A square is breezy if
neighboring square. This
the
one labeled
only if there is a pit in
to be stated for each
The preceding sentences
true in
wumpus worlds.
we include the breeze percepts for
first two squares
visited in the specific world the agent is in, leading
to
Let
us see
used in
these inference rules and
wumpus world.
start
equivalences
the
base
R l through R5,
that is, there is no pit in
show how
CSE – Panimalar Institute of Technology
14
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
First, apply biconditional elimination to R2 to
Then apply And—Elimination to R6 to
Logical equivalence for contrapositives
Now we can apply Modus Ponens with R8
B1,1), to
and the percept
Finally, De Morgan’s rule, giving the
That is, neither [1,2] nor [2,1] contains
inference rules—is called a
An Agent For The
Assume that
agent has reached the square
CSE – Panimalar Institute of Technology
15
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
The Knowledge Base: The agent percepts are
sentences and entered into the knowledge base,
valid sentences that
entailed by the percept
sentences and
some
,
the
is added to the knowledge
percept
— There is a Stench in
B1,2 — There is no Breeze in
— There
no
in
— There is
Breeze
(2,1)
— There
in
There
in
B1,1
— for the square
— for the square
— for the square
In
environment.
then
agent should
For example, if
the square nor any
some knowledge about the
is no smell in a square,
its adjacent square
house a wumpus. Now we will frame the rule of three
quares, are:
If there is a stench in (1,2) then there must
in (1,2)
in one or more of
neighboring
Finding
Wumpus
1.
obtain
Modus Ponens
be
and the sentence
a
CSE – Panimalar Institute of Technology
16
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
2. Applying And - Elimination to this, we obtain the
three separate
3. Applying Modus Ponens and the sentence R2 , obtain
4. Applying and - Elimination to this, we obtain four
sentences
5. Applying Modus Ponens
obtain
and the
R4
6. Apply the Unit Resolution rule,
(derived in step 5 is splitted
and the
(from step 2),
7. Apply
Resolution rule,
and the
8. Apply
W2,2
yields:
First-Order Logic
The
facts in the
logic
and
Wl,2
and
Wl,3
Resolution rule,
sentences are
,
V
from step
is Wl,3
CSE – Panimalar Institute of Technology
17
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
Order
functions
Properties
Relations
Functions
for a given
the world belongings to be
Things with individual identities.
Used to distinguish one object from the
Relation exist between
one
One kind of relation which has
Properties
Relations
Functions
home, people, college, colors . .
big, small, smell, round . .
owns, bigger than, has
brotherof .
father of, best friend . . .
Identify
the
Objects, Properties,
sentences:
Elephant is bigger than
Objects - elephant,
Relation -,
Squares neighboring the wumpus are
Objects - wumpus
Properties —
One plus two equals
Objects - one, two,
Relations Function - plus (one plus two, derives
:
Syntax and Semantics
of
only
First-Order Logic
Models for first-order logic contains object in them .
domain of a model is the set of objects it contains;
domain
objects are sometimes
The objects in the model
be related in various
CSE – Panimalar Institute of Technology
18
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
A relation is just the set of tuples of objects that are
related. (A tuple is a collection of objects arranged in, a
fixed order
written with angle brackets surrounding
the
The model
contains unary relations, or
are
that
best
given
to
exactly
Symbols
The basic syntactic elements of first—order logic
symbols that stand for objects, relations, and
symbols, which stand
stand for relations;
functions.
objects; predicate symbols,
function symbols, which stand
The convention that
letters.
symbols will begin with
Example : model with
Richard
his
the
Lionheart,
legs
King of
evil
England from 1189
John, who ruled
to
CSE – Panimalar Institute of Technology
19
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
Thus, the brotherhood
in this model is the
{ (Richard the Lionheart, King John), (King John,
the Lionheart) )
the “person“ property is true of both Richard and John;
“king“ property is true only of
Each person has one left leg, so the model has a “left
leg“ function that includes the following
(Richard the Lionheart) -> Richard’s left
(King John) -> John’s left leg
The syntax of first-order
specified in Backus-Naur form.
Sentence -> Atomic
I ( Sentence Connective Sentence
I Quantifier Variable,. . .
I
AtomicSentence -> Predicate(Term, . . .)
I
Term =
Term -> Function(Term, . .
I
I
->
I
Quantifier ->
Constant -> A I X1
I John I . .
Variable -> a I x I s I . .
Predicate -> Before I HasColor I Raining
. .
Function -> Mother I LeftLeg I . .
Explanation for each element in the BNF description of
Term
A
Constants:
not
names.
Example: Name of
(John,
is
a
logical
expression
constant symbol denotes
name, might be
King
that
refers
one
by
to
CSE – Panimalar Institute of Technology
20
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
Variables: Assumes different values in the domain the
inference
Example: The variable x may be assigned to different
in the domain depends on the step
Predicate symbols: It refers to a particular
the
Example: brotherhood (a relation that holds
or hold between pair of
Tuple:
In
a
model
the
relation
is
defined
arranged in a fixed
A
with
objects
defines the relation of brotherhood is
of
[<Ram, Raju> <Raju,
A type
relation
to one of the other object by the
Example:
by
defined
the
by
the
related
is
(refers
by a
parenthesized
of terms (refers to
(i) Brother(john,
(i.e) John is a brother of
2
Married(FatherOf(Ram),
(i.e) Ram’s father is married to John’s
connectives
construct more complex sentences, in which the semantics
of given sentence has to be satisfied. In
P(x,y) — x is a P of
P — Predicate x,
Y —
Example: (i) Brother (Ram,
when Ram is
brother of
(ii) Brother (Ram, John)
when Ram is the brother
Brother(john,
John
John
Ram
the
Raju), is
the brother
CSE – Panimalar Institute of Technology
21
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
(ii) Brother (Ram, John)
Brother (Ram, xyz),
that if Ram is a brother of John then he is not a brother
of
Quantifiers: Quantifiers are used to
express properties
entire collection of objects, rather than representing
objects by name. FOL contains two standard
(i) Universal quantifier (
(ii) Existential quantifier (
Universal quantifier
Vs
Universal quantifier (
Existential quantifier
Existential quantifier (
General notation
x P,
P - Logical
x — variable
- for
General notation
x P,
P - Logical
x — variable
- there
(i.e) P is true for
x in the
(i.e)
Example
All cats are
Example
Spot has a sister who is
P
is
true
for
x Cat(x)->
x Sister(x, Spot) ->
All the cats in
universe belongs to the type The Spot’s sister is a
of
mammal and
hence
implies Spot is also a cat
variable x may be replaced by and hence x may be
any of the cat name (object
x may be
name)
replaced by, sister of
Spot, if it exists.
Felix is a
Spot is a
Felix is a sister of
is a
Cat(Felix)
Sister(Felix, Spot)
Sister(Felix, Spot)->
cat(Felix)
Mammal(Spot
Example
Cat(Spot) ->
Example
King John has a crown on
All kings are
x Crown(x) -> OnHead
x King(x) >
John)
CSE – Panimalar Institute of Technology
22
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
Nested quantifiers : The
multiple quantifiers (of same
are represented using
or different
Example 1 : “Brothers are
y Brother(x, y) -> Sibling(x,
Example
For all x and all y, if x is the parent of y then y
child
y Parent (x, y)-> Child (y,
Example 3
“Everybody
y Loves
Example 4
There is
who is loved by
x Loves
Connections
: The two quantifiers
are actually connected with each other through
Example
the
likes
:
Asserting
as
that
asserting
everyone
there
dislikes
does
not
(
partnerships
exist
someone
Likes(x, Partnerships) is equivalent
x Likes(x,
Example: Everyone likes ice cream means that there is no one
does not like ice
x Likes (x, IceCream) is
equivalent Likes(x,
Ground term (or) clause: A term with no variable is
caned ground
Example : Cat
is
CSE – Panimalar Institute of Technology
23
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
The de Morgan rules for quantified
) and unquantified (connection between )
are
Quantified
Unquantified
Well Formed Formula (WFF) : A
in which
variables are properly Introduced with the
the beginning of the sentence itself, is called
all
x Likes (x, Ice) (ii)
y
Horn clause: Horn clause is a clause with at most
literal in the
Example : P
quantify
(i) Two Objects are equal
applied to them are
x, y (x
p p(x) <=>
well as objects.
and
all
CSE – Panimalar Institute of Technology
24
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
(ii) Two functions are equal if and only if they have
same value
all
f, g
( x f(x) =g(x)
Predicate expression using the Lambda
expression can be applied to
term in the same way
Example:
—
(i.e) (4,3) =4
quantifier
( !):
“there
the predicate
) operator:
to
2
Uniqueness
=
quantifier
(
Every country has exactly one
c Country(c)
! Ruler (r,
Uniqueness operator Ci) : Uniqueness operator
to represent the unique object
is
General notation: ix
i —
x — unique
Example: The unique ruler of Japan is
Dead (i r Ruler(r,
equality symbol to statements
to the effect that two terms refer to the object.
Example
Father( John) =
says
the object
object referred to by Henry
Using First Order
to by Father(John) and the
the
Logic
Domain : Domain is a
knowledge representation can
of the world in which
done for that section
CSE – Panimalar Institute of Technology
25
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
Assertions and queries in first-order
Sentences are added to a knowledge base using TELL,
logic.
king
TELL
TELL
kings
King
x King (x) =>
We can ask questions of the knowledge base using ASK. For
example,
ASK (KB, King(John)) returns
Questions are asked to the knowledge base using ASK,
called as queries or
We can also ask quantified queries, such
ASK
x Person(x))) returns true by substituting
to x (i.e){x/John}. If there is more than one
answer a list of substitutions can be returned. This sort
of answering is called substitution or binding
The kinship
consider
domain
relationships, or kinship. This domain includes facts such as
“Elizabeth is the mother of Charles“ and “Charles is the
father William” and rules such as “One’s grandmother is the
mother of
domain
consists,
(b) Unary
(c) Binary
Sister,
(e)
Male
Parent,
Sibling,
Child, Daughter, Son, Spouse, Wife,
Grandchild, Cousin, Aunt, and
Father, Mother
brotherhood,
CSE – Panimalar Institute of Technology
26
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
(i) Male and female are disjoint
x Male (x)
(ii) Parent and child are inverse
p, c Parent(p, c) <=> Child(c,
(iii) One’s mother is one’s female
Parent(m, c)
(iv) One’s husband is one’s male
w,h Husband(h,w ) Male(h)
(v) A grandparent is a parent of one’s
p Parent
(vi) A sibling is
x,y Sibling(x, y)
Parent
child of one’s
p
Numbers, sets, and
Numbers are perhaps the
theory can be built up from
vivid example of how a
kernel of
Natural numbers are defined
n Natnum(n)
Axioms to constrain the successor
n O
m,n
S(m)
we can define addition in terms of the successor
m NatNum(m)
m,n
+ (O,m) =
(S(m)n, )= S(+(m,n
The domain of mathematical set representation, consists
(a) Constant — Empty
CSE – Panimalar Institute of Technology
27
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
(b) Predicates - Member and
(c) Functions - Intersection, Union and
* Axioms - Sentences in the mathematical set knowledge
base.
* Independent axiom - An axiom that can’t be derived
all other
(i) Two sets are equal if and only if each is a subset
of the
Sl, S2
(si
= S2) <=> (Subset
(Subset (s2,
ii) Adjoining an element already in the set has no
x, s Member(x, s) <=> s = Adjoin(x,
Lists
are
similar
to
sets.
The
differences
are
that
are ordered and the same element can appear more than once in a
Vocabulary of Lisp for
Nil is the constant list with no
Cons, Append, First, and Rest are
Find is the predicate that does for lists
what does for
Example set of
The empty list is [
The term Cons (x, y), where y is a nonempty
list, written [x I
The term Cons(x, Nil), (i.e., the
list element x), is written as
corresponds to the nested
Cons(A,Cons (B, Cons
lists
CSE – Panimalar Institute of Technology
28
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
The wumpus
axioms
section
concise,
capturing in a very natural way exactly what want to
wumpus agent receives a percept vector
five
corresponding
knowledge base must include both the percept and the
at
it occurred; otherwise the agent will get confused
about
it saw what. We
use integers for
steps.
typical percept sentence would
Percept([Stench, Breeze, Glitter, None, None],
Here, Percept is a binary predicate and Stench and so on are
constants placed in a list. The actions in the
world can
represented by logical
Climb
To
query such
which is best, the
agent program constructs
a BestAction(a,
ASK should solve this query and return a binding list such
as
Grab). The agent program can then return Grab as
action to
but first
must TELL its own
base that it
performing
Simple
behavior
also be implemented
quantified implication sentences. For example, we
t Glitter(t) => Bestaction(Grab,
percept sentences are classified
Synchronic (same time)
dealing
relate properties of a world state other properties the
same world
CSE – Panimalar Institute of Technology
29
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
(ii) Diachronic (across time)
sentences describing the way in which the world changes
(or does
change) are diachronic sentences. For
the agent needs to know how to
information
location with
its
just
determine
current
because
change is one of the most
area in the
representation.
There are two kinds of synchronic rules that could allow
capture the necessary information for
Infer
For finding
square must
if
s Breezy(s) =>
If a square
pit.
is
Breezy(s)
Combining these
s
a
square
is
breezy,
some
Adjacent (r,
breezy,
r
the
adjacent
square
contains
(r,
biconditional sentence
r Adjacent (r,
Reflect the
the world, Some hidden property of the
certain percepts to be
casuality
causes
A pit causes all adjacent squares to be
r pit(r) => [ s
all squares adjacent
square will not be
s
r Adjacent (r,s)
(r, s) => Breezy
a given square are pitless,
pit (r)]
Breezy
CSE – Panimalar Institute of Technology
30
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM CSE
that reason with casual
reasoning systems” because the
how the environment
Knowledge Engineering
In
called
rules form
based
model of
First Order Logic
Base(KB)
Knowledge
Engineer:
Who
a
learns
a formal representation
relations in the
of
the
particular
that
objects
the
creates
domain and
The knowledge engineering
engineering projects vary widely
and difficulty, but all such projects include
scope,
following
identify the PEAS description of the
2. Assemble the relevant knowledge. The knowledge engineer
might already be an expert in the domain, or might need
work with real experts to extract what they know-a
called knowledge
Decide
constants:
Translate
the
important
into
logical level name. The resulting vocabulary as ontology is kinds
of the domain, which determines
specific
properties and
4. Encode general know/edge about the domain: The knowledge
engineer writes the axioms (rules) for all the vocabulary
misconceptions are clarified from step 3 and
process
5.
a
description
of
the
specific
terms.
problem
concepts that are already part of
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
UNIT -III
KNOWLEDGE INFERENCE
Knowledge representation -Production based system,Frame based system. Inference -Backward
chaining,Forwardchaining,Rulevalueapproach,Fuzzyreasoning-Certainty
factors,Bayesian
Theory-
BayesianNetwork-Dempster Shafertheory.
Knowledge is a general term.
An answer to the question, "how to represent knowledge", requires an analysis to distinguish between
knowledge “how” and knowledge “that”.
knowing "how to do something".
e.g. "how to drive a car" is a Procedural knowledge.
knowing "that something is true or false".
e.g. "that is the speed limit for a car on a motorway" is a Declarative knowledge.
3. knowledge and Representation are two distinct entities. They play a
central but distinguishable roles in intelligent system.
Knowledge is a description of the world.
It determines a system's competence by what it knows.
•
■ Representation is the way knowledge is encoded.
It defines a system's performance in doing something.
•
Different types of knowledge require different kinds of representation. The Knowledge
Representation models/mechanisms are often based on:
Logic
◊
Rules
Frames
◊
Semantic Net
• Different types of knowledge require different kinds of reasoning.
1.1 Framework of Knowledge Representation (Poole 1998)
.
Computer requires a well-defined problem description to process and provide well-defined
acceptable solution.
To collect fragments of knowledge we need first to formulate a description in our spoken language and then
represent it in formal language so that computer can understand. The computer can then use an algorithm to
1
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
compute an answer. This process is illustrated below.
Solve
Problem
Solution
Represent
Interpret
Informal
Formal
Compute
Representation
Output
Fig. Knowledge Representation Framework
The steps are
− The informal formalism of the problem takes place first.
− It is then represented formally and the computer produces an output.
− This output can then be represented in a informally described solution that user understands
or checks for consistency.
Note : The Problem solving requires
− formal knowledge representation, and
− conversion of informal knowledge to formal knowledge , that is conversion of implicit
knowledge to explicit knowledge.
• Knowledge and Representation
.
large
Problem
solving
requires
amount of knowledge and
mechanism for manipulating that knowledge.
and the Representation
The Knowledge
are
central
distinct entities,
but distinguishable roles in intelligent system.
− Knowledge is a description of the world;
it determines a system's competence by what it knows. −
Representation is the way knowledge is encoded;
it defines the system's performance in doing something.
In simple words, we :
− need to know about things we want to represent , and
2
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
− need some means by which things we can manipulate.
◊ know things
to represent
◊ need means
to manipulate
‡ Objects
- facts about objects in the domain.
‡ Events
- actions that occur in the domain.
‡ Performance
- knowledge about how to do things
‡ Metaknowledge
- knowledge about what we know
‡ Requires
some formalism
- to what we represent ;
Thus, knowledge representation can be considered at two levels :
(a) knowledge level at which facts are described, and
(b) symbol level at which the representations of the objects, defined in terms of symbols,
can be manipulated in the programs.
(b) Mapping between Facts and Representation
Knowledge is a collection of “facts” from some domain.
We need a representation of "facts" that can be manipulated by a program. Normal English is insufficient,
too hard currently for a computer program to draw inferences in natural languages.
Thus some symbolic representation is necessary.
Therefore, we must be able to map "facts to symbols" and "symbols to facts" using forward and
backward representation mapping.
Example : Consider an English sentence
Representations
3
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
Facts
◊ Spot is a dog
A fact represented in English sentence
◊ dog (Spot)
Using forward mapping function the
◊ ∀ x : dog(x) →
above fact is represented in logic
hastail (x) A logical representation of the fact that
"all dogs have tails"
Knowledge Representationis divided into 2 types:
I. Production systems.
II. Frame-based systems.
Knowledge-based system
6. Knowledge base:



A set of sentences
 that describe the world in some formal (representational) language (e.g. firstorder logic)

Domain specific knowledge
7. Inference engine:


A set of procedures that work upon the representational language
and can infer new facts or answer

KB queries (e.g. resolution algorithm, forward chaining)
Domain independent

Automated reasoning systems
• Theorem proverbs
4
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
– Prove sentences in the first-order logic. Use inference rules, resolution rule and resolution refutation.
• Deductive retrieval systems
– Systems based on rules (KBs in Horn form)
– Prove theorems or infer new assertions
• Production systems
– Systems based on rules with actions in antecedents
– Forward chaining mode of operation
• Semantic networks
– Graphical representation of the world, objects are nodes in the graphs, relations are various links
• Frames:
– object oriented representation, some procedural control of inference
PRODUCTION BASED SYSTEMS
Based on rules, but different from KBs in the Horn form Knowledge base is divided into:
• A Rule base (includes rules)
• A Working memory (includes facts)
Rules: a special type of if – then rule p1∧ p2∧Kpn⇒a1, a2,K,ak
Antecedent
A conjunction of condition
Consequent
a sequence of action
Basic operation:
• Check if the antecedent of a rule is satisfied
• Decide which rule to execute (if more than one rule is satisfied)
• Execute actions in the consequent of the rule
5
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
Working memory
• Consists of a set of facts – statements about the world but also can represent various data structures
• The exact syntax and representation of facts may differ across different systems
• Examples:
– predicates

such as Red(car12)

but only ground statements

or
– (type attr1:value1 attr2:value2 …) objects such as: (person age 27 home Toronto)
The type, attributes and values are all atoms
Rules
p1∧ p2∧Kpn⇒ a1, a2,K,ak
• Antecedents: conjunctions of conditions
• Examples:
– a conjunction of literals A(x) ∧B(x) ∧C ( y)
– simple negated or non-negated statements in predicate logic
• or
– conjunctions of conditions on objects/object
– (type attr1 spec1 attr2 spec2 …)
– Where specs can be an atom, a variable, expression, condition (person age [n+4] occupation x) (person age {< 23 ∧> 6})
Production systems p1∧ p2∧Kpn⇒ a1, a2,K,ak
• Consequent: a sequence of actions
• An action can be:
6
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
– ADD the fact to the working memory (WM)
– REMOVE the fact from the WM
– MODIFY an attribute field
– QUERY the user for input, etc …
• Examples:
A(x)∧ B(x)∧ C ( y)⇒ add D(x)
• Or
(Student name x) ⇒ ADD (Person name x)
• Use forward chaining to do reasoning:
– If the antecedent of the rule is satisfied (rule is said to be “active”) then its consequent can be executed
(it is “fired”)
• Problem: Two or more rules are active at the same time.Which one to execute next?
Condition
Actions
s R27
R27
Condition
Actions
s R105
R105
?
• Strategy for selecting the rule to be fired from among possible candidates is called conflict resolution
• Why is conflict resolution important? Or, Why do we care about the order?
• Assume that we have two rules and the preconditions of both are satisfied:
R1: A(x)∧B(x)∧C(y)⇒add D(x)
R2: A(x)∧B(x)∧E(z)⇒delete A(x)
• What can happen if rules are triggered in different order?
• Why is conflict resolution important? Or, Why do we care about the order?
• Assume that we have two rules and the preconditions of both are satisfied:
7
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
R1: A(x)∧B(x)∧C(y)⇒add D(x)
R2: A(x)∧B(x)∧E(z)⇒delete A(x)
• What can happen if rules are triggered in different order?
– If R1 goes first, R2 condition is still satisfied and we infer D(x)
– If R2 goes first we may never infer D(x)
• Problems with production systems:
– Additions and Deletions can change a set of active rules;
– If a rule contains variables testing all instances in which the rule is active may require a large
number of unifications.
– Conditions of many rules may overlap, thus requiring to repeat the same unifications multiple
times.
• Solution: Rete algorithm
– gives more efficient solution for managing a set of active rules and performing unifications
– Implemented in the system OPS-5 (used to implement XCON – an expert system for
configuration of DEC computers)
Rete algorithm
• Assume a set of rules:
A(x)∧ B(x)∧ C ( y)⇒ add D(x)
A( x)∧ B( y)∧ D( x)⇒ add E ( x) A( x)∧ B( x)∧ E ( z)⇒ delete A( x)
2.
And facts: A(1), A(2), B(2), B(3), B(4), C (5)
3. Rete:
– Compiles the rules to a network that merges conditions of multiple rules together (avoid repeats)
– Propagates valid unifications
– Reevaluates only changed conditions
Rete algorithm.Network.
CS6659-ARTIFICIAL INTELLIGENCE
Rules:
A ( x )∧ B ( x )∧ C ( y )⇒ add
A ( x )∧ B ( y )∧ D ( x )⇒ add
A ( x )∧ B ( x )∧ E ( z )⇒ delete
Facts:
VI SEM
D(x)
E(x)
A(x)
A (1), A (2), B (2), B (3), B (4), C (5)
Conflict resolution strategies
5. Problem: Two or more rules are active at the same time.Which one to execute next?
6. Solutions:
– No duplication (do not execute the same rule twice)
– Regency. Rules referring to facts newly added to the working memory take precedence
– Specificity. Rules that are more specific are preferred.
– Priority levels. Define priority of rules, actions based on expert opinion. Have multiple priority levels
such that the higher priority rules fire first.
FRAME-BASED REPRESENTATION
• Frames – semantic net with properties
• A frame represents an entity as a set of slots (attributes) and associated values
• A frame can represent a specific entry, or a general concept
• Frames are implicitly associated with one another because the value of a slot can be another
frame
9
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
3 components of a frame
• frame name
• attributes (slots)
• values (fillers: list of values, range, string, etc.)
Book Frame
Slot

Filler

d) Title
AI. A modern
Approach
e) Author
f) Year


Russell &Norvig
2003
Features of Frame Representation
• More natural support of values then semantic nets (each slots has constraints describing legal values
that a slot can take)
• Can be easily implemented using object-oriented programming techniques
• Inheritance is easily controlled
Inheritance
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
Benefits of Frames
• Makes programming easier by grouping related knowledge
• Easily understood by non-developers
• Expressive power
• Easy to set up slots for new properties and relations
• Easy to include default information and detect missing values
Inferential Knowledge :
This knowledge generates new information from the given information. This new
information does not require further data gathering form source, but does require analysis of the given
information to generate newknowledge.





given a set of relations and values, one may infer other values or relations.
a predicate logic (a mathematical deduction) is used to infer from a set of attributes.

inference through predicate logic uses a set of logical operations to relate individual data.
o

the symbols used for the logic operations are :
"
" ¬
→ " (implication),
"
(not),
" V " (or),
"
"
∀
" (for all),
"
∃
(there exists).
" Λ " (and),

11
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
Examples of predicate logic statements :
f)
g)
h)
"Wonder" is a name of a dog :dog (wonder)
All dogs belong to the class of animals :∀x : dog (x) →animal(x)
All animals either live on land or in:∀x : animal(x)→live (x,water : land) V live (x, water)
Declarative knowledge :
Here, the knowledge is based on declarative facts about axioms and domains .





axioms are assumed to be true unless a counter example is found to invalidate them.
domains represent the physical world and the perceived functionality.


axiom and domains thus simply exists and serve as declarative statements that can stand alone

Procedural knowledge:
Here, the knowledge is a mapping process between domains that specify ―what to do when‖ and the
representation is of ―how to make it‖ ratherthan “what it is”. The procedural knowledge :
o may have inferential efficiency, but no inferential adequacy and acquisitional efficiency.
o are represented as small programs that know how to do specific things, how to proceed.
Example : A parser in a natural language has the knowledge that a nounphrase may contain articles,
adjectives and nouns. It thus accordingly call routines that know how to process articles, adjectives and
nouns.
■ Inference Procedural Rules
These rules advise on how to solve a problem, while certain facts are known.
Example :
IF the data needed is not in the system THEN request it from the user.
These rules are part of the inference engine.
The logic model of computation is based on relations and logical inference.
12
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
Programs consist of definitions of relations. Computations are inferences (is a proof).
Example 1 :Linear function
A linear function y = 2x + 3 can be represented as : f (X , Y) if Y is 2 ∗ X + 3.
Here the function represents the relation between X and Y.
Example 2: Determine a value for Circumference.
The circumference computation can be represented as:
Circle (R , C) if Pi = 3.14 and C = 2 ∗ pi ∗ R.
Here the function is represented as the relation between radius R and circumference C.
Forward versus Backward Reasoning
Rule-Based system architecture consistsa set of rules, a set of facts, andan inference engine. The need is
to find what new facts can be derived.Given a set of rules, there are essentially twoways to generate new
Knowledge: one, forward chaining and the other, backward chaining.
Forward chaining : also called data driven.
It starts with the facts, and sees what rules apply.
Backward chaining : also called goal driven.
It starts with something to find out, and looks for rules that will help in answering it.
KR – forward-backward reasoning
Example 1
Rule
13
R1 :
IF hot AND smoky THEN fire
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
Rule
R2 :
IF alarm_beeps
THEN smoky
Rule
R3 :
IF fire
THEN switch_on_sprinklers
Fact
F1
:
alarm_beeps [Given]
Fact
F2
:
hot
[Given]
Example 2
Rule R1 :
IF hot AND smoky
THEN ADD fire
Rule R2 :
IF alarm_beeps
THEN ADD smoky
Rule R3 :
IF fire
THEN ADD switch_on_sprinklers
Fact F1
:
alarm_beeps [Given]
Fact F2
:
hot
[Given]
KR – forward-backward reasoning
Example 3 : A typical Forward Chaining
Rule R1 :
IF hot AND smoky THEN ADD fire
Rule R2 :
IF alarm_beeps
THEN ADD smoky
Rule R3 :
If
THEN ADD switch_on_sprinklers
fire
Fact F1
:
alarm_beeps
[Given]
Fact F2
:
hot
[Given]
Fact F4
:
smoky
[from F1 by R2]
Fact F2
:
fire
[from F2, F4 by R1]
Fact F6
:
switch_on_sprinklers [from F2 by R3]
Example 4 : A typical Backward Chaining
Rule R1 :
14
IF hot AND smoky
THEN fire
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
Rule R2 :
IF alarm_beeps
THEN smoky
Rule R3 :
If _fire
THEN switch_on_sprinklers
Fact F1
:
hot
[Given]
Fact F2
:
alarm_beeps
[Given]
Goal :
Should I switch sprinklers on?
Properties of Forward Chaining







all rules which can fire do fire.

can be inefficient - lead to spurious rules firing, unfocused problem solving
set of rules that can fire known as conflict set.


decision about which rule to fire is conflict resolution.

KR – forward chaining
Forward chaining algorithm - I
Repeat
Collect the rule whose condition matches a fact in WM.
Do actions indicated by the rule.
(add facts to WM or delete facts from WM)
Until problem is solved or no condition match
Backward chaining system
Backward chaining means reasoning from goals back to facts. The idea is to focus on the
search. Rules and facts are processed using backward chaining interpreter.
Checks hypothesis, e.g. "should I switch the sprinklers on?"
15
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
Backward chaining algorithm
Prove goal G
If
G
is in the initial facts , it is proven.
Otherwise, find arulewhich can be used to conclude G, and
try to prove each ofthat rule's conditions.
alarm_beeps
Smoky
hot
fire
switch_on_sprinklers
Encoding of rules
Rule R1 :
IF hot AND smoky THEN fire
Rule R2
:
IF alarm_beeps
THEN smoky
Rule R3
:
If fire
THEN switch_on_sprinklers
Fact F1
:
hot
[Given]
Fact F2
:
alarm_beeps
[Given]
Goal :
Should I switch sprinklers on?
Forward vs Backward Chaining
Depends on problem, and on properties of rule set.
Backward chaining is likely to be better if there is clear hypotheses. Examples : Diagnostic problems or
classification problems, Medical expert systems
Forward chaining may be better if there is less clear hypothesis and want to see what can be concluded
from current situation;
16
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
Examples : Synthesis systems - design / configuration.
Uncertain Knowledge
When an agent knows enough facts about its environment, the logical approach enables it to derive plans
that are guaranteed to work. This is a good thing. Unfortunately, agents almost never have access to the
whole truth about their environment. Agents must, therefore, act under uncertainty. The real world is far
more complex than the wumpus world. For a logical agent, it might be impossible to construct a complete
and correct description of how its actions will work.
Suppose, for example, that the agent wants to drive someone to the airport to catch a flight and is
considering a plan, Ago, that involves leaving home 90 minutes before the flight departs and driving at a
reasonable speed. Even though the airport is only about 15 miles away, the agent will not be able to
conclude with certainty that "Plan Ago will get us to the airport in time." Instead, it reaches the weaker
conclusion "Plan Ago will get us to the airport in time, as long as my car doesn't break down or run out of
gas, and I don't get into an accident, and there are no accidents on the bridge, and the plane doesn't leave
early, and . . . ." None of these conditions can be deduced, so the plan's success cannot be inferred. This is
an example of the qualification problem.
The right thing to do-the rational decision—therefore depends on both the relative importance of
various goals and the likelihood that, and degree to which, they will be achieved.
Handling Uncertain Knowledge
Diagnosis-whether for medicine, automobile repair, or whatever-is a task that almost always involves
uncertainty. Let us try to write rules for dental diagnosis using first-order logic, so that we can see how the
logical approach breaks down. Consider the following rule:
V p Symptom (p, Toothache) * Disease (p, Cavity) .
17
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
The problem is that this rule is wrong. Not all patients with toothaches have cavities; some of them have
gum disease, an abscess, or one of several other problems:
V p Symptom (p, Toothache) + Disease (p, Cavity) V Disease (p, GumDisease) V Disease (p, Abscess) . . .
Unfortunately, in order to make the rule true, we have to add an almost unlimited list of possible causes.
We could try turning the rule into a causal rule:
V y Disease (p, Cavity) + Symptom (p, toothache)
But this rule is not right either; not all cavities cause pain. The only way to fix the rule is to make it
logically exhaustive: to augment the left-hand side with all the qualifications required for a cavity to cause
a toothache. Even then, for the purposes of diagnosis, one must also take into account the possibility that
the patient might have a toothache and a cavity that are unconnected.
Trying to use first-order logic to cope with a domain like medical diagnosis thus fails for three main
reasons:
Laziness: It is too much work to list the complete set of antecedents or consequents needed to ensure an
exceptionless rule and too hard to use such rules.
Theoretical ignorance: Medical science has no complete theory for the domain.
Practical ignorance: Even if we know all the rules, we might be uncertain about a particular patient
because not all the necessary tests have been or can be run.
Probability provides a way of summarizing the uncertainty that comes from our laziness and ignorance.
We might not know for sure what afflicts a particular patient, but we believe that there is, say, an 80%
chance-that is, a probability of 0.8-that the patient has a cavity if he or she has a toothache.
Assigning a probability of 0 to a given sentence corresponds to an unequivocal belief that the sentence is
false, while assigning a probability of 1 corresponds to an unequivocal belief that the sentence is true.
18
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
Probabilities between 0 and 1 correspond to intermediate degrees of belief in the truth of the sentence. The
sentence itself is in fact either true or false.
All probability statements must therefore indicate the evidence with respect to which the probability is
being assessed. As the agent receives new percepts, its probability assessments are updated to reflect the
new evidence. Before the evidence is obtained, we talk about prior or unconditional probability; after the
evidence is obtained, we talk about posterior or conditional probability.
Uncertainty and Rational Decisions
The presence of uncertainty radically changes the way an agent makes decisions. A logical agent typically
has a goal and executes any plan that is guaranteed to achieve it. An action can be selected or rejected on
the basis of whether it achieves the goal, regardless of what other actions might achieve. When uncertainty
enters the picture, this is no longer the case.
To make such choices, an agent must first have preferences between the different possible outcomes of
the various plans. A particular outcome is a completely specified state, including such factors as whether
the agent arrives on time and the length of the wait at the airport. We will be using utility theory to
represent and reason with preferences. (The term utility is used here in the sense of "the quality of being
useful," not in the sense of the electric company or water works.) Utility theory says that every state has a
degree of usefulness, or utility, to an agent and that the agent will prefer states with higher utility.
The utility of a state is relative to the agent whose preferences the utility function is supposed to represent.
The utility of a state in which White has won a game of chess is obviously high for the agent playing
White, but low for the agent p1aying Black.
A utility function can even account for altruistic behavior, simply by including
19
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
the welfare of others as one of the factors contributing to the agent's own utility.
Preferences, as expressed by utilities, are combined with probabilities in the general theory of ra.tiona1
decisions called decision theory:
Decision theory = probability theory + utility theory.
The fundamental idea of decision theory is that an agent is rational if and only if it chooses the action that
yields the highest expected utility, averaged over all the possible outcomes of the action. This is called the
principle of Maximum Expected Utility (MEU).
Design for a Decision – Theoritic Agent
The primary difference is that the decision-theoretic agent's knowledge of the current state is uncertain; the
agent's belief state is a representation of the probabilities of all possible actual states of the world. As time
passes, the agent accumulates more evidence and its belief state changes. Given the belief state, the agent
can make probabilistic predictions of action outcomes and hence select the action with highest expected
utility.
20
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
Basic Probability Notation
Any notation for describing degrees of belief must be able to deal with two main issues:

 The nature of the sentences to which degrees of belief are assigned 
 Dependence of the degree of belief on the agent's experience. 
Propositions
Probability theory typically uses a language that is slightly more expressive than propositional logic.
The basic element of the language is the random variable, which can be thought of as referring to a "part"
of the world whose "status" is initially unknown. For example, Cavity might refer to whether my lower left
wisdom tooth has a cavity.
We will always capitalize the names of random variables.
(However, we still use lowercase, single-letter names to represent an unknown random variable,
for example: P(a) = 1 - P(la).)
Each random variable has a domain of values that it can take on.
For example, the domain of Cavity might be (true,false)(We will use lowercase for the names of values.)
The simplest kind of proposition asserts that a random variable has a particular value drawn from its
domain. For example, Cavity = true might represent the proposition that I do in fact have a cavity in my
lower left wisdom tooth.
Random Variables are divided into three kinds:
21
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
Boolean random variables, such as Cavity, have the domain (true, false). We will often abbreviate a
proposition such as Cavity = true simply by the lowercase name cavity. Similarly, Cavity = false would be
abbreviated by
! cavity.
Discrete random variables, which include Boolean random variables as a special case, take on values
from a countable domain. For example, the domain of Weather might be (sunny, rainy, cloudy, snow). The
values in the domain must be mutually exclusive and exhaustive. Where no confusion arises, we: will use,
for example, snow as an abbreviation for Weather = snow.
Continuous random variables take on values from the: real numbers. The domain can be either the entire
real line or some subset such as the interval [0,1]. For example, the proposition X = 4.02 asserts that the
random variable .X has the exact value 4.02. Propositions concerning continuous random variables can also
be inequalities, such as X <= 4.02
Elementary propositions, such as Cavity = true and Toothache =false, can be combined to form complex
propositions using all the standard logical connectives. For example, Cavity = true A Toothache =false is a
proposition to which one may ascribe a degree of (dis)belief.
Atomic events
An atomic event is a complete specification of the state of the world about which the agent is uncertain. It
can be thought of as an assignment of particular values to all the variables of which the world is composed.
For example, if my world consists of only the Boolean variables Cavity and Toothache, then there are just
four distinct atomic events; the proposition Cavity =false A Toothache = true is one such event.
22
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
Atomic events have some important properties:
They are mutually exclusive-at most one can actually be the case. For example,
Cavity. A toothache and cavity A -toothache cannot both be the case.
The set of all possible atomic events is exhaustive-at least one must be the case. That is, the disjunction
of all atomic events is logically equivalent to true.
Any particular atomic event entails the truth or falsehood of every proposition, whether simple or
complex. For example, the atomic event cavity A toothache entails the truth of cavity and the falsehood of
cavity => toothache.
Any proposition is logically equivalent to the disjunction of all atomic events that entail the truth of
the proposition. For example, the proposition cavity is equivalent to disjunction of the atomic events
cavity ^ toothache and cavity A toothache.
Prior probability
The unconditional or prior probability associated with a proposition a is the degree of belief accorded to
it in the absence of any other information; it is written as P ( a ) . For example, if the prior probability that I
have a cavity is 0.1, then we would write
P(Cavity = true) = 0.1 or P(cavity) = 0.1 .
It is important to remember that P ( a ) can be used only when there is no other information. As soon as
some new information is known, we must reason with the conditional probability of a given that new
information.
We will use an expression such as P( Weather), which denotes a vector of values for the probabilities of
each individual state of the weather.
P( Weather = sunny) = 0.7
23
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
P( TNeather = rain) = 0.2
P( Weather = cloudy) = 0.08
P( Weather = snow) = 0.02 .
we may simply write
P( Weather) = (0.7,0.2,0.08,0.02) .
This statement defines a prior probability distribution for the random variable Weather.
We will also use expressions such as P (Weather , Cavity) to denote the probabilities of all combinations of
the values of a set of random variable. In that case, P( Weather, Cavity) can be represented by a 4 x 2 table
of probabilities. This is called the joint probability distribution of Weather DISTRIBUTION and Cavity.
Sometimes it will be useful to think about the complete set of random variables used to describe the world.
A joint probability distribution that covers this complete set is called the full joint probability
distribution. For example, if the world consists of just the variables Cavity, Toothache, and Weather, then
the full joint distribution is given by
P (Cavity, Toothache, Weather).
Full joint distribution specifies the probability of every atomic event and is therefore a complete
specification of one's uncertainty about the world in question.
Probability distributions for continuous variables are called probability density functions. Density
functions differ in meaning from discrete distributions.
Conditional probability
Once the agent has obtained some evidence concerning the previously unknown random variables making
up the domain, prior probabilities are no longer applicable. Instead, we use conditional or posterior
probabilities. The notation used is P(a|b), where a and b are any proposition. This is read as "the
probability of a, given that all we know is b." For example, P(cavity1 toothache) = 0.8
24
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
indicates that if a patient is observed to have a toothache and no other information is yet available, then the
probability of the patient's having a cavity will be 0.8. A prior probability, such as P(cavity), can be thought
of as a special case of the conditional probability P(cavity( ), where the probability is conditioned on no
evidence.
Conditional probabilities can be defined in terms of unconditional probabilities. The defining equation is
which holds whenever P(b) > 0. This equation can also be written as
P(a A b) = P(a| b) P(b)
which is called the product rule. The product rule is perhaps easier to remember: it comes from the fact
that, for a and b to be true, we need b to be true, and we also need a to be true given b. We can also have it
the other way around:
P(a A b) = P(b|a)P(a.)
In some cases, it is easier to reason in terms of prior probabilities of conjunctions, but for the most part, we
will use conditional probabilities as our vehicle for probabilistic inference.
AXIOMS OF PROBABILITY
1. All probabilities are between 0 and 1. For any proposition a,
2. Necessarily true (i.e., valid) propositions have probability I, and necessarily false (i.e.,
unsatisfiable) propositions have probability 0.
Next, we need an axiom that connects the probabilities of logically related propositions. The simplest way
to do this is to define the probability of a disjunction as follows:
3. The probability of a disjunction is given by
25
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
These three axioms are often called Kolmogorov's axioms in honor of the Russian mathematician Andrei
Kolmogorov.
INFERENCE USING FULL JOINT DISTRIBUTION
We will use the full joint distribution as the "knowledge base" from which answers to all questions may be
derived.
For example, there are six atomic events in which cavity V toothache holds:
P(cavity V toothache) = 0.108 + 0.012 + 0.072 + 0.008 + 0.016 + 0.064 == 0.28
One particularly common task is to extract the distribution over some subset of variables or a single
variable. For example, adding the entries in the first row gives the unconditional or marginal probability of
cavity:
P(cavity) = 0.108 + 0.012 + 0.072 + 0.008 = 0.2 .
This process is called marginalization, or summing out-because the variables other than Cavity are summed
out. We can write the following general marginalization rule for any sets of variables Y and Z:
P(Y) = P(Y, z) .
26
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
A variant of this rule involves conditional probabilities instead of joint probabilities, using the product rule:
This rule is called conditioning. Marginalization and conditioning will turn out to be useful rules for all
kinds of derivations involving probability expressions.
BAYES’ RULE
The product rule
Equating the two right-hand sides and dividing by P(a), we get
This equation is known as Bayes' rule (also Bayes' law or Bayes' theorem).This simple equation underlies
all modern A1 systems for probabilistic inference. The more general case of multivalued variables can be
written in the P notation as
where again this is to be taken as representing a set of equations, each dealing with specific values of the
variables.
27
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
REPRESENTING KNOWLEDGE IN UNCERTAIN DOMAIN
A Bayesian network is a directed graph in which each node is annotated with quantitative probability
information. The full specification is as follows:
1. A set of random variables makes up the nodes of the network. Variables may be discrete or continuous.
2. A set of directed links or arrows connects pairs of nodes. If there is an arrow from node X to node Y, X is
said to be apparent of Y.
3. Each node X, has a conditional probability distribution P(X,( Parents(X,))that quantifies the effect of the
parents on the node.
4. The graph has no directed cycles (and hence is a directed acyclic graph, or DAG).
The topology of the network-the set of nodes and links-specifies the conditional independence relationships
that hold in the domain.
The intuitive meaning of an arrow in a properly constructed network is usually that X has a direct influence
on Y. It is usually easy for a domain expert to decide what direct influences exist in the domain.
28
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
You have a new burglar alarm installed at home. It is fairly reliable at detecting a burglary, but also
responds on occasion to minor earthquakes. You also have two neighbors, John and Mary, who have
promised to call you at work when they hear the alarm. John always calls when he hears the alarm, but
sometimes confuses the telephone ringing with the alarm and calls then, too. Mary, on the other hand, likes
rather loud music and sometimes misses the alarm altogether. Given the evidence of who has or has not
called, we would like to estimate the probability of a burglary.
In the case of the burglary network, the topology shows that burglary and earthquakes directly affect the
probability of the alarm's going off, but whether John and Mary call depends only on the alarm. The
network thus represents our assumptions that they do not perceive any burglaries directly, they do not
notice the minor earthquakes, and they do not confer before calling.
Notice that the network does not have nodes corresponding to Mary's currently listening to loud music or to
the telephone ringing and confusing John. These factors are summarized in the uncertainty associated with
the links from Alarm to JohnCalls and MaryCalls. This shows both laziness and ignorance in operation: it
would be a lot of work to find out why those factors would be more or less likely in any particular case, and
we have no reasonable way to obtain the relevant information anyway.
The probabilities actually summarize a potentially infinite set of circumstances in which the alarm might
fail to go off (high humidity, power failure, dead battery, cut wires, a dead mouse stuck inside the bell, etc.)
or John or Mary might fail to call and report it (out to lunch, on vacation, temporarily deaf, passing
helicopter, etc.). In this way, a small agent can cope with a very large world, at least approximately. Each
row in a CPT contains the conditional probability of each node value for a conditioning case. A
conditioning case is just a possible combination of values for the parent nodes-a miniature atomic event.
Each row must sum to 1, because the entries represent an exhaustive set of cases for the variable. For
Boolean variables, once you know that the probability of a true value is p, the probability of false must be 1
29
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
- p, so we often omit the second number, as in Figure 14.2. In general, a table for a Boolean variable with k
Boolean parents contains 2k independently specifiable probabilities. A node with no parents has only one
row, representing the prior probabilities of each possible value of the variable.
THE SEMANTICS OF BAYESIAN NETWORKS
There are two ways in which one can understand the semantics of Bayesian networks. The first is to see the
network as a representation of the joint probability distribution. The second is to view it as an encoding of
a collection of conditional independence statements. The two views are equivalent, but the first turns out to
be helpful in understanding how to construct networks, whereas the second is helpful in designing
inference procedures.
Representing the full joint distribution
A Bayesian network provides a complete description of the domain. Every entry ]in the full joint
probability distribution (hereafter abbreviated as "joint") can be calculated from the information in the
network. A generic entry in the joint distribution is the probability of a conjunction of particular
assignments to each variable, such as P(X1 = X I A . . . A X, = x,).
We use the notation P(xl., . . , x,) as an abbreviation for this. The value of this entry is given by the formula
(14.1)
where parents(Xi) denotes the specific values of the variables in Parents(Xi). Thus, each entry in the joint
distribution is represented by the product of the appropriate elements of the conditional probability tables
30
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
(CPTs) in the Bayesian network. The CPTs therefore provide a decomposed representation of the joint
distribution.
A method for constructing Bayesian networks
Equation (14.1) defines what a given Bayesian network means. It does not, however, explain how to
construct a Bayesian network in such a way that the resulting joint distribution is a good representation of a
given domain. We will now show that Equation (14.1) implies certain conditional independence
relationships that can be used to guide the knowledge engineer in constructing the topology of the network.
First, we rewrite the joint distribution in terms of a conditional probability, using the product rule
Then we repeat the process, reducing each conjunctive probability to a conditional probability and a
smaller conjunction. We end up with one big product:
CHAIN RULE This identity holds true for any set of random variables and is called the chain rule.
Comparing it with Equation (14.1), we see that the specification of the joint distribution is equivalent to the
general assertion that, for every variable Xi in the network,
(14.2)
31
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
provided that Parents(Xi) {Xi-l, . . . , XI). This last condition is satisfied by labeling the nodes in any order
that is consistent with the partial order implicit in the graph structure.
What Equation (14.2) says is that the Bayesian network is a correct representation of the domain only if
each node is conditionally independent of its predecessors in the node ordering, given its parents. Hence, in
order to construct a Bayesian network with the correct structure for the domain, we need to choose parents
for each node such that this property holds. Intuitively, the parents of node X, should contain all those
nodes in XI, . . . , X,-l
that directly influence X,. For example, suppose we have completed the network in Figure 14.2 except for
the choice of parents for MaryCalls. MaryCalls is certainly influenced by whether there is a Burglary or an
Earthquake, but not directly influenced. Intuitively, our knowledge of the domain tells us that these events
influence Mary's calling behavior only through their effect on the alarm. Also, given the state of the alarm,
whether John calls has no influence on Mary's calling. Formally speaking, we believe that the following
conditional independence statement holds:
P(MaryCa1ls| JohnCalls, Alarm, Earthquake, Burglary) = P(MaryCal1s|Alarm) .
Compactness and Node ordering
The compactness of Bayesian networks is an example of a very general property of locally structured
(also called sparse) systems. In a locally structured system, each subcomponent interacts directly with only
a bounded number of other components, regardless of the total number of components.
Even in a locally structured domain, constructing a locally structured Bayesian network is not a trivial
problem. We require not only that each variable be directly influenced by only a few others, but also that
the network topology actually reflects those direct influences with the appropriate set of parents. Because
of the way that the construction procedure works, the "direct influencers" will have to be added to the
network first if they are to become parents of the node they influence. Therefore, the correct order in which
32
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
to add nodes is to add the "root causes" first, then the variables they influence, and so on, until we reach
the "leaves," which have no direct causal influence on the other variables. If we stick to a causal model, we
end up having to specify fewer numbers, and the numbers will often be easier to come up with.
Conditional Independence Relation in Bayesian Networks
The topological semantics is given by either of the following specifications, which are equivalent:'
1. A node is conditionally independent of its non-descendants, given its parents. For example, in Figure
14.2, JohnCalls is independent of .Burglary and Earthquake, given the value of Alarm.
2. A node is conditionally independent of all other nodes in the network, given its parents, children, and
children's parents-that is, given its Markov blanket. For example, Burglary is independent of JohnCalls and
MaryCalls, given Alarm and Earthquake.
EXACT INFERENCE IN BAYESIAN NETWORKS
The basic task for any probabilistic inference system is to compute the posterior probability distribution for
a set of query variables, given some observed event-that is, some assignment of values to a set of evidence
variables.
Inference By Enumeration
A query P(X1e) can be answered using
P(X|e) = α P(X, e) = α ∑ P(X, e, y) . 13.6
Now, a Bayesian network gives a complete representation of the full joint distribution. More specifically,
Equation (14.1) shows that the terms P(x, e, y) in the joint distribution can be written as products of
conditional probabilities from the network. Therefore, a query can be answered using a Bayesian
network by computing sums of products of conditional probabilities from the network.
33
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
ENUMERATE-JOINT-ASK, is given for inference by enumeration from the full joint distribution. The
algorithm takes as input a full joint distribution P and looks up values therein. Consider the query
P(Burglary JohnCalls = true, &Jury Calls = true). The hidden variables for this query are Earthquake and
Alarm. From Equation (13.6), using initial letters for the variables in order to shorten the expressions, we
have
34
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
The Variable Elimination Algorithm
The idea is simple: do the calculation once and save the results for later use. This is a form of dynamic
programming. There are several versions of this approach; we present the variable elimination algorithm,
which is the simplest.
Variable elimination works by evaluating expressions such as Equation (14.3) in right-to-left order .
Intermediate results ;are stored, and summations over
each variable are done only for those portions of the expression that depend on the variable.
35
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
Examining this sequence of steps, we see that there are two basic computational operations required:
pointwise product of a pair of factors, and sumrning out a variable from a product of factors
The pointwise product is not matrix multiplication, nor is it element-by-element multiplication.
The pointwise product of two factors fl and f2 yields a new factor f whose variables are the union of the
variables in fl and f2. Suppose the two factors have variables Yl, . . . , Yk in common. Then we have
f ( x l ... x j , Y l . . .Yk,Z1.. .Z') =fl(X1.. .Xj,Yl.. .Yk) f2(Y1 ... Yk,Z,. . .Z2).
Summing out a variable from a product of factors is also a straightforward computation.
The only trick is to notice that any factor that does not depend on the variable to be summed out can be
moved outside the summation process.
Complexity of Exact Inference
The time and space requirements of variable elimination are dominated by the size of the largest factor
constructed during the operation of the algorithm. This in turn is determined by the order of elimination of
variables and by the structure of the network. The family of networks in which there is at most one
undirected path between any two nodes in the network are called singly connected networks or polytrees,
36
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
and they have a particularly nice property: The time and space complexity of exact inference in polytrees is
linear in the size of the network
For multiply connected networks, variable elimination can have exponential time and space complexity in
the worst case, even when the number of parents per node is bounded
Clustering Algorithms
The basic idea of clustering is to join individual nodes of the network to form cluster nodes in such a way
that the resulting network is a polytree. For example, the multiply connected network shown can be
converted into a polytree by combining the Sprinkler and Rain node into a cluster node called
Sprinkler+Rain, as shown in Figure.
The two Boolean nodes are replaced by a meganode that takes on four possible values: TT, TF, FT, and FF.
The meganode has only one parent, the Boolean variable Cloudy, so there are two conditioning cases.
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
Approximate Inference In Bayesian Networks
This section describes randomized sampling algorithms, also called Monte Carlo algorithms that provide
approximate answers whose accuracy depends on the number of samples generated.
The primitive element in any sampling algorithm is the generation of samples from a known probability
distribution. For example, an unbiased coin can be thought of as a random variable Coin with values
(heads, tails) and a prior distribution P(Coin) = (0.5,0.5). Sampling from this distribution is exactly like
flipping the coin: with probability 0.5 it will return heads, and with probability 0.5 it will return tails.
The simplest kind of random sampling process for Bayesian networks generates events from a network that
has no evidence associated with it. The idea is to sample each variable in turn, in topological order. The
probability distribution from which the value is sampled is conditioned on the values already assigned to
the variable's parents. This algorithm is shown in Figure 14.12. We can illustrate its operation on the
network in Figure 14.1l(a), assuming an ordering [Cloudy,S prinkler, Rain, WetGrass]:
1. Sample from P(C1oudy) = (0.5,0.5); suppose this returns true.
2. Sample from P(Sprinkler 1 Cloudy = true) = (0.:L , 0.9);suppose this returns false.
3. Sample from P(Rain1C loudy = true) = (0.8,0.2);suppose this returns true.
4. Sample from P( WetGrass(Sprink1er= false, Rain = true) = (0.9,O.l);suppose this returns true.
In this case, PRIOR-SAMPLE returns the event [true,false, true, true].
It is easy to see that PRIOR-SAMPLE generates samples from the prior joint distribution specified by the
network. First, let SpS(xl, . . . , x,) be the probability that a specific event is generated by the PRIORSAMPLE algorithm. Just looking at the sampling process, we have
38
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
because each sampling step depends only on the parent values.
In any sampling algorithm, the answers are computed by counting the actual samples generated. Suppose
there are N total samples, and let N(xl, . . . , x,) be the frequency of the specific event X I , . . . , x,. We
expect this frequency to converge, in the limit, to its expected value according to the sampling probability:
(14.4)
Whenever we use an approximate equality ("~) in what follows, we mean it in exactly this sense-that the
estimated probability becomes exact in the large-sample limit. Such an estimate is called consistent. For
example, one can produce a consistent estimate of the probability of any partially specified event XI,. . . ,
xm, where m <= n, as follows:
39
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
That is, the probability of the event can be estimated as the fraction of all complete events generated by the
sampling process that match the partially specified event. For example, if we generate 1000 samples from
the sprinkler network, and 5 11 of them have Rain = true, then the estimated probability of rain, written as
p (rain= true),is 0.511 .
Rejection Sampling in Bayesian Networks
Rejection sampling is a general method for producing samples from a hard-to-sample distribution given an
easy-to-sample distribution.
First, it generates samples from the prior distribution specified by the network. Then, it rejects all those that
do not match the evidence. Finally, the estimate
P(X = x|e) is obtained by counting how often X = x occurs in the remaining samples.
Likelihood Weighting
40
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
Likelihood weighting avoids the inefficiency of rejection sampling by generating only events that are
consistent with the evidence e.
LIKELIHOOD-WEIGHTING (see Figure 14.14) fixes the values for the evidence variables E and samples
only the remaining variables X and Y. This guarantees that each event generated is consistent with the
evidence. Not all events are equal. Before tallying the counts in the distribution for the query variable, each
event is weighted by the likelihood that the event accords to the evidence, as measured by the product of
the conditional probabilities for each evidence variable, given its parents. Intuitively, events in which the
actual evidence appears unlikely should be given less weight.
41
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
Let us apply the algorithm to the network shown in Figure 14.11(a), with the query P(Rain/Sprinkler =
true, WetGrass = true). The process goes as follows: First, the weight w is set to 1.0. Then an event is
generated:
3 Sample from P(C1oudy) = (0.5,0.5); suppose this returns true.
4 Sprinkler is an evidence variable with value true. Therefore, we set
w <- w x P(Sprinkler = true1 Cloudy = true) = 0.1 .
(iii)
Sample from P(Rain/ Cloudy = true) = (0.8,0.2); suppose this returns true.
(iv)
WetGrass is an evidence variable with value true. Therefore, we set
w <- w x P( WetGrass = true(Sprink1er = true, Rain = true) = 0.099 .
Here WEIGHTED-SAMPLE returns the event [true, true, true, true] with weight 0.099, and this is tallied
under Rain = true. The weight is low because the event describes a cloudy day, which makes the sprinkler
unlikely to be on.
To understand why likelihood weighting works, we start by examining the sampling distribution S ws for
WEIGHTED-SAMPLE. Remember that the evidence variables E are fixed with values e. We will call the
other variables Z, that is, Z = {X) U Y. The algorithm samples each variable in Z given its parent values:
(14.6)
Notice that Parents(Zi) can include both hidden variables and evidence variables. Unlike the prior
distribution P(z), the distribution Sws pays some attention to the evidence: the sampled values for each Zi
will be influenced by evidence among Zi's ancestors. On the other hand, Sws pays less attention to the
evidence than does the true posterior distribution P(Z|e), because the sampled values for each Zi ignore
evidence among Zi,'s non- ancestors
42
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
The likelihood weight w makes up for the difference between the actual and desired sampling distributions.
The weight for a given sample x, composed from z and e, is the product of the likelihoods for each
evidence variable given its parents (some or all {of which may be among the Zis):
(14.7)
Multiplying Equations (14.6) and (14.7), we see that the weighted probability of a sample has the
particularly convenient form
Because likelihood weighting uses all the samples generated, it can be much more efficient than rejection
sampling. It will, however, suffer a degradation in performance as the number of evidence variables
increases. This is because most samples will have very low weights and hence the weighted estimate will
be dominated by the tiny fraction of samples that accord more than an infinitesimal likelihood to the
evidence.
Inference by Markov chain simulation
In this section, we describe the Markov chain Monte Carlo (MCMC) algorithm for inference in Bayesian
networks.
The MCMC algorithm
43
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
MCMC generates each event by making a random change to the preceding event. It is therefore helpful to
think of the network as being in a particular current state specifying a value for every variable. The next
state is generated by randomly sampling a value for one of the nonevidence variables Xi, conditioned on
the current values of the variables in the Markov
blanket of Xi.
MCMC therefore wanders randomly around the state space-the space of possible complete assignmentsflipping one variable at a time, but
keeping the evidence variables fixed.
Consider the query P(Rain1 Sprinkler = true, Wet Grass = true) applied to the network in Figure 14.1 l(a).
The evidence variables Sprinkler and WetGrass are fixed to their observed values and the hidden variables
Cloudy and Rain are initialized randomly-let us say to true and false respectively. Thus, the initial state is
[true, true, false, true]. Now the following steps are executed repeatedly:
(d)
Cloudy is sampled, given the current values of its Markov blanket variables: in this case, we sample
from P(Cloudy1 Sprinkler = true, Rain =false). (Shortly, we will show how to calculate this distribution.)
Suppose the result is Cloudy =false. Then the new current state is [false, true, false, true].
(e)
Rain is sampled, given the current values of its Markov blanket variables: In this case, we sample
from
P(Rain1 Cloudy =false, Sprinkler = true, WetGrass = true). Suppose this yields Rain = true. The new
current state is [false, true, true, true].
44
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
Each state visited during this process is a sample that contributes to the estimate for the query variable
Rain. If the process visits 20 states where Rain is true and 60 states where Rain is false, then the answer to
the query is NORMALIZE((20, 60)) = (0.25,0.75). The complete
algorithm is shown in Figure 14.15.
Why MCMC works
Let q(x-> x') be the probability that the process makes a transition from state x to state x'. This
transition probability defines what is called a Markov chain on the state space. Suppose we :
45
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
Let Xi be the variable to be sampled, and let be all the hidden variables other than Xi. Their values in
the current state are xi and xi. If we sample a new value xi for Xi conditionally on all the other variables,
including the evidence, we have
46
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
This transition probability is called the Gibbs sampler and is a particularly convenient form
of MCMC.
INFERENCE IN TEMPORAL MODELS
The basic tasks include:
Filtering or monitoring: This is the task of computing the belief state-the posterior distribution over the
current state, given all evidence to date. That is, we wish to compute P(Xt (el:t)a,s suming that evidence
arrives in a continuous stream beginning at t = 1. In the umbrella example, this would mean computing the
probability of rain today, given all the observations of the umbrella carrier made so far. Filtering is what a
rational agent needs to do in order to keep track of the current state so that rational decisions
can be made.
Prediction: This is the task of computing the posterior distribution over the future state, given all evidence
to date. In the umbrella example, this might mean computing the probability of rain three days
from now, given all the observations of the umbrella-carrier made so far. Prediction is useful for evaluating
possible courses of action.
Smoothing, or hindsight: This is the task of computing the posterior distribution over a past state, given
all evidence up to the present. In the umbrella example, it might mean computing the probability that it
rained last Wednesday, given all the observations of the umbrella carrier made up to today. Hindsight
provides a better estimate of the state than was
available at the time, because it incorporates more evidence.
47
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
Most likely explanation: Given a sequence of observations, we might wish to find the sequence of states
that is most likely to have generated those observations. or example, if the umbrella appears on each of the
first three days and is absent on the fourth, then the most likely explanation is that
it rained on the first three days and did not rain on the fourth. Algorithms for this task are useful in many
applications, including speech recognition-where the aim is to find the most likely sequence of words,
given a series of sounds-and the reconstruction of bit strings transmitted over a noisy channel.
Filtering and Prediction
We will show that this can be done in a simple online fashion:
given the result of filtering up to time t, one can easily compute the result for t + 1 from the new evidence
et+, . That is,
for some function f. This process is often called recursive estimation.
We can view the calculation as actually being composed of two parts: first, the current state distribution is
projected forward from t to t + 1; then it is updated using the new evidence et+l. This two-part process
emerges quite simply:
48
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
Here and throughout this chapter, a is a normalizing constant used to make probabilities sum
up to 1. The second term, P(X(t+l),el:t) represents a one-step prediction of the next state,
and the first term updates this with the new evidence;
Smoothing
Smoothing is the process of computing the distribution over past states
given evidence up to the present; that is, P(Xk,e:t) for 1 < k < t. (See Figure 15.3.) This is done most
conveniently in two parts-the evidence up to k and the evidence from k + 1 to t,
49
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
Both the forward and backward recursions take a constant amount of time per step; hence, the time
complexity of smoothing with respect to evidence elzt is O(t). This is the complexity for smoothing at a
particular time step k. If we want to smooth the whole sequence, one obvious method is simply to run the
whole smoothing process once for each time step to be smoothed. This results in a time complexity of 0 ( t
2 ) .A better approach uses a very simple application of dynamic programming to reduce the complexity to
O(t). A clue
appears in the preceding analysis of the umbrella example, where we were able to reuse the results of the
forward filtering phase. The key to the linear-time algorithm is to record the results of forward filtering
over the whole sequence. Then we run the backward recursion from t down to 1, computing the smoothed
estimate at each step k from the computed backward message bk+pt and the stored forward message. The
algorithm, aptly called the
forward-backward algorithm, is shown in Figure 15.4.
50
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
Drawbacks of forward –backward algorithm:
6.Its space complexity can be too high for applications where the state space is large and the sequences are
long.
7. The second drawback of the basic algorithm is that i1 needs to be modified to work in an online setting
where smoothed estimates must be computed for earlier time slices as new observations are continuously
added to the end of the sequence. The most common requirement is for fixed-lag smoothing, which
requires computing the smoothed estimate for fixed d. That is, smoothing is done for the time slice d steps
behind the current time t; as t increases, the smoothing has to keep up. Obviously, we can run the forwardbackward algorithm over the d-step "window" as each new observation is added, but this seems inefficient.
51
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
Hidden Markov Models
An HMM is a temporal probabilistic model in which the state of the process is described by a single
random
variable. The possible values of the variable are the possible states of the world. Additional state variables
can be added to a temporal model while staying within the HMM framework, but only by combining all the
state variables into a single "megavariable" whose values are all possible tuples of values of the individual
state variables.
Simplified Matrix Algorithms
With a single, discrete state variable Xt, we can give concrete form to the representations of the transition
model, the sensor model, and the forw<wd and backward messages. Let the state variable Xt have values
denoted by integers 1, . . . , S, where S is the number of possible states.
That is, Tij is the probability of a transition from state i to stat(? j. For example, the transition matrix for the
umbrella world is
We also put the sensor model in matrix form. In this case, because the value of the evidence variable Et is
known to be, say, et, we need use only that part of the model specifying the probability that et appears. For
each time step t, we construct a diagonal matrix Ot whose diagonal entries are given by the values P(et lXt
= i) and whose other entries are 0. For example, on day 1 in the umbrella world, Ul = true, so, from Figure
15.2, we have
52
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
Now, if we use column vectors to represent the forward and backward messages, the computations become
simple matrix-vector operations. The forward equation (15.3) becomes
and the backward equation (15.7) becomes
Besides providing an elegant description of the filtering and smoothing algorithms for HMMs, the matrix
formulation reveals opportunities for improved algorithms. The first is a simple variation on the forwardbackward algorithm that allows smoothing to be carried out in constant space, independently of the length
of the sequence.
Fuzzy Reasoning
53
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
Fuzzy Logic Systems (FLS) produce acceptable but definite output in response to incomplete,
ambiguous, distorted, or inaccurate (fuzzy) input.
What is Fuzzy Logic?
Fuzzy Logic (FL) is a method of reasoning that resembles human reasoning. The approach of FL
imitates the way of decision making in humans that involves all intermediate possibilities between digital
values YES and NO.
The conventional logic block that a computer can understand takes precise input and produces a
definite output as TRUE or FALSE, which is equivalent to human’s YES or NO.
The inventor of fuzzy logic, Lotfi Zadeh, observed that unlike computers, the human decision making
includes a range of possibilities between YES and NO, such as −
CERTAINLY YES
POSSIBLY YES
CANNOT SAY
POSSIBLY NO
CERTAINLY NO
The fuzzy logic works on the levels of possibilities of input to achieve the definite output.
Implementation

It can be implemented in systems with various sizes and capabilities ranging from small
micro-controllers to large, networked, workstation-based control systems. 
54
CS6659-ARTIFICIAL INTELLIGENCE

VI SEM
It can be implemented in hardware, software, or a combination of both. 
Why Fuzzy Logic?
Fuzzy logic is useful for commercial and practical purposes.

It can control machines and consumer products. 

It may not give accurate reasoning, but acceptable reasoning. 

Fuzzy logic helps to deal with the uncertainty in engineering. 


Fuzzy Logic Systems Architecture
It has four main parts as shown −

Fuzzification Module − It transforms the system inputs, which are crisp numbers, into fuzzy sets.
It splits the input signal into five steps such as − 
LP x is Large Positive
MP x is Medium Positive
S
x is Small
MN x is Medium Negative
LN x is Large Negative

Knowledge Base − It stores IF-THEN rules provided by experts. 

Inference Engine − It simulates the human reasoning process by making fuzzy inference on the

inputs and IF-THEN rules. 


Defuzzification Module − It transforms the fuzzy set obtained by the inference engine into a
crisp value. 
55
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
The membership functions work on fuzzy sets of variables.
Membership Function
Membership functions allow you to quantify linguistic term and represent a fuzzy set graphically. A
membership function for a fuzzy set A on the universe of discourse X is defined as µA:X → [0,1].
Here, each element of X is mapped to a value between 0 and 1. It is called membership value or degree of
membership. It quantifies the degree of membership of the element in X to the fuzzy set A.

x axis represents the universe of discourse. 

y axis represents the degrees of membership in the [0, 1] interval. 

There can be multiple membership functions applicable to fuzzify a numerical value. Simple
membership functions are used as use of complex functions does not add more precision in the output.
All membership functions for LP, MP, S, MN, and LN are shown as below −
56
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
The triangular membership function shapes are most common among various other membership
function shapes such as trapezoidal, singleton, and Gaussian.
Here, the input to 5-level fuzzifier varies from -10 volts to +10 volts. Hence the corresponding output also
changes.
Example of a Fuzzy Logic System
Let us consider an air conditioning system with 5-level fuzzy logic system. This system adjusts the
temperature of air conditioner by comparing the room temperature and the target temperature value.
57
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
Algorithm

Define linguistic variables and terms. 

Construct membership functions for them. 

Construct knowledge base of rules. 

Convert crisp data into fuzzy data sets using membership functions. (fuzzification) 

Evaluate rules in the rule base. (interface engine) 

Combine results from each rule. (interface engine) 

Convert output data into non-fuzzy values. (defuzzification) 






Logic Development
Step 1: Define linguistic variables and terms
Linguistic variables are input and output variables in the form of simple words or sentences. For room
temperature, cold, warm, hot, etc., are linguistic terms.
58
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
Temperature (t) = {very-cold, cold, warm, very-warm, hot}
Every member of this set is a linguistic term and it can cover some portion of overall temperature values.
Step 2: Construct membership functions for them
The membership functions of temperature variable are as shown −
Step3: Construct knowledge base rules
Create a matrix of room temperature values versus target temperature values that an air conditioning
system is expected to provide.
RoomTemp.
Very_Cold
Cold
Warm
Hot
Very_Hot
/Target
Very_Cold
No_Change
Heat
Heat
Heat
Heat
Cold
Cool
No_Change
Heat
Heat
Heat
59
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
Warm
Cool
Cool
No_Change
Heat
Heat
Hot
Cool
Cool
Cool
No_Change
Heat
Very_Hot
Cool
Cool
Cool
Cool
No_Change
Build a set of rules into the knowledge base in the form of IF-THEN-ELSE structures.
Sr. No.
Condition
Action
1
IF temperature=(Cold OR Very_Cold) AND target=Warm THEN Heat
2
IF temperature=(Hot OR Very_Hot) AND target=Warm THEN
Cool
3
IF (temperature=Warm) AND (target=Warm) THEN
No_Change
Step 4: Obtain fuzzy value
Fuzzy set operations perform evaluation of rules. The operations used for OR and AND are Max and
Min respectively. Combine all results of evaluation to form a final result. This result is a fuzzy value.
Step 5: Perform defuzzification
Defuzzification is then performed according to membership function for output variable.
60
CS6659-ARTIFICIAL INTELLIGENCE
Application Areas of Fuzzy Logic
The key application areas of fuzzy logic are as given −
Automotive Systems

Automatic Gearboxes 

Four-Wheel Steering 

Vehicle environment control 


Consumer Electronic Goods

Hi-Fi Systems 

Photocopiers 

Still and Video Cameras 

Television 



Domestic Goods
VI SEM
61
CS6659-ARTIFICIAL INTELLIGENCE

Microwave Ovens 

Refrigerators 

Toasters 

Vacuum Cleaners 

Washing Machines 




Environment Control

Air Conditioners/Dryers/Heaters 

Humidifiers 

Advantages of FLSs

Mathematical concepts within fuzzy reasoning are very simple. 

You can modify a FLS by just adding or deleting rules due to flexibility of fuzzy logic. 

Fuzzy logic Systems can take imprecise, distorted, noisy input information. 

FLSs are easy to construct and understand. 

Fuzzy logic is a solution to complex problems in all fields of life, including medicine, as




it resembles human reasoning and decision making. 
Disadvantages of FLSs

There is no systematic approach to fuzzy system designing. 

They are understandable only when simple. 

They are suitable for the problems which do not need high accuracy. 



Dempster-Shafer Theory

VI SEM
62
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
• Dempster-Shafer theory is an approach to combining evidence
• Dempster (1967) developed means for combining degrees of belief derived from independent items of
evidence.
• His student, Glenn Shafer (1976), developed method for obtaining degrees of belief for one question
from subjective probabilities for a related question
• People working in Expert Systems in the 1980s saw their approach as ideally suitable for such
systems.
• Each fact has a degree of support, between 0 and 1:
– 0 No support for the fact
– 1 full support for the fact
• Differs from Bayesian approah in that:
– Belief in a fact and its negation need not sum to 1.
– Both values can be 0 (meaning no evidence for or against the fact)
Frame of discernment :
= { θ1, θ2, …, θn}
• Bayes was concerned with evidence that supported single conclusions (e.g., evidence for each outcome
θi in Θ):
p(θi | E)
• D-S Theoryis concerned with evidences which support subsets of outcomes in Θ, e.g.,
θ1 v θ2 v θ3 or {θ1, θ2, θ3}
• The “frame of discernment” (or “Power set”) of Θ is the set of all possible subsets of Θ:
– E.g., if Θ = { θ1, θ2, θ3}
63
CS6659-ARTIFICIAL INTELLIGENCE
VI SEM
• Then the frame of discernment of Θ is:
( Ø, θ1, θ2, θ3, {θ1, θ2}, {θ1, θ3}, {θ2, θ3}, { θ1, θ2, θ3} )
• Ø, the empty set, has a probability of 0, since one of the outcomes has to be true.
• Each of the other elements in the power set has a probability between 0 and 1.
• The probability of { θ1, θ2, θ3} is 1.0 since one has to be true.
Mass function m(A):
(where A is a member of the power set)
= proportion of all evidence that supports this element of the power set.
“The mass m(A) of a given member of the power set, A, expresses the proportion of all relevant and
available evidence that supports the claim that the actual state belongs to A but to no particular subset of
A.” (wikipedia)
“The value of m(A) pertains only to the set A and makes no additional claims about any subsets of A,
each of which has, by definition, its own mass.
Mass function m(A):
• Each m(A) is between 0 and 1.
• All m(A) sum to 1.
• m(Ø) is 0 - at least one must be true.
Interpetation of m({AvB})=0.3
64
CS6659-ARTIFICIAL INTELLIGENCE
• means there is evidence for {AvB} that cannot be divided among more specific
beliefs for A or B.
Belief in A:
The belief in an element A of the Power set is the sum of the masses of elements which
are subsets of A (including A itself).
E.g., given A={q1, q2, q3}
Bel(A) = m(q1)+m(q2)+m(q3)
+ m({q1, q2})+m({q2, q3})+m({q1,
q3}) +m({q1, q2, q3}) Belief in A:
example
• Given the mass assignments as assigned by the detectives:
A {B}
{J} {S}
m(A) 0.1
0.2 0.1
{B,J}
{B,S}{J,S} {B,J,S}
0.1 0.1 0.3 0.1
 bel({B}) = m({B}) = 0.1
 bel({B,J}) = m({B})+m({J})+m({B,J}) =0.1+0.2+0.1=0.4
Result:
A {B}
{J} {S}
{B,J}
{B,S}{J,S} {B,J,S}
m(A) 0.1
0.2 0.1
0.1 0.1 0.3 0.1
bel(A) 0.1
0.2 0.1
0.4 0.3 0.6 1.0
VI SEM
65