* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download INTRODUCTION TO Al AND PRODUCTION SYSTEMS 9
Genetic algorithm wikipedia , lookup
Ecological interface design wikipedia , lookup
Personal knowledge base wikipedia , lookup
Pattern recognition wikipedia , lookup
Inductive probability wikipedia , lookup
Linear belief function wikipedia , lookup
Unification (computer science) wikipedia , lookup
Multi-armed bandit wikipedia , lookup
Computer Go wikipedia , lookup
Philosophy of artificial intelligence wikipedia , lookup
Intelligence explosion wikipedia , lookup
Embodied cognitive science wikipedia , lookup
Existential risk from artificial general intelligence wikipedia , lookup
History of artificial intelligence wikipedia , lookup
CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE CS6659 -ARTIFICIAL INTELLIGENCE UNIT I -INTRODUCTION TO Al AND PRODUCTION SYSTEMS 9 Introduction to AI-Problem formulation, Problem Definition -Production systems, Control strategies, Search strategies.Problem characteristics, Production system characteristics Specialized production system- Problem solving methods Problem graphs, Matching, Indexing and Heuristic functions –Hill Climbing-Depth first and Breath first, Constraints satisfaction - Related algorithms, Measure of performance and analysis of search algorithms. UNIT II- REPRESENTATION OF KNOWLEDGE 9 Game playing - Knowledge representation, Knowledge representation using Predicate logic, Introduction to predicate calculus, Resolution, Use of predicate calculus, Knowledge representation using other logic-Structured representation of knowledge. UNIT III- KNOWLEDGE INFERENCE 9 Knowledge representation -Production based system, Frame based system. Inference – Backward chaining, Forward chaining, Rule value approach, Fuzzy reasoning Certainty factors, Bayesian Theory-Bayesian Network-Dempster – Shafer theory. UNIT IV -PLANNING AND MACHINE LEARNING Basic plan generation systems - Strips -Advanced plan generation systems – K strips – Strategic explanations -Why, Why not and how explanations. Learning- Machine learning, adaptive Learning. UNIT V -EXPERT SYSTEMS 9 Expert systems - Architecture of expert systems, Roles of expert systems - Knowledge Acquisition – Meta knowledge, Heuristics. Typical expert systems - MYCIN, DART, XOON, Expert systems shells 9 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE What is Artificial Intelligence? Artificial Intelligence (AI) is a branch of Science which deals with helping machines finding solutions to complex problems in a more human-like fashion. This generally involves borrowing characteristics from human intelligence, and applying them as algorithms in a computer friendly way. A more or less flexible or efficient approach can be taken depending on the requirements established, which influences how artificial the intelligent behaviour appears. AI is generally associated with Computer Science, but it has many important links with other fields such as Maths, Psychology, Cognition, Biology and Philosophy, among many others. Our ability to combine knowledge from all these fields will ultimately benefit our progress in the quest of creating an intelligent artificial being. AI currently encompasses a huge variety of subfields, from general-purpose areas such as perception and logical reasoning, to specific tasks such as playing chess, proving mathematical theorems, writing poetry, and diagnosing diseases. Often, scientists in other fields move gradually into artificial intelligence, where they find the tools and vocabulary to systematize and automate the intellectual tasks on which they have been working all their lives. Similarly, workers in AI can choose to apply their methods to any area of human intellectual endeavour. In this sense, it is truly a universal field. HISTORY OF AI The origin of artificial intelligence lies in the earliest days of machine computations. During the 1940s and 1950s, AI begins to grow with the emergence of the modern computer. Among the first researchers to attempt to build intelligent programs were Newell and Simon. Their first well known program, logic theorist, was a program that proved statements using the accepted rules of logic and a problem solving program of their own design. By the late fifties, programs existed that could do a passable job of translating technical documents and it was seen as only a matter of extra databases and more computing power to apply the techniques to less formal, more ambiguous texts. Most problem solving work revolved around the work of Newell, Shaw and Simon, on the general problem solver (GPS). Unfortunately the GPS did not fulfill its promise and did not because of some simple lack of computing capacity. CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE In the 1970‟s the most important concept of AI was developed known as Expert System which exhibits as a set rules the knowledge of an expert. The application area of expert system is very large. The 1980‟s saw the development of neural networks as a method learning examples. Prof. Peter Jackson (University of Edinburgh) classified the history of AI into three periods as: 1. Classical 2. Romantic 3. Modern 1. Classical Period: It was started from 1950. In 1956, the concept of Artificial Intelligence came into existance. During this period, the main research work carried out includes game plying, theorem proving and concept of state space approach for solving a problem. 2. Romantic Period: It was started from the mid 1960 and continues until the mid 1970. During this period people were interested in making machine understand, that is usually mean the understanding of natural language. During this period the knowledge representation technique ―semantic net‖ was developed. 3. Modern Period: It was started from 1970 and continues to the present day. This period was developed to solve more complex problems. This period includes the research on both theories and practical aspects of Artificial Intelligence. This period includes the birth of concepts like Expert system, Artificial Neurons, Pattern Recognition etc. The research of the various advanced concepts of Pattern Recognition and Neural Network are still going on. COMPONENTS OF AI There are three types of components in AI 1) Hardware Components of AI a) Pattern Matching b) Logic Representation c) Symbolic Processing d) Numeric Processing e) Problem Solving f) Heuristic Search CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE g) Natural Language processing h) Knowledge Representation i) Expert System j) Neural Network k) Learning l) Planning m) Semantic Network 2) Software Components a) Machine Language b) Assembly language c) High level Language d) LISP Language e) Fourth generation Language f) Object Oriented Language g) Distributed Language h) Natural Language i) Particular Problem Solving Language 3) Architectural Components a) Uniprocessor b) Multiprocessor c) Special Purpose Processor d) Array Processor e) Vector Processor f) Parallel Processor g) Distributed Processor 10-Definition of Artificial intelligence 1.AI is the study of how to make computers do things which at the moment people do better. This is ephemeral as it refers to the current state of computer science and it excludes a major area ; problems that cannot be solved well either by computers or by people at the moment. 2. AI is a field of study that encompasses computational techniques for performing tasks that apparently require intelligence when performed by humans. CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE 3. AI is the branch of computer science that is concerned with the automation of intelligent behaviour. A I is based upon the principles of computer science namely data structures used in knowledge representation, the algorithms needed to apply that knowledge and the languages and programming techniques used in their implementation. 4. AI is the field of study that seeks to explain and emulate intelligent behaviour in terms of computational processes. 5. AI is about generating representations and procedures that automatically or autonomously solve problems heretofore solved by humans. 6. A I is the part of computer science concerned with designing intelligent computer systems, that is, computer systems that exhibit the characteristics we associate with intelligence in human behaviour such as understanding language, learning, reasoning and solving problems. 7. A I is the study of mental faculties through the use of computational models. 8. A I is the study of the computations that make it possible to perceive, reason, and act. 9. A I is the exciting new effort to make computers think machines with minds, in the full and literal sense. 10. AI is concerned with developing computer systems that can store knowledge and effectively use the knowledge to help solve problems and accomplish tasks. This brief statement sounds a lot like one of the commonly accepted goals in the education of humans. We want students to learn (gain knowledge) and to learn to use this knowledge to help solve problems and accomplish tasks. WEAK AND STRONG AI There are two conceptual thoughts about AI namely the Weak AI and Strong AI. The strong AI is very much promising about the fact that the machine is almost capable of solve a complex problem like an intelligent man. They claim that a computer is much more efficient to solve the problems than some of the human experts. According to strong AI, the computer is not merely a tool in the study of mind, rather the appropriately programmed computer is really a mind. Strong AI is the supposition that some forms of artificial intelligence can truly reason and solve problems. The term strong AI was originally coined by John Searle. In contrast, the weak AI is not so enthusiastic about the outcomes of AI and it simply says that some thinking like features can be added to computers to make them more useful tools. It says that computers to make them more useful tools. It says that computers cannot be made intelligent equal to human being, unless constructed significantly differently. They claim that CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE computers may be similar to human experts but not equal in any cases. Generally weak AI refers to the use of software to study or accomplish specific problem solving that do not encompass the full range of human cognitive abilities. An example of weak AI would be a chess program. Weak AI programs cannot be called ―intelligent‖ because they cannot really think. TASK DOMAIN OF AI Areas of Artificial Intelligence - Perception · Machine Vision: It is easy to interface a TV camera to a computer and get an image into memory; the problem is understandingwhat the image represents. Vision takes lots of computation; in humans, roughly 10% of all calories consumed are burned in vision computation. · Speech Understanding: Speech understanding is available now. Some systems must be trained for the individual user and require pauses between words. Understanding continuous speech with a larger vocabulary is harder. · Touch(tactile or haptic) Sensation: Important for robot assembly tasks. - Robotics Although industrial robots have been expensive, robot hardware can be cheap: Radio Shack has sold a working robot arm and hand for $15. The limiting factor in application of robotics is not the cost of the robot hardware itself. What is needed is perception and intelligence to tell the robot what to do; ``blind'' robots are limited to very well-structured tasks (like spray painting car bodies). - Planning Planning attempts to order actions to achieve goals. Planning applications include logistics, manufacturing scheduling, planning manufacturing steps to construct a desired product. There are huge amounts of money to be saved through better planning. - Expert Systems Expert Systems attempt to capture the knowledge of a human expert and make it available through a computer program. There have been many successful and economically valuable applications of expert systems. Expert systems provide the following benefits • Reducing skill level needed to operate complex de vices. • Diagnostic advice for device repair. • Interpretation of complex data. • ``Cloning'' of scarce expertise. CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE • Capturing knowledge of expert who is about to ret ire. • Combining knowledge of multiple experts. - Theorem Proving Proving mathematical theorems might seem to be mainly of academic interest. However, many practical problems can be cast in terms of theorems. A general theorem prover can therefore be widely applicable. Examples: • Automatic construction of compiler code generators from a description of a CPU's instruction set. • J Moore and colleagues proved correctness of the floating-point division algorithm on AMD CPU chip. - Symbolic Mathematics Symbolic mathematics refers to manipulation of formulas, rather than arithmetic on numeric values. • Algebra • Differential and Integral Calculus Symbolic manipulation is often used in conjunction with ordinary scientific computation as a generator of programs used to actually do the calculations. Symbolic manipulation programs are an important component of scientific and engineering workstations. - Game Playing Games are good vehicles for research because they are well formalized, small, and self-contained. They are therefore easily programmed. Games can be good models of competitive situations, so principles discovered in game-playing programs may be applicable to practical problems. AI Technique Intelligence requires knowledge but knowledge possesses less desirable properties such as - It is voluminous - it is difficult to characterise accurately - it is constantly changing - it differs from data by being organised in a way that corresponds to its application An AI technique is a method that exploits knowledge that is represented so that CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE - The knowledge captures generalisations; situations that share properties, are grouped together, rather than being allowed separate representation. - It can be understood by people who must provide it; although for many programs the bulk of the data may come automatically, such as from readings. In many AI domains people must supply the knowledge to programs in a form the people understand and in a form that is acceptable to the program. - It can be easily modified to correct errors and reflect changes in real conditions. - It can be widely used even if it is incomplete or inaccurate. - It can be used to help overcome its own sheer bulk by helping to narrow the range of possibilities that must be usually considered. Problem Spaces and Search Building a system to solve a problem requires the following steps - Define the problem precisely including detailed specifications and what constitutes an acceptable solution; - Analyse the problem thoroughly for some features may have a dominant affect on the chosen method of solution; - Isolate and represent the background knowledge needed in the solution of the problem; - Choose the best problem solving techniques in the solution. Defining the Problem as state Search To understand what exactly artificial intelligence is, we illustrate some common problems. Problems dealt with in artificial intelligence generally use a common term called 'state'. A state represents a status of the solution at a given step of the problem solving procedure. The solution of a problem, thus, is a collection of the problem states. The problem solving procedure applies an operator to a state to get the next state. Then it applies another operator to the resulting state to derive a new state. The process of applying an operator to a state and its subsequent transition to the next state, thus, is continued until the goal (desired) state is derived. Such a method of solving a problem is generally referred to as state space approach For example, in order to solve the problem play a game, which is restricted to two person table or board games, we require the rules of the game and the targets for winning as well as a means of representing positions in the game. The opening position can be defined as the initial state and a winning position as a goal state, there can be more than one. legal moves allow for transfer from initial state to other states leading to the goal state. However the rules are far too CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE copious in most games especially chess where they exceed the number of particles in the universe 10. Thus the rules cannot in general be supplied accurately and computer programs cannot easily handle them. The storage also presents another problem but searching can be achieved by hashing. The number of rules that are used must be minimised and the set can be produced by expressing each rule in as general a form as possible. The representation of games in this way leads to a state space representation and it is natural for well organised games with some structure. This representation allows for the formal definition of a problem which necessitates the movement from a set of initial positions to one of a set of target positions. It means that the solution involves using known techniques and a systematic search. This is quite a common method in AI. Formal description of a problem - Define a state space that contains all possible configurations of the relevant objects, without enumerating all the states in it. A state space represents a problem in terms of states and operators that change states - Define some of these states as possible initial states; - Specify one or more as acceptable solutions, these are goal states; - Specify a set of rules as the possible actions allowed. This involves thinking about the generality of the rules, the assumptions made in the informal presentation and how much work can be anticipated by inclusion in the rules. The control strategy is again not fully discussed but the AI program needs a structure to facilitate the search which is a characteristic of this type of program. Example: The water jug problem :There are two jugs called four and three ; four holds a maximum of four gallons and three a maximum of three gallons. How can we get 2 gallons in the jug four. The state space is a set of ordered pairs giving the number of gallons in the pair of jugs at any time ie (four, three) where four = 0, 1, 2, 3, 4 and three = 0, 1, 2, 3. The start state is (0,0) and the goal state is (2,n) where n is a don't care but is limited to three holding from 0 to 3 gallons. The major production rules for solving this problem are shown below: CS6659-ARTIFICIAL INTELLIGENCE Initial condition VI SEM CSE goal comment 1 (four,three) if four < 4 (4,three) fill four from tap 2 (four,three) if three< 3 (four,3) fill three from tap 3 (four,three) If four > 0 (0,three) empty four into drain 4 (four,three) if three > 0 6 (four,three) if four+three<3 (four,0) empty three into drain (four+three,0 ) empty three into four (0,four+three ) empty four into three 7 (0,three) If three>0 (three,0) empty three into four 8 (four,0) if four>0 (0,four) empty four into three 9 (0,2) (2,0) empty three into four 5 (four,three) if four+three<4 10 (2,0) (0,2) 11 (four,three) if four<4 (4,three-diff) 12 (three,four) if three<3 (four-diff,3) empty four into three pour diff, 4-four, into four from three pour diff, 3-three, into three from four and a solution is given below Jug four, jug three rule applied 00 032307 Control strategies. 3 3 2 4 2 11 0 2 3 2 0 10 A good control strategy should have the following requirement: The first requirement is that it causes motion. In a game playing program the pieces move on the board and in the water jug CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE problem water is used to fill jugs. The second requirement is that it is systematic, this is a clear requirement for it would not be sensible to fill a jug and empty it repeatedly nor in a game would it be advisable to move a piece round and round the board in a cyclic way. We shall initially consider two systematic approaches to searching. Production Systems System Types of Production Systems. A Knowledge representation formalism consists of collections of condition-action rules(Production Rules or Operators), a database which is modified in accordance with the rules, and a Production System Interpreter which controls the operation of the rules i.e The 'control mechanism' of a Production System, determining the order in which Production Rules are fired. A system that uses this form of knowledge representation is called a production system. A production system consists of rules and factors. Knowledge is encoded in a declarative from which comprises of a set of rules of the form Situation ------------ Action SITUATION that implies ACTION. Example:IF the initial state is a goal state THEN quit. The major components of an AI production system are i. A global database ii. A set of production rules and iii. A control system The goal database is the central data structure used by an AI production system. The production system. The production rules operate on the global database. Each rule has a precondition that is either satisfied or not by the database. If the precondition is satisfied, the rule can be applied. Application of the rule changes the database. The control system chooses which applicable rule should be applied and ceases computation when a termination condition on the database is satisfied. If several rules are to fire at the same time, the control system resolves the conflicts. CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE Four classes of production systems:1. A monotonic production system 2. A non monotonic production system 3. A partially commutative production system 4. A Commutative production system. Advantages of production systems:1. Production systems provide an excellent tool for structuring AI programs. 2. Production Systems are highly modular because the individual rules can be added, removed or modified independently. 3. The production rules are expressed in a natural form, so the statements contained in the knowledge base should the a recording of an expert thinking out loud. Disadvantages of Production Systems:One important disadvantage is the fact that it may be very difficult analyse the flow of control within a production system because the individual rules don‟t call each other. Production systems describe the operations that can be performed in a search for a solution to the problem. They can be classified as follows. Monotonic production system :- A system in which the application of a rule never prevents the later application of another rule, that could have also been applied at the time the first rule was selected. Partially commutative production system:A production system in which the application of a particular sequence of rules transforms state X into state Y, then any permutation of those rules that is allowable also transforms state x into state Y. Theorem proving falls under monotonic partially communicative system. Blocks world and 8 puzzle problems like chemical analysis and synthesis come under monotonic, not partially commutative systems. Playing the game of bridge comes under non monotonic , not partially commutative system. For any problem, several production systems exist. Some will be efficient than others. Though it may seem that there is no relationship between kinds of problems and kinds of production systems, in practice there is a definite relationship. Partially commutative , monotonic production systems are useful for solving ignorable CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE problems. These systems are important for man implementation standpoint because they can be implemented without the ability to backtrack to previous states, when it is discovered that an incorrect path was followed. Such systems increase the efficiency since it is not necessary to keep track of the changes made in the search process. Monotonic partially commutative systems are useful for problems in which changes occur but can be reversed and in which the order of operation is not critical (ex: 8 puzzle problem). Production systems that are not partially commutative are useful for many problems in which irreversible changes occur, such as chemical analysis. When dealing with such systems, the order in which operations are performed is very important and hence correct decisions have to be made at the first time itself. Monotonic and Non monotonic Learning : Monotonic learning is when an agent may not learn any knowledge that contradicts what it already knows. For example, it may not replace a statement with its negation. Thus, the knowledge base may only grow with new facts in a monotonic fashion. The advantages of monotonic learning are: 1.greatly simplified truth-maintenance 2.greater choice in learning strategies Non-monotonic learning is when an agent may learn knowledge that contradicts what it already knows. So it may replace old knowledge with new if it believes there is sufficient reason to do so. The advantages of non-monotonic learning are: 1.increased applicability to real domains, 2.greater freedom in the order things are learned in A related property is the consistency of the knowledge. If an architecture must maintain a consistent knowledge base then any learning strategy it uses must be monotonic. 7- PROBLEM CHARACTERISTICS A problem may have different aspects of representation and explanation. In order to choose the most appropriate method for a particular problem, it is necessary to analyze the problem along several key dimensions. Some of the main key features of a problem are given below. CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE Is the problem decomposable into set of sub problems? Can the solution step be ignored or undone? Is the problem universally predictable? Is a good solution to the problem obvious without comparison to all the possible solutions? Is the desire solution a state of world or a path to a state? Is a large amount of knowledge absolutely required to solve the problem? Will the solution of the problem required interaction between the computer and the person? The above characteristics of a problem are called as 7-problem characteristics under which the solution must take place. PRODUCTION SYSTEM AND ITS CHARACTERISTICS The production system is a model of computation that can be applied to implement search algorithms and model human problem solving. Such problem solving knowledge can be packed up in the form of little quanta called productions. A production is a rule consisting of a situation recognition part and an action part. A production is a situation-action pair in which the left side is a list of things to watch for and the right side is a list of things to do so. When productions are used in deductive systems, the situation that trigger productions are specified combination of facts. The actions are restricted to being assertion of new facts deduced directly from the triggering combination. Production systems may be called premise conclusion pairs rather than situation action pair. A production system consists of following components. (a) side constituent that represent the current problem state and a right hand side that represent an output state. A rule is applicable if its left hand side matches with the current problem state. (b) A database, which contains all the appropriate information for the particular task. Some part of the database may be permanent while some part of this may pertain only to the solution of the current problem. (c) A control strategy that specifies order in which the rules will be compared to the database of rules and a way of resolving the conflicts that arise when several rules match simultaneously. CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE (d) A rule applier, which checks the capability of rule by matching the content state with the left hand side of the rule and finds the appropriate rule from database of rules. The important roles played by production systems include a powerful knowledge representation scheme. A production system not only represents knowledge but also action. It acts as a bridge between AI and expert systems. Production system provides a language in which the representation of expert knowledge is very natural. We can represent knowledge in a production system as a set of rules of the form If (condition) THEN (condition) along with a control system and a database. The control system serves as a rule interpreter and sequencer. The database acts as a context buffer, which records the conditions evaluated by the rules and information on which the rules act. The production rules are also known as condition – action, antecedent – consequent, pattern – action, situation – response, feedback – result pairs. For example, If (you have an exam tomorrow) THEN (study the whole night) The production system can be classified as monotonic, non-monotonic, partially commutative and commutative. Figure Architecture of Production System Features of Production System Some of the main features of production system are: Expressiveness and intuitiveness: In real world, many times situation comes like ―i f this happen-you CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE will do that‖, ―if this is so-then this should happ en‖ and many more. The production rules essentially tell us what to do in a given situation. 1. Simplicity: The structure of each sentence in a production system is unique and uniform as they use ―IF-THEN‖ structure. This structure provides simpli city in knowledge representation. This feature of production system improves the readability of production rules. 2. Modularity: This means production rule code the knowledge available in discrete pieces. Information can be treated as a collection of independent facts which may be added or deleted from the system with essentially no deletetious side effects. 3. Modifiability: This means the facility of modifying rules. It allows the development of production rules in a skeletal form first and then it is accurate to suit a specific application. 4. Knowledge intensive: The knowledge base of production system stores pure knowledge. This part does not contain any type of control or programming information. Each production rule is normally written as an English sentence; the problem of semantics is solved by the very structure of the representation. Disadvantages of production system 1. Opacity: This problem is generated by the combination ofproduction rules. The opacity is generated because of less prioritization of rules. More priority to a rule has the less opacity. 2. Inefficiency: During execution of a program several rules may active. A well devised control strategy reduces this problem. As the rules of the production system are large in number and they are hardly written in hierarchical manner, it requires some forms of complex search through all the production rules for each cycle of control program. 3. Absence of learning: Rule based production systems do not store the result of the problem for future use. Hence, it does not exhibit any type of learning capabilities. So for each time for a particular problem, some new solutions may come. 4. Conflict resolution: The rules in a production system should not have any type of conflict operations. When a new rule is added to a database, it should ensure that it does not have any conflicts with the existing rules. Problem characteristics CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE Analyze each of them with respect to the seven problem characteristics Chess Water jug 1. Chess Problem characteristic Satisfied Reason Is the problem decomposable? No One game have Single solution Can solution steps be ignored or No In actual game(not in PC) we can‟t undo previous steps Is the problem universe predictable? No Problem Universe is not predictable as we are not sure about move of other player(second player) Is a good solution absolute or relative? absolute Absolute solution : once you get one solution you do need to bother about other possible solution. Relative Solution : once you get one solution you have to find another possible solution to check which solution is best(i.e low cost). By considering this chess is absolute Is the solution a state or a path? Path Is the solution a state or a path to a state? – For natural language understanding, some of the words have different interpretations .therefore sentence may cause ambiguity. To solve the problem we need to find interpretation only , the workings are not necessary (i.e path to solution is not necessary) So In chess winning state(goal state) undone? describe path to state CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE What is the role of knowledge? Does the task require humaninteraction? Lot of knowledge helps to constrain the search for a solution. No Conversational In which there is intermediate communication between a person and the computer, either to provide additional assistance to the computer or to provide additional information to the user, or both. In chess additional assistance is not required 2. Water jug Problem characteristic Satisfied Reason Is the problem decomposable? No Can solution steps be ignored or undone? Yes Is the problem universe predictable? Yes Problem Universe is predictable bcz to slove this problem it require only one person .we can predict what will happen in next step Is a good solution absolute or relative? absolute Absolute solution , water jug problem may have number of solution , bt once we found one solution,no need to bother about other solution Bcz it doesn’t effect on its cost Is the solution a state or a path? Path Path to solution One Single solution What is the role of knowledge? lot of knowledge helps to constrain the search for a solution. CS6659-ARTIFICIAL INTELLIGENCE Does the task require humaninteraction? VI SEM CSE Yes additional assistance is required. Additional assistance, like to get jugs or pump ALGORITHM OF PROBLEM SOLVING Any one algorithm for a particular problem is not applicable over all types of problems in a variety of situations. So there should be a general problem solving algorithm, which may work for different strategies of different problems. Algorithm (problem name and specification) Step 1: Analyze the problem to get the starting state and goal state. Step 2: Find out the data about the starting state, goalstate Step 3: Find out the production rules from initial database for proceeding the problem to goal state. Step 4: Select some rules from the set of rules that can be applied to data. Step 5: Apply those rules to the initial state and proceed to get the next state. Step 6: Determine some new generated states after applying the rules. Accordingly make them as current state. Step 7: Finally, achieve some information about the goal state from the recently used current state and get the goal state. Step 8: CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE Exit. After applying the above rules an user may get the solution of the problem from a given state to another state. Let us take few examples. VARIOUS TYPES OF PROBLEMS AND THEIR SOLUTIONS Water Jug Problem Definition: Some jugs are given which should have non-calibrated properties. At least any one of the jugs should have filled with water. Then the process through which we can divide the whole water into different jugs according to the question can be called as water jug problem. Procedure: Suppose that you are given 3 jugs A,B,C with capacities 8,5 and 3 liters respectively but are not calibrated (i.e. no measuring mark will be there). Jug A is filled with 8 liters of water. By a series of pouring back and forth among the 3 jugs, divide the 8 liters into 2 equal parts i.e. 4 liters in jug A and 4 liters in jug B. How? In this problem, the start state is that the jug A will contain 8 liters water whereas jug B and jug C will be empty. The production rules involve filling a jug with some amount of water, taking from the jug A. The search will be finding the sequence of production rules which transform the initial state to final state. The state space for this problem can be described by set of ordered pairs of three variables (A, B, C) where variable A represents the 8 liter jug, variable B represents the 5 liter and variable C represents the 3 liters jug respectively. Figure The production rules are formulated as follows: CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE Step 1: In this step, the initial state will be (8, 0, 0) as the jug B and jug C will be empty. So the water of jug A can be poured like: (5, 0, 3) means 3 liters to jug C and 5 liters will remain in jug A. (3, 5, 0) means 5 liters to jug B and 3 liters will be in jug A. (0, 5, 3) means 5 liters to jug B and 3 liters to jug C and jug C and jug A will be empty. Step2: In this step, start with the first current state of step-1 i.e. (5, 0, 3). This state can only be implemented by pouring the 3 liters water of jug C into jug B. so the state will be (5, 3, 0). Next, come to the second current state of step-1 i.e. (3, 5, 0). This state can be implemented by only pouring the 5 liters water of jug B into jug C. So the remaining water in jug B will be 2 liters. So the state will be (3, 2, 3). Finally come to the third current state of step-1 i.e. (0, 5, 3). But from this state no more state can be implemented because after implementing we may get (5, 0, 3) or (3, 5, 0) or (8, 0, 0) which are repeated state. Hence these states are not considerably again for going towards goal. So the state will be like: (5, 0, 3) (3, 5, 0) (5, 3, 0) (3, 2, 3) (0,5, 3) X Step 3: In this step, start with the first current state of step-2 i.e. (5, 3, 0) and proceed likewise the above steps. (5,3, 0) (2, 3, 3) CS6659-ARTIFICIAL INTELLIGENCE (3, 2,3) VI SEM CSE (6, 2, 0) Step 4: In this step, start with the first current state of step-3 i.e. (2, 3, 3) and proceed. _2,3, 3_ _ _2, 5, 1_ _6,2, 0_ _ _7, 0, 1_ Step 5: _2, 5, 1_ _ _7, 0, 1_ _6, 0, 2_ _ _1, 5, 2_ Step6: _7, 0, 1_ _ _7, 1, 0_ _1, 4, 3_ _ _1, 4, 3_ Step7: _7, 1, 0_ _ _4, 1, 3_ _1, 4, 3_ _ _4, 4, 0_ _Goal_ So finally the state will be (4, 4, 0) that means jug A and jug B contains 4 liters of water each which is our goal state. One thing you have to very careful about the pouring of water from one jug to another that the capacity of jug must satisfy the condition to contain that much of water. The tree of the water jug problem can be like: CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE Figure Comments: This problem takes a lot of time to find the goal state. This process of searching in this problem is very lengthy. At each step of the problem the user have to strictly follow the production rules. Otherwise the problem may go to infinity step. Missionaries and Carnivals Problem Definition: In Missionaries and Carnivals Problem, initially there are some missionaries and some carnivals will be at a sideof a river. They want to cross the river. But there is only one boat available to cross the river. The capacity of the boat is 2 and no one missionary or no Carnivals can cross the river together. So for solving the problem and to find out the solution on different states is called the Missionaries and Carnival Problem. CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE Procedure: Let us take an example. Initially a boatman, Grass, Tiger and Goat is present at the left bank of the river and want to cross it. The only boat available is one capable of carrying 2 objects of portions at a time. The condition of safe crossing is that at no time the tiger present with goat, the goat present with the grass at the either side of the river. How they will cross the river? The objective of the solution is to find the sequence of their transfer from one bank of the river to the other using the boat sailing through the river satisfying these constraints. Let us use different representations for each of the missionaries and Carnivals as follows. B: Boat T: Tiger G: Goat Gr: Grass Step 1: According to the question, this step will be (B, T, G, Gr) as all the Missionaries and the Carnivals are at one side of the bank of the river. Different states from this state can be implemented as The states (B, T, O, O) and (B, O, O, Gr) will not be countable because at a time the Boatman and the Tiger or the Boatman and grass cannot go. (According to the question). Step 2: Now consider the current state of step-1 i.e. the state (B, O, G, O). The state is the right side of the river. CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE So on the left side the state may be (B, T, O, Gr) i.e._B,O,G,O_ _ _B,T, O, Gr_ (Right) (Left) Step 3: Now proceed according to the left and right sides of the river such that the condition of the problem must be satisfied. Step 4: First, consider the first current state on the right side of step 3 i.e. Now consider the second current state on the right side of step-3 i.e. Step 5: Now first consider the first current state of step 4 i.e. CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE Hence the final state will be (B, T, G, Gr) which are on the right side of the river. Comments: This problem requires a lot of space for its state implementation. It takes a lot of time to search the goal node. The production rules at each level of state are very strict. Chess Problem Definition: It is a normal chess game. In a chess problem, the start is the initial configuration of chessboard. The final state is the any board configuration, which is a winning position for any player. There may be multiple final positions and each board configuration can be thought of as representing a state of the game. Whenever any player moves any piece, it leads to different state of game. CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE Procedure: Figure The above figure shows a 3x3 chessboard with each square labeled with integers 1 to 9. We simply enumerate the alternative moves rather than developing a general move operator because of the reduced size of the problem. Using a predicate called move in predicate calculus, whose parameters are the starting and ending squares, we have described the legal moves on the board. For example, move (1, 8) takes the knight from the upper left-hand corner to the middle of the bottom row. While playing Chess, a knight can move two squares either horizontally or vertically followed by one square in an orthogonal direction as long as it does not move off the board. The all possible moves of figure are as follows. Move (1, 8) move (6, 1) Move (1, 6) move (6, 7) Move (2, 9) move (7, 2) Move (2, 7) move (7, 6) Move (3, 4) move (8, 3) Move (3, 8) move (8, 1) Move (4, 1) move (9, 2) Move (4, 3) move (9, 4) The above predicates of the Chess Problem form the knowledge base for this problem. An unification algorithm is used to access the knowledge base. Suppose we need to find the positions to which the knight can move from a particular location, CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE square 2. The goal move (z, x) unifies with two different predicates in the knowledge base, with the substitutions {7/x} and {9/x}. Given the goal move (2, 3), the responsible is failure, because no move (2, 3) exists in the knowledge base. Comments: In this game a lots of production rules are applied for each move of the square on the chessboard. A lots of searching are required in this game. Implementation of algorithm in the knowledge base is very important. 8- Queen Problem Definition: ―We have 8 queens and an 8x8 Chess board having al ternate black and white squares. The queens are placed on the chessboard. Any queen can attack any other queen placed on same row, or column or diagonal. We have to find the proper placement of queens on the Chess board in such a way that no queen attacks other queen‖. Procedure: Figure A possible board configuration of 8 queen problem In figure , the possible board configuration for 8-queen problem has been shown. The board has alternative black and white positions on it. The different positions on the board hold the queens. The production rule for this game is you cannot put the same queens in a same row or same column or in same diagonal. After shifting a single queen from its position on the board, the user have to shift other queens according to the production rule. Starting from the first row CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE on the board the queen of their corresponding row and column are to be moved from their original positions to another position. Finally the player has to be ensured that no rows or columns or diagonals of on the table is same. Comments: This problem requires a lot of space to store the board. It requires a lot of searching to reach at the goal state. The time factor for each queen‟s move is very lengthy. The problem is very strict towards the production rules. 8- Puzzle Problem Definition: ―It has set off a 3x3 board having 9 block spaces o ut of which 8 blocks having tiles bearing number from 1 to 8. One space is left blank. The tile adjacent to blank space can move into it. We have to arrange the tiles in a sequence for getting the goal state‖. Procedure: The 8-puzzle problem belongs to the category of ―sl iding block puzzle‖ type of problem. The 8-puzzle i s a square tray in which eight square tiles are placed. The remaining ninth square is uncovered. Each tile in the tray has a number on it. A tile that is adjacent to blank space can be slide into that space. The game consists of a starting position and a specified goal position. The goal is to transform the starting position into the goal position by sliding the tiles around. The control mechanisms for an 8-puzzle solver must keep track of the order in which operations are performed, so that the operations can be undone one at a time if necessary. The objective of the puzzles is to find a sequence of tile movements that leads from a starting configuration to a goal configuration such as two situations given below. Figure (Starting State) (Goal State) The state of 8-puzzle is the different permutation of tiles within the frame. The operations are the permissible moves up, down, left, right. Here at each step of the problem a function f(x) will be defined CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE which is the combination of g(x) and h(x). i.e. F(x)=g(x) + h (x) Where g (x): how many steps in the problem you have already done or the current state from the initial state. h (x): Number of ways through which you can reach at the goal state from the current state or Or h (x): is the heuristic estimator that compares the current state with the goal state note down how many states are displaced from the initial or the current state. After calculating the f (x) value at each step finally take the smallest f (x) value at every step and choose that as the next current state to get the goal state. Let us take an example. Figure (Initial State) (Goal State) CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE Step1: f (x) is the step required to reach at the goal state from the initial state. So in the tray either 6 or 8 can change their portions to fill the empty position. So there will be two possible current states namely B and C. The f (x) value of B is 6 and that of C is 4. As 4 is the minimum, so take C as the current state to the next state. Step 2: In this step, from the tray C three states can be drawn. The empty position will contain either 5 or 3 or 6. So for three different values three different states can be obtained. Then calculate each of their f (x) and take the minimum one. Here the state F has the minimum value i.e. 4 and hence take that as the next current state. Step 3: The tray F can have 4 different states as the empty positions can be filled with b4 values i.e.2, 4, 5, 8. 31 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE Step 4: In the step-3 the tray I has the smallest f (n) value. The tray I can be implemented in 3 different states because the empty position can be filled by the members like 7, 8, 6. Hence, we reached at the goal state after few changes of tiles in different positions of the trays. Comments: This problem requires a lot of space for saving the different trays. Time complexity is more than that of other problems. The user has to be very careful about the shifting of tiles in the trays. Very complex puzzle games can be solved by this technique. Tower of Hanoi Problem Definition: ―We are given a tower of eight discs (initially) fo ur in the applet below, initially stacked in increasing size on one of three pegs. The objective is to transfer the entire tower to one of the other pegs (the right most one in the applet below), moving only one disc at a time and never a larger one onto a smaller‖. Procedure: The tower of Hanoi puzzle was invented by the French mathematician Eduardo Lucas in 1883. The puzzle is well known to students of computer science since it appears in virtually any introductory text on data structure and algorithms. The objective of the puzzle is to move the entire stack to another rod, obeying the following rules. 32 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE Only one disc can be moved at a time. Each move consist of taking the upper disc from one of the rods and sliding it onto another rod, on top of the other discs that may already be present on that rod. No disc may be placed on the top of a smaller disk. There is a legend about a Vietnamese temple which contains a large room with three times. Worn posts in it surrounded by 64 golden disks. The priests of Hanoi, acting out of command of an ancient prophecy, have been moving these disks, in accordance with the rules of the puzzle, since that time. The puzzle is therefore also known as the tower of Brahma puzzle. According to the legend, when the last move of the puzzle is completed, the world will end. There are many variations on this legend. For instance, in some tellings, the temple is a monastery and the priests are monks. The temple or monastery may be said to be in different parts of the world including Hanoi, Vietnam and may be associated with any religion. The flag tower of Hanoi may have served as the inspiration for the name. The puzzle can be played with any number of disks, although many toy versions have around seven to nine of them. The game seems impossible to many novices yet is solvable with a simple algorithm. The following solution is a very simple method to solve the tower of Hanoi problem. Alternative moves between the smallest piece and a non- smallest piece. When moving the smallest piece, always move it in the same direction (to the right if starting number of pieces is even, to the left if starting number of pieces is odd). If there is no tower in the chosen direction, move the pieces to the opposite end, but then continue to move in the correct direction, for example if you started with three pieces, you would move the smallest piece to the opposite end, then continue in the left direction after that. When the turn is to move the non-smallest piece, there is only one legal move. Doing this should complete the puzzle using the least amount of moves to do so. Finally, the user will reach at the goal. Also various types of solutions may be possible to solve the tower of Hanoi problem like recursive procedure, non-recursive procedure and binary solution procedure. 33 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE Another simple solution to the problem is given below. For an even number of disks Make the legal move between pegs A and B. Make the legal moves between pages A and C. Make the legal move between pages B and C. For an even number of disks Make the legal move between pegs A and C. Make the legal move between pegs A and B. Make the legal move between pegs B and C. Repeat until complete. A recursive solution for tower of Hanoi problem is as follows. A key to solving this problem is to recognize that it can be solve by breaking the problem down into the collection of smaller problems and further breaking those problems down into even smaller problems until a solution is reached. The following procedure demonstrates this approach. Label the pegs A, B, C - these levels may move at different steps. Let n be the total number of disks. Number of disks from 1 (smallest, topmost) to n (largest, bottom most). To move n disks from peg A to peg C. a) Move n-1 disks from A to B. This leaves disk #n alone on peg A. b) Move disk #n from A to C. c) Move n-1 disks from B to C so they sit on disk #n. To carry out steps a and c, apply the same algorithm again for n-1. The entire procedure is a finite number of steps, since at most point the algorithm will be required for n = 1. This step, moving a single disc from peg A to peg B, is trivial. Comments: The tower of Hanoi is frequently used in psychological research on problem solving. This problem is frequently used in neuro-psychological diagnosis and treatment of executive functions. The tower of Hanoi is also used as backup rotation scheme when performing computer data backups where multiple tabs/media are involved. This problem is very popular for teaching recursive algorithm to beginning programming students. A pictorial version of this puzzle is programmed into emacs editor, accessed by 34 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE typing M - X Hanoi. The tower of Hanoi is also used as a test by neuro-psychologists trying to evaluate frontal lobe deficits. Cryptarithmatic Problem Definition: ―It is an arithmetic problem which is represented i n letters. It involves the decoding of digit represented by a character. It is in the form of some arithmetic equation where digits are distinctly represented by some characters. The problem requires finding of the digit represented by each character. Assign a decimal digit to each of the letters in such a way that the answer to the problem is correct. If the same letter occurs more than once, it must be assigned the same digit each time. No two different letters may be assigned the same digit‖. Procedure: Cryptarithmatic problem is an interesting constraint satisfaction problem for which different algorithms have been developed. Cryptarithm is a mathematical puzzle in which digits are replaced by letters of the alphabet or other symbols. Cryptarithmatic is the science and art of creating and solving cryptarithms. The different constraints of defining a cryptarithmatic problem are as follows. 1) Each letter or symbol represented only one and a unique digit throughout the problem. 2) When the digits replace letters or symbols, the resultant arithmetical operation must be correct. The above two constraints lead to some other restrictions in the problem. For example: Consider that, the base of the number is 10. Then there must be at most 10 unique symbols or letters in the problem. Otherwise, it would not possible to assign a unique digit to unique letter or symbol in the problem. To be semantically meaningful, a number 35 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE must not begin with a 0. So, the letters at the beginning of each number should not correspond to 0. Also one can solve the problem by a simple blind search. But a rule based searching technique can provide the solution in minimum time. SEARCHING Problem solving in artificial intelligence may be characterized as a systematic search through a range of possible actions in order to reach some predefined goal or solution. In AI problem solving by search algorithms is quite common technique. In the coming age of AI it will have big impact on the technologies of the robotics and path finding. It is also widely used in travel planning. This chapter contains the different search algorithms of AI used in various applications. Let us look the concepts for visualizing the algorithms. A search algorithm takes a problem as input and returns the solution in the form of an action sequence. Once the solution is found, the actions it recommends can be carried out. This phase is called as the execution phase. After formulating a goal and problem to solve the agent cells a search procedure to solve it. A problem can be defined by 5 components. a) The initial state: The state from which agent will start. b) The goal state: The state to be finally reached. c) The current state: The state at which the agent is present after starting from the initial state. d) Successor function: It is the description of possible actions and their outcomes. e) Path cost: It is a function that assigns a numeric cost to each path. DIFFERENT TYPES OF SEARCHING The searching algorithms can be various types. When any type of searching is performed, there may some information about the searching or mayn‟t be. Also it is possible that the searching procedure may depend upon any constraints or rules. However, generally searching can be classified into two types i.e. uninformed searching and informed searching. Also some other classifications of these searches are given below in the figure 36 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE The searching algorithms are divided into two categories 1. Uninformed Search Algorithms (Blind Search) 2. Informed Search Algorithms (Heuristic Search) There are six Uninformed Search Algorithms 1.Breadth First Search 2.Uniform-cost search 3.Depth-first search 4.Depth-limited search 5. Iterative deepening depth-first search 6.Bidirectional Search There are three Informed Search Algorithms 1. Best First Search 2. Greedy Search 3. A* Search Blind search Vs Heuristic search Blind search Heuristic search No information about the number of steps The path cost from the current state to (or)path cost fromcurrent stateto goal state the goal state is calculated ,to select the minimum path cost as the next state. Less effective insearch method More effectiveinsearch method Problem to be solved with the given information Additional information can be added as assumption to solve the problem Breadth-first search Breadth-first search is a simple strategy in which the root node is expanded first, then all the successors of the root node are expanded next, then their successors, and so on. In general, all the nodes are expanded at a given depth in thesearch tree before any nodes at the next level are expanded. Breadth-firstsearch canbe implemented bycalling TREE- SEARCH with an empty fringe that is afirst-in-first-out (FIFO) queue, assuringthat the nodes that arevisited first will be expanded first. In other words,calling TREE-SEARCH(Problem,FIFO- QUEUE()) results in a breadth-first search. The FIFO queue puts allnewlygenerated successors at the endofthe queue, which means that shallow nodes are expanded before deeper nodes 37 CS6659-ARTIFICIAL INTELLIGENCE Breadth first search trees after node expansions Example: Route finding problem Task: Find a ,path from. S to G using BFS VI SEM CSE The path in the 2nd depth level is selected, (i.e) SBG{or) SCG. 38 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE Algorithm : function BFS{problem) returns a solution or failure return TREE-SEARCH (problem, FIFO-QUEUE( )) Time and space complexity: Example: Time complexity 2 d = 1 +b + b + . . . . . . . + b d = O(b ) The space complexity is same as time complexity because all the leaf nodes of the tree d must be maintained in memory at the same time = O(b ) Completeness: Yes Optimality: Yes, provided the path cost is a non decreasing function of the depth of the node Guaranteed Advantage: shallowest depth level to find the single solution at the Disadvantage: Suitable for only smallest instances problem (i.e.) (number of levels to be mininimum (or) branching factor to be minimum) ' Uniform-cost search Breadth-first search is optimal whenallstep costs are equal, because it always expands the shallowest unexpanded node. By a simple extension, we can find an algorithm that is optimal with anystep cost function. Instead of expanding the shallowest node, uniform-cost search expands the node n with the lowest path cost. Note that if all step costs are equal, this is identical to breadth-first search. 39 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE Uniform-cost search does not care about the number of steps a path has, but only about their total cost. Example: Route finding problem Task : Find a minimum path cost from S to G Since the value of A is less it is expanded first, but it is not optimal. B to be expanded next SBG is the path with minimum path cost. No need to expand the next path SC, because its path cost is high to reach C from S, as well as goal state is reached in the previous path with minimum cost. 40 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE Time and space complexity: Time complexity is same as breadth first instead of depth level the minimum path cost is considered. search d Time complexity: Completeness: Yes O(b ) because d Space complexity: O(b Optimality: Yes ) Advantage: Guaranteed to find the single solution at minimum path cost. Disadvantage: Suitable for only smallest instances problem. Depth-first search Depth-first search always expands the deepest node in the current fringe of the search tree The search proceeds immediately to the deepest level of the search tree, where the nodes have no successors. As those nodes are expanded, they are dropped from the fringe, so then the search ―backs up― to the next shallowest node that still has unexploredsuccessors.Thisstrategy can be implemented by TREE-SEARCH with a last-in-first-out (LIFO) queue, also known as a stack. Depth first search tree with 3 level expansion 41 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE Example: Route finding problem Task: Find a path from S to G using DFS The path in the 3rd depth level is selected. (i.e. S-A-D-G Algorithm: function DFS(problem) return a solution or failure TREE-SEARCH(problem, LIFO-QUEUE()) Time and space complexity: In the worst case depth first search has to expand all the nodes Time complexity m : O(b ). The nodes are expanded towards one particular direction requires memory for only that nodes. 42 CS6659-ARTIFICIAL INTELLIGENCE Space complexity : VI SEM CSE O(bm) b=2 m = 2 :. bm=4 Completeness: No Optimality: No Advantage: If more than one solution exists (or) number of levels is high then DFS is best because exploration is done only in a small portion of the whole space. Disadvantage: Not guaranteed to find a solution Depth - limited search 1. Definition: A cut off (maximum level of the depth) is introduced in this search technique to over come the disadvantage of depth first search.The cutoff value depends on the number of states. Example: Route finding problem The number of states in the given map is 5. So, it is 43 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE possible to get the goal state at a maximum depth of 4. Therefore the cutoff value is 4 Task : Find a path from A to E. A recursive implementation of depth-limited search function DEPTH-LIMITED-SEARCH(problem, limit) returns a solution, or failure/cutoff return RECURSIVE-DLS(MAKE-NODE(INITIAL-STATE [problem]), problem, limit) function RECURSIVE-DLS(node, problem, limit) returns a solution, or failure/cutoff cutoffoccurred? <- false if GOAL-TEST[problem](STATE[node]) then return SOLUTION(node) else if DEPTH[node] =limit then return cutoff else for each successor in EXPAND(node, problem) do result <RECURSIVE-DLS(successor, problem, limit) if result = cutoff then cutoff-occurred?<- true else if result failure then return result if cutoff-occurred? then return cutoff else return failure 44 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE Time and space complexity: The worst case time complexity is equivalent to BFS and worst case DFS. l Time complexity : O(b ) The nodes which is expanded in one particular direction above to be stored. Space complexity : O(bl) Optimality: No, because not guaranteed to find the shortest solution first in the search technique. Completeness : Yes, guaranteed to find the solution if it exists. Advantage: Cut off level is introduced in the DFS technique Disadvantage : Not guaranteed to find the optimal solution. Iterative deepening search Iterative deepening search Definition: Iterative deepening search is a strategy that side step the issue of choosing the best depth limits by trying all possible depth limits. Example: Route finding problem Task: Find a path from A to G Limit = 0 Limit = 1 45 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE Limit = 2 Solution: The goal state G can be reached from A in four ways. They are: -------Limit 1.A — B — D - E — G 4 2.A - B - D - E - G ------- Limit 4 3.A - C - E - G ------- Limit 3 4.A - F - G ------ Limit2 Since it is a iterative deepening search it selects lowest depth limit (i.e.) A-F-G is selected as the solution path. The iterative deepening search algorithm : function ITERATIVE-DEEPENING-SEARCH (problem) returns a solution, or failure inputs : problem for depth <- 0 to do result < -DEPTH-LIMITED-SEARCH(problem, depth) if result cutoff then return result Time and space complexity : Iterative deepening combines the advantage of breadth first search and depth first search (i.e) expansion of states is done as BFS and memory requirement is equivalent to DFS. d Time complexity : O(b ) Space Complexity : O(bd) 46 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE Optimality: Yes, because the order of expansion of states is similar to breadth first search. Completeness:yes, guaranteed to find the solutionifit exists. Advantage: This method is preferred for large state space and the depth of the search is not known. Disadvantage : Many states are expanded multiple times Example : The state D is expanded twice in limit 2 Bidirectional search :Bidirectional search is a strategy that simultaneously search Definition boththedirections(i.e.) forward from the initial state andbackward from the goal, and stops when the two searches meet in the middle. Example: Route finding problem Task : Find a path from A to E. Search from forward (A) : Search from backward (E) : Time and space complexity: 47 CS6659-ARTIFICIAL INTELLIGENCE The forward and backward d/2 VI SEM CSE searches done at the same time d/2 will lead to the solution in O(2b ) = O(b )step, because search is done to go only halfway If the two searches meet at all, the nodes of at least one of them d/2 must all be retained in memory requires O(b ) space. Optimality: Yes, because the order of expansion of states is done in both the directions. Completeness: Yes, guaranteed to find the solution if it exists. Advantage : Time and space complexity is reduced. Disadvantage: If two searches (forward, backward) does not meet at all, complexity arises in the search technique. In backward search calculating predecessor is difficult task. If more than one goal state „exists then explicit, multiple state search is required Comparing uninformed search strategies Criterion Complete Time Space Optimal Breadth First Yes O(b ) O(b ) Yes Uniform Cost Yes O(b ) O(b ) Yes Depth First No O(bm) O(bm) No Depth Limited No O(b ) O(bl) No Iterative Deepening Yes O(b ) O(bd) Yes Avoiding Repeated States The most important complication of search strategy is expanding states that have already been encountered and expanded before on some other path A state space and its exponentially larger search tree Bi direction Yes O(b ) O(b ) Yes 48 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE The repeated states can be avoided using three different ways. They are: 1. Do not return to thestate you just came from (i.e) avoid any successor that is the same state as the node‟s parent. 2. Do not create path with cycles (i.e) avoid any successor of a node that is the same as any of the node‟s ancestors. 3. Do not generate any state that was ever generated before. The general TREE-SEARCH algorithm is modified with additional data structure, such as : Closed list - which stores every expanded node. Open list - fringe of unexpanded nodes. If the current node matches a node on the closed list, then it is discarded and it is not onsidered for expansion. This is done with GRAPH-SEARCH algorithm. This algorithm is efficient for problems with many repeated states function GRAPH-SEARCH (problem, fringe) returns a solution, or failure closed <- an empty set fringe <- INSERT (MAKE-NODE(INITIAL-STATE[problem]), fringe) loop do if EMPTv?(fringe) then return failure node <- REMOVE-FIKST (fringe) if GOAL-TEST [problem I(STATE[node]) then return SOLUTION (node) 49 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE if STATE [node] is not in closed then add STATE [node] to closed fringe <- INSERT-ALL(EXPAND(node, problem), fringe) The worst-case time and space requirements are proportional to the size of the state d space, this may be much smaller than O(b ) Searching With Partial Information When the knowledge of the states or actions is incomplete about the environment, then onlypartial information is known to the agent. This incompleteness lead to three distinct problem types. They are: Sensorlessproblems(conformantproblems): Ifthe agent has no sensors at all, then it could be in one of several possible initial to states, and each action might therefore lead one of possible successor states. (ii) Contigency areuncertain,then problems: If the environment is partially observable or if actions theagent‟s percepts provide new information after each action. A problem is called adversarial if the uncertainty is caused by the actions of another agent. To handle the situation of unknown circumstances the agent needs a contigency plan. (iii)Explorationproblem:It is an extreme case of contigency problems, where the states andactionsofthe environment are unknown and the agent must act to discover them. Measuring Performance of Algorithms There are two aspects of algorithmic performance: • Time - Instructions take time. - How fast does the algorithm perform? - What affects its runtime? • Space - Data structures take space - What kind of data structures can be used? - How does choice of data structure affect the runtime? Algorithms cannot be compared by running them on computers. Run time is system dependent. Even on same computer would depend on language Real time units like microseconds not to be used. 50 - UNIT II REPRESENTATION OF KNOWLEDGE Game playing – Knowledge representation, Knowledge representation using Predicate logic,Introduction to predicate calculus, Resolution, Use of predicate calculus, Knowledge representation using other logic-Structured representation of knowledge. The Knowledge Base The central component of a Knowledge Based agent is its Knowledge Base or KB. Informally a KB is a set of sentences. Each sentence is expressed in a language called a Knowledge Representation Language The knowledge based agent perform the representation. The tasks (constructed following agent To know the current state of the How to infer the unseen properties of the New changes in the Goal of the How to perform actions depends on language to represent which reasoning is carried out to achieve the The of a knowledge sentence. sentences are based knowledge agent is expressed a language How to sentences to the existing knowledge and how to the answer for the given percept (inference)? To these operations two standard tasks are used (i.e) TELL and ASK. TELL - to add new sentences to the known To know the current state of the function KB-AGENT(percept) returns an KB, a knowledge t, a counter, initially 0, indicating <TELL(KB,MAKE-ACTION-SENTENCE(action, t <- t + return CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE and sentence representing the MAKE-ACTION-QUERY: takes returns about the the time as input, and returns MAKE-ACTION-SENTENCE: constructs a sentence asserting that the chosen action was Three levels of knowledge based agent : level knows a formal work correctly work knowledge which describes the agent by saying what languages representation. If TELL then most of the time we can level and not worry about lower delivering a passenger to Marin County and might know that it San Francisco and that the Golden Gate Bridge the only link between two The knowledge is encoded is logical iii) The implementation level : is the level agent logical represented as a physical representation in The Wumpus World that runs Environment Problem statement: Wumpus was an early game, based on agent who explores a cave consisting of connected passage ways. Somewhere in the cave is the wumpus, a some contain bottomless pits that will anyone who into these rooms (except for the who too big fall in). The only feature living in this environment is the occasional of Monster: the Bottomless Action: shoot an Treasure: CSE – Panimalar Institute of Technology 2 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE Wumpus world is a grid of squares surrounded walls, Each square can agent and Agent at 1,1 (lower Agent facing to the The gold the Wumpus are placed at random in the grid — can’t be in start The wumpus can be shot by an agent, but the agent has only In the square containing the wumpus and in the adjacent squares the agent will perceive a a a a agent When walks into a wall, bump. When the wumpus is killed, perceived anywhere in To perceive the agent order perceive a percept CSE – Panimalar Institute of Technology 3 CS6659-ARTIFICIAL INTELLIGENCE 2 3 4 5 VI SEM CSE Turn left by Turn right by Grab - pick up an Shoot - fire an arrow to kill the wumpus, to used only to Points and/or Find Death by Death by Each Picking up the PEAS description for WUMPUS WORLD ENVIRONMENT measure: +1000 for picking up the into a pit or being eaten by the wumpus, -1 for taken and -10 for using up the for each Environment: A 4 x 4 grid of rooms. The agent always [1,1], to locations of the gold and the wumpus are chosen randomly, with uniform distribution, from the squares other the start square. In addition, each square other than the start be a pit, with probability Actuators: right The agent can by 900. The forward, turn left by 900, dies miserable death if it a live wumpus. Moving wall in front of enters a square containing pit forward has no effect if there is agent. The action Grab can be used to pick up an object that is the same squareas the agent. The action Shoot can used to fire an arrow in a straight line in the the agent is facing. The arrow continues until it hits hence kills) the wumpus or hits a The only has one arrow, so only the first Shoot action has any effect. CSE – Panimalar Institute of Technology 4 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE Sensors: The agent has five sensors, each of which gives single bit (not diagonally) adjacent squares will perceive the directly adjacent to a pit, will perceive breeze. the a When agent walks into a wall, will perceive bump. When wumpus is killed, it scream that be perceived anywhere the For example, percept there is stench breeze, will Breeze, None, None, Initial state : Agent in the location of square Goal state : To find the gold and bring it out of as quickly as possible, without getting After percept (None, None, None, None, 1. The first step taken by the agent in The initial situation, after percept None, wumpus None, but CSE – Panimalar Institute of Technology 5 CS6659-ARTIFICIAL INTELLIGENCE After the VI SEM CSE with percept (None, Breeze, None, None, 2. The perceives, the pit be exist (3,1) (or) (2,2) (as backtracks to the towards the other Breeze in square (2,1). anyone of the squares the rule). Now the (1,1) and selects a (1,2). After the move with (Stench, None, None, None, agent in Therefore the wumpus exist in any one of the squares (1,3) (or) Now the agent identifies the safe square (2,2) with the following two should exist in (2,1) also. Therefore the is exist only in the square If the pit is exist in (2,2) be perceived by the (1,2) the pit is exist in the should Therefore (3,1) CSE – Panimalar Institute of Technology 6 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE After the fourth move (None, None, None, After the fifth move with percept (Stench, Breeze, Glitter, The square (2,3) has the percept — Glitter, indicates is in the square. the agent uses following operators in sequence to come out of the with Grab — Pick up an object Climb — To leave the cave from the start Representation, Reasoning And Logic main aim knowledge in computer tractable improve defined by two aspects The of a language inside the form, help agents representation language describes how sentences CSE – Panimalar Institute of Technology 7 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE The semantics determines facts in the world (i.e) agent believes the Example: x=y, a valid statement If the value of x and y matches then true else false Logic: A knowledge representation language in which syntax and semantics are defined correctly is known as Semantics: In logic, semantics represents the meaning of a sentence is what it states about the Inference (or) Reasoning: From the set of given facts conclusion is derived using logical inference or Validity: A sentence is valid or necessarily true if it interpretations in worlds (i.e) universally Example: There is a stench at [1, 1] wumpus The above example is at [1,1] or there problem. true, not because, of is world environment. Satisfiability: A sentence is satisfiable if and only there is some interpretation in some world for which it true. Example: There is a wumpus at [1,3] sentence Example: There is a wall in front of wall in front of me world wumpus world wumpus and there CSE – Panimalar Institute of Technology 8 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE LOGICS - AN INTRODUCTION Logics consist of the following two representations sequence. 1. A formal system Syntax - to which describe the describes state how of the to make which describes the meaning of the 2. The entailments theory a set of a set of rules for deducing We will represent the sentences using two different logics. They 1. Propositional logic (or) Boolean 2. Predicate logic (or) First order Propositional logic (or) Boolean logic The syntax of sentences. The atomic sentences the syntactic elements-consist of a single proposition Each symbol stands for a proposition that can be true or Complex sentences are constructed from simpler using logical There are five connectives in common : A sentence such is called the A literal is either an atomic sentence (a literal) or a negated atomic sentence (a negative (and) : A sentence whose main connective , such P 3,1 V (or) a is, called a A sentence using V, such (implies) A sentence such called an implication. Its premise ) V P3,1 CSE – Panimalar Institute of Technology 9 CS6659-ARTIFICIAL INTELLIGENCE P3,1 VI SEM CSE or consequent as rules or if-then its are also only if). (if is of sentences propositional sentence -> Atomic sentence I Complex sentence Atomic Sentence -> True I False I Symbol (P, Q, R) Complex sentence -> (Sentence) (Sentence Connective Sentence) Sentence connective I V Order of precedence (from highest to lowest) , V Note : The grammar is very strict about parentheses: sentence must enclosed in Example PVQ S is equivalent to(( P)V(Q R))=> Truth table for the five logical P The semantics Propositional true P V defines with logic, a the model P P rules for a simply determining fixes the truth the value- false-for every proposition For example, if the sentences in the use of the proposition P1,2 possible model ml = = P3,1 knowledge base then CSE – Panimalar Institute of Technology 10 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE With three proposition symbols, there = 8 The semantics for propositional logic must specify how compute the truth value of any sentence, given a are the connectives; therefore, compute the truth of of need atomic how and to how the For example, a square is breezy a neighboring square has a pit, and a square is breezy if neighboring has a pit. we need bi conditional such P1,2 can define sentence. the is true in every row (i.e different types of logical constants) then the sentence a valid Example: (P V is a valid sentence or P V P check whether the given (P V H) a true ((P V H) because different logical Two sentences a and b are logically equivalent if they are true the same set of We write this as a For example, we can easily show (using truth tables) that Q and P are logically CSE – Panimalar Institute of Technology 11 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE An alternative definition of equivalence is as follows: for any sentences a and b if and only if a=b and logical (a (a V (b ((a V ( (a ( a (a commutativity commutativity of c ) ) associativity V associativity of a double-negation contraposition implication biconditional a V (b over v over ( b b) V c V A sentence is satisfiable if sentence is true in a a, or that m is a model of Inference rules for De c)) distributivity V c)) distributivity of is true in some model. If then satisfies propositional logic Inference rules is a standard patterns of inference can be applied to derive chains of conclusions that lead to the desired 1. Modus Ponens or Implication — The best-known rule is called Modus Ponens or Implication Elimination is written as From an implication and the premise of the conclusion is The notation means that, the sentence the CSE – Panimalar Institute of Technology 12 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE For Shoot and ( WumpusAlive) are given, Shoot can be And - From a conjunction, any of the conjunct is For example, from ( WumpusAlive), can 3. From list of sentences their conjunction is 4. Or From a sentence, its disjunction with is 5. Double - Negation 6. Unit From a disjunction, if one of the disjuncts is false, the true one is 7. CSE – Panimalar Institute of Technology 13 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE Knowledge base for the wumpus world each be true be true there there a pit [i, in [i, knowledge There is no pit in following sentences, for A square is breezy if neighboring square. This the one labeled only if there is a pit in to be stated for each The preceding sentences true in wumpus worlds. we include the breeze percepts for first two squares visited in the specific world the agent is in, leading to Let us see used in these inference rules and wumpus world. start equivalences the base R l through R5, that is, there is no pit in show how CSE – Panimalar Institute of Technology 14 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE First, apply biconditional elimination to R2 to Then apply And—Elimination to R6 to Logical equivalence for contrapositives Now we can apply Modus Ponens with R8 B1,1), to and the percept Finally, De Morgan’s rule, giving the That is, neither [1,2] nor [2,1] contains inference rules—is called a An Agent For The Assume that agent has reached the square CSE – Panimalar Institute of Technology 15 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE The Knowledge Base: The agent percepts are sentences and entered into the knowledge base, valid sentences that entailed by the percept sentences and some , the is added to the knowledge percept — There is a Stench in B1,2 — There is no Breeze in — There no in — There is Breeze (2,1) — There in There in B1,1 — for the square — for the square — for the square In environment. then agent should For example, if the square nor any some knowledge about the is no smell in a square, its adjacent square house a wumpus. Now we will frame the rule of three quares, are: If there is a stench in (1,2) then there must in (1,2) in one or more of neighboring Finding Wumpus 1. obtain Modus Ponens be and the sentence a CSE – Panimalar Institute of Technology 16 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE 2. Applying And - Elimination to this, we obtain the three separate 3. Applying Modus Ponens and the sentence R2 , obtain 4. Applying and - Elimination to this, we obtain four sentences 5. Applying Modus Ponens obtain and the R4 6. Apply the Unit Resolution rule, (derived in step 5 is splitted and the (from step 2), 7. Apply Resolution rule, and the 8. Apply W2,2 yields: First-Order Logic The facts in the logic and Wl,2 and Wl,3 Resolution rule, sentences are , V from step is Wl,3 CSE – Panimalar Institute of Technology 17 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE Order functions Properties Relations Functions for a given the world belongings to be Things with individual identities. Used to distinguish one object from the Relation exist between one One kind of relation which has Properties Relations Functions home, people, college, colors . . big, small, smell, round . . owns, bigger than, has brotherof . father of, best friend . . . Identify the Objects, Properties, sentences: Elephant is bigger than Objects - elephant, Relation -, Squares neighboring the wumpus are Objects - wumpus Properties — One plus two equals Objects - one, two, Relations Function - plus (one plus two, derives : Syntax and Semantics of only First-Order Logic Models for first-order logic contains object in them . domain of a model is the set of objects it contains; domain objects are sometimes The objects in the model be related in various CSE – Panimalar Institute of Technology 18 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE A relation is just the set of tuples of objects that are related. (A tuple is a collection of objects arranged in, a fixed order written with angle brackets surrounding the The model contains unary relations, or are that best given to exactly Symbols The basic syntactic elements of first—order logic symbols that stand for objects, relations, and symbols, which stand stand for relations; functions. objects; predicate symbols, function symbols, which stand The convention that letters. symbols will begin with Example : model with Richard his the Lionheart, legs King of evil England from 1189 John, who ruled to CSE – Panimalar Institute of Technology 19 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE Thus, the brotherhood in this model is the { (Richard the Lionheart, King John), (King John, the Lionheart) ) the “person“ property is true of both Richard and John; “king“ property is true only of Each person has one left leg, so the model has a “left leg“ function that includes the following (Richard the Lionheart) -> Richard’s left (King John) -> John’s left leg The syntax of first-order specified in Backus-Naur form. Sentence -> Atomic I ( Sentence Connective Sentence I Quantifier Variable,. . . I AtomicSentence -> Predicate(Term, . . .) I Term = Term -> Function(Term, . . I I -> I Quantifier -> Constant -> A I X1 I John I . . Variable -> a I x I s I . . Predicate -> Before I HasColor I Raining . . Function -> Mother I LeftLeg I . . Explanation for each element in the BNF description of Term A Constants: not names. Example: Name of (John, is a logical expression constant symbol denotes name, might be King that refers one by to CSE – Panimalar Institute of Technology 20 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE Variables: Assumes different values in the domain the inference Example: The variable x may be assigned to different in the domain depends on the step Predicate symbols: It refers to a particular the Example: brotherhood (a relation that holds or hold between pair of Tuple: In a model the relation is defined arranged in a fixed A with objects defines the relation of brotherhood is of [<Ram, Raju> <Raju, A type relation to one of the other object by the Example: by defined the by the related is (refers by a parenthesized of terms (refers to (i) Brother(john, (i.e) John is a brother of 2 Married(FatherOf(Ram), (i.e) Ram’s father is married to John’s connectives construct more complex sentences, in which the semantics of given sentence has to be satisfied. In P(x,y) — x is a P of P — Predicate x, Y — Example: (i) Brother (Ram, when Ram is brother of (ii) Brother (Ram, John) when Ram is the brother Brother(john, John John Ram the Raju), is the brother CSE – Panimalar Institute of Technology 21 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE (ii) Brother (Ram, John) Brother (Ram, xyz), that if Ram is a brother of John then he is not a brother of Quantifiers: Quantifiers are used to express properties entire collection of objects, rather than representing objects by name. FOL contains two standard (i) Universal quantifier ( (ii) Existential quantifier ( Universal quantifier Vs Universal quantifier ( Existential quantifier Existential quantifier ( General notation x P, P - Logical x — variable - for General notation x P, P - Logical x — variable - there (i.e) P is true for x in the (i.e) Example All cats are Example Spot has a sister who is P is true for x Cat(x)-> x Sister(x, Spot) -> All the cats in universe belongs to the type The Spot’s sister is a of mammal and hence implies Spot is also a cat variable x may be replaced by and hence x may be any of the cat name (object x may be name) replaced by, sister of Spot, if it exists. Felix is a Spot is a Felix is a sister of is a Cat(Felix) Sister(Felix, Spot) Sister(Felix, Spot)-> cat(Felix) Mammal(Spot Example Cat(Spot) -> Example King John has a crown on All kings are x Crown(x) -> OnHead x King(x) > John) CSE – Panimalar Institute of Technology 22 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE Nested quantifiers : The multiple quantifiers (of same are represented using or different Example 1 : “Brothers are y Brother(x, y) -> Sibling(x, Example For all x and all y, if x is the parent of y then y child y Parent (x, y)-> Child (y, Example 3 “Everybody y Loves Example 4 There is who is loved by x Loves Connections : The two quantifiers are actually connected with each other through Example the likes : Asserting as that asserting everyone there dislikes does not ( partnerships exist someone Likes(x, Partnerships) is equivalent x Likes(x, Example: Everyone likes ice cream means that there is no one does not like ice x Likes (x, IceCream) is equivalent Likes(x, Ground term (or) clause: A term with no variable is caned ground Example : Cat is CSE – Panimalar Institute of Technology 23 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE The de Morgan rules for quantified ) and unquantified (connection between ) are Quantified Unquantified Well Formed Formula (WFF) : A in which variables are properly Introduced with the the beginning of the sentence itself, is called all x Likes (x, Ice) (ii) y Horn clause: Horn clause is a clause with at most literal in the Example : P quantify (i) Two Objects are equal applied to them are x, y (x p p(x) <=> well as objects. and all CSE – Panimalar Institute of Technology 24 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE (ii) Two functions are equal if and only if they have same value all f, g ( x f(x) =g(x) Predicate expression using the Lambda expression can be applied to term in the same way Example: — (i.e) (4,3) =4 quantifier ( !): “there the predicate ) operator: to 2 Uniqueness = quantifier ( Every country has exactly one c Country(c) ! Ruler (r, Uniqueness operator Ci) : Uniqueness operator to represent the unique object is General notation: ix i — x — unique Example: The unique ruler of Japan is Dead (i r Ruler(r, equality symbol to statements to the effect that two terms refer to the object. Example Father( John) = says the object object referred to by Henry Using First Order to by Father(John) and the the Logic Domain : Domain is a knowledge representation can of the world in which done for that section CSE – Panimalar Institute of Technology 25 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE Assertions and queries in first-order Sentences are added to a knowledge base using TELL, logic. king TELL TELL kings King x King (x) => We can ask questions of the knowledge base using ASK. For example, ASK (KB, King(John)) returns Questions are asked to the knowledge base using ASK, called as queries or We can also ask quantified queries, such ASK x Person(x))) returns true by substituting to x (i.e){x/John}. If there is more than one answer a list of substitutions can be returned. This sort of answering is called substitution or binding The kinship consider domain relationships, or kinship. This domain includes facts such as “Elizabeth is the mother of Charles“ and “Charles is the father William” and rules such as “One’s grandmother is the mother of domain consists, (b) Unary (c) Binary Sister, (e) Male Parent, Sibling, Child, Daughter, Son, Spouse, Wife, Grandchild, Cousin, Aunt, and Father, Mother brotherhood, CSE – Panimalar Institute of Technology 26 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE (i) Male and female are disjoint x Male (x) (ii) Parent and child are inverse p, c Parent(p, c) <=> Child(c, (iii) One’s mother is one’s female Parent(m, c) (iv) One’s husband is one’s male w,h Husband(h,w ) Male(h) (v) A grandparent is a parent of one’s p Parent (vi) A sibling is x,y Sibling(x, y) Parent child of one’s p Numbers, sets, and Numbers are perhaps the theory can be built up from vivid example of how a kernel of Natural numbers are defined n Natnum(n) Axioms to constrain the successor n O m,n S(m) we can define addition in terms of the successor m NatNum(m) m,n + (O,m) = (S(m)n, )= S(+(m,n The domain of mathematical set representation, consists (a) Constant — Empty CSE – Panimalar Institute of Technology 27 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE (b) Predicates - Member and (c) Functions - Intersection, Union and * Axioms - Sentences in the mathematical set knowledge base. * Independent axiom - An axiom that can’t be derived all other (i) Two sets are equal if and only if each is a subset of the Sl, S2 (si = S2) <=> (Subset (Subset (s2, ii) Adjoining an element already in the set has no x, s Member(x, s) <=> s = Adjoin(x, Lists are similar to sets. The differences are that are ordered and the same element can appear more than once in a Vocabulary of Lisp for Nil is the constant list with no Cons, Append, First, and Rest are Find is the predicate that does for lists what does for Example set of The empty list is [ The term Cons (x, y), where y is a nonempty list, written [x I The term Cons(x, Nil), (i.e., the list element x), is written as corresponds to the nested Cons(A,Cons (B, Cons lists CSE – Panimalar Institute of Technology 28 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE The wumpus axioms section concise, capturing in a very natural way exactly what want to wumpus agent receives a percept vector five corresponding knowledge base must include both the percept and the at it occurred; otherwise the agent will get confused about it saw what. We use integers for steps. typical percept sentence would Percept([Stench, Breeze, Glitter, None, None], Here, Percept is a binary predicate and Stench and so on are constants placed in a list. The actions in the world can represented by logical Climb To query such which is best, the agent program constructs a BestAction(a, ASK should solve this query and return a binding list such as Grab). The agent program can then return Grab as action to but first must TELL its own base that it performing Simple behavior also be implemented quantified implication sentences. For example, we t Glitter(t) => Bestaction(Grab, percept sentences are classified Synchronic (same time) dealing relate properties of a world state other properties the same world CSE – Panimalar Institute of Technology 29 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE (ii) Diachronic (across time) sentences describing the way in which the world changes (or does change) are diachronic sentences. For the agent needs to know how to information location with its just determine current because change is one of the most area in the representation. There are two kinds of synchronic rules that could allow capture the necessary information for Infer For finding square must if s Breezy(s) => If a square pit. is Breezy(s) Combining these s a square is breezy, some Adjacent (r, breezy, r the adjacent square contains (r, biconditional sentence r Adjacent (r, Reflect the the world, Some hidden property of the certain percepts to be casuality causes A pit causes all adjacent squares to be r pit(r) => [ s all squares adjacent square will not be s r Adjacent (r,s) (r, s) => Breezy a given square are pitless, pit (r)] Breezy CSE – Panimalar Institute of Technology 30 CS6659-ARTIFICIAL INTELLIGENCE VI SEM CSE that reason with casual reasoning systems” because the how the environment Knowledge Engineering In called rules form based model of First Order Logic Base(KB) Knowledge Engineer: Who a learns a formal representation relations in the of the particular that objects the creates domain and The knowledge engineering engineering projects vary widely and difficulty, but all such projects include scope, following identify the PEAS description of the 2. Assemble the relevant knowledge. The knowledge engineer might already be an expert in the domain, or might need work with real experts to extract what they know-a called knowledge Decide constants: Translate the important into logical level name. The resulting vocabulary as ontology is kinds of the domain, which determines specific properties and 4. Encode general know/edge about the domain: The knowledge engineer writes the axioms (rules) for all the vocabulary misconceptions are clarified from step 3 and process 5. a description of the specific terms. problem concepts that are already part of CS6659-ARTIFICIAL INTELLIGENCE VI SEM UNIT -III KNOWLEDGE INFERENCE Knowledge representation -Production based system,Frame based system. Inference -Backward chaining,Forwardchaining,Rulevalueapproach,Fuzzyreasoning-Certainty factors,Bayesian Theory- BayesianNetwork-Dempster Shafertheory. Knowledge is a general term. An answer to the question, "how to represent knowledge", requires an analysis to distinguish between knowledge “how” and knowledge “that”. knowing "how to do something". e.g. "how to drive a car" is a Procedural knowledge. knowing "that something is true or false". e.g. "that is the speed limit for a car on a motorway" is a Declarative knowledge. 3. knowledge and Representation are two distinct entities. They play a central but distinguishable roles in intelligent system. Knowledge is a description of the world. It determines a system's competence by what it knows. • ■ Representation is the way knowledge is encoded. It defines a system's performance in doing something. • Different types of knowledge require different kinds of representation. The Knowledge Representation models/mechanisms are often based on: Logic ◊ Rules Frames ◊ Semantic Net • Different types of knowledge require different kinds of reasoning. 1.1 Framework of Knowledge Representation (Poole 1998) . Computer requires a well-defined problem description to process and provide well-defined acceptable solution. To collect fragments of knowledge we need first to formulate a description in our spoken language and then represent it in formal language so that computer can understand. The computer can then use an algorithm to 1 CS6659-ARTIFICIAL INTELLIGENCE VI SEM compute an answer. This process is illustrated below. Solve Problem Solution Represent Interpret Informal Formal Compute Representation Output Fig. Knowledge Representation Framework The steps are − The informal formalism of the problem takes place first. − It is then represented formally and the computer produces an output. − This output can then be represented in a informally described solution that user understands or checks for consistency. Note : The Problem solving requires − formal knowledge representation, and − conversion of informal knowledge to formal knowledge , that is conversion of implicit knowledge to explicit knowledge. • Knowledge and Representation . large Problem solving requires amount of knowledge and mechanism for manipulating that knowledge. and the Representation The Knowledge are central distinct entities, but distinguishable roles in intelligent system. − Knowledge is a description of the world; it determines a system's competence by what it knows. − Representation is the way knowledge is encoded; it defines the system's performance in doing something. In simple words, we : − need to know about things we want to represent , and 2 CS6659-ARTIFICIAL INTELLIGENCE VI SEM − need some means by which things we can manipulate. ◊ know things to represent ◊ need means to manipulate ‡ Objects - facts about objects in the domain. ‡ Events - actions that occur in the domain. ‡ Performance - knowledge about how to do things ‡ Metaknowledge - knowledge about what we know ‡ Requires some formalism - to what we represent ; Thus, knowledge representation can be considered at two levels : (a) knowledge level at which facts are described, and (b) symbol level at which the representations of the objects, defined in terms of symbols, can be manipulated in the programs. (b) Mapping between Facts and Representation Knowledge is a collection of “facts” from some domain. We need a representation of "facts" that can be manipulated by a program. Normal English is insufficient, too hard currently for a computer program to draw inferences in natural languages. Thus some symbolic representation is necessary. Therefore, we must be able to map "facts to symbols" and "symbols to facts" using forward and backward representation mapping. Example : Consider an English sentence Representations 3 CS6659-ARTIFICIAL INTELLIGENCE VI SEM Facts ◊ Spot is a dog A fact represented in English sentence ◊ dog (Spot) Using forward mapping function the ◊ ∀ x : dog(x) → above fact is represented in logic hastail (x) A logical representation of the fact that "all dogs have tails" Knowledge Representationis divided into 2 types: I. Production systems. II. Frame-based systems. Knowledge-based system 6. Knowledge base: A set of sentences that describe the world in some formal (representational) language (e.g. firstorder logic) Domain specific knowledge 7. Inference engine: A set of procedures that work upon the representational language and can infer new facts or answer KB queries (e.g. resolution algorithm, forward chaining) Domain independent Automated reasoning systems • Theorem proverbs 4 CS6659-ARTIFICIAL INTELLIGENCE VI SEM – Prove sentences in the first-order logic. Use inference rules, resolution rule and resolution refutation. • Deductive retrieval systems – Systems based on rules (KBs in Horn form) – Prove theorems or infer new assertions • Production systems – Systems based on rules with actions in antecedents – Forward chaining mode of operation • Semantic networks – Graphical representation of the world, objects are nodes in the graphs, relations are various links • Frames: – object oriented representation, some procedural control of inference PRODUCTION BASED SYSTEMS Based on rules, but different from KBs in the Horn form Knowledge base is divided into: • A Rule base (includes rules) • A Working memory (includes facts) Rules: a special type of if – then rule p1∧ p2∧Kpn⇒a1, a2,K,ak Antecedent A conjunction of condition Consequent a sequence of action Basic operation: • Check if the antecedent of a rule is satisfied • Decide which rule to execute (if more than one rule is satisfied) • Execute actions in the consequent of the rule 5 CS6659-ARTIFICIAL INTELLIGENCE VI SEM Working memory • Consists of a set of facts – statements about the world but also can represent various data structures • The exact syntax and representation of facts may differ across different systems • Examples: – predicates such as Red(car12) but only ground statements or – (type attr1:value1 attr2:value2 …) objects such as: (person age 27 home Toronto) The type, attributes and values are all atoms Rules p1∧ p2∧Kpn⇒ a1, a2,K,ak • Antecedents: conjunctions of conditions • Examples: – a conjunction of literals A(x) ∧B(x) ∧C ( y) – simple negated or non-negated statements in predicate logic • or – conjunctions of conditions on objects/object – (type attr1 spec1 attr2 spec2 …) – Where specs can be an atom, a variable, expression, condition (person age [n+4] occupation x) (person age {< 23 ∧> 6}) Production systems p1∧ p2∧Kpn⇒ a1, a2,K,ak • Consequent: a sequence of actions • An action can be: 6 CS6659-ARTIFICIAL INTELLIGENCE VI SEM – ADD the fact to the working memory (WM) – REMOVE the fact from the WM – MODIFY an attribute field – QUERY the user for input, etc … • Examples: A(x)∧ B(x)∧ C ( y)⇒ add D(x) • Or (Student name x) ⇒ ADD (Person name x) • Use forward chaining to do reasoning: – If the antecedent of the rule is satisfied (rule is said to be “active”) then its consequent can be executed (it is “fired”) • Problem: Two or more rules are active at the same time.Which one to execute next? Condition Actions s R27 R27 Condition Actions s R105 R105 ? • Strategy for selecting the rule to be fired from among possible candidates is called conflict resolution • Why is conflict resolution important? Or, Why do we care about the order? • Assume that we have two rules and the preconditions of both are satisfied: R1: A(x)∧B(x)∧C(y)⇒add D(x) R2: A(x)∧B(x)∧E(z)⇒delete A(x) • What can happen if rules are triggered in different order? • Why is conflict resolution important? Or, Why do we care about the order? • Assume that we have two rules and the preconditions of both are satisfied: 7 CS6659-ARTIFICIAL INTELLIGENCE VI SEM R1: A(x)∧B(x)∧C(y)⇒add D(x) R2: A(x)∧B(x)∧E(z)⇒delete A(x) • What can happen if rules are triggered in different order? – If R1 goes first, R2 condition is still satisfied and we infer D(x) – If R2 goes first we may never infer D(x) • Problems with production systems: – Additions and Deletions can change a set of active rules; – If a rule contains variables testing all instances in which the rule is active may require a large number of unifications. – Conditions of many rules may overlap, thus requiring to repeat the same unifications multiple times. • Solution: Rete algorithm – gives more efficient solution for managing a set of active rules and performing unifications – Implemented in the system OPS-5 (used to implement XCON – an expert system for configuration of DEC computers) Rete algorithm • Assume a set of rules: A(x)∧ B(x)∧ C ( y)⇒ add D(x) A( x)∧ B( y)∧ D( x)⇒ add E ( x) A( x)∧ B( x)∧ E ( z)⇒ delete A( x) 2. And facts: A(1), A(2), B(2), B(3), B(4), C (5) 3. Rete: – Compiles the rules to a network that merges conditions of multiple rules together (avoid repeats) – Propagates valid unifications – Reevaluates only changed conditions Rete algorithm.Network. CS6659-ARTIFICIAL INTELLIGENCE Rules: A ( x )∧ B ( x )∧ C ( y )⇒ add A ( x )∧ B ( y )∧ D ( x )⇒ add A ( x )∧ B ( x )∧ E ( z )⇒ delete Facts: VI SEM D(x) E(x) A(x) A (1), A (2), B (2), B (3), B (4), C (5) Conflict resolution strategies 5. Problem: Two or more rules are active at the same time.Which one to execute next? 6. Solutions: – No duplication (do not execute the same rule twice) – Regency. Rules referring to facts newly added to the working memory take precedence – Specificity. Rules that are more specific are preferred. – Priority levels. Define priority of rules, actions based on expert opinion. Have multiple priority levels such that the higher priority rules fire first. FRAME-BASED REPRESENTATION • Frames – semantic net with properties • A frame represents an entity as a set of slots (attributes) and associated values • A frame can represent a specific entry, or a general concept • Frames are implicitly associated with one another because the value of a slot can be another frame 9 CS6659-ARTIFICIAL INTELLIGENCE VI SEM 3 components of a frame • frame name • attributes (slots) • values (fillers: list of values, range, string, etc.) Book Frame Slot Filler d) Title AI. A modern Approach e) Author f) Year Russell &Norvig 2003 Features of Frame Representation • More natural support of values then semantic nets (each slots has constraints describing legal values that a slot can take) • Can be easily implemented using object-oriented programming techniques • Inheritance is easily controlled Inheritance CS6659-ARTIFICIAL INTELLIGENCE VI SEM Benefits of Frames • Makes programming easier by grouping related knowledge • Easily understood by non-developers • Expressive power • Easy to set up slots for new properties and relations • Easy to include default information and detect missing values Inferential Knowledge : This knowledge generates new information from the given information. This new information does not require further data gathering form source, but does require analysis of the given information to generate newknowledge. given a set of relations and values, one may infer other values or relations. a predicate logic (a mathematical deduction) is used to infer from a set of attributes. inference through predicate logic uses a set of logical operations to relate individual data. o the symbols used for the logic operations are : " " ¬ → " (implication), " (not), " V " (or), " " ∀ " (for all), " ∃ (there exists). " Λ " (and), 11 CS6659-ARTIFICIAL INTELLIGENCE VI SEM Examples of predicate logic statements : f) g) h) "Wonder" is a name of a dog :dog (wonder) All dogs belong to the class of animals :∀x : dog (x) →animal(x) All animals either live on land or in:∀x : animal(x)→live (x,water : land) V live (x, water) Declarative knowledge : Here, the knowledge is based on declarative facts about axioms and domains . axioms are assumed to be true unless a counter example is found to invalidate them. domains represent the physical world and the perceived functionality. axiom and domains thus simply exists and serve as declarative statements that can stand alone Procedural knowledge: Here, the knowledge is a mapping process between domains that specify ―what to do when‖ and the representation is of ―how to make it‖ ratherthan “what it is”. The procedural knowledge : o may have inferential efficiency, but no inferential adequacy and acquisitional efficiency. o are represented as small programs that know how to do specific things, how to proceed. Example : A parser in a natural language has the knowledge that a nounphrase may contain articles, adjectives and nouns. It thus accordingly call routines that know how to process articles, adjectives and nouns. ■ Inference Procedural Rules These rules advise on how to solve a problem, while certain facts are known. Example : IF the data needed is not in the system THEN request it from the user. These rules are part of the inference engine. The logic model of computation is based on relations and logical inference. 12 CS6659-ARTIFICIAL INTELLIGENCE VI SEM Programs consist of definitions of relations. Computations are inferences (is a proof). Example 1 :Linear function A linear function y = 2x + 3 can be represented as : f (X , Y) if Y is 2 ∗ X + 3. Here the function represents the relation between X and Y. Example 2: Determine a value for Circumference. The circumference computation can be represented as: Circle (R , C) if Pi = 3.14 and C = 2 ∗ pi ∗ R. Here the function is represented as the relation between radius R and circumference C. Forward versus Backward Reasoning Rule-Based system architecture consistsa set of rules, a set of facts, andan inference engine. The need is to find what new facts can be derived.Given a set of rules, there are essentially twoways to generate new Knowledge: one, forward chaining and the other, backward chaining. Forward chaining : also called data driven. It starts with the facts, and sees what rules apply. Backward chaining : also called goal driven. It starts with something to find out, and looks for rules that will help in answering it. KR – forward-backward reasoning Example 1 Rule 13 R1 : IF hot AND smoky THEN fire CS6659-ARTIFICIAL INTELLIGENCE VI SEM Rule R2 : IF alarm_beeps THEN smoky Rule R3 : IF fire THEN switch_on_sprinklers Fact F1 : alarm_beeps [Given] Fact F2 : hot [Given] Example 2 Rule R1 : IF hot AND smoky THEN ADD fire Rule R2 : IF alarm_beeps THEN ADD smoky Rule R3 : IF fire THEN ADD switch_on_sprinklers Fact F1 : alarm_beeps [Given] Fact F2 : hot [Given] KR – forward-backward reasoning Example 3 : A typical Forward Chaining Rule R1 : IF hot AND smoky THEN ADD fire Rule R2 : IF alarm_beeps THEN ADD smoky Rule R3 : If THEN ADD switch_on_sprinklers fire Fact F1 : alarm_beeps [Given] Fact F2 : hot [Given] Fact F4 : smoky [from F1 by R2] Fact F2 : fire [from F2, F4 by R1] Fact F6 : switch_on_sprinklers [from F2 by R3] Example 4 : A typical Backward Chaining Rule R1 : 14 IF hot AND smoky THEN fire CS6659-ARTIFICIAL INTELLIGENCE VI SEM Rule R2 : IF alarm_beeps THEN smoky Rule R3 : If _fire THEN switch_on_sprinklers Fact F1 : hot [Given] Fact F2 : alarm_beeps [Given] Goal : Should I switch sprinklers on? Properties of Forward Chaining all rules which can fire do fire. can be inefficient - lead to spurious rules firing, unfocused problem solving set of rules that can fire known as conflict set. decision about which rule to fire is conflict resolution. KR – forward chaining Forward chaining algorithm - I Repeat Collect the rule whose condition matches a fact in WM. Do actions indicated by the rule. (add facts to WM or delete facts from WM) Until problem is solved or no condition match Backward chaining system Backward chaining means reasoning from goals back to facts. The idea is to focus on the search. Rules and facts are processed using backward chaining interpreter. Checks hypothesis, e.g. "should I switch the sprinklers on?" 15 CS6659-ARTIFICIAL INTELLIGENCE VI SEM Backward chaining algorithm Prove goal G If G is in the initial facts , it is proven. Otherwise, find arulewhich can be used to conclude G, and try to prove each ofthat rule's conditions. alarm_beeps Smoky hot fire switch_on_sprinklers Encoding of rules Rule R1 : IF hot AND smoky THEN fire Rule R2 : IF alarm_beeps THEN smoky Rule R3 : If fire THEN switch_on_sprinklers Fact F1 : hot [Given] Fact F2 : alarm_beeps [Given] Goal : Should I switch sprinklers on? Forward vs Backward Chaining Depends on problem, and on properties of rule set. Backward chaining is likely to be better if there is clear hypotheses. Examples : Diagnostic problems or classification problems, Medical expert systems Forward chaining may be better if there is less clear hypothesis and want to see what can be concluded from current situation; 16 CS6659-ARTIFICIAL INTELLIGENCE VI SEM Examples : Synthesis systems - design / configuration. Uncertain Knowledge When an agent knows enough facts about its environment, the logical approach enables it to derive plans that are guaranteed to work. This is a good thing. Unfortunately, agents almost never have access to the whole truth about their environment. Agents must, therefore, act under uncertainty. The real world is far more complex than the wumpus world. For a logical agent, it might be impossible to construct a complete and correct description of how its actions will work. Suppose, for example, that the agent wants to drive someone to the airport to catch a flight and is considering a plan, Ago, that involves leaving home 90 minutes before the flight departs and driving at a reasonable speed. Even though the airport is only about 15 miles away, the agent will not be able to conclude with certainty that "Plan Ago will get us to the airport in time." Instead, it reaches the weaker conclusion "Plan Ago will get us to the airport in time, as long as my car doesn't break down or run out of gas, and I don't get into an accident, and there are no accidents on the bridge, and the plane doesn't leave early, and . . . ." None of these conditions can be deduced, so the plan's success cannot be inferred. This is an example of the qualification problem. The right thing to do-the rational decision—therefore depends on both the relative importance of various goals and the likelihood that, and degree to which, they will be achieved. Handling Uncertain Knowledge Diagnosis-whether for medicine, automobile repair, or whatever-is a task that almost always involves uncertainty. Let us try to write rules for dental diagnosis using first-order logic, so that we can see how the logical approach breaks down. Consider the following rule: V p Symptom (p, Toothache) * Disease (p, Cavity) . 17 CS6659-ARTIFICIAL INTELLIGENCE VI SEM The problem is that this rule is wrong. Not all patients with toothaches have cavities; some of them have gum disease, an abscess, or one of several other problems: V p Symptom (p, Toothache) + Disease (p, Cavity) V Disease (p, GumDisease) V Disease (p, Abscess) . . . Unfortunately, in order to make the rule true, we have to add an almost unlimited list of possible causes. We could try turning the rule into a causal rule: V y Disease (p, Cavity) + Symptom (p, toothache) But this rule is not right either; not all cavities cause pain. The only way to fix the rule is to make it logically exhaustive: to augment the left-hand side with all the qualifications required for a cavity to cause a toothache. Even then, for the purposes of diagnosis, one must also take into account the possibility that the patient might have a toothache and a cavity that are unconnected. Trying to use first-order logic to cope with a domain like medical diagnosis thus fails for three main reasons: Laziness: It is too much work to list the complete set of antecedents or consequents needed to ensure an exceptionless rule and too hard to use such rules. Theoretical ignorance: Medical science has no complete theory for the domain. Practical ignorance: Even if we know all the rules, we might be uncertain about a particular patient because not all the necessary tests have been or can be run. Probability provides a way of summarizing the uncertainty that comes from our laziness and ignorance. We might not know for sure what afflicts a particular patient, but we believe that there is, say, an 80% chance-that is, a probability of 0.8-that the patient has a cavity if he or she has a toothache. Assigning a probability of 0 to a given sentence corresponds to an unequivocal belief that the sentence is false, while assigning a probability of 1 corresponds to an unequivocal belief that the sentence is true. 18 CS6659-ARTIFICIAL INTELLIGENCE VI SEM Probabilities between 0 and 1 correspond to intermediate degrees of belief in the truth of the sentence. The sentence itself is in fact either true or false. All probability statements must therefore indicate the evidence with respect to which the probability is being assessed. As the agent receives new percepts, its probability assessments are updated to reflect the new evidence. Before the evidence is obtained, we talk about prior or unconditional probability; after the evidence is obtained, we talk about posterior or conditional probability. Uncertainty and Rational Decisions The presence of uncertainty radically changes the way an agent makes decisions. A logical agent typically has a goal and executes any plan that is guaranteed to achieve it. An action can be selected or rejected on the basis of whether it achieves the goal, regardless of what other actions might achieve. When uncertainty enters the picture, this is no longer the case. To make such choices, an agent must first have preferences between the different possible outcomes of the various plans. A particular outcome is a completely specified state, including such factors as whether the agent arrives on time and the length of the wait at the airport. We will be using utility theory to represent and reason with preferences. (The term utility is used here in the sense of "the quality of being useful," not in the sense of the electric company or water works.) Utility theory says that every state has a degree of usefulness, or utility, to an agent and that the agent will prefer states with higher utility. The utility of a state is relative to the agent whose preferences the utility function is supposed to represent. The utility of a state in which White has won a game of chess is obviously high for the agent playing White, but low for the agent p1aying Black. A utility function can even account for altruistic behavior, simply by including 19 CS6659-ARTIFICIAL INTELLIGENCE VI SEM the welfare of others as one of the factors contributing to the agent's own utility. Preferences, as expressed by utilities, are combined with probabilities in the general theory of ra.tiona1 decisions called decision theory: Decision theory = probability theory + utility theory. The fundamental idea of decision theory is that an agent is rational if and only if it chooses the action that yields the highest expected utility, averaged over all the possible outcomes of the action. This is called the principle of Maximum Expected Utility (MEU). Design for a Decision – Theoritic Agent The primary difference is that the decision-theoretic agent's knowledge of the current state is uncertain; the agent's belief state is a representation of the probabilities of all possible actual states of the world. As time passes, the agent accumulates more evidence and its belief state changes. Given the belief state, the agent can make probabilistic predictions of action outcomes and hence select the action with highest expected utility. 20 CS6659-ARTIFICIAL INTELLIGENCE VI SEM Basic Probability Notation Any notation for describing degrees of belief must be able to deal with two main issues: The nature of the sentences to which degrees of belief are assigned Dependence of the degree of belief on the agent's experience. Propositions Probability theory typically uses a language that is slightly more expressive than propositional logic. The basic element of the language is the random variable, which can be thought of as referring to a "part" of the world whose "status" is initially unknown. For example, Cavity might refer to whether my lower left wisdom tooth has a cavity. We will always capitalize the names of random variables. (However, we still use lowercase, single-letter names to represent an unknown random variable, for example: P(a) = 1 - P(la).) Each random variable has a domain of values that it can take on. For example, the domain of Cavity might be (true,false)(We will use lowercase for the names of values.) The simplest kind of proposition asserts that a random variable has a particular value drawn from its domain. For example, Cavity = true might represent the proposition that I do in fact have a cavity in my lower left wisdom tooth. Random Variables are divided into three kinds: 21 CS6659-ARTIFICIAL INTELLIGENCE VI SEM Boolean random variables, such as Cavity, have the domain (true, false). We will often abbreviate a proposition such as Cavity = true simply by the lowercase name cavity. Similarly, Cavity = false would be abbreviated by ! cavity. Discrete random variables, which include Boolean random variables as a special case, take on values from a countable domain. For example, the domain of Weather might be (sunny, rainy, cloudy, snow). The values in the domain must be mutually exclusive and exhaustive. Where no confusion arises, we: will use, for example, snow as an abbreviation for Weather = snow. Continuous random variables take on values from the: real numbers. The domain can be either the entire real line or some subset such as the interval [0,1]. For example, the proposition X = 4.02 asserts that the random variable .X has the exact value 4.02. Propositions concerning continuous random variables can also be inequalities, such as X <= 4.02 Elementary propositions, such as Cavity = true and Toothache =false, can be combined to form complex propositions using all the standard logical connectives. For example, Cavity = true A Toothache =false is a proposition to which one may ascribe a degree of (dis)belief. Atomic events An atomic event is a complete specification of the state of the world about which the agent is uncertain. It can be thought of as an assignment of particular values to all the variables of which the world is composed. For example, if my world consists of only the Boolean variables Cavity and Toothache, then there are just four distinct atomic events; the proposition Cavity =false A Toothache = true is one such event. 22 CS6659-ARTIFICIAL INTELLIGENCE VI SEM Atomic events have some important properties: They are mutually exclusive-at most one can actually be the case. For example, Cavity. A toothache and cavity A -toothache cannot both be the case. The set of all possible atomic events is exhaustive-at least one must be the case. That is, the disjunction of all atomic events is logically equivalent to true. Any particular atomic event entails the truth or falsehood of every proposition, whether simple or complex. For example, the atomic event cavity A toothache entails the truth of cavity and the falsehood of cavity => toothache. Any proposition is logically equivalent to the disjunction of all atomic events that entail the truth of the proposition. For example, the proposition cavity is equivalent to disjunction of the atomic events cavity ^ toothache and cavity A toothache. Prior probability The unconditional or prior probability associated with a proposition a is the degree of belief accorded to it in the absence of any other information; it is written as P ( a ) . For example, if the prior probability that I have a cavity is 0.1, then we would write P(Cavity = true) = 0.1 or P(cavity) = 0.1 . It is important to remember that P ( a ) can be used only when there is no other information. As soon as some new information is known, we must reason with the conditional probability of a given that new information. We will use an expression such as P( Weather), which denotes a vector of values for the probabilities of each individual state of the weather. P( Weather = sunny) = 0.7 23 CS6659-ARTIFICIAL INTELLIGENCE VI SEM P( TNeather = rain) = 0.2 P( Weather = cloudy) = 0.08 P( Weather = snow) = 0.02 . we may simply write P( Weather) = (0.7,0.2,0.08,0.02) . This statement defines a prior probability distribution for the random variable Weather. We will also use expressions such as P (Weather , Cavity) to denote the probabilities of all combinations of the values of a set of random variable. In that case, P( Weather, Cavity) can be represented by a 4 x 2 table of probabilities. This is called the joint probability distribution of Weather DISTRIBUTION and Cavity. Sometimes it will be useful to think about the complete set of random variables used to describe the world. A joint probability distribution that covers this complete set is called the full joint probability distribution. For example, if the world consists of just the variables Cavity, Toothache, and Weather, then the full joint distribution is given by P (Cavity, Toothache, Weather). Full joint distribution specifies the probability of every atomic event and is therefore a complete specification of one's uncertainty about the world in question. Probability distributions for continuous variables are called probability density functions. Density functions differ in meaning from discrete distributions. Conditional probability Once the agent has obtained some evidence concerning the previously unknown random variables making up the domain, prior probabilities are no longer applicable. Instead, we use conditional or posterior probabilities. The notation used is P(a|b), where a and b are any proposition. This is read as "the probability of a, given that all we know is b." For example, P(cavity1 toothache) = 0.8 24 CS6659-ARTIFICIAL INTELLIGENCE VI SEM indicates that if a patient is observed to have a toothache and no other information is yet available, then the probability of the patient's having a cavity will be 0.8. A prior probability, such as P(cavity), can be thought of as a special case of the conditional probability P(cavity( ), where the probability is conditioned on no evidence. Conditional probabilities can be defined in terms of unconditional probabilities. The defining equation is which holds whenever P(b) > 0. This equation can also be written as P(a A b) = P(a| b) P(b) which is called the product rule. The product rule is perhaps easier to remember: it comes from the fact that, for a and b to be true, we need b to be true, and we also need a to be true given b. We can also have it the other way around: P(a A b) = P(b|a)P(a.) In some cases, it is easier to reason in terms of prior probabilities of conjunctions, but for the most part, we will use conditional probabilities as our vehicle for probabilistic inference. AXIOMS OF PROBABILITY 1. All probabilities are between 0 and 1. For any proposition a, 2. Necessarily true (i.e., valid) propositions have probability I, and necessarily false (i.e., unsatisfiable) propositions have probability 0. Next, we need an axiom that connects the probabilities of logically related propositions. The simplest way to do this is to define the probability of a disjunction as follows: 3. The probability of a disjunction is given by 25 CS6659-ARTIFICIAL INTELLIGENCE VI SEM These three axioms are often called Kolmogorov's axioms in honor of the Russian mathematician Andrei Kolmogorov. INFERENCE USING FULL JOINT DISTRIBUTION We will use the full joint distribution as the "knowledge base" from which answers to all questions may be derived. For example, there are six atomic events in which cavity V toothache holds: P(cavity V toothache) = 0.108 + 0.012 + 0.072 + 0.008 + 0.016 + 0.064 == 0.28 One particularly common task is to extract the distribution over some subset of variables or a single variable. For example, adding the entries in the first row gives the unconditional or marginal probability of cavity: P(cavity) = 0.108 + 0.012 + 0.072 + 0.008 = 0.2 . This process is called marginalization, or summing out-because the variables other than Cavity are summed out. We can write the following general marginalization rule for any sets of variables Y and Z: P(Y) = P(Y, z) . 26 CS6659-ARTIFICIAL INTELLIGENCE VI SEM A variant of this rule involves conditional probabilities instead of joint probabilities, using the product rule: This rule is called conditioning. Marginalization and conditioning will turn out to be useful rules for all kinds of derivations involving probability expressions. BAYES’ RULE The product rule Equating the two right-hand sides and dividing by P(a), we get This equation is known as Bayes' rule (also Bayes' law or Bayes' theorem).This simple equation underlies all modern A1 systems for probabilistic inference. The more general case of multivalued variables can be written in the P notation as where again this is to be taken as representing a set of equations, each dealing with specific values of the variables. 27 CS6659-ARTIFICIAL INTELLIGENCE VI SEM REPRESENTING KNOWLEDGE IN UNCERTAIN DOMAIN A Bayesian network is a directed graph in which each node is annotated with quantitative probability information. The full specification is as follows: 1. A set of random variables makes up the nodes of the network. Variables may be discrete or continuous. 2. A set of directed links or arrows connects pairs of nodes. If there is an arrow from node X to node Y, X is said to be apparent of Y. 3. Each node X, has a conditional probability distribution P(X,( Parents(X,))that quantifies the effect of the parents on the node. 4. The graph has no directed cycles (and hence is a directed acyclic graph, or DAG). The topology of the network-the set of nodes and links-specifies the conditional independence relationships that hold in the domain. The intuitive meaning of an arrow in a properly constructed network is usually that X has a direct influence on Y. It is usually easy for a domain expert to decide what direct influences exist in the domain. 28 CS6659-ARTIFICIAL INTELLIGENCE VI SEM You have a new burglar alarm installed at home. It is fairly reliable at detecting a burglary, but also responds on occasion to minor earthquakes. You also have two neighbors, John and Mary, who have promised to call you at work when they hear the alarm. John always calls when he hears the alarm, but sometimes confuses the telephone ringing with the alarm and calls then, too. Mary, on the other hand, likes rather loud music and sometimes misses the alarm altogether. Given the evidence of who has or has not called, we would like to estimate the probability of a burglary. In the case of the burglary network, the topology shows that burglary and earthquakes directly affect the probability of the alarm's going off, but whether John and Mary call depends only on the alarm. The network thus represents our assumptions that they do not perceive any burglaries directly, they do not notice the minor earthquakes, and they do not confer before calling. Notice that the network does not have nodes corresponding to Mary's currently listening to loud music or to the telephone ringing and confusing John. These factors are summarized in the uncertainty associated with the links from Alarm to JohnCalls and MaryCalls. This shows both laziness and ignorance in operation: it would be a lot of work to find out why those factors would be more or less likely in any particular case, and we have no reasonable way to obtain the relevant information anyway. The probabilities actually summarize a potentially infinite set of circumstances in which the alarm might fail to go off (high humidity, power failure, dead battery, cut wires, a dead mouse stuck inside the bell, etc.) or John or Mary might fail to call and report it (out to lunch, on vacation, temporarily deaf, passing helicopter, etc.). In this way, a small agent can cope with a very large world, at least approximately. Each row in a CPT contains the conditional probability of each node value for a conditioning case. A conditioning case is just a possible combination of values for the parent nodes-a miniature atomic event. Each row must sum to 1, because the entries represent an exhaustive set of cases for the variable. For Boolean variables, once you know that the probability of a true value is p, the probability of false must be 1 29 CS6659-ARTIFICIAL INTELLIGENCE VI SEM - p, so we often omit the second number, as in Figure 14.2. In general, a table for a Boolean variable with k Boolean parents contains 2k independently specifiable probabilities. A node with no parents has only one row, representing the prior probabilities of each possible value of the variable. THE SEMANTICS OF BAYESIAN NETWORKS There are two ways in which one can understand the semantics of Bayesian networks. The first is to see the network as a representation of the joint probability distribution. The second is to view it as an encoding of a collection of conditional independence statements. The two views are equivalent, but the first turns out to be helpful in understanding how to construct networks, whereas the second is helpful in designing inference procedures. Representing the full joint distribution A Bayesian network provides a complete description of the domain. Every entry ]in the full joint probability distribution (hereafter abbreviated as "joint") can be calculated from the information in the network. A generic entry in the joint distribution is the probability of a conjunction of particular assignments to each variable, such as P(X1 = X I A . . . A X, = x,). We use the notation P(xl., . . , x,) as an abbreviation for this. The value of this entry is given by the formula (14.1) where parents(Xi) denotes the specific values of the variables in Parents(Xi). Thus, each entry in the joint distribution is represented by the product of the appropriate elements of the conditional probability tables 30 CS6659-ARTIFICIAL INTELLIGENCE VI SEM (CPTs) in the Bayesian network. The CPTs therefore provide a decomposed representation of the joint distribution. A method for constructing Bayesian networks Equation (14.1) defines what a given Bayesian network means. It does not, however, explain how to construct a Bayesian network in such a way that the resulting joint distribution is a good representation of a given domain. We will now show that Equation (14.1) implies certain conditional independence relationships that can be used to guide the knowledge engineer in constructing the topology of the network. First, we rewrite the joint distribution in terms of a conditional probability, using the product rule Then we repeat the process, reducing each conjunctive probability to a conditional probability and a smaller conjunction. We end up with one big product: CHAIN RULE This identity holds true for any set of random variables and is called the chain rule. Comparing it with Equation (14.1), we see that the specification of the joint distribution is equivalent to the general assertion that, for every variable Xi in the network, (14.2) 31 CS6659-ARTIFICIAL INTELLIGENCE VI SEM provided that Parents(Xi) {Xi-l, . . . , XI). This last condition is satisfied by labeling the nodes in any order that is consistent with the partial order implicit in the graph structure. What Equation (14.2) says is that the Bayesian network is a correct representation of the domain only if each node is conditionally independent of its predecessors in the node ordering, given its parents. Hence, in order to construct a Bayesian network with the correct structure for the domain, we need to choose parents for each node such that this property holds. Intuitively, the parents of node X, should contain all those nodes in XI, . . . , X,-l that directly influence X,. For example, suppose we have completed the network in Figure 14.2 except for the choice of parents for MaryCalls. MaryCalls is certainly influenced by whether there is a Burglary or an Earthquake, but not directly influenced. Intuitively, our knowledge of the domain tells us that these events influence Mary's calling behavior only through their effect on the alarm. Also, given the state of the alarm, whether John calls has no influence on Mary's calling. Formally speaking, we believe that the following conditional independence statement holds: P(MaryCa1ls| JohnCalls, Alarm, Earthquake, Burglary) = P(MaryCal1s|Alarm) . Compactness and Node ordering The compactness of Bayesian networks is an example of a very general property of locally structured (also called sparse) systems. In a locally structured system, each subcomponent interacts directly with only a bounded number of other components, regardless of the total number of components. Even in a locally structured domain, constructing a locally structured Bayesian network is not a trivial problem. We require not only that each variable be directly influenced by only a few others, but also that the network topology actually reflects those direct influences with the appropriate set of parents. Because of the way that the construction procedure works, the "direct influencers" will have to be added to the network first if they are to become parents of the node they influence. Therefore, the correct order in which 32 CS6659-ARTIFICIAL INTELLIGENCE VI SEM to add nodes is to add the "root causes" first, then the variables they influence, and so on, until we reach the "leaves," which have no direct causal influence on the other variables. If we stick to a causal model, we end up having to specify fewer numbers, and the numbers will often be easier to come up with. Conditional Independence Relation in Bayesian Networks The topological semantics is given by either of the following specifications, which are equivalent:' 1. A node is conditionally independent of its non-descendants, given its parents. For example, in Figure 14.2, JohnCalls is independent of .Burglary and Earthquake, given the value of Alarm. 2. A node is conditionally independent of all other nodes in the network, given its parents, children, and children's parents-that is, given its Markov blanket. For example, Burglary is independent of JohnCalls and MaryCalls, given Alarm and Earthquake. EXACT INFERENCE IN BAYESIAN NETWORKS The basic task for any probabilistic inference system is to compute the posterior probability distribution for a set of query variables, given some observed event-that is, some assignment of values to a set of evidence variables. Inference By Enumeration A query P(X1e) can be answered using P(X|e) = α P(X, e) = α ∑ P(X, e, y) . 13.6 Now, a Bayesian network gives a complete representation of the full joint distribution. More specifically, Equation (14.1) shows that the terms P(x, e, y) in the joint distribution can be written as products of conditional probabilities from the network. Therefore, a query can be answered using a Bayesian network by computing sums of products of conditional probabilities from the network. 33 CS6659-ARTIFICIAL INTELLIGENCE VI SEM ENUMERATE-JOINT-ASK, is given for inference by enumeration from the full joint distribution. The algorithm takes as input a full joint distribution P and looks up values therein. Consider the query P(Burglary JohnCalls = true, &Jury Calls = true). The hidden variables for this query are Earthquake and Alarm. From Equation (13.6), using initial letters for the variables in order to shorten the expressions, we have 34 CS6659-ARTIFICIAL INTELLIGENCE VI SEM The Variable Elimination Algorithm The idea is simple: do the calculation once and save the results for later use. This is a form of dynamic programming. There are several versions of this approach; we present the variable elimination algorithm, which is the simplest. Variable elimination works by evaluating expressions such as Equation (14.3) in right-to-left order . Intermediate results ;are stored, and summations over each variable are done only for those portions of the expression that depend on the variable. 35 CS6659-ARTIFICIAL INTELLIGENCE VI SEM Examining this sequence of steps, we see that there are two basic computational operations required: pointwise product of a pair of factors, and sumrning out a variable from a product of factors The pointwise product is not matrix multiplication, nor is it element-by-element multiplication. The pointwise product of two factors fl and f2 yields a new factor f whose variables are the union of the variables in fl and f2. Suppose the two factors have variables Yl, . . . , Yk in common. Then we have f ( x l ... x j , Y l . . .Yk,Z1.. .Z') =fl(X1.. .Xj,Yl.. .Yk) f2(Y1 ... Yk,Z,. . .Z2). Summing out a variable from a product of factors is also a straightforward computation. The only trick is to notice that any factor that does not depend on the variable to be summed out can be moved outside the summation process. Complexity of Exact Inference The time and space requirements of variable elimination are dominated by the size of the largest factor constructed during the operation of the algorithm. This in turn is determined by the order of elimination of variables and by the structure of the network. The family of networks in which there is at most one undirected path between any two nodes in the network are called singly connected networks or polytrees, 36 CS6659-ARTIFICIAL INTELLIGENCE VI SEM and they have a particularly nice property: The time and space complexity of exact inference in polytrees is linear in the size of the network For multiply connected networks, variable elimination can have exponential time and space complexity in the worst case, even when the number of parents per node is bounded Clustering Algorithms The basic idea of clustering is to join individual nodes of the network to form cluster nodes in such a way that the resulting network is a polytree. For example, the multiply connected network shown can be converted into a polytree by combining the Sprinkler and Rain node into a cluster node called Sprinkler+Rain, as shown in Figure. The two Boolean nodes are replaced by a meganode that takes on four possible values: TT, TF, FT, and FF. The meganode has only one parent, the Boolean variable Cloudy, so there are two conditioning cases. CS6659-ARTIFICIAL INTELLIGENCE VI SEM Approximate Inference In Bayesian Networks This section describes randomized sampling algorithms, also called Monte Carlo algorithms that provide approximate answers whose accuracy depends on the number of samples generated. The primitive element in any sampling algorithm is the generation of samples from a known probability distribution. For example, an unbiased coin can be thought of as a random variable Coin with values (heads, tails) and a prior distribution P(Coin) = (0.5,0.5). Sampling from this distribution is exactly like flipping the coin: with probability 0.5 it will return heads, and with probability 0.5 it will return tails. The simplest kind of random sampling process for Bayesian networks generates events from a network that has no evidence associated with it. The idea is to sample each variable in turn, in topological order. The probability distribution from which the value is sampled is conditioned on the values already assigned to the variable's parents. This algorithm is shown in Figure 14.12. We can illustrate its operation on the network in Figure 14.1l(a), assuming an ordering [Cloudy,S prinkler, Rain, WetGrass]: 1. Sample from P(C1oudy) = (0.5,0.5); suppose this returns true. 2. Sample from P(Sprinkler 1 Cloudy = true) = (0.:L , 0.9);suppose this returns false. 3. Sample from P(Rain1C loudy = true) = (0.8,0.2);suppose this returns true. 4. Sample from P( WetGrass(Sprink1er= false, Rain = true) = (0.9,O.l);suppose this returns true. In this case, PRIOR-SAMPLE returns the event [true,false, true, true]. It is easy to see that PRIOR-SAMPLE generates samples from the prior joint distribution specified by the network. First, let SpS(xl, . . . , x,) be the probability that a specific event is generated by the PRIORSAMPLE algorithm. Just looking at the sampling process, we have 38 CS6659-ARTIFICIAL INTELLIGENCE VI SEM because each sampling step depends only on the parent values. In any sampling algorithm, the answers are computed by counting the actual samples generated. Suppose there are N total samples, and let N(xl, . . . , x,) be the frequency of the specific event X I , . . . , x,. We expect this frequency to converge, in the limit, to its expected value according to the sampling probability: (14.4) Whenever we use an approximate equality ("~) in what follows, we mean it in exactly this sense-that the estimated probability becomes exact in the large-sample limit. Such an estimate is called consistent. For example, one can produce a consistent estimate of the probability of any partially specified event XI,. . . , xm, where m <= n, as follows: 39 CS6659-ARTIFICIAL INTELLIGENCE VI SEM That is, the probability of the event can be estimated as the fraction of all complete events generated by the sampling process that match the partially specified event. For example, if we generate 1000 samples from the sprinkler network, and 5 11 of them have Rain = true, then the estimated probability of rain, written as p (rain= true),is 0.511 . Rejection Sampling in Bayesian Networks Rejection sampling is a general method for producing samples from a hard-to-sample distribution given an easy-to-sample distribution. First, it generates samples from the prior distribution specified by the network. Then, it rejects all those that do not match the evidence. Finally, the estimate P(X = x|e) is obtained by counting how often X = x occurs in the remaining samples. Likelihood Weighting 40 CS6659-ARTIFICIAL INTELLIGENCE VI SEM Likelihood weighting avoids the inefficiency of rejection sampling by generating only events that are consistent with the evidence e. LIKELIHOOD-WEIGHTING (see Figure 14.14) fixes the values for the evidence variables E and samples only the remaining variables X and Y. This guarantees that each event generated is consistent with the evidence. Not all events are equal. Before tallying the counts in the distribution for the query variable, each event is weighted by the likelihood that the event accords to the evidence, as measured by the product of the conditional probabilities for each evidence variable, given its parents. Intuitively, events in which the actual evidence appears unlikely should be given less weight. 41 CS6659-ARTIFICIAL INTELLIGENCE VI SEM Let us apply the algorithm to the network shown in Figure 14.11(a), with the query P(Rain/Sprinkler = true, WetGrass = true). The process goes as follows: First, the weight w is set to 1.0. Then an event is generated: 3 Sample from P(C1oudy) = (0.5,0.5); suppose this returns true. 4 Sprinkler is an evidence variable with value true. Therefore, we set w <- w x P(Sprinkler = true1 Cloudy = true) = 0.1 . (iii) Sample from P(Rain/ Cloudy = true) = (0.8,0.2); suppose this returns true. (iv) WetGrass is an evidence variable with value true. Therefore, we set w <- w x P( WetGrass = true(Sprink1er = true, Rain = true) = 0.099 . Here WEIGHTED-SAMPLE returns the event [true, true, true, true] with weight 0.099, and this is tallied under Rain = true. The weight is low because the event describes a cloudy day, which makes the sprinkler unlikely to be on. To understand why likelihood weighting works, we start by examining the sampling distribution S ws for WEIGHTED-SAMPLE. Remember that the evidence variables E are fixed with values e. We will call the other variables Z, that is, Z = {X) U Y. The algorithm samples each variable in Z given its parent values: (14.6) Notice that Parents(Zi) can include both hidden variables and evidence variables. Unlike the prior distribution P(z), the distribution Sws pays some attention to the evidence: the sampled values for each Zi will be influenced by evidence among Zi's ancestors. On the other hand, Sws pays less attention to the evidence than does the true posterior distribution P(Z|e), because the sampled values for each Zi ignore evidence among Zi,'s non- ancestors 42 CS6659-ARTIFICIAL INTELLIGENCE VI SEM The likelihood weight w makes up for the difference between the actual and desired sampling distributions. The weight for a given sample x, composed from z and e, is the product of the likelihoods for each evidence variable given its parents (some or all {of which may be among the Zis): (14.7) Multiplying Equations (14.6) and (14.7), we see that the weighted probability of a sample has the particularly convenient form Because likelihood weighting uses all the samples generated, it can be much more efficient than rejection sampling. It will, however, suffer a degradation in performance as the number of evidence variables increases. This is because most samples will have very low weights and hence the weighted estimate will be dominated by the tiny fraction of samples that accord more than an infinitesimal likelihood to the evidence. Inference by Markov chain simulation In this section, we describe the Markov chain Monte Carlo (MCMC) algorithm for inference in Bayesian networks. The MCMC algorithm 43 CS6659-ARTIFICIAL INTELLIGENCE VI SEM MCMC generates each event by making a random change to the preceding event. It is therefore helpful to think of the network as being in a particular current state specifying a value for every variable. The next state is generated by randomly sampling a value for one of the nonevidence variables Xi, conditioned on the current values of the variables in the Markov blanket of Xi. MCMC therefore wanders randomly around the state space-the space of possible complete assignmentsflipping one variable at a time, but keeping the evidence variables fixed. Consider the query P(Rain1 Sprinkler = true, Wet Grass = true) applied to the network in Figure 14.1 l(a). The evidence variables Sprinkler and WetGrass are fixed to their observed values and the hidden variables Cloudy and Rain are initialized randomly-let us say to true and false respectively. Thus, the initial state is [true, true, false, true]. Now the following steps are executed repeatedly: (d) Cloudy is sampled, given the current values of its Markov blanket variables: in this case, we sample from P(Cloudy1 Sprinkler = true, Rain =false). (Shortly, we will show how to calculate this distribution.) Suppose the result is Cloudy =false. Then the new current state is [false, true, false, true]. (e) Rain is sampled, given the current values of its Markov blanket variables: In this case, we sample from P(Rain1 Cloudy =false, Sprinkler = true, WetGrass = true). Suppose this yields Rain = true. The new current state is [false, true, true, true]. 44 CS6659-ARTIFICIAL INTELLIGENCE VI SEM Each state visited during this process is a sample that contributes to the estimate for the query variable Rain. If the process visits 20 states where Rain is true and 60 states where Rain is false, then the answer to the query is NORMALIZE((20, 60)) = (0.25,0.75). The complete algorithm is shown in Figure 14.15. Why MCMC works Let q(x-> x') be the probability that the process makes a transition from state x to state x'. This transition probability defines what is called a Markov chain on the state space. Suppose we : 45 CS6659-ARTIFICIAL INTELLIGENCE VI SEM Let Xi be the variable to be sampled, and let be all the hidden variables other than Xi. Their values in the current state are xi and xi. If we sample a new value xi for Xi conditionally on all the other variables, including the evidence, we have 46 CS6659-ARTIFICIAL INTELLIGENCE VI SEM This transition probability is called the Gibbs sampler and is a particularly convenient form of MCMC. INFERENCE IN TEMPORAL MODELS The basic tasks include: Filtering or monitoring: This is the task of computing the belief state-the posterior distribution over the current state, given all evidence to date. That is, we wish to compute P(Xt (el:t)a,s suming that evidence arrives in a continuous stream beginning at t = 1. In the umbrella example, this would mean computing the probability of rain today, given all the observations of the umbrella carrier made so far. Filtering is what a rational agent needs to do in order to keep track of the current state so that rational decisions can be made. Prediction: This is the task of computing the posterior distribution over the future state, given all evidence to date. In the umbrella example, this might mean computing the probability of rain three days from now, given all the observations of the umbrella-carrier made so far. Prediction is useful for evaluating possible courses of action. Smoothing, or hindsight: This is the task of computing the posterior distribution over a past state, given all evidence up to the present. In the umbrella example, it might mean computing the probability that it rained last Wednesday, given all the observations of the umbrella carrier made up to today. Hindsight provides a better estimate of the state than was available at the time, because it incorporates more evidence. 47 CS6659-ARTIFICIAL INTELLIGENCE VI SEM Most likely explanation: Given a sequence of observations, we might wish to find the sequence of states that is most likely to have generated those observations. or example, if the umbrella appears on each of the first three days and is absent on the fourth, then the most likely explanation is that it rained on the first three days and did not rain on the fourth. Algorithms for this task are useful in many applications, including speech recognition-where the aim is to find the most likely sequence of words, given a series of sounds-and the reconstruction of bit strings transmitted over a noisy channel. Filtering and Prediction We will show that this can be done in a simple online fashion: given the result of filtering up to time t, one can easily compute the result for t + 1 from the new evidence et+, . That is, for some function f. This process is often called recursive estimation. We can view the calculation as actually being composed of two parts: first, the current state distribution is projected forward from t to t + 1; then it is updated using the new evidence et+l. This two-part process emerges quite simply: 48 CS6659-ARTIFICIAL INTELLIGENCE VI SEM Here and throughout this chapter, a is a normalizing constant used to make probabilities sum up to 1. The second term, P(X(t+l),el:t) represents a one-step prediction of the next state, and the first term updates this with the new evidence; Smoothing Smoothing is the process of computing the distribution over past states given evidence up to the present; that is, P(Xk,e:t) for 1 < k < t. (See Figure 15.3.) This is done most conveniently in two parts-the evidence up to k and the evidence from k + 1 to t, 49 CS6659-ARTIFICIAL INTELLIGENCE VI SEM Both the forward and backward recursions take a constant amount of time per step; hence, the time complexity of smoothing with respect to evidence elzt is O(t). This is the complexity for smoothing at a particular time step k. If we want to smooth the whole sequence, one obvious method is simply to run the whole smoothing process once for each time step to be smoothed. This results in a time complexity of 0 ( t 2 ) .A better approach uses a very simple application of dynamic programming to reduce the complexity to O(t). A clue appears in the preceding analysis of the umbrella example, where we were able to reuse the results of the forward filtering phase. The key to the linear-time algorithm is to record the results of forward filtering over the whole sequence. Then we run the backward recursion from t down to 1, computing the smoothed estimate at each step k from the computed backward message bk+pt and the stored forward message. The algorithm, aptly called the forward-backward algorithm, is shown in Figure 15.4. 50 CS6659-ARTIFICIAL INTELLIGENCE VI SEM Drawbacks of forward –backward algorithm: 6.Its space complexity can be too high for applications where the state space is large and the sequences are long. 7. The second drawback of the basic algorithm is that i1 needs to be modified to work in an online setting where smoothed estimates must be computed for earlier time slices as new observations are continuously added to the end of the sequence. The most common requirement is for fixed-lag smoothing, which requires computing the smoothed estimate for fixed d. That is, smoothing is done for the time slice d steps behind the current time t; as t increases, the smoothing has to keep up. Obviously, we can run the forwardbackward algorithm over the d-step "window" as each new observation is added, but this seems inefficient. 51 CS6659-ARTIFICIAL INTELLIGENCE VI SEM Hidden Markov Models An HMM is a temporal probabilistic model in which the state of the process is described by a single random variable. The possible values of the variable are the possible states of the world. Additional state variables can be added to a temporal model while staying within the HMM framework, but only by combining all the state variables into a single "megavariable" whose values are all possible tuples of values of the individual state variables. Simplified Matrix Algorithms With a single, discrete state variable Xt, we can give concrete form to the representations of the transition model, the sensor model, and the forw<wd and backward messages. Let the state variable Xt have values denoted by integers 1, . . . , S, where S is the number of possible states. That is, Tij is the probability of a transition from state i to stat(? j. For example, the transition matrix for the umbrella world is We also put the sensor model in matrix form. In this case, because the value of the evidence variable Et is known to be, say, et, we need use only that part of the model specifying the probability that et appears. For each time step t, we construct a diagonal matrix Ot whose diagonal entries are given by the values P(et lXt = i) and whose other entries are 0. For example, on day 1 in the umbrella world, Ul = true, so, from Figure 15.2, we have 52 CS6659-ARTIFICIAL INTELLIGENCE VI SEM Now, if we use column vectors to represent the forward and backward messages, the computations become simple matrix-vector operations. The forward equation (15.3) becomes and the backward equation (15.7) becomes Besides providing an elegant description of the filtering and smoothing algorithms for HMMs, the matrix formulation reveals opportunities for improved algorithms. The first is a simple variation on the forwardbackward algorithm that allows smoothing to be carried out in constant space, independently of the length of the sequence. Fuzzy Reasoning 53 CS6659-ARTIFICIAL INTELLIGENCE VI SEM Fuzzy Logic Systems (FLS) produce acceptable but definite output in response to incomplete, ambiguous, distorted, or inaccurate (fuzzy) input. What is Fuzzy Logic? Fuzzy Logic (FL) is a method of reasoning that resembles human reasoning. The approach of FL imitates the way of decision making in humans that involves all intermediate possibilities between digital values YES and NO. The conventional logic block that a computer can understand takes precise input and produces a definite output as TRUE or FALSE, which is equivalent to human’s YES or NO. The inventor of fuzzy logic, Lotfi Zadeh, observed that unlike computers, the human decision making includes a range of possibilities between YES and NO, such as − CERTAINLY YES POSSIBLY YES CANNOT SAY POSSIBLY NO CERTAINLY NO The fuzzy logic works on the levels of possibilities of input to achieve the definite output. Implementation It can be implemented in systems with various sizes and capabilities ranging from small micro-controllers to large, networked, workstation-based control systems. 54 CS6659-ARTIFICIAL INTELLIGENCE VI SEM It can be implemented in hardware, software, or a combination of both. Why Fuzzy Logic? Fuzzy logic is useful for commercial and practical purposes. It can control machines and consumer products. It may not give accurate reasoning, but acceptable reasoning. Fuzzy logic helps to deal with the uncertainty in engineering. Fuzzy Logic Systems Architecture It has four main parts as shown − Fuzzification Module − It transforms the system inputs, which are crisp numbers, into fuzzy sets. It splits the input signal into five steps such as − LP x is Large Positive MP x is Medium Positive S x is Small MN x is Medium Negative LN x is Large Negative Knowledge Base − It stores IF-THEN rules provided by experts. Inference Engine − It simulates the human reasoning process by making fuzzy inference on the inputs and IF-THEN rules. Defuzzification Module − It transforms the fuzzy set obtained by the inference engine into a crisp value. 55 CS6659-ARTIFICIAL INTELLIGENCE VI SEM The membership functions work on fuzzy sets of variables. Membership Function Membership functions allow you to quantify linguistic term and represent a fuzzy set graphically. A membership function for a fuzzy set A on the universe of discourse X is defined as µA:X → [0,1]. Here, each element of X is mapped to a value between 0 and 1. It is called membership value or degree of membership. It quantifies the degree of membership of the element in X to the fuzzy set A. x axis represents the universe of discourse. y axis represents the degrees of membership in the [0, 1] interval. There can be multiple membership functions applicable to fuzzify a numerical value. Simple membership functions are used as use of complex functions does not add more precision in the output. All membership functions for LP, MP, S, MN, and LN are shown as below − 56 CS6659-ARTIFICIAL INTELLIGENCE VI SEM The triangular membership function shapes are most common among various other membership function shapes such as trapezoidal, singleton, and Gaussian. Here, the input to 5-level fuzzifier varies from -10 volts to +10 volts. Hence the corresponding output also changes. Example of a Fuzzy Logic System Let us consider an air conditioning system with 5-level fuzzy logic system. This system adjusts the temperature of air conditioner by comparing the room temperature and the target temperature value. 57 CS6659-ARTIFICIAL INTELLIGENCE VI SEM Algorithm Define linguistic variables and terms. Construct membership functions for them. Construct knowledge base of rules. Convert crisp data into fuzzy data sets using membership functions. (fuzzification) Evaluate rules in the rule base. (interface engine) Combine results from each rule. (interface engine) Convert output data into non-fuzzy values. (defuzzification) Logic Development Step 1: Define linguistic variables and terms Linguistic variables are input and output variables in the form of simple words or sentences. For room temperature, cold, warm, hot, etc., are linguistic terms. 58 CS6659-ARTIFICIAL INTELLIGENCE VI SEM Temperature (t) = {very-cold, cold, warm, very-warm, hot} Every member of this set is a linguistic term and it can cover some portion of overall temperature values. Step 2: Construct membership functions for them The membership functions of temperature variable are as shown − Step3: Construct knowledge base rules Create a matrix of room temperature values versus target temperature values that an air conditioning system is expected to provide. RoomTemp. Very_Cold Cold Warm Hot Very_Hot /Target Very_Cold No_Change Heat Heat Heat Heat Cold Cool No_Change Heat Heat Heat 59 CS6659-ARTIFICIAL INTELLIGENCE VI SEM Warm Cool Cool No_Change Heat Heat Hot Cool Cool Cool No_Change Heat Very_Hot Cool Cool Cool Cool No_Change Build a set of rules into the knowledge base in the form of IF-THEN-ELSE structures. Sr. No. Condition Action 1 IF temperature=(Cold OR Very_Cold) AND target=Warm THEN Heat 2 IF temperature=(Hot OR Very_Hot) AND target=Warm THEN Cool 3 IF (temperature=Warm) AND (target=Warm) THEN No_Change Step 4: Obtain fuzzy value Fuzzy set operations perform evaluation of rules. The operations used for OR and AND are Max and Min respectively. Combine all results of evaluation to form a final result. This result is a fuzzy value. Step 5: Perform defuzzification Defuzzification is then performed according to membership function for output variable. 60 CS6659-ARTIFICIAL INTELLIGENCE Application Areas of Fuzzy Logic The key application areas of fuzzy logic are as given − Automotive Systems Automatic Gearboxes Four-Wheel Steering Vehicle environment control Consumer Electronic Goods Hi-Fi Systems Photocopiers Still and Video Cameras Television Domestic Goods VI SEM 61 CS6659-ARTIFICIAL INTELLIGENCE Microwave Ovens Refrigerators Toasters Vacuum Cleaners Washing Machines Environment Control Air Conditioners/Dryers/Heaters Humidifiers Advantages of FLSs Mathematical concepts within fuzzy reasoning are very simple. You can modify a FLS by just adding or deleting rules due to flexibility of fuzzy logic. Fuzzy logic Systems can take imprecise, distorted, noisy input information. FLSs are easy to construct and understand. Fuzzy logic is a solution to complex problems in all fields of life, including medicine, as it resembles human reasoning and decision making. Disadvantages of FLSs There is no systematic approach to fuzzy system designing. They are understandable only when simple. They are suitable for the problems which do not need high accuracy. Dempster-Shafer Theory VI SEM 62 CS6659-ARTIFICIAL INTELLIGENCE VI SEM • Dempster-Shafer theory is an approach to combining evidence • Dempster (1967) developed means for combining degrees of belief derived from independent items of evidence. • His student, Glenn Shafer (1976), developed method for obtaining degrees of belief for one question from subjective probabilities for a related question • People working in Expert Systems in the 1980s saw their approach as ideally suitable for such systems. • Each fact has a degree of support, between 0 and 1: – 0 No support for the fact – 1 full support for the fact • Differs from Bayesian approah in that: – Belief in a fact and its negation need not sum to 1. – Both values can be 0 (meaning no evidence for or against the fact) Frame of discernment : = { θ1, θ2, …, θn} • Bayes was concerned with evidence that supported single conclusions (e.g., evidence for each outcome θi in Θ): p(θi | E) • D-S Theoryis concerned with evidences which support subsets of outcomes in Θ, e.g., θ1 v θ2 v θ3 or {θ1, θ2, θ3} • The “frame of discernment” (or “Power set”) of Θ is the set of all possible subsets of Θ: – E.g., if Θ = { θ1, θ2, θ3} 63 CS6659-ARTIFICIAL INTELLIGENCE VI SEM • Then the frame of discernment of Θ is: ( Ø, θ1, θ2, θ3, {θ1, θ2}, {θ1, θ3}, {θ2, θ3}, { θ1, θ2, θ3} ) • Ø, the empty set, has a probability of 0, since one of the outcomes has to be true. • Each of the other elements in the power set has a probability between 0 and 1. • The probability of { θ1, θ2, θ3} is 1.0 since one has to be true. Mass function m(A): (where A is a member of the power set) = proportion of all evidence that supports this element of the power set. “The mass m(A) of a given member of the power set, A, expresses the proportion of all relevant and available evidence that supports the claim that the actual state belongs to A but to no particular subset of A.” (wikipedia) “The value of m(A) pertains only to the set A and makes no additional claims about any subsets of A, each of which has, by definition, its own mass. Mass function m(A): • Each m(A) is between 0 and 1. • All m(A) sum to 1. • m(Ø) is 0 - at least one must be true. Interpetation of m({AvB})=0.3 64 CS6659-ARTIFICIAL INTELLIGENCE • means there is evidence for {AvB} that cannot be divided among more specific beliefs for A or B. Belief in A: The belief in an element A of the Power set is the sum of the masses of elements which are subsets of A (including A itself). E.g., given A={q1, q2, q3} Bel(A) = m(q1)+m(q2)+m(q3) + m({q1, q2})+m({q2, q3})+m({q1, q3}) +m({q1, q2, q3}) Belief in A: example • Given the mass assignments as assigned by the detectives: A {B} {J} {S} m(A) 0.1 0.2 0.1 {B,J} {B,S}{J,S} {B,J,S} 0.1 0.1 0.3 0.1 bel({B}) = m({B}) = 0.1 bel({B,J}) = m({B})+m({J})+m({B,J}) =0.1+0.2+0.1=0.4 Result: A {B} {J} {S} {B,J} {B,S}{J,S} {B,J,S} m(A) 0.1 0.2 0.1 0.1 0.1 0.3 0.1 bel(A) 0.1 0.2 0.1 0.4 0.3 0.6 1.0 VI SEM 65