* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download BCA-601
Philosophy of artificial intelligence wikipedia , lookup
Multi-armed bandit wikipedia , lookup
Ecological interface design wikipedia , lookup
Existential risk from artificial general intelligence wikipedia , lookup
Logic programming wikipedia , lookup
Embodied language processing wikipedia , lookup
Computer Go wikipedia , lookup
Expert system wikipedia , lookup
Embodied cognitive science wikipedia , lookup
Personal knowledge base wikipedia , lookup
ARTIFICIAL INTELLIGENCE BCA - 601 ARTIFICIAL INTELLIGENCE BCA - 601 This SIM has been prepared exclusively under the guidance of Punjab Technical University (PTU) and reviewed by experts and approved by the concerned statutory Board of Studies (BOS). It conforms to the syllabi and contents as approved by the BOS of PTU. Reviewer Dr. N. Ch. S.N. Iyengar Senior Professor, School of Computing Sciences, VIT University, Vellore Author: Prof. Angajala Srinivasa Rao Copyright © Author, 2009 Reprint 2010 All rights reserved. No part of this publication which is material protected by this copyright notice may be reproduced or transmitted or utilized or stored in any form or by any means now known or hereinafter invented, electronic, digital or mechanical, including photocopying, scanning, recording or by any information storage or retrieval system, without prior written permission from the Publisher. Information contained in this book has been published by VIKAS® Publishing House Pvt. Ltd. and has been obtained by its Authors from sources believed to be reliable and are correct to the best of their knowledge. However, the Publisher and its Authors shall in no event be liable for any errors, omissions or damages arising out of use of this information and specifically disclaim any implied warranties or merchantability or fitness for any particular use. Vikas® is the registered trademark of Vikas® Publishing House Pvt. Ltd. VIKAS® PUBLISHING HOUSE PVT LTD E-28, Sector-8, Noida - 201301 (UP) Phone: 0120-4078900 • Fax: 0120-4078999 Regd. Office: 576, Masjid Road, Jangpura, New Delhi 110 014 • Website: www.vikaspublishing.com • Email: [email protected] CAREER OPPORTUNITIES Computer software engineers are projected to be one of the fastest growing occupations in the next decade. To improve the intelligence of computer systems 'Computer applications software engineers' analyze users' needs and design, construct, and maintain specialized utility programs. Software engineers coordinate the construction and maintenance of a company's computer systems and plan their future growth. Software engineers work for companies that configure, implement, and install complete computer systems. Increasing emphasis on intelligence of computers suggests that software engineers with advanced degrees that include artificial intelligence will be sought after by software developers, government agencies, and consulting firms specializing in information assurance and security. PTU DEP SYLLABI-BOOK MAPPING TABLE Artificial Intelligence Syllabi SECTION I Introduction to AI: Definitions, AI problems, the underlying assumption, and AI techniques, Level of Model, Criteria for Success. Problems, Problem Space and Search: Defining the problem as a state space search, Production System, Problem Characteristics, Production System Characteristics, issues in design of search programs. SECTION-II Knowledge Representation Issues: Representation and mapping, approaches to knowledge representation, issues in knowledge representation, the frame problem. Knowledge representation using predicate logic: Representing simple facts in logic, representing instance and is a relationships, resolution SECTION-III Weak -slot and -filler structures: Semantic nets, frames as sets and instances. Strong slot and filler structures: Conceptual dependency, scripts, CYC. Natural language processing: Syntactic processing, semantic analysis, discourse and pragmatic processing Mapping in Book Unit 1: What is Artificial Intelligence? (Pages 3-30); Unit 2: Problems, Problem Spaces and Search (Pages 31-62) Unit 3: Knowledge Representation Issues (Pages 63-83); Unit 4: Using Predicate Logic (Pages 85-112) Unit 5: Weak Slot and Filler Structures (Pages 113-132) Unit 6: Strong Slot and Filler Structures (Pages 133-153) Unit 7: Natural Language Processing (Pages 155-190) CONTENTS INTRODUCTION UNIT 1 WHAT IS ARTIFICIAL INTELLIGENCE? 1 3-30 1.0 Introuction 1.1 Unit Objectives 1.2 Basics of Artificial Intelligence (AI) 1.2.1 1.2.2 1.2.3 1.2.4 1.2.5 1.2.6 AI Influence on Communication Systems Timeline of Telecommunication-related Technology Timeline of AI-related Technology Timeline of AI Events Modern Digital Communications Building Intelligent Communication Systems 1.3 Beginning of AI 1.3.1 Definitions 1.3.2 AI Vocabulary 1.4 Problems of AI 1.5 Branches of AI 1.6 Applications of AI 1.6.1 1.6.2 1.6.3 1.6.4 1.6.5 1.6.6 1.6.7 1.6.8 1.6.9 Perception Robotics Natural Language Processing Planning Expert Systems Machine Learning Theorem Proving Symbolic Mathematics Game Playing 1.7 Underlying Assumption of AI 1.7.1 Physical Symbol System Hypothesis 1.8 AI Technique 1.8.1 Tic-Tac-Toe 1.8.2 Question Answering 1.9 1.10 1.11 1.12 1.13 1.14 1.15 1.16 Level of the AI Model Criteria for Success Conclusion Summary Key Terms Answers to ‘Check Your Progress’ Questions and Exercises Further Reading UNIT 2 PROBLEMS, PROBLEM SPACES AND SEARCH 2.0 Introduction 2.1 Unit Objectives 2.2 Problem of Building a System 2.2.1 AI - General Problem Solving 2.3 Defining Problem in a State Space Search 2.3.1 State Space Search 2.3.2 The Water Jug Problem 31-62 2.4 Production Systems 2.4.1 Control Strategies 2.4.2 Search Algorithms 2.4.3 Heuristics 2.5 Problem Characteristics 2.5.1 2.5.2 2.5.3 2.5.4 Problem Decomposition Can Solution Steps be Ignored? Is the Problem Universe Predictable? Is Good Solution Absolute or Relative? 2.6 Characteristics of Production Systems 2.7 Design of Search Programs and Solutions 2.7.1 Design of Search Programs 2.7.2 Additional Problems 2.8 2.9 2.10 2.11 2.12 Summary Key Terms Answers to ‘Check Your Progress’ Questions and Exercises Further Reading UNIT 3 KNOWLEDGE REPRESENTATION ISSUES 63-83 3.0 Introduction 3.1 Unit Objectives 3.2 Knowledge Representation 3.2.1 3.2.2 3.2.3 3.2.4 3.2.5 Approaches to AI Goals Fundamental System Issues Knowledge Progression Knowledge Model Knowledge Typology Map 3.3 Representation and Mappings 3.3.1 Framework of Knowledge Representation 3.3.2 Representation of Facts 3.3.3 Using Knowledge 3.4 Approaches to Knowledge Representation 3.4.1 Properties for Knowledge Representation Systems 3.4.2 Simple Relational Knowledge 3.5 Issues in Knowledge Representation 3.5.1 3.5.2 3.5.3 3.5.4 Important Attributes and Relationships Granularity Representing Set of Objects Finding Right Structure 3.6 The Frame Problem 3.6.1 The Frame Problem in Logic 3.7 3.8 3.9 3.10 Summary Key Terms Questions and Exercises Further Reading UNIT 4 4.0 4.1 4.2 4.3 USING PREDICATE LOGIC Introduction Unit Objectives The Basic in Logic Representing Simple Facts in Logic 4.3.1 Logic 4.4 Representing Instance and Isa Relationships 85-112 4.5 Computable Functions and Predicates 4.6 Resolution 4.6.1 4.6.2 4.6.3 4.6.4 4.6.5 4.7 4.8 4.9 4.10 4.11 4.12 Algorithm: Convert to Clausal Form The Basis of Resolution Resolution in Propositional Logic The Unification Algorithm Resolution in Predicate Logic Natural Deduction Summary Key Terms Answers to ‘Check Your Progress” Questions and Exercises Further Reading UNIT 5 WEAK SLOT AND FILLER STRUCTURES 113-132 5.0 Introduction 5.1 Unit Objectives 5.2 Semantic Nets 5.2.1 Representation in a Semantic Net 5.2.2 Inference in a Semantic Net 5.2.3 Extending Semantic Nets 5.3 Frames 5.3.1 Frame Knowledge Representation 5.3.2 Distinction between Sets and Instances 5.4 5.5 5.6 5.7 5.8 Summary Key Terms Answers to ‘Check Your Progress’ Questions and Exercises Further Reading UNIT 6 6.0 6.1 6.2 6.3 6.4 STRONG SLOT AND FILLER STRUCTURES 133-153 Introduction Unit Objectives Conceptual Dependency Scripts CYC 6.4.1 Motivation 6.5 6.6 6.7 6.8 6.9 6.10 6.11 CYCL Global Ontology Summary Key Terms Answers to ‘Check Your Progress’ Questions and Exercises Further Reading UNIT 7 NATURAL LANGUAGE PROCESSING 7.0 Introduction 7.1 Unit Objectives 7.2 The Problem of Natural Language 7.2.1 Steps in the Process 7.2.2 Why is Language Processing Difficult? 7.3 Syntactic Processing 7.4 Augmented Transition Networks 155-190 7.5 Semantic Analysis 7.6 Discourse and Pragmatic Processing 7.6.1 Conversational Postulates 7.7 7.8 7.9 7.10 7.11 7.12 General Comments Summary Key Terms Answers to ‘Check Your Progress’ Questions and Exercises Further Reading INTRODUCTION Artificial Intelligence (AI) is a technique that helps create software programs to make computers perform operations, which require human intelligence. Currently, AI is being used in various application areas such as intelligent game playing, natural language processing, vision processing and speech processing. In AI, a large amount of knowledge is required to solve problems such as natural language understanding and generation. To represent knowledge in AI, predicate logic is used, which helps represent simple facts. Artificial intelligence is also referred to as computational intelligence. Introduction NOTES This book, Artificial Intelligence is divided into seven units. The first unit covers the basics of AI whereas the second unit covers general problem solving, heuristics, control strategies, problem decomposition and design of search programs. The third unit discusses knowledge presentation issues including knowledge progression, knowledge model and typology map. The fourth unit focusses on predicate logic covering the basis of resolution, the unification algorithm and resolution in propositional logic as well as predicate logic. Units five and six are devoted to weak and strong slots and filler structures. While the former discusses semantic nets and frames, the latter discusses global ontology, conceptual dependency, scripts, CYC, CYCL and motivation. The last unit is devoted to natural language processing, syntactic processing, semantic analysis, discourse and pragmatic processing and conversational postulates. The book follows the self-instructional mode wherein each Unit begins with an Introduction to the topic. The Unit Objectives are then outlined before going on to the presentation of the detailed content in a simple and structured format. Check Your Progress questions are provided at regular intervals to test the student’s understanding of the subject. A Summary, a list of Key Terms and a set of Questions and Exercises are provided at the end of each unit for recapitulation. Self-Instructional Material 1 Introduction NOTES 2 Self-Instructional Material UNIT 1 WHAT IS ARTIFICIAL INTELLIGENCE? What is Artificial Intelligence? NOTES Structure 1.0 Introuction 1.1 Unit Objectives 1.2 Basics of Artificial Intelligence (AI) 1.2.1 1.2.2 1.2.3 1.2.4 1.2.5 1.2.6 AI Influence on Communication Systems Timeline of Telecommunication-related Technology Timeline of AI-related Technology Timeline of AI Events Modern Digital Communications Building Intelligent Communication Systems 1.3 Beginning of AI 1.3.1 Definitions 1.3.2 AI Vocabulary 1.4 Problems of AI 1.5 Branches of AI 1.6 Applications of AI 1.6.1 1.6.2 1.6.3 1.6.4 1.6.5 1.6.6 1.6.7 1.6.8 1.6.9 Perception Robotics Natural Language Processing Planning Expert Systems Machine Learning Theorem Proving Symbolic Mathematics Game Playing 1.7 Underlying Assumption of AI 1.7.1 Physical Symbol System Hypothesis 1.8 AI Technique 1.8.1 Tic-Tac-Toe 1.8.2 Question Answering 1.9 1.10 1.11 1.12 1.13 1.14 1.15 1.16 Level of the AI Model Criteria for Success Conclusion Summary Key Terms Answers to ‘Check Your Progress’ Questions and Exercises Further Reading 1.0 INTRODUCTION In this unit, you will learn about the basics of artificial intelligence (AI), a key technology today, used in many applications. AI is also known as Computational Intelligence. With the development of the digital electronic computer in 1941, the technology to create machine intelligence was finally available. The term, ‘Artificial Intelligence’ was first used in 1956 at the Dartmouth conference, and since then AI Self-Instructional Material 3 What is Artificial Intelligence? NOTES has expanded with theories and principles developed by dedicated researchers. Though advances in the field of AI have been slower, than first estimated, progress continues to be made. During the last four decades a variety of AI programs has been developed in sync with other technological advancements. In 1941, the development of digital computer revolutionized every aspect of storage and processing of information in the US and Germany. The early computers required large, separate air-conditioned rooms, and were programmers’ nightmare. These involved separate configurations of thousands of wires to get a program running. The 1949 innovation—the stored program computer—made the job of entering a program easier. Further advancements in computer theory, led to development in computer science and eventually to AI. With the invention of an electronic means of processing data came a medium that made AI possible. 1.1 UNIT OBJECTIVES After going through this unit, you will be able to: • Explain the basics of AI • Discuss the advantages and constraints of AI • Explain the beginning of AI • Describe the key applications of AI • Understand the underlying assumption of AI • Discuss AI techniques • Explain the level of the model • Describe the criteria for success 1.2 BASICS OF ARTIFICIAL INTELLIGENCE Artificial intelligence is ‘the learning and devising of an intelligence system’. It recognizes its surrounding and initiates actions that maximize its change of success. The term AI is used to describe the ‘intelligence’ that the system demonstrates. Tools and insights from fields, including linguistics, psychology, computer science, cognitive science, neuroscience, probability, optimization and logic are used. Since the human brain uses many techniques to plan and cross-check facts, so now, system integrations is seen as esstential for AI. 1.2.1 AI Influence on Communication Systems • The telephone is one of the most marvellous inventions of the communications era: the physical distance is conquered instantly. ■ Any telephone in the world can be reached through a vast communication network that spans oceans and continents. ■ The form of communication is natural, namely human speech. • Communication is no more just two persons talking; it is much more. ■ Humans communicate with a knowledge source to gather facts. 4 Self-Instructional Material ■ ■ ■ ■ ■ Humans communicate with intelligent systems for solving problems requiring higher mental process. Humans communicate with experts to seek specialized opinion. Humans communicate with logic machines to seek guidance. Humans communicate with reasoning systems to get new knowledge. Humans communicate for many more things to do. What is Artificial Intelligence? NOTES • Advances in communication technologies have led to increased worldwide connectivity, while new technology like cell-phone has increased mobility. 1.2.2 Timeline of Telecommunication-related Technology • Roots of Communication: The development of communication systems began two centuries ago with wire-based electrical systems called telegraph and telephone. Before that it was human messengers on foot or horseback. Egypt and China built messenger relay stations. • Electric Telegraph was invented by Samuel Morse in 1831. • Morse code was invented by Samuel Morse in 1835, a method for transmitting telegraphic information. • Typewriter was invented by Christopher Latham Sholes in 1867, as the first practical typewriting business offices machine. • Telephone was invented by Alexander Graham Bell in 1875, an instrument through which speech sounds, not voice, were first transmitted electrically. • Telephone Exchange, a Rotary Dialling system became operational in New Haven, Connecticut in 1878. • Kodak Camera was invented in 1888 by George Eastman, who also introduced Rolled Photographic Film. • Telegraphing was invented by Valdemar Poulsen in 1899; it was a tape recorder; recording media was a magnetized steel tape. • Wireless telegraphy was invented by Guglielmo Marconi in 1902; it transmitted MF band radio signals across the Atlantic Ocean, from Cornwall to Newfoundland. • Audion vacuum tube was invented by Lee Deforest in 1906, a two-electrode detector device and later in 1908 a three electrode amplifier device. • Cross-continental telephone call was invented by Graham Bell’s 1914, evolution of the Telegraph into the Telephone; the Greek word tele meaning from a far, and phone meaning voice. • Radios with tuners came in 1916, a technological revolution of its time; pilots directed gunfire of artillery batteries on ground through wireless operator attached to each battery. • Iconoscope was invented by Vladimir Zworykin in 1923, a tube for television camera needed for TV transmission. • Television system was invented by John Logie Baird in 1925, transmitted TV signals in 1927 between London and Glasgow over telephone line. • Radio networks came in 1927, a system which distributed programming (contents) to multiple stations simultaneously, for the purpose of extending total coverage beyond the limits of a single broadcast station. Self-Instructional Material 5 What is Artificial Intelligence? Radio broadcasting is traditionally through air as radio waves; it can be via cable, FM, local wire networks, satellite and internet. Stations are linked in radio networks, either in syndication or simulcast or both. NOTES 1.2.3 Timeline of AI-related Technology • Roots of AI: The development of artificial intelligence actually began centuries ago, long before the computer. • Roman Abacus (5000 years ago): machine with memory. • Pascaline (1652): Calculating machines that mechanized arithmetic. • Diff. Engine (1849): Mechanical calculating machines programmable to tabulate polynomial functions. • Boolean algebra (1854): ‘Investigation of Laws of Thought’ symbolic language of calculus. • Turing machines (1936): an abstract symbol-manipulating device, adapted to simulate the logic, was the first computer invented (on paper only). • Von Neumann architecture (1945): Computer design model, a processing unit and a shared memory structure to hold both instructions and data. • ENIAC (1946): Electronic Numerical Integrator and Calculator, the first electronic ‘general-purpose’ digital computer by Eckert and Mauchly. 1.2.4 Timeline of AI Events The concept of AI as a true scientific pursuit is very new. It remained over centuries a plot for popular science fiction stories. Most researchers agree the beginning of AI with Alan Turing. • Turing test, by Alan Turing in 1950, in the paper ‘Computing Machinery and Intelligence’ used to measure machine intelligence. • Intelligent behaviour by Norbert Wiener in 1950 observed link between human intelligence and machines, and theorized intelligent behaviour. • Logic Theorist, a program by Allen Newell and Herbert Simon in 1955, claimed that machines can contain minds just as human bodies do, proved 38 out of the first 52 theorems in Principia Mathematica. • Birth of AI, Dartmouth Summer Research Conference on Artificial Intelligence in 1956, organized by John McCarthy, regarded as the father of AI. • Seven years later, in 1963, AI began to pick up momentum. The field was still undefined, and the ideas formed at the conference were re-examined. • In 1957, General Problem Solver (GPS) was tested. GPS was an extension of Wiener’s feedback principle, and capable of solving to a great extent common sense problems. • In 1958, LISP language was invented by McCarthy and soon adopted as the language of choice among most AI developers. • In 1963, DoD’s Advanced Research projects started at MIT, researching Machine-Aided Cognition (artificial intelligence), by drawing computer scientists from around the world. 6 Self-Instructional Material • In 1968, Micro-world program SHRDLU, at MIT, controlled a robot arm operated above flat surface scattered with play blocks. • In mid-1970s, Expert systems for Medical diagnosis (Mycin), Chemical data analysis (Dendral) and Mineral exploration (Prospector) were developed. • During the 1970s, Computer vision (CV) technology of machines that ‘see’ emerged. David Marr was first to model the functions of the visual system. What is Artificial Intelligence? NOTES • In 1972, Prolog by Alain Colmerauer was a logic programming language. Logic programming is the use of logic in both declarative and procedural representation language. 1.2.5 Modern Digital Communications In 1947, Shannon created a mathematical theory, which formed the basis for modern digital communications. Since then the developments were: • 1960s: three geosynchronous communication satellites, launched by NASA. • 1961: packet switching theory was published by Leonard Kleinrock at MIT. • 1965: wide-area computer network, a low speed dial-up telephone line, created by Lawrence Roberts and Thomas Merrill. • 1966: Optical fibre was used for transmission of telephone signals. • Late 1966: Roberts went to DARPA to develop the computer network concept and put his plan for the ‘Advanced Research Projects Agency Network – ARPANET’, which he presented in a conference in 1967. There Paul Baran and others at RAND presented a paper on packet switching networks for secure voice communication in military use. • 1968: Roberts and DARPA revised the overall structure and specifications for the ARPANET, released an RFQ for development of key components: the packet switches called Interface Message Processors (IMP’s). • Architectural design by Bob Kahn. • Network topology and economics design and optimization by Roberts with Howard Frank and his team at Network Analysis Corporation. • Data networking technology, network measurement system preparation by Kleinrock’s team at University of California, Los Angeles (UCLA). The year 1969 saw the beginning of the Internet era, the development of ARPANET, an unprecedented integration of capabilities of telegraph, telephone, radio, television, satellite, Optical fibre and computer. In 1969 a day after Labour Day, UCLA became the first node to join the ARPANET. That meant, the UCLA team connected the first switch (IMP) to the first host computer (a minicomputer from Honeywell). A month later the second node was added at SRI (Stanford Research Institute) and the first Host-to-Host message on the Internet was launched from UCLA. It worked in a cleaver way, as described below: • Programmers for ‘logon’ to the SRI Host from the UCLA Host typed in ‘log’ and the system at SRI added ‘in’, thus creating the word ‘login’. • Programmers could communicate by voice as the message was transmitted using telephone headsets at both ends. Self-Instructional Material 7 What is Artificial Intelligence? NOTES • Programmers at the UCLA end typed in the ‘l’ and asked SRI if they received it; came the voice reply ‘got the l’. • By 1969, they connected four nodes (UCLA, SRI, UC SANTA BARBARA, and University of Utah). UCLA served for many years as the ARPANET Measurement Centre. • In the mid-1970s, UCLA controlled a geosynchronous satellite by sending messages through ARPANET from California to East Coast satellite dish. By 1970, they connected ten nodes. * In 1972, the International Network Working Group, INWG was formed to further explore packet switching concepts and internetworking, as there would be multiple independent networks of arbitrary design. • In 1973, Kahn developed a protocol that could meet the needs of an open architecture network environment. This protocol is popularly known as TCP/ IP (Transmission Control Protocol/Internet Protocol). Note: TCP/IP is named after two of the most important protocols in it. • IP is responsible for moving packet of data from node to node. • TCP is responsible for verifying delivery of data from client to server. • Sockets are subroutines that provide access to TCP/IP on most systems. In 1976, X.25 protocol was developed for public packet networking. • In 1977, the first Internet work was demonstrated by Cerf and Kahn. They connected three networks with TCP: the Radio network, the Satellite network (SATNET), and the Advanced Research Projects Agency Network (ARPANET). • In the 1980s, ARPANET was evolved into INTERNET. Internet is defined officially as networks using TCP/IP. • On 1 January 1983, the ARPANET and every other network attached to the ARPANET officially adopted the TCP/IP networking protocol. From then on, all networks that use TCP/IP are collectively known as the Internet. The standardization of TCP/IP allows the number of Internet sites and users to grow exponentially. • Today, Internet has millions of computers and hundreds of thousands of networks. The network traffic is dominated by its ability to promote ‘peopleto-people’ interaction. 1.2.6 Building Intelligent Communication Systems Modern telecommunications are based on coordination and utilization of individual services, such as telephony, cellular, cable, microwave terrestrial and satellite, and their integration into a seamless global network. Successful design, planning, coordination, management and financing of global communications network require a broad understanding of these services, their costs and the advantages and limitations. The next generation of wireless and wired communication systems and networks has requirements for many AI and AI-related concepts, algorithms, techniques and technologies. Research groups are currently exploring and using Semantic Web languages in monitoring, learning, adaptation and reasoning. 8 Self-Instructional Material To present an idea of Intelligent Communication Systems, examples of an ‘Intelligent Mobile Platform’, the ‘VoiceRecognition across Mobile Phone’ and the ‘Project Knowledge Based Networking’ by DARPA are briefly illustrated. Ex.1 Intelligent Mobile Platform: Magitti The mobile phone is no more a simple two-way communication device. Intelligent mobiles infer our behaviour and suggest appropriate lists of restaurants, stores and events. What is Artificial Intelligence? NOTES The difference between Magitti with other GPS-enabled mobile applications is Artificial intelligence. Magitti is intelligence personal assistant. Magitti’s specification has not been released, but mobile phones are becoming increasingly powerful with sensors, entertainment tools, accelerometer, GPS, etc. Perhaps AI would make more sense in not so long a future. Ex.2 Voice Recognition across mobile phone: Lingo Mobile phones can do lots of things, but a majority of people never use them for more than calls and short text messages. A voice recognition-correction interface across mobile phone applications is coming to market to provide speech recognition. Now you can talk to your phone and the phone understands you. Ex.3 Project Knowledge Based Networking by DARPA Military research aims to develop self-configuring, secure wireless nets. Academic concepts of Artificial Intelligence and Semantic Web, combined with technologies such as the Mobile Ad-hoc Network (MANET), Cognitive Radio and Peer-to-Peer networking provide the elements of such a network. This project by DARPA is intended for soldiers in the field. Ex.4 Semantic Web Web is the first generation of the World Wide Web and contains static HTML ages. Web 2.0 is the second generation of the World Wide Web, focused on the ability for people to collaborate and share information online. It is dynamic, serves applications to users and offers open communications. Web requires a human operator, using computer systems to perform the tasks required to find, search and aggregate its information. It is impossible for a computer to do these tasks without human guidance because Web pages are specifically designed for human readers. Ex. 5 Mobile Ad-hoc Network (MANET) The traditional wireless mobile networks have been ‘Infrastructure based’ in which mobile devices communicate with access points like base stations connected to the fixed network. Typical examples of this kind of wireless networks are GSM, WLL, WLAN, etc. Approaches to the next generation of wireless mobile networks are ‘Infrastructure less’. MANET is one such network. A MANET is a collection of wireless nodes that can dynamically form a network to exchange information without using any existing fixed network infrastructure. This is very important because in many contexts information exchange between mobile units cannot rely on any fixed network infrastructure, but on rapid configuration of a wireless connections on-theSelf-Instructional Material 9 What is Artificial Intelligence? NOTES fly. The MANET is a self-configuring network of mobile routers and associated hosts connected by wireless links, the union of which forms an arbitrary topology that may change rapidly and unpredictably. MANET concept A mobile ad-hoc network is a collection of wireless nodes that can dynamically be set up anywhere and anytime without using any existing network infrastructure. It is an autonomous system in which mobile hosts connected by wireless links are free to move randomly and often act as routers at the same time. The MANET traffic includes: • Peer-to-Peer: between two nodes • Remote-to-Remote: two nodes beyond a single hop, and • Dynamic Traffic: nodes are moving around. Ex.6 Cognitive Radio The concept of ‘cognitive’ radio was originated by Defense Advanced Research Products Agency (DARPA) scientist, Joseph Mitola in 1999, and is the ‘next step up’ for software defined radios (SDR) that are emerging today, primarily in military applications. Most commercial radios and many two-way communication devices are hardware-based with predetermined, analog operating parameters. A Software Defined Radio (SDR) is a radio communication system where components that have typically been in hardware (e.g. mixers, filters, amplifiers, modulators/demodulators, detectors, etc.) are implemented using software. The SDR concept is not new. A basic SDR consists of a computer (PC), a sound card, some form of RF front end and embedded signal-processing algorithms. SDR radio can receive and transmit a different form of radio protocol just by running different software. 1.3 BEGINNING OF AI Although the computer provided the technology necessary for AI, it was not until the early 1950s that the link between human intelligence and machines was really observed. Norbert Wiener was one of the first Americans to make observations on the principle of feedback theory. The most familiar example of feedback theory is the thermostat: It controls the temperature of an environment by gathering the actual temperature of the house, comparing it to the desired temperature, and responding by turning the heat up or down. What was so important about his research into feedback loops was that Wiener theorized that all intelligent behaviour was the result of feedback mechanisms—mechanisms that could possibly be simulated by machines. This discovery influenced much of early development of AI. In late 1955, Newell and Simon developed The Logic Theorist, considered by many to be the first AI program. The program, representing each problem as a tree model, would attempt to solve it by selecting the branch that would most likely result in the correct conclusion. The impact that the logic theorist made on both the public and the fields of AI has made it a crucial stepping stone in developing the AI field. 10 Self-Instructional Material In 1956 John McCarthy, the father of AI, organized a conference to draw the talent and expertise of others interested in machine intelligence for a month of brainstorming. He invited them to Vermont for the conference of ‘The Dartmouth summer research project on artificial intelligence.’ From that point, because of McCarthy, the field would be known as Artificial intelligence. Although not a huge success, the Dartmouth conference did bring together the founders of AI, and served to lay the basis for the future of AI research. What is Artificial Intelligence? NOTES In 1957 at Dartmouth New Hampshire conference discussed about the possibilities of simulating human intelligence and thinking in computers. Today Artificial Intelligence is a well-established, natural and scaling branch of computer science. It is the science and engineering of making intelligent machines, especially intelligent computer programs. It is related to the similar task of using computers to understand human intelligence, understanding language, learning, reasoning and solving problems. But AI does not have to confine itself to the methods that are biologically observable. We will discuss alternative definitions of Artificial Intelligence below. 1.3.1 Definitions Artificial Intelligence is the study of how to make computers do things which at the moment people do better. This is ephemeral as it refers to the current state of computer science and it excludes major area problems that cannot be solved well either by computers or by people at the moment. Alternative - I Artificial Intelligence is the branch of computer science that is concerned with the automation of intellectual performance. AI is based upon the principles of computer science, namely data structures used in knowledge representation, the algorithms needed to apply that knowledge and the languages and programming techniques used in their implementation. Alternative - II Artificial Intelligence is a field of study that encompasses computational techniques for performing tasks that seem to require intelligence when performed by humans. Alternative - III Artificial Intelligence is the field of study that seeks to explain and follow intellectual performance in terms of computational processes. Alternative - IV Artificial Intelligence is about generating representations and procedures that automatically or autonomously solve problems heretofore solved by humans. Alternative - V Artificial Intelligence is that part of computer science concerned with designing intelligent computer systems, i.e. computer systems that exhibit characteristics. We associate intelligence in human behaviour with understanding language, learning, reasoning and solving problems. Self-Instructional Material 11 What is Artificial Intelligence? NOTES There are various definitions of Artificial Intelligence according to different authors. Most of these definitions take a very technical direction and avoid philosophical problems connected with the idea that AI’s purpose is to construct an artificial human. These definitions are categorized into four categories. The four categories are: 1. Systems that think like humans (focus on reasoning and human framework) 2. Systems that think rationally (focus on reasoning and a general concept of intelligence) 3. Systems that act like humans (focus on behaviour and human framework) 4. Systems that act rationally (focus on behaviour and a general concept of intelligence) What is rationality? - ‘Doing the right thing,’ simply speaking. Definitions: 1. ‘The art of creating machines that performs functions that require intelligence when performed by humans’ (Kurzweil). Involves cognitive modelling—we have to determine how humans think in a literal sense. 2. GPS is a General Problem Solver (Newell and Simon). Deals with ‘right thinking’ and dives into the field of logic. Uses logic to represent the world and relationships between objects in it and come to conclusions about it. Problems: hard to encode informal knowledge into a formal logic system and theorem provers have limitations (if there’s no solution to a given logical notation). 3. Turing defined intelligent behaviour as the ability to achieve human-level performance in all cognitive tasks, sufficient to fool a human person (Turing Test). Physical contact with the machine has to be avoided, because physical appearance is not relevant to exhibit intelligence. However, the ‘Total Turing Test’ includes appearance by encompassing visual input and robotics as well. 4. The rational agent—achieving one’s goals given one’s beliefs. Instead of focusing on humans, this approach is more general, focusing on agents (which perceive and act), more general than strict logical approach (i.e. thinking rationally). In practical ways, the Intelligence of a Computer System provides: • Ability to automatically perform tasks that currently require human operators • More autonomy in computer systems; fewer requirements for human interference or monitoring • Flexibility in dealing with inconsistency in the environment • Ability to understand what the user wants from limited instructions • Increase in performance by learning from experience 1.3.2 AI Vocabulary Intelligence relates to tasks involving higher mental processes, e.g. creativity, solving problems, pattern recognition, classification, learning, induction, deduction, building analogies, optimization, language processing, knowledge and many more. Intelligence is the computational part of the ability to achieve goals. 12 Self-Instructional Material Intelligent behaviour is depicted by perceiving one’s environment, acting in complex environments, learning and understanding from experience, reasoning to solve problems and discover hidden knowledge, applying knowledge successfully in new situations, thinking abstractly, using analogies, communicating with others and more. What is Artificial Intelligence? NOTES Science based goals of AI pertain to developing concepts, mechanisms and understanding biological intelligent behaviour. The emphasis is on understanding intelligent behaviour. Engineering based goals of AI relate to developing concepts, theory and practice of building intelligent machines. The emphasis is on system building. AI Techniques depict how we represent, manipulate and reason with knowledge in order to solve problems. Knowledge is a collection of ‘facts’. To manipulate these facts by a program, a suitable representation is required. A good representation facilitates problem solving. Learning means that programs learn from what facts or behaviour can represent. Learning denotes changes in the systems that are adaptive in other words, it enables the system to do the same task(s) more efficiently next time. Applications of AI refers to problem solving, search and control strategies, speech recognition, natural language understanding, computer vision, expert systems, etc. 1.4 PROBLEMS OF AI Intelligence does not imply perfect understanding; every intelligent being has limited perception, memory and computation. Many points on the spectrum of intelligence versus cost are viable, from insects to humans. AI seeks to understand the computations required from intelligent behaviour and to produce computer systems that exhibit intelligence. Aspects of intelligence studied by AI include perception, communicational using human languages, reasoning, planning, learning and memory. Let us consider some of the problems that Artificial Intelligence is used to solve. Early examples are game playing and theorem proving, which involves resolution. Common sense reasoning formed the basis of GPS a general problem solver. Natural language processing met with early success; then the power of the computers hindered progress, but currently this topic is experiencing a flow. The question of expert systems is interesting but it represents one of the best examples of an application of AI, which appears useful to non-AI people. Actually the expert system solves particular subsets of problems using knowledge and rules about a particular topic. The following questions are to be considered before we can step forward: 1. What are the underlying assumptions about intelligence? 2. What kinds of techniques will be useful for solving AI problems? 3. At what level human intelligence can be modelled? 4. When will it be realized when an intelligent program has been built? Check Your Progress 1. Who advanced mathematical theory? 2. Web 2.0 is the second generation of the World Wide Web, focused on people to collaborate and share information online. (True or False) 3. A MANET is a collection of wireless nodes that can dynamically form a network to exchange information without using any existing fixed network infrastructure. (True or False). Self-Instructional Material 13 What is Artificial Intelligence? NOTES 1.5 BRANCHES OF AI A list of branches of AI is given below. However some branches are surely missing, because no one has identified them yet. Some of these may be regarded as concepts or topics rather than full branches. Logical AI In general the facts of the specific situation in which it must act, and its goals are all represented by sentences of some mathematical logical language. The program decides what to do by inferring that certain actions are appropriate for achieving its goals. Search Artificial Intelligence programs often examine large numbers of possibilities – for example, moves in a chess game and inferences by a theorem proving program. Discoveries are frequently made about how to do this more efficiently in various domains. Pattern Recognition When a program makes observations of some kind, it is often planned to compare what it sees with a pattern. For example, a vision program may try to match a pattern of eyes and a nose in a scene in order to find a face. More complex patterns are like a natural language text, a chess position or in the history of some event. These more complex patterns require quite different methods than do the simple patterns that have been studied the most. Representation Usually languages of mathematical logic are used to represent the facts about the world. Inference Others can be inferred from some facts. Mathematical logical deduction is sufficient for some purposes, but new methods of non-monotonic inference have been added to the logic since the 1970s. The simplest kind of non-monotonic reasoning is default reasoning in which a conclusion is to be inferred by default. But the conclusion can be withdrawn if there is evidence to the divergent. For example, when we hear of a bird, we infer that it can fly, but this conclusion can be reversed when we hear that it is a penguin. It is the possibility that a conclusion may have to be withdrawn that constitutes the non-monotonic character of the reasoning. Normal logical reasoning is monotonic, in that the set of conclusions can be drawn from a set of premises, i.e. monotonic increasing function of the premises. Circumscription is another form of non-monotonic reasoning. Common sense knowledge and Reasoning 14 Self-Instructional Material This is the area in which AI is farthest from the human level, in spite of the fact that it has been an active research area since the 1950s. While there has been considerable progress in developing systems of non-monotonic reasoning and theories of action, yet more new ideas are needed. Learning from experience There are some rules expressed in logic for learning. Programs can only learn what facts or behaviour their formalisms can represent, and unfortunately learning systems are almost all based on very limited abilities to represent information. What is Artificial Intelligence? NOTES Planning Planning starts with general facts about the world (especially facts about the effects of actions), facts about the particular situation and a statement of a goal. From these, planning programs generate a strategy for achieving the goal. In the most common cases, the strategy is just a sequence of actions. Epistemology This is a study of the kinds of knowledge that are required for solving problems in the world. Ontology Ontology is the study of the kinds of things that exist. In AI the programs and sentences deal with various kinds of objects and we study what these kinds are and what their basic properties are. Ontology assumed importance from the 1990s. Heuristics A heuristic is a way of trying to discover something or an idea embedded in a program. The term is used variously in AI. Heuristic functions are used in some approaches to search or to measure how far a node in a search tree seems to be from a goal. Heuristic predicates that compare two nodes in a search tree to see if one is better than the other, i.e. constitutes an advance toward the goal, and may be more useful. Genetic programming Genetic programming is an automated method for creating a working computer program from a high-level problem statement of a problem. Genetic programming starts from a high-level statement of ‘what needs to be done’ and automatically creates a computer program to solve the problem. It is being developed by John Koza’s group. 1.6 APPLICATIONS OF AI AI has applications in all fields of human study, such as finance and economics, environmental engineering, chemistry, computer science, and so on. Some of the applications of AI are listed below: • Perception ■ Machine vision ■ Speech understanding ■ Touch ( tactile or haptic) sensation • Robotics Self-Instructional Material 15 What is Artificial Intelligence? NOTES • Natural Language Processing ■ Natural Language Understanding ■ Speech Understanding ■ Language Generation ■ Machine Translation • Planning • Expert Systems • Machine Learning • Theorem Proving • Symbolic Mathematics • Game Playing 1.6.1 Perception Perception is defined as ‘as the formation, from a sensory signal, of an internal representation suitable for intelligent processing’. Computer perception is an example of artificial intelligence. It focuses on vision and speech. Since it involves incident time-varying continuous energy distributions prior to interpretation in symbolic terms, perception is seen to be distinct from intelligence. 1.6.1.1 Machine vision It is easy to interface a TV camera to a computer and get an image into memory; the problem understands what the image represents. Vision takes lots of computation; in humans, roughly 10 per cent of all calories consumed are burned in vision computation. 1.6.1.2 Speech understanding Speech understanding is available now. Some systems must be trained for the individual user and require pauses between words. Understanding continuous speech with a larger vocabulary is harder. 1.6.1.3 Touch (tactile or haptic) sensation Important for robot assembly tasks. 1.6.2 Robotics Although industrial robots have been expensive, robot hardware can be cheap. The limiting factor in application of robotics is not the cost of the robot hardware itself. What is needed is perception and intelligence to tell the robot what to do; ‘blind’’ robots are limited to very well-structured tasks (like spray painting car bodies). 1.6.3 Natural Language Processing Natural language processing is explained below. 1.6.3.1 Natural language understanding 16 Self-Instructional Material Natural languages are human languages such as English. Making computers understand English allows non-programmers to use them with little training. Getting computers to generate human or natural languages can be called Natural Language generation. It is much easier than natural language understanding. A text written in one language and then generated in another language by means of computers can be called machine translation. It is important for organizations that operate in many countries. What is Artificial Intelligence? NOTES 1.6.3.2 Natural language generation Generation is easier than Natural Language understanding. It can be an inexpensive output device. 1.6.3.3 Machine translation Usable translation of text is available now. It is important for organizations that operate in many countries. 1.6.4 Planning Planning attempts to make an order of actions for achieving goals. Planning applications include logistics, manufacturing scheduling and planning manufacturing steps to construct a desired product. There are huge amounts of money to be saved through better planning. 1.6.5 Expert Systems Expert Systems attempt to capture the knowledge of a human and present it through a computer system. There have been many successful and economically valuable applications of expert systems. 1.6.5.1 Benefits • Reducing the skill level needed to operate complex devices • Diagnostic advice for device repair • Interpretation of complex data • ‘Cloning’ of scarce expertise • Capturing knowledge of expert who is about to retire • Combining knowledge of multiple experts • Intelligent training 1.6.6 Machine Learning Machine learning has been a central part of AI research from the beginning. In machine learning, unsupervised learning is a class of problems in which one seeks to determine how the data is organized. Unsupervised learning is the ability to find patterns in a stream of input. Supervised learning includes both classification (determines what category something belongs in) and regression (given a set of numerical input/output examples, discovers a continuous function that would generate the outputs from the inputs). 1.6.7 Theorem Proving Proving mathematical theorems might seem to be mainly of academic interest. However, many practical problems can be cast in terms of theorems. A general theorem prover can therefore be widely applicable. Self-Instructional Material 17 What is Artificial Intelligence? 1.6.8 Symbolic Mathematics Symbolic mathematics refers to manipulation of formulas, rather than arithmetic on numeric values. NOTES • Algebra • Differential Calculus • Integral Calculus Symbolic manipulation is often used in conjunction with ordinary scientific computation as a generator of programs used to actually do the calculations. Symbolic manipulation programs are an important component of scientific and engineering workstations. 1.6.9 Game Playing Games are good for research because they are well formalized, small and selfcontained. And they are good models for competitive situations, so principles discovered in game playing programs may be applicable to practical problems. Apart from these, the four following categories highlight application areas where AI technology is having a strong impact on industry and everyday life. 1. Authorizing Financial Transactions: Credit card provides, telephone companies, mortgage lenders, banks and the US Government employ AI systems to detect fraud and expenditure financial transactions, with daily transaction volumes in the billions. These systems first use learning algorithms to construct profiles of customer usage patterns, and then use the resulting profiles to detect unusual patterns and take the appropriate action (Ex. Disabling the credit card). Such automated oversight of financial transactions is an important component in achieving a viable basis for electronic commerce. 2. Configuring Hardware and Software: Systems that diagnose and treat problems, whether illness in people or problems in hardware and software, are now in widespread use. Diagnostic systems based on Artificial Intelligence Technology are being built into photocopiers, computer operating systems and office automation tools to reduce service calls. Standalone units are being used to monitor and control operations in factories and office buildings. Artificial Intelligence based systems assist physicians in many kinds of medical diagnosis, in prescribing treatments and monitoring patient responses. Microsoft’s Office Assistant, an integral part of every office application, provides users with customized help by means of decision-theoretic reasoning. 3. Scheduling for Manufacturing: The use of automatic scheduling for manufacturing operations in exploring as manufacturers realize that remaining competitive demands on every more efficient use of resources. This Artificial Intelligence technology—supporting rapid rescheduling up and down the ‘supply chain’ to respond to changing orders, changing markets, and unexpected events—has been shown to be superior to less adaptive systems based on older technology. This same technology has proven highly effective in other commercial tasks, including job shop scheduling and assigning airport gates and railway crews. It also has proven highly effective in military settings. DARPA reported that an AI based Logistics Planning Tool, DART, pressed into service for operations Dessert shield and Dessert storm, completely repaid its three decades of investments in AI research. 18 Self-Instructional Material 4. The Future: AI began as an attempt to answer some of the most fundamental questions about human existence by understanding the nature of intelligence, but it has grown into a scientific and technological field affecting many aspects of commerce and society. Even as AI technology becomes integrated into the framework of everyday life, AI researchers remain focused on the grand challenges of automating intelligence. Artificial intelligence is used in fields of medical diagnosis, stock trading, robot control, law, scientific discovery, video games and toys. It can also be used for evaluation in specific problems such as small problems in chemistry, handwriting recognition and game-playing. For this Natural language processing (NLP) is used which interacts between computers and human (natural) languages. Natural language processing gives AI machines the ability to read and understand the languages that human beings speak. Some straightforward applications of natural language processing include information retrieval or text mining and machine translation. The pursuit of ultimate goals of AI—design of intelligent artifacts, understanding of human intelligence, abstract understanding of intelligence (possibly super human)—continues to have practical consequences in the form of new industries, enhanced functionality for existing systems, increased productivity in general and improvements in the quality of life. But the ultimate promises of AI are still decades away and the necessary advances in knowledge and technology will require a continuous fundamental research effort. 1.7 What is Artificial Intelligence? NOTES UNDERLYING ASSUMPTION OF AI Newell and Simon presented the Physical Symbol System Hypothesis, which lies in the heart of the research in Artificial Intelligence. A physical symbol system consists of a set of entities called symbols, which are physical patterns that can occur as components of another entity called an expression or symbol structure. A symbol structure is a collection of instances or tokens of symbols related in some physical way. At any instant, the system will contain a collection of these symbol structures. In addition, the system also contains a collection of processes that operate on expressions to produce other expressions; processes of creation, modification, reproduction and destruction. A physical symbol system is a machine that produces an evolving collection of symbol structures. 1.7.1 Physical Symbol System Hypothesis A physical symbol system has the necessary and sufficient attributes for general intelligent action. It is only a hypothesis and there is no way to prove it other than on empirical validation. Evidence in support of the physical symbol system hypothesis has come not only from the areas such as game playing but also from the areas such as visual perception. We will discuss the visual perception area later. The importance of the physical symbol system hypothesis is twofold. It is the major theory of human intelligence and also forms the basic belief that, it is possible to build programs that can execute intelligent tasks now performed by people. Check Your Progress Fill in the blanks: 4. Perception is one of the_______by AI. 5. Pattern recognition is one of the____of AI. 6. Robotics is one of the_______of AI. Self-Instructional Material 19 What is Artificial Intelligence? NOTES 1.8 AI TECHNIQUE Artificial Intelligence research during the last three decades has concluded that Intelligence requires knowledge. To compensate overwhelming quality, knowledge possesses less desirable properties. A. It is huge. B. It is difficult to characterize correctly. C. It is constantly varying. D. It differs from data by being organized in a way that corresponds to its application. E. It is complicated. An AI technique is a method that exploits knowledge that is represented so that: • The knowledge captures generalizations that share properties, are grouped together, rather than being allowed separate representation. • It can be understood by people who must provide it—even though for many programs bulk of the data comes automatically from readings. • In many AI domains, how the people understand the same people must supply the knowledge to a program. • It can be easily modified to correct errors and reflect changes in real conditions. • It can be widely used even if it is incomplete or inaccurate. • It can be used to help overcome its own sheer bulk by helping to narrow the range of possibilities that must be usually considered. In order to characterize an AI technique let us consider initially OXO or tictac-toe and use a series of different approaches to play the game. The programs increase in complexity, their use of generalizations, the clarity of their knowledge and the extensibility of their approach. In this way they move towards being representations of AI techniques. 1.8.1 Tic-Tac-Toe 1.8.1.1 The first approach (simple) The Tic-Tac-Toe game consists of a nine element vector called BOARD; it represents the numbers 1 to 9 in three rows. 1 2 3 4 5 6 7 8 9 An element contains the value 0 for blank, 1 for X and 2 for O. A MOVETABLE vector consists of 19,683 elements (39) and is needed where each element is a nine element vector. The contents of the vector are especially chosen to help the algorithm. The algorithm makes moves by pursuing the following: 1. View the vector as a ternary number. Convert it to a decimal number. 20 Self-Instructional Material 2. Use the decimal number as an index in MOVETABLE and access the vector. 3. Set BOARD to this vector indicating how the board looks after the move. This approach is capable in time but it has several disadvantages. It takes more space and requires stunning effort to calculate the decimal numbers. This method is specific to this game and cannot be completed. What is Artificial Intelligence? NOTES 1.8.1.2 The second approach The structure of the data is as before but we use 2 for a blank, 3 for an X and 5 for an O. A variable called TURN indicates 1 for the first move and 9 for the last. The algorithm consists of three actions: MAKE2 which returns 5 if the centre square is blank; otherwise it returns any blank non-corner square, i.e. 2, 4, 6 or 8. POSSWIN (p) returns 0 if player p cannot win on the next move and otherwise returns the number of the square that gives a winning move. It checks each line using products 3*3*2 = 18 gives a win for X, 5*5*2=50 gives a win for O, and the winning move is the holder of the blank. GO (n) makes a move to square n setting BOARD[n] to 3 or 5. This algorithm is more involved and takes longer but it is more efficient in storage which compensates for its longer time. It depends on the programmer’s skill. 1.8.1.3 The final approach The structure of the data consists of BOARD which contains a nine element vector, a list of board positions that could result from the next move and a number representing an estimation of how the board position leads to an ultimate win for the player to move. This algorithm looks ahead to make a decision on the next move by deciding which the most promising move or the most suitable move at any stage would be and selects the same. Consider all possible moves and replies that the program can make. Continue this process for as long as time permits until a winner emerges, and then choose the move that leads to the computer program winning, if possible in the shortest time. Actually this is most difficult to program by a good limit but it is as far that the technique can be extended to in any game. This method makes relatively fewer loads on the programmer in terms of the game technique but the overall game strategy must be known to the adviser. 1.8.2 Question Answering Let us consider Question Answering systems that accept input in English and provide answers also in English. This problem is harder than the previous one as it is more difficult to specify the problem properly. Another area of difficulty concerns deciding whether the answer obtained is correct, or not, and further what is meant by ‘correct’. For example, consider the following situation: 1.8.2.1 Text Rani went shopping for a new Coat. She found a red one she really liked. When she got home, she found that it went perfectly with her favourite dress. Self-Instructional Material 21 What is Artificial Intelligence? 1.8.2.2 Question 1. What did Rani go shopping for? 2. What did Rani find that she liked? NOTES 3. Did Rani buy anything? Method 1 1.8.2.3 Data Structures A set of templates that match common questions and produce patterns used to match against inputs. Templates and patterns are used so that a template that matches a given question is associated with the corresponding pattern to find the answer in the input text. For example, the template who did x y generates x y z if a match occurs and z is the answer to the question. The given text and the question are both stored as strings. 1.8.2.4 Algorithm Answering a question requires the following four steps to be followed: • Compare the templates against the questions and store all successful matches to produce a set of text patterns. • Pass these text patterns through a substitution process to change the person or voice and produce an expanded set of text patterns. • Apply each of these patterns to the text; collect all the answers and then print the answers. 1.8.2.5 Example In question 1 we use the template WHAT DID X Y which generates Rani go shopping for z and after substitution we get Rani goes shopping for z and Rani went shopping for z giving z [equivalence] a new coat In question 2 we need a very large number of templates and also a scheme to allow the insertion of ‘find’ before ‘that she liked’; the insertion of ‘really’ in the text; and the substitution of ‘she’ for ‘Rani’ gives the answer ‘a red one’. Question 3 cannot be answered. 1.8.2.6 Comments This is a very primitive approach basically not matching the criteria we set for intelligence and worse than that, used in the game. Surprisingly this type of technique was actually used in ELIZA which will be considered later in the course. Method 2 1.8.2.7 Data Structures A structure called English consists of a dictionary, grammar and some semantics about the vocabulary we are likely to come across. This data structure provides the knowledge to convert English text into a storable internal form and also to convert 22 Self-Instructional Material the response back into English. The structured representation of the text is a processed form and defines the context of the input text by making explicit all references such as pronouns. There are three types of such knowledge representation systems: production rules of the form ‘if x then y’, slot and filler systems and statements in mathematical logic. The system used here will be the slot and filler system. Take, for example sentence: What is Artificial Intelligence? NOTES ‘She found a red one she really liked’. Event2 Event2 instance: finding instance: liking tense: past tense: past agent: Rani modifier: much object: Thing1 object: Thing1 Thing1 instance: coat colour: red The question is stored in two forms: as input and in the above form. 1.8.2.8 Algorithm • Convert the question to a structured form using English know how, then use a marker to indicate the substring (like ‘who’ or ‘what’) of the structure, that should be returned as an answer. If a slot and filler system is used a special marker can be placed in more than one slot. • The answer appears by matching this structured form against the structured text. • The structured form is matched against the text and the requested segments of the question are returned. 1.8.2.9 Examples Both questions 1 and 2 generate answers via a new coat and a red coat respectively. Question 3 cannot be answered, because there is no direct response. 1.8.2.10 Comments This approach is more meaningful than the previous one and so is more effective. The extra power given must be paid for by additional search time in the knowledge bases. A warning must be given here: that is – to generate an unambiguous English knowledge base is a complex task and must be left until later in the course. The problems of handling pronouns are difficult. For example: Rani walked up to the salesperson: she asked where the toy department was. Rani walked up to the salesperson: she asked her if she needed any help. Whereas in the original text the linkage of ‘she’ to ‘Rani’ is easy, linkage of ‘she’ in each of the above sentences to Rani and to the salesperson requires additional knowledge about the context via the people in a shop. Self-Instructional Material 23 What is Artificial Intelligence? Method 3 1.8.2.11 Data Structures NOTES World model contains knowledge about objects, actions and situations that are described in the input text. This structure is used to create integrated text from input text. The diagram shows how the system’s knowledge of shopping might be represented and stored. This information is known as a script and in this case is a shopping script. Shopping Script: C - Customer, S - Salesperson Props: M - Merchandize, D - Money-dollars, Location: L - a Store. C (Customer) selects the products to purchase C contacts to S (Salesperson) to know product’s details and payment scheme Materials to be delivered by S after clearing payment Admin Dept contains M (Merchandize), D (MoneyDollars and L (Store Details) to control the whole shopping transaction Refer and instruct to S Marketing Dept keeps C details, such as Name, Address, Phone number and Email This box represents C’s, S’s and Marketing Activities Arrow represents control of Shopping Flow This box represents Admin Dept Activities Fig. 1.1 Diagrammatic Representation of Shopping Script 1.8.2.12 Algorithm 24 Self-Instructional Material Convert the question to a structured form using both the knowledge contained in Method 2 and the World model, generating even more possible structures, since even more knowledge is being used. Sometimes filters are introduced to prune the possible answers. To answer a question, the scheme followed is: Convert the question to a structured form as before but use the world model to resolve any ambiguities that may occur. The structured form is matched against the text and the requested segments of the question are returned. 1.8.2.13 Example What is Artificial Intelligence? NOTES Both questions 1 and 2 generate answers, as in the previous program. Question 3 can now be answered. The shopping script is instantiated and from the last sentence the path through step 14 is the one used to form the representation. ‘M’ is bound to the red coat-got home. ‘Rani buys a red coat’ comes from step 10 and the integrated text generates that she bought a red coat. 1.8.2.14 Comments This program is more powerful than both the previous programs because it has more knowledge. Thus, like the last game program it is exploiting AI techniques. However, we are not yet in a position to handle any English question. The major omission is that of a general reasoning mechanism known as inference to be used when the required answer is not explicitly given in the input text. But this approach can handle, with some modifications, questions of the following form with the answer—Saturday morning Rani went shopping. Her brother tried to call her but she did not answer. Question Why couldn’t Rani’s brother reach her? Answer Because she was not in. This answer is derived because we have supplied an additional fact that a person cannot be in two places at once. This patch is not sufficently general so as to work in all cases and does not provide the type of solution we are really looking for. 1.9 LEVEL OF THE AI MODEL ‘What is our goal in trying to produce programs that do the intelligent things that people do?’ do? Are we trying to produce programs that do the tasks the same way that people OR Are we trying to produce programs that simply do the tasks the easiest way that is possible? Programs in the first class attempt to solve problems that a computer can easily solve and do not usually use AI techniques. AI techniques usually include a search, as no direct method is available, the use of knowledge about the objects involved in the problem area and abstraction on which allows an element of pruning to occur, and to enable a solution to be found in real time; otherwise, the data could explode in size. Examples of these trivial problems in the first class, which are now Self-Instructional Material 25 What is Artificial Intelligence? NOTES of interest only to psychologists are EPAM (Elementary Perceiver and Memorizer) which memorized garbage syllables. The second class of problems attempts to solve problems that are non-trivial for a computer and use AI techniques. We wish to model human performance on these: 1. To test psychological theories of human performance. Ex. PARRY [Colby, 1975] – a program to simulate the conversational behaviour of a paranoid person. 2. To enable computers to understand human reasoning – for example, programs that answer questions based upon newspaper articles indicating human behaviour. 3. To enable people to understand computer reasoning. Some people are reluctant to accept computer results unless they understand the mechanisms involved in arriving at the results. 4. To exploit the knowledge gained by people who are best at gathering information. This persuaded the earlier workers to simulate human behaviour in the SB part of AISB simulated behaviour. Examples of this type of approach led to GPS (General Problem Solver) discussed in detailed later and also to natural language understanding which will be covered in later lessons. 1.10 CRITERIA FOR SUCCESS In any engineering project, we often ask when we will know we have succeeded. In 1950, Alan Turing proposed the Turing Test which determined if a machine could think. The test is conducted by two people, one of whom is the interrogator and the other sits with a computer C in another room. The only communication is by means of typing messages on a simple terminal around 1950. The interrogator is given the task of determining whether a response comes from the person or the computer C. The goal of the machine is to fool the interrogator into believing that the machine is the person and therefore the interrogator will believe the machine can think. The machine can trick the interrogator by giving the wrong answer to 12324 times 73981. Some believe a computer will never pass the Turing Test and be able to maintain the following dialogue due to Turing, which Turing believed that a computer would need to exhibit to pass the test. Recently computers have convinced five people out of ten that the programs loaded inside them were of human intelligence. Also, Chinook, a computer program, gave the draughts world. This is a good example for AI technique and it defines three important AI techniques: 26 Self-Instructional Material Search What is Artificial Intelligence? A search program finds a solution for a problem by trying various sequences of actions or operators until a solution is found. Advantages NOTES To solve a problem using search, it is only necessary to code the operators that can be used; the search will find the sequence of actions that will provide the desired result. For example, a program can be written to play chess using search if one knows the rules of chess; it is not necessary to know how to play good chess. Disadvantages Most problems have search spaces so large that it is impossible to search the whole space. Chess has been estimated to have 10120 possible games. The rapid growth of combinations of possible moves is called the combinatoric explosion problem. Use of knowledge Use of knowledge provides a way of solving complex problems by exploiting the structures of the objects that are concerned. Abstraction Abstraction provides a way of separating important features and variations from the many unimportant ones that would otherwise overwhelm any process. 1.11 CONCLUSION We live in an era of rapid change, moving towards the information or knowledge or network society. We require seamless, easy-to-use, high quality, reasonable communications between people and machines, anywhere and anytime. The creation of intelligence in a machine has been a long cherished desire to replicate the functionality of the human mind. Intelligent information and communications technology (IICT) emulates and employs some aspects of human intelligence in performing a task. The IICT based systems include sensors, computers, knowledge-based software, human-machine interfaces and other devices. IICT enabled machines and devices anticipate requirements and deal with environments that are complex, unknown, unpredictable and bring the power of computing technology into our daily lives and business practices. Intelligent systems were first developed for use in traditional industries, such as manufacturing, mining, etc., enabling the automation of routine or dangerous tasks to improve productivity and quality. Today, an intelligent systems application exists virtually in all sectors where they deliver social as well as economic benefits. Post-World War II, a number of people started working on intelligent machines. The English mathematician Alan Turing has been the first person who gave a lecture in 1947 that AI was best researched by programming computers rather than by building machines. By the late 1950s, there were many researchers on AI and most of their work based on programming computers. Daniel Dennett’s book Brainchildren has an excellent explanation about the Turing test and various partial Turing tests have been implemented, i.e. with restrictions on the observer’s knowledge of AI and the subject matter of questioning. Self-Instructional Material 27 What is Artificial Intelligence? NOTES It turns out that some people are easily led into believing that a rather dumb program is intelligent. Alan Turing’s article ‘Computing Machinery and Intelligence’ in 1950 mentioned conditions for considering a machine to be intelligent. He argued that if the machine could successfully pretend to be a knowledgeable observer like a human, then we would certainly consider it to be intelligent. This test would satisfy most of the people but not all philosophers. The observer could interact with the machine and a human by teletype (to avoid the machine from imitating the appearance or voice of the person), and the human would try to plead with the observer that it was human and the machine would try to fool the observer. The Turing test is a one-sided test. A machine that passes the test should certainly be considered intelligent, but a machine could still be considered intelligent without knowing enough about humans or to imitate a human. 1.12 SUMMARY In this unit, you have learned about the various concepts related to AI, beginning with its basics. It was in late 1955 that Newell and Simon developed The Logic Theorist, considered by many to be the first AI program. The program, representing each problem as a tree model, would attempt to solve it by selecting the branch that would most likely result in the correct conclusion. You have also learned the various definitions of AI according to different authors. Most of these definitions take a very technical direction and avoid philosophical problems connected with the idea that AI’s purpose is to construct an artificial human. AI seeks to understand the computations required from intelligent behaviour and to produce computer systems that exhibit intelligence. This unit also explained in detail, the applications of AI which mean problem solving, search and control strategies, speech recognition, natural language understanding, computer vision, expert systems, etc. Aspects of intelligence studied by AI include perception, communicational using human languages, reasoning, planning, learning and memory. Another important topic that you have learned is AI techniques. The later depict how we represent, manipulate and reason with knowledge in order to solve problems. Knowledge is a collection of ‘facts’. To manipulate these facts by a program, a suitable representation is required. A good representation facilitates problem solving. The level of the AI model and the criteria for success has been also discussed. 1.13 KEY TERMS • Web 2.0: It is the second generation of the World Wide Web, focused on the ability for people to collaborate and share information online. It is dynamic, serves applications to users and offers open communications. • MANET concept: A mobile ad-hoc network is a collection of wireless nodes that can dynamically be set up anywhere and anytime without using any existing network infrastructure. 28 Self-Instructional Material • Software defined radio: It is a radio communication system where components that have typically been in hardware (e.g., mixers, filters, amplifiers, modulators/demodulators, detectors, etc.) are implemented using software. • Artificial intelligence: It is the branch of computer science that is concerned with the automation of intellectual performance. What is Artificial Intelligence? NOTES • AI techniques: These depict how we represent, manipulate and reason with knowledge in order to solve problems. Knowledge is a collection of ‘facts’. 1.14 ANSWERS TO ‘CHECK YOUR PROGRESS’ 1. Shannon 2. True 3. True 4. Aspects of intelligence studied by AI 5. Branches 6. Applications 1.15 QUESTIONS AND EXERCISES 1. Explain the Turing test to find out whether a machine is intelligent or not. 2. What would convince you that a given machine is truly intelligent? 3. What is intelligence? How do we measure it? Are these measurements useful? 4. When the temperature falls and the thermostat turns the heater on, does it act because it believes the room to be too cold? Does it feel cold? What sorts of things can have beliefs or feelings? Is this related to the idea of consciousness? 5. Some people believe that the relationship between your mind (a non-physical thing) and your brain (the physical thing inside your skull) is exactly like the relationship between a computational process (a non-physical thing) and a physical computer. Do you agree? 6. How good are machines at playing chess? If a machine can consistently beat all the best human chess players, does this prove that the machine is intelligent? 1.16 FURTHER READING 1. www.mm.iit.uni-miskolc.hu 2. www.cs.washington.edu 3. www.cs.sonoma.edu 4. www.littera.deusto.es 5. www.cs.waikato.ac.nz 6. www.en.wikibooks.org 7. www.deri.net 8. www.cs.wise.edu 9. www.cs.bham.ac.uk 10. www.mediateam.oulu.fi 11. www.indastudychannel.com 12. www.isoc.org 13. www.lk.cs.ucla.edu 14. www.precarn.ca Self-Instructional Material 29 What is Artificial Intelligence? NOTES 30 Self-Instructional Material 15. www.cotsjournalonline.com 16. www.jhu-oei.com 17. www.webopedia.com 18. www.answers.com 19. www.en.wikipedia.org 20. www.genetic-programming.com 21. www.cc.gatech.com 22. www.aamhomeschool.org 23. www.etd.gsu.edu 24. www.ebiguity.umbc.edu 25. www.cs.utexas.edu 26. www.lutter1.de 27. www.etd.gsu.edu 28. www.techiwarehouse.com 29. www.uli.se 30. www.cs.ou.edu 31. www.aamhomeschool.org 32. www.a-website.org 33. www.moloth.com 34. www.sir.com UNIT 2 PROBLEMS, PROBLEM SPACES AND SEARCH Problems, Problem Spaces and Search NOTES Structure 2.0 Introduction 2.1 Unit Objectives 2.2 Problem of Building a System 2.2.1 AI - General Problem Solving 2.3 Defining Problem in a State Space Search 2.3.1 State Space Search 2.3.2 The Water Jug Problem 2.4 Production Systems 2.4.1 Control Strategies 2.4.2 Search Algorithms 2.4.3 Heuristics 2.5 Problem Characteristics 2.5.1 2.5.2 2.5.3 2.5.4 Problem Decomposition Can Solution Steps be Ignored? Is the Problem Universe Predictable? Is Good Solution Absolute or Relative? 2.6 Characteristics of Production Systems 2.7 Design of Search Programs and Solutions 2.7.1 Design of Search Programs 2.7.2 Additional Problems 2.8 2.9 2.10 2.11 2.12 2.0 Summary Key Terms Answers to ‘Check Your Progress’ Questions and Exercises Further Reading INTRODUCTION In the preceding unit, you learnt about the basic concepts and applications of AI. In this unit, you will learn about problems, their definitions, characteristics and design of search programs. You will learn to define problems in a state space research. A state space represents a problem in terms of states and operators that change states. You will also learn about problem solving which is a process of generating solutions from observed or given data. This unit also examines production systems. Production systems provide appropriate structures for performing and describing search processes. Other important topics discussed in this unit are—problem characteristics and production systems’ characteristics. The latter provide us with a good way of describing the operations that can be performed in a search for a solution to the problem. Finally, you will also learn about design of search programs and solutions. 2.1 UNIT OBJECTIVES After going through this unit, you will be able to: • Understand the problem of building a system • Define problems in a state space search Self-Instructional Material 31 Problems, Problem Spaces and Search • Discuss production systems • Explain characteristics of problems • Describe production systems • Design search programs and solutions NOTES 2.2 PROBLEM OF BUILDING A SYSTEM To solve the problem of building a system you should take the following steps: 1. Define the problem accurately including detailed specifications and what constitutes a suitable solution. 2. Scrutinize the problem carefully, for some features may have a central affect on the chosen method of solution. 3. Segregate and represent the background knowledge needed in the solution of the problem. 4. Choose the best solving techniques for the problem to solve a solution. Problem solving is a process of generating solutions from observed data. • a ‘problem’ is characterized by a set of goals, • a set of objects, and • a set of operations. These could be ill-defined and may evolve during problem solving. • A ‘problem space’ is an abstract space. ■ A problem space encompasses all valid states that can be generated by the application of any combination of operators on any combination of objects. ■ The problem space may contain one or more solutions. A solution is a combination of operations and objects that achieve the goals. • A ‘search’ refers to the search for a solution in a problem space. ■ Search proceeds with different types of ‘search control strategies’. ■ The depth-first search and breadth-first search are the two common search strategies. 2.2.1 AI - General Problem Solving Problem solving has been the key area of concern for Artificial Intelligence. Problem solving is a process of generating solutions from observed or given data. It is however not always possible to use direct methods (i.e. go directly from data to solution). Instead, problem solving often needs to use indirect or modelbased methods. General Problem Solver (GPS) was a computer program created in 1957 by Simon and Newell to build a universal problem solver machine. GPS was based on Simon and Newell’s theoretical work on logic machines. GPS in principle can solve any formalized symbolic problem, such as theorems proof and geometric problems and chess playing. 32 Self-Instructional Material GPS solved many simple problems, such as the Towers of Hanoi, that could be sufficiently formalized, but GPS could not solve any real-world problems. To build a system to solve a particular problem, we need to: Problems, Problem Spaces and Search • Define the problem precisely – find input situations as well as final situations for an acceptable solution to the problem • Analyse the problem – find few important features that may have impact on the appropriateness of various possible techniques for solving the problem NOTES • Isolate and represent task knowledge necessary to solve the problem • Choose the best problem-solving technique(s) and apply to the particular problem 2.2.1.1 Problem definitions A problem is defined by its ‘elements’ and their ‘relations’. To provide a formal description of a problem, we need to do the following: a. Define a state space that contains all the possible configurations of the relevant objects, including some impossible ones. b. Specify one or more states that describe possible situations, from which the problem-solving process may start. These states are called initial states. c. Specify one or more states that would be acceptable solution to the problem. These states are called goal states. Specify a set of rules that describe the actions (operators) available. The problem can then be solved by using the rules, in combination with an appropriate control strategy, to move through the problem space until a path from an initial state to a goal state is found. This process is known as ‘search’. Thus: • Search is fundamental to the problem-solving process. • Search is a general mechanism that can be used when a more direct method is not known. • Search provides the framework into which more direct methods for solving subparts of a problem can be embedded. A very large number of AI problems are formulated as search problems. • Problem space A problem space is represented by a directed graph, where nodes represent search state and paths represent the operators applied to change the state. To simplify search algorithms, it is often convenient to logically and programmatically represent a problem space as a tree. A tree usually decreases the complexity of a search at a cost. Here, the cost is due to duplicating some nodes on the tree that were linked numerous times in the graph, e.g. node B and node D. A tree is a graph in which any two vertices are connected by exactly one path. Alternatively, any connected graph with no cycles is a tree. Self-Instructional Material 33 Problems, Problem Spaces and Search Graph Trees NOTES Fig. 2.1 Graph and Tree • Problem solving The term, Problem Solving relates to analysis in AI. Problem solving may be characterized as a systematic search through a range of possible actions to reach some predefined goal or solution. Problem-solving methods are categorized as special purpose and general purpose. • A special-purpose method is tailor-made for a particular problem, often exploits very specific features of the situation in which the problem is embedded. • A general-purpose method is applicable to a wide variety of problems. One General-purpose technique used in AI is ‘means-end analysis’: a step-bystep, or incremental, reduction of the difference between current state and final goal. 2.3 DEFINING PROBLEM AS A STATE SPACE SEARCH To solve the problem of playing a game, we require the rules of the game and targets for winning as well as representing positions in the game. The opening position can be defined as the initial state and a winning position as a goal state. Moves from initial state to other states leading to the goal state follow legally. However, the rules are far too abundant in most games— especially in chess, where they exceed the number of particles in the universe. Thus, the rules cannot be supplied accurately and computer programs cannot handle easily. The storage also presents another problem but searching can be achieved by hashing. The number of rules that are used must be minimized and the set can be created by expressing each rule in a form as possible. The representation of games leads to a state space representation and it is common for well-organized games 34 Self-Instructional Material with some structure. This representation allows for the formal definition of a problem that needs the movement from a set of initial positions to one of a set of target positions. It means that the solution involves using known techniques and a systematic search. This is quite a common method in Artificial Intelligence. Problems, Problem Spaces and Search NOTES 2.3.1 State Space Search A state space represents a problem in terms of states and operators that change states. A state space consists of: • A representation of the states the system can be in. For example, in a board game, the board represents the current state of the game. • A set of operators that can change one state into another state. In a board game, the operators are the legal moves from any given state. Often the operators are represented as programs that change a state representation to represent the new state. • An initial state. • A set of final states; some of these may be desirable, others undesirable. This set is often represented implicitly by a program that detects terminal states. 2.3.2 The Water Jug Problem In this problem, we use two jugs called four and three; four holds a maximum of four gallons of water and three a maximum of three gallons of water. How can we get two gallons of water in the four jug? The state space is a set of prearranged pairs giving the number of gallons of water in the pair of jugs at any time, i.e., (four, three) where four = 0, 1, 2, 3 or 4 and three = 0, 1, 2 or 3. The start state is (0, 0) and the goal state is (2, n) where n may be any but it is limited to three holding from 0 to 3 gallons of water or empty. Three and four shows the name and numerical number shows the amount of water in jugs for solving the water jug problem. The major production rules for solving this problem are shown below: Initial condition Goal comment 1. (four, three) if four < 4 (4, three) fill four from tap 2. (four, three) if three< 3 (four, 3) fill three from tap 3. (four, three) If four > 0 (0, three) empty four into drain 4. (four, three) if three > 0 (four, 0) empty three into drain 5. (four, three) if four + three<4 (four + three, 0) empty three into four 6. (four, three) if four + three<3 (0, four + three) empty four into three 7. (0, three) If three > 0 (three, 0) empty three into four 8. (four, 0) if four > 0 (0, four) empty four into three 9. (0, 2) (2, 0) empty three into four 10. (2, 0) (0, 2) empty four into three Self-Instructional Material 35 Problems, Problem Spaces and Search 11. (four, three) if four < 4 (4, three-diff) pour diff, 4-four, into four from three 12. (three, four) if three < 3 (four-diff, 3) pour diff, 3-three, into three from four and a solution is given below four three rule NOTES Fig. 2.2 Production Rules for the Water Jug Problem Gallons in Four Jug Gallons in Three Jug Rules Applied 0 0 - 0 3 2 3 0 7 3 3 2 4 2 11 0 2 3 2 0 10 Fig. 2.3 One Solution to the Water Jug Problem The problem solved by using the production rules in combination with an appropriate control strategy, moving through the problem space until a path from an initial state to a goal state is found. In this problem solving process, search is the fundamental concept. For simple problems it is easier to achieve this goal by hand but there will be cases where this is far too difficult. 2.4 PRODUCTION SYSTEMS Production systems provide appropriate structures for performing and describing search processes. A production system has four basic components as enumerated below. • A set of rules each consisting of a left side that determines the applicability of the rule and a right side that describes the operation to be performed if the rule is applied. • A database of current facts established during the process of inference. • A control strategy that specifies the order in which the rules will be compared with facts in the database and also specifies how to resolve conflicts in selection of several rules or selection of more facts. • A rule firing module. The production rules operate on the knowledge database. Each rule has a precondition—that is, either satisfied or not by the knowledge database. If the precondition is satisfied, the rule can be applied. Application of the rule changes the knowledge database. The control system chooses which applicable rule should be applied and ceases computation when a termination condition on the knowledge database is satisfied. 36 Self-Instructional Material Example: Eight puzzle (8-Puzzle) Problems, Problem Spaces and Search The 8-puzzle is a 3 × 3 array containing eight square pieces, numbered 1 through 8, and one empty space. A piece can be moved horizontally or vertically into the empty space, in effect exchanging the positions of the piece and the empty space. There are four possible moves, UP (move the blank space up), DOWN, LEFT and RIGHT. The aim of the game is to make a sequence of moves that will convert the board from the start state into the goal state: NOTES 2 3 4 1 8 6 2 8 5 7 7 Initial State 2 3 4 6 5 Goal State This example can be solved by the operator sequence UP, RIGHT, UP, LEFT, DOWN. Example: Missionaries and Cannibals The Missionaries and Cannibals problem illustrates the use of state space search for planning under constraints: Three missionaries and three cannibals wish to cross a river using a twoperson boat. If at any time the cannibals outnumber the missionaries on either side of the river, they will eat the missionaries. How can a sequence of boat trips be performed that will get everyone to the other side of the river without any missionaries being eaten? State representation: 1. BOAT position: original (T) or final (NIL) side of the river. 2. Number of Missionaries and Cannibals on the original side of the river. 3. Start is (T 3 3); Goal is (NIL 0 0). Operators: (MM 2 0) Two Missionaries cross the river. (MC 1 1) One Missionary and one Cannibal. (CC 0 2) Two Cannibals. (M 1 0) One Missionary. (C 0 1) One Cannibal. Self-Instructional Material 37 Problems, Problem Spaces and Search Missionaries/Cannibals Search Graph Missionaries on Left Cannibals on Left Boat Position NOTES 3 3 0 CC MC 3 1 1 2 2 1 M C 3 2 0 CC 3 0 1 C 3 1 0 MM 1 1 1 MC 2 2 0 MM 0 2 1 0 3 0 0 1 1 CC C M 0 1 1 0 MC 2 0 CC 0 0 1 2.4.1 Control Strategies The word ‘search’ refers to the search for a solution in a problem space. • Search proceeds with different types of ‘search control strategies’. • A strategy is defined by picking the order in which the nodes expand. The Search strategies are evaluated along the following dimensions: Completeness, Time complexity, Space complexity, Optimality (the search- related terms are first explained, and then the search algorithms and control strategies are illustrated next). Search-related terms • Algorithm’s performance and complexity Ideally we want a common measure so that we can compare approaches in order to select the most appropriate algorithm for a given situation. ■ Performance of an algorithm depends on internal and external factors. Internal factors/ External factors 38 Self-Instructional Material Time required to run ❑ Size of input to the algorithm ❑ Space (memory) required to run ❑ Speed of the computer ❑ Quality of the compiler Complexity is a measure of the performance of an algorithm. Complexity measures the internal factors, usually in time than space. ❑ ■ Problems, Problem Spaces and Search NOTES • Computational complexity It is the measure of resources in terms of Time and Space. ■ If A is an algorithm that solves a decision problem f, then run-time of A is the number of steps taken on the input of length n. ■ Time Complexity T(n) of a decision problem f is the run-time of the ‘best’ algorithm A for f. ■ Space Complexity S(n) of a decision problem f is the amount of memory used by the ‘best’ algorithm A for f. • ‘Big - O’ notation The Big-O, theoretical measure of the execution of an algorithm, usually indicates the time or the memory needed, given the problem size n, which is usually the number of items. • Big-O notation The Big-O notation is used to give an approximation to the run-time- efficiency of an algorithm; the letter ‘O’ is for order of magnitude of operations or space at run-time. • The Big-O of an Algorithm A ■ If an algorithm A requires time proportional to f(n), then algorithm A is said to be of order f(n), and it is denoted as O(f(n)). ■ If algorithm A requires time proportional to n2, then the order of the algorithm is said to be O(n2). ■ If algorithm A requires time proportional to n, then the order of the algorithm is said to be O(n). The function f(n) is called the algorithm’s growth-rate function. In other words, if an algorithm has performance complexity O(n), this means that the run-time t should be directly proportional to n, ie t • n or t = k n where k is constant of proportionality. Similarly, for algorithms having performance complexity O(log2(n)), O(log N), O(N log N), O(2N) and so on. Example 1: Determine the Big-O of an algorithm: Calculate the sum of the n elements in an integer array a[0..n-1]. Line no. Instructions No of execution steps line 1 sum 1 line 2 for (i = 0; i < n; i++) n+1 line 3 sum += a[i] n line 4 print sum 1 Total 2n + 3 Self-Instructional Material 39 Problems, Problem Spaces and Search NOTES Thus, the polynomial (2n + 3) is dominated by the 1st term as n while the number of elements in the array becomes very large. • In determining the Big-O, ignore constants such as 2 and 3. So the algorithm is of order n. • So the Big-O of the algorithm is O(n). • In other words the run-time of this algorithm increases roughly as the size of the input data n, e.g., an array of size n. Example 2: Determine the Big-O of an algorithm for finding the largest element in a square 2D array a[0 . . . . . n-1] [0 . . . . . n-1] Line no Instructions No of execution line 1 max = a[0][0] 1 line 2 for (row = 0; row < n; row++) line 3 for (col = 0; col < n; col++) line 4 if (a[row][col] > max) max = a[row][col]. line 5 print max n+1 n*(n+1) n*(n) 1 2n2 + 2n + 3 Total Thus, the polynomial (2n2 + 2n + 3) is dominated by the 1st term as n2 while the number of elements in the array becomes very large. • In determining the Big-O, ignore constants such as 2, 2 and 3. So the algorithm is of order n2. • The Big-O of the algorithm is O(n2). • In other words, the run-time of this algorithm increases roughly as the square of the size of the input data which is n2, e.g., an array of size n x n. Example 3 : Polynomial in n with degree k. The number of steps needed to carry out an algorithm is expressed as f(n) = ak nk + ak-1 nk-1 + ... + a1 n1 + a0 Then f(n) is a polynomial in n with degree k and f(n) ∈ O(nk). • To obtain the order of a polynomial function, use the term, which is of the highest degree and disregard the constants and the terms which are of lower degrees. The Big-O of the algorithm is O(nk). In other words, the run-time of this algorithm increases exponentially. Example of Growth rates variation: Problem: If an algorithm requires 1 second run-time for a problem of size 8, find the run-time for that algorithm for the problem of size 16? then Solutions: If the Order of the algorithm is O(f(n)) then the calculated execution time T (n) of the algorithm as problem size increases are as below. O(f(n)) O(1) 40 Self-Instructional Material Run-time T(n) required as problem size increases ⇒ T(n) = 1 second ; Algorithm is constant time, and independent of the Problems, Problem Spaces and Search size of the problem. O(log2n) ⇒ T(n) = (1*log216) / log28 = 4/3 seconds ; NOTES Algorithm is of logarithmic time, increases slowly with the size of the problem. O(n) ⇒ T(n) = (1*16) / 8 = 2 seconds Algorithm is linear time, increases directly with the size of the problem. O(n*log2n) ⇒ T(n) = (1*16*log216) / 8*log28 = 8/3 seconds Algorithm is of log-linear time, increases more rapidly than a linear algorithm. O(n2) ⇒ T(n) = (1*162) / 82 = 4 seconds Algorithm is quadratic time, increases rapidly with the size of the problem. O(n3) ⇒ T(n) = (1*163) / 83 = 8 seconds Algorithm is cubic time increases more rapidly than quadratic algorithm with the size of the problem. O(2n) ⇒ T(n) = (1*216) / 28 = 28 seconds = 256 seconds Algorithm is exponential time, increases too rapidly to be practical. Tree structure Tree is a way of organizing objects, related in a hierarchical fashion. • Tree is a type of data structure in which each element is attached to one or more elements directly beneath it. • The connections between elements are called branches. • Tree is often called inverted trees because it is drawn with the root at the top. • The elements that have no elements below them are called leaves. • A binary tree is a special type: each element has only two branches below it. Properties • Tree is a special case of a graph. • The topmost node in a tree is called the root node. • At root node all operations on the tree begin. • A node has at most one parent. • The topmost node (root node) has no parents. Self-Instructional Material 41 Problems, Problem Spaces and Search • Each node has zero or more child nodes, which are below it . • The nodes at the bottommost level of the tree are called leaf nodes. Since leaf nodes are at the bottom most level, they do not have children. NOTES • A node that has a child is called the child’s parent node. • The depth of a node n is the length of the path from the root to the node. • The root node is at depth zero. • Stacks and Queues The Stacks and Queues are data structures that maintain the order of last-in, first-out and first-in, first-out respectively. Both stacks and queues are often implemented as linked lists, but that is not the only possible implementation. Stack - Last In First Out (LIFO) lists ■ An ordered list; a sequence of items, piled one on top of the other. ■ The insertions and deletions are made at one end only, called Top. ■ If Stack S = (a[1], a[2], . . . . a[n]) then a[1] is bottom most element ■ Any intermediate element (a[i]) is on top of element a[i-1], 1 < i <= n. ■ In Stack all operation take place on Top. The Pop operation removes item from top of the stack. The Push operation adds an item on top of the stack. Queue - First In First Out (FIFO) lists • An ordered list; a sequence of items; there are restrictions about how items can be added to and removed from the list. A queue has two ends. • All insertions (enqueue ) take place at one end, called Rear or Back • All deletions (dequeue) take place at other end, called Front. • If Queue has a[n] as rear element then a[i+1] is behind a[i] , 1 < i <= n. • All operation takes place at one end of queue or the other. The Dequeue operation removes the item at Front of the queue. The Enqueue operation adds an item to the Rear of the queue. Search Search is the systematic examination of states to find path from the start / root state to the goal state. • Search usually results from a lack of knowledge. • Search explores knowledge alternatives to arrive at the best answer. • Search algorithm output is a solution, that is, a path from the initial state to a state that satisfies the goal test. For general-purpose problem-solving – ‘Search’ is an approach. • Search deals with finding nodes having certain properties in a graph that represents search space. • Search methods explore the search space ‘intelligently’, evaluating possibilities without investigating every single possibility. 42 Self-Instructional Material Problems, Problem Spaces and Search Examples: • For a Robot this might consist of PICKUP, PUTDOWN, MOVEFORWARD, MOVEBACK, MOVELEFT, and MOVERIGHT—until the goal is reached. • Puzzles and Games have explicit rules: e.g., the ‘Tower of Hanoi’ puzzle. (a) Start NOTES (b) Final Fig. 2.4 Tower of Hanoi Puzzle • This puzzle involves a set of rings of different sizes that can be placed on three different pegs. • The puzzle starts with the rings arranged as shown in Figure 2.4(a) • The goal of this puzzle is to move them all as to Figure 2.4(b) • Condition: Only the top ring on a peg can be moved, and it may only be placed on a smaller ring, or on an empty peg. In this Tower of Hanoi puzzle: • Situations encountered while solving the problem are described as states. • Set of all possible configurations of rings on the pegs is called ‘problem space’. • States A State is a representation of elements in a given moment. A problem is defined by its elements and their relations. At each instant of a problem, the elements have specific descriptors and relations; the descriptors indicate how to select elements? Among all possible states, there are two special states called: ■ Initial state – the start point ■ Final state – the goal state • State Change: Successor Function A ‘successor function’ is needed for state change. The Successor Function moves one state to another state. Successor Function: ■ It is a description of possible actions; a set of operators. ■ It is a transformation function on a state representation, which converts that state into another state. ■ It defines a relation of accessibility among states. ■ It represents the conditions of applicability of a state and corresponding transformation function. Check Your Progress 1. What does a search refer to? 2. Problem solving is a process of generating solutions from observed or given data. (True or False) 3. What are the two characteristics by which a problem is defined? Self-Instructional Material 43 Problems, Problem Spaces and Search NOTES • State space A state space is the set of all states reachable from the initial state. ■ A state space forms a graph (or map) in which the nodes are states and the arcs between nodes are actions. ■ In a state space, a path is a sequence of states connected by a sequence of actions. ■ The solution of a problem is part of the map formed by the state space. • Structure of a state space The structures of a state space are trees and graphs. ■ A tree is a hierarchical structure in a graphical form. ■ A graph is a non-hierarchical structure. • A tree has only one path to a given node; i.e., a tree has one and only one path from any point to any other point. • A graph consists of a set of nodes (vertices) and a set of edges (arcs). Arcs establish relationships (connections) between the nodes; i.e., a graph has several paths to a given node. • The Operators are directed arcs between nodes. A search process explores the state space. In the worst case, the search explores all possible paths between the initial state and the goal state. • Problem solution In the state space, a solution is a path from the initial state to a goal state or, sometimes, just a goal state. ■ A solution cost function assigns a numeric cost to each path; it also gives the cost of applying the operators to the states. ■ A solution quality is measured by the path cost function; and an optimal solution has the lowest path cost among all solutions. ■ The solutions can be any or optimal or all. ■ The importance of cost depends on the problem and the type of solution asked. • Problem description A problem consists of the description of: ■ The current state of the world, ■ The actions that can transform one state of the world into another, ■ The desired state of the world. The following action one taken to describe the problem: ❑ State space is defined explicitly or implicitly A state space should describe everything that is needed to solve a problem and nothing that is not needed to solve the problem. ❑ Initial state is start state ❑ Goal state is the conditions it has to fulfill. The description by a disered state may be complete or partial. ❑ Operators are to change state ❑ Operators do actions that can transform one state into another; ❑ Operators consist of: Preconditions and Instructions; 44 Self-Instructional Material ■ ■ ■ ■ Preconditions provide partial description of the state of the world that must be true in order to perform the action, and Instructions tell the user how to create the next state. Operators should be as general as possible, so as to reduce their number. Elements of the domain has relevance to the problem ❑ Knowledge of the starting point. Problem solving is finding a solution ❑ Find an ordered sequence of operators that transform the current (start) state into a goal state. Restrictions are solution quality any, optimal, or all ❑ Finding the shortest sequence, or ❑ finding the least expensive sequence defining cost, or ❑ finding any sequence as quickly as possible. Problems, Problem Spaces and Search NOTES This can also be explained with the help of algebraic function as given below. Algebraic Function A function may take the form of a set of ordered pair, a graph or a equation. Regardless of the form it takes, a function must obey the condition that, no two of its ordered pairs have the same first member with different second members. Relation: A set of ordered pair of the form (x, y) is called a relation. Function: A relation in which no two ordered pairs have the same x-value but different y-value is called a function. Functions are usually named by lower-case letters such as f, g, and h. For example, f = {(-3, 9), (0, 0), (3, 9)}, g = {(4, -2), (2, 2)} Here f is a function, g is not a function. Domain and Range: The domain of a function is the set of all the first members of its ordered pairs, and the range of a function is the set of all second members of its ordered pairs. if function f = {(a, A), (b, B), (c, C)},then its domain is {a, b, c } and its range is {A, B, C}. Function and mapping: A function may be viewed as a mapping or a pairing of one set with elements of a second set such that each element of the first set (called domain) is paired with exactly one element of the second set ( called co domain) . e.g., if a function f maps {a, b, c} into {A, B, C, D} such that a ’! A (read ‘ a is mapped into A’), b ’! B, c ’! C then the domain is {a, b, c} and the co domain is {A, B, C, D}. Since a is paired with A in co domain, A is called the image of a. Each element of the co domain that corresponds to an element of the domain is called the image of that element. The set of image points, {A, B, C}, is called the range. Thus, the range is a subset of the co domain. Onto mappings: Set A is mapped onto set B if each element of set B is image of an element of a set A. Thus, every function maps its domain onto its range. Describing a function by an equation: The rule by which each x-value gets paired with the corresponding y-value may be specified by an equation. For example, the Self-Instructional Material 45 Problems, Problem Spaces and Search function described by the equation y = x + 1 requires that for any choice of x in the domain, the corresponding range value is x + 1. Thus, 2 ’! 3, 3 ’! 4, and 4 ’! 5. NOTES Restricting domains of functions: Unless otherwise indicated, the domain of a function is assumed to be the largest possible set of real numbers. Thus: The domain of y = x / (x2 - 4) is the set of all real numbers except ± 2 since for these values of x the denominator is 0. The domain of y = (x - 1)1/2 is the set of real numbers greater than or equal to 1 since for any value of x less than 1, the root radical has a negative radicand so the radical does not represent a real number. Example: Find which of the relation describe function? (a) y = x1/2 , (b) y = x3 , (c) y > x , (d) x = y2 Equations (a) and (b) produce exactly one value of y for each value of x. Hence, equations (a) and (b) describe functions. The equation (c), y > x does not represent a function since it contains ordered pair such as (1, 2) and (1, 3) where same value of x is paired with different values of y. The equation (d), x = y2 is not a function since ordered pair such as (4, 2) and (4, 2) satisfy the equation but have the same value of x paired with different values of y. Function notation: For any function f, the value of y that corresponds to a given value of x is denoted by f(x). If y = 5x -1, then f (2), read as ‘f of 2’, represents the value of y. When x = 2, then f(2) = 5 . 2 - 1 = 9; when x = 3, then f(3) = 5 . 3 - 1 = 14; In an equation that describes function f, then f(x) may be used in place of y, for example, f(x) = 5x -1. If y = f(x), then y is said to be a function of x. Since the value of y depends on the value of x, y is called the dependent variable and x is called the independent variable. 2.4.2 Search Algorithms Many traditional search algorithms are used in AI applications. For complex problems, the traditional algorithms are unable to find the solutions within some practical time and space limits. Consequently, many special techniques are developed, using heuristic functions. The algorithms that use heuristic functions are called heuristic algorithms. • Heuristic algorithms are not really intelligent; they appear to be intelligent because they achieve better performance. • Heuristic algorithms are more efficient because they take advantage of feedback from the data to direct the search path. • Uninformed search algorithms or Brute-force algorithms, search through the search space all possible candidates for the solution checking whether each candidate satisfies the problem’s statement. 46 Self-Instructional Material • Informed search algorithms use heuristic functions that are specific to the problem, apply them to guide the search through the search space to try to reduce the amount of time spent in searching. A good heuristic will make an informed search dramatically outperform any uninformed search: for example, the Traveling Salesman Problem (TSP), where the goal is to find is a good solution instead of finding the best solution. In such problems, the search proceeds using current information about the problem to predict which path is closer to the goal and follow it, although it does not always guarantee to find the best possible solution. Such techniques help in finding a solution within reasonable time and space (memory). Some prominent intelligent search algorithms are stated below: 1. Generate and Test Search 4. A* Search 2. Best-first Search 5. Constraint Search 3. Greedy Search 6. Means-ends analysis Problems, Problem Spaces and Search NOTES There are some more algorithms. They are either improvements or combinations of these. • Hierarchical Representation of Search Algorithms: A Hierarchical representation of most search algorithms is illustrated below. The representation begins with two types of search: • Uninformed Search: Also called blind, exhaustive or brute-force search, it uses no information about the problem to guide the search and therefore may not be very efficient. • Informed Search: Also called heuristic or intelligent search, this uses information about the problem to guide the search—usually guesses the distance to a goal state and is therefore efficient, but the search may not be always possible. Search Algorithms G(State, Operator, Cost) User Heuristics h(n) No Heuristics Informed Search Uninformed Search Priority Queue: g(n) LIFO Stack FIFO Queue Depth First Search DFS Breadth First Search BFS Imposed Fixed Depth Limit Depth Limited Search Cost First Search Generate and Test Hill Climbing Priority Queue = h(n) Best First Search Problem Reduction Constraint Satisfaction Means-end Analysis gradually fixed Priority Queue f(n) = h(n) + g(n) depth limit Iterative Deepening DFS A* Search A0* Search Fig. 2.5 Different Search Algorithms The first requirement is that it causes motion, in a game playing program, it moves on the board and in the water jug problem, filling water is used to fill jugs. It means the control strategies without the motion will never lead to the solution. Self-Instructional Material 47 Problems, Problem Spaces and Search NOTES The second requirement is that it is systematic, that is, it corresponds to the need for global motion as well as for local motion. This is a clear condition that neither would it be rational to fill a jug and empty it repeatedly, nor it would be worthwhile to move a piece round and round on the board in a cyclic way in a game. We shall initially consider two systematic approaches for searching. Searches can be classified by the order in which operators are tried: depth-first, breadth-first, bounded depth-first. Breadth First Depth First 1 2 1 2 4 3 6 3 5 Goes too deep if tree is very large 4 5 6 Requires a lot of storage Bounded Depth First Maximum Depth Boundary Depth is bounded artificially. Low storage requirement 2.4.2.1 Breadth-first search A Search strategy, in which the highest layer of a decision tree is searched completely before proceeding to the next layer is called Breadth-first search (BFS). • In this strategy, no viable solutions are omitted and therefore it is guaranteed that an optimal solution is found. • This strategy is often not feasible when the search space is large. Algorithm 1. Create a variable called LIST and set it to be the starting state. 2. Loop until a goal state is found or LIST is empty, Do a. Remove the first element from the LIST and call it E. If the LIST is empty, quit. b. For every path each rule can match the state E, Do (i) Apply the rule to generate a new state. (ii) If the new state is a goal state, quit and return this state. (iii) Otherwise, add the new state to the end of LIST. 48 Self-Instructional Material Advantages Problems, Problem Spaces and Search 1. Guaranteed to find an optimal solution (in terms of shortest number of steps to reach the goal). 2. Can always find a goal node if one exists (complete). NOTES Disadvantages 1. High storage requirement: exponential with tree depth. 2.4.2.2 Depth-first search A search strategy that extends the current path as far as possible before backtracking to the last choice point and trying the next alternative path is called Depth-first search (DFS). • This strategy does not guarantee that the optimal solution has been found. • In this strategy, search reaches a satisfactory solution more rapidly than breadth first, an advantage when the search space is large. Algorithm Depth-first search applies operators to each newly generated state, trying to drive directly toward the goal. 1. If the starting state is a goal state, quit and return success. 2. Otherwise, do the following until success or failure is signalled: a. Generate a successor E to the starting state. If there are no more successors, then signal failure. b. Call Depth-first Search with E as the starting state. c. If success is returned signal success; otherwise, continue in the loop. Advantages 1. Low storage requirement: linear with tree depth. 2. Easily programmed: function call stack does most of the work of maintaining state of the search. Disadvantages 1. May find a sub-optimal solution (one that is deeper or more costly than the best solution). 2. Incomplete: without a depth bound, may not find a solution even if one exists. 2.4.2.3 Bounded depth-first search Depth-first search can spend much time (perhaps infinite time) exploring a very deep path that does not contain a solution, when a shallow solution exists. An easy way to solve this problem is to put a maximum depth bound on the search. Beyond the depth bound, a failure is generated automatically without exploring any deeper. Problems: 1. It’s hard to guess how deep the solution lies. 2. If the estimated depth is too deep (even by 1) the computer time used is dramatically increased, by a factor of bextra. 3. If the estimated depth is too shallow, the search fails to find a solution; all that computer time is wasted. Self-Instructional Material 49 Problems, Problem Spaces and Search NOTES 2.4.3 Heuristics A heuristic is a method that improves the efficiency of the search process. These are like tour guides. There are good to the level that they may neglect the points in general interesting directions; they are bad to the level that they may neglect points of interest to particular individuals. Some heuristics help in the search process without sacrificing any claims to entirety that the process might previously had. Others may occasionally cause an excellent path to be overlooked. By sacrificing entirety it increases efficiency. Heuristics may not find the best solution every time but guarantee that they find a good solution in a reasonable time. These are particularly useful in solving tough and complex problems, solutions of which would require infinite time, i.e. far longer than a lifetime for the problems which are not solved in any other way. 2.4.3.1 Heuristic search To find a solution in proper time rather than a complete solution in unlimited time we use heuristics. ‘A heuristic function is a function that maps from problem state descriptions to measures of desirability, usually represented as numbers’. Heuristic search methods use knowledge about the problem domain and choose promising operators first. These heuristic search methods use heuristic functions to evaluate the next state towards the goal state. For finding a solution, by using the heuristic technique, one should carry out the following steps: 1. Add domain—specific information to select what is the best path to continue searching along. 2. Define a heuristic function h(n) that estimates the ‘goodness’ of a node n. Specifically, h(n) = estimated cost(or distance) of minimal cost path from n to a goal state. 3. The term, heuristic means ‘serving to aid discovery’ and is an estimate, based on domain specific information that is computable from the current state description of how close we are to a goal. Finding a route from one city to another city is an example of a search problem in which different search orders and the use of heuristic knowledge are easily understood. 1. State: The current city in which the traveller is located. 2. Operators: Roads linking the current city to other cities. 3. Cost Metric: The cost of taking a given road between cities. 4. Heuristic information: The search could be guided by the direction of the goal city from the current city, or we could use airline distance as an estimate of the distance to the goal. Heuristic search techniques For complex problems, the traditional algorithms, presented above, are unable to find the solution within some practical time and space limits. Consequently, many special techniques are developed, using heuristic functions. • Blind search is not always possible, because it requires too much time or Space (memory). 50 Self-Instructional Material • Heuristics are rules of thumb; they do not guarantee a solution to a problem. Problems, Problem Spaces and Search • Heuristic Search is a weak technique but can be effective if applied correctly; it requires domain specific information. Characteristics of heuristic search NOTES • Heuristics are knowledge about domain, which help search and reasoning in its domain. • Heuristic search incorporates domain knowledge to improve efficiency over blind search. • Heuristic is a function that, when applied to a state, returns value as estimated merit of state, with respect to goal. ■ Heuristics might (for reasons) underestimate or overestimate the merit of a state with respect to goal. ■ Heuristics that underestimate are desirable and called admissible. • Heuristic evaluation function estimates likelihood of given state leading to goal state. • Heuristic search function estimates cost from current state to goal, presuming function is efficient. Heuristic search compared with other search The Heuristic search is compared with Brute force or Blind search techniques below: Comparison of Algorithms Brute force / Blind search Heuristic search Can only search what it has knowledge about already Estimates ‘distance’ to goal state through xplored nodes No knowledge about how far a node Guides search process toward goal node from goal state Prefers states (nodes) that lead close to and not away from goal state Example: Travelling salesman A salesman has to visit a list of cities and he must visit each city only once. There are different routes between the cities. The problem is to find the shortest route between the cities so that the salesman visits all the cities at once. Suppose there are N cities, then a solution would be to take N! possible combinations to find the shortest distance to decide the required route. This is not efficient as with N=10 there are 36,28,800 possible routes. This is an example of combinatorial explosion. There are better methods for the solution of such problems: one is called branch and bound. First, generate all the complete paths and find the distance of the first complete path. If the next path is shorter, then save it and proceed this way avoiding the path when its length exceeds the saved shortest path length, although it is better than the previous method. Self-Instructional Material 51 Problems, Problem Spaces and Search NOTES Algorithm: Travelling salesman problem 1. Select a city at random as a starting point 2. Repeat 3. Select the next city from the list of all the cities to be visited and choose the nearest one to the current city, then go to it, 4. until all cities are visited. This produces a significant development and reduces the time from order N! to N[2]. Our goal is to find the shortest route that visits each city exactly once. Suppose the cities to be visited and the distances between them are as shown below: Hyderabad Secunderabad Mumbai Bangalore Chennai Hyderabad - 15 270 780 740 Secunderabad 15 - 280 760 780 Mumbai 270 280 - 340 420 Bangalore 780 760 340 - 770 Chennai 740 780 420 770 - (Distance in Kilometres) One situation is the salesman could start from Hyderabad. In that case, one path might be followed as shown below: Hyderabad 15 Secunderabad 780 Chennai 420 Mumbai 340 Bangalore 780 Hyderabad TOTAL : 2335 52 Self-Instructional Material Here the total distance is 2335 km. But this may not be a solution to the problem, maybe other paths may give the shortest route. It is also possible to create a bound on the error in the answer, but in general it is not possible to make such an error bound. In real problems, the value of a particular solution is trickier to establish, but this problem is easier if it is measured in miles, and other problems have vague measures. Although heuristics can be created for unstructured knowledge, producing coherent analysis is another issue and this means that the solution lacks reliability. Rarely is this an optimal solution, since the required approximations are usually in sufficient. Although heuristic solutions are bad in the worst case, the problem occurs very infrequently. Problems, Problem Spaces and Search Formal Statement: Problem solving is a set of statements describing the desired states expressed in a suitable language; e.g. first-order logic. The solution of many problems (like chess, crosses) can be described by finding a sequence of actions that lead to a desired goal. NOTES • Each action changes the state, and • The aim is to find the sequence of actions that lead from the initial (start) state to a final (goal) state. A well-defined problem can be described by the example given below: Example • Initial State: (S) • Operator or successor function: for any state x , returns s(x), the set of states reachable from x with one action. • State space: all states reachable from the initial one by any sequence of actions. • Path: sequence through state space. • Path cost: function that assigns a cost to a path; cost of a path is the sum of costs of individual actions along the path. • Goal state: (G) • Goal test: test to determine if at goal state. • Search notations Search is the systematic examination of states to find path from the start / root state to the goal state. The notations used for this purpose are given below: • Evaluation function f (n) estimates the least cost solution through node n • Heuristic function h(n) Estimates least cost path from node n to goal node • Cost function g(n) estimates the least cost path from start node to node n f (n) = g(n) + h(n) Estimate Actual Goal Start n h(n) g(n) f(n) Self-Instructional Material 53 Problems, Problem Spaces and Search • The notations ^f, ^g, ^h, are sometimes used to indicate that these values are estimates of f , g , h ^f(n) = ^g(n) + ^h(n) • If h(n) d• actual cost of the shortest path from node n to goal, then h(n) is an underestimate. NOTES AI - Search and Control of Strategies • Estimate cost function g* The estimated least cost path from start node to node n is written as g*(n). • g* is calculated as the actual cost, so far, of the explored path. • g* is known exactly by summing all path costs from start to current state. • If search space is a tree, then g* = g, because there is only one path from start node to current node. • In general, the search space is a graph. • If search space is a graph, then g* e” g, ■ g* can never be less than the cost of the optimal path; it can only over estimate the cost. ■ g* can be equal to g in a graph if chosen properly. • Estimate heuristic function h* Check Your Progress The estimated least cost path from node n to goal node is written h*(n) 4. Production systems provide appropriate structures for performing and describing search processes. (True or False) 5. The Stacks and Queues are data structures that maintain the order of and first-in, firstout and last-in, first-out respectively. (True or False) 6. Heuristic algorithms are more efficient than traditional algorithms. (True or False) 7. What is a Breadthfirst search (BFS)? 8. Heuristics may not find the best solution every time but guarantee that they find a good solution in reasonable time. (True or False) 54 Self-Instructional Material • h* is heuristic information, represents a guess at: ‘How hard it is to reach from current node to goal state ?’. • h* may be estimated using an evaluation function f(n) that measures ‘goodness’ of a node. • h* may have different values; the values lie between 0 d” h*(n) d” h(n); they mean a different search algorithm. • If h* = h, it is a perfect heuristic; means no unnecessary nodes are ever expanded. 2.5 PROBLEM CHARACTERISTICS Heuristics cannot be generalized, as they are domain specific. Production systems provide ideal techniques for representing such heuristics in the form of IF-THEN rules. Most problems requiring simulation of intelligence use heuristic search extensively. Some heuristics are used to define the control structure that guides the search process, as seen in the example described above. But heuristics can also be encoded in the rules to represent the domain knowledge. Since most AI problems make use of knowledge and guided search through the knowledge, AI can be described as the study of techniques for solving exponentially hard problems in polynomial time by exploiting knowledge about problem domain. To use the heuristic search for problem solving, we suggest analysis of the problem for the following considerations: • Decomposability of the problem into a set of independent smaller subproblems • Possibility of undoing solution steps, if they are found to be unwise Problems, Problem Spaces and Search • Predictability of the problem universe • Possibility of obtaining an obvious solution to a problem without comparison of all other possible solutions • Type of the solution: whether it is a state or a path to the goal state NOTES • Role of knowledge in problem solving • Nature of solution process: with or without interacting with the user The general classes of engineering problems such as planning, classification, diagnosis, monitoring and design are generally knowledge intensive and use a large amount of heuristics. Depending on the type of problem, the knowledge representation schemes and control strategies for search are to be adopted. Combining heuristics with the two basic search strategies have been discussed above. There are a number of other general-purpose search techniques which are essentially heuristics based. Their efficiency primarily depends on how they exploit the domain-specific knowledge to abolish undesirable paths. Such search methods are called ‘weak methods’, since the progress of the search depends heavily on the way the domain knowledge is exploited. A few of such search techniques which form the centre of many AI systems are briefly presented in the following sections. 2.5.1 Problem Decomposition Suppose to solve the expression is: + ∫(X³ + X² + 2X + 3sinx)dx ∫(X³ + X² + 2X + 3sinx)dx ∫x³dx ∫x²dx ∫2xdx ∫3sinxdx x 4 /4 x³/3 2∫xdx 3∫sinxdx x² –3cosx This problem can be solved by breaking it into smaller problems, each of which we can solve by using a small collection of specific rules. Using this technique of problem decomposition, we can solve very large problems very easily. This can be considered as an intelligent behaviour. 2.5.2 Can Solution Steps be Ignored? Suppose we are trying to prove a mathematical theorem: first we proceed considering that proving a lemma will be useful. Later we realize that it is not at all useful. We start with another one to prove the theorem. Here we simply ignore the first method. Consider the 8-puzzle problem to solve: we make a wrong move and realize that mistake. But here, the control strategy must keep track of all the moves, so that we can backtrack to the initial state and start with some new move. Consider the problem of playing chess. Here, once we make a move we never recover from that step. These problems are illustrated in the three important classes of problems mentioned below: Self-Instructional Material 55 Problems, Problem Spaces and Search 1. Ignorable, in which solution steps can be ignored. Eg: Theorem Proving 2. Recoverable, in which solution steps can be undone. NOTES Eg: 8-Puzzle 3. Irrecoverable, in which solution steps cannot be undone. Eg: Chess 2.5.3 Is the Problem Universe Predictable? Consider the 8-Puzzle problem. Every time we make a move, we know exactly what will happen. This means that it is possible to plan an entire sequence of moves and be confident what the resulting state will be. We can backtrack to earlier moves if they prove unwise. Suppose we want to play Bridge. We need to plan before the first play, but we cannot play with certainty. So, the outcome of this game is very uncertain. In case of 8-Puzzle, the outcome is very certain. To solve uncertain outcome problems, we follow the process of plan revision as the plan is carried out and the necessary feedback is provided. The disadvantage is that the planning in this case is often very expensive. 2.5.4 Is Good Solution Absolute or Relative? Consider the problem of answering questions based on a database of simple facts such as the following: 1. Siva was a man. 2. Siva was a worker in a company. 3. Siva was born in 1905. 4. All men are mortal. 5. All workers in a factory died when there was an accident in 1952. 6. No mortal lives longer than 100 years. Suppose we ask a question: ‘Is Siva alive?’ By representing these facts in a formal language, such as predicate logic, and then using formal inference methods we can derive an answer to this question easily. There are two ways to answer the question shown below: Method I: 1. Siva was a man. 2. Siva was born in 1905. 3. All men are mortal. 4. Now it is 2008, so Siva’s age is 103 years. 5. No mortal lives longer than 100 years. Method II: 1. Siva is a worker in the company. 2. All workers in the company died in 1952. Answer: So Siva is not alive. It is the answer from the above methods. 56 Self-Instructional Material We are interested to answer the question; it does not matter which path we follow. If we follow one path successfully to the correct answer, then there is no reason to go back and check another path to lead the solution. 2.6 CHARACTERISTICS OF PRODUCTION SYSTEMS Problems, Problem Spaces and Search NOTES Production systems provide us with good ways of describing the operations that can be performed in a search for a solution to a problem. At this time, two questions may arise: 1. Can production systems be described by a set of characteristics? And how can they be easily implemented? 2. What relationships are there between the problem types and the types of production systems well suited for solving the problems? To answer these questions, first consider the following definitions of classes of production systems: 1. A monotonic production system is a production system in which the application of a rule never prevents the later application of another rule that could also have been applied at the time the first rule was selected. 2. A nonmonotonic production system is one in which this is not true. 3. A partially communicative production system is a production system with the property that if the application of a particular sequence of rules transforms state P into state Q, then any combination of those rules that is allowable also transforms state P into state Q. 4. A commutative production system is a production system that is both monotonic and partially commutative. Table 2.1 Four Categories of Production Systems Production System Partially Commutative Non-partially Commutative Monotonic Non-monotonic Theorem Proving Robot Navigation Chemical Synthesis Bridge Is there any relationship between classes of production systems and classes of problems? For any solvable problems, there exist an infinite number of production systems that show how to find solutions. Any problem that can be solved by any production system can be solved by a commutative one, but the commutative one is practically useless. It may use individual states to represent entire sequences of applications of rules of a simpler, non-commutative system. In the formal sense, there is no relationship between kinds of problems and kinds of production systems since all problems can be solved by all kinds of systems. But in the practical sense, there is definitely such a relationship between the kinds of problems and the kinds of systems that lend themselves to describing those problems. Partially commutative, monotonic productions systems are useful for solving ignorable problems. These are important from an implementation point of view without the ability to backtrack to previous states when it is discovered that an incorrect path has been followed. Both types of partially commutative production Self-Instructional Material 57 Problems, Problem Spaces and Search NOTES systems are significant from an implementation point; they tend to lead to many duplications of individual states during the search process. Production systems that are not partially commutative are useful for many problems in which permanent changes occur. 2.7 DESIGN OF SEARCH PROGRAMS AND SOLUTIONS Solutions can be good in different ways. They can be good in terms of time or storage or in difficulty of the algorithm. In case of the travelling salesman problem, finding the best path can lead to a significant amount of computation. The solution of such problems is only possible by using heuristics. In this type of problem, a path is found of a distance of 8850 miles and another one of 7750. It is clear that the second is better than the first but is it the best? Infinite time may be needed and usually heuristics are used to find a very good path in finite time. 2.7.1 Design of Search Programs Each search process can be considered to be a tree traversal. The object of the search is to find a path from the initial state to a goal state using a tree. The number of nodes generated might be huge; and in practice many of the nodes would not be needed. The secret of a good search routine is to generate only those nodes that are likely to be useful, rather than having a precise tree. The rules are used to represent the tree implicitly and only to create nodes explicitly if they are actually to be of use. The following issues arise when searching: • The tree can be searched forward from the initial node to the goal state or backwards from the goal state to the initial state. • To select applicable rules, it is critical to have an efficient procedure for matching rules against states. • How to represent each node of the search process? This is the knowledge representation problem or the frame problem. In games, an array suffices; in other problems, more complex data structures are needed. Finally in terms of data structures, considering the water jug as a typical problem do we use a graph or tree? The breadth-first structure does take note of all nodes generated but the depth-first one can be modified. 2.7.1.1 Check duplicate nodes 1. Observe all nodes that are already generated, if a new node is present. 2. If it exists add it to the graph. 3. If it already exists, then a. Set the node that is being expanded to the point to the already existing node corresponding to its successor rather than to the new one. The new one can be thrown away. b. If the best or shortest path is being determined, check to see if this path is better or worse than the old one. If worse, do nothing. 58 Self-Instructional Material Better save the new path and work the change in length through the chain of successor nodes if necessary. Problems, Problem Spaces and Search Example: Tic-Tac-Toe State spaces are good representations for board games such as Tic-Tac-Toe. The position of a game can be explained by the contents of the board and the player whose turn is next. The board can be represented as an array of 9 cells, each of which may contain an X or O or be empty. NOTES • State: ■ Player to move next: X or O. ■ Board configuration: X 0 0 X X • Operators: Change an empty cell to X or O. • Start State: Board empty; X’s turn. • Terminal States: Three X’s in a row; Three O’s in a row; All cells full. Search Tree The sequence of states formed by possible moves is called a search tree. Each level of the tree is called a ply. Since the same state may be reachable by different sequences of moves, the state space may in general be a graph. It may be treated as a tree for simplicity, at the cost of duplicating states. 2.7.1.2 Solving problems using search • Given an informal description of the problem, construct a formal description as a state space: ■ Define a data structure to represent the state. ■ Make a representation for the initial state from the given data. ■ Write programs to represent operators that change a given state representation to a new state representation. ■ Write a program to detect terminal states. Self-Instructional Material 59 Problems, Problem Spaces and Search NOTES • Choose an appropriate search technique: ■ How large is the search space? ■ How well structured is the domain? ■ What knowledge about the domain can be used to guide the search? 2.7.2 Additional Problems The additional problems are given below. 2.7.2.1 The Monkey and Bananas Problem The monkey is in a closed room in which there is a small table. There is a bunch of bananas hanging from the ceiling but the monkey cannot reach them. Let us assume that the monkey can move the table and if the monkey stands on the table, the monkey can reach the bananas. Establish a method to instruct the monkey on how to capture the bananas. 1. Task Environment A. Information Set of places {on the floor, place1, place2, under the bananas} B. Operators CLIMB Pretest the monkey is in the table’s place Move the monkey on the table WALK Variable x is the set of places Move the monkey’s place becomes x. MOVE TABLE Variable x is the set of places Pretest the monkey’s place is in the set of places The monkey’s place is in the table’s place Move the monkey’s place becomes x. The table’s place becomes x. GET BANANAS Pretest the table’s place is under the bananas The monkey’s place is on the table Move the contents of the monkey’s hand are the bananas C. Differences d1 is the monkey’s place d2 is the table’s place d3 are the contents of the monkey’s hand D. Differences ordering d3 is harder to reduce than d2 which is harder to reduce than d1. Specific Task Goal transform initial object to desired object Objects Initial the monkey’s place is place1 the table’s place is place2 The contents of the monkey’s hand is empty Desired the contents of the monkey’s hand are the bananas. 60 Self-Instructional Material 2.8 SUMMARY In this unit, you have learned about problems, their definitions, characteristics and design of search programs. You have also learned how to define the problem in a state space research. A state space represents a problem in terms of states and operators that change states. You have further learned about problem solving which is a process of generating solutions from observed or given data. Problems, Problem Spaces and Search NOTES In this unit, you have also studied the two key steps that must be taken towards designing a program to solve a particular problem: 1. First, accurately defining the problem means, specify the problem space, the operators for moving within the space, and the starting and goal states. 2. Second, analyse the problem for determining where it falls. Finally, consider the last two steps for developing a program to solve that problem: 3. Identify and represent the knowledge required by the task. 4. Choose one or more techniques to solve the problem and apply those techniques. Literally, the relationship between the problem characteristics and specific techniques should become clear and clear. 2.9 KEY TERMS • Problem solving: It is a process of generating solutions from observed or given data. It is however not always possible to use direct methods (i.e. go directly from data to solution). • Problem: A problem is defined by its ‘elements’ and their ‘relations’. • State space search: A state space represents a problem in terms of states and operators that change states. • Production systems: These provide appropriate structures for performing and describing search processes. • Stacks and queues: The Stacks and Queues are data structures that maintain the order of last-in, first-out and first-in, first-out respectively. 2.10 ANSWERS TO ‘CHECK YOUR PROGRESS’ A search refers to the search for a solution in a problem space. True A problem is defined by its ‘elements’ and their ‘relations’. True. False. True. A search strategy in which the highest layer of a decision tree is searched completely before proceeding to the next layer is called Breadth-first search (BFS). 8. True 1. 2. 3. 4. 5. 6. 7. Self-Instructional Material 61 Problems, Problem Spaces and Search NOTES 2.11 QUESTIONS AND EXERCISES 1. Discuss the meaning of ‘The first requirement of a good control strategy is that it causes motion’. 2. Describe the necessary diagrams, a suitable state space representation for the 8-puzzle problem, and explain how the problem can be solved by state space search? Show how heuristics can improve the efficiency of search? 3. State the travelling salesman problem and explain: Combinatorial explosion, branch and bound technique and nearest neighbour heuristic. 4. Show that the Tower of Hanoi problem can be classified under the area of AI. Give a state space representation of the problem. 2.12 FURTHER READING 1. www.mm.iit.uni-miskolc.hu 2. www.induastudychannel.com 3. www.mtsu.edu 4. www.cosc.brocku.com 5. www.cc.gatech.edu 6. www.cosmo.marymount.edu 7. www.tu-berlin.de 8. www.dubmail.gcd.ie 9. www.alanturing.net 10. www.en.wikipedia.org 11. www.cds.caltech.edu 12. www.dl.icr.org 13. www.cs.cardiff.ac.uk 14. www.niceindia.com 15. www.rdspc.com 16. www.cs.wise.edu 17. www.web.inet-tr.org.tr 18. www.books.google.com 19. www.acs.ilstu.edu 20. www.amsglossary.allenpress.com 21. www.sjc.edu.hk 22. www.lif-sud.uni-mrs.fr 23. Albrecht. Encyclopedia of Disability 24. Kumar, Deepak. 1995. ‘Undergraduate AI and its Non-imperative Prerequisite’. ACM SIGART Bulletin, 4th Jan. 25. Zhou, R. 2006. ‘Breadth-first Heuristic Search’. Artificial Intelligence. 26. Bagchi, A. 1985. ‘Three Approaches to Heuristic Search in Networks’. Journal of the ACM, 1st Jan. 27. Chandrasekaran, B., T. Johnson and R. Smit. 1992. ‘Task-Structure Analysis for Knowledge Modeling (Technical)’. Communications of the ACM, Sept. 62 Self-Instructional Material UNIT 3 KNOWLEDGE REPRESENTATION ISSUES Knowledge Representation Issues NOTES Structure 3.0 Introduction 3.1 Unit Objectives 3.2 Knowledge Representation 3.2.1 3.2.2 3.2.3 3.2.4 3.2.5 Approaches to AI Goals Fundamental System Issues Knowledge Progression Knowledge Model Knowledge Typology Map 3.3 Representation and Mappings 3.3.1 Framework of Knowledge Representation 3.3.2 Representation of Facts 3.3.3 Using Knowledge 3.4 Approaches to Knowledge Representation 3.4.1 Properties for Knowledge Representation Systems 3.4.2 Simple Relational Knowledge 3.5 Issues in Knowledge Representation 3.5.1 3.5.2 3.5.3 3.5.4 Important Attributes and Relationships Granularity Representing Set of Objects Finding Right Structure 3.6 The Frame Problem 3.6.1 The Frame Problem in Logic 3.7 3.8 3.9 3.10 3.0 Summary Key Terms Questions and Exercises Further Reading INTRODUCTION In the preceding unit, you learnt about problems, their definitions, characteristics and design of search programs. In this unit, you will learn about the various aspects of knowledge representation. Let us start with the observation that we have no perfect method of knowledge representation today. This stems largely from our ignorance of just what knowledge is. Nevertheless many methods have been worked out and used by AI researchers. You will learn that the knowledge representation problem concerns the mismatch between human and computer ‘memory’, i.e., how to encode knowledge so that it is a faithful reflection of the expert’s knowledge and can be manipulated by computers. For mapping, we call these representations of knowledge—knowledge bases— and the manipulative operations on these knowledge bases, are known as inference engine programs. In this unit you will learn about the frame problem which is a problem of representing the facts that change as well as those that do not change. For example, consider a table with a plant on it under a window. Suppose, it is Self-Instructional Material 63 Knowledge Representation Issues NOTES moved to the centre of the room. Here, it must be inferred that the plant is now in the centre but the window is not. 3.1 UNIT OBJECTIVES After going through this unit, you will be able to: • Explain the different knowledge representation issues • Understand representation and mapping • Describe the various approaches to knowledge representation • Explain the issues in knowledge representation • Describe the frame problem 3.2 KNOWLEDGE REPRESENTATION Knowledge Representation (x): animal (x) fi eat (x, fruit) eat (x, meat) • What is knowledge? • How do we search through knowledge? • How do we get knowledge? • How do we get a computer to understand knowledge? What AI researchers call ‘knowledge’ appears as data at the level of programming. Data becomes knowledge when a computer program represents and uses the meaning of some data. Many knowledge-based programs are written in the LISP programming language, which is designed to manipulate data as symbols. Knowledge may be declarative or procedural. Declarative knowledge is represented as a static collection of facts with a set of procedures for manipulating the facts. Procedural knowledge is described by an executable code that performs some action. Procedural knowledge refers to ‘how-to’ do something. Usually, there is a need for both kinds of knowledge representation to capture and represent knowledge in a particular domain. First-order predicate calculus (FOPC) is the best-understood scheme for knowledge representation and reasoning. In FOPC, knowledge about the world is represented as objects and relations between objects. Objects are real-world things that have individual identities and properties, which are used to distinguish the things from other objects. In a first-order predicate language, knowledge about the world is expressed in terms of sentences that are subject to the language’s syntax and semantics. 3.2.1 Approaches to AI Goals 64 Self-Instructional Material There are four approaches to the goals of AI: (1) computer systems that act like humans; (2) programs that simulate the human mind; (3) knowledge representation and mechanistic reasoning; and (4) intelligent or rational agent design. The first two approaches focus on studying humans and how they solve problems, while the latter two approaches focus on studying real-world problems and developing rational solutions regardless of how a human would solve the same problems. Programming a computer to act like a human is a difficult task and requires that the computer system be able to understand and process commands in natural language, store knowledge, retrieve and process that knowledge in order to derive conclusions and make decisions, learn to adapt to new situations, perceive objects through computer vision and have robotic capabilities to move and manipulate objects. Although this approach was inspired by the Turing Test, most programs have been developed with the goal of enabling computers to interact with humans in a natural way rather than passing the Turing Test. Knowledge Representation Issues NOTES Some researchers focus instead on developing programs that simulate the way in which the human mind works on problem-solving tasks. The first attempt to imitate human thinking was the Logic Theorist and the General Problem Solver programs developed by Allen Newell and Herbert Simon. Their main interest was in simulating human thinking rather than solving problems correctly. Cognitive science is the interdisciplinary field that studies the human mind and intelligence. The basic premise of cognitive science is that the mind uses representations that are similar to computer data structures and computational procedures that are similar to computer algorithms that operate on those structures. Other researchers focus on developing programs that use logical notation to represent a problem and use formal reasoning to solve a problem. This is called the ‘logicist approach’ to developing intelligent systems. Such programs require huge computational resources to create vast knowledge bases and to perform complex reasoning algorithms. Researchers continue to debate whether this strategy will lead to computer problem solving at the level of human intelligence. Still other researchers focus on the development of ‘intelligent agents’ within computer systems. Russell and Norvig defines these agents as ‘anything that can be viewed as perceiving its environment through sensors and acting upon that environment through effectors.’ The goal for computer scientists working in this area is to create agents that incorporate information about the users and the use of their systems into the agents’ operations. 3.2.2 Fundamental System Issues A robust AI system must be able to store knowledge, apply that knowledge to the solution of problems and acquire new knowledge through experience. Among the challenges that face researchers in building AI systems, there are three that are fundamental: knowledge representation, reasoning and searching and learning. How do we represent what we know? • Knowledge is a general term. An answer to the question, ‘how to represent knowledge’, requires an analysis to distinguish between knowledge ‘how’ and knowledge ‘that’. ■ knowing ‘how to do something’. e.g. ‘how to drive a car’ is Procedural knowledge. ■ knowing ‘that something is true or false’. e.g. ‘that is the speed limit for a car on a motorway’ is Declarative knowledge. • Knowledge and representation are distinct entities that play central but distinguishable roles in the intelligence system. Self-Instructional Material 65 Knowledge Representation Issues ■ ■ NOTES Knowledge is a description of the world. It determines a system’s competence by what it knows. Representation is the way knowledge is encoded. It defines the performance of a system in doing something. • Different types of knowledge require different kinds of representation. The Knowledge Representation models/mechanisms are often based on: ■ Logic ■ Rules Frames ■ ■ Semantic Net • Different types of knowledge require different kinds of reasoning. Knowledge is a general term. Knowledge is a progression that starts with data which is of limited utility. By organizing or analysing the data, we understand what the data means, and this becomes information. The interpretation or evaluation of information yield knowledge. An understanding of the principles within the knowledge is wisdom. 3.2.3 Knowledge Progression Interpretation Organizing Data Knowledge Information Analysing Understanding Evolution Wisdom Principles Fig. 3.1 Knowledge Progression Data is viewed as a collection of disconnected facts. Eg: It is raining. Information emerges when relationships among facts are established and understood. Providing answers to ‘who’, ‘what’, ‘where’, and ‘when’ gives the relationships among facts. Eg: The temperature dropped 15 degrees and it started raining. Knowledge emerges when relationships among patterns are identified and understood. Answering to ‘how’ gives the relationships among patterns. Eg: If the humidity is very high, temperature drops substantially, and then atmosphere holds the moisture, so it rains. Wisdom is the understanding of the principles of relationships that describe patterns. Providing the answer for ‘why’ gives understanding of the interaction between patterns. Eg: Understanding of all the interactions that may happen between raining, evaporation, air currents, temperature gradients and changes. Let’s look at various kinds of knowledge that might need to represent AI systems: Objects Objects are defined through facts. Eg: Guitars with strings and trumpets are brass instruments. 66 Self-Instructional Material Knowledge Representation Issues Events Events are actions. Eg: Vinay played the guitar at the farewell party. NOTES Performance Playing the guitar involves the behaviour of the knowledge about how to do things. Meta-knowledge Knowledge about what we know. To solve problems in AI, we must represent knowledge and must deal with the entities. Facts Facts are truths about the real world on what we represent. This can be considered as knowledge level. 3.2.4 Knowledge Model The Knowledge Model defines that as the level of ‘connectedness’ and ‘understanding’ increases, our progress moves from data through information and knowledge to wisdom. The model represents transitions and understanding. • The transitions are from data to information, information to knowledge, and finally knowledge to wisdom; • The support of understanding is the transitions from one stage to the next stage. The distinctions between data, information, knowledge and wisdom are not very discrete. They are more like shades of gray, rather than black and white. Data and information deal with the past and are based on gathering facts and adding context. Knowledge deals with the present and that enables us to perform. Wisdom deals with the future vision for what will be rather than for what it is or it was. Degrees of Connectedness Wisdom Understanding Principles Knowledge Understanding Patterns Information Understanding Relations Data Fig. 3.2 Knowledge Model Degrees of Understanding Self-Instructional Material 67 Knowledge Representation Issues NOTES 3.2.5 Knowledge Typology Map According to the topology map: 1. Tacit knowledge derives from the experience, doing action and subjective insight 2. Explicit knowledge derives from principle, procedure, process and concepts. 3. Information derives from data, context and facts. 4. Knowledge conversion devices from internalization, socialization, wisdom, combination and externalization. Finally Knowledge derives from the Tacit and Explicit Knowledge and Information. Let’s see the contents of the topology map: Facts are data or instance that is specific and unique. Concepts are classes of items or words or ideas that are known by a common name and share common features. Processes are flow of events or activities that describes how things will work rather than how to do things. Procedures are a series of step-by-step actions and decisions that give the result of a task. Principles are guidelines, rules and parameters that are going to rule or monitor. Doing Action Experience Principles Procedure Explicit Knowledge Tacit Knowledge Process Subjective Insight Knowledge Concept Information Socialization Facts Context Data Wisdon Intemalization Knowledge Conversion Externalizatin Combination Fig. 3.3 Knowledge Typology Map Principles are the basic building blocks of theoretical models and allow for making predictions and drawing implications. These artifacts are supported in the knowledge creation process for creating two of knowledge types: declarative and procedural, which are explained below. 68 Self-Instructional Material • Knowledge Type Cognitive psychologists sort knowledge into Declarative and Procedural categories and some researchers add Strategic as a third category. Procedural knowledge Declarative knowledge • Examples: procedures, rules, • Example: concepts, objects, strategies, agendas, models. facts, propositions, assertions, semantic nets, logic and descriptive models. • Focuses on tasks that must be performed to reach a particular objective or goal. • Knowledge about ‘how to do something’; e.g., to determine if Peter or Robert is older, first find their ages. • Refers to representations of objects and events; knowledge about facts and relationships; • Knowledge about ‘that something if is true or false’. e.g., A car has four tyres; Peter is older than Robert. Knowledge Representation Issues NOTES Note : About procedural knowledge, there is some disparity in views. One view is, that it is close to Tacit knowledge; it manifests itself in the doing of something, yet cannot be expressed in words; e.g. we read faces and moods. Another view is that it is close to declarative knowledge; the difference is that a task or method is described instead of facts or things. All declarative knowledge is explicit knowledge; it is knowledge that can be and has been articulated. Strategic knowledge is considered to be a subset of declarative knowledge. 3.3 REPRESENTATION AND MAPPINGS When we collect knowledge we are faced with the problem of how to record it. And when we try to build the knowledge base we have the similar problem of how to represent it. We could just write down what we are told but as the information grows, it becomes more and more difficult to keep track of the relationships between the items. Let us start with the observation that we have no perfect method of knowledge representation today. This stems largely from our ignorance of just what knowledge is. Nevertheless many methods have been worked out and used by AI researchers. The knowledge representation problem concerns the mismatch between human and computer ‘memory’, i.e. how to encode knowledge so that it is a faithful reflection of the expert’s knowledge and can be manipulated by the computer. We call these representations of knowledge knowledge bases, and the manipulative operations on these knowledge bases, inference engine programs. What to Represent? Facts: Truths about the real world and what we represent. This can be regarded as the base knowledge level. Representation of the facts: That which we manipulate. This can be regarded as the symbol level since we usually define the representation in terms of symbols that can be manipulated by programs. Simple Representation: Simple way to store facts. Each fact about a set of objects is set out systematically in columns. Little opportunity for inference. Knowledge basis for inference engines. Self-Instructional Material 69 Knowledge Representation Issues NOTES 3.3.1 Framework of Knowledge Representation A computer requires a well-defined problem description to process and to provide well-defined acceptable solution. To collect fragments of knowledge we first need to formulate a description in our spoken language and then represent it in formal language so that the computer can understand. The computer can then use an algorithm to compute an answer. This process is illustrated below. Solve Problem Represent Solution Informal Interpret Formal Compute Representation Output Fig. 3.4 Knowledge Representation Framework The steps are: • The informal formalism of the problem takes place first. • It is then represented formally and the computer produces an output. • This output can then be represented in a informally described solution that the user understands or checks for consistency. Note: Problem solving requires • Formal knowledge representation, and • Conversion of informal (implicit) knowledge to formal (explicit) knowledge. • Knowledge and Representation Problem solving requires a large amount of knowledge and some mechanism for manipulating that knowledge. Knowledge and Representation are distinct entities and play a central but distinguishable roles in the intelligence system. ■ Knowledge is a description of the world; it determines a system’s competence by what it knows. ■ Representation is the way knowledge is encoded; it defines the system’s performance in doing something. In simple words, we: • need to know about things we want to represent, and • need some means by which we can manipulate things. ■ Objects - facts about objects in the domain ■ Events - actions that occur in the domain ■ Performance - knowledge about how to do things • know things to represent ■ Meta-knowledge - Knowledge about what we know 70 Self-Instructional Material • need means to manipulate ■ Requires some formalism - to what we represent Knowledge Representation Issues Thus, knowledge representation can be considered at two levels: (a) knowledge level at which facts are described, and NOTES (b) Symbol level at which the representations of the objects, defined in terms of symbols, can be manipulated in the programs. Note: A good representation enables fast and accurate access to knowledge and understanding of the content. 3.3.2 Representation of Facts Representations of facts are those which we manipulate. This can be regarded as the symbol level since we normally define the representation in terms of symbols that can be manipulated by programs. We can structure these entities in two levels: The knowledge level At which the facts are going to be described. The symbol level At which representations of objects are going to be defined in terms of symbols that can be manipulated in programs. Fig. 3.5 Two Entities in Knowledge Representation Natural language (or English) is the way of representing and handling the facts. Logic enables us to consider the following fact: spot is a dog represented as dog(spot) We infer that all dogs have tails : dog(x) has_a_tail(x) According to the logical conclusion has_a_tail(Spot) Using a backward mapping function Spot has a tail can be generated. The available functions are not always one to one but are many to many which are a characteristic of English representations. The sentences All dogs have Self-Instructional Material 71 Knowledge Representation Issues NOTES tails and every dog has a tail – both say that each dog has a tail, but from the first sentence, one could say that each dog has more than one tail and try substituting teeth for tails. When an AI program manipulates the internal representation of facts these new representations can also be interpretable as new representations of facts. Consider the classic problem of the mutilated chessboard. The problem in a normal chessboard, the opposite corner squares, have been eliminated. The given task is to cover all the squares on the remaining board by dominoes so that each domino covers two squares. Overlapping of dominoes is not allowed. Consider three data structures: Fig. 3.6 Mutilated Checker The first two data structures are shown in the diagrams above and the third data structure is the number of black squares and the number of white squares. The first diagram loses the colour of the squares and a solution is not easy. The second preserves the colours but produces no easier path because the numbers of black and white squares are not equal. Counting the number of squares of each colour, giving black as 32 and the number of white as 30, gives the negative solution as a domino must be on one white square and one black square; thus the number of squares must be equal for a positive solution. 3.3.3 Using Knowledge We have briefly discussed above where we can use knowledge. Let us consider how knowledge can be used in various applications. Learning Acquiring knowledge is learning. It simply means adding new facts to a knowledge base. New data may have to be classified prior to storage for easy retrieval, interaction and inference with the existing facts and has to avoid redundancy and replication in the knowledge. These facts should also be updated. Retrieval Using the representation scheme shows a critical effect on the efficiency of the method. Humans are very good at it. Many AI methods have tried to model humans. Reasoning Get or infer facts from the existing data. If a system only knows that: • Ravi is a Jazz musician. • All Jazz musicians can play their instruments well. 72 Self-Instructional Material Knowledge Representation Issues If questions are like this Is Ravi a Jazz musician? OR Can Jazz musicians play their instruments well? then the answer is easy to get from the data structures and procedures. NOTES However, questions like Can Ravi play his instrument well? require reasoning. The above are all related. For example, it is fairly obvious that learning and reasoning involve retrieval, etc. 3.4 APPROACHES TO KNOWLEDGE REPRESENTATION We discuss the various concepts relating to approaches to knowledge representation in the following sections. 3.4.1 Properties for Knowledge Representation Systems The Knowledge Representation System possesses the following properties: • Representational Adequacy • Inferential Adequacy • Inferential Efficiency • Acquisitional Efficiency • Well-defined Syntax and Semantics • Naturalness • Frame problem 3.4.1.1 Representational adequacy A knowledge representation scheme must be able to actually represent the knowledge appropriate to our problem. 3.4.1.2 Inferential adequacy A knowledge representation scheme must allow us to make new inference from old knowledge. It means the ability to manipulate the knowledge represented to produce new knowledge corresponding to that inferred from the original. It must make inferences that are: • Sound – The new knowledge actually does follow from the old knowledge. • Complete – It should make all the right inferences. Here soundness is usually easy, but completeness is very hard. Example: Given Knowledge Tom is a man All men are mortal. The inference – Self-Instructional Material 73 Knowledge Representation Issues James is mortal Is not sound, whereas Tom is mortal NOTES Is sound. 3.4.1.3 Inferential efficiency A knowledge representation scheme should be tractable, i.e. make inferences in reasonable time. Unfortunately, any knowledge representation scheme with interesting expressive power is not going to be efficient; often, the more general knowledge representation schemes are less efficient. We have to use knowledge representation schemes tailored to problem domain i.e. less general but more efficient. The ability to direct the inferential mechanisms into the most productive directions by storing guides. 3.4.1.4 Acquisitional efficiency Acquire new knowledge using automatic methods wherever possible rather than on human intervention. Till today no single system optimizes all the properties. Now we will discuss some of the representation schemes. 3.4.1.5 Well-defined syntax and semantics It should be possible to tell: • Whether any construction is ‘grammatically correct’. • How to read any particular construction, i.e. no ambiguity. Thus a knowledge representation scheme should have well-defined syntax. It should be possible to precisely determine for any given construction, exactly what its meaning is. Thus a knowledge representation scheme should have well-defined semantics. Here syntax is easy, but semantics is hard for a knowledge representation scheme. 3.4.1.6 Naturalness A knowledge representation scheme should closely correspond to our way of thinking, reading and writing. A knowledge representation scheme should allow a knowledge engineer to read and check the knowledge base. Simply we can define knowledge representation as knowledge representation is the problem of representing what the computer knows. 3.4.1.7 Frame problem A frame problem is a problem of representing the facts that change as well as those that do not change. For example, consider a table with a plant on it under a window. Suppose we move it to the centre of the room. Here we must infer that plant is now in the center but the window is not. Frame axioms are used to describe all the things that do not change when an operator is applied in one state to go to another state say n + 1. 3.4.2 Simple Relational Knowledge • Simple way to store facts. 74 Self-Instructional Material • Each fact about a set of objects is set out systematically in columns. Knowledge Representation Issues • Little opportunity for inference. • Knowledge basis for inference engines. Musician Ravi John Robert Krishna Style Instrument Trumpet Saxophone Guitar Guitar Jazz Avant Garde Rock Jazz Age 36 35 43 47 NOTES Fig 3.7 Simple Relational Knowledge We ask things like: • Who is dead? • Who plays Jazz/Trumpet, etc.? This sort of representation is popular in database systems. 3.4.2.1 Inheritable knowledge age Adult Male 35 is a Musician is a Jazz Avant Garde/Jazz instance instance John Ravi bands bands Ravi's Group Ravi's Quintet Naked City Hyderabad Fig. 3.8 Property Inheritance Hierarchy Relational knowledge is made up of objects consisting of • attributes • corresponding associated values We can extend the base by allowing inference mechanisms: • Property inheritance ■ Elements inherit values from being members of a class. ■ Data must be organized into a hierarchy of classes. Self-Instructional Material 75 Knowledge Representation Issues • Boxed nodes – objects and values of attributes of objects. • Values can be objects with attributes and so on. • Arrows – point from object to its value. NOTES • This structure is known as a slot and filler structure, semantic network or a collection of frames. The algorithm is to retrieve a value for an attribute in an instance object: Algorithm: Property Inheritance 1. Find the object in the knowledge base. 2. If there is a value for the attribute report it. 3. Otherwise look for a value of instance if none fail. 4. Otherwise go to that node and find a value for the attribute and then report it. 5. Otherwise search through using isa until a value is found for the attribute. 3.4.2.2 Inferential knowledge Represent knowledge as formal logic: All dogs have tails : dog(x) has_a_tail(x) Advantages: • A set of strict rules. ■ Can be used to derive more facts. ■ Truths of new statements can be verified. ■ Guaranteed correctness. • Many inference procedures available to implement standard rules of logic. • Popular in AI systems. Ex. Automated theorem proving. 3.4.2.3 Procedural knowledge The basic idea of Procedural Knowledge is the Knowledge encoded in some procedures. Small programs know how to do specific things and how to proceed. For example, a parser in a natural language understander has the knowledge that a noun phrase may contain articles, adjectives and nouns. It is represented by calls to routines that know how to process articles, adjectives and nouns. Advantages: • Heuristic or domain specific knowledge can be represented. • Extended logical inferences, such as default reasoning facilitated. • Side effects of actions may be modelled. Some rules may become false in time. Keeping track of this in large systems may be tricky. Disadvantages: • Completeness - not all cases may be represented. • Consistency - not all deductions may be correct. • Modularity is sacrificed. • Cumbersome control information. 76 Self-Instructional Material 3.5 ISSUES IN KNOWLEDGE REPRESENTATION Knowledge Representation Issues The issues in knowledge representation are discussed below. 3.5.1 Important Attributes and Relationships NOTES There are two main important attributes in Knowledge Representation: Instance and Isa. They are important because each supports property inheritance. What about the relationship between the attributes of an object, such as inverses, existence, techniques for reasoning about values and single valued attributes. We consider an example of an inverse in band(John, Naked City) This can be treated as John plays in the band Naked City or John’s band in Naked City. Another representation is band = Naked City Band-members = John, Robert, The relationship between the attributes of an object is independent of specific knowledge they encode, and may hold properties like: inverses, existence in an isa hierarchy, techniques for reasoning about values and single valued attributes. Inverses This is about consistency check, while a value is added to one attribute. The entities are related to each other in many different ways. The figure shows attributes (isa, instance and team), each with a directed arrow, originating at the object being described and terminating either at the object or its value. Existence in an isa hierarchy This is about generalization-specialization, like classes of objects and Specialized subsets of those classes, their attributes and specialization of attributes. Ex.: The height is a specialization of general attribute; physical-size is a specialization of physical-attribute. These generalization-specialization relationships are important for attributes because they support inheritance. Techniques for reasoning about values This is about reasoning values of attributes not given explicitly. Several kinds of information are used in reasoning. Like height must be in a unit of length, age of person cannot be greater than the age of person’s parents. The values are often specified when a knowledge base is created. Single-valued attributes This is about a specific attribute that is guaranteed to take a unique value. For example, a guitar player can be only a single and be a member of only one team. KR systems take different approaches to provide support for single valued attributes. 3.5.2 Granularity At what level the knowledge represents and what are the primitives? Choosing the Granularity of Representation Primitives are fundamental concepts, such as holding, seeing, playing. English is a very rich language with over half a million words is Self-Instructional Material 77 Knowledge Representation Issues NOTES clear and we will find difficulty in deciding upon which words to choose as our primitives in a series of situations. If Syam feeds a dog then it could be: feeds(Syam, dog) If Syam gives the dog a bone will be: gives (Syam, dog, bone) Are these the same? In any sense does giving an object food constitute feeding? If give(x, food) feed(x) then we are making progress. But we need to add certain inferential rules. In the famous program on relationships Ravi is Sai’s cousin. How do we represent this? Ravi = daughter (brother or sister (father or mother (Sai))) Suppose it is Sree then we do not know whether Sree is a male or female and then son applies as well. Clearly the separate levels of understanding require different levels of primitives and these need many rules to link together apparently similar primitives. Obviously there is a potential storage problem and the underlying question must be what level of comprehension is needed. 3.5.3 Representing Set of Objects There are certain properties of objects that are true as member of a set but not as individual; for example, consider the assertion made in the sentences: ‘There are more sheep than people in India’, and ‘English speakers can be found all over the world.’ To describe these facts, the only way is to attach assertion to the sets representing people, sheep and English. The reason to represent sets of objects is if a property is true to all or most of the elements of a set, then it is more efficient to associate with the set rather than to associate it explicitly with every elements of the set. This is done • in logical representation through the use of a universal quantifier, and • in hierarchical structure where nodes represent sets and inheritance propagate a set level assertion down to the individual. For example: assertion large (elephant), remember to make clear distinction between • whether we are asserting some property of the set itself, means, the set of elephants is large, or • asserting some property that holds for individual elements of the set means any thing that is an elephant is large. There are three ways in which sets may be represented: (a) Name, for example the node Musician and the predicates Jazz and Avant Garde in logical representation (see Fig. 3.8). (b) Extensional definition is to list the numbers, and 78 Self-Instructional Material (c) Intentional definition is to provide a rule that returns true or false depending on whether the object is in the set or not. Knowledge Representation Issues 3.5.4 Finding Right Structure For accessing the right structure we can describe a particular situation. This requires selecting an initial structure and then revising the choice. While doing so, it is necessary to solve the following problems: NOTES • how to perform an initial selection of the most appropriate structure • how to fill in appropriate details from the current situations • how to find a better structure if the one chosen initially turns out not to be appropriate • what to do if none of the available structures is appropriate • when to create and remember a new structure There is no good and general-purpose method for solving all these problems. Some knowledge representation techniques solve some of them. Let’s consider two problems: 1. How to select initial structure 2. How to find a better structure Selecting an initial Structure: There are three important approaches for the selection of an initial structure. (a) Index the structures directly by specific English words. Let each verb have own its structure that describeds the meaning. Disadvantages (i) Many words may have several different meanings. Eg: 1. John flew to Delhi. 2. John flew the kite. ‘flew’ here had different meaning in two different contexts. (ii) It is useful only when there is an English description of the problem. (b) Consider a major concept as a pointer to all of the structures. Example: 1. The concept, Steak might point to two scripts, one for the restaurant and the other for the super market. 2. The concept bill might point to as restaurant script or a shopping script. We take the intersection of these sets; get the structure that involves all the content words. Disadvantages 1. If the problem contains extraneous concepts then the intersection will result as empty. 2. It may require a great deal of computation to compute all the possible sets and then to intersect them. (c) Locate the major clue and use that to select an initial structure Self-Instructional Material 79 Knowledge Representation Issues NOTES Disadvantages 1. We can’t identify a major clue in some situations. 2. It is difficult to anticipate which clues are important and which are not. (2) Revising the choice when necessary Once we find a structure and if it doesn’t seem to be appropriate then we would opt for another choice. The different ways in which this can be done are: 1. Select the fragments of the current structure and match them against the other alternatives available. Choose the best one. 2. Make an excuse for the failure of the current structures and continue to use that. There are heuristics such as the fact that a structure is more appropriate if a desired feature is missing rather than having an inappropriate feature. Example: A person with one leg is more plausible than a person with a tail. 3. Refer to specific links between the structures in order to follow new directions to explore. 4. If the knowledge is represented in a ‘isa’ hierarchy then move upward until a structure is found and they should not conflict with evidence. Use this structure if it provides the required knowledge or create a new structure below it. 3.6 THE FRAME PROBLEM The frame problem is the challenge of representing the effects of action in logic without having to represent explicitly a large number of intuitively obvious noneffects. Philosophers and AI researchers’ frame problem is suggestive of a wider epistemological issue, namely whether it is possible or in principle, to limit the scope of the reasoning required to derive the consequences of an action. 3.6.1 The Frame Problem in Logic Check Your Progress 1. What AI researchers call ‘knowledge’ appears as data at the level of programming. (True or False) 2. Knowledge is a description of the world. It determines a system’s competence by what it knows. (True or False) 3. What is a topology map? 80 Self-Instructional Material Using mathematical logic, how is it possible to write formulae that describe the effects of actions without having to write a large number of accompanying formulae that describe the mundane, obvious non-effects of those actions? Let’s look at an example. The difficulty can be illustrated without the full apparatus of formal logic, but it should be borne in mind that the devil is in the mathematical details. Suppose we write two formulae, one describing the effects of painting an object and the other describing the effects of moving an object. 1. colour(x, c) holds after paint(x, c) 2. position(x, p) holds after move(x, p) Now, suppose we have an initial situation in which colour(A, Red) and position(A, House). According to the machinery of deductive logic, what then holds after the action paint(A, Blue) followed by the action move(A, Garden)? Intuitively, we would expect colour(A, Blue) and position(A, Garden) to hold. Unfortunately, this is not the case. If written out more formally in classical predicate logic, using a suitable formalism for representing time and action such as the situation calculus, the two formulae above gives the conclusion that position(A, Garden) holds. This is because they don’t rule out the possibility that the colour of A gets changed by the Move action. The most obvious way to augment such formalization so that the right common sense conclusions fall out is to add a number of formulae that explicitly describe the non-effects of each action. These formulae are called frame axioms. For example, we need a pair of frame axioms. Knowledge Representation Issues NOTES 3. colour(x ,c) holds after Move(x, p) if colour(x ,c) held beforehand 4. Position(x, p) holds after Paint(x, c) if Position(x, p) held beforehand In other words, painting an object will not affect its position, and moving an object will not affect its colour. With the addition of these two formulae, all the desired conclusions can be drawn. However, this is not at all a satisfactory solution. Since most actions do not affect most properties of a situation, in a domain comprising M actions and N properties we will, in general, have to write out almost MN frame axioms. Whether these formulae are destined to be stored explicitly in a computer’s memory, or are merely part of the designer’s specification, this is an unwelcome burden. The challenge is to find a way to capture the non-effects of actions more successively in formal logic. What we need is some way of declaring the general rule-of-thumb that an action can be assumed not to change a given property of a situation unless there is evidence to the contrary. This default assumption is known as the common sense law of inertia. The technical frame problem can be viewed as the task of formalizing this law. The main obstacle to doing this is the monotonic of classical logic. In classical logic, the set of conclusions that can be drawn from a set of formulae always increases with the addition of further formulae. This makes it impossible to express a rule that has an open-ended set of exceptions, and the common sense law of inertia is just such a rule. Researchers in logic-based AI have put a lot of effort into developing a variety of non-monotonic reasoning formalisms, such as circumscription, and investigating their application to the frame problem. None of this has turned out to be at all straightforward. One of the most troublesome barriers to progress was highlighted in the so-called Yale shooting problem, a simple scenario that gives rise to counter-intuitive conclusions if naively represented with a non-monotonic formalism. To make matters worse, a full solution needs to work in the presence of concurrent actions, actions with non-deterministic effects, continuous change and actions with indirect ramifications. In spite of these subtleties, a number of solutions to the technical frame problem now exist that are adequate for logic-based AI research. Although improvements and extensions continue to be found, it is fair to say that the dust has settled, and that the frame problem, technically, is more or less solved. 3.7 SUMMARY In this unit, you have learned about the various aspects of knowledge representation. There is no perfect method of knowledge representation today. This stems largely from our ignorance of just what knowledge is. Nevertheless, many methods have Self-Instructional Material 81 Knowledge Representation Issues NOTES been worked out and used by AI researchers. You have also learned about mapping. The knowledge representation problem concerns the mismatch between human and computer ‘memory’, i.e., how to encode knowledge so that it is a faithful reflection of the expert’s knowledge and can be manipulated by computer. For mapping, we call these representations of knowledge ‘knowledge bases’, and the manipulative operations on these knowledge bases, inference engine programs. You have also learned about the frame problem which is a problem of representing the facts that change as well as those that do not change. In this unit, you have studied about knowledge representation and reasoning formalism. This means it could lead to new conclusions from the knowledge we have. It could be reasoned that we have enough knowledge to build a knowledgebased agent. However, propositional logic is a weak language; there are many things that cannot be expressed. To express knowledge about objects, their properties and the relationships that exist between objects, we need a more expressive language: first-order logic. 3.8 KEY TERMS • Knowledge: It is a general term. Knowledge is a progression that starts with data that is of limited utility. By organizing or analysing the data, we understand what the data means, and this becomes information. The interpretation or evaluation of information yield knowledge. • Information: It emerges when relationships among facts are established and understood. Providing answers to ‘who’, ‘what’, ‘where’, and ‘when’ gives the relationships among facts. • Wisdom: It is the understanding of the principles of relationships that describe patterns. Providing the answer for ‘why’ gives understanding of the interaction between patterns. • Frame problem: It is a problem of representing the facts that change as well as those that do not change. 3.9 QUESTIONS AND EXERCISES 1. What do AI researchers mean by ‘knowledge’? 2. In which language are many knowledge-based programs written? 3. Knowledge is a description of the world. It determines a system’s competence by what it knows. Explain. 4. Discuss the knowledge model. 5. Explain the key features of the frame problem. 82 Self-Instructional Material 3.10 FURTHER READING Knowledge Representation Issues 1. www.plato.standford.edu 2. www.indiastudychannel.com NOTES 3. www.egeria.cs.cf.ac.uk 4. www.cs.cardiff.ac.uk 5. www.guthulamurali.freeservers.com 6. www.meanderingpassage.com 7. www.jpmf.house.cern.ch 8. www.mm.iit.uni-miskolc.hu 9. www.bookrags.com 10. www.stewardess.inhatc.ac.kr 11. www.seop.leeds.ac.uk 12. www.nwlink.com Self-Instructional Material 83 Using Predicate Logic UNIT 4 USING PREDICATE LOGIC Structure 4.0 4.1 4.2 4.3 NOTES Introduction Unit Objectives The Basic in Logic Representing Simple Facts in Logic 4.3.1 Logic 4.4 Representing Instance and Isa Relationships 4.5 Computable Functions and Predicates 4.6 Resolution 4.6.1 4.6.2 4.6.3 4.6.4 4.6.5 4.7 4.8 4.9 4.10 4.11 4.12 Algorithm: Convert to Clausal Form The Basis of Resolution Resolution in Propositional Logic The Unification Algorithm Resolution in Predicate Logic Natural Deduction Summary Key Terms Answers to ‘Check Your Progress” Questions and Exercises Further Reading 4.0 INTRODUCTION In the preceding unit, you learnt about the various aspects of knowledge representation. In this unit, you will learn about the various concepts relating to the use of predicate logic. You will also learn about how to use simple facts in logic. The unit will also discuss resolution and natural deduction. 4.1 UNIT OBJECTIVES After going through this unit, you will be able to: • Define the types of logic • Discuss the use of predicate logic • Describe the representation of simple facts in logic • Discuss the representation of Instance and Isa Relationships • Analyse resolution and natural deduction 4.2 THE BASICS IN LOGIC Logic is a language for reasoning; a collection of rules are used in logical reasoning. We briefly discuss the following: Types of logic • Propositional logic - logic of sentences • Predicate logic - logic of objects Self-Instructional Material 85 Using Predicate Logic • Logic involving uncertainties • Fuzzy logic - dealing with fuzziness NOTES • Temporal logic, etc Types of logic Propositional logic and Predicate logic are fundamental to all logic. Propositional logic • Propositions are ‘Sentences’; either true or false but not both. • A sentence is the smallest unit in propositional logic. • If proposition is true, then truth value is ‘true’; else ‘false’. • Ex: Sentence – ‘Grass is green’; Truth value ‘true’; Proposition is ‘yes’ Predicate logic • Predicate is a function, may be true or false for arguments. • Predicate logic is rules that govern quantifiers. • Predicate logic is propositional logic added with quantifiers. Ex: ‘The sky is blue’, ‘The cover of this book is blue’ Predicate is ‘is blue’, give a name ‘B’; Sentence is represented as ‘B(x)’, B(x) reads as ‘x is blue’; Object is represented as ‘x’ Mathematical Predicate Logic is an important area of Artificial Intelligence: • Predicate logic is one of the major knowledge representation and reasoning methods. • Predicate logic serves as a standard of comparison for other representation and reasoning methods. • Predicate logic has a sound mathematical basis. • The PROLOG Language is based on logic. The form of predicate logic most commonly used in AI is first order predicate calculus. Predicate logic requires that certain strong conditions be satisfied by the data being represented: • Discrete Objects: The objects represented must be discrete individuals like people, trucks, but not 1 ton of sugar. • Truth or Falsity: Propositions must be entirely true or entirely false, inaccuracy or degrees of belief are not representable. • Non-contradiction: Not only must data not be contradictory, but facts derivable by rules must not contradict. In many applications information is to be encoded into certain database originate from descriptive sentences that are difficult and unnatural to represent by simple structure called Arrays or Sets. Hence logic is used to represent these sentences which we call predicate calculus, which represents real-world facts, and has logical propositions written as well-formed formula. 86 Self-Instructional Material Predicate logic is one in which a statement from a natural language like English is translated into symbolic structures comprising predicates, functions, variables, constants, qualifiers and logical connectives. The syntax for the predicate logic is determined by the symbols that are allowed and the rules of combination. The semantics is determined by the interpretations assigned to predicates. The symbols and rules of combination of a predicate logic are: (1) Predicate Symbol (2) Constant Symbol (3) Variable Symbol (4) Function Symbol Using Predicate Logic NOTES In addition to these symbols, parenthesis, delimiter and comma are also used. Predicate Symbol: This symbol is used to represent a relation in the domain of discourse. For example: Valmiki wrote Ramayana – wrote (Valmiki, Ramayana) In this, wrote is a Predicate. e.g., Rama loves Sita – loves (Rama, Sita). Constant Symbol: A constant symbol is simple term and is used to represent object or entities in the domain of discourse. These objects may be physical objects or people or anything we name. e.g., Rama, Sita, Ramayana, Valmiki Variable Symbol: These symbols are permitted to have indefinite values about the entity which is being referred to or specified. e.g., X loves Y – loves(X, Y). In this X, Y are variables. Function Symbol: These symbols are used to represent a special type of relationship or mapping. e.g., Rama’s father is married to Rama’s Mother – Married(father(Rama), Mother(Rama)). In this Father and Mother are functions and married is Predicate. Constants, variables and functions are referred to as terms and predicates are referred to as atomic formulas or atoms. The statements in predicate logic are termed as well-formed formula. A predicate with no variable is called a ground atom. 4.3 REPRESENTING SIMPLE FACTS IN LOGIC 4.3.1 Logic Logic is concerned with the truth of statements about the world. Generally each statement is either TRUE or FALSE. Logic includes: Syntax, Semantics and Inference Procedure. Syntax: Specifies the symbols in the language about how they can be combined to form sentences. The facts about the world are represented as sentences in logic. Semantic: Specifies how to assign a truth value to a sentence based on its meaning in the world. It specifies what facts a sentence refers to. A fact is a claim about the world, and it may be TRUE or FALSE. Inference Procedure: Specifies methods for computing new sentences from an existing sentence. Self-Instructional Material 87 Using Predicate Logic Note: Facts are claims about the world that are True or False. Representation is an expression (sentence), that stands for the objects and relations. NOTES Sentences can be encoded in a computer program. • Logic as a Knowledge Representation Language: Logic is a language for reasoning, a collection of rules used while doing logical reasoning. Logic is studied as KR languages in artificial intelligence. • Logic is a formal system in which the formulas or sentences have true or false values. • The problem of designing a KR language is a tradeoff between that which is: i. Expressive enough to represent important objects and relations in a problem domain. ii. Efficient enough in reasoning and answering questions about implicit information in a reasonable amount of time. • Logics are of different types: Propositional logic, Predicate logic, temporal logic, Modal logic, Description logic, etc; They represent things and allow more or less efficient inference. • Propositional logic and Predicate logic are fundamental to all logic. Propositional Logic is the study of statements and their connectivity. Predicate Logic is the study of individuals and their properties. 4.3.1.1 Logic representation The Facts are claims about the world that are True or False. Logic can be used to represent simple facts. To build a Logic-based representation: • User defines a set of primitive symbols and the associated semantics. • Logic defines ways of putting symbols together so that user can define legal sentences in the language that represent TRUE facts. • Logic defines ways of inferring new sentences from existing ones. • Sentences - either TRUE or false but not both are called propositions. • A declarative sentence expresses a statement with a proposition as content; Example: The declarative ‘snow is white’ expresses that snow is white; further, ‘snow is white’ expresses that snow is white is TRUE. In this section, first Propositional Logic (PL) is briefly explained and then the Predicate logic is illustrated in detail. 4.3.2 Propositional logic (PL) A proposition is a statement, which in English would be a declarative sentence. Every proposition is either TRUE or FALSE. Examples: (a) The sky is blue. (b) Snow is cold. , (c) 12 * 12=144 • Propositions are ‘sentences’, either true or false but not both. 88 Self-Instructional Material • A sentence is the smallest unit in propositional logic. • If proposition is true, then truth value is ‘true’. if proposition is false, then truth value is ‘false’. Example: Sentence Truth value Proposition (Y/N) Using Predicate Logic NOTES ‘Grass is green’ ‘true’ Yes ‘2 + 5 = 5’ ‘false’ Yes ‘Close the door’ - No ‘Is it hot outside?’ - No ‘x > 2’ where is variable - No (since x is not defined) ‘x = x’ - No (don’t know what is ‘x’ and ‘=‘; ‘3 = 3’ or ‘air is equal to air’ or ‘Water is equal to water’ has no meaning) – Propositional logic is fundamental to all logic. – Propositional logic is also called Propositional calculus, sentential calculus, or Boolean algebra. – Propositional logic describes the ways of joining and/or modifying entire propositions, statements or sentences to form more complicated propositions, statements or sentences, as well as the logical relationships and properties that are derived from the methods of combining or altering statements. Statement, variables and symbols These and a few more related terms such as, connective, truth value, contingencies, tautologies, contradictions, antecedent, consequent and argument are explained below. – Statement: Simple statements (sentences), TRUE or FALSE, that do not contain any other statement as a part, are basic propositions; lower-case letters, p, q, r, are symbols for simple statements. Large, compound or complex statement are constructed from basic propositions by combining them with connectives. – Connective or Operator: The connectives join simple statements into compounds, and join compounds into larger compounds. Table 4.1 indicates five basic connectives and their symbols: – Listed in decreasing order of operation priority; – An operation with higher priority is solved first. Example of a formula: ((((a ⊄ ¬b) V c ’! d) ”! ¬ (a V c)) Self-Instructional Material 89 Table 4.1 Connectives and Symbols in Decreasing Order of Operation Priority Using Predicate Logic Connective NOTES Symbols assertion P negation ¬p conjunction Read as ‘p is true’ ~ ! pΛq · && Pvq || | NOT & ‘p is false’ AND ‘both p and q are true’ disjunction OR ‘either p is true, or q is true, or both’ p→q ⊃ implication if..then ⇒ ‘if p is true, then q is true’ ‘p implies q’ ↔ equivalence ≡ if and only if ⇔ ‘p and q are either both true or both false’ Note: The propositions and connectives are the basic elements of propositional logic. Truth value The truth value of a statement is its TRUTH or FALSITY. Example: p is either TRUE or FALSE, ~p is either TRUE or FALSE, p v q is either TRUE or FALSE, and so on. Use ‘ T ‘ or ‘ 1 ‘ to mean TRUE. use ‘ F ‘ or ‘ 0 ‘ to mean FALSE Truth table defining the basic connectives: p T T F q T F T ¬p F F T ¬q F T F F F T T p T F F q pvq T T T F F p’!q T F T T p ”! q q’!p T F F T T T F T Tautologies A proposition that is always true is called a tautology, e.g. (P v ¬P) is always true regardless of the truth value of the proposition P. Contradictions A proposition that is always false is called a contradiction. e.g., (P ‘“ ¬P) is always false, regardless of the truth value of the proposition P. Contingencies A proposition is called a contingency, if that proposition is neither a tautology nor a contradiction, e.g. (P v Q) is a contingency. Antecedent, Consequent In the conditional statements, p ’! q , the 1st statement or ‘if - clause’ (here p) is called antecedent, 2nd statement or ‘then - clause’ (here q) is called consequent. 90 Self-Instructional Material Using Predicate Logic Argument Any argument can be expressed as a compound statement. Take all the premises, conjoin them and make that conjunction the antecedent of a conditional and make the conclusion the consequent. This implication statement is called the corresponding conditional of the argument. NOTES Note: • Every argument has a corresponding conditional, and every implication statement has a corresponding argument. • Because the corresponding conditional of an argument is a statement, it is therefore either a tautology, or a contradiction or a contingency. • An argument is valid ‘if and only if’ its corresponding conditional is a tautology. • Two statements are consistent ‘if and only if’ their conjunction is not a contradiction. • Two statements are logically equivalent ‘if and only if’ their truth table columns are identical; ‘if and only if’ the statement of their equivalence using ‘a”’ is a tautology. Note Truth tables are adequate to test validity, tautology, contradiction, contingency, consistency and equivalence. Here we will highlight major principles involved in knowledge representation. In particular, predicate logic will be met in other knowledge representation schemes and reasoning methods. The following are the standard logic symbols we use in this topic: There are two types of quantifiers: 1. for all 2. There exists ∀ ∃ Connectives: Implies → Not ¬ Or ∨ And ∧ Let us now look at an example of how predicate logic is used to represent knowledge. There are other ways but this form is popular. The propositional logic is not powerful enough for all types of assertions; Example: The assertion ‘x > 1’, where x is a variable, is not a proposition because it is neither true nor false unless value of x is defined. For x > 1 to be a proposition, • either we substitute a specific number for x; • or change it to something like ‘There is a number x for which x > 1 holds’; or ‘For every number x, x > 1 holds’. Self-Instructional Material 91 Using Predicate Logic Consider the following example: ‘All men are mortal. Socrates is a man. NOTES Then Socrates is mortal’, These cannot be expressed in propositional logic as a finite and logically valid argument (formula). We need languages that allow us to describe properties (predicates) of objects, or a relationship among objects represented by the variables. Predicate logic satisfies the requirements of a language. • Predicate logic is powerful enough for expression and reasoning. • Predicate logic is built upon the ideas of propositional logic. Predicate: Every complete sentence contains two parts: a subject and a predicate. The subject is what (or whom) the sentence is about. The predicate tells something about the subject; Example: A sentence ‘Judy {runs}’. The subject is Judy and the predicate is runs. Predicate always includes verb and tells something about the subject. Predicate is a verb phrase template that describes a property of objects or a relation among objects represented by the variables. Example: ‘The car Tom is driving is blue’; ‘The sky is blue’; ‘The cover of this book is blue.’ Predicate is ‘is blue’, describes property. Predicates are given names; Let ‘B’ be the name for predicate, ‘is blue’. Sentence is represented as ‘B(x)’, read as ‘x is blue’; ‘x’ represents an arbitrary Object . Predicate logic expressions: The propositional operators combine predicates, like If (p(....) && ( !q(....) || r (....) ) ) Examples of logic operators: disjunction (OR) and conjunction (AND). Consider the expression with the respective logic symbols || and && x < y || ( y < z && z < x) Which is true || (true && true); Applying truth table, found True Assignment for < are 3, 2, 1 for x, y, z and then the value can be FALSE or TRUE 3 < 2 || (2 < 1 && 1 < 3) It is False 92 Self-Instructional Material Predicate Logic Quantifiers Using Predicate Logic As said before, x > 1 is not a proposition and why? Also said, that for x > 1 to be a proposition, what is required? Generally, a predicate with variables (is called atomic formula) can be made a proposition by applying one of the following two operations to each of its variables: NOTES 1. Assign a value to the variable; e.g. x > 1, if 3 is assigned to x becomes 3 > 1, and it then becomes a true statement, hence a proposition. 2. Quantify the variable using a quantifier on formulas of predicate logic (called wff), such as x > 1 or P(x), by using Quantifiers on variables. Apply Quantifiers on Variables Variable x * x > 5 is not a proposition, its truth depends upon the value of variable x to reason such statements, x needs to be declared. Declaration x: a * x: a declares variable * x: a read as ‘x is an element of set a’ Statement p is a statement about x * Q x: a • p is quantification of statement statement declaration of variable x as element of set a quantifier * Quantifiers are two types: Universal quantifiers, denoted by symbol “ and existential quantifiers, denoted by symbol$. Universe of Discourse The universe of discourse is also called domain of discourse or universe andindicates: – a set of entities that the quantifiers deal. – entities can be set of real numbers, set of integers, set of all cars on a parking lot, the set of all students in a classroom, etc. – universe is thus the domain of the (individual) variables. – propositions in predicate logic are statements on objects of a universe. The universe is often left implicit in practice, but it should be obvious from the context. Examples: – About natural numbers for All x, y (x < y or x = y or x > y), there is no need to be more precise and say for All x, y in N, because N is implicit, being the universe of discourse. – About a property that holds for natural numbers but not for real numbers, it is necessary to qualify what the allowable values of x and y is. Apply Universal quantifier ‘For All’ Universal Quantification allows us to make a statement about a collection of objects. * Universal quantification: x: a • p Self-Instructional Material 93 Using Predicate Logic * read ‘for all x in a, p holds’ * a is universe of discourse * x is a member of the domain of discourse NOTES * p is a statement about x * In propositional form, it is written as: x P(x) * read ‘for all x, P(x) holds’ ‘for each x, P(x) holds’ or ‘for every x, P(x) holds’ where P(x) is predicate, x means all the objects x in the universe P(x) is true for every object x in the universe Example: English language to Propositional form * ‘All cars have wheels’ x: car • x has wheel *x P(x) where P (x) is predicate tells: ‘x has wheels’ x is variable for object ‘cars’ that populate universe of discourse Apply Existential quantifier ‘There Exist’ Existential Quantification allows us to state that an object does exist without naming it. Existential quantification: x: a • p * read there exists an x such that p holds’ * a is universe of discourse * x is a member of the domain of discourse. * p is a statement about x In propositional form it is written as: x P(x) * read ‘there exists an x such that P(x)’ or ‘there exists at least one x such that P(x)’ * Where P(x) is predicate, x means at least one object x in the universe P(x) is true for least one object x in the universe Example: English language to Propositional form * ‘Someone loves you’ x: Someone • x loves you * x P(x),where P(x) is predicate tells : ‘x loves you’ x is variable for object ‘someone’ that populate universe of discourse Formula: In mathematical logic, a formula is a type of abstract object, a token of which is a symbol or string of symbols which may be interpreted as any meaningful unit in a formal language. Terms: Defined recursively as variables, or constants, or functions like f(t1, . . . , tn), where f is an n-ary function symbol, and t1, . . . , tn are terms. Applying predicates to terms produce atomic formulas. 94 Self-Instructional Material Using Predicate Logic Atomic formulas: An atomic formula (or simply atom) is a formula with no deeper propositional structure, i.e. a formula that contains no logical connectives or a formula that has no strict sub-formulas. – Atoms are thus the simplest well-formed formulas of the logic. NOTES – Compound formulas are formed by combining the atomic formulas using the logical connectives. – Well-formed formula (‘wiff’) is a symbol or string of symbols (a formula) generated by the formal grammar of a formal language. An atomic formula is one of the form: – t1 = t2, where t1 and t2 are terms, or – R(t1, . . . , tn), where R is an n-ary relation symbol, and t1, . . . , tn are terms. – ¬ a is a formula when a is a formula. – (a ‘“ b) and (a v b) are formula when a and b are formula Compound formula: example ((((a ‘“ b) ‘“ c) (“ ((¬ a ‘“ b) ‘“ c)) (“ ((a ‘“ ¬ b) ‘“ c)) Example: Consider the following: – Prince is a megastar. – Megastars are rich. – Rich people have fast cars. – Fast cars consume a lot of petrol. and try to draw the conclusion: Prince’s car consumes a lot of petrol. So we can translate Prince is a megastar into: megastar (prince) and Megastars are rich into: m: megastar (m) rich (m) Rich people have fast cars, the third axiom is more difficult: • Is cars a relation and therefore car (c, m) says that case c is m’s car. OR • Is a car a function? So we may have car_of (m). Assume cars are a relation then axiom 3 may be written: c, m: car (c, m) rich (m) fast (c). The fourth axiom is a general statement about fast cars. Let consume(c) mean that car c consumes a lot of petrol. Then we may write: c: fast (c) m : car (c, m) consume (c) . Is this enough? NO! — Does prince have a car? We need the car_of function after all (and addition to car): c: car (car_of (m), m). The result of applying car_of to m is m’s car. The final set of predicates is: megastar (prince) Self-Instructional Material 95 Using Predicate Logic m: megastar (m) rich (m) c: car (car_of (m),m). c, m: car (c, m) rich (m) fast(c). c: fast(c) m : car (c, m) consume(c) . NOTES Given this, we could conclude: Consume (car_of (prince)). Example: A Predicate Logic 1. Marcus was a man. Man (Marcus) 2. Marcus was a Pompeian. Pompeian (Marcus) 3. All Pompeians were Romans. x: Pompeian(x)→Roman(x) 4. Caesar was a ruler. Ruler (Caesar) 5. All Romans were either loyal to Caesar or hated him. x: Roman(x) → loyalto(x, Caesar) V hate(x, Caesar) 6. Everyone is loyal to someone. x: ∃y: loyalto(x, y) 7. People only try to assassinate rulers they aren’t loyal to. x:”y:person(x)L ruler(y) L tryassassinate(x, y)®Ø loyalto(x, y) 8. Marcus tried to assassinate Caesar. tryassassinate(Marcus, Caesar) 9. All men are people. x: man(x) → person(x) 4.4 REPRESENTING INSTANCE AND ISA RELATIONSHIPS The two attributes, isa and instance play an important role in many aspects of knowledge representation. The reason for this is that they support property inheritance. isa Used to show class inclusion. Ex. isa(megastar, rich). instance Used to show class membership, Ex. instance (prince, megastar). From the above it should be simple to see how to represent these in predicate logic. 96 Self-Instructional Material 4.5 COMPUTABLE FUNCTIONS AND PREDICATES The objective is to define class of functions C computable in terms of F. This is expressed as C {F} is explained below using two examples: (1) ‘evaluate factorial n’ and (2) ‘expression for triangular functions’. Using Predicate Logic NOTES Ex: (1) A conditional expression to define factorial n or n! Expression ‘If p1 then e1 else if p2 then e2 . . . else if pn then en’ . i.e. (p1 ’!?e1, p2 ’!?e2 . . . pn ’!?en) Here p1, p2 . . . pn are propositional expressions taking the values T or F for true and false respectively. The value of (p1 ’!?e1, p2 ’!?e2 . . . pn ’!?en) is the value of the e corresponding to the first p that has value T. The expressions defining n! , n= 5, recursively are: n! = n x (n-1)! for n e”?1 5! = 1 x 2 x 3 x 4 x 5 = 120 0! = 1 The above definition incorporates an instance that the product of no numbers i.e. 0! = 1, then only, the recursive relation (n + 1)! =n! X (n+1) works for n = 0. Now, use conditional expressions n! = (n = 0 ’!?1, n ‘“ 0 ’!?n. (n – 1)!) to define functions recursively. Ex: Evaluate 2! According to above definition. 2! = (2 = 0 ’!?1, 2 ‘“ 0 ’!?2. (2 – 1)! ) = 2 x 1! = 2 x (1 = 0 ’!?1, 1 ‘“ 0 ’!?1. (1 – 1)! ) = 2 x 1 x 0! = 2 x 1 x (0 = 0 ’!?1, 0 ‘“ 0 ’!?0. (0 – 1)! ) =2x1x1 =2 4.6 RESOLUTION Resolution is a procedure used in proving that arguments which are expressible in predicate logic are correct. Resolution is a procedure that produces proofs by refutation or contradiction. Resolution leads to refute a theorem-proving technique for sentences in propositional logic and first-order logic. Resolution is a purely syntactic uniform proof procedure. Resolution does not consider what predicates may mean, but only what logical conclusions may be Self-Instructional Material 97 Using Predicate Logic NOTES derived from the axioms. The advantage is Resolution is universally applicable to problems that can be described in first order logic. The disadvantage is Resolution by itself cannot make use of any domain dependent heuristics. Despite many attempts to improve the efficiency of resolution, if often takes exponential time. – Resolution is a rule of inference. – Resolution is a computerized theorem prover. – Resolution is so far only defined for Propositional Logic. The strategy is that the Resolution techniques of Propositional logic be adopted in Predicate Logic. 1. Proof in predicate logic is composed of a variety of processes including modus ponens, chain rule, substitution, matching, etc. 2. However, we have to take variables into account, e.g. parent of(Ram, Sam) AND Not Parent of(Ram, Lakshman) cannot be resolved. 3. In other words, the resolution method has to look inside predicates to discover if they can be resolved. 4.6.1 Algorithm: Convert to Clausal Form 98 Self-Instructional Material 1. Eliminate→ P → Q ≡ ¬P ∨ Q 2. Reduce the scope of each ¬ to a single term. ¬(P ∨ Q) ≡ ¬P ∧ ¬Q ¬(P ∧ Q) ≡ ¬P ∨ ¬Q ¬‘x: P ≡ ∃x: ¬P ∃¬x: p ≡ ‘x: ¬P ¬¬ P ≡ P 3. Standardize variables so that each quantifier binds a unique variable. ( ∀ x: P(x)) ∨ (∃x: Q(x)) ≡ ( ∀ x: P(x)) ∨ (∃y: Q(y)) 4. Move all quantifiers to the left without changing their relative order. ( ∀ x: P(x)) ∨ (∃y: Q(y)) ≡ ∀ x: ∃y: (P(x) ∨ (Q(y)) 5. Eliminate ∃ (Solemnization). ∃x: P(x) ≡ P(c) Skolem constant ∀ x: ∃y P(x, y) ≡ ∀ x: P(x, f(x)) Skolem function 6. Drop ∀ . ∀ x: P(x) ≡ P(x) 7. Convert the formula into a conjunction of disjuncts. (P ∧ Q) ∨ R ≡ (P ∨ R) ∧ (Q ∨ R) 8. Create a separate clause for each conjunction. 9. Standardize apart the variables in the set of obtained clauses. Example:1 ∀ x: [Roman(x) Λ know(x, Marcus)] →[hate(x, Caesar) V ( ∀ y: ∃z: hate(y, z) → thinkcrazy(x, y))] 1. Eliminate → ∀ x: ¬[Roman(x) Λ know(x, Marcus)] V [hate(x, Caesar) V ( ∀ y: ∃¬z: hate(y, z) V thinkcrazy(x, y))] 2. Reduce scope of¬. ∀ x: [ ¬Roman(x) V ¬ know(x, Marcus)] V [hate(x, Caesar) V ( ∀ y: z: ¬hate(y, z) V thinkcrazy(x, y))] 3. ‘Standardize’ variables: ∀ x: P(x) V ∀ x: Q(x) converts to ∀ x: P(x) V ∀ y: Q(y) 4. Move quantifiers. ∀ x: ∀ y: ∀ z: [¬Roman(x) V ¬ know(x, Marcus)] V [hate(x, Caesar) V (¬hate(y, z) V thinkcrazy(x, y))] 5. Eliminate existential quantifiers. ∃y: President(y) will be converted to President(S1) ∀ x: ∃y: father-of(y, x) will be converted to ∀ x: father-of(S2(x), x)) 6. Drop the prefix. [ ¬Roman(x) ¬ know(x, Marcus)] V [hate(x, Caesar) V (¬ hate(y, z) V thinkcrazy(x, y))] 7. Convert to a conjunction of disjuncts. ¬ Roman(x) V ¬ know(x, Marcus) V hate(x, Caesar) V ¬ hate(y, z) V thinkcrazy(x, y) 8. Create a separate clause corresponding to each conjunct. 9. Standardize apart the variables in the set of clauses generated step 8. Using Predicate Logic NOTES First-Order Resolution _ Generalized Resolution Rule For clauses P V Q and ¬Q′VR with Q, Q′ atomic formulae _ where θ is a most general unifier for Q and Q′ _ (P v R) θ is the resolvent of the two clauses Applying Resolution Refutation – Negate query to be proven (resolution is a refutation system) – Convert knowledge base and negated query into CNF and extract clauses – Repeatedly apply resolution to clauses or copies of clauses until either the empty clause (contradiction) is derived or no more clauses can be derived (a copy of a clause is the clause with all variables renamed) – If the empty clause is derived, answer ‘yes’ (query follows from knowledge base); otherwise, answer ‘no’ (query does not follow from knowledge base). Resolution: Example 1 |- ∃x (P(x) →∀xP(x)) CNF (¬∃x(P(x)→∀xP(x))) 1, 2. ∀x¬(¬P(x)(∀∀xP(x)) 2. ∀x (¬¬P(x)’∀¬∀xP(x)) 2, 3. ∀x (P(x) ‘∨∃x¬P(x)) 4. ∀x (P(x)∧∃y ¬P(y)) 5. ∀x∃y(P(x)∧∀¬P(y)) 6. ∀x (P(x)∧∀¬P( f (x))) 8. P(x), ¬P( f (x)) 1. P(x) [¬ Conclusion] Self-Instructional Material 99 Using Predicate Logic NOTES 2. ¬P( f (y)) [Copy of ¬ Conclusion] 3. _ [1, 2 Resolution {x/ f (y)}] Resolution — Example 2 |- ∃x∀y∀z ((P(y)(∨Q (z)) → (P(x)(∨Q(x))) 1. P ( f (x))(∨Q(g(x)) [¬ Conclusion] 2. ¬P(x) [¬ Conclusion] 3. ¬Q(x) [¬ Conclusion] 4. ¬P(y) [Copy of 2] 5. Q(g(x)) [1, 4 Resolution {y/ f (x)}] 6. ¬Q(z) [Copy of 3] 7. _ [5, 6 Resolution {z/g(x)}] 4.6.2 The Basis of Resolution The resolution procedure is a simple iterative process: at every step two clauses (parent clauses) are compared for deriving a new clause that has been inferred from them. The new clause represents that the two parent clauses interact with each other. 4.6.2.1 Soundness and completeness The notation p |= q is read ‘p entails q’; it means that q holds in every model in which p holds. The notation p |- q means that q can be derived from p by some proof mechanism m. A proof mechanism m is sound if p |-m q → p |= q. A proof mechanism m is complete if p |= q → p |-m q. Resolution for predicate calculus is: Sound: If it is derived by resolution, then the original set of clauses is unclassifiable. Complete: If a set of clauses is unclassifiable, resolution will eventually derive. However, this is a search problem and may take a very long time. We generally are not willing to give up soundness, since we want our conclusions to be valid. We might be willing to give up completeness: if a soundproof procedure will prove the theorem we want, that is enough. Example: Resolution – Another type of proof system based on refutation – Better suited to computer implementation than systems of axioms and rules (can give correct ‘no’ answers) – Generalizes to first-order logic (see next week) – The basis of Prolog’s inference method – To apply resolution, all formulae in the knowledge base and the query must be in clausal form (c.f. Prolog clauses) Resolution Rule of Inference Resolution Rule 100 Self-Instructional Material ¬Q′′ ∨ R P∨Q Using Predicate Logic NOTES (P ∨ Q) θ Resolution Rule: Key Idea _ Consider A V B and ¬B V C if B is True, ¬B is False and truth of second formula depends only on C if B is False, truth of first formula depends only on A Only one of B, ¬B is True, so if both A V B and ¬B V C is True, either A or C is True, i.e. A V C is True. Applying Resolution – The resolution rule is sound (resolvent entailed by two ‘parent’ clauses) – How can we use the resolution rule? One way: – Convert knowledge base into clausal form – Repeatedly apply resolution rule to the resulting clauses – A query A follows from the knowledge base if and only if each of the clauses in the CNF of A can be derived using resolution – There is a better way . . . Refutation Systems – To show that P follows from S (i.e. S P) using refutation, start with S and âÿP in clausal form and derive a contradiction using resolution – A contradiction is the ‘empty clause’ (a clause with no literals) – The empty clause _ is unsatisfiable (always False) – So if the empty clause _ is derived using resolution, the original set of clauses is unclassifiable (never all True together) – That is, if we can derive _ from the clausal forms of S and âÿP, these clauses can never be all True together – Hence whenever the clauses of S are all True, at least one clause from âÿP must be False, i.e. âÿP must be False and P must be True – By definition, S |= P (so P can correctly be concluded from S) Applying Resolution Refutation – Negate query to be proven (resolution is a refutation system) – Convert knowledge base and negated conclusion into CNF and extract clauses – Repeatedly apply resolution until either the empty clause (contradiction) is derived or no more clauses can be derived – If the empty clause is derived, answer ‘yes’ (query follows from knowledge base), otherwise answer ‘no’ (query does not follow from knowledge base) Resolution: Example 1 (G V H)→(¬ J Λ ¬K), G ¬J Clausal form of (G V H) → (¬J Λ ¬K) is {¬G V ¬J, ¬H V ¬J, ¬G V ¬K, ¬H V ¬K} Check Your Progress 1. Predicate logic does not serve as a standard of comparison for other representation and reasoning methods. (True or False) 2. List four types of logic. 3. Define formula. atomic 4. List the two natural deduction methods. Self-Instructional Material 101 Using Predicate Logic NOTES 1. ¬G V ¬J [Premise] 2. ¬H V ¬J [Premise] 3. ¬G V ¬K [Premise] 4. ¬H V ¬K [Premise] 5. G [Premise] 6. J [¬ Conclusion] 7. ¬G [1, 6 Resolution] 8. _ [5, 7 Resolution] Resolution: Example 2 P→¬Q, ¬Q→R |– P’!R Recall P→R ≡ ¬P V R Clausal form of ¬(¬P V R) is {P, ¬R} 1. ¬P V ¬Q [Premise] 2. Q V R [Premise] 3. P [¬ Conclusion] 4. ¬R [¬ Conclusion] 5. ¬Q [1, 3 Resolution] 6. R [2, 5 Resolution] 7. _ [4, 6 Resolution] Resolution: Example 3 ((P V Q) ∧ ¬P)→Q Clausal form of ¬(((P V Q) ∧ ¬P)→Q) is {P V Q, ¬P, ¬Q} 1. P V Q [¬ Conclusion] 2. ¬P [¬ Conclusion] 3. ¬Q [¬ Conclusion] 4. Q [1, 2 Resolution] 5. [3, 4 Resolution] Soundness and Completeness Again – Resolution refutation is sound, i.e. it preserves truth (if a set of premises are all true, any conclusion drawn from those premises must also be true) – Resolution refutation is complete, i.e. it is capable of proving all consequences of any knowledge base (not shown here!) – Resolution refutation is decidable, i.e. there is an algorithm implementing resolution which when asked whether S |– P, can always answer ‘yes’ or ‘no’ (correctly) Heuristics in Applying Resolution – Clause elimination—can disregard certain types of clauses – Pure clauses: contain literal L where ¬ L doesn’t appear elsewhere – Tautologies: clauses containing both L and âÿL – Subsumed clauses: another clause exists containing a subset of the literals – Ordering strategies – Resolve unit clauses (only one literal) first – Start with query clauses – Aim to shorten clauses 102 Self-Instructional Material Using Predicate Logic 4.6.2.2 Resolution strategies Different strategies have been tried for selecting the clauses to be resolved. These include: a. Level saturation or two-pointer method: the outer pointer starts at the negated conclusion; the inner pointer starts at the first clause. The two clauses denoted by the pointers are resolved, if possible, with the result added to the end of the list of clauses. The inner pointer is incremented to the next clause until it reaches the outer pointer; then the outer pointer is incremented and the inner pointer is reset to the front. The two-pointer method is a breadth-first method that will generate many duplicate clauses. b. Set of support: One clause in each resolution step must be part of the negated conclusion or a clause derived from it. This can be combined with the two-pointer method by putting the clauses from the negated conclusion at the end of the list. Set-of-support keeps the proof process focused on the theorem to be proved rather than trying to prove everything. c. Unit preference: Clauses are prioritized, with unit clauses preferred, or more generally shorter clauses preferred. To reach our goal, this has zero literals and is only obtained by resolving two unit clauses. Resolution with a unit clause makes the result smaller. d. Linear resolution: one clause in each step must be the result of the previous step. This is a depth-first strategy. It may be necessary to back up to a previous clause if no resolution with the current clause is possible. NOTES 4.6.3 Resolution in Propositional Logic Algorithm: Propositional Resolution 1. Convert all the propositions of KB to clause form (S). 2. Negate a and convert it to clause form. Add it to S. 3. Repeat until either a contradiction is found or no progress can be made. a. Select two clauses (α ∨ ¬P) and ( γ ∨ P). b. Add the resolvent (α ∨ γ) to S. Ex: 1. KB = {P, (P ∧ Q) → R, (S ∨ T) → Q, T} α = R then KB = {P(a), ‘x: (P(x) ∧ Q(x)) → R(x), ‘y: (S(y) ∨ T(y)) → Q(y), T(a)} α = R(a) Ex: 2. A few Facts in Prepositional Logic Given Axioms P Clause Form P (1) (P Λ Q) → R ¬P V ¬Q V R (2) (S V T) → Q ¬S V Q (3) ¬T V Q (4) T T (5) Self-Instructional Material 103 Using Predicate Logic ¬P V ¬Q V R ¬R ¬P V ¬Q NOTES ¬T V Q P ¬Q ¬T T Fig. 4.1 Resolution in Prepositional Logic 4.6.4 The Unification Algorithm The resolution step is still valid for predicate calculus, i.e. if clause C1 contains a literal L and clause C2 contains ¬ L, then the resolvent of C1 and C2 is a logical consequence of C1 and C2 and added to the set of clauses without changing its truth value. However, since L in C1 and ¬ L in C2 may contain different arguments, we must do something to make L in C1 and ¬ L in C2 exactly complementary. Since every variable is universally quantified, we are justified in substituting any term for a variable. Thus, we need to find a set of substitutions of terms for variables that will make the literals L in C1 and ¬ L in C2 exactly the same. The process of finding this set of substitutions is called unification. We wish to find the most general unifier of two clauses, which will preserve as many variables as possible. Example: C1 = P(z) v Q(z), C2 = ¬ P(f(x)) v R(x) . If we substitute f(x) for z in C1, C1 becomes P(f(x)) v Q(f(x)) . The resolvent is Q(f(x)) v R(x) . Examples of Unification Consider unifying the literal P(x, g(x)) with: 1. P(z, y) : unifies with {x/z, g(x)/y} 2. P(z, g(z)): unifies with {x/z} or {z/x} 3. P(Socrates, g(Socrates)) : unifies, {Socrates/x} 4. P(z, g(y)): unifies with {x/z, x/y} or {z/x, z/y} 5. P(g(y), z): unifies with {g(y)/x, g(g(y))/z} 6. P (Socrates, f (Socrates)): does not unify: f and g do not match. 7. P (g(y), y): does not unify: no substitution works. Example 2: S = {f (x, g(x)), f (h(y), g(h(z)))} D0 = {x, h(y)} so s1 = {x/h(y)} S1 = { f (h(y),g(h(y))), f (h(y),g(h(z)))} D1 = {y, z} so s2 = {x/h(z), y/z} S2 = { f (h(z),g(h(z))), f (h(z),g(h(z)))} i.e. s2 is an m.g.u. S = { f (h(x),g(x)), f (g(x),h(x))}. . .? 4.6.4.1 Substitutions 104 Self-Instructional Material Unification will produce a set of substitutions that make two literals the same. A substitution ti / vi specifies substitution of term ti for variable vi. A substitution set can be represented as either sequential substitutions (done one at a time in sequence) or as simultaneous substitutions (done all at once). Unification can be done correctly either way. 4.6.4.2 Unification algorithm Using Predicate Logic NOTES The basic unification algorithm is simple. However, it must be implemented with care to ensure that the results are correct. We begin by making sure that the two expressions have no variables in common. If there are common variables, substitute a new variable in one of the expressions. (Since variables are universally quantified, another variable can be substituted without changing the meaning.) Imagine moving a pointer left-to-right across both expressions until parts are encountered that are not the same in both expressionsL for example, one is a variable and the other is a term not containing that variable? 1. Substitute the term for the variable in both expressions, 2. substitute the term for the variable in the existing substitution set [This is necessary so that the substitution set will be simultaneous]. 3. add the substitution to the substitution set. Algorithm: Unify (L1, L2) 1. If L1 or L2 is a variable or constant, then: (a) If L1 and L2 are identical, then return NIL. (b) Else if L1 is a variable, then if L1 occurs in L2 then return FAIL, else return {(L2/L1)}. (c) Else if L2 is a variable, then if L2 occurs in L1 then return FAIL, else return {(L1/L2)}. (d) Else return FAIL. 2. If the initial predicate symbols in L1 and L2 are not identical, then return FAIL. 3. If L1 and L2 have a different number of arguments, then return FAIL 4. Set SUBST to NIL. 5. for i 1 to number of arguments in L1: a) Call Unify with the ith argument of L1 and the ith argument of L2, putting result in S. b) If S = FAIL then return FAIL. c) If S is not equal to NIL then: i. Apply S to the remainder of both L1 and L2. ii. SUBST: = APPEND(S, SUBST). 6. Return SUBST. 4.6.4.3 Unification implementation 1. Initialize the substitution set to be empty. A nil set indicates failure. 2. Recursively unify expressions: 1. Identical items match. 2. If one item is a variable vi and the other is a term ti not containing that variable, then: 1. Substitute ti / vi in the existing substitutions. 2. Add ti / vi to the substitution set. 3. If both items are functions, the function names must be identical and all arguments must unify. Substitutions are made in the rest of the expression as unification proceeds. Self-Instructional Material 105 Using Predicate Logic NOTES Unification i) We can get the inference immediately if we can find a substitution θ such that King(x) and Greedy(x) match King (John) and Greedy(y) θ = {x/John, y/John} works Unify (α, β) = θ if αθ = βθ p q θ Knows (John, x) Knows (John, Jane) —— Knows (John, x) Knows (y, OJ) Knows (John, x) Knows (y, Mother(y)) Knows (John, x) Knows (x, OJ) Standardizing apart eliminates overlap of variables, e.g. Knows (z17, OJ) ii) We can get the inference immediately if we can find a substitution θ such that King(x) and Greedy(x) match King (John) and Greedy(y) θ = {x/John, y/John} works Unify (α, β) = θ if αθ = βθ P q θ Knows (John, x) Knows (John,Jane) {x/Jane}} Knows (John, x) Knows(y, OJ) Knows (John, x) Knows(y, Mother(y)) Knows (John, x) Knows(x, OJ) Standardizing apart eliminates overlap of variables, e.g., Knows (z17, OJ) iii) We can get the inference immediately if we can find a substitution θ such that King(x) and Greedy(x) match King(John) and Greedy(y) θ = {x/John, y/John} works Unify (α, β) = θ if αθ = βθ p q θ Knows (John, x) Knows (John, Jane) {x/Jane}} Knows (John, x) Knows (y, OJ) {x/OJ, y/ John}} Knows (John, x) Knows (y, Mother(y)) Knows (John, x) Knows(x, OJ) Standardizing apart eliminates overlap of variables, e.g. Knows (z17, OJ) iv) We can get the inference immediately if we can find a substitution θ such that King(x) and Greedy(x) match King(John) and Greedy(y) θ = {x/John,y/John} works Unify(α,β) = θ if αθ = βθ p q θ Knows (John, x) Knows (John, Jane) {x/Jane}} Knows (John, x) Knows (y, OJ) {x/OJ, y/John}} Knows (John, x) Knows (y, Mother(y)) {y/John, x/Mother (John)}} Knows (John, x) Knows (x, OJ) Standardizing apart eliminates overlap of variables, e.g. Knows (z17, OJ) v) We can get the inference immediately if we can find a substitution θ such that King(x) and Greedy(x) match King (John) and Greedy(y) θ = {x/John, y/John} works Unify (α, β) = θ if αθ = βθ 106 Self-Instructional Material p q θ Knows (John, x) Knows (John, Jane) {x/Jane}} Knows (John, x) Knows (y, OJ) {x/OJ, y/John}} Knows (John, x) Knows (y, Mother(y)) {y/John, x/Mother (John)}} Knows (John, x) Knows (x, OJ) {fail} Standardizing apart eliminates overlap of variables, e.g. Knows (z17, OJ) To unify Knows (John, x) and Knows (y, z), θ = {y/John, x/z } or θ = {y/John, x/John, z/John} The first unifier is more general than the second. There is a single most general unifier (MGU) that is unique up to renaming of variables. MGU = { y/John, x/z } Using Predicate Logic NOTES 4.6.5 Resolution in Predicate Logic Algorithm: Resolution 1. Convert all the propositions of F to clause form. 2. Negate P and convert to clause form. Add it to the set of clauses obtained in 1. 3. Repeat until a contradiction is found, no progress can be made, or a predetermined amount of effort has been expanded. a. Select two clauses. Call these parent clauses. b. Resolve them together. The resolvent will be the disjunction of all the literals of both parent clauses with appropriate substitutions performed and with the following exception: If there is one pair of literals T1 and ¬ T2, such that one of the parent clauses contains T1 and the other contains ¬ T2 and if T1 and T2 are unifiable, then neither T1 nor ¬ T2 should appear in the resolvent. If there is more than one pair of complementary literals, only one pair should be omitted from the resolvent. c. If the resolvent is the empty clause, then a contradiction has been found. If it is not, then add it to the set of clauses available to the procedure. Example: 1. 2. 3. 4. 5. 6. 7. 8. Marcus was a man. Marcus was a Pompeian. All Pompeians were Romans. Caesar was a ruler. All Pompeians were either loyal to Caesar or hated him. Every one is loyal to someone. People only try to assassinate rulers they are not loyal to. Marcus tried to assassinate Caesar. • Axioms in clause form are: 1. man(Marcus) 2. Pompeian(Marcus) 3. ¬ Pompeian(x1) v Roman(x1) 4. Ruler(Caesar) Self-Instructional Material 107 Using Predicate Logic 5. 6. 7. 8. NOTES ¬ Roman(x2) v loyalto(x2, Caesar) v hate(x2, Caesar) loyalto(x3, f1(x3)) ¬ man(x4) v ¬ ruler(y1) v ¬ tryassassinate(x4, y1) v loyalto(x4, y1) tryassassinate(Marcus, Caesar) Prove: hate (Marcus, Caesar) Prove: hate (Marcus, Caesar) 5 hate(Marcus, Caesar) 3 Roman(Marcus) V loyalto(Marcus, Caesar) Marcus/x 2 Pompeion(Marcus) V loyalto(Marcus, Caesar) 7 loyalto(Marcus, Caesar) Marcus/x4, Caesar/y1 1 man(Marcus) V ruler (Caesar)V ruler (Caesar)V tryassassinate(Marcus, Caesar) tryassassinate(Marcus, Caesar) 4 8 tryassassinate(Marcus, Caesar) Fig. 4.2 Resolution Proof Prove: loyalto (Marcus, Caesar) 5 loyalto(marcus, caesar) Marcus/x2 Roman(Marcus) V hate(Marcus, Caesar) 3 Marcus/x1 Pompeion(Marcus) V hate(Marcus, Caesar) 2 hate(Marcus, Caesar) (a) hate(Marcus, Caesar) 10 Marcus/x6, Caesar/y3 Presecute(Caesar, Marcus,) 9 Marcus/x5, Caesar/y2 hate(Marcus, Caesar) : : (b) Fig. 4.3 An Unsuccessful Attempt at Resolution 4.7 NATURAL DEDUCTION 108 Self-Instructional Material Natural deduction methods perform deduction in a manner similar to reasoning used by humans, e.g., in proving mathematical theorems. Forward chaining and backward chaining are natural deduction methods. These are similar to the algorithms described earlier for propositional logic, with extensions to handle variable bindings and unification. Backward chaining by itself is not complete, since it only handles Horn clauses (clauses that have at most one positive literal). Not all clauses are Horn; for example, ‘every person is male or female’ becomes ¬ Person(x) V Male(x) V Female(x) which has two positive literals. Such clauses do not support backward chaining. Splitting can be used with back chaining to make it complete. Splitting makes assumptions (Ex. ‘Assume x is Male’) and attempts to prove the theorem for each case. Using Predicate Logic NOTES 4.8 SUMMARY In this unit, you have learned how predicate logic can be used as the basis of the technique for knowledge representation. You also learned how resolutions are applied in knowledge representation and problem solving technique. . There are two ways to approach achieving the computational goal. The first is to search good heuristics that can inform a theorem proving program. The second approach is to change given data to the program but not the program. A difficulty with the use of theorem proving in AI systems is that there are some kinds of information that are not easily represented in predicate logic. Consider the following examples: • ‘It is very cool today’. How can relative degree of cool be represented? • ‘Black hair people often have grey eyes.’ How can the amount of certainty be represented? • ‘It is better to have more pieces on the board than the opponent has.’ How can we represent this kind of heuristic information? • ‘I know Dravid thinks the Sachin will win, but I think they are going to lose.’ How can several different belief systems be represented at once? Using the knowledge representation on these examples we have not yet suitably given the proper representation. They primarily deal with the knowledge base that is incomplete, although other problems do also exist, such as the difficulty of representing continuous phenomena in a discrete system. You will learn about their solutions in the next units. You have also learned in this unit that: • First-order logic allows us to speak about objects, properties of objects and relationships between objects. • It also allows quantification over variables. • First-order logic is quite an expressive knowledge representation language; much more so than propositional logic. • However, you need to add things like equality if you wish to be able to do things like counting. • You have also traded expressiveness for decidability. • How much of a problem is this? • If you add (Piano) axioms for mathematics, then you encounter Gödel’s famous incompleteness theorem (which is beyond the scope of this course). You have now investigated one knowledge representation and reasoning formalism. This means you can draw new conclusions from the knowledge you have: you can reason. You have enough to build a knowledge-based agent. However, you have also learned that propositional logic is a weak language; there are many things that cannot be expressed. Self-Instructional Material 109 Using Predicate Logic NOTES To express knowledge about objects, their properties and the relationships that exist between objects, You need a more expressive language: first-order logic. 4.9 KEY TERMS • Logic: It is a language for reasoning; a collection of rules are used in logical reasoning. • Predicate logic: It is one of the major knowledge representation and reasoning methods. It serves as a standard of comparison for other representation and reasoning methods. • Propositional Logic: It is the study of statements and their connectivity. • Argument: Any argument can be expressed as a compound statement. Take all the premises, conjoin them and make that conjunction the antecedent of a conditional and make the conclusion the consequent. This implication statement is called the corresponding conditional of the argument. • Natural deduction: These methods perform deduction in a manner similar to reasoning used by humans, e.g. in proving mathematical theorems. Forward chaining and backward chaining are natural deduction methods. 4.10 ANSWERS TO ‘CHECK YOUR PROGRESS’ 1. False 2. Propositional logic, predicate logic, temporal logic and modal logic. 3. An atomic formula (or simply atom) is a formula with no deeper propositional structure, i.e., a formula that contains no logical connectives or a formula that has no strict sub-formulas. 4. Forward chaining and backward chaining are natural deduction methods. 4.11 QUESTIONS AND EXERCISES 1. Assume the facts given below: • Ram only likes easy courses. • Science courses are hard. • All the courses in the basket weaving department are easy. • MA-301 is a basket weaving course. Use a resolution to answer the question ‘What course would Ram like?’. 2. Consider the following sentences: • Sam likes all kinds of food. • Apples are food. • Chicken is food. • Anything anyone eats and isn’t killed by is food. • John eats peanuts and is still alive. • Myke eats everything John eats. a. Translate these sentences into formulae in predicate logic. b. Prove that Sam likes peanuts using a backward chaining. c. Convert the formulas of part a into clause form. 110 Self-Instructional Material d. Prove that Sam likes peanuts using resolution. e. Use a resolution to answer the question ‘What food does Myke eat?’ 3. Trace the operation of the unification algorithm on each of the following pairs of literals. a. f(Marcus) and f(Caesar) b. f(x) and f(g(x)) c. f(Marcus, g(x, y)) and f(x, g(Caesar, Marcus)) Using Predicate Logic NOTES 4. Suppose that we are attempting to resolve the following clauses: loves(mother(a), a) %loves(y, x) V loves(x, y) a. What will be the result of the unification algorithm? b. What must be generated as a result of resolving these two clauses? c. What does the example show about the order in which the substitutions determined by the unification procedure must be performed? 5. Assume the following facts: – Steve only likes easy courses. – Computing courses are hard. – All courses in Sociology are easy. – ‘Society is evil’ is a sociology course. Represent these facts in predicate logic and answer the question? What course would Steve like? 6. Find out what knowledge representation schemes are used in the STRIPS system. 7. Translate the following sentences into propositional logic: (i) If Jane and John are not in town we will play tennis. (ii) It will either rain today or it will be dry today. (iii) You will not pass this course unless you study. To do the translation you will need to (a) Identify a scheme of abbreviation (b) Identify logical connectives 8. Convert the following formulae into Conjunctive Normal Form (CNF): (i) P ! Q (ii) (P ! :Q) ! R (iii) :(P ^ :Q) ! (:R _ :Q) 9. Show using the truth table method that the following inferences are valid. (i) P ! Q; :Q j= :P (ii) P ! Q j= :Q ! :P (iii) P ! Q;Q ! R j= P ! R 10. Repeat question 3 using resolution. In this case we want to show: (i) P ! Q; :Q ‘ :P (ii) P ! Q ‘ :Q ! :P (iii) P ! Q;Q ! R ‘ P ! R Self-Instructional Material 111 Using Predicate Logic NOTES 11. Determine whether the following sentences are valid (i.e., tautologies) using truth tables. (i) ((P _ Q) ^ :P) ! Q (ii) ((P ! Q) ^ :(P ! R)) ! (P ! Q) (iii) :(:P ^ P) ^ P (iv) (P _ Q) ! :(:P ^ :Q) 12. Repeat question 5 using resolution. In this case we aim to show: (i) ‘ ((P _ Q) ^ :P) ! Q (ii) ‘ ((P ! Q) ^ :(P ! R)) ! (P ! Q) (iii) ‘ :(:P ^ P) ^ P (iv) ‘ (P_Q)!:(:P^:Q) 4.11 FURTHER READING 1. www.mm.iit.uni-kiskilc.hu 2. www.cs.cardiff.ac.uk 3. www.yoda.cis.temple.edu 4. www.csie.ntu.edu.tw 5. www.csie.cyut.edu.tw 6. www.asutonlab.org 7. www.ndsu.nodak.edu 8. www.cs.wise.edu 9. www.acs.madonna.edu 10. www.pub.ufasta.edu.ar 11. www.virtualworldlets.net 12. www.neda10.norminet.org.ph 13. www.csi.usiuthal.edu 14. www.cs.colostate.edu 15. www.cs.cf.ac.uk 16. www.cs.ust.hk 17. www.it.bond.edu.au 18. www.rwc.us.edu 19. www.cs.uwp.edu 20. Hon, Giora. 1995. ‘Going Wrong: To Make a Mistake, to Fall into an Error’. The Review of Metaphysics, Sept.. 21. Jensen, Thomas. 1997. ‘Disjunctive Program Analysis for Algebraic Data Types’. ACM Transaction on Programming Language and Systems, 1st September . 22. Field, Hartry. 2006. ‘Compositional Principle VS Schematic Reasoning’. The Monist, January. 24. Edgington, Dorothy. 1995.‘On Conditionals’. Mind, April. 112 Self-Instructional Material UNIT 5 WEAK SLOT AND FILLER STRUCTURES Weak Slot and Filler Structures NOTES Structure 5.0 Introduction 5.1 Unit Objectives 5.2 Semantic Nets 5.2.1 Representation in a Semantic Net 5.2.2 Inference in a Semantic Net 5.2.3 Extending Semantic Nets 5.3 Frames 5.3.1 Frame Knowledge Representation 5.3.2 Distinction between Sets and Instances 5.4 5.5 5.6 5.7 5.8 5.0 Summary Key Terms Answers to ‘Check Your Progress’ Questions and Exercises Further Reading INTRODUCTION In the preceding unit, you learnt the various concepts relating to the use of predicate logic and how to use simple facts in logic. In this unit, you will learn about the various concepts about weak slot and filler structures. There are two attributes that are of very general significance, the use of which you have already studied in the previous units: instance and isa. These attributes are important because they support property inheritance. They are called a variety of things in AI systems, but the names do not matter. What does matter is that they represent class membership and inclusion and that class inclusion is transitive. In slot and filler systems, these attributes are usually represented explicitly. In logicbased systems, these relationships may be represented this way or they may be represented implicitly by a set of predicates describing a particular class. This is introduced as a device to support property inheritance along isa and instance links. This is an important aspect of these structures. Monotonic inheritance can be performed substantially more efficiently with such structures than with pure logic, and non-monotonic inheritance is easily supported. The reason that inheritance is easy is that the knowledge in slot and filler systems consists of structures as a set of entities and their attributes. These structures turn out to be useful for other reasons besides the support for inheritance. Weak slot and filler structures are a data structure and the reasons to use this data structure are as follows: • It enables attribute values to be retrieved quickly. ■ Assertions are indexed by the entities. ■ Binary predicates are indexed by first argument. • Properties of relations are easy to describe. Self-Instructional Material 113 Weak Slot and Filler Structures • It allows ease of consideration as it embraces aspects of object oriented programming. This structure is called a weak slot and filler structure because: • A slot is an attribute value pair in its simplest form. • A filler is a value that a slot can take—could be a numeric, string (or any data type) value or a pointer to another slot. • A weak slot and filler structure does not consider the content of the representation. NOTES We describe the two kinds of structures: Semantics and Frames. In this unit, you will learn about representation of these structures themselves and techniques for reasoning with them. We call these ‘knowledge poor’ structures ‘weak’ by analogy with the weak methods for problem solving. 5.1 UNIT OBJECTIVES After going through this unit, you will be able to: • Define slot and filler systems • Analyse the concept of semantic nets • Discuss the role of frames 5.2 SEMANTIC NETS The simplest kind of structured objects are the semantic nets, which were originally developed in the early 1960s to represent the meaning of English words. They are important both historically, and also in introducing the basic ideas of class hierarchies and inheritance. A semantic net is really just a graph, where the nodes in the graph represent concepts, and the arcs represent binary relationships between concepts. The most important relations between concepts are subclass relations between classes and subclasses, and instance relations between particular objects and their parent class. However, any other relations are allowed, such as has-part, colour, etc. So, to represent some knowledge about animals (as AI people so often do) we might have the following network: This network represents the fact that mammals and reptiles are animals, that mammals have heads, an elephant is a large grey mammal, Clyde and Nellie are both elephants and that Nellie likes apples. The subclass relations define a class hierarchy (in this case very simple). The subclass and instance relations may be used to derive new information, which is not explicitly represented. We should be able to conclude that both Clyde and Nellie have a head and are large and grey. They inherit information from their parent classes. Semantic networks normally allow efficient inheritance-based inferences using special-purpose algorithms. Semantic nets are fine at representing relationships between two objects, but what if we want to represent a relation between three or more objects? Let us say we want to represent the fact that ‘John gives Mary the book’. This might be 114 Self-Instructional Material represented in logic as gives(john, mary, book2) where book2 represents the particular book we are talking about. However, in semantic networks we have to view the fact as representing a set of binary relationships between a ‘giving’ event and some objects. When semantic networks became popular in the 1970s, there was much discussion about what the nodes and relations really meant. People were using them in subtly different ways, which led to much confusion. For example, a node such as elephant might be used to represent the class of all elephants or just a typical elephant. Saying that an elephant has_part head could mean that every elephant has some particular head, that there exists some elephant that has a head, or (more reasonably in this case) that they all have some object belonging to the class head. Depending on what interpretation you choose for your nodes and links, different inferences are valid. For example, if it’s just a typical elephant, then Clyde may have properties different from general elephant properties (such as being pink and not grey). Weak Slot and Filler Structures NOTES The simplest way to interpret the class nodes is as denoting sets of objects. So, an elephant node denotes the set of all elephants. Nodes such as Clyde and Nellie denote individuals. So, the instance relationship can be defined in terms of set membership (Nellie is a member of the set of all elephants), while the subclass relation can be defined in terms of a subset relation— the set of all elephants is a subset of the set of all mammals. Saying that elephants are grey means (in the simple model) that every individual in the set of elephants is grey (so Clyde can’t be pink). If we interpret networks in this way we have the advantage of a clear, simple semantics, but the disadvantage of a certain lack of flexibility—maybe Clyde is pink! In the debate about semantic nets, people were also concerned about their representational adequacy (i.e. what sort of facts they were capable of representing). Things that are easy to represent in logic (such as ‘every dog in town has bitten the constable’) are hard to represent in nets (at least, in a way that has a clear and welldefined interpretation). Techniques were developed to allow such things to be represented, which involved partitioning the net into sections, and introducing a special relationship. These techniques didn’t really catch on, so we won’t go into them here, but they can be found in many AI textbooks. To summarize, nets allow us to simply represent knowledge about an object that can be expressed as binary relations. Subclass and instance relations allow us to use inheritance to infer new facts/relations from the explicitly represented one. However, early nets didn’t have very clear semantics (i.e. it wasn’t clear what the nodes and links really meant). It was difficult to use nets in a fully consistent and meaningful manner, and still use them to represent what you wanted to represent. Techniques evolved to get round this, but they are quite complex, and seem to partly remove the attractive simplicity of the initial idea. A Semantic Net is a formal graphic language representing facts about entities in some world about which we wish to reason. The meaning of a concept comes from the ways in which it is connected to other concepts. Computer languages cannot tolerate the ambiguity of natural languages. It is therefore necessary to have welldefined sets of predicates and arguments for a domain. Semantic nets for realistic problems become large and cluttered; the problems need to be broken down. Semantic nets are a simple way of representing the relationships between entities and concepts. Self-Instructional Material 115 Weak Slot and Filler Structures The major idea is that: • The meaning of a concept comes from its relationship to other concepts, and that, NOTES • The information is stored by interconnecting nodes with labelled arcs. Let’s discuss the following aspects of Semantic Nets: • Representation in a Semantic Net • Inference in a Semantic Net • Extending Semantic Nets 5.2.1 Representation in a Semantic Net The physical attributes of a person can be represented as in Figure 5.1. Mammal is a has_part Person Head instance team_colours Black/Blue team India Dhoni Fig. 5.1 A Semantic Network These values can also be represented in logic as: isa (person, mammal), instance (Dhoni, person), team(Dhoni, India). We have already seen how conventional predicates such as lecturer (Rao) can be written as instance (Rao, lecturer). Recall that isa and instance represent inheritance and are popular in many knowledge representation schemes. But we have a problem: How we can have more than 2 place predicates in semantic nets? E.g. score (India, Australia, 236). Solution: • Create new nodes to represent new objects either contained or alluded to in the knowledge, game and fixture in the current example. Relate information to nodes and fill up slots (see Fig. 5.2). Cricket Match isa away-team Australia Fixture 3 score Home-team India 116 Self-Instructional Material Fig. 5.2 A Semantic Network for n-Place Predicate 236 As a more complex example consider the sentence: Sam gave Ravi a book. Here we have several aspects of an event ( see Fig. 5.3). gave book Weak Slot and Filler Structures NOTES instance object agent book 13 event 1 Sam receiver Ravi Fig. 5.3 A Semantic Network for a Sentence 5.2.2 Inference in a Semantic Net Basic inference mechanism: follow links between nodes. Two methods to do this: 5.3.2.1 Intersection search The notion that spreading activation out of two nodes and finding their intersection finds relationships among objects. This is achieved by assigning a special tag to each visited node. The advantages include the entity-based organization and fast parallel implementation. However, every structured question need highly structured networks. Inheritance The isa and instance representations provide a mechanism to implement this. Inheritance also provides a means of dealing with default reasoning. e.g., we could represent: • Emus are birds. • Typically birds fly and have wings. • Emus run. In the following Semantic net: fly action has_part bird wings instance emu action run Fig. 5.4 A Semantic Network for a Default Reasoning Self-Instructional Material 117 Weak Slot and Filler Structures NOTES In making certain inferences we will also need to distinguish between the link that defines a new entity and holds its value and the other kind of link that relates two existing entities. Consider the example shown where the height of two people is depicted and we also wish to compare them. We need extra nodes for the concept as well as its value. Ravi Sam height height 170 160 Fig. 5.5 Two Heights Special procedures are needed to process these nodes, but without this distinction the analysis would be very limited. Ravi Sam height height greater than H2 H1 value value 170 160 Fig. 5.6 Comparison of Two Heights Defaults and inheritance • Defaults and inheritance are ways of achieving some common sense. • Inheritance is a way of reasoning by default, that is, when information is missing, fall back to defaults. • Semantic networks represent inheritance. grey 1 skin 4 tail elephant trunk legs 1 isa e1 name clyde 118 Self-Instructional Material Weak Slot and Filler Structures Using inheritance • To find the value of a property of e1, first look at e1. • If the property is not attached to that node, ‘climb’ the isa link to the node’s parent and search there. ■ isa signifies set membership: ■ ako signifies the subset relation: NOTES • Repeat, using isa/ako links, until the property is found or the inheritance hierarchy is exhausted. • Sets of things in a semantic network are termed types. • Individual objects in a semantic network are termed instances. Examples of Semantic Networks State: I own a tan leather chair. furniture ako person chair isa me part seal isa owner my-chair colour covering leather tan isa brown Event: John gives the book to Mary. give isa John agent event7 isa person object beneficiary isa Mary book23 isa book Complex event: Bilbo finds the magic ring in Gollum’s cave. Bilbo isa habbit ako person isa ring agent event9 isa find object magicring2 location owned-by cave77 isa Gollum cave Self-Instructional Material 119 Weak Slot and Filler Structures 5. 2.3 Extending Semantic Nets Here, we will consider some extensions to Semantic nets that overcome a few problems (see Questions and Exercises) or extend their expression of knowledge. NOTES Partitioned Networks Partitioned Semantic Networks allow for: • propositions to be made without commitment to truth. • expressions to be quantified. Basic idea: Break network into spaces which consist of groups of nodes and arcs and regard each space as a node. Consider the following: John believes that the earth is flat. We can encode the proposition the earth is flat in a space and within it have nodes and arcs that represent the fact (see Fig. 5.7). We can have the nodes and arcs to link this space with the rest of the network to represent John’s belief. believes instance John agent event1 object space 1 earth flat instance insurance Prop 1 Object 1 has_property Fig. 5.7 Partitioned Network Now consider the quantified expression: Every parent loves their child. To represent this we: • Create a general statement, GS, special class. • Make node g an instance of GS. • Every element will have at least 2 attributes: ■ a form that states which relation is being asserted. ■ one or more forall ( ∀∃ )∧ or exists ∀∃ ( )∧ connections—these represent universally quantifiable variables in such statements e.g. x, y in ∀∃ y∧: child(y) x ∧parent(x) → loves(x, y) ∀∃ ∀∃ ∧ Here we have to construct two spaces one for each x, y. Note: We can express variables as existentially qualified variables and express the event of love having an agent p and receiver b for every parent p which could simplify the network. Also, if we change the sentence to Every parent loves child then the node of the object being acted on (the child) lies outside the form of the general statement. 120 Self-Instructional Material Thus it is not viewed as an existentially qualified variable whose value may depend on the agent. So we could construct a partitioned network as in Figure 5.8. Weak Slot and Filler Structures instance GS NOTES instance gs1 parent loves child form form gs2 instance forall c2 exists instance i3 receiver space2 instance agent p1 space1 Fig. 5.8 ∀∃(for ∧ all) Partitioned Network Construct the semantic nets for the following sentences: 1. The dog bit the mail carrier. 2. Every dog has bitten a mail carrier. 3. Every dog in town has bitten the mail carrier. 4. Every dog has bitten every mail carrier. 1. The dog bit the mail carrier. Dogs mail carrier Bite isa isa isa m b d assailant victim 2. Every dog has bitten a mail carrier. SA GS Dogs isa isa isa isa mail carrier Bite S1 form g m b d assailant victim Self-Instructional Material 121 Weak Slot and Filler Structures 3. Every dog in town has bitten the mail carrier. Dogs NOTES SA Town Dogs GS isa isa isa form g mail carrier Bite isa S1 C b d assailant victim Every dog has bitten every mail carrier. SA Dogs isa isa assailant GS 1. A semantic net is a formal, non-graphic language representing facts about entities in some world about which we wish to reason. (True or False) 2. Semantic nets do not allow us to simply represent knowledge about an object that can be expressed as binary relations. (True or False) 3. Frames on their own are not particularly helpful but frame systems are a powerful way of encoding information to support reasoning. (True or False) 122 Self-Instructional Material 5.3 isa m b d Check Your Progress mail carrier Bite victim g form FRAMES Frames can also be regarded as an extension to Semantic nets. Indeed it is not clear where the distinction between a semantic net and a frame ends. Initially, semantic nets were used to represent labelled connections between objects. As tasks became more complex, the representation needs to be more structured. The more structured the system, the more beneficial to use frames. A frame is a collection of attributes or slots and associated values that describe some real-world entity. Frames on their own are not particularly helpful but frame systems are a powerful way of encoding information to support reasoning. Set theory provides a good basis for understanding frame systems. Each frame represents: • a class (set), or • an instance (an element of a class). Instead of properties a frame has slots. A slot is like a property, but can contain more kinds of information, sometimes called facts of the slot: • The value of the slot • A procedure that can be run to compute the value (and if needed, procedure) • Procedures to be run when a value is put into the slot or removed • Data type information, constraints on possible slot fillers • Documentation Frames can inherit slots from parent frames. For example, man might inherit properties from Ape or Mammal (A parent class of man). Weak Slot and Filler Structures Properties • Frames implement semantic networks. NOTES • They add procedural attachment. • A frame has slots and slots have values. • A frame may be generic, i.e. it describes a class of objects. • A frame may be an instance, i.e. it describes a particular object. • Frames can inherit properties from generic frames. Frames are a variant of nets that are one of the most popular ways of representing non-procedural knowledge in an expert system. In a frame, all the information relevant to a particular concept is stored in a single complex entity, called a frame. Superficially, frames look pretty much like record data structures. However frames, at the very least, support inheritance. They are often used to capture knowledge about typical objects or events, such as a typical bird or a typical restaurant meal. We could represent some knowledge about elephants in frames as follows: Mammal subclass: Animal warm_blooded: yes subclass: Mammal * colour: grey * size: large instance: Elephant colour: pink owner: Fred instance: Elephant size: small Elephant Clyde Nellie: A particular frame (such as Elephant) has a number of attributes or slots such as colour and size where these slots may be filled with particular values, such as grey. We have used a ‘*’ to indicate those attributes that are only true of a typical member of the class, and not necessarily every member. Most frame systems will let you distinguish between typical attribute values and definite values that must be true. [Rich & Knight in fact distinguish between attribute values that are true of the class itself, such as the number of members of that class, and typical attribute values of members] In the above frame system, we would be able to infer that Nellie was small, grey and warm blooded. Clyde is large, pink and warm blooded and owned by Fred. Self-Instructional Material 123 Weak Slot and Filler Structures NOTES Objects and classes inherit the properties of their parent classes UNLESS they have an individual property value that conflicts with the inherited one. Inheritance is simple where each object/class has a single parent class, and where slots take single values. If slots may take more than one value it is less clear whether to block inheritance when you have more specific information. For example, has_part head, and that an elephant has_part trunk you may still want to infer that an elephant has a head. It is therefore useful to label slots according to whether they take single values or multiple values. i f y o u k n o w t h a t a m a m m a l If objects/classes have several parent classes (e.g. Clyde is both an elephant and a circus animal), then you may have to decide which parent to inherit from (maybe, elephants are by default wild, but circus animals are by default tame). There are various mechanisms for making this choice, based on choosing the most specific parent class to inherit from. In general, both slots and slot values may themselves be frames. Allowing slots of be frames means that we can specify various attributes of a slot. We might want to say, for example, that the slot size always must take a single value of type size-set (where size-set is the set of all sizes). The slot owner may take multiple values of type person (Clyde could have more than one owner). We could specify this in the frames: Size: instance: Slot single_valued: yes range: Size-set instance: Slot single_valued: no range: Person Owner: The attribute value Fred (and even large and grey, etc.) could be represented as a frame, e.g: Fred: instance: Person occupation: Elephant-breeder One final useful feature of frame systems is the ability to attach procedures to slots. So, if we don’t know the value of a slot, but know how it could be calculated, we can attach a procedure to be used if needed, to compute the value of that slot. Maybe, we have slots representing the length and width of an object and sometimes need to know the object’s area—we would write a (simple) procedure to calculate it, and put that in place of the slot’s value. Such mechanisms of procedural attachment are useful, but perhaps should not be overused, or else our nice frame system would consist mainly of just lots of procedures, interacting in an unpredictable fashion. 124 Self-Instructional Material Frame systems, in all their full glory, are pretty complex and sophisticated things. More details are available in [Rich & Knight, sec 9.2] or [Luger &Stubblefield, sec 9.4.1] (or less detail in [Bratko]). The main idea to get clear is the notion of inheritance and default values. Most of the other features are developed to support inheritance reasoning in a flexible but principled manner. As we saw for nets, it is easy to get confused about what slots and objects really mean. In frame systems we partly get round this by distinguishing between default and definite values, and by allowing users to make slots ‘first class citizens’, giving the slot particular properties by writing a frame-based definition. Advantages of frames Weak Slot and Filler Structures NOTES 1. A frame collects information about an object in a single place in an organized fashion. 2. By relating slots to other kinds of frames, a frame can represent typical structures involving an object; these can be very important for reasoning based on limited information. 3. Frames provide a way of associating knowledge with objects. 4. Frames may be a relatively efficient way of implementing AI applications. 5. Frames allow data that are stored and computed to be treated in a uniform manner. For example, class, a computed grade or marks of a student might be stored. Disadvantages of frames 1. Slot fillers must be real data. For example, it is not possible to say that Ram is a player or a student, since there is no way to deal with a disjunction in a slot filler. 2. It is not possible to quantify over slots. For example, there is no way to represent ‘some students make 100 on the Examination’. 3. It is necessary to repeat the same information to make it usable from different viewpoints, since methods are associated with slots or particular object types. For example, it may be easy to answer, ‘Whom does Sam love?’, but hard to answer ‘Who loves Nisha?’ Frames are good for applications in which the structure of the data and the available information are relatively fixed. The procedures or methods associated with slots provide a good way to associate knowledge with a class of objects. Frames are not particularly good for applications in which deep deductions are made from a few principles as in theorem proving. Object oriented programming and frame systems have much in common: • Class / Instance structure • Inheritance from higher classes • Ability to associate programs with classes and call them automatically Because of these similarities it is not possible to draw a hard distinction between object oriented systems and frames. However, there are some differences that are typically found. Frame systems typically have the following features in contrast to the typical object oriented system: 1. Richer Slots: A slot can contain more kinds of information. E.g.: documentation, default value, restrictions on slot fillers. 2. Slot orientation: Usually there are no messages that are not associated with slots. 3. More complex structure: A frame system could potentially have instance values, default values, and if needed, methods for the same slot. When all Self-Instructional Material 125 Weak Slot and Filler Structures NOTES these are combined with multiple inheritance paths, the result can be complex. These are certain problems with frames: • Inheritance causes trouble. • Inference tends to become complex and ill structured. • Frames are good for representing static situations; they are not so good for representing things that change over time. As example of a simple frame is shown below: (Ravi (profession (value professor) (Age (value 42)) (wife (value Rani)) (children (value A, B) (address (street (value Nagarampalem) (city (value Guntur)) (state (value AP) (PIN (value 522 004)) The general frame template structure is shown below: (<frame name>) (<slot 1> (<fact 1><value 1>………<value k1>) (<fact 2><value 1>………<value k2>) (<slot 2><fact 1><value 1…….<value km>) | | |) From the above general structure it is seen that a frame may have any number of slots, and a slot may have any number of facts, each with number of values. This provides a general framework from which to build a variety of knowledge structures. The slots in a frame specify genera or specific. Now we will discuss: • Frame knowledge Representation • Interpreting frames 5.3.1 Frame Knowledge Representation Consider the example first discussed in Semantic Nets: Person isa: Cardinality: Mammal Adult-Male isa: Person Cardinality: 126 Self-Instructional Material Cricket-Player isa: Adult-Male Cardinality: Height: Weight: Position: Team: Team-Colours: Back isa: Cricket-Player Cardinality: Tries: Dhoni instance: Back Height: 6-0 Position: Centre Team: India Team-Colours: Black Cricket-Team isa: Team Cardinality: Team-size: 10 Coach: Cricket team Instance: Indian Team Team size: 10 Coach: Amarnadh Players: {Sachin, Yuvaraj Singh, Srinath, Harbajan…..} Weak Slot and Filler Structures NOTES Fig. 5.9 A Simple Frame System Here, the frames Person, Adult-Male, Cricket-Player and India-Team are all classes and the frames Amarnadh and Indian are instances. Note: • The isa relation is in fact the subset relation. • The instance relation is in fact element of. • The isa attribute possesses a transitivity property. This implies: Amarnadh is a Back and a Back is a Cricket-Player who in turn is an Adult-Male and also a Person. • Both isa and instance have inverses which are called subclasses or all instances. • There are attributes that are associated with the class or set such as cardinality and on the other hand there are attributes that are possessed by each member of the class or set. Self-Instructional Material 127 Weak Slot and Filler Structures 5.3.2 Distinction between Sets and Instances It is important that this distinction is clearly understood. India can be thought of as a set of players or as an instance of an Indian- NOTES Team. If India were a class then • its instances would be players • it could not be a subclass of Indian-Team; otherwise, its elements would be members of Rugby-Team which we do not want. Instead we make it a subclass of Cricket-Player and this allows the players to inherit the correct properties enabling us to let the Indian to inherit information about teams. This means that Indian is an instance of Cricket-Team. But there is a problem here: • A class is a set and its elements have properties. • We wish to use inheritance to bestow values on its members. • But there are properties that the set or class itself has such as the manager of a team. This is why we need to view Indian as a subset of one class players and an instance of teams. We seem to have a CATCH 22. Solution: MetaClasses. A metaclass is a special class whose elements are themselves classes. Now consider our rugby teams as: Class instance: Class isa: Class Cardinality: … Instance: class Isa: class Cardinality: The number of team Team Size: 10 Team Indian Team Isa: Team Cardinality: The number of team Team Size: 10 Coach: Amarnath Instance: Cricket-team Team Size: 10 Coach: Amarnath India 128 Self-Instructional Material Weak Slot and Filler Structures Amarnath Instance: Back Height: 6-0 Position: Slip Team: India NOTES Team-colours: Blue Fig. 5.10 A Metaclass Frame System The basic metaclass is Class, and this allows us to • define classes which are instances of other classes, and (thus) • inherit properties from this class. Inheritance of default values occurs when one element or class is an instance of a class. Slots as Objects How can we represent the following properties in frames? • Attributes such as weight, age to be attached and make sense • Constraints on values such as age being less than a hundred • Default values • Rules for inheritance of values such as children inheriting parent’s names • Rules for computing values • Many values for a slot A slot is a relation that maps from its domain of classes to its range of values. A relation is a set of ordered pairs so one relation is a subset of another. Since slot is a set the set of all slots can be represented a metaclass called Slot, say. Consider the following: SLOT isa: instance: domain: range: range-constraint: definition: default: to-compute: single-valued: Coach instance: domain: range: range-constraint: default: single-valued: Class Class SLOT Cricket-Team Person λx(experience x.manager) TRUE Self-Instructional Material 129 Weak Slot and Filler Structures NOTES Colour instance: domain: range: single-valued: Team-Colours instance: isa: domain: range: range-constraint: single-valued: Position instance: domain: range: to-compute: single-valued: SLOT Physical-Object Colour-Set FALSE SLOT Colour team-player Colour-Set not Pink FALSE SLOT Cricket-Player {Back, Forward, Reserve} λx x.position TRUE Note the following: • Instances of SLOT are slots. • Associated with SLOT are attributes that each instance will inherit. • Each slot has a domain and range. • Range is split into two parts: one, the class of the elements and the other is a constraint which is a logical expression; if absent it is taken to be true. • If there is a value for default then it must be passed on unless an instance has its own value. • The to-compute attribute involves a procedure to compute its value. E.g., in Position where we use the dot notation to assign values to the slot of a frame. • Transfers through lists – other slots from which values can be derived from inheritance. 5.4.2.1 Interpreting frames A frame system interpreter must be capable of the following in order to exploit the frame slot representation: • Consistency checking—when a slot value is added to the frame relying on the domain attribute and the value is legal using range and range constraints. • Propagation of definition values must be along isa and instance links. • Inheritance of default values along isa and instance links. • Computation of value of slot as needed. • Checking that only correct number of values are computed. 130 Self-Instructional Material 5.4 SUMMARY In this unit, you have learned the various concepts about weak slot and filler structures. There are two attributes that are of very general significance, the use of which you have already learned in the previous units. These attributes are important because they support property inheritance. This unit also discussed in detail about semantic nets and frames. The idea of a frame system as a way to represent declarative knowledge has been encapsulated in a series of frame oriented representation languages, whose features have evolved and been driven by an increased understanding of the sort of representation issues. Weak Slot and Filler Structures NOTES • Semantic networks start out by encoding relationships like isa and ako but can represent quite complicated information on this simple basis. • Frames organize knowledge around concepts considered to be of interest (like person and patient in the example code). • A frame can be a generic frame (or template) or an instance frame. • Frames also allow procedural attachment—that is, demons can be attached to slots so that the mere fact of creating a frame or accessing a slot can cause. • Significant computation to be performed. 5.5 KEY TERMS • Monotonic inheritance: This can be performed substantially more efficiently with such structures than with pure logic, and non-monotonic inheritance is easily supported. • Semantic net: It is a formal graphic language representing facts about entities in some world about which we wish to reason. • Frames: They can also be regarded as an extension to Semantic nets. Indeed it is not clear where the distinction between a semantic net and a frame ends. 5.6 ANSWERS TO ‘CHECK YOUR PROGRESS’ 1. False 2. False 3. True 5.7 QUESTIONS AND EXERCISES 1. Discuss in detail with proper illustrations semantic nets and scripts. What are their advantages and limitations? 2. Express the following statements using semantic nets: (i) Every student has been hit by every teacher (at least once). (ii) Every student has been hit by some teacher (or the other). (iii) Some students have been hit by some teachers. Self-Instructional Material 131 Weak Slot and Filler Structures 3. Give semantic nets to describe the following: Narayana is a writer. Narayana lives in Hyderabad. Jagan is a teacher. NOTES Jagan lives in Mumbai. Narayana sent a copy to his book to Jagan. Jagan sent his thanks to Narayana. 5.8 FURTHER READING 1. www.cee.hw.ac.uk 2. www.cs.cardiff.ac.uk 3. www.cse.unsw.edu.ac 4. www.mm.iit.uni-miskolc.hu 5. www.cindy.cis.netu.edu.tw 6. www.uet.net.pk 7. Marinov, M. 2008. ‘Using Frames for Knowledge Representation in A CORBA – Based Distributed Environment’. Knowledge Based Systems. July. 8. Cheng, H.C. 2006. ‘A eb-Based Distributed Problem Solving Environment for Engineering Application’. Advances in Engineering Software, February. 132 Self-Instructional Material UNIT 6 STRONG SLOT AND FILLER STRUCTURES Strong Slot and Filler Structures NOTES Structure 6.0 6.1 6.2 6.3 6.4 Introduction Unit Objectives Conceptual Dependency Scripts CYC 6.4.1 Motivation 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.0 CYCL Global Ontology Summary Key Terms Answers to ‘Check Your Progress’ Questions and Exercises Further Reading INTRODUCTION In the previous unit, you learnt about slot and filler structures in very general outlines. In this unit, you will learn about strong slot and filler structures. Individual semantic networks and frame systems may have specialized links and inference procedures, but there are no hard and fast rules about what kinds of objects and links are good in general for knowledge representation. Such decisions are left up to the builder of the semantic network or frame system. Strong slot and filler structures typically: • Represent links between objects according to more rigid rules • Specific notions of what types of object and relations between them are provided • Represent knowledge about common situations We have two types of strong slot and filler structures: 1. Conceptual Dependency (CD) 2. Scripts For a specific need, we can place the restrictions on the structure, that is, we can have a meaningful structure with specific structural elements and primitives. This will enable us to derive specific methods for using the structure. • Conceptual dependencies – for dealing with natural languages Low-level representation capturing textual information Primitives: Actions – transfer, movement, mental, sensory Objects – modifiers of actions, modifiers of objects Tenses, directionality Self-Instructional Material 133 Strong Slot and Filler Structures NOTES Note that many sentences can form the same CD because they convey the same information, just differently – I gave the man a book. – I gave the book to a man. – That was the man that I gave the book to. If a sentence is ambiguous, the Conceptual Dependencies must also contain the ambiguity, or we could create multiple Conceptual Dependencies – one for each interpretation. CDs can be very lengthy for even a very simple idea. • Scripts – for planning or reasoning over ‘stereotypical events’. Captures the typical sequence of actions along with the actors and props of a typical event. Includes: Preconditions, Post conditions, roles, props, settings and variations. Used primarily for answering the questions and summarizing the input text. What about a typical action? • CYC – an experiment dealing with common sense knowledge Most Systems have common sense reasoning abilities at all – this has been a criticism of AI. CYC – An experiment to program common sense knowledge. Use simple interfacing strategies (e.g. Resolution) to drive new common sense facts. CYC systems consist of perhaps 10 million individual common sense facts. Effort of 5 years and many people. Uses a frame based representation and is capable of generating new generations. • Functional Representation – for reasoning over device functions and behaviour. How do we represent functionality and device behaviour? How do we reason about how things work? We can use a ‘functional’ based representation to perform hypothetical reasoning. Elements of a functional representation: i. Devices are described by a Function, structure and behaviour. ii. Function: The purposes of component X. iii. Structures: How components physically construct. iv. Behaviour: The sequence of states that X undergoes to bring about the function. Examples of Functional Representation: • Physical Objects: Doorbell Ringer System Chemical Processing Unit Human Body subsystem (e.g. Gate) 134 Self-Instructional Material • Abstracts Objects: Compute program Combat Mission Plan Diagnostic reasoning process Strong Slot and Filler Structures • Uses of functional representations Given functional representation of how something works’ Generate a ‘fault tree for diagnosis NOTES Perform Hypothetical reasoning Produce predictions and simulations Perform redesign 6.1 UNIT OBJECTIVES After going through this unit, you will be able to: • • • • • • 6.2 Understand conceptual dependency Describe how to represent knowledge through conceptual dependency Discuss the role of scripts Understand the concept of CYC Explain CYCL Discuss the concept of global ontology CONCEPTUAL DEPENDENCY (CD) Conceptual Dependency originally developed to represent knowledge acquired from natural language input. The goals of this theory are: • To help in the drawing of inference from sentences • To be independent of the words used in the original input • That is to say: For any 2 (or more) sentences that are identical in meaning there should be only one representation of that meaning. It has been used by many programs that claim to understand English. Conceptual Dependency is a theory of Roger Schank that combines several ideas. CD provides: • a structure into which nodes representing information can be placed • a specific set of primitives • a given level of granularity Sentences are represented as a series of diagrams depicting actions using both abstract and real physical situations. • The agent and the objects are represented. • The actions are built up from a set of primitive acts which can be modified by tense. Examples of Primitive Acts are: ATRANS Transfer of an abstract relationship. e.g., give. Self-Instructional Material 135 Strong Slot and Filler Structures PTRANS Transfer of the physical location of an object. e.g., go. NOTES PROPEL Application of a physical force to an object. e.g., push. MTRANS Transfer of mental information. e.g., tell. MBUILD Construct new information from old. e.g., decide. SPEAK Utter a sound. e.g., say. ATTEND Focus a sense on a stimulus. e.g., listen, watch. MOVE Movement of a body part by owner. e.g., punch, kick. GRASP Actor grasping an object. e.g., clutch. INGEST Actor ingesting an object. e.g., eat. EXPEL Actor getting rid of an object from the body. Conceptual Dependency introduced several interesting ideas: 1. An internal representation that is language free, using primitive ACTs instead. 2. A small number of primitive ACTs rather than thousands. 3. Different ways of saying the same thing can map the same internal representation. There are some problems too: 1. The expansion of some verbs into primitive ACT structures can be complex. 2. Graph matching is NP-hard. Six primitive conceptual categories provide building blocks which are the set of allowable dependencies in the concepts in a sentence: 136 Self-Instructional Material ACTs Actions PPs Object (Picture Procedure) AAs Modifiers of actions (Action aiders) PAs Modifiers of objects, i.e. PPs (picture aiders) AA Action aiders are properties or attributes of primitive Actions PP Picture procedures are actors or physical objects that perform different acts or producers. How do we connect these things together? Strong Slot and Filler Structures NOTES Consider the example: John gives Mary a book. • Arrows indicate the direction of dependency. Letters above indicate certain relationships: o object. R recipient-donor. I instrument e.g. eats with a spoon. D destination e.g. going home. • Double arrows (⇔) indicate two-way links between the actor (PP) and action (ACT). • The actions are built from the set of primitive acts (see above). These can be modified by tense etc. • Primitives (ATRANS, PTRANS, PROPEL) indicates transfer of possession. • O indicates object relation. • R indicates recipient relation. A list of basic conceptual dependencies is shown below. These capture the fundamental semantic structures of natural language. For example, the first conceptual dependency shown below describes the relationship between as subject and its verb, and the third describes the verb object relation. 1. PP ⇔ ACT indicates that an actor acts. 2. PP 3. ACT ACT Indicates that objects has a certain attribute. PP indicates that action. The use of tense and mood in design events are extremely important and Schank introduced the following modifiers: p f t ts tf k ? / delta c the absence past future transition start transition finished transition continuing interrogative negative timeless conditional of any modifier implies the present tense. Self-Instructional Material 137 Strong Slot and Filler Structures 4. Indicates the recipient and the donor of an object within an action. NOTES 5. Indicates the direction of an object within an action. 6. Indicates the instrumental conceptualization for an action. 7. Indicates that conceptualization X caused conceptualization Y. 8. Indicates a stage change of an Object. 9. PP1 ← PP2. Indicates that PP2 is either Part of PP1. 10. Describes the relationship between PP and an attribute that is already been predicated of it. 11. Represents the relationship between PP and a state in which ist started and another in which it ended. 138 Self-Instructional Material 12. The two forms of the rule describe the cause of an action and the cause of a state change. Strong Slot and Filler Structures NOTES (a) (b) 13. Describes the relationship between a conceptualization and the time at which the event it describes occurred. These can be combined to represent a simple transitive sentence such as: ‘John throws the ball.’ John ⇔ PROPEL ←° Ball So the past tense of the above example: John gave Mary a book becomes: The has an object (actor), PP and action, ACT, i.e. PP ⇔ ACT. The triple arrow (⇔) is also a two link but between an object, PP, and its attribute, PA, i.e. PP ⇔ PA. It represents isa type dependencies. E.g. Rao ⇔ lecturer Rao is a lecturer. Primitive states are used to describe many state descriptions such as height, health, mental state, physical state. There are many more physical states than primitive actions. They use a numeric scale. E.g. John ⇔ height (+10) John is the tallest John ⇔ height (< average) John is short Frank ⇔ health (-10) Frank is dead Dave ⇔ mental_state (-10) Dave is sad Vase ⇔ physical_state (-10) the vase is broken You can also specify things like the time of occurrence in the relationship. Self-Instructional Material 139 Strong Slot and Filler Structures For example: John gave Mary the book yesterday P John to 0 Mary R TRANS Book John NOTES from Now, let us consider a more complex sentence: Since smoking can kill you, I stopped. Let’s look at how we represent the inference that smoking can kill: • Use the notion of one to apply the knowledge to. • Use the primitive act of ingesting smoke from a cigarette to one. • Killing is a transition from being alive to dead. We use triple arrows to indicate a transition from one state to another. • Have a conditional, c causality link. The triple arrow indicates dependency of one concept on another. One 0 INGEST One R Smoke Cigarette From health(-10) One to __ health( 10) To add the fact that I stopped smoking • Use similar rules to imply that I smoke cigarettes. • The qualification t/p attached to this dependency indicates that the instance INGESTing smoke has stopped. One 0 One R INGEST smoke Cigarette tfp INGEST from 0 R 1 smoke to Cigarette from One P to Advantages of CD • Using these primitives involves fewer inference rules. • Many inference rules are already represented in the CD structure. 140 Self-Instructional Material • The holes in the initial structure help to focus on the points still to be established. Disadvantages of CD Strong Slot and Filler Structures • Knowledge must be decomposed into fairly low-level primitives. • Impossible or difficult to find the correct set of primitives. • A lot of inference may still be required. NOTES • Representations can be complex even for relatively simple actions. Consider: Dave bet Frank five pounds that Wales would win the Rugby World Cup. Complex representations require a lot of storage. Applications of CD MARGIE (Meaning Analysis, Response Generation and Inference on English) – model natural language understanding. SAM (Script Applier Mechanism) – Scripts to understand stories (see the next section). PAM (Plan Applier Mechanism) – Scripts to understand stories. Schank et al. developed all of the above. 6.3 SCRIPTS A script is a structure that prescribes a set of circumstances which could be expected to follow on from one another. It is similar to a thought sequence or a chain of situations which could be anticipated. It could be considered to consist of a number of slots or frames but with more specialized roles. A Script can be viewed as a frame for a sequence of actions. Understanding natural language often requires knowledge of typical sequences. Example: Shyam went to a restaurant. He ordered an exclusive dinner. He had forgotten his wallet. He had to wash dishes. The sequence of sentences mentions only the parts of the story that are different from what might otherwise be expected. Understanding such a sequence requires knowledge of a restaurant script that specifies typical sequences of actions involved in going to a restaurant. In contrast to the relatively static slots of a frame, a script may have a directed graph of events composing the script. A script is a structure that described a stereotyped sequence of events in a particular content. A script consists of slots. Each slot contains some information Self-Instructional Material 141 Strong Slot and Filler Structures NOTES about kinds of values it may contain as well as a default value to be used if no other information is available. Scripts are frame-like structures used to represent commonly occurring situations such as going to the movies, shopping in a supermarket, eating in a restaurant, visiting a dentist, reading about baseball or politics. Scripts are used in natural language understanding systems to organize a knowledge base in terms of the situations that the system is to understand. When people enter a restaurant they know what to expect and how to act. They are met at the entrance by restaurant staff or by a sign indicating that they should continue in and be directed to a seat. Either a menu is available at the table or presented by the waiter or the customer asks for it. These are the routines for ordering food, eating, paying and leaving. In fact the restaurant script is quite different from other eating scripts such as the ‘fast food’ model of the ‘formal family means’. In the fast food model the customer enters gets in the line to order, pay for the meal (before eating), waits about for a tray with the food, takes the tray and tries to find a clean table and so on. These are two different stereotyped sequenced of events and each has a potential script. Scripts are beneficial because: • Events tend to occur in known runs or patterns. • Causal relationships between events exist. • Entry conditions exist which allow an event to take place. • Prerequisites exist upon events taking place. E.g. when a student progresses through a degree scheme or when a purchaser buys a house. The components of a script include: Entry Conditions Entry conditions or descriptors of the world that must be true for the script to be called. From the above script, these include an open restaurant and a happy customer that has some money, or conditions that must, in general, be satisfied before the events described in the script can occur. Results Those results or facts that are true once the script has terminated. Example, the customer is full and poorer, the restaurant owner has more money or conditions that will, in general, be true after the events described in the script have occurred. Props Props or the ‘things’ that support the content of the script. These might include table, waiters and menus. The set of props supports reasonable default assumptions about the situation; a restaurant is assumed to have tables and chairs, unless stated otherwise. OR These are the slots representing the objects that are involved in the events described in the scripts. The presence of these objects can be inferred even if they are not mentioned explicitly. 142 Self-Instructional Material Strong Slot and Filler Structures Roles Are the items those that the individual participants perform? The waiter takes the order, delivers food and presents the bill. The customers order, eat and pay. OR NOTES These are the slots representing the people, who are involved in the events described in the script. If specific individuals are mentioned, they can be inserted into the appropriate slots. Track The specific variations in one or more general patterns that is represented by their particular script. Scenes Script is broken into a sequence of scenes each of which presents a temporal aspect of the script. The sequence of events that occur. Events are represented in conceptual dependency form. In the restaurant there is entering, ordering, eating, etc. Example: Restaurant Script Structure SCRIPT : Restaurant TRACK : Oberoi PROPS : Tables, Chairs, Menu, Money, Food ROLES : Customer, Waiter, Cashier, Owner, Cook Entry Conditions (a) Customer is hungry (b) Customer has money Scene 1: Entering (a) Customer PTRANS into Restaurant. (b) Customer ATTEND eyes to the tables. (c) Customer MBUILD where to sit. (d) Waiter PTRANS customer to an empty table. (e) Customer MOVES to sitting position. Scene 2: Ordering (a) Waiter PTANS the menu. (b) Customer MBUILD choice of food. (c) Customer MTRANS signal to waiter. (d) Waiter PTRANS to table. (e) Customer MTRANS I want food to waiter. Self-Instructional Material 143 Strong Slot and Filler Structures Scene 3: Eating (a) Cook ATRANS food to waiter. NOTES (b) Waiter ATRANS food to customer. (c) Customer INGESTS food. (Option: Return to scene 2 to order more: otherwise, go to scene 4) Scene 4: Eating (a) Customer MTRANS to waiter. (b) Waiter PTRANS to customer. (c) Waiter ATRANS bill to customer. (d) Customer ATRANS tip to waiter. (e) Customer PTRANS to cashier. (f) Customer ATRANS money to cashier. (g) Customer PTRANS out of restaurant. Results (a) Customer is no more hungry. (b) Customer is satisfied. (c) Owner gets money. Example 2: A Supermarket script structure: SCRIPT : Food market TRACK : Super market PROP : Shopping cart, Market ITEMS, Check-out Stands, Money ROLES : Shopping, Daily Attendant, Sea food Attendant, Check-out Clerk, Sacking clerk, Cashier, other Shoppers. Entry Condition: (a) Shopper needs groceries. (b) Food markets open. Scene 1: (a) Shopper enters into market. (b) Shopper PTRANS from one location to another into market. (c) Shopper PTRANS shopping_cart to shopper. Scene 2: (a) Shopper shops for items. (b) Shopper moves (MOVES). 144 Self-Instructional Material (c) Shopper focuses eyes on display items (ATTEND). Strong Slot and Filler Structures (d) Shopper transfers items to shopping_cart (PTRANS). Scene 3: (a) Check-out. NOTES (b) Shopper MOVES shopper to check-out stand. (c) Shopper WAITS for Shopper turn. (d) Shopper ATTENDS eyes to charge. (e) Shopper ATRANS money to Cashier. (f) Sales Boy ATRANS bags to Shopper. Scene 4: (a) Exit market (b) Shopper PTRANS shopper to exit market. Results (a) Shopper has less money. (b) Shopper has grocery money. (c) Market has less grocery items. (d) Market has more money. Scripts are useful in describing certain situations such as robbing a bank. This might involve: • Getting a gun • Hold up a bank • Escape with the money Here the Props might be • Gun, G • Loot, L • Bag, B • Getaway car, C The Roles might be: • Robber, S • Cashier, M • Bank Manager, O • Policeman, P The Entry Conditions might be: • S is poor. • S is destitute. The Results might be: • S has more money. • O is angry. Self-Instructional Material 145 Strong Slot and Filler Structures • M is in a state of shock. • P is shot. There are 3 scenes: obtaining the gun, robbing the bank and the getaway. NOTES Script: Robbery Track: Successful Snatch Props: G = Run L = Loot B = Bag C = Get away Car Roles: R = Robber M = Cashier O = Bank Manager P = Police Man Entry Conditions: R is poor R is destitute Scene 1: Getting a gun R PTRANS R into Gun Shop. R MBUILD R choice of G. R MTRANS choice. R ATRANS buys G. Scene 2: Holding up the bank R PTRANS R into bank. R ATTEND eyes M, O and P. R MOVE R to M positions. R GRASP G R MOVE G to point M. R MTRANS ‘Give me the money or else’ to M. P MTRANS ‘Hold it Hands up’ to R. R PROPEL Shoots G. P INGEST bullet from G. M ATRANS L to M. M ATRANS L puts in bag B. 146 Self-Instructional Material M PTRANS exit. Strong Slot and Filler Structures O ATRANS raises the alarm. Scene 3: The getaway NOTES M PTRANS C. Results R has more money. O is angry. M is in a state of shock. P is shot. Some additional points to note on Scripts: • If a particular script is to be applied it must be activated and the activating depends on its significance. • If a topic is mentioned in passing, then a pointer to that script could be held. • If the topic is important then the script should be opened. • The danger lies in having too many active scripts, much as one might have too many windows open on the screen or too many recursive calls in a program. • Provided events follow a known trail we can use scripts to represent the actions involved and use them to answer detailed questions. • Different trails may be allowed for different outcomes of Scripts (e.g. The bank robbery goes wrong). Advantages of Scripts: • Ability to predict events. • A single coherent interpretation may be built up from a collection of observations. Disadvantages: • Less general than frames • May not be suitable to represent all kinds of knowledge 6.4 CYC What is CYC? • An ambitious attempt to form a very large knowledge base aimed at capturing commonsense reasoning. • Initial goals to capture knowledge from a hundred randomly selected articles in the Encyclopedia Britannica. • Both Implicit and Explicit knowledge encoded. • Emphasis on study of underlying information (assumed by the authors but not needed to tell the readers. Self-Instructional Material 147 Strong Slot and Filler Structures Example: Suppose we read that Sam learned of Napoleon’s death. Then we (humans) can conclude Napoleon never new that Sam had died. NOTES How do we do this? We require special implicit knowledge or commonsense such as: • We only die once. • You stay dead. • You cannot learn of anything when dead. • Time cannot go backwards. Why build large knowledge bases: 6.4.1 Motivations Why we should build large knowledge bases at all? There are many reasons: Brittleness Specialized knowledge bases are brittle. Hard to encode new situations and nongraceful degradation in performance. Commonsense based knowledge bases should have a firmer foundation. Form and Content Knowledge representation may not be suitable for AI. Commonsense strategies could point out where difficulties in content may affect the form. Shared Knowledge Should allow greater communication among systems with common bases and assumptions. Building an immense knowledge base is an overwhelming task; however, we should ask whether there are any methods for acquiring this knowledge automatically. Here there are two possibilities: 1. Machine Learning: Current techniques permit only modest extensions of a program’s knowledge when compared with older techniques. In order for a system to learn a great deal, it must already know a great deal. In particular systems with a lot of knowledge will be able to employ powerful analogical reasoning. 2. Natural Language Understanding: Humans can extend their knowledge by reading books and talking with other humans. Now we have online versions of dictionaries and encyclopaedias, so why don’t we use these into an AI program and have it assimilate all the information automatically? Although there are many methods for building language understanding systems, these methods are themselves very knowledge intensive. The approach taken by CYC is to hand-code the 10-million or so facts that make up commonsense knowledge. It may then possibly to develop more automatic methods. The CYC is coded by using the following methods: • By hand 148 Self-Instructional Material • Special CYC languages such as 1. LISP like 2. Frame Based 3. Multiple inheritances 4. Slots are fully fledged objects. 5. Generalized inheritance any link not just ‘isa’ and instance. Strong Slot and Filler Structures NOTES Example: ‘Shyam learnt of Napoleon’s death’. Then human can conclude Napoleon never knew that Shyam had died. For doing this, we need special implicit knowledge of common sense such as: • We only die once. • You stay dead. • You cannot learn when dead. • Time cannot go backwards. Sometimes, we have to build large knowledge bases. The reasons to construct large knowledge bases are as follows: There are multiple ways for representing knowledge. They are: 1. Simple relational knowledge 2. Inheritable knowledge 3. Inferential knowledge 4. Procedural knowledge 1. Inferential knowledge: The simplest way to represent declarative facts is a set of relations of the same sort. This sort of representation provides very weak inferential capabilities. The following table shows an example of simple relational knowledge. Player Height in feet Weight in pounds Bats – throws ABC 6–0 180 Left–Right XYZ 6–1 170 Right–Right PQR 6–2 190 Right–Left EFG 5–10 215 Left–Left This advantage of this knowledge is that it serves as the input to powerful inference engines. Now, this system cannot answer the questions, such as ‘Who is the heaviest player?’. It needs some procedure for finding the heaviest player. 2. Inheritable knowledge: The structure of knowledge representation must correspond to the inference mechanism. The most useful form of inference is property inheritance, in which elements of specific classes inherits attributes and values from more general classes. Objects must be organized into classes and classes must be arranged in a generalization hierarchy. Here the two attributes, isa and instance, provide the basis for property inheritance; isa is used for class inclusion and instance is used for class membership. Property inheritance provides as an inference technique for the retrieval of facts that have been explicitly stored and of facts that can be derived from those that Self-Instructional Material 149 Strong Slot and Filler Structures NOTES are explicitly stored. Thus, network representations provide a means of structuring and exhibiting the structure in knowledge. This gives us a pictorial representation of objects, their attributes and the relationships that exist between them and other entities. 3. Inferential knowledge: We may need traditional logic like first-order predicate logic to describe the inferences that are needed. How we need an inference procedure to exploit the knowledge represented and make some inferences. Many such procedures are provided which may reason either in backward direction or forward direction. An example is Resolution, in which the proof will be got by refutation. 4. Procedural knowledge: This knowledge specifies what to do and when to do. Procedural knowledge can be represented in programs. 6.5 CYCL CYC’s knowledge is encoded in a representation language called CYCL. CYCL is a frame-based system that incorporates most of the techniques like multiple inheritance, slots as full-fledged objects, transfers-through, mutually-disjoint-with, etc. CYCL generalizes the notion of inheritance so that properties can be inherited along any link, not just isa and instance. For example: 1. All birds have two legs. 2. All of Mary’s friends speak Spanish. We can encode the first fast using standard inheritance—any frame with Bird on its instance slot inherits the value 2 on its legs slot. The second fact can be encoded in a similar fashion if we allow inheritance to proceed along the friend relation. Check Your Progress 1. Who developed the theory of conceptual dependency? 2. Script is similar to a thought sequence or a chain of situations that could be anticipated. (True or False) 3. CYCL generalizes the notion of inheritance so that properties cannot be inherited along any link, not just isa and instance. (True or False) 150 Self-Instructional Material In addition to frames, CYCL contains a constraint language that allows the expression of arbitrary first-order logical expressions. The time at which the default reasoning is actually performed is determined by the direction of the slotValueSubsumes rules. If the direction is backward, the rule is an if-needed rule, and it is invoked whenever someone inquires. If the direction is forward, the rule is an if-added rule and additions are automatically propagated. While forward rules can be very useful, they can also require substantial time and space to propagate their values. If a rule is entered as backward then the system defers reasoning until the information is specifically requested. CYC maintains a separate background process for accomplishing forward propagations. A Knowledge engineer can continue entering knowledge while its effects are propagated during idle keyboard time. Frame based inference is very efficient, while general logical reasoning is computationally hard. CYC actually supports about twenty types of efficient inference mechanism, each with its own truth maintenance facility. The constraint language allows for the expression of facts that are too complex for any of these mechanisms to handle. The constraint language also provides an elegant, abstract layer of representation. In reality, CYC maintains two levels of representation: the Epistemological level (EL) and the heuristic level (HL). The EL contains facts stated in the logical constraint language, while the HL contains the same Facts are stored using efficient inference templates. There is a translation program for automatically converting an EL statement into an efficient HL representation. The EL provides a clean, simple functional interface to CYC so that users and computer programs can easily insert and retrieve information from the knowledge base. The EL / HL distinction represents one way of combining the formal neatness of logic with the computational efficiency of frames. Strong Slot and Filler Structures NOTES In addition to frames, inference mechanisms and the constraint language, CYCL performs consistency checking and conflict resolution. 6.6 GLOBAL ONTOLOGY Ontology is the philosophical study of what exists. All knowledge based systems refer to entities in the world, but in order to capture the breadth of human knowledge, we need a well-designed global ontology that specifies at a very high level what kinds of things exist and what their general properties are. The highest-level concept in CYC is called Thing. Everything is an instance of Thing. Below this top-level concept, CYC makes several distinctions including 1. Individual Object versus Collection 2. Intangible, Tangible and Composite 3. Substance 4. Intrinsic versus Extrinsic properties 5. Event and Process 6. Slots 7. Time 8. Agent 6.7 SUMMARY In this unit, you have learned about the various concepts related to strong slot and filler structures. You have studued about CYC which is a multi-user system that provides each knowledge enterer with a textual and graphical interface to the knowledge base. User’s modifications to the knowledge base are transmitted to a central serve, where they are checked and then propagated to other users. We do not have much experience with the engineering problems of building and maintaining very large knowledge bases. In future, we should have a lot of tools that checks consistency in the knowledge base. In this unit, you have also learned in detail about how conceptual dependency works out in various kinds of scenariosyou also learned (with several examples) how to write scripts and scenes in various situations. 6.8 KEY TERMS • Conceptual Dependency: It was originally developed to represent knowledge acquired from natural language input. The goals of this theory are to help in the drawing of inference from sentences and to be independent of the words used in the original input. Self-Instructional Material 151 Strong Slot and Filler Structures • Script: A script is a structure that prescribes a set of circumstances that could be expected to follow on from one another. • CYC: It is an ambitious attempt to form a very large knowledge base aimed at capturing commonsense reasoning. NOTES 6.9 ANSWERS TO ‘CHECK YOUR PROGRESS’ 1. Roger Schank 2. True 3. False 6.10 QUESTIONS AND EXERCISES Short-Answer Questions 1. Suggest a semantic net to describe the main organs of the human body. 2. Convert the following statements to Conceptual dependencies. I gave a pen to my friend. Ram ate ice cream. I borrowed a book from your friend. While going home, I saw a car. 3. Construct a script for going to a movie from the viewpoint of the movie goer. 4. Would conceptual dependency be a good way to represent the contents of a typical issue of National Geographic? 5. Construct CD representation of the following: (i) John begged Mary for a pencil. (ii) Jim stirred his coffee with a spoon. (iii) Dave took the book from Jim. (iv) On my way home, I stopped to fill my car with petrol. (v) I heard strange music in the woods. (vi) Drinking beer makes you drunk. (vii) John killed Mary by strangling her. Long-Answer Questions 1. Try capturing the differences between the following in CD: John slapped Dave, John punched Dave. Sue likes Prince, Sue adores Prince. 2. Rewrite the script given in the lecture so that the bank robbery goes wrong. 3. Write a script to allow for both outcome of the bank robbery: Getaway and going wrong and getting caught. 4. Write a script for enrolling as a student. 5. Find out about how MARGIE, SAM and PAM are implemented. In particular pay attention to their reasoning and inference mechanisms with the knowledge. 152 Self-Instructional Material 6. Find out how the CYCL language represents knowledge. Strong Slot and Filler Structures 7. What are the two levels of representation in the constraints of CYC? 8. Find out the relevance of Meta-Knowledge in CYC and how it controls the interpretations of knowledge. NOTES 6.11 FURTHER READING 1. www.egeria.cs.cf.ac.uk 2. www.cs.cf.ac.uk 3. www.cato.cl-ki.uni-osnabrueck.de 4. www.tip.psychology.org 5. www.csis.hku.hk 6. www.eil.utoronto.ca 7. www.cs.cardiff.ac.uk 8. www.rhyshaden.com 9. Levenston, E. A. 1974. ‘Teaching Indirect Object Structures in English – A Case Study in Applied linguistics’. ELT Journal. Self-Instructional Material 153 UNIT 7 NATURAL LANGUAGE PROCESSING Natural Language Processing NOTES Structure 7.0 Introduction 7.1 Unit Objectives 7.2 The Problem of Natural Language 7.2.1 Steps in the Process 7.2.2 Why is Language Processing Difficult? 7.3 7.4 7.5 7.6 Syntactic Processing Augmented Transition Networks Semantic Analysis Discourse and Pragmatic Processing 7.6.1 Conversational Postulates 7.7 7.8 7.9 7.10 7.11 7.12 7.0 General Comments Summary Key Terms Answers to ‘Check Your Progress’ Questions and Exercises Further Reading INTRODUCTION In the preceding unit, you learnt about the various concepts related to strong slot and filler structures. In this unit, you will learn about the fundamental techniques of natural language processing and evaluate some current and potential applications. This unit will also help you to develop an understanding of the limits of these techniques and of current research issues. 7.1 UNIT OBJECTIVES After going through this unit, you will be able to: • Describe the problem of natural language • Understand the concept of syntactic processing • Discuss augmented transition networks • Explain discourse and pragmatic processing 7.2 THE PROBLEM OF NATURAL LANGUAGE Natural language systems are developed both to explore general linguistic theories and to produce natural language interfaces or front ends to application systems. In the discussion here we’ll generally assume that there is some application system that the user is interacting with, and that it is the job of the understanding system to interpret the user’s utterances and ‘translate’ them into a suitable form for the application. (For example, we might translate into a database query). We’ll also Self-Instructional Material 155 Natural Language Processing NOTES assume for now that the natural language in question is English, though of course in general, it might be any other language. In general, the user may communicate with the system by speaking or by typing. Understanding spoken language is much harder than understanding typed language—our input is just the raw speech signals (normally a plot of how much energy is coming in at different frequencies at different points in time). Before we can get to work on what the speech means we must work out from the frequency spectrogram what words are being spoken. This is very difficult to do in general. To begin with, different speakers have different voices and accents and individual speakers may articulate differently on different occasions, so there’s no simple mapping from speech waveform to word. There may also be background noise, so we have to separate the signals resulting from the wind whistling in the trees from the signals resulting from Fred saying ‘Hello’. Even if the speech is very clear, it may be hard to work out what words are spoken. There may be many different ways of splitting up a sentence into words. Generally, in fluent speech, there are virtually no pauses between words, and the understanding system must guess where word breaks are. As an example of this, consider the sentence, ‘how to recognize speech’. If spoken quickly, this might be misread as saying ‘how to wreck a nice beach’. And even if we get the word breaks right, we may still not know what words are spoken—some words sound similar (e.g. bear and bare) and it may be impossible to tell which was meant without thinking about the meaning of the sentence. Because of all these problems, a speech understanding system may come up with a number of different alternative sequences of words, perhaps ranked according to their likelihood. Any ambiguities as to what the words in the sentence are (e.g. bear/bare) will be resolved when the system starts trying to work out the meaning of the sentence. Whether we start with speech signals or typed input, at some stage we’ll have a list (or lists) of words and will have to figure out what they mean. There are three main stages to this analysis: Syntactic Analysis: Where we use grammatical rules describing the legal structure of the language to obtain one or more parses of the sentence. Semantic Analysis: Where we try and obtain an initial representation (or representations) of the meaning of the sentence, given the possible parses. Pragmatic Analysis: Where we use additional contextual information to fill in gaps in the meaning representation, and to work out what the speaker was really getting at. (So, if we include speech understanding, we can think of their being four stages of processing). We won’t be saying much more about speech understanding in this course. Generally, it is feasible if we train a system for a particular speaker who speaks clearly in a quiet room, using a limited vocabulary. State-of-the-art systems might 156 Self-Instructional Material allow us to relax one of these constraints (e.g. single speaker), but no system has been developed that works effectively for a wide range of speakers, speaking quickly with a wide vocabulary in a normal environment. Anyway, the next section will discuss in some detail the stage of syntactic analysis. The sections after that will discuss rather more briefly what’s involved in semantic and pragmatic analysis. We will present these as successive stages, where we first do syntax, then semantics, then pragmatics. However, you should note that in practical systems it may NOT always be appropriate to fully separate out the stages, and we may want to interleave: for example, syntactic and semantic analysis. Natural Language Processing NOTES Normally, to define or to understand the natural language processing we should follow the contents given below: Introduction: Brief history of NLP research, current applications, generic NLP system architecture, and knowledge-based versus probabilistic approaches Finite state techniques: Inflectional and derivational morphology, finitestate automata in NLP, finite-state transducers Prediction and part-of-speech tagging: Corpora, simple N-grams, word prediction, stochastic tagging, evaluating system performance Parsing and generation I: Generative grammar, context-free grammars, parsing and generation with context-free grammars, weights and probabilities Parsing and generation II: Constraint-based grammar, unification, simple compositional semantics Lexical semantics: Semantic relations, WorldNet, word senses, word sense disambiguation Discourse: Anaphora resolution, discourse relations Applications: Machine translation, e-mail response, spoken dialogue systems Natural language processing (NLP) can be defined as the automatic (or semiautomatic) processing of human language. The term ‘NLP’ is sometimes used rather more narrowly than that, often excluding information retrieval and sometimes even excluding machine translation. NLP is sometimes contrasted with ‘computational linguistics’, with NLP being thought of as more applied. Nowadays, alternative terms are often preferred, like ‘language technology’ or ‘language engineering’. Language is often used in contrast with speech (e.g. Speech and Language Technology). In general, language understanding is a complex problem because it requires programming to extract meaning from sequences of words and sentences. At the lexical level, the program uses words, prefixes, suffixes and other morphological forms and inflections. At the syntactic level, it uses grammar to parse a sentence. Semantic interpretation (i.e., deriving meaning from a group of words) depends on domain knowledge to assess what an utterance means. For example, ‘Let’s meet by the bank to get a few bucks’ means one thing to bank robbers and another to weekend hunters. Finally, to interpret the pragmatic significance of a conversation, the computer needs a detailed understanding of the goals of the participants in the conversation and the context of the conversation. For example, to build a program that understands spoken language, we need all the facilities to understand written language as well as enough additional Self-Instructional Material 157 Natural Language Processing NOTES knowledge to handle all the noise and ambiguities of the audio signal. Thus it is useful to divide the entire language processing problem into two tasks: • Processing the written text using lexical, syntactic and semantic knowledge of the language as well as the required real-world information • Processing the spoken language using all the information needed above and additional knowledge about phonology as well as enough added information to handle further ambiguities that arise in speech Examples: 1. The Problem: No natural language program can be complete because new words, expressions and meanings can be generated liberally: I’ll bring it to you. Language can develop with the change in the experiences that we want to communicate. 2. Same expressions means different things in different contexts: Where’s the water? (in a chemistry lab, it must be pure) Where’s the water? (when you are thirsty, it must be potable) Where’s the water? (dealing with a leaky roof, it can be filthy) Language lets us communicate about an infinite world using a finite number of symbols. 3. There are lots of ways to say the same thing: John was born on November 10. John’s birthday is November 10. When you know a lot, facts imply each other. Language is intended to be used by agents who know a lot. 4. English sentences are incomplete descriptions of the information that they are intended to convey: Some dogs are outside. I called Sonia to ask her to the movies. ↓ Some dogs are on the lawn. Three dogs are on the lawn. Don, Spot and Tiger are on the lawn. She said she’d love to go. ↓ She was home when I called. She answered the phone. I actually asked her. Language allows speakers to be as vague or as precise as they like. It also allows speakers to leave out things they believe their hearers already know. We concentrate on written language processing which is usually called natural language processing. Natural language processing includes understanding and generation, as well as other tasks such as multilingual translation. Natural languages are human languages like English, Telugu and Hindi. The different areas of natural language processing are as follows: 1. Natural language understanding 2. Natural language generation 158 Self-Instructional Material 3. Natural language translations The reasons for studying natural languages are: Natural Language Processing 1. To understand how language is structured 2. To understand the mental mechanisms necessary to support use of language, e.g., memory NOTES 3. Easier communication with computers for humans Talking is easier than typing. Compact communication of complex concepts. 4. Machine translation, i.e., converting one language to another language 5. In future, intelligent systems may use natural language to talk to each other Natural language evolved because humans needed to communicate complex concepts over a bandwidth-limited serial channel, i.e., speech. All of our communi-cation methods are serial: • A small number of basic symbols (characters). • Basic symbols are combined into words. • Words are combined into phrases and sentences. Natural language is strongly biased toward minimality: • Never say something that the listener already knows. • Omit things that can be inferred. • Eliminate redundancy. • Shorten. In general, natural language processing involve a translation from the natural language to some internal representation that represents its meaning. The internal representation might be as predicate calculus, a semantic network, a frame or as script representation. There are many problems in making such a translation: 1. Ambiguity: There may be multiple ways of translating a statement. Ex. I went to bank Here bank may contain different meanings. 2. Incompleteness: The statements are usually only the bare outline of the message. The missing parts must be filled in. Ex. I was late to my office. My car wouldn’t start. The battery was dead. The study of language is traditionally divided into several broad areas: 1. Syntax: The rules by which words can be put together legally to form sentences. 2. Semantics: Meaning of sentences. 3. Pragmatics: Knowledge about the world and the social content of language use. Self-Instructional Material 159 Natural Language Processing Before getting into detail on the several components of the natural language understanding process, it is useful to survey all of them and see how they fit together. 7.2.1 Steps in the Process NOTES 1. Morphological Analysis: Individual words are analysed into their components and non-word tokens, for example, punctuation marks, are separated from the words. 2. Syntactic Analysis: Linear sequences of words are changed into structures that show how the words relate to each other. Some word sequences may be refused if they have disregarded the language’s rules for how words may be combined. For example, an English syntactic analyser would refuse the sentence ‘Boy the go the to store.’ 3. Semantic Analysis: The structures created by the syntactic analyser are assigned meaning. In other words, a mapping is made between the syntactic structures and objects and in the task domain. Structures for which no such mapping is possible may be refused. For example, the sentence ‘Colourless green ideas sleep furiously’ would be refused as semantically not suitable. 4. Discourse Integration: The meaning of an individual sentence may depend on the sentences that precede it and may influence the meaning of the sentences that follow it. For example, the word ‘it’ in the sentence, ‘Shyam wanted it’, depends of the prior discourse content, while the word ‘Shyam’ may influence the meaning of the subequent sentences (Such as, ‘he always had’). 5. Pragmatic Analysis: The structure representing what was said is reinterpreted to determine what was actually meant. For example, the sentence ‘Do you know what time it is?’ should be explained as a request to be told the time. 7.2.1.1 Syntax The stage of syntactic analysis is the best understood stage of natural language processing. Syntax helps us understand how words are grouped together to make complex sentences, and gives us a starting point for working out the meaning of the whole sentence. For example, consider the following two sentences: 1. The dog ate the bone. 2. The bone was eaten by the dog. The rules of syntax help us work out that it’s the bone that gets eaten and not the dog. A simple rule like ‘it’s the second noun that gets eaten’ just won’t work. Syntactic analysis allows us to determine possible groupings of words in a sentence. Sometimes, there will only be one possible grouping, and we will be well on the way to working out the meaning. For example, in the following sentence: 3. The rabbit with long ears enjoyed a large green lettuce. We can work out from the rules of syntax that ‘the rabbit with long ears’ forms one group (a noun phrase), and ‘a large green lettuce’ forms another noun phrase group. When we get down to working out the meaning of the sentence we can start off by working out the meaning of these word groups, before combining them together to get the meaning of the whole sentence. 160 Self-Instructional Material In other cases, there may be many possible groupings of words. For example, in the sentence ‘John saw Mary with a telescope’, there are two different readings based on the following groupings: (i) John saw (Mary with a telescope), i.e., Mary has the telescope. (ii) John (saw Mary with a telescope), i.e., John saw her with the telescope. Natural Language Processing NOTES When there are many possible groupings, then the sentence is syntactically ambiguous. Sometimes we will be able to use general knowledge to work out which is the intended grouping. For example, consider the following sentence: 4. I saw the Forth Bridge flying into Edinburgh. We can probably guess that the Forth Bridge isn’t flying! So, this sentence is syntactically ambiguous, but unambiguous if we bring to bear general knowledge about bridges. The ‘John saw Mary ..’ example is more seriously ambiguous, though we may be able to work out the right reading if we know something about John and Mary (is John in the habit of looking at girls through a telescope?). Anyway, rules of syntax specify the possible organizations of words in sentences. They are normally specified by writing a grammar for the language. Of course, just having a grammar isn’t enough to analyse the sentence; we need a parser to use the grammar to analyse the sentence. The parser should return possible parse trees for the sentence, indicating the possible groupings of words into higher-level syntactic sections. The next section will describe how simple grammar and parsers may be written; focussing on Prolog’s built in direct clause grammar formalism. Language is meant for communicating about the world. If we succeed at building a computational model of language, we should have a powerful tool for communication about the world. So we can exploit knowledge about the world, in combination with linguistic facts, to build computational natural language systems. The largest part of human linguistic communication occurs as speech. Written language is a recent invention and it plays a major role than speech in most activities. Communication Intentional exchange of information brought about by the production and perception of signs drawn from a shared system of conventional signs. • Humans use language to communicate most of what is known about the world. • The Turing test is based on language. Communication as Action • Speech act • Language production viewed as an action • Speaker, hearer, utterance Examples: Query: ‘Have you smelled the wumpus anywhere?’ Inform: ‘There’s a breeze here in 3 4.’ Request: ‘Please help me carry the gold.’ Self-Instructional Material 161 Natural Language Processing ‘I could use some help carrying this.’ Acknowledge: ‘OK’ Promise: ‘I’ll shoot the wumpus.’ NOTES Fundamentals of Language • Formal language: A (possibly infinite) set of strings • Grammar: A finite set of rules that specifies a language • Rewrite rules non-terminal symbols (S, NP, etc.) terminal symbols (he) S → NP VP NP → Pronoun Pronoun → he Chomsky’s Hierarchy Every language is specified by a particular grammar. Therefore, the classification of languages is based on the classification of the grammar used to specify them. Consider a grammar G(VN, Σ, P, S), where VN and ∑ are the sets of symbols and S ∈ VN. For classifying grammar, you need to depend on P, which is the set of productions of the grammar. Four classes of grammatical formalisms: • Recursively enumerable grammar • Unrestricted rules: both sides of the rewrite rules can have any number of terminal and non-terminal symbols AB → C • Context-sensitive grammar • The RHS must contain at least as many symbols as the LHS ASB → AXB • Context-free grammar (CFG) • LHS is a single non-terminal symbol S → XYa • Regular grammars X → a X → aY In terms of the forms of production, Chomsky classified the grammar hierarchy into four types, which are as follows: • Type 0 Grammar • Type 1 Grammar • Type 2 Grammar • Type 3 Grammar Type 0 Grammar In the formal language theory, type 0 grammar is the superset of grammars and is defined as a phase structure grammar that has no restrictions. In other words, in type 0 grammar, no restrictions are there on the left and the right sides of the productions of the grammar. Type 0 grammar is also known as unrestricted grammar. In type 0 grammar, the productions used for generating strings are known as type 0 productions. Type 0 productions are not associated with any restrictions. The languages specified by type 0 grammar are known as type 0 languages. 162 Self-Instructional Material In a production of the form φAψ → φαψ with A being a variable, φ and ψ are respectively known as the left context and the right context, while the string φαψ is known as replacement string. Again, a production having the form, φAψ → φαψ is known as type 1 production, when α ≠ Λ. If you consider a production, abcAcd → abcABcd, then Abc is the left context, while cd is the right context. Similarly, in AC → Λ, A is the left context, while Λ is the right context and the production here simply erases C. Theorem 1: If G be type 0 grammar, then you can find an equivalent grammar G1, where each production is either of the form α → β or A → a. Here, α and β are the strings variables where A is a variable and a is a terminal. Natural Language Processing NOTES Proof: To construct G1, consider a production α → β in G with α or β having the same terminals. Let in both α and β, a new variable Ca replace each of the terminals to produce α’ and β ’. Now, for every α → β, where α and β have same terminals, there is a corresponding α′ → β ′ with productions of the form Ca → a for each terminal that appears on α or β. Hence, the new productions obtained from the above construction are the new productions for G1. Also, the variables of G along with the new variables of the form Ca are the variables of G1. Similarly, the terminals and the start symbol of G1 are also same as those of G. Thus, G1 satisfies the required conditions for a grammar and it is equivalent to G. Therefore, L(G) = L(G1). Note: This theorem also holds for grammars of types 1, 2 and 3. Type 1 Grammar A grammar is termed as type 1 grammar when all of its productions are type 1 productions. In other words, you can define a grammar G (VN , ∑, P, S) , where all the productions are of the form φAψ → φαψ. Type 1 grammar is also known as context-sensitive or context-dependent grammar. The formal language that can be generated using type 1 grammar is known as type 1 or context-sensitive language. In type 1 grammar, production of the type S → Λ is allowed; however, here, S does not appear on the right-hand side of any of the production. A production φAψ → φαψ in type 1 grammar does not increase the length of the working string as |φAψ| ≤ |φαψ| and α Λ. However, if a production α → β is such that |α| ≤ |β|, then it is not necessary that the production is of type 1. For example, AB → BC is not of type 1. Again, a grammar G (VN , ∑, P, S) is called monotonic when every production in P is of the form α → β having |α| ≤ |β| or S → . In the second situation, S does not appear on the right-hand side of any of the production of G. Theorem: Every monotonic grammar G is equivalent to type 1 grammar. Proof: For the grammar G, you can apply theorem 1 to get an equivalent grammar G1. Now, consider a production A1A2…Am → B1B2…Bn, where n ≥ m in G1. For m = 1, this production is of type 1. Now, consider m ≥ 2. Then you can introduce new variables C 1, C 2, …, C nr to construct the following type 1 productions corresponding to A1A2…Am → B1B2…Bn A1A2…Am → C1A2…Am C1A2…Am → C1C2A3…Am C1C2A3…Am → C1C2C3A4…Am… C1C2…Cm−1Am → C1C2… CmBm+1 Bm+2…Bn Self-Instructional Material 163 Natural Language Processing NOTES C1C2… CmBm+1 Bm+2…Bn → B1C2… CmBm+1 Bm+2…Bn B1C2 C3…Bn → B1B2C3…Bn B1 B2…CmBm+1…Bn → B1B2… Bm…Bn Now, explaining the above construction, you can find that since more than one symbol on the left-hand side of the production A1A2…Am → B1B2…Bn are replaced, the production is not of type 1. In the chain of the above productions, A1, A2, …, Am− are replaced by C1, C2, …, Cm−1. Here, Am is replaced by CmBm+1…Bn. Similarly, 1 C1, C2, etc. are correspondingly replaced by B1, B2, etc. Since only one variable is replaced at a time in these productions so these productions are of type 1. Now, for each production in G1 that is not of type 1, you can repeat this construction process. This can give rise to a new grammar G2 having variables that are of G1together with the new variables. The productions of G2 are of type 1 and the terminals and the start symbol of G2 are those of G1. Therefore, you can conclude that G2 is of type 1 or context-sensitive, i.e., L(G2) = L(G1) = L(G) Type 2 Grammar In the formal language, a grammar is called type 2 grammar, if it only contains type 2 productions. A type 2 production is of the form A → α, where A is in VN and α is in (VN ∪ ∑)*. In type 2 production, the right-hand side of the production does not have any right or left context. For example, S → Ba, A → ab, B → abc, etc. are the examples of type 2 productions. Type 2 grammar is often termed as context-free grammar and its productions are called context-free productions. If a language is generated using type 2 grammar, then it is known as type 2 language or context-free language. Type 3 Grammar A grammar is termed as type 3 grammar if it only contains the productions of type 3. Here, a production of type 3 is a production of the form A → a or A → aB, where A and B are in VN, while a is in ∑. In type 3 grammar, the productions of the form S → Λ are allowed; however, in such cases, the right-hand side of the productions does not contain S. Type 3 grammar is generally known as regular grammar. The languages specified by type 3 grammar are known as type 3 languages or regular languages. Example 1. Find the highest type number that can be applied to the following productions: 1. S → A0, A → 1 | 2 | B0, B → 012 2. S → ASB | b, A → bA | c 3. S → bS | bc 164 Self-Instructional Material Solution: 1. Here, S → A0, A → B0 and B → 012 are of type 2, while A → 1 and A → 2 are of type 3. Therefore, the highest type number is 2. 2. Here, S → ASB is of type 2, while S b, A → bA and A → c are of type 3. Therefore, the highest type number is 2. 3. Here, S → bS is of type 3, while S → ab is of type 2. Therefore, the highest type number is 2. Chomsky Normal Form In the Chomsky Normal Form (CNF), there are certain restrictions on the length and the type of symbols to be used in the RHS of the productions. A grammar G is said to be in the CNF if the productions defined in the grammar is in one of the following forms: Natural Language Processing NOTES • A→a • A → BC • S → ∧ if ∧ ∈ G For example, the grammar G with the productions S → AB, S → Λ, A → a and B → b is in the CNF. The grammar G with the productions S → ABC, S → aC, A → b, B → a and C → c is not in the CNF because the productions S → ABC and S → aC are not in the form as required by the definition of the CNF. You can reduce a CFG into an equivalent grammar G2, which is in CNF, by performing the following steps: • Step 1: Eliminate all the null productions and then eliminate all the unit productions from the given grammar. Represent the grammar obtained after the elimination of all the null and unit productions by G = (VN, Σ, P, S). • Step 2: Eliminate all the terminals that appear with some non-terminals on the RHS of the productions. For this, you need to define the grammar G1 = (V2 N, Σ, P1, S). P1 and V2 N in G1 are constructed in the following ways: o The productions in P, which are of the forms A → a and A → BC, are placed in P1. Also, all the non-terminal symbols in VN are included in V2 N. o Consider the production, A → X 1X2 … Xn with some terminal symbols on the RHS. If Xi represents the terminal symbol, say ai, then add a new non-terminal Cai to V2 N and a new production Cai → ai to P1. Also, all the terminal symbols in the production of the form A → X1X2 … Xn are replaced by the corresponding new nonterminal symbol and the resulted production is included in P1. • Step 3: Restrict the number of the non-terminals on the RHS of a production. The RHS of the productions in P1 either consists of a terminal symbol or more than one non-terminal symbols. Thus, you need to define the grammar, G2 = (V2 2†N, Σ, P2, S), in the following ways: o All the productions in P1, which are in the required form, are included in P2. All the non-terminals in V2 N are also included in V2 2†N. o For the productions of type A → A1A2 . . . Am, where m > 3, add new productions of the following form to P2 2†: A → A1C1 C1 → A2C2 ... ... ... Cm - 2 → Am – 1Am Where, C1, C2, . . . , Cm – 2 are the new non-terminal symbols added to V2 2†N. Self-Instructional Material 165 Natural Language Processing NOTES Example 2. Reduce the grammar G to the CNF with the following productions: S → bBaA, A → aA | a, B → bB | b Solution: Since the grammar does not contain any null and unit productions, you need to proceed with step 2. Step 2: Let G1 = (V2 N, Σ, P1, S) Now, construct P1 and V2 N as follows: Add the production A → a and B → b to P1. Eliminate the terminal symbols from the productions S → bBaA, A → aA | a and B → bB | b by using the following productions: S → CbBCaA A → CaA B → CbB Ca → a Cb → b All these productions are added to P1 and V2 N = {S, A, B, Ca, Cb} Step 3: The length of the RHS of the production S → CbBCaA is four; therefore, it should be replaced by the following productions: S → CbC1 C1 → BC2 C2 → CaA Therefore, the productions in P2 are as follows: S → CbC1, C1 → BC2, C2 → CaA, A → CaA, B → CbB, Ca → a, Cb → b, A → a, and B → b. V2 2 N = {S, A, B, Ca, Cb, C1, C2} Σ = {a, b} Thus, the grammar in the CNF is represented by G2 = (V2 2 N, Σ, P2, S). Component Steps of Communication SPEAKER: • Intention Know (H, ¬Alive (Wumpus, S3)) • Generation ‘The wumpus is dead’ • Synthesis [thaxwahmpaxsihzdehd] HEARER: • Perception: ‘The wumpus is dead’ 166 Self-Instructional Material Natural Language Processing • Analysis (Parsing): (Semantic Interpretation): ¬Alive (Wumpus, Now) Tired (Wumpus, Now) (Pragmatic Interpretation): ¬Alive (Wumpus1, S3) NOTES Tired (Wumpus1, S3) HEARER: • Disambiguation: ¬Alive (Wumpus1, S3) • Incorporation: TELL (KB, ¬Alive (Wumpus1, S3)) Writing Grammar Natural language grammar specifies allowable sentence structures in terms of basic syntactic categories such as nouns and verbs, and allows us to determine the structure of the sentence. It is defined in a similar way to a grammar for programming language, though it tends to be more complex, and the notations used are somewhat different. Because of the complexity of natural language, a given grammar is unlikely to cover all possible syntactically acceptable sentences. [Note: In natural language we don’t usually parse language in order to check that it is correct. We parse it in order to determine the structure and help work out the meaning. But most grammars are just concerned with the structure of ‘correct’ English, as it gets much more complex to parse if you allow bad English.] A starting point for describing the structure of a natural language is to use context free grammar (as often used to describe the syntax of programming languages). Suppose we want grammar that will parse sentences like: 1. John ate the biscuit. 2. The lion ate the schizophrenic. 3. The lion kissed John. But we want to exclude incorrect sentences like: 1. Ate John biscuit the. 2. Schizophrenic the lion the ate. 3. Biscuit lion kissed. A simple grammar that deals with this is the following: 1. sentence → noun_phrase, verb_phrase. Self-Instructional Material 167 Natural Language Processing 2. noun_phrase → proper_name. 3. noun_phrase → determiner, noun. 4. verb_phrase → verb, noun_phrase. NOTES proper_name → [Mary] proper_name → [John] noun → [schizophrenic] noun → [biscuit] verb → [ate] verb → [kissed] determiner → [the] The notation is similar to what is sometimes used for grammars of programming languages. A sentence consists of a noun phrase and a verb phrase. A noun phrase consists of either a proper noun (using rule 2) or a determiner (e.g., the, a) and a noun. A verb phrase consists of a verb (e.g., ate) and a noun phrase. The rules at the end are really like dictionary entries, which state the syntactic category of different words. Basic syntactic categories such as ‘noun’ and ‘verb’ are terminal symbols in grammar, as they cannot be expanded into lower level categories. (We put words in square brackets because that’s the way it is generally done in prologs built in grammar notation). If we consider the example sentences just stated, the sentence ‘John ate the biscuit’ consists of a noun phrase ‘John’ and a verb phrase ‘ate the biscuit’. The noun phrase is just a proper noun, while the verb phrase consists of a verb ‘ate’ and another noun phrase (‘the biscuit’). This noun phrase consists of a determiner ‘the’ and a noun ‘biscuit’. The incorrect sentences will be excluded by grammar. For example, ‘biscuit lion kissed’ starts with two nouns, which is not allowed in grammar. However, some odd sentences will be allowed, such as ‘the biscuit kissed John’. This sentence is syntactically acceptable, just semantically odd, so should still be parsed. For a given grammar, we can illustrate the syntactic structure of the sentence by giving the parse tree, which shows how the sentence is broken down into different syntactic constituents. This kind of information may be useful for later semantic processing. Anyway, given the above grammar, the parse tree for ‘John ate the lion’ would be: Of course, the grammar given above is not really adequate to parse natural language properly. Consider the following two sentences: • Mary eat the lion. • Mary eats the ferocious lion. If we have ‘eat’ and ‘eats’ categorized as verbs, then, given the simple grammar above, the first sentence will be acceptable according to the grammar, while the second won’t—we don’t have any mention of adjectives in our grammar. To deal with the first problem we need to have some method of enforcing number/person agreement between subjects and verbs, so that things like ‘I am..’ and ‘We are ..’ are accepted, but ‘I are ..’ and ‘We am ..’ are not. To deal with the second problem, we need to add further rules to our grammar. 168 Self-Instructional Material To enforce subject-verb agreement, the simplest method is to add arguments to our grammar rules. If we’re only concerned about singular vs plural nouns (and assume that we don’t have any first or second person pronouns), we might get the rules and dictionary entries, which include the following: sentence → noun_phrase(Num), verb_phrase(Num) Natural Language Processing NOTES noun_phrase(Num) → proper_name(Num) noun_phrase(Num) → determiner(Num), noun(Num) verb_phrase(Num) → verb(Num), noun_phrase(_) proper_name(sing) → [Mary] noun(sing) → [lion] noun(plur) → [lions] det(sing) → [the] det(plur) → [the] verb(sing) → [eats] verb(plur) → [eat] Note that strictly, we no longer have context-free grammar having added these extra arguments to our rules. In general, getting the agreement right in grammar is much more complex that this. We need fairly complex rules, and we also need to put more information in dictionary entries. A good dictionary will not state everything explicitly, but will exploit general information about word structure, such as the fact that, given a verb such as ‘eat’ the third person singular form generally involves adding an ‘s’: hit/hits, eat/eats, like/likes, etc. Morphology is the area of natural language processing concerned with such things. To extend the grammar to allow adjectives we need to add an extra rule or two, e.g.: noun_phrase(Num) → determiner(Num), adjectives, noun(Num). adjectives → adjective, adjectives. adjectives → adjective. adjective → [ferocious]. adjective → [ugly]. etc. That is, noun phrases can consist of a determiner, some adjectives and a noun. Adjectives consist of an adjective and some more adjectives, OR just an adjective. We can now parse the sentence ‘the ferocious ugly lion eats Mary’. Another thing we may need to do to grammar is to extend it so we can distinguish between transitive verbs that take an object (e.g. likes) and intransitive verbs that don’t (e.g. talks). (‘Mary likes the lion’ is OK while ‘Mary likes’ is not. ‘Mary talks’ is OK while ‘Mary talks the lion’ is not). This is left as an exercise for the reader. Our grammar so far (if we put all the bits together) still only parses sentences of a very simple form. It certainly wouldn’t parse the sentences I’m currently writing! We can try adding more and more rules to account for more and more of English: for example, we need rules that deal with prepositional phrases (e.g. ‘Mary likes Self-Instructional Material 169 Natural Language Processing NOTES the lion with the long mane’), and relative clauses (e.g., ‘The lion that ate Mary kissed John’). These are left as another exercise for the reader. As we add more and more rules to allow more bits of English to be parsed then we may find that our basic grammar formalism becomes inadequate, and we need a more powerful one to allow us to concisely capture the rules of syntax. There are lots of different grammar formalisms that have been developed (e.g., unification grammar, categorical grammar), but we won’t go into them. Formal Grammar • The lexicon for εo: Noun → stench | breeze | glitter | wumpus | pit | pits | gold | … Verb → is | see | smell | shoot | stinks | go | grab | turn | … Adjective → right | left | east | dead | back | smelly | … Adverb → here | there | nearby | ahead | right | left | east | … Pronoun → me | you | I | it | … Name → John | Mary | Bush | Rocky | … Article → the | a | an | … Preposition → to | in | on | near | … Conjunction → and | or | but | … Digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 • The grammar for εo: S → NP VP | S Conjunction S NP → Pronoun | + feel a breeze | feel a breeze + and + I smell a wumpus | | Name John | Noun pits | Article Noun the + wumpus | Digit Digit 34 | NP PP the wumpus + to the east | NP RelClause the wumpus + that is smelly VP → Verb stinks | VP NP feel + a breeze | VP Adjective is + smelly | VP PP turn + to the east | VP Adverb go + ahead PP → Preposition NP to + the east RelClause → that VP that + is smelly • Parts of speech Open class: noun, verb, adjective, adverb Closed class: pronoun, article, preposition, conjunction … 170 Self-Instructional Material • Grammar Natural Language Processing Overgenerate: ‘Me go Boston’ Undergenerate: ‘I think the wumpus is smelly’ 7.2.1.2 Parsers NOTES Having a grammar isn’t enough to parse natural language; you need a parser. The parser should search for possible ways the rules of the grammar can be used to parse the sentence, so parsing can be viewed as a kind of search problem. In general, there may be many different rules that can be used to ‘expand’ or rewrite a given syntactic category, and the parser must check through them all, to see if the sentence can be parsed using them. For example, in our mini-grammar above there were two rules for noun_phrases—a parse of the sentence may use either one or the other. In fact we can view the grammar as defining an AND-OR tree to search: alternative ways of expanding a node give OR branches, while rules with more than one syntactic category on the right hand side give AND branches. So, to parse a sentence we need to search through all these possibilities, effectively going through all possible syntactic structures to find one that fits the sentence. There are good ways and bad ways of doing this, just as there are good and bad ways of parsing programming languages. One way is basically to do a depth-first search through the parse tree. When you reach the first terminal node in the grammar (i.e. a primitive syntactic category, such as noun) you check whether the first word of the sentence belongs to this category (e.g., is a noun). If it does, then you continue the parse with the rest of the sentence. If it doesn’t, you backtrack and try alternative grammar rules. As an example, suppose you were trying to parse ‘John loves Mary’, given the following grammar: sentence → noun_phrase, verb_pharse. verb_phrase → verb, noun_phrase. noun_pharse → det, noun. noun_phrase → p_name. verb → [loves]. p_name → [john]. p_name → [Mary]. You might start off expanding ‘sentence’ to a verb phrase and a noun phrase. Then the noun phrase would be expanded to give a determiner and a noun, using the third rule. A determiner is a primitive syntactic category (a terminal node in the grammar); so we check whether the first word (John) belongs to that category. It doesn’t— John is a proper noun—so we backtrack and find another way of expanding ‘noun_phrase’ and try the fourth rule. Now, as John is a proper name this will work OK, so we continue the parse with the rest of the sentence (‘loves Mary’). We haven’t yet expanded the verb phrase, so we try to parse ‘loves Mary’ as a verb phrase. This will eventually succeed, so the whole thing succeeds. It may be clear by now that in Prolog the parsing mechanism is really just Prolog’s built-in search procedure. Prolog will just backtrack to explore the different possible syntactic structures. The extra arguments which Prolog (internally) adds to Self-Instructional Material 171 Natural Language Processing NOTES DCG rules allows it to parse a bit of the sentence using one rule, then the rest of the sentence using another. Note that this kind of simple top-down parser can be implemented reasonably easily in other languages that don’t support backtracking by using an agenda-based search mechanism. An item in the agenda may be the remaining syntactic categories and bits of sentences left to parse, given this branch of the search tree (e.g. [[verb, noun_phrase], [loves, Mary]]). Simple parsers are often inefficient, because they don’t keep a record of all the bits of the sentence that have been parsed. Using simple depth-first search with backtracking can result in useful bits of parsing being undone on backtracking: if you have a rule ‘a -> b, c, d’ and a rule ‘a -> b, c, e’ then you may find the first part of the sentence consists of ‘b’ and ‘c’ constituents, go on to check if the rest is a ‘d’ constituent, but when that fails a Prolog-like system will throw away its conclusions about the first half when backtracking to try the second rule. It will then duplicate the parsing of the ‘b’ and ‘c’ bits. Anyway, better parsers try to avoid this. Two important kinds of parsers used in natural language are transition network parsers and chart parsers. A transition network parser makes use of a more flexible network representation for grammar rules to partially avoid this problem. The above rules would be represented something like: The transition network parser would basically traverse this network, checking that words in the sentence match the syntactic categories on each arc. A chart parser goes further in avoiding duplicating work by explicitly recording in a chart all the possible parses of each bit of the sentence. Parse Tree Syntactic Analysis (Parsing) • Parsing: The process of finding a parse tree for a given input string. • Top-down parsing: Start with the S symbol and search for a tree that has the words as its leaves. • Bottom-up parsing: Start with the words and search for a tree with root S. Trace of Bottom-up Parsing: 172 Self-Instructional Material List of nodes Subsequence Rule the wumpus is dead the Article → the Article wumpus is dead wumpus Noun → wumpus Article Noun is dead Article Noun NP → Article Noun NP is dead is Verb → is NP Verb dead dead Adjective → dead NP Verb Adjective Verb VP → Verb NP VP Adjective VP Adjective VP → VP Adjective NP VP NP VP S → NP VP Natural Language Processing NOTES S 7.2.1.3 Semantics and pragmatics The remaining two stages of analysis, semantics and pragmatics, are concerned with getting at the meaning of a sentence. In the first stage (semantics), a partial representation of the meaning is obtained based on the possible syntactic structure(s) of the sentence, and on the meanings of the words in that sentence. In the second stage, the meaning is elaborated based on contextual and world knowledge. To illustrate the difference between these stages, consider the sentence: He asked for the boss From the knowledge of the meaning of the words and the structure of the sentence we can work out that someone (who is male) asked for someone who is the boss. But we can’t say who these people are and why the first guy wanted the second. If we know something about the context (including the last few sentences spoken/ written) we may be able to figure these things out. Maybe the last sentence was ‘Fred had just been sacked.’, and we know from our general knowledge that bosses generally sack people and if people want to speak to people who sack them it is generally to complain about it. We could then really start to get at the meaning of the sentence— Fred wants to complain to his boss about getting sacked. Anyway, this second stage of getting at the real contextual meaning is referred to as pragmatics. The first stage—based on the meanings of the words and the structure of the sentence—is semantics and is what we’ll discuss a bit more next. • Semantics • Pragmatics Semantics In general, the input to the semantic stage of analysis may be viewed as being a set of possible parses of the sentence, and information about the possible word meanings. The aim is to combine the word meanings, given knowledge of the sentence structure, to obtain an initial representation of the meaning of the whole sentence. The hard thing, in a sense, is to represent word meanings in such a way that they may be combined with other word meanings in a simple and general way. It is not really possible even to present a simple approach to semantic analysis in less than half a lecture—we can only outline some of the problems and show what the output of this stage of analysis may be (though in general there are many different semantic theories and representations, just as there are many different grammar formalisms). Self-Instructional Material 173 Natural Language Processing First, let’s go back to our syntactically ambiguous sentences and see how semantics could help: • Time flies like an arrow. NOTES • Fruit flies like a banana. If we have some representation of the meanings of the different words in the sentence we can probably rule out the silly parse. We might look up ‘banana’ (maybe in some frame or semantic net system) and find that it is a fruit, and fruits generally don’t fly. We might then be able to throw out the reading ‘flies like a banana’ if we made sure that sentences which mean ‘X does something like Y’ require that X and Y can do that thing! Sometimes ambiguity is introduced at the stage of semantic analysis. For example: John went to the bank Did John go to the river bank or the financial bank? We might want to make this explicit in our semantic representation, but without contextual knowledge we have no good way of choosing between them. This kind of ambiguity occurs when a word has two possible meanings, but both of them may, for example, be nouns. To obtain a semantic representation it helps if you can combine the meanings of the parts of the sentence in a simple way to get at the meaning of the whole (The term compositional semantics refers to this process). For those familiar with lambda expressions, one way to do this is to represent word meanings as complex lambda expressions, and just use function application to combine them. To combine a noun phrase ‘John’ and a verb phrase ‘sleeps’ we might have: Verb phrase meaning : λ X.sleeps (X) Noun phrase meaning : john. Apply VP meaning to NP meaning : λ X.sleeps (X)(john) = sleeps (john) The output of the semantic analysis stage may be anything from a semantic net to an expression in some complex logic. It will partially specify the meaning of the sentence. From ‘He went to the bank’ we might have two possible readings, represented in predicate logic as: ∃ X male (X) wentto (X, Y) financialbank (Y) ∃ X male (X) wentto (X, Y) riverbank (Y) Good representations for sentence meaning tend to be much more complex than this, to properly capture tense, conditionals, etc., but this gives the general flavour. Semantic Interpretation: • Semantics: Meaning of utterances • First-order logic as the representation language • Compositional semantics: Meaning of a phrase is composed of meaning of the constituent parts of the phrase Exp(x) → Exp(x1) Operator (op) Exp(x2) {x = Apply (op, x1, x2)} Exp(x) → (Exp (x)) 174 Self-Instructional Material Exp(x) → Number(x) Natural Language Processing Number(x) → Digit(x) Number(x) → Number(x1) Digit(x2) {x = 10 × x1 + x2} Digit(x) → x {0 ≤ x ≤ 9} NOTES Operator(x) → x {x “ {+, –, ×, ÷}} Example: John loves Mary Loves (John, Mary) (λy λx Loves (x, y)) (Mary) ≡ λx Loves (x, Mary) (λx Loves (x, Mary)) (John) ≡ Loves (John, Mary) S (rel (obj)) → NP (obj) VP (rel) VP (rel (obj)) → Verb (rel) NP (obj) NP (obj) → Name (obj) Name (John) → John Name (Mary) → Mary Verb (λy λx Loves (x, y)) → loves Pragmatics Pragmatics is the last stage of analysis, where the meaning is elaborated based on contextual and world knowledge. Contextual knowledge includes knowledge of the previous sentences (spoken or written), general knowledge about the world and knowledge of the speaker. One important task at this stage is to work out referents of expressions. For example, in the sentence ‘he kicked the brown dog’ the expression ‘the brown dog’ refers to a particular brown dog (say, Fido). The pronoun ‘he’ refers to the particular guy we are talking about (Fred). A full representation of the meaning of the sentence should mention Fido and Fred. We can often find this out by looking at the previous sentence, e.g.: Fred went to the park. He kicked the brown dog. Self-Instructional Material 175 Natural Language Processing NOTES We can work out from this that ‘he’ refers to Fred. We might also guess that the brown dog is in the park, but to figure out that we mean Fido we’d need some extra general or contextual knowledge—that the only brown dog that generally frequents the park is Fido. In general, this kind of inference is pretty difficult, though quite a lot can be done using simple strategies, like looking at who’s mentioned in the previous sentence to work out who ‘he’ refers to. Of course, sometimes there may be two people (or two dogs) that the speaker might be referring to, e.g.: There was a brown dog and a black dog in the park. Fred went to the park with Jim. He kicked the dog. In cases like this we have referential ambiguity. It is seldom quite as explicit as this, but in general can be a big problem. When the intended referent is unclear a natural language dialogue system may have to initiate a clarification subdialogue, asking for example ‘Do you mean the black one or the brown one.’. Anyway, another thing that is often done at this stage of analysis (pragmatics) is to try and guess at the goals underlying utterances. For example, if someone asks how much something is you generally assume that they have the goal of (probably) buying it. If you can guess at people’s goals you can be a bit more helpful in responding to their questions. So, an automatic airline information service, when asked when the next flight to Paris is, shouldn‘t just say ‘6 pm’ if it knows the flight is full. It should guess that the questioner wants to travel on it, check that this is possible, and say ‘6 pm, but it’s full. The next flight with an empty seat is at 8 pm.’ Pragmatic Interpretation • Adding context-dependent information about the current situation to each candidate semantic interpretation • Indexicals: Phrases that refer directly to the current situation Ex.: ‘I am in Boston today’ (‘I’ refers to speaker and ‘today’ refers to now) Generation We have so far only discussed natural language understanding. However, you should be aware also of the problems in generating natural language. If you have something you want to express (e.g. eats (John, chocolate)), or some goal you want to achieve (e.g. get Fred to close the door), then there are many ways you can achieve that through language: He eats chocolate. It’s chocolate that John eats. John eats chocolate. Chocolate is eaten by John. Close the door. It’s cold in here. Can you close the door? A generation system must be able to choose appropriately from among the different possible constructions, based on knowledge of the context. If a complex text is to be written, it must further know how to make that text coherent. 176 Self-Instructional Material Anyway, that’s enough on natural language. The main points to understand are roughly what happens at each stage of analysis (for language understanding), what the problems are and why, and how to write simple grammar in Prolog’s DCG formalism. Language Generation Natural Language Processing NOTES The same DCG can be used for parsing and generation • Parsing: Given: S (sem, [John, loves, Mary]) Return: sem = Loves (John, Mary) • Generation: Given: S (Loves (John, Mary), words) Return: words = [John, loves, Mary] • Ambiguity Lexical ambiguity ‘the back of the room’ vs ‘back up your files’ ‘In the interest of stimulating the economy, the government lowered the interest rate.’ Syntactic ambiguity (structural ambiguity) ‘I smelled a wumpus in 2,2’ Semantic ambiguity ‘the IBM lecture’ Pragmatic ambiguity ‘I’ll meet you next Friday’ Discourse Understanding • Discourse: Multiple sentences • Reference resolution: The interpretation of a pronoun or a definite noun phrase that refers to an object in the world. ‘John flagged down the waiter. He ordered a ham sandwich.’ ‘He’ refers to ‘John’ ‘After John proposed to Mary, they found a preacher and got married. For the honeymoon, they went to Hawaii.’ ‘they’? ‘the honeymoon’? • Structure of coherent discourse: Sentences are joined by coherence relations • Examples of coherence relations between S1 and S2: Enable or cause: S1 brings about a change of state that causes or enables S2 ‘I went outside. I drove to school.’ Explanation: The reverse of enablement, S2 causes or enables S1 and is an explanation for S1. ‘I was late for school. I overslept.’ Exemplification: S2 is an example of the general principle in S1. Self-Instructional Material 177 Natural Language Processing ‘This algorithm reverses a list. The input [A, B, C] is mapped to [C, B, A].’ Etc. 7.2.2 Why is Language Processing Difficult? NOTES Consider trying to build a system that would answer e-mails sent by customers to a retailer selling laptops and accessories via the Internet. This might be expected to handle queries such as the following: • Has my order number 4291 been shipped yet? • Is FD5 compatible with a 565G? • What is the speed of the 565G? Assume the query is to be evaluated against a database containing product and order information, with relations such as the following: ORDER Order number Date ordered Date shipped 4290 2/12/08 2/1/09 4291 2/12/08 2/1/09 4292 2/12/08 USER : Has my order number 4291 been shipped yet? DB QUERY : Order (number=4291, date shipped=?) RESPONSE TO USER : Order number 4291 was shipped on 2/12/08 It might look quite easy to write patterns for these queries, but very similar strings can mean very different things, while very different strings can mean much the same thing. 1 and 2 below look very similar but mean something completely different, while 2 and 3 look very different but mean much the same thing. 1. How fast is the 565G? 2. How fast will my 565G arrive? 3. Please tell me when I can expect the 565G I ordered. While some tasks in NLP can be done adequately without having any sort of account of meaning, others require that we can construct detailed representations that will reflect the underlying meaning rather than the superficial string. In fact, in natural languages (as opposed to programming languages), ambiguity is ubiquitous, so exactly the same string might mean different things. For instance in the query: Do you sell Sony laptops and disk drives? Check Your Progress 1. List the three main stages of speech analysis. 2. State the two important kinds of parsers used in natural language. 178 Self-Instructional Material The user may or may not be asking about Sony disk drives. We’ll see lots of examples of different types of ambiguity in these lectures. Often, humans have knowledge of the world which resolves a possible ambiguity, probably without the speaker or hearer even being aware that there is a potential ambiguity. But hand-coding such knowledge in NLP applications has turned out to be impossibly hard to do for more than very limited domains: the term AIcomplete is sometimes used (by analogy to NP-complete), meaning that we’d have to solve the entire problem of representing the world and acquiring world knowledge. The term AI-complete is intended somewhat jokingly, but conveys what’s probably the most important guiding principle in current NLP: we’re looking for applications which don’t require AI-complete solutions, i.e., ones where we can work with very limited domains or approximate full world knowledge by relatively simple techniques. 7.3 Natural Language Processing NOTES SYNTACTIC PROCESSING The most basic unit of linguistic structure appears to be the word. Morphology is the study of the construction of words from more basic components corresponding roughly to meaning units. In particular, a word may consist of a route from plus additions affixes. For example, the word, friend can be considered to be a route from which the adjective friendly is constructed by adding a suffix – ly. This form can then be used to derive the adjective unfriendly by adding a prefix. Unknown phrases(abbreviated as NP) are used to refer to things like objects, concepts, places, events, qualities and so on. While noun phrases are used to refer the things to sentences (abbreviated as S) are used to assert, and queries are bring about some partial description of the word. You can build more complex sentences from smaller sentences by allowing one sentence to include another as a sub-clause. To examine how the syntactic structure of a sentence can be computed, we must consider two things: 1. Grammar, which is a formal specification of the structures allowable in the language, and 2. Parsing technique, which is the method of analysing the sentence to determine its structure according to the grammar. The most common way of representing a sentence structure is to use a tree-like representation that outlines how a sentence is broken into its major subparts. For example, consider the following sentence, Shyam ate the bun. It can be represented as a tree representation, i.e.: Here S – Sentence NP – Noun Phrase Self-Instructional Material 179 Natural Language Processing NOTES VP – Verb Phrase NAME – Noun ART Article – To construct such descriptions, you must know what structures are legal for English. A set of rules called rewrite rules describe the tree structures that are allowable. These rules say that a certain symbol may be expanded in the tree by a sequence of other symbols. For instance, a set of rules that would allow the tree structure in the above figure is the following figure: S → NP VP VP → VERB NP NP → NAME NP → ART NOUN Here, S may consist of NP followed by a VP, VP may consist of VERB followed by NP and so on. Grammars consisting entirely of rules of the form, are called Context <Sym> → <Sym>1………..<Sym>m for n>=1. Free Grammars (CFGs). Symbols that cannot be further decomposed in a grammar, such as NOUN, ART and VERB in the above example, are called terminal symbols. The other symbols such as NP, VP, S are called non-terminal symbols. Two common and simple techniques for CFGs are known as top-down parsing and bottom-up parsing. Top-down Parsing begins by starting with the symbol S and rewriting it say, to NP VP. These symbols may then themselves be rewritten as per the rewrite rules. Finally, terminal symbols such as NOUN may be rewritten by a word, such as rat, which is marked as a noun in the lexicon. Thus, in top-down parsing you use the rules of the grammar in such as way that the right hand side of the rule is always used to rewrite the symbol on the left hand side. In bottom-up parsing, we start with the individual words and replace them with their syntactic categories. A possible top-down parsing for ‘Shyam ate the bun’ will be S → NP VP → NAME VP → Shyam VP → Shyam VERB NP → Shyam ate NP → Shyam ate ART NOUN → Shyam ate the NOUN → Shyam ate the bun. A possible bottom-up parsing of the sentence ‘Shyam ate the bun’ may be → NAME ate the bun → NAME VERB the bun 180 Self-Instructional Material → NAME VERB ART bun → NAME VERB ART NOUN Natural Language Processing → NP VERB ART NOUN → NP VERB NP → NP VP NOTES →S Another grammar representation is often more convenient for visualizing the grammar. This formalism is based open the notion of transition network consisting of nodes and labelled arcs. Consider the network named S in the following grammar: Each arc is labelled with a word category. Starting at a given node, you can traverse an arc if the current word in the sentence is in the category on the arc. If the arc is followed, the current word is updated to the next word. This network recognized the same set of sentences as the following content-free grammar: S → NOUN S1 S 1 → VERB S2 S 2 → NOUN Example: Shyam ate mango can be accepted by the above grammar. The Simple Transition Network (STN) formalism is not powerful enough to describe all languages that can be described by a CFG. To get the descriptive power of CFGs, we need a notion of recursion in the network grammar. A Recursive Transition Network (RTN) is like a simple transition network, except that it allows arc labels that refer to other networks rather than word categories. Self-Instructional Material 181 Natural Language Processing NOTES 7.4 AUGMENTED TRANSITION NETWORKS An augmented transition network (ATN) is a top-down parsing procedure that allows various kinds of knowledge to be incorporated into the parsing system so it can operate efficiently. Suppose we want to collect the structure of legal sentences in order to further analyse them. One structure seen earlier was the syntactic parse tree that motivated context-free grammars. It is more convenient, though, to represent structure using a slot filler representation, which reflects more the functional role of phrases in a sentence. For instance, you could identify one particular noun phrase as the syntactic subject (SUBJ) of a sentence and another as the syntactic object of the verb (OBJ). Within noun phrases, you might identify the determiner structure, adjectives and the head noun and so on. Thus, the sentence ‘Shyam found a diamond’ might be represented by the structure. (S SUBJ (NP NAME Shyam) MAIN-V Found TENSE PAST OBJ (NP DET A HEAD diamond) Such a structure is created by the Recursive Transition network parser by allowing each network to have a set of registers. Registers are local to each network. Thus each time a new network is pushed, a new set of empty registers is created. When the network is popped, the registers disappear. Registers can be set to values, and values can be retrieved from registers. In this case, the registers will have the names of the slots used for each for each of the preceding syntactic structures. Thus the NP network has registers named DET, ADJS, HEAD and NUM. Registers are set by actions that can be specified on the arcs. When an arc is followed, the actions associated with it are executed. The most common action involves setting a register to a certain value. When a pop arc is followed, all the registers set in the current network are automatically collected to form a structure consisting of the network name followed by a list of the registers with their values. An RTN with registers, and tests and actions on those registers, is an Augmented Transition Network. The registers can be used to make the grammar more selective without unnecessarily complicating the network. For instance, in English, the number of the first noun phrase must correspond to the number of the verb. Thus, you cannot have a singular first NP and a plural verb, as in ‘The cat is sleeping’. More importantly, 182 Self-Instructional Material number agreement is crucial in disambiguating some sentences. Consider the contrast between ‘flying planes is dangerous’, where the subject is the activity of flying planes, and ‘flying planes are dangerous’, where the subject is a set of planes that are flying. Without checking the subject-verb agreement, the grammar could not distinguish between these two reading. You could encode number information by classifying all words as to whether they are singular or plural; then you could build up a grammar for singular sentences and for plural sentences. In an RTN, this approach is shown in the following figure: But the method doubles the size of the grammar. In addition, to checking some other feature, you would need to double the size of the grammar again. Besides the problems of size, this approach also forces you to make the singular plural distinction even when it is not necessary to check it. A better solution is to allow words and syntactic structures to have features as well as a basic category. We can do so by using the slot value list notation introduced earlier for syntactic structure. This notation allows you to store number information as well as other useful information about the word in a data structure called the lexicon. The following table shows a lexicon with the root and number information for some words, in which ‘3S’ means, third person singular and ‘3P’ means third person plural. Word Representation Cats (NOUN ROOT Cat Num {SP}) Cat (NOUN ROOT Cat Num {35}) the (ART ROOT THE NUM M{3S 3P}) a (ART ROOT A Natural Language Processing NOTES Check Your Progress 3. List the two common techniques for CFGs. 4. Simple transition network (STN) formalism is not powerful enough to describe all languages that can be described by a CFG. (True or False?) 5. An augmented transition network (ATN) is a bottomup parsing procedure that allows various kinds of knowledge to be incorporated into the parsing system so it can operate efficiently. (True or False?) NUM {3S}) Self-Instructional Material 183 Natural Language Processing cried (VERB ROOT CRT NUM {3S 3P}) hates (VERB ROOT hate NUM {3S}) NOTES hate (VERB ROOT hate NUM {3P}) Shyam (NAME ROOT Shyam) Now extend the RTN formalism by adding a test to each arc in the network. A test is simply a function that is said to succeed if it returns a non-empty value, such as a set or atom, and to fail if it returns the empty set or nil. If a test fails, its arc is not traversed. 7.5 SEMANTIC ANALYSIS Deriving the syntactic structure of a sentence is just one step toward the goal of building a model of the language understanding process. A sentence’s structure is as important as its meaning. While it is difficult to define precisely, the sentence meaning allows you to conclude that the following two sentences are saying much the same thing: • I gave a contribution to the local party. • The local party received a donation from me. Sentence meaning, coupled with general knowledge, is also used in question answering. Thus, if it is asserted that ‘Giri drove to the office’, then the answer to the question, ‘Did Giri get into a car?’, is ‘yes’. To answer such a question, you need to know that the action described by the verb, driving is an activity that must be done in a car. The approach taken here divides the problems of semantic interpretation into two stages. In the first stage, appropriate meaning of each word is determined and these meanings are combined to form a logical form (LF). The logical form is then interpreted with respect to contextual knowledge, resulting in a set of conclusions that can be made from the sentence. An intermediate semantic representation is derivable because it provides a natural division between two separate, but not totally independent problems. The first problem concerns word sense and sentence ambiguity. Just as many words fall into multiple syntactic classes many have several different meanings or senses. The verb, go, for instance, has over 50 distinct meanings. Even though most words in a sentence have multiple senses, a combination of words in a phrase often has only a simple sense, since the words mutually constrain each other’s possible interpretations. 184 Self-Instructional Material The second problem involves using knowledge of the world and the present context to identify the particular consequences of a certain sentence. For instance, if you hear the sentence, ‘The principal has resigned’, you must use knowledge of the situation to identify who the principal is (and what college he or she heads) and to conclude that the person no longer holds a position in that college. Depending on the circumstances, you might also infer other consequences, such as the need for a replacement principal. In fact, the consequences of a given sentence could have far reaching effects. Most attempts to formalize this process involve encoding final sentence meaning in some form of logic and modelling the derivation of consequences as a form of logical deduction. The first process just described, which is known as word sense disambiguation, is not easily modelled as a deductive process. The components of meaning—word meanings, phase meanings and so on— often correspond at best to fragments of logic, such as predicate names, types and so on, over which deduction is not defined. For example, when analysing the meaning of the following sentences you might use the predicate BREAK to model the verb meaning. Natural Language Processing NOTES 1. Sasi broke the window with a hammer 2. The hammer broke the window. 3. Sasi broke the window. 4. The window broke. Furthermore, assume that BREAK takes three arguments: the breaker, the thing broken and the instrument used for breaking. To analyse these sentences, you must utilize information about the semantics of the NPS Sasi, the hammer and the window, together with the syntactic form of the sentence, to decide that the subject NP corresponds to the third argument in sentence 2, the first argument in sentences, 1 and 3 and the second argument in sentence 4. Semantic Grammars A semantic grammar is a context-free grammar in which the choice of non- terminals and production rules is governed by semantic as well as syntactic function. There is usually a semantic action associated with each grammar rule. The result of parsing and applying all the associated semantic actions is the meaning of the sentence. This close coupling of semantic actions to grammar rules works because the grammar rules themselves are designed around key semantic concepts. Let us look at an example of semantic grammar: S → What is FILE – PROPERTY of FILE? {Query FILE, FILE-PROPERTY} S → I want to ACTION {command ACTION} FILE-PROPERTY → the FILE-PROP. {FILE-PROP} FILE-PROP → extension | protection | creation date | owner {value} FILE → FILE-NAME | FILE 1 {value} FILE 1 → USER’S FILE 2 {FILE2.owner:USER} FILE1 → FILE2 FILE2 → EXT file {instance: file-struct extension: EXT) (FILE2} Self-Instructional Material 185 Natural Language Processing EXT → init | .text | .lsp | .for | .PS | .mss Value ACTION → Print FILE {instance: Printing NOTES object: FILE} ACTION → print FILE on PRINTER {instance: printing Object: FILE Printer: PRINTER ,} USER → Bill | Susan (Value) A semantic grammar can be used by a parsing system in exactly the same way in which a strictly syntactic grammar could be used. Several existing parsing systems that have used semantic grammars have been built around an ATM parsing system, since it offers a great deal of flexibility. 7.6 DISCOURSE AND PRAGMATIC PROCESSING To understand even a single sentence, it is necessary to consider the discourse and pragmatic context in which the sentence was uttered. There are a number of important relationships that may hold between phrases and parts of their discourse contexts, including: • Identical entities. Ex. — Ram had a red balloon. Sam wanted it. The word ‘it’ identified as referring to the red balloon. Such references are called anaphoric references or anaphora. • Parts of entities. Ex. — Rani opened the book she just bought. The title pages were torn. The phrase ‘the title page’ recognized as part of the book bought by Rani. • Parts of actions. Ex. — Sam went on a business trip to New York. He left on an early morning flight. Taking a flight should be recognized as part of going on a trip. • Entities involved in actions. Ex. — My house was broken into last week. They took the TV and stereo. Pronoun ‘they’ should be recognized as referring to the burglars who broke the house. 186 Self-Instructional Material • Elements of sets. Natural Language Processing Ex. — The decals we have in stock are stars, the moon, item and a flag. I’ll take two moons. To understand the second sentence all that is required is that we use the context of the first sentence to establish that the word ‘moons’ mean moon decals. NOTES • Name of the individuals. Ex. — John went to the movie. John should be understood to be some person name John. • Causal chains. Ex. — There was a big snow storm yesterday. The schools were closed today. The snow should be recognized as the reason that the schools were closed. • Planning Sequences. Ex. — Shalu wanted a new car. She decided to get a job. Shalu’s sudden interest in a job should be recognized as arising out of her desire for a new car and thus for the money to buy a car. • Illocutionary force. Ex. — It sure is cold in here. In many circumstances, this sentence should be recognized as having, as its intended effect, that the hearer should do something like close the window or turn up the thermostat. • Implicit presuppositions. Ex. — Did Sam fail CS101? Speaker’s presuppositions include the fact that CS101 is a valid course, Sam is a student, and that Sam took CS101, should recognize that he failed. In order to be able to recognize these kinds of relationships among sentences, a great deal of knowledge about the world is required. Programs that can do multiple sentence understanding rely either on large knowledge bases or on strong constraints on the domain of discourse so that only a more limited knowledge base is necessary. The way this knowledge is organized is critical to the success of the understanding program. We should focus on the use of the following kinds of knowledge: 1. The current focus of the dialogue. 2. A model of each participant’s current beliefs. 3. The goal-driven character of dialogue. 4. The rules of conversation shared by all participants. 7.6.1 Conversational Postulates Unfortunately, this analysis of language is complicated by the fact that we don’t say exactly what we mean. We often use indirect speech acts such as ‘It sure is cold in here’. Regularity gives rise to a set of conversational postulates, which are rules Self-Instructional Material 187 Natural Language Processing about conversation that are shared by all speakers. Usually, these rules are followed. Some of these conversational postulates are: • Sincerity condition Ex: — Can you open the door? NOTES • Reasonableness conditions Ex: — Can you open the door? Why do you want it open? • Appropriateness Conditions Ex: — Who won the race? Someone with long, dark hair. I thought you knew all the runners. Assuming a cooperative relationship between the parties to a dialogue, the shared assumption of these postulates greatly facilitates communication. 7.7 GENERAL COMMENTS • Even ‘simple’ NLP applications, such as spelling checkers, need complex knowledge sources for some problems. • Applications cannot be 100 per cent perfect, because full real world knowledge is not possible. • Applications that are less than 100 per cent perfect can be useful (humans aren’t 100 per cent perfect anyway). • Applications that aid humans are much easier to construct than applications which replace humans. It is difficult to make the limitations of systems which accept speech or language obvious to naive human users. • NLP interfaces are nearly always competing with a non-language based approach. • Nearly all applications either do relatively shallow processing on arbitrary input or deep processing on narrow domains. MT can be domain-specific to varying extents: MT on arbitrary text isn’t very good, but has some applications. • Limited domain systems require extensive and expensive expertise to port. Research that relies on extensive hand-coding of knowledge for small domains is now generally regarded as a dead-end, though reusable hand coding is a different matter. • The development of NLP has mainly been driven by hardware and software advances and societal and infrastructure changes, not by great new ideas. Improvements in NLP techniques are generally incremental rather than revolutionary. 188 Self-Instructional Material 7.8 SUMMARY In this unit, you have learned about the tough problem of language understanding. One interesting way to summarize the natural language understanding problem is to view it as a constraint satisfaction problem. Unfortunately, many more kinds of constraint satisfactions must be considered. But constraint satisfaction does provide a reasonable framework in which to view the whole collection of steps that together create a meaning for a sentence. Essentially, each of the steps described in the unit exploits a particular kind of knowledge that contributes a specific set of constraints that must be satisfied by any correct final interpretation of a sentence. Natural Language Processing NOTES You have also learned about syntactic processing which contributes a set of constraints derived from the grammar of the language. Semantic processing contributes an additional set of constraints derived from the knowledge it has about entities that can exist in the world. The unit also discussed discourse processing which contributes a further set of constraints that arise from the stricture of coherent discourses. And finally, pragmatics contributes yet another set of constraints. There are many important issues in natural language processing. By combining understanding and generation systems, it is possible to attack the problem of machine translation, by which we understand text written in one language and then generate it in another language. 7.9 KEY TERMS • Natural language systems: These are developed both to explore general linguistic theories and to produce natural language interfaces or front ends to application systems. • Natural language processing: It can be defined as the automatic (or semiautomatic) processing of human language. • Syntax: The stage of syntactic analysis is the best understood stage of natural language processing. Syntax helps us understand how words are grouped together to make complex sentences, and gives us a starting point for working out the meaning of the whole sentence. 7.10 ANSWERS TO ‘CHECK YOUR PROGRESS’ 1. Syntactic analysis, semantic analysis and pragmatic analysis. 2. Two important kinds of parsers used in natural language are transition network parsers and chart parsers 3. Two common and simple techniques for CFGs are top-down parsing and bottom-up parsing. 4. True 5. False Self-Instructional Material 189 Natural Language Processing 7.11 QUESTIONS AND EXERCISES 1. What is natural language processing? NOTES 2. Write the production rules necessary to check the syntax of an English noun. The grammar shall include both proper and common nouns. 3. What is a simple transition network (STN)? 4. Describe syntactic grammars and semantic grammars. 7.12 FURTHER READING 1. www.home.yawl.com.br 2. www.cee.hw.ac.uk 3. www.comp.nus.edu 4. www.gmitweb.gmit.ie 5. www.atn.ath.cx 6. www.cs.darmouth.edu 7. www.aiexplained.com 8. www.cl.cam.ac.uk 9. www.kbs.twi.tudelft.nl 10. www.eecs.berkeley.edu 11. www.csic.cornell.edu 12. www.claes.sci.eg 13. www.cl.cam.ac.uk 14. www.bluehawk.monmouth.edu 15. www.www-ist.massey.ac.nz 16. www.min.ecn.purdue.edu 17. www.crg9.uwaterloo.ca 18. www.free-esl-blogs.com 19. www.csic.cornell.edu 20. De, Suranjan. 1986.‘A Framework for Integrated Problems, Solving in Manufacturing’. IIE Transactions, 9th January. 190 Self-Instructional Material