Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Artificial Intelligence and the SAS· System: Why You Have To Teach the SAS· System about Sex! David B. Malkovsky, SAS Institute Inc., Cary, NC has been research in basic AI techniques, such as ways to represent knOWledge. development of special programming languages (for example, LISP, logic programming languages like Prolog, and object-oriented programming languages), tools (expert systems shells, for example), and architectures for AI applications. ABSTRACT This paper presents a very brief survey of the areas that comprise artificial intelligence (AI) research (knowledge representation, natural language understanding. expert systems, and so on). To date, the following kinds of AI applications have been developed: The majority of the paper describes AI researCh done at SAS Institute and a new product, SAS/ENGLISW software, which is a result of that research effort. SAS/ENGLISH software provides users with a means of asking for information about their data by typing English questions using a user-friendly interface that avoids a computerspecific language or syntax. The importance of knowledge acquisition to make this interface work is discussed. The subtitle of the paper is used to draw attention to the need for teaching AI programs the words and concepts that humans take for granted when they query each other in normal conversation. .. natural language interfaces to databases-interfaces that do not require users to learn special command languages .. expert systems-systems that use the knowledge of specialists to perform tasks such as medical diagnosis, chemical analysis, geological exploration, and computer system configuration .. automatic programming systems-systems to create programs based on descriptions of what the programs should acco.mpUsh INTRODUCTION • vision and robotic systems with sensory apparatus that respond to changes in their environment There is disagreement in the field of artificial intelligence (Al) as to what constitutes the discipline. It is not unusual to find people who believe they are working in the field of AI, but are not considered to be doing so by any of their colleagues. Conversely, there are other individuals working in areas that are traditionally considered to be part of AI who refuse to apply that label to their work. In reading announcements of new products or advertisements in leading computer periodicals, you are led to believe that almost everyone in the computer field tOday is working on, or in, the AI field. • neural network approaches to pattern recognition and knowledge acquisition • fuzzy logic approaches for performing inexact reasoning. Such systems were developed for medicine, business, engineering, SCience, and manufacturing applications. Answers to the question of what constitutes AI often center on defining the borders or limits of the field. One possible (and overly simplistic) test for receiving the AI classification is that any program written in LISP or Prolog belongs to the field of AI. Although in the first half of the history of AI this was very true, it is no longer a necessary or sufficient test The remainder of this paper will examine one specific AI application from SAS Institute, SAS/ENGLISH software, which draws on a long tradition of language processing and AI research. Other litmus tests of AI-ness identify systems that use knowledge, employ search techniques, have an inference engine, or modify their behavior over time. The function of a natural language interface such as SAS/ENGLISH software is to provide easy access to data without the user's having to learn the syntax of a retrieval language or know the structure and field names associated with individual data values. One humorous definition of AI states that ~ AI is any program or system that never gets to market" The basis of this view is that AI is magic and complex. If a marketable product is produced, the problem it solves was never in the field of AI. Rather, the task solved by such a system was only a complex software engineering task that was ear1!er misclassified as an AI problem. Using the full-screen interactive facility illustrated in Display 1, you type your question and SAS/ENGUSH software provides your answer. Once information is accessed, a user can use the full power of the SAS" System to manage, analyze, and present the data without ever leaving the menu-driven environment SAS/ENGLISH SOFTWARE SAS/ENGLISH software works by translating a question into Structured Query Language (SOL) commands and presents the retrieved data or answers in the SAS/ ASSIST" software environment Obviously there is no one correct answer to the question ~What is artificial intelligence?~ One acceptable answer is that AI ~is the part of computer science concerned with designing intelligent computer systems, that is, systems that exhibit the characteristics we associate with intelligence in human behavior - understanding language, learning, reasoning, solving problems, and so on~ (Barr, Cohen, and Feigenbaum 1989, p. 3). SAS/ENGLISH software is a quick and easy way to get information and generate reports. Users can find, order, and summarize data by asking simple who, what, when, where, how many, and yes/no questions. MAJOR AI APPLICATIONS The kinds of intelligent human behavior that AI investigators have tried to simulate include deduction, vision, learning, planning, problem solving, and natural language understanding. In addition, there 341 SAS/ENGLISH: .BUqlhh Language Query'-----------_~ range" job responsibilities, payroll status, and other job-related accounting information. The relationship between employee object and the job information must be defined to the SAS/ENGLISH KB in order for questions about an employee's job to be answered. Jnovltdge Base: ItBS.SUGI17 ItS DescrIption: ltnovledge Base for 8!1GI17 Pleue enter yOQT qu.... y, Defining the Vocabulary Presentation: Optlon~: • Lj$t1ng o Report The last function of KB administration is to define the vocabulary and specify idioms that describe the data. Each application has a SpecifiC vocabulary that is customarily used with the data accessed by the application. In addition, the same data in different industries or applications are often known by different names. For example, people are commonly known as students in a school environment, as employees in a corporate setting, and as subjects in medical experiments. X Run Generat"d' Cod'e X ShoW Query l'ar.p~ra$" X Show UIIknown Words o Vie" t .. bUlar o Brown c!>nrvaticn o High resolution graphics o LoW resolution graphics <Addltlobd Options> <RIJ1I> Display 1 <Recall> <Resnltn <Gobacb Objects, Attributes. and Relationships <H.Ip> SAS/ENGLISH Knowledge Base administration makes some very important and fundamental assumptions about the data that it can model. The data model is oriented toward recognizing objects (also called entities), attributes of objects, and relationships between SAS/ENGLISH English Language Query Screen SAS/ENGLISH Knowledge Base objects. To understand a question or query, SAS/ENGLISH software uses an information storage facility, or data model, called the Knowledge Base (KB). The KB describes the data in terms of real-world concepts, relationships, and vocabulary. This type of information is cal· led domain·s'pecific information. It includes the content, concepts, and context of the data that do not exist in the. built-in Knowledge Base of SAS/ENGLISH software. The activity of acquiring domainspecific information is known as knowledge base definition. The person who performs this task is the KB Administrator (KBA). The successful completion of this task requires an understanding both of the purpose of the application and of how a typical user interacts with that application. An object corresponds to a concept or thing in the real world and is represented by data in a data set. It can be a person, an event, a physical object. or an organizational entity, such as a department in a company. Objects or entities have attributes or characteristics. For example, people have names, heights, and weights; and events have times and places. SAS/ENGLISH Knowledge Base Definition The SAS/ENGLISH Knowledge Base Definition facility is a fullscreen SASIAF" software application that is used to interactively define a Knowledge Base describing the data for a specific application. Before defining a KB, the data must be identified and made accessible to SAS/ENGLISH software. This step may require defining SAS views or using a SAS/ACCESS product if the data are not in native SAS data sets. The effort required to provide a level of detail and coverage to successfully implement a complete SAS/ENGLISH KB is not triviaL There are three major tasks of KB administration: • creating the physical data model • creating the conceptual data model Defining a Knowledge Base is usually an iterative process. A KBA begins with an initial model of the data, builds a Knowledge Base, and then tests some sample queries. If the queries are not under~ stood, additional vocabulary may be needed or objects and their attributes may need to be defined differently. In some cases, it may be necessary to structure the data differently and redefine the Knowledge Base. • defining the vocabulary. Creating the Physical Data Model The first task of KB administration is to specify the name and location of the data files associated with the application. The applicationspecific data are stored in one or more data sets that can be accessed by the SAS System. The physical location and storage method are unimportant to the SAS/ENGLISH product, as long as they are accessible by a native SAS engine or through a SAS/ACCESS$ software product. The first attempt at defining a KB is usually not the last because it is nearly impossible to think of everything during the first session of KB definition. Modifications and improvements of the KB are always possible over the life of the application. Most decisions made during the definition process are easily altered or reversed. Creating the Conceptual Data Model Sample SAS/ENGLISH Knowledge Base Definition Having identified the particular data components of the application, the second task is to specify what concept or object each data element represents. For example, if a particular employee data set is a component of a human resource application, the employee data set can be modeled by an employee object. If the user enterprise is related to the sale of a product, a particular group of data may represent the concept of a sales transaction. This section of the paper will discuss various components of the Knowledge Base definition process and present some of the interactive screens associated with each component. Once a name and location for the Knowledge Base is chosen, the Select Data Set screen, illustrated in Display 2, is the central point for building the KB. Part of identifying the objects or the concepts of an enterprise is specifying the relationships between the various objects. Continuing with the human resource example, a second data source may con-tain the job descriptions for the various positions within the company. The data file might not only include the job title, but the salary 342 SAS/EN~LISH' S~l~ct nata S e t - - - - - - - - - - - - - - - - - , SAS/EKGLISH, Define a Data Defining- data nt Choou a new [lata Set Set'----------------~ P!~D"!A.J!HPLYEI Sp*cHy n01lns to refer to Specify verbs to uhr to nefine spedal attributes Set eptions fer this data Set Other KB Option. Sal"ct a data set that has already heen defined: Clod Data ut nam" lIeference "ord this data nt. this d~ta set. of this data $et. set object. lype ley? Punt? Sort? var? l'e"" .. t Oef Cmd Attribute ..... PHAIIl! tll""! ~ ~ SC1IAlI10. SC1lAll20. '" - ~UX _ _ 00. DATB1. fflll~nAT1!: OIlIS1. SC1lAlI15. SCUAR15. _crn .. _ _ ST"TE _ _ <BrOl/un Display 2 <Gobac:b <Help> <Help> <oebu\J> SAS/ENGLISH Select Data Set Screen Display 4 This screen has options for adding new data sets to the Knowledge Base (Choose a New Data Set button) and performing specialized and advanced KB operations (Set Other KB Options button). The Query option allows you to test queries to see the effects of adding or deleting information from the KB. The Browser option lets you examine the KB using the Knowledge Base Browser. SAS/ENGLISH Define a Data Set Screen The next level of specification is the definition of the type and use of each variable in the data set. Fortunately, there are only two groups into which attributes fall. They are based on the data type of the attribute; the data are either numeric or character. For the purpose of this discussion. we will k>ok at one example of each type. SALARY and SEX are the attributes used here to represent numeric and character type attributes, respectively. As new data sets are added to the Knowledge Base, they are listed with their corresponding main reference word. The main reference word is the word that best describes the object represented by each data set. This word is used by SAS/ENGLISH software instead of the data set name to specify the object. A complex KB for a business application may look like the one shown in Display 3 below. SAS/~n~LISH: DOH"1111.2 SALAiI~ DlP'I' After selecting the SALARY attribute on the Define a Data Set screen, the next screen seen by the Knowledge Base Administrator is the Define Numeric Attribute screen, as shown in Display 5 below. SAS/!IIGLISU: Define 1I...... ri<: Attribut ...- - - - - - - - - - - - - - , Dsfinition of numeric variable SALAilT 1n data set PEII!!!IIA!A.EMPLU! Select Oata S e t - - - - - - - - - - - - - - - - - , Koun used to rabr t(l this attribllte: SALAIlY Choo~e a lis" Data Set Choose the but (l A number that A quantity of a A measllrement a " number that o A time, date, o A money value ° Set Other KB options Hlsct a data set that has already baen defined: ClOd Oata set name Reference I/ord PERllDAlA.!IIPL'lBE PE!IlIOATA.iIlGlllill PEIUlDAU. SALESPIl PBIlIlOAlA.DEPJ P!IUIOAJA. IUK P£IIHOAlA.CUJnfT PBIIHOArA.SALE PBIIHDAJA.PIIOJECJ PEIIHDATA.PR.1'fIHE UIIHDATA.i>RJUSE EKPLOYEE ducription of SALAIlY. 1dentH Ies i:IIPLOY~E. EMPLOYBE. of !HPLOY1!:H. identifies an Object uiahd to EHPLOYEE. or duration (If &IIPLOYEE. associated I/ith HHPLOUE. Specify nOllns to refar to 8FJ.ARY. Sp$dfy verhs relating- SALARY to EHnOYu. Specify conn&ctlon$ betlfeen SALAItY and illnOYHE.. Specify adjectives de.cribing E.IIPLOYSE by SAtAU. Define clU. options for SALARY. SALESPERSOII OEPAUHEIIT ,,~ CLIENT m. PROJECT "OU "" <Gobacb <Qllery> Display 3 <BrOl/ser> <GolJaek> <Kelp> Display 5 <Debug> <Hall'> SAS/ENGLISH Define Numeric Attribute Screen In the middle of the screen, there are several general types of numeric attributes from which the KBA must choose the best description of the attribute. The selection of the description or type adds information to the Knowledge Base as to how or where in a query this data can be specified. The various numeric types are the following: SAS/ENGLISH Select Data Set Screen The Select Data Set screen is the central point for specifying the contents of the KB. The Define a Data Set screen is the central point for defining the contents of each data file to SAS/ENGLISH software. The screen example in Display 4 below shows that much of the information is similar to that produced by running the CONTENTS procedure on the data set. Infonnation unique to SAS/ENGLISH software includes specifying vocabulary, defining special attributes, and setting object options. a number that identifies the object This type is chosen if the attribute uniquely specifies or identifies the observation in the data set. For those of you familiar with relational databases, SAS/ENGLISH software assumes that such an attribute forms a primary singleton key. a quantity This type is chosen if the attribute counts some object. 343 for a compensation monetary attribute. A KBA can define meaningful adjectives for this attribute by selecting Specify adjectives describing EMPLOYEE by SALARY on the Define Numeric Attribute screen. Making such a selection results in the screen shown in Display 7: a measurement This type is chosen if the attribute measures or rates some aspect of the object. Examples of this type of attribute are temperature, age, and weight. a number that identifies a related object This type is chosen if the values of the attribute identify some data set object other than the current object. This type of attribute is sometimes known as a join field. 5AS/KRGLISH: lIefine A d j e c t i v " . - - - - - - - - - - - - - - - - , IIt1l11eric attribute SALARY .in d4t4 set PEIlHIIATA.EHPUU hter ne .. adjectival ;>p~'"~"c---~"~,~"~".C----c"~,~,~""="~ a time, date, or duration This type is chosen it the attribute represents a time, date, or some length of time. < Edit Range _ _ _ _ • _ _ _ _ Display 7 Choose one of the followin'1 that !>est describes th:is attribute: EHP~OYBE. A form of compellsation recuved by BHPLOYKE. So.... other kind of money attribute of RIIPLOYEB. Display 6 CHIAP SLIH FAT CAT ,"OOR RICH nDIOCRR HODERATB > 60000 < > > > 10000 40000 20000 < 60000 30000 < 70000 <Help> SAS/ENGLISH Define Adjectives Screen This example Define Adjectives screen is already filled with information specified by the KBA. For example, the words EXPENSIVE, HOT, and FAT are equated with large salary values, and the words CHEAP and SLIM are associated with small data values. The KBA specified the root 10nn of the adjectives because SAS/ENGLISH software automatically recognizes both the comparative and superlative tonns when they are used in queries. Thus the KBA specified FAT rather than the comparative form, FATTER, or the superlative form, FATTEST. Ddin .. unit of ""asore for thh ,\tttibuh <Gobad> IIXPEIISIVE FAT <Gobacl<> Definition of mOlley attribute aA~1JIY 111 data set PllUIDAT'\.iIII!PLYRR A price of < Othar option. > _1I0T Once the KBA selects one of these types as the primary type description of the attribute, additional information may be required to define more uniquely the use or scope of the attribute. In this example, once the KBA specifies that a money value is the best description of the attribute SALARY, the screen shown below in Display 6 appears requesting more specific information. ,\ttribute---------------~ Object Noun c Clear ilord Range CiIld Dr lin ilord a money value This type is chosen if the attribute represents a price of something or money received for services. S,\S/IIfGLlSH: Honey Add ilord > hn'i/$ defined: The direction field (Plus, None, and Minus) is needed to indicate the direction associated with a comparative data value. In this example, the larger the data value, the higher (or fatter, or hotter) the salary. If the attribute were age and the data values were represented by a person's date of birth, then the oldest person would be the person with the smallest (or earliest) birth date (direction of minus). <Help> SAS/ENGLISH Money Attribute Screen Money is usually the purchase price of some object or service, or a form of compensation received by some object or person. If the KBA selects price as the specialized type of money attribute, the KB is automatically modified to add the words PRICE, COST, and VALUE as synonyms for the attribute salary. The verb COST is also added so that queries of the fonn "Which sales cost more than $1 DO?" are supported. Finally, the adjectives EXPENSIVE, CHEAP, and COSTLY are incorporated so the command UList the cheapest sale is now possible. This screen provides a method of specifying values for numeric attributes that can be associated with adjectives with subjective meanings. This is because words like rich and poor or young and old can have very different meanings, depending on the audience or situation. In this example, RICH is associated with salaries over 70,000 dollars. The second major classification of attributes is those attributes that are represented by character data values. The Define Character Attribute screen, illustrated in Display e, is used to define such attributes to the Knowledge Base. B If the form of compensation selection is made irstead, as would be the case in this example, then SALARY, COMPENSATION, PAY, and REMUNERATION are added to the Knowledge Base as vocabulary associated with salary. Two verbs, EARN and MAKE, are also included so queries like "How much did Bob eam?~ are recognized. There is a third option, Some other kind of money attribute, for which SAS/ENGLISH software makes no extra assumptions about this attribute. As shown above, default adjectives like CHEAP are provided for a price monetary attribute, but none are given 344 SAS/KHGLISIl, Define Choracter l\ttr1bute------------~, Definition ot character variable in data .~t and how they are used until the KBA provides this type of information. The original creator of the employee data set did a good job in providing descriptive names for the various variables in the data set. The variable names impart meaning and implied use for the data values to a human but not to a computer program (at least not yet). Although SASfENGLISH software understands the concepts associated with numeric and character attributes outlined above, the system does not make any inference as to the use of any object or attribute without explicit instructions from the Knowledge Base Administrator. S~;( PERMDATl\.EHPUU Noun used to .efer to this attrihute, Choose the best description of SIX: o A name associated with EMPLOYEE. o An ldentifier of EMPLOYEE. o A value that classifies EKPLOYEE. " A location associated .. ith EMPLOYEE. o An identifier ot onother Object uuted to EMPLOYEE. o Some other property of EMI'LOYEB. _ _ specify nouns to refer to SEX. Specify verbs relating SEX to RMPLOY~E. sp"dfy _ _ Add dah valuu of SEX to the dictionary. oehne data value mapping. for SIX. o"fine class options for SEX. _ ~QnneetioD= between nx _ <Go back> Display 8 ~nd "MPLOYEE. For the current character attribute SEX, most people would draw the logical conclusion that the data values are likely to be F and M and represent FEMALE and MALE, respectively. Although this inference is indeed correct, in this example it need not be always true. SAS/ENGLISH software uses the ClaSSifier Attribute screen, shown in Display 9, to define the relationships between concept or vocabulary and data value. <lIelp> SAS/ENGUSH Define Character Attribute Screen In the middle of the screen there are several general types of character attributes from which the KBA must choose the best description of the attribute. As was the case with the numeric attributes, the selection of the description or type adds information to the Knowledge Base. The various character types are the following: SIISIRNGLlSII: ClauifiH A t t r t b u t e ' - - - - - - - - - - - - - - - Defiuing classificatien attribute nx in data $et PlllUlDATA.EIIPLYU _ Use llatinq scnen _ lIord denotes data ut object lIerd: Value: Obj em<! Word a name ~> Value , '" This type is chosen if the values of the attribute are names of the data set object being defined. These may be names of persons or things; they may be complete names or parts of names. ~U KASCULnU;: FEMALE I'E)UNINE ,,~ an identifier This type is chosen if the attribute uniquely identifies the Object or <;tata set observation. SAS/ENGLISH software assumes that such an attribute forms a primary singleton key. <Other Options> a value that classifies This type is chosen if the attribute classifies or categorizes the object in some manner. The data values may classify as boolean attributes, or into many categories. Typical classifiers might be job titles or gender of employees (as in this example), or a status label for a project (active, inactive, completed, and so on). Display 9 <Goback> <lIelp> SAS/ENGUSH Classifier Attribute Screen On this screen, the vocabulary that can be recognized in a query is found in the Word column and the data values that are equated with each word are found in the Value column. More than one word can be associated with a particular data value, as multiple data values can be associated with a particular word. This screen shows examples of the first type of association. a location This type is chosen if the data values for the attribute represent locations or addresses. This attribute may be part of a typical address or some other variety of geographical location. In the second type of association, one word refers to a list of possible values if a job attribute exists for an employee object. The classifier attribute JOB could have the data values MANAGER, CIVIL, and MECH. The words manager, civil engineer, and mechanical engineer could be associated with each value. Furthermore, you can define engineer as a synonym for someone who is either a civil or a mechanical engineer. an identifier of another object This type is chosen if the values of the attribute identify some data set object other than the current object. some other property This type is chosen if none of the other types fit the attribute. SAS/ENGLISH software will make no assumptions about this type of attribute and will not add any default information to the Knowledge Base for it. This type should rarely, if ever, be chosen. CONCLUSION There are other aspects of Knowledge Base definition that are not covered here. These include how to specify which attributes are components of composite locations or names, how to define new attributes that are mathematically computed from the values of other attributes, how to collect attributes into a group so they can be accessed collectively as a single attribute, how to specify units of measure for numeric quantities, how to Continuing with the screen example, the obvious choice for the best description of the attribute SEX is that the data values for this variable classify employees into one of two groups, based on gender. An important concept to understand is that the SAS/ENGLISH product does not (and cannot) make any assumptions about the data 345 specify more specialized object options, and how to define additional verbs for use in queries. Mockler, R. J. and Oologite, O. G. (1992), An Introduction to Expert Systems; Knowledge-Based Systems, New York: Macmillan. SAS/ENGlISH software, like any other AI application of this type, is only as powerful and as useful to users as the Knowledge Base that defines the application. Parsaye, K. and Chignell, M. (1988), Expert Systems for Experts, New York: John Wiley and Sons. RiCh, E. (1983), Artificial Intelligence, New York: McGraw-Hili Book Company. DISCLAIMER SAS/ENGLISH software is an experimental product (the first distribution of the product requires Release 6.07 of the SAS System), and the final implementation appearance of the product may differ from what is presented in this paper. Shank, R.C. (1987), "What is AI, Anyway?: AI Magazine, 8(4), 59-65. Shank, R.C. (1991), "Where's the 12(4),38-49. LIST OF ARTIFICIAL INTELLIGENCE SOURCES AI?,~ AI Magazine, Wilensky, R. (1986), Common LlSPcraft, New York: W. W. Norton and Company. Barr, A., Cohen, P., and Feigenbaum, E., eds. (1989), The Handbook of Artificial Intelligence, 4 vols., Los Altos, CA: William Kaufmann. Winograd, T. (1983). Language as a Cognitive Process, Reading, MA: Addison-Wesley. Charniak, E. and McDermott, D. (1985), Introduction to Artificial Intelligence, Reading, MA: Addison-Wesley. Winston, P.H. and Berthold, K. (1984), Artificial Intelligence, Second Edition, Reading, MA: AddisonWesley. Charniak, E., Riesback, C., and McDermott, D. (1987), ArtificiallnteJligence Programming, Hillsdale, NJ: Lawrence Erlbaum Associates. Winston, P.H. and Hom, B. (1984), LISP, Reading, MA: Addison-Wesley. Cullingford, R.E. (1986), Natural Language Processing, Totowa, NJ: Rowman and Littlefield. Grimson, E. and PatH, R., eds. (1987), AI in the 1980s and Beyond, Cambridge, MA: The MIT Press. SAS, SAS/ACCESS, SAS/AF, SAS/ASSIST, and SAS/ENGLISH are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Hayes-Roth, F., Watennan, D., and Lenat, D. (1983), Building Expert Systems, Reading, MA: Addison-Wesley. Other brand and product names are registered trademarks or trademarks of their respective companies. Klahr, P. and Waterman, D.A., eds. (1986), Expert Systems, Reading, MA; Addison-Wesley. Lenat, D. 8. and Guha, R. V. (1990), Building Large Knowledge-Based Systems, Reading, MA: AddisonWesley. 346