Download Natural Language Interaction with Robots Alden Walker May 7, 2007

Natural Language Interaction with Robots Alden Walker May 7, 2007 Abstract Natural language communication with robots has obvious uses in almost all areas of life. Computer-based natural language interaction is an active area of research in Computational Linguistics and AI. While there have been several NL systems built for specific computer applications, NL interaction with robots remains largely unexplored. Our research focuses on implementing a natural language interpreter for commands and queries given to a small mobile robot. Our goal is to implement a complete system for natural language understanding in this domain, and as such consists of two main parts: a system for parsing the subset of English our robot is to understand and a semantic analyzer used to extract meaning from the natural language. By using such a system we will be able to demonstrate that a mobile robot is capable of understanding NL commands and queries and responding to them appropriately. 1 NOTE: Images in this document have been obfuscated for copyright purposes. Contents 1 Introduction 4 2 Overview of the robot and its language capabilities 2.1 The Robot . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Sensors . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Output . . . . . . . . . . . . . . . . . . . . . . 2.2 Natural Restrictions on Language . . . . . . . . . . . . 2.3 Myro (Python Module) . . . . . . . . . . . . . . . . . . . . . . 5 5 7 7 8 8 . . . . . . . . . . . . . 8 8 10 10 11 14 14 16 16 16 17 18 19 20 . . . . . . . 21 21 21 22 22 22 23 23 . . . . . . . . . . . . . . . . . . . . . . . . . 3 Natural Language Processing Unit 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Grammar and Lexicon (and Parsing) . . . . . . . . . . . . . . . 3.3 General Language Model and Robot Control Architecture . . . 3.3.1 Language Model (Imperatives) . . . . . . . . . . . . . . 3.3.2 Language Model (Queries and Declaratives) . . . . . . . 3.3.3 Model of Thought (Subsumptional Brain Architecture) 3.3.4 Robot Control Interface . . . . . . . . . . . . . . . . . . 3.4 Semantic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Semantic Analysis Overview . . . . . . . . . . . . . . . . 3.4.2 Imperatives . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 Contextual Commands . . . . . . . . . . . . . . . . . . . 3.4.4 (Embedded) Declaratives . . . . . . . . . . . . . . . . . 3.4.5 Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Examples 4.1 Example: 4.2 Example: 4.3 Example: 4.4 Example: 4.5 Example: 4.6 Example: 4.7 Example: Simple Commands . . . Adjectives, Adverbs . . . Simple Prepositions . . . Contextual Commands . Embedded Declaratives . Queries . . . . . . . . . . Complete Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Progress and Future Work 24 6 Related Work and Background 24 A Grammar 27 2 B Acceptable Sentences B.1 Imperatives . . . . . . B.1.1 Moving . . . . B.1.2 Turning . . . . B.1.3 Lights . . . . . B.1.4 Sound . . . . . B.2 Queries . . . . . . . . B.3 Contextual Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 29 29 29 30 30 30 30 1 Introduction When giving a robot commands, a user typically must give short, blunt commands or remember a long list of precise phrases that must be input exactly as expected. Usually, robots either do not take commands in textual form (they have a joystick, for instance), or take a set of commands which are pre-programmed and must be said exactly as pre-programmed. For instance, a robot might know what “Move quickly” means but have no idea what“Quickly move” means. It might know “Turn left for 45 degrees” and “Turn left slowly” but not “Turn left slowly for 45 degrees.” These limitations arise because making a robot understand commands in a looser, more natural way is much more complicated that simply mapping a list of commands to a list of movements and activity scripts. Finding an appropriate response to natural language is more in the domain of AI and natural language processing, and it is understandable that people who build robots tend to err on the side of dependability and require precise commands. Coordinating both the basic functioning of the robot and natural language processing and understanding at the same time poses a considerable task. Both are very sensitive and prone to error. Our goal is to use a very simple, pre-made robot and build a natural language processor on top of it which can handle the small subset of English which makes sense for commands and queries. By limiting the problem in such a way, we hope not only to end up with a functioning example of a robot able to follow commands given in natural language but also to develop methods which can be used when creating similar systems for more complicated robots which allow for more complicated interactions with the real world. In order to satisfactorily solve this problem, we must attempt to bridge gaps between robotics and natural language processing. Natural language processing is typically done on a computer, which processes the input sentence, extracts meaning and represents it in a formalism, and responds appropriately in natural language. With a robot, we must produce a real action. Research in natural language processing can show us ways to understand natural language on a computer. Research in robotics can show us some of the best ways to help robots function in the real world. Transforming loose, natural language commands into the precise low-level commands for the robot involves new kinds of semantic interpretation, and that is the goal of this project. By using a combination of techniques from robotics and natural language processing, we have developed an architecture which looks promising. Here is an example of an interaction with our robot using the current version of our natural language processor: Hello, I’m Scribby! Type a command: do you see a wall 4 No Type a command: beep whenever you see a wall Type a command: turn right whenever you see a wall to your left Type a command: turn left whenever you see a wall to your right Type a command: move for 60 seconds The result of this sequence of commands is that the robot will move around for a minute avoiding walls and beeping whenever it sees a wall. As is clearly demonstrated above, the robot can respond appropriately to many natural language commands. 2 Overview of the robot and its language capabilities The natural language robot interface is composed of many pieces. However, there are natural chunks into which the program pieces divide. In our system, all of the language processing is done on a host computer. The host computer communicates with the robot over a wireless bluetooth connection. All the robot control software also runs on the host computer. Communication takes places using a client-server paradigm. Thus, the robot runs only a small server which listens for commands. The basic design schematic is shown in Figure 1. In the following sections, the various pieces which make up the lower-level system details will be explored in detail. This is necessary to understand the decisions we made and how we dealt with the natural language issues which arose during research. 2.1 The Robot We use the Parallax Scribbler robot, which is actually marketed more or less as a toy: it can carry a Sharpie and draw lines on a piece of paper and avoid obstacles straight out of the box (see Figure 2). Connecting it to a computer through a serial port allows the microcontroller to be reprogrammed as a robot typically would be. All of our work is done in the context of the Scribbler but our work has broad applications. In order to solve the problem of natural language interaction, the architecture and design have been tailored for the Scribbler. Thus, it is necessary to understand the capabilities of the robot to understand our design choices. 5 Low-Level Hardware/Software Interface Schematic Commands Queries IR Sensors Responses Light Sensors NLPU Line Sensors (below) Speaker Myro Serial (Bluetooth) Connection 3 LEDs Wheels Software Scribbler Robot Figure 1: The hardware interface Figure 2: The Scribbler robot 6 2.1.1 Sensors The robot has three sets of sensors: Proximity (IR), light, and line: IR: The IR sensors consist of two IR emitters and one IR detector. The emitters shine IR light forward, one slightly left and the other slightly right. Thus, though the robot can only see forward, it can detect if an object is a little to the left or right. In this way, it can turn in the correct direction to avoid obstacles and perform other similar actions. The command do you see a wall to your left would be translated and understood as a procedure for polling the IR sensors. Light: The robot has three forward-looking light sensors pointing left, straight, and right. Lower reported values from the light sensor correspond to brighter light. By turning toward the light sensor reporting the lowest value, the robot can follow a bright light. Though not always completely reliable, the light sensors do typically report different values if the incoming light to the sensors is significantly different, so the sensor looking at the brightest light will usually report a lower value than the others. Line: The line sensor is actually a pair of IR emitter/detector pairs underneath the robot looking straight down. They are placed side-by side, so the robot can see if it is on the edge of a dark area (the right sensor will report white and the other black). By moving slowly and making careful use of these sensors, the robot can limit its movement to a box of a color significantly different than the surrounding area or follow a dark line on a white background. 2.1.2 Output The robot has three ways of producing output which interacts with the world: it can move around, it can play sounds, and it can turn a bank of three LEDs on and off: Movement: The robot has two wheels which can be controlled separately. Each can spin forward or backward at a given speed. Because the robot is round and has two wheels controlled so freely, it is very mobile and can spin in place or perform just about any sort of movement. The command move to the wall would have meaning performed by a procedure which polled the IR sensors and then directed the wheels accordingly. Sound generation: The robot has a speaker which can play a wide range of frequencies or pair of overlapped frequencies. 7 Lights: There are three LEDs which can be turned on and off. They are small, but useful for indicating status. 2.2 Natural Restrictions on Language Because of the simplicity of the robot the set of natural language sentences/commands which could appear in an interaction between a human and the robot is restricted. The small set of ways in which the robot can affect the world (output: movement, sound, and lights) keeps small the set of verbs which make sense in commands to the robot, and likewise the small set of sensors makes small the set of possible queries. This restriction is very good for our project: in order to make possible natural language interaction between a robot and a human the natural language domain must be restricted, but good interaction will never occur unless the restriction is natural. By using such a simple robot we achieve this. 2.3 Myro (Python Module) Myro is a high-level interface module for Python created by IPRE. Myro handles the serial connection and allows a program written in Python to communicate easily without bothering with the low-level details. We used this module as our starting base and built our system on top of it. Thus, our NLPU communicates with the robot exclusively through the Myro module. 3 3.1 Natural Language Processing Unit Overview The system for natural language understanding can be divided into three parts: the grammar, which performs syntactic analysis using a context-free grammar, the semantic analyzer, and the brain engine, which organizes currently running commands into our model of thought. The order of this section may seem strange: we will first discuss how we parse the input provided by a user. Then we will talk about how we represent meaning in our language domain. Only after that will we be ready to explore semantic analysis. Why is this? The goal of our project boils down to doing semantic analysis on the parse trees provided by our parser. It doesn’t make sense to talk about semantic analysis until we know what output we should produce. In order to answer this question, we will take a detour into our method for representing the meaning of language and specifically imperative commands, how these 8 move to the wall move again turn right for 2 seconds Commands Queries Responses Language Interface Input NLTK-lite Grammar/Parser Parse Tree Semantic Analyzer Layer Brain Control Interface Raw Myro Commands Myro/Hardware Connection Scribbler Figure 3: The NLPU 9 } NLPU representations are realized in practice, and how the robot control architecture interacts with the robot. At that point, we will be ready to bridge the gap with semantic analysis. 3.2 Grammar and Lexicon (and Parsing) The small mobile robot that we elected to work with made the design of a grammar easier. Only a small subset of English is applicable to the situation of giving commands and queries to such a robot, so the language is naturally restricted. Within this subset, however, we tried to be as complete as possible in the design of the grammar. The grammar parses standard declarative sentences and commands (verb phrases) and allows for adverbial and prepositional phrases. The parsing is done by an NLTK-lite parse class. Though it’s not very fast, our parser uses a standard recursive descent algorithm. The danger with a shift-reduce parsing algorithm is the possibility of not finding valid parses: that would be unacceptable. Because the parsing is done in such a compartmented fashion, it is useful to let the grammar contain some semantic information and tailor it to the specific situation. We evaluate the meaning of a declarative sentence in the usual way i.e. as a logical statement, but commands and contextual commands have meaning consisting a series of Myro commands. Thus, sentences get evaluated in completely different ways depending on whether they are declarative or imperative. The grammar can assist in the discrimination between types of sentences: it immediately distinguishes a contextual command by finding the presence of a contextual adverb such as “again” or “more,” and the grammar considers declarative verb phrases and imperative verb phrases to be completely different. This allows the semantic analysis to be carried out more effectively. An example of this is shown in Figure 4. Note how the prepositional phrase is marked as a VPP, or verb prepositional phrase and is noted at the top as being an IP - imperative sentence. In contrast, the contextual command move again is marked as a CC - contextual command. Thus, some of the semantic analysis comes directly out of the syntax of the command. For the complete grammar, refer to appendix A. 3.3 General Language Model and Robot Control Architecture We have now covered the way in which we do syntactic analysis. Now we turn to the way in which we represent meaning and how we realize these representations in practice. We will show how we create a fundamental unit of meaning, called a block, and how we build the meaning of a command, a layer, out of these pieces. Once we have shown how to create layers, we will describe our model of thought and how the layers are combined in a dynamic data structure called a brain to create output. 10 C IS C VP V turn AdvP Adv quickly AdvP Adv left CC VPP AdvP P DP for NP NumP NP Num N 5 seconds V CAV move again Figure 4: Parse trees 3.3.1 Language Model (Imperatives) In order to understand how the grammar was designed, and especially how the semantic analysis works, it is necessary to understand the reasoning behind our representation of meaning. The best way to see why we use the representation we do is to describe it and show its flexibility. The reasoning described here applies only to imperatives. We use a verb-centric system of meaning. We consider a verb to have a certain fundamental meaning which is realized as a function. This verb function is is the meaning of the verb, and it, when called, generates a compact version of a Myro command performing the appropriate action. Other pieces of an imperative, such as adverbs or verbal prepositional phrases, can modify the function. The function can take arguments as necessary: a function representing a transitive verb, for instance, needs an argument representing the object which is acted upon. Our final meaning for an imperative phrase is the original verb function, modified by the other pieces of the clause. If necessary, the verb function could have access to the general state of the robot, such as whether there is a wall in front of it. An important question is how the various phrases modify the meaning (function) of the verb. We chose to make the meaning, as far as possible, constrainted. That is, “move” has a fundamental meaning, and the prepositional phrases “for 2 seconds” simply serves to inhibit the behavior of the verb function so that after 2 seconds it ceases to have meaning. An imperative like “beep whenever you see a wall” contains the prepositional phrase “whenever...,” which inhibits the verb function until the robot “sees” a wall, at which point the verb function is left free to act (and produce a beep). 11 “Turn right for two seconds” Senses/Time/Commands In For 2 seconds nextBlock Turn right nextBlock Commands Out Figure 5: The block diagram for turn right for 2 seconds This representation of meaning is useful and serves to make the meaning of multiple prepositional phrases clearer: “beep for 2 seconds whenever you see a wall” has two stacked prepositional phrases: when a wall is in sight, the outside blocking preposition “whenever” releases the command under it, which happens to be “beep for 2 seconds.” The second prepositional phrases inhibits the verb function for “beep” so that the activity occurs for the duration of 2 seconds. We call the fundamental unit of meaning corresponding to a verb or prepositional phrase a block. As described above, blocks can be linked together in such a way that one block inhibits the behavior of another. In our example, the block representation for the command turn right for 2 seconds is shown in Figure 5. This inhibitory theory works well for prepositional phrases, but adverbs are a more complicated situation: prepositions tend to have meaning more disjoint from the verbs they modify. However, the meaning of adverbs is tied up in the meaning of the verb it “modifies.” It is almost as though adverbs trigger a modification rather than create one. To see the distinction, consider the two sentences “turn around” and “move around,” or “move back” and “throw it back.” The way in which the adverbs modify the meaning is strikingly different: turning around is probably the rather rigidly defined act of turning about 180 degrees, while moving around is vaguely defined as walking in a room in various directions, probably in a relaxed manner. Moving back is moving backwards, while throwing something back is probably actually throwing it forwards to the person who threw it last. This last example 12 is weak because the verbs “move” and “throw” are intransitive and transitive, but the point is clear. In light of the complications with adverbs, we consider the meaning of adverbs to be inherent in the verb. Certainly, there are “default” adverbs, that come pre-programmed, even into newly made verbs, so the real-world situation is more complicated. But in our robotworld, adverbs trigger already-defined modifications in the verb functions. Thus, adverbs get bound up with the verb in the verb block: the block representation for the meaning of turn right quickly for 2 seconds would be identical to the block diagram above, except the inhibited verb block would be the block for turn right quickly. Note that we already implicitly did this before with right without calling attention to it. So far, we have verbs, adverbs, and prepositions. The main area left to cover is nouns and noun phrases. For imperative sentences, nouns and noun phrases tend to have an actual tangible meaning. Obviously, this is not the case in all sentences, but for imperatives a noun carries with it contextual information which allows people to know how to pick it up, turn it on, move it, etc. Nouns might also represent an abstract quantity (a second, for instance), but even these abstract nouns must have some sort of realization, like the duration of an action. Because of the simplicity of the world and range of activities for the robot, we chose to understand a noun as an abstract object. There can be many things associated with a noun object, such as its name or names, attributes that it has, verbs (functions) which can be used to determine its state, command fields that will alter it, etc. When we attempt to find the meaning of a noun phrase, we are searching for a single noun object or a set of objects. Note that an object might be abstract, such as a second. This “second” won’t have many attributes, and certainly no function to check its state or modify it, but it will have the abstract quantity of duration. The abstract noun “three” has basically only the cardinality expression of 3. These abstract quantities can get added together to create “three seconds.” We actually consider adjectives and nouns to be more or less on equal footing. When we ask for the left LED, we mean to perform an intersection of the set of objects which are have the attribute “LED” and the set of objects which are “left.” The intersection will leave us with the single object of the left LED. When we ask for “lights,” we will get everything which is a light. That is, we’ll get a list of objects. The same thing applies to prepositions inside noun phrases, which are basically just fancied up adjectives. For the most part, nouns (and therefore adjectives) get bound up in fundamental meaning. To see why this is justified, consider the examples “play an A” and “play a B.” The actual physical, imperative meanings of the commands are very different, even though the difference is not in the verb. Here we can see that we need to take nouns directly into the block of a verb or prepositional phrase. It is clear now that we end up sticking many things inside these so-called “fundamental” blocks, but this comes only out of necessity. In support of this representation, it should be 13 noted that though prepositional phrases are only one syntactic/semantic category that we must consider, they are very prevalent in imperatives. Also, the nouns and adverbs that get stuck inside blocks are simple to handle, and once their meaning is understood we can be done constructing our blocks. Compare this to the prepositional phrases which naturally seem to regulate and whose meaning can change with time or the situation. With this in mind, it is clear that such a block structure is useful to use even if it only makes two types of phrases “fundamental.” It should be noted that some commands, such as move, have an unspecified ending condition, so they will technically continue forever. In our language model, there is an understood temporal restriction which kicks in if the command does not provide some condition under which it stops. This makes sense in general: if someone is told to “turn,” they will turn for a little while, maybe 180 degrees, and then stop, and likewise for such imperatives as “move” and “beep.” In addition to being probably more realistic, this addition to the model simplifies the picture, and it helps keep the robot under control! 3.3.2 Language Model (Queries and Declaratives) Queries and declarative sentences have a similar structure. We think of both as logical propositions: declarative sentences (our robot can understand only declaratives which are embedded with a subordinating preposition) are simple propositions; queries are just propositions modified to make them questions. In both cases, the meaning of the sentence is its logical truth value. Here we are simply taking the typical linguistic stance about declarative sentences. However, queries need an extra note: a query is asking for a response from the robot, and we can think of that as an action, so the meaning of a query (a response to it) is a block which tells the brain to print a message to the computer screen—it’s still a block. We do not have to worry about this with declaratives because they are embedded in prepositional phrases and thus get integrated into blocks automatically. 3.3.3 Model of Thought (Subsumptional Brain Architecture) We have seen how we can represent the meaning of a command by breaking it down into blocks inhibiting each other. Once we have this representation, what does it mean for the robot to carry out the meaning? This is an important linguistic and cognitive question, because it involves not only how the robot will carry out a single command (it will just follow orders) but also how the robot will deal with many simultaneous commands. We know now what the meanings of move and turn right when you see a wall are. However, what is the meaning of both of them together? This section attempts to answer that question. 14 We need some “container” representation to make the discussion of collections of commands easier: we define a layer to be the collection of interconnected blocks which represent the meaning of a command. We consider a layer as the representation of a single, selfcontained command. For example, consider the command: turn right for two seconds. Verbs, together with adverbs, form single units of meaning, that is, blocks. Thus, we will create one block for “turn right.” We will create another block for “for two seconds,” and we will give the verb block to the prepositional block as the block to inhibit. These two blocks are wrapped up in a single layer, and this layer is the meaning of the command. The representation of the entire robot thought process is the brain. It consists of a collection of active layers, ordered by precedence. In our case, we consider older commands to have higher precedence. The brain has the same input and output as a single layer, namely the senses and state of the robot and a collection of Myro commands respectively. However, the brain synthesizes its whole collection of layers to generate just a few commands. Note that the brain and each layer can generate multiple commands because there are three output paths (lights, sound, and movement), and each layer can use any of them. The method for this is as follows: the brain goes through each layer and asks it what it wants to do given the current situation. It then asks the next one, and so on. The highest-precedence response for each of the three output paths is the one taken as the generated command. In addition to this, we allow the brain to let the currently working layer have access to what the previous layer wanted to do. In this way, layers can be “nice” and choose to allow other, less powerful layers to have a say. For instance, if one layer says to turn right and another to move forward, the higher precedence one would be nice to simply combine both desires into forward movement with a slight right slant. In some other cases it is best to have higher layers simply rule out their lower counterparts. It depends on the context. However, this layering effect is very powerful. As an example of what it can achieve, consider the series of commands (layers), ordered in decreasing precedence. 1. turn right when you see a wall to your left 2. turn left when you see a wall only to your right 3. move forward When no wall is in sight, the robot will just move forward. When a wall is detected, the more powerful layers will kick in and force the robot to turn in a direction appropriate to avoid the wall. The result of these commands is that the robot will move around, avoiding the walls. This is a rather complicated behavior, but stacking the various pieces of the activity allows it to be broken into simple chunks. 15 This layer stacking effect is our attempt to incorporate the successful “subsumptional architecture” described in [1]. Because we incorporate new layers on the fly and do incorporate representations into our design, the motivation behind implementing the subsumption architecture is a little different, but it works well. 3.3.4 Robot Control Interface Note that the brain never actually sends commands to the robot. The brain is our representation of an entire collection of layers operating in concert. When we want to actually use the commands that the brain produces, we can use our brain control interface to do this. The robot command interface is simple. The proposed interface has two threads, one of which receives input commands, parses, and analyzes them (the Language Interface), and the other which manages the running brain and passes commands between the brain and the robot (the Brain Control Interface). Because of technical issues, this interface is not finished. The proposed interface differs from the current one most importantly in the fact that the current interface waits for a command to finish before asking for another one. In the proposed interface, things will happen in parallel so that effective layer stacking can take place. Currently, the interface performs a few rather mundane tasks. It runs an endless loop of: 1. Wait for a command 2. Parse and analyze it — if the command is a query, ask the brain for an appropriate response 3. Add the command to list of previous commands (context) 4. Add the resulting layer to the brain 5. Run the brain until it stops doing anything This interface is weak because it does not allow commands to be entered while the robot is doing anything (e.g.“Stop!”). Commands with unspecified ending conditions do get truncated, as discussed above, but it is still not ideal. However, it is effective. 3.4 3.4.1 Semantic Analysis Semantic Analysis Overview Now we know what the semantic analyzer gets as input: a parse tree containing some semantic information. We also know what it must produce: a single layer. The method we developed for doing this is as follows. 16 Once an NLTK-lite tree structure has been created out of an imperative clause using the NLTK-lite parser, our semantic analyzer is called. We tried to keep the analysis as compartmented as possible, and in this vein, it proceeds in a recursive fashion: each node in the tree has an associated function (getVP, for example). Our main analysis function, getMeaning, looks at the top node of the tree and calls the appropriate function. It passes as an argument a list of subtrees. The functions which process a given type of node rely on getMeaning to collect the meanings of the subtrees. The node function is primarily responsible for piecing things together. We make heavy use of lambda functions during the analysis. Because verbs and prepositions must eventually make their way into the final layer as functions, node functions often build special functions by piecing the results of getting the meaning of subtrees. We will look at each type of sentence/command in turn. 3.4.2 Imperatives The entire workings of each function are too complicated and are unnecessary to go into. However, let’s take a look at our good example. Consider the imperative turn quickly left for 5 seconds. The parse tree for this sentence is shown in Figure 4. We will go through the process of semantic analysis as our algorithm does; however, for a broad overview, note that the call tree for the function getMeaning called on the above command will be almost identical to the parse tree itself: every time a function needs to get the meaning (in whatever form that is) of a subtree, it calls getMeaning, which in turn calls the appropriate get command. Approximate pseudocode for any of the analysis functions would be: getMeaning(X) 1. Break the X into its constituent pieces (sub-projections) A, B, ... 2. Get the meanings of these pieces by calling getMeaning(A), getMeaning(B), ... 3. Assemble the meanings and return the meaning of X The first interesting work is done by the function getVP. It will first create a block of meaning corresponding to turn quickly left. To do this, it will collect the adverbs and pass them to getV, which will take the simple function for turn and add the adverbs to create the block. The available verb functions come with pre-set adverbial modifications, which the adverbs just trigger. Next, getVP will call getVPP to get the prepositional block. Here, we will find the DP seconds and add to it the numeral quantity of 5. This will create an object with a duration 17 of 5 seconds. This is collected with the preposition (using getP) to create the inhibitory block for for 5 seconds. If we had been required to parse a command like move to the wall, we would have to have gotten the meaning of the noun wall. In order to make this possible, we keep a database of all the known object with which the robot can interact. The phrase right LED, for instance, will refer to the specific object in our database representing the right LED on the robot. We find this meaning by giving each member of the database a series of keywords. If we need to find the right LED, we first find all LEDs and then search in this sublist for all things which are right. In the case of this command, we find the object which represents a second and add the numeral quantity as described above. getVP now takes the verb block and attaches it as the “next block” onto the prepositional block. Now, the verb block for turn quickly right is monitored and controlled by the prepositional block for for 5 seconds. This chain of blocks is packaged in a layer and is finally returned as the result of the first call to getMeaning. 3.4.3 Contextual Commands Contextual commands are handled differently: the interface saves the full parse tree of every entered command along with the name of the verb and important characteristics of that command. Thus, after a while, we’ll have a long history of parses and key words. Basically, contextual commands search through this list to find the previous command being referred to. This may seem like a rather limited context from which to draw. However, it is sufficient for most interactions with the robot. Consider the string of commands: 1. turn right 2. move to the wall 3. turn left for 3 seconds 4. move again It is debatable what is actually meant by the last command. Is the prepositional phrase to the wall included in doing the action again? We say yes: the meaning of a contextual repeat command is to repeat exactly some command which has already been done. The getMeaning function will recognize this last command as contextual from its parse, and it will call the getCC function, which will search through the list of keywords from previous commands until it finds the keyword “move.” It will then take the entire stored parse tree and just call getMeaning on it again and return the result. 18 Why wouldn’t we just store the layer resulting from the semantic analysis and not redo all the hard work? We must accommodate time-sensitive functions. When we say: move backwards for 3 seconds, we really mean move back until 3 seconds from now (at least, that is the meaning we arrive at after semantic analysis). However, if we then say move again, we certainly don’t mean to move until 3 seconds from the original time of the first command! Thus, commands sometimes have meaning which depends on the time of the command, and repeating a command necessarily moves the calling time to the present. To most easily accomplish this, we re-analyze the entire parse tree. Note that this is not actually that much of a loss: creating the parse tree is by far the most time-consuming part of the analysis process—re-analyzing is negligible compared to re-parsing (which we do not do). 3.4.4 (Embedded) Declaratives An important class of prepositions is the group of subordinating prepositions, which allow an embedded declarative sentence to appear inside a prepositional phrase, which in turn governs the behavior of a command. For example, turn left if you see a wall. Here we see the embedded declarative you see a wall. This is a special sort of prepositional phrase, because it directly asks a question; regular prepositional phrases usually have a question or some sort of condition built in to them (move to the wall has the question “do you see a wall” built in to it), but here we have an explicit condition on, for example, turning left. What we need to do here (and for queries, as we will see) is evaluate the truth value of the declarative in the current context of the robot. The meaning of the declarative, as we discussed earlier, is its truth value: once the truth of the embedded declarative has been evaluated, then prepositional phrase can use this truth value to govern its own inhibitory behavior. Our robot does not use a logic engine: that sort of system would be overkill. The declaratives we need to understand are not adding facts to a knowledge base or anything complicated like that; they are simply asking for the truth value of a simple sentence. In fact, the simplicity of the robot helps us dramatically restrict the domain of possible sentences. If we are asking about the current state of the robot, we must ask about the value of one of its sensors, and in all cases, the only possible verb that can appear in the declarative is “see.” For example, turn if you see a wall asks about the IR sensors, turn if you see a light asks about the light sensors, and move until you see a line asks about the line sensors. Therefore, the declarative analysis functions are looking only for sentences of the form “you see X.” Obviously, this is an extremely simple subset of English. However, the restriction is appropriate here. The semantic analysis is done by first building a tiny logical tree, so you see a wall 19 becomes (V: see (N: you) (DP: a wall)). Once this has taken place, a second set of functions looks at the tree and builds a lambda function which returns true or false depending on whether the robot (“you”) sees “a wall.” In our database of objects, each object has a function associated with it which senses it, so we find the object “a wall,” perhaps using adjectives and prepositions to pare down the list of possibilities, and call its sensor function. This process seems more complicated than is necessary, since as we’ve already said, the only possible verb is “see:” why don’t we just find the object of the verb and not bother with the tree? We’d like to be as general and flexible as possible; with the current system, we could expand it to understand more verbs if that started to make sense, or, more likely, we could pass the logical tree to a logical engine for understanding. Once the declarative semantic analysis has taken place, it returns the lambda function, which is then integrated into the preposition (inhibitory) block. In other words, the analysis of turn left if you see a wall to your right makes a lambda function out of the embedded declarative, so the command really becomes turn left if X, where X is a black-box lambda function. Then the sentence is analyzed as an imperative, except getVPP uses the black box, in combination of how “if” deals with true and false, to create the verb preposition layer. 3.4.5 Queries Queries are just declaratives marked as needing some kind of language response (rather than an action). We deal with queries in a way very similar to declaratives. Currently, we only handle very simple “do” queries. This is not a language restriction imposed by the robot; it certainly makes sense to ask the robot “what do you see?” But that kind of query is quite complicated since it requires a search through all possible objects. We chose to only handle the more simple type of query such as “do you see a wall?” Here, we just analyze the embedded declarative. From our declarative analysis functions, we get a function which gives us the truth value of the declarative in the context of the robot. We then build a block (and layer) out of this function, but we use a special feature of our brain class: a layer can return the commands it wants for the motors, lights, and speaker, and it can pass meta-commands such as “kill me,” but it can also send messages for printing. Our queryanalysis functions produce a layer which immediately kills itself and sends a message “yes” or “no” depending on the truth value of the declarative. In other words, we still produce a layer from a query, but it doesn’t perform any action; it just sends a message. In this way, we maintain the communication between the brain and the parser/analyzer: for every command, we produce a layer and add it to the brain. For imperatives, this works as already described. For queries, the layer we add to the brain is special, but it’s still a layer. Keep in mind that the brain is not directly connected to the robot. It must be connected through 20 a data structure we created called a nerves interface, which passes commands to the actual robot from the brain and relays the sensor data. To answer the query without adding a layer, we’d have to connect the analyzer directly to the nerves instantiation and bypass the brain. This goes against both our attempts to keep everything as compartmentalized as possible and our desire for consistent representation of meaning. 4 Examples Here we exhibit a set of example commands and the behaviors they induce in the robot. 4.1 Example: Simple Commands Simple commands ask for a single, uncomplicated behavior. A selection of nice simple commands: • move —The robot moves forward for a small amount of time (about 2 seconds) • turn —The robot turns a little • beep —The robot beeps • play a D —The robot plays a D (note). 4.2 Example: Adjectives, Adverbs Things can get more interesting if we allow ourselves some adjectives and adverbs. All of the following commands have obvious results. Note that none of them specify the length of time for which they should continue, so they all run for the default “small amount of time,” which is about 2 seconds. • turn left • move backwards • turnon your left light • quickly turn right • quickly go back 21 4.3 Example: Simple Prepositions Prepositions allow us to specify conditions under which an action should be performed. The following commands do what you would expect. • move to the wall • turn left to the wall These simple prepositions do not allow us to do too much. We’ll see later that subordinating prepositions are much more powerful. 4.4 Example: Contextual Commands With contextual commands, we can refer to previous commands. These commands aren’t too complicated, but they are interesting. Assume that we’ve already given the robot the commands (in order): move backwards, turn right, turn left, beep, move back quickly. Then we could give the following commands: • again — The robot would move back quickly again. • beep again — The robot would beep again (not terribly exciting) • turn again — The robot would turn left. 4.5 Example: Embedded Declaratives With subordinating prepositions, we can embed declarative sentences inside preposition phrases, and these embedded declaratives now govern how the prepositional phrase inhibits the verb. The results of these commands are self-evident. Note that we no longer have the default 2-second duration, since these commands have explicit termination conditions. We can now also start to make use of noun prepositions to specify which wall (right or left), for instance, the robot sees. • move until you see a wall • turn left if you see a bright light • beep whenever you see a light • go back quickly if you see a line • turn left quickly if you see a wall to your right 22 4.6 Example: Queries Queries allow us to ask about the current state of the robot. The robot responds “Yes!” or “no” depending on the answer to the question. • Do you see a wall • Do you see a bright light • Do you see a wall to your right 4.7 Example: Complete Interactions Now we can put everything together. First, let’s use the layered, combining ability of the brain to have the robot move around avoiding walls: turn left if you see a wall to your right turn right if you see a wall to your left move for 45 seconds The robot will move around for 45 seconds and avoid any walls that it sees. We can also use queries to, for instance, find a corner (“>>>” denotes commands, the other lines are robot responses): >>> do you see a wall No >>> move to a wall — the robot move forward until it sees a wall anywhere (right, left, or in front) >>> turn left until you see a wall to your front — make sure the robot is directly facing the wall it just found >>> do you see a wall — just checking Yes! turn left for 90 degrees — make the robot face parallel to the wall. We might also have done the command turn left until you can’t see a wall. 23 move until you see a wall The robot should now we facing a wall and be parallel to the first wall it found, so it’s in a corner. In both of these examples, we can replace “wall” with “light” or “line” or a modified version of any of them. 5 Progress and Future Work Probably the best way to understand the current situation is to look at the schematic in Figure 6. The major incomplete area of the proposed NLPU is the robot control architecture. Python, the language in which our system is implemented, has support for multi-threading. However, there are technical issues with IDLE, a development environment for Python, which have stalled our efforts in this area. This is unfortunate, because with multi-threading can we take full advantage of our layering, subsumptional architecture. 6 Related Work and Background Our project is a blend of natural language processing and robotics, and there are a few key sources which inspired many of our design choices and made our project possible. On a very basic level, we relied on all the research done to make the foundations of natural language processing, like parsing, easy to do. In particular, we used the Natural Language Toolkit (NLTK-Lite) for Python, which implements a recursive-descent parser and a tree data structure. Using NLTK-Lite allowed us to focus our efforts on the design of the language model and the semantic analysis. The basic goal of our project is inspired most by research in natural language processing. In a very broad sense, this field attempts to develop software which can interact with a person using natural language, and the products of this research are varied. One simple example is the machine which answers telephones and can direct calls based on spoken commands from a user. This machine does mostly phonetic and syntactic processing: the challenge is understanding which words were spoken. On the other end of the natural language processing spectrum are programs which take typed natural language sentences, such as paragraphs of a story, and can reason about what the meaning of the input is. These programs tackle the problem of semantic and pragmatic analysis, and it is these latter programs which most inspired our research. In terms of semantic analysis, there are two distinct areas: research in more standard, linguistics-oriented efforts to create programs which can understand declarative sentences 24 Commands Responses Language Interface NLTK-lite Input Grammar/Parser Parse Tree Layer Semantic Analyzer Thread 2 } } Thread 1 Queries Brain Control Interface Raw Myro Commands Myro/Hardware Connection Scribbler Figure 6: The proposed NLPU 25 } NLPU and reason about them and research in programs which can carry out natural language commands, that is, procedural semantics. Clearly, our project falls more in line with the second area, but for basic reading, [5] is a good example of the first type of semantic analysis and demonstrates that software which understands a short story (among other things) was quite feasible in the 70’s. While this research is inspiring for us, we are interested more in creating a system which works with procedural semantics. A very important paper here is [7], from 1972, which develops a system for logical deduction and understanding of commands which require complicated behaviors. This is a simulated robot arm program which can manipulate blocks. As an example of the power of this system, if a red block lies on top of a blue block, the system can figure out that if it is required to move the blue block, it will need to set the red block aside. Though everything is simulated, this is a clear predecessor to the project we undertook and was our major source for clarification and understanding of the challenges we faced. For general background in procedural semantics and natural language processing in general, [4] and [2] were very useful. Following in the footsteps of Winograd are [6], [3]. The most influential research we looked to for reference was [1]. This paper argues that the standard way of building AI systems has the wrong approach. The vast majority of research attempts to create an AI system in a simplified environment (all of the papers we have discussed so far do this). For instance, the usual habitat of an AI robot is a block world: the things in the world with which the robot can interact are just blocks, perhaps with color or texture. The hope is that creating a good AI system in a simplified world can be slowly extended to the creation of a good AI system in the real world. Brooks argues that we should instead start with simple robots in the real world: this paper describes the development of several robots which operate in the real world, performing relatively simple tasks, such as searching a building for soda cans to pick up and recycle. The key insight here is the subsumptional architecture: one process directs the robot (the “move around” process) until the robot senses a situation requiring specialized behavior and a higher-precedence process takes over (e.g. the “grab a soda can” process). The idea of having multiple levels of processes which inhibit each other played directly into our design of the model of thought of the robot and our design of “block” as a linguistic idea (and an implemented data structure). Our realization of the ideas put forward by Brooks has an ironic twist: the very title of [1] contains the thesis that intelligent robots do not need to (and in fact are hindered by) a fixed representation of the world. Note that we took the idea of a subsumptional model of control, made it dynamic, and added representations! The inhibitory model of procedural semantics that we inherited from this work made the most difference for us, and it, more than anything else, made our project possible. 26 A Grammar Here we show our complete grammar. Some abbreviations: C is “command” (the start symbol). IS, CC, and DS are “imperative sentence,” ”contextual command,” and ”declarative sentence.” In addition, most standard phrase names are prefixed with a D to indicate that they are phrases in a declarative sentence. C -> IS | CC | DS | Q Q -> DQ | WH WHQ WHQ -> DO DWHQ DQ -> DO DS DWHQ -> DS DO -> do WH -> what CC -> V CAV | CAV CAV -> again | more DS -> DP DVP DVP -> (AdvP | Adv) DV (AdvP) (VPP) DV -> (AUX) RDV DP -> NP | D NP IS -> VP VP -> (Adv) V (AdvP) (VDP) (VPP) VDP -> (D) VNP VNP -> VN | VN AP | AP VN NP -> N | AP NP | N NPP | AP NP NPP | NumP NP | N AP AP -> A AdvP -> Adv AdvP | NPP -> P DP VPP -> P DP | SubP DS | P DP VPP | SubP DS VPP NumP -> Num Num -> 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 20 | 45 | 30 | 90 | 180 | 360 | 270 | one | 27 two | three | four | five | six | seven | eight | nine | ten Adv -> quickly | slowly | carefully | right | left | straight | back | backwards | forward | forwards | around VN -> light | LED | LEDs | sound | beep | song | note | A | B | C | D | E | F | G N -> I | you | little | room | dark | lights | yourself | wall | light | box | line | seconds | second | foot | feet | inches | meters | meter | degrees | right | left | ahead | front D -> the | your | a | an | that | A -> right | left | center | high | low | black | white | dark | bright | ahead | front P -> to | until | for | along | toward | towards | from | into SubP -> after | if | unless | until | when | whenever V -> face | spin | move | go | turn | turnon | turnoff | play | light | blink | beep AUX -> can | can’t | dont RDV -> face | spin | move | go | turn | turnon | turnoff | play | light | blink | beep | say | hear | see | is 28 B Acceptable Sentences Here is an example suite of sentences. This list is not exhaustive—most standard combinations of the phrases shown here are acceptable. B.1 B.1.1 Imperatives Moving Here, go can be substituted for move, and light can almost always be substituted for wall: move forward move back move backwards move forward a little move forward for two feet move forward for 4 seconds move forward slowly move forward quickly move forward slowly for two feet move move to the line move to the wall move forward to the wall move until you can see the wall move to the light quickly move forward to the wall move for 4 seconds whenever you see a wall to your right B.1.2 turn turn turn turn turn turn turn turn Turning right left right right right right right right for 2 seconds for 90 degrees a little until you see a wall slowly quickly 29 slowly turn turn slowly right turn right until you can’t see a wall whenever you see a wall to your left B.1.3 Lights turnon your left light turnon your lights turn off your lights turnon your right light when you see the wall turnon your lights until you see the wall turnon your left light if you see a wall to your left B.1.4 Sound play an A beep when you see the wall beep for 2 seconds when you can’t see a bright light B.2 Queries do you see a wall do you see a bright light to your left do you see a wall to you right B.3 Contextual Commands again move again play again References [1] Rodney A. Brooks. Intelligence without representation. Number 47 in Artificial Intelligence, pages 139–159. 1991. [2] Colleen Crangle and Patrick C. Suppes. Language and Learning for Robots. CSLI Publications, Stanford, CA, USA, 1994. 30 [3] Masoud Ghaffari, Souma Alhaj Ali, and Ernest L. Hall. A perception-based approach toward robot control by natural language. Intelligent Engineering Systems through Artificial Neural Networks, 14, 2004. [4] Barbara J. Grosz, Karen Sparck-Jones, and Bonnie Lynn Webber, editors. Readings in natural language processing. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1986. [5] R. C. Schank and C. K. Riesbeck. Lisp. In R. C. Schank and C. K. Riesbeck, editors, Inside Computer Understanding: Five Programs Plus Miniatures, pages 41–74. Erlbaum, Hillsdale, NJ, 1981. [6] Stuart C. Shaprio. Natural language competent robots. IEEE Intelligent Systems, 21(4):76–77, 2006. [7] Terry Winograd. A procedural model of language understanding. In Computation & intelligence: collected readings, pages 203–234. American Association for Artificial Intelligence, Menlo Park, CA, USA, 1995. 31

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Natural Language Interaction with Robots Alden Walker May 7, 2007