Download Natural Language Interaction with Robots Alden Walker May 7, 2007

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Visual servoing wikipedia , lookup

The City and the Stars wikipedia , lookup

Robotics wikipedia , lookup

Embodied language processing wikipedia , lookup

Adaptive collaborative control wikipedia , lookup

List of Doctor Who robots wikipedia , lookup

Index of robotics articles wikipedia , lookup

Embodied cognitive science wikipedia , lookup

Transcript
Natural Language Interaction with Robots
Alden Walker
May 7, 2007
Abstract
Natural language communication with robots has obvious uses in almost all areas
of life. Computer-based natural language interaction is an active area of research in
Computational Linguistics and AI. While there have been several NL systems built for
specific computer applications, NL interaction with robots remains largely unexplored.
Our research focuses on implementing a natural language interpreter for commands
and queries given to a small mobile robot. Our goal is to implement a complete system
for natural language understanding in this domain, and as such consists of two main
parts: a system for parsing the subset of English our robot is to understand and a
semantic analyzer used to extract meaning from the natural language. By using such a
system we will be able to demonstrate that a mobile robot is capable of understanding
NL commands and queries and responding to them appropriately.
1
NOTE: Images in this document have been obfuscated for copyright purposes. Contents
1 Introduction
4
2 Overview of the robot and its language capabilities
2.1 The Robot . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1 Sensors . . . . . . . . . . . . . . . . . . . . . .
2.1.2 Output . . . . . . . . . . . . . . . . . . . . . .
2.2 Natural Restrictions on Language . . . . . . . . . . . .
2.3 Myro (Python Module) . . . . . . . . . . . . . . . . .
.
.
.
.
.
5
5
7
7
8
8
.
.
.
.
.
.
.
.
.
.
.
.
.
8
8
10
10
11
14
14
16
16
16
17
18
19
20
.
.
.
.
.
.
.
21
21
21
22
22
22
23
23
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Natural Language Processing Unit
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Grammar and Lexicon (and Parsing) . . . . . . . . . . . . . . .
3.3 General Language Model and Robot Control Architecture . . .
3.3.1 Language Model (Imperatives) . . . . . . . . . . . . . .
3.3.2 Language Model (Queries and Declaratives) . . . . . . .
3.3.3 Model of Thought (Subsumptional Brain Architecture)
3.3.4 Robot Control Interface . . . . . . . . . . . . . . . . . .
3.4 Semantic Analysis . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1 Semantic Analysis Overview . . . . . . . . . . . . . . . .
3.4.2 Imperatives . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.3 Contextual Commands . . . . . . . . . . . . . . . . . . .
3.4.4 (Embedded) Declaratives . . . . . . . . . . . . . . . . .
3.4.5 Queries . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 Examples
4.1 Example:
4.2 Example:
4.3 Example:
4.4 Example:
4.5 Example:
4.6 Example:
4.7 Example:
Simple Commands . . .
Adjectives, Adverbs . . .
Simple Prepositions . . .
Contextual Commands .
Embedded Declaratives .
Queries . . . . . . . . . .
Complete Interactions .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 Progress and Future Work
24
6 Related Work and Background
24
A Grammar
27
2
B Acceptable Sentences
B.1 Imperatives . . . . . .
B.1.1 Moving . . . .
B.1.2 Turning . . . .
B.1.3 Lights . . . . .
B.1.4 Sound . . . . .
B.2 Queries . . . . . . . .
B.3 Contextual Commands
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
29
29
29
29
30
30
30
30
1
Introduction
When giving a robot commands, a user typically must give short, blunt commands or remember a long list of precise phrases that must be input exactly as expected. Usually, robots
either do not take commands in textual form (they have a joystick, for instance), or take a set
of commands which are pre-programmed and must be said exactly as pre-programmed. For
instance, a robot might know what “Move quickly” means but have no idea what“Quickly
move” means. It might know “Turn left for 45 degrees” and “Turn left slowly” but not “Turn
left slowly for 45 degrees.” These limitations arise because making a robot understand commands in a looser, more natural way is much more complicated that simply mapping a list
of commands to a list of movements and activity scripts. Finding an appropriate response
to natural language is more in the domain of AI and natural language processing, and it is
understandable that people who build robots tend to err on the side of dependability and
require precise commands.
Coordinating both the basic functioning of the robot and natural language processing
and understanding at the same time poses a considerable task. Both are very sensitive
and prone to error. Our goal is to use a very simple, pre-made robot and build a natural
language processor on top of it which can handle the small subset of English which makes
sense for commands and queries. By limiting the problem in such a way, we hope not only
to end up with a functioning example of a robot able to follow commands given in natural
language but also to develop methods which can be used when creating similar systems for
more complicated robots which allow for more complicated interactions with the real world.
In order to satisfactorily solve this problem, we must attempt to bridge gaps between
robotics and natural language processing. Natural language processing is typically done on
a computer, which processes the input sentence, extracts meaning and represents it in a
formalism, and responds appropriately in natural language. With a robot, we must produce
a real action. Research in natural language processing can show us ways to understand
natural language on a computer. Research in robotics can show us some of the best ways to
help robots function in the real world. Transforming loose, natural language commands into
the precise low-level commands for the robot involves new kinds of semantic interpretation,
and that is the goal of this project.
By using a combination of techniques from robotics and natural language processing, we
have developed an architecture which looks promising.
Here is an example of an interaction with our robot using the current version of our
natural language processor:
Hello, I’m Scribby!
Type a command: do you see a wall
4
No
Type a command:
beep whenever you see a wall
Type a command:
turn right whenever you see a wall to your left
Type a command:
turn left whenever you see a wall to your right
Type a command:
move for 60 seconds
The result of this sequence of commands is that the robot will move around for a minute
avoiding walls and beeping whenever it sees a wall. As is clearly demonstrated above, the
robot can respond appropriately to many natural language commands.
2
Overview of the robot and its language capabilities
The natural language robot interface is composed of many pieces. However, there are natural
chunks into which the program pieces divide. In our system, all of the language processing is
done on a host computer. The host computer communicates with the robot over a wireless
bluetooth connection. All the robot control software also runs on the host computer. Communication takes places using a client-server paradigm. Thus, the robot runs only a small
server which listens for commands. The basic design schematic is shown in Figure 1.
In the following sections, the various pieces which make up the lower-level system details
will be explored in detail. This is necessary to understand the decisions we made and how
we dealt with the natural language issues which arose during research.
2.1
The Robot
We use the Parallax Scribbler robot, which is actually marketed more or less as a toy: it can
carry a Sharpie and draw lines on a piece of paper and avoid obstacles straight out of the box
(see Figure 2). Connecting it to a computer through a serial port allows the microcontroller
to be reprogrammed as a robot typically would be.
All of our work is done in the context of the Scribbler but our work has broad applications.
In order to solve the problem of natural language interaction, the architecture and design
have been tailored for the Scribbler. Thus, it is necessary to understand the capabilities of
the robot to understand our design choices.
5
Low-Level Hardware/Software Interface Schematic
Commands
Queries
IR Sensors
Responses
Light Sensors
NLPU
Line Sensors (below)
Speaker
Myro
Serial (Bluetooth) Connection
3 LEDs
Wheels
Software
Scribbler Robot
Figure 1: The hardware interface
Figure 2: The Scribbler robot
6
2.1.1
Sensors
The robot has three sets of sensors: Proximity (IR), light, and line:
IR: The IR sensors consist of two IR emitters and one IR detector. The emitters shine IR
light forward, one slightly left and the other slightly right. Thus, though the robot can only
see forward, it can detect if an object is a little to the left or right. In this way, it can turn
in the correct direction to avoid obstacles and perform other similar actions. The command
do you see a wall to your left would be translated and understood as a procedure for
polling the IR sensors.
Light: The robot has three forward-looking light sensors pointing left, straight, and right.
Lower reported values from the light sensor correspond to brighter light. By turning toward
the light sensor reporting the lowest value, the robot can follow a bright light. Though
not always completely reliable, the light sensors do typically report different values if the
incoming light to the sensors is significantly different, so the sensor looking at the brightest
light will usually report a lower value than the others.
Line: The line sensor is actually a pair of IR emitter/detector pairs underneath the robot
looking straight down. They are placed side-by side, so the robot can see if it is on the edge
of a dark area (the right sensor will report white and the other black). By moving slowly
and making careful use of these sensors, the robot can limit its movement to a box of a color
significantly different than the surrounding area or follow a dark line on a white background.
2.1.2
Output
The robot has three ways of producing output which interacts with the world: it can move
around, it can play sounds, and it can turn a bank of three LEDs on and off:
Movement: The robot has two wheels which can be controlled separately. Each can spin
forward or backward at a given speed. Because the robot is round and has two wheels controlled so freely, it is very mobile and can spin in place or perform just about any sort of
movement. The command move to the wall would have meaning performed by a procedure which polled the IR sensors and then directed the wheels accordingly.
Sound generation: The robot has a speaker which can play a wide range of frequencies
or pair of overlapped frequencies.
7
Lights: There are three LEDs which can be turned on and off. They are small, but useful
for indicating status.
2.2
Natural Restrictions on Language
Because of the simplicity of the robot the set of natural language sentences/commands which
could appear in an interaction between a human and the robot is restricted. The small set
of ways in which the robot can affect the world (output: movement, sound, and lights) keeps
small the set of verbs which make sense in commands to the robot, and likewise the small
set of sensors makes small the set of possible queries. This restriction is very good for our
project: in order to make possible natural language interaction between a robot and a human
the natural language domain must be restricted, but good interaction will never occur unless
the restriction is natural. By using such a simple robot we achieve this.
2.3
Myro (Python Module)
Myro is a high-level interface module for Python created by IPRE. Myro handles the serial
connection and allows a program written in Python to communicate easily without bothering
with the low-level details. We used this module as our starting base and built our system
on top of it. Thus, our NLPU communicates with the robot exclusively through the Myro
module.
3
3.1
Natural Language Processing Unit
Overview
The system for natural language understanding can be divided into three parts: the grammar,
which performs syntactic analysis using a context-free grammar, the semantic analyzer, and
the brain engine, which organizes currently running commands into our model of thought.
The order of this section may seem strange: we will first discuss how we parse the input
provided by a user. Then we will talk about how we represent meaning in our language
domain. Only after that will we be ready to explore semantic analysis. Why is this? The
goal of our project boils down to doing semantic analysis on the parse trees provided by our
parser. It doesn’t make sense to talk about semantic analysis until we know what output
we should produce. In order to answer this question, we will take a detour into our method
for representing the meaning of language and specifically imperative commands, how these
8
move to the wall
move again
turn right for 2 seconds
Commands
Queries
Responses
Language Interface
Input
NLTK-lite
Grammar/Parser
Parse Tree
Semantic Analyzer
Layer
Brain Control
Interface
Raw Myro Commands
Myro/Hardware Connection
Scribbler
Figure 3: The NLPU
9
}
NLPU
representations are realized in practice, and how the robot control architecture interacts with
the robot. At that point, we will be ready to bridge the gap with semantic analysis.
3.2
Grammar and Lexicon (and Parsing)
The small mobile robot that we elected to work with made the design of a grammar easier.
Only a small subset of English is applicable to the situation of giving commands and queries
to such a robot, so the language is naturally restricted. Within this subset, however, we
tried to be as complete as possible in the design of the grammar. The grammar parses
standard declarative sentences and commands (verb phrases) and allows for adverbial and
prepositional phrases.
The parsing is done by an NLTK-lite parse class. Though it’s not very fast, our parser uses
a standard recursive descent algorithm. The danger with a shift-reduce parsing algorithm is
the possibility of not finding valid parses: that would be unacceptable.
Because the parsing is done in such a compartmented fashion, it is useful to let the
grammar contain some semantic information and tailor it to the specific situation. We
evaluate the meaning of a declarative sentence in the usual way i.e. as a logical statement, but
commands and contextual commands have meaning consisting a series of Myro commands.
Thus, sentences get evaluated in completely different ways depending on whether they are
declarative or imperative. The grammar can assist in the discrimination between types of
sentences: it immediately distinguishes a contextual command by finding the presence of a
contextual adverb such as “again” or “more,” and the grammar considers declarative verb
phrases and imperative verb phrases to be completely different. This allows the semantic
analysis to be carried out more effectively. An example of this is shown in Figure 4.
Note how the prepositional phrase is marked as a VPP, or verb prepositional phrase and is
noted at the top as being an IP - imperative sentence. In contrast, the contextual command
move again is marked as a CC - contextual command. Thus, some of the semantic analysis
comes directly out of the syntax of the command. For the complete grammar, refer to
appendix A.
3.3
General Language Model and Robot Control Architecture
We have now covered the way in which we do syntactic analysis. Now we turn to the way in
which we represent meaning and how we realize these representations in practice. We will
show how we create a fundamental unit of meaning, called a block, and how we build the
meaning of a command, a layer, out of these pieces. Once we have shown how to create
layers, we will describe our model of thought and how the layers are combined in a dynamic
data structure called a brain to create output.
10
C
IS
C
VP
V
turn
AdvP
Adv
quickly
AdvP
Adv
left
CC
VPP
AdvP
P
DP
for
NP
NumP
NP
Num
N
5
seconds
V
CAV
move
again
Figure 4: Parse trees
3.3.1
Language Model (Imperatives)
In order to understand how the grammar was designed, and especially how the semantic
analysis works, it is necessary to understand the reasoning behind our representation of
meaning. The best way to see why we use the representation we do is to describe it and
show its flexibility. The reasoning described here applies only to imperatives.
We use a verb-centric system of meaning. We consider a verb to have a certain fundamental meaning which is realized as a function. This verb function is is the meaning of the
verb, and it, when called, generates a compact version of a Myro command performing the
appropriate action. Other pieces of an imperative, such as adverbs or verbal prepositional
phrases, can modify the function. The function can take arguments as necessary: a function
representing a transitive verb, for instance, needs an argument representing the object which
is acted upon. Our final meaning for an imperative phrase is the original verb function, modified by the other pieces of the clause. If necessary, the verb function could have access to
the general state of the robot, such as whether there is a wall in front of it.
An important question is how the various phrases modify the meaning (function) of the
verb. We chose to make the meaning, as far as possible, constrainted. That is, “move”
has a fundamental meaning, and the prepositional phrases “for 2 seconds” simply serves to
inhibit the behavior of the verb function so that after 2 seconds it ceases to have meaning.
An imperative like “beep whenever you see a wall” contains the prepositional phrase “whenever...,” which inhibits the verb function until the robot “sees” a wall, at which point the
verb function is left free to act (and produce a beep).
11
“Turn right for two seconds”
Senses/Time/Commands In
For 2 seconds
nextBlock
Turn right
nextBlock
Commands Out
Figure 5: The block diagram for turn right for 2 seconds
This representation of meaning is useful and serves to make the meaning of multiple
prepositional phrases clearer: “beep for 2 seconds whenever you see a wall” has two stacked
prepositional phrases: when a wall is in sight, the outside blocking preposition “whenever”
releases the command under it, which happens to be “beep for 2 seconds.” The second
prepositional phrases inhibits the verb function for “beep” so that the activity occurs for the
duration of 2 seconds.
We call the fundamental unit of meaning corresponding to a verb or prepositional phrase
a block. As described above, blocks can be linked together in such a way that one block
inhibits the behavior of another. In our example, the block representation for the command
turn right for 2 seconds is shown in Figure 5.
This inhibitory theory works well for prepositional phrases, but adverbs are a more
complicated situation: prepositions tend to have meaning more disjoint from the verbs they
modify. However, the meaning of adverbs is tied up in the meaning of the verb it “modifies.”
It is almost as though adverbs trigger a modification rather than create one. To see the
distinction, consider the two sentences “turn around” and “move around,” or “move back”
and “throw it back.” The way in which the adverbs modify the meaning is strikingly different:
turning around is probably the rather rigidly defined act of turning about 180 degrees, while
moving around is vaguely defined as walking in a room in various directions, probably in
a relaxed manner. Moving back is moving backwards, while throwing something back is
probably actually throwing it forwards to the person who threw it last. This last example
12
is weak because the verbs “move” and “throw” are intransitive and transitive, but the point
is clear.
In light of the complications with adverbs, we consider the meaning of adverbs to be
inherent in the verb. Certainly, there are “default” adverbs, that come pre-programmed, even
into newly made verbs, so the real-world situation is more complicated. But in our robotworld, adverbs trigger already-defined modifications in the verb functions. Thus, adverbs
get bound up with the verb in the verb block: the block representation for the meaning
of turn right quickly for 2 seconds would be identical to the block diagram above,
except the inhibited verb block would be the block for turn right quickly. Note that we
already implicitly did this before with right without calling attention to it.
So far, we have verbs, adverbs, and prepositions. The main area left to cover is nouns
and noun phrases. For imperative sentences, nouns and noun phrases tend to have an actual
tangible meaning. Obviously, this is not the case in all sentences, but for imperatives a noun
carries with it contextual information which allows people to know how to pick it up, turn it
on, move it, etc. Nouns might also represent an abstract quantity (a second, for instance),
but even these abstract nouns must have some sort of realization, like the duration of an
action. Because of the simplicity of the world and range of activities for the robot, we chose
to understand a noun as an abstract object. There can be many things associated with a
noun object, such as its name or names, attributes that it has, verbs (functions) which can
be used to determine its state, command fields that will alter it, etc. When we attempt
to find the meaning of a noun phrase, we are searching for a single noun object or a set of
objects. Note that an object might be abstract, such as a second. This “second” won’t have
many attributes, and certainly no function to check its state or modify it, but it will have the
abstract quantity of duration. The abstract noun “three” has basically only the cardinality
expression of 3. These abstract quantities can get added together to create “three seconds.”
We actually consider adjectives and nouns to be more or less on equal footing. When
we ask for the left LED, we mean to perform an intersection of the set of objects which are
have the attribute “LED” and the set of objects which are “left.” The intersection will leave
us with the single object of the left LED. When we ask for “lights,” we will get everything
which is a light. That is, we’ll get a list of objects. The same thing applies to prepositions
inside noun phrases, which are basically just fancied up adjectives.
For the most part, nouns (and therefore adjectives) get bound up in fundamental meaning.
To see why this is justified, consider the examples “play an A” and “play a B.” The actual
physical, imperative meanings of the commands are very different, even though the difference
is not in the verb. Here we can see that we need to take nouns directly into the block of a
verb or prepositional phrase.
It is clear now that we end up sticking many things inside these so-called “fundamental”
blocks, but this comes only out of necessity. In support of this representation, it should be
13
noted that though prepositional phrases are only one syntactic/semantic category that we
must consider, they are very prevalent in imperatives. Also, the nouns and adverbs that
get stuck inside blocks are simple to handle, and once their meaning is understood we can
be done constructing our blocks. Compare this to the prepositional phrases which naturally
seem to regulate and whose meaning can change with time or the situation. With this in
mind, it is clear that such a block structure is useful to use even if it only makes two types
of phrases “fundamental.”
It should be noted that some commands, such as move, have an unspecified ending condition, so they will technically continue forever. In our language model, there is an understood
temporal restriction which kicks in if the command does not provide some condition under
which it stops. This makes sense in general: if someone is told to “turn,” they will turn
for a little while, maybe 180 degrees, and then stop, and likewise for such imperatives as
“move” and “beep.” In addition to being probably more realistic, this addition to the model
simplifies the picture, and it helps keep the robot under control!
3.3.2
Language Model (Queries and Declaratives)
Queries and declarative sentences have a similar structure. We think of both as logical
propositions: declarative sentences (our robot can understand only declaratives which are
embedded with a subordinating preposition) are simple propositions; queries are just propositions modified to make them questions. In both cases, the meaning of the sentence is its
logical truth value. Here we are simply taking the typical linguistic stance about declarative
sentences. However, queries need an extra note: a query is asking for a response from the
robot, and we can think of that as an action, so the meaning of a query (a response to
it) is a block which tells the brain to print a message to the computer screen—it’s still a
block. We do not have to worry about this with declaratives because they are embedded in
prepositional phrases and thus get integrated into blocks automatically.
3.3.3
Model of Thought (Subsumptional Brain Architecture)
We have seen how we can represent the meaning of a command by breaking it down into
blocks inhibiting each other. Once we have this representation, what does it mean for
the robot to carry out the meaning? This is an important linguistic and cognitive question,
because it involves not only how the robot will carry out a single command (it will just follow
orders) but also how the robot will deal with many simultaneous commands. We know now
what the meanings of move and turn right when you see a wall are. However, what is
the meaning of both of them together? This section attempts to answer that question.
14
We need some “container” representation to make the discussion of collections of commands easier: we define a layer to be the collection of interconnected blocks which represent
the meaning of a command. We consider a layer as the representation of a single, selfcontained command. For example, consider the command: turn right for two seconds.
Verbs, together with adverbs, form single units of meaning, that is, blocks. Thus, we will
create one block for “turn right.” We will create another block for “for two seconds,” and we
will give the verb block to the prepositional block as the block to inhibit. These two blocks
are wrapped up in a single layer, and this layer is the meaning of the command.
The representation of the entire robot thought process is the brain. It consists of a
collection of active layers, ordered by precedence. In our case, we consider older commands
to have higher precedence. The brain has the same input and output as a single layer, namely
the senses and state of the robot and a collection of Myro commands respectively. However,
the brain synthesizes its whole collection of layers to generate just a few commands. Note
that the brain and each layer can generate multiple commands because there are three output
paths (lights, sound, and movement), and each layer can use any of them. The method for
this is as follows: the brain goes through each layer and asks it what it wants to do given
the current situation. It then asks the next one, and so on. The highest-precedence response
for each of the three output paths is the one taken as the generated command.
In addition to this, we allow the brain to let the currently working layer have access to
what the previous layer wanted to do. In this way, layers can be “nice” and choose to allow
other, less powerful layers to have a say. For instance, if one layer says to turn right and
another to move forward, the higher precedence one would be nice to simply combine both
desires into forward movement with a slight right slant. In some other cases it is best to have
higher layers simply rule out their lower counterparts. It depends on the context. However,
this layering effect is very powerful. As an example of what it can achieve, consider the series
of commands (layers), ordered in decreasing precedence.
1. turn right when you see a wall to your left
2. turn left when you see a wall only to your right
3. move forward
When no wall is in sight, the robot will just move forward. When a wall is detected, the more
powerful layers will kick in and force the robot to turn in a direction appropriate to avoid
the wall. The result of these commands is that the robot will move around, avoiding the
walls. This is a rather complicated behavior, but stacking the various pieces of the activity
allows it to be broken into simple chunks.
15
This layer stacking effect is our attempt to incorporate the successful “subsumptional
architecture” described in [1]. Because we incorporate new layers on the fly and do incorporate representations into our design, the motivation behind implementing the subsumption
architecture is a little different, but it works well.
3.3.4
Robot Control Interface
Note that the brain never actually sends commands to the robot. The brain is our representation of an entire collection of layers operating in concert. When we want to actually use
the commands that the brain produces, we can use our brain control interface to do this.
The robot command interface is simple. The proposed interface has two threads, one of
which receives input commands, parses, and analyzes them (the Language Interface), and
the other which manages the running brain and passes commands between the brain and
the robot (the Brain Control Interface). Because of technical issues, this interface is not
finished. The proposed interface differs from the current one most importantly in the fact
that the current interface waits for a command to finish before asking for another one. In
the proposed interface, things will happen in parallel so that effective layer stacking can take
place. Currently, the interface performs a few rather mundane tasks. It runs an endless loop
of:
1. Wait for a command
2. Parse and analyze it — if the command is a query, ask the brain for an appropriate
response
3. Add the command to list of previous commands (context)
4. Add the resulting layer to the brain
5. Run the brain until it stops doing anything
This interface is weak because it does not allow commands to be entered while the robot is
doing anything (e.g.“Stop!”). Commands with unspecified ending conditions do get truncated, as discussed above, but it is still not ideal. However, it is effective.
3.4
3.4.1
Semantic Analysis
Semantic Analysis Overview
Now we know what the semantic analyzer gets as input: a parse tree containing some
semantic information. We also know what it must produce: a single layer. The method we
developed for doing this is as follows.
16
Once an NLTK-lite tree structure has been created out of an imperative clause using
the NLTK-lite parser, our semantic analyzer is called. We tried to keep the analysis as
compartmented as possible, and in this vein, it proceeds in a recursive fashion: each node
in the tree has an associated function (getVP, for example). Our main analysis function,
getMeaning, looks at the top node of the tree and calls the appropriate function. It passes
as an argument a list of subtrees. The functions which process a given type of node rely
on getMeaning to collect the meanings of the subtrees. The node function is primarily
responsible for piecing things together.
We make heavy use of lambda functions during the analysis. Because verbs and prepositions must eventually make their way into the final layer as functions, node functions often
build special functions by piecing the results of getting the meaning of subtrees.
We will look at each type of sentence/command in turn.
3.4.2
Imperatives
The entire workings of each function are too complicated and are unnecessary to go into.
However, let’s take a look at our good example. Consider the imperative turn quickly
left for 5 seconds. The parse tree for this sentence is shown in Figure 4.
We will go through the process of semantic analysis as our algorithm does; however, for
a broad overview, note that the call tree for the function getMeaning called on the above
command will be almost identical to the parse tree itself: every time a function needs to get
the meaning (in whatever form that is) of a subtree, it calls getMeaning, which in turn calls
the appropriate get command. Approximate pseudocode for any of the analysis functions
would be:
getMeaning(X)
1. Break the X into its constituent pieces (sub-projections) A, B, ...
2. Get the meanings of these pieces by calling getMeaning(A), getMeaning(B), ...
3. Assemble the meanings and return the meaning of X
The first interesting work is done by the function getVP. It will first create a block of
meaning corresponding to turn quickly left. To do this, it will collect the adverbs and
pass them to getV, which will take the simple function for turn and add the adverbs to
create the block. The available verb functions come with pre-set adverbial modifications,
which the adverbs just trigger.
Next, getVP will call getVPP to get the prepositional block. Here, we will find the DP
seconds and add to it the numeral quantity of 5. This will create an object with a duration
17
of 5 seconds. This is collected with the preposition (using getP) to create the inhibitory
block for for 5 seconds. If we had been required to parse a command like move to the
wall, we would have to have gotten the meaning of the noun wall. In order to make this
possible, we keep a database of all the known object with which the robot can interact. The
phrase right LED, for instance, will refer to the specific object in our database representing
the right LED on the robot. We find this meaning by giving each member of the database a
series of keywords. If we need to find the right LED, we first find all LEDs and then search in
this sublist for all things which are right. In the case of this command, we find the object
which represents a second and add the numeral quantity as described above.
getVP now takes the verb block and attaches it as the “next block” onto the prepositional
block. Now, the verb block for turn quickly right is monitored and controlled by the
prepositional block for for 5 seconds. This chain of blocks is packaged in a layer and is
finally returned as the result of the first call to getMeaning.
3.4.3
Contextual Commands
Contextual commands are handled differently: the interface saves the full parse tree of every
entered command along with the name of the verb and important characteristics of that
command. Thus, after a while, we’ll have a long history of parses and key words. Basically,
contextual commands search through this list to find the previous command being referred
to.
This may seem like a rather limited context from which to draw. However, it is sufficient
for most interactions with the robot. Consider the string of commands:
1. turn right
2. move to the wall
3. turn left for 3 seconds
4. move again
It is debatable what is actually meant by the last command. Is the prepositional phrase
to the wall included in doing the action again? We say yes: the meaning of a contextual
repeat command is to repeat exactly some command which has already been done. The
getMeaning function will recognize this last command as contextual from its parse, and it
will call the getCC function, which will search through the list of keywords from previous
commands until it finds the keyword “move.” It will then take the entire stored parse tree
and just call getMeaning on it again and return the result.
18
Why wouldn’t we just store the layer resulting from the semantic analysis and not
redo all the hard work? We must accommodate time-sensitive functions. When we say:
move backwards for 3 seconds, we really mean move back until 3 seconds from now (at
least, that is the meaning we arrive at after semantic analysis). However, if we then say
move again, we certainly don’t mean to move until 3 seconds from the original time of the
first command! Thus, commands sometimes have meaning which depends on the time of
the command, and repeating a command necessarily moves the calling time to the present.
To most easily accomplish this, we re-analyze the entire parse tree. Note that this is not
actually that much of a loss: creating the parse tree is by far the most time-consuming part
of the analysis process—re-analyzing is negligible compared to re-parsing (which we do not
do).
3.4.4
(Embedded) Declaratives
An important class of prepositions is the group of subordinating prepositions, which allow
an embedded declarative sentence to appear inside a prepositional phrase, which in turn
governs the behavior of a command. For example, turn left if you see a wall. Here
we see the embedded declarative you see a wall. This is a special sort of prepositional
phrase, because it directly asks a question; regular prepositional phrases usually have a
question or some sort of condition built in to them (move to the wall has the question “do
you see a wall” built in to it), but here we have an explicit condition on, for example, turning
left. What we need to do here (and for queries, as we will see) is evaluate the truth value
of the declarative in the current context of the robot. The meaning of the declarative, as
we discussed earlier, is its truth value: once the truth of the embedded declarative has been
evaluated, then prepositional phrase can use this truth value to govern its own inhibitory
behavior.
Our robot does not use a logic engine: that sort of system would be overkill. The
declaratives we need to understand are not adding facts to a knowledge base or anything
complicated like that; they are simply asking for the truth value of a simple sentence. In fact,
the simplicity of the robot helps us dramatically restrict the domain of possible sentences. If
we are asking about the current state of the robot, we must ask about the value of one of its
sensors, and in all cases, the only possible verb that can appear in the declarative is “see.” For
example, turn if you see a wall asks about the IR sensors, turn if you see a light
asks about the light sensors, and move until you see a line asks about the line sensors.
Therefore, the declarative analysis functions are looking only for sentences of the form “you
see X.” Obviously, this is an extremely simple subset of English. However, the restriction is
appropriate here.
The semantic analysis is done by first building a tiny logical tree, so you see a wall
19
becomes (V: see (N: you) (DP: a wall)). Once this has taken place, a second set of functions
looks at the tree and builds a lambda function which returns true or false depending on
whether the robot (“you”) sees “a wall.” In our database of objects, each object has a
function associated with it which senses it, so we find the object “a wall,” perhaps using
adjectives and prepositions to pare down the list of possibilities, and call its sensor function.
This process seems more complicated than is necessary, since as we’ve already said, the only
possible verb is “see:” why don’t we just find the object of the verb and not bother with the
tree? We’d like to be as general and flexible as possible; with the current system, we could
expand it to understand more verbs if that started to make sense, or, more likely, we could
pass the logical tree to a logical engine for understanding.
Once the declarative semantic analysis has taken place, it returns the lambda function,
which is then integrated into the preposition (inhibitory) block. In other words, the analysis of turn left if you see a wall to your right makes a lambda function out of the
embedded declarative, so the command really becomes turn left if X, where X is a
black-box lambda function. Then the sentence is analyzed as an imperative, except getVPP
uses the black box, in combination of how “if” deals with true and false, to create the verb
preposition layer.
3.4.5
Queries
Queries are just declaratives marked as needing some kind of language response (rather than
an action). We deal with queries in a way very similar to declaratives. Currently, we only
handle very simple “do” queries. This is not a language restriction imposed by the robot;
it certainly makes sense to ask the robot “what do you see?” But that kind of query is
quite complicated since it requires a search through all possible objects. We chose to only
handle the more simple type of query such as “do you see a wall?” Here, we just analyze
the embedded declarative. From our declarative analysis functions, we get a function which
gives us the truth value of the declarative in the context of the robot. We then build a
block (and layer) out of this function, but we use a special feature of our brain class: a
layer can return the commands it wants for the motors, lights, and speaker, and it can pass
meta-commands such as “kill me,” but it can also send messages for printing. Our queryanalysis functions produce a layer which immediately kills itself and sends a message “yes”
or “no” depending on the truth value of the declarative. In other words, we still produce
a layer from a query, but it doesn’t perform any action; it just sends a message. In this
way, we maintain the communication between the brain and the parser/analyzer: for every
command, we produce a layer and add it to the brain. For imperatives, this works as already
described. For queries, the layer we add to the brain is special, but it’s still a layer. Keep
in mind that the brain is not directly connected to the robot. It must be connected through
20
a data structure we created called a nerves interface, which passes commands to the actual
robot from the brain and relays the sensor data. To answer the query without adding a layer,
we’d have to connect the analyzer directly to the nerves instantiation and bypass the brain.
This goes against both our attempts to keep everything as compartmentalized as possible
and our desire for consistent representation of meaning.
4
Examples
Here we exhibit a set of example commands and the behaviors they induce in the robot.
4.1
Example: Simple Commands
Simple commands ask for a single, uncomplicated behavior. A selection of nice simple
commands:
• move —The robot moves forward for a small amount of time (about 2 seconds)
• turn —The robot turns a little
• beep —The robot beeps
• play a D —The robot plays a D (note).
4.2
Example: Adjectives, Adverbs
Things can get more interesting if we allow ourselves some adjectives and adverbs. All of
the following commands have obvious results. Note that none of them specify the length of
time for which they should continue, so they all run for the default “small amount of time,”
which is about 2 seconds.
• turn left
• move backwards
• turnon your left light
• quickly turn right
• quickly go back
21
4.3
Example: Simple Prepositions
Prepositions allow us to specify conditions under which an action should be performed. The
following commands do what you would expect.
• move to the wall
• turn left to the wall
These simple prepositions do not allow us to do too much. We’ll see later that subordinating prepositions are much more powerful.
4.4
Example: Contextual Commands
With contextual commands, we can refer to previous commands. These commands aren’t too
complicated, but they are interesting. Assume that we’ve already given the robot the commands (in order): move backwards, turn right, turn left, beep, move back quickly.
Then we could give the following commands:
• again — The robot would move back quickly again.
• beep again — The robot would beep again (not terribly exciting)
• turn again — The robot would turn left.
4.5
Example: Embedded Declaratives
With subordinating prepositions, we can embed declarative sentences inside preposition
phrases, and these embedded declaratives now govern how the prepositional phrase inhibits
the verb. The results of these commands are self-evident. Note that we no longer have the
default 2-second duration, since these commands have explicit termination conditions.
We can now also start to make use of noun prepositions to specify which wall (right or
left), for instance, the robot sees.
• move until you see a wall
• turn left if you see a bright light
• beep whenever you see a light
• go back quickly if you see a line
• turn left quickly if you see a wall to your right
22
4.6
Example: Queries
Queries allow us to ask about the current state of the robot. The robot responds “Yes!” or
“no” depending on the answer to the question.
• Do you see a wall
• Do you see a bright light
• Do you see a wall to your right
4.7
Example: Complete Interactions
Now we can put everything together. First, let’s use the layered, combining ability of the
brain to have the robot move around avoiding walls:
turn left if you see a wall to your right
turn right if you see a wall to your left
move for 45 seconds
The robot will move around for 45 seconds and avoid any walls that it sees.
We can also use queries to, for instance, find a corner (“>>>” denotes commands, the
other lines are robot responses):
>>> do you see a wall
No
>>> move to a wall — the robot move forward until it sees a wall anywhere (right,
left, or in front)
>>> turn left until you see a wall to your front — make sure the robot is
directly facing the wall it just found
>>> do you see a wall — just checking
Yes!
turn left for 90 degrees — make the robot face parallel to the wall. We might
also have done the command turn left until you can’t see a wall.
23
move until you see a wall
The robot should now we facing a wall and be parallel to the first wall it found, so it’s
in a corner. In both of these examples, we can replace “wall” with “light” or “line” or a
modified version of any of them.
5
Progress and Future Work
Probably the best way to understand the current situation is to look at the schematic in
Figure 6.
The major incomplete area of the proposed NLPU is the robot control architecture.
Python, the language in which our system is implemented, has support for multi-threading.
However, there are technical issues with IDLE, a development environment for Python, which
have stalled our efforts in this area. This is unfortunate, because with multi-threading can
we take full advantage of our layering, subsumptional architecture.
6
Related Work and Background
Our project is a blend of natural language processing and robotics, and there are a few
key sources which inspired many of our design choices and made our project possible. On
a very basic level, we relied on all the research done to make the foundations of natural
language processing, like parsing, easy to do. In particular, we used the Natural Language
Toolkit (NLTK-Lite) for Python, which implements a recursive-descent parser and a tree data
structure. Using NLTK-Lite allowed us to focus our efforts on the design of the language
model and the semantic analysis.
The basic goal of our project is inspired most by research in natural language processing.
In a very broad sense, this field attempts to develop software which can interact with a person
using natural language, and the products of this research are varied. One simple example
is the machine which answers telephones and can direct calls based on spoken commands
from a user. This machine does mostly phonetic and syntactic processing: the challenge is
understanding which words were spoken. On the other end of the natural language processing
spectrum are programs which take typed natural language sentences, such as paragraphs of
a story, and can reason about what the meaning of the input is. These programs tackle
the problem of semantic and pragmatic analysis, and it is these latter programs which most
inspired our research.
In terms of semantic analysis, there are two distinct areas: research in more standard,
linguistics-oriented efforts to create programs which can understand declarative sentences
24
Commands
Responses
Language Interface
NLTK-lite
Input
Grammar/Parser
Parse Tree
Layer
Semantic Analyzer
Thread 2
}
}
Thread 1
Queries
Brain Control
Interface
Raw Myro Commands
Myro/Hardware Connection
Scribbler
Figure 6: The proposed NLPU
25
}
NLPU
and reason about them and research in programs which can carry out natural language
commands, that is, procedural semantics. Clearly, our project falls more in line with the
second area, but for basic reading, [5] is a good example of the first type of semantic analysis
and demonstrates that software which understands a short story (among other things) was
quite feasible in the 70’s.
While this research is inspiring for us, we are interested more in creating a system which
works with procedural semantics. A very important paper here is [7], from 1972, which
develops a system for logical deduction and understanding of commands which require complicated behaviors. This is a simulated robot arm program which can manipulate blocks.
As an example of the power of this system, if a red block lies on top of a blue block, the
system can figure out that if it is required to move the blue block, it will need to set the red
block aside. Though everything is simulated, this is a clear predecessor to the project we
undertook and was our major source for clarification and understanding of the challenges we
faced.
For general background in procedural semantics and natural language processing in general, [4] and [2] were very useful. Following in the footsteps of Winograd are [6], [3].
The most influential research we looked to for reference was [1]. This paper argues that
the standard way of building AI systems has the wrong approach. The vast majority of
research attempts to create an AI system in a simplified environment (all of the papers we
have discussed so far do this). For instance, the usual habitat of an AI robot is a block
world: the things in the world with which the robot can interact are just blocks, perhaps
with color or texture. The hope is that creating a good AI system in a simplified world can
be slowly extended to the creation of a good AI system in the real world. Brooks argues
that we should instead start with simple robots in the real world: this paper describes the
development of several robots which operate in the real world, performing relatively simple
tasks, such as searching a building for soda cans to pick up and recycle. The key insight here is
the subsumptional architecture: one process directs the robot (the “move around” process)
until the robot senses a situation requiring specialized behavior and a higher-precedence
process takes over (e.g. the “grab a soda can” process). The idea of having multiple levels of
processes which inhibit each other played directly into our design of the model of thought of
the robot and our design of “block” as a linguistic idea (and an implemented data structure).
Our realization of the ideas put forward by Brooks has an ironic twist: the very title of [1]
contains the thesis that intelligent robots do not need to (and in fact are hindered by) a
fixed representation of the world. Note that we took the idea of a subsumptional model of
control, made it dynamic, and added representations! The inhibitory model of procedural
semantics that we inherited from this work made the most difference for us, and it, more
than anything else, made our project possible.
26
A
Grammar
Here we show our complete grammar. Some abbreviations: C is “command” (the start
symbol). IS, CC, and DS are “imperative sentence,” ”contextual command,” and ”declarative
sentence.” In addition, most standard phrase names are prefixed with a D to indicate that
they are phrases in a declarative sentence.
C -> IS | CC | DS | Q
Q -> DQ | WH WHQ
WHQ -> DO DWHQ
DQ -> DO DS
DWHQ -> DS
DO -> do
WH -> what
CC -> V CAV | CAV
CAV -> again | more
DS -> DP DVP
DVP -> (AdvP | Adv) DV (AdvP) (VPP)
DV -> (AUX) RDV
DP -> NP | D NP
IS -> VP
VP -> (Adv) V (AdvP) (VDP) (VPP)
VDP -> (D) VNP
VNP -> VN | VN AP | AP VN
NP -> N | AP NP | N NPP | AP NP NPP | NumP NP | N AP
AP -> A
AdvP -> Adv AdvP |
NPP -> P DP
VPP -> P DP | SubP DS | P DP VPP | SubP DS VPP
NumP -> Num
Num -> 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
20 | 45 | 30 | 90 | 180 | 360 | 270 | one |
27
two | three | four | five | six | seven | eight |
nine | ten
Adv -> quickly | slowly | carefully | right | left |
straight | back | backwards | forward | forwards |
around
VN -> light | LED | LEDs | sound | beep | song | note |
A | B | C | D | E | F | G
N
-> I | you | little | room | dark | lights | yourself |
wall | light | box | line | seconds | second | foot |
feet | inches | meters | meter | degrees | right |
left | ahead | front
D
-> the | your | a | an | that |
A
-> right | left | center | high | low | black |
white | dark | bright | ahead | front
P
-> to | until | for | along | toward | towards |
from | into
SubP -> after | if | unless | until | when | whenever
V
-> face | spin | move | go | turn | turnon |
turnoff | play | light | blink | beep
AUX -> can | can’t | dont
RDV -> face | spin | move | go | turn | turnon |
turnoff | play | light | blink | beep | say |
hear | see | is
28
B
Acceptable Sentences
Here is an example suite of sentences. This list is not exhaustive—most standard combinations of the phrases shown here are acceptable.
B.1
B.1.1
Imperatives
Moving
Here, go can be substituted for move, and light can almost always be substituted for wall:
move forward
move back
move backwards
move forward a little
move forward for two feet
move forward for 4 seconds
move forward slowly
move forward quickly
move forward slowly for two feet
move
move to the line
move to the wall
move forward to the wall
move until you can see the wall
move to the light
quickly move forward to the wall
move for 4 seconds whenever you see a wall to your right
B.1.2
turn
turn
turn
turn
turn
turn
turn
turn
Turning
right
left
right
right
right
right
right
right
for 2 seconds
for 90 degrees
a little
until you see a wall
slowly
quickly
29
slowly turn
turn slowly right
turn right until you can’t see a wall whenever you see a wall to your left
B.1.3
Lights
turnon your left light
turnon your lights
turn off your lights
turnon your right light when you see the wall
turnon your lights until you see the wall
turnon your left light if you see a wall to your left
B.1.4
Sound
play an A
beep when you see the wall
beep for 2 seconds when you can’t see a bright light
B.2
Queries
do you see a wall
do you see a bright light to your left
do you see a wall to you right
B.3
Contextual Commands
again
move again
play again
References
[1] Rodney A. Brooks. Intelligence without representation. Number 47 in Artificial Intelligence, pages 139–159. 1991.
[2] Colleen Crangle and Patrick C. Suppes. Language and Learning for Robots. CSLI Publications, Stanford, CA, USA, 1994.
30
[3] Masoud Ghaffari, Souma Alhaj Ali, and Ernest L. Hall. A perception-based approach
toward robot control by natural language. Intelligent Engineering Systems through Artificial Neural Networks, 14, 2004.
[4] Barbara J. Grosz, Karen Sparck-Jones, and Bonnie Lynn Webber, editors. Readings
in natural language processing. Morgan Kaufmann Publishers Inc., San Francisco, CA,
USA, 1986.
[5] R. C. Schank and C. K. Riesbeck. Lisp. In R. C. Schank and C. K. Riesbeck, editors,
Inside Computer Understanding: Five Programs Plus Miniatures, pages 41–74. Erlbaum,
Hillsdale, NJ, 1981.
[6] Stuart C. Shaprio. Natural language competent robots. IEEE Intelligent Systems,
21(4):76–77, 2006.
[7] Terry Winograd. A procedural model of language understanding. In Computation &
intelligence: collected readings, pages 203–234. American Association for Artificial Intelligence, Menlo Park, CA, USA, 1995.
31