Download The Symbol Grounding Problem has been solved. So

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Incomplete Nature wikipedia, lookup

Visual Turing Test wikipedia, lookup

Speech-generating device wikipedia, lookup

Ecological interface design wikipedia, lookup

Artificial intelligence in video games wikipedia, lookup

Enactivism wikipedia, lookup

Computer vision wikipedia, lookup

Computer Go wikipedia, lookup

Human-Computer Interaction Institute wikipedia, lookup

Knowledge representation and reasoning wikipedia, lookup

Intelligence explosion wikipedia, lookup

Wizard of Oz experiment wikipedia, lookup

Ethics of artificial intelligence wikipedia, lookup

Existential risk from artificial general intelligence wikipedia, lookup

Human–computer interaction wikipedia, lookup

History of artificial intelligence wikipedia, lookup

Philosophy of artificial intelligence wikipedia, lookup

Embodied cognitive science wikipedia, lookup

Transcript
The Symbol Grounding Problem
has been solved. So what’s next?
Luc Steels1,2
1
Sony Computer Science Laboratory Paris,
2
VUB AI Lab, Vrije Universiteit Brussel
November 5, 2006
In the nineteen eighties, a lot of ink was spent on the question of symbol
grounding, largely triggered by Searle’s Chinese Room story. Searle’s article
had the advantage of stirring up discussion about when and how symbols
could be about things in the world, whether intelligence involves representations or not, what embodiment means and under what conditions cognition
is embodied, etc. But almost twenty years of philosophical discussion have
shed little light on the issue, partly because the discussion has been mixed
up with arguments whether artificial intelligence was possible or not. Today
I believe that sufficient progress has been made in cognitive science and AI
that we can move forward and study the processes involved in representations
instead of worrying about the general framework with which this should be
done.
1
Symbols about the world
As suggested by Glenberg, De Vega and Graesser [4], let us start from Peirce
and the (much longer) semiotic tradition which makes a distinction between
a symbol, the objects in the world with which the symbol is associated, for
example for purposes of reference, and the concept (interpretant) associated
with the symbol (see figure 1).
Here we are interested primarily in triads for which there is a method that
constrains the use of a symbol for the objects with which it is associated. The
1
Figure 1: The semiotic triad relates a symbol, an object, and a concept
applicable to the object. The method is a procedure to decide whether the
concept applies or not.
method could be a classifier, a perceptual/pattern recognition process that
operates over sensori-motor data to decide whether the object ’fits’ with
the concept. If an effective method is available, then we call the symbol
grounded. There are a lot of symbols which are not about the real world but
about abstractions of various sorts and so they will never be grounded this
way. Some concepts are also about other concepts.
The Semiotic Map
Together with these semiotic relations among the entities in the triad, there
are some additional relations that provide more pathways for semantic navigation between concepts, objects, and symbols. Objects occur in a context
and have various domain relationships with each other so that they can be
grouped into sets. Symbols co-occur with other symbols in texts and speech,
and this statistical structure can be picked up using inductive methods (as in
the Landauer-Dumais [9] Latent Semantic Analysis proposal). Concepts may
have semantic relations among each other, for example because they tend to
apply to the same objects. There are also relations between methods, for
example because they both use the same feature of the environment or use
the same technique for classification. Humans effortlessly use all these links
and make jumps far beyond what would be logically warranted.
The various links between objects, symbols, concepts and their methods,
constitute a semiotic map in the form of a giant network, which gets dynamically expanded and reshuffled every time the individual interacts with the
2
world or others. Individuals navigate through this network for purposes of
communicaton: When a speaker wants to draw the attention of an addressee
to an object, he can use a concept whose method applies to the object, then
choose the symbol associated with this concept and render it in speech or
some other medium. The listener gets the symbol, uses his own vocabulary
to retrieve the concept and hence the method, and applies the method to
decide which object might be intended. Thus if the speaker says ”Could you
pass me the wine please”, then ”the wine” refers to a bottle on the table and
the interaction (the language game) is a success if the addressee indeed takes
up the bottle and gives it - or maybe poors some wine in the speaker’s glass.
Clearly, grounded symbols play a crucial role in communication about the
real world, but the other links may also play important roles. For example,
the sentence earlier was ”Could you pass me the wine please” wereas in fact
the speaker wanted the bottle that contained the wine. The speaker could
also have said ”Could you pass me the Bordeaux please” and nobody would
think that she requested to pass the city of Bordeaux. In this paper we will
focus however only on grounded symbols as this is the main issue in the
symbol grounding debate.
Symbols in Computer Science
The term symbol had a venerable history in philosophy and linguistics
and I have tried to sketch its meaning. But it was hijacked in the late fifties
by computer scientists (more specifically AI researchers) and adopted in the
context of programming language research. The notion of a symbol in (so
called symbolic) programming languages like LISP or Prolog is quite precise.
It is a pointer (i.e. an address in computer memory) to a list structure which
involves a string known as the Print Name (to read and write the symbol),
possibly a value bound to this symbol, possibly a definition of a function
associated with this symbol, and a list of properties and values associated
with the symbol.
The function of symbols can be recreated in other programming languages
(like C++) but then the programmer has to take care of reading and writing
strings and turn them into internal pointers and he has to introduce other
datastructures to associate information with the symbol. In a symbolic programming language all that is done automatically. So a symbol is a very
useful computational abstraction (like the notion of an array or structure)
3
and a typical LISP program might involve hundreds of thousands of symbols,
created on the fly and reclaimed for memory (garbage collected) when the
need arises. Programs that use symbols are called symbolic programs and
almost all sophisticated AI technology but also a lot of web technology in
search engines rests on this elegant but enormously powerful concept.
But clearly this notion of symbol is not related to anything I discussed
in the previous paragraphs. Unfortunately when philosophers and cognitive
scientists who are not knowledgable about computer programing read the AI
literature they come with the bagage of their own field and assume immediately that if one uses a symbolic programming language, you subscribe to all
sorts of philosophical doctrines. All this has given rise to what is probably
the greatest terminological confusion in the history of science. The debate
about the role of symbols in cognition or intelligence is totally unrelated to
whether one uses a symbolic programming language or not and the rejection
of so called ’traditional AI’ is mostly based on misunderstandings. A neural network for example may be implemented technically using symbols, but
these symbols are computer science symbols and not the philosophical type
of symbols.
The Symbol Grounding Problem
Let me return now to the question originally posed by Searle: Can a
robot deal with grounded symbols? More precisely, is it possible to build
an artificial system that has a body, sensors and actuators, signal and image
processing and pattern recognition processes, and information structures to
store and use semiotic maps, and uses all that, for example in communication
about the real world.
Without hesitation, my answer is yes. Already in the early seventies,
AI experiments like Winograd’s SHRDLU or Nilsson’s Shakey achieved this.
Shakey was a robot moving around and accepting commands to go to certain
places in its environment. To act appropriately upon a command like ’go to
the next room and approach the big pyramid standing in the corner’, it is
necessary to perceive the world, construct a world model, parse sentences
and interpret them in terms of this world model, and then make a plan and
execute it. Shakey could do all this. It was slow, but then it used a computer
with roughly the same power as the processors we find in modern day hotel
door knobs, and about as much memory as we find in yesterday’s mobile
4
phones.
So what was all the fuzz created by Searle about? Was Searle’s paper
(and subsequent philosophical discussion) based on ignorance or on a lack of
understanding of what was going on in these experiments? Probably partly.
It has always been popular to bash AI because that puts one in the glorious
position of defending humanity.
But one part of the criticism was justified: all was programmed by human designers. The semiotic relations were not autonomously established
by the artificial agent but carefully mapped out and then coded by human
programmers. In the case of natural intelligence, this is clearly not the case.
The mind/brain is capable to develop autonomously an enormous repertoire
of concepts to deal with the environment and to associate them with symbols which are invented, adopted, and negotiated with others. This early
AI research did however make it very clear that the methods needed to
ground symbols had to be hugely complicated, domain-specific, and contextsensitive, if they were to work at all. Continuing to program these methods
by hand was therefore out of the question and that route was more or less
abandoned.
So the key question for symbol grounding is actually another question,
well formulated by Harnad [7]: If someone claims that a robot can deal with
grounded symbols we expect that this robot autonomously establishes the
semiotic map that it is going to use to relate symbols with the world.
Semiotic Dynamics
Very recent developments in AI have now shown that this second question
can also be resolved. We now know how to conceive of artificial systems
that autonomously set up and coordinate semiotic landscapes and there is a
growing number of concrete realisations achieving this [11]. That is why I
believe we can now say that the symbol grounding problem is solved. The
breakthrough idea is to set up a particular kind of ’semiotic’ dynamics. Each
agent is endowed with the capacity to invent new concepts which initially
act like hooks. The concepts get progressively more meaning by associating
methods with them that relate the concepts to the world or by associating
symbols with them to coordinate their use within the group. Agents interact
with each other, for example through language games, and as part of the
interaction they adapt both their grounding methods and the relation to
5
Figure 2: Results of experiments showing the coordination of color categories
in a cultural evolution process. Agents play guessing games naming color
distinctions and coordinate both their lexicons and color categories through
the game. The graph shows category variance with (bottom) and without
(top graph) language. Category variance is much less if agents use language
to coordinate their categories. The color prototypes of all agents are shown
in the right top corner with (right) and without (left) use of language.
symbols. One way to implement this adaptation is to have weighted links that
get updated and changed based on success in interactions. If all this is done
right, we see a gradual convergence towards shared semiotic systems among
the agents, in the sense that their concepts become sufficiently coordinated
to make successful communication possible.
It has been shown convincingly that interaction is crucial for coordinating
the conceptual repertoires of agents. There is no global control, no telepathy
and no prior (innate) programming of concepts for the agents, and hence
coordination can only take place through adaptation based on local one-onone interaction (see for example figure 2 from [15]). Amazingly, it is possible
to orchestrate the interaction between groups of agents so as to spontaneously
self-organise communication systems without human intervention. Research
has been steadily pushing up the complexity of these communication systems
from simple lexical one-word utterances to complex grammatical multi-word
utterances with rich underlying conceptualisations [14].
6
Alignment in dialogue
These fascinating developments are in sink with observations gaining ground
(again) in various cognitive studies, showing that conceptualisations as used
for language are not only culture- and individual-specific but also imposed on
reality instead of statistically derived from it. More specifically: Studies of
perceptually grounded categories, like color or sound distinctions, have shown
that there are important differences between cultures [3] and that these differences have co-evolved with the languages used in these cultures. Similar
studies have shown that there are also tremendous individual differences both
in categorisation and in the use of symbols, even within the same community
[17]. So the idea that there is an objective, universally shared unique ’meaning’ for human symbols is a fallacy. This may explain why neural networks,
which usually implement some form of statistical induction, have never been
able to go beyond toy examples with carefully prepared data sets. It also
explains why projects like CYC that try to define ’the’ basic human concepts
appearing in language or thought once and for all chase a holy grail that can
never be found.
If conceptualisations and the use of symbols is really individual and
culture-based as well as domain and situation specific, how come we humans are ever able to communicate and share thoughts at all? The answer
is in a phenomenon that psycholinguists are currently studying intensely,
namely alignment (see for example [2], [10]). For the purposes of a specific
dialogue or interaction, humans adapt their conceptualisations and language
surprisingly quickly, so that efficient communication becomes possible. They
do this at all levels: phonetic (the structure of the sign), lexicon, grammar,
but also conceptual.
The experiments with semiotic dynamics on autonomous robots or software agents can be seen as a way to operationalise the cognitive mechanisms
that are needed to explain how these alignment phenomena as observed by
psycholinguists are computationally possible. It is perhaps better to talk
about co-ordination instead of alignment. The idea is that agents do not
really align (in the sense of make equal) their semiotic systems - which is
basically impossible because there is no telepathic way in which one can inspect the brain states of the other and the semiotic map of each individual
is based on a history of millions of situated interactions - but that they sufficiently co-ordinate their classifiers and use of symbols temporarily to make
7
joint action and dialogue possible. The crucial insight is that the grounding
of symbols is not a once and for all affair, but a matter of progressive and
continuous coordination. Classifiers operationalising the use of a symbol are
constantly adapted by the agents, both to deal better with the environment
and to yield more success in communication with others within the same
community.
Another fascinating development is that we currently see systems emerging on the web that try to orchestrate and support human semiotic dynamics.
They take the form of social tagging sites (such as Flickr, Delicio.us, Connotea, Last.Fm, etc.) that create peer-to-peer networks among users and
allow them to associate tags (i.e. symbols) with objects that they upload
and thus make available to others. The semiotic dynamics that is seen in the
collaborative interaction among hundreds of thousands of users resembles
the semiotic dynamics seen in human populations or populations of artificial agents, such as winner-take-all phenomena of a dominant tag, new tags
emerging, shifts in tag use, etc. [5], [16].
2
Representations and Meanings
We now come to the second issue: representations. The notion of representation has had a venerable history in philosophy, art history, etc. before
it was hijacked by computer scientists, where it became used without all the
bagage that philosophers associate with the notion of representation. But
then representations returned into cognitive science through the backdoor
of AI, resulting in some rather confusing debates. Let me try to trace this
shift of meaning, starting with the original pre-computational notion of representation, as we find for example in the famous essay on the hobbyhorse
by Gombrich [6] and we still find in cultural studies.
Defining Representations
A representation is a stand-in for something else, so that it can be made
present again, ’re’-’present’-ed. Anything can be a representation of anything
by fiat. For example, a pen can be a representation of a boat or a person
or upward movement, or whatever. The magic of representations happens
simply when one person decides to establish that x is a representation for y,
8
and others could possibly agree with this or understand that this representational relation holds. There is nothing in the nature of an object that makes
it a representation or not, it is rather the role the object plays in subsequent
interaction. Of course it helps a lot if the representation has some properties that help others to guess what it might be a representation of. But a
representation is seldom a picture-like copy of what is represented.
It is clear that symbols are a particular type of representation, one where
the physical object used has a rather arbitrary relation to what is represented.
It therefore follows that all the remarks made earlier about symbols are valid
for representations as well. The term symbol seems more often used for
a building block of representations, whereas a representation can be more
complex as it is typically a combination of symbols which compositionally
express more global structured meanings.
Human representations (and more complex symbols like grammatical sentences) have some additional features. First of all they seldom represent
physical things, but rather meanings. A meaning is a feature which is relevant in the interaction between the person and the thing being represented.
For example, a child may represent a fire engine by a square painted red.
Why red? Because this is a distinctive important feature of the fire engine for the child. To represent something we select a number of important
meanings and we select ways to invoke these meanings in ourselves or others.
The invocation process is always indirect and requires inference. Without
knowing the context, personal history, prior use of representations, etc. it
is often very difficult to decipher representations. Second, human representations typically involve perspective. Things are seen and represented from
particular points of view and often these perspectives are mixed together.
A nice example of a representation is shown in figure 3. Perhaps you
thought that this drawing represents a garden, with the person in the middle
watering the plants. Someone else might have interpreted this as a kitchen
with pots and pans and somebody cooking a meal. As a matter of fact, the
drawing represents a British doubledecker bus (the word “baz” is written on
the drawing). Once you adopt this interpretation, it is easy to guess that
the bus driver must be sitting on the right hand side in his enclosure. The
conductor is shown in the middle of the picture. The top shows a set of
windows of increasing size and the bottom a set of wheels. Everything is
enclosed in a big box which covers the whole page. This drawing has been
made by a child, Monica, when she was 4,2 years old.
Clearly the drawing is not a realistic depiction of a bus. Instead it ex9
Figure 3: Drawing by a child called Monica. Can you guess what meanings
this representation expresses?
presses some of the important meanings of a bus from the viewpoint of Monica. Some of the meanings have to do with recognising objects, in this case,
recognise whether a bus is approaching so that you can get ready to get on
it. A bus is huge. That is perhaps why the drawing fills the whole page,
and why the windows at the top and the wheels at the bottom are drawn
as far apart as possible. A bus has many more windows than an ordinary
car, so Monica has drawn many windows. Same thing for the wheels. A bus
has many more wheels than an ordinary car and so many wheels are drawn.
The exact number of wheels or windows does not matter, there must only be
enough of them to express the concept of ’many’. The shape of the windows
and the wheels has been chosen by analogy with their shape in the real world.
They are positioned at the top and the bottom as in a normal bus viewed
sideways. The concept of ‘many’ is expressed in the visual grammar invented
by this child.
Showing your ticket to the conductor or buying a ticket must have been
an important event for Monica. It is of such importance that the conductor,
who plays a central role in this interaction, is put in the middle of the picture
and drawn prominently to make her stand out as foreground. Once again,
features of the conductor have been selected that are meaningful, in the sense
10
of relevant for recognising the conductor. The human figure is schematically
represented by drawing essential body parts (head, torso, arms, legs). Nobody fails to recognise that this is a human figure, which cannot be said
for the way the driver has been drawn. The conductor carries a ticketing
machine. Monica’s mother, with whom I corresponded about this picture,
wrote me that this machine makes a loud “ping” and so it would be impossible not to notice it. There is also something drawn on the right side of
the head, which is most probably a hat, another characteristic feature of the
conductor. The activity of the conductor is also represented. The right arm
is extended as if ready to accept money or check a ticket.
Composite objects are superimposed and there is no hesitation to mix
different perspectives. This happens quite often in children’s drawings. If
they draw a table viewed from the side, they typically draw all the forks,
knives, and plates as separate objects ‘floating above’ the table. In figure
3, the bird’s-eye perspective is adopted so that the driver is located on the
right-hand side in a separate box and the conductor in the middle. The
bus is at the same time viewed from the side, so that the wheels are at
the bottom and the windows near the top. And then there is the third
perspective of a sideways view (as if seated inside the bus), which is used to
draw the conductor as a standing up figure. Even for drawing the conductor,
two perspectives are adopted: the sideways view for the figure itself and the
bird’s-eye view for the ticketing machine so that we can see what is inside.
Multiple perspectives in a single drawing are later abandoned as children
try to make their drawings ‘more realistic’ and hence less creative. But it
may reappear again in artists’ drawings. For example, many of Picasso’s
paintings play around with different perspectives on the same object in the
same image.
The fact that the windows get bigger towards the front expresses another
aspect of a bus which is most probably very important to Monica. At the
front of a doubledecker bus there is a huge window and it is great fun to sit
there and watch the street. The increasing size of the windows reflects the
desirability to be near the front. Size is a general representational tool used
by many children (and also in medieval art) to emphasise what is considered
to be important. Most representations express not only facts but above all
attitudes and interpretations of facts. The representations of very young
children which seem to be random marks on a piece of paper are for them
almost purely emotional expressions of attitudes and feelings, like anger or
love.
11
Human representations are clearly incredibly rich and serve multiple purposes. They help us to engage with the world and share world views and
feelings with others. But let us now turn to the computer science notion of
representation.
C-representations and m-representations
In the fifties when higher level computer programming started to develop,
computer scientists began to adopt the term ’representation’ for datastructures that held information for an ongoing computational process. For example, in order to calculate the amount of water left in a tank with a hole
in it, you have to represent the initial amount of water, the flow rate, the
amount of water after a certain time, etc., so that calculations can be done
over these representations. Information processing can be understood as designing representations and orchestrating the processes for the creation and
manipulation of these representations.
Clearly computer scientists (and ipso facto AI researchers) are adopting
only one aspect of representations: the creation of a ’stand-in’ for something
into the physical world of the computer so that it could be transformed,
stored, transmitted, etc. They are not thinking of the much more complex
process of meaning selection, representational choice, composition, perspective taking, inferential interpretation, etc. which are all an important part
of human representation-making. Let me use the term c-representations to
mean representations as used in computer science, and m-representations for
the original meaning-oriented use of representations in the humanities.
In building AI systems there can be no doubt that c-representations
are needed to do anything at all, and also among computational neuroscience researchers it is now accepted that there must be something like
c-representations used in the brain, i.e. information structures for visual processing, motor control. Mirror neurons are a clear example of c-representations
for actions. So the question is not whether or not c-representations are used
in cognition but rather what their nature might be.
As AI research progressed, a debate arose between advocates of ‘symbolic’
c-representations and ‘non-symbolic’ c-representations. Simplified, symbolic
c-representations use building blocks that correspond to categories and they
are the hall-mark of logic-based approach to AI, whereas non-symbolic crepresentations use continuous values and are therefore much more directly
12
related to the real world. For example, we could have a sensory channel
picking up the wavelength of light, and numerical values on this channel (a
non-symbolic c-representation), or we could have categories like ‘red’, ‘green’,
etc. (a symbolic c-representation). Similarly we could have an infrared or
sonar channel that reflects numerically the distance of the robot to obstacles (a non-symbolic c-representation), or we could have a discrete symbol
representing ’obstacle-seen’ (a symbolic c-representation).
When the connectionist movement came up in the nineteen-eighties, they
de-emphasised symbolic c-representations in favor of the propagation of (continuous) activation values in neural networks (although some researchers used
these networks later in an entirely symbolic way - as pointed out in the
discussion by [4] of Roger, et.al.). When Brooks wrote a paper on ’Intelligence without representation’ [1] in the nineties, he wanted to argue that
real-time robotic behavior could often be better achieved without symbolic
c-representations. For example, instead of having a rule that says if obstacle
then move back and turn away from obstacle, we could have a dynamical
system that couples directly the change in sensory values to a change on
the actuators. For example the more infrared reflection from obstacles, the
slower the robot moves or begins to make a turn.
But does this mean that all symbolic c-representations have to be rejected? This does not seem to be warranted, particularly not if we are
considering natural language processing, expert problem solving, conceptual
memory, etc. At least from a practical point of view it makes much more
sense to design and implement such systems using symbolic c-representations
and mix symbolic and non-symbolic c-representations when building embodied AI systems.
But what about m-representations? There is no question that humans
are able to create m-representations and judging from the great importance
of text, art, film, dialogue, etc. in all human societies, representation-making
and interpreting appears of great important to us. We have seen that models
are beginning to appear how representations about the world are made and
interpreted.
But the fact that a system produces m-representations or is able to interpret them does not necessarily mean that this system uses them internally.
The process to produce representations (including symbols and language)
need not be a mere translation process. In fact it seems clear that it is
a true construction process about which the outcome is uncertain, with a
strong interaction between the final form and the conceptualisations that are
13
being expressed. This is similar to the way that a bird does not have a preconceived image of the nest (like an architectural design for a building) but
goes through the movements of nest building which results in a particular
nest embedded in the specific environment.
However, as I argued elsewhere [12], there is a further possibility in which
the brain (or an artificial system) might be able to construct and entertain m-representations, namely by internalising the process of creating mrepresentations. Rather than producing the representation in terms of external physical symbols (sounds, gestures, lines on a piece of paper) an internal
image is created and re-entered and processed as if it was perceived externally. The inner voice that we each hear while thinking is a manifestation
of this phenomenon. In such a case, we see the emergence of an ’intelligence
with representation’ [13], in the full rich sense of m-representations. The
power that such representations bring to the community in terms of memory, aid in communication, etc. then becomes available to the individual
brain and the different components in the brain.
3
Notions of Embodiment
We have seen that both the notion of symbol and representation are used
by computer scientists in a quite different way than by cognitive scientists in
general, muddling the debate. This has also happened I think with the term
embodiment. In the technical literature (e.g. in patents), the term embodiment refers to the implementation, i.e. the physical realisation, of a method
or idea. Computer scientists more commonly use the word implementation.
An implementation of an algorithm is a definition of that algorithm in a form
so that it can be physically instantiated on a computer and actually run, i.e.
that the various steps of the algorithm find their analog in physical operations. It is an absolute truth in computer science that any algorithm can be
implemented in many different physical media and on many different types of
computer architectures and in different programming languages, even though
of course it may take much longer (either to implement or to execute) in one
medium versus another one.
It is therefore natural that computer scientists apply this way of thinking
to the brain as well. Their goal is to identify the algorithms and study them
through various kinds of implementations which may not at all be brain-like.
They also might talk about neural implementation or neural embodiment,
14
meaning the physical instantiation of a particular process with the hardware
components and processes available to the brain. But most computer scientists know that the distance between an algorithm (even a simple one) and
its embodiment, for example in an electronic circuit, is huge, with many layers of complexity intervening. And the translation through all these layers is
never done by hand but by compilers and assemblers. It is very well possible
that we will have to follow the same strategy to bridge the gap between high
level cognitive models and low level neural implementations.
There has been considerable resistance both from biologists and from
philosophers for accepting this notion of implementation, i.e. the idea that
the same process (or algorithm) can be instantiated (embodied) in many
different media. They seem to believe that the (bio-)physics of the brain must
have unique characteristics with unique causal powers. For example, Penrose
has argued that intelligence is due to certain (unknown) quantum processes
which are unique to the brain and therefore any other type of implementation
can never obtain the same functionality. It is of course possible in theory to
build physical systems that can implement computational steps much more
efficiently or can implement something that is not implementable in another
medium. But so far there is no serious evidence that cognitive processes
might require them.
There is another notion of embodiment which is also relevant in this discussion. This refers quite literally to ’having a body’ for interacting with the
world, i.e. having a physical structure, dotted with sensors and actuators,
and the necessary signal processing and pattern recognition to bridge the
gap from reality to symbol use. Embodiment in this sense is a clear precondition to symbol grounding, and as soon as we step outside the realm of
computer simulations and embed computers in physical robotic systems then
we achieve some form of embodiment, even if the embodiment is not the same
as human embodiment. It is entirely possible and in fact quite likely that
the human body and its biophysical properties for interacting with the world
make certain behaviors possible and allow certain forms of interaction that
are unique. This puts a limit on how similar artificial intelligence systems
can be to human intelligence. But it only means that artificial intelligence
is necessarily of a different nature due to the differences in embodiment not
that artificial intelligence itself is possible.
15
4
So What’s Next?
The main goal of this essay was to clarify some terminology and attempt
to indicate where we are with respect to some of the most fundamental questions in cognition. I boldly stated that the symbol grounding problem is
solved, by that I meant that we now understand enough to create systems in
which groups of agents self-organise a symbolic communication system that
is grounded in their interactions with the world, and these systems may act
as models to understand how humans manage to self-organise their communication systems. But this does not mean that we know all there is to know,
on the contrary. The discussion of representations made it clear that symbols are a special case of representations, and that representation-making is a
much more richer process than just having internal datastructures holding information about the world. Much is still to be learned about the enormously
complex processes that humans effortlessly use to build up and navigate their
semiotic networks in the process of creating and interpreting representations
and I believe that our research effort should now be focused on understanding all this better. This can either be through psychological observations
of representation-making in action (for example in dialogue or drawing) or
through AI models of the cognitive processes that appear necessary to deal
with representations.
Acknowledgement This research was funded and carried out at the Sony
Computer Science Laboratory in Paris with additional funding from the EU
FET ECAgents Project IST-1940 and at the University of Brussels, VUB AI
Lab. I am indebted to the participants of the Garachico workshop on symbol
grounding for their tremendously valuable feedback and to the organisers
Manuel de Vega, Art Glenberg and Arthur Graesser for orchestrating such a
remarkably fruitful workshop.
References
[1] Brooks, R. (1991) Intelligence Without Representation. Artificial Intelligence Journal (47), 1991, pp. 139-159.
[2] Clark, H. and S. Brennan (1991) Grounding in communication. In:
Resnick, L. J. Levine and S. Teasley (eds.) Perspectives on Socially Shared
Cognition. APA Books, Washington. p. 127-149.
16
[3] Davidoff, J. (2001). Language and perceptual categorisation. Trends in
Cognitive Sciences, 5(9):382 - 387.
[4] Glenberg, De Vega, Graesser (2005) Framing the debate. Discussion paper
for the Garachico Workshop on Symbols, Embodiment, and Meaning.
[5] Golder, S. and B. Huberman (2006) The Structure of Collaborative Tagging. Journal of Information Science. To appear.
[6] Gombrich, E.H. (1969) Art and Illusion: A Study in the Psychology of
Pictorial Representation Princeton, NJ: Princeton University Press
[7] Harnad, S. (1990) The Symbol Grounding Problem. Physica D 42: 335346.
[8] Kay, P. and Regier, T. (2003). Resolving the question of color naming
universals. Proceedings of the National Academy of Sciences, 100(15):908509089.
[9] Landauer, T.K. and S.T. Dumais (1997) A solution to Plato’s problem:
the Latent Semantic Analysis theory of acquisition, induction and representation of knowledge. Psychological Review, 104, 211-240.
[10] Pickering, M.J. and S. Garrod (2004) Toward a mechanistic psychology
of dialogue. Behavioral and Brain Sciences. 27, 169 - 225.
[11] Steels, L. (2003) Evolving grounded communication for robots. Trends
in Cognitive Science 7(7), 308–312
[12] Steels, L. (2003) Language Re-entrance and the Inner Voice. Journal of
Consciousness Studies, 10(4-5):173-185 2003.
[13] Steels, L. (2004) Intelligence with Representation. Phil Trans Royal Soc
Lond A. Volume 361, Number 1811 / October 15, 2003, pp. 2381- 2395.
[14] Steels, L. (2005) The Emergence and Evolution of Linguistic Structure:
From Minimal to Grammatical Communication Systems. Connection Science. Vol 17 (3).
[15] Steels, L. and T. Belpaeme (2005) Coordinating Perceptually Grounded
Categories through Language. A Case Study for Colour. Behavioral and
Brain Sciences 24,6, 469–489.
17
[16] Steels, L. and P. Hanappe (2006) A Semiotic Dynamics approach to
Emergent Semantics. Journal of Data Semantics, to appear.
[17] Michael A. Webster and Paul Kay (2005) Variations in color naming
within and across populations. Behavioral and Brain Sciences, Volume 28,
Issue 04, August 2005, pp 512 - 513
18