Download Making Artificial Intelligence which Copies the Way Babies Learn

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Making Artificial Intelligence which
Copies the Way Babies Learn
Severin Fichtl
CS5577 - Scientific, Technological and Market Research
Supervisor: Frank Guerin
Department of Computing Science
April 2011
1
Executive summary
Independent knowledge creation and maintenance is the key to powerful artificial
general intelligence. Uncertainty about how to build a system with this key- feature and
the capability to obtain all the commonsense knowledge of human adults has led to
growing interest in developmental artificial intelligence.
This research document introduces the developmental approach to artificial intelligence
and describes developmental psychology as its guide and motivator. It then outlines the
developmental AI approach in more details and describes some high- level architectures,
frameworks and ideas used in developmental AI systems. This is followed by a
description of selected techniques used within this research project and related work.
Finally, this paper draws a conclusion and discusses some suggestions for future
improvements.
2
1.
Introduction ................................................................................................................ 4
2.
Developmental Psychology ....................................................................................... 6
3.
4.
2.1.
Piaget‟s Sensorimotor Theory ............................................................................ 6
2.2.
Information Processing Principles...................................................................... 8
2.3.
Discussion........................................................................................................... 9
Technology............................................................................................................... 10
3.1.
Reinforcement Learning ................................................................................... 10
3.2.
Q-Learning ....................................................................................................... 11
3.3.
Function Approximation .................................................................................. 11
3.3.1.
Artificial Neural Networks ........................................................................ 11
3.3.2.
Neural Fitted Q .......................................................................................... 12
3.3.3.
Self-organizing Maps ................................................................................ 12
Developmental Artificial Intelligence...................................................................... 13
4.1.
5.
Intrinsic Motivation .......................................................................................... 13
4.1.1.
Evolutionary Perspective........................................................................... 14
4.1.2.
Intelligent Adaptive Curiosity ................................................................... 14
4.2.
Perception Aided Learning ............................................................................... 15
4.3.
Constructivist Learning Architecture ............................................................... 15
Conclusions .............................................................................................................. 17
Bibliography .................................................................................................................... 18
3
1. Introduction
Infants are incredibly smart and flexible. They devise and apply highly sophisticated
strategies in order to circumvent obstacles and achieve their goals in novel ways. Oneyear-olds by far outperform any of today‟s state of the art artificial general intelligence
(Goertzel & Pennachin, 2007) implementations. Strong and successful AI systems are
still limited to very specific problem domains such as chess-playing. This introduction
explains the problems with traditional AI systems and outlines the developmental
approach which sets out to resolve those problems.
The traditional approach to Artificial Intelligence is to implement a huge knowledge
database alongside the AI controller. Engineers of traditional AI systems have to
prepare the system for every possible problem. This approach works only for a very
limited scope of application, as firstly, in order to make the AI system more flexible and
overarching, the manually implemented knowledge base has to be huge and secondly, it
is impossible to foresee every possible problem an AI system might encounter. In
addition, Brooks (1991) criticised the traditional approach to AI for two further reasons:
firstly, the developer‟s conceptualisation of the environment might not be the most
suitable one for an AI system using dissimilar perception and action mechanisms and
secondly, the abstractions, implemented by a human designer, might be entirely
different to what the human is actually using himself. “Essentially the criticism states
that when human engineers try to make up the appropriate knowledge representation for
an AI system, they will invariably do it wrongly” (Guerin, 2011). Sutton (2001) claims
that the ability to independently analyse whether or not an AI system is working
correctly is the key to a successful AI. An agent that can assess its own success may be
able to modify itself to ensure or regain success. Sutton postulates a “Verification
Principle” for allowing this self assessment:
“An AI system can create and maintain knowledge only to the extent
that it can verify that knowledge itself” (Sutton, 2001)
Today‟s AI systems and knowledge bases rely entirely on human construction and
maintenance. “„Birds have wings‟ they say, but of course they have no way of verifying
this.” (Sutton, 2001)
These problems have led to rising attention to developmental Artificial Intelligence
(Cangelosi, et al., 2010) (Prince, Helder, & Hollich, 2005) (Lungarella, Metta, Pfeifer,
& Sandini, 2003) (Meeden & Blank, 2006) whose idea originates in Turing‟s paper on
“Computing machinery and intelligence” (Turing, 1950). Figure 1 shows a presentation
of Icub as an example of the state of the art in develop mental robotics.
4
Figure 1: Icub robot “playing” with colourful foam balls.
Rather than implementing the AI system‟s knowledge base manually, research in
developmental AI tries to answer the question of “How does one create a learning
system that develops its own model of the world, but is also sophisticated enough to
handle complex tasks efficiently?” (Chaput, 2004). Evolution already answered that
question by creating the learning system innate to every human being. While
developmental psychology tries to describe, comprehend and explain this system,
developmental artificial intelligence, inspired by developmental psychology, aims to
implement artificial intelligence systems which learn their own knowledge, world
model and abilities from infant- like interactions.
In the following chapter (Chapter 2), developmental psychology is presented as a basis
of developmental artificial intelligence. Chapter 3 briefly describes a selection of
techniques used within this research project and related work. Chapter 4 describes
developmental AI and some ideas and architectures used in this area in more details.
Finally, Chapter 5 draws a conclusion.
5
2. Developmental Psychology
Developmental psychology tries to describe, comprehend and explain the development
of human behaviour and cognition. A great deal of the developmental psycho logists
attention is devoted to human development during infancy. This part of developmental
psychology has a strong influence on developmental artificial intelligence (Guerin,
2011) (Cangelosi, et al., 2010).
2.1.
Piaget’s Sensorimotor Theory
Piaget‟s Sensorimotor Theory (Piaget, 1952) is his attempt to explain the human
cognitive development during infancy and early childhood. According to Piaget, the
first two years of a child consist of six successive stages, where mastering one stage
enables a transition to the next stage. During these stages the child shifts from
performing simple random actions and no knowledge of the world to a stage with a
world knowledge where the child can predict the outcome of an action in advance. In
Piaget‟s view, an infant comes with little inborn intellectual structure, but with the
capability to perform basic actions on its environment which ensure progressive
cognitive development and adaption to the world.
Stage I:
Reflexes
The first stage describes the behaviour of up to one- month-old infants, whose actions
solely consist of inborn reflexes like sucking, closing the hand if touched and tracking
moving objects with the eyes. These reflexes become arbitrarily performable actions. So
far the infant itself has no concept of an external world.
Stage II:
Primary Circular Reactions
During stage two (1-4 Months), new and more sophisticated action schemas emerge by
randomly combining, assimilating and accommodating basic stage one actions. This is,
by generalizing and adapting existing action schemas to suit new situations and objects,
new behaviour schemas arise that were not there at birth. E.g. reaching for an object and
bringing it to the mouth. These schemas Piaget calls Primary Circular Reactions, are
executed for their own sake, with the gratification lying within the act, not in the object
the schema is performed on. In their current world model, objects and actions are linked
to each other and cannot be distinguished from one another.
Stage III:
Secondary Circular Reactions
In stage three (4-8 Months) the infants, by understanding that desirable states may result
from their actions, start to develop a model of an external world. The infant begins to
perform actions not for their own sake, but for the desired result of the action. Piaget
calls that Secondary Circular Reactions. The infants begin to separate their actions (the
mean) from their sensations (the end) and therefore they start to develop simple meansend behaviour. An example of stage three behaviour could be an infant who has learned
shaking a rattle results in an interesting noise.
Currently, this project focuses on emulating the transition from stage three to stage four.
We therefore discuss this part in more detail.
6
Willatts (1999) studied means-end behaviour of six to eight- month old infants in more
detail. His findings refute theories contradicting Piaget, such as Baillargeon‟s claim that
young infants already possess subgoaling capacity (Baillargeon, 1993), and confirm
Piaget‟s theory “of a shift from transitional to intentional means-end behaviour and
suggest that that development of means-end behaviour involves acquisition of
knowledge of appropriate means-end relations” (Willatts, 1999). Though, some
disagreements remain concerning whether Piaget‟s claims, about whe n something
happens, are always accurate. E.g., according to Goubet & Clifton (1998), there is
evidence that 6- month-old infants are able to use memory of complex events to guide
and correct their actions in order to achieve a goal. This observation contradicts Piaget‟s
claim that goal-directed behaviour first appears in Stage four (Willatts, 1999). Piaget
did not observe this specific behaviour at this specific time, but his theory does allow
for overlap between stages, and so this behaviour can still belong in stage four. This is
because Piaget claims that transitions from one stage to another are not disjunctive, but
overlapping, and progress in different stages can take place in parallel.
Stage IV:
Coordination of Schemas
While, in stage three the infant discovered the means-end relations after performing an
action and observing its result, in stage four (8-12 Months) infants develop the ability to
create means-end relationships even before performing the action. Infants now put
together series of actions, so far only performed on their own, in order to reach a goal.
Simultaneously, infants develop basic understanding of spatial relations of objects to
each other. This understanding leads to object permanence - the knowledge that an
object continues to exist even when hidden. With this understanding an infant is able to
search for hidden objects by removing a lid and then grasping the object. It also is able
to obtain an object out of reach by using a means-end schema and first pulling a support
to bring the object into reach and then grasping it.
Stage V:
Tertiary Circular Reactions
Beginning in stage five (12-18 Months), infants start to experiment in order to find new
solutions to problems, rather than just adapting and combining existing schemas. E.g.
infants try to use objects as tools to get hold of other objects which are out of reach.
Bremner describes them as “[having] taken a step „outside‟ activity to see how activities
work and to develop new activities that will have desired outcomes” (Bremner, 1994, p.
129). Piaget calls this experimenting Tertiary Circular Reactions. The infants also
further developed their model of the world and their concept of objects is now fully
independent of their actions, but they do not see themselves as another object in their
world, yet.
Stage VI:
Mental Representation
After reaching stage six (18–24 Months), an infant is able to predict the outcome of a
particular action, without having to try that action first. The infants‟ behaviour is no
longer only trial and error driven when searching for solutions to solve problems.
Instead, the infants now come up with sophisticated plans to effectively reach a goal.
“The infant is imagining his own displacements „as if he saw them from outside‟”
7
(Piaget, 1954, p. 204). Thus, infants have now a fully developed model of the world and
placed themselves into this world as just another object. Also they developed a mental
representation of objects, allowing them to imagine that changes may occur to objects
out of sight.
2.2.
Information Processing Principles
The Information Processing Principles (Cohen, Chaput, & Cashon, 2002) are another
approach to describe cognitive development of infants. Like Piaget‟s Sensorimotor
Theory the Information Processing Principles suggest a hierarc hical learning process
and are “essentially a restatement of Piaget‟s constructivism framed in the context of
modern learning systems” (Chaput, 2004, p. 9). The Principles are based on Cohen‟s
Information Processing Approach to infant cognitive development (Cohen, 1998).
1) Infants are endowe d with an innate information-processing system:
Cohen, Chaput & Cashon (2002) state that infants are born without innate core
knowledge, but with an innate system, enabling them to develop their own repertoire of
knowledge by accessing low–level information, such as sound, texture and movement.
2) Infants form higher sche mas from lowe r sche mas:
Cohen (1998) claims, that learning and cognitive development is based on a hierarchical
learning system. The ability to process more complex information is the result of
integrating lower- level schemas into higher-level schemas based on correlations of the
activity of the lower- level schemas.
3) Higher sche mas serve as components for still-highe r schemas:
There is no inherent limit to the hierarchy. Not only during infancy, but throughout
human lifespan, any schema can be integrated into even higher level schemas and grow
the hierarchy constructed so far.
4) There is a bias to process using highest-forme d schemas:
New information is preferably processed using the highest available level of schemas,
as those schemas are more generalized and better adapted to the environment.
5) If, for some reason, highe r schemas are not available, lower-level schemas
are utilized:
In the case that the current highest level is not capable of dealing with the situation,
lower level schemas are executed. By being able to fall back to lower levels in the
hierarchy, an infant can still cope well with a new situation, rather than trying basic
random actions again. The infant can then adapt or grow the higher levels to increase its
performance.
6) This learning system applies throughout development and across domains
The proposed hierarchical learning system represents not only cognitive development
during infancy, but represents cognitive development and learning in general.
8
2.3.
Discussion
Both, Piaget‟s Sensorimotor Theory (Piaget, 1952) and the Information Processing
Principles (Cohen, Chaput, & Cashon, 2002) suggest that learning and cognitive
development in infants is based on a hierarchical learning system which enables an
infant to, over time, process more and more complex information. The Sensorimotor
Theory and the Information Processing Principles complement each other. While Piaget
mainly describes and justifies the actual stages he observed and the transitions
occurring, he does not tell much about the how this transition system, underlying his
observations, works. The Information Processing Principles however, do not describe
actual stages, but instead they describe the underlying principles leading to the
occurrence of different stages in a hierarchical learning system.
Both theories have a strong influence on developmental artificial intelligence. Piaget
provides guiding principles to inspire AI (see Drescher‟s simulated infant in Chapter 4)
whereas Cohen provides more detail which can be directly used to design an
architecture for developmental AI systems (see Chaput‟s Constructivist Learning
Architecture in Section 4.4).
9
3. Technology
This chapter briefly describes a selection of techniques used within this research project
and related work.
3.1.
Reinforcement Learning
Reinforcement learning (RL) is one computational approach to learning from interaction
(Sutton & Barto, 1998). Unlike most other machine learning systems, a RL system
learns from experimenting with actions and evaluating the outcome of the actions via
reward, rather than from guidance where the system is given the correct action for a
series of learning states. Here arises the need for infant-like trial-and-error behaviour. A
RL system has to find a trade-off between exploration and exploitation. By exploiting
its current knowledge a RL system tries to reach its goals as efficiently as possible.
However, in order to discover efficient behaviour policies it first must explore and
perform actions, not selected before.
A Reinforcement learning system consists of four main elements: a policy, a reward
function, a value function and, optionally, a model of the environment (Sutton & Barto,
1998).
The reward function determines the instantaneous, inherent desirability of a perceived
state. This equals pleasure and pain in biological systems, where a positive reward
stands for pleasure and a negative reward stands for pain. The goal of agents in
reinforcement learning systems is defined by the reward function.
The value function is an approximation of the total sum of reward an agent can expect
to gather in the future, starting from a specific state. The value function not only takes
into account the reward of the very next state but also the rewards of the states expected
to come after the next state. E.g. one state may give a low reward, but still have a high
value, as it is followed by high reward states.
The policy is the core of a RL agent and specifies the behaviour of that agent. The
policy maps from experienced states to actions best to be taken in those states. This
mapping ranges from simple lookup tables to sophisticated search processes involving
extensive computation. The policy‟s action choices are based on the value function, as
the value function helps to maximise the collected reward in the long run.
The model of the environment is an optional part of a reinforcement learning agent. In
the first RL systems the selection of the best action was purely based on the learned
value function. Newer RL systems incorporate models in order to, given a state and an
action, predict the next state and its reward. Taking into account predicted rewards
rather than actually experienced ones only, is called planning. Modern RL systems
simultaneously “learn by trial and error, learn a model of the environment, and use the
model for planning” (Sutton & Barto, 1998, p. 9).
Reinforcement learning systems have successfully been used to implement
developmental artificial intelligence systems resulting in robust autonomous systems.
Three recent accomplishments are worth mentioning here. Chaput (2004) used a RL
system in connection with his constructivist learning architecture to create a robust and
flexible autonomous robot controller. Bakker and Schmidhuber (2004) implemented a
10
hierarchical RL system with sub-goaling capabilities. Last but not least, Mugan &
Kuipers (2008) implemented a RL system with an enhanced function approximation
method for complex continuous environments.
3.2.
Q-Learning
Reinforcement learning is learning a behaviour policy to maximise the reward earned
for reaching specific states. A reinforcement learning system is not told what to do but
has to discover good behaviour strategies by trying them. Several algorithms exist to
realise this mapping of situation to actions.
“One of the most important breakthroughs in reinforcement learning was the
development of an off-policy […] control algorithm known as Q- learning” (Sutton &
Barto, 1998). The Q-learning algorithm (Watkins C. J., 1989) can be used to learn an
optimal policy while gathering experience following another policy. No matter what
policy is followed to gather experience, the Q-learning algorithm is proven to converge
to an optimum (Watkins & Dayan, 1992), as long as each state-action pair keeps getting
updated eventually. The policy followed during training could be complete random
behaviour or a fixed policy which is to be replaced by a better one.
3.3.
Function Approximation
For reasonably small state-action spaces, value functions can be represented as simple
look- up table with one entry for each state-action pair (Sutton & Barto, 1998). This is a
simple and straight forward version of a Value function representation but it is
inappropriate for large and continuous domains. Not only is the memory needed for
large look up tables an issue, but more importantly the amount of time necessary to fill
the table accurately is an issue. In complex continuous environments, such as the real
world, the time to fill such a table approaches infinity. It is therefore crucial to
generalise from a limited subset of experienced state-action sets to the whole stateaction space. This generalisation is called function approximation and the methods used
are instances of supervised learning.
3.3.1. Artificial Neural Networks
An artificial neural network (ANN) is a biologically inspired mesh of artificial neurons.
ANNs were first described and used by McCulloch & Pitts (1943) who used a simple
version of artificial neurons and proved that ANNs can calculate almost every logic or
arithmetic function. Four years later they were the first to point out that such ANNs can
be successfully used for spatial pattern recognition (Pitts & McCulloch, 1947). The
strength of ANNs lies in its ability to generalise from training examples. Typical areas
in which ANNs are employed are function approximation, c lassification or clustering.
A classic artificial neuron has multiple weighted inputs and one single output value
where the output is the sum of the weighted inputs or a function of this sum (Rey &
Beck). Typical functions are simple Heaviside step functions for binary ANNs or
sigmoid functions for continuous ANNs. An output can be input to several subsequent
neurons, each of which can assign a different weight to it.
11
The knowledge and experience of an ANN lies within its weights and connections (Rey
& Beck). Methods to train an ANN include adding or deleting weighted connections,
changing the output function or adding and deleting of neurons, but usually the training
of ANNs happens by changing its weights. This training can happen either supervised
or unsupervised. Backpropagation (Bryson & Ho, 1969) is a commonly used supervised
learning method to train ANNs from examples where the correct output is known.
3.3.2. Neural Fitted Q
The Neural Fitted Q (NFQ) algorithm (Riedmiller, 2005) uses an ANN to approximate a
Q-Value function. NFQ trains its ANN offline via backpropagation using batches of
saved experience. The backpropagation algorithm used by NFQ is the Rprop algorithm
(Riedmiller & Braun, 1993), which is one of the “best performing first-order learning
methods for neural networks” (Igel & Husken, 2003). Thanks to its efficient learning
algorithm and the offline learning in batch mode, the NFQ algorithm is highly data
efficient (Riedmiller, Gabel, Hafner, & Lange, 2009). That means only relatively few
training instances are necessary to obtain a good learning result.
3.3.3. Self-organizing Maps
Self-organizing maps (SOM) (Kohonen, 1990) are a special type of artificial neural
network. SOMs map from multi-dimensional state-spaces to low-dimensional
representations. They are trained in an unsupervised manner and typically used to
classify or find patterns within the input data. SOMs are especially interesting for use in
developmental artificial intelligence systems, as their mode of operation reflects
biological neural network mechanisms. Chaput (2004) uses SOMs in his Constructivist
Learning Architecture (see Section 4.4).
12
4. Developmental Artificial Intelligence
While developmental psychology tries to describe, comprehend and explain the learning
system innate to every human being, developmental artificial intelligence, inspired by
developmental psychology, aims to implement artificial intelligence systems which
learn their own knowledge, world model and abilities from infant- like interactions.
These developmental AI systems are not meant to copy human development exactly,
but instead they should loosely copy infant cognition development and follow the same
essential milestones. Zlatev & Balkenius (2001) stated that “true intelligence in natural
and (possibly) artificial systems presupposes three crucial properties:”
The agent needs:
1. a body
2. a physical and social environment
3. a learning system where development is the result of interaction with the
environment
One of the earliest and most influential works on developmental artificial intelligence is
Drescher‟s (1991) simulated infant. Drescher based his AI system on Piaget‟s (1952)
sensorimotor theory and used it to simulate an infant in a small physical environment.
Even though Drescher‟s implementation has been criticised for being inordinately
inefficient and unable to scale (Witkowski, 1997), his work is a promising approach to
developmental artificial intelligence and still motivation and drive of current research,
e.g. Chaput‟s Constructivist Learning Architecture (Chaput, 2004).
In my Summer project, a reinforcement learning system is used as the learning system
of a learning agent. The agent controls the body of a simulated infant, situated in a
simple physical environment and gets reward for basic behaviours such as tracking an
object with the eye or putting an object into its mouth and sucking on it. By getting
reward for actions, first resulting from random behaviour, the infant learns sophisticated
behaviours like searching for objects with the eye and bringing them to the mouth with
the arm to suck on them.
The following subsections describe some high- level architectures, frameworks and ideas
used in developmental AI systems.
4.1.
Intrinsic Motivation
Research is done to replace the purely random exploration behaviour of simple RL
systems by more sophisticated exploration mechanisms leading to improved RL system
performance. Intrinsic motivation is a reward, not triggered by the environment. In the
terms of RL systems, the environment is not only the environment of an agents‟ body
(external environment), but also the body itself (internal environment). Classical
reward, the equivalence to pleasure and pain, is considered extrinsic motivation, as it is
triggered by the (internal) environment. However, a lot of human behaviour, such as
child‟s play, is not motivated by the environment, hence it must be motivated
intrinsically. Intrinsically motivated exploratory activities are crucial for the cognitive
13
development of children (Oudeyer, Kaplan, & Hafner, 2007). Intrinsic motivation
possibly provides a way to intelligent exploratory behaviour instead of exploring with
random actions only.
4.1.1. Evolutionary Perspective
Satinder et al. (2010) adopted an evolutionary perspective on Intrinsic Motivation. They
believe that “the evolutionary process gave exploration, play, discovery, etc., positive
hedonic valence because these behaviours contributed to reproductive success
throughout evolution”. In parallel to the extrinsic reward, intrinsic reward can be
triggered by any abstract feature the agent can perceive. In an evolutionary approach
they search in the space of possible intrinsic reward functions for the one that
maximises the accumulated external reward of an agent which learns using that
particular reward function.
The search for optimal intrinsic reward functions leads to more effective learning agents
but it does not lead to agents with better end results. In fact, the final performance may
even be reduced. E.g. intrinsic reward for excessive eating helped humans to survive
until today, but now leads to overweight populations. This kind of intrinsic motivation
reflects evolution‟s approach to deal with the fact that experience of a biological entity
cannot be transferred to its offspring, but that is not necessarily the case in artificial
systems. The search for optimal intrinsic reward functions is extremely computationally
expensive and takes a lot of experience which is then lost for the optimised agent as it
comes with no inherent knowledge. Satinder et al. (2010) only compared the learning
effectiveness of agents with and without intrinsic motivation. However, one could
imagine training a simple RL system directly with all the training examples, instead of
first learning the optimal intrinsic motivation function. Satinder et al did not compare
the performance of these two approaches. It is left unclear which agent would perform
better in such a comparison. This is especially true because the search for optimised
intrinsic reward functions is so computational expensive that, for their study, Satinder et
al. (2010) had to deliberately limit the search space to cover only intrinsic rewards
functions they expected to be effective in the first place.
4.1.2. Intelligent Adaptive Curiosity
“Intelligent Adaptive Curiosity” (IAC) (Oudeyer, Kaplan, & Hafner, 2007) is an
intrinsic motivation based framework for “artificial curiosity” and rewards an agent for
performing actions leading to maximal learning outcome. To achieve this, IAC relies on
a memory of all the agent‟s experience. Based on this former experience, the
sensorimotor space is divided into regions. For each region there exists a learning
machine called “expert”, which is trained with the experience belonging to its region.
An expert is used to predict the outcome of possible next actions when the agent is in a
situation that belongs to the expert‟s region. For each expert‟s prediction, the error made
is measured, stored in a list, smoothed and used for an extrapolation of the derivative.
The Agent then chooses the action which maximises its expected learning progress
based on the extrapolated derivation value.
14
Unlike the evolutionary approach of Satinder at al., the IAC does not give reward for
predetermined features of the state space, but rewards novelty and learning itself. The
evolutionary approach leads the agent to successful behaviour by rewarding
intermediate steps on the way to extrinsically rewarded goals but does not lead the
exploration process itself. The IAC approach leads the agent to explore where
something can be learned and continues exploring as long as there is any further
learning outcome expected.
Both Intrinsic motivation approaches use different methods to achieve different goals
and could be used complementarily in a RL system utilizing the advantages of both.
4.2.
Perception Aided Learning
Imitation of other individuals‟ behaviour is yet another powerful strategy of infants to
guide their exploration activities. The ability to imitate demonstrated behaviour could
lead to extremely fast learning and easily programmable robots (Demiris & Johnson,
2003). Imitation would provide a means for agents to limit their sensorimotor search
space to a region in which the agent is guaranteed to find successful action schemas.
However, this area research is relatively young and proposed systems, such as the
HAMMER architecture of Demiris & Khadhouri (2006), are not yet able to reproduce
infant imitation skills. Current approaches need very accurate information about what a
demonstrator perceives and merely enable agents to exactly copy the observed
behaviour for the sake of copying it. Research has to be done to enable an agent to make
sense of a demonstrated action in order to abstract and generalize from it and guide its
own learning and goal reaching behaviour.
4.3.
Constructivist Learning Architecture
As outlined in the developmental psychology part, a developmental AI system,
modelling infant cognition development, inevitably has to implement a hierarchical
learning system. The Constructivist Learning Architecture (CLA) (Chaput, 2004)
implements a hierarchical AI system based on the Information Processing Principles
(IPP) (Cohen, Chaput, & Cashon, 2002) to recreate and improve on the achievements of
Drescher‟s simulated infant. Self Organizing Maps (SOM) (Kohonen, 1997),
unsupervised learning machines, are used by CLA to build a hierarchical knowledge
base. In the lowest level, a SOM is stimulated and trained by sensor values only. Once
one layer is trained, it is frozen and a new SOM is created. This next level SOM is
trained with both, the sensor values and the output of the preceding SOM. This way,
like the IPPs (see Section 2.2.) 2 and 3 demand, trained SOMs stay available and new
SOMs can make use of the knowledge already learned. CLA was proven to successfully
replicate several characteristics of human cognitive development, in particular the
perception of causality.
The main advantage of the CLA over other AI systems is that it successfully
implements the IPPs 4 and 5, too. That means CLA allows an agent to fall back to lower
level behaviour if necessary. If applicable, the output of the highest level SOM is used
to control the agents‟ behaviour. If, for any reason, the highest level is not capable of
dealing with a situation successfully, the agent falls back to the next lower leve l to
15
control its behaviour. This fallback method enables agents to respond to unexpected
situations, such as a change in the environment, “by gracefully degrading in
performance, rather than failing outright” (Chaput, 2004).
Stage transitions in Piaget‟s theory do not happen in a successive manner from one selfcontained stage to the next. Rather, consecutive levels of hierarchies are constructed in
an overlapping way, where one stage keeps evolving even after higher stages started
evolving too (Piaget, 1952). Once parts of a stage reach a sufficiently stable state, the
next stage in the hierarchy can begin to emerge, building a new level based on the stable
parts of the precedent levels in the hierarchy whilst the other level still undergoes
further progress. This further progress then leads to supplementary development in the
higher levels which now can exploit the newly developed parts of the lower level.
The Constructive Learning Architecture, however, freezes different stages after they
have been trained to a specific degree, which contradicts the theories of developmental
psychology. The performance of the CLA may be increasable by not freezing stages.
Without the freezing implementation, CLA would also model the property of
overlapping stages. This property may have a positive impact on the learning system‟s
performance. Firstly, higher level features could be available earlier, if their
development was initiated even before the preceding level is mastered. Secondly, lower
levels could keep optimizing themselves, hence, the performance of higher levels,
building on these lower levels, would improve too. And thirdly, in the case of falling
back to a lower level, the performance drop would be less significant, as the constantly
improved lower level would be superior to a frozen one.
16
5. Conclusions
This dissertation outlined and justified developmental psychology as a basis for
developmental AI. It further reviewed recent works in the area of developmental AI,
such as intrinsic motivation frameworks and Chaput‟s CLA architecture, and described
important techniques used. The main conclusions drawn by this dissertation were,
firstly, that to build a successful developmental AI system one has to implement a
hierarchical learning system resembling Piagetian stages which follows the essential
milestones of infant cognition development. Secondly RL systems have to apply more
sophisticated exploration strategies such as intrinsic motivation or perception aided
learning. But those strategies first need to improve further in order to really emulate
infant like exploration and playing behaviour. Reviewing current work I observed that
systems are often tested in very limited environments in which most modern
developmental AI systems are likely to succeed. In order to obtain not only positive
results more complex environments which overstrain the system should be used for
testing. This is necessary to seriously evaluate the capabilities of an imple mentation and
to find not only its strengths but also its weaknesses and drawbacks. Furthermore, I see
a need for a standard benchmarking environment. One standard environment used to
assess all the different implementations of developmental AI systems would allow for a
better and easier comparison of those.
A possible next step for the Summer project could be the implementation of a
hierarchical learning system similar to the CLA but without the shortcoming of frozen
stages. That would bring the system a step closer to the cognitive system innate to
human infants.
17
Bibliography
Baillargeon, R. (1993). The object concept revisited: New directions in the investigation
of infants' physical knowledge. In C. E. Granrud (Ed.), Visual perception and cognition
in infancy (pp. 265-315). New Jersey.
Bakker, B., & Schmidhuber, J. (2004). Hierarchical Reinforcement Learning Based on
Subgoal Discovery and Subpolicy Specialization. In F. Groen, N. Amato, A. Bonarini,
E. Yoshida, & B. Krose (Eds.), Intelligent Autonomous Systems 8 (pp. 438-445).
Amsterdam: IOS Press.
Bremner, J. G. (1994). Infancy. Oxford: Blackwell.
Brooks, R. A. (1991). Intelligence without representation. Artificial Intelligence , 47 (13), 139-159.
Bryson, A. E., & Ho, Y.-C. (1969). Applied Optimal Control: Optimization, Estimation,
and Control. IEEE Transactions on Systems, Man and Cybernetics , 366 - 367.
Cangelosi, A., Metta, G., Sagerer, G., Nolfi, S., Nehaniv, C., Fischer, K., et al. (2010).
Integration of Action and Language Knowledge: A Roadmap for Developmental
Robotics. IEEE Transactions on Autonomous Mental Development , 2 (3), 167-195.
Chaput, H. H. (2004). The Constructivist Learning Architecture: A Model of Cognitive
Development for Robust Autonomous Robots. PhD thesis, The University of Texas at
Austin, Artificial Intelligence Laboratory. Austin: The University of Texas at Austin.
Cohen, L. B. (1998). An Information-Processing Approach to Infant Perception and
Cognition. In F. Simion, & G. Butterworth (Eds.), The Development of Sensory, Motor,
and Cognitive Capacities in Early Infancy (pp. 277-300). Hove: Psychology Press.
Cohen, L. B., Chaput, H. H., & Cashon, C. H. (2002). A constructivist model of infant
cognition. Cognitive Development , 17 (3-4), 1323-1343.
Demiris, Y., & Johnson, M. (2003). Distributed, predictive perception of actions: a
biologically inspired robotics architecture for imitation and learning. Connection
Science , 15 (4), 231-243.
Demiris, Y., & Khadhouri, B. (2006). Hierarchical attentive multiple models for
execution and recognition of actions. Robotics and Autonomous Systems , 54 (3), 361369.
Drescher, G. L. (1991). Made-Up Minds: A Constructivist Approach to Artificial
Intelligence. Cambridge: MIT Press.
Goertzel, B., & Pennachin, C. (Eds.). (2007). Artificial general intelligence. Dordrecht:
Springer.
Goubet, N., & Clifton, R. (1998). Object and event representation in 6 1/2- month-old
infants. Developmental Psychology , 34 (1), 63-76.
18
Guerin, F., to appear, (2011). Learning Like Baby: A Survey of AI approaches. The
Knowledge Engineering Review .
Igel, C., & Husken, M. (2003). Empirical evaluation of the improved Rprop learning
algorithms. Neurocomputing , 50, 105-123.
John
(n.d.).
DigitalBlind.com.
Retrieved
April
17,
2011,
http://digitalblind.com/wp-content/uploads/2011/03/Shades-of-Cognition.jpg
from
Kohonen, T. (1997). Self-Organizing Maps. Berlin: Springer.
Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE , 78 (9), 1464 1480.
LN.
(n.d.).
RobotCub.org.
Retrieved
April
17,
2011,
from
http://www.robotcub.org/index.php/robotcub/content/download/1382/4817/file/201001-grasp2.jpg
Lungarella, M., Metta, G., Pfeifer, R., & Sandini, G. (2003). Developmental robotics: a
survey. Connection Science , 15 (4), 151–190.
Meeden, L. A., & Blank, D. S. (2006). Introduction to developmental robotics.
Connection Science , 18 (2), 93-96.
Mugan, J., & Kuipers, B. (2008). Continuous-Domain Reinforcement Learning Using a
Learned Qualitative State Representation. 22nd International Workshop on Qualitative
Reasoning.
Oudeyer, P.-Y., Kaplan, F., & Hafner, V. V. (2007). Intrinsic Motivation Systems for
Autonomous mental Development. IEEE Transactions on Evolutionary Computation ,
11 (2), 265-286.
Piaget, J. (1954). The Construction of Reality in the Child. (M. Cook, Trans.) New
York: Basic books.
Piaget, J. (1952). The Origins of Intelligence in Children. (M. Cook, Trans.) New York:
Basic Books (originally published in French 1936).
Pitts, W., & McCulloch, W. (1943). A logical calculus of the ideas immanent in nervous
activity. Bulletin of Mathematical Biology , 5 (4), 115-133.
Pitts, W., & McCulloch, W. (1947). How we know universals the perception of auditory
and visual forms. Bulletin of Mathematical Biology , 9 (3), 127-147.
Prince, C., Helder, N., & Hollich, G. (2005). Ongoing Emergence: A Core Concept in
Epigenetic Robotics. Proceedings of EpiRob05 - International Conference on
Epigenetic Robotics, (pp. 63-70).
Rey, G. D., & Beck, F. (n.d.). Neuronale Netze - Eine Einführung. Retrieved 04 10,
2011, from http://www.neuronalesnetz.de
19
Riedmiller, M. (2005). Neural Fitted Q Iteration – First Experiences with a Data
Efficient Neural Reinforcement Learning Method. In Machine Learning: ECML 2005
(Vol. 3720, pp. 317-328). Berlin / Heidelberg: Springer Berlin / Heidelberg.
Riedmiller, M., & Braun, H. (1993). A direct adaptive method for faster
backpropagation learning: the RPROP algorithm. IEEE International Conference on
Neural Networks, 1, pp. 586 - 591.
Riedmiller, M., Gabel, T., Hafner, R., & Lange, S. (2009). Reinforcement learning for
robot soccer. Autonomous Robots , 27 (1), 55-73.
Singh, S., Lewis, R. L., Barto, A. G., & Sorg, J. (2010). Intrinsically Motivated
Reinforcement Learning: An Evolutionary Perspective. IEEE Transactions on
Autonomous Mental Development , 2 (2), 70 - 82.
Sutton, R. S. (2001, 11 15). Verification, The Key to AI. Retrieved 04 10, 2011, from
http://webdocs.cs.ualberta.ca/~sutton/IncIdeas/KeytoAI.html
Sutton, R. S., & Barto, A. G. (1998). Reinforcement Learning: An Introduction.
Cambridge: MIT Press.
Turing, A. M. (1950). Computing machinery and intelligence. Mind , 59 (236), 433460.
Watkins, C. J. (1989). Learning from Delayed Rewards. PhD thesis, Cambridge:
Cambridge University.
Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine Learning , 8 (3-4), 279-292.
Willatts, P. (1999). Development of Means-end Behaviour in Young Infants: Pulling a
Support to Retrieve a Distant Object. Developmental Psychology , 35 (3), 651-667.
Witkowski, W. (1997). Schemes for Learning and Behaviour: A New Expectancy
Model. PhD thesis, Queen Mary Westfield College, Department of Computer Science.
London: University of London.
Zlatev, J., & Balkenius, C. (2001). Introduction: Why "epigenetic robotics"? (C.
Balkenius, J. Zlatev, H. Kozima, K. Dautenhahn, & C. Breazeal, Eds.) Epigenetic
Robotics , Vol. 85.
20