Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Making Artificial Intelligence which Copies the Way Babies Learn Severin Fichtl CS5577 - Scientific, Technological and Market Research Supervisor: Frank Guerin Department of Computing Science April 2011 1 Executive summary Independent knowledge creation and maintenance is the key to powerful artificial general intelligence. Uncertainty about how to build a system with this key- feature and the capability to obtain all the commonsense knowledge of human adults has led to growing interest in developmental artificial intelligence. This research document introduces the developmental approach to artificial intelligence and describes developmental psychology as its guide and motivator. It then outlines the developmental AI approach in more details and describes some high- level architectures, frameworks and ideas used in developmental AI systems. This is followed by a description of selected techniques used within this research project and related work. Finally, this paper draws a conclusion and discusses some suggestions for future improvements. 2 1. Introduction ................................................................................................................ 4 2. Developmental Psychology ....................................................................................... 6 3. 4. 2.1. Piaget‟s Sensorimotor Theory ............................................................................ 6 2.2. Information Processing Principles...................................................................... 8 2.3. Discussion........................................................................................................... 9 Technology............................................................................................................... 10 3.1. Reinforcement Learning ................................................................................... 10 3.2. Q-Learning ....................................................................................................... 11 3.3. Function Approximation .................................................................................. 11 3.3.1. Artificial Neural Networks ........................................................................ 11 3.3.2. Neural Fitted Q .......................................................................................... 12 3.3.3. Self-organizing Maps ................................................................................ 12 Developmental Artificial Intelligence...................................................................... 13 4.1. 5. Intrinsic Motivation .......................................................................................... 13 4.1.1. Evolutionary Perspective........................................................................... 14 4.1.2. Intelligent Adaptive Curiosity ................................................................... 14 4.2. Perception Aided Learning ............................................................................... 15 4.3. Constructivist Learning Architecture ............................................................... 15 Conclusions .............................................................................................................. 17 Bibliography .................................................................................................................... 18 3 1. Introduction Infants are incredibly smart and flexible. They devise and apply highly sophisticated strategies in order to circumvent obstacles and achieve their goals in novel ways. Oneyear-olds by far outperform any of today‟s state of the art artificial general intelligence (Goertzel & Pennachin, 2007) implementations. Strong and successful AI systems are still limited to very specific problem domains such as chess-playing. This introduction explains the problems with traditional AI systems and outlines the developmental approach which sets out to resolve those problems. The traditional approach to Artificial Intelligence is to implement a huge knowledge database alongside the AI controller. Engineers of traditional AI systems have to prepare the system for every possible problem. This approach works only for a very limited scope of application, as firstly, in order to make the AI system more flexible and overarching, the manually implemented knowledge base has to be huge and secondly, it is impossible to foresee every possible problem an AI system might encounter. In addition, Brooks (1991) criticised the traditional approach to AI for two further reasons: firstly, the developer‟s conceptualisation of the environment might not be the most suitable one for an AI system using dissimilar perception and action mechanisms and secondly, the abstractions, implemented by a human designer, might be entirely different to what the human is actually using himself. “Essentially the criticism states that when human engineers try to make up the appropriate knowledge representation for an AI system, they will invariably do it wrongly” (Guerin, 2011). Sutton (2001) claims that the ability to independently analyse whether or not an AI system is working correctly is the key to a successful AI. An agent that can assess its own success may be able to modify itself to ensure or regain success. Sutton postulates a “Verification Principle” for allowing this self assessment: “An AI system can create and maintain knowledge only to the extent that it can verify that knowledge itself” (Sutton, 2001) Today‟s AI systems and knowledge bases rely entirely on human construction and maintenance. “„Birds have wings‟ they say, but of course they have no way of verifying this.” (Sutton, 2001) These problems have led to rising attention to developmental Artificial Intelligence (Cangelosi, et al., 2010) (Prince, Helder, & Hollich, 2005) (Lungarella, Metta, Pfeifer, & Sandini, 2003) (Meeden & Blank, 2006) whose idea originates in Turing‟s paper on “Computing machinery and intelligence” (Turing, 1950). Figure 1 shows a presentation of Icub as an example of the state of the art in develop mental robotics. 4 Figure 1: Icub robot “playing” with colourful foam balls. Rather than implementing the AI system‟s knowledge base manually, research in developmental AI tries to answer the question of “How does one create a learning system that develops its own model of the world, but is also sophisticated enough to handle complex tasks efficiently?” (Chaput, 2004). Evolution already answered that question by creating the learning system innate to every human being. While developmental psychology tries to describe, comprehend and explain this system, developmental artificial intelligence, inspired by developmental psychology, aims to implement artificial intelligence systems which learn their own knowledge, world model and abilities from infant- like interactions. In the following chapter (Chapter 2), developmental psychology is presented as a basis of developmental artificial intelligence. Chapter 3 briefly describes a selection of techniques used within this research project and related work. Chapter 4 describes developmental AI and some ideas and architectures used in this area in more details. Finally, Chapter 5 draws a conclusion. 5 2. Developmental Psychology Developmental psychology tries to describe, comprehend and explain the development of human behaviour and cognition. A great deal of the developmental psycho logists attention is devoted to human development during infancy. This part of developmental psychology has a strong influence on developmental artificial intelligence (Guerin, 2011) (Cangelosi, et al., 2010). 2.1. Piaget’s Sensorimotor Theory Piaget‟s Sensorimotor Theory (Piaget, 1952) is his attempt to explain the human cognitive development during infancy and early childhood. According to Piaget, the first two years of a child consist of six successive stages, where mastering one stage enables a transition to the next stage. During these stages the child shifts from performing simple random actions and no knowledge of the world to a stage with a world knowledge where the child can predict the outcome of an action in advance. In Piaget‟s view, an infant comes with little inborn intellectual structure, but with the capability to perform basic actions on its environment which ensure progressive cognitive development and adaption to the world. Stage I: Reflexes The first stage describes the behaviour of up to one- month-old infants, whose actions solely consist of inborn reflexes like sucking, closing the hand if touched and tracking moving objects with the eyes. These reflexes become arbitrarily performable actions. So far the infant itself has no concept of an external world. Stage II: Primary Circular Reactions During stage two (1-4 Months), new and more sophisticated action schemas emerge by randomly combining, assimilating and accommodating basic stage one actions. This is, by generalizing and adapting existing action schemas to suit new situations and objects, new behaviour schemas arise that were not there at birth. E.g. reaching for an object and bringing it to the mouth. These schemas Piaget calls Primary Circular Reactions, are executed for their own sake, with the gratification lying within the act, not in the object the schema is performed on. In their current world model, objects and actions are linked to each other and cannot be distinguished from one another. Stage III: Secondary Circular Reactions In stage three (4-8 Months) the infants, by understanding that desirable states may result from their actions, start to develop a model of an external world. The infant begins to perform actions not for their own sake, but for the desired result of the action. Piaget calls that Secondary Circular Reactions. The infants begin to separate their actions (the mean) from their sensations (the end) and therefore they start to develop simple meansend behaviour. An example of stage three behaviour could be an infant who has learned shaking a rattle results in an interesting noise. Currently, this project focuses on emulating the transition from stage three to stage four. We therefore discuss this part in more detail. 6 Willatts (1999) studied means-end behaviour of six to eight- month old infants in more detail. His findings refute theories contradicting Piaget, such as Baillargeon‟s claim that young infants already possess subgoaling capacity (Baillargeon, 1993), and confirm Piaget‟s theory “of a shift from transitional to intentional means-end behaviour and suggest that that development of means-end behaviour involves acquisition of knowledge of appropriate means-end relations” (Willatts, 1999). Though, some disagreements remain concerning whether Piaget‟s claims, about whe n something happens, are always accurate. E.g., according to Goubet & Clifton (1998), there is evidence that 6- month-old infants are able to use memory of complex events to guide and correct their actions in order to achieve a goal. This observation contradicts Piaget‟s claim that goal-directed behaviour first appears in Stage four (Willatts, 1999). Piaget did not observe this specific behaviour at this specific time, but his theory does allow for overlap between stages, and so this behaviour can still belong in stage four. This is because Piaget claims that transitions from one stage to another are not disjunctive, but overlapping, and progress in different stages can take place in parallel. Stage IV: Coordination of Schemas While, in stage three the infant discovered the means-end relations after performing an action and observing its result, in stage four (8-12 Months) infants develop the ability to create means-end relationships even before performing the action. Infants now put together series of actions, so far only performed on their own, in order to reach a goal. Simultaneously, infants develop basic understanding of spatial relations of objects to each other. This understanding leads to object permanence - the knowledge that an object continues to exist even when hidden. With this understanding an infant is able to search for hidden objects by removing a lid and then grasping the object. It also is able to obtain an object out of reach by using a means-end schema and first pulling a support to bring the object into reach and then grasping it. Stage V: Tertiary Circular Reactions Beginning in stage five (12-18 Months), infants start to experiment in order to find new solutions to problems, rather than just adapting and combining existing schemas. E.g. infants try to use objects as tools to get hold of other objects which are out of reach. Bremner describes them as “[having] taken a step „outside‟ activity to see how activities work and to develop new activities that will have desired outcomes” (Bremner, 1994, p. 129). Piaget calls this experimenting Tertiary Circular Reactions. The infants also further developed their model of the world and their concept of objects is now fully independent of their actions, but they do not see themselves as another object in their world, yet. Stage VI: Mental Representation After reaching stage six (18–24 Months), an infant is able to predict the outcome of a particular action, without having to try that action first. The infants‟ behaviour is no longer only trial and error driven when searching for solutions to solve problems. Instead, the infants now come up with sophisticated plans to effectively reach a goal. “The infant is imagining his own displacements „as if he saw them from outside‟” 7 (Piaget, 1954, p. 204). Thus, infants have now a fully developed model of the world and placed themselves into this world as just another object. Also they developed a mental representation of objects, allowing them to imagine that changes may occur to objects out of sight. 2.2. Information Processing Principles The Information Processing Principles (Cohen, Chaput, & Cashon, 2002) are another approach to describe cognitive development of infants. Like Piaget‟s Sensorimotor Theory the Information Processing Principles suggest a hierarc hical learning process and are “essentially a restatement of Piaget‟s constructivism framed in the context of modern learning systems” (Chaput, 2004, p. 9). The Principles are based on Cohen‟s Information Processing Approach to infant cognitive development (Cohen, 1998). 1) Infants are endowe d with an innate information-processing system: Cohen, Chaput & Cashon (2002) state that infants are born without innate core knowledge, but with an innate system, enabling them to develop their own repertoire of knowledge by accessing low–level information, such as sound, texture and movement. 2) Infants form higher sche mas from lowe r sche mas: Cohen (1998) claims, that learning and cognitive development is based on a hierarchical learning system. The ability to process more complex information is the result of integrating lower- level schemas into higher-level schemas based on correlations of the activity of the lower- level schemas. 3) Higher sche mas serve as components for still-highe r schemas: There is no inherent limit to the hierarchy. Not only during infancy, but throughout human lifespan, any schema can be integrated into even higher level schemas and grow the hierarchy constructed so far. 4) There is a bias to process using highest-forme d schemas: New information is preferably processed using the highest available level of schemas, as those schemas are more generalized and better adapted to the environment. 5) If, for some reason, highe r schemas are not available, lower-level schemas are utilized: In the case that the current highest level is not capable of dealing with the situation, lower level schemas are executed. By being able to fall back to lower levels in the hierarchy, an infant can still cope well with a new situation, rather than trying basic random actions again. The infant can then adapt or grow the higher levels to increase its performance. 6) This learning system applies throughout development and across domains The proposed hierarchical learning system represents not only cognitive development during infancy, but represents cognitive development and learning in general. 8 2.3. Discussion Both, Piaget‟s Sensorimotor Theory (Piaget, 1952) and the Information Processing Principles (Cohen, Chaput, & Cashon, 2002) suggest that learning and cognitive development in infants is based on a hierarchical learning system which enables an infant to, over time, process more and more complex information. The Sensorimotor Theory and the Information Processing Principles complement each other. While Piaget mainly describes and justifies the actual stages he observed and the transitions occurring, he does not tell much about the how this transition system, underlying his observations, works. The Information Processing Principles however, do not describe actual stages, but instead they describe the underlying principles leading to the occurrence of different stages in a hierarchical learning system. Both theories have a strong influence on developmental artificial intelligence. Piaget provides guiding principles to inspire AI (see Drescher‟s simulated infant in Chapter 4) whereas Cohen provides more detail which can be directly used to design an architecture for developmental AI systems (see Chaput‟s Constructivist Learning Architecture in Section 4.4). 9 3. Technology This chapter briefly describes a selection of techniques used within this research project and related work. 3.1. Reinforcement Learning Reinforcement learning (RL) is one computational approach to learning from interaction (Sutton & Barto, 1998). Unlike most other machine learning systems, a RL system learns from experimenting with actions and evaluating the outcome of the actions via reward, rather than from guidance where the system is given the correct action for a series of learning states. Here arises the need for infant-like trial-and-error behaviour. A RL system has to find a trade-off between exploration and exploitation. By exploiting its current knowledge a RL system tries to reach its goals as efficiently as possible. However, in order to discover efficient behaviour policies it first must explore and perform actions, not selected before. A Reinforcement learning system consists of four main elements: a policy, a reward function, a value function and, optionally, a model of the environment (Sutton & Barto, 1998). The reward function determines the instantaneous, inherent desirability of a perceived state. This equals pleasure and pain in biological systems, where a positive reward stands for pleasure and a negative reward stands for pain. The goal of agents in reinforcement learning systems is defined by the reward function. The value function is an approximation of the total sum of reward an agent can expect to gather in the future, starting from a specific state. The value function not only takes into account the reward of the very next state but also the rewards of the states expected to come after the next state. E.g. one state may give a low reward, but still have a high value, as it is followed by high reward states. The policy is the core of a RL agent and specifies the behaviour of that agent. The policy maps from experienced states to actions best to be taken in those states. This mapping ranges from simple lookup tables to sophisticated search processes involving extensive computation. The policy‟s action choices are based on the value function, as the value function helps to maximise the collected reward in the long run. The model of the environment is an optional part of a reinforcement learning agent. In the first RL systems the selection of the best action was purely based on the learned value function. Newer RL systems incorporate models in order to, given a state and an action, predict the next state and its reward. Taking into account predicted rewards rather than actually experienced ones only, is called planning. Modern RL systems simultaneously “learn by trial and error, learn a model of the environment, and use the model for planning” (Sutton & Barto, 1998, p. 9). Reinforcement learning systems have successfully been used to implement developmental artificial intelligence systems resulting in robust autonomous systems. Three recent accomplishments are worth mentioning here. Chaput (2004) used a RL system in connection with his constructivist learning architecture to create a robust and flexible autonomous robot controller. Bakker and Schmidhuber (2004) implemented a 10 hierarchical RL system with sub-goaling capabilities. Last but not least, Mugan & Kuipers (2008) implemented a RL system with an enhanced function approximation method for complex continuous environments. 3.2. Q-Learning Reinforcement learning is learning a behaviour policy to maximise the reward earned for reaching specific states. A reinforcement learning system is not told what to do but has to discover good behaviour strategies by trying them. Several algorithms exist to realise this mapping of situation to actions. “One of the most important breakthroughs in reinforcement learning was the development of an off-policy […] control algorithm known as Q- learning” (Sutton & Barto, 1998). The Q-learning algorithm (Watkins C. J., 1989) can be used to learn an optimal policy while gathering experience following another policy. No matter what policy is followed to gather experience, the Q-learning algorithm is proven to converge to an optimum (Watkins & Dayan, 1992), as long as each state-action pair keeps getting updated eventually. The policy followed during training could be complete random behaviour or a fixed policy which is to be replaced by a better one. 3.3. Function Approximation For reasonably small state-action spaces, value functions can be represented as simple look- up table with one entry for each state-action pair (Sutton & Barto, 1998). This is a simple and straight forward version of a Value function representation but it is inappropriate for large and continuous domains. Not only is the memory needed for large look up tables an issue, but more importantly the amount of time necessary to fill the table accurately is an issue. In complex continuous environments, such as the real world, the time to fill such a table approaches infinity. It is therefore crucial to generalise from a limited subset of experienced state-action sets to the whole stateaction space. This generalisation is called function approximation and the methods used are instances of supervised learning. 3.3.1. Artificial Neural Networks An artificial neural network (ANN) is a biologically inspired mesh of artificial neurons. ANNs were first described and used by McCulloch & Pitts (1943) who used a simple version of artificial neurons and proved that ANNs can calculate almost every logic or arithmetic function. Four years later they were the first to point out that such ANNs can be successfully used for spatial pattern recognition (Pitts & McCulloch, 1947). The strength of ANNs lies in its ability to generalise from training examples. Typical areas in which ANNs are employed are function approximation, c lassification or clustering. A classic artificial neuron has multiple weighted inputs and one single output value where the output is the sum of the weighted inputs or a function of this sum (Rey & Beck). Typical functions are simple Heaviside step functions for binary ANNs or sigmoid functions for continuous ANNs. An output can be input to several subsequent neurons, each of which can assign a different weight to it. 11 The knowledge and experience of an ANN lies within its weights and connections (Rey & Beck). Methods to train an ANN include adding or deleting weighted connections, changing the output function or adding and deleting of neurons, but usually the training of ANNs happens by changing its weights. This training can happen either supervised or unsupervised. Backpropagation (Bryson & Ho, 1969) is a commonly used supervised learning method to train ANNs from examples where the correct output is known. 3.3.2. Neural Fitted Q The Neural Fitted Q (NFQ) algorithm (Riedmiller, 2005) uses an ANN to approximate a Q-Value function. NFQ trains its ANN offline via backpropagation using batches of saved experience. The backpropagation algorithm used by NFQ is the Rprop algorithm (Riedmiller & Braun, 1993), which is one of the “best performing first-order learning methods for neural networks” (Igel & Husken, 2003). Thanks to its efficient learning algorithm and the offline learning in batch mode, the NFQ algorithm is highly data efficient (Riedmiller, Gabel, Hafner, & Lange, 2009). That means only relatively few training instances are necessary to obtain a good learning result. 3.3.3. Self-organizing Maps Self-organizing maps (SOM) (Kohonen, 1990) are a special type of artificial neural network. SOMs map from multi-dimensional state-spaces to low-dimensional representations. They are trained in an unsupervised manner and typically used to classify or find patterns within the input data. SOMs are especially interesting for use in developmental artificial intelligence systems, as their mode of operation reflects biological neural network mechanisms. Chaput (2004) uses SOMs in his Constructivist Learning Architecture (see Section 4.4). 12 4. Developmental Artificial Intelligence While developmental psychology tries to describe, comprehend and explain the learning system innate to every human being, developmental artificial intelligence, inspired by developmental psychology, aims to implement artificial intelligence systems which learn their own knowledge, world model and abilities from infant- like interactions. These developmental AI systems are not meant to copy human development exactly, but instead they should loosely copy infant cognition development and follow the same essential milestones. Zlatev & Balkenius (2001) stated that “true intelligence in natural and (possibly) artificial systems presupposes three crucial properties:” The agent needs: 1. a body 2. a physical and social environment 3. a learning system where development is the result of interaction with the environment One of the earliest and most influential works on developmental artificial intelligence is Drescher‟s (1991) simulated infant. Drescher based his AI system on Piaget‟s (1952) sensorimotor theory and used it to simulate an infant in a small physical environment. Even though Drescher‟s implementation has been criticised for being inordinately inefficient and unable to scale (Witkowski, 1997), his work is a promising approach to developmental artificial intelligence and still motivation and drive of current research, e.g. Chaput‟s Constructivist Learning Architecture (Chaput, 2004). In my Summer project, a reinforcement learning system is used as the learning system of a learning agent. The agent controls the body of a simulated infant, situated in a simple physical environment and gets reward for basic behaviours such as tracking an object with the eye or putting an object into its mouth and sucking on it. By getting reward for actions, first resulting from random behaviour, the infant learns sophisticated behaviours like searching for objects with the eye and bringing them to the mouth with the arm to suck on them. The following subsections describe some high- level architectures, frameworks and ideas used in developmental AI systems. 4.1. Intrinsic Motivation Research is done to replace the purely random exploration behaviour of simple RL systems by more sophisticated exploration mechanisms leading to improved RL system performance. Intrinsic motivation is a reward, not triggered by the environment. In the terms of RL systems, the environment is not only the environment of an agents‟ body (external environment), but also the body itself (internal environment). Classical reward, the equivalence to pleasure and pain, is considered extrinsic motivation, as it is triggered by the (internal) environment. However, a lot of human behaviour, such as child‟s play, is not motivated by the environment, hence it must be motivated intrinsically. Intrinsically motivated exploratory activities are crucial for the cognitive 13 development of children (Oudeyer, Kaplan, & Hafner, 2007). Intrinsic motivation possibly provides a way to intelligent exploratory behaviour instead of exploring with random actions only. 4.1.1. Evolutionary Perspective Satinder et al. (2010) adopted an evolutionary perspective on Intrinsic Motivation. They believe that “the evolutionary process gave exploration, play, discovery, etc., positive hedonic valence because these behaviours contributed to reproductive success throughout evolution”. In parallel to the extrinsic reward, intrinsic reward can be triggered by any abstract feature the agent can perceive. In an evolutionary approach they search in the space of possible intrinsic reward functions for the one that maximises the accumulated external reward of an agent which learns using that particular reward function. The search for optimal intrinsic reward functions leads to more effective learning agents but it does not lead to agents with better end results. In fact, the final performance may even be reduced. E.g. intrinsic reward for excessive eating helped humans to survive until today, but now leads to overweight populations. This kind of intrinsic motivation reflects evolution‟s approach to deal with the fact that experience of a biological entity cannot be transferred to its offspring, but that is not necessarily the case in artificial systems. The search for optimal intrinsic reward functions is extremely computationally expensive and takes a lot of experience which is then lost for the optimised agent as it comes with no inherent knowledge. Satinder et al. (2010) only compared the learning effectiveness of agents with and without intrinsic motivation. However, one could imagine training a simple RL system directly with all the training examples, instead of first learning the optimal intrinsic motivation function. Satinder et al did not compare the performance of these two approaches. It is left unclear which agent would perform better in such a comparison. This is especially true because the search for optimised intrinsic reward functions is so computational expensive that, for their study, Satinder et al. (2010) had to deliberately limit the search space to cover only intrinsic rewards functions they expected to be effective in the first place. 4.1.2. Intelligent Adaptive Curiosity “Intelligent Adaptive Curiosity” (IAC) (Oudeyer, Kaplan, & Hafner, 2007) is an intrinsic motivation based framework for “artificial curiosity” and rewards an agent for performing actions leading to maximal learning outcome. To achieve this, IAC relies on a memory of all the agent‟s experience. Based on this former experience, the sensorimotor space is divided into regions. For each region there exists a learning machine called “expert”, which is trained with the experience belonging to its region. An expert is used to predict the outcome of possible next actions when the agent is in a situation that belongs to the expert‟s region. For each expert‟s prediction, the error made is measured, stored in a list, smoothed and used for an extrapolation of the derivative. The Agent then chooses the action which maximises its expected learning progress based on the extrapolated derivation value. 14 Unlike the evolutionary approach of Satinder at al., the IAC does not give reward for predetermined features of the state space, but rewards novelty and learning itself. The evolutionary approach leads the agent to successful behaviour by rewarding intermediate steps on the way to extrinsically rewarded goals but does not lead the exploration process itself. The IAC approach leads the agent to explore where something can be learned and continues exploring as long as there is any further learning outcome expected. Both Intrinsic motivation approaches use different methods to achieve different goals and could be used complementarily in a RL system utilizing the advantages of both. 4.2. Perception Aided Learning Imitation of other individuals‟ behaviour is yet another powerful strategy of infants to guide their exploration activities. The ability to imitate demonstrated behaviour could lead to extremely fast learning and easily programmable robots (Demiris & Johnson, 2003). Imitation would provide a means for agents to limit their sensorimotor search space to a region in which the agent is guaranteed to find successful action schemas. However, this area research is relatively young and proposed systems, such as the HAMMER architecture of Demiris & Khadhouri (2006), are not yet able to reproduce infant imitation skills. Current approaches need very accurate information about what a demonstrator perceives and merely enable agents to exactly copy the observed behaviour for the sake of copying it. Research has to be done to enable an agent to make sense of a demonstrated action in order to abstract and generalize from it and guide its own learning and goal reaching behaviour. 4.3. Constructivist Learning Architecture As outlined in the developmental psychology part, a developmental AI system, modelling infant cognition development, inevitably has to implement a hierarchical learning system. The Constructivist Learning Architecture (CLA) (Chaput, 2004) implements a hierarchical AI system based on the Information Processing Principles (IPP) (Cohen, Chaput, & Cashon, 2002) to recreate and improve on the achievements of Drescher‟s simulated infant. Self Organizing Maps (SOM) (Kohonen, 1997), unsupervised learning machines, are used by CLA to build a hierarchical knowledge base. In the lowest level, a SOM is stimulated and trained by sensor values only. Once one layer is trained, it is frozen and a new SOM is created. This next level SOM is trained with both, the sensor values and the output of the preceding SOM. This way, like the IPPs (see Section 2.2.) 2 and 3 demand, trained SOMs stay available and new SOMs can make use of the knowledge already learned. CLA was proven to successfully replicate several characteristics of human cognitive development, in particular the perception of causality. The main advantage of the CLA over other AI systems is that it successfully implements the IPPs 4 and 5, too. That means CLA allows an agent to fall back to lower level behaviour if necessary. If applicable, the output of the highest level SOM is used to control the agents‟ behaviour. If, for any reason, the highest level is not capable of dealing with a situation successfully, the agent falls back to the next lower leve l to 15 control its behaviour. This fallback method enables agents to respond to unexpected situations, such as a change in the environment, “by gracefully degrading in performance, rather than failing outright” (Chaput, 2004). Stage transitions in Piaget‟s theory do not happen in a successive manner from one selfcontained stage to the next. Rather, consecutive levels of hierarchies are constructed in an overlapping way, where one stage keeps evolving even after higher stages started evolving too (Piaget, 1952). Once parts of a stage reach a sufficiently stable state, the next stage in the hierarchy can begin to emerge, building a new level based on the stable parts of the precedent levels in the hierarchy whilst the other level still undergoes further progress. This further progress then leads to supplementary development in the higher levels which now can exploit the newly developed parts of the lower level. The Constructive Learning Architecture, however, freezes different stages after they have been trained to a specific degree, which contradicts the theories of developmental psychology. The performance of the CLA may be increasable by not freezing stages. Without the freezing implementation, CLA would also model the property of overlapping stages. This property may have a positive impact on the learning system‟s performance. Firstly, higher level features could be available earlier, if their development was initiated even before the preceding level is mastered. Secondly, lower levels could keep optimizing themselves, hence, the performance of higher levels, building on these lower levels, would improve too. And thirdly, in the case of falling back to a lower level, the performance drop would be less significant, as the constantly improved lower level would be superior to a frozen one. 16 5. Conclusions This dissertation outlined and justified developmental psychology as a basis for developmental AI. It further reviewed recent works in the area of developmental AI, such as intrinsic motivation frameworks and Chaput‟s CLA architecture, and described important techniques used. The main conclusions drawn by this dissertation were, firstly, that to build a successful developmental AI system one has to implement a hierarchical learning system resembling Piagetian stages which follows the essential milestones of infant cognition development. Secondly RL systems have to apply more sophisticated exploration strategies such as intrinsic motivation or perception aided learning. But those strategies first need to improve further in order to really emulate infant like exploration and playing behaviour. Reviewing current work I observed that systems are often tested in very limited environments in which most modern developmental AI systems are likely to succeed. In order to obtain not only positive results more complex environments which overstrain the system should be used for testing. This is necessary to seriously evaluate the capabilities of an imple mentation and to find not only its strengths but also its weaknesses and drawbacks. Furthermore, I see a need for a standard benchmarking environment. One standard environment used to assess all the different implementations of developmental AI systems would allow for a better and easier comparison of those. A possible next step for the Summer project could be the implementation of a hierarchical learning system similar to the CLA but without the shortcoming of frozen stages. That would bring the system a step closer to the cognitive system innate to human infants. 17 Bibliography Baillargeon, R. (1993). The object concept revisited: New directions in the investigation of infants' physical knowledge. In C. E. Granrud (Ed.), Visual perception and cognition in infancy (pp. 265-315). New Jersey. Bakker, B., & Schmidhuber, J. (2004). Hierarchical Reinforcement Learning Based on Subgoal Discovery and Subpolicy Specialization. In F. Groen, N. Amato, A. Bonarini, E. Yoshida, & B. Krose (Eds.), Intelligent Autonomous Systems 8 (pp. 438-445). Amsterdam: IOS Press. Bremner, J. G. (1994). Infancy. Oxford: Blackwell. Brooks, R. A. (1991). Intelligence without representation. Artificial Intelligence , 47 (13), 139-159. Bryson, A. E., & Ho, Y.-C. (1969). Applied Optimal Control: Optimization, Estimation, and Control. IEEE Transactions on Systems, Man and Cybernetics , 366 - 367. Cangelosi, A., Metta, G., Sagerer, G., Nolfi, S., Nehaniv, C., Fischer, K., et al. (2010). Integration of Action and Language Knowledge: A Roadmap for Developmental Robotics. IEEE Transactions on Autonomous Mental Development , 2 (3), 167-195. Chaput, H. H. (2004). The Constructivist Learning Architecture: A Model of Cognitive Development for Robust Autonomous Robots. PhD thesis, The University of Texas at Austin, Artificial Intelligence Laboratory. Austin: The University of Texas at Austin. Cohen, L. B. (1998). An Information-Processing Approach to Infant Perception and Cognition. In F. Simion, & G. Butterworth (Eds.), The Development of Sensory, Motor, and Cognitive Capacities in Early Infancy (pp. 277-300). Hove: Psychology Press. Cohen, L. B., Chaput, H. H., & Cashon, C. H. (2002). A constructivist model of infant cognition. Cognitive Development , 17 (3-4), 1323-1343. Demiris, Y., & Johnson, M. (2003). Distributed, predictive perception of actions: a biologically inspired robotics architecture for imitation and learning. Connection Science , 15 (4), 231-243. Demiris, Y., & Khadhouri, B. (2006). Hierarchical attentive multiple models for execution and recognition of actions. Robotics and Autonomous Systems , 54 (3), 361369. Drescher, G. L. (1991). Made-Up Minds: A Constructivist Approach to Artificial Intelligence. Cambridge: MIT Press. Goertzel, B., & Pennachin, C. (Eds.). (2007). Artificial general intelligence. Dordrecht: Springer. Goubet, N., & Clifton, R. (1998). Object and event representation in 6 1/2- month-old infants. Developmental Psychology , 34 (1), 63-76. 18 Guerin, F., to appear, (2011). Learning Like Baby: A Survey of AI approaches. The Knowledge Engineering Review . Igel, C., & Husken, M. (2003). Empirical evaluation of the improved Rprop learning algorithms. Neurocomputing , 50, 105-123. John (n.d.). DigitalBlind.com. Retrieved April 17, 2011, http://digitalblind.com/wp-content/uploads/2011/03/Shades-of-Cognition.jpg from Kohonen, T. (1997). Self-Organizing Maps. Berlin: Springer. Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE , 78 (9), 1464 1480. LN. (n.d.). RobotCub.org. Retrieved April 17, 2011, from http://www.robotcub.org/index.php/robotcub/content/download/1382/4817/file/201001-grasp2.jpg Lungarella, M., Metta, G., Pfeifer, R., & Sandini, G. (2003). Developmental robotics: a survey. Connection Science , 15 (4), 151–190. Meeden, L. A., & Blank, D. S. (2006). Introduction to developmental robotics. Connection Science , 18 (2), 93-96. Mugan, J., & Kuipers, B. (2008). Continuous-Domain Reinforcement Learning Using a Learned Qualitative State Representation. 22nd International Workshop on Qualitative Reasoning. Oudeyer, P.-Y., Kaplan, F., & Hafner, V. V. (2007). Intrinsic Motivation Systems for Autonomous mental Development. IEEE Transactions on Evolutionary Computation , 11 (2), 265-286. Piaget, J. (1954). The Construction of Reality in the Child. (M. Cook, Trans.) New York: Basic books. Piaget, J. (1952). The Origins of Intelligence in Children. (M. Cook, Trans.) New York: Basic Books (originally published in French 1936). Pitts, W., & McCulloch, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biology , 5 (4), 115-133. Pitts, W., & McCulloch, W. (1947). How we know universals the perception of auditory and visual forms. Bulletin of Mathematical Biology , 9 (3), 127-147. Prince, C., Helder, N., & Hollich, G. (2005). Ongoing Emergence: A Core Concept in Epigenetic Robotics. Proceedings of EpiRob05 - International Conference on Epigenetic Robotics, (pp. 63-70). Rey, G. D., & Beck, F. (n.d.). Neuronale Netze - Eine Einführung. Retrieved 04 10, 2011, from http://www.neuronalesnetz.de 19 Riedmiller, M. (2005). Neural Fitted Q Iteration – First Experiences with a Data Efficient Neural Reinforcement Learning Method. In Machine Learning: ECML 2005 (Vol. 3720, pp. 317-328). Berlin / Heidelberg: Springer Berlin / Heidelberg. Riedmiller, M., & Braun, H. (1993). A direct adaptive method for faster backpropagation learning: the RPROP algorithm. IEEE International Conference on Neural Networks, 1, pp. 586 - 591. Riedmiller, M., Gabel, T., Hafner, R., & Lange, S. (2009). Reinforcement learning for robot soccer. Autonomous Robots , 27 (1), 55-73. Singh, S., Lewis, R. L., Barto, A. G., & Sorg, J. (2010). Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective. IEEE Transactions on Autonomous Mental Development , 2 (2), 70 - 82. Sutton, R. S. (2001, 11 15). Verification, The Key to AI. Retrieved 04 10, 2011, from http://webdocs.cs.ualberta.ca/~sutton/IncIdeas/KeytoAI.html Sutton, R. S., & Barto, A. G. (1998). Reinforcement Learning: An Introduction. Cambridge: MIT Press. Turing, A. M. (1950). Computing machinery and intelligence. Mind , 59 (236), 433460. Watkins, C. J. (1989). Learning from Delayed Rewards. PhD thesis, Cambridge: Cambridge University. Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine Learning , 8 (3-4), 279-292. Willatts, P. (1999). Development of Means-end Behaviour in Young Infants: Pulling a Support to Retrieve a Distant Object. Developmental Psychology , 35 (3), 651-667. Witkowski, W. (1997). Schemes for Learning and Behaviour: A New Expectancy Model. PhD thesis, Queen Mary Westfield College, Department of Computer Science. London: University of London. Zlatev, J., & Balkenius, C. (2001). Introduction: Why "epigenetic robotics"? (C. Balkenius, J. Zlatev, H. Kozima, K. Dautenhahn, & C. Breazeal, Eds.) Epigenetic Robotics , Vol. 85. 20