Download learning motor skills by imitation: a biologically inspired robotic model

Cybernetics and Systems: An International Journal, 32 : 155± 193, 2001 Copyright # 2001 Taylor & Francis 0196- 9722/ 01 $12.00 + .00 LEARNING M OTOR SK ILLS BY IM ITATION: A BIOLOGIC ALLY INSPIRED ROBOTIC M ODEL AUD E B ILLAR D Robotics Laboratory, University of California, Los Angeles, California, USA This article presents a biologically inspired model for motor skills imitation. The model is composed of modules whose functinalities are inspired by corresponding brain regions responsible for the control of movement in primates. These modules are high-level abstractions of the spinal cord, the primary and premotor cortexes (M1 and PM), the cerebellum, and the temporal cortex. Each module is modeled at a connectionist level. Neurons in PM respond both to visual observation of movements and to corresponding motor commands produced by the cerebellum. As such, they give an abstract representation of mirror neurons. Learning of new combinations of movements is done in PM and in the cerebellum. Premotor cortexes and cerebellum are modeled by the DRAMA neural architecture which allows learning of times series and of spatio-temporal invariance in multimodal inputs. The model is implemented in a mechanical simulation of two humanoid avatars, the imitator and the imitatee. Three types of sequences learning are presented: (1) learning of repetitive patterns of arm and leg movements; (2) learning of oscillatory movements of shoulders and elbows, using video data of a human demonstration; 3) learning of precise movements of the extremities for grasp and reach. M OD ELS OF LEARNING B Y IM ITATION From a very early age and all our life, we learn many new ways of using our limbs, from driving, cooking, dancing to speaking and writing. An important part of this is done by the observation of others. Learning Address correspondence to Aude Billard, Robotics Laboratory, University of Southern California, SAL 230, Los Angeles, CA 90089. E-mail: [email protected] 155 156 A. BILLARD new motor skills by the observation and then reproduction of the behavior of conspeci¢cs is an example of social learning. It might be described as an imitative act (Byrne & Whiten, 1988; Moore, 1996). There is still some debate to determine what behaviors the term `ìmitation’’ refers to and in which species it is exhibited (see, e.g., Byrne & Whiten, 1988; Tomasello, 1990). Imitation (or ``true’’ imitation) is contrasted to mimicry, where imitation is more than the mere ability to reproduce others’ actions; it is the ability to replicate and, by so doing, learn ``new’’ skills (i.e., skills that are not part of the animal’s usual repertoire) by the simple observation of those performed by others. The current agreement is that only apes and humans are provided with the ability for true imitation. Simpler forms of imitation or mimicry have, however, been shown in rats (Heyes, 1996), monkeys (Visalberghy & Fragaszy, 1990), parrots (Moore, 1996), and dolphins (Herman, 1990). Neuroscientists and psychologists ¢nd a common interest in the study of imitation, which provides them with a means to compare and analyze the similarities and differences between humans and other animals’ cognition. In order to better understand the leap between the different levels of imitation in animals, there is a need to better describe the neural mechanisms underlying imitation. This work wishes to contribute to research on learning by imitation by proposing a neural model of the different cognitive processes involved in imitation. The motivation underlying this work is two-fold. First, it aims at developing a potential control mechanism for imitation in robots. Second, it aims at giving a possible model of the neurological circuits underlying primates’ imitative skills. M echanisms Underlying Imitation Motor skills imitation relies on the ability to recognize conspeci¢cs’ actions and to transform visual 1 patterns into motor commands. The mirror neural system in monkey premotor cortex has been proposed as the neural system responsible for the linkage of self-generated and observed actions (di Pellegrino et al., 1992; Gallese et al., 1996; Rizzolati et al., 1996a). Recent studies showed that a subset of the neurons located 1 Note that imitation can use other sensor modalities than that of vision, such as sound or touch. In this paper, only imitation of movement based on visual observation is addressed. LEARNING MOTOR SKILLS BY IMITATION 157 in the rostral part of inferior area 6 (F5) of the monkey become active both during monkey movements and when the monkey observes the experimentator or another monkey performing `àn action similar to the one that, when actively performed, triggers [that] neuron’’ (Ferraina et al., 1997; Fogassi et al., 1998; Gallese et al., 1996). Neurons endowed with these properties are referred to as ``mirror neurons’’ (Rizzolati et al., 1996a). The interpretation of mirror neurons is that they might be responsible ``for matching the neural command for an action with the neural code for the recognition of the same action executed by another primate’’ (Jeannerod et al., 1995; Rizzolati et al., 1996; Rizzolati & Arbib 1998). Research on the mirror system is still in its early stages. So far, mirror neurons have been observed only for reaching and grasping actions. It remains to be shown that mirror neurons exist for other movements than that of the arms and hands and that they exist in animals capable of true imitation (which is not the case with monkeys (Moore, 1996; Whiten & Ham, 1992)). Note that recent studies in humans measured an increased activity of left Broca’s area (area 45) (Rizzolati et al., 1996b) and in the left dorsal premotor area 6,2 during both observation and execution of hand actions. The discovery of the mirror system in monkeys is very exciting to those who wish to understand the neurological processes behind imitation. It suggests a possible neural circuit for transforming visual pattern into motor commands in primates. The work of this paper aims to contribute to research on the neural mechanisms behind imitation by developing a computational model of those mechanisms. The model is biologically inspired in its function, as its composite modules have functionalities similar to those of speci¢c brain regions, and in its structure, as the modules are composed of arti¢cial neural architectures. It is loosely based on neurological ¢ndings in primates and incorporates an abstract model of the spinal cord, the primary motor cortex (M1) and premotor cortex (PM), the temporal cortex (TC), and the cerebellum. Visual recognition of human movements is done in the temporal cortex, following recent observation (Perret et al., 1985). In the model, neurons in the premotor cortex respond to both visual recognition of movements and to the corresponding motor commands produced by the cerebellum. As such, they give an abstract representation of mirror neurons. Learning of new combinations of 2 Personal communication from Michael Arbib. 158 A. BILLARD movements is done in the PM module and in the cerebellum module. These two modules are implemented using the dynamical recurrent associative memory architecture (DRAMA) (Billard, 1998; Billard & Hayes, 1999), which allows learning of times series and of spatio-temporal invariance in multimodal inputs. Dynamical recurrent associative memory architecture has been successfully applied for on-line learning of autonomous mobile robots. For instance, it was used for the robot to record its path, by learning sequences of sensory inputs and motor commands (Billard & Hayes, 1999). Further, it was used in experiments in which the robot learned a proto-language, by extracting spatio-temporal invariance across its sensor perceptions (Billard, 2000, 1999; Billard & Dautenhahn, 1999). The robot learned the sequential ordering behind words occurring in a sentence (e.g., ``You move left foot’’). It also learned to attach a meaning for each word of the sentence by associating the word with a particular proprio- or exteroceptions (e.g., it associates the term ``left’’ to all instances of touch on the left side of its body). The DRAMA architecture is not a model of a particular biological neural circuit. Dynamical recurrent associative memory architecture’s internal structure and functioning (the neural activation function and the learning rules) are not biological. Therefore, the modeling of the PM and cerebellum does not respect the biological structure of the corresponding brain areas. In the model presented here, only the spinal cord module, whose structure was borrowed from Ijspeert’s model of vertebrates’ spinal circuits (Ijspeert et al., 1998, 1999), is biologically plausible. The biological inspiration underlying our model of imitation lies in the particular modular division, the connectivity across the modules, and the modules’ functionality. The modeling of M1, PM, and the cerebellum respects some of the functionality of the corresponding brain modules, namely: M1 allows the control of limb movements following a topographic map of the body; PM allows learning of actions as coactivation of movements stored in M1; and the cerebellum allows learning of the timing and extent of the sequential activation of motor commands in PM and M1 (in the second section, the parallel between the model and the brain is further expanded). As such, this work intends to contribute to biology by proposing a model of imitation which investigates the dynamics between speci¢c brain regions and which, although it does not model the details of these cerebral areas, is implemented at a connectionist level using (abstract) neurons as building blocks. LEARNING MOTOR SKILLS BY IMITATION 159 M odel of Learning by Imitation A better understanding of the neurological substrate of learning by imitation is also relevant to roboticists. Roboticists would bene¢t from the possibility of implementing a control mechanism allowing the robot to learn new skills (which would otherwise require complex programming) by the sole ability of observing another agent’s performance. Imitation can be the direct means of learning the skill, as in the case of learning new motor skills (see, for instance, Cooke et al., 1997; Dautenhahn, 1995; Demiris & Hayes, 1996; Gaussier et al., 1998; Kuniyoshi & Inoue, 1994; Schaal, 1999). It can also be an indirect means of teaching. For instance, in previous work, the robot’s ability to imitate the teacher is used to lead the robot to make speci¢c perceptual experiences upon which the robot grounds its understanding of a proto-language (Billard, 2000; Billard & Hayes, 1999, 1998). The ¢fth section discusses the relationships between imitation and the development of language and suggests potential contributions that our model could bring to this issue. Models of imitation and, in particular, of learning by imitation are scarce. A number of theoretical models of animals’ imitation, which propose different decompositions of the underlying cognitive processes, have been proposed (see, e.g., Heyes, 1996; Nehaniv & Dautenhahn, 1998). Computational models have also been proposed (Schmajuk & Zanutto, 1997), among which the most relevant are those implemented in robots. Kuniyoshi et al. (1994) did experiments in which a robot was able to reproduce a human demonstration of object manipulation. Recognition of movements was done by preprocessing visual input from ¢xed cameras placed above the scene. The robot’s controller had a prede¢ned set of possible hand and arm movements’ actions which it instantiated sequentially following the recognition of these in the demonstration. Demiris et al. (1997) developed a controller, which allowed a robot to reproduce the head movements (left-right and up-down shake) of a human’s demonstrator. The algorithm consisted of a prede¢ned mapping between the robot’s camera and the motors, from recognition of the visual £ow direction to activation of the corresponding motors. Schaal (1997) did experiments in which a robot learned ball juggling by observing a human demonstration. A ¢xed external camera, placed behind the robot, recorded the movements of the balls and the experimenter’s hands. Learning to juggle consisted of training a connectionist algorithm, which learned the dynamics of the ball-hand 160 A. BILLARD movements. The algorithm was trained by comparing the desired motion (as observed during the demonstration) to that achieved through numerous trials by the robot. The model presented in this article intends to bring three new contributions with respect to other models of imitation: First, the model is biased by biologically motivated constraints. These are the use of a connectionist representation and the building of a hierarchy of neural mechanisms which follows the neural functional decomposition found in primates. Second, it proposes a comprehensive model of learning by imitation from visual segmentation to motor control, using the 65 degrees of freedom of a humanoid body rather than a restricted set of joints. Note that the visual abilities of the model are for now limited to (video) tracking only human movements of the upper body part in the plane (but it allows tracking of movements of the complete body in simulation). Third, the model is validated through implementation in a mechanical simulation of two humanoids with high degrees of freedom, for reproducing a variety of actions. Experiments are conducted in which the imitator avatar learns different sequences of limbs movements, as ¢rst demonstrated by the imitatee avatar. Three types of sequence learning are presented: 1. learning of repetitive patterns of arm and leg movements; 2. learning of oscillatory movements of shoulder and elbows, using video data of a human demonstration; 3. learning of precise movements of the extremities: grasp and reach. Although the experiments presented here do not use a physical robot, the model has been built with the goal of implementing it on a real humanoid robot.3 For this reason, we use a realistic mechanical simulation of a humanoid, whose parameters can be adjusted to describe the particular dynamic of a robot, and have adapted the model so that it can take real input for the tracking system of a camera. The rest of the article is organized as follows. The second section describes in detail the architecture, precisely referring to the neurobiological correspondence of each module. The third section explains the mechanical simulation of two humanoid avatars used as 3 In collaboration with USC colleague Stefan Schaal and his collaborators at the Kawato Laboratory, the author will be working towards the implementation of the model on the hydraulic humanoid robot of the ATR Kawato’s Laboratory, located in Kyoto, Japan. LEARNING MOTOR SKILLS BY IMITATION 161 a platform for the implementation of the model. The fourth section reports on the results of the architecture’s implementation in the simulated platform. The ¢fth section discusses the results and the hypotheses behind the model. The sixth section concludes the paper with a short summary of the work presented followed by a brief outlook on continuing work. In the following the imitator will be referred to when speaking of the agent that imitates and of the imitatee when speaking of the agent that is being imitated. TH E AR C HITECTUR E The architecture is inspired by neurological models of visuo-motor processing. Figure 1 shows two corresponding schematics of the brain structure as identi¢ed by neurologists (Geschwind, 1979) and the proposed architecture. The architecture is divided into three parts for visual recognition, motor control and learning and is composed of seven modules. The seven modules are the attentional and temporal cortex modules, the primary motor cortex and spinal cord modules, the premotor cortex and cerebellum module, and the decision module. Visual recognition is performed by the visual and attentional modules. Motor control is directed by the spinal cord module and the primary motor cortex (M1) module, which both have direct connections to the motor neurons. The motor neurons activate the avatars’ muscles (see section 3). The M1 can also activate the spinal cord neurons. Learning of new motor sequences is done inside the premotor cortex (PM) and the cerebellum module. The neural connectivity inside the visual cortex, spinal cord and M1 is prede¢ned, while that inside the M1 and the cerebellum builds up during learning. Learning builds the connectivity between M1, PM and the cerebellum and within PM and the cerebellum. The next sections describe in detail our implementation of each of these modules. The decision module controls the passage between observing and reproducing the motor sequences. It is implemented as a set of if-then rules and has no direct biological inspiration. 162 Figure 1 (left) A. BILLARD LEARNING MOTOR SKILLS BY IMITATION 163 Figure 1. Above: The brain structure as identi¢ed by neurologists (adapted from Geschwind, 1979). Opposite: The architecture proposed: The architecture is divided into three parts for visual recognition, motor control, and learning and is composed of seven modules. The seven modules are the attentional and temporal cortex modules, the primary motor cortex and spinal cord modules, the premotor cortex and cerebellum module, and the decision module. A. BILLARD 164 The Visual System The temporal cortex module. The temporal cortex module (TC) performs recognition of the direction and orientation of movement of each imitatee’s limb relative to a frame of reference located on the imitatee’s body. That is, the module takes as input the Cartesian coordinates of each joint of the imitatee’s limbs in an eccentric frame of reference. It then transforms these coordinates to a new set of coordinates relative to an egocentric frame of reference. For doing this, we assume the existence of a visual preprocessing module, which recognizes human shapes and decomposes the visual information of a human body into joints coordinates. In the experiments using video data (second learning example presented in the fourth section), this visual preprocessing is done by a video tracking system developed by Weber.4 The module processes the visual input to calculate the speed and direction of movement of each limb segment relative to its parent limbs segment (that is, the limb segment to which it is attached). For instance, the speed of the hand is zero if the hand movement in space is due to the bending of the elbow or the shoulder rather than that of the wrist. The transformation of the frame of reference is done symbolically (as opposed to using a connectionist representation) by calculating the vector projections. Its output activates a series of cells coding for the six possible orientations relative to a Cartesian referential attached to each limb (see Figure 1). The farther away from the rest position in one particular direction, the greater the output excitation of the cell coding for this direction. This decomposition of the limbs’ relative position is transferred to the nodes of the M1 module, which encode the excitation states of each of the muscles associated with each limb, each muscle representing a movement in one of the possible six directions relative to its rest position (see the third section for details). Note that if there are fewer than three degrees of freedom in a joint, then fewer than six nodes will be activated for representing the possible orientations of that joint. 4 A technical report of the visual tracking system will soon be available at the University of Southern California. LEARNING MOTOR SKILLS BY IMITATION 165 Attentional module. The vision system also incorporates a simpli¢ed attentional mechanism which triggers whenever a signi¢cant change of position (relative to the position at the previous time-step) in one of the limbs is observed. Note that at this stage of the modeling and given the simplicity of this module, the attentional module does not relate to any speci¢c brain area. The module is represented by two nodes. The ¢rst node has a self-connection and receives input from all nodes in the temporal cortex (TC). It computes the sum of activation of the TC nodes if this sum is different from that at the previous time-step projected through the self-connection it ¢res. The second node receives an inhibition from the ¢rst node and outputs to each synapse which links M1 to PM. This node creates an inhibition, preventing information to £ow from M1 to PM and further to the cerebellum, therefore allowing learning of new movements only when a change in the limb position is observed. Biological motivations. The recognition of conspeci¢cs is clearly a capacity with which all animals capable of imitation are endowed. How this is done is still not completely understood. The visual system plays an important role by performing the ¢rst stages of recognition of shapes and movement. In primates, the primary visual cortex has a quasi-lattice structure where cells are arranged functionally into orientation-, color-, and size-speci¢c columns (Newsome et al., 1989; Perret et al., 1985). Cells located in the cortex of the temporal lobe in monkeys have also been shown to play an active role in the recognition of movements in both the observer’s (extrinsic) frame of reference and (important for our model) in the egocentric frame of reference of the observed agent (Perret et al., 1989a, 1989b). Therefore, the assumption of the recognition of each human limb and of an explicit coding of their orientation in the imitatee’s frame of reference is biologically plausible. However, this model makes no attempt to explain how this is done. Note that it is the aim to increase step-by-step the biological plausibility of each module of the model. Our ongoing work is currently building up a more complex visual and attentional module, taking inspiration from other neural models of visual attention (Niebur & Koch, 1994; Usher & Niebur; 1996). A. BILLARD 166 M otor C ontrol There is a three-level hierarchy in human motor control (Voogt, 1993). On the lowest level is the spinal cord, composed of primary neural circuits made of motor neurons (afferent to the muscles spindles, as well as responsible for the muscles activation or inhibition) and interneurons .5 The spinal circuits encode stretch and retracting arm movements and rhythmic movements of legs and arms involved in the locomotion, i.e., central pattern generators (Stein et al., 1997). The second level is the brain stem, which is responsible for coordination of muscle activation across the spinal circuits and for low-level motor response to somato- and visuo-sensory feedback (e.g., for postural adjustments and compensation of head and eye movements) (Requin & Stelmach, 1996). The third level corresponds to three cortical areas, the primary motor cortex, premotor cortex, and supplementary motor area. The two latter areas play an important role for coordinating and planning complex sequences of movements (Rothwell, 1994). The primary motor cortex contains a motor map of the body (Pen¢eld & Rossmussen, 1950). It is divided into subparts which each activate distinct parts of the body. The division gives a topographic representation of each limb motor dimension, with bigger parts for the limbs with more degrees of freedom such as the hands and face. This model gives a basic representation of some of the functionality of the spinal cord, the primary motor cortex, and the premotor cortex. In addition to these levels, another level of motor control is provided by the cerebellum and the basal ganglia (Voogt, 1993). The main functional difference between these two regions lies in their connectivity with the rest of the motor circuit. Parts of the cerebellum have direct afferent connection from the spinal cord and efferent connections to the brain stem, and reciprocal connections with the premotor and supplementary motor cortexes. In contrast, the basal ganglia has no direct connection with the spinal cord and very few with the brain stem, while it projects to regions of the prefrontal association cortex. The basal ganglia is thought to play a role in the high-level cognitive aspect of motor control (plani¢cation, execution of complex motor strategies) (Houk, 1997; Houk & Wise, 1995), as well as in gating all types of voluntary movement (see, e.g., Mink, 1996). The cerebellum has been shown to partici5 Inter- and motor-neurons are common terminology for describing the spinal cord neurons with, respectively, no direct and direct input to the muscles. LEARNING MOTOR SKILLS BY IMITATION 167 Figure 2. Schematics of the neural structure of each module and their interconnections. Uni- and bi-directional connectivity between the modules is represented by single and bidirectional arrows, respectively. Plastic connectivity is represented by plain arrows, while connection which form during learning are represented as dashed arrows (namely, those between cerebellum, PM, and M1). Connections are one neuron to all in all cases except from M1 to the spinal cord, in which case the speci¢c connectivity is drawn. pate in motor learning (Houk et al., 1996) and in particular in learning the timing of motor sequences (Thach, 1996). The cerebellum module in this model is used to learn a combination of movements encoded in the premotor cortex module (PM). It is represented by the DRAMA architecture and learns the timing of the sequences. It has a bidirectional connectivity with PM. Activation of nodes in the cerebellum after learning reactivates the learned sequences of node activation in PM, which further activates nodes in the primary motor cortex (M1) and, downwards the spinal cord; the motor neurons. The PM and cerebellum modules will be described in a later section, devoted to the learning system. The modules responsible only for the motor control are described in the following namely, the spinal cord and primary motor cortex modules. A. BILLARD 168 The spinal cord module. The spinal cord module comprises built-in networks of neurons, which produce retracting and stretch movements of the left and right arms and oscillations of the legs, knees, and arms, together resulting in a walking behavior. Note that only the templates for these motor behaviors are simulated here without integrating sensory feedback (such as postural and/or balance information). At this stage, the walking pattern is given by six coupled oscillators with variable frequency. Each oscillator is composed of two interneurons and one motor neuron (see Figure 2). The stretch and retracting movements of the arms are implemented as a set of two interconnected interneurons which, when activated, lead to the sequential activation of the shoulder and elbow extensor (for stretch) and £exor (for retracting) muscles. The oscillators are composed of neurons of intermediate complexity. Instead of stimulating each activity spike of a real neuron, the neuron unit is modeled as a leaky-integrator which computes the average ¢ring frequency (Hop¢eld, 1984). According to this model, the mean membrane potential mi of a neuron Ni is governed by the equation ti dmi /dt ˆ ¡mi ‡ wi, j xj . …1† where xj ˆ …1 ‡ e…mj ‡bj ††¡1 represents the neuron’s short-term average ¢ring frequency, bj is the neuron’s bias ti is a time-constant associated with the passive properties of the neuron’s membrane, and wi,j is the synaptic weight of a connection from neuron Nj to neuron Ni . Each neuron exhibits an internal dynamics and even small networks of these neurons have proven able to produce rich dynamics (Beer, 1995). The structure and parameters of the oscillators are inspired by oscillators developed using evolutionary algorithms for representing the central pattern generator underlying the swimming of the lamprey (Ijspeert et al., 1999) and the aquatic and terrestrial locomotion of the salamander (Ijspeert, 1999). These oscillators were developed to produce regular oscillations over a wide range of frequencies, with the frequency depending monotonically of the level of excitation applied to the oscillator. 6 6 It is relatively easy to de¢ne an oscillatory network by hand, but it is a hard task to set all the t and o parameters for producing stable oscillations over a large range of frequencies as was realized with the genetic algorithm in Ijspeert et al. (1999) and Ijspeert (1999). In this model, the values of the weights were then adjusted by hand in order to ¢t the constraints created by the mechanical simulation of the humanoid which were different from that of the salamander in Ijspeert (1999) and Ijspeert et al. (1999). LEARNING MOTOR SKILLS BY IMITATION 169 Figure 3. The humanoid avatar walking (left) and running (right). The networks for the retracting/stretching arm movements and the walking behavior receive input from the primary motor cortex module. The amplitude of the movements and the frequency of the oscillations can easily be modulated by varying the cerebral input to the motor neurons (for the amplitude) and to the network of interneurons (for the frequency) as in Ijspeert et al. (1999), which the motor neuron output and the frequency of the oscillation increasing monotonically with the excitation level. Figure 3 shows the oscillatory activity of the motor neurons of the shoulders, leg, and knees during open-loop walking. The motor neurons of the elbows are continuously activated to produce the elbow bending which can be observed in human walking and running. When the excitation of the motor cortex neural signals sent to the motor neurons is low, the humanoid walks (making small oscillations of the legs and shoulder and always keeping one foot on the ground) and the elbows are half bent (Figure 4 left). When the excitation is increased, the amplitude and frequency of movement increases and the humanoid starts running (the gait goes through a phase in which two feet are simultaneously in the air). As mentioned above, these patterns are just the 170 A. BILLARD templates underlying locomotion and would by no means be suf¢cient to produce a dynamically stable gait without being modulated by sensory feedback. The primary motor cortex. The primary motor cortex (M1) contains two layers of nodes which has each a set of nodes for each joint. The represented joints are the shoulder, elbow, wrist, ¢nger, head, hip, and knee joints. There are three nodes for each pair of muscles (£exor-extensor) in order to regulate independently the amplitude (two nodes, one for each muscle) and the frequency (one node) of the movement. One pair of muscles is used per degree of freedom (DOF) attached to each joint. Figure 2 shows the M1 connectivity for the elbow and the shoulder DOF along the x-axis. The ¢rst layer of neurons gets excited by the output of the visual system (TC module) for the recognition of speci¢c limb movements in the imitatee’s behavior. For recall, the output of the TC module to M1 is activated by the attentional module once a change has been observed in one of the imitatee’s limb movement, see the previous section. The corresponding three-node sets in M1 are then activated to represent the new state of activation of the limbs which have been seen to have changed. Hence, the ¢rst layer of nodes in M1 represents the current state of the imitatee’s limb activity in an egocentric frame of reference (as opposed to being represented in an intrinsic frame of reference as it is the case in the TC module). Figure 4. Activity of motor neurons of extensor and £exor (dashed and solid lines, respectively) for left and right shoulders (L-Sh, R-Sh), elbows (L-El, R-El), legs (L-Le, R-Le), and knees (L-Kn, R-Kn) during open-loop walking. LEARNING MOTOR SKILLS BY IMITATION 171 The second layer of nodes gets activated by the output of the premotor area for the execution of a movement. The execution of a movement is started by the decision module, by activating one of the cerebellum nodes (the node which encodes the corresponding sequence of muscles activation). The activity of the cerebellum node is passed down to the nodes of the premotor cortex, to which it is connected and, further, down to the nodes of the second layer of the primary motor cortex. Finally, the activity of the nodes in the second layer of M1 activates the nodes in the spinal cord module, which further activate the motor neurons and these the simulated muscles of the avatar. There is a one-to-one mapping between the nodes of the ¢rst and second layers of M1. That is, there is an isomorphic mapping between the neural area representing the recognition of a limb movement and that controlling the execution of the same limb movement. This mapping does not respect completely current biological ¢ndings. This will be further discussed in a later section. The Learning System The premotor cortex module. The premotor cortex (PM) is the location of the ¢rst stage of the learning of movement sequences. It learns combinations of excitation of the neurons in the ¢rst layer of M1, which encode the recognition of limb movements in the imitatee. The PM neurons activation function is the same as that of the M1 and cerebellum neurons and is given by equation 2. Learning in PM follows the same rules as that used for learning in the cerebellum, which are given by equations 4 and 5 (see the next section). The PM neurons receive input from all nodes in the ¢rst layer of M1 and output to all nodes in the second layer of M1 (see Figure 2). Learning of new sequences of movement consists of 1) updating the forward connections from the active nodes in the ¢rst layer of M1 to PM (for learning the visual pattern of the movement), and 2) updating the backwards connections from PM to the corresponding neurons in the second layer of M1 (for learning the visuo-motor correspondence). Backwards and forward connections with ¢rst and second layers, respectively, have same synaptic weights after update. In short, learning of M1-PM connectivity results in learning the visual pattern of the observed sequence as well as learning how to perform it. A. BILLARD 172 In the simulations reported here, the M1 module contains 130 neurons (two times the number of degrees of freedom per joints; see the third section), similarly to the cerebellum network. Initially, the weights of all connections to these neurons are zero, except those encoding the prede¢ned movements of reaching and grasping. Reaching consists of the coordinated activation of spinal networks, which encode the stretch movements of the elbow and the shoulder in horizontal and vertical directions. The level of excitation given to the shoulder £exors determines the position that will be reached as it ¢xes the amplitude of the movement through the motor neuron excitation. Left and right grasps consist of the coordinated activation of all £exor muscles of all ¢ngers in the left and right hand, respectively. The cerebellum module. Similarly to the PM module, the cerebellum module is composed of 130 nodes. Learning in both PM and cerebellum modules follows the rules of the DRAMA architecture, which is fully described in Billard and Hayes (1999). The modules in our model are composed of a set of nodes which are fully connected to all nodes in M1 (for the PM module) and in PM (for the cerebellum module), as shown in Figure 2. Each unit in the network also has a self-connection. While in the spinal cord module, the neurons were represented as leaky integrators; in M1, PM, and cerebellum modules the neuron’s activation function follows a linear ¢rst-order differential equation given by equation 2. yi …t† ˆ F …xi …t† ‡ tii yi …t ¡ 1† ‡ j6î G…tji , wji , yj …t ¡ 1†††. …2† F is the identity function for input values less than 1 and saturates to 1 for input values greater than 1 (F …x† ˆ x if x 1 and F(x) ˆ 1, otherwise) and G is the retrieving function whose equation is in 3. The indices notation used in the equations should be interpreted as follows: wji is the weight of the connection leading from unit j to unit i. G…tji , wji , yj …t ¡ 1†† ˆ A…tji † B…wji † A…tji † ˆ 1 ¡ Y…jyj …t ¡ 1† ¡ tji j, e…tij †† B…wji † ˆ y…wji , d…wij ††. …3† The function Y…x, H† is a threshold function that outputs 1 when x > ˆ H and 0, otherwise. The factor e is a error margin on the time par- LEARNING MOTOR SKILLS BY IMITATION 173 ameter. It is equal to 0.1. tij in the simulations, allowing a 10% imprecision in the record of the time-delay of units coactivation. The term d(wij ) is a threshold on the weight. It is equal to …max yj > 0 …wji †/y…wij †† y…wij † ˆ 2 in the experiments of the second section. max yj> 0 …wji † is the maximal value of con¢dence factor of all the connections between activated units j and unit i, which satisfy the temporal condition encoded in A…tji †. The self-connections on the units provide a short-term memory of the units’ activation (the term dyi /dt ˆ …tii ¡ 1†. yi , where tii < 1); the memory duration is determined by the decay rate tii of unit activation along the self-connection on the unit. Equation 2 can be paraphrased as follows: the output yi of a unit i in the network takes values between 0 and 1: yi …t† ˆ 1, when (i) an input unit xi (M1 nodes input to the PM and PM nodes input to the cerebellum) has just been activated (new motor event) or (ii) when the sum of activation provided by the other network units is suf¢cient to pass the two thresholds of time and weight, represented by the function G (see equation 3). A value less than 1 represents the memory of a past full activation (value 1). Table 1. Learning algorithm 1: Present an input I to TC. Compute the output of the attentional mechanism and of M1. The output vector of TC to M1 is either equal to the visual input to TC, if the TC input activity is su¤ciently di¡erent from the TC input activity at the previous cycle, or equal to the zero vector 2: Compute output yi of all units i of the PM and Cerebellum DRAMA networks, according to equation 2. An output unit is activated when the two following conditions are satis¢ed: (i) the time delay since activation of the input units which vote for the activation of the output units is equal to the recorded time lag between these units coactivation and (ii) the connection weights of all active input units which vote for the activation of this unit are greater than a ¢xed percentage of the maximal value of connection strength between all active units and output units at the time of retrieval. 3: Update the connection parameters of the DRAMA networks: If 9i and j (units of the DRAMA network), s.t. yi ˆ 1 and yj > 0, choose a node k s.t. 8l 6ˆ kwkl ˆ 0 & tkl ˆ 0 then update wki , tki , wkj , tkj of the connections from unit k to units i and j according to equations 4 and 5. A. BILLARD 174 Each connection between units i and j is associated with two parameters, a weight wij and a time parameter tij . Connections are bidirectional and wij 6ˆ wji , unless it is so as a result of learning (as it is the case in the experiments reported in this paper). Weights correspond to the synaptic strength, while the time parameters correspond to the decay rate of predendritic neurons’ activity along the synapses (similarly to feed-forward time-delay networks). Both parameters are modulated by the learning in order to represent the spatial (w) and temporal (t) regularity of the input to a node. The parameters are updated following Hebbian rules, given by equations 4 and 5. Learning starts with all weights and time parameters set to zero, unless speci¢ed differently to represent prede¢ned connection. This is the case for the M1 module, where connections are preset for de¢ning the grasp and reach movements. dwji ˆ a yi …t† yj …t† tji …t† ˆ tji …t ¡ 1† …wji /a† ‡ …yj …t†/yi …t†† …wji /a† ‡ 1 …4† yi …t† yj …t†, …5† where a is a constant factor by which the weights are incremented. The result of the learning in the PM and cerebellum modules is that the network builds up the connectivity of its nodes such as to represent spatio-temporal patterns of activation in the primary and premotor systems, respectively. This will be further explained in the fourth section, which presents the results of the implementation. Table 1 presents the complete learning algorithm. THE AVATAR ENVIR ONM ENT Cosimir (Freund & Rossman, 1999), a three-dimensional simulator of two humanoid avatars (see Figure 4), is used. An avatar has 65 degrees of freedom (DOFs): hip, shoulder, head, wrist, and ankle joints have 3 DOFs. The elbow, ¢nger, and knee joints have 1 DOF. Fingers have three joints, except the thumbs which have only 2. A basic mechanical simulation for the avatar was developed, simulating two muscles (£exor and extensor) for each DOF of the joints. Each muscle is represented as a spring and a damper (this is a standard model, see (Ijspeert (1999)), which are excited by the motor neuron LEARNING MOTOR SKILLS BY IMITATION 175 output. The external force applied to each joint is gravitation. Balance is handled by supporting the hips; ground contact is not modeled. There is no collision avoidance module. Finally, the internal torques that keep the limbs connected are not explicitly calculated. The newton equation of the forces acting on a joint, whose angle is y, is the following: m d y_ ˆ …ke E ¡ kf F † y ‡ …kpf ¡ kpe † y_ ¡ m g sin…y†. dt …6† m is the mass of the limb, g ˆ 9.81[m/s] is the constant of gravitation, E, F are the amplitudes of the motor neuron signals for the extensor and £exor muscles, a ˆ 5 is a factor of conversion of muscles strength resulting from the motor neuron excitation. ke ˆ 0.3, kf ˆ 0.3 are the spring constants of the muscles. kpf ˆ 30 and kpe ˆ 30 are the damping constants of the muscles. R ESULTS We present three examples of sequence learning implemented with the two avatars. Sequence 1 is a series of movements involving the shoulders, elbows, hips, and knees. Sequence 2 consists of oscillatory movements of the two arms. For this sequence, video data from recording a human demonstration were used: these were recordings of a human demonstration. 7 Sequence 3 is a series of movements of the right arm, hand, and ¢ngers: reaching, followed by grasping (contraction of all ¢ngers), a wrist rotation and arm retraction with bending of the elbow. Our choice of these sequences was motivated by the wish to demonstrate different aspects of the work, namely, 1. that learning of repetitive patterns of movements is possible (Sequence 1); 2. that the algorithm can use real data as visual input (Sequence 2); 3. that the algorithm allows learning of movements of all limbs, including precision movements of the extremities (Sequence 3). The experiments consisted of ¢rst running the demonstration, by entering the video data from the human demonstration (Sequence 2) 7 The visual tracking system could track only movement of the upper body part in a vertical plane. Therefore, movements of Sequence 1 and Sequence 3 could not be recorded from a human demonstration and were generated in simulation using the imitatee’s avatar. 176 A. BILLARD Figure 5. Snapshots of intermediate positions in the taught sequence 1: The ¢gures on top show the imitatee’s demonstration and the bottom ¢gures show the imitator’s reproduction. Figure 6. Snapshots of intermediate positions in the taught sequene 3 (1: reaching a position at about 30 on the right, 2: closing the ¢ngers for grasp, 3: wrist rotation, 4: opening of grasp, retracting of the arm and £exion of the elbow). The ¢gures on top show the imitatee’s demonstration and the bottom ¢gures show the imitator’s reproduction. Figure 7. Left: Snapshot of the video of the human demonstration in sequence 2 (series of oscillations of shoulders and elbows). Right: Superpositions of the hand (star points) and elbow (dots) and shoulder positions during the demonstration. The ¢gure shows the lines joining the elbows to the shoulders and the two shoulders together. LEARNING MOTOR SKILLS BY IMITATION 177 Figure 8. Activity of the motor neurons in imitator (plain line) and imitatee (dashed lines) in sequence 1 and 3. L-kn-F is the motor neuron of the £xor of the left knee. R-Sh-E-x is the motor neuron of the extensor of the right shoulder in the direction x. F1/5-1/3 correspond to the £exors of the ¢ve ¢ngers and the three joints per ¢nger. The thumb, ¢nger 1, has only two joints. 178 A. BILLARD Figure 9. Activity of motor neurons of imitator during repetition of sequence 2. L-Sh-x/y/z is the motor neuron for left shoulder extensor for direction x, y, and z, respectively. or by letting the ¢rst avatar perform the prede¢ned sequence of movements (Sequences 1 and 3). The movements of the imitatee were generated by sequentially activating speci¢c neurons in its primary motor cortex (imitator and imitate have the same neural structure), which further instantiated the spinal cord neurons and ¢nally the muscles. The imitator observes the demonstration (that is, simulated or real data are processed by the visual module for recognition of limb movements) and simultaneously learns the sequential activation of each limb motion, i.e., updates the M1-PM and PM-cerebellum connectivity. Once the demonstration is ¢nished, rehearsal of the learned sequence is instantiated in the imitator and recorded for further comparison of demonstration and imitation. Learning and rehearsal of the sequences is directed by the decision module. That is, the decision module activates the learning or rehearsal routines of the DRAMA architecture, depending on the value of a £ag, instantiated by the experimenter as input to the program. LEARNING MOTOR SKILLS BY IMITATION 179 Figures 5 and 6 show the intermediate positions of the sequences of movements 1 and 3, respectively. Because Sequence 2 was composed of oscillations of small amplitude, it was dif¢cult to represent them through a series of snapshots. Instead, the ¢gure shows superimposed plots of the hand and elbow positions during the demonstration and one snapshot of the video recording at the beginning of the sequence (Figure 7). Animations of each of the three simulations and the video of the human motion recording can be seen at the following Web site: http://www-robotics.usc.edu/ billard/imitation.html Figure 8 shows the motor neurons’ activity in the imitatee superimposed during the demonstration (dashed line) and in the imitator during rehearsal of the sequence (plain line) for Sequences 1 and 3 (top and bottom). Figure 9 shows the activity of the avatar imitator’s motor neuron superimposed during rehearsal of Sequence 2 (note that only the neural activity of the imitator, that is the avatar, was accessible for this sequence). In all three examples, the imitator’s reproduction of the sequence is complete (the reader can refer to the video and animations on the above-mentioned web site for observing the correct reproduction of Sequence 2). That is, the sequential order of muscle excitation is respected and all steps in the sequences are reproduced. However, the exact timing (the duration of excitation of each muscle) and the amplitude of the excitation is not perfectly reproduced. This is due in our model to the error margin e in equation 2 which permits up to 10% (in these simulations) imprecision on the measured time delay of units’ coactivation. In order for a motor neuron to reach the maximum of its amplitude and, hence, to activate the muscle, it must receive an external excitation during a suf¢ciently long time-lag. When the duration of activation is too short (due to an imprecise reproduction of the timing of excitation/inhibition of the excitatory M1 neurons), the motor neuron excitation is very weak (as in Sequence 1). This problem can easily be overcome by reducing the error margin. However, reducing the error margin decreases the robustness of the learning in Table 2. Growing of M1-PM and PM-cerebellum interconnectivity before and along learning of each of the three sequences Learning Stages M1-PM PM-Cereb 0 52 0 1 64 22 2 94 34 3 130 48 180 A. BILLARD front of noisy input and one has to ¢nd a tradeoff between the two issues. In previous work on learning of time series with an autonomous robot (Billard & Hayes, 1999), we proposed an algorithm to adapt the parameters e and y in equation 2 during the learning. This algorithm was not implemented in the experiments presented here. Table 2 shows the building of the connectivity between M1 and PM and between PM and the cerebellum during learning of the three sequences (starting with Sequence 1 followed by Sequences 2 and 3). Data are the number of nonzero connections during the four learning stages. Stage 0 is before learning and stages 1, 2, and 3 are after learning of each of the three sequences. Initially (stage 0), 22 nodes in PM are already connected to M1 nodes (making 52 nonzero connections), while no nodes in PM and the cerebellum are yet interconnected (hence, 0 nonzero connection in PM-cerebellum). The prede¢ned M1-PM connections encode the prede¢ned movements of reaching (in the two frontal directions) and grasping for the two arms, as well as the connections for starting the oscillatory movements of legs and knees in walking, retracting, and kicking movements. During stages 1, 2, and 3, new connections are created between M1 and PM to represent non-prede¢ned simultaneous activation of muscles, resulting from the excitation of speci¢c PM neurons. Simultaneously, new connections within the cerebellum and between the cerebellum and PM are created to represent the sequential activation of coordinated muscles activation, learned in PM. Results of Table 2 show that Sequence 1 has the biggest increase of connections between PM and cerebellum compared to Sequences 2 and 3, and that Sequences 3 builds more connections than Sequence 2. These differences are due to the fact that Sequence 1 activates in sequence more limbs than Sequences 2 and 3, and, similarly, that Sequence 3 activates more limbs than Sequence 2: Sequence 1 requires activation of £exor and extensors muscles of the shoulders (in x and z directions), elbows, legs, and knees; Sequence 3 requires activation of the right shoulder (£exor-extensor in x), elbow (£exor), wrist (£exor-extensor), and ¢ngers (£exor and extensor); and, ¢nally, Sequence 2 requires movements of only the shoulders and elbow (£exor and extensor in x). Sequence 3 results in the building of more connections between M1 and PM than Sequences 1 and 2. This is due to the fact that Sequence 1 involves the coactivation of more limbs, namely, during the LEARNING MOTOR SKILLS BY IMITATION 181 coactivation of extensor muscles in simultaneous retracting of the arm with elbow £exion and opening of the grip. These movements (arm retraction with elbow £exion and opening of the grip) were not yet encoded, while coactivation of £exor muscles in reaching and grasping were encoded. Sequences 1 and 2 made fewer connections, because an important part of the movements could be described by the preencoded stretch and retraction movements of the shoulders and elbows. Note that in the choice of encoding, some but not all the movements used in the demonstration were preencoded in order to show through these three training examples, both building of new connections and reuse of prede¢ned ones. In further experiments, which will address the development of human motor skill, we will investigate learning of new arbitrary movements on top of prede¢ned movements, corresponding to those present in early stage of infants’ development (such as grasping, reaching, and crawling (Clifton et al., 1994; Berthier, 1996; Konczak & Dichgans, 1997)). D ISCUSSION This article presented a biologically inspired model of the visuo-motor transformation and the learning processes involved in learning by imitation of new motor skills in a humanoid avatar. The model was applied to learning skills involving discrete and oscillatory movements of upper and lower limbs, such as balancing of legs and arms with £exion of knees and elbows. It was also tested on more precise movements of the extremities, namely, 1) grasping, consisting of coordinated £exion of all ¢ngers, and 2) reaching a speci¢c point in space, which requires precise tuning of the duration of excitation of the M1 neurons responsible for the excitation of the shoulder and elbow extensor muscles. Results showed that learning of the sequences was correct to the extent that each step of the sequence was reproduced. However, the imitator did not learn the exact duration of neural excitation for each movement, as the model allowed large imprecision in the recording of time-delay of neural coactivation. Consequently, the imitator’s reproduction of the movement was imprecise: it would allow less delay between each step of the sequences and it would sometimes make movements of lower amplitude (as the amplitude of the movement is directly related to the duration of excitation of motor neurons to the muscles responsible for producing the movement) than that demonstrated. 182 A. BILLARD B iological Inspiration of the M odel The architecture proposed here gives a very high-level and abstract representation of the functionality and not the detailed structure of the modeled brain areas. A number of biological features were represented by this model. The modules are all modeled at a connectionist level with the exception of the visual and decision modules. The connectivity between the modules respects that identi¢ed by neurologists. We have not introduced connections which have not been observed in the brain, but all existing connections have not been modeled. Motor control is hierarchical (two of the three levels indicated by neurobiologists are modeled) with, at the lowest level, prede¢ned neural oscillatory circuits, central pattern generators (Stein et al., 1997), encoding simple rhythmic movements. An important number of biological features however, are, not represented in this model. Motor control is done without sensory feedback. In vertebrates, sensory feedback from muscle spindles (measuring muscle stretch), tendon, joint and skin receptors, are used to direct re£exes and control locomotor patterns. The mechanical simulation of the avatar is only a ¢rst approximation of the human biomechanics and is incomplete (see the third section for details). The neural structure of each module does not correspond to that of corresponding brain areas; the DRAMA architecture is not a plausible model of any brain area. The visual and attentional module are not modeled to correspond to speci¢c brain functionalities. They only serve as functional modules for a possible robotic implementation. In addition, there are a number of problems in relation to visuo-motor control which this model did not attempt to address. These are the different neural processes involved in visual recognition of human shapes, decomposition of limb movements, and frame of reference transformation. Also, there are the aspects related to the learning of ¢ne motor tuning in the presence of noise and in coordination with sensory feedback. Detailed models of speci¢c parts of the brain involved in motor control and learning have been developed e.g., (Arbib & Dominey, 1995; Kawato et al., 1987; Schweighofer et al., 1998). Current and continuing work is inspired by those models. While the modeling of a humanoid avatar’s imitation abilities is far from approaching the immense complexity of similar processes in primates, this work might still bring some insight to research on imitation: it is the ¢rst neural architecture that accounts for the imitation LEARNING MOTOR SKILLS BY IMITATION 183 of grasping and reaching movements and shows that the same architecture could be used for producing imitation of all other limb movements. As such, it represents a ¢rst step towards the development of a complete neurological model of learning by imitation and towards its implementation on robots. M irror Neurons Recently, the imitation community has shown an increased interest in the area F5 of the motor cortex in Rhesus monkeys, in which mirror neurons have been observed (Rizzolati et al., 1996a; di Pellegrino et al., 1992). Mirror neurons are those neurons which ¢re both when the monkey observes an action performed by a human or another monkey and when it produces the same action itself. The experiments reported data on the recognition of ¢nger prehension, holding, grasping, and manipulating (involving wrist rotations). These are all behaviors that are part of the animal’s natural repertoire and as such are questioned as a demonstration of imitation behavior. Nonetheless, there is a need for a neurological encoding of the visuo-motor transform which allows the imitator to understand its visual perceptions in terms of its own motor commands. In our model, this is done by the primary and premotor cortex modules whose neurons ¢re when a speci¢c limb action (in M1) or combination of these (in PM) is observed or performed. The area F5 in the monkey, in which the mirror neurons have been observed, is located within the premotor cortex, which might correspond to Broca’s area in area 6 of humans (Rizzolati & Arbib, 1998). In this model, the same PM neurons get activated by both the observation and production of the same movements. For this reason, the PM module gives a high level (functional rather than neurological) representation of mirror neurons. Note that this model assumes that mirror neurons exist for all the premotor cortex to represent movements of all limbs. However, there is yet no evidence of a similar neural activation in monkey premotor cortex for movements of the lower limbs. Only the area F5 associated to hand movements has been reported so far. Fagg and Arbib, analyzing the data of Rizzolatti and Sakata, have developed a detailed model, the FARS model (Fagg & Arbib, 1998), of the neural pathways in monkeys premotor cortex, based on the interactions between the anterior intraparietal sulcus (which transfers visual information) and the F5 area. The model represents a biologically 184 A. BILLARD plausible pathway for the visuo-motor transformation involved in grasping. This model is currently extended by Oztop and Arbib to incorporate a model of mirror neurons (personal communication). In the FARS model, the parietal analysis of visual information is transmitted directly to the premotor cortex. In this respect and others (as the brain areas do not model to such a detailed level), this model differs from the FARS model and lacks biological plausibility. In this model, the visuo-motor transformation is done in the M1, and not in PM as in the FARS model. The M1 module is composed of two layers of nodes which are a duplicate of one another. Neurons in the ¢rst layer of M1 ¢re for the visual recognition of speci¢c limb movement (performed by the imitatee), while the corresponding nodes in the second layer of M1 ¢re when the imitator performs the corresponding limb movement. The connectivity between M1 and TC and the correspondence between the two M1 layers is built-in in this model and it is not explained how it is learned or developed. The building of connections between M1 and PM leads to the learning of the visuo-motor correspondence, the ¢rst step in the learning by imitation process. This two-layer decomposition of the M1 and the PM connectivity with the TC does not correspond to (nor contradicts) biological data. To one’s knowledge, there is so far no evidence of a neural activation in the primary motor cortex during both visual observation and motor command, as this model assumes there would in two spatially close areas of the M1. Our choice to build the visuo-motor correspondence in M1, rather than through the PM module, was motivated by the wish to give a biologically plausible explanation for the building of the backwards connections to the motor system. The fact that learning an observed sequence also means learning how to perform it, is based on the assumption that when synapses of the connections from the upper layer of M1 to PM are updated (e.g., following an observed movement in the left knee), the synapses of the corresponding connections from the PM to the lower layer of the M1 are updated in the same way (e.g., to encode the movement of the knee). Such a simultaneous update of synapses could be explained by a spatial proximity of the two connections. Note that having these two distinct layers is important to separate learning an observed movement from performing it, and allows, for instance, the imitator to learn an observed movement while performing another. LEARNING MOTOR SKILLS BY IMITATION 185 In contrast, if TC were directly connected to M1 (as in the FARS model), then a more complex mechanism (than the connection proximity used in this model) would be needed in order to learn the correspondence between TC-PM and PM-M1 connectivity. This mechanism could, for instance, be a meta-level representation of the correspondence between visual and motor patterns for each movement. In this case, the mirror neurons system would not be suf¢cient to produce the visuo-motor transformation necessary for learning by imitation and another brain area should be found to account for this meta-level representation. Mirror neurons are often described as reacting to the goal of movements rather than to the actual motor pattern. They describe an action (such as grasping or reaching) rather than a movement (i.e., the particular limbs activity) (Rizzolatti et al., 1996a; Rizzolati & Arbib, 1998). In the experiments presented here, the model allowed copy of movements, i.e., of speci¢c sequence of limb activations, rather than copy of the movements’ goal or effect on the environment. In this respect, the model does not satisfy the above description of mirror neurons. The model should be improved so that it incorporates a module which generalizes over different limbs’ movements and recognizes an action as the cooccurence of one of these movements with the recognition of a speci¢c perceptual (visual) context (e.g., the contact of the end point of the moving limb with an object for the action of reaching). In the present model, the recognition of movements can be made independent of the speci¢c timing between each limb motion in the sequence and the speci¢c amplitude of each limb movement (by allowing a loose match of weight and time parameters in equation 2). This is a ¢rst step toward recognizing general motions over speci¢c ones. It remains now to improve the model’s visual ability for recognition of object versus human limbs, and to add a module that makes association across this new visual input and output of the current modules for recognition of movements. However, one should mention that true imitation, as that required for learning dance steps, must rely on the ability to recognize limb speci¢c movements, which often cannot be related to a usual, goal-directed motion. Two dance steps are often discriminated only by the timing and the amplitude of the movement, while the two steps activate the same limbs. The leap between mimicry, i.e., the ability to reproduce motion which are part of the animal’s usual repertoire, and true imitation lies perhaps in having the ability to decompose the 186 A. BILLARD recognition of movements with respect to a limb-by-limb representation, using a detailed parametrization of the movement. The human mirror system, if it exists, would need to incorporate neurons that would be selective to speci¢c movements, in addition to neurons selective to actions (as that of monkeys). While this model does not account for goal-directed limitation, it allows, however, imitation of arbitrary movements, limb, amplitude, and timing speci¢c. A complete model of human imitation should allow both goal-directing and arbitrary imitation. Further work will improve the model towards this end. Imitation and the D evelopment of Language Mirror neurons permit the passage from observation to execution of movements. Such a neural mechanisms provides the grounds for body communication and further verbal communication. Mirror neurons could thus be a necessary device for both imitation and language. The observation that the area F5 in monkeys could correspond to Broca’s area8 in the corresponding area 6 of the human motor cortex leads Rizzolatti and Arbib (1998) to propose that ``human language [ . . . ] evolved from a basic mechanism that was not originally related to communication: the capacity to recognize actions. [ . . . ] Natural selection yielded a set of generic structures for matching action, observation, and execution. These structures coupled with appropriate learning mechanisms proved great enough to support cultural evolution of human languages in all their richness.’’ It is interesting to relate this claim to studies of psycholinguistics (Garton, 1992; Nadel et al., 1999; Trevarth et al. 1999), which stress the importance of social cues, such as coordinated behavior (of which imitation is one instance), as the precursor to language development in infants (see Billard (2000) for a review). Imitation has been attributed to three different roles in infants’ language development: in the motor control of speech, the infants’ internal cognitive development, and the infants’ social interactions. In the last, imitation is as a social factor which guide the infants’ cognitive development. This developmental step is `à marker for the child’s development of a more complex form of 8 Broca’s area (see Figure 1) is thought to be involved in the control of motor program for coordinating mouth movements for speech. It also has a role in the processing of syntax, as Broca’s patients not only speak slowly but also agrammatically. LEARNING MOTOR SKILLS BY IMITATION 187 verbal communication exchanges’’ (Nadel et al., 1999). If imitation and language require the development of a common neural system, it is not surprising that both skills are observed to develop within the same stage of development. The hypothesis of a relationship between the neural mechanisms responsible for imitation and that responsible for language are very interesting, as we use the same learning architecture (DRAMA) in both the present model, which allows learning of motor skills by imitation, as in previous experiments on robot learning of a language (Billard & Hayes, 1999, 1998; Billard and Dautenhahn, 1999). In this article, DRAMA models the M1 (which contains the mirror neurons) and the cerebellum. Recent studies suggest other cognitive functions for the cerebellum in addition to motor control (Paulin, 1993; Keele & Ivry, 1990). Particularly relevant to the argument of this section is the study of Leiner et al. (1993), which links the changes in the cerebellar structure (occurring during hominid evolution) and the evolution of human language. Their argument follows from the observation that: `ìn the human brainstem a neural loop has evolved in which the red nucleus receives a projection from language areas of the cerebral cortex. This input to the red nucleus would enable the neural loop to participate in language functions as well as motor functions. It could participate both in the cognitive process of expressing these words and in the motor process of expressing these words, perhaps functioning as a language-learning loop.’’ The model presented in this article will be extended to integrate an auditory module. Experiments will then be conducted where the robot will be taught complete sentences to describe the newly learned sequences of movements. These experiments would investigate a potential link between the neural mechanisms used for learning of a language and that used in motor learning. In previous work (Billard, 2000, 1999), it was shown that a robot controlled by the DRAMA architecture could be taught proto-sentences, such as ``you touch left arm’’ and `Ì move head right,’’ to describe its interactions with the teacher. We hypothesize that the neural processes for motor and language learning require a general mechanism for spatio-temporal association across multimodal inputs and for learning of complex time-series. Such a mechanism corresponds in parts to the function of the cerebellum in the brain and the DRAMA architecture is a possible model of it. 188 A. BILLARD C ONCLUSION This article presented a connectionist architecture for learning motor skills by imitation. The architecture is biologically inspired. It gives an abstract and very high level representation of the functionality but not the structure of some of the brain’s cortical areas, namely, the visual cortex, premotor, and primary motor cortexes, and cerebellum. It also models the spinal cord as prede¢ned networks of motor- and inter-neurons, i.e., central pattern generators. Learning in the motor cortex and cerebellum results from spatio-temporal association of multimodal inputs and is provided by DRAMA, a connectionist architecture for learning of time-series. We discussed the limitations and contributions of this model to robotics and neurobiology. As a robotic model, it provides a complete connectionist mechanism for learning of motor skills by imitation, involving all degrees of freedom of a humanoid robot. As a biological model, it gives a high level representation of the visuo-motor pathways responsible for learning by imitation in primates. It gives a simple representation of mirror neurons. We also discussed further development of the model, following our previous work on robot learning of a language, which would address the hypothesis of a similarity between the cognitive structures responsible for the learning of motor skills and that responsible for the learning of a language. The architecture was validated in a mechanical simulation of a pair of imitator-imitatee humanoid avatars for learning three types of movements sequences. These experiments showed that the architecture can learn 1. combinations of movement involving all joints, including the ¢nger joints, 2. complex oscillatory patterns 3. sequences with variable timing, as it is the case with the human demonstration. Further work will gradually improve the biological plausibility of each of the architecture’s modules and the overall organization. The mechanical simulation of the avatars is being currently improved in view of its implementation in a humanoid robot. LEARNING MOTOR SKILLS BY IMITATION 189 AC K NOWLED GM ENTS Many thanks to the anonymous reviewers whose comments helped greatly improve the articler. Lots of thanks to Michael Arbib for constructive comments concerning the writing of this article and for his guidance towards improving the current model. Warmest thanks to Auke Jan Ijspeert for his invaluable help on modeling the spinal cord module. Many thanks to Maja Mataric for fruitful discussions which helped improve this work. Many thanks to JÏrgen Rossmann and Dirk Pensky at the Robotics Institute at the University of Dortmund for providing the humanoid version of the Cosimir simulator. Many thanks to Stefan Weber for providing the visual tracking system and the data on human motion. This work was funded by the National Science Foundation Award IRI-9624237 to M. Mataric. R EFER ENC ES Arbib, M. A., and P. F. Dorniney. 1995. Modeling the roles of basal ganglia in timing and sequencing saccadic eye movements. In Models of information processing in the basal ganglia, 149^162. Beer, R. D. 1995. On the dynamics of small continuous-time recurrent neural networks. Adaptive Behavior 3(4):469^510. Berthier, N. 1996. Infant reaching strategies: theoretical considerations. Infant Behavior and Development 17:521. Billard, A. 1998. DRAMA, a connectionist model for robot learning: Experiments on grounding communication through imitation in autonomous robots. PhD thesis, Dept. of Arti¢cial Intelligence, University of Edinburgh, UK. Billard, A. 1999. Drama, a connectionist architecture for on-line learning and control of autonomous robots: Experiments on learning of a synthetic proto-language with a doll robot. Industrial Robot 26:1:59^66. Billard, A. 2000. Imitation: A means to enhance learning of a synthetic proto-language in an autonomous robot. In K. Dautenhahn and C. Nehaniv, ed., Imitation in animals and artifacst. Cambridge: MIT Press. Billard, A., and K. Dautenhahn. 1999. Experiments in social robotics: grounding and use of communication in autonomous agents. Adaptive Behavior, Special Issue on Simulation of Social Agents 7(3/4):415^438. Billard, A., and G. Hayes. 1998. Transmitting communication skills through imitation in autonomous robots. In A. Birk and J. Demiris, eds., Learning robots: A multi-perspective exploration, 7995. LNAI Series, Springer-Verlag. Billard, A., and G. Hayes. 1999. Drama, a connectionist architecture for control and learning in autonomous robots. Adaptive Behavior 7(1):35^64. 190 A. BILLARD Byrne, R. W., and A. Whiten. 1988. Machiavellian intelligence: Social expertise and the evolution of intellect in monkeys, apes, and humans. Oxford Science Publications. Oxford: Clarendon Press. Clifton, R., P. Rochat, D. Robin, and N. Berthier. 1994. Multimodal perception in the control of infant reaching. Journal of Experimental Psychology: Human Perception and Performance 20:876^886. Cooke, S. R., B. Kitts, R. Sekuler, and M. J. Mataric. 1997. Delayed and real-time imitation of complex visual gestures. In Proceedings of the International Conference on Vision, Recognition, Action: Neural Models of Mind and Machine, Boston University, May 28^31. Dautenhahn, K. 1995. Getting to know each otheröarti¢cial social intelligence for autonomous robots. Robotics and Autonomous Systems 16:333^356. Demiris, J., and G. Hayes. 1996. Imitative learning mechanisms in robots and humans. In Proceedings of the 5th European Workshop on Learning Robots, Bari, Italy, 9^16. Also as Research Paper No. 814, Dept. of Arti¢cial Intelligence at the University of Edinburgh. Demiris, J., S. Rougeaux, G. M. Hayes, L. Berthouze, and Y. Kuniyoshi. 1997. Deferred imitation of human head movements by an active stereo vision head. In Proceedings of the 6th IEEE International Workshop on Robot Human Communication , 88^93. IEEE Press, Sendai, Japan, Sept. di Pellegrino, G., L. Fadiga, L. Fogassi, V. Gallese, and G. Rizzolati. 1992. Understanding motor events: a neurophysiological study. Experimental Brain Research 91:176^180. Fagg, A. H., and M. A. Arbib. 1998. Modelling parietal-premotor interactions in primate control of grasping. Neural Networks 11:1277^1303. Ferraina, S., P. B. Johnson, and M. R. Garasto. 1997. Combination of hand and gaze signals during reaching: Activity in parietal area 7 m of the monkey. Journal of Neurophysiology 77(2):1034^1038. Gallese, V., G. Rizzolati, L. Fadiga, and L. Fogassi. 1996. Action recognition in the premotor cortex. Brain 119:593^609. Garton, A. F. 1992. Social interaction and the development of language and cognition. In Essays in developmental psychology. Mahwah, USA: Laurence Erlbaum Associates. Gaussier, P., S. Moga, J. P. Banquet, and M. Quoy. 1998. From perception-action loop to imitation processes: A bottom-up approach of learning by imitation. Applied Arti¢cial Intelligence 7(1):701^729. Geschwind, N. 1979. Specialisation of the human brain. Scienti¢c American 24(3):180^199. Herman, L. M. 1990. Cognitive performance of dolphins in visually-guided tasks. In J. Thomas and R. Kastelein, eds., Sensor abilities of cetaceans, 455^462. New York: Plenum Press. LEARNING MOTOR SKILLS BY IMITATION 191 Heyes, C. M. 1996. Social learning in animals: The roots of culture. San Diego: Academic Press. Hop¢eld, J. J. 1984. Neurons with graded response properties have collective computational properties like those of two-state neurons. In Proceedings of the National Academy of Sciences, Volume 81, 3088^3092. Washington, DC: The Academy. Houk, J. C. 1997. On the role of the cerebellum and basal ganglia in cognitive signal processing. In C. I. De Zeeu, P. Strata, and J. Voogd, eds., Progress in brain research, Volume 114, 543^552. New York: Elsevier Science. Houk, J. C., J. T. Buckingham, and A. G. Barto. 1996. Models of the cerebellum and motor learning. Behavioral and Brain Sciences 19(3):368^383. Houk, J. C., and S. P. Wise. 1995. Distributed modular architectures linking basal ganglia, cerebellum, and cerebral cortex: Their role in planning and controlling action. Cerebral Cortex 5(2):95^110. Ijspeert, A. J. 1999. Synthetic approaches to neurobiology: Review and case study in the control of anguiliform locomotion. In D. Floreano, F. Mondada, and J.-D. Nicoud, eds. Proceedings of the Fifth European Conference on Arti¢cial Life, ECAL99, 195^204. New York: Springer-Verlag. Ijspeert, A. J., J. Hallam, and D. Willshaw. 1998. From lampreys to salamanders: Evolving neural controllers for swimming and walking. In R. Pfeifer, B. Blumberg, J.-A. Meyer, and S. W. Wilson, eds., From Animals to Animals, Proceedings of the Fifth International Conference of The Society for Adaptive Behavior (SAB98), 390^399. Cambridge: MIT Press. Ijspeert, A. J., J. Hallam, and D. Willshaw. 1999. Evolving swimming controllers for a simulated lamprey with inspiration from neurobiology. Adaptive Behavior 7(2):151^172. Kawato, M., K. Furukawa, and R. Suzuki. 1987. A hierarchical neural network model for control and learning of voluntary movement. Biological Cybernetics 57:169^185. Keele, S. W., and R. Ivry. 1990. Does the cerebellum provide a common computation for diverse tasks? A timing hypothesis. Annals of the New York Academy of Sciences 608:179^211. Konczak, J., and J. Dichgans. 1997. The development toward stereotypic arm kinematics during reaching in the ¢rst 3 years of life. Experimental Brain Research 117(2):346^354. Kuniyoshi, M. I., and I. Inoue. 1994. Learning by watching: Extracting reusable task knowledge from visual observation of human performance. IEEE Transactions on Robotics and Automation 10:799^822. Fogassi, L., V. Gallese, L. Fadiga, and G. Rizzolatti. 1998. Neurons responding to the sight of goal-directed hand/arm actions in the parietal area pf (7b) of the macaque monkey. 28th Annual Meeting of Society for Neuroscience, Los Angeles, CA. 192 A. BILLARD Leiner, H. C., A. L. Leiner, and R. S. Dow. 1993. Cognitive and language function of the human cerebellum. Trends in Neurosciences 16:1^4. Jeannerod M., M. A. Arbib, G. Rizzolatti, and H. Sakata. 1995. Grasping objects: Cortical mechanisms of visuomotor transformations. Trends in Neuroscience . 18:314^320. Paulin, M. 1993. The role of the cerebellum in motor control and perception. Brain Behavior Evolution 41:39^50. Mink, J. W. 1996. The basal ganglia: Focused selection and inhibition of competing motor programs. Progress In Neurobiology 50(4):381^425. Moore, B. R. 1996. The evolution of imitative learning. In C. M. Heyes and B. G. Galef, eds., Social learning in animals: The roots of culture, 245^265. New York: Academic Press. Nadel, J., C. Guerini, A. Peze, and C. Rivet. 1999. The evolving nature of imitation as a format for communication, In: Imitation in Infancy, 209^234. London: Cambridge University Press. Nehaniv, C., and K. Dautenhahn. 1998. Mapping between dissimilar bodies: Affordances and the algebraic foundations of imitation. In A. Birk and J. Demiris, eds., Proceedings of EWLR97, 7th European Workshop on Learning Robots, Edinburgh, July. Newsome, W. T., K. H. Britten, and J. H. Movshon. 1989. Neuronal correlates of a perceptual decision. Nature 341:52^54. Niebur, E., and C. Koch. 1994. A model for the neuronal implementation of selective visual attention based on temporal correlation among neurons. Journal of Computational Neuroscience 1:141^158. Pen¢eld, W., and T. Rassmussen. 1950. The cerebral cortex of man: A clinical study of localisation of function. New York: Macmillan. Perret, D. I., M. Harries, R. Bevan, S. Thomas, P. J. Benson, A. J. Mistlin, A. J. Chitty, J. K. Hietanene, and J. E. Ortega. 1989. Frameworks of analysis for the neural representation of animate objects and actions. Journal of Experimental Biology 146:87^113. Perret, D. I., M. Harries, A. J. Mistlin, and A. J. Chitty. 1989b. Three stages in the classi¢cation of body movements by visual neurons. In H. B. Barlow, ed., Images and understanding, 94^107. London: Cambridge University Press. Perret, D. I., P. A. J. Smith, A. J. Mistlin, A. J. Chitty, A. S. Head, D. D. Potter, R. Broennimann, A. D. Milner, and M. A. Jeeves. 1985. Visual analysis of body movements by neurones in the temporal cortex of the macaque monkey: A preliminary report. Behavioral Brain Research 16:153^170. Requin, J., and G. E. Stelmach. 1990. Tutorials in motor neuroscience. In NATO ASI Series, Series D: Behavioral and social sciences, Vol. 62. Norwells, MA: Kluwer Academic Publishers. Rizzolatti, G., and M. A. Arbib. 1998. Language within our grasp. Trends Neurosciences 21:188^194. LEARNING MOTOR SKILLS BY IMITATION 193 Rizzolatti, G., L. Fadiga, V. Gallese, and L. Fogassi. 1996a. Premotor cortex and the recognition of motor actions. Cognitive Brain Research 3:131^141. Rizzolatti, G., L. Fadiga, M. Matelli, V. Bettinardi, D. Perani, and F. Fazio. 1996b. Localization of grasp representations in humans by positron emission tomography: 1. Observation versus execution. Experimental Brain Research 111:246^252. Freund, E., and J. Rossmann. 1999. Projective virtual reality: Bridging the gap between virtual reality and robotics. IEEE Transaction on Robotics and Automation; Special Section on Virtual Reality in Robotics and Automation 15(3):411^422. www.irf.de/cosimir.eng/ Rothwell, J. 1994. Control of human voluntary movement. London: Chapman & Hall. Schaal, S. 1997. Learning from demonstration. Advances in Neural Information Processing Systems 9:1040^1046. Schaal, S. 1999. Is imitation learning the route to humanoid robots? Trends in Cognitive Sciences 3(6):233^242. Schmajuk, N. A. and B. S. Zanutto. 1997. Escape, avoidance, and imitation: A neural network approach. Adaptive Behaviour 6(1):63^129. Schweighofer, N., J. Spoelstra, M. A. Arbib, and M. Kawato. 1998. Role of the cerebellum in reaching movements in humans. II. A neural model of the intermediate cerebellum. European Journal of Neuroscience 10(1):95^105. Stein, P. S. G., S. Grillner, A. I. Selverston, and D. G. Stuart. 1997. Neurons, networks and motor behavior. A Bradford book. Cambridge: MIT Press. Thach, W. T. 1996. On the speci¢c role of the cerebellum in motor learning and cognitionöclues from PET activation and lesion studies in man. Behavioral and Brain Sciences 19(3):411^431. Tomasello, M. 1990. Cultural transmission in the tool use and communicatory signaling of chimpanzees. In Language and intelligence in monkeys and apes: Comparative developmental perspectives, 274^311. Trevarthen, C., T. Kokkinaki, and G. A. Fiamenghi Jr. 1999. What infants’ imitations communicate: With mothers, with fathers and with peers, 61^124. London: Cambridge University Press. Usher, M., and E. Niebur. 1996. A neural model for parallel, expectation-driven attention for objects. Journal of Cognitive Neuroscience 8(3):305^321. Visalberghy, E., and D. Fragaszy. 1990. Do monkey ape? Language and intelligence in monkeys and apes: Comparative developmental perspectives, 247^273. Voogt, J. 1993. Anatomy of the motor system, In: Neural Aspect of Human Movements,1^11. Amsterdam: Sweets & Zeitlinger. Whiten, A., and R. Ham. 1992. On the nature and evolution of imitation in the animal kingdom: Reappraisal of a century of research. Advances in the Study of Behavior 21:239^283.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download learning motor skills by imitation: a biologically inspired robotic model