Download Emergence of Mirror Neurons in a Model of Gaze Following

Emergence of Mirror Neurons in a Model of Gaze Following Jochen Triesch , Hector Jasso , and Gedeon O. Deák Frankfurt Institute for Advanced Studies, Max-von-Laue-Str. 1, D-60438 Frankfurt/Main, Germany Dept. of Cognitive Science, 9500 Gilman Drive, La Jolla, CA 92093, USA, triesch,deak @cogsci.ucsd.edu Dept. of Computer Science and Engineering, 9500 Gilman Drive, La Jolla, CA 92093, USA, [email protected] Abstract— We present a computational model of the emergence of gaze following, which is the ability to re-direct one’s gaze to the location that another agent is looking at. The model acquires gaze following by learning to predict the locations of interesting sights from the looking behavior of other agents through reinforcement learning. In doing so, the model develops pre-motor representations that exhibit many properties characteristic of mirror neurons. The predicted new class of mirror neurons is specific to looking behaviors. The model offers a simple account of how these and other mirror neurons may acquire their special response properties. In this account, visual representations of other agent’s actions become associated with pre-motor neurons that represent the intention to perform corresponding actions. Index Terms— gaze following, mirror neuron, reinforcement learning, temporal difference learning, actor critic architecture, imitation, response facilitation I. I NTRODUCTION Gaze following is the ability to look where somebody else is looking. This skill is considered to be a foundational component of humans’ social interaction abilities. Gaze following emerges in a progressive fashion during the first two years of life [16]. While pre-cursors of gaze following can be observed in newborns, some gaze following behaviors do not emerge until 18 month or later. For example, while young infants will only follow gaze to target objects that are already inside their field of view, older infants will also follow gaze to targets behind them [1]. Similarly, while young infants will be easily “fooled” by additional distractor objects which are not being looked at, older infants are more accurate in estimating the correct target of others’ gaze. Finally, there is also an interesting development in the kinds of visual cues that infants use for gaze following. While they seem to mostly follow others’ head movements early on, they will later become sensitive to the status of the eyes of the other person. There has been much interest in gaze following in recent years, because researchers believe that the developing gaze following capacities in infants may reflect infants’ developing understanding of others as perceiving intelligent agents that are “like them.” In fact, many gaze following experiments have been designed specifically with the goal of elucidating what the nature of the infant’s understanding of other people may be. Over the last years, several groups including our own have been working on computational models that try to account for various aspects of the infant’s developing gaze following abilities and explain this development in neural terms. We think that it is good modeling practice to keep models as simple as possible. In doing so, we do not deny that infants will ultimately develop very sophisticated representations of themselves and others. Rather, we would like to clarify how these representations may emerge, how they may be built on top of earlier and more primitive representations, and what the underlying developmental driving forces for this process may be. In order to do so, it is best to start with relatively simple models and to extend and refine them as needed. In a companion paper [6], we have proposed a new model for the emergence of gaze following skills during the first 18 months of life. The model is able to account for a wealth of experimental findings and is based on neurally plausible learning mechanisms. An outline of the model will be presented below. The focus of this paper is the unexpected finding that the model develops internal premotor representations that share many aspects of so-called mirror neurons. A. Mirror Neurons Mirror neurons are a class of pre-motor neurons originally found in macaque area F5 [14]. Their defining characteristic is that these neurons become activated when the animal performs an action such as reaching for an object and grasping it or when the animal sees another agent perform the same action. Because of this property, it has been suggested that mirror neurons may play a role in the understanding of other’s actions and recent evidence supports this hypothesis [5]. Some mirror neurons can be triggered through different modalities. For example, a mirror neuron may respond to seeing the action of tearing paper or to hearing the same action being performed [7]. Interestingly, for some mirror neurons it is sufficient if the performance of the action can be inferred, even though it may not be fully visible [22]. Converging evidence points to the existence of a similar system of mirror neurons in humans. What is currently unclear, however, is how mirror neurons arrive at their specific response properties. We find it very unlikely that a sophisticated mirror system could be innate, in the sense of a detailed pre-specified connection pattern for every neuron. Rather, we believe that learning processes must play an important role in the formation of the mirror system and below we will demonstrate how such a learning process might occur. B. Gaze Following as Imitation Mirror neurons are thought to play a central role in imitation behaviors and representations of goal-directed actions. In its most general sense, imitation occurs when an individual observes another’s behavior and replicates it. In this sense, gaze following can be viewed as imitation [12], [19]. The behavior that is being observed is another’s gaze shift to a particular location in space, and this behavior is replicated1. Most authors use more strict definitions for “true” imitation, however, and might prefer to consider acts of gaze following as a form of response facilitation because the copied behavior (a gaze shift to a certain location) is not novel but already part of the agent’s behavioral repertoire (see, e.g., [17]). This distinction is of little importance for the argument of this paper, however, and we will not elaborate on this issue. It is easy to imagine the role that mirror neurons may play in imitation or response facilitation. If another agent is observed performing an action, then this leads to the activation of a population of mirror neurons that code specifically for this action. Because of the motor properties of mirror neurons, this representation of the other agent’s action can be used to trigger execution of the “same” action. As we will see below, this is exactly how gaze following occurs in the model. The novelty of this paper is that our model offers an account how neurons with mirror properties can be learned from scratch (see also [11] for a proposal of how mirror neurons for objectdirected actions may emerge) and that it predicts a new class of mirror neurons that have not been observed yet. II. C OMPUTATIONAL M ODEL OF THE E MERGENCE G AZE F OLLOWING OF The proposed model is a blend of two previous models of the emergence of gaze following. Like [2], our model is based on reinforcement learning [18], [20]. In particular, it uses an actor critic architecture as a biologically plausible model for reward driven learning. Like [8], our model addresses spatial aspects of the gaze following problem. In particular, it allows to model how infants learn to cope with the spatial 1 Since the “behavior” is very simple in this case, it may be more appropriate to use the term motor act, but the imitative nature of gaze following is still obvious. Fig. 1. The model comprises an infant and a caregiver interacting with a number of salient objects. ambiguity that results from only being able to observe their caregiver’s gaze direction but not the precise location of their focus of attention. Due to space limitations, we can only give an overview of the model here. A more detailed description of the model and its behavior are given in a companion paper [6]. In the model, an infant and its caregiver interact with a number of salient objects as illustrated in Fig. 1. The infant receives a reward for looking at salient objects, modeling the infant’s looking preferences. During learning, the infant discovers that the caregiver’s direction of gaze is often predicting the locations of salient objects. The infant learns to associate a particular gaze direction of the caregiver with an increased probability of finding interesting visual stimuli in the locations along the caregiver’s line of sight. The infant is modeled as a reinforcement learning agent with an actor critic architecture as shown in Fig. 2. The infant’s visual system extracts three kinds of feature vectors from the visual scene. We can think of each of them as being represented in a layer of model neurons described by a vector of real numbers. First, the visual system extracts the locations of salient objects within the infant’s field of view. For convenience, these are represented in a two-dimensional 2 body-centered representation , where each element of corresponds to one of 64 locations around the infant as 2 The restriction to two dimensions is a matter of convenience and reflects the fact that most experiments on infant gaze following use a twodimensional layout where infant, adult, and objects are in the same plane, i.e. at the same height from the floor. We expect our model to easily generalize to the case of a fully three-dimensional setting. r s w "V1" "basal ganglia" V delta h action selection m "STS" M e u a "to motor cortex" "pre−motor cortex" Fig. 2. The infant model comprises a number of brain areas including different visual cortical areas, basal ganglia, pre-motor cortex, and motor cortex. See text for details. indicated in Fig. 2. The components of are given by: "!$#"%'&()*+ , -*+/.0$'&(+ 1 (1) where is the saliency of object 2 and the sum runs over all objects present in location # . Location # is considered visible if it falls within the infant’s current field of view of 345,"6 . Maps of the saliency of different visual stimuli may be present in primary visual cortex [9] but also in higher visual areas, particularly in the dorsal pathway. Our assumption of a body-centered representation (in contrast to a retinotopic one) is not physiologically accurate but it frees us from having to model coordinate transformations between different coordinate systems (although it is an interesting question in its own right when and how infants learn to compute certain coordinate transformations). When the caregiver is inside the infant’s field of view, the infant will represent the head pose of the caregiver and the direction of the caregiver’s eyes in vectors 7 and 8 respectively. Each unit in layer 7 (or 8 ) codes for a specific orientation of the caregiver’s head (eyes). Such representations of head pose and gaze direction may be found in the superior temporal sulcus (STS) in monkeys [13]. Separate representations for the caregiver’s head pose and eye direction allow us to capture the development of the infant’s differential sensitivity to these cues, although we do not claim that these representations have to be anatomically separated in the brain. The three vectors of visual features serve as the input to the actor and the critic part of the infant. The actor maintains a representation of competing action plans (in our case, plans for different gaze shifts) in a pre-motor layer 9 . Each element in 9 corresponds to a gaze shift to a certain location in space. These locations are again conveniently represented in a body centered coordinate system such that there is a oneto-one correspondence between features in the visual saliency layer and the pre-motor layer 9 . The visual representations , 7 , 8 activate layer 9 via a matrix of connection weights : ; :<; . Formally, 9 , where is the full vector of features resulting from concatenating , 7 , and 8 . The actor choses among the different action plans via a probabilistic soft-max action selection where the probability of choosing action = ?FEHG is given by: >@? E , = BA ?FEHG A +/CD A 1 I +JCKD (2) where ML is an inverse temperature parameter balancing exploration and exploitation. The critic estimates the value of the current situation via a ; ; linear estimate based on the full feature vector : N MOQP , where ORP denotes the transpose of weight vector O . Based on this estimate it calculates an error signal S : S MTVU N 1 (3) where T is the received reward and N is the estimated value of the state (see below). The infant receives two kinds of rewards. First, the infant receives a positive reward corresponding to the saliency of the location where the infant is looking. This models infants’ preferences for looking at salient stimuli (moving, high contrast, etc.). Second, the infant receives a small negative reward that is proportional to how far the infant is turning away from the forward direction. This models the discomfort associated with turning around, e.g., to look at what’s behind the infant [4]. Learning occurs through adaptation of the weight matrix : for the actor and the weight vector O for the critic based B. Emergence of gaze following Figure 3 shows a subset of the learned connection weights to layer 9 after 50000 steps of learning3. Initially, all weights are set to zero. Soon the connections from layer to layer 9 develop a characteristic one-to-one mapping (not shown): the infant learns how to make accurate saccades to salient objects in its field of view. At the same time, the connections from 7 and 8 to the pre-motor area develop their specific pattern. The infant learns that specific head poses and eye directions of the caregiver predict rewarding stimuli in certain locations which lie along the caregiver’s line of sight. The model’s behavior nicely reflects the developmental progression from rudimentary to more sophisticated gaze following observed in human infants [6]. This development also results in an interesting transition in the infant’s behavior. While the initial behavior is purely driven by visual saliency (bottom-up attention), the later looking behavior becomes increasingly driven by top-down predictions about the locations of rewards. Since the locations of interesting objects are more strongly correlated with the caregiver’s eye direction as compared to the head pose, the weights from 8 get slightly “sharper” than those from 7 . Behaviorally, this is reflected in the development of a higher sensitivity to the caregiver’s eyes as compared to the head. : actions head features eye features Fig. 3. Learned connection weights from the head features W and eye features X to the pre-motor representation Y . Note how the same head pose of the caregiver has become associated with several locations that lie along the caregiver’s line of sight. on the received rewards [3]. The element Z\[^] _ of weight : matrix is updated according to: ?Facb ?Fa 3 A b`de? >@f S[J[^] U ; ?ka AmlnA Seop_ (4) 1 k? a >@f ; ?ka Aql is = gi where = g is the action taken, is a learning rate, ; A the probability of taking action = g in state , o _ is the r -th ; component of the state vector , and S [J[ ] is the Kronecker delta, defined as 1 if = = g , 0 otherwise. The weight vector O used by the critic to estimate the Z`[^] _ A Z`[^]_ d =hgji value of a state is updated according to: ?ka b O 3 A sO ?ka btd A S ;vu (5) III. E XPERIMENTS AND R ESULTS A. Learning environment In the following experiments, the infant and the caregiver interacted with the environment in the following way. The 3, interaction proceeds in discrete trials lasting time steps. There is one salient object positioned in a random location. The caregiver is looking at this object for the entire duration of the trial. The infant starts the trial looking at the caregiver. In every time step, first the visual feature vector ; is computed. Then the activity in the pre-motor layer 9 is calculated. The infant selects an action, which is then ; executed. This results in a new feature vector , a reward T , and new activations in 9 and N . Now the weights are updated according to (4) and (5). This process repeats until the end of the trial. C. Formation of mirror neurons in the pre-motor area At the end of the learning process, the model neurons in layer 9 share many characteristics with classical mirror neurons. First, a unit in this layer will usually be active during the execution of a gaze shift to a certain location in space. This is because the probability of performing such a gaze shift is directly related to the activation of the unit, as described by (2). Second, the unit will also be active when the model observes another agent performing the action of looking at this location. This is due to the learned connection weights from the representation of the caregiver’s head and eyes in layers 8 and 7 to the pre-motor units in 9 . The combination of being active during execution and observation of a motor act is the defining characteristic of mirror neurons. At the same time, these neurons are not merely motor neurons. The model will not always perform a gaze shift when the corresponding pre-motor neuron is activated. Instead, the pre-motor neurons only represent a plan or proposal to perform a certain gaze shift from which the action selection mechanism will select one. This means that the activation of a pre-motor unit due to a salient stimulus or the gaze shift of another agent does not automatically lead to the corresponding gaze shift. Instead, multiple such action 3 A more detailed analysis of the emergence of gaze following in the model is given in [6]. plans will usually compete for being executed. At the same time, execution of any action may be inhibited by additional brain structures which we have G not included in our model. Clearly, the neurons in layer have the defining characteristics of mirror neurons. Note that area F5 mirror neurons selective for grasping typically do not respond to the presence of a visual stimulus alone, even if that stimulus is of interest to the animal. In this respect, the mirror neurons in our model behave differently. A salient visual object to which the model G is not habituated will be sufficient to activate a neuron in layer . This activation will be stronger, however, if the model also sees the caregiver looking at this object. IV. D ISCUSSION We have presented a computational model of the emergence of gaze following. Despite its simplicity, the model explains a large number of findings about the emergence of gaze following in human infants as we have shown in a companion paper to this article [6]. These include the progression in expertise when following gaze to targets in different locations, the improving ability to ignore distractor objects, and the changing utilization of head pose and eye cues for gaze following. An earlier model [2], [21], from which the current one is derived, also explained delays or the complete absence of gaze following in certain developmental disorders such as autism or in other species. These models are based on biologically plausible reinforcement learning mechanisms. A number of other models of the emergence of gaze following have been proposed in the past, but, to the best of our knowledge, none of them accounts for the wide range of experimental findings that our model can capture. Our model was designed to offer a simple and parsimonious account of the complicated sequence of behavior patterns observed in the development of gaze following abilities in human infants. Only after the model was completed, we realized that the representation in the model’s pre-motor area shares important properties with the mirror neuron system in primates. The pre-motor representations in our model are different from the mirror neurons that have been reported in monkeys so far in the sense that they are not concerned with manual or oral motor acts but with gaze shifts. Thus, the model predicts the existence of a new class of mirror neurons specific to looking behaviors. If such a class of mirror neurons could be found, this would lend support to our model. This raises the important question of where in the brains of monkeys (or humans) one should look for such neurons. Electrophysiological and brain imaging studies suggest some tentative answers. Area F5, where the first mirror neurons (for grasping) have been reported is an obvious candidate. More generally, we may expect the presence of the predicted class of mirror neurons in any area intermediate between the superior temporal sulcus (STS), where head and gaze direction sensitive neurons are found, and eye movement related areas such as the frontal eye fields (FEF). Some of the predicted mirror neurons may also be present inside the FEF. The model also predicts that this area should receive direct or indirect input from a visual saliency map. Beyond gaze following per se, our model may have broader implications for our understanding of the mirror system — in particular with respect to the question whether mirror neurons are innate or whether they acquire their properties through a learning process (e.g., [10]). If the model is in fact an accurate description of the processes that underly the emergence of gaze following, and if the predicted class of mirror neurons does in fact exist, then this raises the possibility that other classes of mirror neurons acquire their specific properties in a similar learning process. In the current model, this process essentially has two (not necessarily successive) steps. First, the model learns to perform certain motor acts in the appropriate situations. Concretely, it learns to map the discovery of visually salient stimuli in certain locations to gaze shifts to those locations. This corresponds to learning the appropriate pattern of weights between the representation of visually salient stimuli in layer to the layer 9 of premotor neurons that encode the intention to make gaze shifts to specific locations. Second, the model learns to associate representations of the looking behavior of other agents to appropriate pre-motor neurons. This corresponds to learning the appropriate pattern of weights between the representation of the other agent’s gaze direction in layer 7 and 8 to the same layer of pre-motor neurons 9 , thereby establishing an alternative pathway for activating neurons in this layer. This process is purely driven by the desire to maximize rewards, which, in this case, are obtained for looking at interesting visual stimuli. Thus, the gaze following behavior is learned because the gaze shift of another agent indicates that it is rewarding to perform the same gaze shift, i.e. to look at the same location. Is it conceivable that the same mechanism for learning could also work for other behaviors and other kinds of mirror neurons? For the “classic” mirror neurons concerned with the grasping of objects, we find it plausible that there may be situations where observing an agent grasp an object (e.g., a food item grasped by the mother in order to eat it) may be indicative of a reward if the same action is attempted (grasping a second food item from the same or another source in order to also consume it). This explanation of the emergence of mirror neuron properties seems to have been rejected by some authors. Specifically, Rizzolatti and coworkers write [15]: “In conditions in which mirror neurons become active, hardly any imitation would be useful.” If this were generally true, then our proposed learning mechanism cannot work. For gaze behaviors, we believe that this claim is certainly not true. With respect to grasping, while there may in fact be such situations (e.g., when only one food item is present), we believe that there may also be situations (such as the one described above) where it is very useful to imitate a grasping movement and these situations may be sufficient for the emergence of mirror neurons for grasping. Leaving aside the specifics of grasping, our reinforcement learning explanation of the emergence of imitation may be applicable in many instances where imitation (in its various forms) and a corresponding set of mirror neurons is observed. To resolve this issue, it will likely be necessary to study the emergence of imitation and the mirror system from a developmental perspective. If the appearance of certain imitative behaviors (such as gaze following or the imitation grasping movements) during an individual’s development turns out to coincide with the appearance of mirror neurons for these behaviors, then this would be consistent with our hypothesis. More critically, however, our model predicts that if an animal were raised without the opportunity to ever observe a specific action performed by other animals, then no mirror neurons specific to this action should develop (see also [10] for a different but related proposal). Furthermore, if an animal were raised in an environment where behavior A was rewarded whenever another agent performed an unrelated behavior B, then we may expect the emergence of “mirror” neurons that respond to the animal performing A or to the observation of another animal performing B. In the context of theories of imitation, our account of the emergence of gaze following can be considered a simple associative learning account of response facilitation. It is worth highlighting that our goal was not a model of imitation, and our model does not start with any mechanisms for, say, matching other’s bodies to one’s own. On the contrary, our model has a generic reinforcement learning architecture. Nevertheless, it acquires the ability to map other’s motor acts onto its own behaviors. This finding may be of interest for the question of the development of higher imitative behaviors. While it may be the case that specific mappings from other bodies to one’s own body may be present at birth (e.g., [10]), we have shown that such mappings can also result from generic reinforcement learning mechanisms. ACKNOWLEDGMENT This work was done as part of the MESA project (Modeling the Emergence of Shared Attention) at the University of California, San Diego (http://mesa.ucsd.edu). We thank all members of the MESA team for their continuing collaboration. We also thank Jaime Pineda and Garrison Cottrell for fruitful discussions. This work was supported by the National Science Foundation under grant SES-0527756. JT acknowledges support from the Hertie foundation. R EFERENCES [1] G. E. Butterworth and N. Jarrett. What minds have in common in space: Spatial mechanisms serving joint visual attention in infancy. British J. of Developmental Psychology, 9:55–72, 1991. [2] E. Carlson and J. Triesch. A computational model of the emergence of gaze following. In H. Bowman and C. Labiouse, editors, Connectionist Models of Cognition and Perception II, pages 105–114. World Scientific, 2004. [3] P. Dayan and L. F. Abbott. Theoretical Neuroscience. MIT Press, Cambridge, MA, 2001. [4] G. O. Deák, R. Flom, and A. D. Pick. Perceptual and motivational factors affecting joint visual attention in 12- and 18-month-olds. Developmental Psychology, 36:511–523, 2000. [5] Pobric. G. and A.F. de C. Hamilton. Action understanding requires the left inferior frontal cortex. Current Biology, 16:524–529, 2006. [6] H. Jasso, J. Triesch, C. Teuscher, and G.O. Deák. A reinforcement learning model explains the development of gaze following. Int. Conf. on Cognitive Modeling (ICCM), 2006. [7] Evelyne Kohler, Christian Keysers, M. Alessandra Umiltà, L. Fogassi, V. Gallese, and G. Rizzolatti. Hearing sounds, understanding actions: Action representation in mirror neurons. Science, 297:846–848, 2002. [8] B. Lau and J. Triesch. Learning gaze following in space: a computational model. In Jochen Triesch and Tony Jebara, editors, Proc. ICDL’04 — Third International Conference on Development and Learning, San Diego, USA, pages 57–64. The Salk Institute for Biological Studies, 2004. [9] Zhaoping Li. A saliency map in primary visual cortex. Trends in Cognitive Sciences, 6(1):9–16, 2002. [10] Andrew N. Meltzoff. Imitation and other minds: The “like me” hypothesis. In S. Hurley and N. Chater, editors, Perspectives on Imitation: From Neuroscience to Social Science, pages 55–77. MIT Press, 2005. [11] G. Metta, G. Sandini, L. Natale, L. Craighero, and L. Fadiga. Understanding mirror neurons: a bio-robotic approach. Interaction Studies, in press, 2006. [12] Y. Nagai. Joint attention development in infant-like robot based on head movement imitation. In Proc. Third Int. Symposium on Imitation in Animals and Artifacts (AISB’05), pages 87–96, 2005. [13] D.I. Perrett, P.A.J. Smith, D.D. Potter, A.J. Mistlin, A.S. Head, A.D. Milner, and M.A. Jeeves. Visual cells in the temporal cortex sensitive to face view and gaze direction. Proceedings of the Royal Society of London. Series B, 223:293–317, 1985. [14] G. Rizzolatti and L. Craighero. The mirror-neuron system. Annu. Rev. Neurosci., 27:169–192, 2004. [15] G. Rizzolatti, L. Fogassi, and V. Gallese. Neurophysiological mechanisms underlying the understanding and imitation of action. Nature Reviews Neuroscience, 2:661–670, 2001. [16] M. Scaife and J. S. Bruner. The capacity for joint visual attention in the infant. Nature, 253:265–266, 1975. [17] S. Schaal. Is imitation learning the route to humanoid robots? Trends in Cognitive Sciences, 3:233–242, 1999. [18] W. Schultz, P. Dayan, and P. R. Montague. A neural substrate of prediction and reward. Science, 275:1593–1599, 1997. [19] A.P. Shon, D.B. Grimes, C.L. Baker, M.W. Hoffman, S. Zhou, and R.P.N. Rao. Probabilistic gaze imitation and saliency learning in a robotic head. In Proc. Int. Conf. on Robotics and Automation (ICRA 05), 2005. [20] R. S. Sutton and A. G. Barto. Reinforcement Learning: an introduction. MIT Press, Cambridge, MA, 1998. [21] J. Triesch, C. Teuscher, G. Deák, and E. Carlson. Gaze following: why (not) learn it? Developmental Science, 9(2):125–157, 2006. [22] M.A. Umiltà, E. Kohler, V. Gallese, L. Fogassi, Fadiga L., C. Keysers, and G. Rizzolatti. “I know what you are doing:” a neurophysiological study. Neuron, 31:91–101, 2001.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Emergence of Mirror Neurons in a Model of Gaze Following