Download Document 8942625

Embodied Models of Emotions Verification of Psychological and Neurobiological Theories of Emotions Using Virtual and Situated Agents Martin Pascal Inderbitzin TESI DOCTORAL UPF / ANY 2011 DIRECTOR DE LA TESI Paul F. M. J. Verschure Departament Tecnologies de la Informació i les Comunicacions ii iii To Patrick, Pia, Werner, Meg, Nina & Judith v Acknowledgments The accomplishment of this thesis would not have been possible without the professional and personal support of so many people. First of all I would like to thank Paul Verschure for giving me the opportunity to learn from his profound experience during all these years. His constant support and motivation to go further was fundamental for the success of this project! Second of all I would like to thank Ulysses for his co-supervision. Your countless critical and constructive inputs were very helpful to improve the quality of my work. Thanks a lot! My special thanks goes to Dominic Massaro who guided my final study with his impressive experience in scientific research. It was a big honor for me to work with you on this project. I would also like to thank Karen for the so warm reception in your house. You two made me feel home during my stay! Thank you so much for all you have done for me! I would also like to thanks the members of SPECS and my closest friends for their moral and amoral support. Thanks Sytse for introducing me to the scientific world of belgian alchemy. Thanks Encarnushka for joining me on all our round trips (I am still convinced that it goes to the left!). Thanks Cesu for covering my back during all our C.o.D. discussions. Thanks Anna for giving me a hard time in proof reading my documents. Thanks Alasdair for all the inspiring and so helpful inputs on my work. Thanks Ivan for not giving up on me. Thanks Carme, Mireia, Christian, Santa, Joana and Lydia for guiding me through the UPF paper work jungle! Thanks Alberto for enriching our group with Ukulele live performances. Thanks Deco for the invention of one brain one song contest. Thanks Zenuschka for beeing Indian. Thanks Sylvain for bringing me to NASA (and back to earth). Thanks Sam for all the programming work. Thanks Arnau or never. Thanks Elena and Eliza for the frappe! Thanks Marti for the lunch invitation. Thanks Pecho Paloma for the lunch. Thanks Armin for showing me that its possible to finish. Thanks Quique for his funny anecdotes. Thanks Cristina and Belen for enriching our group. And thanks Vicky for the tsipouro, it helped! vii My final and most deepest thanks goes to my family. To Patrick for teaching me resistance, to Pia & Werner for being my compass in so many aspects of live, to Lexa, Maxa and Milla for showing me life outside science, to Meg for her accompaniment, to Nina for always being there and to Judith for all her support during this journey. viii Abstract The investigation of the influence of emotions on human cognition and behavior challenges scientist since a long time. So far the most popular approach to investigate this phenomenon was to observe brain processes and behavior. In the recent decade the field of computational neuroscience proposed a new methodology: the construction of embodied models of emotions and their verification in real world environments. In this thesis we present different studies that use computational models of emotions to control the behavior and the expressions of situated agents. Using different methodologies we evaluate both, the performance of the models and the behavioral responses of humans interacting with them. Our results add to a deeper understanding of the multidimensional phenomena of emotions on three levels: Perception, interaction and how the processing of emotional cues influences learning and behavior. ix Summary In this dissertation we address the issue of understanding the phenomenon of human emotions. To do so we pose the question of how we can construct biologically plausible embodied models of emotions. The motivation to ask this question is based on our strong belief that we can understand the nature of emotions by building situated models of them. We do this by equipping agents with emotive architectures to control their behavior in virtual and physical environments. The observation of the agent’s performance, and the behavior of users interacting with it are used in this thesis to verify existing theories of emotions. Emotions are multidimensional body-mind states that emerge over time. The basis of every emotion is an appraisal mechanism that evaluates the coping potential of an internal or external stimulus with the goals and needs of an individual (Arnold, 1960; Lazarus, 1991; Scherer, 2001). The results of this evaluation mechanism are positive or negative somatic and neuronal adaptions, that influence cognition, perception and behavior. Hence, the main function of emotions is the creation of a valence map that helps an individual to increase the ability to cope with ambiguous physical and social environments (LeDoux, 1996; Damasio et al., 1996; Craig, 2010). Despite this importance, the underlying neurobiological and psychological mechanisms of emotions are not understood in full detail. In the first part of this thesis we investigate the perception and integration of affective behavioral features that build the basis for social interaction. In the second part we focus on the neurocomputational processing of emotional cues and their influence on learning and behavior. In the final part we propose an advanced neurocomputational emotive architecture that is based on the insights of our results and the conceptual framework of recent emotion theories. We start the discourse with the investigation of the perception of a basic emotional behavior, the regulation of the interpersonal space to others (Hall, 1966). Our first study addresses the question of how the perception of a virtual agent or a real person affects social interaction on a spatial scale. Our results reveal that the regulation of the personal space is xi coding social behavior that is fundamentally influenced by the perceptual salience of the interactors (Inderbitzin et al., 2009, submitted). The established psychological concept of the ’vividness effect’ (Frijda, 1988) states that a more salient stimulus construct induces altered cognitive and behavioral responses. Based on our findings we propose that this is a general mechanism of human perception. Our results add to the understanding of this effect that is found to crucially influencing social interaction. The result of our first study opens the question as to which additional non-verbal behaviors are coding social signals. In our second study we investigate the perception of emotional states communicated by different styles of locomotion. Our results identify a number of canonical parameters defining the body configuration of a person walking that code different valence qualities (Inderbitzin et al., 2011). These results are important for the understanding of the underlying behavioral mechanism that codes non-verbal emotional behavior. So far, the presented studies focused on the perception of non-verbal affect. In the next study we want to add the verbal dimension and investigate how humans perceive the emotions transmitted by a talking face. In a face-to-face communication verbal and non-verbal features transmitting emotional meaning build a complex multidimensional stimulus construct. In our third study we investigate the perception and integration of emotional features transmitted by facial expressions and affective words. We compare the behavioral performance of people perceiving a multidimensional stimulus construct that codes either coherent or incoherent affect qualities with the prediction of the fuzzy logical model of perception FLMP (Massaro, 1998). Subjects were instructed to judge the affect of the facial expression, of the meaning of the word pronounced by the face or of the global event combining these two properties. As described by the FLMP, both properties influenced judgments when the participants responded fast. With increasing reaction time, the FLMP did not make better predictions than other models of perception. We conclude that the perception of affect in multiple modalities is an automatic process that can produce interferences, while the integration of these modalities into a global impression is more controlled. xii In the second part of this thesis we investigate the underlying mechanism of emotion processing and its influence on learning and behavioral control. We propose a computational model of emotional conditioning that is based on the two phase theory of conditioning (Inderbitzin et al., 2010a). This theory states that the associative learning processes can be separated into a fast valence-driven, non-specific learning system and a slow specific learning system. We provide a complete account of Konorski’s proposal (Konorski, 1968) by integrating these two systems into a biologically grounded computational model. As an additional benchmark we apply this model to control the behavior of an autonomous robot in an obstacle avoidance task (Inderbitzin et al., 2010b). In the last study we construct a neurocomputational model of fear conditioning in order to elicit appropriate behavioral expressions in an android. The robot’s performance to learn the valence qualities of different stimuli is tested in a real world set up that involves interaction with humans. Based on these findings we propose an advanced emotive architecture. In this thesis we apply different embodied emotive systems to investigate the underlying mechanism of emotions. The presented studies illuminate how humans perceive and integrate affective features. We show that this mechanism influences social interaction on a spatial scale. Using different computational models of conditioning we analyze how the underlying computational mechanisms affect behavioral control. These computational models are used successfully to control different types of robots in physical and social environments. Our results add to a deeper understanding of emotions on three levels: Perception, interaction and learning. xiii List of Appended Papers This thesis is based on the studies listed below. They will be referred in the text. M Inderbitzin, S Wierenga, A Väljamäe, U Bernardet, and P F M J Verschure. Cooperation and competition in the mixed reality space eXperience Induction Machine XIM. Virtual Reality, 13, 153–158, 2009. M Inderbitzin, I Herreros-Alonso, and P F M J Verschure. Amygdala Induced Plasticity in an Integrated Computational Model of the TwoPhase Theory of Conditioning. 4th International Conference on Cognitive Systems, Zurich, 2010. M Inderbitzin, I Herreros-Alonso, and P F M J Verschure. An integrated computational model of the two phase theory of classical conditioning. The 2010 International Joint Conference on Neural Networks (IJCNN), 1-8, 2010. M Inderbitzin, A Valjamae, J M B Calvo, P F M J Verschure, and U Bernardet. Expression of emotional states during locomotion based on canonical parameters. IEEE International Conference on Automatic Face and Gesture Recognition, 809 -814, 2011. M Inderbitzin, A Betella, U Bernardet, and P F M J Verschure. The Social Perceptual Salience Effect. Journal of Experimental Psychology. Human Perception and Performance, submitted. M Inderbitzin, P F M J Verschure, and D W Massaro. Emotion Perception in a Talking Face: Facial and Linguistic Influences. To be submitted. xv Other Papers The author has also contributed to following publications. U Bernardet, M Inderbitzin, S Wierenga, A Väljamäe, A Mura and P F M J Verschure. Validating presence by relying on recollection: Human experience and performance in the mixed reality system XIM. The 10th International Workshop on Presence, Padova, Italy, 2008. U Bernardet, S Bermúdez i Badia, A Duff, M Inderbitzin, S LeGroux, J Manzolli, Z Mathews, A Mura, A Väljamäe, and P F M J Verschure. The experience induction machine: a new paradigm for mixed-reality interaction design and psychological experimentation. The Engineering of Mixed Reality Systems, 357–379, 2010. U Bernardet, A Väljamäe, M Inderbitzin, S Wierenga and P F M J Verschure. Quantifying human subjective experience and social interaction using the eXperience Induction Machine. Brain Research Bulletin, In press. xvi Summary List of figures xxxii List of tables xxxiii 1 2 INTRODUCTION 1.1 What Are Emotions? . . . . . . . . . . . . . . . . . . . 1.2 The Building Blocks of an Emotion . . . . . . . . . . . 1.2.1 Needs and Motivation . . . . . . . . . . . . . . 1.2.2 The Valence System . . . . . . . . . . . . . . . 1.2.3 The Appraisal Mechanism . . . . . . . . . . . . 1.2.4 Neurocomputational, Physiological and Behavioral Responses . . . . . . . . . . . . . . . . . . . . . 1.3 The Time Scale of Emotions . . . . . . . . . . . . . . . 1.4 What Distinguishes an Emotion From a Non-Emotion? . 1.4.1 The Feeling Theory of Emotions . . . . . . . . . 1.4.2 The Cognitive Approach to Emotions . . . . . . 1.5 Basic and Complex Emotions . . . . . . . . . . . . . . . 1.6 The Neurobiological Basis of Emotions . . . . . . . . . 1.6.1 Subcortical Areas . . . . . . . . . . . . . . . . . 1.6.2 Cortical Areas . . . . . . . . . . . . . . . . . . 5 6 7 7 10 13 17 17 20 SYNTHETIC EMOTIONS AND EMOTIONAL AGENTS 2.1 Synthetic Emotions . . . . . . . . . . . . . . . . . . . . 2.1.1 Theory Modeling . . . . . . . . . . . . . . . . . 2.1.2 Application Modeling . . . . . . . . . . . . . . 23 23 24 24 xvii 1 1 2 2 4 4 2.2 3 4 Emotional Agents . . . . . . . . . . . . . . . . . . . . . 2.2.1 Virtual Agents . . . . . . . . . . . . . . . . . . 2.2.2 Physical Agents . . . . . . . . . . . . . . . . . . 25 25 27 NON-VERBAL BEHAVIOR AND SOCIAL INTERACTION 3.1 Human Spatial Behavior . . . . . . . . . . . . . . . . . 3.2 The Effect of Apparent Reality . . . . . . . . . . . . . . 3.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Materials . . . . . . . . . . . . . . . . . . . . . 3.3.2 Research Design . . . . . . . . . . . . . . . . . 3.3.3 Measures . . . . . . . . . . . . . . . . . . . . . 3.3.4 Procedure . . . . . . . . . . . . . . . . . . . . . 3.3.5 Participants . . . . . . . . . . . . . . . . . . . . 3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Spatial Scale of Collaborative Behavior . . . . . 3.4.2 Effect of Players Representation on the Spatial Interaction . . . . . . . . . . . . . . . . . . . . 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 33 33 35 39 39 39 41 42 42 42 45 PERCEPTION OF EMOTIONS 4.1 Emotion Perception in Locomotion . . . . . . . . . . . . 4.1.1 Methods . . . . . . . . . . . . . . . . . . . . . 4.1.2 Results . . . . . . . . . . . . . . . . . . . . . . 4.1.3 Discussion & Conclusion . . . . . . . . . . . . . 4.2 Emotion Perception in the Talking Face . . . . . . . . . 4.2.1 The Fuzzy Logical Model of Perception . . . . . 4.2.2 The Weighted Average Model of Perception . . . 4.2.3 Automatic Processing of Information . . . . . . 4.2.4 Automatic processing of affective faces and words 4.2.5 Experiment 1 . . . . . . . . . . . . . . . . . . . 4.2.6 Results . . . . . . . . . . . . . . . . . . . . . . 4.2.7 Discussion . . . . . . . . . . . . . . . . . . . . 4.2.8 Experiment 2 . . . . . . . . . . . . . . . . . . . 57 58 60 62 64 68 68 72 73 73 76 78 80 84 xviii 47 49 55 4.3 4.2.9 Results . . . . . . . . . . . . . . . . . . . . . . 4.2.10 Discussion . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 86 87 90 5 COMPUTATIONAL MODEL OF EMOTION INDUCED LEARNING 93 5.1 The Two Phase Model of Conditioning . . . . . . . . . . 94 5.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.2.1 The circuit . . . . . . . . . . . . . . . . . . . . 98 5.2.2 The Non-specific Learning System . . . . . . . . 98 5.2.3 The specific Learning System . . . . . . . . . . 102 5.2.4 Integrating the NSL with the SLS . . . . . . . . 105 5.2.5 Robot Application . . . . . . . . . . . . . . . . 106 5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.3.1 Performance of the Integrated Model . . . . . . 107 5.3.2 Performance of the Robot . . . . . . . . . . . . 111 5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 115 6 CONSTRUCTING AN EMOTIVE ANDROID 6.1 The Neurobiological Mechanism of Fear . . . . 6.2 Embodied Emotive Model . . . . . . . . . . . 6.2.1 Model Architecture . . . . . . . . . . . 6.2.2 Experimental Design . . . . . . . . . . 6.2.3 Conditioning . . . . . . . . . . . . . . 6.2.4 Discussion & Conclusion . . . . . . . . 6.3 Proposal for an Advanced Emotive Architecture 6.3.1 Theoretical Basis . . . . . . . . . . . . 6.3.2 Distributed Adaptive Control . . . . . . 6.3.3 Conclusion . . . . . . . . . . . . . . . 7 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 117 121 121 121 124 125 127 127 129 131 133 xix List of Figures 1.1 1.2 1.3 Schematic illustration of the Cartesian analysis of anger. The exciting cause from the external world stimulates the body spirits that are conceptualized as the immediate cause of emotions. The bodily spirits give raise to both the behavioral response and the emotion itself. In the cartesian approach it is not clear what the object of the emotion is. Figure retrieved from (Power and Dalgleish, 1997) . . . 8 The cognitive account of emotions by Aristotle applied to anger. The object describes the external event that becomes evaluated by the individual, that is in an appropriate state of mind. The results is an internal representation or stimulus that elicits the emotional response which is divided into the dimension of matter and form. Figure retrieved from (Power and Dalgleish, 1997) . . . . . . . 11 The connectivity of the amygdala. This nucleus receives inputs from all sensory modalities, cortical and subcortical areas. The output is transmitted to modulatory systems and neuronal correlates in the brain stem. The direct connection to the hypothalamus allows the amygdala to trigger hormonal responses. Figure adapted from (LeDoux, 2006) . . . . . . . . . . . . . . . . . . . . . . 18 xxi 1.4 2.1 2.2 The connectivity of cortical and sub-cortical clusters. The prefrontal cortex is highly connected to the amygdala, sensory cortices, the hippocampus and nuclei in the brain stem that regulates hormonal responses. Figure from (Salzman and Fusi, 2010). . . . . . . . . . . . . . . . . . . . 21 Full body humanoid robots. Asimo (left), Hubo (middle) and iCub (right) are three examples of androids with different capabilities and objectives. . . . . . . . . . . . . 27 Small full body humanoid robots. Nao (left) and Qrio from Sony (right). . . . . . . . . . . . . . . . . . . . . . 28 2.3 The three most popular teleoperated androids, so called geminoids: Model F (left), HI (middle) and DK (right) front row, with their human ’originals’, Anonymous young female (left), Hiroshi Ishiguro from Osaka University, Japan (middle) and Henrike Scharfe from Aalborg University, Denmark. . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.4 Upper torso robots: Nexi (left), Domo (center left), Barthoc (center right) and Armar3 (right), partially with mobile platform. . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.5 Expressive robot heads: Kismet (left), Mertz (center) and Roman (right). . . . . . . . . . . . . . . . . . . . . . . . 30 2.6 Zoomorphic robots: Emuu (left), iCat (center left), Leonardo (center right) and Probo (right). . . . . . . . . . . . . . . 31 3.1 The eXperience Induction Machine XIM, a fully instrumented mixed reality space that can be accessed by multiple users either as physical visitors or virtual representations. Virtual visitors are represented in the physical space of the XIM on the surrounding screen and as lit floor tiles. Physical visitors are represented as virtual characters in the virtual world. . . . . . . . . . . . . . . xxii 36 3.2 3.3 In the Mixed condition one remote player built a team with one physical player. The remote players played the game using a computer and a game pad. Physical players inside the XIM were represented as avatars on the screen of remote players. Verbal communication between the remote and physical player was established over a wireless communication headset. . . . . . . . . . . . . . . . . . 40 Spatial distribution of an example epoch. The ball play out (red dot) starts in the middle of the field. At the beginning of the epoch team players were positioned in their team side (blue and green dots). The trajectories of the players show their spatial behavior over time. Play direction was vertical. Team 2 scored a goal when the ball reached the back line of team 1. . . . . . . . . . . . . . 43 3.4 Distribution of epoch winners (right panel) and epoch losers (left panel) for all goal events. The graph only shows one side of the game field; play direction is from top down and vice versa. Colorbar indicates accumulated position of players over time. Winners chose more static and defensive positions compared to losers. . . . . . . . . . . 46 3.5 Distribution of epoch winners (right panel) and epoch losers (left panel) for all goal events. The graph only shows one side of the game field; play direction is from top down and vice versa. Colorbar indicates accumulated position of players over time. Winners chose more static and defensive positions compared to losers. . . . . . . . . . . 48 xxiii 3.6 Schematic representation of the three conditions. Only team of the same condition played against each other. Left panel: Two Physical teams compete each other. All four players are physical present inside XIM. Middle panel: In the Mixed condition one player of each team is present inside XIM and the other player virtually represented. Virtual players use a computer to play the game. Right panel: In the Virtual condition all four players use a computer to play and are virtually represented inside XIM. . 50 3.7 Schematic representation of the detailed analysis of players behavior in different conditions. We compared the behavior of XIM players in the Physical condition with the behavior of the XIM players in the Mixed condition (A) and the behavior of the remote players in the Mixed condition with the behavior of the remote players in the Virtual condition (B). . . . . . . . . . . . . . . . . . . . 51 Still images of stimuli in frontal view (A-C), and side view (D-F). Head/Torso inclination varied between 55 degree down (A, D), zero degrees (B, E), and 15 degrees up (C, F). . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Valence and arousal rating for Head/Torso inclination. Error bars indicate standard error. Valence rating 0 indicates a very sad emotional state, rating 10 a very happy state. Arousal rating 0 indicates low arousal, arousal rating 10 indicates high arousal. . . . . . . . . . . . . . . . . . . 63 Valence and arousal rating for different speed parameters. Error bars indicate standard error. Valence rating 0 indicates a very sad emotional state, rating 10 a very happy state. Arousal rating 0 indicates a low arousal state, arousal rating 10 indicates a high arousal state. . . . . . 64 4.1 4.2 4.3 xxiv 4.4 Distribution of the animations in the circumplex space. The legend indicates the stimuli parameter space of the different animations: <speed>.<viewing angle>.<head/torso inclination>. The speed parameter is defined as Fast = 1.4 m/sec, Medium = 0.75 m/sec and Slow = 0.5 m/sec. The viewing angle varies between profile view = 90 degrees, and rotated frontal view = 45 degrees. The parameter for the head/torso inclination varies between Neutral = 0 degrees, Up = + 15 degrees and Down = -55 degrees. . . . 65 4.5 Schematic representation of the three stages involved in perceptual recognition proposed by the Fuzzy Logical Model of Perception FLMP. The three processes are temporarily successive, but overlapping. Reading direction in the diagram is from left to right. The model is explained with a task where subjects have to integrate affect from words and expressions. The source of information are indicated by upper case letters: Expressive information by Ei , word information by Wj . The evaluation process transforms this information into perceived features, indicated by lower case letters ei and wj . The integration process results in an overall degree of support sk , for a given affect k. The decision process maps the output of the integration into a response Rk . All three processes make use of prototypes stored in the memory. . . . . . . . . . . . 69 4.6 Tree of wisdom illustrating binary oppositions central to the differences among theories of perception. Figure retrieved from (Massaro, 1998). . . . . . . . . . . . . . . 70 The affective facial expressions of the stimulus space used in experiment 1. The eyebrows and the mouth corner deflection of Baldi were varied to produce a stimulus continuum from happy to angry. . . . . . . . . . . . . . . . 77 4.7 xxv 4.8 4.9 Reaction time in the expression condition (left) and the word condition (right). When the stimulus construct had coherent valence qualities reaction times were reduced in both conditions. The box indicates the 25th and the 75th percentile, the whiskers indicate the most extreme data points not considered as outliers. The horizontal line is the median. . . . . . . . . . . . . . . . . . . . . . . . . 79 Observations (symbols) and predictions (lines) for the fuzzy logical model of perception FLMP in the expression condition (left) and the linguistic semantics condition (right). We observed a significant influence of the angry words on the judgments of the neutral facial expressions (left panel). This effect was not observed in the linguistic semantics condition (right panel). . . . . . . . . . . . . . . 81 4.10 Observations (symbols) and predictions (lines) for the fuzzy logical model of perception FLMP (left) and the weighted additive model of perception AMP (right). The plot shows the fits for the bimodal condition where subjects had to identify the affect of the overall event. The FLMP makes a significant better prediction for the observed data compared to the AMP. . . . . . . . . . . . . . . . . . . . . . 81 4.11 The affective facial expressions of the stimulus space used in experiment 2. The eyebrows and the mouth corner deflection of Baldi were varied to produce a stimulus continuum from happy H (top left) to angry A (down right) in 10 steps. The letter N indicates a neutral intermediate state. The number indicates the strength of the affect. . . 85 4.12 The mean RT in experiment 1(M = 0.97, SD = 0.5) was significant faster compared to experiment 2 (M = 2,12, SD = 1,1) (Wilcoxon z = 79,6, p < 0.01). . . . . . . . . . 87 4.13 Observations (symbols) and predictions (lines) for the fuzzy logical model of perception FLMP in the expression condition (left) and the linguistic semantics condition (right). 88 xxvi 4.14 Observations (symbols) and predictions (lines) for the fuzzy logical model of perception FLMP (left) and the weighted average model WAM (right). The average root mean square deviation RMSD for the FLMP (0.032) and the WAM (0.031) did not differ in their quality of prediction. . . . . 88 5.1 The architecture of the integrated model: The Non-specific learning system (NLS) is shown on the left, the specific learning systems (SLS) on the right. In the NLS the activation of the amygdala (A) and the nucleus basalis (NB) induces plasticity in the auditory cortex (AC). The conditioning stimulus (CS) reaches the auditory cortex over the thalamus (Th) where it converges with the unconditioned stimulus (US). Inhibitory interneurons (IN) regulate the amount of plasticity. The pontine nucleus (PN) gates the stimulation from the NLS to the SLS. In the SLS the CS and the US converge at the level of the purkinje cell resulting in the induction of LTD at the purkinje synapse. This induces a dis-inhibition of the deep nucleus (DN) leading to the exact timed motor conditioned response (CR). The reflexive unconditioned response (UR) is elicited without adaptive processing. A amygdala; AC auditory cortex; CS conditioning stimulus; DN deep nucleus; GC granule cells; IN inhibitory interneurons; IO inferior olive; NB nucleus basalis; CR conditioned reaction; PN pontine nucleus; PU purkinje cell; Th thalamus; US unconditioned stimulus . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii 99 5.2 The architecture of the cerebellar SLS. The CS and the US converge at the purkinje cell synapse (PU-SYN). CF climbing fibre, CR conditioned reaction, CS conditioned stimulus, DN deep nucleus, GA granule cells, GO golgi cells, IIN inhibitory interneurons, IO inferior olive, MF mossy fibre, PF parallel fibre, PU-SP purkinje cell spontaneous activity, PU-SO purkinje cell soma, PU-SYN purkinje cell synapse, US unconditioned stimulus. . . . . . . 103 5.3 Robot application: A ePuck robot moves autonomously in a circular open field arena. The association of the red color on the floor detected by a camera (CS) and the detection of the wall by proximity sensors (US) induced learning in the proposed computational mechanism. The green arrows indicates the moving direction of the robot. 107 5.4 Reactivity of the auditory cortex before and after the conditioning. CS is the stimulus with ID 1. Before the conditioning the cortical reaction to all 5 stimuli is homogenic. After the conditioning the cortex response to the CS is increased. . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.5 Learning of the exactly timed CR by the SLS: The PU cell activity decreases during conditioning trials 1-13. During trial 12 the activity under-runs for the first time the threshold resulting in the dis-inhibition of the deep nucleus. During trial 13 the PU cell activity under-runs the threshold before the US and an exactly timed CR is triggered. The CS and the US are only schematically represented in this plot. . . . . . . . . . . . . . . . . . . . . . 109 5.6 The performance of the integrated model before the conditioning. The purkinje cell (PU) does not change its activity and no CR is elicited. CS conditioned stimulus, US unconditioned stimulus, AC auditory cortex, PU purkinje cell, CR conditioned reaction. . . . . . . . . . . . . . . 110 xxviii 5.7 The performance of the model after the conditioning. The CS representation in the auditory cortex (AC) is increased. A delayed pause in the purkinje cell (PU) can be observed. The CR is elicited just before the US presentation. CS conditioned stimulus, US unconditioned stimulus, AC auditory cortex, PU purkinje cell, CR conditioned reaction. 110 5.8 The behavior of the ePuck robot before conditioning. The robot enters the red area of the arena. The proximity sensors detect the wall (US) and elicit the unconditioned response (UR) in form of a late turning. The blue line indicates the track of the robot in the arena. . . . . . . . . . 111 5.9 The behavior of the ePuck robot after conditioning. The robot does not enter the red area of the arena. The camera detects the red color (CS) and the model elicits a conditioned response (CR) in form of an exactly timed turning. The blue line indicates the track of the robot. . . . . . . 112 5.10 The change of the synaptic weight at the level of the PFPU during the robot experiment. Every time a CS and a US coincide at the level of the purkinje synapse LTD becomes induced. Once the synaptic efficacy reaches a critical level a conditioned response becomes trigger avoiding future LTD induction and the synaptic weight becomes stable. . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.11 The performance of the ePuck robot measured by percentage of performed conditioned response and occurred US. After 113 trials the robot shows conditioned behavior. The fluctuation in response is due a spontaneous recovery of the synaptic transmission at the Purkinje cell. Whiskers indicate STD. . . . . . . . . . . . . . . . . . . 114 xxix 6.1 During the conditioning phase (left panel) an animal becomes exposed to a neutral tone (CS) and an aversive foot shock (US). After the conditioning phase (right panel) the animal reacts with a freeze response when exposed to the original neutral tone (CS). Figure adapted from Nadel and Land (2000). . . . . . . . . . . . . . . . . . . . . . . . 118 6.2 An aversive stimulus is transmitted by two pathways to the amygdala: The low route transmits the sensory information directly from the thalamus to the amygdala. This route is fast and responsible for unspecific behavioral responses. The high route sends the sensory input to cortical areas for the evaluation of the stimulus features. This route is slower, but capable to elicit more specific cognitive and behavioral responses. Figure adapted from LeDoux (1994). . . . . . . . . . . . . . . . . . . . . . . 119 6.3 The processing of a neutral CS and an aversive US. When CS and US coincide at the location of the amygdala, learning is induced. The results are different physiological and behavioral responses. LA lateral amygdala, CE central amygdala, CG central gray, LH lateral hypothalamus, PVN paraventricular hypothalamus. Figure adapted from Medina et al. (2002) . . . . . . . . . . . . . . . . . . . . 120 6.4 Schematic representation of the fear conditioning model. The visual stimulus and the audio stimulus are transmitted over the thalamus to the amygdala where they coincide. This co-activation induces an adaptation of the synaptic weight. After conditioning the change in synaptic weight allows the CS to trigger the behavioral response. . . . . 122 6.5 The iCub uses led lights to express different emotions in the face. The picture shows his angry expression that was used in the present study. . . . . . . . . . . . . . . . . . 123 xxx 6.6 Experimental design of the fear conditioning in the iCub. The association of a neutral CS with an aversive US induces a change in plasticity. After conditioning the CS alone is capable to elicit the behavioral response. A nonconditioned stimulus NS elicits also after the conditioning phase a unconditioned response. . . . . . . . . . . . . . 124 6.7 The conditioning phase of the iCub. Before conditioning the iCub smiles when seeing either red (A) or blue (B). During the conditioning phase the robot sees the blue while hearing 4-5 aversive noise events (C). After the conditioning the robot reacts with an angry face when seeing the blue hat (E), but still smiles when seeing the red color (D). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 6.8 The component process model (Scherer (2001); Sander et al. (2005)). Represented are the five components of emotion (vertical) as well as the sequence of appraisals (horizontal) and the interaction between subsystems that gradually shape the emotion, supporting the genesis of a particular feeling. . . . . . . . . . . . . . . . . . . . . 128 xxxi 6.9 The system architecture of DAC: the system consists of three tightly coupled layers: reactive, adaptive and contextual. The reactive layer endows a behaving system with a prewired repertoire of reflexes (low complexity unconditioned stimuli and responses) that enable it to display simple adaptive behaviors. The activation of any reflex, however, also provides cues for learning that are used by the adaptive layer via representations of internal states, i.e. aversive and appetitive. The adaptive layer provides the mechanisms for the adaptive classification of sensory events and the reshaping of response. The sensory and motor representations formed at the level of adaptive control provide the inputs to the contextual layer that acquires, retains, and expresses sequential representations using systems for short and long term memory. The contextual layer describes goal oriented learning and reflexive mechanisms. . . . . . . . . . . . . . . . . . . . 130 xxxii List of Tables 1.1 3.1 ’Basic’ emotion classes of different theorists according to Ortony and Tuner (1990) . . . . . . . . . . . . . . . . . 15 Proxemics behavior of winners and losers: Mean time of shared interaction space; standard deviation in brackets. IS = intimate space; PS = personal space; Sig = significance (a p < 0.1, * p < 0.05, ** p < 0.01). . . . . . . . Spatial intra-team interactions for winners and losers during the entire game, winning and losing epochs and offensive and defensive game situations: Mean intra-team member distance; standard deviation in brackets. ITMD = Intra-Team Member Distance; Sig = Significance level ** p < 0.01. . . . . . . . . . . . . . . . . . . . . . . . . Spatial behavior of XIM players and remote players. Mean sprinted distance, mean distance to the mid-line of the team side and mean time spent in the field side of the team member (Time behind mid-line); standard deviation in brackets. Sig = Significance level ( * p < 0.05, ** p < 0.01). . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.1 Specification of the stimuli parameters . . . . . . . . . . 62 6.1 Levels of processing for stimulus evaluation checks. Adapted from Leventhal and Scherer (1987). . . . . . . . . . . . 127 3.2 3.3 xxxiii 44 47 Chapter 1 INTRODUCTION 1.1 What Are Emotions? Before I can introduce our studies, I have to answer some basic questions addressing the phenomenon of emotions. This includes basic definitions that I will use later in our argumentation. This is an important step because different emotion scientists use the same terms for different concepts. So it is fundamental to clearly define the personal position in this discourse. We will use the following five questions to start the discussion about the emotions. 1. What are the building blocks of an emotion? 2. What is the time scale of an emotion? 3. What distinguishes an emotion from a non-emotion? 4. How do we distinguish different emotions? 5. What is the neurobiological basis of an emotion? 1 Each of the question will be answered during the introduction. The last part of the introduction provides a summary of how scientist constructed models of synthetic emotions and how they implemented them into virtual and physical agents. 1.2 The Building Blocks of an Emotion An emotion is a multi-dimensional body-mind state that emerges over time. Different functional building blocks that can be identified and described form part of this state. The driving forces of every emotion are needs and motivations of an agent to act. As we will see such needs can be very basic or complex. The emergence of an affective state is an appraisal mechanism that evaluates the congruency of an agents needs and goals with a perceived internal state or external stimuli. This evaluation is a multi-modal process that involves simple reactive and complex cognitive brain processes. Goal congruency induces positive emotions, goal in-congruency induces negative emotions. The resulting emotional state is a mix of specific cognitive activity patterns and physiological reactions. It allows an agent to access implicit mental state that affect behavior, cognition, memory and perception. The physiological state affects the somatic performance and sensory sensitivity. Mental and physiological states are connected in a closed loop. The function of emotions is to increase an agents capability to survive in the physical and social environment. 1.2.1 Needs and Motivation Needs are defining conditions of an agents well-being. The satisfaction of such needs are the driving motor of emotions. The most basic examples are physiologically needs that control the nutrition intake, sleep and the urge for security. In biology this process that control fundamental needs are described as homeostasis and allostasis. Homeostasis is the regulation of a physiological state within a cer2 tain threshold controlling for one parameter. The regulation of the body temperature is a good example to visualize this process. The discrepancy of the desired value induces physiological and behavioral changes to re-establish stability. Allostasis maintains the internal stability of an organism in an adaptive fashion. By adjusting actively to predictable and unpredictable future needs allostasis controls anticipatory. Dehydration is a good example to visualize the difference between homeostasis and allostasis. The reduction of sweating is a simple homeostatic process to control for the loss of water. The orchestration of multiple such homeostatic processes like urine reduction, mucous membrane dehydration or blood pressure that are directly or indirectly helping to maintain stability is called allostasis. In contrast to homeostasis, this process can happen anticipatory. The biological regulation of fundamental needs establish health and survival of an organism. The result of this regulation are basic drives, like foraging or the defense of territory. May someone ask why reproduction is not listed as a fundamental need of an organism. This is because reproduction is only a fundamental need of a species, but not of an individual. An organism can survive without reproduction, while the species can not. This does not mean that reproduction is not a strong need. Basic needs regulate directly survival and are therefore defined as fundamental. Non-fundamental needs drive behavior that is not directly linked to survival, but to the well-being of an organism. Some examples are sex, the need for belongingness and esteem. Even more complex needs are self-actualization, a cognitive need for understanding and a need for aesthetics (Maslow, 1954). The need of self-actualization describes an individuals desire for self-fulfillment. The cognitive need for knowledge and understanding has different underlying mechanism. Acquiring knowledge and systematizing the environment increases security and reduces unpredictability. Curiosity is a innate mechanism that has not to be taught to infants. A second explanation of this need is the satisfaction of insight and understanding. Aesthetic needs are based on a crave for beauty and a repulsion of ugliness. The main difference between needs and motivation is that needs define desired conditions while motivations describe cognitive states of wants. 3 In a healthy subjects all unsatisfied needs are succeeded by a motivational state. Mostly but not always this state drives a behavioral action. Motivation is not the motor of behavior. It describes only the causal mechanisms of action initiation. An agent is motivated because he or she detects an unsatisfied need. Needs are organized into a hierarchy of relative prepotency. This means that if two needs from different hierarchy level are unsatisfied the more basic need will drive behavior. Gratification and satisfaction built a fundamental concept in motivation theory. A satisfied need gives raise to another need. This does not imply that a need has to be 100 % satisfied until the next need starts driving. Gratification of needs and deprivation of resources and conditions that satisfy needs built the basic mechanism for the emergence of emotional states. Motivational states provide an agent with a drive that is necessary to behave if a need is not satisfied. 1.2.2 The Valence System Emotions code valence. This is the most important functional axiom of every emotion. They inform an individual if a stimulus is good or bad. Because stimuli can have different levels of complexity different mental and somatic processes are involved the human valence system. As we will see later in this introduction different subcortical, cortical clusters and bodily reactions have been identified so far to process the valence of a stimulus. The mechanism of evaluation is called appraisal. 1.2.3 The Appraisal Mechanism Appraisal is a cognitive-somatic evaluation process that identifies a stimulus as positive or negative affecting (Arnold, 1960). This stimulus can be internal or external. The evaluation process is structured into different dimensions (Lazarus, 1991). The goal relevance determines whether a given stimulus or situation is relevant to one’s goals. The dimension of goal congruency determines whether the stimulus or the situation facilitates or averts the agents goals. The type of ego-involvement identifies 4 the relationship between the agents ego- identity within a group of people. The fourth dimensions determines who or what is accountable and whether a credit or blame should be assigned. The coping potential defines if the agent is capable to deal with the result of the stimulus or the situation. The future expectancy estimates the likelihood for further congruencies with the agents goals. The six dimensions of appraisal: • Goal relevance • Goal congruency • Type of ego-involvement • Blame or credit • Coping potential • Future expectancy All six dimension are processed in parallel and overlapping in time. The results of this appraisal mechanism is the induction of positive or negative affect expressed by different responses on cognitive and somatic levels. 1.2.4 Neurocomputational, Physiological and Behavioral Responses The final building block of emotions is the response repertoire. On a cognitive level we observe influences on memory building processes, perception, attention and decision mechanisms. Adaptions in synaptic plasticity and neurotransmitter releases have been identified as crucial neurocomputational factors responsible for this cognitive responses. On a physiological level the human body modulates hormonal levels, energy consumption, transpiration, respiration, the body temperature and the muscle control. These adaptations produce a wide variety of expressive and goal directed behaviors. 5 1.3 The Time Scale of Emotions It is important to understand that the emergence of an emotion is a sequentially structured processes. The three different mechanisms that underlie this process are: Perception, evaluation and response. Through perception a system detects and identifies internal and external stimuli, for example a bear as a big animal. This process can also happen unconsciously. The perception of homeostatic or physiological states for example do not always reach consciousness. At this stage of processing we do not have any information about the meaning of the stimulus. The second stage is appraisal. This mechanism evaluates if the stimulus is goal-congruent or not. This mechanism differentiates emotions from nonemotional cognitive processing. The appraisal mechanism can be processed on a somato-sensory level with very little cognitive activity. The smell of rotten food for example induces a straight forward emotion of disgust. But appraisal can also be happening without any somato-sensory input. For example the imagination of past or future events like singing in front of an audience can induce emotions as fear or happiness. The last stage of the sequence is the response. This includes three dimensions: Cognition, physiology and behavior. The paradigm of fear conditioning is a prime example how dramatically cognition, physiology and behavior can be affected by potent stimuli. The endurance of these responses vary across systems. Behavioral responses often happen quickly, while physiological and cognitive responses can have long-lasting effects. Emotions are not a on-off mechanism. They arise and disappear over time. Because the evaluation and the responses include somatic and cognitive system, the time course vary highly across systems. A fearful stimuli induces a quick behavioral and also hormonal response. Now the released stress hormones may stay for hours in the blood. On a cognitive level a positive experience can be stored in memory and retrieved over and over again inducing a long lasting positive feeling. This means that emotions can be induced very quickly but having long lasting effects. Emotion research therefore classify emotional states that last over multiple hours or days as moods (Frijda, 1993). The conscious perception of the mental 6 representations that characterize emotions has been classified as feeling (Damasio, 2001). Its important to make this differentiation that are based in the time course of the emergence of an affective state. 1.4 What Distinguishes an Emotion From a NonEmotion? The understanding of emotions fundamentally changed during history. One main conceptual shift can be summarized with the body mind problem: Are emotions body responses or cognitive states? Where is the location of emotions? As we will see one fundamental difference between emotions and non-emotions is s unique interconnection of somatic and cognitive processes. Emotion scientist have elaborated different theories that try to identify the location and relation of different emotional processes. Already in early theories of emotions that can be tracked back to the greeks and asian philosophers we observe two main theories of emotions that later gave raise to distinguishable philosophical streams addressing this problem. In the dualistic approach the emotions or passions are place in the immortal soul and affected by the bodily spirit. The immediate cause of bodily spirit movements is an external cause that gives raise to an emotional reaction. The object of such an event is the content of what the emotion is about (See Figure 1.1). 1.4.1 The Feeling Theory of Emotions The cartesian idea, also know as the Feeling Theory of Emotions, was fundamentally expressed in the work of William James (1842-1910) and Carl Lang (1843-1900). The James-Lang theory of emotions can be conceptualized with the well know example of the bear: Walking through the woods one day, Susan stumbles across a large grizzly bear which then starts running towards her. She 7 Figure 1.1: Schematic illustration of the Cartesian analysis of anger. The exciting cause from the external world stimulates the body spirits that are conceptualized as the immediate cause of emotions. The bodily spirits give raise to both the behavioral response and the emotion itself. In the cartesian approach it is not clear what the object of the emotion is. Figure retrieved from (Power and Dalgleish, 1997) 8 turns and runs away. The conscious perception of the change of her physiological state makes her feel terrified. So the person starts running and induces a physiological change. The perception of these exciting fact is the emotion itself (James, 1884). There are different problems associated with the Feeling Theory of Emotions. It has been claimed that this theory can not explain the wide range of different emotions and behaviors (Cannon, 1927). Physiological states are ambiguous, so how do we differentiate between fear, anger or jealousy? This includes that we body can also become physiologically aroused without experiencing any emotion. Intense sport activity is a good example for that case. Another problem of the feeling theory is that it regards emotions as inner states that only can be known through introspection. Philosophers conceptualized this problem as the Private Language Problem (Wittgenstein, 1963). It states that any word, that only describes a state that can be accessed by subjective observation acquires a pure private and unverifiable meaning. The two described problems summarize the main critique scientist expressed in response to the Feeling Theory of Emotions. A related approach is the one psychological behaviorism. The principle idea of this scientific direction is that we can not base our theories on introspective mental states, that can not be accessed from ’outside’. Its theoretical goal is to make predictions of behavior using objective observant experimental methods (Watson, 1930). Both approaches psychological behaviorism and Feeling Theory of Emotions claim that the constituent parts of emotions are the physiological response induced by the stimulus. The main difference is that James-Lang and his followers make some claims about the mental states of the subject, while the psychological behaviorists base their conclusion on objective observations. The Watsonian account describes the basic emotions of fear, rage and love. But this approach faces the same difficulties as the Feeling Theory of Emotions when it wants to describe a wider range of emotions. Other behaviorists tried to face this critics by identifying a set of operants that induce clearly defined reinforced behavior (Skinner, 1976). The main idea is that a set of operants A induce a set of reinforcers A. This set defines the emotion 1. 9 The emotional state 2 then is induced by a set of operants B leading to a set of reinforcers B. Going back to the bear example we can visualize the problem we are facing with this account: Lets assume that Susan decides to stand still rather than start running. According to Skinner’s theory this would lead to a different emotional state, because we are changing the operants. But we can not be sure that this is the case. 1.4.2 The Cognitive Approach to Emotions We see that we are facing fundamental problems with emotion theories that base the description of emotional states mainly on the physiological changes in the body. This is the main reason why scientists’ early on started to include the mind in their investigations of emotions elaborating cognitive theories of emotions. A fundamental basis of his work is the distinction of matter and form in any individual entity (Evans, 1995). It means that any individual entity can be described by what its made of the matter - and what it makes it what it is - the form. Applied to the emotions this means that the physiological response, for example the boiling blood accounts for the matter, while the relationship to the object of the induced emotion accounts for the form. Aristotle’s functionalist view of the emotion is based on three conditions that have to be satisfied to elicit an emotional state: First, the object that describes the external situation the individual is confronted with. Second, individual must have a state of mind that allows him to experience the emotion and third a stimulus capable to elicit the emotion 1.2. Aristotle’s functional concepts of stimulus introduces the mind as an evaluator of the confronted situation and allows us to distinguish between different emotions. Thomas Aquinas (1225 - 1274) and Baruch Spinoza (1632 - 1677) are probably the most known philosophers that took up this functional approach of emotion in their work. A difference to the concept of Aristotle is that both philosophers include in their theory a noncognitive impulse that controls initial approaching or avoid to an object. This impulse induces a physiological tone that affects a basic emotional state, like pleasure or pain. In a second step cognition evaluates the ’ac10 Figure 1.2: The cognitive account of emotions by Aristotle applied to anger. The object describes the external event that becomes evaluated by the individual, that is in an appropriate state of mind. The results is an internal representation or stimulus that elicits the emotional response which is divided into the dimension of matter and form. Figure retrieved from (Power and Dalgleish, 1997) 11 companied idea’ of the emotion. In Spinoza’s approach cognition has no causal role and, thus it is an example of a weak cognitive theory of emotions. Another issue of discussion is how the non-cognitive nature of the initial impulse. It is unclear how this impulse gives either rise to one or another physiological tone. Both philosophical streams the Feeling Theory of Emotions and the cognitive approach tried to identify the location of emotion processing either in the body or in the mind. Actual theories of emotions are mainly inspired by the cognitive account, without blending out the somatic component of the body. As we will see we find components of both streams in modern emotion research. One unifying idea in the actual discussion is the mechanism of appraisal or the cognitive evaluation of a stimulus as affecting oneself in some a way that matters (Arnold, 1960). Despite the cognitive appraisal theory of emotion is very popular today, it would be wrong to say that the Feeling Theory of Emotions is not present anymore in the actual debate. One of the most popular defender of this idea was Robert Zajonc. He stated in his well recognized article that affective and cognitive systems are largely independent (Zajonc, 1984). One core concept of his theory is the ’primacy of affect’. That means that an emotion does not require by necessity a cognitive state. He bases his argument on differences in the phylogenetic evolution and the separate neuroanatomical structures of the two systems. He also challenges the appraisal theory by showing that an emotional state can be induced without any prior mental state using drugs, hormones or electrical stimulations. An interesting example how the cognitive evaluation and the somatic state are inter-connected can be seen in an empirical work by Schachter (1964). In their famous experiment they manipulated the physiological arousal of people by giving them epinephrine also know as adrenalin). Now they exposing the test group which was not informed about the effects of the epinephrine injection and the control group which was informed about the side effects of the epinephrine to either an euphoria or anger inducing situation. Their results show that the emotional response to positive and negative affect was bigger in the misinformed test group. This example shows how a cognitive mechanism labels an am12 biguous somatic state thereby induces a different emotional experience. This means that the appraisal mechanism can not be completely uncoupled from the body. A similar approach is the somatic marker theory that states that body markers influence the process of responses to a stimuli at multiple levels of operation (Damasio et al., 1996; Bechara et al., 2000). Some of those processes occur consciously in the mind and some of them non-consciously in a non-minded manner. These markers are called somatic because they arise in bioregulatory processes normally involved in emotional states. The theory concludes that not only the mechanism of appraisal but also reasoning and decision making processes that traditionally are understood as pure cognitive are influenced by this mechanism. We can summarize that the ’feeling’ approach of James-Lang is facing different difficulties in explaining the full complexity of different emotional states only referring to the perception of physiological changes. On the other hand a pure cognitive approach is not capable to explain the why emotions can precede mental cognition and can be induced artificially without affecting mind processes. This brings us to the conclusion that an embodied appraisal that includes psychosomatic representations into the cognitive evaluation of the stimulus could be understood as constructive proposal to unify the two dimensions (Prinz, 2004). We also have to be aware that during this section we were shifting between different emotional states of different grades of complexity. The generalization of emotional states of different levels of complexity is critical. As we will see in the next section we have to differentiate between different simple button-up and complex top-down mechanism if we want to understand to the phenomenon in its full picture. 1.5 Basic and Complex Emotions If we talk about emotions we have to be aware that term includes a wide range of different body mind states. In this section I want to address the third question stated at the beginning of the introduction: ’What distinguishes an emotion from another?’ So the question is if emotional states 13 can be categorized into different classes or if they have to be described as a multidimensional continuum. Emotions are evolved adaptive perception - response patterns that help an individual to survive. Different situations and stimuli induce different emotional states. This causality can be explained with the idea that emotions have different formal objects. Fear is about danger and sadness is about loss for example. So emotions do not have an unifying formal object (De Sousa, 1987). This brings us to the problem that we have to describe the different observed states. Now we can follow two approaches to do so: We can either describe the differences of the phenomenon using modular parameters constructing a multidimensional semantic space or we use categories. The first approach was conceptualized by the disunity thesis that states that emotions do not form a natural class (Griffiths, 1997). By natural class we mean boundaries between categories that derived from nature and not from the classification of humans observing nature. Griffiths arguments that the phenomenon emotions can at least be divided into two subcategories. These two categories can be described by the concept of ’affect programs’ and higher cognitive emotions. Affect programs are fast appraisal mechanisms that induce physiological changes and action dispositions (Ekman and Friesen, 1986). This programs are modular, means they are divided into modules that process a certain information stream that is not affected by the information representation in other processes. Affect programs describe a basic set of emotions that include fear, anger, happiness, sadness, surprise and disgust. This means that we are capable to produce a predefined stimulus that triggers one of the named basic emotional state. Ekman shows that these basic emotions are universally expressed and perceived across different cultures (Ekman and Friesen, 1986). Scientists’ investigating the neural correlates of basic emotions show that evolutionary old subcortical structures are involved in the processing of these states (LeDoux, 2000). Scientists do not agree on a clear definition which emotional states should be described as basic and which not (See table 1.5). The concept of basic emotions does only capture a subset of states that 14 Table 1.1: ’Basic’ emotion classes of different theorists according to Ortony and Tuner (1990) Theorist Basic Emotions Arnold (1960) Anger, aversion, courage, dejection, desire, despair, fear, hate, hope, love, sadness Ekman, Friesen, Anger, disgust, fear, joy, sadness, surprise Ellsworth (1982) Frijda (Personal Desire, happiness, interest, surprise, communication, 1986) wonder, sorrow Izard (1971) Anger, contempt, disgust, distress, fear, guilt, interest, joy, shame, surprise James (1884) Fear, grief, love, rage Mowrer (1960) Pain, pleasure Oatley & Anger, disgust, anxiety, happiness, Johnson-Laird (1987) sadness Panksepp (1982) Expectancy, fear, rage, panic Plutchik (1980) Acceptance, anger, anticipation, disgust, joy, fear, sadness, surprise Tomkins (1984) Anger, interest, contempt, disgust, distress, fear, joy, shame, surprise Watson (1930) Fear, love, rage 15 we call emotions. The so called higher cognitive emotions are based on the cognitive evaluation of a stimulus or of the self in the social context. They are completely disconnected from direct somatosensory streams. Some examples are the emotions pride, shame, embarrassment, empathy and guilt (Strongman, 1987; Lewis, 1993). These emotions are also called self-conscious emotions, because they require a concept of self in order to emerge. They are highly influenced by the standards of a society and therefore differ across cultures. Another difference to basic emotions is that these type of affective states often lack classes of specific stimuli capable to elicit the emotion. The elicitation of pride for example requires different factors all having to do with cognition related to the self. Cognitive factors may also play a role in the elicitation of more basic emotions, however the nature of these processes are much less cognitively elaborated than in self-conscious emotions (Plutchik, 1980). Another reason that makes why higher cognitive emotions are more difficult to classify is the lack of coherent physiological and behavioral response patterns. While basic emotions like happiness or sadness can be differentiated by distinct facial expressions, complex emotions like guilt or shame are more difficult to identify on all levels, physiological, behavioral and neurocomputational. The neuroanatomical organization of the cortical and subcortical clusters involved in the processing of basic and complex emotions reflect also the evolutionary development of the two types of processing (LeDoux, 2000; Aggleton, 1992). As we will see basic emotions as fear or disgust are associated with activity in subcortical evolutionary old structures in the brain stem. Social and self-conscious emotions are additionally to these old structures inducing activity in cortical areas like the prefrontal cortex. A functional organization reflecting this development has been proposed for the insular cortex (Craig, 2009). 16 1.6 The Neurobiological Basis of Emotions This section will provide an overview of the most important areas involved in the processing of affect and the elicitation of appropriate responses. 1.6.1 Subcortical Areas In traditional emotion research, a set of subcortical brain structures including the hippocampus, amygdala, anterior thalamic nuclei, septum, limbic cortex and the fornix are conceptualized as the main location where emotions being processed. This network is also known as the limbic system. Despite this term is widely used in the field, only few empirical work can be found that fosters and defends the functionality of the limbic system (LeDoux, 2003). Because of this lack of empirical results we will skip this term in future discussion and rather explain the connectivity, anatomy and functionality of each area in particular. One of the most prominent investigated nucleus in the subcortical cluster is the amygdala. The structure is highly connected to multiple cortical and subcortical areas (Swanson and Petrovich, 1998; Aggleton, 1992). It receives input from all five senses and is therefore a dominant relay station of sensory information transmission (See figure 1.3). The most profound investigation of the amygdala’s functionality has been done using the paradigm of fear conditioning (LeDoux and Phillips, 1992; Smith et al., 1995; LeDoux, 1996, 2000; Sehlmeyer et al., 2009; Johansen et al., 2010). These studies have shown that the amygdala is responsible for the evaluation of fearful stimuli. Recent studies provide results that the amygdala also evaluates positive valence, proposing this structure as a general affect detector (Paton et al., 2006; Salzman and Fusi, 2010). The activity of the amygdala stimulates different subcortical and cortical clusters. The most prominent modulatory systems are the nucleus basalis, the raphe nuclei, the pons and the ventral tegmental area which regulates the neurotransmitter serotonin, acetylcholine and dopamine. These 17 Figure 1.3: The connectivity of the amygdala. This nucleus receives inputs from all sensory modalities, cortical and subcortical areas. The output is transmitted to modulatory systems and neuronal correlates in the brain stem. The direct connection to the hypothalamus allows the amygdala to trigger hormonal responses. Figure adapted from (LeDoux, 2006) 18 neurotransmitters are responsible for the regulation of moods (Young and Leyton, 2002), anxiety (Hariri et al., 2002), memory acquisition (Gold, 2003) and reward processing (Schultz, 2002). Another output location of the amygdala is the autonomous nervous system (ANS). The two main divisions of the ANS, the parasympathetic nervous systems and the sympathetic nervous system, receive inputs over the vagus nerve and the hypothalamus (Bechara et al., 1999). The parasympathetic nervous system is responsible for the regulation of rest and digest functions, the sympathetic nervous system for fight and flight behaviors. Additionally the amygdala connects to the association cortex that is related to cognition (Price, 2003) and to the prefrontal cortex related to the control of behavior (Quirk et al., 2003). The dense connectivity of the amygdala provides the basis for its two main functionalities: the evaluation of affect and the orchestration of different response patterns (Anderson and Phelps, 2001). Functional imaging studies show evidence that the amygdala is involved in the detection of fear expressions in faces (Adolphs et al., 1994; Morris et al., 1998), the regulation of anxiety (Sehlmeyer et al., 2009) and social behavior (Bickart et al., 2010; Davis et al., 2009; Haruno and Frith, 2010; Schiller et al., 2009). Another subcortical region involved in emotion processing is the hippocampus. This structure has a long-established role in spatial memory. Recent studies have identified this structure also as a regulator of defensive responses. Rats with bilateral hippocampus lesion show reduced freezing (LeDoux and Phillips, 1992), displayed fewer defensive reactions when confronted with a cat (Pentkowski et al., 2006), reduced expression of unconditioned responses (Deacon et al., 2002) and avoidance behavior of threatening stimuli (Chudasama et al., 2009). These results underlie the hippocampus’ important role in the normal expression of fear responses. The described network of subcortical clusters with the amygdala functioning as a relay station, the hippocampus as a controller of defense behavior and the different nuclei in the brainstem as global regulator of hormonal and behavioral responses can be seen as the basis of emotion pro19 cessing. Compared to cortical areas this structures evolved earlier in time and can be found in non-primate animals. Therefore emotion researchers describe this pathway as the ’old route’ (LeDoux, 1996). 1.6.2 Cortical Areas Complex stimuli stimulate cognitive mechanisms that are processed in a wide cortical network. Three cortical brain regions that are involved int the processing of emotions have to be pointed out: the insular, the lateral somato-sensory and the prefrontal cortex. The anatomical architecture reveals a strong connection between the prefrontal cortex and the amygdala, the hippocampus and the hypothalamus (Salzman and Fusi, 2010) (See figure 1.4). It has been shown that the pre-frontal cortex is involved in the regulation of fear conditioning (Quirk et al., 2003; Sehlmeyer et al., 2009), phobia’s (Hermann et al., 2009), the control of goal directed behavior (Fuster, 2008) and regulation of hormonal and expressive emotional responses (Kalin et al., 2007). Based on these results it has been suggested that the pre-frontal cortex is involved in the regulation of emotion processing. The second important cortical structure involved in the processing of emotions is the insula. This area receives visceral, pain and gustatory sensory input and has therefore been proposes as the location of somatosensory body representation (Craig, 2010). This body representation is involved in the regulation of homeostasis, but also in decision processes. The somatic marker theory states that the imagination of hypothetical future decision outcomes induces somatic states (Damasio et al., 1996; Bechara et al., 1999; Bechara and Damasio, 2005). The conscious perception of these somatic states affect then the decision process. Functional neuroimaging studies show activity in the anterior insula that are related to social emotions like empathy, compassion and cooperation (Lamm and Singer, 2010). Electrical stimulation of the insular cortex produces social behavior related to these emotions (Caruana et al., 2011). Daniel Craig (2009; 2010) has proposed that the emergence of consciousness is based on the somatic and sentient-self representation processed in the anterior 20 Figure 1.4: The connectivity of cortical and sub-cortical clusters. The prefrontal cortex is highly connected to the amygdala, sensory cortices, the hippocampus and nuclei in the brain stem that regulates hormonal responses. Figure from (Salzman and Fusi, 2010). 21 part of the insula. Another structure involved in the processing of emotional and social signals is the lateral somatosensory cortex, especially the superior lateral sulcus (SLS) and the fusiform gyrus. These areas have been proposed to be involved in the perception of face expressions and face identity. (McCarthy et al., 1997; Haxby et al., 2002; Kanwisher et al., 1997). The understanding of emotional connotation in voices and prosody has been related to increased activity in the right lateral hemisphere (Bookheimer, 2002), while the understanding of the emotional content transmitted on the semantic channel has been associated with the left lateral hemisphere (Bookheimer, 2002; Binder et al., 1996, 2009). The discussed areas show how multilayered the processing of an emotion stimulus is. The variety in stimulus complexity is one of the main reasons why it is difficult to define exactly the brain areas involved in emotion processing. This means that also more exact definitions of the different dimensions of the phenomena emotion will be needed to clearly identify the underlying neurobiological substrate. 22 Chapter 2 SYNTHETIC EMOTIONS AND EMOTIONAL AGENTS So far artificial intelligence (AI) focused on the construction of computational models and applications capable to solve cognitive or behavioral task, ignoring any emotional component. Based on new insights from neuroscience and psychology we observed a paradigm shift dedicating more functional importance to emotional processes. Today it is widely accepted that emotions have a fundamental function that increased the fitness of an individual in complex environments. In the recent years we observe a recognizable number of computational models of emotions and affective processes. This trend runs in parallel with the development of android robots and autonomous virtual agents that target social interaction with humans. The following chapter will give a short introduction into the science of synthetic emotions and emotional agents. 2.1 Synthetic Emotions The multidimensional phenomenon emotion can be layered into stimulus perception, appraisal, and response elicitation. Computational models of emotions address one or multiple of these stages. In affective computing an important distinction has to be made between theory modeling that 23 focuses on the understanding of the phenomenon emotion and application modeling that aims to improve the control of autonomous agents. 2.1.1 Theory Modeling Theory modeling starts with a formalization process based on the insights from neurophysiological and psychological experiments. This process results in a conceptualization of a theory of an emotion mechanism. To verify the plausibility of the model scientists compare the data from neurophysiological experiments with the performance of the model. An established model is the well studied fear conditioning paradigm (LeDoux, 2000). The insights from these experiments gave raise to computational models that try to illuminate the underlying neurocomputational mechanism of plasticity in the amygdala (Armony et al., 1997; Mor, 1995). The proposal of protoemotions, basic hard-wired reactive mechanism that result in the detection positive or negative valence have been used to construct a bottom-up model of synthetic emotions (Vallverdú and Casacuberta, 2009). All these approaches model a basic neurobiological mechanism that underlies emotions. Other models target more complex mechanism of emotions like appraisal (Wehrle and Scherer, 2001), reasoning (Davis and Lewis, 2003), the regulation of emotions (Elliott and Siegle, 1993) and the involvement of multiple brain areas (Balkenius and Morén, 1998). The main objective of theory modeling is the investigation of the underlying neurocomputational mechanism of emotion processes. 2.1.2 Application Modeling The second stream of synthetic emotion modeling focuses on the construction of controller for interactive agents. The increasing number of applications targeting social interaction with computers, machines and agents motivate researchers from different discipline to construct computational emotive architectures. These models target the dynamics of affective states, appraisal and response patterns (Velásquez, 1997; El-Nasr 24 et al., 2000; Marsella and Gratch, 2009; Gratch and Marsella, 2004), the perception and expression of emotions (Breazeal, 2003) or the influence of emotions in human-agent conversations (Pelachaud and Bilvi, 2003). Other approaches layer their models into emotions, moods and personality to include the time component related to different affective states and personal characteristics (Gebhard, 2005; Corchado Ramos et al., 2009). All these models aim to enrich the social interaction of virtual or real agents with humans. In this first section we have seen that the modeling of synthetic emotions has two main objectives: The understanding of the phenomena achieved by theory modeling and the construction of useful applications. In the next section we will introduce some of the the most popular existing agents that aim to interact with humans. 2.2 Emotional Agents In the recent years we have observed an increasing number of virtual and physical agents that are constructed for social interaction with humans. The variation in surface properties, aesthetics, controlling and functionality is huge and makes it difficult to keep an overview. In this section we would like to introduce some of the most important achievements in the field. 2.2.1 Virtual Agents The boost of computer applications that use autonomous virtual agents to interact with humans increases the demand for believable behavior. The appropriate elicitation of emotional expression that are being recognized as such are fundamental in this approach. This does not imply that the agent has to be realistic. For example the expressive agent Simon is a comical representation of a human baby, but highly expressive (Velásquez, 1997). Another approach is to parameterize facial features used in the visual speech production in order to be capable to produce a 25 realistic text to speech transaction in real time (Massaro, 1998). Newer agents are showing more realistic renderings and an increased variety of expressions (Bevacqua et al., 2008). The behavior of some of these agents are controlled by the Affective Presentation Markup Language, a systematic collection of commands useful for the control of an agent in conversation (DeCarolis et al., 2004). Similar programming languages deal with the multi-modality of conversations (Zong et al., 2000) or on the emotive components (Schröder et al., 2007). Emotional conversational agents (ECA) are agents capable to express and perceive emotions while communicating with humans (Becker et al., 2004; Gratch et al., 2002; Schröder et al., 2008). Some of these examples are used in online stores, video games, help-lines, street navigation or intelligent homes. 26 2.2.2 Physical Agents Today a wide variety of robots constructed for social interaction can be observed. These agents differ in anatomy and function. The following section will give an overview of some of the most recent physical agents that are related to social interaction. Full Body Humanoids One main objective of full body humanoids is to construct robot that are equipped with humanoid-like motion (See figure 2.1). Their joints have a wide degree of freedom allowing them to have a wide variety of gestural and postural behaviors. Such agents are designed to interact with humans. One big challenge of mobile humanoids is the supply of power and computational power. This problem can be solved by outsourcing the computational processes to external serves and equipping the robot with mobile batteries. Figure 2.1: Full body humanoid robots. Asimo (left), Hubo (middle) and iCub (right) are three examples of androids with different capabilities and objectives. In the recent years the construction of small full body humanoids has made impressive improvements (See figure 2.2). These robot-platforms provide autonomous agents with an impressive motor control for much lower costs compared to their bigger brothers. 27 Figure 2.2: Small full body humanoid robots. Nao (left) and Qrio from Sony (right). Geminoids Geminoids are android robots with an impressive high realistic anatomy. The first ever developed geminoid is a copy of Hiroshi Ishiguro form Osaka University, Japan (See figure 2.3). The main objective of geminoids is the investigation of the underlying psychological mechanism of android perception and interaction. The body and the voice of these robots are teleoperated means that a human remotely controls the interaction with other humans. Upper Torso Humanoids Upper torso humanoids build a class of robots that are not equipped with legs or lower body parts (See figure 2.4). Some of these humanoids can move using a mobile platform. One main difference to full body androids or geminoids is their increased perceptual capability. This is because such robot platforms are targeting social interaction and communication and not motor control. 28 Figure 2.3: The three most popular teleoperated androids, so called geminoids: Model F (left), HI (middle) and DK (right) front row, with their human ’originals’, Anonymous young female (left), Hiroshi Ishiguro from Osaka University, Japan (middle) and Henrike Scharfe from Aalborg University, Denmark. 29 Figure 2.4: Upper torso robots: Nexi (left), Domo (center left), Barthoc (center right) and Armar3 (right), partially with mobile platform. Humanoid Heads Humanoid heads are constructed for verbal and non-verbal interaction with humans in a face to face set up. These systems are equipped with face and expression detecting capabilities and an expressive behavioral repertoire (See figure 2.5). Figure 2.5: Expressive robot heads: Kismet (left), Mertz (center) and Roman (right). 30 Zoomorphic Robots The construction of emotive interactive agents does not necessary have to lead to humanoid robots. The so called zoomorphic robots aim to use either animals or animal-like entities as inspiration for their construction (See figure 2.6). The have an increased expressive repertoire focusing mainly on social interaction on a verbal or non-verbal dimension. Figure 2.6: Zoomorphic robots: Emuu (left), iCat (center left), Leonardo (center right) and Probo (right). As we have seen there exist a wide variety of interactive physical agents. The anatomy, computational capability and functionality of these applications differ with their objectives. In the next chapters we want to introduce different studies from our lab that us of such agents to investigate the underlying mechanism of emotion perception, processing and expression in relation to social interaction. 31 Chapter 3 NON-VERBAL BEHAVIOR AND SOCIAL INTERACTION The first question we address is if and how humans perceive artificial agents. Before we dive into the complex world of emotive communication we investigate a more subtile code of social interaction: The regulation of the interpersonal space. 3.1 Human Spatial Behavior Humans use a complex code of non-verbal interaction including facial expressions, eye-contact, gestures, postures, and the regulation of interpersonal distance to communicate their intentions and feelings (Birdwhistell, 1975; Ekman, 1993; Mehrabian, 1972; Sommer, 1969; Argyle and Dean, 1965). In this study we investigate the spatial dimension of social behavior. In particular we analyze how people regulate their interpersonal distance to each other while they are engaged in a cooperative task. We are also interested to understand how the salience of a stimulus, for example the perception of another person, affects social interaction. Therefor we investigate the proxemic behavior of players interacting with either a virtual character or a physical counterpart. Three factors that regulate the spatial distance to others have been 33 identified so far (Baldassare, 1978): Biologically pre-programmed instincts, the environment, and the cultural background of people. Behavioral studies have shown that animals have innate behavioral mechanism that affect the regulation of their territory defense (Hediger, 1964). The violation of this space induces psychological stress expressed in physiological arousal and behavioral fight or flight responses. A hypothesis on the underlying neural substrate of this regulation has been proposed by a computational model of allostatic control (Sanchez-Fibla et al., 2010). As a second factor ecological psychology has identified environmental aspects that affect social interaction on a spatial scale (Stokols, 1978). Thereafter the spatial organization of space influences grouping behavior (Sommer, 1969), the building of friendships (Festinger et al., 1950), crime rates (Newman, 1973) and community life (Jacobs, 1961). Ecological psychology distinguishes between different types of spatial cognition that trigger either active and reactive behavioral responses on a spatial dimension. The theory defines different modes of human-environment transactions that are used to explain the environmental influence on human behavior (Stokols, 1978). Another theory dealing with the influence of space on cognition and behavior is the one of space syntax. It proposes that the spatial configuration of buildings and cities influence implicitly spatial cognition and navigation performance, without explicitly assuming anything about individuals motivations (Hillier, 1996; Penn, 2003). The theory states that environmental cognition constructs rather a topological than a metric representation of space that affects the individual behavior in predictive ways. The third factor affecting spatial behavior is culture. The theory of proxemics states that people regulate their inter-personal distance to each other as a subtle code of social behavior that differs across cultures (Hall, 1963, 1966). Hall classified the inter-personal distance to other humans into four different categories: Intimate space (0 - 0.46 meter), personal space (0.46 - 1.2 meter), social space (1.2 - 3.66 meter) and public space (3.66 - 7.6 meter). The intimate space is only shared with closest friends and confidants, the personal space with familiar persons. The social space is the interaction space for routine social interactions with acquaintances 34 as well as strangers, while the public space is not perceived as personal and relatively anonymous. The perception of space varies across different cultures. The three described factors of inter-personal distance regulation do not explain the entire phenomena of spatial behavior. We know from behavioral psychology that people interacting with each other follow regulative dynamics and adapt their actions and thus distance in response to the behavior of their counterpart (Burgoon et al., 2007). The dyadic interaction between two individuals, which is the product of approaching and avoidance forces, balances the mutual comfort of the interactors (Patterson, 1973). The result is a synchronization of actions and responses, that express the adaptive regulation of ease and stress. Approaching tendencies triggered by affiliated needs balance the avoidance tendencies controlled by various fears. The behavioral equilibrium, which is expressed through a number of non-verbal interaction patterns is the result of a comfortable perceived level of intimacy (Argyle and Dean, 1965). Based on these findings we propose a first hypothesis to investigate the spatial dyadic interaction: People that are engaged in a collaborative spatial task perform significantly different depending on the team strategy they choose. Such strategies are expressed in quantifiable spatial behavior. 3.2 The Effect of Apparent Reality One feature of the organization of behavior is the attribution of a change to a perceptual unit (Heider, 1944). This is a theoretical proposition in behavioral psychology that helps designing studies that try to decompose and understand the mechanism of how singular percepts affect human behavior. Nico Frijda describes in his law of apparent reality how the perceptual salience of a stimulus affects action tendencies that lead to the elicitation of emotions (Frijda, 1988). Frijda states in this law that the reality that affects behavior is the perceived stimulus property and not the property itself. An empirical example for this effect are studies that investigate methods to treat spider phobia with stimuli that differ in re35 Figure 3.1: The eXperience Induction Machine XIM, a fully instrumented mixed reality space that can be accessed by multiple users either as physical visitors or virtual representations. Virtual visitors are represented in the physical space of the XIM on the surrounding screen and as lit floor tiles. Physical visitors are represented as virtual characters in the virtual world. 36 alism (Bandura, 1977). Other studies show a lower impact of symbolic information compared to the impact of pictures of the same event on peoples psychological state (Fiske and Taylor, 1984). In social psychology this phenomena is known as ’the vividness effect’ (Borgida and Nisbett, 1977). It summarizes that a vividly perceived stimulus induces a stronger psychological and behavioral response than the cognitive knowledge of the stimulus. Despite some studies have challenged the power of this phenomenon (Taylora and Thompson, 1982; Kisielius and Sternthal, 1986), it has been shown that the vividness of a stimulus affects memory building processes and judgments (Baddeley and Andrade, 2000; McCabe and Castel, 2008). The perceptual salience of a stimulus construct is only one important factor that influences cognition and performance. An other interesting question is how the presence by itself influences the behavior of another person. Darley and Latan 1970 showed that the perception of others reduced the individual’s feeling of responsibility to act in an emergency situation. The bystander in-action is often explained by apathy and alienation or the bystander response to the observers responsibility. This is an example how the mere presence of others affects fundamentally the behavior of an individual. Another example of this phenomena is the so called audience effect. Different studies have shown that people adapt their behavior and expressions if they are performing an action alone or in the presence of an audience (Kraut and Johnston, 1979; Fridlund et al., 1991). Based on these findings we assume that the salience of the stimulus fundamentally affects human behavior. Hence, following this line of concepts and evidence on the role of the perceived salience of stimuli in action we hypothesize that the perceptual salience of another person affects social interaction that can be measured in its spatial dimension. We propose a second hypothesis to investigate this interesting phenomena: The perceptual salience of another person affects the social interaction on the spatial dimension. To test these two hypotheses, we constructed a cooperative ball game in a human accessible mixed reality environment, called the experience 37 induction machine (XIM) (Bernardet et al., 2007, In press) where two teams of two players each had to find the optimal spatial strategy to win the game. In previews work we could show that the inter-personal distance regulation is a subtle code of social interaction that can be attributed to cooperative and competitive behavior (Inderbitzin et al., 2009). In this study we want to use mixed virtual reality as a tool to understand a psychological phenomena know from the real world. Mixed virtual reality combines virtual reality with a physical space where real world and virtual world merge to an immersive experience that does not restrict the performance of users natural physical actions. Such applications offer new possibilities to investigate fundamental psychological questions of human behavior and social interactions, because they provide experimental control without loosing mundane realism as we know from traditional psychological methods (Blascovich et al., 2002). By constructing an experimental set up that provides a collaborative space for virtual and physical humans, we are capable to investigate spatial cooperation and the effects of the stimulus salience on this behavior. When humans are present in virtual worlds they keep certain behavioral interaction patterns. Recent studies investigating the effect of gaze control and personal distance regulation in immersive virtual environments show that humans behave similar as known from real world situations (Bailenson et al., 2003, 2001). Also repulsive reactions following the violation of the personal space in stereoscopic 3D views have been documented (Wilcox et al., 2006). For our study we used the mixed virtual reality space eXperience Induction Machine XIM (Bernardet et al., 2007). XIM can be accessed by physical and virtual visitors providing a collaborative space for the investigation of human behavior. This unique set up allows us to analyze the behavior of humans that are either physical or virtual without changing the context of the situation 38 3.3 Methods 3.3.1 Materials The study was conducted in the mixed virtual reality space eXperience Induction Machine (XIM) (Bernardet et al., 2007, In press). The physical space has a size of 5.5 by 5.5 meters and surrounds the visitor on all four sides with wide screen projection walls. The luminous floor is built by 72 pressure sensitive hexagonal floor tiles (Delbruck et al., 2007). People in the space are tracked by the Multimodal Tracking System MMT, which combines infrared tracking information with the tactile information from the floor (Mathews et al., 2007). The virtual world is produced by the game engine Torque (GarageGames, 2010). The XIM can be experienced by multiple users of different modalities - physical or virtual - sharing a collective space of social interaction (See Figure 3.1). Users that enter XIM remotely over a network see a virtual representation of the physical space and the users present in the space on a computer screen. Remote users are represented in the space as avatars on the surrounding screen and as lit tiles on the floor. Remote visitors control their avatars using a game pad and a wireless communication head set (Logitech ClearChatT M ) to talk to their physical team player (See Figure 3.2). 3.3.2 Research Design We constructed a cooperative mixed reality ball game, where two teams of two players had to find the optimal spatial strategy to win. The ball was represented as a yellow floor tile in the space (See Figure 3.1). Players could control a virtual representation of a paddle through changing their position in space. The aim of the game was to use the paddle to hit the ball and reflect it towards the opposing team’s side. If the ball passed the back most boarder of the playing field of one team a goal was scored. This game could be either played by physical action in XIM or by using a game pad to control an avatar visible on a computer screen (See figure 3.1 and 3.2). The independent variable in our study was the body representation 39 Figure 3.2: In the Mixed condition one remote player built a team with one physical player. The remote players played the game using a computer and a game pad. Physical players inside the XIM were represented as avatars on the screen of remote players. Verbal communication between the remote and physical player was established over a wireless communication headset. 40 itself. The dependent variable was the performance and spatial behavior of the players. By varying the players representation between virtual and physical we constructed three different game conditions: Physical, Mixed and Virtual (See figure 3.6). In the Physical condition all participants were inside the XIM and had to move physically to play the game. In the Mixed condition one physical player inside XIM formed a team with one virtual player using a computer to play. In the Virtual condition two virtual players formed a team. Only teams using the same modality played against each other: Physical teams vs. Physical teams; Mixed teams vs. Mixed teams; Virtual teams vs. Virtual teams. 3.3.3 Measures The position of the four players, the ball position, goal events and paddle - ball collisions were recorded at a sampling rate of 25 cycles per second (See figure 3.3). We calculated three different aspects of participant’s spatial behavior: The inter-personal distance regulation between teamplayers, players’ activity and the position in space. We quantified the inter-personal distance and the time that team members shared either the intimate or the personal space. To understand the spatial tactics on both a global and local level, we measured the inter-personal distance regulation for entire games, for all winning and losing epochs and for all offensive and defensive game situations. An epoch was defined as the time window lasting from the ball play out until a goal was scored. An offensive game situation was defined as the time period in which the ball was moving away from a team, a defensive game situation as the time period while the ball was moving towards a team. To investigate the overall activity, we calculated the mean distance that players moved in space. To analyze the position of players we calculated the mean distance to the team field mid-line, defined as a parallel line to the side line separating the team field into two equal parts. The time that a player spent in the field side of his/her team partner was used as an additional measurement of players spatial distribution. 41 3.3.4 Procedure The team assignment and the order of the game modality was randomized. Every team played the game in all three conditions (Physical, Mixed and Virtual). One game lasted three minutes. An experimenter explained the game to the participants inside XIM and answered questions to make sure that all players understood the rules. A rehearsal trial of about one minute was played in every modality so that participants could familiarize themselves with the setup. The experimenter informed all participants that data was recorded during the game and that they could leave the space at any time if they did not feel comfortable. 3.3.5 Participants Fifty-two healthy adults aged 18 to 30 years (M = 23.6, SD = 3.9; 33 % women, 67 % men) were recruited from different universities of Barcelona by an ad. All participants had at least finished undergraduate educational level, and were all permanent Spanish residents. Participants were participating voluntarily in the study without financial reward. Participants gave consent that the data of the experiment were used for scientific investigation. 3.4 Results We recorded 13 games for each of the three conditions, yielding a total sample size of 39 games. Three ties were observed, two in the Mixed condition and one in the Physical condition. Winning teams scored a mean of 10.5 goals (SD = 3.9), losing teams 5.3 goals (SD = 1.9). In tie games 6.3 goals (SD = 0.5) were scored. Overall 609 goals were observed, 191 in the Physical condition, 213 in the Mixed condition and 205 in the Virtual condition. 42 Figure 3.3: Spatial distribution of an example epoch. The ball play out (red dot) starts in the middle of the field. At the beginning of the epoch team players were positioned in their team side (blue and green dots). The trajectories of the players show their spatial behavior over time. Play direction was vertical. Team 2 scored a goal when the ball reached the back line of team 1. 43 Table 3.1: Proxemics behavior of winners and losers: Mean time of shared interaction space; standard deviation in brackets. IS = intimate space; PS = personal space; Sig = significance (a p < 0.1, * p < 0.05, ** p < 0.01). Game Shared IS [sec] Shared PS [sec] Epoch Shared IS [sec] Shared PS [sec] Offensive Situation Shared IS [sec] Shared PS [sec] Defensive Situation Shared IS [sec] Shared PS [sec] Winning Team Losing Team Sig 0.97 (2.5) 15.18 (19.9) 0.57 (2.0) 11.15 (14.5) a 0.06 (0.1) 0.91 (1.0) 0.04 (0.1) 0.70 (1.3) * 0.02 (0.1) 0.63 (0.7) 0.01 (0.1) 0.35 (0.6) ** 0.01 (0.1) 0.25 (0.4) 0.01 (0.0) 0.34 (0.5) 44 3.4.1 Spatial Scale of Collaborative Behavior We used two measurements to quantify the spatial scale of the intra-team member interaction: The mean distance between team players and the time they shared the intimate and personal interaction space. We evaluate the difference in time in which team members shared personal space across epoch winning and epoch losing teams (See figure 3.4) . Winners shared their personal space longer with each other compared to losers: χ2 (1, N = 155) = 5.15, p < 0.05 (See table 3.1). Winners also shared their personal space significantly longer with their team mates during offensive moves: χ2 (1, N = 143) = 8.3, p < 0.01. As we will see this does not mean that epoch winners in general displayed a shorter interpersonal distance. Additionally winners choose a more offensive distribution compared to losers: Wilcoxon z = 98.4, p < 0.01. The mean of the ranks for winners distance to the back line was 0.55 meters, while the mean of the ranks for the losers distance to the back line was 0.69 meters. The analysis of the dyadic regulation of the inter-personal distance during offensive and defensive game situations revealed a significant different pattern between epoch winners and epoch losers. Winners chose a significant bigger inter-personal distance during offensive game situations: χ2 (1, N = 143) = 35.3, p < 0.01 and a closer inter-personal distance during defensive situations: χ2 (1, N = 143) = 37.2, p < 0.01 (See table 3.2). This means that epoch winners and epoch losers chose an opposite dyadic inter-personal distance regulation. Winners stood closer together during defensive game situations, but where wider distributed in the space during offensive moves compared to losing teams. So winners and losers were regulating their interpersonal space in a inverse oscillating manner, while winners accepted significantly longer the presence of their team members in their personal space. 45 Figure 3.4: Distribution of epoch winners (right panel) and epoch losers (left panel) for all goal events. The graph only shows one side of the game field; play direction is from top down and vice versa. Colorbar indicates accumulated position of players over time. Winners chose more static and defensive positions compared to losers. 46 Table 3.2: Spatial intra-team interactions for winners and losers during the entire game, winning and losing epochs and offensive and defensive game situations: Mean intra-team member distance; standard deviation in brackets. ITMD = Intra-Team Member Distance; Sig = Significance level ** p < 0.01. Game Epoch Offense Defense 3.4.2 ITMD Winners [m] ITMD Losers [m ] Sig 2.23 (0.4) 2.34 (0.4) 2.29 (0.4) 1.76 (0.6) 2.33 (0.4) 2.33 (0.4) 1.71 (0.6) 2.32 (0.4) ** ** Effect of Players Representation on the Spatial Interaction To investigate the effects of players representation - Physical or Virtual we analyzed the spatial behavior under different conditions. We evaluated significant differences in seconds spent in the intimate space of the team member across different conditions: χ2 (2, N = 78) = 15.76, p < 0.01 (See figure 3.5). Post hoc analysis using the Bonferroni criterion indicated that Mixed teams shared their intimate space longer as compared to Physical teams. Additionally we observed differences in the duration that personal space was shared across conditions: χ2 (2, N = 78) = 9.86, p < 0.01. Significant differences between the condition Physical and Mixed and between the condition Physical and Virtual were evaluated by a Bonferroni post hoc analysis. This is an interesting result, but we have to be careful to make interpretations about the underlying mechanisms responsible for this change in behavior. Because we are changing the modality to play the game (physical action vs. game pad) we do not know how much this difference of playing affected the behavioral adaptation. To avoid such criticism we investigated the effect of the representation on spatial interaction using a method that excludes such side effects: We compared the behavior of 47 Figure 3.5: Distribution of epoch winners (right panel) and epoch losers (left panel) for all goal events. The graph only shows one side of the game field; play direction is from top down and vice versa. Colorbar indicates accumulated position of players over time. Winners chose more static and defensive positions compared to losers. 48 XIM players playing in Mixed teams with XIM players playing in Physical teams (See figure 3.7). All players playing the game inside XIM share the same representation and thereby the same game modality. But they differ in the interaction set up: XIM players playing in Physical teams interact with another XIM player, while XIM players playing in Mixed teams interact with avatars controlled by the remote players (See figure 3.7). Analogous we compared behavioral differences between remote players that participated in either Mixed or Virtual teams. All remote players shared the same modality to play the game and perceive their team members. A two-sided t-test revealed that XIM players in Mixed teams sprinted more than XIM players in Physical teams t(76)=2.03, p<0.05. No differences were observed between remote players of Mixed teams and remote players of Virtual teams (See table 3.3). To investigate the distribution of players in the space we analyzed the mean distance of subjects to the midline of the team field. The mid-line is defined as a line parallel to the side line separating the field into two equal parts. A two-sided t-test revealed that XIM players in Mixed teams chose a smaller distance to the mid-line compared to XIM players in Physical teams t(76) = 2.61, p < 0.01. No differences between remote players of Mixed teams and remote players of Virtual teams were observed (See table 3.3). Additionally we calculated the time that players spent behind the midline in the field side of their team mate. XIM players of Mixed teams entered significantly longer this field side compared to XIM players of Physical teams: χ2 (2, N = 76) = 5.96, p < 0.01. No differences between remote players of Mixed teams and remote players of Virtual teams were observed (See table 3.3). 3.5 Discussion Based on previous findings we hypothesized that the spatial behavior of multiple people engaged in a cooperative task codes social interaction that can be quantified. Additionally we hypothesized that these interaction patterns are affected by the salience of perceiving another person. The 49 Figure 3.6: Schematic representation of the three conditions. Only team of the same condition played against each other. Left panel: Two Physical teams compete each other. All four players are physical present inside XIM. Middle panel: In the Mixed condition one player of each team is present inside XIM and the other player virtually represented. Virtual players use a computer to play the game. Right panel: In the Virtual condition all four players use a computer to play and are virtually represented inside XIM. Table 3.3: Spatial behavior of XIM players and remote players. Mean sprinted distance, mean distance to the mid-line of the team side and mean time spent in the field side of the team member (Time behind mid-line); standard deviation in brackets. Sig = Significance level ( * p < 0.05, ** p < 0.01). Physical Mixed Sig Sprinted Distance [m] Distance to mid-line [m] Time behind mid-line [sec] 67.6 (26.1) 1.17 (0.3) 2.35 (3.6) 50 80.3 (25.3) 0.90 (0.6) 3.89 (17.8) * ** ** Figure 3.7: Schematic representation of the detailed analysis of players behavior in different conditions. We compared the behavior of XIM players in the Physical condition with the behavior of the XIM players in the Mixed condition (A) and the behavior of the remote players in the Mixed condition with the behavior of the remote players in the Virtual condition (B). results of our study provide support for both proposed hypotheses. The task participants had to complete favors teams that optimally coordinate their distribution in space (Inderbitzin et al., 2009). This means that the two team players had to find a spatial strategy that led to success. Our data shows that the spatial strategy of this social interaction can be identified and quantified. Winners choose in general a more defensive strategy selecting positions closer to the back line of the space. This simple but fundamental difference in behavior seems to gave the winners more time to react to the attacks from their opposing teams. Another difference is that winners shared significantly longer their personal space with their team players, in particular during offensive moves. Also losers entered the personal space of their team players, but not as long as winners. Winners and losers also differed in the spatial dyadic interaction: Successful teams stood more compact during defensive game situations and more disperse during attacking moves, while losing teams chose the reciprocal moving pattern. This two results seem to be contradictorily. A closers look at the data 51 revealed that a very specific moving pattern for winning teams: During defense they chose a compact disposition in space without entering the personal space of their team members. During offense they increased their interpersonal distance, but entered in particular moments the personal space of the other player. This detailed analysis shows a complex spatial interaction that is not visible at the first look. An interpretation of this behavior is that winners played more individualistic increasing their interpersonal space to their team member during offense. Such an excited play mode led to situations where both tried to hit the ball entering the personal space of the other player. We can summarize that winners chose a more efficient spatial strategy during the rally. This adaptation of the inter-personal space had a crucial effect on the success of the team. It has been shown that the regulation of the personal space depends on the familiarity of the interactors (Hall, 1963, 1966). Humans share their direct surrounding with familiar and intimate friends while they prefer to interact with strangers at a wider distance. The invasion of an unknown person into the personal space can be perceived as a threat and therefore induces discomfort (Hayduk, 1978). Recently it has been show that neuronal clusters that respond to fearful situations show also an increased activity during a violation of the direct personal space (Kennedy et al., 2009). Humans compensate invading behavior of a person with a counterbalancing action (Patterson, 1973). The behavioral equilibrium of non-verbal interaction patterns results in a comfortable level of intimacy (Argyle and Dean, 1965). From psychological studies we know that people with a high social status claim more direct space and position themselves closer to other people compared to people with a lower social status (McKenzie and Strongman, 1981; Leffler et al., 1982). Interestingly people with a reduced self-esteem increase their inter-personal distance to others (Roger, 1982). This means that the regulation of the personal space is an indirect indicator of the social relationship we establish to others. It could be that such underlying psychological mechanisms were affecting the behavior of players in our game. This would mean that winners felt less discom52 fort sharing the personal space with their team members and therefore regulated their dyads more naturally. The spatial game we used in our experiment favors by definition a homogeneous distribution of players in the space. This implies that an asymmetric spatial distribution of players affects negatively the performance of a team. So maybe teams that perceived social discomfort chose an inhomogeneous spatial distribution and therefore suffered under a reduced success rate. This is an interesting interpretation that relates players perception of ’the self’ to the observed behavioral performance of the team. So far we base this interpretations on the theoretical basis of proxemics, showing that social discomfort increases the spatial distance. To confirm the factors responsible for this change in behavior we would need reports of players feeling during the game. So far we quantified the spatial patterns coding social behavior. Our second hypothesis addressed the question how the salience of the stimulus, in our case the perception of another person, influences social interaction on a spatial scale. The results show that players significantly adapted their spatial distance to another person whether or not this person was physically present or virtually represented. Members of Mixed teams shared their intimate and personal space longer with each other compared to Virtual and Physical teams. This finding raises an interesting question: Are these behavioral differences induced by the varying modality to play the game (Game pad vs. physical action) or are they induced by a change of the stimulus salience (virtual team partner vs. physical team partner)? To investigate this question we compared the behavior of XIM players in Mixed teams with the behavior of XIM players in Physical teams. And analogously the behavior of remote players in Mixed teams with the behavior of remote players in Virtual teams (See figure 3.7). All XIM players and all remote players share the same game modality, means they either use a game pad or perform physical action to control their body. The only difference between the two groups is the representation or salience of their team partner (virtual vs. physical). Our results show that XIM players of Mixed teams playing with a virtual avatar as team partner were much more active and centralized positioned compared to XIM players in Physical teams. XIM players of Mixed teams also entered significantly 53 longer the team field side of their virtually represented team player compared to XIM players of Physical teams playing with a physical team player. The change of the salience of the stimulus, in our case the representation of the team partner either as virtual character or real player, had a fundamental effect on the behavior of players. In social psychology this phenomena has been described as the ’vividness effect’ (Fiske and Taylor, 1984; Baddeley and Andrade, 2000). It states that symbolic knowledge has weaker impact compared to the impact of pictures and events. Frijda describes a similar effect in his law of apparent reality 1988, “Emotions are elicited by events appraised as real, and their intensity corresponds to the degree to which this is the case“ (p.352). So the mental state of how strong we perceive the world as real affects how we react to it. This concept finds also support by physiological studies that show that the subjective perception of realism can influence the physiological responses to a fearful stimulus (Bridger and Mandel, 1964). In our study we reduce the apparent realism or the salience of the stimulus by varying the representation of players. The perception of physical and virtual players differ in the amount of accessible information. A physical player not only marks his position in space, but also expresses body gestures that are important non-verbal cues for the understand other people’s intentions (Birdwhistell, 1975; Sommer, 1969). It could be that physical players joining a team with a virtual player behaved more ”egoistically“ because they could not understand their intentions. This interpretation points out the importance of gestures to understand immediate actions of other people. Another interpretation of the ”egoistic“ behavior of physical players in Mixed teams is that the physical absence of their team player induced an increased feeling of responsibility to act. A phenomena that has been observed in the so called bystander effect (Darley and Latane, 1970). It states that the presence of others in an emergency situation reduces impulses to act. This would mean that physical players in Mixed teams felt alone in the physical space and therefore more responsible to run for the ball. 54 We conclude that the reduction of the stimulus salience induces a significant change in social behavior on a spatial scale. To which account the lack of gestural information or the lack of physical presence is responsible for this change is difficult to say. The behavioral differences between Physical and Virtual teams are also very interesting. Our observations show that people interacting in a virtual world with other people reduce significantly their proxemic regulations. In particular virtual players entered more often the intimate and personal space of their team players than physical players. The perception of a real physical person induces not the same behavioral response as the perception of a virtual character on a computer screen. It seems like the regulation of the inter-personal space is fundamentally affected by how we perceive others rather than the consequence of a cognitive applied concept. But we have to be careful with claiming that the change in perception is the only factor affecting our results. Because we do not know if additionally the game modality of controlling the body influenced the observed behavior. Obviously moving a virtual body using a game pad is not the same as performing physical action in a space. So any behavioral difference between the condition Physical and Virtual is probably influenced by both, the salience of the stimulus and a change of the game modality. 3.6 Conclusion The understanding of the regulation of non-verbal spatial behavior is a complex problem. With our study we could show which dimensions of spatial behavior can be related to cooperative interaction patterns and how we can quantify them. Winners chose in general a closer interpersonal distance and a more successful dyad. It could be that this pattern was positively or negatively affected by the level of perceived social status of singular players inside the teams. The salience of perceiving another person either as real or physical was influencing this spatial behavior. So far we know three factors responsible for the regulation of spatial interaction 55 (Baldassare, 1978): Genetically pre-programmed behavioral patterns, the environment and culture. Based on our results we propose the that variation in stimulus salience acts as a gating mechanism for these factors. This concept is not new and has already been described by others (Frijda, 1988; Borgida and Nisbett, 1977). With our study we provide empirical data supporting the idea that the perceived apparent realism works as a psychological mechanism that modulates behavioral responses. One strategy for investigation is to construct a set of different stimulus properties that induce different behavioral responses, that help us to understand the underlying psychology. Mixed virtual reality environments are therefore powerful tools to construct experimental design addressing this question (Blascovich et al., 2002). Future studies investigating the effect of the perceived salience have to be capable to gradually reduce the vividness of the stimulus, which could be done by using 3D holographic representations of real humans and objects. Recently such technologies have been used successfully in entertainment industry (McQueen, 2006). More realistic approaches to control a virtual avatar by physical actions will help to reduce behavioral discrepancy between the real and virtual world. That could be done by using interfaces that do not restrict physical actions like the multidirectional endless treadmill called CyberWalk (De Luca et al., 2009). 56 Chapter 4 PERCEPTION OF EMOTIONS The expression of emotions is fundamental for social interaction. Humans communicate their emotions to others using spoken language and a variety of non-verbal behaviors. Often these emotional expressions communicate the internal state of an individual to the group (Scherer and Ekman, 1984; Ekman, 1993; Izard, 1994). But not always. It has been shown that certain expressive behaviors like smiling for example are more often used to foster social relationships and hierarchical structures than to express a true affective state (Fridlund et al., 1991; Kraut and Johnston, 1979). The ability to perceive and understand emotional cues is fundamental for social interaction, because it allows an individual to derive the intentions of others (Baron-Cohen, 1997b; Blakemore and Decety, 2001). The importance of being capable to perceive affective behavior can be seen in patients suffering from the Asperger’s Syndrome (Baron-Cohen et al., 1997a). These patients show fundamental deficits in social interactions, because they lack the ability to ’read’ the affective state of others. In this chapter we investigate the perception of verbal and non-verbal features of emotions. We want to know which behavioral parameters code an emotional state and how the brain perceives and integrates these parameters into a global impression. To do so we use different artificial agents that allow us to construct realistic and controllable stimuli spaces. The behavioral observations will be tested against different models of per57 ception. The insights from these studies add to the understanding of how humans perceive emotions and which behavioral parameters are crucial for the communication of these emotions. 4.1 Emotion Perception in Locomotion A major challenge for the understanding of the meaning of expressive behavior is to find a schematic classification. This is not a trivial task, it has motivated researchers from sociology, behavioral psychology, theatre, and dance studies for decades. While a great deal of attention was focused on the understanding and classification of emotional facial expressions (Ekman and Friesen, 1978; Scherer and Ekman, 1984; Ekman, 1993; Izard, 1994), relationally little systematic research has been carried out in the field of emotional body language. One approach to describe body movements is the Laban Movement Analysis (LMA), that divides expressive biological motion into four different dimensions: Body, Effort, Shape and Space (Pforsich, 1977). Using its own symbolic notation, this analysis method is capable of specifically describing body movements. The LMA is a powerful tool that can be used for the production and especially the reproduction of human behavior in acting and dance. Despite these capabilities, the LMA lacks a clear linkage between expressive behavior and somatic and cognitive states. Another theoretical concept is Bridwhistell’s theory of kinesics that understands the language of the human body as a “structured dynamic process of communication“ (Birdwhistell, 1975). According to this theory all movements of the body have a meaning and these movements have a grammar that is based on kineme, interchangeable units of movements. Unfortunately, the results of his extensive studies are not systematically ordered and thereby difficult to quantify (Jolly, 2000). A simpler classification system was proposed by Mehrabian, focusing on the orientation of the head in relation to the body and the angles of bodies interacting with each other (Mehrabian, 1972). What all classification systems have in common is a difficulty to find a direct relationship between the affective state and concrete corporal con58 figuration and body movement. In contrast to facial expressions, where we can observe coherent relationships between basic emotions and expressive behavior (Scherer and Ekman, 1984), the interpretation of expressive body behavior is more sensitive to contextual and social influences (Kret and de Gelder, 2010). Nevertheless, there is some empirical evidence that the movement and the form of the human body communicates emotions (Camurri et al., 2003; Blake and Shiffrar, 2007), also at a distance where facial expression is not detectable (Walters and Walk, 1986). A promising approach is to analyze the emotion attributing of predefined body postures or movements, and correlate them with the parameters defining the body configuration. Studies following this idea differ methodological by either exposing viewers to real actors playing (Camurri et al., 2003), video scene of actors playing (Wallbott, 1998), computer animations of virtual humans (Coulson, 2004)or point-light animations that conceptualize human body movements (Clarke et al., 2005; Pollick et al., 2001). The results of these and similar studies show that affective states can be identified by observing static postures (Kleinsmith and Bianchi-Berthouze, 2007; Coulson, 2004; Kleinsmith et al., 2006; De Silva and Bianchi-Berthouze, 2004) or moving behavior (Kamisato et al., 2004; Camurri et al., 2003; Wallbott, 1998). The exact contribution of form and movement for the perception of emotional states is the topic of an extended discussion in the field. A recent study by Roether et al.(2009) states that the understanding of affective body language is an integrative process of the perception of both dimensions, form and movement. Roether identifies the limb flexion velocity as an important feature for the perception of fear and anger, while the upper body posture, especially the head inclination communicates sadness. These results are in line with a study from Thurman et al (2010) that investigates the perception of different critical features for biological motion. Exaggerated body movement facilitates the recognition of affective states, especially the intensity of them (Atkinson et al., 2004). The contribution of the form dimension for the identification of emotional states was made visible by a study using inverted and reversely played sequences of a moving person (Atkinson et al., 2007). The result of these studies can be 59 interpreted that the form plays a crucial role in affect identification, while kinetics help to solve conflicts and the identification of the intensity of the emotion. A finding that is in line with perceptual studies investigating the neurobiological mechanism of motion perception (Giese and Poggio, 2003). The emotional classifications used to describe affective behavior differ in complexity. The basic emotion approach is claiming that there exist a finite set of distinguishable emotions that can be attributed to expressive behavior (Izard, 1977; Ekman, 1992). The dimensional approach to emotions describes affective states using a two-dimensional classification system known as the circumplex model (Russell, 1980; Plutchik, 1980). This theory provides a circular classification space of basic emotions using valence and arousal to describe the quality and intensity of different emotional states. Both systems are used to describe expressive body movements (Coulson, 2004; Wallbott, 1998). 4.1.1 Methods Based on the results of previous studies (Coulson, 2004; Wallbott, 1998), we constructed different animations of expressive locomotion by varying three parameters of the movement: The head/torso inclination, including the erection of the shoulder, the speed of the movement, and the viewing angle. We selected 18 participants from the University Pompeu Fabra for our study. All the participants were either master students, PhD students or professionals working in academic and were permanent living in Spain. The mean age of the participants was 28.4 years (SD = 4.3; M = 70 %; W = 30 %). The animations were modeled using Autodesk 3ds Max (Autodesk Inc., San Francisco, CA, USA, 2007) and transferred to the Torque Game Engine (GarageGames, 2010). As stimuli we exported from 10 sequences of a length of 10 seconds each. For the stimuli exposure, and the rating of the sequences we used a 15 inch IBM Think Pad Laptop running the E-prime1 experiment exposure software (Psychology Software Tools, Inc., Sharspburg, PA, USA, 2007). The self-assessment manikin 60 rating scale (Bradley and Lang, 1994) was used for the evaluation of the sequences. Figure 4.1: Still images of stimuli in frontal view (A-C), and side view (D-F). Head/Torso inclination varied between 55 degree down (A, D), zero degrees (B, E), and 15 degrees up (C, F). We constructed 12 different animations of a person walking by varying the parameter of the head-torso inclination, the speed of the movement, and the viewing angel (Figure 4.1). We defined the head-torso inclination of the neutral body posture as inclination angle zero, and used this as a reference for the other animations. The deviation of the headtorso varied between -55 and +15 degrees. The convention applied was that minus inclination indicates a ventral direction, positive declination a dorsal one. Half of the animations were showing the walking body in profile view (90 degree viewing angle), the other half in 45 degree rotated frontal view (Table 4.1). The animated avatar was a women wearing a dark, red-blackish suit, and dark shoes. To avoid any contextual influence we used a neutral gray color as background (Kret and de Gelder, 2010). The face of the character was blurred to avoid any influence of the facial expression (Van den Stock et al., 2007; Meeren et al., 2005). 61 Table 4.1: Specification of the stimuli parameters Viewing Angel Inclination [Degree] Speed [steps/sec] 45 Neutral [0] Medium [0.75] 90 Neutral [0] Medium [0.75] 45 Up [+ 15] Medium [0.75] 90 Up [+ 15] Medium [0.75] 45 Down [- 55] Medium [0.75] 90 Down [- 55] Medium [0.75] 45 Neutral [0] Slow [0.5] 90 Neutral [0] Slow [0.5] 45 Neutral [0] Fast [1.4] 90 Neutral [0] Fast [1.4] Participants were sitting alone at a table in front of a computer laptop used for the stimuli presentation, and asked to rate the valence and arousal state of a walking person. Each sequence was played for 10 seconds, followed by a black screen. After 2 seconds the valence and arousal rating scale appeared until a rating was given. The pause before the next stimulus sequence was played was 4 second. The order of the sequences was randomized. After the experiment, participants were asked by the experimenter if they had any problems to follow the experiment. Participants were not informed about the specific objective of the study. 4.1.2 Results The data was analyzed using the SPSS software package. The valence and arousal ratings were submitted to two multivariate analysis of variance (MANOVAs) where Wilks Lambda was used as the multivariate criterion. The first MANOVA factors were 2 (viewing angle) x 3 (head inclination), and the second MANOVA factors were 2 (viewing angle) x 3 (movement speed). All data satisfied the normality criterion as verified 62 using the Kolmogorov-Smirnov test. Effects of head/torso inclination The analysis showed that the head/torso inclination factor had a significant effect on the ratings (F (4, 13) = 23.5, p < 0.001, Λ = 0.1). This effect was pronounced both for arousal, F (2, 29) = 49.9, p < 0.001, and for valence, F (1, 24) = 45.2, p < 0.001. The post-hoc Bonferroni comparisons for the arousal ratings showed that the difference between head/torso down (M = 2.5, SD = 0.3) condition was significantly lower (p < 0.001) than normal the head/torso condition (M = 5.4, SD = 0.3), and the head/torso up condition (M = 6, SD = 0.3). The same comparisons for the valence ratings showed significant differences for all three conditions. The head/torso up condition was perceived as most pleasant followed by the normal head/torso position, and head/torso down. The means were M = 6.7, SD = 0.2; M = 5.9, SD = 0.3; and M = 2.8, SD = 0.5, respectively. No effect of the viewing angle, or interaction between the angle and the head/torso position reached significance. Figure 4.2: Valence and arousal rating for Head/Torso inclination. Error bars indicate standard error. Valence rating 0 indicates a very sad emotional state, rating 10 a very happy state. Arousal rating 0 indicates low arousal, arousal rating 10 indicates high arousal. 63 Effects of walking speed The movement speed factor reached significance at F (4, 13) = 41.1, p < 0.001, Λ = 0.07. This effect was caused only by the arousal ratings, F (2, 27) = 58.6, p < 0.001. The post-hoc Bonferroni comparisons for the arousal ratings showed that fast speed motion (M = 8.1 SD = 0.2) was significantly different from the normal speed (M = 5.4 SD = 0.3), and from the slow speed conditions (M = 4.2 SD = 0.3). No effect of viewing angle, or interaction between the angle and the movement speed reached significance. Figure 4.3: Valence and arousal rating for different speed parameters. Error bars indicate standard error. Valence rating 0 indicates a very sad emotional state, rating 10 a very happy state. Arousal rating 0 indicates a low arousal state, arousal rating 10 indicates a high arousal state. When locating the animations in the circumplex model of valence and arousal (Figure 4), we see that a wide area is covered, indicating the power of the head/torso inclination and speed parameters to express a range of emotional states. The coordinates that are not sufficiently covered yet, are the combinations of high valence/low arousal, and low valence/high arousal. 4.1.3 Discussion & Conclusion Our results show that participants assigned distinct emotional states to animations of a walking person that only differed in the erection of the pos64 8 ◆ 6 fast.90.neutral medium.45.down medium.45.neutral medium.45.up medium.90.down medium.90.neutral medium.90.up slow.45.neutral ◆ slow.90.neutral Arousal 4 Animation ◆ fast.45.neutral 2 ◆ 2 4 6 8 Valence Figure 4.4: Distribution of the animations in the circumplex space. The legend indicates the stimuli parameter space of the different animations: <speed>.<viewing angle>.<head/torso inclination>. The speed parameter is defined as Fast = 1.4 m/sec, Medium = 0.75 m/sec and Slow = 0.5 m/sec. The viewing angle varies between profile view = 90 degrees, and rotated frontal view = 45 degrees. The parameter for the head/torso inclination varies between Neutral = 0 degrees, Up = + 15 degrees and Down = -55 degrees. 65 ture, and walking speed. An upright head/torso position was significantly related with a positive emotional state or high valence, a lower position with a more unpleasant emotional state. Even small changes in head/torso position of 15 degrees induced a significantly different perception of the emotional quality. This is indicative of the high sensitivity of humans in relating subtle differences in body language to internal states. Next to the valence, also the arousal rating was significantly affected by the body posture: Animations with negative head/torso inclinations were perceived as less aroused compared to body postures with more upright head/torso positions. These results are in line with studies showing that especially the static configuration of the upper body part code important features responsible for the perception of emotional states (Roether et al., 2009; Atkinson et al., 2007). While the valence rating differed in all three head/torso conditions, in the arousal rating we only observed significant difference for the most extreme negative head/torso inclination . This finding suggest that only extreme down positions of the head clearly code low values of arousal, which is in line with other studies that found that depressive states were characterized by non-erected postures (Roether et al., 2009). The different walking speeds had a clear effect on the perception of arousal: Higher speed yielded higher arousal ratings compared to slower movements. This means that the velocity of the body movements does provide information about the magnitude of an emotional state of a person. This finding is in line with recent studies showing that the velocity of body movements codes the intensity of a perceived emotional state (Atkinson et al., 2004; Roether et al., 2009). Contrary to this observation, the speed had no effect on the valence rating of the perceived emotion. If we are searching for canonical parameters that control the expression of emotions in animations, we aim at finding parameters that are independent of the angle from which their are seen. Indeed, our results show that the emotional quality of the animations generated based on the chosen set of parameters are independent of the viewing angle. The identification and empirical evaluation of canonical parameters that control the expression of emotions in locomotive behavior is the main contribution of this study. Our results are coherent with previous 66 work, showing that upright upper body postures are perceived as emotionally more positive and forward leaning postures more negative (Coulson, 2004; Roether et al., 2009), and studies that found associations between “dropped head” positions and sadness (Wallbott, 1998). The perception of the arousal state can be related to a variation of the velocity of the movement, which is in line with findings from (Ekman and Friesen, 1974). The results of our study confirm previous results stating that the intensity of a perceived emotion is directly linked to the velocity of the identified body gesture (Atkinson et al., 2004). Our study therefore supports the hypothesis proposed by others that the static configuration of the body parts, especially the upped back, shoulders and head inclination valence value (Roether et al., 2009; Atkinson et al., 2007), while the kinematic dimension codes the intensity of the emotion (Atkinson et al., 2004). Even though context (Wallbott, 1998; Aviezer et al., 2008; Kret and de Gelder, 2010) and facial expressions (Van den Stock et al., 2007; Meeren et al., 2005) play an important role in giving meaning to bodily expression, our results show that people recognize distinguishable emotional states of a moving person independent of those two factors. Hence, we show that the characteristic of locomotion by itself can convey emotional states. These findings are important as they allow us to build virtual characters whose emotional expression is recognizable at distances larger than those at which facial expression can be decoded. Additionally, the moving characters can keep their emotional state during an extended period of time. This is important since observing an isolated emotive face over a long time can be perceived as a non-natural behavior. The understanding of both of the mentioned aspects is of relevance for the construction of avatars that interact with users in virtual worlds or in environments such as CAVEs (Cruz-Neira et al., 1992) and mixed-reality spaces such as the eXperience Induction Machine (Bernardet et al., 2008). Future work will include the investigation of additional parameters that allow to cover the entire circumplex space. Additionally, we plan to apply our finding to the control of the emotional expression of a real-world robotic platforms such as the humanoid robot iCub (Sandini et al., 2007). 67 4.2 Emotion Perception in the Talking Face The perception of emotions is not a trivial task, because humans use multiple modalities in parallel to transmit their affect to their environment. While talking for example, humans communicate their emotions with the meaning of the words (Johnson-Laird and Oatley, 1989; Ortony et al., 1987), the prosody (Buchanan et al., 2000) and abstract vocalizations (Sauter et al., 2010). Postures, gestures, touch, facial expressions, eye-gaze and the inter-personal distance regulation are additional communicative cues that support or weaken the verbal dimension (Argyle, 1988; Pentland, 2008). The results is a complex multimodal information stream, consistent of verbal and non-verbal cues. Different theories of how the brain processes this information stream have been proposed (Farah et al., 1995; Etcoff and Magee, 1992; Tanaka and Farah, 1993). 4.2.1 The Fuzzy Logical Model of Perception We base our research on the idea that a multi-modal stimulus is a pattern recognition problem. Rather than perceiving the stimulus as a holistic category, we propose that the brain perceives the single features independently and integrates them in a multiplicative manner. Because no set of particular features characterizes a particular emotion, it has been proposed that the features coding an emotion are continuous (Ellison and Massaro, 1997). The theory behind this idea has been synthesized into the Fuzzy Logical Model of Perception (FLMP) (Massaro, 1998). I will first introduce the theoretical concept of the FLMP and compare the model against other ideas and concepts of perception. The main principle of the FLMP is that the perception of a multimodal stimulus stream is a pattern recognition problem. The model assumes three basic stages of processing as shown in figure 4.5: (1) each source of continuous information is evaluated to ascertain the degree to which it matches various stored prototypes; (2) the sources are integrated according to a multiplicative formula to provide an overall degree to which they support each alternative; and (3) a decision is made on the basis of 68 the relative goodness of fit of each prototype. The three processes are successively ordered in time, but overlapping. This information is based on sensory primitives, or features. Based on the empirical results of previews studies we propose that the encoding of affect in a talking face follows this principle (Ellison and Massaro, 1997). Figure 4.5: Schematic representation of the three stages involved in perceptual recognition proposed by the Fuzzy Logical Model of Perception FLMP. The three processes are temporarily successive, but overlapping. Reading direction in the diagram is from left to right. The model is explained with a task where subjects have to integrate affect from words and expressions. The source of information are indicated by upper case letters: Expressive information by Ei , word information by Wj . The evaluation process transforms this information into perceived features, indicated by lower case letters ei and wj . The integration process results in an overall degree of support sk , for a given affect k. The decision process maps the output of the integration into a response Rk . All three processes make use of prototypes stored in the memory. The FLMP has shown superior performance in multiple empirical experiments in different domains (Massaro, 1998). In the following we want to use a deductive approach to test the model against alternative models of perception. We do this by answering different questions concerning the underlying psychological mechanism of perception. These questions 69 are hierarchically ordered into a tree of wisdom to simplify the answer proceeding. This order does not imply any functional dependency. The concept of the tree of wisdom has already successfully be applied in the field of multi-modal speech perception (Massaro, 1987a). Figure 4.6 illustrates the tree of wisdom, which consists of a set of binary oppositions about how a multimodal stimulus stream is processed. Emotion Recognition Holistic Featural Parts Categorical Continuous Dependent Independent Additive Multiplicative Figure 4.6: Tree of wisdom illustrating binary oppositions central to the differences among theories of perception. Figure retrieved from (Massaro, 1998). In the first stage we ask if the perception of emotion is a holistic process or a pattern recognition problem. Holistic processing can be divided into holistic encoding and configural encoding (Farah et al., 1995). The theory of holistic encoding states that the stimulus is perceived as a whole. In a face recognition task it has been shown that face features in the context of the whole face are perceived with a higher accuracy than the separated features (Tanaka and Farah, 1993). If we want to test the psychological mechanism behind this result we have to find a model that predicts the outcome of this experiment. The problem with holistic processing is that there does not exist any model for testing because it requires the same amount of free parameters as stimuli. According to the holistic idea of perception every emotion is unique, and its identification 70 cannot be predicted on the basis of components. The only approach we can follow is to show that a contradictory model provides a good fit for the observed behavioral data. The FLMP predicts the emotion judgment from single features with a high accuracy. This results provides evidence against holistic encoding (Massaro, 1998). Configural encoding states that the spatial relations of the single features are important for the global perception mechanism. This form of perception is more difficult to falsify, because the relative spatial configuration can be seen as a feature. This means for example that the displacement of a smile is evaluated in relation to the absolute position of the mouth center. Interestingly it has been shown that the FLMP could predict the emotion in half and whole faces with the same parameter values (Ellison and Massaro, 1997). This result supports the idea that component features are the most important forces of emotion perception and not the spatial configuration. We conclude that the perception of a multimodal stimulus stream is not a holistic process and move down the right branch of the tree of wisdom. In the next step we have to answer if stimulus features are categorical or continuous. Experimental results show that given a stimuli continuum between two alternatives identifications judgments change abruptly. Scientist have taken this as a support for the categorical perception theory (Etcoff and Magee, 1992). Because the shape of the identification function is discontinuous supporters of the categorical idea falsely interpreted this as a proof for their theory. It has been shown that continuous information leads to a discrete identification function (Massaro, 1987b). In a study using two features of emotional cues, Ellison and Massaro (1997) have shown that the FLMP describes well the observed emotion judgments that follow a ’discontinuous’ identification function. Because the FLMP assumes continuous information about each feature this results demonstrates that the identification function alone is not sufficient to determine whether perception is continuous or categorical. The verification of which psychological mechanism is responsible for the observed result has to be based on quantitative tests. Unfortunately, like holistic models, categorical models do not allow compositional tests, because such a 71 model would have the same amount of free parameters as stimuli. One approach that has been used to proof the categorical model wrong is to compare the performance of the FLMP against the single channel model (SCM) that is mathematically equivalent to the categorical model of perception. This model states that people categorize information from each feature and respond with the outcome of the categorization of only one of the features. The poor fit of the SCM against the FLMP supports the idea that categorical perception is not an adequate model for the mechanism of emotion recognition (Massaro, 1998). The good fit of the FLMP provide us with the answers for the last two branches in the tree of wisdom. The theoretical basis of the FLMP assumes that single features are independent and combined in a multiplicative manner. The rating results of individual subjects show that combined feature evaluation is more extreme than the rating given to either source alone. The multiple empirical results supporting the FLMP show that this model is a robust framework for inquiry. In this thesis we will evaluate this framework in the context of affect perception in a talking face and compare its performance against the Weighted Average Model WAM of perception. 4.2.2 The Weighted Average Model of Perception Another concept of how multi-modal stimulus integration is achieved is conceptualized by the weighted average model of perception (Bruno and Cutting, 1988; Massaro, 1988; Massaro and Ferguson, 1993). The evaluation stage is similar to the FLMP, but the values are added at the integration state. If we allow one feature dimension to have more influence then the other the model can be made more general. Then the probability to identify angry affect is equal to: P (A|Ei , Wj ) = wei + (1 − we)wj (4.1) where we is the weight given to the expression and (1 − we) the weight given to the word. The AMP is mathematically a single channel 72 model where the participants only attend to information of one modality (Thompson and Massaro, 1989). 4.2.3 Automatic Processing of Information Various criteria have been proposed to define a process of perception an automatic mechanism. This criteria consider if a minimal attention is required, if the stimulus can be processed unintentionally, or whether the detection is effortless (Bargh, 1989; Logan, 1989; Shiffrin, 1988; Evans, 2003). Automatic processes can be structured along a continuum with some processes being more or less automatic than others (Logan, 1989). To which extent a process is automatic can be measured by the ability of some distracter information to interfere with the processing of attended information. A widely used method that measures such interferences is the Stroop task (Stroop, 1935; MacLeod, 1991). In the original version of this method, participants had to either read words printed in different colors, or name the colors of the words. The interference was measured by longer reaction time (RT) in the condition where the word color and the word meaning mismatched. The results show that word reading was less influenced by the non-attended stimulus dimension than the naming of the color. One explanation for this difference in interference is the claim that word reading is more automatic than color naming (Logan, 1989; MacLeod, 1991; Shiffrin, 1977). We will use the same method to investigate the automaticity of affect perception in a talking face. 4.2.4 Automatic processing of affective faces and words Affective stimuli communicate information that can be very important for survival (Öhman, 2002; Öhman et al., 2001). The processing of such stimuli is fast and very efficient Globisch et al. (1999). So far two brain pathways involved in the evaluation of affect have been identified. The first one is an old and fast subcortical pathway connecting the sensory input directly to the amygdala, a nucleus involved in the evaluation of valence quality (LeDoux, 2000; Lang et al., 2000; Paton et al., 2006). 73 This pathway is responsible for unspecific physiological and behavioral responses. The second pathway involves cortical processing of the stimulus and elicits more complex goal directed responses (LeDoux, 1996). The exact relationship between the processing of affect in faces and words and the brain pathways involved in the evaluation of valence is not clearly understood. Recognizing emotion in faces is a skill that develops early in infancy (LaBarbera et al., 1976; Meltzoff and Moore, 1977; Schwartz et al., 1985). This process requires little attention. Threatening facial expressions are capable of influencing responses even when people are unaware of the presence of the face (Esteves et al., 1994; Morris et al., 1998). Maskedpriming stimuli of affective faces are also capable of inducing unconscious mimicry behavior in subjects measured by electromyographic responses (Dimberg et al., 2000). Another interesting results comes from studies that have observed physiological responses to affective faces in patients suffering from face-blindness or prosopagnosia (Bauer, 1984; Damasio et al., 1990). This implies that the perception of affect does not need a conscious perception of the face. An explanation for these findings is the claim that threat-relevant stimuli like angry faces are more salient and therefore are processed differently compared to non-threat stimuli (Morris et al., 1998; Schupp et al., 2004). The results of these studies suggest that emotions communicated by facial expressions are automatically perceived and processed. Similar results have been presented for the perception of the linguistic semantics. In masked priming experiments where participants were exposed to apparently undetectable affective words just a moment before they had to judge the valence quality of a follow up word Greenwald et al. (1989) observed significant influences of these masked words on the reaction time to judge the affective quality of the follow up words. Also, impression formation and preference responses can be influenced by words not consciously detected (Bargh, 1989; Kihlstrom, 1987). Dehaene (1998) showed that words and numbers presented as masked primes induce detectable changes in behavior and electrophysiological activity measured in the premotor area. These results show that also the process74 ing of the linguistic semantics can happen automatic. The understanding of spoken language is based on the integration of both verbal and non-verbal dimension. Interestingly multiple functional MRI studies and physiological results have described a lateralization effect for the comprehension of the linguistic semantic and the expressive dimension of language (Schirmer and Kotz, 2006). The perception of non-verbal expressions of language including vocal tone is related to an increased activity of different brain areas in the right cerebral hemisphere (Nakamura et al., 1999). These results are consistent with the identification of a special cortical region responsible for the processing of faces in the lateral fusiform gyrus of the right hemisphere (Haxby et al., 2002; McCarthy et al., 1997; Kanwisher et al., 1997). The existence of a specialized region responsible only for the processing of the different aspects of faces points out its evolutionary importance. The processing of linguistic semantics has been related to an increased brain activity in the left cerebral hemisphere, concretely in regions of the inferior and temporal frontal lobe (Bookheimer, 2002; Binder et al., 2009). Different regions responsible for the integration of semantic-lexical or phonological content could be distinguished (Demonet et al., 1992). Interestingly scientists have failed to locate category-specific brain areas responsible for the detection of different semantic classes (Devlin et al., 2002; Bookheimer, 2002). These results show that the human brain processes the linguistic semantics and the expressive dimension of language in different regions of the brain. This processing can happen automatically without conscious control. The present study focuses on how facial expressions and linguistic semantics are perceived and integrated in the judgment of two specific emotions: happiness and anger. To investigate this question we designed two experiments using an emotional Stroop task where the subjects always saw a face saying a word. They had to rate the emotional content of the facial expression, or the meaning of the word, or both. Our goal was to vary the amount of information supporting happiness or anger in the spoken words and the face without claiming to produce a complete stim75 ulus continuum between happiness and anger. We used a controllable synthetic talking head to manipulate the expressive dimension of the face but not the voice (Cohen and Massaro, 1993; Massaro and Cohen, 1995). The reaction time (RT) for valence coherent and valence-incoherent stimuli was used to investigate the degree of automaticity of affect perception. In previous studies we successfully used an expanded factorial design to investigate the integration mechanism of vocal and facial emotional components (Massaro and Egan, 1996). The results of these studies showed that the fuzzy logical model of perception (FLMP), which assumes continuous and independent perceptual features, fit the judgments better than an additive model. Hence, in this study we extended this paradigm to the independent manipulation of the face and the linguistic content of the spoken word. 4.2.5 Experiment 1 Methods 7 female undergraduate students from the University of California Santa Cruz participated in the experiment. Participants were recruited by an ad at the UCSC campus. All subjects received a financial reimbursement of 45 US Dollar for their participation. They ranged in age from 18 to 20 (M = 18.6; SD = 0.79) and were all English native speakers. We generated a stimuli space by controlling parametrically the animation of a 3-D talking head, Baldi (Massaro et al., 1998). This application is capable of synthesizing the audio-visual and affective components of a talking face following a modular principle. This powerful approach allows us to modulate and blend the different components of the multimodal stimulus stream producing a complete set of stimuli that portrays different affect. Given that Baldi apps are currently on the iPhone (Massaro et al., 2009), we used an in-house app to present the stimuli on an Apple iPad to the subjects at a distance of about 45 cm. No visual fixation point was provided. We selected the two basic emotions happy and angry for our stimulus space. This decision was based on the concept that the two emotions code 76 opposite affect (Russell, 1980; Ekman et al., 1982). To create the affective expressions we varied the eyebrows and mouth corner deflection because of their influence on affective rating between angry and happy expressions (Ekman and Friesen, 1978) (See figure 4.7). For the linguistic semantic dimension we defined a stimulus continuum that consists of fifteen words. The selection of these words was based on the evaluation of affect and activation measured by others (Whissell, 1989; Morgan and Heise, 1988). We also controlled for the word frequency, selecting familiar words that appear between once every million and once every one hundred thousand tokens (Carroll, 1971). The happy words were: happy, joyful, delighted, proud, pleased and enthusiastic. The angry words were: bitter, resentful, envious, angry, outraged and furious. The neutral words were: neutral, demanding and rebellious. The 6 angry words and 6 happy words were pooled into two classes coding high and low affect. The neutral words were pooled into one class. For the vocalization of the words we used the MARY text to speech engine speaking in a neutral voice (Schröder and Trouvain, 2003). Figure 4.7: The affective facial expressions of the stimulus space used in experiment 1. The eyebrows and the mouth corner deflection of Baldi were varied to produce a stimulus continuum from happy to angry. The experiment used a factorial design with fifteen words and five facial expressions. This means that 75 distinguishable stimuli were produced. Three different conditions were tested in the experiment. In the first condition participants had to judge the expression of the face without 77 paying attention to the semantic meaning of the word. In the second condition they had to judge the linguistic meaning without paying attention to the affect of the face. In the last condition subjects had to judge the global event. In each condition the presentation of the 75 stimuli was repeated 5 times. Between these blocks participants had a break of 2 minutes. Normally only one condition was tested per day. Subjects that were tested on two conditions on the same day had a at least a break of 4 hours between the sessions. The order of the 75 stimuli and the 3 conditions were randomized. After each stimulus subjects had to give a rating by pressing a button on the touch screen that was labeled as ’Positive’ or ’Negative’. During the rating the face was not visible. The subject’s response and reaction time were recorded. After the rating a one second break was implemented before the next stimulus was presented. The mean observed proportion of happiness identification was computed for each of the 75 stimuli for each subject by pooling across all the 5 blocks for each condition. 4.2.6 Results For the data analysis we classified the 15 affective words into 5 classes coding different strength of affect: Happy, Medium Happy, Neutral, Medium Angry, Angry. Hence, each class contained 3 words: Happy (Happy, joyful, delighted), Medium Happy (Proud, pleased, enthusiastic), Neutral (Neutral, demanding, rebellious), Medium Angry (Bitter, resentful, envious) and Angry (Angry, outraged, furious). The analysis of the reaction time revealed a significant difference between the linguistic semantic and the expressive condition. Subjects responded faster in the expressive condition (Median = 0.77 [sec]) compared to the linguistic semantic condition (Median = 0.98 [sec]): Kruskal Wallis χ2 (1, N = 1050) = 344.05, p < 0.01. This result has to be carefully interpreted because the face was exposed 0.6 seconds before the head started to talk. This means that the expression was perceived earlier than the word and any difference in performance are influenced by this difference. To analyze the influence of one affective dimension on the rating of 78 Reaction Time in Expression Condition Reaction Time in Word Condition 1.1 * * 1.4 Reaction Time [sec] Reaction Time [sec] 1 0.9 0.8 0.7 1.2 1 0.8 0.6 Coherent Valence Incoherent Valence Coherent Valence Incoherent Valence Figure 4.8: Reaction time in the expression condition (left) and the word condition (right). When the stimulus construct had coherent valence qualities reaction times were reduced in both conditions. The box indicates the 25th and the 75th percentile, the whiskers indicate the most extreme data points not considered as outliers. The horizontal line is the median. the other, we calculated the reaction time for coherent and incoherent stimulus constructs. A coherent stimulus construct is defined as a stimulus coding happy or angry valence in both linguistic semantics and expression dimensions. An incoherent stimulus construct codes opposite valence in the two dimensions. People responded faster in both conditions, when the stimulus construct coded coherent valence quality (See figure 4.8). A Kruskal Wallis test revealed a significant reduction in reaction time in the expressive condition (Median coherent = 0.74 [sec]; median incoherent = 0.8 [sec]: χ2 (1, N = 85) = 10.96, p < 0.01) and the linguistic semantic condition (Median coherent = 1.03 [sec] ; median incoherent = 1.10 [sec]: χ2 (1, N = 85) = 8.04, p < 0.01). No difference for the type of valence (angry vs. happy) was observed. In the second analysis we investigated how the word or the face affected the judgment of the face or the word. We observed a significant influence of the angry vs. happy words on the judgments of neutral faces: Wilcoxon W = 129.66, z = 2.53, p < 0.01 (See figure 4.9). To investigate the psychological mechanism responsible for the integration of the two modalities we tested the observed data against the 79 predictions of the Fuzzy Logical Model of Perception FLMP and the Additive Model of Perception AMP. The model fitting was accomplished with STEPIT (Chandler, 1969). The FLMP and the AMP were fit to the observations of each of the 7 individuals. The fit of the FLMP requires the estimation of five ei values for the 5 different classes of expressions and 5 lj for the five different classes of semantic information. The AMP requires the same number of parameters plus one for the weight value w. The goodness of fit was calculated by the root mean square deviation (RMSD) between the observation and the model’s prediction. Figure 4.9 shows the average fit for the FLMP in the expression and semantic conditions. The average RMSDs for the FLMP in these two conditions were rexpressive = 0.0234 and rlinguistic = 0.021. The average RMSDs for the AMP in these two conditions were rexpressive = 0.0454 and rlinguistic = 0.0287. Figure 4.10 shows the average fit for the FLMP along with the observed data in the bimodal condition. The average RMSDs for the FLMP in this condition was rbimodal = 0.048. The fit of the AMP also produces a bigger RMSD in this condition rbimodal = 0.13. So in all three conditions the AMP produces lager RMSDs compared to the FLMP. An ANOVA was carried out on the RMSDs for the fits of the two models. The FLMP provided a significant better fit for than the AMP [F (1,40) = 7.89, p < 0.01]. 4.2.7 Discussion The differences in reaction time between stimuli that coded coherent and stimuli that coded incoherent valence quality indicates interference effects for the perception of affect in different modalities. Comparing the behavioral observations with the predictions of different models of perception shows that the FLMP significantly better fits the data than other models of perception. In the original Stroop task the meaning of a word interferes with color categorization (Stroop, 1935). This interference effect has been explained by two different hypotheses: The relative speed-of-processing for differ80 Semantic Condition 1 0.9 0.9 0.8 0.8 P (Happy) Identification P (Happy) Identification Expression Condition 1 0.7 0.6 0.5 0.4 Semantic 0.3 Happy Medium Happy 0.2 Neutral 0.1 Medium Angry Angry 0 Happy Medium Happy Neutral Medium Angry Angry 0.7 0.6 0.5 0.4 Expression 0.3 Happy Medium Happy 0.2 Neutral 0.1 Medium Angry Angry 0 Happy Medium Happy Expression Neutral Medium Angry Angry Semantic Figure 4.9: Observations (symbols) and predictions (lines) for the fuzzy logical model of perception FLMP in the expression condition (left) and the linguistic semantics condition (right). We observed a significant influence of the angry words on the judgments of the neutral facial expressions (left panel). This effect was not observed in the linguistic semantics condition (right panel). AMP 1 0.9 0.9 0.8 0.8 P (Happy) Identification P (Happy) Identification FLMP 1 0.7 0.6 0.5 0.4 Semantic 0.3 Happy Medium Happy 0.2 Neutral 0.1 Medium Angry Angry 0 Happy Medium Happy Neutral Medium Angry Angry Expression 0.7 0.6 0.5 0.4 Semantic 0.3 Happy Medium Happy 0.2 Neutral 0.1 Medium Angry Angry 0 Happy Medium Happy Neutral Medium Angry Angry Expression Figure 4.10: Observations (symbols) and predictions (lines) for the fuzzy logical model of perception FLMP (left) and the weighted additive model of perception AMP (right). The plot shows the fits for the bimodal condition where subjects had to identify the affect of the overall event. The FLMP makes a significant better prediction for the observed data compared to the AMP. 81 ent stimulus features and the bottleneck of attention (MacLeod, 1991). The first hypothesis states that the processing of the word is faster than the naming of the color. Because of this difference in speed, the reading task interferes with the color identification task but not vice versa (MacLeod, 1991). The second hypothesis claims that there is a bottleneck of attention. Specifically, people ignore the color of the word when reading the word but not the meaning of the word when they have to name the color (Cohen et al., 1990). According to this hypothesis, automatic processes are less affected by the distractive power of the non-attended stimulus feature. Following this idea theorists have argued that word reading is an automatic process, while color naming or picture identification requires more cognitive load and is therefore a controlled process (MacLeod, 1991). Other studies showed that pictures interfere with word categorization (Stenberg et al., 1998; Glaser and Glaser, 1989), but words do not interfere with picture categorization (Glaser and Dungelhoff, 1984; De Houwer and Hermans, 1994). The same result has been presented in a recent study that used affective words superimposed to affective faces (Beall and Herbert, 2008). In our experiment we used a modified Stroop task where people had to judge the affect of spoken words and facial expressions. Our analysis of the reaction times in the single mode conditions shows that not only the facial expression influences the judgment of the word meaning but also the meaning of spoken words interferes with the judgment of affective facial expressions. In both conditions, expression and linguistic semantic, the participants made faster judgments when the two dimensions of the stimulus construct coded coherent valence quality. Interestingly, this difference has only been observed for strong affect quality. Words and facial expressions that did not code extreme affect did not show such interference effects. These results indicate that the processing of affect in the facial expression and the linguistic semantic interfere with each other. Because both dimension show this influence we can exclude the idea that one of the two dimensions is processed faster than the other. A more suitable explanation for our results is the bottleneck of attention theory: The processing of affect in the face and the linguistic semantic are not com82 pletely automatic and therefore interfering. The fact that we observed this interference only in words and faces coding strong affect indicates that the processing of valence differs across the quality. This suggets that the brain prioritizes the processing of stimuli with strong affect. The fast processing of threatening faces have been explained with the same argument (Schupp et al., 2004). We also observed an influence of the linguistic dimension on the judgment of the neutral facial expression, but no influence of the facial expression on the judgment of the linguistic semantic (See figure 4.9). Probably the neutral faces were coding ambiguous affect that could be influenced by the non-attended feature. We hypothesize that more interferences are not being detected because the binary rating produces only extreme ratings. Therefore we implemented a slider interface in experiment 2. Compared to previous studies we used an animated face that was talking directly to the subjects. This experimental design is more natural than reading superimposed words on static photographs. This could be an explanation for the different results compared to studies that did not show interferences of the word dimension on the rating of the faces (Beall and Herbert, 2008). In studies that used printed words superimposed on static photographs we cannot be sure that the subjects were reading the words. It can be speculated that the missing interference of the words on the judgment of the facial expressions was due a lack of the stimulus perception and not because the perception of facial expressions is an automatic process. The interesting question now is how the brain integrates the dimensions into a global percept. Because the face is an important evolutionary stimulus it has been claimed that the perception of faces is unique, involving holistic and non-analytic brain processes (Levine et al., 1988). This hypothesis can be tested using our synthetic talking head Blaid that can express different levels of affect. The comparison between the observed performance of identifying affect and the predictions of different models of perception shows that the FLMP fits the data significantly better than the additive model (See figure 4.10). This means that in experiment 1 the subjects use both cues to judge affect in the same manner as they 83 combine speech features (Massaro, 1989; Massaro and Ferguson, 1993; Massaro et al., 1993; Massaro and Egan, 1996). In experiment 1 the subjects used a binary choice to give their ratings. It could be that the ratings were influenced by the urge of the participants to give coherent ratings and they memorized the rating they gave to expressions and words. The binary choice probably was not sensitive enough to detect more interference. Therefore we designed a second experiment with a slider to give continuos judgments. While a binary choice can be remembered we hypothesized that a slider is more sensitive to detect interferences in the ratings. 4.2.8 Experiment 2 Methods 6 male and 3 female academics from the University Pompeu Fabra participated in experiment 2. The participants did not receive any financial reimbursement. The age ranged from 24 to 37 (M = 30.1, SD = 5). As in experiment 1 we generated a stimuli continuum using Baldi to modulate the facial expressions and the MARY text to speech engine to vocalize the words. We varied the eyebrows and mouth corner deflection to generate 10 different facial expressions coding different strength of the emotions happiness and anger. (See figure 4.11). The selection of the 10 words coding happy or angry affect was based on the evaluation of other studies (Whissell, 1989; Morgan and Heise, 1988). We controlled for the word frequency using only words that appeared once every million and once every one hundred thousand tokens (Carroll, 1971). The words were: Joyful, happy, delighted, pleased, surprised, neutral, disappointed, angry, furious and outraged. We asked participants if they understood these words before the experiment started. Only comprehensible words were included in the analysis. In experiment 2 we used a factorial design with 10 facial expressions and 10 words producing 100 distinguishable stimuli. These stimuli were tested in 3 conditions were participants had to rate either the affective meaning of the word, of the facial expression or of the global event. Each 84 Figure 4.11: The affective facial expressions of the stimulus space used in experiment 2. The eyebrows and the mouth corner deflection of Baldi were varied to produce a stimulus continuum from happy H (top left) to angry A (down right) in 10 steps. The letter N indicates a neutral intermediate state. The number indicates the strength of the affect. 85 condition was tested on a different day. Before the experiment we presented the participants the complete stimulus space (words and faces) to familiarize them with the continuum. The stimuli were presented on an Apple iPad at a distance of about 45 cm. No visual fixation point was provided. After each stimulus subjects had to give a rating by using a slider that was labeled as ’Positive’ or ’Negative’ at the end points. A set button below the slider was used to execute the rating. During the rating the face was not visible. The subject’s response and reaction time were recorded. After the rating a one second break was implemented before the next stimulus was presented. The mean observed proportion of happiness identification was computed for each of the 100 stimuli for each subject. 4.2.9 Results A Wilcoxon test was conducted to evaluate difference in RT between experiment 1 and 2. The result indicate a significant difference: z = 79.6, p < 0.01 (See figure 4.12). Also in experiment 2 the reaction time for the face condition was significantly faster compared to the semantic or the global condition: Kruskal Wallis χ2 (2, N = 2687) = 93.9, p < 0.01. This result is not very informative because it is influenced by the stimulus on-set difference of 0.6 seconds between the facial expression and the moment when the head started to talk. Participant did not rate faster stimuli coding coherent valence compared to stimuli coding in-coherent valence. We analyzed if the ratings in the single modal conditions were influence by the non-attended stimulus dimension. Figure 4.13 shows the FLMP fits for the expression and the linguistic semantic condition. The dots show observations, the lines the fit of the FLMP. In the expression condition strong affect coding words showed an influence on the ratings of the faces. But this influences was not coherent: For example, we observed a significant positive influence of the word happy on the probability to identify a happy affect in the facial expression compared to the word furious: Wilcoxon p < 0.01. Interestingly the word joyful had a negative 86 RT Experiment 1 and 2 4 * Reaction time [sec] 3.5 3 2.5 2 1.5 1 0.5 0 Experiment 1 Experiment 2 Figure 4.12: The mean RT in experiment 1(M = 0.97, SD = 0.5) was significant faster compared to experiment 2 (M = 2,12, SD = 1,1) (Wilcoxon z = 79,6, p < 0.01). effect on the probability to identify happy affect in the face compared to the word outraged: Wilcoxon p < 0.01 (See figure 4.13, left panel). In the semantic condition we did not observe influences of the facial expression on the ratings of the affect coded by the linguistic semantic (Figure 4.13, right panel). In general the participants identified stronger positive affect in the linguistic semantics compared to the expressive condition. A ttest comparing the 5 most positive classes revealed significant differences across the two conditions: t-test t(8) = 2,2, p = 0.05). To investigate the underlying mechanism of multi-modal stimulus integration we analyzed the average fit for the fuzzy logical model of perception FLMP and the weighted average model WAM for the bimodal condition. The root mean square deviation (RMSD) for the FLMP is rbimodal = 0.032, for the WAM rbimodal = 0.031. The two model fits did not differ in the quality of their predictions (See figure 4.14). 4.2.10 Discussion We did not observe the interference effects in the reaction times for coherent and non-coherent stimulus constructs in experiment 2 as we have seen in experiment 1. The observed differences in RT between the two 87 Expression Condition 0.7 0.6 0.5 0.8 0.4 0.3 0.7 0.6 0.5 0.4 0.3 0.2 0.2 0.1 0.1 0 H4 H3 H2 H1 N2 N1 A1 A2 A3 H4 H3 H2 H1 N2 N1 A1 A2 A3 A4 0.9 P (Happy) Identification 0.8 Expression 1 Joy Hap Plea Surp Deli Neut Disap Ang Fur Outr 0.9 P (Happy) Identification Linguistic Semantic Condition Semantic 1 0 A4 Jo Ha Ple Sur Expression Del Neu Dis Ang Fur Out Word Figure 4.13: Observations (symbols) and predictions (lines) for the fuzzy logical model of perception FLMP in the expression condition (left) and the linguistic semantics condition (right). FLMP 0.6 0.5 0.4 0.3 0.7 0.6 0.5 0.4 0.3 0.2 0.2 0.1 0.1 0 H4 H3 H2 H1 N2 N1 A1 A2 A3 Joy Hap Plea Surp Deli Neut Disap Ang Fur Outr 0.8 P (Happy) Identification 0.7 Semantic 0.9 Joy Hap Plea Surp Deli Neut Disap Ang Fur Outr 0.8 P (Happy) Identification WAM Semantic 0.9 0 A4 Expression H4 H3 H2 H1 N2 N1 A1 A2 A3 A4 Expression Figure 4.14: Observations (symbols) and predictions (lines) for the fuzzy logical model of perception FLMP (left) and the weighted average model WAM (right). The average root mean square deviation RMSD for the FLMP (0.032) and the WAM (0.031) did not differ in their quality of prediction. 88 experiments could be a possible explanation for this result. The mean reaction time in experiment 2 was with 2,12 seconds much slower than in experiment 1 (M = 0.97). One objective of experiment 2 was to investigate the influence of one modality (face or word) on the rating of the other. In the condition expression we did observe influences of the words on the ratings. The word happy positively affected the identification of positive affect in the face compared to the word furious. Surprisingly these influences were not structured according to the valence quality. For example we observed that the word joyful had a stronger negative influence on the probability to identify positive affect in the face compared to the word outraged. These effects have not been observed in the linguistic semantic condition. Different interpretations can explain this result: it could be that the perception of affect in the face is more continuous than the perception of the affective meaning of words. It seems like we perceive the words meaning as classes and we are capable of remembering these classes. This memory traces are also not influenced if the word is presented with a happy or an angry facial expression. Another interpretation is that the facial expressions were more ambiguous than the selected words that coded stronger affect. The mean probability to identify positive affect in the linguistic semantic condition was higher than in the facial expression condition. A control experiment would be to let participants rate the facial expression multiple times without saying a word. A more homogeneous identification curve would indicate that the observed inhomogeneous identifications were induced by the meaning of the words and are not intrinsic properties of the facial stimulus. In the second experiment we observed that the predictions of the FLMP and the WAM did not perform significantly different. We have to point out that the FLMP is using one free parameter less than the WAM but still achieves the same performance. Nevertheless, this precondition does not provide enough strength to favor the FLMP over the WAM for the matching of the observed behavioral data. The bad fit of the FLMP can have two interpretations: First, in general the model does not make not stronger predictions for non-extreme observations. The fact that participants did 89 not perceive maximal or minimal positive affect in the bimodal condition can be an explanation for the bad fit of the FLMP. The second interpretation is that the integration of the affect coded in facial expressions and linguistic semantics does not follow the mechanism of the FLMP. Because the WAM also does not make significantly better predictions we cannot say which of the two models is more appropriate to explain the multimodal integration of affect. 4.3 Conclusion We investigated the perception of affect in a talking face focusing on linguistic and expressive influences. In experiment 1 we observed interferences of the non-attended dimension on the judgment of the attended dimension. These interferences were measured in differences in reaction time and judgments. The observations of the multi-modal condition were significantly better predicted by the FLMP than by the WAM. In experiment 2 we only observed the interfering of the linguistic meaning on the judgments of the facial expressions. While in the first experiment we observed interferences as measured by differences in reaction time, in the second experiment we observed interferences measured by changes of the probability to identify positive affect. Surprisingly these interferences were not coherently structured according to the valence quality continuum. The observations of the bimodal condition were not significantly better predicted by the FLMP than by the WAM. A main difference between experiment 1 and 2 is the mean reaction time. The slider interface of experiment 2 increased the time participants used to rate the stimulus from approximately 1 second to 2 seconds (See figure 4.12). This difference in reaction time could give us an explanation for the different results observed in the two experiments. The fast responses in experiment one could have favored automatic processing of the affect. In experiment two participants took more time and the responses probably were more controlled. It has been shown that the FLMP makes better predictions for automatic processes than for controlled one. 90 Therefore, the good predictions of the FLMP in experiment 1 supports the interpretation that the evaluation and integration of the affect was, in this case, based on automatic processing. In experiment 2 RTs were significantly longer. This indicates that the answers were more controlled than in experiment 1. In experiment 1 we have observed interferences measured in differences in RTs. But when we look at the judgments in both experiments we see that the meaning of the non-attended feature is not integrated into the final perception. Participants based their decision mainly on the valence quality of the attended feature. We have observed only unstructured influences of the word on the judgments of affective faces in experiment 2. The inhomogeneity of these results increases our doubt that this is a reliable influence. The similar performance of the FLMP compared to the WAM in experiment 2 brings us to the conclusion that the controlled integration of affect coded in the word and the facial expression does not follow the same mechanism observed in the bi-modal speech perception. Bringing the results from the two experiments together we conclude that the perception of affective linguistic and expressive features happens automatically. The differences in RT to evaluate stimuli coding coherent or in-coherent valence quality observed in experiment 1 indicate that the two processes are interfering with each other, also when participants are instructed to focus on either the linguistic or the expressive feature. Masked priming experiments showed that both, the processing of affective words (Greenwald et al., 1989; Dehaene et al., 1998; Bargh, 1989; Kihlstrom, 1987) and facial expressions can happen automatically (Dimberg et al., 2000; Winkielman et al., 2005). This means that the participants cannot avoid the perception of the meaning of the non-attended feature. The fact that only words and faces coding strong affect had the power to interfere indicates that the observed phenomenon is not a capacity problem to process two stimuli at the same time. A more satisfying explanation however is that the valence quality is the crucial factor responsible for this interference. The fact that the evaluation of the valence quality is mainly a sub-cortical process supports this interpretation. Our results show that when participants have enough time they do not 91 integrate a multi-modal affective stimuli according to the mechanism of the FLMP. We know that the processing of the linguistic semantic and different aspects of face perception is located in different cortical areas (Schirmer and Kotz, 2006). Our results support the idea that the communication between these areas is not based on automatic mechanisms. We propose that the integration of affective features communicated by the face and in the linguistic semantic is more controlled than the bi-modal speech perception and perhaps uses a different but so far unknown integration processes. Future studies should try to address this issue if we want to understand how humans integrate emotions perceived by a talking face. 92 Chapter 5 COMPUTATIONAL MODEL OF EMOTION INDUCED LEARNING One of the most interesting questions in emotion research is how the brain processes affect and how this mechanism influences behavioral performance and cognitive activity. One approach to study this phenomenon is to construct computational models using both, the knowledge from studies investigating the anatomical architecture of the neuronal network and physiological data of the brains’ activity pattern from real world experiments. By comparing the performance of the model with the performance of the neurobiological system we gain insight about the underlying neurobiological mechanisms. The emergence of emotions is a complex multidimensional process that involves different brain areas. Because of this complexity researchers that model emotions identified specific aspects of emotion processing (Velásquez, 1997; Gebhard, 2005; Gratch and Marsella, 2005; Armony et al., 1997; Marsella and Gratch, 2009; Mor, 1995; El-Nasr et al., 2000). Here we address the mechanism of classical conditioning that is affected by the emotional strength of a stimulus. We investigate how the underlying mechanisms of affect evaluation influences behavior and memory 93 acquisition. 5.1 The Two Phase Model of Conditioning Learning is defined as a change in behavior that occurs as a result of experience (Mackintosh, 1974). The classical conditioning paradigm introduced by Pavlov (Pavlov, 1927) is based upon the association of two stimuli. A conditioned stimulus (CS) such as a tone produces either no overt or a weak response, usually unrelated to the response that eventually will be learned. The unconditioned stimulus (US) such as a shock to the leg elicits a strong, consistent response called the unconditioned response (UR). Presenting the CS before the US will start to elicit a new response: the conditioned response (CR), that reaches its peak amplitude just before the excepted US. The probability to observe a correctly timed conditioned response increases over multiple training sessions. The classical conditioning paradigm provides an opportunity for the acquisition of both emotional and motor CRs. In the 1960ths, Mowrer investigated how the avoidance of a conditioned stimulus that induces fear can act as a reinforcer for associative learning (Mowrer, 1960). His study stimulated the discussion how emotional states affect behavioral adaptations. He and Miller formulated the two-factor learning theory, that states that behavior that reduces fear will be reinforced (Miller, 1948). Based on this idea the polish psychologist Jerzy Konorski studied in the early 1960s the relative independence of classical and instrumental conditioning responses (Konorksi, 1948; Konorski, 1968). He proposed the existence of two distinguishable associative learning mechanisms: A fast non-specific learning system (NLS), that produces within 1 to 5 trials a global state of arousal and elicits simple self-protective reaction patterns and a slow specific learning system (SLS), that is responsible for the accumulation of fine tuned motor reactions over a longer period of conditioning (Ellison and Konorski, 1964). Acquisition of such motor CRs however requires massive training and the response involves the musculature of organs challenged by the aversive US (Schneiderman et al., 1962; Powell and 94 Levine-Bryce, 1988). This distributed learning mechanism was conceptualized as the Two Phase Theory of Learning (Ellison and Konorski, 1964; Konorksi, 1948; Konorski, 1968; Rescorla and Solomon, 1967; Rescorla et al., 1972; Gormezano et al., 1987; Bakin and Weinberger, 1990), stating that association involves two stages: rapid stimulus-stimulus learning followed by slower stimulus-response learning. The first step shows that the subject has learned the apparent cause–effect relationship of the stimuli, i.e., is able to predict the course of events. The second step shows that the subject is attempting to alter its physical relationship to the outside world, by reducing the impact of the US if it is noxious. At an abstract level we have already modeled this relationship using our Distributed Adaptive Control Architecture that has been successfully applied to robots (Verschure et al., 2003). We have also provided a formal analysis of how prediction based models for perceptual and behavioral learning can be interfaced (Duff and Verschure, 2010). Here we specifically generalize these abstract models to a biologically constrained solution in terms of the NLS and the SLS (Inderbitzin et al., 2010a). Eye-blink Conditioning One of the best studied cases of associative learning is the eye-blink conditioning paradigm, which was introduced by Gormezano (?). In this paradigm, a tone or light (CS) is paired with an air puff or electric shock (US) to the eye. The US alone leads to a reflexive eye-blink (UR). The CS–US pairing results in a precisely timed closure of the eyelid, milliseconds before the predicted air–puff or electrical shock arrives. Eye-blink conditioning provides an experimental set up that allows us to study in detail the multi-dimensional mechanisms that lead to the acquisition of the CR. An aversive US induces within few trials a range of bodily responses (e.g. freezing, changes in cardiovascular rhythm, respiratory systems) (LeDoux, 1996). Evidence of the areas responsible for eyelid and fear conditioning were obtained by removing or destructing various brain areas and examining whether learning was still possible. By inactivating or removing the amygdala, it was shown that the construction 95 of CS representational maps in the cortex was negatively affected (Armony et al., 1998) and the CR is disrupted (Phillips and LeDoux, 1992; LeDoux, 2000). By lesioning the vermis of the cerebellum the CR can be abolished without the UR being affected (Thompson, 2005). The same result can be observed, if instead of the cerebellum, the interpositus nucleus, a deep cerebellar nucleus, is lesioned (Thompson, 2005; Fanselow and Poulos, 2005). While the lesion of the cerebellar cortex has a negative impact on the exact timing of the CR (Perrett et al., 1993). This studies support the view that the amygdala is responsible for the acquisition of emotional CRs, taking the form of non-specific, autonomic arousal (Lennartz and Weinberger, 1992) and the cerebellum for the induction of the motor conditioned reaction, in the form of an exactly timed CR. Stimulus–Stimulus Conditioning Rapidly-developing CRs like the change of heart rate, respiration, blood pressure or skin conductance, develop regardless of the locus or the type of the US (Schneiderman et al., 1962; Powell and Levine-Bryce, 1988). These reactions have been termed non–specific (Lennartz and Weinberger, 1992). Such non-specific CRs have an important role in behavioral adaptation during conditioning. In 1956 Galambos et al. identified the primary auditory cortex as a location of associative plasticity (Galambos et al., 1956). Subsequent neurophysiological studies then further strengthened this long-thought idea of learning–induced plasticity in sensory cortices (Weinberger, 2004). Classical fear conditioning to another CS retunes the receptive fields in the primary auditory cortex to favor the processing of the frequency which was used as the CS (Bakin et al., 1996; Bakin and Weinberger, 1990; Kisley and Gerstein, 2001). These changes of receptive fields develop very rapidly (Edeline et al., 1993). It has been shown that the amygdala codes aversive events (LeDoux, 2000; Paton et al., 2006; Tazumi and Okaichi, 2002) and stimulates subcortical modulatory cell clusters like the nucleus basalis in the basal forebrain (Aggleton, 1992; LeDoux, 1995). The activation of the nucleus basalis by the amygdala releases cortical acetylcholine, a modulatory neurotransmit96 ter, that acts as an inducer of plasticity in the cortex (Gold, 2003; Wenk, 1997). Stimulus–Response Conditioning Stimulus–response associations are responsible for forming the specific somatic motor responses that are directed to a specific unconditioned stimulus (US). Such CRs must not only be specific to the locus of the nociceptive US, but they also need to be well–timed, to occur preceding and during delivery of the US. One brain region that is highly involved in the controlling of well coordinated motor behavior is the cerebellum (Perrett et al., 1993). It receives sensory information from cortical and subcortical parts of the brain and integrates these inputs into a fine tuned motor response. Lesion studies involving the classical eye-blink conditioning paradigm provide strong evidence that the cerebellum is one location where the acquisition of stimulus–response conditioning can be observed (Krupa and Thompson, 1997). Inactivating of the different cerebellar structures prevents the construction of a measurable CS–CR relation. Several formulations for the adaptive plasticity of sensory-motor response in cerebellum have been described (Albus, 1975, 1971; Marr, 1969; Floeter and Greenough, 1979). Multiple investigations have shown that the granule cell–purkinje cell– deep nucleus circuit is a locus of CS–US convergence (Ito, 1989, 2002). Theories that assign learning to cerebellar circuits are based on the observation of activity induced synaptic plasticity at the level of the parallel fibre - Purkinje cell (James et al., 2004). The CS activates the PU over the mossy-fiber connection, the US excites the purkinje cell by the inferior olive–climbing fibre pathway. This CS - US convergence at the locus of the purkinje cells leads to a co-activation and an induction of long term potentiation at a synaptic level (Aizenman et al., 1998; Ito, 1989, 2002). This long-lasting reduction of synaptic strength induces a dis-inhibition of the deep nucleus. 97 The Link Physiological and lesion studies have identified the basilar pontine nuclei as a relay structure that transmits auditory information from the cortex to the cerebellum (Steinmetz et al., 1991; Thompson, 1986). This cell structure receives input from the cochlear nuclei, nuclei of the lateral leminiscus and the inferior colliculus. Lesions of this nuclei result in a disruption of the motor CR (Steinmetz et al., 1987; Lewis et al., 1987). Stimulation of the pontine as a substitute of an external CSs leads to a fast induction of conditioning (Steinmetz et al., 1986). Animals exposed to pontine stimulation showed in follow up exposure to a real tone CS’ an immediate CRs (Steinmetz, 1990). This findings indicate that the pontine nucleus acts as a gate of auditory stimuli transmission to the cerebellum. 5.2 5.2.1 Methods The circuit We propose a model of the two phase theory of conditioning with system architecture that is constructed by two subsystems: The Non-specific learning system (NLS) and the specific learning system (SLS) (Figure 5.1). We study local and global learning mechanisms of activity induced plasticity in an integrated neuronal circuit, that models the auditory system, including the subcortical amygdala and nucleus basalis and the cerebellum. In both systems we model synaptic plasticity at the locus where CS and US converge. 5.2.2 The Non-specific Learning System We propose to model plasticity in the non-specific learning system with a circuit including the amygdala, the nucleus basalis and the primary auditory cortex. The amygdala plays an important role in learning to respond defensively to stimuli that predict punishment (LeDoux, 2000, 1996; Phillips and LeDoux, 1992) and the elicitation of fast non-specific 98 Figure 5.1: The architecture of the integrated model: The Non-specific learning system (NLS) is shown on the left, the specific learning systems (SLS) on the right. In the NLS the activation of the amygdala (A) and the nucleus basalis (NB) induces plasticity in the auditory cortex (AC). The conditioning stimulus (CS) reaches the auditory cortex over the thalamus (Th) where it converges with the unconditioned stimulus (US). Inhibitory interneurons (IN) regulate the amount of plasticity. The pontine nucleus (PN) gates the stimulation from the NLS to the SLS. In the SLS the CS and the US converge at the level of the purkinje cell resulting in the induction of LTD at the purkinje synapse. This induces a dis-inhibition of the deep nucleus (DN) leading to the exact timed motor conditioned response (CR). The reflexive unconditioned response (UR) is elicited without adaptive processing. A amygdala; AC auditory cortex; CS conditioning stimulus; DN deep nucleus; GC granule cells; IN inhibitory interneurons; IO inferior olive; NB nucleus basalis; CR conditioned reaction; PN pontine nucleus; PU purkinje cell; Th thalamus; US unconditioned stimulus 99 arousal states (Dedovic et al., 2009; LeDoux, 2000). Rodent fear conditioning studies have reported that amygdala lesions selectively impair acquisition and expression of conditioned fear responses to the CS, without altering unconditioned reflex responses to the innately aversive US (Phillips and LeDoux, 1992). Lee et al. were able to demonstrate that, consistent with the two-phase model of conditioning, rats exhibit two successive stages of non-specific emotional (fear) and specific musculature (eyelid) learning during delay eye-blink conditioning (Lee and Kim, 2004). As a cell cluster that is highly connected to subcortical modulatory systems (LeDoux, 2000; Aggleton, 1992), the amygdala can be seen as a relay station that channels the valence quality of a stimulus to other parts of the brain. One of the target destination of the amygdala’s output is the cholinergic neurons of the basal forebrain. Cholinergic neurons of the nucleus basalis regulate globally synaptic plasticity in the cortex. Experimental examples of specific learning–induced cortical plasticity are studies of the auditory cortex A1 (Weinberger, 2004; Bakin and Weinberger, 1990; Weinberger, 1998). The ventral medial geniculate body of the thalamus (MGv) transmits the tone detection from the cochlea to the primary auditory cortex. The released ACh is a result from the amygdala - nucleus basalis stimulation and acts at muscarinic receptors in A1. Converging events with cortical excitation from the effects of the tone thus produces long-term plasticity. Spike Time Dependent Synaptic Plasticity STDP The timing of pre- and post synaptic activity are the crucial factor for the adaptation of signal transmission at a synapse (Markram et al., 1997; GuoQiang and MuMing, 1998). Back-propagating action potentials BAPs, that travel backwards from the soma to the dendrite (Stuart and Sakmann, 1994; Kuczewski et al., 2008) and the inhibition of BAP’s through inhibitory interneurons are regulating the activity pattern on a synaptic level and thereby the induction of STDP (Lowe, 2002). I(t) is the amount of inhibition received during the interval [t, tpost ]. If the inhibition I(t) is high the back-propagating AP gets blocked and synaptic depression is induced. 100 The pre-synaptic activity pattern causes 2 types of depressions: Longterm depression LTD if there is no pre-synaptic activity, heterogenous long-term depression HLTD if there is some pre-synaptic activity. If the inhibition I(t) is low and not strong enough to block the back-propagating AP, LTP is induced. The synaptic efficacy of the weights in the current model evolve according to a modification of a recently proposed learning rule, which utilizes back-propagating action potentials (Sanchez-Montanes et al., 2002; Hofstoffer et al., 2002). The efficacy of a synapse is increased, if a backpropagating action potential arrives at a synapse simultaneously, within a small symmetrical temporal window when a pre-synaptic action potential arrives: τ0 (5.1) ∆w = αLT P τ0 + |tpost − tpre | with αLT P being the LTP learning rate, τ0 = 10 defining the temporal window and tpost , tpre the timing of the pre– and postsynaptic action potential respectively. The activation of the inhibitory interneurons through the negative feedback loop attenuate this retrograde propagation in the dendritic trees of the cortical excitatory neurons, decreasing the efficacy of the activated synapses according to: ∆w = −βLT D τ0 τ0 + |tpost − tpre | (5.2) with βLT D being the LTD learning rate, τ0 = 10 defining the temporal window and tpost , tpre the timing of the pre– and postsynaptic action potential respectively. To further alter the weights, an additional heterosynaptic LTD (HLTD) was implemented, which decreases the synaptic efficacy if postsynaptic activity occurs without coincident presynaptic activity: ∆w = αheteroLT D (5.3) with αheteroLT D being the heterosynaptic LTD learning rate. The modification of the weights is therefore crucially dependent on the temporal dynamics of the neuronal network, taking the relative timing of the excitatory and inhibitory inputs to the cortical neurons into account. In our 101 model US activity drives the nucleus basalis activity and modulates the inhibition of the cortical interneurons, in this way regulating the ratio of LTP/LTD in the network. 5.2.3 The specific Learning System The model described here is an extension of the model published by Verschure and Mintz (Hofstoffer et al., 2002). The circuit is built up by the granule cells, purkinje cells, inferior olive, deep nucleus, mossy fibers, climbing fibers and parallel fibers (Figure 5.2). The system receives input from the NLS via pontine nucleus. The co-activation of the purkinje cell by US induced climbing fibre activity results in a reduction of synaptic efficacy at the PF–PU synapse, or long-term depression LTD. (Ito, 1989). PF stimulation alone leads to a weak net increase of the connection strength of the PF–PU–synapse or LTP. Purkinje Cell The Purkinje cell is composed of three different compartments (Figure 5.2 ). The compartment representing the soma of the cell, called PU–SO, receives excitatory inputs from PU–SP, PU–SYN and IO. PU–SP is responsible for the spontaneous activity of the Purkinje cell. PU–SYN represents the dendritic region of the PU which forms synapses with PF. PU–SO emits spikes as long it is not inhibited by the inhibitory neurons (I). PU–SYN on the other hand represents the metabolic postsynaptic responses in Purkinje cell dendrites to parallel fiber stimulation. Unlike a generic integrate and fire neuron, PU–SYN does not emit spikes but behaves like a linear threshold neuron showing continuous dynamics. In order for the PU to form an association, a permanent trace – the eligibility trace – has to be present in its dendrites (PU–SYN). The high persistence values of PU–SYN, β SY N , defines this prolonged responses in PU dendrites, forming a CS–trace. In the present model a CS–trace obtained through prolonged responses in PU–SYN dendrites, allows the association of CS and US. Such a notion is supported by physiological studies (Wang et al., 2000) and has already been suggested by Hull (Hull, 1939). Thus, synapses which have been 102 Figure 5.2: The architecture of the cerebellar SLS. The CS and the US converge at the purkinje cell synapse (PU-SYN). CF climbing fibre, CR conditioned reaction, CS conditioned stimulus, DN deep nucleus, GA granule cells, GO golgi cells, IIN inhibitory interneurons, IO inferior olive, MF mossy fibre, PF parallel fibre, PU-SP purkinje cell spontaneous activity, PU-SO purkinje cell soma, PU-SYN purkinje cell synapse, US unconditioned stimulus. 103 activated by a CS–related input remain eligible to US–induced weight changes for some period of time. The Purkinje cell operates in two modes: a default, spontaneous mode and a CS–mode. In the spontaneous mode, the PU–SP compartment is active, providing the tonic inhibition of the Deep Nuclei. Once a CS is presented, this activity is suppressed through inhibition. The duration of this suppression is matched to the duration of the CS–trace in PU–SYN. Tonic inhibition from now on is under the control of PF. To support the learning mechanism outlined, the model needs to account for the acquisition of a pause in Purkinje cell activity following a CS. Only this would lead to a rebound excitation in the deep nucleus. Synaptic plasticity in the model The processes contributing to the learning effect in the cerebellum is LTD and LTP. Many experiments have shown that synapse weights between the PF and PU undergo plasticity (Ito, 1989; Aizenman et al., 1998). In order to learn, the weight has to be altered during the conditioning process. According to the Marr–Albus theory (Marr, 1969; Albus, 1971), which was implemented in the model, LTD can only occur in the presence of an active stimulus trace once the CF gets activated, thus alterations can only happen, if there is a stimulus trace of a CS (AP U SY N > 0) in the PU– Dendrites (PU–SYN). Such a CS–trace, which is believed to be formed by a prolonged metabolic second–messenger response in Purkinje cells following parallel fiber stimulation, was included in the model through high persistence values of PU–SYN. PF stimulation alone leads to a weak net increase of the connection strength of the PF–PU–synapse or LTP, while activation of a CF in the presence of an active stimulus trace leads to a net decrease or LTD. LT P LT P Long–term Potentiation Rule In the present model Emin and Emax LT P LT P define the range in which an LTP can be triggered. If Ei ∈ [Emin , Emax ]: max wij (t + 1) = wij (t) + η(wij − wij (t)) 104 (5.4) otherwise: wij (t + 1) = wij (t) (5.5) η describes the rate constant for the potentiation. The chosen values for these parameters allow several weak potentiation events following a Pf input. Long–term Depression Rule The magnitude of the long-term depression in the present model is determined by the internal calcium concentration. As the model described in this work is an abstract and reduced description of the cerebellum, the notion of an internal calcium concentration has to be seen as the internal trace of a past CS event. Work from Coesmans and colleagues (Coesmans et al., 2004) supports the concept of such a calcium dependent response. They observed that the bidirectional PF long–term plasticity is governed by a calcium threshold mechanism, which is characterized by a high calcium threshold for LTD and a lower calcium threshold for LTP. The minimal value for a LTD to be triggered LT D . is defined as Emin LT D ; wij (t) if Ei > Emin (5.6) wij (t + 1) = wij (t) otherwise. describes the rate constant of the depression. 5.2.4 Integrating the NSL with the SLS The connection of the non-specific learning system, responsible for the cortical CS representation and the specific learning system, responsible for the exact timing of the CR, produces an integrated model of the two phase theory of learning. In our study the pontine nucleus has a gating function, allowing the transmission of stimulus with behavioral importance. The pontine nucleus is built by an integrate and fire neuron i with a membrane potential at time t + 1, Vi (t+1): Vi (t + 1) = βVi (t + 1) + Ei (t) + Ii (t) 105 (5.7) where β[0, 1] is the persistence of the membrane potential which defines the speed of the decay towards the resting state, Ei (t) and Ii (t) the excitatory and inhibitory input at the time t. The functionality of the integrated model is dependent on the quality of the cortical representation, which is a function of the strength of the STDP induced plasticity in the auditory cortex and from the gating threshold of the pontine nucleus. The model transmits only stimuli with a behavioral importance from the NLS to the SLS. We tested the performance of our network with an eye-blink conditioning simulation. The auditory cortex was constructed by an cell array of 50 units. All the other components of the model were built by single cell units. The CS was a conceptualized auditory stimulus coded as a pattern of 5 active cells in an array of 50. 30 trace conditioning trials with a CS exposure time of 400 ms and an US exposure time of 100 ms were applied to the model. To check the performance of the model the CS and the 4 different control stimuli were exposed after the conditioning phase. 5.2.5 Robot Application In a second step we verified the reliability of the model by checking the performance of an autonomously behaving robot in an obstacle avoidance task (Figure 5.3). In an open field arena the robot had to learn to avoid any collision with the wall by detecting a red color patch with a camera. In this set up the detection of the wall by the robots’ proximity sensors were used as the US, the detection of the red color by the camera as the CS. Because the visual field of the camera exceeded the sensitive range of the proximity sensors, the detection of the CS was always prior to the US. To keep the interstimulus interval constant the velocity of the robot did not changing. The robot was moving freely around until it a conditioned response in form of a exactly timed turning was observed. 106 Figure 5.3: Robot application: A ePuck robot moves autonomously in a circular open field arena. The association of the red color on the floor detected by a camera (CS) and the detection of the wall by proximity sensors (US) induced learning in the proposed computational mechanism. The green arrows indicates the moving direction of the robot. 5.3 5.3.1 Results Performance of the Integrated Model We recorded the activity of the auditory cortex (AC), the purkinje cells (PU) and the deep nucleus (DN) before, during and after the eye-blink conditioning simulation. An analysis of the learning curve of the NSL and the integrated model was made to proof the timing of the adaptive processes. A quantification of the response was made by analyzing the spiking behavior of the different cell groups. In the non-specific learning system we observe before conditioning a homogenic intensity in response (Figure 5.4, left). The co-activation of cortical cells by the amygdala– nucleus basalis pathway and the thalamic pathway induces a tonotopical reorganization of the different tones. After conditioning a bigger response of cortical cells to the CS can be observed (Figure 5.4, right). Once the CS representation has increased a critical threshold, the pontine transmits 107 the signal to the specific learning system. Figure 5.4: Reactivity of the auditory cortex before and after the conditioning. CS is the stimulus with ID 1. Before the conditioning the cortical reaction to all 5 stimuli is homogenic. After the conditioning the cortex response to the CS is increased. The plasticity in the integrated model starts when co-activation of the purkinje cell from the parallel fibers (CS) and the climbing fibres (US) coincide. This co-activation of the purkinje cell induces LTD that results in an decrease of the purkinje cell activity (Figure 5.5). During conditioning trial 12 this activity under-run the first time the threshold causing the dis-inhibition of the deep nucleus, that leads to the elicitation of a first imprecise motor reaction. As long as the CR is not optimal timed the sustained LTD induction results in an ongoing decrease of the purkinje cell activity until the exactly timed CR is established. Before conditioning no increase AC activity can be observed and the PU keeps its deep nucleus inhibition constant (Figure 5.6). After Conditioning the AC reacts with an increased firing rate and the PU induced pause releases the rebound inhibition of the deep nucleus, responsible for the exact timed conditioned response (Figure 5.7). 108 Figure 5.5: Learning of the exactly timed CR by the SLS: The PU cell activity decreases during conditioning trials 1-13. During trial 12 the activity under-runs for the first time the threshold resulting in the dis-inhibition of the deep nucleus. During trial 13 the PU cell activity under-runs the threshold before the US and an exactly timed CR is triggered. The CS and the US are only schematically represented in this plot. 109 Figure 5.6: The performance of the integrated model before the conditioning. The purkinje cell (PU) does not change its activity and no CR is elicited. CS conditioned stimulus, US unconditioned stimulus, AC auditory cortex, PU purkinje cell, CR conditioned reaction. Figure 5.7: The performance of the model after the conditioning. The CS representation in the auditory cortex (AC) is increased. A delayed pause in the purkinje cell (PU) can be observed. The CR is elicited just before the US presentation. CS conditioned stimulus, US unconditioned stimulus, AC auditory cortex, PU purkinje cell, CR conditioned reaction. 110 5.3.2 Performance of the Robot In the beginning of the robot experiment the ePuck drives over the red area (CS) until he detects the wall of the arena with his proximity sensors (US). The late turning can be classified as a unconditioned response or reflex (Figure 5.8). A co-activation of the Purkinje synapse (PU-SYN) by the CS and the US induces LTD decreasing the synaptic weight of the PF-PU synapse (Figure 5.10). After 113 conditioning trials the robot performs for the first time a conditioned response in form of an early timing. From this point the robot avoids the wall as soon as the camera detects the red color (Figure 5.9). The blue line indicates the track of the robot. Figure 5.8: The behavior of the ePuck robot before conditioning. The robot enters the red area of the arena. The proximity sensors detect the wall (US) and elicit the unconditioned response (UR) in form of a late turning. The blue line indicates the track of the robot in the arena. . 111 Figure 5.9: The behavior of the ePuck robot after conditioning. The robot does not enter the red area of the arena. The camera detects the red color (CS) and the model elicits a conditioned response (CR) in form of an exactly timed turning. The blue line indicates the track of the robot. . 112 Figure 5.10: The change of the synaptic weight at the level of the PF-PU during the robot experiment. Every time a CS and a US coincide at the level of the purkinje synapse LTD becomes induced. Once the synaptic efficacy reaches a critical level a conditioned response becomes trigger avoiding future LTD induction and the synaptic weight becomes stable. . 113 Figure 5.11: The performance of the ePuck robot measured by percentage of performed conditioned response and occurred US. After 113 trials the robot shows conditioned behavior. The fluctuation in response is due a spontaneous recovery of the synaptic transmission at the Purkinje cell. Whiskers indicate STD. . 114 5.4 Conclusion We have presented an integrated model of the two phase theory of conditioning including neurobiological constrains of the non-specific and specific learning system. In a simulated eye-blink conditioning experiment we have demonstrated in a first step that the model increases cortical representations of stimuli with behavioral importance. The models capability to gate those representations to the specific learning system induces the adaptation of the exact timed CR. The performance of the NSL is controlled by the biologically based STDP taking into account the effects of back-propagating action potentials. The inhibition of these backpropagating APs by inhibitory interneurons is a fundamental controlling mechanism of the strength of the STDP. The performance of the specific learning system is controlled by the rate of the LTP and LTD and the CS-trace at the level of the purkinje cell. Integrating the circuits of the non-specific and specific learning system we demonstrate how cortical plasticity supports effective cerebellar associative learning. 115 Chapter 6 CONSTRUCTING AN EMOTIVE ANDROID So far we have presented studies that either investigated the perception of emotions or the underlying computational mechanisms. In the next chapter we want to introduce a study that combines this two approaches constructing an emotive android. While robots can have any form and function, an android is defined as a synthetic system that is designed to look like and behave like humans. In the recent years we have observed a dramatic increase of such robots. In the present study we implemented a model of fear into a humanoid robot to control his behavior. The here presented result is a collaborative work by Zenon Mathews, Etienne Roesch, Can Erogul, Cassandra Gould and myself. 6.1 The Neurobiological Mechanism of Fear The evaluation of a threatening situation and the elicitation of an appropriate response to it is one of the most important survival mechanism (LeDoux, 1996). Fear is an emotion that has been intensively studied using the behavioral paradigm of fear conditioning (Maren, 2001). This paradigm is based on the association of a neutral stimulus like a tone with an aversive stimulus like a foot shock resulting in the expression of fear 117 responses to the original neutral stimulus. The neutral stimulus is called ’conditioned stimulus’ (CS), the aversive stimulus ’unconditioned stimulus’ (US) and the response is the ’conditioned response’ (CR). A classical example is when an animal learns to freeze as a response to a conditioned stimulus (See figure 6.1). Figure 6.1: During the conditioning phase (left panel) an animal becomes exposed to a neutral tone (CS) and an aversive foot shock (US). After the conditioning phase (right panel) the animal reacts with a freeze response when exposed to the original neutral tone (CS). Figure adapted from Nadel and Land (2000). The processing of the valence quality of a stimulus or its relevance has been located in the subcortical cluster called amygdala (LeDoux, 2000; Paton et al., 2006; Tazumi and Okaichi, 2002; Sander et al., 2003). The amygdala is highly connected to modulatory sub-systems and behavioral response centers in the brain stem (Aggleton, 1992; LeDoux, 1996). The two different pathways transmitting signals to the amygdala have been identified (See figure 6.2). The low route transmits the signal without conscious experience over the thalamus to the amygdala. It is the fast route to a non-specific bodily response. The high route is activated simultaneously and involves cortical clusters to evaluate the importance of the stimulus and the elicitation of specific stimulus directed responses. This process takes more time but provides more information about the importance of the stimulus. Recently it has been proposed that the association of CS and US in 118 Figure 6.2: An aversive stimulus is transmitted by two pathways to the amygdala: The low route transmits the sensory information directly from the thalamus to the amygdala. This route is fast and responsible for unspecific behavioral responses. The high route sends the sensory input to cortical areas for the evaluation of the stimulus features. This route is slower, but capable to elicit more specific cognitive and behavioral responses. Figure adapted from LeDoux (1994). 119 the amygdala is based on the Hebbian plasticity mechanism (Armony et al., 1997; Johnson et al., 2008). Hebb’s rule states that any two cells or systems of cells that are repeatedly active at the same time will tend to become ’associated’, so that activity in one facilitates activity in the other (Hebb, 1949). The theory is often summarized as ’Cells that fire together, wire together’. This means in the context of a fear conditioning paradigm that the activation of two information streams induces plasticity on a synaptic level. This mechanism has been proposed by different animal studies using the fear conditioning paradigm (See figure 6.3): Figure 6.3: The processing of a neutral CS and an aversive US. When CS and US coincide at the location of the amygdala, learning is induced. The results are different physiological and behavioral responses. LA lateral amygdala, CE central amygdala, CG central gray, LH lateral hypothalamus, PVN paraventricular hypothalamus. Figure adapted from Medina et al. (2002) In the following study we use this paradigm of fear conditioning to 120 equip a humanoid agent with learning capabilities to control appropriate emotional expressions. 6.2 6.2.1 Embodied Emotive Model Model Architecture We constructed a neurobiologically constrained model of fear conditioning that consist of 3 subunits: The visual thalamus, the auditory thalamus and the amygdala (See figure 6.4). This system is capable to process a visual CS (red or blue color) and an aversive US (loud tone). When the CS and the US coincide in the amygdala, an adaptation in plasticity is induced. After the conditioning the CS is able to trigger solely the behavioral response. The implemented Hebbian learning rule changes the weight of the synaptic transmission in the amygdala: p wi,j 1X k k x x = p k=i i j (6.1) where wi is the weight of the connection of neuron i and j, p is the number of training patterns, and xki the kth input for neuron i. The implemented slow decay of activity in the activated cells defines the time window sensitive for learning. As an experiment platform we used the iCub, a humanoid robot that is equipped with multiple sensors and the capability to communicate his emotional state using facial expressions (Sandini et al., 2007). The iCub as two color cameras and two microphones to detect visual and audio input. 6.2.2 Experimental Design We conditioned the iCub using an aversive audio noise (US) and the color blue (CS) and red (non-conditioned stimulus NS). The behavioral 121 Figure 6.4: Schematic representation of the fear conditioning model. The visual stimulus and the audio stimulus are transmitted over the thalamus to the amygdala where they coincide. This co-activation induces an adaptation of the synaptic weight. After conditioning the change in synaptic weight allows the CS to trigger the behavioral response. 122 Figure 6.5: The iCub uses led lights to express different emotions in the face. The picture shows his angry expression that was used in the present study. 123 response was either an angry facial expression (CR) or a happy facial expression (UR) (See figure 6.6). Figure 6.6: Experimental design of the fear conditioning in the iCub. The association of a neutral CS with an aversive US induces a change in plasticity. After conditioning the CS alone is capable to elicit the behavioral response. A non-conditioned stimulus NS elicits also after the conditioning phase a unconditioned response. 6.2.3 Conditioning Before the conditioning the iCub smiles when he detects one of the two colors (See figure 6.7 A and B). Solely listening to the noise (US) the robot expresses an angry face. During conditioning the iCub was exposed to the blue color (CS) and to 4-5 noise events (US). The number of events is depending on the adaptation of the synaptic weight defined by the formula 6.1. After the conditioning the robot responds with an angry face 124 when seeing the blue hat, but still with a happy face when seeing the red backpack (Figure 6.7 C and D). A video of the conditioning procedure can be seen in our youtube channel:www.youtube.com/user/SpecsU P F Figure 6.7: The conditioning phase of the iCub. Before conditioning the iCub smiles when seeing either red (A) or blue (B). During the conditioning phase the robot sees the blue while hearing 4-5 aversive noise events (C). After the conditioning the robot reacts with an angry face when seeing the blue hat (E), but still smiles when seeing the red color (D). 6.2.4 Discussion & Conclusion Fear is one of the most important emotions responsible for a fast evaluation of a situation and the elicitation of fight or flight responses. Using the paradigm of fear conditioning the processing of this emotion has been extensively studied. In this study we successfully implemented a neurobiologically constrained model of fear conditioning into humanoid robot. We used this model to perceive different types of stimuli, to learn to associate some of them and use this association to elicit appropriate expressive behavior. The construction of a robot that is not only capable to make logical calculations but rather interact in a social meaningful way with its environment and other people is a big ongoing challenge. An artificial system that can achieve this task has to be capable to perceive and understand the valence quality of a situation and to integrate this perception with experiences already stored in memory to express appropriate responses. 125 Humans have different senses to perceive different types of valence qualities. One of the most basic valence quality is pain. Unfortunately we find very few examples of robots equipped with such kind of receptors. This shows the lack of a change in concept of how humanoid robots that aim to interact with people should be constructed. In our study we used a loud noise as an aversive stimuli. The processing of the valence quality and the elicitation of an appropriate response are very fast mechanisms in nature. Robots still lack behind in performance to this benchmark. The problem emerges in robot platforms that use different operating systems running on different machines. Such system are not capable to provide real-time processing and communication between software and hardware. This means that the transmission of the stimulus, the processing, but most importantly the control of behavior have some delays that range between millisecond and seconds. In our study we had therefore to make some adaptation in the timing of the plasticity change and the elicitation of the response. This problem hopefully becomes solved with new more powerful machines that use optimized system architectures. Despite this technical restrictions we were able to control the behavior of a robot using the paradigm of fear conditioning. Our system is capable to perceive the valence quality of a stimulus and associate it with the neutral meaning of another stimulus. The implementation of this system into a humanoid robot equips the agent with the capability to process emotive content. So far we evaluated the systems performance by observing its behavior in a real world interaction. In future steps we also want to quantify the learning performance of the computational circuit. Additionally it would interesting to extend the model with different types of valence stimuli and expressive behavior. We propose therefore that basic experimental approaches can be used to construct more complex emotive systems. 126 6.3 6.3.1 Proposal for an Advanced Emotive Architecture Theoretical Basis Emotions are structured processed that emerge over time to evaluate the valence quality of an internal or external stimulus. These stimuli can address very basic needs like the regulation of the internal milieu or be quite complex, for example the evaluation of social signals. This implies that the underlying mechanism of appraisal involves different levels of processing addressing different levels of complexity (Leventhal and Scherer, 1987). Table 6.1: Levels of processing for stimulus evaluation checks. Adapted from Leventhal and Scherer (1987). Level Pleasantness Goals/Needs Coping potential Sensory-motor Innate Basic Available energy Schematic Learned Acquired Body schemata Conceptual Recalled Conscious Problem solving Scherer et al. proposed a component process model CPM of emotions to deal with this problem (Scherer, 2001; Sander et al., 2005). This model defines the genesis of an emotion by a layered appraisal mechanism. This evaluation is described in terms of the following proposed objectives: 1. Relevance - Is the stimulus relevant for the individual? Does it require attention deployment, further information processing? 2. Implication - What are the potential consequences of the stimulus for the individual? 3. Coping - Does the individual have sufficient resources to cope with the consequences of the event? 127 4. Normative significance - How does the stimulus relate to the individual’s social or personal norms and standards? Each of these objectives encompasses more subtle cognitive appraisals, dubbed stimulus evaluation checks (SECs), the interaction of which yields to the differentiation of the ensuing emotion (See table 6.1). Throughout the appraising process, the evaluative function of the checks increases in complexity. Core to this theory is the proposal that appraisals occur sequentially (Grandjean and Scherer, 2008) and influence in turn each of the five components of emotion (See figure 6.8). Figure 6.8: The component process model (Scherer (2001); Sander et al. (2005)). Represented are the five components of emotion (vertical) as well as the sequence of appraisals (horizontal) and the interaction between subsystems that gradually shape the emotion, supporting the genesis of a particular feeling. 128 6.3.2 Distributed Adaptive Control In the previous sections we introduced a computational model of the two phase theory of conditioning. This model distinguishes two different associative learning mechanisms: A fast non-specific learning system (NLS), that produces within 1 to 5 trials a global state of arousal and elicits simple self-protective reaction patterns and a slow specific learning system (SLS), that is responsible for the accumulation of fine tuned motor reactions over a longer period of conditioning. Now we want to extend this model to a multilayered appraisal and control structure. Following the theoretical basis of the component process model we want to propose the Distributed Adaptive Control DAC as an advanced computational architecture for emotive processing (Verschure et al., 2003; Duff and Verschure, 2010). The architecture of DAC is structured into three layers: reactive, adaptive and contextual. The reactive layer is responsible for innate prewired reflexes. In the stimulus evaluation check of the CPM this addresses the motor-sensory level (See table 6.1). The reflexes of the reactive layer provide cues for the learning in the adaptive layer of DAC. In the stimulus evaluation check of the CPM this addresses the schematic level. The acquired representations in the adaptive layer provide inputs for the contextual layer that stores sequential representations. This mechanism represents in the conceptual level of the stimulus evaluation check of CPM. We propose to model each stage of the CPM using the 3 layered architceture of the distributed adaptive control. The relevance can be addressed by the action space of our system (See figure 6.9 red panel). The Implication and the Coping will we modeled by the self-introspection mechanism. This circuit evaluates the allostatic control and the higher cognitive goals of the agent (See figure 6.9 green panel). The model of the three components will produce a parallel architecture of three sub-DAC systems. Each DAC system deals with one of the components. The Normative Significance is the most complex stage because it needs a model of self. Future models of DAC have to propose such a structure to be able to process how the stimulus relates to the agent’s social or personal norms. 129 Figure 6.9: The system architecture of DAC: the system consists of three tightly coupled layers: reactive, adaptive and contextual. The reactive layer endows a behaving system with a prewired repertoire of reflexes (low complexity unconditioned stimuli and responses) that enable it to display simple adaptive behaviors. The activation of any reflex, however, also provides cues for learning that are used by the adaptive layer via representations of internal states, i.e. aversive and appetitive. The adaptive layer provides the mechanisms for the adaptive classification of sensory events and the reshaping of response. The sensory and motor representations formed at the level of adaptive control provide the inputs to the contextual layer that acquires, retains, and expresses sequential representations using systems for short and long term memory. The contextual layer describes goal oriented learning and reflexive mechanisms. 130 6.3.3 Conclusion Based on the empirical results of our models of conditioning we propose to extend their complexity on a perceptual and behavioral level. To do so we use the theoretical framework of the component process model that states that the stimulus evaluation process can be layered into three levels: motor-sensory, schematic and conceptual. This theoretical framework supports theoretically the proposed control mechanism of the distributive adaptive control DAC. 131 Chapter 7 CONCLUSION In this dissertation, we addressed the issue of understanding the phenomena of human emotions. To do so we posed the question of how we can construct embodied models of emotions. Following this methodology we implemented neurobiological and psychological models of affect expression into autonomous behaving agents to investigate both, the underlying neuronal mechanisms and the perception of affect. This approach allowed us to investigate and revise existing theories by comparing physiological data from behavioral and neurophysiological experiments with the performance of our models. In a second step we used a computational model of conditioning to control the expressive behavior of an android robot. Based on the findings of these studies we propose an emotive architecture that can be used to control the behavior of android robots that aim to socially interact with humans. The contributions of this thesis add to a deeper understanding of the multidimensional phenomena of emotions on three levels: Perception, interaction and how the processing of emotional cues influences learning and behavior. In the first part we investigated the perception of emotional behavior and its impact on social interaction. Humans use a complex code of verbal communication and non-verbal behavior to express their emotions and intentions. The perception of the physical presence of others is probably one of the most basic social interaction patterns in humans (Hall, 1966; 133 Baldassare, 1978). Before we investigated complex aspects of emotion processing and affect perception we posed the basic question of how humans perceive the physical presence of other humans and virtual avatars. We hypothesized that the perception of a virtual avatar is less salient compared to a real human and that this decrease in salience has a fundamental impact on social interaction on a spatial scale. This concept is know in psychology as the law of apparent reality (Frijda, 1988) or the ’vividness effect’ (Borgida and Nisbett, 1977; Baddeley and Andrade, 2000; McCabe and Castel, 2008). As an experimental paradigm to study this question we constructed a collaborative mixed reality ball game where two teams of two players had to coordinate their spatial movements. This game could be either played by physical players being present in the space or by remote players controlling a virtual avatar. The results of our study show that spatial interaction of winners differ significantly from the interaction patterns of losers (Inderbitzin et al., 2009). This social interaction is fundamentally influenced by the salience of the interactors (Inderbitzin et al., submitted). Our empirical data supports the concept that the salience of a stimuli acts like a gating mechanism for cognition and behavior. We propose this concept as a general mechanism of perception and behavioral control. In our study we showed that humans perceive virtual agents less salient and that this difference in vividness induces a fundamental adaptation of their interaction patterns. These results contribute to a better understanding of how the stimulus salience of perceiving another person influences social interaction. The understanding of this effect has important implications for the construction of interactive virtual emotive agents that aim social interaction with humans. The results of our first study showed that the perception of others influence the regulation of the interpersonal distance, a subtle code of social interaction. This opens the question what kind of additional non-verbal behaviors transmit the intention of others. In our second study we investigated how people perceive the expression of emotional states based on the observation of different styles of locomotion (Inderbitzin et al., 2011). Our goal was to find a small set of canonical parameters that allow us to control a wide range of emotional expressions. The results showed that, 134 independent of the viewing angle, participants perceived distinct states of arousal and valence. Moreover, we could show that parametrized body posture codes emotional states, irrespective of the contextual influence or facial expressions. Our results show that human locomotion transmits basic emotional cues that can be directly related to the modulation of the two canonical parameters speed and head/torso inclination. These findings are important for the understanding of how humans perceive nonverbal behavior. The acquired knowledge from this investigation allows us to build virtual characters whose emotional expression is recognizable at large distances and during extended periods of time. We know that human communication is a multidimensional stream of non-verbal and verbal features. So far our studies analyzed the perception of non-verbal codes. In our third study we addressed the question of how humans perceive and integrate emotional meaning transmitted by the facial expression and the meaning of spoken words. We used an expressive dialog system to investigate how humans perceive and integrate the linguistic semantics and expressive dimension of an affective stimulus construct. Differences in reaction times to judge coherent and incoherent stimuli constructs were used to evaluate the automaticity of affect processing in the two dimensions. We tested between the fuzzy logical model of perception (FLMP) and an additive model to investigate the underlying psychological mechanism of affect perception. Using a computer animated avatar face we constructed a stimuli continuum from angry to happy facial and linguistic expressions. We constructed a stimuli space that transmits various degrees of coherent or incoherent affect. Subjects were instructed to judge the affect of the facial expression, the affect of the meaning of the word or the affect of the global event combining these two properties. Both properties influenced judgments as described by the FLMP, when participants responded quickly. With increasing reaction time, the FLMP did not make better predictions than other models of perception. The reaction times increased when subjects had to rate stimuli that coded incoherent valence qualities in the two modalities. This results indicate that people can not avoid the perception of the affect, even when they are instructed to do so. Masked-priming experiments support 135 this interpretation of our data (Esteves et al., 1994; Morris et al., 1998; Bargh, 1989; Kihlstrom, 1987). When participants had enough time they did not integrate a multi-modal affective stimuli according to the mechanism of the FLMP. We conclude that the perception of affect in multiple modalities is an automatic process that can produce interferences, while the integration of these modalities into a global impression is more controlled. So far we have investigated the perception of emotions and its relation on social interaction. The results of these studies contribute to the understanding of the perceptual dimension of emotions. In the second part of this thesis we focused on the computational mechanisms of emotions. In the first study we wanted to investigate the neuronal plasticity responsible for the elicitation of an appropriate behavioral response to an aversive stimulus. We investigated this question using the experimental paradigm of classical conditioning. According to Konorski’s two phase theory of conditioning the associative processes underlying classical conditioning can be separated into a fast valence driven non-specific learning systems (NLS) and a slow specific learning system (SLS) (Konorksi, 1948; Konorski, 1968). The theory states that the NLS elicits a nonspecific state of arousal and that the SLS is responsible for the exact elicitation of a coordinated motor response (Ellison and Konorski, 1964). Based on biological evidence we propose the amygdala, the basal forebrain and the auditory cortex as an example of NLS (Sanchez-Montanes et al., 2002) and the cerebellum for the SLS (Hofstotter et al., 2002). The performance of the model was tested applying the eye-blink paradigm of classical conditioning. The unconditioned stimulus induced amygdala stimulation of the nucleus basalis elicits plasticity in the NLS. This lead to an increased representation if the conditioned stimulus in the cortex. The plasticity of the cerebellar SLS was regulated by these increased cortical representation coding the behavioral importance of the conditioned stimulus (Inderbitzin et al., 2010a). To verify the credibility of our model we connected it to an autonomous robot that had to achieve an obstacle avoidance task. The behavioral performance was used as a benchmark (Inderbitzin et al., 2010b). The results of these studies provide a com136 plete account of Konorskis proposal by integrating these two systems into a complete biologically-grounded computational model of the two-phase theory of classical conditioning. As a next step we applied the knowledge we had gained from the previous studies to design an emotive android agent that is capable to learn the valence quality of a stimulus and produce an appropriate expressional response. Based on the results of studies investigation the mechanism of fear, we constructed a neurobiologically constrained model of the amygdala. This model used a Hebbian plasticity mechanism (Armony et al., 1997; Johnson et al., 2008) to associate an unconditioned stimulus with a conditioned stimulus to elicit an appropriate behavioral response in the humanoid robot iCub. The performance of the model was tested in a real world set up, exposing the robot to different colors (CS) and an aversive tone (US). The correct elicitation of a conditioned response (CR) when exposed to the CS after the conditioning phase was used as a benchmark. In this study we successfully implemented a neurobiological based model of fear conditioning into the control architecture of an android. The analysis of our results helps us to understand the connection between somatic and cognitive processes involved the control and elicitation of emotions. The fear mechanism is a very fast and efficient process in nature. While neuronal modeling software can learn in real time, the iCub robot platform still lags behind the speed of behavioral control. This restriction has to be considered in the future design of studies addressing social interaction in real time with android robots. We investigated different aspects of emotions using embodied emotive models. The results of our studies contribute to the understanding of the phenomenon emotions on three levels: Perception, interaction and how learning affects behavioral control. As a main contribution we propose a biologically inspired architecture of emotion processing that can control the behavior of an android. The results of this thesis show that embodied emotive models can be successfully used to investigate human psychology. But it also shows that we are capable to equip android agents with synthetic emotions. Robots and virtual avatars that are equipped with such psychological and neurocomputational inspired mechanisms will dramat137 ically increase in the next decades. This will have a profound impact on modern society’s hegemonic, economic and socio-cultural development. 138 Bibliography R Adolphs, D Tranel, H Damasio, and A Damasio. Impaired recognition of emotion in facial expressions following bilateral damage to the human amygdala. Nature, 372(6507):669–672, 1994. J P Aggleton. The Amygdala. Wiley-Liss, Inc., New York, 1992. C D Aizenman, P B Manis, and D J Linden. Polarity of long-term synaptic gain change is related to postsynaptic spike firing at a cerebellar inhibitory synapse. Neuron, 21(4):827–835, 1998. J S Albus. A theory of cerebellar function. Mathematical Biosciences, 10:25–61, 1971. J S Albus. A new approach to manipulator control: the cerebellar model articulation controller. Journal of Dynamic Systems, Measurement, and Control, 97:220–227, 1975. A K Anderson and E A Phelps. Lesions of the human amygdala impair enhanced perception of emotionally salient events. Nature, 411(6835): 305–9, 2001. M Argyle. Bodily Communication, volume 2nd. Methuen, 1988. M Argyle and J Dean. Eye-contact, distance and affiliation. Sociometry, 28(3):289–304, 1965. J L Armony, D Servan-Schreiber, J D Cohen, and J E LeDoux. Computational modeling of emotion: explorations through the anatomy and 139 physiology of fear conditioning. Trends in Cognitive Sciences, 6613 (1364), 1997. J L Armony, G J Quirk, and J E LeDoux. Differential effects of amygdala lesions on early and late plastic components of auditory cortex spike trains during fear conditioning. The Journal of Neuroscience, 18(7): 2592–601, 1998. M Arnold. Emotion and Personality. Columbia University Press, New York, 1960. A P Atkinson, W H Dittrich, A J Gemmell, and A W Young. Emotion perception from dynamic and static body expressions in point-light and full-light displays. Perception, 33(6):717–746, 2004. A P Atkinson, M L Tunstall, and W H Dittrich. Evidence for distinct contributions of form and motion information to the recognition of emotions from body gestures. Cognition, 104(1):59–72, 2007. Autodesk Inc., San Francisco, CA, USA. Autodesk 3ds max, 2007. H Aviezer, R R Hassin, J Ryan, C Grady, J Susskind, A Anderson, M Moscovitch, and S Bentin. Angry, disgusted, or afraid? Studies on the malleability of emotion perception. Psychological science : A Journal of the American Psychological Society / APS, 19(7):724–32, 2008. A D Baddeley and J Andrade. Working memory and the vividness of imagery. Journal of Experimental Psychology: General, 129(1):126– 145, 2000. J N Bailenson, J Blascovich, A C Beall, and J M Loomis. Equilibrium theory revisited: Mutual gaze and personal space in virtual environments. Presence: Teleoperators & Virtual Environments, 10(6):583– 598, 2001. 140 J N Bailenson, A C Beall, and J M Loomis. Interpersonal distance in immersive virtual environments. Personality and Social Psychology Bulletin, 29:1–15, 2003. J S Bakin and N M Weinberger. Classical conditioning induces cs-specific receptive field plasticity in the auditory cortex of the guinea pig. Brain Res, 536(1-2):271–286, 1990. J S Bakin, D A South, and N M Weinberger. Induction of receptive field plasticity in the auditory cortex of the guinea pig during instrumental avoidance conditioning. Behavioral Neuroscience, 110(5):905–913, 1996. M Baldassare. Human spatial behavior. Annual Review of Sociology, 4: 29–56, 1978. C Balkenius and J Morén. A computational model of emotional conditioning in the brain. In Proceedings of Workshop on Grounding Emotions in Adaptive Systems, Zurich. Citeseer, 1998. A Bandura. Social learning theory. Prentice Hall, Englewood Cliffs, NJ, 1977. J.A. Bargh. Conditional automaticity: Varieties of automatic influence in social perception and cognition. In James S Uleman and J.A. Bargh, editors, Unintended thought, chapter 1, pages 3–51. Guildford Press, New York, 1989. S Baron-Cohen. Mindblindness: An Essay on Autism and Theory of Mind. The MIT Press, 1997b. S Baron-Cohen, S Wheelwright, and T Jolliffe. Is there a” language of the eyes”? Evidence from normal adults, and adults with autism or Asperger syndrome. Visual Cognition, 4(3):311–331, 1997a. R M Bauer. Autonomic recognition of names and faces in prosopagnosia: a neuropsychological application of the Guilty Knowledge Test. Neuropsychologia, 22(4):457–69, January 1984. 141 Paula Beall and Andrew Herbert. The face wins: Stronger automatic processing of affect in facial expressions than words in a modified Stroop task. Cognition & Emotion, 22(8):1613–1642, May 2008. A Bechara and A R Damasio. The somatic marker hypothesis: A neural theory of economic decision. Games and Economic Behavior, 52(2): 336–372, 2005. A Bechara, H Damasio, A R Damasio, and G P Lee. Different contributions of the human amygdala and ventromedial prefrontal cortex to decision-making. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience, 19(13):5473–81, 1999. A Bechara, H Damasio, and A R Damasio. Emotion , Decision Making and the Orbitofrontal Cortex. Cerebral Cortex, 10:295–307, 2000. C Becker, S Kopp, and I Wachsmuth. Simulating the emotion dynamics of a multimodal conversational agent. Affective Dialogue Systems, pages 154–165, 2004. U Bernardet, S Bermúdez i Badia, and P F M J Verschure. The experience induction machine and its role in the research on presence. 10th Annual International Workshop on Presence. Barcelona: Spain, 2007. U Bernardet, M Inderbitzin, S Wierenga, A Väljamäe, A Mura, and P F M J Verschure. Validating presence by relying on recollection: Human experience and performance in the mixed reality system XIM. The 10th International Workshop on Presence, 2008. U Bernardet, A Väljamäe, M Inderbitzin, S Wierenga, and P F M J Verschure. Quantifying human subjective experience and social interaction using the experience induction machine. Brain Research Bulletin, In press. E Bevacqua, M Mancini, and C Pelachaud. A listening agent exhibiting variable behaviour. In Intelligent Virtual Agents, pages 262–269. Springer, 2008. 142 K C Bickart, C I Wright, R J Dautoff, B C Dickerson, and L F Barrett. Amygdala volume and social network size in humans. Nature Neuroscience, 14(2):163–164, 2010. J R Binder, S J Swanson, T a Hammeke, G L Morris, W M Mueller, M Fischer, S Benbadis, J A Frost, S M Rao, and V M Haughton. Determination of language dominance using functional MRI: a comparison with the Wada test. Neurology, 46(4):978–84, 1996. J R Binder, R H Desai, W W Graves, and L L Conant. Where is the semantic system? A critical review and meta-analysis of 120 functional neuroimaging studies. Cerebral Cortex, 19(12):2767–96, 2009. R L Birdwhistell. Introduction to kinesics: An annotation system for analysis of body motion and gesture. University of Louisville, Louisville, KY, 1975. R Blake and M Shiffrar. Perception of human motion. Annual Review of Psychology, 58:47–73, 2007. S J Blakemore and J Decety. From the perception of action to the understanding of intention. Nature Reviews Neuroscience, 2(8):561–7, 2001. J Blascovich, J Loomis, A C Beall, and K R Swinth. Virtual environment technology as a methodological tool for social psychology. Psychological Inquiry, 13(2):103–124, 2002. S Bookheimer. Functional MRI of language: new approaches to understanding the cortical organization of semantic processing. Annual Review of Neuroscience, 25:151–88, 2002. E Borgida and R Nisbett. The Differential Impact of Abstract vs. Concrete Information on Decisions. Journal of Applied Social Psychology, 7(3): 258–271, 1977. 143 M M Bradley and P J Lang. Measuring emotion: the self-assessment manikin and the semantic differential. Journal of Behavior Therapy and Experimental Psychiatry, 25(1):49–59, 1994. C Breazeal. Emotion and sociable humanoid robots. International Journal of Human-Computer Studies, 59(1-2):119–155, 2003. W H Bridger and I J Mandel. A comparison of GSR fear responses produced by threat and electric shock. Journal of Psychiatric Research, 54:31–40, 1964. N Bruno and J E Cutting. Minimodularity and the perception of layout. Journal of experimental psychology. General, 117(2):161–70, 1988. T W Buchanan, K Lutz, S Mirzazade, K Specht, N J Shah, K Zilles, and L Jäncke. Recognition of emotional prosody and verbal components of spoken language: an fMRI study. Cognitive Brain Research, 9(3): 227–38, 2000. J K Burgoon, L A Stern, and L Dillman. Interpersonal Adaption: Dyadic Interaction Patterns. Cambridge University Press, New York, NY, 2007. A Camurri, I Lagerlöf, and V Gualtiero. Recognizing emotion from dance movement: comparison of spectator recognition and automated techniques. International Journal of Human-Computer Studies, 59(1-2): 213–225, 2003. W B Cannon. The James-Lang Theory of Emotions: A critical examination and an alternative theory. American Journal of Psychology, 39 (1/4):106–124, 1927. J B Carroll. Word Frequency Book. In P Davis and R Barry, editors, The American Heritage Word Frequency Book, New York, 1971. American Heritage Publishing Co., Inc. 144 F Caruana, A Jezzini, B Sbriscia-Fioretti, G Rizzolatti, and V Gallese. Emotional and Social Behaviors Elicited by Electrical Stimulation of the Insula in the Macaque Monkey. Current Biology, 21(3):195–199, 2011. J.P. Chandler. Subroutine STEPIT - Finds local minima of a smooth function of several parameters. Behavioral Science, 14:81–82, 1969. Y Chudasama, A Izquierdo, and E A Murray. Distinct contributions of the amygdala and hippocampus to fear expression. The European Journal of Neuroscience, 30(12):2327–37, 2009. T J Clarke, M F Bradshaw, D T Field, S E Hampson, and D Rose. The perception of emotion from body movement in point-light displays of interpersonal dialogue. Perception, 34(10):1171–1180, 2005. M Coesmans, J T Weber, C I De Zeeuw, and C Hansel. Bidirectional parallel fiber plasticity in the cerebellum under climbing fiber control. Neuron, 44(4):691–700, 2004. J D Cohen, K Dunbar, and J L McClelland. On the control of automatic processes: a parallel distributed processing account of the Stroop effect. Psychological review, 97(3):332–61, July 1990. MM Cohen and DW Massaro. Modeling coarticulation in synthetic visual speech. Models and techniques in computer animation, pages 139–156, 1993. F F Corchado Ramos, H R Orozco Aguirre, and L A Razo Ruvalcab. The Use of Artificial Emotional Intelligence in Virtual Creatures. In J Vallverdú and D Casacuberta, editors, Handbook of Research on Synthetic Emotions and Sociable Robotics: New Applications in Affective Computing and Artificial Intelligence, pages 350–378. IGI Global, 2009. M Coulson. Attributing Emotion to Static Body Postures: Recognition Accuracy, Confusions, and Viewpoint Dependence. Journal of Nonverbal Behavior, 28(2):117–139, 2004. 145 A D B Craig. How do you feel–now? The anterior insula and human awareness. Nature Reviews Neuroscience, 10(1):59–70, 2009. A D B Craig. The sentient self. Brain structure & function, 214(5-6): 563–77, 2010. C Cruz-Neira, D J Sandin, T A DeFanti, R V Kenyon, and J C Hart. The cave: Audio visual experience automatic virtual environment. Communications of the ACM, 35(6):64–72, 1992. A R Damasio. Fundamental feelings. Nature, 413(6858):781, 2001. A R Damasio, D Tranel, and H Damasio. Face agnosia and the neural substrates of memory. Annual review of neuroscience, 13:89–109, January 1990. A R Damasio, B J Everitt, and D Bishop. The Somatic Marker Hypothesis and the Possible Functions of the Prefrontal Cortex. Philosophical Transactions: Biological Sciences, pages 1413–1420, 1996. J M Darley and B Latane. The unresponsive bystander: why doesn’t he help? Appleton-Century Crofts, New York, NY, 1970. D N Davis and S C Lewis. Computational models of emotion for autonomy and reasoning. Informatica Special Edition on Perception and Emotion Based Reasoning, 27(2):157–164, 2003. F C Davis, T Johnstone, E C Mazzulla, J A Oler, and P J Whalen. Regional Response Differences Across the Human Amygdaloid Complex during Social Conditioning. Cerebral Cortex, 12(10):1217–1218, 2009. J De Houwer and D Hermans. Differences in the affective processing of words and pictures. Cognition & Emotion, 8(1):1–20, 1994. 146 A De Luca, R Mattone, P R Giordano, and H H Bulthoff. Control design and experimental evaluation of the 2D cyberwalk platform. In Proceedings from the IEEE/RSJ International Conference on Intelligent Robots and Systems, St. Louis, USA, 2009. P R De Silva and N Bianchi-Berthouze. Modeling human affective postures: an information theoretic characterization of posture features. Computer Animation and Virtual Worlds, 15(3-4):269–276, 2004. R De Sousa. The rationality of emotions. MIT Press, Cambridge, 1987. R M J Deacon, D M Bannerman, and J N P Rawlins. Anxiolytic Effects of Cytotoxic Hippocampal Lesions in Rats. Behavioral Neuroscience, 116(3):494–497, 2002. B DeCarolis, C Pelachaud, and I Poggi. APML, a mark-up language for believable behavior generation. Life-like Characters., pages 1–22, 2004. K Dedovic, A Duchesne, J Andrews, V Engert, and J C Pruessner. The brain and the stress axis: The neural correlates of cortisol regulation in response to stress. NeuroImage, 47:864–871, 2009. S Dehaene, L Naccache, H Gurvan Le Clec, E Koechlin, M Mueller, G Dehaene-Lampertz, P F van De Moortele, and D Le Bihan. Imaging unconscious semantic priming. Nature, 395:597–600, 1998. T Delbruck, A Whatley, R Douglas, K Eng, K Hepp, and P F M J Verschure. A tactile luminous floor for an interactive autonomous space. Robotics and Autonomous Systems, 55(6):433–443, 2007. J-F Demonet, F Chollet, S Ramsay, D Cardebat, J-L Nespoulous, R Wise, A Rascol, and R Frackowiak. The anatomy of phonological and semantic processing in normal subjects. Brain, 115:1753–1768, 1992. J T Devlin, R P Russell, M H Davis, C J Price, H E Moss, M J Fadili, and L K Tyler. Is there an anatomical basis for category-specificity? 147 Semantic memory studies in PET and fMRI. Neuropsychologia, 40(1): 54–75, 2002. U Dimberg, M Thunberg, and K Elmehed. Unconscious facial reactions to emotional facial expressions. Psychological Science : A Journal of the American Psychological Society, 11(1):86–89, 2000. A Duff and P F M J Verschure. Unifying perceptual and behavioral learning with a correlative subspace learning rule. Neurocomputing, 73(1012):1818 – 1830, 2010. Subspace Learning / Selected papers from the European Symposium on Time Series Prediction. J M Edeline, P Pham, and N M Weinberger. Rapid development of learning-induced receptive field plasticity in the auditory cortex. Behavioral Neuroscience, 107(4):539–551, 1993. P Ekman. An argument for basic emotions. Cognition & Emotion, 6(3): 169–200, 1992. P Ekman. Facial expression and emotion. American Psychologist, 48(4): 384–392, 1993. P Ekman and W V Friesen. Detecting deception from the body or face. Journal of Personality and Social Psychology, 29(3):288–298, 1974. P Ekman and W V Friesen. Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press, Palo Alto, 1978. P Ekman and W V Friesen. A new pan-cultural facial expression of emotion. Motivation and Emotion, 10(2):159–168, 1986. P Ekman, W V Friesen, and P Ellsworth. Emotion in the Human Face. Oxford University Press, New York, 2nd edition, 1982. M S El-Nasr, J Yen, and T R Ioerger. FLAME - Fuzzy Logic Adaptive Model of Emotions. Autonomous Agents and Multi-agent systems, 3 (3):219–257, 2000. 148 C Elliott and G Siegle. Variables Influencing the Intensity of Simulated Affective States. In AAAI Spring Symposium on Reasoning about Mental States: Formal Theories and Applications, pages 58–67, 1993. G D Ellison and J Konorski. Separation of the salivary and motor responses in instrumental conditioning. Science, 146(3647):1071–1072, 1964. J W Ellison and D W Massaro. Featural evaluation, integration, and judgment of facial affect. Journal of Experimental Psychology. Human Perception and Performance, 23(1):213–26, 1997. F Esteves, U Dimberg, and A Öhman. Automatically elicited fear: Conditioned skin conductance responses to masked facial expressions. Cognition & Emotion, 8(5):393–413, 1994. N L Etcoff and J J Magee. Categorical perception of facial expressions. Cognition, 44(3):227 – 240, 1992. J Evans. In two minds: dual-process accounts of reasoning. Trends in Cognitive Sciences, 7(10):454–459, 2003. J D G Evans. Review: Aristotle’s de anima. The Classical Review, 45(1): pp. 60–61, 1995. M S Fanselow and A M Poulos. The neuroscience of mammalian associative learning. Annual Review of Psychology, 56:207–234, 2005. M J Farah, J W Tanaka, and H M Drain. What causes the face inversion effect? Journal of Experimental Psychology. Human Perception and Performance, 21(3):628–634, 1995. L Festinger, S Schachter, and K Back. Social Pressures in Informal Groups. Harper, New York, 1950. S T Fiske and S E Taylor. Social cognition. New York, NY: Random House, 1984. 149 M K Floeter and W T Greenough. Cerebellar plasticity: modification of Purkinje cell structure by differential rearing in monkeys. Science, 206 (4415):227–229, 1979. A J Fridlund, B Apfelbaum, G Blum, D Brown, J Balakrishnan, J Loomis, G Mchugo, M Platow, and P Rozin. Sociality of Solitary Smiling: Potentiation by an Implicit Audience. Journal of Personality and Social Psychology, 60(2):229–240, 1991. N H Frijda. The laws of emotion. The American Psychologist, 43(5): 349–58, 1988. N H Frijda. Moods, emotion episodes and emotions. In M Lewis and J M Haviland, editors, Handbook of Emotions, pages 381–403. Guilford Press, New York, 1993. J M Fuster. The prefrontal cortex. Elsevier, Amsterdam, 4th edition, 2008. R Galambos, G Sheatz, and V G Vernier. Electrophysiological correlates of a conditioned response in cats. Science, 123(3192):376–377, 1956. GarageGames. Torque Game Engine [Computer software]. Eugene, OR, 2010. P Gebhard. ALMA: a layered model of affect. In Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems, pages 29–36. ACM, 2005. M A Giese and T Poggio. Neural mechanisms for the recognition of biological movements. Nature Reviews Neuroscience, 4(3):179–192, 2003. W R Glaser and F-J Dungelhoff. The time course of picture-word interference. Journal of experimental psychology. Human perception and performance, 10(5):640–654, 1984. 150 W R Glaser and M O Glaser. Context effects in stroop-like word and picture processing. Journal of Experimental Psychology. General, 118 (1):13–42, 1989. J Globisch, A O Hamm, F Esteves, and A Öhman. Fear appears fast: temporal course of startle reflex potentiation in animal fearful subjects. Psychophysiology, 36(1):66–75, 1999. P Gold. Acetylcholine modulation of neural systems involved in learning and memory. Neurobiology of Learning and Memory, 80:194–210, 2003. I Gormezano, W F Prokasy, and R F Thompson. Classical conditioning. Lawrence Erlbaum, Hillsdale, NJ, England, 1987. D Grandjean and K R Scherer. Unpacking the cognitive architecture of emotion processes. Emotion, 8(3):341–51, 2008. J Gratch and S Marsella. A domain-independent framework for modeling emotion. Journal of Cognitive Systems Research, 5(4):269–306, 2004. J Gratch and S Marsella. Evaluating a Computational Model of Emotion. Autonomous Agents and Multi-Agent Systems, 11(1):23–43, 2005. J Gratch, J Rickel, E André, J Cassell, E Petajan, and N Badler. Creating interactive virtual humans: Some assembly required. Intelligent Systems, IEEE, 17(4):54–63, 2002. A G Greenwald, M R Klinger, and T J Liu. Unconscious processing of dichoptically masked words. Memory & cognition, 17(1):35–47, 1989. P E Griffiths. What emotions really are. University of Chicago Press, Chicago, 1997. B GuoQiang and P MuMing. Synaptic modifications in cultured hippocampal neurons: Dependence on spike timing, synaptic strength, and postsynaptic cell type. Journal of Neuroscience, 18(24):10464–10472, 1998. 151 E T Hall. A system for the notation of proxemic behavior. American Anthropologist, 65:1003–1026, 1963. E T Hall. The Hidden Dimension. Anchor Books, New York, 1966. A R Hariri, V S Mattay, A Tessitore, B Kolachana, F Fera, D Goldman, M F Egan, and D R Weinberger. Serotonin transporter genetic variation and the response of the human amygdala. Science, 297(5580):400–3, 2002. M Haruno and C D Frith. Activity in the amygdala elicited by unfair divisions predicts social value orientation. Nature Neuroscience, 13 (2):160–1, 2010. J V Haxby, E A Hoffman, and M I Gobbini. Human neural systems for face recognition and social communication. Biological Psychiatry, 51 (1):59–67, 2002. L A Hayduk. Personal space: An evaluative and orienting overview. Psychological Bulletin, 85(1):117 – 134, 1978. D O Hebb. The Organization of Behavior: A Neuropsychological Theory. Wiley, New York, 1949. H Hediger. Wild Animals in Captivity. Dover Publications, New York, 1964. F Heider. Social perception and phenomenal causality. Psychological Review, 51(6):358–374, 1944. A Hermann, A Schäfer, B Walter, R Stark, D Vaitl, and A Schienle. Emotion regulation in spider phobia: role of the medial prefrontal cortex. Social Cognitive and Affective Neuroscience, 4(3):257–67, 2009. B Hillier. Space is the Machine. Press Syndicate of the University of Cambridge, 1996. 152 C Hofstoffer, M Mintz, and P F M J Verschure. The cerebellum in action: a simulation and a robotics study. European Journal of Neuroscience, 16:1361–1376, 2002. C Hofstotter, M Mintz, and P F M J Verschure. The cerebellum in action: a simulation and robotics study. European Journal of Neuroscience, 16 (7):1361–1376, 2002. C Hull. The problem of stimulus equivalence in behavior theory. Psychological Review, 46:9–30, 1939. M Inderbitzin, S Wierenga, A Väljamäe, U Bernardet, and P F M J Verschure. Cooperation and competition in the mixed reality space experience induction machine xim. Virtual Reality, 13:153–158, 2009. M Inderbitzin, I Herreros-Alonso, and P F M J Verschure. An integrated computational model of the two phase theory of classical conditioning. In The 2010 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2010a. M Inderbitzin, I Herreros-Alonso, and P F M J Verschure Verschure. Amygdala Induced Plasticity in an Integrated Computational Model of the Two-Phase Theory of Conditioning. In 4th International Conference on Cognitive Systems, Zurich: Switzerland, 2010b. M Inderbitzin, A Valjamae, J M B Calvo, P F M J Verschure, and U Bernardet. Expression of emotional states during locomotion based on canonical parameters. In IEEE International Conference on Automatic Face and Gesture Recognition, pages 809 –814, 2011. M Inderbitzin, A Betella, U Bernardet, and P F M J Verschure. The social perceptual salience effect. Journal of Experimental Psychology. Human Perception and Performance, submitted. M Ito. Long-term depression. Annual Review of Neuroscience, 12(1): 85–102, 1989. 153 M Ito. Historical review of the signification of the cerebellum and the role of the purkinje cells in motor learning. Annals of the New York Academy of Sciences, 978:273–288, 2002. C E Izard. The face of emotions. Appleton-Century-Crofts, New York, 1971. C E Izard. Human emotions. Plenum, New York, 1977. C E Izard. Innate and universal facial expressions: Evidence from developmental and cross-cultural research. Psychological Bulletin, 115(2): 288–299, 1994. J Jacobs. The Death and Life of Great American Cities. Random House, New York, 1961. C H James, T J Buckingham, and G A Barto. Models of the cerebellum and motor learning. Behavioral and Brain Sciences, 19:368–383, 2004. W James. What is an emotion? Mind, 9(34):188–205, 1884. J P Johansen, J W Tarpley, J E Ledoux, and H T Blair. Neural substrates for expectation-modulated fear learning in the amygdala and periaqueductal gray. Nature Neuroscience, 13(8):979–986, 2010. L R Johnson, J E LeDoux, and V Doyère. Hebbian reverbrations in emotional memory micro circuits. Frontiers in Neuroscience, 3(2):198– 205, 2008. P N Johnson-Laird and K Oatley. The language of emotions: An analysis of a semantic field. Cognition & Emotion, 3(2):81–123, 1989. S Jolly. Understanding body language: Birdwhistell’s theory of kinesics. Corporate Communications: An International Journal, 5(3):133–139, 2000. N H Kalin, S E Shelton, and R J Davidson. Role of the primate orbitofrontal cortex in mediating anxious temperament. Biological Psychiatry, 62(10):1134–9, 2007. 154 S Kamisato, S Odo, Y Ishikawa, and K Hoshino. Extraction of Motion Characteristics Corresponding to Sensitivity Information Using Dance Movement. Computational Intelligence, 8(2), 2004. N Kanwisher, J McDermott, and M M Chun. The fusiform face area: a module in human extrastriate cortex specialized for face perception. The Journal of neuroscience : the official journal of the Society for Neuroscience, 17(11):4302–11, 1997. D P Kennedy, J Gläscher, J M Tyszka, and R Adolphs. Personal space regulation by the human amygdala. Nature Neuroscience, 12(10):1226–7, 2009. J F Kihlstrom. The Cognitive Unconscious. Science, 237(4821):1445–52, 1987. J Kisielius and B Sternthal. Examining the Vividness Controversy: An Availability-Valence Interpretation. The Journal of Consumer Research, 12(4):418–431, 1986. M A Kisley and G L Gerstein. Daily variation and appetitive conditioning-induced plasticity of auditory cortex receptive fields. European Journal of Neuroscience, 13(10):1993–2003, 2001. A Kleinsmith and N Bianchi-Berthouze. Recognizing affective dimensions from body posture. In International Conference of Affective Computing and Intelligent Interaction, pages 48–58, Lisboa (Portugal), 2007. A Kleinsmith, P De Silva, and N Bianchi-Berthouze. Cross-cultural differences in recognizing affect from body posture. Interacting with Computers, 18(6):1371–1389, 2006. J Konorksi. Conditioned reflex and neuron organization. University Press, Cambridge, 1948. J Konorski. Integrative Activity of the Brain. An Interdisciplinary Approach. University of Chicago Press, Chicago, 1968. 155 R E Kraut and R E Johnston. Social and Emotional Messages of Smiling : An Ethological Approach. Journal of Personality and Social Psychology, 37(9):1539–1553, 1979. M E Kret and B de Gelder. Social context influences recognition of bodily expressions. Experimental brain research. Experimentelle Hirnforschung. Expérimentation cérébrale, 203(1):169–80, 2010. D J Krupa and R F Thompson. Reversible inactivation of the cerebellar interpositus nucleus completely prevents acquisition of the classically conditioned eye-blink response. Learning & Memory, 3(6):545–556, 1997. N Kuczewski, C Porcher, V Lessmann, I Medina, and J-L Gaiarsa. Backpropagating action potential: A key contributor in activity-dependent dendritic release of BDNF. Communicative & Integrative Biology, 1 (2):153–155, 2008. J D LaBarbera, C E Izard, P Vietze, and S A Parisis. Four- and sixmonth-old infants’ visual responses to joy, anger, and neutral expressions. Child Development, 47(2), 1976. C Lamm and T Singer. The role of anterior insular cortex in social emotions. Brain Structure & Function, pages 579–591, 2010. P J Lang, M Davis, and A Öhman. Fear and anxiety: Animal models and human cognitive psychophysiology. Journal of Affective Disorders, 61 (3):137–159, 2000. R S Lazarus. Stress, appraisal and coping. Springer, New York, 1991. J E LeDoux. Emotion memory and the brain. Scientific American, 270 (6):32–39, 1994. J E LeDoux. Emotion: clues from the brain. Annual Review of Psychology, 46:209–35, 1995. 156 J E LeDoux. The emotional brain. Simon and Schuster Paperbacks, New York, 1996. J E LeDoux. Emotion circuits in the brain. Annual review of neuroscience, 23(1):155–184, 2000. J E LeDoux. Synaptic Self: How Our Brains Become Who We Are. Penguin (Non-Classics), 2003. J E LeDoux. Amygdala. Scholarpedia, 2006. J E LeDoux and R G Phillips. Differential Contribution of Amygdala and Hippocampus to Cued and Contextual Fear Conditioning. Behavioral Neuroscience, 106(2):274–285, 1992. T Lee and J J Kim. Differential effects of cerebellar, amygdalar, and hippocampal lesions on classical eyeblink conditioning in rats. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience, 24(13):3242–3250, 2004. A Leffler, D L Gillespie, and J C Conaty. The Effects of Status Differentiation on Nonverbal Behavior. Social Psychology Quarterly, 45(3): 153–161, 1982. R Lennartz and N M Weinberger. Analysis of response systems in pavlovian conditioning reveal rapidly versus slowly acquired conditioned responses: Support for two–factors and implications for neurobiology. Psychobiology, 20:93–119, 1992. H Leventhal and K R Scherer. The relationship of emotion to cognition: A functional approach to a semantic controversy. Cognition and Emotion, 1(1):3–28, 1987. S C Levine, M T Banich, and M P Koch-Weser. Face recognition: a general or specific right hemisphere capacity? Brain and Cognition, 8 (3):303–25, 1988. 157 J L Lewis, J J LoTurco, and P R Solomon. Lesions of the middle cerebellar peduncle disrupt acquisition and retention of the rabbits classically conditioned nictitating membrane response. Behavioral Neuroscience, 101, 1987. M Lewis. Self-Conscious Emotions: Embarrassment, Pride, Shame, and Guilt. In M Lewis and J M Haviland, editors, Handbook of Emotions, pages 563–573. Guildford Press, New York, 1993. G D Logan. Automaticity and cognitive control. In J S Uleman and J A Bargh, editors, Unintended thoughts. Guildford Press, New York, 1989. G Lowe. Inhibition of backpropagating action potentials in mitral cell secondary dendrites. Journal of Neurophysiology, 88(1):64–85, 2002. N Mackintosh. The psychology of animal learning. Academic Press, London, 1974. C M MacLeod. Half a century of research on the Stroop effect: an integrative review. Psychological Bulletin, 109(2):163–203, 1991. S Maren. Neurobiology of Pavlovian Fear Conditioning. Annual Review of Neuroscience, 24:897–931, 2001. H Markram, J Lubke, M Frotscher, and B Sakmann. Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs. Science, 275(5297):213–215, 1997. D Marr. A theory of cerebellar cortex. The Journal of Physiology, 202 (2):437–470, 1969. S C Marsella and J Gratch. EMA : A process model of appraisal dynamics. Cognitive Systems Research, 10(1):70–90, 2009. A H Maslow. Motivation and Personality. Harper & Row, Publishers, Inc., Oxford, England, 1954. 158 D W Massaro. Speech perception by ear and eye; a paradigm for psychological inquiry. Erlbaum, Hillsdale, NJ, England, 1987a. D W Massaro. Categorical partition: A fuzzy-logical model of categorization behaviour. In H Stevan, editor, Categorical perception: The groundwork of cognition, pages 254–283. Cambridge University Press, New York, 1987b. D W Massaro. Ambiguity in Perception and Experimentation. Journal of Experimental Psychology: General, 117(4):417–421, 1988. D W Massaro. Testing between the TRACE model and the fuzzy logical model of speech perception. Cognitive Psychology, 21(3):398–421, 1989. D W Massaro. Perceiving talking faces: from speech perception to a behavioral principle. MIT Press, Cambridge, MA, USA, 1998. D W Massaro and M M Cohen. Perceiving Talking Faces. Current Directions in Psychological Science, 4(4):104–109, 1995. D W Massaro and P B Egan. Perceiving affect from the voice and the face. Psychonomic Bulletin and Review, 3:215–221, 1996. D W Massaro and E L Ferguson. Cognitive style and perception : the relationship between category width and speech perception , categorization , and discrimination. The American Journal of Psychology, 106(1):25–49, 1993. D W Massaro, M M Cohen, A Gesi, R Heredia, and M Tsuzaki. Bimodal speech perception: an examination across languages. Journal of Phonetics, 21:445–478, 1993. D W Massaro, M M Cohen, J Beskow, and R A Cole. Developing and evaluating conversational agents. In Workshop on Embodied Conversational Characters WECC, pages 287–318, Lake Tahoe, CA, USA, 1998. 159 D W Massaro, M M Cohen, and S Vanderhyden. Baldi. iPhone Software, 2009. Z Mathews, S Bermúdez i Badia, and P F M J Verschure. A novel brainbased approach for multi-modal multi-target tracking in a mixed reality space. 4th Intuition international conference and workshop on virtual reality, Athens, Greece, 2007. D P McCabe and A D Castel. Seeing is believing: the effect of brain images on judgments of scientific reasoning. Cognition, 107(1):343– 52, 2008. G McCarthy, A Puce, J C Gore, and T Allison. Face-Specific Processing in the Human Fusiform Gyrus. Journal of Cognitive Neuroscience, 9 (5):605–610, 1997. I K McKenzie and K T Strongman. Rank (status) and interaction distance. European Journal of Social Psychology, 11(2):227–230, 1981. A McQueen. Fall Winter collection: Kate moss holographic projection. fashionWatch [Video file]. Retrieved from: http://www.youtube.com/user/fashionWATCH, 2006. J F Medina, J C Repa, M D Mauk, and J E LeDoux. Parallels between cerebellum- and amygdala-dependent conditioning. Nature Reviews Neuroscience, 3(2):122–31, 2002. H K M Meeren, C C R J van Heijnsbergen, and B de Gelder. Rapid perceptual integration of facial expression and emotional body language. Proceedings of the National Academy of Sciences of the United States of America, 102(45):16518–23, 2005. A Mehrabian. Nonverbal communication. Aldine Transaction Publishers, New Jersey, USA, 1972. A N Meltzoff and M K Moore. Imitation of Facial and Manual Gestures by Human Neonates. Science, 198(4312):75–78, 1977. 160 N E Miller. Studies of fear as acquirable drive. Journal of Experimental Psychologye, 38:89–101, 1948. J Mor. A Computational Model of Emotional Learning in the Amygdala. Cognitive Science, 1995. R L Morgan and D Heise. Structure of Emotions. Social Psychology Quarterly, 51(1):19, 1988. J S Morris, A Ohman, and R J Dolan. Conscious and unconscious emotional learning in the human amygdala. Nature, 393:467–70, 1998. O H Mowrer. Learning theory and behavior. Wiley, New York, 1960. L Nadel and C Land. Commentary - Reconsolidation : Memory traces revisited. Nature Reviews Neuroscience, 1(3):209–212, 2000. K Nakamura, R Kawashima, K Ito, M Sugiura, T Kato, a Nakamura, K Hatano, S Nagumo, K Kubota, H Fukuda, and S Kojima. Activation of the right inferior frontal cortex during assessment of facial emotion. Journal of Neurophysiology, 82(3):1610–4, 1999. O Newman. Defensible Space. Macmillan, New York, 1973. K Oatley and P N Johnson-Laird. Towards a cognitive theory of emotions. Cognition & Emotion, 1:29–50, 1987. A Öhman. Automaticity and the amygdala: Nonconscious responses to emotional faces. Current Directions in Psychological Science, 11(2): 62–66, 2002. A Öhman, A Flykt, and F Esteves. Emotion dirves attention: Detecting the snake in the grass. Journal of Experimental Psychology General, 130(3):466–478, 2001. A Ortony and T J Turner. What’s basic about basic emotions? Psychological Review, 97(3):315–331, 1990. 161 A Ortony, G Clore, and M Foss. The referential structure of the affective lexicon. Cognitive Science, 11(3):341–364, 1987. J Panksepp. Toward a general psychobiological theory of emotions. Behavioral and Brain Sciences, 5(3):407–422, 1982. J J Paton, M A Belova, S E Morrison, and C D Salzman. The primate amygdala represents the positive and negative value of visual stimuli during learning. Nature, 439:865–70, 2006. M L Patterson. Compensation in nonverbal immediacy behaviors: A review. Sociometry, 36(2):237–252, 1973. I Pavlov. Conditioned reflexes. Oxford University Press, Oxford, 1927. C Pelachaud and M Bilvi. Computational model of believable conversational agents. Communications in Multiagent Systems, pages 300–317, 2003. A Penn. Space Syntax And Spatial Cognition: Or Why the Axial Line? Environment & Behavior, 35(1):30–65, 2003. N S Pentkowski, D C Blanchard, C Lever, Y Litvin, and R J Blanchard. Effects of lesions to the dorsal and ventral hippocampus on defensive behaviors in rats. The European Journal of Neuroscience, 23(8):2185– 96, 2006. A S Pentland. Honest Signals: How They Sahpe Our World. MIT Press, Cambridge. MA, 2008. S P Perrett, B P Ruiz, and M D Mauk. Cerebellar cortex lesions disrupt learning-dependent timing of conditioned eyelid responses. Journal of Neuroscience, 13(4):1708–18, 1993. J Pforsich. Handbook for Laban Movement Analysis. Janis Pforsich, New York, 1977. 162 R G Phillips and J E LeDoux. Differential contribution of amygdala and hippocampus to cued and contextual fear conditioning. Behavioral Neuroscience, 106(2):274–285, 1992. R Plutchik. A general psychocvolutionary theory of emotion. In R Kellerman and Plutchik R, editors, Emotion: Theory, research, and experience. Theories of emotion, volume 1, pages 3–31. Academic Press, New York, 1980. F E Pollick, H M Paterson, A Bruderlin, and A J Sanford. Perceiving affect from arm movement. Cognition, 82(2):B51–B61, 2001. D A Powell and D Levine-Bryce. A comparison of two model systems of associative learning: heart rate and eyeblink conditioning in the rabbit. Psychophysiology, 25(6):672–682, 1988. M J Power and T Dalgleish. Cognition and Emotion: From Order to disorder. Psychology Press, Sussex, UK, 1997. J L Price. Comparative aspects of amygdala connectivity. Annals of the New York Academy of Sciences, 985(1):50–58, 2003. J J Prinz. Gut Reactions. Oxford University Press, New York, 2004. Psychology Software Tools, Inc., Sharspburg, PA, USA. E-prime 1, 2007. G J Quirk, E Likhtik, J G Pelletier, and D Pare. Stimulation of Medial Prefrontal Cortex Decreases the Responsiveness of Central Amygdala Output Neurons. Behavioral Neuroscience, 23(25):8800 – 8807, 2003. R A Rescorla and R L Solomon. Two process learning theory: Realtionships between pavlovian conditioning and instrumental learning. Psychological Review, 74:151–182, 1967. R A Rescorla, A R Wagner, A H Black, and W F Prokasy. A theory of pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. In A H Black and W F Prokasy, editors, 163 Classical conditioning II: Current research and theory, pages 64–99. Appleton-Century Crofts, New York, 1972. C L Roether, L Omlor, A Christensen, and M A Giese. Critical features for the perception of emotion from gait. Journal of Vision, 9(6):1–32, 2009. D B Roger. Body-Image, Personal Space and Self-Esteem: Preliminary Evidence for ”Focusing” Effects. Journal of Personality Assessment, 46(5):468–476, 1982. J A Russell. A circumplex model of affect. Journal of Personality and Social Psychology, 39(6):1161–1178, 1980. C D Salzman and S Fusi. Emotion, cognition, and mental state representation in amygdala and prefrontal cortex. Annual Review of Neuroscience, 33:173–202, 2010. M Sanchez-Fibla, U Bernardet, E Wasserman, T Pelc, M Mintz, J C Jackson, C Lansink, C Pennartz, and P F M J Verschure. Allostatic control for robot behavior regulation: a comparative rodent-robot study. Advances In Complex Systems, 13:377–403, 2010. M A Sanchez-Montanes, P Konig, and P F M J Verschure. Learning sensory maps with real-world stimuli in real time using a biophysically realistic learning rule. IEEE Transaction on Neural Networks, 13(3): 619–632, 2002. D Sander, J Grafman, and T Zalla. The Human Amygdala: An evolved system for relevance detection. Reviews in the Neurosciences, 14(4): 303–316, 2003. D Sander, D Grandjean, and K R Scherer. A systems approach to appraisal mechanisms in emotion. Neural networks : the official journal of the International Neural Network Society, 18(4):317–52, 2005. 164 G Sandini, G Metta, and D Vernon. The iCub Cognitive Humanoid Robot : An Open-System Research Platform for Enactive Cognition Enactive Cognition : Why Create a Cognitive Humanoid. In M Lungarella, F Iida, J Bongard, and R Pfeifer, editors, Darwin, pages 358–369. Springer, Berlin, 2007. D A Sauter, F Eisner, P Ekman, and S K Scott. Cross-cultural recognition of basic emotions through nonverbal emotional vocalizations. Proceedings of the National Academy of Sciences, 107(6):2408, 2010. S Schachter. The interaction of cognitive and physiological determinants of emotional state. In L Berkowitz, editor, Advances in experimental social psychology, volume 1 of Advances in Experimental Social Psychology, pages 49 – 80. Academic Press, 1964. K R Scherer. Appraisal considered as a process of multilevel sequential checking. In K R Scherer and A Schorr, editors, Appraisal processes in emotion: Theory, methods, research, pages 92–120. Oxford University Press, New York, 2001. K R Scherer and P Ekman. Approaches to Emotion, chapter Expression and the nature of emotion, pages 319–344. Lawrence Erlbaum Associates, Hillsdale, NJ, 1984. D Schiller, J B Freeman, J P Mitchell, J S Uleman, and E A Phelps. A neural mechanism of first impressions. Nature Neuroscience, 12(4): 508–14, 2009. A Schirmer and S A Kotz. Beyond the right hemisphere: brain mechanisms mediating vocal emotional processing. Trends in Cognitive Sciences, 10(1):24–30, 2006. N Schneiderman, I Fuentes, and I Gormezano. Acquisition and extinction of the classically conditioned eyelid response in the albino rabbit. Science, 136:650–652, 1962. 165 M Schröder and J Trouvain. The German text-to-speech synthesis system MARY: A tool for research, development and teaching. International Journal of Speech Technology, 6(4):365–377, 2003. M Schröder, L Devillers, K Karpouzis, J C Martin, C Pelachaud, C Peter, H Pirker, B Schuller, J Tao, and I Wilson. What should a generic emotion markup language be able to represent? Affective Computing and Intelligent Interaction, pages 440–451, 2007. M Schröder, D Cowie, Rand Heylen, M Pantic, C Pelachaud, and B Schuller. Towards responsive sensitive artificial listeners. In Proceedings of the 4th International Workshop on Human-Computer Conversation, page 6, Sheffield, UK, 2008. W Schultz. Getting formal with dopamine and reward. Neuron, 36(2): 241–63, 2002. H T Schupp, A Öhman, M Junghöfer, A I Weike, J Stockburger, and A O Hamm. The facilitated processing of threatening faces: an ERP analysis. Emotion, 4(2):189–200, 2004. G M Schwartz, C E Izard, and S E Ansul. The 5-month-old’s ability to discriminate facial expressions of emotion. Infant Behavior and Development, 8(1):65 – 77, 1985. C Sehlmeyer, S Schöning, P Zwitserlood, B Pfleiderer, T Kircher, V Arolt, and C Konrad. Human fear conditioning and extinction in neuroimaging: a systematic review. PloS one, 4(6):e5865, 2009. R M Shiffrin. Controlled and automatic human information processing: Ii. perceptual learning, automatic attending, and a general theory. Psychological Review, 84:127–190, 1977. R M Shiffrin. Attention. In R C Atkinson, R J Herrnstein, Lindzey G, and Luce D, editors, Steven’s handbook of experimental psychology, pages 739–811. Wiley, New York, 1988. 166 B F Skinner. About Behaviorism. Random House, New York, 1976. J N M Smith, C J Krebs, A Bechara, D Tranel, H Damasio, R Adolphs, C Rockland, and A R Damasio. Double Dissociation of Conditioning and Declarative Knowledge Relative to the Amygdala and Hippocampus in Humans. Science, 269:1115–8, 1995. R Sommer. Personal Space. The Behavioral Basis of Design. PrenticeHall, Inc., Englewood Cliffs, NJ, 1969. J E Steinmetz. Neuronal activity in the cerebellar interpositus nucleus during classical nm conditioning with a pontine stimulation cs. Psychological Science, 1:378–382, 1990. J E Steinmetz, D J Rosen, P F Chapman, D G Lavond, and R F Thompson. Classical Conditioning of the Rabbit Eyelid Response With a MossyFibre Stimulation CS: I. Pontine Nuclei and Middle Cerebellar Peduncle Stimulation. Behavioral Neuroscience, 100(6):878, 1986. J E Steinmetz, C G Logan, D J Rosen, J K Thompson, D G Lavond, and R F Thompson. Initial localization of the acoustic conditioned stimulus projection system to the cerebellum essential for classical eyelid conditioning. Proceedings of the National Academy of Sciences of the United States of America, 84(10):3531–5, 1987. J E Steinmetz, L L Sears, M Gabriel, Y Kubota, and A Poremba. Cerebellar interpositus nucleus lesions disrupt classical nictitating membrane conditioning but not discriminative avoidance learning in rabbits. Behavioural Brain Research, 45(1):71 – 80, 1991. G Stenberg, S Wiking, and M Dahl. Judging Words at Face Value: Interference in a Word Processing Task Reveals Automatic Processing of Affective Facial Expressions. Cognition & Emotion, 12(6):755–782, 1998. D Stokols. Environmental psychology. Annual Review of Psychology, 29: 253–295, 1978. 167 K T Strongman. Specific emotions theory. In The psychology of emotion, chapter 8, pages 132–151. John Wiley & Sons, Oxford, England, 1987. J R Stroop. Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18:643–662, 1935. G J Stuart and B Sakmann. Active propagation of somatic action potentials into neocortical pyramidal cell dendrites. Nature, 367(6458): 69–72, 1994. L W Swanson and G D Petrovich. What is the amygdala? Trends in Neurosciences, 21(8):323–331, 1998. J W Tanaka and M J Farah. Parts and Wholes in Face Recognition. The Quarterly Journal of Experimental Psychology Section A, 46(2):225– 245, 1993. S E Taylora and S Thompson. Stalking the elusive ”vividness” effect. Psychological Review, 89(2):155–181, 1982. T Tazumi and H Okaichi. Effect of lesions in the lateral nucleus of the amygdala on fear conditioning using auditory and visual conditioned stimuli in rats. Neuroscience Research, 43(2):163–170, 2002. L A Thompson and D W Massaro. Before you see it, you see its parts: evidence for feature encoding and integration in preschool children and adults. Cognitive Psychology, 21(3):334–62, 1989. R F Thompson. The Neurobiology of learning and memory. Science, 233:941–947, 1986. R F Thompson. In search of memory traces. Annual Review of Psychology, 56:1–23, 2005. S M Thurman, M A Giese, and E D Grossman. Perceptual and computational analysis of critical features for biological motion. Journal of Vision, 10(12):1–14, 2010. 168 S S Tomkins. Affect theory. In P Ekman and K R Scherer, editors, Approaches to emotion, pages 163–195. Erlbaum, Hillsdale, NJ, 1984. J Vallverdú and D Casacuberta. Modelling Hardwired Synthetic Emotions: TPR 2.0. In J Vallverdú and D Casacuberta, editors, Handbook of Research on Synthetic Emotions and Social Robots, pages 452–463. IGI Global, 2009. J Van den Stock, R Righart, and B de Gelder. Body expressions influence recognition of emotions in the face and voice. Emotion, 7(3):487–94, 2007. J D Velásquez. Modeling emotions and other motivations in synthetic agents. In Proceedings of the natioanl Conferrence on Artifical Intelligence, pages 10–15. Citeseer, 1997. P F M J Verschure, T Voegtlin, and R J Douglas. Environmentally mediated synergy between perception and behavior in mobile robots. Nature, 425:620–624, 2003. H Wallbott. Bodily expression of emotion. European Journal of Social Psychology, 28(6):879–896, 1998. K L Walters and R D Walk. Perception of emotion from body posture. Bulletin of the Psychonomic Society, 24(5):329–329, 1986. S S Wang, W Denk, and M Hausser. Coincidence detection in single dendritic spines mediated by calcium release. Nature Neuroscience, 3 (12):1266–1273, 2000. J B Watson. Behaviorsm. University of Chicago Press, Chicago, 1930. T Wehrle and K R Scherer. Towards Computational Modeling of Appraisal Theories. In T Scherer, K R and Schorr, A and Johnstone, editor, Appraisal processes in emotion: Theory, methods, research, pages 350–368. Oxford University Press, New York, 2001. 169 N M Weinberger. Physiological memory in primary auditory cortex: characteristics and mechanisms. Neurobiololy of Learnning and Memory, 70(1-2):226–251, 1998. N M Weinberger. Specific long-term memory traces in primary auditory cortex. Nature Reviews Neuroscience, 5(4):279–290, 2004. G L Wenk. The nucleus basalis magnocellularis cholinergic system: one hundred years of progress. Neurobiology of Learning and Memory, 67: 85–95, 1997. C Whissell. The dictionary of affect. In R Plutchik and H Kellerman, editors, Emotion: Theory, research, and experience. Academic Press, New York, 1989. L M Wilcox, R S Allison, S Elfassy, and C Grelik. Personal space in virtual reality. ACM Transaction on Applied Perception, 3(4):412–428, 2006. Piotr Winkielman, Kent C Berridge, and Julia L Wilbarger. Unconscious affective reactions to masked happy versus angry faces influence consumption behavior and judgments of value. Personality & Social Psychology Bulletin, 31(1):121–35, 2005. L Wittgenstein. Philosophical Investigations. Basil Blackwell, Oxford, 1963. S N Young and M Leyton. The role of serotonin in human mood and social interaction. Insight from altered tryptophan levels. Pharmacology, Biochemistry, and Behavior, 71(4):857–65, 2002. R B Zajonc. On the primacy of affect. American Psychologist, 39(2): 117–123, 1984. Y Zong, H Dohi, and M Ishizuka. Multimodal presentation markup language mpml with emotion expression functions attached. In Proceedings of the Intl Symp on Multimedia Software Engineering (IEEE Computer Soc, pages 359–365, 2000. 170

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Document 8942625