Download document 8942625

Document related concepts

Neurophilosophy wikipedia , lookup

Activity-dependent plasticity wikipedia , lookup

Cognitive neuroscience wikipedia , lookup

Perception of infrasound wikipedia , lookup

Psychoneuroimmunology wikipedia , lookup

Neuroethology wikipedia , lookup

Limbic system wikipedia , lookup

Feature detection (nervous system) wikipedia , lookup

Stimulus (physiology) wikipedia , lookup

Behaviorism wikipedia , lookup

Metastability in the brain wikipedia , lookup

Allochiria wikipedia , lookup

Enactivism wikipedia , lookup

Neuroeconomics wikipedia , lookup

Music psychology wikipedia , lookup

Perception wikipedia , lookup

Psychological behaviorism wikipedia , lookup

Emotion and memory wikipedia , lookup

Psychophysics wikipedia , lookup

Emotion wikipedia , lookup

Eyeblink conditioning wikipedia , lookup

Time perception wikipedia , lookup

Affective neuroscience wikipedia , lookup

Embodied cognitive science wikipedia , lookup

Emotional lateralization wikipedia , lookup

Emotion perception wikipedia , lookup

Affective computing wikipedia , lookup

Transcript
Embodied Models of Emotions
Verification of Psychological and
Neurobiological Theories of Emotions Using
Virtual and Situated Agents
Martin Pascal Inderbitzin
TESI DOCTORAL UPF / ANY 2011
DIRECTOR DE LA TESI
Paul F. M. J. Verschure Departament Tecnologies de la
Informació i les Comunicacions
ii
iii
To Patrick, Pia, Werner, Meg, Nina & Judith
v
Acknowledgments
The accomplishment of this thesis would not have been possible without the professional and personal support of so many people. First of all I
would like to thank Paul Verschure for giving me the opportunity to learn
from his profound experience during all these years. His constant support and motivation to go further was fundamental for the success of this
project!
Second of all I would like to thank Ulysses for his co-supervision.
Your countless critical and constructive inputs were very helpful to improve the quality of my work. Thanks a lot!
My special thanks goes to Dominic Massaro who guided my final
study with his impressive experience in scientific research. It was a big
honor for me to work with you on this project. I would also like to thank
Karen for the so warm reception in your house. You two made me feel
home during my stay! Thank you so much for all you have done for me!
I would also like to thanks the members of SPECS and my closest
friends for their moral and amoral support. Thanks Sytse for introducing me to the scientific world of belgian alchemy. Thanks Encarnushka
for joining me on all our round trips (I am still convinced that it goes to
the left!). Thanks Cesu for covering my back during all our C.o.D. discussions. Thanks Anna for giving me a hard time in proof reading my
documents. Thanks Alasdair for all the inspiring and so helpful inputs
on my work. Thanks Ivan for not giving up on me. Thanks Carme,
Mireia, Christian, Santa, Joana and Lydia for guiding me through the
UPF paper work jungle! Thanks Alberto for enriching our group with
Ukulele live performances. Thanks Deco for the invention of one brain
one song contest. Thanks Zenuschka for beeing Indian. Thanks Sylvain
for bringing me to NASA (and back to earth). Thanks Sam for all the
programming work. Thanks Arnau or never. Thanks Elena and Eliza for
the frappe! Thanks Marti for the lunch invitation. Thanks Pecho Paloma
for the lunch. Thanks Armin for showing me that its possible to finish.
Thanks Quique for his funny anecdotes. Thanks Cristina and Belen for
enriching our group. And thanks Vicky for the tsipouro, it helped!
vii
My final and most deepest thanks goes to my family. To Patrick for
teaching me resistance, to Pia & Werner for being my compass in so many
aspects of live, to Lexa, Maxa and Milla for showing me life outside science, to Meg for her accompaniment, to Nina for always being there and
to Judith for all her support during this journey.
viii
Abstract
The investigation of the influence of emotions on human cognition and
behavior challenges scientist since a long time. So far the most popular
approach to investigate this phenomenon was to observe brain processes
and behavior. In the recent decade the field of computational neuroscience
proposed a new methodology: the construction of embodied models of
emotions and their verification in real world environments.
In this thesis we present different studies that use computational models of emotions to control the behavior and the expressions of situated
agents. Using different methodologies we evaluate both, the performance
of the models and the behavioral responses of humans interacting with
them. Our results add to a deeper understanding of the multidimensional
phenomena of emotions on three levels: Perception, interaction and how
the processing of emotional cues influences learning and behavior.
ix
Summary
In this dissertation we address the issue of understanding the phenomenon
of human emotions. To do so we pose the question of how we can construct biologically plausible embodied models of emotions. The motivation to ask this question is based on our strong belief that we can understand the nature of emotions by building situated models of them. We do
this by equipping agents with emotive architectures to control their behavior in virtual and physical environments. The observation of the agent’s
performance, and the behavior of users interacting with it are used in this
thesis to verify existing theories of emotions.
Emotions are multidimensional body-mind states that emerge over
time. The basis of every emotion is an appraisal mechanism that evaluates the coping potential of an internal or external stimulus with the goals
and needs of an individual (Arnold, 1960; Lazarus, 1991; Scherer, 2001).
The results of this evaluation mechanism are positive or negative somatic
and neuronal adaptions, that influence cognition, perception and behavior.
Hence, the main function of emotions is the creation of a valence map that
helps an individual to increase the ability to cope with ambiguous physical and social environments (LeDoux, 1996; Damasio et al., 1996; Craig,
2010). Despite this importance, the underlying neurobiological and psychological mechanisms of emotions are not understood in full detail.
In the first part of this thesis we investigate the perception and integration of affective behavioral features that build the basis for social
interaction. In the second part we focus on the neurocomputational processing of emotional cues and their influence on learning and behavior.
In the final part we propose an advanced neurocomputational emotive architecture that is based on the insights of our results and the conceptual
framework of recent emotion theories.
We start the discourse with the investigation of the perception of a basic emotional behavior, the regulation of the interpersonal space to others
(Hall, 1966). Our first study addresses the question of how the perception of a virtual agent or a real person affects social interaction on a spatial scale. Our results reveal that the regulation of the personal space is
xi
coding social behavior that is fundamentally influenced by the perceptual
salience of the interactors (Inderbitzin et al., 2009, submitted). The established psychological concept of the ’vividness effect’ (Frijda, 1988) states
that a more salient stimulus construct induces altered cognitive and behavioral responses. Based on our findings we propose that this is a general
mechanism of human perception. Our results add to the understanding of
this effect that is found to crucially influencing social interaction.
The result of our first study opens the question as to which additional
non-verbal behaviors are coding social signals. In our second study we
investigate the perception of emotional states communicated by different
styles of locomotion. Our results identify a number of canonical parameters defining the body configuration of a person walking that code different valence qualities (Inderbitzin et al., 2011). These results are important
for the understanding of the underlying behavioral mechanism that codes
non-verbal emotional behavior.
So far, the presented studies focused on the perception of non-verbal
affect. In the next study we want to add the verbal dimension and investigate how humans perceive the emotions transmitted by a talking face.
In a face-to-face communication verbal and non-verbal features transmitting emotional meaning build a complex multidimensional stimulus construct. In our third study we investigate the perception and integration of
emotional features transmitted by facial expressions and affective words.
We compare the behavioral performance of people perceiving a multidimensional stimulus construct that codes either coherent or incoherent
affect qualities with the prediction of the fuzzy logical model of perception FLMP (Massaro, 1998). Subjects were instructed to judge the affect
of the facial expression, of the meaning of the word pronounced by the
face or of the global event combining these two properties. As described
by the FLMP, both properties influenced judgments when the participants
responded fast. With increasing reaction time, the FLMP did not make
better predictions than other models of perception. We conclude that the
perception of affect in multiple modalities is an automatic process that
can produce interferences, while the integration of these modalities into a
global impression is more controlled.
xii
In the second part of this thesis we investigate the underlying mechanism of emotion processing and its influence on learning and behavioral control. We propose a computational model of emotional conditioning that is based on the two phase theory of conditioning (Inderbitzin
et al., 2010a). This theory states that the associative learning processes
can be separated into a fast valence-driven, non-specific learning system
and a slow specific learning system. We provide a complete account
of Konorski’s proposal (Konorski, 1968) by integrating these two systems into a biologically grounded computational model. As an additional
benchmark we apply this model to control the behavior of an autonomous
robot in an obstacle avoidance task (Inderbitzin et al., 2010b).
In the last study we construct a neurocomputational model of fear
conditioning in order to elicit appropriate behavioral expressions in an
android. The robot’s performance to learn the valence qualities of different stimuli is tested in a real world set up that involves interaction with
humans. Based on these findings we propose an advanced emotive architecture.
In this thesis we apply different embodied emotive systems to investigate the underlying mechanism of emotions. The presented studies illuminate how humans perceive and integrate affective features. We show
that this mechanism influences social interaction on a spatial scale. Using different computational models of conditioning we analyze how the
underlying computational mechanisms affect behavioral control. These
computational models are used successfully to control different types of
robots in physical and social environments. Our results add to a deeper
understanding of emotions on three levels: Perception, interaction and
learning.
xiii
List of Appended Papers
This thesis is based on the studies listed below. They will be referred in
the text.
M Inderbitzin, S Wierenga, A Väljamäe, U Bernardet, and P F M J
Verschure. Cooperation and competition in the mixed reality space eXperience Induction Machine XIM. Virtual Reality, 13, 153–158, 2009.
M Inderbitzin, I Herreros-Alonso, and P F M J Verschure. Amygdala Induced Plasticity in an Integrated Computational Model of the TwoPhase Theory of Conditioning. 4th International Conference on Cognitive
Systems, Zurich, 2010.
M Inderbitzin, I Herreros-Alonso, and P F M J Verschure. An integrated computational model of the two phase theory of classical conditioning. The 2010 International Joint Conference on Neural Networks
(IJCNN), 1-8, 2010.
M Inderbitzin, A Valjamae, J M B Calvo, P F M J Verschure, and U
Bernardet. Expression of emotional states during locomotion based on
canonical parameters. IEEE International Conference on Automatic Face
and Gesture Recognition, 809 -814, 2011.
M Inderbitzin, A Betella, U Bernardet, and P F M J Verschure. The
Social Perceptual Salience Effect. Journal of Experimental Psychology.
Human Perception and Performance, submitted.
M Inderbitzin, P F M J Verschure, and D W Massaro. Emotion Perception in a Talking Face: Facial and Linguistic Influences. To be submitted.
xv
Other Papers
The author has also contributed to following publications.
U Bernardet, M Inderbitzin, S Wierenga, A Väljamäe, A Mura and P
F M J Verschure. Validating presence by relying on recollection: Human
experience and performance in the mixed reality system XIM. The 10th
International Workshop on Presence, Padova, Italy, 2008.
U Bernardet, S Bermúdez i Badia, A Duff, M Inderbitzin, S LeGroux,
J Manzolli, Z Mathews, A Mura, A Väljamäe, and P F M J Verschure.
The experience induction machine: a new paradigm for mixed-reality interaction design and psychological experimentation. The Engineering of
Mixed Reality Systems, 357–379, 2010.
U Bernardet, A Väljamäe, M Inderbitzin, S Wierenga and P F M J
Verschure. Quantifying human subjective experience and social interaction using the eXperience Induction Machine. Brain Research Bulletin,
In press.
xvi
Summary
List of figures
xxxii
List of tables
xxxiii
1
2
INTRODUCTION
1.1 What Are Emotions? . . . . . . . . . . . . . . . . . . .
1.2 The Building Blocks of an Emotion . . . . . . . . . . .
1.2.1 Needs and Motivation . . . . . . . . . . . . . .
1.2.2 The Valence System . . . . . . . . . . . . . . .
1.2.3 The Appraisal Mechanism . . . . . . . . . . . .
1.2.4 Neurocomputational, Physiological and Behavioral
Responses . . . . . . . . . . . . . . . . . . . . .
1.3 The Time Scale of Emotions . . . . . . . . . . . . . . .
1.4 What Distinguishes an Emotion From a Non-Emotion? .
1.4.1 The Feeling Theory of Emotions . . . . . . . . .
1.4.2 The Cognitive Approach to Emotions . . . . . .
1.5 Basic and Complex Emotions . . . . . . . . . . . . . . .
1.6 The Neurobiological Basis of Emotions . . . . . . . . .
1.6.1 Subcortical Areas . . . . . . . . . . . . . . . . .
1.6.2 Cortical Areas . . . . . . . . . . . . . . . . . .
5
6
7
7
10
13
17
17
20
SYNTHETIC EMOTIONS AND EMOTIONAL AGENTS
2.1 Synthetic Emotions . . . . . . . . . . . . . . . . . . . .
2.1.1 Theory Modeling . . . . . . . . . . . . . . . . .
2.1.2 Application Modeling . . . . . . . . . . . . . .
23
23
24
24
xvii
1
1
2
2
4
4
2.2
3
4
Emotional Agents . . . . . . . . . . . . . . . . . . . . .
2.2.1 Virtual Agents . . . . . . . . . . . . . . . . . .
2.2.2 Physical Agents . . . . . . . . . . . . . . . . . .
25
25
27
NON-VERBAL BEHAVIOR AND SOCIAL INTERACTION
3.1 Human Spatial Behavior . . . . . . . . . . . . . . . . .
3.2 The Effect of Apparent Reality . . . . . . . . . . . . . .
3.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.1 Materials . . . . . . . . . . . . . . . . . . . . .
3.3.2 Research Design . . . . . . . . . . . . . . . . .
3.3.3 Measures . . . . . . . . . . . . . . . . . . . . .
3.3.4 Procedure . . . . . . . . . . . . . . . . . . . . .
3.3.5 Participants . . . . . . . . . . . . . . . . . . . .
3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1 Spatial Scale of Collaborative Behavior . . . . .
3.4.2 Effect of Players Representation on the Spatial
Interaction . . . . . . . . . . . . . . . . . . . .
3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . .
3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . .
33
33
35
39
39
39
41
42
42
42
45
PERCEPTION OF EMOTIONS
4.1 Emotion Perception in Locomotion . . . . . . . . . . . .
4.1.1 Methods . . . . . . . . . . . . . . . . . . . . .
4.1.2 Results . . . . . . . . . . . . . . . . . . . . . .
4.1.3 Discussion & Conclusion . . . . . . . . . . . . .
4.2 Emotion Perception in the Talking Face . . . . . . . . .
4.2.1 The Fuzzy Logical Model of Perception . . . . .
4.2.2 The Weighted Average Model of Perception . . .
4.2.3 Automatic Processing of Information . . . . . .
4.2.4 Automatic processing of affective faces and words
4.2.5 Experiment 1 . . . . . . . . . . . . . . . . . . .
4.2.6 Results . . . . . . . . . . . . . . . . . . . . . .
4.2.7 Discussion . . . . . . . . . . . . . . . . . . . .
4.2.8 Experiment 2 . . . . . . . . . . . . . . . . . . .
57
58
60
62
64
68
68
72
73
73
76
78
80
84
xviii
47
49
55
4.3
4.2.9 Results . . . . . . . . . . . . . . . . . . . . . .
4.2.10 Discussion . . . . . . . . . . . . . . . . . . . .
Conclusion . . . . . . . . . . . . . . . . . . . . . . . .
86
87
90
5
COMPUTATIONAL MODEL OF EMOTION INDUCED LEARNING
93
5.1 The Two Phase Model of Conditioning . . . . . . . . . . 94
5.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.2.1 The circuit . . . . . . . . . . . . . . . . . . . . 98
5.2.2 The Non-specific Learning System . . . . . . . . 98
5.2.3 The specific Learning System . . . . . . . . . . 102
5.2.4 Integrating the NSL with the SLS . . . . . . . . 105
5.2.5 Robot Application . . . . . . . . . . . . . . . . 106
5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.3.1 Performance of the Integrated Model . . . . . . 107
5.3.2 Performance of the Robot . . . . . . . . . . . . 111
5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 115
6
CONSTRUCTING AN EMOTIVE ANDROID
6.1 The Neurobiological Mechanism of Fear . . . .
6.2 Embodied Emotive Model . . . . . . . . . . .
6.2.1 Model Architecture . . . . . . . . . . .
6.2.2 Experimental Design . . . . . . . . . .
6.2.3 Conditioning . . . . . . . . . . . . . .
6.2.4 Discussion & Conclusion . . . . . . . .
6.3 Proposal for an Advanced Emotive Architecture
6.3.1 Theoretical Basis . . . . . . . . . . . .
6.3.2 Distributed Adaptive Control . . . . . .
6.3.3 Conclusion . . . . . . . . . . . . . . .
7
CONCLUSION
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
117
117
121
121
121
124
125
127
127
129
131
133
xix
List of Figures
1.1
1.2
1.3
Schematic illustration of the Cartesian analysis of anger.
The exciting cause from the external world stimulates the
body spirits that are conceptualized as the immediate cause
of emotions. The bodily spirits give raise to both the behavioral response and the emotion itself. In the cartesian
approach it is not clear what the object of the emotion is.
Figure retrieved from (Power and Dalgleish, 1997) . . .
8
The cognitive account of emotions by Aristotle applied
to anger. The object describes the external event that becomes evaluated by the individual, that is in an appropriate state of mind. The results is an internal representation
or stimulus that elicits the emotional response which is
divided into the dimension of matter and form. Figure
retrieved from (Power and Dalgleish, 1997) . . . . . . .
11
The connectivity of the amygdala. This nucleus receives
inputs from all sensory modalities, cortical and subcortical areas. The output is transmitted to modulatory systems and neuronal correlates in the brain stem. The direct connection to the hypothalamus allows the amygdala to trigger hormonal responses. Figure adapted from
(LeDoux, 2006) . . . . . . . . . . . . . . . . . . . . . .
18
xxi
1.4
2.1
2.2
The connectivity of cortical and sub-cortical clusters. The
prefrontal cortex is highly connected to the amygdala,
sensory cortices, the hippocampus and nuclei in the brain
stem that regulates hormonal responses. Figure from (Salzman and Fusi, 2010). . . . . . . . . . . . . . . . . . . .
21
Full body humanoid robots. Asimo (left), Hubo (middle)
and iCub (right) are three examples of androids with different capabilities and objectives. . . . . . . . . . . . .
27
Small full body humanoid robots. Nao (left) and Qrio
from Sony (right). . . . . . . . . . . . . . . . . . . . . .
28
2.3
The three most popular teleoperated androids, so called
geminoids: Model F (left), HI (middle) and DK (right)
front row, with their human ’originals’, Anonymous young
female (left), Hiroshi Ishiguro from Osaka University, Japan
(middle) and Henrike Scharfe from Aalborg University,
Denmark. . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4
Upper torso robots: Nexi (left), Domo (center left), Barthoc
(center right) and Armar3 (right), partially with mobile
platform. . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.5
Expressive robot heads: Kismet (left), Mertz (center) and
Roman (right). . . . . . . . . . . . . . . . . . . . . . . .
30
2.6
Zoomorphic robots: Emuu (left), iCat (center left), Leonardo
(center right) and Probo (right). . . . . . . . . . . . . . . 31
3.1
The eXperience Induction Machine XIM, a fully instrumented mixed reality space that can be accessed by multiple users either as physical visitors or virtual representations. Virtual visitors are represented in the physical
space of the XIM on the surrounding screen and as lit
floor tiles. Physical visitors are represented as virtual
characters in the virtual world. . . . . . . . . . . . . . .
xxii
36
3.2
3.3
In the Mixed condition one remote player built a team
with one physical player. The remote players played the
game using a computer and a game pad. Physical players
inside the XIM were represented as avatars on the screen
of remote players. Verbal communication between the remote and physical player was established over a wireless
communication headset. . . . . . . . . . . . . . . . . .
40
Spatial distribution of an example epoch. The ball play
out (red dot) starts in the middle of the field. At the beginning of the epoch team players were positioned in their
team side (blue and green dots). The trajectories of the
players show their spatial behavior over time. Play direction was vertical. Team 2 scored a goal when the ball
reached the back line of team 1. . . . . . . . . . . . . .
43
3.4
Distribution of epoch winners (right panel) and epoch losers
(left panel) for all goal events. The graph only shows one
side of the game field; play direction is from top down
and vice versa. Colorbar indicates accumulated position
of players over time. Winners chose more static and defensive positions compared to losers. . . . . . . . . . . 46
3.5
Distribution of epoch winners (right panel) and epoch losers
(left panel) for all goal events. The graph only shows one
side of the game field; play direction is from top down
and vice versa. Colorbar indicates accumulated position
of players over time. Winners chose more static and defensive positions compared to losers. . . . . . . . . . . 48
xxiii
3.6
Schematic representation of the three conditions. Only
team of the same condition played against each other.
Left panel: Two Physical teams compete each other. All
four players are physical present inside XIM. Middle panel:
In the Mixed condition one player of each team is present
inside XIM and the other player virtually represented.
Virtual players use a computer to play the game. Right
panel: In the Virtual condition all four players use a computer to play and are virtually represented inside XIM. . 50
3.7
Schematic representation of the detailed analysis of players behavior in different conditions. We compared the
behavior of XIM players in the Physical condition with
the behavior of the XIM players in the Mixed condition
(A) and the behavior of the remote players in the Mixed
condition with the behavior of the remote players in the
Virtual condition (B). . . . . . . . . . . . . . . . . . . .
51
Still images of stimuli in frontal view (A-C), and side
view (D-F). Head/Torso inclination varied between 55 degree down (A, D), zero degrees (B, E), and 15 degrees up
(C, F). . . . . . . . . . . . . . . . . . . . . . . . . . . .
61
Valence and arousal rating for Head/Torso inclination. Error bars indicate standard error. Valence rating 0 indicates
a very sad emotional state, rating 10 a very happy state.
Arousal rating 0 indicates low arousal, arousal rating 10
indicates high arousal. . . . . . . . . . . . . . . . . . .
63
Valence and arousal rating for different speed parameters. Error bars indicate standard error. Valence rating
0 indicates a very sad emotional state, rating 10 a very
happy state. Arousal rating 0 indicates a low arousal state,
arousal rating 10 indicates a high arousal state. . . . . .
64
4.1
4.2
4.3
xxiv
4.4
Distribution of the animations in the circumplex space.
The legend indicates the stimuli parameter space of the
different animations: <speed>.<viewing angle>.<head/torso
inclination>. The speed parameter is defined as Fast = 1.4
m/sec, Medium = 0.75 m/sec and Slow = 0.5 m/sec. The
viewing angle varies between profile view = 90 degrees,
and rotated frontal view = 45 degrees. The parameter for
the head/torso inclination varies between Neutral = 0 degrees, Up = + 15 degrees and Down = -55 degrees. . . . 65
4.5
Schematic representation of the three stages involved in
perceptual recognition proposed by the Fuzzy Logical Model
of Perception FLMP. The three processes are temporarily successive, but overlapping. Reading direction in the
diagram is from left to right. The model is explained
with a task where subjects have to integrate affect from
words and expressions. The source of information are
indicated by upper case letters: Expressive information
by Ei , word information by Wj . The evaluation process
transforms this information into perceived features, indicated by lower case letters ei and wj . The integration process results in an overall degree of support sk , for a given
affect k. The decision process maps the output of the integration into a response Rk . All three processes make use
of prototypes stored in the memory. . . . . . . . . . . . 69
4.6
Tree of wisdom illustrating binary oppositions central to
the differences among theories of perception. Figure retrieved from (Massaro, 1998). . . . . . . . . . . . . . .
70
The affective facial expressions of the stimulus space used
in experiment 1. The eyebrows and the mouth corner deflection of Baldi were varied to produce a stimulus continuum from happy to angry. . . . . . . . . . . . . . . .
77
4.7
xxv
4.8
4.9
Reaction time in the expression condition (left) and the
word condition (right). When the stimulus construct had
coherent valence qualities reaction times were reduced in
both conditions. The box indicates the 25th and the 75th
percentile, the whiskers indicate the most extreme data
points not considered as outliers. The horizontal line is
the median. . . . . . . . . . . . . . . . . . . . . . . . .
79
Observations (symbols) and predictions (lines) for the fuzzy
logical model of perception FLMP in the expression condition (left) and the linguistic semantics condition (right).
We observed a significant influence of the angry words
on the judgments of the neutral facial expressions (left
panel). This effect was not observed in the linguistic semantics condition (right panel). . . . . . . . . . . . . . . 81
4.10 Observations (symbols) and predictions (lines) for the fuzzy
logical model of perception FLMP (left) and the weighted
additive model of perception AMP (right). The plot shows
the fits for the bimodal condition where subjects had to
identify the affect of the overall event. The FLMP makes
a significant better prediction for the observed data compared to the AMP. . . . . . . . . . . . . . . . . . . . . . 81
4.11 The affective facial expressions of the stimulus space used
in experiment 2. The eyebrows and the mouth corner deflection of Baldi were varied to produce a stimulus continuum from happy H (top left) to angry A (down right)
in 10 steps. The letter N indicates a neutral intermediate
state. The number indicates the strength of the affect. . .
85
4.12 The mean RT in experiment 1(M = 0.97, SD = 0.5) was
significant faster compared to experiment 2 (M = 2,12,
SD = 1,1) (Wilcoxon z = 79,6, p < 0.01). . . . . . . . . .
87
4.13 Observations (symbols) and predictions (lines) for the fuzzy
logical model of perception FLMP in the expression condition (left) and the linguistic semantics condition (right). 88
xxvi
4.14 Observations (symbols) and predictions (lines) for the fuzzy
logical model of perception FLMP (left) and the weighted
average model WAM (right). The average root mean square
deviation RMSD for the FLMP (0.032) and the WAM
(0.031) did not differ in their quality of prediction. . . . . 88
5.1
The architecture of the integrated model: The Non-specific
learning system (NLS) is shown on the left, the specific
learning systems (SLS) on the right. In the NLS the activation of the amygdala (A) and the nucleus basalis (NB)
induces plasticity in the auditory cortex (AC). The conditioning stimulus (CS) reaches the auditory cortex over the
thalamus (Th) where it converges with the unconditioned
stimulus (US). Inhibitory interneurons (IN) regulate the
amount of plasticity. The pontine nucleus (PN) gates the
stimulation from the NLS to the SLS. In the SLS the CS
and the US converge at the level of the purkinje cell resulting in the induction of LTD at the purkinje synapse. This
induces a dis-inhibition of the deep nucleus (DN) leading
to the exact timed motor conditioned response (CR). The
reflexive unconditioned response (UR) is elicited without
adaptive processing. A amygdala; AC auditory cortex;
CS conditioning stimulus; DN deep nucleus; GC granule
cells; IN inhibitory interneurons; IO inferior olive; NB
nucleus basalis; CR conditioned reaction; PN pontine nucleus; PU purkinje cell; Th thalamus; US unconditioned
stimulus . . . . . . . . . . . . . . . . . . . . . . . . . .
xxvii
99
5.2
The architecture of the cerebellar SLS. The CS and the
US converge at the purkinje cell synapse (PU-SYN). CF
climbing fibre, CR conditioned reaction, CS conditioned
stimulus, DN deep nucleus, GA granule cells, GO golgi
cells, IIN inhibitory interneurons, IO inferior olive, MF
mossy fibre, PF parallel fibre, PU-SP purkinje cell spontaneous activity, PU-SO purkinje cell soma, PU-SYN purkinje cell synapse, US unconditioned stimulus. . . . . . . 103
5.3
Robot application: A ePuck robot moves autonomously
in a circular open field arena. The association of the red
color on the floor detected by a camera (CS) and the detection of the wall by proximity sensors (US) induced
learning in the proposed computational mechanism. The
green arrows indicates the moving direction of the robot.
107
5.4
Reactivity of the auditory cortex before and after the conditioning. CS is the stimulus with ID 1. Before the conditioning the cortical reaction to all 5 stimuli is homogenic.
After the conditioning the cortex response to the CS is
increased. . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.5
Learning of the exactly timed CR by the SLS: The PU cell
activity decreases during conditioning trials 1-13. During trial 12 the activity under-runs for the first time the
threshold resulting in the dis-inhibition of the deep nucleus. During trial 13 the PU cell activity under-runs the
threshold before the US and an exactly timed CR is triggered. The CS and the US are only schematically represented in this plot. . . . . . . . . . . . . . . . . . . . . . 109
5.6
The performance of the integrated model before the conditioning. The purkinje cell (PU) does not change its activity and no CR is elicited. CS conditioned stimulus, US
unconditioned stimulus, AC auditory cortex, PU purkinje
cell, CR conditioned reaction. . . . . . . . . . . . . . . 110
xxviii
5.7
The performance of the model after the conditioning. The
CS representation in the auditory cortex (AC) is increased.
A delayed pause in the purkinje cell (PU) can be observed. The CR is elicited just before the US presentation.
CS conditioned stimulus, US unconditioned stimulus, AC
auditory cortex, PU purkinje cell, CR conditioned reaction. 110
5.8
The behavior of the ePuck robot before conditioning. The
robot enters the red area of the arena. The proximity sensors detect the wall (US) and elicit the unconditioned response (UR) in form of a late turning. The blue line indicates the track of the robot in the arena. . . . . . . . . . 111
5.9
The behavior of the ePuck robot after conditioning. The
robot does not enter the red area of the arena. The camera
detects the red color (CS) and the model elicits a conditioned response (CR) in form of an exactly timed turning.
The blue line indicates the track of the robot. . . . . . . 112
5.10 The change of the synaptic weight at the level of the PFPU during the robot experiment. Every time a CS and a
US coincide at the level of the purkinje synapse LTD becomes induced. Once the synaptic efficacy reaches a critical level a conditioned response becomes trigger avoiding
future LTD induction and the synaptic weight becomes
stable. . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.11 The performance of the ePuck robot measured by percentage of performed conditioned response and occurred
US. After 113 trials the robot shows conditioned behavior. The fluctuation in response is due a spontaneous recovery of the synaptic transmission at the Purkinje cell.
Whiskers indicate STD. . . . . . . . . . . . . . . . . . . 114
xxix
6.1
During the conditioning phase (left panel) an animal becomes exposed to a neutral tone (CS) and an aversive foot
shock (US). After the conditioning phase (right panel) the
animal reacts with a freeze response when exposed to the
original neutral tone (CS). Figure adapted from Nadel and
Land (2000). . . . . . . . . . . . . . . . . . . . . . . . 118
6.2
An aversive stimulus is transmitted by two pathways to
the amygdala: The low route transmits the sensory information directly from the thalamus to the amygdala.
This route is fast and responsible for unspecific behavioral responses. The high route sends the sensory input to
cortical areas for the evaluation of the stimulus features.
This route is slower, but capable to elicit more specific
cognitive and behavioral responses. Figure adapted from
LeDoux (1994). . . . . . . . . . . . . . . . . . . . . . . 119
6.3
The processing of a neutral CS and an aversive US. When
CS and US coincide at the location of the amygdala, learning is induced. The results are different physiological
and behavioral responses. LA lateral amygdala, CE central amygdala, CG central gray, LH lateral hypothalamus,
PVN paraventricular hypothalamus. Figure adapted from
Medina et al. (2002) . . . . . . . . . . . . . . . . . . . . 120
6.4
Schematic representation of the fear conditioning model.
The visual stimulus and the audio stimulus are transmitted
over the thalamus to the amygdala where they coincide.
This co-activation induces an adaptation of the synaptic
weight. After conditioning the change in synaptic weight
allows the CS to trigger the behavioral response. . . . . 122
6.5
The iCub uses led lights to express different emotions in
the face. The picture shows his angry expression that was
used in the present study. . . . . . . . . . . . . . . . . . 123
xxx
6.6
Experimental design of the fear conditioning in the iCub.
The association of a neutral CS with an aversive US induces a change in plasticity. After conditioning the CS
alone is capable to elicit the behavioral response. A nonconditioned stimulus NS elicits also after the conditioning
phase a unconditioned response. . . . . . . . . . . . . . 124
6.7
The conditioning phase of the iCub. Before conditioning the iCub smiles when seeing either red (A) or blue
(B). During the conditioning phase the robot sees the blue
while hearing 4-5 aversive noise events (C). After the conditioning the robot reacts with an angry face when seeing
the blue hat (E), but still smiles when seeing the red color
(D). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.8
The component process model (Scherer (2001); Sander
et al. (2005)). Represented are the five components of
emotion (vertical) as well as the sequence of appraisals
(horizontal) and the interaction between subsystems that
gradually shape the emotion, sup- porting the genesis of
a particular feeling. . . . . . . . . . . . . . . . . . . . . 128
xxxi
6.9
The system architecture of DAC: the system consists of
three tightly coupled layers: reactive, adaptive and contextual. The reactive layer endows a behaving system
with a prewired repertoire of reflexes (low complexity unconditioned stimuli and responses) that enable it to display simple adaptive behaviors. The activation of any
reflex, however, also provides cues for learning that are
used by the adaptive layer via representations of internal
states, i.e. aversive and appetitive. The adaptive layer
provides the mechanisms for the adaptive classification
of sensory events and the reshaping of response. The
sensory and motor representations formed at the level of
adaptive control provide the inputs to the contextual layer
that acquires, retains, and expresses sequential representations using systems for short and long term memory.
The contextual layer describes goal oriented learning and
reflexive mechanisms. . . . . . . . . . . . . . . . . . . . 130
xxxii
List of Tables
1.1
3.1
’Basic’ emotion classes of different theorists according to
Ortony and Tuner (1990) . . . . . . . . . . . . . . . . .
15
Proxemics behavior of winners and losers: Mean time of
shared interaction space; standard deviation in brackets.
IS = intimate space; PS = personal space; Sig = significance (a p < 0.1, * p < 0.05, ** p < 0.01). . . . . . . .
Spatial intra-team interactions for winners and losers during the entire game, winning and losing epochs and offensive and defensive game situations: Mean intra-team
member distance; standard deviation in brackets. ITMD
= Intra-Team Member Distance; Sig = Significance level
** p < 0.01. . . . . . . . . . . . . . . . . . . . . . . . .
Spatial behavior of XIM players and remote players. Mean
sprinted distance, mean distance to the mid-line of the
team side and mean time spent in the field side of the
team member (Time behind mid-line); standard deviation
in brackets. Sig = Significance level ( * p < 0.05, ** p <
0.01). . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
4.1
Specification of the stimuli parameters . . . . . . . . . .
62
6.1
Levels of processing for stimulus evaluation checks. Adapted
from Leventhal and Scherer (1987). . . . . . . . . . . . 127
3.2
3.3
xxxiii
44
47
Chapter 1
INTRODUCTION
1.1
What Are Emotions?
Before I can introduce our studies, I have to answer some basic questions
addressing the phenomenon of emotions. This includes basic definitions
that I will use later in our argumentation. This is an important step because different emotion scientists use the same terms for different concepts. So it is fundamental to clearly define the personal position in this
discourse.
We will use the following five questions to start the discussion about
the emotions.
1. What are the building blocks of an emotion?
2. What is the time scale of an emotion?
3. What distinguishes an emotion from a non-emotion?
4. How do we distinguish different emotions?
5. What is the neurobiological basis of an emotion?
1
Each of the question will be answered during the introduction. The
last part of the introduction provides a summary of how scientist constructed models of synthetic emotions and how they implemented them
into virtual and physical agents.
1.2
The Building Blocks of an Emotion
An emotion is a multi-dimensional body-mind state that emerges over
time. Different functional building blocks that can be identified and described form part of this state.
The driving forces of every emotion are needs and motivations of an
agent to act. As we will see such needs can be very basic or complex. The
emergence of an affective state is an appraisal mechanism that evaluates
the congruency of an agents needs and goals with a perceived internal
state or external stimuli. This evaluation is a multi-modal process that
involves simple reactive and complex cognitive brain processes. Goal
congruency induces positive emotions, goal in-congruency induces negative emotions. The resulting emotional state is a mix of specific cognitive activity patterns and physiological reactions. It allows an agent to
access implicit mental state that affect behavior, cognition, memory and
perception. The physiological state affects the somatic performance and
sensory sensitivity. Mental and physiological states are connected in a
closed loop. The function of emotions is to increase an agents capability
to survive in the physical and social environment.
1.2.1
Needs and Motivation
Needs are defining conditions of an agents well-being. The satisfaction
of such needs are the driving motor of emotions.
The most basic examples are physiologically needs that control the
nutrition intake, sleep and the urge for security. In biology this process
that control fundamental needs are described as homeostasis and allostasis. Homeostasis is the regulation of a physiological state within a cer2
tain threshold controlling for one parameter. The regulation of the body
temperature is a good example to visualize this process. The discrepancy of the desired value induces physiological and behavioral changes
to re-establish stability. Allostasis maintains the internal stability of an
organism in an adaptive fashion. By adjusting actively to predictable and
unpredictable future needs allostasis controls anticipatory. Dehydration
is a good example to visualize the difference between homeostasis and
allostasis. The reduction of sweating is a simple homeostatic process to
control for the loss of water. The orchestration of multiple such homeostatic processes like urine reduction, mucous membrane dehydration or
blood pressure that are directly or indirectly helping to maintain stability
is called allostasis. In contrast to homeostasis, this process can happen
anticipatory. The biological regulation of fundamental needs establish
health and survival of an organism. The result of this regulation are basic
drives, like foraging or the defense of territory. May someone ask why
reproduction is not listed as a fundamental need of an organism. This is
because reproduction is only a fundamental need of a species, but not of
an individual. An organism can survive without reproduction, while the
species can not. This does not mean that reproduction is not a strong need.
Basic needs regulate directly survival and are therefore defined as
fundamental. Non-fundamental needs drive behavior that is not directly
linked to survival, but to the well-being of an organism. Some examples are sex, the need for belongingness and esteem. Even more complex needs are self-actualization, a cognitive need for understanding and
a need for aesthetics (Maslow, 1954). The need of self-actualization describes an individuals desire for self-fulfillment. The cognitive need for
knowledge and understanding has different underlying mechanism. Acquiring knowledge and systematizing the environment increases security
and reduces unpredictability. Curiosity is a innate mechanism that has not
to be taught to infants. A second explanation of this need is the satisfaction of insight and understanding. Aesthetic needs are based on a crave
for beauty and a repulsion of ugliness.
The main difference between needs and motivation is that needs define
desired conditions while motivations describe cognitive states of wants.
3
In a healthy subjects all unsatisfied needs are succeeded by a motivational
state. Mostly but not always this state drives a behavioral action. Motivation is not the motor of behavior. It describes only the causal mechanisms
of action initiation. An agent is motivated because he or she detects an
unsatisfied need. Needs are organized into a hierarchy of relative prepotency. This means that if two needs from different hierarchy level are
unsatisfied the more basic need will drive behavior. Gratification and satisfaction built a fundamental concept in motivation theory. A satisfied
need gives raise to another need. This does not imply that a need has to
be 100 % satisfied until the next need starts driving.
Gratification of needs and deprivation of resources and conditions that
satisfy needs built the basic mechanism for the emergence of emotional
states. Motivational states provide an agent with a drive that is necessary
to behave if a need is not satisfied.
1.2.2
The Valence System
Emotions code valence. This is the most important functional axiom of
every emotion. They inform an individual if a stimulus is good or bad.
Because stimuli can have different levels of complexity different mental
and somatic processes are involved the human valence system. As we
will see later in this introduction different subcortical, cortical clusters
and bodily reactions have been identified so far to process the valence of
a stimulus. The mechanism of evaluation is called appraisal.
1.2.3
The Appraisal Mechanism
Appraisal is a cognitive-somatic evaluation process that identifies a stimulus as positive or negative affecting (Arnold, 1960). This stimulus can
be internal or external. The evaluation process is structured into different dimensions (Lazarus, 1991). The goal relevance determines whether
a given stimulus or situation is relevant to one’s goals. The dimension of
goal congruency determines whether the stimulus or the situation facilitates or averts the agents goals. The type of ego-involvement identifies
4
the relationship between the agents ego- identity within a group of people. The fourth dimensions determines who or what is accountable and
whether a credit or blame should be assigned. The coping potential defines if the agent is capable to deal with the result of the stimulus or the
situation. The future expectancy estimates the likelihood for further congruencies with the agents goals.
The six dimensions of appraisal:
• Goal relevance
• Goal congruency
• Type of ego-involvement
• Blame or credit
• Coping potential
• Future expectancy
All six dimension are processed in parallel and overlapping in time.
The results of this appraisal mechanism is the induction of positive or
negative affect expressed by different responses on cognitive and somatic
levels.
1.2.4
Neurocomputational, Physiological and Behavioral
Responses
The final building block of emotions is the response repertoire. On a cognitive level we observe influences on memory building processes, perception, attention and decision mechanisms. Adaptions in synaptic plasticity
and neurotransmitter releases have been identified as crucial neurocomputational factors responsible for this cognitive responses. On a physiological level the human body modulates hormonal levels, energy consumption, transpiration, respiration, the body temperature and the muscle
control. These adaptations produce a wide variety of expressive and goal
directed behaviors.
5
1.3
The Time Scale of Emotions
It is important to understand that the emergence of an emotion is a sequentially structured processes. The three different mechanisms that underlie
this process are: Perception, evaluation and response.
Through perception a system detects and identifies internal and external stimuli, for example a bear as a big animal. This process can also happen unconsciously. The perception of homeostatic or physiological states
for example do not always reach consciousness. At this stage of processing we do not have any information about the meaning of the stimulus.
The second stage is appraisal. This mechanism evaluates if the stimulus is
goal-congruent or not. This mechanism differentiates emotions from nonemotional cognitive processing. The appraisal mechanism can be processed on a somato-sensory level with very little cognitive activity. The
smell of rotten food for example induces a straight forward emotion of
disgust. But appraisal can also be happening without any somato-sensory
input. For example the imagination of past or future events like singing
in front of an audience can induce emotions as fear or happiness. The last
stage of the sequence is the response. This includes three dimensions:
Cognition, physiology and behavior. The paradigm of fear conditioning
is a prime example how dramatically cognition, physiology and behavior can be affected by potent stimuli. The endurance of these responses
vary across systems. Behavioral responses often happen quickly, while
physiological and cognitive responses can have long-lasting effects.
Emotions are not a on-off mechanism. They arise and disappear over
time. Because the evaluation and the responses include somatic and cognitive system, the time course vary highly across systems. A fearful stimuli induces a quick behavioral and also hormonal response. Now the released stress hormones may stay for hours in the blood. On a cognitive
level a positive experience can be stored in memory and retrieved over and
over again inducing a long lasting positive feeling. This means that emotions can be induced very quickly but having long lasting effects. Emotion
research therefore classify emotional states that last over multiple hours
or days as moods (Frijda, 1993). The conscious perception of the mental
6
representations that characterize emotions has been classified as feeling
(Damasio, 2001). Its important to make this differentiation that are based
in the time course of the emergence of an affective state.
1.4
What Distinguishes an Emotion From a NonEmotion?
The understanding of emotions fundamentally changed during history.
One main conceptual shift can be summarized with the body mind problem: Are emotions body responses or cognitive states? Where is the location of emotions? As we will see one fundamental difference between
emotions and non-emotions is s unique interconnection of somatic and
cognitive processes. Emotion scientist have elaborated different theories
that try to identify the location and relation of different emotional processes.
Already in early theories of emotions that can be tracked back to the
greeks and asian philosophers we observe two main theories of emotions
that later gave raise to distinguishable philosophical streams addressing
this problem. In the dualistic approach the emotions or passions are place
in the immortal soul and affected by the bodily spirit. The immediate
cause of bodily spirit movements is an external cause that gives raise to
an emotional reaction. The object of such an event is the content of what
the emotion is about (See Figure 1.1).
1.4.1
The Feeling Theory of Emotions
The cartesian idea, also know as the Feeling Theory of Emotions, was
fundamentally expressed in the work of William James (1842-1910) and
Carl Lang (1843-1900). The James-Lang theory of emotions can be conceptualized with the well know example of the bear:
Walking through the woods one day, Susan stumbles across a
large grizzly bear which then starts running towards her. She
7
Figure 1.1: Schematic illustration of the Cartesian analysis of anger. The
exciting cause from the external world stimulates the body spirits that are
conceptualized as the immediate cause of emotions. The bodily spirits
give raise to both the behavioral response and the emotion itself. In the
cartesian approach it is not clear what the object of the emotion is. Figure
retrieved from (Power and Dalgleish, 1997)
8
turns and runs away. The conscious perception of the change
of her physiological state makes her feel terrified.
So the person starts running and induces a physiological change. The
perception of these exciting fact is the emotion itself (James, 1884).
There are different problems associated with the Feeling Theory of
Emotions. It has been claimed that this theory can not explain the wide
range of different emotions and behaviors (Cannon, 1927). Physiological
states are ambiguous, so how do we differentiate between fear, anger or
jealousy? This includes that we body can also become physiologically
aroused without experiencing any emotion. Intense sport activity is a
good example for that case. Another problem of the feeling theory is
that it regards emotions as inner states that only can be known through
introspection. Philosophers conceptualized this problem as the Private
Language Problem (Wittgenstein, 1963). It states that any word, that only
describes a state that can be accessed by subjective observation acquires
a pure private and unverifiable meaning. The two described problems
summarize the main critique scientist expressed in response to the Feeling
Theory of Emotions.
A related approach is the one psychological behaviorism. The principle idea of this scientific direction is that we can not base our theories on
introspective mental states, that can not be accessed from ’outside’. Its
theoretical goal is to make predictions of behavior using objective observant experimental methods (Watson, 1930). Both approaches psychological behaviorism and Feeling Theory of Emotions claim that the constituent
parts of emotions are the physiological response induced by the stimulus.
The main difference is that James-Lang and his followers make some
claims about the mental states of the subject, while the psychological behaviorists base their conclusion on objective observations. The Watsonian
account describes the basic emotions of fear, rage and love. But this approach faces the same difficulties as the Feeling Theory of Emotions when
it wants to describe a wider range of emotions. Other behaviorists tried
to face this critics by identifying a set of operants that induce clearly defined reinforced behavior (Skinner, 1976). The main idea is that a set of
operants A induce a set of reinforcers A. This set defines the emotion 1.
9
The emotional state 2 then is induced by a set of operants B leading to a
set of reinforcers B. Going back to the bear example we can visualize the
problem we are facing with this account: Lets assume that Susan decides
to stand still rather than start running. According to Skinner’s theory this
would lead to a different emotional state, because we are changing the
operants. But we can not be sure that this is the case.
1.4.2
The Cognitive Approach to Emotions
We see that we are facing fundamental problems with emotion theories
that base the description of emotional states mainly on the physiological
changes in the body. This is the main reason why scientists’ early on
started to include the mind in their investigations of emotions elaborating
cognitive theories of emotions. A fundamental basis of his work is the
distinction of matter and form in any individual entity (Evans, 1995). It
means that any individual entity can be described by what its made of the matter - and what it makes it what it is - the form. Applied to the emotions this means that the physiological response, for example the boiling
blood accounts for the matter, while the relationship to the object of the
induced emotion accounts for the form. Aristotle’s functionalist view of
the emotion is based on three conditions that have to be satisfied to elicit
an emotional state: First, the object that describes the external situation
the individual is confronted with. Second, individual must have a state
of mind that allows him to experience the emotion and third a stimulus
capable to elicit the emotion 1.2.
Aristotle’s functional concepts of stimulus introduces the mind as an
evaluator of the confronted situation and allows us to distinguish between
different emotions. Thomas Aquinas (1225 - 1274) and Baruch Spinoza
(1632 - 1677) are probably the most known philosophers that took up this
functional approach of emotion in their work. A difference to the concept of Aristotle is that both philosophers include in their theory a noncognitive impulse that controls initial approaching or avoid to an object.
This impulse induces a physiological tone that affects a basic emotional
state, like pleasure or pain. In a second step cognition evaluates the ’ac10
Figure 1.2: The cognitive account of emotions by Aristotle applied to
anger. The object describes the external event that becomes evaluated
by the individual, that is in an appropriate state of mind. The results is
an internal representation or stimulus that elicits the emotional response
which is divided into the dimension of matter and form. Figure retrieved
from (Power and Dalgleish, 1997)
11
companied idea’ of the emotion. In Spinoza’s approach cognition has no
causal role and, thus it is an example of a weak cognitive theory of emotions. Another issue of discussion is how the non-cognitive nature of the
initial impulse. It is unclear how this impulse gives either rise to one or
another physiological tone.
Both philosophical streams the Feeling Theory of Emotions and the cognitive approach tried to identify the location of emotion processing either in
the body or in the mind. Actual theories of emotions are mainly inspired
by the cognitive account, without blending out the somatic component of
the body. As we will see we find components of both streams in modern
emotion research. One unifying idea in the actual discussion is the mechanism of appraisal or the cognitive evaluation of a stimulus as affecting
oneself in some a way that matters (Arnold, 1960).
Despite the cognitive appraisal theory of emotion is very popular today, it would be wrong to say that the Feeling Theory of Emotions is not
present anymore in the actual debate. One of the most popular defender of
this idea was Robert Zajonc. He stated in his well recognized article that
affective and cognitive systems are largely independent (Zajonc, 1984).
One core concept of his theory is the ’primacy of affect’. That means
that an emotion does not require by necessity a cognitive state. He bases
his argument on differences in the phylogenetic evolution and the separate
neuroanatomical structures of the two systems. He also challenges the appraisal theory by showing that an emotional state can be induced without
any prior mental state using drugs, hormones or electrical stimulations.
An interesting example how the cognitive evaluation and the somatic
state are inter-connected can be seen in an empirical work by Schachter
(1964). In their famous experiment they manipulated the physiological
arousal of people by giving them epinephrine also know as adrenalin).
Now they exposing the test group which was not informed about the effects of the epinephrine injection and the control group which was informed about the side effects of the epinephrine to either an euphoria
or anger inducing situation. Their results show that the emotional response to positive and negative affect was bigger in the misinformed test
group. This example shows how a cognitive mechanism labels an am12
biguous somatic state thereby induces a different emotional experience.
This means that the appraisal mechanism can not be completely uncoupled from the body. A similar approach is the somatic marker theory that
states that body markers influence the process of responses to a stimuli at
multiple levels of operation (Damasio et al., 1996; Bechara et al., 2000).
Some of those processes occur consciously in the mind and some of them
non-consciously in a non-minded manner. These markers are called somatic because they arise in bioregulatory processes normally involved in
emotional states. The theory concludes that not only the mechanism of
appraisal but also reasoning and decision making processes that traditionally are understood as pure cognitive are influenced by this mechanism.
We can summarize that the ’feeling’ approach of James-Lang is facing
different difficulties in explaining the full complexity of different emotional states only referring to the perception of physiological changes. On
the other hand a pure cognitive approach is not capable to explain the
why emotions can precede mental cognition and can be induced artificially without affecting mind processes. This brings us to the conclusion
that an embodied appraisal that includes psychosomatic representations
into the cognitive evaluation of the stimulus could be understood as constructive proposal to unify the two dimensions (Prinz, 2004). We also
have to be aware that during this section we were shifting between different emotional states of different grades of complexity. The generalization
of emotional states of different levels of complexity is critical. As we will
see in the next section we have to differentiate between different simple
button-up and complex top-down mechanism if we want to understand to
the phenomenon in its full picture.
1.5
Basic and Complex Emotions
If we talk about emotions we have to be aware that term includes a wide
range of different body mind states. In this section I want to address the
third question stated at the beginning of the introduction: ’What distinguishes an emotion from another?’ So the question is if emotional states
13
can be categorized into different classes or if they have to be described as
a multidimensional continuum.
Emotions are evolved adaptive perception - response patterns that help
an individual to survive.
Different situations and stimuli induce different emotional states. This
causality can be explained with the idea that emotions have different formal objects. Fear is about danger and sadness is about loss for example.
So emotions do not have an unifying formal object (De Sousa, 1987). This
brings us to the problem that we have to describe the different observed
states. Now we can follow two approaches to do so: We can either describe the differences of the phenomenon using modular parameters constructing a multidimensional semantic space or we use categories. The
first approach was conceptualized by the disunity thesis that states that
emotions do not form a natural class (Griffiths, 1997). By natural class
we mean boundaries between categories that derived from nature and not
from the classification of humans observing nature. Griffiths arguments
that the phenomenon emotions can at least be divided into two subcategories. These two categories can be described by the concept of ’affect
programs’ and higher cognitive emotions.
Affect programs are fast appraisal mechanisms that induce physiological changes and action dispositions (Ekman and Friesen, 1986). This
programs are modular, means they are divided into modules that process
a certain information stream that is not affected by the information representation in other processes. Affect programs describe a basic set of
emotions that include fear, anger, happiness, sadness, surprise and disgust. This means that we are capable to produce a predefined stimulus
that triggers one of the named basic emotional state. Ekman shows that
these basic emotions are universally expressed and perceived across different cultures (Ekman and Friesen, 1986). Scientists’ investigating the
neural correlates of basic emotions show that evolutionary old subcortical
structures are involved in the processing of these states (LeDoux, 2000).
Scientists do not agree on a clear definition which emotional states should
be described as basic and which not (See table 1.5).
The concept of basic emotions does only capture a subset of states that
14
Table 1.1: ’Basic’ emotion classes of different theorists according to
Ortony and Tuner (1990)
Theorist
Basic Emotions
Arnold (1960)
Anger, aversion, courage, dejection, desire,
despair, fear, hate, hope, love, sadness
Ekman, Friesen,
Anger, disgust, fear, joy, sadness, surprise
Ellsworth (1982)
Frijda (Personal
Desire, happiness, interest, surprise,
communication, 1986) wonder, sorrow
Izard (1971)
Anger, contempt, disgust, distress, fear,
guilt, interest, joy, shame, surprise
James (1884)
Fear, grief, love, rage
Mowrer (1960)
Pain, pleasure
Oatley &
Anger, disgust, anxiety, happiness,
Johnson-Laird (1987) sadness
Panksepp (1982)
Expectancy, fear, rage, panic
Plutchik (1980)
Acceptance, anger, anticipation, disgust,
joy, fear, sadness, surprise
Tomkins (1984)
Anger, interest, contempt, disgust,
distress, fear, joy, shame, surprise
Watson (1930)
Fear, love, rage
15
we call emotions. The so called higher cognitive emotions are based on
the cognitive evaluation of a stimulus or of the self in the social context.
They are completely disconnected from direct somatosensory streams.
Some examples are the emotions pride, shame, embarrassment, empathy and guilt (Strongman, 1987; Lewis, 1993). These emotions are also
called self-conscious emotions, because they require a concept of self in
order to emerge. They are highly influenced by the standards of a society
and therefore differ across cultures. Another difference to basic emotions
is that these type of affective states often lack classes of specific stimuli
capable to elicit the emotion. The elicitation of pride for example requires
different factors all having to do with cognition related to the self. Cognitive factors may also play a role in the elicitation of more basic emotions,
however the nature of these processes are much less cognitively elaborated than in self-conscious emotions (Plutchik, 1980). Another reason
that makes why higher cognitive emotions are more difficult to classify
is the lack of coherent physiological and behavioral response patterns.
While basic emotions like happiness or sadness can be differentiated by
distinct facial expressions, complex emotions like guilt or shame are more
difficult to identify on all levels, physiological, behavioral and neurocomputational.
The neuroanatomical organization of the cortical and subcortical clusters involved in the processing of basic and complex emotions reflect also
the evolutionary development of the two types of processing (LeDoux,
2000; Aggleton, 1992). As we will see basic emotions as fear or disgust
are associated with activity in subcortical evolutionary old structures in
the brain stem. Social and self-conscious emotions are additionally to
these old structures inducing activity in cortical areas like the prefrontal
cortex. A functional organization reflecting this development has been
proposed for the insular cortex (Craig, 2009).
16
1.6
The Neurobiological Basis of Emotions
This section will provide an overview of the most important areas involved in the processing of affect and the elicitation of appropriate responses.
1.6.1
Subcortical Areas
In traditional emotion research, a set of subcortical brain structures including the hippocampus, amygdala, anterior thalamic nuclei, septum,
limbic cortex and the fornix are conceptualized as the main location where
emotions being processed. This network is also known as the limbic system. Despite this term is widely used in the field, only few empirical
work can be found that fosters and defends the functionality of the limbic system (LeDoux, 2003). Because of this lack of empirical results we
will skip this term in future discussion and rather explain the connectivity,
anatomy and functionality of each area in particular.
One of the most prominent investigated nucleus in the subcortical
cluster is the amygdala. The structure is highly connected to multiple
cortical and subcortical areas (Swanson and Petrovich, 1998; Aggleton,
1992). It receives input from all five senses and is therefore a dominant
relay station of sensory information transmission (See figure 1.3).
The most profound investigation of the amygdala’s functionality has
been done using the paradigm of fear conditioning (LeDoux and Phillips,
1992; Smith et al., 1995; LeDoux, 1996, 2000; Sehlmeyer et al., 2009;
Johansen et al., 2010). These studies have shown that the amygdala is
responsible for the evaluation of fearful stimuli. Recent studies provide
results that the amygdala also evaluates positive valence, proposing this
structure as a general affect detector (Paton et al., 2006; Salzman and Fusi,
2010).
The activity of the amygdala stimulates different subcortical and cortical clusters. The most prominent modulatory systems are the nucleus
basalis, the raphe nuclei, the pons and the ventral tegmental area which
regulates the neurotransmitter serotonin, acetylcholine and dopamine. These
17
Figure 1.3: The connectivity of the amygdala. This nucleus receives inputs from all sensory modalities, cortical and subcortical areas. The output is transmitted to modulatory systems and neuronal correlates in the
brain stem. The direct connection to the hypothalamus allows the amygdala to trigger hormonal responses. Figure adapted from (LeDoux, 2006)
18
neurotransmitters are responsible for the regulation of moods (Young and
Leyton, 2002), anxiety (Hariri et al., 2002), memory acquisition (Gold,
2003) and reward processing (Schultz, 2002). Another output location of
the amygdala is the autonomous nervous system (ANS). The two main
divisions of the ANS, the parasympathetic nervous systems and the sympathetic nervous system, receive inputs over the vagus nerve and the hypothalamus (Bechara et al., 1999). The parasympathetic nervous system
is responsible for the regulation of rest and digest functions, the sympathetic nervous system for fight and flight behaviors. Additionally the
amygdala connects to the association cortex that is related to cognition
(Price, 2003) and to the prefrontal cortex related to the control of behavior (Quirk et al., 2003).
The dense connectivity of the amygdala provides the basis for its two
main functionalities: the evaluation of affect and the orchestration of different response patterns (Anderson and Phelps, 2001). Functional imaging studies show evidence that the amygdala is involved in the detection
of fear expressions in faces (Adolphs et al., 1994; Morris et al., 1998), the
regulation of anxiety (Sehlmeyer et al., 2009) and social behavior (Bickart
et al., 2010; Davis et al., 2009; Haruno and Frith, 2010; Schiller et al.,
2009).
Another subcortical region involved in emotion processing is the hippocampus. This structure has a long-established role in spatial memory.
Recent studies have identified this structure also as a regulator of defensive responses. Rats with bilateral hippocampus lesion show reduced
freezing (LeDoux and Phillips, 1992), displayed fewer defensive reactions when confronted with a cat (Pentkowski et al., 2006), reduced expression of unconditioned responses (Deacon et al., 2002) and avoidance
behavior of threatening stimuli (Chudasama et al., 2009). These results
underlie the hippocampus’ important role in the normal expression of fear
responses.
The described network of subcortical clusters with the amygdala functioning as a relay station, the hippocampus as a controller of defense behavior and the different nuclei in the brainstem as global regulator of hormonal and behavioral responses can be seen as the basis of emotion pro19
cessing. Compared to cortical areas this structures evolved earlier in time
and can be found in non-primate animals. Therefore emotion researchers
describe this pathway as the ’old route’ (LeDoux, 1996).
1.6.2
Cortical Areas
Complex stimuli stimulate cognitive mechanisms that are processed in a
wide cortical network. Three cortical brain regions that are involved int
the processing of emotions have to be pointed out: the insular, the lateral
somato-sensory and the prefrontal cortex.
The anatomical architecture reveals a strong connection between the
prefrontal cortex and the amygdala, the hippocampus and the hypothalamus (Salzman and Fusi, 2010) (See figure 1.4). It has been shown that the
pre-frontal cortex is involved in the regulation of fear conditioning (Quirk
et al., 2003; Sehlmeyer et al., 2009), phobia’s (Hermann et al., 2009), the
control of goal directed behavior (Fuster, 2008) and regulation of hormonal and expressive emotional responses (Kalin et al., 2007). Based on
these results it has been suggested that the pre-frontal cortex is involved
in the regulation of emotion processing.
The second important cortical structure involved in the processing of
emotions is the insula. This area receives visceral, pain and gustatory sensory input and has therefore been proposes as the location of somatosensory body representation (Craig, 2010). This body representation is involved in the regulation of homeostasis, but also in decision processes.
The somatic marker theory states that the imagination of hypothetical
future decision outcomes induces somatic states (Damasio et al., 1996;
Bechara et al., 1999; Bechara and Damasio, 2005). The conscious perception of these somatic states affect then the decision process. Functional
neuroimaging studies show activity in the anterior insula that are related
to social emotions like empathy, compassion and cooperation (Lamm and
Singer, 2010). Electrical stimulation of the insular cortex produces social
behavior related to these emotions (Caruana et al., 2011). Daniel Craig
(2009; 2010) has proposed that the emergence of consciousness is based
on the somatic and sentient-self representation processed in the anterior
20
Figure 1.4: The connectivity of cortical and sub-cortical clusters. The
prefrontal cortex is highly connected to the amygdala, sensory cortices,
the hippocampus and nuclei in the brain stem that regulates hormonal
responses. Figure from (Salzman and Fusi, 2010).
21
part of the insula.
Another structure involved in the processing of emotional and social
signals is the lateral somatosensory cortex, especially the superior lateral
sulcus (SLS) and the fusiform gyrus. These areas have been proposed
to be involved in the perception of face expressions and face identity.
(McCarthy et al., 1997; Haxby et al., 2002; Kanwisher et al., 1997). The
understanding of emotional connotation in voices and prosody has been
related to increased activity in the right lateral hemisphere (Bookheimer,
2002), while the understanding of the emotional content transmitted on
the semantic channel has been associated with the left lateral hemisphere
(Bookheimer, 2002; Binder et al., 1996, 2009).
The discussed areas show how multilayered the processing of an emotion stimulus is. The variety in stimulus complexity is one of the main
reasons why it is difficult to define exactly the brain areas involved in
emotion processing. This means that also more exact definitions of the
different dimensions of the phenomena emotion will be needed to clearly
identify the underlying neurobiological substrate.
22
Chapter 2
SYNTHETIC EMOTIONS AND
EMOTIONAL AGENTS
So far artificial intelligence (AI) focused on the construction of computational models and applications capable to solve cognitive or behavioral
task, ignoring any emotional component. Based on new insights from
neuroscience and psychology we observed a paradigm shift dedicating
more functional importance to emotional processes. Today it is widely
accepted that emotions have a fundamental function that increased the fitness of an individual in complex environments. In the recent years we
observe a recognizable number of computational models of emotions and
affective processes. This trend runs in parallel with the development of
android robots and autonomous virtual agents that target social interaction with humans. The following chapter will give a short introduction
into the science of synthetic emotions and emotional agents.
2.1
Synthetic Emotions
The multidimensional phenomenon emotion can be layered into stimulus
perception, appraisal, and response elicitation. Computational models of
emotions address one or multiple of these stages. In affective computing
an important distinction has to be made between theory modeling that
23
focuses on the understanding of the phenomenon emotion and application
modeling that aims to improve the control of autonomous agents.
2.1.1
Theory Modeling
Theory modeling starts with a formalization process based on the insights
from neurophysiological and psychological experiments. This process
results in a conceptualization of a theory of an emotion mechanism. To
verify the plausibility of the model scientists compare the data from neurophysiological experiments with the performance of the model. An established model is the well studied fear conditioning paradigm (LeDoux,
2000). The insights from these experiments gave raise to computational
models that try to illuminate the underlying neurocomputational mechanism of plasticity in the amygdala (Armony et al., 1997; Mor, 1995). The
proposal of protoemotions, basic hard-wired reactive mechanism that result in the detection positive or negative valence have been used to construct a bottom-up model of synthetic emotions (Vallverdú and Casacuberta, 2009). All these approaches model a basic neurobiological mechanism that underlies emotions. Other models target more complex mechanism of emotions like appraisal (Wehrle and Scherer, 2001), reasoning
(Davis and Lewis, 2003), the regulation of emotions (Elliott and Siegle,
1993) and the involvement of multiple brain areas (Balkenius and Morén,
1998).
The main objective of theory modeling is the investigation of the underlying neurocomputational mechanism of emotion processes.
2.1.2
Application Modeling
The second stream of synthetic emotion modeling focuses on the construction of controller for interactive agents. The increasing number of
applications targeting social interaction with computers, machines and
agents motivate researchers from different discipline to construct computational emotive architectures. These models target the dynamics of affective states, appraisal and response patterns (Velásquez, 1997; El-Nasr
24
et al., 2000; Marsella and Gratch, 2009; Gratch and Marsella, 2004), the
perception and expression of emotions (Breazeal, 2003) or the influence
of emotions in human-agent conversations (Pelachaud and Bilvi, 2003).
Other approaches layer their models into emotions, moods and personality to include the time component related to different affective states and
personal characteristics (Gebhard, 2005; Corchado Ramos et al., 2009).
All these models aim to enrich the social interaction of virtual or real
agents with humans.
In this first section we have seen that the modeling of synthetic emotions has two main objectives: The understanding of the phenomena achieved
by theory modeling and the construction of useful applications. In the
next section we will introduce some of the the most popular existing
agents that aim to interact with humans.
2.2
Emotional Agents
In the recent years we have observed an increasing number of virtual and
physical agents that are constructed for social interaction with humans.
The variation in surface properties, aesthetics, controlling and functionality is huge and makes it difficult to keep an overview. In this section we
would like to introduce some of the most important achievements in the
field.
2.2.1
Virtual Agents
The boost of computer applications that use autonomous virtual agents
to interact with humans increases the demand for believable behavior.
The appropriate elicitation of emotional expression that are being recognized as such are fundamental in this approach. This does not imply
that the agent has to be realistic. For example the expressive agent Simon is a comical representation of a human baby, but highly expressive
(Velásquez, 1997). Another approach is to parameterize facial features
used in the visual speech production in order to be capable to produce a
25
realistic text to speech transaction in real time (Massaro, 1998). Newer
agents are showing more realistic renderings and an increased variety of
expressions (Bevacqua et al., 2008). The behavior of some of these agents
are controlled by the Affective Presentation Markup Language, a systematic collection of commands useful for the control of an agent in conversation (DeCarolis et al., 2004). Similar programming languages deal
with the multi-modality of conversations (Zong et al., 2000) or on the
emotive components (Schröder et al., 2007). Emotional conversational
agents (ECA) are agents capable to express and perceive emotions while
communicating with humans (Becker et al., 2004; Gratch et al., 2002;
Schröder et al., 2008). Some of these examples are used in online stores,
video games, help-lines, street navigation or intelligent homes.
26
2.2.2
Physical Agents
Today a wide variety of robots constructed for social interaction can be
observed. These agents differ in anatomy and function. The following
section will give an overview of some of the most recent physical agents
that are related to social interaction.
Full Body Humanoids
One main objective of full body humanoids is to construct robot that are
equipped with humanoid-like motion (See figure 2.1). Their joints have
a wide degree of freedom allowing them to have a wide variety of gestural and postural behaviors. Such agents are designed to interact with
humans. One big challenge of mobile humanoids is the supply of power
and computational power. This problem can be solved by outsourcing the
computational processes to external serves and equipping the robot with
mobile batteries.
Figure 2.1: Full body humanoid robots. Asimo (left), Hubo (middle) and
iCub (right) are three examples of androids with different capabilities and
objectives.
In the recent years the construction of small full body humanoids has
made impressive improvements (See figure 2.2). These robot-platforms
provide autonomous agents with an impressive motor control for much
lower costs compared to their bigger brothers.
27
Figure 2.2: Small full body humanoid robots. Nao (left) and Qrio from
Sony (right).
Geminoids
Geminoids are android robots with an impressive high realistic anatomy.
The first ever developed geminoid is a copy of Hiroshi Ishiguro form Osaka University, Japan (See figure 2.3). The main objective of geminoids is
the investigation of the underlying psychological mechanism of android
perception and interaction. The body and the voice of these robots are
teleoperated means that a human remotely controls the interaction with
other humans.
Upper Torso Humanoids
Upper torso humanoids build a class of robots that are not equipped with
legs or lower body parts (See figure 2.4). Some of these humanoids can
move using a mobile platform. One main difference to full body androids
or geminoids is their increased perceptual capability. This is because such
robot platforms are targeting social interaction and communication and
not motor control.
28
Figure 2.3: The three most popular teleoperated androids, so called geminoids: Model F (left), HI (middle) and DK (right) front row, with their human ’originals’, Anonymous young female (left), Hiroshi Ishiguro from
Osaka University, Japan (middle) and Henrike Scharfe from Aalborg University, Denmark.
29
Figure 2.4: Upper torso robots: Nexi (left), Domo (center left), Barthoc
(center right) and Armar3 (right), partially with mobile platform.
Humanoid Heads
Humanoid heads are constructed for verbal and non-verbal interaction
with humans in a face to face set up. These systems are equipped with
face and expression detecting capabilities and an expressive behavioral
repertoire (See figure 2.5).
Figure 2.5: Expressive robot heads: Kismet (left), Mertz (center) and
Roman (right).
30
Zoomorphic Robots
The construction of emotive interactive agents does not necessary have
to lead to humanoid robots. The so called zoomorphic robots aim to use
either animals or animal-like entities as inspiration for their construction
(See figure 2.6). The have an increased expressive repertoire focusing
mainly on social interaction on a verbal or non-verbal dimension.
Figure 2.6: Zoomorphic robots: Emuu (left), iCat (center left), Leonardo
(center right) and Probo (right).
As we have seen there exist a wide variety of interactive physical
agents. The anatomy, computational capability and functionality of these
applications differ with their objectives. In the next chapters we want to
introduce different studies from our lab that us of such agents to investigate the underlying mechanism of emotion perception, processing and
expression in relation to social interaction.
31
Chapter 3
NON-VERBAL BEHAVIOR
AND SOCIAL INTERACTION
The first question we address is if and how humans perceive artificial
agents. Before we dive into the complex world of emotive communication
we investigate a more subtile code of social interaction: The regulation of
the interpersonal space.
3.1
Human Spatial Behavior
Humans use a complex code of non-verbal interaction including facial
expressions, eye-contact, gestures, postures, and the regulation of interpersonal distance to communicate their intentions and feelings (Birdwhistell, 1975; Ekman, 1993; Mehrabian, 1972; Sommer, 1969; Argyle and
Dean, 1965). In this study we investigate the spatial dimension of social behavior. In particular we analyze how people regulate their interpersonal distance to each other while they are engaged in a cooperative
task. We are also interested to understand how the salience of a stimulus,
for example the perception of another person, affects social interaction.
Therefor we investigate the proxemic behavior of players interacting with
either a virtual character or a physical counterpart.
Three factors that regulate the spatial distance to others have been
33
identified so far (Baldassare, 1978): Biologically pre-programmed instincts, the environment, and the cultural background of people. Behavioral studies have shown that animals have innate behavioral mechanism
that affect the regulation of their territory defense (Hediger, 1964). The
violation of this space induces psychological stress expressed in physiological arousal and behavioral fight or flight responses. A hypothesis on
the underlying neural substrate of this regulation has been proposed by
a computational model of allostatic control (Sanchez-Fibla et al., 2010).
As a second factor ecological psychology has identified environmental
aspects that affect social interaction on a spatial scale (Stokols, 1978).
Thereafter the spatial organization of space influences grouping behavior (Sommer, 1969), the building of friendships (Festinger et al., 1950),
crime rates (Newman, 1973) and community life (Jacobs, 1961). Ecological psychology distinguishes between different types of spatial cognition
that trigger either active and reactive behavioral responses on a spatial
dimension. The theory defines different modes of human-environment
transactions that are used to explain the environmental influence on human behavior (Stokols, 1978). Another theory dealing with the influence
of space on cognition and behavior is the one of space syntax. It proposes
that the spatial configuration of buildings and cities influence implicitly
spatial cognition and navigation performance, without explicitly assuming anything about individuals motivations (Hillier, 1996; Penn, 2003).
The theory states that environmental cognition constructs rather a topological than a metric representation of space that affects the individual
behavior in predictive ways.
The third factor affecting spatial behavior is culture. The theory of
proxemics states that people regulate their inter-personal distance to each
other as a subtle code of social behavior that differs across cultures (Hall,
1963, 1966). Hall classified the inter-personal distance to other humans
into four different categories: Intimate space (0 - 0.46 meter), personal
space (0.46 - 1.2 meter), social space (1.2 - 3.66 meter) and public space
(3.66 - 7.6 meter). The intimate space is only shared with closest friends
and confidants, the personal space with familiar persons. The social space
is the interaction space for routine social interactions with acquaintances
34
as well as strangers, while the public space is not perceived as personal
and relatively anonymous. The perception of space varies across different
cultures.
The three described factors of inter-personal distance regulation do
not explain the entire phenomena of spatial behavior. We know from
behavioral psychology that people interacting with each other follow regulative dynamics and adapt their actions and thus distance in response to
the behavior of their counterpart (Burgoon et al., 2007). The dyadic interaction between two individuals, which is the product of approaching and
avoidance forces, balances the mutual comfort of the interactors (Patterson, 1973). The result is a synchronization of actions and responses, that
express the adaptive regulation of ease and stress. Approaching tendencies triggered by affiliated needs balance the avoidance tendencies controlled by various fears. The behavioral equilibrium, which is expressed
through a number of non-verbal interaction patterns is the result of a comfortable perceived level of intimacy (Argyle and Dean, 1965). Based
on these findings we propose a first hypothesis to investigate the spatial dyadic interaction: People that are engaged in a collaborative spatial
task perform significantly different depending on the team strategy they
choose. Such strategies are expressed in quantifiable spatial behavior.
3.2
The Effect of Apparent Reality
One feature of the organization of behavior is the attribution of a change
to a perceptual unit (Heider, 1944). This is a theoretical proposition in
behavioral psychology that helps designing studies that try to decompose
and understand the mechanism of how singular percepts affect human
behavior. Nico Frijda describes in his law of apparent reality how the
perceptual salience of a stimulus affects action tendencies that lead to the
elicitation of emotions (Frijda, 1988). Frijda states in this law that the
reality that affects behavior is the perceived stimulus property and not
the property itself. An empirical example for this effect are studies that
investigate methods to treat spider phobia with stimuli that differ in re35
Figure 3.1: The eXperience Induction Machine XIM, a fully instrumented
mixed reality space that can be accessed by multiple users either as physical visitors or virtual representations. Virtual visitors are represented in
the physical space of the XIM on the surrounding screen and as lit floor
tiles. Physical visitors are represented as virtual characters in the virtual
world.
36
alism (Bandura, 1977). Other studies show a lower impact of symbolic
information compared to the impact of pictures of the same event on peoples psychological state (Fiske and Taylor, 1984). In social psychology
this phenomena is known as ’the vividness effect’ (Borgida and Nisbett,
1977). It summarizes that a vividly perceived stimulus induces a stronger
psychological and behavioral response than the cognitive knowledge of
the stimulus.
Despite some studies have challenged the power of this phenomenon
(Taylora and Thompson, 1982; Kisielius and Sternthal, 1986), it has been
shown that the vividness of a stimulus affects memory building processes
and judgments (Baddeley and Andrade, 2000; McCabe and Castel, 2008).
The perceptual salience of a stimulus construct is only one important factor that influences cognition and performance. An other interesting question is how the presence by itself influences the behavior of another person. Darley and Latan 1970 showed that the perception of others reduced
the individual’s feeling of responsibility to act in an emergency situation.
The bystander in-action is often explained by apathy and alienation or the
bystander response to the observers responsibility. This is an example
how the mere presence of others affects fundamentally the behavior of an
individual. Another example of this phenomena is the so called audience
effect. Different studies have shown that people adapt their behavior and
expressions if they are performing an action alone or in the presence of
an audience (Kraut and Johnston, 1979; Fridlund et al., 1991). Based on
these findings we assume that the salience of the stimulus fundamentally
affects human behavior.
Hence, following this line of concepts and evidence on the role of the
perceived salience of stimuli in action we hypothesize that the perceptual
salience of another person affects social interaction that can be measured
in its spatial dimension.
We propose a second hypothesis to investigate this interesting phenomena: The perceptual salience of another person affects the social interaction on the spatial dimension.
To test these two hypotheses, we constructed a cooperative ball game
in a human accessible mixed reality environment, called the experience
37
induction machine (XIM) (Bernardet et al., 2007, In press) where two
teams of two players each had to find the optimal spatial strategy to win
the game. In previews work we could show that the inter-personal distance regulation is a subtle code of social interaction that can be attributed
to cooperative and competitive behavior (Inderbitzin et al., 2009). In this
study we want to use mixed virtual reality as a tool to understand a psychological phenomena know from the real world. Mixed virtual reality
combines virtual reality with a physical space where real world and virtual world merge to an immersive experience that does not restrict the
performance of users natural physical actions. Such applications offer
new possibilities to investigate fundamental psychological questions of
human behavior and social interactions, because they provide experimental control without loosing mundane realism as we know from traditional
psychological methods (Blascovich et al., 2002). By constructing an experimental set up that provides a collaborative space for virtual and physical humans, we are capable to investigate spatial cooperation and the effects of the stimulus salience on this behavior. When humans are present
in virtual worlds they keep certain behavioral interaction patterns. Recent
studies investigating the effect of gaze control and personal distance regulation in immersive virtual environments show that humans behave similar as known from real world situations (Bailenson et al., 2003, 2001).
Also repulsive reactions following the violation of the personal space in
stereoscopic 3D views have been documented (Wilcox et al., 2006). For
our study we used the mixed virtual reality space eXperience Induction
Machine XIM (Bernardet et al., 2007). XIM can be accessed by physical
and virtual visitors providing a collaborative space for the investigation of
human behavior. This unique set up allows us to analyze the behavior of
humans that are either physical or virtual without changing the context of
the situation
38
3.3
Methods
3.3.1
Materials
The study was conducted in the mixed virtual reality space eXperience
Induction Machine (XIM) (Bernardet et al., 2007, In press). The physical
space has a size of 5.5 by 5.5 meters and surrounds the visitor on all four
sides with wide screen projection walls. The luminous floor is built by 72
pressure sensitive hexagonal floor tiles (Delbruck et al., 2007). People in
the space are tracked by the Multimodal Tracking System MMT, which
combines infrared tracking information with the tactile information from
the floor (Mathews et al., 2007). The virtual world is produced by the
game engine Torque (GarageGames, 2010). The XIM can be experienced
by multiple users of different modalities - physical or virtual - sharing a
collective space of social interaction (See Figure 3.1). Users that enter
XIM remotely over a network see a virtual representation of the physical
space and the users present in the space on a computer screen. Remote
users are represented in the space as avatars on the surrounding screen and
as lit tiles on the floor. Remote visitors control their avatars using a game
pad and a wireless communication head set (Logitech ClearChatT M ) to
talk to their physical team player (See Figure 3.2).
3.3.2
Research Design
We constructed a cooperative mixed reality ball game, where two teams
of two players had to find the optimal spatial strategy to win. The ball was
represented as a yellow floor tile in the space (See Figure 3.1). Players
could control a virtual representation of a paddle through changing their
position in space. The aim of the game was to use the paddle to hit the
ball and reflect it towards the opposing team’s side. If the ball passed the
back most boarder of the playing field of one team a goal was scored. This
game could be either played by physical action in XIM or by using a game
pad to control an avatar visible on a computer screen (See figure 3.1 and
3.2). The independent variable in our study was the body representation
39
Figure 3.2: In the Mixed condition one remote player built a team with one
physical player. The remote players played the game using a computer
and a game pad. Physical players inside the XIM were represented as
avatars on the screen of remote players. Verbal communication between
the remote and physical player was established over a wireless communication headset.
40
itself. The dependent variable was the performance and spatial behavior
of the players. By varying the players representation between virtual and
physical we constructed three different game conditions: Physical, Mixed
and Virtual (See figure 3.6). In the Physical condition all participants
were inside the XIM and had to move physically to play the game. In the
Mixed condition one physical player inside XIM formed a team with one
virtual player using a computer to play. In the Virtual condition two virtual players formed a team. Only teams using the same modality played
against each other: Physical teams vs. Physical teams; Mixed teams vs.
Mixed teams; Virtual teams vs. Virtual teams.
3.3.3
Measures
The position of the four players, the ball position, goal events and paddle
- ball collisions were recorded at a sampling rate of 25 cycles per second
(See figure 3.3). We calculated three different aspects of participant’s
spatial behavior: The inter-personal distance regulation between teamplayers, players’ activity and the position in space. We quantified the
inter-personal distance and the time that team members shared either the
intimate or the personal space. To understand the spatial tactics on both a
global and local level, we measured the inter-personal distance regulation
for entire games, for all winning and losing epochs and for all offensive
and defensive game situations. An epoch was defined as the time window
lasting from the ball play out until a goal was scored. An offensive game
situation was defined as the time period in which the ball was moving
away from a team, a defensive game situation as the time period while
the ball was moving towards a team. To investigate the overall activity,
we calculated the mean distance that players moved in space. To analyze
the position of players we calculated the mean distance to the team field
mid-line, defined as a parallel line to the side line separating the team
field into two equal parts. The time that a player spent in the field side
of his/her team partner was used as an additional measurement of players
spatial distribution.
41
3.3.4
Procedure
The team assignment and the order of the game modality was randomized.
Every team played the game in all three conditions (Physical, Mixed and
Virtual). One game lasted three minutes. An experimenter explained the
game to the participants inside XIM and answered questions to make sure
that all players understood the rules. A rehearsal trial of about one minute
was played in every modality so that participants could familiarize themselves with the setup. The experimenter informed all participants that data
was recorded during the game and that they could leave the space at any
time if they did not feel comfortable.
3.3.5
Participants
Fifty-two healthy adults aged 18 to 30 years (M = 23.6, SD = 3.9; 33 %
women, 67 % men) were recruited from different universities of Barcelona
by an ad. All participants had at least finished undergraduate educational
level, and were all permanent Spanish residents. Participants were participating voluntarily in the study without financial reward. Participants
gave consent that the data of the experiment were used for scientific investigation.
3.4
Results
We recorded 13 games for each of the three conditions, yielding a total
sample size of 39 games. Three ties were observed, two in the Mixed
condition and one in the Physical condition. Winning teams scored a
mean of 10.5 goals (SD = 3.9), losing teams 5.3 goals (SD = 1.9). In tie
games 6.3 goals (SD = 0.5) were scored. Overall 609 goals were observed,
191 in the Physical condition, 213 in the Mixed condition and 205 in the
Virtual condition.
42
Figure 3.3: Spatial distribution of an example epoch. The ball play out
(red dot) starts in the middle of the field. At the beginning of the epoch
team players were positioned in their team side (blue and green dots).
The trajectories of the players show their spatial behavior over time. Play
direction was vertical. Team 2 scored a goal when the ball reached the
back line of team 1.
43
Table 3.1: Proxemics behavior of winners and losers: Mean time of
shared interaction space; standard deviation in brackets. IS = intimate
space; PS = personal space; Sig = significance (a p < 0.1, * p < 0.05, **
p < 0.01).
Game
Shared IS [sec]
Shared PS [sec]
Epoch
Shared IS [sec]
Shared PS [sec]
Offensive Situation
Shared IS [sec]
Shared PS [sec]
Defensive Situation
Shared IS [sec]
Shared PS [sec]
Winning Team
Losing Team
Sig
0.97 (2.5)
15.18 (19.9)
0.57 (2.0)
11.15 (14.5)
a
0.06 (0.1)
0.91 (1.0)
0.04 (0.1)
0.70 (1.3)
*
0.02 (0.1)
0.63 (0.7)
0.01 (0.1)
0.35 (0.6)
**
0.01 (0.1)
0.25 (0.4)
0.01 (0.0)
0.34 (0.5)
44
3.4.1
Spatial Scale of Collaborative Behavior
We used two measurements to quantify the spatial scale of the intra-team
member interaction: The mean distance between team players and the
time they shared the intimate and personal interaction space. We evaluate the difference in time in which team members shared personal space
across epoch winning and epoch losing teams (See figure 3.4) . Winners
shared their personal space longer with each other compared to losers: χ2
(1, N = 155) = 5.15, p < 0.05 (See table 3.1). Winners also shared their
personal space significantly longer with their team mates during offensive
moves: χ2 (1, N = 143) = 8.3, p < 0.01. As we will see this does not mean
that epoch winners in general displayed a shorter interpersonal distance.
Additionally winners choose a more offensive distribution compared to
losers: Wilcoxon z = 98.4, p < 0.01. The mean of the ranks for winners
distance to the back line was 0.55 meters, while the mean of the ranks for
the losers distance to the back line was 0.69 meters.
The analysis of the dyadic regulation of the inter-personal distance
during offensive and defensive game situations revealed a significant different pattern between epoch winners and epoch losers. Winners chose
a significant bigger inter-personal distance during offensive game situations: χ2 (1, N = 143) = 35.3, p < 0.01 and a closer inter-personal distance
during defensive situations: χ2 (1, N = 143) = 37.2, p < 0.01 (See table
3.2). This means that epoch winners and epoch losers chose an opposite
dyadic inter-personal distance regulation. Winners stood closer together
during defensive game situations, but where wider distributed in the space
during offensive moves compared to losing teams. So winners and losers
were regulating their interpersonal space in a inverse oscillating manner,
while winners accepted significantly longer the presence of their team
members in their personal space.
45
Figure 3.4: Distribution of epoch winners (right panel) and epoch losers
(left panel) for all goal events. The graph only shows one side of the game
field; play direction is from top down and vice versa. Colorbar indicates
accumulated position of players over time. Winners chose more static and
defensive positions compared to losers.
46
Table 3.2: Spatial intra-team interactions for winners and losers during
the entire game, winning and losing epochs and offensive and defensive
game situations: Mean intra-team member distance; standard deviation in
brackets. ITMD = Intra-Team Member Distance; Sig = Significance level
** p < 0.01.
Game
Epoch
Offense
Defense
3.4.2
ITMD Winners [m]
ITMD Losers [m ]
Sig
2.23 (0.4)
2.34 (0.4)
2.29 (0.4)
1.76 (0.6)
2.33 (0.4)
2.33 (0.4)
1.71 (0.6)
2.32 (0.4)
**
**
Effect of Players Representation on the Spatial Interaction
To investigate the effects of players representation - Physical or Virtual we analyzed the spatial behavior under different conditions. We evaluated
significant differences in seconds spent in the intimate space of the team
member across different conditions: χ2 (2, N = 78) = 15.76, p < 0.01 (See
figure 3.5). Post hoc analysis using the Bonferroni criterion indicated that
Mixed teams shared their intimate space longer as compared to Physical
teams. Additionally we observed differences in the duration that personal
space was shared across conditions: χ2 (2, N = 78) = 9.86, p < 0.01.
Significant differences between the condition Physical and Mixed and between the condition Physical and Virtual were evaluated by a Bonferroni
post hoc analysis.
This is an interesting result, but we have to be careful to make interpretations about the underlying mechanisms responsible for this change
in behavior. Because we are changing the modality to play the game
(physical action vs. game pad) we do not know how much this difference
of playing affected the behavioral adaptation. To avoid such criticism we
investigated the effect of the representation on spatial interaction using
a method that excludes such side effects: We compared the behavior of
47
Figure 3.5: Distribution of epoch winners (right panel) and epoch losers
(left panel) for all goal events. The graph only shows one side of the game
field; play direction is from top down and vice versa. Colorbar indicates
accumulated position of players over time. Winners chose more static and
defensive positions compared to losers.
48
XIM players playing in Mixed teams with XIM players playing in Physical teams (See figure 3.7). All players playing the game inside XIM
share the same representation and thereby the same game modality. But
they differ in the interaction set up: XIM players playing in Physical
teams interact with another XIM player, while XIM players playing in
Mixed teams interact with avatars controlled by the remote players (See
figure 3.7). Analogous we compared behavioral differences between remote players that participated in either Mixed or Virtual teams. All remote
players shared the same modality to play the game and perceive their team
members.
A two-sided t-test revealed that XIM players in Mixed teams sprinted
more than XIM players in Physical teams t(76)=2.03, p<0.05. No differences were observed between remote players of Mixed teams and remote
players of Virtual teams (See table 3.3). To investigate the distribution of
players in the space we analyzed the mean distance of subjects to the midline of the team field. The mid-line is defined as a line parallel to the side
line separating the field into two equal parts. A two-sided t-test revealed
that XIM players in Mixed teams chose a smaller distance to the mid-line
compared to XIM players in Physical teams t(76) = 2.61, p < 0.01. No
differences between remote players of Mixed teams and remote players of
Virtual teams were observed (See table 3.3). Additionally we calculated
the time that players spent behind the midline in the field side of their
team mate. XIM players of Mixed teams entered significantly longer this
field side compared to XIM players of Physical teams: χ2 (2, N = 76) =
5.96, p < 0.01. No differences between remote players of Mixed teams
and remote players of Virtual teams were observed (See table 3.3).
3.5
Discussion
Based on previous findings we hypothesized that the spatial behavior of
multiple people engaged in a cooperative task codes social interaction that
can be quantified. Additionally we hypothesized that these interaction
patterns are affected by the salience of perceiving another person. The
49
Figure 3.6: Schematic representation of the three conditions. Only team
of the same condition played against each other. Left panel: Two Physical teams compete each other. All four players are physical present inside
XIM. Middle panel: In the Mixed condition one player of each team is
present inside XIM and the other player virtually represented. Virtual
players use a computer to play the game. Right panel: In the Virtual condition all four players use a computer to play and are virtually represented
inside XIM.
Table 3.3: Spatial behavior of XIM players and remote players. Mean
sprinted distance, mean distance to the mid-line of the team side and mean
time spent in the field side of the team member (Time behind mid-line);
standard deviation in brackets. Sig = Significance level ( * p < 0.05, ** p
< 0.01).
Physical
Mixed
Sig
Sprinted Distance [m]
Distance to mid-line [m]
Time behind mid-line [sec]
67.6 (26.1)
1.17 (0.3)
2.35 (3.6)
50
80.3 (25.3)
0.90 (0.6)
3.89 (17.8)
*
**
**
Figure 3.7: Schematic representation of the detailed analysis of players
behavior in different conditions. We compared the behavior of XIM players in the Physical condition with the behavior of the XIM players in the
Mixed condition (A) and the behavior of the remote players in the Mixed
condition with the behavior of the remote players in the Virtual condition
(B).
results of our study provide support for both proposed hypotheses.
The task participants had to complete favors teams that optimally coordinate their distribution in space (Inderbitzin et al., 2009). This means
that the two team players had to find a spatial strategy that led to success.
Our data shows that the spatial strategy of this social interaction can be
identified and quantified.
Winners choose in general a more defensive strategy selecting positions closer to the back line of the space. This simple but fundamental
difference in behavior seems to gave the winners more time to react to
the attacks from their opposing teams. Another difference is that winners
shared significantly longer their personal space with their team players, in
particular during offensive moves. Also losers entered the personal space
of their team players, but not as long as winners. Winners and losers also
differed in the spatial dyadic interaction: Successful teams stood more
compact during defensive game situations and more disperse during attacking moves, while losing teams chose the reciprocal moving pattern.
This two results seem to be contradictorily. A closers look at the data
51
revealed that a very specific moving pattern for winning teams: During
defense they chose a compact disposition in space without entering the
personal space of their team members. During offense they increased
their interpersonal distance, but entered in particular moments the personal space of the other player. This detailed analysis shows a complex
spatial interaction that is not visible at the first look. An interpretation of
this behavior is that winners played more individualistic increasing their
interpersonal space to their team member during offense. Such an excited
play mode led to situations where both tried to hit the ball entering the
personal space of the other player. We can summarize that winners chose
a more efficient spatial strategy during the rally. This adaptation of the
inter-personal space had a crucial effect on the success of the team.
It has been shown that the regulation of the personal space depends on
the familiarity of the interactors (Hall, 1963, 1966). Humans share their
direct surrounding with familiar and intimate friends while they prefer to
interact with strangers at a wider distance. The invasion of an unknown
person into the personal space can be perceived as a threat and therefore
induces discomfort (Hayduk, 1978). Recently it has been show that neuronal clusters that respond to fearful situations show also an increased
activity during a violation of the direct personal space (Kennedy et al.,
2009).
Humans compensate invading behavior of a person with a counterbalancing action (Patterson, 1973). The behavioral equilibrium of non-verbal
interaction patterns results in a comfortable level of intimacy (Argyle and
Dean, 1965).
From psychological studies we know that people with a high social
status claim more direct space and position themselves closer to other
people compared to people with a lower social status (McKenzie and
Strongman, 1981; Leffler et al., 1982). Interestingly people with a reduced self-esteem increase their inter-personal distance to others (Roger,
1982). This means that the regulation of the personal space is an indirect
indicator of the social relationship we establish to others. It could be that
such underlying psychological mechanisms were affecting the behavior
of players in our game. This would mean that winners felt less discom52
fort sharing the personal space with their team members and therefore
regulated their dyads more naturally. The spatial game we used in our
experiment favors by definition a homogeneous distribution of players in
the space. This implies that an asymmetric spatial distribution of players affects negatively the performance of a team. So maybe teams that
perceived social discomfort chose an inhomogeneous spatial distribution
and therefore suffered under a reduced success rate. This is an interesting
interpretation that relates players perception of ’the self’ to the observed
behavioral performance of the team. So far we base this interpretations
on the theoretical basis of proxemics, showing that social discomfort increases the spatial distance. To confirm the factors responsible for this
change in behavior we would need reports of players feeling during the
game. So far we quantified the spatial patterns coding social behavior.
Our second hypothesis addressed the question how the salience of the
stimulus, in our case the perception of another person, influences social
interaction on a spatial scale. The results show that players significantly
adapted their spatial distance to another person whether or not this person
was physically present or virtually represented. Members of Mixed teams
shared their intimate and personal space longer with each other compared
to Virtual and Physical teams. This finding raises an interesting question:
Are these behavioral differences induced by the varying modality to play
the game (Game pad vs. physical action) or are they induced by a change
of the stimulus salience (virtual team partner vs. physical team partner)?
To investigate this question we compared the behavior of XIM players
in Mixed teams with the behavior of XIM players in Physical teams. And
analogously the behavior of remote players in Mixed teams with the behavior of remote players in Virtual teams (See figure 3.7). All XIM players and all remote players share the same game modality, means they either use a game pad or perform physical action to control their body. The
only difference between the two groups is the representation or salience
of their team partner (virtual vs. physical). Our results show that XIM
players of Mixed teams playing with a virtual avatar as team partner were
much more active and centralized positioned compared to XIM players in
Physical teams. XIM players of Mixed teams also entered significantly
53
longer the team field side of their virtually represented team player compared to XIM players of Physical teams playing with a physical team
player. The change of the salience of the stimulus, in our case the representation of the team partner either as virtual character or real player, had
a fundamental effect on the behavior of players.
In social psychology this phenomena has been described as the ’vividness effect’ (Fiske and Taylor, 1984; Baddeley and Andrade, 2000). It
states that symbolic knowledge has weaker impact compared to the impact of pictures and events. Frijda describes a similar effect in his law
of apparent reality 1988, “Emotions are elicited by events appraised as
real, and their intensity corresponds to the degree to which this is the
case“ (p.352). So the mental state of how strong we perceive the world
as real affects how we react to it. This concept finds also support by
physiological studies that show that the subjective perception of realism
can influence the physiological responses to a fearful stimulus (Bridger
and Mandel, 1964). In our study we reduce the apparent realism or the
salience of the stimulus by varying the representation of players. The perception of physical and virtual players differ in the amount of accessible
information.
A physical player not only marks his position in space, but also expresses body gestures that are important non-verbal cues for the understand other people’s intentions (Birdwhistell, 1975; Sommer, 1969). It
could be that physical players joining a team with a virtual player behaved
more ”egoistically“ because they could not understand their intentions.
This interpretation points out the importance of gestures to understand
immediate actions of other people.
Another interpretation of the ”egoistic“ behavior of physical players
in Mixed teams is that the physical absence of their team player induced
an increased feeling of responsibility to act. A phenomena that has been
observed in the so called bystander effect (Darley and Latane, 1970). It
states that the presence of others in an emergency situation reduces impulses to act. This would mean that physical players in Mixed teams felt
alone in the physical space and therefore more responsible to run for the
ball.
54
We conclude that the reduction of the stimulus salience induces a significant change in social behavior on a spatial scale. To which account the
lack of gestural information or the lack of physical presence is responsible
for this change is difficult to say.
The behavioral differences between Physical and Virtual teams are
also very interesting. Our observations show that people interacting in a
virtual world with other people reduce significantly their proxemic regulations. In particular virtual players entered more often the intimate and
personal space of their team players than physical players. The perception of a real physical person induces not the same behavioral response
as the perception of a virtual character on a computer screen. It seems
like the regulation of the inter-personal space is fundamentally affected
by how we perceive others rather than the consequence of a cognitive applied concept. But we have to be careful with claiming that the change
in perception is the only factor affecting our results. Because we do not
know if additionally the game modality of controlling the body influenced
the observed behavior. Obviously moving a virtual body using a game pad
is not the same as performing physical action in a space. So any behavioral difference between the condition Physical and Virtual is probably
influenced by both, the salience of the stimulus and a change of the game
modality.
3.6
Conclusion
The understanding of the regulation of non-verbal spatial behavior is a
complex problem. With our study we could show which dimensions of
spatial behavior can be related to cooperative interaction patterns and how
we can quantify them. Winners chose in general a closer interpersonal
distance and a more successful dyad. It could be that this pattern was
positively or negatively affected by the level of perceived social status of
singular players inside the teams. The salience of perceiving another person either as real or physical was influencing this spatial behavior. So far
we know three factors responsible for the regulation of spatial interaction
55
(Baldassare, 1978): Genetically pre-programmed behavioral patterns, the
environment and culture. Based on our results we propose the that variation in stimulus salience acts as a gating mechanism for these factors.
This concept is not new and has already been described by others (Frijda,
1988; Borgida and Nisbett, 1977). With our study we provide empirical
data supporting the idea that the perceived apparent realism works as a
psychological mechanism that modulates behavioral responses. One strategy for investigation is to construct a set of different stimulus properties
that induce different behavioral responses, that help us to understand the
underlying psychology. Mixed virtual reality environments are therefore
powerful tools to construct experimental design addressing this question
(Blascovich et al., 2002).
Future studies investigating the effect of the perceived salience have
to be capable to gradually reduce the vividness of the stimulus, which
could be done by using 3D holographic representations of real humans
and objects. Recently such technologies have been used successfully in
entertainment industry (McQueen, 2006). More realistic approaches to
control a virtual avatar by physical actions will help to reduce behavioral
discrepancy between the real and virtual world. That could be done by using interfaces that do not restrict physical actions like the multidirectional
endless treadmill called CyberWalk (De Luca et al., 2009).
56
Chapter 4
PERCEPTION OF EMOTIONS
The expression of emotions is fundamental for social interaction. Humans
communicate their emotions to others using spoken language and a variety of non-verbal behaviors. Often these emotional expressions communicate the internal state of an individual to the group (Scherer and Ekman,
1984; Ekman, 1993; Izard, 1994). But not always. It has been shown that
certain expressive behaviors like smiling for example are more often used
to foster social relationships and hierarchical structures than to express a
true affective state (Fridlund et al., 1991; Kraut and Johnston, 1979).
The ability to perceive and understand emotional cues is fundamental
for social interaction, because it allows an individual to derive the intentions of others (Baron-Cohen, 1997b; Blakemore and Decety, 2001). The
importance of being capable to perceive affective behavior can be seen
in patients suffering from the Asperger’s Syndrome (Baron-Cohen et al.,
1997a). These patients show fundamental deficits in social interactions,
because they lack the ability to ’read’ the affective state of others.
In this chapter we investigate the perception of verbal and non-verbal
features of emotions. We want to know which behavioral parameters code
an emotional state and how the brain perceives and integrates these parameters into a global impression. To do so we use different artificial
agents that allow us to construct realistic and controllable stimuli spaces.
The behavioral observations will be tested against different models of per57
ception. The insights from these studies add to the understanding of how
humans perceive emotions and which behavioral parameters are crucial
for the communication of these emotions.
4.1
Emotion Perception in Locomotion
A major challenge for the understanding of the meaning of expressive
behavior is to find a schematic classification. This is not a trivial task,
it has motivated researchers from sociology, behavioral psychology, theatre, and dance studies for decades. While a great deal of attention was
focused on the understanding and classification of emotional facial expressions (Ekman and Friesen, 1978; Scherer and Ekman, 1984; Ekman,
1993; Izard, 1994), relationally little systematic research has been carried
out in the field of emotional body language. One approach to describe
body movements is the Laban Movement Analysis (LMA), that divides
expressive biological motion into four different dimensions: Body, Effort,
Shape and Space (Pforsich, 1977). Using its own symbolic notation, this
analysis method is capable of specifically describing body movements.
The LMA is a powerful tool that can be used for the production and especially the reproduction of human behavior in acting and dance. Despite
these capabilities, the LMA lacks a clear linkage between expressive behavior and somatic and cognitive states. Another theoretical concept is
Bridwhistell’s theory of kinesics that understands the language of the human body as a “structured dynamic process of communication“ (Birdwhistell, 1975). According to this theory all movements of the body have
a meaning and these movements have a grammar that is based on kineme,
interchangeable units of movements. Unfortunately, the results of his
extensive studies are not systematically ordered and thereby difficult to
quantify (Jolly, 2000). A simpler classification system was proposed by
Mehrabian, focusing on the orientation of the head in relation to the body
and the angles of bodies interacting with each other (Mehrabian, 1972).
What all classification systems have in common is a difficulty to find a
direct relationship between the affective state and concrete corporal con58
figuration and body movement. In contrast to facial expressions, where
we can observe coherent relationships between basic emotions and expressive behavior (Scherer and Ekman, 1984), the interpretation of expressive body behavior is more sensitive to contextual and social influences (Kret and de Gelder, 2010). Nevertheless, there is some empirical
evidence that the movement and the form of the human body communicates emotions (Camurri et al., 2003; Blake and Shiffrar, 2007), also at
a distance where facial expression is not detectable (Walters and Walk,
1986). A promising approach is to analyze the emotion attributing of
predefined body postures or movements, and correlate them with the parameters defining the body configuration. Studies following this idea
differ methodological by either exposing viewers to real actors playing
(Camurri et al., 2003), video scene of actors playing (Wallbott, 1998),
computer animations of virtual humans (Coulson, 2004)or point-light animations that conceptualize human body movements (Clarke et al., 2005;
Pollick et al., 2001). The results of these and similar studies show that
affective states can be identified by observing static postures (Kleinsmith
and Bianchi-Berthouze, 2007; Coulson, 2004; Kleinsmith et al., 2006; De
Silva and Bianchi-Berthouze, 2004) or moving behavior (Kamisato et al.,
2004; Camurri et al., 2003; Wallbott, 1998).
The exact contribution of form and movement for the perception of
emotional states is the topic of an extended discussion in the field. A recent study by Roether et al.(2009) states that the understanding of affective body language is an integrative process of the perception of both dimensions, form and movement. Roether identifies the limb flexion velocity as an important feature for the perception of fear and anger, while the
upper body posture, especially the head inclination communicates sadness. These results are in line with a study from Thurman et al (2010) that
investigates the perception of different critical features for biological motion. Exaggerated body movement facilitates the recognition of affective
states, especially the intensity of them (Atkinson et al., 2004). The contribution of the form dimension for the identification of emotional states was
made visible by a study using inverted and reversely played sequences of
a moving person (Atkinson et al., 2007). The result of these studies can be
59
interpreted that the form plays a crucial role in affect identification, while
kinetics help to solve conflicts and the identification of the intensity of
the emotion. A finding that is in line with perceptual studies investigating
the neurobiological mechanism of motion perception (Giese and Poggio,
2003).
The emotional classifications used to describe affective behavior differ in complexity. The basic emotion approach is claiming that there exist
a finite set of distinguishable emotions that can be attributed to expressive behavior (Izard, 1977; Ekman, 1992). The dimensional approach
to emotions describes affective states using a two-dimensional classification system known as the circumplex model (Russell, 1980; Plutchik,
1980). This theory provides a circular classification space of basic emotions using valence and arousal to describe the quality and intensity of
different emotional states. Both systems are used to describe expressive
body movements (Coulson, 2004; Wallbott, 1998).
4.1.1
Methods
Based on the results of previous studies (Coulson, 2004; Wallbott, 1998),
we constructed different animations of expressive locomotion by varying
three parameters of the movement: The head/torso inclination, including
the erection of the shoulder, the speed of the movement, and the viewing
angle.
We selected 18 participants from the University Pompeu Fabra for our
study. All the participants were either master students, PhD students or
professionals working in academic and were permanent living in Spain.
The mean age of the participants was 28.4 years (SD = 4.3; M = 70 %;
W = 30 %). The animations were modeled using Autodesk 3ds Max (Autodesk Inc., San Francisco, CA, USA, 2007) and transferred to the Torque
Game Engine (GarageGames, 2010). As stimuli we exported from 10 sequences of a length of 10 seconds each. For the stimuli exposure, and the
rating of the sequences we used a 15 inch IBM Think Pad Laptop running the E-prime1 experiment exposure software (Psychology Software
Tools, Inc., Sharspburg, PA, USA, 2007). The self-assessment manikin
60
rating scale (Bradley and Lang, 1994) was used for the evaluation of the
sequences.
Figure 4.1: Still images of stimuli in frontal view (A-C), and side view
(D-F). Head/Torso inclination varied between 55 degree down (A, D),
zero degrees (B, E), and 15 degrees up (C, F).
We constructed 12 different animations of a person walking by varying the parameter of the head-torso inclination, the speed of the movement, and the viewing angel (Figure 4.1). We defined the head-torso inclination of the neutral body posture as inclination angle zero, and used
this as a reference for the other animations. The deviation of the headtorso varied between -55 and +15 degrees. The convention applied was
that minus inclination indicates a ventral direction, positive declination
a dorsal one. Half of the animations were showing the walking body in
profile view (90 degree viewing angle), the other half in 45 degree rotated
frontal view (Table 4.1). The animated avatar was a women wearing a
dark, red-blackish suit, and dark shoes. To avoid any contextual influence
we used a neutral gray color as background (Kret and de Gelder, 2010).
The face of the character was blurred to avoid any influence of the facial
expression (Van den Stock et al., 2007; Meeren et al., 2005).
61
Table 4.1: Specification of the stimuli parameters
Viewing Angel Inclination [Degree] Speed [steps/sec]
45
Neutral [0]
Medium [0.75]
90
Neutral [0]
Medium [0.75]
45
Up [+ 15]
Medium [0.75]
90
Up [+ 15]
Medium [0.75]
45
Down [- 55]
Medium [0.75]
90
Down [- 55]
Medium [0.75]
45
Neutral [0]
Slow [0.5]
90
Neutral [0]
Slow [0.5]
45
Neutral [0]
Fast [1.4]
90
Neutral [0]
Fast [1.4]
Participants were sitting alone at a table in front of a computer laptop
used for the stimuli presentation, and asked to rate the valence and arousal
state of a walking person. Each sequence was played for 10 seconds, followed by a black screen. After 2 seconds the valence and arousal rating
scale appeared until a rating was given. The pause before the next stimulus sequence was played was 4 second. The order of the sequences was
randomized. After the experiment, participants were asked by the experimenter if they had any problems to follow the experiment. Participants
were not informed about the specific objective of the study.
4.1.2
Results
The data was analyzed using the SPSS software package. The valence
and arousal ratings were submitted to two multivariate analysis of variance (MANOVAs) where Wilks Lambda was used as the multivariate criterion. The first MANOVA factors were 2 (viewing angle) x 3 (head inclination), and the second MANOVA factors were 2 (viewing angle) x 3
(movement speed). All data satisfied the normality criterion as verified
62
using the Kolmogorov-Smirnov test.
Effects of head/torso inclination
The analysis showed that the head/torso inclination factor had a significant effect on the ratings (F (4, 13) = 23.5, p < 0.001, Λ = 0.1). This
effect was pronounced both for arousal, F (2, 29) = 49.9, p < 0.001,
and for valence, F (1, 24) = 45.2, p < 0.001. The post-hoc Bonferroni
comparisons for the arousal ratings showed that the difference between
head/torso down (M = 2.5, SD = 0.3) condition was significantly lower
(p < 0.001) than normal the head/torso condition (M = 5.4, SD = 0.3),
and the head/torso up condition (M = 6, SD = 0.3). The same comparisons for the valence ratings showed significant differences for all three
conditions. The head/torso up condition was perceived as most pleasant
followed by the normal head/torso position, and head/torso down. The
means were M = 6.7, SD = 0.2; M = 5.9, SD = 0.3; and M = 2.8, SD =
0.5, respectively. No effect of the viewing angle, or interaction between
the angle and the head/torso position reached significance.
Figure 4.2: Valence and arousal rating for Head/Torso inclination. Error
bars indicate standard error. Valence rating 0 indicates a very sad emotional state, rating 10 a very happy state. Arousal rating 0 indicates low
arousal, arousal rating 10 indicates high arousal.
63
Effects of walking speed
The movement speed factor reached significance at F (4, 13) = 41.1, p <
0.001, Λ = 0.07. This effect was caused only by the arousal ratings,
F (2, 27) = 58.6, p < 0.001. The post-hoc Bonferroni comparisons for
the arousal ratings showed that fast speed motion (M = 8.1 SD = 0.2) was
significantly different from the normal speed (M = 5.4 SD = 0.3), and
from the slow speed conditions (M = 4.2 SD = 0.3). No effect of viewing
angle, or interaction between the angle and the movement speed reached
significance.
Figure 4.3: Valence and arousal rating for different speed parameters.
Error bars indicate standard error. Valence rating 0 indicates a very sad
emotional state, rating 10 a very happy state. Arousal rating 0 indicates a
low arousal state, arousal rating 10 indicates a high arousal state.
When locating the animations in the circumplex model of valence and
arousal (Figure 4), we see that a wide area is covered, indicating the power
of the head/torso inclination and speed parameters to express a range of
emotional states. The coordinates that are not sufficiently covered yet,
are the combinations of high valence/low arousal, and low valence/high
arousal.
4.1.3
Discussion & Conclusion
Our results show that participants assigned distinct emotional states to animations of a walking person that only differed in the erection of the pos64
8
◆
6
fast.90.neutral
medium.45.down
medium.45.neutral
medium.45.up
medium.90.down
medium.90.neutral
medium.90.up
slow.45.neutral
◆ slow.90.neutral
Arousal
4
Animation
◆ fast.45.neutral
2
◆
2
4
6
8
Valence
Figure 4.4: Distribution of the animations in the circumplex space. The
legend indicates the stimuli parameter space of the different animations:
<speed>.<viewing angle>.<head/torso inclination>. The speed parameter is defined as Fast = 1.4 m/sec, Medium = 0.75 m/sec and Slow = 0.5
m/sec. The viewing angle varies between profile view = 90 degrees, and
rotated frontal view = 45 degrees. The parameter for the head/torso inclination varies between Neutral = 0 degrees, Up = + 15 degrees and Down
= -55 degrees.
65
ture, and walking speed. An upright head/torso position was significantly
related with a positive emotional state or high valence, a lower position
with a more unpleasant emotional state. Even small changes in head/torso
position of 15 degrees induced a significantly different perception of the
emotional quality. This is indicative of the high sensitivity of humans in
relating subtle differences in body language to internal states. Next to the
valence, also the arousal rating was significantly affected by the body posture: Animations with negative head/torso inclinations were perceived as
less aroused compared to body postures with more upright head/torso positions. These results are in line with studies showing that especially the
static configuration of the upper body part code important features responsible for the perception of emotional states (Roether et al., 2009; Atkinson
et al., 2007). While the valence rating differed in all three head/torso conditions, in the arousal rating we only observed significant difference for
the most extreme negative head/torso inclination . This finding suggest
that only extreme down positions of the head clearly code low values of
arousal, which is in line with other studies that found that depressive states
were characterized by non-erected postures (Roether et al., 2009). The
different walking speeds had a clear effect on the perception of arousal:
Higher speed yielded higher arousal ratings compared to slower movements. This means that the velocity of the body movements does provide
information about the magnitude of an emotional state of a person. This
finding is in line with recent studies showing that the velocity of body
movements codes the intensity of a perceived emotional state (Atkinson
et al., 2004; Roether et al., 2009). Contrary to this observation, the speed
had no effect on the valence rating of the perceived emotion. If we are
searching for canonical parameters that control the expression of emotions in animations, we aim at finding parameters that are independent of
the angle from which their are seen. Indeed, our results show that the
emotional quality of the animations generated based on the chosen set of
parameters are independent of the viewing angle.
The identification and empirical evaluation of canonical parameters
that control the expression of emotions in locomotive behavior is the
main contribution of this study. Our results are coherent with previous
66
work, showing that upright upper body postures are perceived as emotionally more positive and forward leaning postures more negative (Coulson,
2004; Roether et al., 2009), and studies that found associations between
“dropped head” positions and sadness (Wallbott, 1998). The perception of
the arousal state can be related to a variation of the velocity of the movement, which is in line with findings from (Ekman and Friesen, 1974).
The results of our study confirm previous results stating that the intensity
of a perceived emotion is directly linked to the velocity of the identified
body gesture (Atkinson et al., 2004). Our study therefore supports the
hypothesis proposed by others that the static configuration of the body
parts, especially the upped back, shoulders and head inclination valence
value (Roether et al., 2009; Atkinson et al., 2007), while the kinematic
dimension codes the intensity of the emotion (Atkinson et al., 2004).
Even though context (Wallbott, 1998; Aviezer et al., 2008; Kret and
de Gelder, 2010) and facial expressions (Van den Stock et al., 2007;
Meeren et al., 2005) play an important role in giving meaning to bodily
expression, our results show that people recognize distinguishable emotional states of a moving person independent of those two factors. Hence,
we show that the characteristic of locomotion by itself can convey emotional states.
These findings are important as they allow us to build virtual characters whose emotional expression is recognizable at distances larger than
those at which facial expression can be decoded. Additionally, the moving characters can keep their emotional state during an extended period
of time. This is important since observing an isolated emotive face over a
long time can be perceived as a non-natural behavior. The understanding
of both of the mentioned aspects is of relevance for the construction of
avatars that interact with users in virtual worlds or in environments such
as CAVEs (Cruz-Neira et al., 1992) and mixed-reality spaces such as the
eXperience Induction Machine (Bernardet et al., 2008). Future work will
include the investigation of additional parameters that allow to cover the
entire circumplex space. Additionally, we plan to apply our finding to the
control of the emotional expression of a real-world robotic platforms such
as the humanoid robot iCub (Sandini et al., 2007).
67
4.2
Emotion Perception in the Talking Face
The perception of emotions is not a trivial task, because humans use
multiple modalities in parallel to transmit their affect to their environment. While talking for example, humans communicate their emotions
with the meaning of the words (Johnson-Laird and Oatley, 1989; Ortony
et al., 1987), the prosody (Buchanan et al., 2000) and abstract vocalizations (Sauter et al., 2010). Postures, gestures, touch, facial expressions,
eye-gaze and the inter-personal distance regulation are additional communicative cues that support or weaken the verbal dimension (Argyle, 1988;
Pentland, 2008). The results is a complex multimodal information stream,
consistent of verbal and non-verbal cues. Different theories of how the
brain processes this information stream have been proposed (Farah et al.,
1995; Etcoff and Magee, 1992; Tanaka and Farah, 1993).
4.2.1
The Fuzzy Logical Model of Perception
We base our research on the idea that a multi-modal stimulus is a pattern
recognition problem. Rather than perceiving the stimulus as a holistic
category, we propose that the brain perceives the single features independently and integrates them in a multiplicative manner. Because no set of
particular features characterizes a particular emotion, it has been proposed
that the features coding an emotion are continuous (Ellison and Massaro,
1997). The theory behind this idea has been synthesized into the Fuzzy
Logical Model of Perception (FLMP) (Massaro, 1998). I will first introduce the theoretical concept of the FLMP and compare the model against
other ideas and concepts of perception.
The main principle of the FLMP is that the perception of a multimodal stimulus stream is a pattern recognition problem. The model assumes three basic stages of processing as shown in figure 4.5: (1) each
source of continuous information is evaluated to ascertain the degree to
which it matches various stored prototypes; (2) the sources are integrated
according to a multiplicative formula to provide an overall degree to which
they support each alternative; and (3) a decision is made on the basis of
68
the relative goodness of fit of each prototype. The three processes are successively ordered in time, but overlapping. This information is based on
sensory primitives, or features. Based on the empirical results of previews
studies we propose that the encoding of affect in a talking face follows this
principle (Ellison and Massaro, 1997).
Figure 4.5: Schematic representation of the three stages involved in perceptual recognition proposed by the Fuzzy Logical Model of Perception
FLMP. The three processes are temporarily successive, but overlapping.
Reading direction in the diagram is from left to right. The model is explained with a task where subjects have to integrate affect from words
and expressions. The source of information are indicated by upper case
letters: Expressive information by Ei , word information by Wj . The evaluation process transforms this information into perceived features, indicated by lower case letters ei and wj . The integration process results in
an overall degree of support sk , for a given affect k. The decision process
maps the output of the integration into a response Rk . All three processes
make use of prototypes stored in the memory.
The FLMP has shown superior performance in multiple empirical experiments in different domains (Massaro, 1998). In the following we want
to use a deductive approach to test the model against alternative models
of perception. We do this by answering different questions concerning
the underlying psychological mechanism of perception. These questions
69
are hierarchically ordered into a tree of wisdom to simplify the answer
proceeding. This order does not imply any functional dependency. The
concept of the tree of wisdom has already successfully be applied in the
field of multi-modal speech perception (Massaro, 1987a). Figure 4.6 illustrates the tree of wisdom, which consists of a set of binary oppositions
about how a multimodal stimulus stream is processed.
Emotion Recognition
Holistic
Featural Parts
Categorical
Continuous
Dependent
Independent
Additive
Multiplicative
Figure 4.6: Tree of wisdom illustrating binary oppositions central to the
differences among theories of perception. Figure retrieved from (Massaro, 1998).
In the first stage we ask if the perception of emotion is a holistic
process or a pattern recognition problem. Holistic processing can be divided into holistic encoding and configural encoding (Farah et al., 1995).
The theory of holistic encoding states that the stimulus is perceived as
a whole. In a face recognition task it has been shown that face features
in the context of the whole face are perceived with a higher accuracy
than the separated features (Tanaka and Farah, 1993). If we want to test
the psychological mechanism behind this result we have to find a model
that predicts the outcome of this experiment. The problem with holistic
processing is that there does not exist any model for testing because it
requires the same amount of free parameters as stimuli. According to the
holistic idea of perception every emotion is unique, and its identification
70
cannot be predicted on the basis of components. The only approach we
can follow is to show that a contradictory model provides a good fit for
the observed behavioral data. The FLMP predicts the emotion judgment
from single features with a high accuracy. This results provides evidence
against holistic encoding (Massaro, 1998).
Configural encoding states that the spatial relations of the single features are important for the global perception mechanism. This form of
perception is more difficult to falsify, because the relative spatial configuration can be seen as a feature. This means for example that the displacement of a smile is evaluated in relation to the absolute position of the
mouth center. Interestingly it has been shown that the FLMP could predict the emotion in half and whole faces with the same parameter values
(Ellison and Massaro, 1997). This result supports the idea that component
features are the most important forces of emotion perception and not the
spatial configuration. We conclude that the perception of a multimodal
stimulus stream is not a holistic process and move down the right branch
of the tree of wisdom.
In the next step we have to answer if stimulus features are categorical
or continuous. Experimental results show that given a stimuli continuum between two alternatives identifications judgments change abruptly.
Scientist have taken this as a support for the categorical perception theory (Etcoff and Magee, 1992). Because the shape of the identification
function is discontinuous supporters of the categorical idea falsely interpreted this as a proof for their theory. It has been shown that continuous
information leads to a discrete identification function (Massaro, 1987b).
In a study using two features of emotional cues, Ellison and Massaro
(1997) have shown that the FLMP describes well the observed emotion
judgments that follow a ’discontinuous’ identification function. Because
the FLMP assumes continuous information about each feature this results
demonstrates that the identification function alone is not sufficient to determine whether perception is continuous or categorical. The verification
of which psychological mechanism is responsible for the observed result
has to be based on quantitative tests. Unfortunately, like holistic models, categorical models do not allow compositional tests, because such a
71
model would have the same amount of free parameters as stimuli. One
approach that has been used to proof the categorical model wrong is to
compare the performance of the FLMP against the single channel model
(SCM) that is mathematically equivalent to the categorical model of perception. This model states that people categorize information from each
feature and respond with the outcome of the categorization of only one of
the features. The poor fit of the SCM against the FLMP supports the idea
that categorical perception is not an adequate model for the mechanism
of emotion recognition (Massaro, 1998).
The good fit of the FLMP provide us with the answers for the last two
branches in the tree of wisdom. The theoretical basis of the FLMP assumes that single features are independent and combined in a multiplicative manner. The rating results of individual subjects show that combined
feature evaluation is more extreme than the rating given to either source
alone. The multiple empirical results supporting the FLMP show that this
model is a robust framework for inquiry. In this thesis we will evaluate
this framework in the context of affect perception in a talking face and
compare its performance against the Weighted Average Model WAM of
perception.
4.2.2
The Weighted Average Model of Perception
Another concept of how multi-modal stimulus integration is achieved is
conceptualized by the weighted average model of perception (Bruno and
Cutting, 1988; Massaro, 1988; Massaro and Ferguson, 1993). The evaluation stage is similar to the FLMP, but the values are added at the integration state. If we allow one feature dimension to have more influence
then the other the model can be made more general. Then the probability
to identify angry affect is equal to:
P (A|Ei , Wj ) = wei + (1 − we)wj
(4.1)
where we is the weight given to the expression and (1 − we) the
weight given to the word. The AMP is mathematically a single channel
72
model where the participants only attend to information of one modality
(Thompson and Massaro, 1989).
4.2.3
Automatic Processing of Information
Various criteria have been proposed to define a process of perception an
automatic mechanism. This criteria consider if a minimal attention is
required, if the stimulus can be processed unintentionally, or whether the
detection is effortless (Bargh, 1989; Logan, 1989; Shiffrin, 1988; Evans,
2003). Automatic processes can be structured along a continuum with
some processes being more or less automatic than others (Logan, 1989).
To which extent a process is automatic can be measured by the ability of
some distracter information to interfere with the processing of attended
information. A widely used method that measures such interferences is
the Stroop task (Stroop, 1935; MacLeod, 1991). In the original version
of this method, participants had to either read words printed in different
colors, or name the colors of the words. The interference was measured
by longer reaction time (RT) in the condition where the word color and
the word meaning mismatched. The results show that word reading was
less influenced by the non-attended stimulus dimension than the naming
of the color. One explanation for this difference in interference is the
claim that word reading is more automatic than color naming (Logan,
1989; MacLeod, 1991; Shiffrin, 1977). We will use the same method to
investigate the automaticity of affect perception in a talking face.
4.2.4
Automatic processing of affective faces and words
Affective stimuli communicate information that can be very important
for survival (Öhman, 2002; Öhman et al., 2001). The processing of such
stimuli is fast and very efficient Globisch et al. (1999). So far two brain
pathways involved in the evaluation of affect have been identified. The
first one is an old and fast subcortical pathway connecting the sensory
input directly to the amygdala, a nucleus involved in the evaluation of
valence quality (LeDoux, 2000; Lang et al., 2000; Paton et al., 2006).
73
This pathway is responsible for unspecific physiological and behavioral
responses. The second pathway involves cortical processing of the stimulus and elicits more complex goal directed responses (LeDoux, 1996).
The exact relationship between the processing of affect in faces and words
and the brain pathways involved in the evaluation of valence is not clearly
understood.
Recognizing emotion in faces is a skill that develops early in infancy
(LaBarbera et al., 1976; Meltzoff and Moore, 1977; Schwartz et al., 1985).
This process requires little attention. Threatening facial expressions are
capable of influencing responses even when people are unaware of the
presence of the face (Esteves et al., 1994; Morris et al., 1998). Maskedpriming stimuli of affective faces are also capable of inducing unconscious mimicry behavior in subjects measured by electromyographic responses (Dimberg et al., 2000). Another interesting results comes from
studies that have observed physiological responses to affective faces in
patients suffering from face-blindness or prosopagnosia (Bauer, 1984;
Damasio et al., 1990). This implies that the perception of affect does not
need a conscious perception of the face. An explanation for these findings
is the claim that threat-relevant stimuli like angry faces are more salient
and therefore are processed differently compared to non-threat stimuli
(Morris et al., 1998; Schupp et al., 2004). The results of these studies
suggest that emotions communicated by facial expressions are automatically perceived and processed.
Similar results have been presented for the perception of the linguistic semantics. In masked priming experiments where participants were
exposed to apparently undetectable affective words just a moment before
they had to judge the valence quality of a follow up word Greenwald
et al. (1989) observed significant influences of these masked words on
the reaction time to judge the affective quality of the follow up words.
Also, impression formation and preference responses can be influenced
by words not consciously detected (Bargh, 1989; Kihlstrom, 1987). Dehaene (1998) showed that words and numbers presented as masked primes
induce detectable changes in behavior and electrophysiological activity
measured in the premotor area. These results show that also the process74
ing of the linguistic semantics can happen automatic.
The understanding of spoken language is based on the integration of
both verbal and non-verbal dimension. Interestingly multiple functional
MRI studies and physiological results have described a lateralization effect for the comprehension of the linguistic semantic and the expressive
dimension of language (Schirmer and Kotz, 2006). The perception of
non-verbal expressions of language including vocal tone is related to an
increased activity of different brain areas in the right cerebral hemisphere
(Nakamura et al., 1999). These results are consistent with the identification of a special cortical region responsible for the processing of faces
in the lateral fusiform gyrus of the right hemisphere (Haxby et al., 2002;
McCarthy et al., 1997; Kanwisher et al., 1997). The existence of a specialized region responsible only for the processing of the different aspects
of faces points out its evolutionary importance. The processing of linguistic semantics has been related to an increased brain activity in the left
cerebral hemisphere, concretely in regions of the inferior and temporal
frontal lobe (Bookheimer, 2002; Binder et al., 2009). Different regions
responsible for the integration of semantic-lexical or phonological content could be distinguished (Demonet et al., 1992). Interestingly scientists have failed to locate category-specific brain areas responsible for the
detection of different semantic classes (Devlin et al., 2002; Bookheimer,
2002).
These results show that the human brain processes the linguistic semantics and the expressive dimension of language in different regions of
the brain. This processing can happen automatically without conscious
control.
The present study focuses on how facial expressions and linguistic
semantics are perceived and integrated in the judgment of two specific
emotions: happiness and anger. To investigate this question we designed
two experiments using an emotional Stroop task where the subjects always saw a face saying a word. They had to rate the emotional content of
the facial expression, or the meaning of the word, or both. Our goal was
to vary the amount of information supporting happiness or anger in the
spoken words and the face without claiming to produce a complete stim75
ulus continuum between happiness and anger. We used a controllable
synthetic talking head to manipulate the expressive dimension of the face
but not the voice (Cohen and Massaro, 1993; Massaro and Cohen, 1995).
The reaction time (RT) for valence coherent and valence-incoherent stimuli was used to investigate the degree of automaticity of affect perception.
In previous studies we successfully used an expanded factorial design to
investigate the integration mechanism of vocal and facial emotional components (Massaro and Egan, 1996). The results of these studies showed
that the fuzzy logical model of perception (FLMP), which assumes continuous and independent perceptual features, fit the judgments better than
an additive model. Hence, in this study we extended this paradigm to
the independent manipulation of the face and the linguistic content of the
spoken word.
4.2.5
Experiment 1
Methods
7 female undergraduate students from the University of California Santa
Cruz participated in the experiment. Participants were recruited by an ad
at the UCSC campus. All subjects received a financial reimbursement of
45 US Dollar for their participation. They ranged in age from 18 to 20 (M
= 18.6; SD = 0.79) and were all English native speakers.
We generated a stimuli space by controlling parametrically the animation of a 3-D talking head, Baldi (Massaro et al., 1998). This application
is capable of synthesizing the audio-visual and affective components of
a talking face following a modular principle. This powerful approach allows us to modulate and blend the different components of the multimodal
stimulus stream producing a complete set of stimuli that portrays different
affect. Given that Baldi apps are currently on the iPhone (Massaro et al.,
2009), we used an in-house app to present the stimuli on an Apple iPad
to the subjects at a distance of about 45 cm. No visual fixation point was
provided.
We selected the two basic emotions happy and angry for our stimulus
space. This decision was based on the concept that the two emotions code
76
opposite affect (Russell, 1980; Ekman et al., 1982). To create the affective
expressions we varied the eyebrows and mouth corner deflection because
of their influence on affective rating between angry and happy expressions
(Ekman and Friesen, 1978) (See figure 4.7). For the linguistic semantic
dimension we defined a stimulus continuum that consists of fifteen words.
The selection of these words was based on the evaluation of affect and
activation measured by others (Whissell, 1989; Morgan and Heise, 1988).
We also controlled for the word frequency, selecting familiar words that
appear between once every million and once every one hundred thousand
tokens (Carroll, 1971). The happy words were: happy, joyful, delighted,
proud, pleased and enthusiastic. The angry words were: bitter, resentful,
envious, angry, outraged and furious. The neutral words were: neutral,
demanding and rebellious. The 6 angry words and 6 happy words were
pooled into two classes coding high and low affect. The neutral words
were pooled into one class. For the vocalization of the words we used the
MARY text to speech engine speaking in a neutral voice (Schröder and
Trouvain, 2003).
Figure 4.7: The affective facial expressions of the stimulus space used
in experiment 1. The eyebrows and the mouth corner deflection of Baldi
were varied to produce a stimulus continuum from happy to angry.
The experiment used a factorial design with fifteen words and five
facial expressions. This means that 75 distinguishable stimuli were produced. Three different conditions were tested in the experiment. In the
first condition participants had to judge the expression of the face without
77
paying attention to the semantic meaning of the word.
In the second condition they had to judge the linguistic meaning without paying attention to the affect of the face. In the last condition subjects
had to judge the global event. In each condition the presentation of the
75 stimuli was repeated 5 times. Between these blocks participants had
a break of 2 minutes. Normally only one condition was tested per day.
Subjects that were tested on two conditions on the same day had a at least
a break of 4 hours between the sessions. The order of the 75 stimuli and
the 3 conditions were randomized.
After each stimulus subjects had to give a rating by pressing a button
on the touch screen that was labeled as ’Positive’ or ’Negative’. During
the rating the face was not visible. The subject’s response and reaction
time were recorded. After the rating a one second break was implemented
before the next stimulus was presented. The mean observed proportion of
happiness identification was computed for each of the 75 stimuli for each
subject by pooling across all the 5 blocks for each condition.
4.2.6
Results
For the data analysis we classified the 15 affective words into 5 classes
coding different strength of affect: Happy, Medium Happy, Neutral, Medium Angry, Angry. Hence, each class contained 3 words: Happy (Happy,
joyful, delighted), Medium Happy (Proud, pleased, enthusiastic), Neutral
(Neutral, demanding, rebellious), Medium Angry (Bitter, resentful, envious) and Angry (Angry, outraged, furious). The analysis of the reaction
time revealed a significant difference between the linguistic semantic and
the expressive condition. Subjects responded faster in the expressive condition (Median = 0.77 [sec]) compared to the linguistic semantic condition (Median = 0.98 [sec]): Kruskal Wallis χ2 (1, N = 1050) = 344.05,
p < 0.01. This result has to be carefully interpreted because the face
was exposed 0.6 seconds before the head started to talk. This means that
the expression was perceived earlier than the word and any difference in
performance are influenced by this difference.
To analyze the influence of one affective dimension on the rating of
78
Reaction Time in Expression Condition
Reaction Time in Word Condition
1.1
*
*
1.4
Reaction Time [sec]
Reaction Time [sec]
1
0.9
0.8
0.7
1.2
1
0.8
0.6
Coherent Valence
Incoherent Valence
Coherent Valence
Incoherent Valence
Figure 4.8: Reaction time in the expression condition (left) and the word
condition (right). When the stimulus construct had coherent valence qualities reaction times were reduced in both conditions. The box indicates the
25th and the 75th percentile, the whiskers indicate the most extreme data
points not considered as outliers. The horizontal line is the median.
the other, we calculated the reaction time for coherent and incoherent
stimulus constructs. A coherent stimulus construct is defined as a stimulus
coding happy or angry valence in both linguistic semantics and expression
dimensions. An incoherent stimulus construct codes opposite valence in
the two dimensions. People responded faster in both conditions, when
the stimulus construct coded coherent valence quality (See figure 4.8). A
Kruskal Wallis test revealed a significant reduction in reaction time in the
expressive condition (Median coherent = 0.74 [sec]; median incoherent =
0.8 [sec]: χ2 (1, N = 85) = 10.96, p < 0.01) and the linguistic semantic
condition (Median coherent = 1.03 [sec] ; median incoherent = 1.10 [sec]:
χ2 (1, N = 85) = 8.04, p < 0.01). No difference for the type of valence
(angry vs. happy) was observed.
In the second analysis we investigated how the word or the face affected the judgment of the face or the word. We observed a significant
influence of the angry vs. happy words on the judgments of neutral faces:
Wilcoxon W = 129.66, z = 2.53, p < 0.01 (See figure 4.9).
To investigate the psychological mechanism responsible for the integration of the two modalities we tested the observed data against the
79
predictions of the Fuzzy Logical Model of Perception FLMP and the Additive Model of Perception AMP. The model fitting was accomplished
with STEPIT (Chandler, 1969). The FLMP and the AMP were fit to the
observations of each of the 7 individuals. The fit of the FLMP requires
the estimation of five ei values for the 5 different classes of expressions
and 5 lj for the five different classes of semantic information. The AMP
requires the same number of parameters plus one for the weight value
w. The goodness of fit was calculated by the root mean square deviation
(RMSD) between the observation and the model’s prediction. Figure 4.9
shows the average fit for the FLMP in the expression and semantic conditions. The average RMSDs for the FLMP in these two conditions were
rexpressive = 0.0234 and rlinguistic = 0.021. The average RMSDs for the
AMP in these two conditions were rexpressive = 0.0454 and rlinguistic =
0.0287.
Figure 4.10 shows the average fit for the FLMP along with the observed data in the bimodal condition. The average RMSDs for the FLMP
in this condition was rbimodal = 0.048. The fit of the AMP also produces a
bigger RMSD in this condition rbimodal = 0.13. So in all three conditions
the AMP produces lager RMSDs compared to the FLMP. An ANOVA
was carried out on the RMSDs for the fits of the two models. The FLMP
provided a significant better fit for than the AMP [F (1,40) = 7.89, p <
0.01].
4.2.7
Discussion
The differences in reaction time between stimuli that coded coherent and
stimuli that coded incoherent valence quality indicates interference effects
for the perception of affect in different modalities. Comparing the behavioral observations with the predictions of different models of perception
shows that the FLMP significantly better fits the data than other models
of perception.
In the original Stroop task the meaning of a word interferes with color
categorization (Stroop, 1935). This interference effect has been explained
by two different hypotheses: The relative speed-of-processing for differ80
Semantic Condition
1
0.9
0.9
0.8
0.8
P (Happy) Identification
P (Happy) Identification
Expression Condition
1
0.7
0.6
0.5
0.4
Semantic
0.3
Happy
Medium Happy
0.2
Neutral
0.1
Medium Angry
Angry
0
Happy
Medium Happy
Neutral
Medium Angry
Angry
0.7
0.6
0.5
0.4
Expression
0.3
Happy
Medium Happy
0.2
Neutral
0.1
Medium Angry
Angry
0
Happy
Medium Happy
Expression
Neutral
Medium Angry
Angry
Semantic
Figure 4.9: Observations (symbols) and predictions (lines) for the fuzzy
logical model of perception FLMP in the expression condition (left) and
the linguistic semantics condition (right). We observed a significant influence of the angry words on the judgments of the neutral facial expressions
(left panel). This effect was not observed in the linguistic semantics condition (right panel).
AMP
1
0.9
0.9
0.8
0.8
P (Happy) Identification
P (Happy) Identification
FLMP
1
0.7
0.6
0.5
0.4
Semantic
0.3
Happy
Medium Happy
0.2
Neutral
0.1
Medium Angry
Angry
0
Happy
Medium Happy
Neutral
Medium Angry
Angry
Expression
0.7
0.6
0.5
0.4
Semantic
0.3
Happy
Medium Happy
0.2
Neutral
0.1
Medium Angry
Angry
0
Happy
Medium Happy
Neutral
Medium Angry
Angry
Expression
Figure 4.10: Observations (symbols) and predictions (lines) for the fuzzy
logical model of perception FLMP (left) and the weighted additive model
of perception AMP (right). The plot shows the fits for the bimodal condition where subjects had to identify the affect of the overall event. The
FLMP makes a significant better prediction for the observed data compared to the AMP.
81
ent stimulus features and the bottleneck of attention (MacLeod, 1991).
The first hypothesis states that the processing of the word is faster than
the naming of the color. Because of this difference in speed, the reading task interferes with the color identification task but not vice versa
(MacLeod, 1991). The second hypothesis claims that there is a bottleneck
of attention. Specifically, people ignore the color of the word when reading the word but not the meaning of the word when they have to name
the color (Cohen et al., 1990). According to this hypothesis, automatic
processes are less affected by the distractive power of the non-attended
stimulus feature. Following this idea theorists have argued that word
reading is an automatic process, while color naming or picture identification requires more cognitive load and is therefore a controlled process
(MacLeod, 1991). Other studies showed that pictures interfere with word
categorization (Stenberg et al., 1998; Glaser and Glaser, 1989), but words
do not interfere with picture categorization (Glaser and Dungelhoff, 1984;
De Houwer and Hermans, 1994). The same result has been presented in
a recent study that used affective words superimposed to affective faces
(Beall and Herbert, 2008).
In our experiment we used a modified Stroop task where people had to
judge the affect of spoken words and facial expressions. Our analysis of
the reaction times in the single mode conditions shows that not only the
facial expression influences the judgment of the word meaning but also
the meaning of spoken words interferes with the judgment of affective
facial expressions. In both conditions, expression and linguistic semantic, the participants made faster judgments when the two dimensions of
the stimulus construct coded coherent valence quality. Interestingly, this
difference has only been observed for strong affect quality. Words and
facial expressions that did not code extreme affect did not show such interference effects. These results indicate that the processing of affect in
the facial expression and the linguistic semantic interfere with each other.
Because both dimension show this influence we can exclude the idea that
one of the two dimensions is processed faster than the other. A more suitable explanation for our results is the bottleneck of attention theory: The
processing of affect in the face and the linguistic semantic are not com82
pletely automatic and therefore interfering. The fact that we observed
this interference only in words and faces coding strong affect indicates
that the processing of valence differs across the quality. This suggets that
the brain prioritizes the processing of stimuli with strong affect. The fast
processing of threatening faces have been explained with the same argument (Schupp et al., 2004). We also observed an influence of the linguistic
dimension on the judgment of the neutral facial expression, but no influence of the facial expression on the judgment of the linguistic semantic
(See figure 4.9). Probably the neutral faces were coding ambiguous affect
that could be influenced by the non-attended feature. We hypothesize that
more interferences are not being detected because the binary rating produces only extreme ratings. Therefore we implemented a slider interface
in experiment 2.
Compared to previous studies we used an animated face that was talking directly to the subjects. This experimental design is more natural than
reading superimposed words on static photographs. This could be an explanation for the different results compared to studies that did not show
interferences of the word dimension on the rating of the faces (Beall and
Herbert, 2008). In studies that used printed words superimposed on static
photographs we cannot be sure that the subjects were reading the words.
It can be speculated that the missing interference of the words on the
judgment of the facial expressions was due a lack of the stimulus perception and not because the perception of facial expressions is an automatic
process.
The interesting question now is how the brain integrates the dimensions into a global percept. Because the face is an important evolutionary
stimulus it has been claimed that the perception of faces is unique, involving holistic and non-analytic brain processes (Levine et al., 1988). This
hypothesis can be tested using our synthetic talking head Blaid that can
express different levels of affect. The comparison between the observed
performance of identifying affect and the predictions of different models of perception shows that the FLMP fits the data significantly better
than the additive model (See figure 4.10). This means that in experiment
1 the subjects use both cues to judge affect in the same manner as they
83
combine speech features (Massaro, 1989; Massaro and Ferguson, 1993;
Massaro et al., 1993; Massaro and Egan, 1996).
In experiment 1 the subjects used a binary choice to give their ratings.
It could be that the ratings were influenced by the urge of the participants to give coherent ratings and they memorized the rating they gave
to expressions and words. The binary choice probably was not sensitive
enough to detect more interference. Therefore we designed a second experiment with a slider to give continuos judgments. While a binary choice
can be remembered we hypothesized that a slider is more sensitive to detect interferences in the ratings.
4.2.8
Experiment 2
Methods
6 male and 3 female academics from the University Pompeu Fabra participated in experiment 2. The participants did not receive any financial
reimbursement. The age ranged from 24 to 37 (M = 30.1, SD = 5).
As in experiment 1 we generated a stimuli continuum using Baldi to
modulate the facial expressions and the MARY text to speech engine to
vocalize the words. We varied the eyebrows and mouth corner deflection
to generate 10 different facial expressions coding different strength of the
emotions happiness and anger. (See figure 4.11). The selection of the
10 words coding happy or angry affect was based on the evaluation of
other studies (Whissell, 1989; Morgan and Heise, 1988). We controlled
for the word frequency using only words that appeared once every million
and once every one hundred thousand tokens (Carroll, 1971). The words
were: Joyful, happy, delighted, pleased, surprised, neutral, disappointed,
angry, furious and outraged. We asked participants if they understood
these words before the experiment started. Only comprehensible words
were included in the analysis.
In experiment 2 we used a factorial design with 10 facial expressions
and 10 words producing 100 distinguishable stimuli. These stimuli were
tested in 3 conditions were participants had to rate either the affective
meaning of the word, of the facial expression or of the global event. Each
84
Figure 4.11: The affective facial expressions of the stimulus space used
in experiment 2. The eyebrows and the mouth corner deflection of Baldi
were varied to produce a stimulus continuum from happy H (top left)
to angry A (down right) in 10 steps. The letter N indicates a neutral
intermediate state. The number indicates the strength of the affect.
85
condition was tested on a different day. Before the experiment we presented the participants the complete stimulus space (words and faces) to
familiarize them with the continuum.
The stimuli were presented on an Apple iPad at a distance of about
45 cm. No visual fixation point was provided. After each stimulus subjects had to give a rating by using a slider that was labeled as ’Positive’
or ’Negative’ at the end points. A set button below the slider was used to
execute the rating. During the rating the face was not visible. The subject’s response and reaction time were recorded. After the rating a one
second break was implemented before the next stimulus was presented.
The mean observed proportion of happiness identification was computed
for each of the 100 stimuli for each subject.
4.2.9
Results
A Wilcoxon test was conducted to evaluate difference in RT between experiment 1 and 2. The result indicate a significant difference: z = 79.6,
p < 0.01 (See figure 4.12). Also in experiment 2 the reaction time for
the face condition was significantly faster compared to the semantic or
the global condition: Kruskal Wallis χ2 (2, N = 2687) = 93.9, p < 0.01.
This result is not very informative because it is influenced by the stimulus on-set difference of 0.6 seconds between the facial expression and
the moment when the head started to talk. Participant did not rate faster
stimuli coding coherent valence compared to stimuli coding in-coherent
valence.
We analyzed if the ratings in the single modal conditions were influence by the non-attended stimulus dimension. Figure 4.13 shows the
FLMP fits for the expression and the linguistic semantic condition. The
dots show observations, the lines the fit of the FLMP. In the expression
condition strong affect coding words showed an influence on the ratings
of the faces. But this influences was not coherent: For example, we observed a significant positive influence of the word happy on the probability to identify a happy affect in the facial expression compared to the word
furious: Wilcoxon p < 0.01. Interestingly the word joyful had a negative
86
RT Experiment 1 and 2
4
*
Reaction time [sec]
3.5
3
2.5
2
1.5
1
0.5
0
Experiment 1
Experiment 2
Figure 4.12: The mean RT in experiment 1(M = 0.97, SD = 0.5) was significant faster compared to experiment 2 (M = 2,12, SD = 1,1) (Wilcoxon
z = 79,6, p < 0.01).
effect on the probability to identify happy affect in the face compared to
the word outraged: Wilcoxon p < 0.01 (See figure 4.13, left panel). In the
semantic condition we did not observe influences of the facial expression
on the ratings of the affect coded by the linguistic semantic (Figure 4.13,
right panel). In general the participants identified stronger positive affect
in the linguistic semantics compared to the expressive condition. A ttest comparing the 5 most positive classes revealed significant differences
across the two conditions: t-test t(8) = 2,2, p = 0.05).
To investigate the underlying mechanism of multi-modal stimulus integration we analyzed the average fit for the fuzzy logical model of perception FLMP and the weighted average model WAM for the bimodal
condition. The root mean square deviation (RMSD) for the FLMP is
rbimodal = 0.032, for the WAM rbimodal = 0.031. The two model fits did
not differ in the quality of their predictions (See figure 4.14).
4.2.10
Discussion
We did not observe the interference effects in the reaction times for coherent and non-coherent stimulus constructs in experiment 2 as we have
seen in experiment 1. The observed differences in RT between the two
87
Expression Condition
0.7
0.6
0.5
0.8
0.4
0.3
0.7
0.6
0.5
0.4
0.3
0.2
0.2
0.1
0.1
0
H4
H3
H2
H1
N2
N1
A1
A2
A3
H4
H3
H2
H1
N2
N1
A1
A2
A3
A4
0.9
P (Happy) Identification
0.8
Expression
1
Joy
Hap
Plea
Surp
Deli
Neut
Disap
Ang
Fur
Outr
0.9
P (Happy) Identification
Linguistic Semantic Condition
Semantic
1
0
A4
Jo
Ha
Ple
Sur
Expression
Del
Neu
Dis
Ang
Fur
Out
Word
Figure 4.13: Observations (symbols) and predictions (lines) for the fuzzy
logical model of perception FLMP in the expression condition (left) and
the linguistic semantics condition (right).
FLMP
0.6
0.5
0.4
0.3
0.7
0.6
0.5
0.4
0.3
0.2
0.2
0.1
0.1
0
H4
H3
H2
H1
N2
N1
A1
A2
A3
Joy
Hap
Plea
Surp
Deli
Neut
Disap
Ang
Fur
Outr
0.8
P (Happy) Identification
0.7
Semantic
0.9
Joy
Hap
Plea
Surp
Deli
Neut
Disap
Ang
Fur
Outr
0.8
P (Happy) Identification
WAM
Semantic
0.9
0
A4
Expression
H4
H3
H2
H1
N2
N1
A1
A2
A3
A4
Expression
Figure 4.14: Observations (symbols) and predictions (lines) for the fuzzy
logical model of perception FLMP (left) and the weighted average model
WAM (right). The average root mean square deviation RMSD for the
FLMP (0.032) and the WAM (0.031) did not differ in their quality of
prediction.
88
experiments could be a possible explanation for this result. The mean reaction time in experiment 2 was with 2,12 seconds much slower than in
experiment 1 (M = 0.97).
One objective of experiment 2 was to investigate the influence of one
modality (face or word) on the rating of the other. In the condition expression we did observe influences of the words on the ratings. The word
happy positively affected the identification of positive affect in the face
compared to the word furious. Surprisingly these influences were not
structured according to the valence quality. For example we observed
that the word joyful had a stronger negative influence on the probability to identify positive affect in the face compared to the word outraged.
These effects have not been observed in the linguistic semantic condition.
Different interpretations can explain this result: it could be that the perception of affect in the face is more continuous than the perception of the
affective meaning of words. It seems like we perceive the words meaning
as classes and we are capable of remembering these classes. This memory
traces are also not influenced if the word is presented with a happy or an
angry facial expression. Another interpretation is that the facial expressions were more ambiguous than the selected words that coded stronger
affect. The mean probability to identify positive affect in the linguistic
semantic condition was higher than in the facial expression condition.
A control experiment would be to let participants rate the facial expression multiple times without saying a word. A more homogeneous
identification curve would indicate that the observed inhomogeneous identifications were induced by the meaning of the words and are not intrinsic
properties of the facial stimulus.
In the second experiment we observed that the predictions of the FLMP
and the WAM did not perform significantly different. We have to point
out that the FLMP is using one free parameter less than the WAM but still
achieves the same performance. Nevertheless, this precondition does not
provide enough strength to favor the FLMP over the WAM for the matching of the observed behavioral data. The bad fit of the FLMP can have
two interpretations: First, in general the model does not make not stronger
predictions for non-extreme observations. The fact that participants did
89
not perceive maximal or minimal positive affect in the bimodal condition
can be an explanation for the bad fit of the FLMP. The second interpretation is that the integration of the affect coded in facial expressions and
linguistic semantics does not follow the mechanism of the FLMP. Because the WAM also does not make significantly better predictions we
cannot say which of the two models is more appropriate to explain the
multimodal integration of affect.
4.3
Conclusion
We investigated the perception of affect in a talking face focusing on linguistic and expressive influences. In experiment 1 we observed interferences of the non-attended dimension on the judgment of the attended
dimension. These interferences were measured in differences in reaction
time and judgments. The observations of the multi-modal condition were
significantly better predicted by the FLMP than by the WAM. In experiment 2 we only observed the interfering of the linguistic meaning on the
judgments of the facial expressions. While in the first experiment we observed interferences as measured by differences in reaction time, in the
second experiment we observed interferences measured by changes of
the probability to identify positive affect. Surprisingly these interferences
were not coherently structured according to the valence quality continuum. The observations of the bimodal condition were not significantly
better predicted by the FLMP than by the WAM.
A main difference between experiment 1 and 2 is the mean reaction
time. The slider interface of experiment 2 increased the time participants
used to rate the stimulus from approximately 1 second to 2 seconds (See
figure 4.12). This difference in reaction time could give us an explanation for the different results observed in the two experiments. The fast
responses in experiment one could have favored automatic processing of
the affect. In experiment two participants took more time and the responses probably were more controlled. It has been shown that the FLMP
makes better predictions for automatic processes than for controlled one.
90
Therefore, the good predictions of the FLMP in experiment 1 supports the
interpretation that the evaluation and integration of the affect was, in this
case, based on automatic processing.
In experiment 2 RTs were significantly longer. This indicates that the
answers were more controlled than in experiment 1. In experiment 1 we
have observed interferences measured in differences in RTs. But when
we look at the judgments in both experiments we see that the meaning of
the non-attended feature is not integrated into the final perception. Participants based their decision mainly on the valence quality of the attended
feature. We have observed only unstructured influences of the word on
the judgments of affective faces in experiment 2. The inhomogeneity of
these results increases our doubt that this is a reliable influence. The similar performance of the FLMP compared to the WAM in experiment 2
brings us to the conclusion that the controlled integration of affect coded
in the word and the facial expression does not follow the same mechanism
observed in the bi-modal speech perception.
Bringing the results from the two experiments together we conclude
that the perception of affective linguistic and expressive features happens
automatically. The differences in RT to evaluate stimuli coding coherent or in-coherent valence quality observed in experiment 1 indicate that
the two processes are interfering with each other, also when participants
are instructed to focus on either the linguistic or the expressive feature.
Masked priming experiments showed that both, the processing of affective words (Greenwald et al., 1989; Dehaene et al., 1998; Bargh, 1989;
Kihlstrom, 1987) and facial expressions can happen automatically (Dimberg et al., 2000; Winkielman et al., 2005). This means that the participants cannot avoid the perception of the meaning of the non-attended
feature. The fact that only words and faces coding strong affect had the
power to interfere indicates that the observed phenomenon is not a capacity problem to process two stimuli at the same time. A more satisfying
explanation however is that the valence quality is the crucial factor responsible for this interference. The fact that the evaluation of the valence
quality is mainly a sub-cortical process supports this interpretation.
Our results show that when participants have enough time they do not
91
integrate a multi-modal affective stimuli according to the mechanism of
the FLMP. We know that the processing of the linguistic semantic and
different aspects of face perception is located in different cortical areas
(Schirmer and Kotz, 2006). Our results support the idea that the communication between these areas is not based on automatic mechanisms.
We propose that the integration of affective features communicated by the
face and in the linguistic semantic is more controlled than the bi-modal
speech perception and perhaps uses a different but so far unknown integration processes. Future studies should try to address this issue if we
want to understand how humans integrate emotions perceived by a talking
face.
92
Chapter 5
COMPUTATIONAL MODEL
OF EMOTION INDUCED
LEARNING
One of the most interesting questions in emotion research is how the brain
processes affect and how this mechanism influences behavioral performance and cognitive activity. One approach to study this phenomenon is
to construct computational models using both, the knowledge from studies investigating the anatomical architecture of the neuronal network and
physiological data of the brains’ activity pattern from real world experiments. By comparing the performance of the model with the performance
of the neurobiological system we gain insight about the underlying neurobiological mechanisms.
The emergence of emotions is a complex multidimensional process
that involves different brain areas. Because of this complexity researchers
that model emotions identified specific aspects of emotion processing
(Velásquez, 1997; Gebhard, 2005; Gratch and Marsella, 2005; Armony
et al., 1997; Marsella and Gratch, 2009; Mor, 1995; El-Nasr et al., 2000).
Here we address the mechanism of classical conditioning that is affected
by the emotional strength of a stimulus. We investigate how the underlying mechanisms of affect evaluation influences behavior and memory
93
acquisition.
5.1
The Two Phase Model of Conditioning
Learning is defined as a change in behavior that occurs as a result of
experience (Mackintosh, 1974). The classical conditioning paradigm introduced by Pavlov (Pavlov, 1927) is based upon the association of two
stimuli. A conditioned stimulus (CS) such as a tone produces either no
overt or a weak response, usually unrelated to the response that eventually
will be learned. The unconditioned stimulus (US) such as a shock to the
leg elicits a strong, consistent response called the unconditioned response
(UR). Presenting the CS before the US will start to elicit a new response:
the conditioned response (CR), that reaches its peak amplitude just before
the excepted US. The probability to observe a correctly timed conditioned
response increases over multiple training sessions.
The classical conditioning paradigm provides an opportunity for the
acquisition of both emotional and motor CRs. In the 1960ths, Mowrer investigated how the avoidance of a conditioned stimulus that induces fear
can act as a reinforcer for associative learning (Mowrer, 1960). His study
stimulated the discussion how emotional states affect behavioral adaptations. He and Miller formulated the two-factor learning theory, that states
that behavior that reduces fear will be reinforced (Miller, 1948). Based on
this idea the polish psychologist Jerzy Konorski studied in the early 1960s
the relative independence of classical and instrumental conditioning responses (Konorksi, 1948; Konorski, 1968). He proposed the existence of
two distinguishable associative learning mechanisms: A fast non-specific
learning system (NLS), that produces within 1 to 5 trials a global state
of arousal and elicits simple self-protective reaction patterns and a slow
specific learning system (SLS), that is responsible for the accumulation of
fine tuned motor reactions over a longer period of conditioning (Ellison
and Konorski, 1964). Acquisition of such motor CRs however requires
massive training and the response involves the musculature of organs
challenged by the aversive US (Schneiderman et al., 1962; Powell and
94
Levine-Bryce, 1988). This distributed learning mechanism was conceptualized as the Two Phase Theory of Learning (Ellison and Konorski, 1964;
Konorksi, 1948; Konorski, 1968; Rescorla and Solomon, 1967; Rescorla
et al., 1972; Gormezano et al., 1987; Bakin and Weinberger, 1990), stating that association involves two stages: rapid stimulus-stimulus learning
followed by slower stimulus-response learning. The first step shows that
the subject has learned the apparent cause–effect relationship of the stimuli, i.e., is able to predict the course of events. The second step shows that
the subject is attempting to alter its physical relationship to the outside
world, by reducing the impact of the US if it is noxious.
At an abstract level we have already modeled this relationship using
our Distributed Adaptive Control Architecture that has been successfully
applied to robots (Verschure et al., 2003). We have also provided a formal analysis of how prediction based models for perceptual and behavioral learning can be interfaced (Duff and Verschure, 2010). Here we
specifically generalize these abstract models to a biologically constrained
solution in terms of the NLS and the SLS (Inderbitzin et al., 2010a).
Eye-blink Conditioning
One of the best studied cases of associative learning is the eye-blink conditioning paradigm, which was introduced by Gormezano (?). In this
paradigm, a tone or light (CS) is paired with an air puff or electric shock
(US) to the eye. The US alone leads to a reflexive eye-blink (UR). The
CS–US pairing results in a precisely timed closure of the eyelid, milliseconds before the predicted air–puff or electrical shock arrives.
Eye-blink conditioning provides an experimental set up that allows us
to study in detail the multi-dimensional mechanisms that lead to the acquisition of the CR. An aversive US induces within few trials a range of
bodily responses (e.g. freezing, changes in cardiovascular rhythm, respiratory systems) (LeDoux, 1996). Evidence of the areas responsible for
eyelid and fear conditioning were obtained by removing or destructing
various brain areas and examining whether learning was still possible. By
inactivating or removing the amygdala, it was shown that the construction
95
of CS representational maps in the cortex was negatively affected (Armony et al., 1998) and the CR is disrupted (Phillips and LeDoux, 1992;
LeDoux, 2000). By lesioning the vermis of the cerebellum the CR can be
abolished without the UR being affected (Thompson, 2005). The same
result can be observed, if instead of the cerebellum, the interpositus nucleus, a deep cerebellar nucleus, is lesioned (Thompson, 2005; Fanselow
and Poulos, 2005). While the lesion of the cerebellar cortex has a negative impact on the exact timing of the CR (Perrett et al., 1993). This
studies support the view that the amygdala is responsible for the acquisition of emotional CRs, taking the form of non-specific, autonomic arousal
(Lennartz and Weinberger, 1992) and the cerebellum for the induction of
the motor conditioned reaction, in the form of an exactly timed CR.
Stimulus–Stimulus Conditioning
Rapidly-developing CRs like the change of heart rate, respiration, blood
pressure or skin conductance, develop regardless of the locus or the type
of the US (Schneiderman et al., 1962; Powell and Levine-Bryce, 1988).
These reactions have been termed non–specific (Lennartz and Weinberger,
1992). Such non-specific CRs have an important role in behavioral adaptation during conditioning. In 1956 Galambos et al. identified the primary
auditory cortex as a location of associative plasticity (Galambos et al.,
1956). Subsequent neurophysiological studies then further strengthened
this long-thought idea of learning–induced plasticity in sensory cortices
(Weinberger, 2004). Classical fear conditioning to another CS retunes
the receptive fields in the primary auditory cortex to favor the processing
of the frequency which was used as the CS (Bakin et al., 1996; Bakin
and Weinberger, 1990; Kisley and Gerstein, 2001). These changes of
receptive fields develop very rapidly (Edeline et al., 1993). It has been
shown that the amygdala codes aversive events (LeDoux, 2000; Paton
et al., 2006; Tazumi and Okaichi, 2002) and stimulates subcortical modulatory cell clusters like the nucleus basalis in the basal forebrain (Aggleton, 1992; LeDoux, 1995). The activation of the nucleus basalis by
the amygdala releases cortical acetylcholine, a modulatory neurotransmit96
ter, that acts as an inducer of plasticity in the cortex (Gold, 2003; Wenk,
1997).
Stimulus–Response Conditioning
Stimulus–response associations are responsible for forming the specific
somatic motor responses that are directed to a specific unconditioned
stimulus (US). Such CRs must not only be specific to the locus of the
nociceptive US, but they also need to be well–timed, to occur preceding
and during delivery of the US.
One brain region that is highly involved in the controlling of well
coordinated motor behavior is the cerebellum (Perrett et al., 1993). It
receives sensory information from cortical and subcortical parts of the
brain and integrates these inputs into a fine tuned motor response. Lesion
studies involving the classical eye-blink conditioning paradigm provide
strong evidence that the cerebellum is one location where the acquisition
of stimulus–response conditioning can be observed (Krupa and Thompson, 1997). Inactivating of the different cerebellar structures prevents the
construction of a measurable CS–CR relation. Several formulations for
the adaptive plasticity of sensory-motor response in cerebellum have been
described (Albus, 1975, 1971; Marr, 1969; Floeter and Greenough, 1979).
Multiple investigations have shown that the granule cell–purkinje cell–
deep nucleus circuit is a locus of CS–US convergence (Ito, 1989, 2002).
Theories that assign learning to cerebellar circuits are based on the observation of activity induced synaptic plasticity at the level of the parallel
fibre - Purkinje cell (James et al., 2004). The CS activates the PU over the
mossy-fiber connection, the US excites the purkinje cell by the inferior
olive–climbing fibre pathway. This CS - US convergence at the locus of
the purkinje cells leads to a co-activation and an induction of long term
potentiation at a synaptic level (Aizenman et al., 1998; Ito, 1989, 2002).
This long-lasting reduction of synaptic strength induces a dis-inhibition
of the deep nucleus.
97
The Link
Physiological and lesion studies have identified the basilar pontine nuclei
as a relay structure that transmits auditory information from the cortex to
the cerebellum (Steinmetz et al., 1991; Thompson, 1986). This cell structure receives input from the cochlear nuclei, nuclei of the lateral leminiscus and the inferior colliculus. Lesions of this nuclei result in a disruption
of the motor CR (Steinmetz et al., 1987; Lewis et al., 1987). Stimulation
of the pontine as a substitute of an external CSs leads to a fast induction of
conditioning (Steinmetz et al., 1986). Animals exposed to pontine stimulation showed in follow up exposure to a real tone CS’ an immediate CRs
(Steinmetz, 1990). This findings indicate that the pontine nucleus acts as
a gate of auditory stimuli transmission to the cerebellum.
5.2
5.2.1
Methods
The circuit
We propose a model of the two phase theory of conditioning with system architecture that is constructed by two subsystems: The Non-specific
learning system (NLS) and the specific learning system (SLS) (Figure
5.1). We study local and global learning mechanisms of activity induced
plasticity in an integrated neuronal circuit, that models the auditory system, including the subcortical amygdala and nucleus basalis and the cerebellum. In both systems we model synaptic plasticity at the locus where
CS and US converge.
5.2.2
The Non-specific Learning System
We propose to model plasticity in the non-specific learning system with
a circuit including the amygdala, the nucleus basalis and the primary
auditory cortex. The amygdala plays an important role in learning to
respond defensively to stimuli that predict punishment (LeDoux, 2000,
1996; Phillips and LeDoux, 1992) and the elicitation of fast non-specific
98
Figure 5.1: The architecture of the integrated model: The Non-specific
learning system (NLS) is shown on the left, the specific learning systems
(SLS) on the right. In the NLS the activation of the amygdala (A) and the
nucleus basalis (NB) induces plasticity in the auditory cortex (AC). The
conditioning stimulus (CS) reaches the auditory cortex over the thalamus
(Th) where it converges with the unconditioned stimulus (US). Inhibitory
interneurons (IN) regulate the amount of plasticity. The pontine nucleus
(PN) gates the stimulation from the NLS to the SLS. In the SLS the CS
and the US converge at the level of the purkinje cell resulting in the induction of LTD at the purkinje synapse. This induces a dis-inhibition of the
deep nucleus (DN) leading to the exact timed motor conditioned response
(CR). The reflexive unconditioned response (UR) is elicited without adaptive processing. A amygdala; AC auditory cortex; CS conditioning stimulus; DN deep nucleus; GC granule cells; IN inhibitory interneurons; IO
inferior olive; NB nucleus basalis; CR conditioned reaction; PN pontine
nucleus; PU purkinje cell; Th thalamus; US unconditioned stimulus
99
arousal states (Dedovic et al., 2009; LeDoux, 2000). Rodent fear conditioning studies have reported that amygdala lesions selectively impair
acquisition and expression of conditioned fear responses to the CS, without altering unconditioned reflex responses to the innately aversive US
(Phillips and LeDoux, 1992). Lee et al. were able to demonstrate that,
consistent with the two-phase model of conditioning, rats exhibit two
successive stages of non-specific emotional (fear) and specific musculature (eyelid) learning during delay eye-blink conditioning (Lee and Kim,
2004). As a cell cluster that is highly connected to subcortical modulatory systems (LeDoux, 2000; Aggleton, 1992), the amygdala can be seen
as a relay station that channels the valence quality of a stimulus to other
parts of the brain. One of the target destination of the amygdala’s output
is the cholinergic neurons of the basal forebrain. Cholinergic neurons of
the nucleus basalis regulate globally synaptic plasticity in the cortex. Experimental examples of specific learning–induced cortical plasticity are
studies of the auditory cortex A1 (Weinberger, 2004; Bakin and Weinberger, 1990; Weinberger, 1998). The ventral medial geniculate body of
the thalamus (MGv) transmits the tone detection from the cochlea to the
primary auditory cortex. The released ACh is a result from the amygdala
- nucleus basalis stimulation and acts at muscarinic receptors in A1. Converging events with cortical excitation from the effects of the tone thus
produces long-term plasticity.
Spike Time Dependent Synaptic Plasticity STDP
The timing of pre- and post synaptic activity are the crucial factor for the
adaptation of signal transmission at a synapse (Markram et al., 1997; GuoQiang and MuMing, 1998). Back-propagating action potentials BAPs,
that travel backwards from the soma to the dendrite (Stuart and Sakmann,
1994; Kuczewski et al., 2008) and the inhibition of BAP’s through inhibitory interneurons are regulating the activity pattern on a synaptic level
and thereby the induction of STDP (Lowe, 2002). I(t) is the amount of inhibition received during the interval [t, tpost ]. If the inhibition I(t) is high
the back-propagating AP gets blocked and synaptic depression is induced.
100
The pre-synaptic activity pattern causes 2 types of depressions: Longterm depression LTD if there is no pre-synaptic activity, heterogenous
long-term depression HLTD if there is some pre-synaptic activity. If the
inhibition I(t) is low and not strong enough to block the back-propagating
AP, LTP is induced.
The synaptic efficacy of the weights in the current model evolve according to a modification of a recently proposed learning rule, which utilizes back-propagating action potentials (Sanchez-Montanes et al., 2002;
Hofstoffer et al., 2002). The efficacy of a synapse is increased, if a backpropagating action potential arrives at a synapse simultaneously, within a
small symmetrical temporal window when a pre-synaptic action potential
arrives:
τ0
(5.1)
∆w = αLT P
τ0 + |tpost − tpre |
with αLT P being the LTP learning rate, τ0 = 10 defining the temporal
window and tpost , tpre the timing of the pre– and postsynaptic action potential respectively. The activation of the inhibitory interneurons through
the negative feedback loop attenuate this retrograde propagation in the
dendritic trees of the cortical excitatory neurons, decreasing the efficacy
of the activated synapses according to:
∆w = −βLT D
τ0
τ0 + |tpost − tpre |
(5.2)
with βLT D being the LTD learning rate, τ0 = 10 defining the temporal window and tpost , tpre the timing of the pre– and postsynaptic action
potential respectively. To further alter the weights, an additional heterosynaptic LTD (HLTD) was implemented, which decreases the synaptic efficacy if postsynaptic activity occurs without coincident presynaptic activity:
∆w = αheteroLT D
(5.3)
with αheteroLT D being the heterosynaptic LTD learning rate. The modification of the weights is therefore crucially dependent on the temporal
dynamics of the neuronal network, taking the relative timing of the excitatory and inhibitory inputs to the cortical neurons into account. In our
101
model US activity drives the nucleus basalis activity and modulates the
inhibition of the cortical interneurons, in this way regulating the ratio of
LTP/LTD in the network.
5.2.3
The specific Learning System
The model described here is an extension of the model published by Verschure and Mintz (Hofstoffer et al., 2002). The circuit is built up by the
granule cells, purkinje cells, inferior olive, deep nucleus, mossy fibers,
climbing fibers and parallel fibers (Figure 5.2). The system receives input from the NLS via pontine nucleus. The co-activation of the purkinje
cell by US induced climbing fibre activity results in a reduction of synaptic efficacy at the PF–PU synapse, or long-term depression LTD. (Ito,
1989). PF stimulation alone leads to a weak net increase of the connection strength of the PF–PU–synapse or LTP.
Purkinje Cell The Purkinje cell is composed of three different compartments (Figure 5.2 ). The compartment representing the soma of the
cell, called PU–SO, receives excitatory inputs from PU–SP, PU–SYN and
IO. PU–SP is responsible for the spontaneous activity of the Purkinje cell.
PU–SYN represents the dendritic region of the PU which forms synapses
with PF. PU–SO emits spikes as long it is not inhibited by the inhibitory
neurons (I). PU–SYN on the other hand represents the metabolic postsynaptic responses in Purkinje cell dendrites to parallel fiber stimulation.
Unlike a generic integrate and fire neuron, PU–SYN does not emit spikes
but behaves like a linear threshold neuron showing continuous dynamics. In order for the PU to form an association, a permanent trace – the
eligibility trace – has to be present in its dendrites (PU–SYN). The high
persistence values of PU–SYN, β SY N , defines this prolonged responses
in PU dendrites, forming a CS–trace.
In the present model a CS–trace obtained through prolonged responses
in PU–SYN dendrites, allows the association of CS and US. Such a notion
is supported by physiological studies (Wang et al., 2000) and has already
been suggested by Hull (Hull, 1939). Thus, synapses which have been
102
Figure 5.2: The architecture of the cerebellar SLS. The CS and the US
converge at the purkinje cell synapse (PU-SYN). CF climbing fibre, CR
conditioned reaction, CS conditioned stimulus, DN deep nucleus, GA
granule cells, GO golgi cells, IIN inhibitory interneurons, IO inferior
olive, MF mossy fibre, PF parallel fibre, PU-SP purkinje cell spontaneous
activity, PU-SO purkinje cell soma, PU-SYN purkinje cell synapse, US
unconditioned stimulus.
103
activated by a CS–related input remain eligible to US–induced weight
changes for some period of time. The Purkinje cell operates in two modes:
a default, spontaneous mode and a CS–mode. In the spontaneous mode,
the PU–SP compartment is active, providing the tonic inhibition of the
Deep Nuclei. Once a CS is presented, this activity is suppressed through
inhibition. The duration of this suppression is matched to the duration of
the CS–trace in PU–SYN. Tonic inhibition from now on is under the control of PF. To support the learning mechanism outlined, the model needs
to account for the acquisition of a pause in Purkinje cell activity following
a CS. Only this would lead to a rebound excitation in the deep nucleus.
Synaptic plasticity in the model
The processes contributing to the learning effect in the cerebellum is LTD
and LTP. Many experiments have shown that synapse weights between the
PF and PU undergo plasticity (Ito, 1989; Aizenman et al., 1998). In order
to learn, the weight has to be altered during the conditioning process.
According to the Marr–Albus theory (Marr, 1969; Albus, 1971), which
was implemented in the model, LTD can only occur in the presence of an
active stimulus trace once the CF gets activated, thus alterations can only
happen, if there is a stimulus trace of a CS (AP U SY N > 0) in the PU–
Dendrites (PU–SYN). Such a CS–trace, which is believed to be formed
by a prolonged metabolic second–messenger response in Purkinje cells
following parallel fiber stimulation, was included in the model through
high persistence values of PU–SYN. PF stimulation alone leads to a weak
net increase of the connection strength of the PF–PU–synapse or LTP,
while activation of a CF in the presence of an active stimulus trace leads
to a net decrease or LTD.
LT P
LT P
Long–term Potentiation Rule In the present model Emin
and Emax
LT P
LT P
define the range in which an LTP can be triggered. If Ei ∈ [Emin
, Emax
]:
max
wij (t + 1) = wij (t) + η(wij
− wij (t))
104
(5.4)
otherwise:
wij (t + 1) = wij (t)
(5.5)
η describes the rate constant for the potentiation. The chosen values
for these parameters allow several weak potentiation events following a
Pf input.
Long–term Depression Rule The magnitude of the long-term depression in the present model is determined by the internal calcium concentration. As the model described in this work is an abstract and reduced
description of the cerebellum, the notion of an internal calcium concentration has to be seen as the internal trace of a past CS event. Work from
Coesmans and colleagues (Coesmans et al., 2004) supports the concept of
such a calcium dependent response. They observed that the bidirectional
PF long–term plasticity is governed by a calcium threshold mechanism,
which is characterized by a high calcium threshold for LTD and a lower
calcium threshold for LTP. The minimal value for a LTD to be triggered
LT D
.
is defined as Emin
LT D
;
wij (t) if Ei > Emin
(5.6)
wij (t + 1) =
wij (t) otherwise.
describes the rate constant of the depression.
5.2.4
Integrating the NSL with the SLS
The connection of the non-specific learning system, responsible for the
cortical CS representation and the specific learning system, responsible
for the exact timing of the CR, produces an integrated model of the two
phase theory of learning. In our study the pontine nucleus has a gating
function, allowing the transmission of stimulus with behavioral importance. The pontine nucleus is built by an integrate and fire neuron i with
a membrane potential at time t + 1, Vi (t+1):
Vi (t + 1) = βVi (t + 1) + Ei (t) + Ii (t)
105
(5.7)
where β[0, 1] is the persistence of the membrane potential which defines
the speed of the decay towards the resting state, Ei (t) and Ii (t) the excitatory and inhibitory input at the time t.
The functionality of the integrated model is dependent on the quality of the cortical representation, which is a function of the strength of
the STDP induced plasticity in the auditory cortex and from the gating
threshold of the pontine nucleus. The model transmits only stimuli with a
behavioral importance from the NLS to the SLS.
We tested the performance of our network with an eye-blink conditioning simulation. The auditory cortex was constructed by an cell array
of 50 units. All the other components of the model were built by single
cell units. The CS was a conceptualized auditory stimulus coded as a pattern of 5 active cells in an array of 50. 30 trace conditioning trials with
a CS exposure time of 400 ms and an US exposure time of 100 ms were
applied to the model. To check the performance of the model the CS and
the 4 different control stimuli were exposed after the conditioning phase.
5.2.5
Robot Application
In a second step we verified the reliability of the model by checking the
performance of an autonomously behaving robot in an obstacle avoidance
task (Figure 5.3). In an open field arena the robot had to learn to avoid
any collision with the wall by detecting a red color patch with a camera.
In this set up the detection of the wall by the robots’ proximity sensors
were used as the US, the detection of the red color by the camera as the
CS. Because the visual field of the camera exceeded the sensitive range of
the proximity sensors, the detection of the CS was always prior to the US.
To keep the interstimulus interval constant the velocity of the robot did
not changing. The robot was moving freely around until it a conditioned
response in form of a exactly timed turning was observed.
106
Figure 5.3: Robot application: A ePuck robot moves autonomously in a
circular open field arena. The association of the red color on the floor
detected by a camera (CS) and the detection of the wall by proximity
sensors (US) induced learning in the proposed computational mechanism.
The green arrows indicates the moving direction of the robot.
5.3
5.3.1
Results
Performance of the Integrated Model
We recorded the activity of the auditory cortex (AC), the purkinje cells
(PU) and the deep nucleus (DN) before, during and after the eye-blink
conditioning simulation. An analysis of the learning curve of the NSL
and the integrated model was made to proof the timing of the adaptive
processes. A quantification of the response was made by analyzing the
spiking behavior of the different cell groups. In the non-specific learning
system we observe before conditioning a homogenic intensity in response
(Figure 5.4, left). The co-activation of cortical cells by the amygdala–
nucleus basalis pathway and the thalamic pathway induces a tonotopical
reorganization of the different tones. After conditioning a bigger response
of cortical cells to the CS can be observed (Figure 5.4, right). Once the
CS representation has increased a critical threshold, the pontine transmits
107
the signal to the specific learning system.
Figure 5.4: Reactivity of the auditory cortex before and after the conditioning. CS is the stimulus with ID 1. Before the conditioning the cortical
reaction to all 5 stimuli is homogenic. After the conditioning the cortex
response to the CS is increased.
The plasticity in the integrated model starts when co-activation of the
purkinje cell from the parallel fibers (CS) and the climbing fibres (US)
coincide. This co-activation of the purkinje cell induces LTD that results
in an decrease of the purkinje cell activity (Figure 5.5). During conditioning trial 12 this activity under-run the first time the threshold causing
the dis-inhibition of the deep nucleus, that leads to the elicitation of a first
imprecise motor reaction. As long as the CR is not optimal timed the sustained LTD induction results in an ongoing decrease of the purkinje cell
activity until the exactly timed CR is established. Before conditioning no
increase AC activity can be observed and the PU keeps its deep nucleus
inhibition constant (Figure 5.6). After Conditioning the AC reacts with
an increased firing rate and the PU induced pause releases the rebound inhibition of the deep nucleus, responsible for the exact timed conditioned
response (Figure 5.7).
108
Figure 5.5: Learning of the exactly timed CR by the SLS: The PU cell activity decreases during conditioning trials 1-13. During trial 12 the activity under-runs for the first time the threshold resulting in the dis-inhibition
of the deep nucleus. During trial 13 the PU cell activity under-runs the
threshold before the US and an exactly timed CR is triggered. The CS
and the US are only schematically represented in this plot.
109
Figure 5.6: The performance of the integrated model before the conditioning. The purkinje cell (PU) does not change its activity and no CR is
elicited. CS conditioned stimulus, US unconditioned stimulus, AC auditory cortex, PU purkinje cell, CR conditioned reaction.
Figure 5.7: The performance of the model after the conditioning. The
CS representation in the auditory cortex (AC) is increased. A delayed
pause in the purkinje cell (PU) can be observed. The CR is elicited just
before the US presentation. CS conditioned stimulus, US unconditioned
stimulus, AC auditory cortex, PU purkinje cell, CR conditioned reaction.
110
5.3.2
Performance of the Robot
In the beginning of the robot experiment the ePuck drives over the red
area (CS) until he detects the wall of the arena with his proximity sensors
(US). The late turning can be classified as a unconditioned response or
reflex (Figure 5.8). A co-activation of the Purkinje synapse (PU-SYN)
by the CS and the US induces LTD decreasing the synaptic weight of
the PF-PU synapse (Figure 5.10). After 113 conditioning trials the robot
performs for the first time a conditioned response in form of an early
timing. From this point the robot avoids the wall as soon as the camera
detects the red color (Figure 5.9). The blue line indicates the track of the
robot.
Figure 5.8: The behavior of the ePuck robot before conditioning. The
robot enters the red area of the arena. The proximity sensors detect the
wall (US) and elicit the unconditioned response (UR) in form of a late
turning. The blue line indicates the track of the robot in the arena.
.
111
Figure 5.9: The behavior of the ePuck robot after conditioning. The robot
does not enter the red area of the arena. The camera detects the red color
(CS) and the model elicits a conditioned response (CR) in form of an
exactly timed turning. The blue line indicates the track of the robot.
.
112
Figure 5.10: The change of the synaptic weight at the level of the PF-PU
during the robot experiment. Every time a CS and a US coincide at the
level of the purkinje synapse LTD becomes induced. Once the synaptic
efficacy reaches a critical level a conditioned response becomes trigger
avoiding future LTD induction and the synaptic weight becomes stable.
.
113
Figure 5.11: The performance of the ePuck robot measured by percentage
of performed conditioned response and occurred US. After 113 trials the
robot shows conditioned behavior. The fluctuation in response is due a
spontaneous recovery of the synaptic transmission at the Purkinje cell.
Whiskers indicate STD.
.
114
5.4
Conclusion
We have presented an integrated model of the two phase theory of conditioning including neurobiological constrains of the non-specific and specific learning system. In a simulated eye-blink conditioning experiment
we have demonstrated in a first step that the model increases cortical representations of stimuli with behavioral importance. The models capability to gate those representations to the specific learning system induces
the adaptation of the exact timed CR. The performance of the NSL is
controlled by the biologically based STDP taking into account the effects of back-propagating action potentials. The inhibition of these backpropagating APs by inhibitory interneurons is a fundamental controlling
mechanism of the strength of the STDP. The performance of the specific
learning system is controlled by the rate of the LTP and LTD and the
CS-trace at the level of the purkinje cell. Integrating the circuits of the
non-specific and specific learning system we demonstrate how cortical
plasticity supports effective cerebellar associative learning.
115
Chapter 6
CONSTRUCTING AN
EMOTIVE ANDROID
So far we have presented studies that either investigated the perception
of emotions or the underlying computational mechanisms. In the next
chapter we want to introduce a study that combines this two approaches
constructing an emotive android. While robots can have any form and
function, an android is defined as a synthetic system that is designed to
look like and behave like humans. In the recent years we have observed a
dramatic increase of such robots. In the present study we implemented a
model of fear into a humanoid robot to control his behavior. The here presented result is a collaborative work by Zenon Mathews, Etienne Roesch,
Can Erogul, Cassandra Gould and myself.
6.1
The Neurobiological Mechanism of Fear
The evaluation of a threatening situation and the elicitation of an appropriate response to it is one of the most important survival mechanism
(LeDoux, 1996). Fear is an emotion that has been intensively studied
using the behavioral paradigm of fear conditioning (Maren, 2001). This
paradigm is based on the association of a neutral stimulus like a tone with
an aversive stimulus like a foot shock resulting in the expression of fear
117
responses to the original neutral stimulus. The neutral stimulus is called
’conditioned stimulus’ (CS), the aversive stimulus ’unconditioned stimulus’ (US) and the response is the ’conditioned response’ (CR). A classical
example is when an animal learns to freeze as a response to a conditioned
stimulus (See figure 6.1).
Figure 6.1: During the conditioning phase (left panel) an animal becomes
exposed to a neutral tone (CS) and an aversive foot shock (US). After
the conditioning phase (right panel) the animal reacts with a freeze response when exposed to the original neutral tone (CS). Figure adapted
from Nadel and Land (2000).
The processing of the valence quality of a stimulus or its relevance has
been located in the subcortical cluster called amygdala (LeDoux, 2000;
Paton et al., 2006; Tazumi and Okaichi, 2002; Sander et al., 2003). The
amygdala is highly connected to modulatory sub-systems and behavioral
response centers in the brain stem (Aggleton, 1992; LeDoux, 1996). The
two different pathways transmitting signals to the amygdala have been
identified (See figure 6.2). The low route transmits the signal without
conscious experience over the thalamus to the amygdala. It is the fast
route to a non-specific bodily response. The high route is activated simultaneously and involves cortical clusters to evaluate the importance of
the stimulus and the elicitation of specific stimulus directed responses.
This process takes more time but provides more information about the
importance of the stimulus.
Recently it has been proposed that the association of CS and US in
118
Figure 6.2: An aversive stimulus is transmitted by two pathways to the
amygdala: The low route transmits the sensory information directly from
the thalamus to the amygdala. This route is fast and responsible for unspecific behavioral responses. The high route sends the sensory input to
cortical areas for the evaluation of the stimulus features. This route is
slower, but capable to elicit more specific cognitive and behavioral responses. Figure adapted from LeDoux (1994).
119
the amygdala is based on the Hebbian plasticity mechanism (Armony
et al., 1997; Johnson et al., 2008). Hebb’s rule states that any two cells
or systems of cells that are repeatedly active at the same time will tend
to become ’associated’, so that activity in one facilitates activity in the
other (Hebb, 1949). The theory is often summarized as ’Cells that fire
together, wire together’. This means in the context of a fear conditioning
paradigm that the activation of two information streams induces plasticity on a synaptic level. This mechanism has been proposed by different
animal studies using the fear conditioning paradigm (See figure 6.3):
Figure 6.3: The processing of a neutral CS and an aversive US. When CS
and US coincide at the location of the amygdala, learning is induced. The
results are different physiological and behavioral responses. LA lateral
amygdala, CE central amygdala, CG central gray, LH lateral hypothalamus, PVN paraventricular hypothalamus. Figure adapted from Medina
et al. (2002)
In the following study we use this paradigm of fear conditioning to
120
equip a humanoid agent with learning capabilities to control appropriate
emotional expressions.
6.2
6.2.1
Embodied Emotive Model
Model Architecture
We constructed a neurobiologically constrained model of fear conditioning that consist of 3 subunits: The visual thalamus, the auditory thalamus
and the amygdala (See figure 6.4). This system is capable to process a
visual CS (red or blue color) and an aversive US (loud tone). When the
CS and the US coincide in the amygdala, an adaptation in plasticity is
induced. After the conditioning the CS is able to trigger solely the behavioral response.
The implemented Hebbian learning rule changes the weight of the
synaptic transmission in the amygdala:
p
wi,j
1X k k
x x
=
p k=i i j
(6.1)
where wi is the weight of the connection of neuron i and j, p is the
number of training patterns, and xki the kth input for neuron i. The implemented slow decay of activity in the activated cells defines the time
window sensitive for learning.
As an experiment platform we used the iCub, a humanoid robot that
is equipped with multiple sensors and the capability to communicate his
emotional state using facial expressions (Sandini et al., 2007). The iCub
as two color cameras and two microphones to detect visual and audio
input.
6.2.2
Experimental Design
We conditioned the iCub using an aversive audio noise (US) and the
color blue (CS) and red (non-conditioned stimulus NS). The behavioral
121
Figure 6.4: Schematic representation of the fear conditioning model. The
visual stimulus and the audio stimulus are transmitted over the thalamus
to the amygdala where they coincide. This co-activation induces an adaptation of the synaptic weight. After conditioning the change in synaptic
weight allows the CS to trigger the behavioral response.
122
Figure 6.5: The iCub uses led lights to express different emotions in the
face. The picture shows his angry expression that was used in the present
study.
123
response was either an angry facial expression (CR) or a happy facial expression (UR) (See figure 6.6).
Figure 6.6: Experimental design of the fear conditioning in the iCub. The
association of a neutral CS with an aversive US induces a change in plasticity. After conditioning the CS alone is capable to elicit the behavioral
response. A non-conditioned stimulus NS elicits also after the conditioning phase a unconditioned response.
6.2.3
Conditioning
Before the conditioning the iCub smiles when he detects one of the two
colors (See figure 6.7 A and B). Solely listening to the noise (US) the
robot expresses an angry face. During conditioning the iCub was exposed
to the blue color (CS) and to 4-5 noise events (US). The number of events
is depending on the adaptation of the synaptic weight defined by the formula 6.1. After the conditioning the robot responds with an angry face
124
when seeing the blue hat, but still with a happy face when seeing the red
backpack (Figure 6.7 C and D). A video of the conditioning procedure can
be seen in our youtube channel:www.youtube.com/user/SpecsU P F
Figure 6.7: The conditioning phase of the iCub. Before conditioning the
iCub smiles when seeing either red (A) or blue (B). During the conditioning phase the robot sees the blue while hearing 4-5 aversive noise events
(C). After the conditioning the robot reacts with an angry face when seeing the blue hat (E), but still smiles when seeing the red color (D).
6.2.4
Discussion & Conclusion
Fear is one of the most important emotions responsible for a fast evaluation of a situation and the elicitation of fight or flight responses. Using
the paradigm of fear conditioning the processing of this emotion has been
extensively studied. In this study we successfully implemented a neurobiologically constrained model of fear conditioning into humanoid robot.
We used this model to perceive different types of stimuli, to learn to associate some of them and use this association to elicit appropriate expressive
behavior.
The construction of a robot that is not only capable to make logical
calculations but rather interact in a social meaningful way with its environment and other people is a big ongoing challenge. An artificial system
that can achieve this task has to be capable to perceive and understand the
valence quality of a situation and to integrate this perception with experiences already stored in memory to express appropriate responses.
125
Humans have different senses to perceive different types of valence
qualities. One of the most basic valence quality is pain. Unfortunately we
find very few examples of robots equipped with such kind of receptors.
This shows the lack of a change in concept of how humanoid robots that
aim to interact with people should be constructed. In our study we used a
loud noise as an aversive stimuli.
The processing of the valence quality and the elicitation of an appropriate response are very fast mechanisms in nature. Robots still lack
behind in performance to this benchmark. The problem emerges in robot
platforms that use different operating systems running on different machines. Such system are not capable to provide real-time processing and
communication between software and hardware. This means that the
transmission of the stimulus, the processing, but most importantly the
control of behavior have some delays that range between millisecond and
seconds. In our study we had therefore to make some adaptation in the
timing of the plasticity change and the elicitation of the response. This
problem hopefully becomes solved with new more powerful machines
that use optimized system architectures.
Despite this technical restrictions we were able to control the behavior
of a robot using the paradigm of fear conditioning. Our system is capable
to perceive the valence quality of a stimulus and associate it with the
neutral meaning of another stimulus. The implementation of this system
into a humanoid robot equips the agent with the capability to process
emotive content.
So far we evaluated the systems performance by observing its behavior in a real world interaction. In future steps we also want to quantify the
learning performance of the computational circuit. Additionally it would
interesting to extend the model with different types of valence stimuli
and expressive behavior. We propose therefore that basic experimental
approaches can be used to construct more complex emotive systems.
126
6.3
6.3.1
Proposal for an Advanced Emotive Architecture
Theoretical Basis
Emotions are structured processed that emerge over time to evaluate the
valence quality of an internal or external stimulus. These stimuli can address very basic needs like the regulation of the internal milieu or be quite
complex, for example the evaluation of social signals. This implies that
the underlying mechanism of appraisal involves different levels of processing addressing different levels of complexity (Leventhal and Scherer,
1987).
Table 6.1: Levels of processing for stimulus evaluation checks. Adapted
from Leventhal and Scherer (1987).
Level
Pleasantness Goals/Needs Coping potential
Sensory-motor
Innate
Basic
Available energy
Schematic
Learned
Acquired
Body schemata
Conceptual
Recalled
Conscious
Problem solving
Scherer et al. proposed a component process model CPM of emotions
to deal with this problem (Scherer, 2001; Sander et al., 2005). This model
defines the genesis of an emotion by a layered appraisal mechanism. This
evaluation is described in terms of the following proposed objectives:
1. Relevance - Is the stimulus relevant for the individual? Does it
require attention deployment, further information processing?
2. Implication - What are the potential consequences of the stimulus
for the individual?
3. Coping - Does the individual have sufficient resources to cope with
the consequences of the event?
127
4. Normative significance - How does the stimulus relate to the individual’s social or personal norms and standards?
Each of these objectives encompasses more subtle cognitive appraisals,
dubbed stimulus evaluation checks (SECs), the interaction of which yields
to the differentiation of the ensuing emotion (See table 6.1). Throughout
the appraising process, the evaluative function of the checks increases in
complexity. Core to this theory is the proposal that appraisals occur sequentially (Grandjean and Scherer, 2008) and influence in turn each of
the five components of emotion (See figure 6.8).
Figure 6.8: The component process model (Scherer (2001); Sander et al.
(2005)). Represented are the five components of emotion (vertical) as
well as the sequence of appraisals (horizontal) and the interaction between
subsystems that gradually shape the emotion, sup- porting the genesis of
a particular feeling.
128
6.3.2
Distributed Adaptive Control
In the previous sections we introduced a computational model of the
two phase theory of conditioning. This model distinguishes two different associative learning mechanisms: A fast non-specific learning system
(NLS), that produces within 1 to 5 trials a global state of arousal and
elicits simple self-protective reaction patterns and a slow specific learning system (SLS), that is responsible for the accumulation of fine tuned
motor reactions over a longer period of conditioning. Now we want to
extend this model to a multilayered appraisal and control structure. Following the theoretical basis of the component process model we want to
propose the Distributed Adaptive Control DAC as an advanced computational architecture for emotive processing (Verschure et al., 2003; Duff
and Verschure, 2010). The architecture of DAC is structured into three
layers: reactive, adaptive and contextual. The reactive layer is responsible for innate prewired reflexes. In the stimulus evaluation check of
the CPM this addresses the motor-sensory level (See table 6.1). The reflexes of the reactive layer provide cues for the learning in the adaptive
layer of DAC. In the stimulus evaluation check of the CPM this addresses
the schematic level. The acquired representations in the adaptive layer
provide inputs for the contextual layer that stores sequential representations. This mechanism represents in the conceptual level of the stimulus
evaluation check of CPM. We propose to model each stage of the CPM
using the 3 layered architceture of the distributed adaptive control. The
relevance can be addressed by the action space of our system (See figure
6.9 red panel). The Implication and the Coping will we modeled by the
self-introspection mechanism. This circuit evaluates the allostatic control
and the higher cognitive goals of the agent (See figure 6.9 green panel).
The model of the three components will produce a parallel architecture
of three sub-DAC systems. Each DAC system deals with one of the components. The Normative Significance is the most complex stage because
it needs a model of self. Future models of DAC have to propose such
a structure to be able to process how the stimulus relates to the agent’s
social or personal norms.
129
Figure 6.9: The system architecture of DAC: the system consists of three
tightly coupled layers: reactive, adaptive and contextual. The reactive
layer endows a behaving system with a prewired repertoire of reflexes
(low complexity unconditioned stimuli and responses) that enable it to
display simple adaptive behaviors. The activation of any reflex, however,
also provides cues for learning that are used by the adaptive layer via representations of internal states, i.e. aversive and appetitive. The adaptive
layer provides the mechanisms for the adaptive classification of sensory
events and the reshaping of response. The sensory and motor representations formed at the level of adaptive control provide the inputs to the
contextual layer that acquires, retains, and expresses sequential representations using systems for short and long term memory. The contextual
layer describes goal oriented learning and reflexive mechanisms.
130
6.3.3
Conclusion
Based on the empirical results of our models of conditioning we propose to extend their complexity on a perceptual and behavioral level. To
do so we use the theoretical framework of the component process model
that states that the stimulus evaluation process can be layered into three
levels: motor-sensory, schematic and conceptual. This theoretical framework supports theoretically the proposed control mechanism of the distributive adaptive control DAC.
131
Chapter 7
CONCLUSION
In this dissertation, we addressed the issue of understanding the phenomena of human emotions. To do so we posed the question of how we can
construct embodied models of emotions. Following this methodology we
implemented neurobiological and psychological models of affect expression into autonomous behaving agents to investigate both, the underlying
neuronal mechanisms and the perception of affect. This approach allowed
us to investigate and revise existing theories by comparing physiological
data from behavioral and neurophysiological experiments with the performance of our models. In a second step we used a computational model
of conditioning to control the expressive behavior of an android robot.
Based on the findings of these studies we propose an emotive architecture that can be used to control the behavior of android robots that aim
to socially interact with humans. The contributions of this thesis add to a
deeper understanding of the multidimensional phenomena of emotions on
three levels: Perception, interaction and how the processing of emotional
cues influences learning and behavior.
In the first part we investigated the perception of emotional behavior
and its impact on social interaction. Humans use a complex code of verbal communication and non-verbal behavior to express their emotions and
intentions. The perception of the physical presence of others is probably
one of the most basic social interaction patterns in humans (Hall, 1966;
133
Baldassare, 1978). Before we investigated complex aspects of emotion
processing and affect perception we posed the basic question of how humans perceive the physical presence of other humans and virtual avatars.
We hypothesized that the perception of a virtual avatar is less salient compared to a real human and that this decrease in salience has a fundamental
impact on social interaction on a spatial scale. This concept is know in
psychology as the law of apparent reality (Frijda, 1988) or the ’vividness effect’ (Borgida and Nisbett, 1977; Baddeley and Andrade, 2000;
McCabe and Castel, 2008). As an experimental paradigm to study this
question we constructed a collaborative mixed reality ball game where
two teams of two players had to coordinate their spatial movements. This
game could be either played by physical players being present in the space
or by remote players controlling a virtual avatar. The results of our study
show that spatial interaction of winners differ significantly from the interaction patterns of losers (Inderbitzin et al., 2009). This social interaction is fundamentally influenced by the salience of the interactors (Inderbitzin et al., submitted). Our empirical data supports the concept that the
salience of a stimuli acts like a gating mechanism for cognition and behavior. We propose this concept as a general mechanism of perception and
behavioral control. In our study we showed that humans perceive virtual
agents less salient and that this difference in vividness induces a fundamental adaptation of their interaction patterns. These results contribute to
a better understanding of how the stimulus salience of perceiving another
person influences social interaction. The understanding of this effect has
important implications for the construction of interactive virtual emotive
agents that aim social interaction with humans.
The results of our first study showed that the perception of others influence the regulation of the interpersonal distance, a subtle code of social
interaction. This opens the question what kind of additional non-verbal
behaviors transmit the intention of others. In our second study we investigated how people perceive the expression of emotional states based on
the observation of different styles of locomotion (Inderbitzin et al., 2011).
Our goal was to find a small set of canonical parameters that allow us to
control a wide range of emotional expressions. The results showed that,
134
independent of the viewing angle, participants perceived distinct states of
arousal and valence. Moreover, we could show that parametrized body
posture codes emotional states, irrespective of the contextual influence or
facial expressions. Our results show that human locomotion transmits basic emotional cues that can be directly related to the modulation of the
two canonical parameters speed and head/torso inclination. These findings are important for the understanding of how humans perceive nonverbal behavior. The acquired knowledge from this investigation allows
us to build virtual characters whose emotional expression is recognizable
at large distances and during extended periods of time.
We know that human communication is a multidimensional stream of
non-verbal and verbal features. So far our studies analyzed the perception of non-verbal codes. In our third study we addressed the question
of how humans perceive and integrate emotional meaning transmitted by
the facial expression and the meaning of spoken words. We used an expressive dialog system to investigate how humans perceive and integrate
the linguistic semantics and expressive dimension of an affective stimulus
construct. Differences in reaction times to judge coherent and incoherent
stimuli constructs were used to evaluate the automaticity of affect processing in the two dimensions. We tested between the fuzzy logical model
of perception (FLMP) and an additive model to investigate the underlying psychological mechanism of affect perception. Using a computer
animated avatar face we constructed a stimuli continuum from angry to
happy facial and linguistic expressions. We constructed a stimuli space
that transmits various degrees of coherent or incoherent affect. Subjects
were instructed to judge the affect of the facial expression, the affect of
the meaning of the word or the affect of the global event combining these
two properties. Both properties influenced judgments as described by the
FLMP, when participants responded quickly. With increasing reaction
time, the FLMP did not make better predictions than other models of perception. The reaction times increased when subjects had to rate stimuli
that coded incoherent valence qualities in the two modalities. This results indicate that people can not avoid the perception of the affect, even
when they are instructed to do so. Masked-priming experiments support
135
this interpretation of our data (Esteves et al., 1994; Morris et al., 1998;
Bargh, 1989; Kihlstrom, 1987). When participants had enough time they
did not integrate a multi-modal affective stimuli according to the mechanism of the FLMP. We conclude that the perception of affect in multiple
modalities is an automatic process that can produce interferences, while
the integration of these modalities into a global impression is more controlled.
So far we have investigated the perception of emotions and its relation on social interaction. The results of these studies contribute to the
understanding of the perceptual dimension of emotions. In the second
part of this thesis we focused on the computational mechanisms of emotions. In the first study we wanted to investigate the neuronal plasticity
responsible for the elicitation of an appropriate behavioral response to an
aversive stimulus. We investigated this question using the experimental
paradigm of classical conditioning. According to Konorski’s two phase
theory of conditioning the associative processes underlying classical conditioning can be separated into a fast valence driven non-specific learning systems (NLS) and a slow specific learning system (SLS) (Konorksi,
1948; Konorski, 1968). The theory states that the NLS elicits a nonspecific state of arousal and that the SLS is responsible for the exact
elicitation of a coordinated motor response (Ellison and Konorski, 1964).
Based on biological evidence we propose the amygdala, the basal forebrain and the auditory cortex as an example of NLS (Sanchez-Montanes
et al., 2002) and the cerebellum for the SLS (Hofstotter et al., 2002). The
performance of the model was tested applying the eye-blink paradigm
of classical conditioning. The unconditioned stimulus induced amygdala
stimulation of the nucleus basalis elicits plasticity in the NLS. This lead
to an increased representation if the conditioned stimulus in the cortex.
The plasticity of the cerebellar SLS was regulated by these increased cortical representation coding the behavioral importance of the conditioned
stimulus (Inderbitzin et al., 2010a). To verify the credibility of our model
we connected it to an autonomous robot that had to achieve an obstacle
avoidance task. The behavioral performance was used as a benchmark
(Inderbitzin et al., 2010b). The results of these studies provide a com136
plete account of Konorskis proposal by integrating these two systems into
a complete biologically-grounded computational model of the two-phase
theory of classical conditioning.
As a next step we applied the knowledge we had gained from the previous studies to design an emotive android agent that is capable to learn
the valence quality of a stimulus and produce an appropriate expressional
response. Based on the results of studies investigation the mechanism of
fear, we constructed a neurobiologically constrained model of the amygdala. This model used a Hebbian plasticity mechanism (Armony et al.,
1997; Johnson et al., 2008) to associate an unconditioned stimulus with
a conditioned stimulus to elicit an appropriate behavioral response in the
humanoid robot iCub. The performance of the model was tested in a real
world set up, exposing the robot to different colors (CS) and an aversive
tone (US). The correct elicitation of a conditioned response (CR) when
exposed to the CS after the conditioning phase was used as a benchmark.
In this study we successfully implemented a neurobiological based model
of fear conditioning into the control architecture of an android. The analysis of our results helps us to understand the connection between somatic
and cognitive processes involved the control and elicitation of emotions.
The fear mechanism is a very fast and efficient process in nature. While
neuronal modeling software can learn in real time, the iCub robot platform still lags behind the speed of behavioral control. This restriction has
to be considered in the future design of studies addressing social interaction in real time with android robots.
We investigated different aspects of emotions using embodied emotive
models. The results of our studies contribute to the understanding of the
phenomenon emotions on three levels: Perception, interaction and how
learning affects behavioral control. As a main contribution we propose a
biologically inspired architecture of emotion processing that can control
the behavior of an android. The results of this thesis show that embodied
emotive models can be successfully used to investigate human psychology. But it also shows that we are capable to equip android agents with
synthetic emotions. Robots and virtual avatars that are equipped with such
psychological and neurocomputational inspired mechanisms will dramat137
ically increase in the next decades. This will have a profound impact on
modern society’s hegemonic, economic and socio-cultural development.
138
Bibliography
R Adolphs, D Tranel, H Damasio, and A Damasio. Impaired recognition of emotion in facial expressions following bilateral damage to the
human amygdala. Nature, 372(6507):669–672, 1994.
J P Aggleton. The Amygdala. Wiley-Liss, Inc., New York, 1992.
C D Aizenman, P B Manis, and D J Linden. Polarity of long-term synaptic gain change is related to postsynaptic spike firing at a cerebellar
inhibitory synapse. Neuron, 21(4):827–835, 1998.
J S Albus. A theory of cerebellar function. Mathematical Biosciences,
10:25–61, 1971.
J S Albus. A new approach to manipulator control: the cerebellar model
articulation controller. Journal of Dynamic Systems, Measurement, and
Control, 97:220–227, 1975.
A K Anderson and E A Phelps. Lesions of the human amygdala impair
enhanced perception of emotionally salient events. Nature, 411(6835):
305–9, 2001.
M Argyle. Bodily Communication, volume 2nd. Methuen, 1988.
M Argyle and J Dean. Eye-contact, distance and affiliation. Sociometry,
28(3):289–304, 1965.
J L Armony, D Servan-Schreiber, J D Cohen, and J E LeDoux. Computational modeling of emotion: explorations through the anatomy and
139
physiology of fear conditioning. Trends in Cognitive Sciences, 6613
(1364), 1997.
J L Armony, G J Quirk, and J E LeDoux. Differential effects of amygdala
lesions on early and late plastic components of auditory cortex spike
trains during fear conditioning. The Journal of Neuroscience, 18(7):
2592–601, 1998.
M Arnold. Emotion and Personality. Columbia University Press, New
York, 1960.
A P Atkinson, W H Dittrich, A J Gemmell, and A W Young. Emotion
perception from dynamic and static body expressions in point-light and
full-light displays. Perception, 33(6):717–746, 2004.
A P Atkinson, M L Tunstall, and W H Dittrich. Evidence for distinct contributions of form and motion information to the recognition of emotions from body gestures. Cognition, 104(1):59–72, 2007.
Autodesk Inc., San Francisco, CA, USA. Autodesk 3ds max, 2007.
H Aviezer, R R Hassin, J Ryan, C Grady, J Susskind, A Anderson,
M Moscovitch, and S Bentin. Angry, disgusted, or afraid? Studies
on the malleability of emotion perception. Psychological science : A
Journal of the American Psychological Society / APS, 19(7):724–32,
2008.
A D Baddeley and J Andrade. Working memory and the vividness of
imagery. Journal of Experimental Psychology: General, 129(1):126–
145, 2000.
J N Bailenson, J Blascovich, A C Beall, and J M Loomis. Equilibrium
theory revisited: Mutual gaze and personal space in virtual environments. Presence: Teleoperators & Virtual Environments, 10(6):583–
598, 2001.
140
J N Bailenson, A C Beall, and J M Loomis. Interpersonal distance in
immersive virtual environments. Personality and Social Psychology
Bulletin, 29:1–15, 2003.
J S Bakin and N M Weinberger. Classical conditioning induces cs-specific
receptive field plasticity in the auditory cortex of the guinea pig. Brain
Res, 536(1-2):271–286, 1990.
J S Bakin, D A South, and N M Weinberger. Induction of receptive field
plasticity in the auditory cortex of the guinea pig during instrumental avoidance conditioning. Behavioral Neuroscience, 110(5):905–913,
1996.
M Baldassare. Human spatial behavior. Annual Review of Sociology, 4:
29–56, 1978.
C Balkenius and J Morén. A computational model of emotional conditioning in the brain. In Proceedings of Workshop on Grounding Emotions in Adaptive Systems, Zurich. Citeseer, 1998.
A Bandura. Social learning theory. Prentice Hall, Englewood Cliffs, NJ,
1977.
J.A. Bargh. Conditional automaticity: Varieties of automatic influence in
social perception and cognition. In James S Uleman and J.A. Bargh,
editors, Unintended thought, chapter 1, pages 3–51. Guildford Press,
New York, 1989.
S Baron-Cohen. Mindblindness: An Essay on Autism and Theory of Mind.
The MIT Press, 1997b.
S Baron-Cohen, S Wheelwright, and T Jolliffe. Is there a” language of
the eyes”? Evidence from normal adults, and adults with autism or
Asperger syndrome. Visual Cognition, 4(3):311–331, 1997a.
R M Bauer. Autonomic recognition of names and faces in prosopagnosia:
a neuropsychological application of the Guilty Knowledge Test. Neuropsychologia, 22(4):457–69, January 1984.
141
Paula Beall and Andrew Herbert. The face wins: Stronger automatic processing of affect in facial expressions than words in a modified Stroop
task. Cognition & Emotion, 22(8):1613–1642, May 2008.
A Bechara and A R Damasio. The somatic marker hypothesis: A neural
theory of economic decision. Games and Economic Behavior, 52(2):
336–372, 2005.
A Bechara, H Damasio, A R Damasio, and G P Lee. Different contributions of the human amygdala and ventromedial prefrontal cortex to
decision-making. The Journal of Neuroscience : The Official Journal
of the Society for Neuroscience, 19(13):5473–81, 1999.
A Bechara, H Damasio, and A R Damasio. Emotion , Decision Making
and the Orbitofrontal Cortex. Cerebral Cortex, 10:295–307, 2000.
C Becker, S Kopp, and I Wachsmuth. Simulating the emotion dynamics of
a multimodal conversational agent. Affective Dialogue Systems, pages
154–165, 2004.
U Bernardet, S Bermúdez i Badia, and P F M J Verschure. The experience
induction machine and its role in the research on presence. 10th Annual
International Workshop on Presence. Barcelona: Spain, 2007.
U Bernardet, M Inderbitzin, S Wierenga, A Väljamäe, A Mura, and P F
M J Verschure. Validating presence by relying on recollection: Human
experience and performance in the mixed reality system XIM. The 10th
International Workshop on Presence, 2008.
U Bernardet, A Väljamäe, M Inderbitzin, S Wierenga, and P F M J Verschure. Quantifying human subjective experience and social interaction using the experience induction machine. Brain Research Bulletin,
In press.
E Bevacqua, M Mancini, and C Pelachaud. A listening agent exhibiting variable behaviour. In Intelligent Virtual Agents, pages 262–269.
Springer, 2008.
142
K C Bickart, C I Wright, R J Dautoff, B C Dickerson, and L F Barrett.
Amygdala volume and social network size in humans. Nature Neuroscience, 14(2):163–164, 2010.
J R Binder, S J Swanson, T a Hammeke, G L Morris, W M Mueller,
M Fischer, S Benbadis, J A Frost, S M Rao, and V M Haughton. Determination of language dominance using functional MRI: a comparison
with the Wada test. Neurology, 46(4):978–84, 1996.
J R Binder, R H Desai, W W Graves, and L L Conant. Where is the
semantic system? A critical review and meta-analysis of 120 functional
neuroimaging studies. Cerebral Cortex, 19(12):2767–96, 2009.
R L Birdwhistell. Introduction to kinesics: An annotation system for analysis of body motion and gesture. University of Louisville, Louisville,
KY, 1975.
R Blake and M Shiffrar. Perception of human motion. Annual Review of
Psychology, 58:47–73, 2007.
S J Blakemore and J Decety. From the perception of action to the understanding of intention. Nature Reviews Neuroscience, 2(8):561–7,
2001.
J Blascovich, J Loomis, A C Beall, and K R Swinth. Virtual environment
technology as a methodological tool for social psychology. Psychological Inquiry, 13(2):103–124, 2002.
S Bookheimer. Functional MRI of language: new approaches to understanding the cortical organization of semantic processing. Annual Review of Neuroscience, 25:151–88, 2002.
E Borgida and R Nisbett. The Differential Impact of Abstract vs. Concrete
Information on Decisions. Journal of Applied Social Psychology, 7(3):
258–271, 1977.
143
M M Bradley and P J Lang. Measuring emotion: the self-assessment
manikin and the semantic differential. Journal of Behavior Therapy
and Experimental Psychiatry, 25(1):49–59, 1994.
C Breazeal. Emotion and sociable humanoid robots. International Journal of Human-Computer Studies, 59(1-2):119–155, 2003.
W H Bridger and I J Mandel. A comparison of GSR fear responses produced by threat and electric shock. Journal of Psychiatric Research,
54:31–40, 1964.
N Bruno and J E Cutting. Minimodularity and the perception of layout.
Journal of experimental psychology. General, 117(2):161–70, 1988.
T W Buchanan, K Lutz, S Mirzazade, K Specht, N J Shah, K Zilles, and
L Jäncke. Recognition of emotional prosody and verbal components
of spoken language: an fMRI study. Cognitive Brain Research, 9(3):
227–38, 2000.
J K Burgoon, L A Stern, and L Dillman. Interpersonal Adaption: Dyadic
Interaction Patterns. Cambridge University Press, New York, NY,
2007.
A Camurri, I Lagerlöf, and V Gualtiero. Recognizing emotion from dance
movement: comparison of spectator recognition and automated techniques. International Journal of Human-Computer Studies, 59(1-2):
213–225, 2003.
W B Cannon. The James-Lang Theory of Emotions: A critical examination and an alternative theory. American Journal of Psychology, 39
(1/4):106–124, 1927.
J B Carroll. Word Frequency Book. In P Davis and R Barry, editors, The
American Heritage Word Frequency Book, New York, 1971. American
Heritage Publishing Co., Inc.
144
F Caruana, A Jezzini, B Sbriscia-Fioretti, G Rizzolatti, and V Gallese.
Emotional and Social Behaviors Elicited by Electrical Stimulation of
the Insula in the Macaque Monkey. Current Biology, 21(3):195–199,
2011.
J.P. Chandler. Subroutine STEPIT - Finds local minima of a smooth function of several parameters. Behavioral Science, 14:81–82, 1969.
Y Chudasama, A Izquierdo, and E A Murray. Distinct contributions of the
amygdala and hippocampus to fear expression. The European Journal
of Neuroscience, 30(12):2327–37, 2009.
T J Clarke, M F Bradshaw, D T Field, S E Hampson, and D Rose. The
perception of emotion from body movement in point-light displays of
interpersonal dialogue. Perception, 34(10):1171–1180, 2005.
M Coesmans, J T Weber, C I De Zeeuw, and C Hansel. Bidirectional
parallel fiber plasticity in the cerebellum under climbing fiber control.
Neuron, 44(4):691–700, 2004.
J D Cohen, K Dunbar, and J L McClelland. On the control of automatic
processes: a parallel distributed processing account of the Stroop effect.
Psychological review, 97(3):332–61, July 1990.
MM Cohen and DW Massaro. Modeling coarticulation in synthetic visual
speech. Models and techniques in computer animation, pages 139–156,
1993.
F F Corchado Ramos, H R Orozco Aguirre, and L A Razo Ruvalcab. The
Use of Artificial Emotional Intelligence in Virtual Creatures. In J Vallverdú and D Casacuberta, editors, Handbook of Research on Synthetic
Emotions and Sociable Robotics: New Applications in Affective Computing and Artificial Intelligence, pages 350–378. IGI Global, 2009.
M Coulson. Attributing Emotion to Static Body Postures: Recognition
Accuracy, Confusions, and Viewpoint Dependence. Journal of Nonverbal Behavior, 28(2):117–139, 2004.
145
A D B Craig. How do you feel–now? The anterior insula and human
awareness. Nature Reviews Neuroscience, 10(1):59–70, 2009.
A D B Craig. The sentient self. Brain structure & function, 214(5-6):
563–77, 2010.
C Cruz-Neira, D J Sandin, T A DeFanti, R V Kenyon, and J C Hart. The
cave: Audio visual experience automatic virtual environment. Communications of the ACM, 35(6):64–72, 1992.
A R Damasio. Fundamental feelings. Nature, 413(6858):781, 2001.
A R Damasio, D Tranel, and H Damasio. Face agnosia and the neural substrates of memory. Annual review of neuroscience, 13:89–109, January
1990.
A R Damasio, B J Everitt, and D Bishop. The Somatic Marker Hypothesis and the Possible Functions of the Prefrontal Cortex. Philosophical
Transactions: Biological Sciences, pages 1413–1420, 1996.
J M Darley and B Latane. The unresponsive bystander: why doesn’t he
help? Appleton-Century Crofts, New York, NY, 1970.
D N Davis and S C Lewis. Computational models of emotion for autonomy and reasoning. Informatica Special Edition on Perception and
Emotion Based Reasoning, 27(2):157–164, 2003.
F C Davis, T Johnstone, E C Mazzulla, J A Oler, and P J Whalen. Regional Response Differences Across the Human Amygdaloid Complex during Social Conditioning. Cerebral Cortex, 12(10):1217–1218,
2009.
J De Houwer and D Hermans. Differences in the affective processing of
words and pictures. Cognition & Emotion, 8(1):1–20, 1994.
146
A De Luca, R Mattone, P R Giordano, and H H Bulthoff. Control design
and experimental evaluation of the 2D cyberwalk platform. In Proceedings from the IEEE/RSJ International Conference on Intelligent Robots
and Systems, St. Louis, USA, 2009.
P R De Silva and N Bianchi-Berthouze. Modeling human affective postures: an information theoretic characterization of posture features.
Computer Animation and Virtual Worlds, 15(3-4):269–276, 2004.
R De Sousa. The rationality of emotions. MIT Press, Cambridge, 1987.
R M J Deacon, D M Bannerman, and J N P Rawlins. Anxiolytic Effects
of Cytotoxic Hippocampal Lesions in Rats. Behavioral Neuroscience,
116(3):494–497, 2002.
B DeCarolis, C Pelachaud, and I Poggi. APML, a mark-up language
for believable behavior generation. Life-like Characters., pages 1–22,
2004.
K Dedovic, A Duchesne, J Andrews, V Engert, and J C Pruessner. The
brain and the stress axis: The neural correlates of cortisol regulation in
response to stress. NeuroImage, 47:864–871, 2009.
S Dehaene, L Naccache, H Gurvan Le Clec, E Koechlin, M Mueller,
G Dehaene-Lampertz, P F van De Moortele, and D Le Bihan. Imaging
unconscious semantic priming. Nature, 395:597–600, 1998.
T Delbruck, A Whatley, R Douglas, K Eng, K Hepp, and P F M J Verschure. A tactile luminous floor for an interactive autonomous space.
Robotics and Autonomous Systems, 55(6):433–443, 2007.
J-F Demonet, F Chollet, S Ramsay, D Cardebat, J-L Nespoulous, R Wise,
A Rascol, and R Frackowiak. The anatomy of phonological and semantic processing in normal subjects. Brain, 115:1753–1768, 1992.
J T Devlin, R P Russell, M H Davis, C J Price, H E Moss, M J Fadili,
and L K Tyler. Is there an anatomical basis for category-specificity?
147
Semantic memory studies in PET and fMRI. Neuropsychologia, 40(1):
54–75, 2002.
U Dimberg, M Thunberg, and K Elmehed. Unconscious facial reactions
to emotional facial expressions. Psychological Science : A Journal of
the American Psychological Society, 11(1):86–89, 2000.
A Duff and P F M J Verschure. Unifying perceptual and behavioral learning with a correlative subspace learning rule. Neurocomputing, 73(1012):1818 – 1830, 2010. Subspace Learning / Selected papers from the
European Symposium on Time Series Prediction.
J M Edeline, P Pham, and N M Weinberger. Rapid development of
learning-induced receptive field plasticity in the auditory cortex. Behavioral Neuroscience, 107(4):539–551, 1993.
P Ekman. An argument for basic emotions. Cognition & Emotion, 6(3):
169–200, 1992.
P Ekman. Facial expression and emotion. American Psychologist, 48(4):
384–392, 1993.
P Ekman and W V Friesen. Detecting deception from the body or face.
Journal of Personality and Social Psychology, 29(3):288–298, 1974.
P Ekman and W V Friesen. Facial Action Coding System: A Technique
for the Measurement of Facial Movement. Consulting Psychologists
Press, Palo Alto, 1978.
P Ekman and W V Friesen. A new pan-cultural facial expression of emotion. Motivation and Emotion, 10(2):159–168, 1986.
P Ekman, W V Friesen, and P Ellsworth. Emotion in the Human Face.
Oxford University Press, New York, 2nd edition, 1982.
M S El-Nasr, J Yen, and T R Ioerger. FLAME - Fuzzy Logic Adaptive
Model of Emotions. Autonomous Agents and Multi-agent systems, 3
(3):219–257, 2000.
148
C Elliott and G Siegle. Variables Influencing the Intensity of Simulated
Affective States. In AAAI Spring Symposium on Reasoning about Mental States: Formal Theories and Applications, pages 58–67, 1993.
G D Ellison and J Konorski. Separation of the salivary and motor responses in instrumental conditioning. Science, 146(3647):1071–1072,
1964.
J W Ellison and D W Massaro. Featural evaluation, integration, and judgment of facial affect. Journal of Experimental Psychology. Human Perception and Performance, 23(1):213–26, 1997.
F Esteves, U Dimberg, and A Öhman. Automatically elicited fear: Conditioned skin conductance responses to masked facial expressions. Cognition & Emotion, 8(5):393–413, 1994.
N L Etcoff and J J Magee. Categorical perception of facial expressions.
Cognition, 44(3):227 – 240, 1992.
J Evans. In two minds: dual-process accounts of reasoning. Trends in
Cognitive Sciences, 7(10):454–459, 2003.
J D G Evans. Review: Aristotle’s de anima. The Classical Review, 45(1):
pp. 60–61, 1995.
M S Fanselow and A M Poulos. The neuroscience of mammalian associative learning. Annual Review of Psychology, 56:207–234, 2005.
M J Farah, J W Tanaka, and H M Drain. What causes the face inversion
effect? Journal of Experimental Psychology. Human Perception and
Performance, 21(3):628–634, 1995.
L Festinger, S Schachter, and K Back. Social Pressures in Informal
Groups. Harper, New York, 1950.
S T Fiske and S E Taylor. Social cognition. New York, NY: Random
House, 1984.
149
M K Floeter and W T Greenough. Cerebellar plasticity: modification of
Purkinje cell structure by differential rearing in monkeys. Science, 206
(4415):227–229, 1979.
A J Fridlund, B Apfelbaum, G Blum, D Brown, J Balakrishnan, J Loomis,
G Mchugo, M Platow, and P Rozin. Sociality of Solitary Smiling:
Potentiation by an Implicit Audience. Journal of Personality and Social
Psychology, 60(2):229–240, 1991.
N H Frijda. The laws of emotion. The American Psychologist, 43(5):
349–58, 1988.
N H Frijda. Moods, emotion episodes and emotions. In M Lewis and
J M Haviland, editors, Handbook of Emotions, pages 381–403. Guilford Press, New York, 1993.
J M Fuster. The prefrontal cortex. Elsevier, Amsterdam, 4th edition, 2008.
R Galambos, G Sheatz, and V G Vernier. Electrophysiological correlates
of a conditioned response in cats. Science, 123(3192):376–377, 1956.
GarageGames. Torque Game Engine [Computer software]. Eugene, OR,
2010.
P Gebhard. ALMA: a layered model of affect. In Proceedings of the
fourth international joint conference on Autonomous agents and multiagent systems, pages 29–36. ACM, 2005.
M A Giese and T Poggio. Neural mechanisms for the recognition of
biological movements. Nature Reviews Neuroscience, 4(3):179–192,
2003.
W R Glaser and F-J Dungelhoff. The time course of picture-word interference. Journal of experimental psychology. Human perception and
performance, 10(5):640–654, 1984.
150
W R Glaser and M O Glaser. Context effects in stroop-like word and
picture processing. Journal of Experimental Psychology. General, 118
(1):13–42, 1989.
J Globisch, A O Hamm, F Esteves, and A Öhman. Fear appears fast:
temporal course of startle reflex potentiation in animal fearful subjects.
Psychophysiology, 36(1):66–75, 1999.
P Gold. Acetylcholine modulation of neural systems involved in learning
and memory. Neurobiology of Learning and Memory, 80:194–210,
2003.
I Gormezano, W F Prokasy, and R F Thompson. Classical conditioning.
Lawrence Erlbaum, Hillsdale, NJ, England, 1987.
D Grandjean and K R Scherer. Unpacking the cognitive architecture of
emotion processes. Emotion, 8(3):341–51, 2008.
J Gratch and S Marsella. A domain-independent framework for modeling
emotion. Journal of Cognitive Systems Research, 5(4):269–306, 2004.
J Gratch and S Marsella. Evaluating a Computational Model of Emotion.
Autonomous Agents and Multi-Agent Systems, 11(1):23–43, 2005.
J Gratch, J Rickel, E André, J Cassell, E Petajan, and N Badler. Creating interactive virtual humans: Some assembly required. Intelligent
Systems, IEEE, 17(4):54–63, 2002.
A G Greenwald, M R Klinger, and T J Liu. Unconscious processing of
dichoptically masked words. Memory & cognition, 17(1):35–47, 1989.
P E Griffiths. What emotions really are. University of Chicago Press,
Chicago, 1997.
B GuoQiang and P MuMing. Synaptic modifications in cultured hippocampal neurons: Dependence on spike timing, synaptic strength, and
postsynaptic cell type. Journal of Neuroscience, 18(24):10464–10472,
1998.
151
E T Hall. A system for the notation of proxemic behavior. American
Anthropologist, 65:1003–1026, 1963.
E T Hall. The Hidden Dimension. Anchor Books, New York, 1966.
A R Hariri, V S Mattay, A Tessitore, B Kolachana, F Fera, D Goldman,
M F Egan, and D R Weinberger. Serotonin transporter genetic variation
and the response of the human amygdala. Science, 297(5580):400–3,
2002.
M Haruno and C D Frith. Activity in the amygdala elicited by unfair
divisions predicts social value orientation. Nature Neuroscience, 13
(2):160–1, 2010.
J V Haxby, E A Hoffman, and M I Gobbini. Human neural systems for
face recognition and social communication. Biological Psychiatry, 51
(1):59–67, 2002.
L A Hayduk. Personal space: An evaluative and orienting overview. Psychological Bulletin, 85(1):117 – 134, 1978.
D O Hebb. The Organization of Behavior: A Neuropsychological Theory.
Wiley, New York, 1949.
H Hediger. Wild Animals in Captivity. Dover Publications, New York,
1964.
F Heider. Social perception and phenomenal causality. Psychological
Review, 51(6):358–374, 1944.
A Hermann, A Schäfer, B Walter, R Stark, D Vaitl, and A Schienle. Emotion regulation in spider phobia: role of the medial prefrontal cortex.
Social Cognitive and Affective Neuroscience, 4(3):257–67, 2009.
B Hillier. Space is the Machine. Press Syndicate of the University of
Cambridge, 1996.
152
C Hofstoffer, M Mintz, and P F M J Verschure. The cerebellum in action:
a simulation and a robotics study. European Journal of Neuroscience,
16:1361–1376, 2002.
C Hofstotter, M Mintz, and P F M J Verschure. The cerebellum in action:
a simulation and robotics study. European Journal of Neuroscience, 16
(7):1361–1376, 2002.
C Hull. The problem of stimulus equivalence in behavior theory. Psychological Review, 46:9–30, 1939.
M Inderbitzin, S Wierenga, A Väljamäe, U Bernardet, and P F M J Verschure. Cooperation and competition in the mixed reality space experience induction machine xim. Virtual Reality, 13:153–158, 2009.
M Inderbitzin, I Herreros-Alonso, and P F M J Verschure. An integrated
computational model of the two phase theory of classical conditioning. In The 2010 International Joint Conference on Neural Networks
(IJCNN), pages 1–8. IEEE, 2010a.
M Inderbitzin, I Herreros-Alonso, and P F M J Verschure Verschure.
Amygdala Induced Plasticity in an Integrated Computational Model of
the Two-Phase Theory of Conditioning. In 4th International Conference on Cognitive Systems, Zurich: Switzerland, 2010b.
M Inderbitzin, A Valjamae, J M B Calvo, P F M J Verschure, and
U Bernardet. Expression of emotional states during locomotion based
on canonical parameters. In IEEE International Conference on Automatic Face and Gesture Recognition, pages 809 –814, 2011.
M Inderbitzin, A Betella, U Bernardet, and P F M J Verschure. The
social perceptual salience effect. Journal of Experimental Psychology.
Human Perception and Performance, submitted.
M Ito. Long-term depression. Annual Review of Neuroscience, 12(1):
85–102, 1989.
153
M Ito. Historical review of the signification of the cerebellum and the
role of the purkinje cells in motor learning. Annals of the New York
Academy of Sciences, 978:273–288, 2002.
C E Izard. The face of emotions. Appleton-Century-Crofts, New York,
1971.
C E Izard. Human emotions. Plenum, New York, 1977.
C E Izard. Innate and universal facial expressions: Evidence from developmental and cross-cultural research. Psychological Bulletin, 115(2):
288–299, 1994.
J Jacobs. The Death and Life of Great American Cities. Random House,
New York, 1961.
C H James, T J Buckingham, and G A Barto. Models of the cerebellum
and motor learning. Behavioral and Brain Sciences, 19:368–383, 2004.
W James. What is an emotion? Mind, 9(34):188–205, 1884.
J P Johansen, J W Tarpley, J E Ledoux, and H T Blair. Neural substrates
for expectation-modulated fear learning in the amygdala and periaqueductal gray. Nature Neuroscience, 13(8):979–986, 2010.
L R Johnson, J E LeDoux, and V Doyère. Hebbian reverbrations in emotional memory micro circuits. Frontiers in Neuroscience, 3(2):198–
205, 2008.
P N Johnson-Laird and K Oatley. The language of emotions: An analysis
of a semantic field. Cognition & Emotion, 3(2):81–123, 1989.
S Jolly. Understanding body language: Birdwhistell’s theory of kinesics.
Corporate Communications: An International Journal, 5(3):133–139,
2000.
N H Kalin, S E Shelton, and R J Davidson. Role of the primate orbitofrontal cortex in mediating anxious temperament. Biological Psychiatry, 62(10):1134–9, 2007.
154
S Kamisato, S Odo, Y Ishikawa, and K Hoshino. Extraction of Motion
Characteristics Corresponding to Sensitivity Information Using Dance
Movement. Computational Intelligence, 8(2), 2004.
N Kanwisher, J McDermott, and M M Chun. The fusiform face area:
a module in human extrastriate cortex specialized for face perception.
The Journal of neuroscience : the official journal of the Society for
Neuroscience, 17(11):4302–11, 1997.
D P Kennedy, J Gläscher, J M Tyszka, and R Adolphs. Personal space regulation by the human amygdala. Nature Neuroscience, 12(10):1226–7,
2009.
J F Kihlstrom. The Cognitive Unconscious. Science, 237(4821):1445–52,
1987.
J Kisielius and B Sternthal. Examining the Vividness Controversy: An
Availability-Valence Interpretation. The Journal of Consumer Research, 12(4):418–431, 1986.
M A Kisley and G L Gerstein.
Daily variation and appetitive
conditioning-induced plasticity of auditory cortex receptive fields. European Journal of Neuroscience, 13(10):1993–2003, 2001.
A Kleinsmith and N Bianchi-Berthouze. Recognizing affective dimensions from body posture. In International Conference of Affective
Computing and Intelligent Interaction, pages 48–58, Lisboa (Portugal),
2007.
A Kleinsmith, P De Silva, and N Bianchi-Berthouze. Cross-cultural differences in recognizing affect from body posture. Interacting with
Computers, 18(6):1371–1389, 2006.
J Konorksi. Conditioned reflex and neuron organization. University Press,
Cambridge, 1948.
J Konorski. Integrative Activity of the Brain. An Interdisciplinary Approach. University of Chicago Press, Chicago, 1968.
155
R E Kraut and R E Johnston. Social and Emotional Messages of Smiling :
An Ethological Approach. Journal of Personality and Social Psychology, 37(9):1539–1553, 1979.
M E Kret and B de Gelder. Social context influences recognition of bodily expressions. Experimental brain research. Experimentelle Hirnforschung. Expérimentation cérébrale, 203(1):169–80, 2010.
D J Krupa and R F Thompson. Reversible inactivation of the cerebellar
interpositus nucleus completely prevents acquisition of the classically
conditioned eye-blink response. Learning & Memory, 3(6):545–556,
1997.
N Kuczewski, C Porcher, V Lessmann, I Medina, and J-L Gaiarsa. Backpropagating action potential: A key contributor in activity-dependent
dendritic release of BDNF. Communicative & Integrative Biology, 1
(2):153–155, 2008.
J D LaBarbera, C E Izard, P Vietze, and S A Parisis. Four- and sixmonth-old infants’ visual responses to joy, anger, and neutral expressions. Child Development, 47(2), 1976.
C Lamm and T Singer. The role of anterior insular cortex in social emotions. Brain Structure & Function, pages 579–591, 2010.
P J Lang, M Davis, and A Öhman. Fear and anxiety: Animal models and
human cognitive psychophysiology. Journal of Affective Disorders, 61
(3):137–159, 2000.
R S Lazarus. Stress, appraisal and coping. Springer, New York, 1991.
J E LeDoux. Emotion memory and the brain. Scientific American, 270
(6):32–39, 1994.
J E LeDoux. Emotion: clues from the brain. Annual Review of Psychology, 46:209–35, 1995.
156
J E LeDoux. The emotional brain. Simon and Schuster Paperbacks, New
York, 1996.
J E LeDoux. Emotion circuits in the brain. Annual review of neuroscience,
23(1):155–184, 2000.
J E LeDoux. Synaptic Self: How Our Brains Become Who We Are. Penguin (Non-Classics), 2003.
J E LeDoux. Amygdala. Scholarpedia, 2006.
J E LeDoux and R G Phillips. Differential Contribution of Amygdala and
Hippocampus to Cued and Contextual Fear Conditioning. Behavioral
Neuroscience, 106(2):274–285, 1992.
T Lee and J J Kim. Differential effects of cerebellar, amygdalar, and
hippocampal lesions on classical eyeblink conditioning in rats. The
Journal of Neuroscience : The Official Journal of the Society for Neuroscience, 24(13):3242–3250, 2004.
A Leffler, D L Gillespie, and J C Conaty. The Effects of Status Differentiation on Nonverbal Behavior. Social Psychology Quarterly, 45(3):
153–161, 1982.
R Lennartz and N M Weinberger. Analysis of response systems in pavlovian conditioning reveal rapidly versus slowly acquired conditioned responses: Support for two–factors and implications for neurobiology.
Psychobiology, 20:93–119, 1992.
H Leventhal and K R Scherer. The relationship of emotion to cognition: A
functional approach to a semantic controversy. Cognition and Emotion,
1(1):3–28, 1987.
S C Levine, M T Banich, and M P Koch-Weser. Face recognition: a
general or specific right hemisphere capacity? Brain and Cognition, 8
(3):303–25, 1988.
157
J L Lewis, J J LoTurco, and P R Solomon. Lesions of the middle cerebellar peduncle disrupt acquisition and retention of the rabbits classically
conditioned nictitating membrane response. Behavioral Neuroscience,
101, 1987.
M Lewis. Self-Conscious Emotions: Embarrassment, Pride, Shame, and
Guilt. In M Lewis and J M Haviland, editors, Handbook of Emotions,
pages 563–573. Guildford Press, New York, 1993.
G D Logan. Automaticity and cognitive control. In J S Uleman and J A
Bargh, editors, Unintended thoughts. Guildford Press, New York, 1989.
G Lowe. Inhibition of backpropagating action potentials in mitral cell
secondary dendrites. Journal of Neurophysiology, 88(1):64–85, 2002.
N Mackintosh. The psychology of animal learning. Academic Press,
London, 1974.
C M MacLeod. Half a century of research on the Stroop effect: an integrative review. Psychological Bulletin, 109(2):163–203, 1991.
S Maren. Neurobiology of Pavlovian Fear Conditioning. Annual Review
of Neuroscience, 24:897–931, 2001.
H Markram, J Lubke, M Frotscher, and B Sakmann. Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs. Science,
275(5297):213–215, 1997.
D Marr. A theory of cerebellar cortex. The Journal of Physiology, 202
(2):437–470, 1969.
S C Marsella and J Gratch. EMA : A process model of appraisal dynamics. Cognitive Systems Research, 10(1):70–90, 2009.
A H Maslow. Motivation and Personality. Harper & Row, Publishers,
Inc., Oxford, England, 1954.
158
D W Massaro. Speech perception by ear and eye; a paradigm for psychological inquiry. Erlbaum, Hillsdale, NJ, England, 1987a.
D W Massaro. Categorical partition: A fuzzy-logical model of categorization behaviour. In H Stevan, editor, Categorical perception: The
groundwork of cognition, pages 254–283. Cambridge University Press,
New York, 1987b.
D W Massaro. Ambiguity in Perception and Experimentation. Journal of
Experimental Psychology: General, 117(4):417–421, 1988.
D W Massaro. Testing between the TRACE model and the fuzzy logical
model of speech perception. Cognitive Psychology, 21(3):398–421,
1989.
D W Massaro. Perceiving talking faces: from speech perception to a
behavioral principle. MIT Press, Cambridge, MA, USA, 1998.
D W Massaro and M M Cohen. Perceiving Talking Faces. Current Directions in Psychological Science, 4(4):104–109, 1995.
D W Massaro and P B Egan. Perceiving affect from the voice and the
face. Psychonomic Bulletin and Review, 3:215–221, 1996.
D W Massaro and E L Ferguson. Cognitive style and perception : the
relationship between category width and speech perception , categorization , and discrimination. The American Journal of Psychology,
106(1):25–49, 1993.
D W Massaro, M M Cohen, A Gesi, R Heredia, and M Tsuzaki. Bimodal speech perception: an examination across languages. Journal of
Phonetics, 21:445–478, 1993.
D W Massaro, M M Cohen, J Beskow, and R A Cole. Developing and
evaluating conversational agents. In Workshop on Embodied Conversational Characters WECC, pages 287–318, Lake Tahoe, CA, USA,
1998.
159
D W Massaro, M M Cohen, and S Vanderhyden. Baldi. iPhone Software,
2009.
Z Mathews, S Bermúdez i Badia, and P F M J Verschure. A novel brainbased approach for multi-modal multi-target tracking in a mixed reality
space. 4th Intuition international conference and workshop on virtual
reality, Athens, Greece, 2007.
D P McCabe and A D Castel. Seeing is believing: the effect of brain
images on judgments of scientific reasoning. Cognition, 107(1):343–
52, 2008.
G McCarthy, A Puce, J C Gore, and T Allison. Face-Specific Processing
in the Human Fusiform Gyrus. Journal of Cognitive Neuroscience, 9
(5):605–610, 1997.
I K McKenzie and K T Strongman. Rank (status) and interaction distance.
European Journal of Social Psychology, 11(2):227–230, 1981.
A McQueen.
Fall Winter collection: Kate moss holographic
projection.
fashionWatch [Video file]. Retrieved from:
http://www.youtube.com/user/fashionWATCH, 2006.
J F Medina, J C Repa, M D Mauk, and J E LeDoux. Parallels between
cerebellum- and amygdala-dependent conditioning. Nature Reviews
Neuroscience, 3(2):122–31, 2002.
H K M Meeren, C C R J van Heijnsbergen, and B de Gelder. Rapid perceptual integration of facial expression and emotional body language.
Proceedings of the National Academy of Sciences of the United States
of America, 102(45):16518–23, 2005.
A Mehrabian. Nonverbal communication. Aldine Transaction Publishers,
New Jersey, USA, 1972.
A N Meltzoff and M K Moore. Imitation of Facial and Manual Gestures
by Human Neonates. Science, 198(4312):75–78, 1977.
160
N E Miller. Studies of fear as acquirable drive. Journal of Experimental
Psychologye, 38:89–101, 1948.
J Mor. A Computational Model of Emotional Learning in the Amygdala.
Cognitive Science, 1995.
R L Morgan and D Heise. Structure of Emotions. Social Psychology
Quarterly, 51(1):19, 1988.
J S Morris, A Ohman, and R J Dolan. Conscious and unconscious emotional learning in the human amygdala. Nature, 393:467–70, 1998.
O H Mowrer. Learning theory and behavior. Wiley, New York, 1960.
L Nadel and C Land. Commentary - Reconsolidation : Memory traces
revisited. Nature Reviews Neuroscience, 1(3):209–212, 2000.
K Nakamura, R Kawashima, K Ito, M Sugiura, T Kato, a Nakamura,
K Hatano, S Nagumo, K Kubota, H Fukuda, and S Kojima. Activation
of the right inferior frontal cortex during assessment of facial emotion.
Journal of Neurophysiology, 82(3):1610–4, 1999.
O Newman. Defensible Space. Macmillan, New York, 1973.
K Oatley and P N Johnson-Laird. Towards a cognitive theory of emotions.
Cognition & Emotion, 1:29–50, 1987.
A Öhman. Automaticity and the amygdala: Nonconscious responses to
emotional faces. Current Directions in Psychological Science, 11(2):
62–66, 2002.
A Öhman, A Flykt, and F Esteves. Emotion dirves attention: Detecting
the snake in the grass. Journal of Experimental Psychology General,
130(3):466–478, 2001.
A Ortony and T J Turner. What’s basic about basic emotions? Psychological Review, 97(3):315–331, 1990.
161
A Ortony, G Clore, and M Foss. The referential structure of the affective
lexicon. Cognitive Science, 11(3):341–364, 1987.
J Panksepp. Toward a general psychobiological theory of emotions. Behavioral and Brain Sciences, 5(3):407–422, 1982.
J J Paton, M A Belova, S E Morrison, and C D Salzman. The primate
amygdala represents the positive and negative value of visual stimuli
during learning. Nature, 439:865–70, 2006.
M L Patterson. Compensation in nonverbal immediacy behaviors: A review. Sociometry, 36(2):237–252, 1973.
I Pavlov. Conditioned reflexes. Oxford University Press, Oxford, 1927.
C Pelachaud and M Bilvi. Computational model of believable conversational agents. Communications in Multiagent Systems, pages 300–317,
2003.
A Penn. Space Syntax And Spatial Cognition: Or Why the Axial Line?
Environment & Behavior, 35(1):30–65, 2003.
N S Pentkowski, D C Blanchard, C Lever, Y Litvin, and R J Blanchard.
Effects of lesions to the dorsal and ventral hippocampus on defensive
behaviors in rats. The European Journal of Neuroscience, 23(8):2185–
96, 2006.
A S Pentland. Honest Signals: How They Sahpe Our World. MIT Press,
Cambridge. MA, 2008.
S P Perrett, B P Ruiz, and M D Mauk. Cerebellar cortex lesions disrupt
learning-dependent timing of conditioned eyelid responses. Journal of
Neuroscience, 13(4):1708–18, 1993.
J Pforsich. Handbook for Laban Movement Analysis. Janis Pforsich, New
York, 1977.
162
R G Phillips and J E LeDoux. Differential contribution of amygdala
and hippocampus to cued and contextual fear conditioning. Behavioral
Neuroscience, 106(2):274–285, 1992.
R Plutchik. A general psychocvolutionary theory of emotion. In R Kellerman and Plutchik R, editors, Emotion: Theory, research, and experience. Theories of emotion, volume 1, pages 3–31. Academic Press,
New York, 1980.
F E Pollick, H M Paterson, A Bruderlin, and A J Sanford. Perceiving
affect from arm movement. Cognition, 82(2):B51–B61, 2001.
D A Powell and D Levine-Bryce. A comparison of two model systems of
associative learning: heart rate and eyeblink conditioning in the rabbit.
Psychophysiology, 25(6):672–682, 1988.
M J Power and T Dalgleish. Cognition and Emotion: From Order to
disorder. Psychology Press, Sussex, UK, 1997.
J L Price. Comparative aspects of amygdala connectivity. Annals of the
New York Academy of Sciences, 985(1):50–58, 2003.
J J Prinz. Gut Reactions. Oxford University Press, New York, 2004.
Psychology Software Tools, Inc., Sharspburg, PA, USA. E-prime 1, 2007.
G J Quirk, E Likhtik, J G Pelletier, and D Pare. Stimulation of Medial
Prefrontal Cortex Decreases the Responsiveness of Central Amygdala
Output Neurons. Behavioral Neuroscience, 23(25):8800 – 8807, 2003.
R A Rescorla and R L Solomon. Two process learning theory: Realtionships between pavlovian conditioning and instrumental learning.
Psychological Review, 74:151–182, 1967.
R A Rescorla, A R Wagner, A H Black, and W F Prokasy. A theory
of pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. In A H Black and W F Prokasy, editors,
163
Classical conditioning II: Current research and theory, pages 64–99.
Appleton-Century Crofts, New York, 1972.
C L Roether, L Omlor, A Christensen, and M A Giese. Critical features
for the perception of emotion from gait. Journal of Vision, 9(6):1–32,
2009.
D B Roger. Body-Image, Personal Space and Self-Esteem: Preliminary
Evidence for ”Focusing” Effects. Journal of Personality Assessment,
46(5):468–476, 1982.
J A Russell. A circumplex model of affect. Journal of Personality and
Social Psychology, 39(6):1161–1178, 1980.
C D Salzman and S Fusi. Emotion, cognition, and mental state representation in amygdala and prefrontal cortex. Annual Review of Neuroscience, 33:173–202, 2010.
M Sanchez-Fibla, U Bernardet, E Wasserman, T Pelc, M Mintz, J C Jackson, C Lansink, C Pennartz, and P F M J Verschure. Allostatic control
for robot behavior regulation: a comparative rodent-robot study. Advances In Complex Systems, 13:377–403, 2010.
M A Sanchez-Montanes, P Konig, and P F M J Verschure. Learning
sensory maps with real-world stimuli in real time using a biophysically
realistic learning rule. IEEE Transaction on Neural Networks, 13(3):
619–632, 2002.
D Sander, J Grafman, and T Zalla. The Human Amygdala: An evolved
system for relevance detection. Reviews in the Neurosciences, 14(4):
303–316, 2003.
D Sander, D Grandjean, and K R Scherer. A systems approach to appraisal mechanisms in emotion. Neural networks : the official journal
of the International Neural Network Society, 18(4):317–52, 2005.
164
G Sandini, G Metta, and D Vernon. The iCub Cognitive Humanoid Robot
: An Open-System Research Platform for Enactive Cognition Enactive
Cognition : Why Create a Cognitive Humanoid. In M Lungarella,
F Iida, J Bongard, and R Pfeifer, editors, Darwin, pages 358–369.
Springer, Berlin, 2007.
D A Sauter, F Eisner, P Ekman, and S K Scott. Cross-cultural recognition
of basic emotions through nonverbal emotional vocalizations. Proceedings of the National Academy of Sciences, 107(6):2408, 2010.
S Schachter. The interaction of cognitive and physiological determinants
of emotional state. In L Berkowitz, editor, Advances in experimental
social psychology, volume 1 of Advances in Experimental Social Psychology, pages 49 – 80. Academic Press, 1964.
K R Scherer. Appraisal considered as a process of multilevel sequential
checking. In K R Scherer and A Schorr, editors, Appraisal processes in
emotion: Theory, methods, research, pages 92–120. Oxford University
Press, New York, 2001.
K R Scherer and P Ekman. Approaches to Emotion, chapter Expression
and the nature of emotion, pages 319–344. Lawrence Erlbaum Associates, Hillsdale, NJ, 1984.
D Schiller, J B Freeman, J P Mitchell, J S Uleman, and E A Phelps. A
neural mechanism of first impressions. Nature Neuroscience, 12(4):
508–14, 2009.
A Schirmer and S A Kotz. Beyond the right hemisphere: brain mechanisms mediating vocal emotional processing. Trends in Cognitive Sciences, 10(1):24–30, 2006.
N Schneiderman, I Fuentes, and I Gormezano. Acquisition and extinction of the classically conditioned eyelid response in the albino rabbit.
Science, 136:650–652, 1962.
165
M Schröder and J Trouvain. The German text-to-speech synthesis system
MARY: A tool for research, development and teaching. International
Journal of Speech Technology, 6(4):365–377, 2003.
M Schröder, L Devillers, K Karpouzis, J C Martin, C Pelachaud, C Peter,
H Pirker, B Schuller, J Tao, and I Wilson. What should a generic emotion markup language be able to represent? Affective Computing and
Intelligent Interaction, pages 440–451, 2007.
M Schröder, D Cowie, Rand Heylen, M Pantic, C Pelachaud, and
B Schuller. Towards responsive sensitive artificial listeners. In Proceedings of the 4th International Workshop on Human-Computer Conversation, page 6, Sheffield, UK, 2008.
W Schultz. Getting formal with dopamine and reward. Neuron, 36(2):
241–63, 2002.
H T Schupp, A Öhman, M Junghöfer, A I Weike, J Stockburger, and
A O Hamm. The facilitated processing of threatening faces: an ERP
analysis. Emotion, 4(2):189–200, 2004.
G M Schwartz, C E Izard, and S E Ansul. The 5-month-old’s ability to
discriminate facial expressions of emotion. Infant Behavior and Development, 8(1):65 – 77, 1985.
C Sehlmeyer, S Schöning, P Zwitserlood, B Pfleiderer, T Kircher,
V Arolt, and C Konrad. Human fear conditioning and extinction in
neuroimaging: a systematic review. PloS one, 4(6):e5865, 2009.
R M Shiffrin. Controlled and automatic human information processing:
Ii. perceptual learning, automatic attending, and a general theory. Psychological Review, 84:127–190, 1977.
R M Shiffrin. Attention. In R C Atkinson, R J Herrnstein, Lindzey G, and
Luce D, editors, Steven’s handbook of experimental psychology, pages
739–811. Wiley, New York, 1988.
166
B F Skinner. About Behaviorism. Random House, New York, 1976.
J N M Smith, C J Krebs, A Bechara, D Tranel, H Damasio, R Adolphs,
C Rockland, and A R Damasio. Double Dissociation of Conditioning
and Declarative Knowledge Relative to the Amygdala and Hippocampus in Humans. Science, 269:1115–8, 1995.
R Sommer. Personal Space. The Behavioral Basis of Design. PrenticeHall, Inc., Englewood Cliffs, NJ, 1969.
J E Steinmetz. Neuronal activity in the cerebellar interpositus nucleus
during classical nm conditioning with a pontine stimulation cs. Psychological Science, 1:378–382, 1990.
J E Steinmetz, D J Rosen, P F Chapman, D G Lavond, and R F Thompson.
Classical Conditioning of the Rabbit Eyelid Response With a MossyFibre Stimulation CS: I. Pontine Nuclei and Middle Cerebellar Peduncle Stimulation. Behavioral Neuroscience, 100(6):878, 1986.
J E Steinmetz, C G Logan, D J Rosen, J K Thompson, D G Lavond, and
R F Thompson. Initial localization of the acoustic conditioned stimulus projection system to the cerebellum essential for classical eyelid
conditioning. Proceedings of the National Academy of Sciences of the
United States of America, 84(10):3531–5, 1987.
J E Steinmetz, L L Sears, M Gabriel, Y Kubota, and A Poremba. Cerebellar interpositus nucleus lesions disrupt classical nictitating membrane
conditioning but not discriminative avoidance learning in rabbits. Behavioural Brain Research, 45(1):71 – 80, 1991.
G Stenberg, S Wiking, and M Dahl. Judging Words at Face Value: Interference in a Word Processing Task Reveals Automatic Processing of
Affective Facial Expressions. Cognition & Emotion, 12(6):755–782,
1998.
D Stokols. Environmental psychology. Annual Review of Psychology, 29:
253–295, 1978.
167
K T Strongman. Specific emotions theory. In The psychology of emotion,
chapter 8, pages 132–151. John Wiley & Sons, Oxford, England, 1987.
J R Stroop. Studies of interference in serial verbal reactions. Journal of
Experimental Psychology, 18:643–662, 1935.
G J Stuart and B Sakmann. Active propagation of somatic action potentials into neocortical pyramidal cell dendrites. Nature, 367(6458):
69–72, 1994.
L W Swanson and G D Petrovich. What is the amygdala? Trends in
Neurosciences, 21(8):323–331, 1998.
J W Tanaka and M J Farah. Parts and Wholes in Face Recognition. The
Quarterly Journal of Experimental Psychology Section A, 46(2):225–
245, 1993.
S E Taylora and S Thompson. Stalking the elusive ”vividness” effect.
Psychological Review, 89(2):155–181, 1982.
T Tazumi and H Okaichi. Effect of lesions in the lateral nucleus of the
amygdala on fear conditioning using auditory and visual conditioned
stimuli in rats. Neuroscience Research, 43(2):163–170, 2002.
L A Thompson and D W Massaro. Before you see it, you see its parts:
evidence for feature encoding and integration in preschool children and
adults. Cognitive Psychology, 21(3):334–62, 1989.
R F Thompson. The Neurobiology of learning and memory. Science,
233:941–947, 1986.
R F Thompson. In search of memory traces. Annual Review of Psychology, 56:1–23, 2005.
S M Thurman, M A Giese, and E D Grossman. Perceptual and computational analysis of critical features for biological motion. Journal of
Vision, 10(12):1–14, 2010.
168
S S Tomkins. Affect theory. In P Ekman and K R Scherer, editors, Approaches to emotion, pages 163–195. Erlbaum, Hillsdale, NJ, 1984.
J Vallverdú and D Casacuberta. Modelling Hardwired Synthetic Emotions: TPR 2.0. In J Vallverdú and D Casacuberta, editors, Handbook
of Research on Synthetic Emotions and Social Robots, pages 452–463.
IGI Global, 2009.
J Van den Stock, R Righart, and B de Gelder. Body expressions influence
recognition of emotions in the face and voice. Emotion, 7(3):487–94,
2007.
J D Velásquez. Modeling emotions and other motivations in synthetic
agents. In Proceedings of the natioanl Conferrence on Artifical Intelligence, pages 10–15. Citeseer, 1997.
P F M J Verschure, T Voegtlin, and R J Douglas. Environmentally mediated synergy between perception and behavior in mobile robots. Nature, 425:620–624, 2003.
H Wallbott. Bodily expression of emotion. European Journal of Social
Psychology, 28(6):879–896, 1998.
K L Walters and R D Walk. Perception of emotion from body posture.
Bulletin of the Psychonomic Society, 24(5):329–329, 1986.
S S Wang, W Denk, and M Hausser. Coincidence detection in single
dendritic spines mediated by calcium release. Nature Neuroscience, 3
(12):1266–1273, 2000.
J B Watson. Behaviorsm. University of Chicago Press, Chicago, 1930.
T Wehrle and K R Scherer. Towards Computational Modeling of Appraisal Theories. In T Scherer, K R and Schorr, A and Johnstone, editor, Appraisal processes in emotion: Theory, methods, research, pages
350–368. Oxford University Press, New York, 2001.
169
N M Weinberger. Physiological memory in primary auditory cortex: characteristics and mechanisms. Neurobiololy of Learnning and Memory,
70(1-2):226–251, 1998.
N M Weinberger. Specific long-term memory traces in primary auditory
cortex. Nature Reviews Neuroscience, 5(4):279–290, 2004.
G L Wenk. The nucleus basalis magnocellularis cholinergic system: one
hundred years of progress. Neurobiology of Learning and Memory, 67:
85–95, 1997.
C Whissell. The dictionary of affect. In R Plutchik and H Kellerman,
editors, Emotion: Theory, research, and experience. Academic Press,
New York, 1989.
L M Wilcox, R S Allison, S Elfassy, and C Grelik. Personal space in
virtual reality. ACM Transaction on Applied Perception, 3(4):412–428,
2006.
Piotr Winkielman, Kent C Berridge, and Julia L Wilbarger. Unconscious
affective reactions to masked happy versus angry faces influence consumption behavior and judgments of value. Personality & Social Psychology Bulletin, 31(1):121–35, 2005.
L Wittgenstein. Philosophical Investigations. Basil Blackwell, Oxford,
1963.
S N Young and M Leyton. The role of serotonin in human mood and social interaction. Insight from altered tryptophan levels. Pharmacology,
Biochemistry, and Behavior, 71(4):857–65, 2002.
R B Zajonc. On the primacy of affect. American Psychologist, 39(2):
117–123, 1984.
Y Zong, H Dohi, and M Ishizuka. Multimodal presentation markup language mpml with emotion expression functions attached. In Proceedings of the Intl Symp on Multimedia Software Engineering (IEEE Computer Soc, pages 359–365, 2000.
170