Download A conversation with a 3D face - Dipartimento di Informatica

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Intelligence explosion wikipedia , lookup

Embodied cognition wikipedia , lookup

Ethics of artificial intelligence wikipedia , lookup

Ecological interface design wikipedia , lookup

History of artificial intelligence wikipedia , lookup

Existential risk from artificial general intelligence wikipedia , lookup

Agent-based model in biology wikipedia , lookup

Wizard of Oz experiment wikipedia , lookup

Philosophy of artificial intelligence wikipedia , lookup

Enactivism wikipedia , lookup

Adaptive collaborative control wikipedia , lookup

Human–computer interaction wikipedia , lookup

Agent (The Matrix) wikipedia , lookup

Cognitive model wikipedia , lookup

Affective computing wikipedia , lookup

Embodied cognitive science wikipedia , lookup

Transcript
A conversation with a 3D face
M Bilvi2,V. Carofiglio1, MT Cassotta1, B. De Carolis1,
F de Rosis1, G. Diomede1, R. Grassano1, C. Pelachoud2
1
Gruppo Interfacce Intelligenti - Dipartimento di Informatica – Università di Bari
http://aos2.uniba.it:8080/IntInt.html
2
1.
Dipartimento di Informatica e Sistemistica – Università “La Sapienza” di Roma
Introduction
Natural conversation involves more than speech. Humans communicate using language and a lot of other signals in
combination with speech: gestures (pointing at something, emphasising object dimensions, …), gaze (making eye
contact, looking down, up, to a particular object, and so on,…), facial expression , body postures and so on. In addition ,
people, while communicating, exhibit a behaviour that is consistent with their personality, affective state and with the
context in which the conversation takes place.
Our aim, in the context of the Magicster project (rif), is to build a conversational communication interface that make
use of non-linguistic signals when delivering information in order to achieve an effective and natural communication
with other humans/artificial agents. To this aim, the best way to incorporate non-linguistic conversational aspects into
an interface is to allow users to interact with an embodied Animated Agent which has communicative and expressive
capabilities similar to those exhibited by humans (body postures, gestures, face expressions, lip movements, voice
intonation, etc.). This is considered to be a necessary feature of a conversational agent (citare), however, in our opinion,
simply providing a 2 or 3D character, a full-body virtual humans or a face with multimodal communicative functions is
not enough for achieving a believable behaviour. The various definitions of believability discussed in the literature on
the topic, involve several dimensions: personality, affect, social intelligence, goal-oriented behaviour, reactivity and
responsiveness, consistent evolution, and so on (citare). In our case, this means providing the agent with a personality, a
social role and, consequently, social relationships, allowing it to express its emotions but also to refrain from showing
them consistently to the conversation cognitive context and the agent goals. In addition to this reflexive capability, our
agent has to achieve a very rich expressivity during the conversation showing several communicative functions
typically used in human discursive strategies such as the use of: syntactic, dialogic, meta-cognitive, performative,
deictic, adjectival, and belief relation functions (for more details …).
The demo we propose is about a system allowing a natural conversation between a user and a reflexive conversational
agent embodied in a 3D face able to express, using verbal and non-verbal signals, all these communicative functions in
a way that is consistent with its personality and with the cognitive conversation context. In order to do this, a set of
factors are used to trigger and regulate the displaying and the intensity of display of these communicative functions.
Example of this factors are: the agent’s personality, the features of the interlocutor, the social and private relationship
between them, and in particular case of emotion display also the nature and valence of emotion. In this paper, since it
is aimed at illustrating a system demo, we focus on architectural aspects of such an agent more than on its cognitive
modeling aspects (more details can be found in …).
2.
MagiCster Conversational Agent Architecture
In the architecture of our Agent, we make a main distinction between two components: the Agent’s Mind and its Body.
This distinction is motivated by the opportunity to couple the Mind component to different Bodies (virtual humans, 3D
faces, 2D characters, and so on), without constraining the symbolic representation of the conversation to a particular
Body and by delegating the surface realization of every dialog move to the Body itself.
At this stage of our work, the Body we use is a combination of a 3D face model (Greta) that was developed by the
University of Roma “La Sapienza” and a speech synthesiser (Festival) that was developed by the University of
Edinburgh, in the scope of MagiCster. We will refer to ‘Greta’ to name the ‘3D talking head’ that results from the
combination of the two mentioned modules. Greta is capable to express the main (verbal and nonverbal) communicative
functions foreseen for our conversational agent: to insure that her behavior is ‘believable’, we have to provide her with
an appropriate input, that results from some ‘reasoning ability’, that is from a ‘Mind’.
An outline of the Mind of Greta is shown in Figure 1; it includes several modules, all aimed at generating a dialog
whose moves are enriched combinations of conversational acts. Let’s see in more detail what this means exactly and
how it is implemented.
User Moves
Discourse
Plan
MIND
XML Spec
The main input of Mind is a dialog goal in a
particular domain. From this goal, an overall
discourse plan is produced for the Agent (either
through a planner or by retrieving an appropriate
‘recipe’ from a plan library).
Social
Context
Figura 1. Outline of the Mind Module
This plan represents the way that the Agent will try to achieve the specified communicative goal during the
conversation. However, the way the dialog goes on is a function not only of this plan but also of the User Moves and of
what we call the Social Context. The Social Context is a knowledge base which includes two main components:
-
an Agent’s Mental State, which is a dynamic BDI model of the rational and affective components of the
Agent’s mind and
a domain knowledge base, which describes the objects, the events and the actions that may occur in the domain
and may influence the Agent’s mental state in a way that (as we will see later on) depends on her ‘personality’.
The output of the Mind module is an enriched dialog move that is expressed according to an XML specification which
is interpretable by the Body. XML is used as a middle-ware level in order to overcome integration problems between
the two components.
2.1 Dialog Manager
The type of conversations we simulate, in our first prototype of MagiCster, are information-giving dialogs: in these
dialogs, the main function of Greta is to provide some kind of information to the User, in a given domain. She has
therefore an overall ‘communication goal’, that she tries to achieve through an appropriate discourse plan. However,
MagiCster is a ‘mixed-initiative’ system: so, the User may interrupt Greta in any phase of her discourse, to ask
questions; this opens a question-answering subdialog, after which Greta takes again the dialog control, by revising her
discourse plan according to what occurred so far. As we said, we are particularly interested in simulating the ‘affective’
aspects of conversations; for this reason, we take medical dialogs as the main application domain for our first prototype
and will show examples in this domain.
As illustrated in Figure 2, Mind includes several modules:
- Hamlet, which is responsible for updating the Agent’s mental state by deciding, according to the Social Context,
whether a particular affective state of the Agent should be activated and with which intensity and whether the felt
emotion should be displayed or hidden. Hamlet is based on a dynamic bayesian network;
- the Dialogue manager, which manages the dialogue flow and decides “what to say” in every dialogue turn. It
generates the Agent’s dialogue turn output according to the following control flow:
- after a ‘dialog goal’ has been specified, the Agent selects an appropriate discourse plan and generates the first
move according to the first steps of this plan. At the end of this first move, the initiative is passed to the User,
that can make questions to the Agent regarding the topic under discussion.
- The User input is translated into a symbolic communicative act (through a very simple interpretation process)
and is passed to Hamlet, that updates Greta’s mental state and possibly returns the name of an ‘affect’ that
should be associated with the communicative act. This information, together with the User input and the social
context state (???), is employed to produce the next dialogue move;
- this dialogue move may be very simple (i.e. greet, thanks, and so on) or may correspond to a small discourse
plan (for instance: describe an object and its properties); in both cases, the next step of the Dialogue Manager
is to generate, through the MIDA module, an annotated output valid against the APML (Affective Presentation
Markup Language) DTD (Rif deliverable). The move is translated into a XML string, in which the natural
language act is enriched with the tags that the graphical and the speech generation components of Greta need,
to produce the required expressions: rhetorical relations between communicative acts, deictic or adjectival
components, certainty values, metacognitive or turn-taking expressions and so on (for more details, see Poggi
and Pelachaud,….).
- the annotated XML dialogue act may now be passed to the Body Generator module (Greta), that interprets it
and decides which signal to convey on which channel and, in case of conflicts (more than one signal on the
same channel at the same time), solves the conflict through a dedicated module (the Goal-Media Prioritiser).
The last step is to generate, from the XML structure, the FAP and Wave files that realize the talking head.
Social Context
HAMLET
User Moves
Discourse plan
MAGICSTER DIALOGUE
MANAGER
(TRINDI)
MIDA
XML Spec
BODY Generator
Figure 2. The Architecture of the Dialogue Manager (DM).
In order to connect all the components (Hamlet, MIDA, Greta, and the DM itself), we use a Java class (jcontroller),
which controls activation, termination and information exchange for the various processes involved in the dialogue
management, via socket.
The dialogue manager Magicster has been built on the top of TRINDI framework (rif), it is toolkit developed in Prolog
for building and experimenting with a dialogue move engine (DME) and an information states (IS). Briefly TRINDI
allow to create and manipulate an IS containing information internally stored by an agent. A dialogue move engine, or
DME, updates the information state on the basis of the observed dialogue moves and selects appropriate moves to be
performed. TRINDIKIT proposes a general system architecture and specifies, as well, formats for defining information
states, update rules, dialogue moves and associated algorithms. To build a dialogue move engine, one needs to provide
definitions of update rules, moves and algorithms, as well as the internal structure of the information state. The DME
forms a core of a complete dialogue system. Simple interpretation, generation, input and output modules are also
provided by TRINDIKIT, to simulate an end-to-end dialogue system.
The IS and the DME modules can be “shaped” in order to represent the agent mental state and to formalize the chosen
dialogue strategy. In MagiCster we personalized a number of modules: the information state (IS), and a number of
resources hooked up to it. In addition to the control module, which wires together the other modules and establishes the
execution order, MagiCster includes the following components: input, which receives input from the user, and allow to
handle synonymous; interpret, which interprets utterances as dialogue moves with some content; generate, which buils
up the tagged dialogue acts; update, which updates the information state based on interpreted moves; and select, which
selects the next moves to perform. The last two constitute the DME of MagiCster. In particular, the update module
consist of a set of update rules that allow for updating the information state after a dialogue move has been executed.
In MagiCster the IS is divided in a private (the agent’s set of beliefs and plans) and a shared (information shared among
dialogue participants established during the conversation) parts. The IS containes a static part that remains constant
during a dialogue (rules for interpreting utterances, updating the IS, selecting further moves, move definition, plan
libraries, static databases, etc.) and a dynamic one that changes during a dialogue.
3. An Example
To describe how the system works, we show a simple example of outputs produced in the case of a medical
dialogue in which an artificial agent representing a doctor informing a patient (the user) about her current disease
and explaining her the appropriate therapy.
After the greeting phase, the agent takes the initiative and inform the patient about her disease. Then, the patient
can make any question concerning the topic in focus of the conversation. For instance, she can ask for
explanation about the disease, its severity, the related therapy and so on. The system will answer in a way
appropriate to the context. The dialogue then evolves in this way. The user may interrupt the agent in any moment
and make questions. Obviously, the type of user question may cause the selection of a different plan step or a the
activation of a different plan.
The following Figure 3 reports an example of dialogue in which the system output is annotated.
S: <performative type="inform" affect="sorry-for"> I'm sorry to tell you that
you have been diagnosed as suffering from a <adjectival type="small">mild
</adjectival> form of what we call angina pectoris. </performative>
U: what is angina?
S: <performative type="inform"><belief-relation type="gen-spec">This is
</belief-relation> a spasm of <deictic coord="0 30 0">chest</deictic><beliefrelation type="cause-effect"> resulting from </belief-relation> overexertion
when heart is diseased.</performative>
U: which is the therapy? how to cure it?
S: <performative type="suggest"> <belief-relation type="cause-effect"> To solve
<deictic coord=”10 10 0>your</deictic> problem </belief-relation> there are two
drugs I would like you to take </performative>
<performative type="inform"> <topic-comment type="comment"> The first one
</topic-comment> is Aspirin <belief-relation type="sequence">and</beliefrelation> <topic-comment type="comment"> the second one </topic-comment> is
Adenolol </performative>
…
and so on.
Figure 3 . An example of dialogue act tagged according to APML XML specification [?].
Annotations are translated into a face expression and voice intonations. The following table reports how the Body
Generator module interprets this annotation language, in the case of the 3D face, and how it turnes the language tags
into facial expressions.
certainty
certain:
small frown
adjectival:
small, subtle, tiny, difficult:
small aperture of the eye
performative:
suggest:
inform:
small raised eyebrow, look at, head a bit aside right
**uncertain**
look at
deictic:
gaze direction to referent
turn allocation:
give turn:
look at
affective:
happy-for:
sorry-for:
smile
?? cath
Table 1. sarebbe carino mettere la faccia per ogni riga nella table
Figure 4. Some expressive Greta’s wrinkles.
4 Conclusion and Future Work
In this paper, we have presented the architecture of a conversational agent embodied in a 3D face (Greta) that
tries to achieve a believable behavior during the conversation. Its behavior has to reflect what the agent mind has
decided according to the personality and other factors represented in the Social Context. In particular, our aim is
to build a reflexive agent, that is, an agent who, when feeling an emotion, “decides” whether to display it
immediately or not. The architecture we proposed has been developed to this aim. In the future, we plan to look
at how to deal with multiagent dialogues and how to link the dialogue manger to a high -level dialogue planner.
References (tutte da rivedere!)
E. André, T. Rist, S. Van Mulken, M. Klesen and S. Baldes: The automated design of believable dialogues for animated
presentation teams. In S. Prevost, J. Cassel, J.Sullivan and E.Churchill (eds), Embodied Conversational Characters.
MIT Press, 2000.
J Arafa, P Charlton, A Mamdani and P Fehin: Designing and building Personal Service Assistants with personality.
Proceedings of the Workshop on Embodied Conversational Characters, S Prevost and E Churchill (Eds), Tahoe City,
1999.
G. Ball and J. Breese: Emotion and personality in a conversational agent. In S. Prevost, J. Cassel, J.Sullivan and
E.Churchill (eds), Embodied Conversational Characters. MIT Press, 2000.
J. Bates: The role of emotion in believable agents. Communications of the ACM, 37, 7, 1994.
S. Carberry and L. Schroeder: Recognizing and conveying attitude and its underlying motivation. Second Workshop on
‘Attitudes, Personality and Emotions in User-Adapted Interaction’. 2001.
J. G. Carbonell: Towards a process model of human personality traits. Artificial Intelligence, 1980.
C Castelfranchi, F de Rosis, R Falcone and S Pizzutilo: Personality traits and social attitudes in multiagent cooperation.
Applied Artificial Intelligence, 12, 7-8, 1998.
C. Castelfranchi: Affective appraisal versus cognitive evaluation in social emotions and interactions. In A Paiva (Ed):
Affective Interactions. Springer, 2000.
C. Castelfranchi, F. de Rosis and V. Carofiglio: Can computers deliberately deceive? A simulation attempt for Turing’s
‘Imitation Game’. submitted for publication.
G Clore and A Ortony: The semantics of the affective lexicon. In V Hamilton, G H Bower and N H Frjida (Eds):
Cognitive perspectives on emotion and motivation. Kluwer, 1988.
F. de Rosis, F. Grasso and D. Berry: Refining instructional text generation after evaluation. Artificial Intelligence in
Medicine. 17, 1999.
F. de Rosis, F. Grasso, C. Castelfranchi and I. Poggi:Modeling conflict-resolution dialogs. In: R Dieng and J Mueller
(Eds): Computational Conflicts: Conflict modeling for Distributed Intelligent Systems. Kluwer Publ Co, 2000.
F. de Rosis and F. Grasso: Affective natural language generation. In: A. Paiva (Ed): Affective Interactions. Springer
LNAI1814, 2000.
C D Dryer: Dominance and valence: A two-factor model for emotion in HCI.
P.Ekman and W. Friesen: Unmasking the face: A guide to recognizing emotions from facial clues. Prentice Hall Inc,
1978.
C. Elliott and G. Siegle: Variables influencing the intensity of simulated affective states. Proceedings of the AAAI
Spring Symposium on Mental States ’93. 1993.
C Elliott: Research problems in the use of a shallow Artificial Intelligence model of personality and emotion.
Proceedings of 12th AAAI Conference, 1994.
C. Elliott: Why boys like motorcycles: using emotion theory to find structure in humorous stories. Proceedings of the
Workshop on EBAA ’99, 1999.
J. Fogg and C. Nass: Silicon sycophants: the effects of computers that flatter. International Journal of HumanComputer Studies, 46, 1997.
N.H.Frjida and J Swagerman: Can computers feel? Theory and design of an emotional system. Cognition and Emotion,
1987
E. H. Hovy: Affect in text. In: Generating natural language under pragmatic constraints (Chapter 4). Lawrence
Erlbaum Associates, Publishers, ????
E. Hudlicka and J-M. Fellous: Review of computational models of emotions. Where?
C Isbister and C Nass: Personality in conversational characters: building better digital interaction partners using
knowledge about human personality preferences and perceptions. In: S Prevost and E Churchill (Eds): Proceedings of
the Workshop on Embodied Conversational Characters, 1999.
J. C. Lester, S. G. Towns, C.B. Callaway, J.L. Voerman and P.J. FitzGerald: Deictic and emotive communication in
animated pedagogical agents. In S. Prevost, J. Cassel, J.Sullivan and E.Churchill (eds), Embodied Conversational
Characters. MIT Press, 2000.
A. B. Loyall and J. Bates: Personality-rich believable agents that use language. Proceedings of Autonomous Agents
1997.
R Mc Crae and O P John: An introduction to the Five-Factor model and its applications.
Y Moon and C Nass: How “Real”! are Computer Personalties? Communication Research, 23, 6, 1996
Y.Moon and C. Nass: Are computers scapegoats? Attributions of responsibility in human-computer interactions.
International Journal of Human-Computer Studies, 49, 1998.
C Nass, Y Moon, B J Fogg, B Reeves and C D Dryer: Can computer personalities be human personalities?
International Journal of Human-Computer Studies. 43, 1995.
C. Nass, B.J. Fogg and Y. Moon: Can computers be teammates? International Journal of Human-Computer Studies.
1996.
C. Nass, C. Isbister and E-J. Lee: Truth is beauty: Researching embodied conversational agents. In S. Prevost, J. Cassel,
J.Sullivan and E.Churchill (eds), Embodied Conversational Characters. MIT Press, 2000.
A. Ortony, G.L. Clore and A. Collins: The cognitive structure of emotions. Cambridge University Press, 1988.
A.Ortony: Subjective importance and computational models of emotions. In V Hamilton, G H Bower and N H Frjida
(Eds): Cognitive perspectives on emotion and motivation. Kluwer, 1988.
R Pfeifer and D W Nicholas: Toward computational models of emotion. In L Steels and J A Campbell (Eds): Progress
in Artificial Intelligence. Ellis Horwood, 1985.
R Pfeifer: Artificial Intelligence models of emotion. In V Hamilton, G H Bower and N H Frjida (Eds): Cognitive
perspectives on emotion and motivation. Kluwer, 1988.
I. Poggi and C. Pelachaud: Emotional meaning and expression in animated faces. In A. Paiva (Ed): Affective
Interactions. Springer, 2000.
E. Rich: User modeling via stereotypes. In: A. Kobsa and W. Wahlster (Eds): User Modeling in Dialog Systems.
I C Taylor, F R McInness, S Love, J C Foster and M A Jack: Providing animated characters with designated personality
profiles. Proceedings of the Workshop on Embodied Conversational Characters, S Prevost and E Churchill (Eds),
Tahoe City, 1999
A.M. Turing: Computing Machinery and Intelligence. Mind. London N.S., 1950.
M. A. Walker, J. A. Cahn and S. J. Whittaker: Improvising linguistic style: Social and affective bases for Agent
personality. Proceedings of Autonomous Agents 1997.
C Wegman: Emotion and argumentation in expression of opinion. In V Hamilton, G H Bower and N H Frjida (Eds):
Cognitive perspectives on emotion and motivation. Kluwer, 1988.
Main References
1. Andre, T. Rist, S. van Mulken, M. Klesen, and S. Baldes. The automated design of believable dialogues for animated presentation
teams. In S. Prevost J. Cassell, J. Sullivan and E. Churchill, editors, Embodied Conversational Characters. MITpress,
Cambridge, MA, 2000.
2. G .Ball and J. Breese. Emotion and personality in a conversational agent. In S. Prevost J. Cassell, J. Sullivan and E. Churchill
editors. Embodied Conversational Characters. MITpress, Cambridge, MA, 2000.
3.
J. Cassell, C. Pelachaud, N.I. Badler, M. Steedman, B. Achorn, T. Becket, B. Douville, S. Prevost, and M. Stone. Animated
conversation: Rule-based generation of facial expression, gesture and spoken intonation for multiple conversational agents. In
Computer Graphics Proceedings, Annual Conference Series, pages 413--420. ACM SIGGRAPH, 1994.
1.
J. Cassell, J. Bickmore, M. Billinghurst, L. Campbell, K. Chang, H. Vilhjalmsson, and H. Yan. Embodiment in conversational
interfaces: Rea. In Proceedings of CHI'99, pages 520--527, Pittsburgh, PA, 1999.
2.
J. Cassell and M. Stone. Living hand and mouth. Psychological theories about speech and gestures in interactive dialogue
systems. In AAAI 99 Fall Symposium on Psychological Models of Communication in Collaborative Systems, 1999.
3.
B. De Carolis., C. Pelachaud and I. Poggi. Verbal and nonverbal discourse planning. In Proceedings of International Agents
2000 Workshop on Achieving Human-Like Behavior in Interactive Animated Agents. Barcelona, 2000.
4.
P.Ekman and W. Friesen. Unmasking the Face: A guide to recognizing emotions from facial clues. Prentice-Hall, Inc., 1975.
5.
C. Elliott. An Affective Reasoner: A process model of emotions in a multiagent system, Technical Report No. 32 of The
Institute for the Learning Sciences Northwestern University,1992.
6.
E.H. Hovy. Automated Discourse Generation using Discourse Structure Relations. Artificial Intelligence, 63, (1993), 341-385.
7.
J.C. Lester, S.G. Stuart, C.B. Callaway, J.L. Voerman, and P.J.Fitzgerald. Deictic and emotive communication in animated
pedagogical agents. In S. Prevost J. Cassell, J. Sullivan and E. Churchill, editors, Embodied Conversational Characters.
MITpress, Cambridge, MA, 2000.
8.
M. Maybury. Communicative Acts for Explanation Generation. Int. J Man-Machine Studies . 37, (1992),135-172.
9.
J.D. Moore. Participating in Explanatory Dialogues. Interpreting and Responding to Question in Context. ACL-MIT Press series
in NLP, (1995).
10. J.D. Moore and C. Paris. Planning Text for Advisory Dialogues: Capturing Intentional and Rhetorical Information.
Computational Linguistics, 19(4), (1993), 651-694.
11. A. Ortony, G.L. Clore and A. Collins. The Cognitive Structure of Emotions. Cambridge University Press, 1988.
12. R. Picard. Affective Computing, MIT Press, 1997.
13. I. Poggi, C. Pelachaud, and F. de Rosis. Eye communication in a conversational 3D synthetic agent. Special Issue on Behavior
Planning for Life-Like Characters and Avatars of AI Communications. 2000.
14. J. Rickel and W.L. Johnson. Animated agents for procedural training in virtual reality: Perception, cognition, and
motor control. Applied Artificial Intelligence, 13:343--382, 1999.