* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download A conversation with a 3D face - Dipartimento di Informatica
Survey
Document related concepts
Intelligence explosion wikipedia , lookup
Embodied cognition wikipedia , lookup
Ethics of artificial intelligence wikipedia , lookup
Ecological interface design wikipedia , lookup
History of artificial intelligence wikipedia , lookup
Existential risk from artificial general intelligence wikipedia , lookup
Agent-based model in biology wikipedia , lookup
Wizard of Oz experiment wikipedia , lookup
Philosophy of artificial intelligence wikipedia , lookup
Adaptive collaborative control wikipedia , lookup
Human–computer interaction wikipedia , lookup
Agent (The Matrix) wikipedia , lookup
Cognitive model wikipedia , lookup
Transcript
A conversation with a 3D face M Bilvi2,V. Carofiglio1, MT Cassotta1, B. De Carolis1, F de Rosis1, G. Diomede1, R. Grassano1, C. Pelachoud2 1 Gruppo Interfacce Intelligenti - Dipartimento di Informatica – Università di Bari http://aos2.uniba.it:8080/IntInt.html 2 1. Dipartimento di Informatica e Sistemistica – Università “La Sapienza” di Roma Introduction Natural conversation involves more than speech. Humans communicate using language and a lot of other signals in combination with speech: gestures (pointing at something, emphasising object dimensions, …), gaze (making eye contact, looking down, up, to a particular object, and so on,…), facial expression , body postures and so on. In addition , people, while communicating, exhibit a behaviour that is consistent with their personality, affective state and with the context in which the conversation takes place. Our aim, in the context of the Magicster project (rif), is to build a conversational communication interface that make use of non-linguistic signals when delivering information in order to achieve an effective and natural communication with other humans/artificial agents. To this aim, the best way to incorporate non-linguistic conversational aspects into an interface is to allow users to interact with an embodied Animated Agent which has communicative and expressive capabilities similar to those exhibited by humans (body postures, gestures, face expressions, lip movements, voice intonation, etc.). This is considered to be a necessary feature of a conversational agent (citare), however, in our opinion, simply providing a 2 or 3D character, a full-body virtual humans or a face with multimodal communicative functions is not enough for achieving a believable behaviour. The various definitions of believability discussed in the literature on the topic, involve several dimensions: personality, affect, social intelligence, goal-oriented behaviour, reactivity and responsiveness, consistent evolution, and so on (citare). In our case, this means providing the agent with a personality, a social role and, consequently, social relationships, allowing it to express its emotions but also to refrain from showing them consistently to the conversation cognitive context and the agent goals. In addition to this reflexive capability, our agent has to achieve a very rich expressivity during the conversation showing several communicative functions typically used in human discursive strategies such as the use of: syntactic, dialogic, meta-cognitive, performative, deictic, adjectival, and belief relation functions (for more details …). The demo we propose is about a system allowing a natural conversation between a user and a reflexive conversational agent embodied in a 3D face able to express, using verbal and non-verbal signals, all these communicative functions in a way that is consistent with its personality and with the cognitive conversation context. In order to do this, a set of factors are used to trigger and regulate the displaying and the intensity of display of these communicative functions. Example of this factors are: the agent’s personality, the features of the interlocutor, the social and private relationship between them, and in particular case of emotion display also the nature and valence of emotion. In this paper, since it is aimed at illustrating a system demo, we focus on architectural aspects of such an agent more than on its cognitive modeling aspects (more details can be found in …). 2. MagiCster Conversational Agent Architecture In the architecture of our Agent, we make a main distinction between two components: the Agent’s Mind and its Body. This distinction is motivated by the opportunity to couple the Mind component to different Bodies (virtual humans, 3D faces, 2D characters, and so on), without constraining the symbolic representation of the conversation to a particular Body and by delegating the surface realization of every dialog move to the Body itself. At this stage of our work, the Body we use is a combination of a 3D face model (Greta) that was developed by the University of Roma “La Sapienza” and a speech synthesiser (Festival) that was developed by the University of Edinburgh, in the scope of MagiCster. We will refer to ‘Greta’ to name the ‘3D talking head’ that results from the combination of the two mentioned modules. Greta is capable to express the main (verbal and nonverbal) communicative functions foreseen for our conversational agent: to insure that her behavior is ‘believable’, we have to provide her with an appropriate input, that results from some ‘reasoning ability’, that is from a ‘Mind’. An outline of the Mind of Greta is shown in Figure 1; it includes several modules, all aimed at generating a dialog whose moves are enriched combinations of conversational acts. Let’s see in more detail what this means exactly and how it is implemented. User Moves Discourse Plan MIND XML Spec The main input of Mind is a dialog goal in a particular domain. From this goal, an overall discourse plan is produced for the Agent (either through a planner or by retrieving an appropriate ‘recipe’ from a plan library). Social Context Figura 1. Outline of the Mind Module This plan represents the way that the Agent will try to achieve the specified communicative goal during the conversation. However, the way the dialog goes on is a function not only of this plan but also of the User Moves and of what we call the Social Context. The Social Context is a knowledge base which includes two main components: - an Agent’s Mental State, which is a dynamic BDI model of the rational and affective components of the Agent’s mind and a domain knowledge base, which describes the objects, the events and the actions that may occur in the domain and may influence the Agent’s mental state in a way that (as we will see later on) depends on her ‘personality’. The output of the Mind module is an enriched dialog move that is expressed according to an XML specification which is interpretable by the Body. XML is used as a middle-ware level in order to overcome integration problems between the two components. 2.1 Dialog Manager The type of conversations we simulate, in our first prototype of MagiCster, are information-giving dialogs: in these dialogs, the main function of Greta is to provide some kind of information to the User, in a given domain. She has therefore an overall ‘communication goal’, that she tries to achieve through an appropriate discourse plan. However, MagiCster is a ‘mixed-initiative’ system: so, the User may interrupt Greta in any phase of her discourse, to ask questions; this opens a question-answering subdialog, after which Greta takes again the dialog control, by revising her discourse plan according to what occurred so far. As we said, we are particularly interested in simulating the ‘affective’ aspects of conversations; for this reason, we take medical dialogs as the main application domain for our first prototype and will show examples in this domain. As illustrated in Figure 2, Mind includes several modules: - Hamlet, which is responsible for updating the Agent’s mental state by deciding, according to the Social Context, whether a particular affective state of the Agent should be activated and with which intensity and whether the felt emotion should be displayed or hidden. Hamlet is based on a dynamic bayesian network; - the Dialogue manager, which manages the dialogue flow and decides “what to say” in every dialogue turn. It generates the Agent’s dialogue turn output according to the following control flow: - after a ‘dialog goal’ has been specified, the Agent selects an appropriate discourse plan and generates the first move according to the first steps of this plan. At the end of this first move, the initiative is passed to the User, that can make questions to the Agent regarding the topic under discussion. - The User input is translated into a symbolic communicative act (through a very simple interpretation process) and is passed to Hamlet, that updates Greta’s mental state and possibly returns the name of an ‘affect’ that should be associated with the communicative act. This information, together with the User input and the social context state (???), is employed to produce the next dialogue move; - this dialogue move may be very simple (i.e. greet, thanks, and so on) or may correspond to a small discourse plan (for instance: describe an object and its properties); in both cases, the next step of the Dialogue Manager is to generate, through the MIDA module, an annotated output valid against the APML (Affective Presentation Markup Language) DTD (Rif deliverable). The move is translated into a XML string, in which the natural language act is enriched with the tags that the graphical and the speech generation components of Greta need, to produce the required expressions: rhetorical relations between communicative acts, deictic or adjectival components, certainty values, metacognitive or turn-taking expressions and so on (for more details, see Poggi and Pelachaud,….). - the annotated XML dialogue act may now be passed to the Body Generator module (Greta), that interprets it and decides which signal to convey on which channel and, in case of conflicts (more than one signal on the same channel at the same time), solves the conflict through a dedicated module (the Goal-Media Prioritiser). The last step is to generate, from the XML structure, the FAP and Wave files that realize the talking head. Social Context HAMLET User Moves Discourse plan MAGICSTER DIALOGUE MANAGER (TRINDI) MIDA XML Spec BODY Generator Figure 2. The Architecture of the Dialogue Manager (DM). In order to connect all the components (Hamlet, MIDA, Greta, and the DM itself), we use a Java class (jcontroller), which controls activation, termination and information exchange for the various processes involved in the dialogue management, via socket. The dialogue manager Magicster has been built on the top of TRINDI framework (rif), it is toolkit developed in Prolog for building and experimenting with a dialogue move engine (DME) and an information states (IS). Briefly TRINDI allow to create and manipulate an IS containing information internally stored by an agent. A dialogue move engine, or DME, updates the information state on the basis of the observed dialogue moves and selects appropriate moves to be performed. TRINDIKIT proposes a general system architecture and specifies, as well, formats for defining information states, update rules, dialogue moves and associated algorithms. To build a dialogue move engine, one needs to provide definitions of update rules, moves and algorithms, as well as the internal structure of the information state. The DME forms a core of a complete dialogue system. Simple interpretation, generation, input and output modules are also provided by TRINDIKIT, to simulate an end-to-end dialogue system. The IS and the DME modules can be “shaped” in order to represent the agent mental state and to formalize the chosen dialogue strategy. In MagiCster we personalized a number of modules: the information state (IS), and a number of resources hooked up to it. In addition to the control module, which wires together the other modules and establishes the execution order, MagiCster includes the following components: input, which receives input from the user, and allow to handle synonymous; interpret, which interprets utterances as dialogue moves with some content; generate, which buils up the tagged dialogue acts; update, which updates the information state based on interpreted moves; and select, which selects the next moves to perform. The last two constitute the DME of MagiCster. In particular, the update module consist of a set of update rules that allow for updating the information state after a dialogue move has been executed. In MagiCster the IS is divided in a private (the agent’s set of beliefs and plans) and a shared (information shared among dialogue participants established during the conversation) parts. The IS containes a static part that remains constant during a dialogue (rules for interpreting utterances, updating the IS, selecting further moves, move definition, plan libraries, static databases, etc.) and a dynamic one that changes during a dialogue. 3. An Example To describe how the system works, we show a simple example of outputs produced in the case of a medical dialogue in which an artificial agent representing a doctor informing a patient (the user) about her current disease and explaining her the appropriate therapy. After the greeting phase, the agent takes the initiative and inform the patient about her disease. Then, the patient can make any question concerning the topic in focus of the conversation. For instance, she can ask for explanation about the disease, its severity, the related therapy and so on. The system will answer in a way appropriate to the context. The dialogue then evolves in this way. The user may interrupt the agent in any moment and make questions. Obviously, the type of user question may cause the selection of a different plan step or a the activation of a different plan. The following Figure 3 reports an example of dialogue in which the system output is annotated. S: <performative type="inform" affect="sorry-for"> I'm sorry to tell you that you have been diagnosed as suffering from a <adjectival type="small">mild </adjectival> form of what we call angina pectoris. </performative> U: what is angina? S: <performative type="inform"><belief-relation type="gen-spec">This is </belief-relation> a spasm of <deictic coord="0 30 0">chest</deictic><beliefrelation type="cause-effect"> resulting from </belief-relation> overexertion when heart is diseased.</performative> U: which is the therapy? how to cure it? S: <performative type="suggest"> <belief-relation type="cause-effect"> To solve <deictic coord=”10 10 0>your</deictic> problem </belief-relation> there are two drugs I would like you to take </performative> <performative type="inform"> <topic-comment type="comment"> The first one </topic-comment> is Aspirin <belief-relation type="sequence">and</beliefrelation> <topic-comment type="comment"> the second one </topic-comment> is Adenolol </performative> … and so on. Figure 3 . An example of dialogue act tagged according to APML XML specification [?]. Annotations are translated into a face expression and voice intonations. The following table reports how the Body Generator module interprets this annotation language, in the case of the 3D face, and how it turnes the language tags into facial expressions. certainty certain: small frown adjectival: small, subtle, tiny, difficult: small aperture of the eye performative: suggest: inform: small raised eyebrow, look at, head a bit aside right **uncertain** look at deictic: gaze direction to referent turn allocation: give turn: look at affective: happy-for: sorry-for: smile ?? cath Table 1. sarebbe carino mettere la faccia per ogni riga nella table Figure 4. Some expressive Greta’s wrinkles. 4 Conclusion and Future Work In this paper, we have presented the architecture of a conversational agent embodied in a 3D face (Greta) that tries to achieve a believable behavior during the conversation. Its behavior has to reflect what the agent mind has decided according to the personality and other factors represented in the Social Context. In particular, our aim is to build a reflexive agent, that is, an agent who, when feeling an emotion, “decides” whether to display it immediately or not. The architecture we proposed has been developed to this aim. In the future, we plan to look at how to deal with multiagent dialogues and how to link the dialogue manger to a high -level dialogue planner. References (tutte da rivedere!) E. André, T. Rist, S. Van Mulken, M. Klesen and S. Baldes: The automated design of believable dialogues for animated presentation teams. In S. Prevost, J. Cassel, J.Sullivan and E.Churchill (eds), Embodied Conversational Characters. MIT Press, 2000. J Arafa, P Charlton, A Mamdani and P Fehin: Designing and building Personal Service Assistants with personality. Proceedings of the Workshop on Embodied Conversational Characters, S Prevost and E Churchill (Eds), Tahoe City, 1999. G. Ball and J. Breese: Emotion and personality in a conversational agent. In S. Prevost, J. Cassel, J.Sullivan and E.Churchill (eds), Embodied Conversational Characters. MIT Press, 2000. J. Bates: The role of emotion in believable agents. Communications of the ACM, 37, 7, 1994. S. Carberry and L. Schroeder: Recognizing and conveying attitude and its underlying motivation. Second Workshop on ‘Attitudes, Personality and Emotions in User-Adapted Interaction’. 2001. J. G. Carbonell: Towards a process model of human personality traits. Artificial Intelligence, 1980. C Castelfranchi, F de Rosis, R Falcone and S Pizzutilo: Personality traits and social attitudes in multiagent cooperation. Applied Artificial Intelligence, 12, 7-8, 1998. C. Castelfranchi: Affective appraisal versus cognitive evaluation in social emotions and interactions. In A Paiva (Ed): Affective Interactions. Springer, 2000. C. Castelfranchi, F. de Rosis and V. Carofiglio: Can computers deliberately deceive? A simulation attempt for Turing’s ‘Imitation Game’. submitted for publication. G Clore and A Ortony: The semantics of the affective lexicon. In V Hamilton, G H Bower and N H Frjida (Eds): Cognitive perspectives on emotion and motivation. Kluwer, 1988. F. de Rosis, F. Grasso and D. Berry: Refining instructional text generation after evaluation. Artificial Intelligence in Medicine. 17, 1999. F. de Rosis, F. Grasso, C. Castelfranchi and I. Poggi:Modeling conflict-resolution dialogs. In: R Dieng and J Mueller (Eds): Computational Conflicts: Conflict modeling for Distributed Intelligent Systems. Kluwer Publ Co, 2000. F. de Rosis and F. Grasso: Affective natural language generation. In: A. Paiva (Ed): Affective Interactions. Springer LNAI1814, 2000. C D Dryer: Dominance and valence: A two-factor model for emotion in HCI. P.Ekman and W. Friesen: Unmasking the face: A guide to recognizing emotions from facial clues. Prentice Hall Inc, 1978. C. Elliott and G. Siegle: Variables influencing the intensity of simulated affective states. Proceedings of the AAAI Spring Symposium on Mental States ’93. 1993. C Elliott: Research problems in the use of a shallow Artificial Intelligence model of personality and emotion. Proceedings of 12th AAAI Conference, 1994. C. Elliott: Why boys like motorcycles: using emotion theory to find structure in humorous stories. Proceedings of the Workshop on EBAA ’99, 1999. J. Fogg and C. Nass: Silicon sycophants: the effects of computers that flatter. International Journal of HumanComputer Studies, 46, 1997. N.H.Frjida and J Swagerman: Can computers feel? Theory and design of an emotional system. Cognition and Emotion, 1987 E. H. Hovy: Affect in text. In: Generating natural language under pragmatic constraints (Chapter 4). Lawrence Erlbaum Associates, Publishers, ???? E. Hudlicka and J-M. Fellous: Review of computational models of emotions. Where? C Isbister and C Nass: Personality in conversational characters: building better digital interaction partners using knowledge about human personality preferences and perceptions. In: S Prevost and E Churchill (Eds): Proceedings of the Workshop on Embodied Conversational Characters, 1999. J. C. Lester, S. G. Towns, C.B. Callaway, J.L. Voerman and P.J. FitzGerald: Deictic and emotive communication in animated pedagogical agents. In S. Prevost, J. Cassel, J.Sullivan and E.Churchill (eds), Embodied Conversational Characters. MIT Press, 2000. A. B. Loyall and J. Bates: Personality-rich believable agents that use language. Proceedings of Autonomous Agents 1997. R Mc Crae and O P John: An introduction to the Five-Factor model and its applications. Y Moon and C Nass: How “Real”! are Computer Personalties? Communication Research, 23, 6, 1996 Y.Moon and C. Nass: Are computers scapegoats? Attributions of responsibility in human-computer interactions. International Journal of Human-Computer Studies, 49, 1998. C Nass, Y Moon, B J Fogg, B Reeves and C D Dryer: Can computer personalities be human personalities? International Journal of Human-Computer Studies. 43, 1995. C. Nass, B.J. Fogg and Y. Moon: Can computers be teammates? International Journal of Human-Computer Studies. 1996. C. Nass, C. Isbister and E-J. Lee: Truth is beauty: Researching embodied conversational agents. In S. Prevost, J. Cassel, J.Sullivan and E.Churchill (eds), Embodied Conversational Characters. MIT Press, 2000. A. Ortony, G.L. Clore and A. Collins: The cognitive structure of emotions. Cambridge University Press, 1988. A.Ortony: Subjective importance and computational models of emotions. In V Hamilton, G H Bower and N H Frjida (Eds): Cognitive perspectives on emotion and motivation. Kluwer, 1988. R Pfeifer and D W Nicholas: Toward computational models of emotion. In L Steels and J A Campbell (Eds): Progress in Artificial Intelligence. Ellis Horwood, 1985. R Pfeifer: Artificial Intelligence models of emotion. In V Hamilton, G H Bower and N H Frjida (Eds): Cognitive perspectives on emotion and motivation. Kluwer, 1988. I. Poggi and C. Pelachaud: Emotional meaning and expression in animated faces. In A. Paiva (Ed): Affective Interactions. Springer, 2000. E. Rich: User modeling via stereotypes. In: A. Kobsa and W. Wahlster (Eds): User Modeling in Dialog Systems. I C Taylor, F R McInness, S Love, J C Foster and M A Jack: Providing animated characters with designated personality profiles. Proceedings of the Workshop on Embodied Conversational Characters, S Prevost and E Churchill (Eds), Tahoe City, 1999 A.M. Turing: Computing Machinery and Intelligence. Mind. London N.S., 1950. M. A. Walker, J. A. Cahn and S. J. Whittaker: Improvising linguistic style: Social and affective bases for Agent personality. Proceedings of Autonomous Agents 1997. C Wegman: Emotion and argumentation in expression of opinion. In V Hamilton, G H Bower and N H Frjida (Eds): Cognitive perspectives on emotion and motivation. Kluwer, 1988. Main References 1. Andre, T. Rist, S. van Mulken, M. Klesen, and S. Baldes. The automated design of believable dialogues for animated presentation teams. In S. Prevost J. Cassell, J. Sullivan and E. Churchill, editors, Embodied Conversational Characters. MITpress, Cambridge, MA, 2000. 2. G .Ball and J. Breese. Emotion and personality in a conversational agent. In S. Prevost J. Cassell, J. Sullivan and E. Churchill editors. Embodied Conversational Characters. MITpress, Cambridge, MA, 2000. 3. J. Cassell, C. Pelachaud, N.I. Badler, M. Steedman, B. Achorn, T. Becket, B. Douville, S. Prevost, and M. Stone. Animated conversation: Rule-based generation of facial expression, gesture and spoken intonation for multiple conversational agents. In Computer Graphics Proceedings, Annual Conference Series, pages 413--420. ACM SIGGRAPH, 1994. 1. J. Cassell, J. Bickmore, M. Billinghurst, L. Campbell, K. Chang, H. Vilhjalmsson, and H. Yan. Embodiment in conversational interfaces: Rea. In Proceedings of CHI'99, pages 520--527, Pittsburgh, PA, 1999. 2. J. Cassell and M. Stone. Living hand and mouth. Psychological theories about speech and gestures in interactive dialogue systems. In AAAI 99 Fall Symposium on Psychological Models of Communication in Collaborative Systems, 1999. 3. B. De Carolis., C. Pelachaud and I. Poggi. Verbal and nonverbal discourse planning. In Proceedings of International Agents 2000 Workshop on Achieving Human-Like Behavior in Interactive Animated Agents. Barcelona, 2000. 4. P.Ekman and W. Friesen. Unmasking the Face: A guide to recognizing emotions from facial clues. Prentice-Hall, Inc., 1975. 5. C. Elliott. An Affective Reasoner: A process model of emotions in a multiagent system, Technical Report No. 32 of The Institute for the Learning Sciences Northwestern University,1992. 6. E.H. Hovy. Automated Discourse Generation using Discourse Structure Relations. Artificial Intelligence, 63, (1993), 341-385. 7. J.C. Lester, S.G. Stuart, C.B. Callaway, J.L. Voerman, and P.J.Fitzgerald. Deictic and emotive communication in animated pedagogical agents. In S. Prevost J. Cassell, J. Sullivan and E. Churchill, editors, Embodied Conversational Characters. MITpress, Cambridge, MA, 2000. 8. M. Maybury. Communicative Acts for Explanation Generation. Int. J Man-Machine Studies . 37, (1992),135-172. 9. J.D. Moore. Participating in Explanatory Dialogues. Interpreting and Responding to Question in Context. ACL-MIT Press series in NLP, (1995). 10. J.D. Moore and C. Paris. Planning Text for Advisory Dialogues: Capturing Intentional and Rhetorical Information. Computational Linguistics, 19(4), (1993), 651-694. 11. A. Ortony, G.L. Clore and A. Collins. The Cognitive Structure of Emotions. Cambridge University Press, 1988. 12. R. Picard. Affective Computing, MIT Press, 1997. 13. I. Poggi, C. Pelachaud, and F. de Rosis. Eye communication in a conversational 3D synthetic agent. Special Issue on Behavior Planning for Life-Like Characters and Avatars of AI Communications. 2000. 14. J. Rickel and W.L. Johnson. Animated agents for procedural training in virtual reality: Perception, cognition, and motor control. Applied Artificial Intelligence, 13:343--382, 1999.