Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Multimodal Dialogue Analysis INOUE, Masashi Yamagata University 29-Nov-09 @FIU Dr. Tao Li’s Group Name of the discipline • Computational Social Linguistics – Society influences language use • Conversation Analysis (CA) • Discourse Analysis (DA) 2 OVERVIEW (1/5) 3 Layers of investigation Data • Sensing (Objective) • Device development and signal processing Information • Event detection (Ambiguous) • Pattern recognition Knowledge • Pattern Discovery (Subjective) • Data mining 4 Major Conferences and Journals • ICMI-MLMI – ICMI (User Interface) and MLMI (Dialogue Analysis) merged in 2009 • Some in multimedia or NLP conferences – ACM Multimedia – ACL – etc. 5 Research Initiatives In Europe • CHIL Corpus • AMI Corpus – Augmented multi-party interaction – http://corpus.amiproject.org/ • SSPNET – A European network of excellence in social signal processing – http://sspnet.eu/ 6 FIRST EXAMPLE (2/5) 8 Paper 1 (ICMI-MLMI 2009) • "Discovering group nonverbal conversational patterns with topics” by Dinesh Babu Jayagopi, Daniel Gatica-Pere (IDIAP) • Goal: Understand group dynamics (= leadership) from conversational video 9 Method • Feature descriptor – Time slices of conversation (documents) • different time scale shows different patterns – 1 min scale – monologue vs. 5 min scale - a lot of interaction – Speaking energy/Speaking status • Bag of non-verbal patterns (NVP) – speech length, # of turns, successful interruptions • Method (what’s new) – Unsupervised – Topic model (LDA) – which feature is prominent 10 Feature categories 1. Generic group patterns: group as a whole – silence, one-speaker, two-speaker, other, evenly 2. Leadership patterns: – proposed in social psychology field – position of designated leader (‘L’) or someone else (‘NL’): taking maximum values • 21 dimensional feature vectors (vocabulary) • 6 tokens per slice (words) 11 12 Data • AMI Corpus – Meeting for product design – 17 meetings (17 hours) – 4 participants / group: • ‘Project Manager’, ‘User Interface specialist’, ‘Marketing Expert’, and ‘Industrial Designer’. 13 Result (3 topics) 14 Result (visual) 3 topics Can be used to characterize groups 15 Validation • Comparison with ground-truth(GT): – 5 min scale, 8 top docs per class – 3 annotator / meeting – GT is majority agreed • Accuracy: 62%, 100%, 75% for each class – Autocratic, Participative, Free rein 16 Questions • Feature representation (Are they good? ) – Some magic numbers (e.g., 6 words/slice) – Balancing #of vocabulary and # of words • Modeling technique (Is LDA a valid one?) – Can we regard the NVC as words and Group Dynamics as topics? – Arbitrary number of topics, different interpretation 17 EXAMPLE 2 (3/5) 18 Paper 2 (MSSSC 2009) • "Sensor-Based Organizational Engineering” by Daniel Olguin-Olguin, Alex (Sandy) Pentland (MIT Media Lab) • [16] Olguin-Olguin, D., & Pentland, A. (2008). Social Sensors for Automatic Data Collection. 14th Americas Conference on Information • Social signals/Reality mining/Sensible organizations – Introduction to their research projects – Use of sensors to collect data in groups – Combination of textual and survey data 19 Method • Sensor data – Face/body/vocal behavior/space and environment/affective behavior – camera infrared sensors, accelerometer, gyroscopes, inclinometers, cameras, pressure sensors, microphones, cameras, vibration,... • Pattern recognition • Social network analysis – Who talks to who – How well they are communicating 20 Case 1 • Communication in a call center – wearable sensor devices (sociometric badge) – completion time difference (productivity) – 2,200 hours of data (100 hours per employee) and 880 reciprocal e-mails • Findings – more interaction implied lower productivity – higher variance in physical activity implies lower productivity 21 Case 2 • Communication in a marketing division – face-to-face vs. emails – questionnaire (satisfaction) • Findings: – Total comm = email + face-to-face – Total comm negatively correlate with satisfaction 22 Questions • Evaluation – Some domains do no have clear definition of good/bad conversation • Interestingness – High proximity -> low email usage • Implementation – management practices for productivity improvements, customer satisfaction, and a better competitive position 24 OVERVIEW OF OUR PROJECT (3/5) 25 Pattern discovery from dialogue • Goal: Finding recurring events or event sequences in human face-to-face dialogues. • Why?: Human communication skills are often experience or assumption-based. – Enable smooth communication – Prevent problematic communication • Task: Identify plausible hypotheses by machines that human cannot notice by observation 26 Target dialogue • Psychotherapeutic Interview (Counseling) – Counseling at schools – Counseling at hospitals • Increasing demand for therapists • Shortage of qualified teachers • Lack of effective training methods • Therapist training setting (non-experimental) 27 Our Corpus (Private) • Psychotherapeutic interview (counseling) – Training opportunity for students • 25 dialogues (approx. 2 hrs each, 21 hrs in total) • Adding more dialogues (3/year) 30 Recording and data format Priority: minimize disturbance for participants Single Camera Two microphones AVI -> MPEG Video Data Transcript Annotation 31 Multimodality • Verbal cue is dominant in defining meanings (textual information) • What are the impact of non verbal cues such as gestures, eye-gaze, styles, timing, or context including social background? 32 33 Can gestures indicate misunderstandings? • “Prediction of Misunderstanding from Gesture Patterns in Psychotherapy”, M. Inoue, R. Hanada, N. Furuyama, NII-2009-001E, Feb. 2009 • Negative result – We should rely on verbal content 34 Gestural Feature for Th & Cl • Before/During/After the misunderstanding • 5/10/50 sec. windows • • • • Frequency (x1; x2; x3) Frequency Difference (x4; x5) Duration (Mean & Max & Min) (x6; x7; x8) Mean Interval (x9) 35 Predictability by gestural cues • Classification by linear discriminate analysis – Is there any feature that have similar precision/recall tendency over different dialogues? P P 1 2 2 3 1 3 Dialogue 1 R Dialogue 2 R 36 SPEECH-GESTURE INTERACTION (4/5) 37 Analysis of speech type patterns – Understand how therapists speak words to their clients based on speech type transition patterns 1. 2. 3. 4. 5. 6. 7. Closed question e.g., :”Do you mean ~?” Open question e.g., “Can you elaborate that?” Encouragement/Repeat e.g., “Go on.” “I see.” Rephrase e.g., “So, you are thinking ~.” Reflection of emotion Reflection of meaning Other A taxonomy used in counseling domain 38 Relationship between speech and gesture • Frequencies of speech types • At the beginning or the end of dialogues • How do speech patterns look differently when gestures are taken into consideration? • Speeches that co-occur with gestures VS Speeches without gestures • Do above division leads to any changes in the speech type transition patterns? 39 Speech type transition in the beginning of the dialogue Sequences beginning from questioning Co-gesture Generic encouragement Non-Co-Gesture 40 Speech type transition in the ending part of the dialogue Sequence beginning from question Question and rephrase Co-Gesture Sequence beginning from encouragement Non Co-Gesture 41 Speech type transition in the ending part of the dialogue (Beginner therapist) Co-Gesture Reflection of therapists’ skill? Non Co-Gesture 42 Summary • Various speech sequence patterns can be interpreted as the techniques in dialogues. • Patterns could be better understood when multimodality is taken into account. • Discovered patterns could be used to assess the proficiency of therapist. 43 VERBAL CONTENT MISMATCH (5/5) 44 Mismatch between intension and perception over an utterance • Therapists (Th) want to empower clients (Cl) by compliments. • Clients want to be empowered by Th through their compliments. • They share the same goal but this process dos not goes well in reality. – Th tried compliment but Cl did not notice it – Some complimentary expression are uncomfortable to Cls – Th cannot figure out how Cls are praised 45 Compliment as a counseling technique • Therapists learn the concept and necessity of compliment through lectures, but – There is not enough analysis of failures. – Concrete examples of expression are scarce. As a result • Inexperienced Th cannot succeed in using compliment techniques in the actual interview occasions very often. 46 Analysis approach • How there happen mismatches in terms of vocabulary. – The focus is on what Ths say rather than how they say. • How the intention and perception are different over the word usage – Timing of the utterance are ignored. • To understand the generic tendency, multiple dialogues are mixed together into a word pool. 47 Data preparation • Transcripts based on the videos of psychotherapeutic interviews (13 pairs, 27 participants) • They are assigned to the participants. • Both Th and Cl highlights Th’s speech where Th conducted compliment (Th) or Cl was empowered (Cl). • Highlighted speeches are extracted and put into the word pool. 48 Degree of discrepancy • Number of highlighted speech by therapists: – 114 (M=8.1) • Number of highlighted speech by clients: – 69(M=4.6) Th marked Cl marked (114) (69) • Agreement: – 6%(11/183) Both marked (11) 49 Pre-processing • Morphological analysis • Replacement of words (fluctuation, removal of proper nouns for anonymity) • Number of tokens: 4250 • Removal of low frequent (tf<2) or single document (df<2) words focusing on the generic (cross-dialogue) expressions • Number of vocabulary: 476 -> 113 50 Frequent words Overall Word TF Therapist Word TF Word Say Think Something Role Very Thing Well That Do Like this Say Thing Think That Something Role Well Do Great Not Say Very Role Think Something Well Do Like this Not Thing 64 45 42 41 40 39 38 36 33 31 22 19 18 18 15 13 13 11 10 9 Client TF 42 30 28 27 27 25 22 22 21 20 51 Eliminate high frequency words frequency 70 total th cl 60 50 40 30 20 10 0 0 20 40 60 80 100 120 word id threshold 52 Mid frequency words Overall Word Tf Feeling Now How Story Hmm Listen Yes Then So Think Therapist Word Tf 16 16 13 13 12 12 10 10 10 10 Feeling Say Talk How Become Thing Tough Listen Absent Hard Client Word 7 6 6 5 5 5 5 4 4 4 Now Hmm Story Feeling Yes How Listen Think I Enter Tf 13 11 10 9 8 8 8 8 8 8 53 Summary • Problem: Compliment used by therapists (Th) during counseling are not well accepted by clients (Cl). • Data: 13 dialogue transcripts; utterances where Th intended compliment technique and Cl feel empowered by compliment are marked. • Analysis: To understand the mismatch in vocabulary level, differences in usage are explored in terms of frequency. – Th tend to use compliment technique to focus on the difficulties of the problem. – Cl may be empowered by the words referring internal mental status. • Future direction: Understanding resolving process of mismatches taking the difference in proficiency of therapists and dialogue topics into account. 54