Download スライド 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Multimodal Dialogue Analysis
INOUE, Masashi
Yamagata University
29-Nov-09 @FIU Dr. Tao Li’s Group
Name of the discipline
• Computational Social Linguistics
– Society influences language use
• Conversation Analysis (CA)
• Discourse Analysis (DA)
2
OVERVIEW (1/5)
3
Layers of investigation
Data
• Sensing (Objective)
• Device development and signal processing
Information
• Event detection (Ambiguous)
• Pattern recognition
Knowledge
• Pattern Discovery (Subjective)
• Data mining
4
Major Conferences and Journals
• ICMI-MLMI
– ICMI (User Interface) and MLMI (Dialogue
Analysis) merged in 2009
• Some in multimedia or NLP conferences
– ACM Multimedia
– ACL
– etc.
5
Research Initiatives In Europe
• CHIL Corpus
• AMI Corpus
– Augmented multi-party interaction
– http://corpus.amiproject.org/
• SSPNET
– A European network of excellence in social signal
processing
– http://sspnet.eu/
6
FIRST EXAMPLE (2/5)
8
Paper 1 (ICMI-MLMI 2009)
• "Discovering group nonverbal conversational
patterns with topics” by Dinesh Babu
Jayagopi, Daniel Gatica-Pere (IDIAP)
• Goal: Understand group dynamics (=
leadership) from conversational video
9
Method
• Feature descriptor
– Time slices of conversation (documents)
• different time scale shows different patterns
– 1 min scale – monologue vs. 5 min scale - a lot of interaction
– Speaking energy/Speaking status
• Bag of non-verbal patterns (NVP)
– speech length, # of turns, successful interruptions
• Method (what’s new)
– Unsupervised
– Topic model (LDA) – which feature is prominent
10
Feature categories
1. Generic group patterns: group as a whole
– silence, one-speaker, two-speaker, other, evenly
2. Leadership patterns:
– proposed in social psychology field
– position of designated leader (‘L’) or someone else
(‘NL’): taking maximum values
• 21 dimensional feature vectors (vocabulary)
• 6 tokens per slice (words)
11
12
Data
• AMI Corpus
– Meeting for product design
– 17 meetings (17 hours)
– 4 participants / group:
• ‘Project Manager’, ‘User Interface specialist’,
‘Marketing Expert’, and ‘Industrial Designer’.
13
Result (3 topics)
14
Result (visual)
3 topics
Can be used to characterize groups
15
Validation
• Comparison with ground-truth(GT):
– 5 min scale, 8 top docs per class
– 3 annotator / meeting
– GT is majority agreed
• Accuracy: 62%, 100%, 75% for each class
– Autocratic, Participative, Free rein
16
Questions
• Feature representation (Are they good? )
– Some magic numbers (e.g., 6 words/slice)
– Balancing #of vocabulary and # of words
• Modeling technique (Is LDA a valid one?)
– Can we regard the NVC as words and Group
Dynamics as topics?
– Arbitrary number of topics, different
interpretation
17
EXAMPLE 2 (3/5)
18
Paper 2 (MSSSC 2009)
• "Sensor-Based Organizational Engineering” by
Daniel Olguin-Olguin, Alex (Sandy) Pentland
(MIT Media Lab)
• [16] Olguin-Olguin, D., & Pentland, A. (2008). Social
Sensors for Automatic Data Collection. 14th Americas
Conference on Information
• Social signals/Reality mining/Sensible
organizations
– Introduction to their research projects
– Use of sensors to collect data in groups
– Combination of textual and survey data
19
Method
• Sensor data
– Face/body/vocal behavior/space and
environment/affective behavior
– camera infrared sensors, accelerometer,
gyroscopes, inclinometers, cameras, pressure
sensors, microphones, cameras, vibration,...
• Pattern recognition
• Social network analysis
– Who talks to who
– How well they are communicating
20
Case 1
• Communication in a call center
– wearable sensor devices (sociometric badge)
– completion time difference (productivity)
– 2,200 hours of data (100 hours per employee) and
880 reciprocal e-mails
• Findings
– more interaction implied lower productivity
– higher variance in physical activity implies lower
productivity
21
Case 2
• Communication in a marketing division
– face-to-face vs. emails
– questionnaire (satisfaction)
• Findings:
– Total comm = email + face-to-face
– Total comm negatively correlate with satisfaction
22
Questions
• Evaluation
– Some domains do no have clear definition of
good/bad conversation
• Interestingness
– High proximity -> low email usage
• Implementation
– management practices for productivity
improvements, customer satisfaction, and a better
competitive position
24
OVERVIEW OF OUR PROJECT (3/5)
25
Pattern discovery from dialogue
• Goal: Finding recurring events or event
sequences in human face-to-face dialogues.
• Why?: Human communication skills are often
experience or assumption-based.
– Enable smooth communication
– Prevent problematic communication
• Task: Identify plausible hypotheses by
machines that human cannot notice by
observation
26
Target dialogue
• Psychotherapeutic Interview (Counseling)
– Counseling at schools
– Counseling at hospitals
• Increasing demand for therapists
• Shortage of qualified teachers
• Lack of effective training methods
• Therapist training setting (non-experimental)
27
Our Corpus (Private)
• Psychotherapeutic interview (counseling)
– Training opportunity for students
• 25 dialogues (approx. 2 hrs each, 21 hrs in
total)
• Adding more dialogues (3/year)
30
Recording and data format
Priority: minimize disturbance for participants
Single Camera
Two microphones
AVI -> MPEG
Video Data
Transcript
Annotation
31
Multimodality
• Verbal cue is dominant in defining meanings
(textual information)
• What are the impact of non verbal cues such
as gestures, eye-gaze, styles, timing, or
context including social background?
32
33
Can gestures indicate
misunderstandings?
• “Prediction of Misunderstanding from Gesture
Patterns in Psychotherapy”, M. Inoue, R.
Hanada, N. Furuyama, NII-2009-001E, Feb.
2009
• Negative result
– We should rely on verbal content
34
Gestural Feature for Th & Cl
• Before/During/After the misunderstanding
• 5/10/50 sec. windows
•
•
•
•
Frequency (x1; x2; x3)
Frequency Difference (x4; x5)
Duration (Mean & Max & Min) (x6; x7; x8)
Mean Interval (x9)
35
Predictability by gestural cues
• Classification by linear discriminate analysis
– Is there any feature that have similar
precision/recall tendency over different
dialogues?
P
P
1
2
2
3
1
3
Dialogue 1
R
Dialogue 2
R
36
SPEECH-GESTURE INTERACTION
(4/5)
37
Analysis of speech type patterns
– Understand how therapists speak words to their
clients based on speech type transition patterns
1.
2.
3.
4.
5.
6.
7.
Closed question
e.g., :”Do you mean ~?”
Open question
e.g., “Can you elaborate that?”
Encouragement/Repeat
e.g., “Go on.” “I see.”
Rephrase
e.g., “So, you are thinking ~.”
Reflection of emotion
Reflection of meaning
Other
A taxonomy used in counseling domain
38
Relationship between speech and
gesture
• Frequencies of speech types
• At the beginning or the end of dialogues
• How do speech patterns look differently when
gestures are taken into consideration?
• Speeches that co-occur with gestures
VS
Speeches without gestures
• Do above division leads to any changes in the
speech type transition patterns?
39
Speech type transition in the beginning of the dialogue
Sequences beginning from questioning
Co-gesture
Generic encouragement
Non-Co-Gesture
40
Speech type transition in the ending part of the dialogue
Sequence beginning from question
Question and rephrase
Co-Gesture
Sequence beginning from encouragement
Non Co-Gesture
41
Speech type transition in the ending part of the dialogue
(Beginner therapist)
Co-Gesture
Reflection of therapists’ skill?
Non Co-Gesture
42
Summary
• Various speech sequence patterns can be
interpreted as the techniques in dialogues.
• Patterns could be better understood when
multimodality is taken into account.
• Discovered patterns could be used to assess
the proficiency of therapist.
43
VERBAL CONTENT MISMATCH (5/5)
44
Mismatch between intension and
perception over an utterance
• Therapists (Th) want to empower clients (Cl)
by compliments.
• Clients want to be empowered by Th through
their compliments.
• They share the same goal but this process dos
not goes well in reality.
– Th tried compliment but Cl did not notice it
– Some complimentary expression are
uncomfortable to Cls
– Th cannot figure out how Cls are praised
45
Compliment as a counseling
technique
• Therapists learn the concept and necessity of
compliment through lectures, but
– There is not enough analysis of failures.
– Concrete examples of expression are scarce.
As a result
• Inexperienced Th cannot succeed in using
compliment techniques in the actual interview
occasions very often.
46
Analysis approach
• How there happen mismatches in terms of
vocabulary.
– The focus is on what Ths say rather than how they say.
• How the intention and perception are different
over the word usage
– Timing of the utterance are ignored.
• To understand the generic tendency, multiple
dialogues are mixed together into a word pool.
47
Data preparation
• Transcripts based on the videos of
psychotherapeutic interviews (13 pairs, 27
participants)
• They are assigned to the participants.
• Both Th and Cl highlights Th’s speech where
Th conducted compliment (Th) or Cl was
empowered (Cl).
• Highlighted speeches are extracted and put
into the word pool.
48
Degree of discrepancy
• Number of highlighted
speech by therapists:
– 114 (M=8.1)
• Number of highlighted
speech by clients:
– 69(M=4.6)
Th marked
Cl marked
(114)
(69)
• Agreement:
– 6%(11/183)
Both marked (11)
49
Pre-processing
• Morphological analysis
• Replacement of words (fluctuation, removal of
proper nouns for anonymity)
• Number of tokens: 4250
• Removal of low frequent (tf<2) or single
document (df<2) words focusing on the
generic (cross-dialogue) expressions
• Number of vocabulary: 476 -> 113
50
Frequent words
Overall
Word
TF
Therapist
Word
TF
Word
Say
Think
Something
Role
Very
Thing
Well
That
Do
Like this
Say
Thing
Think
That
Something
Role
Well
Do
Great
Not
Say
Very
Role
Think
Something
Well
Do
Like this
Not
Thing
64
45
42
41
40
39
38
36
33
31
22
19
18
18
15
13
13
11
10
9
Client
TF
42
30
28
27
27
25
22
22
21
20
51
Eliminate high frequency words
frequency
70
total
th
cl
60
50
40
30
20
10
0
0
20
40
60
80
100
120
word id
threshold
52
Mid frequency words
Overall
Word
Tf
Feeling
Now
How
Story
Hmm
Listen
Yes
Then
So
Think
Therapist
Word
Tf
16
16
13
13
12
12
10
10
10
10
Feeling
Say
Talk
How
Become
Thing
Tough
Listen
Absent
Hard
Client
Word
7
6
6
5
5
5
5
4
4
4
Now
Hmm
Story
Feeling
Yes
How
Listen
Think
I
Enter
Tf
13
11
10
9
8
8
8
8
8
8
53
Summary
• Problem: Compliment used by therapists (Th) during counseling are not
well accepted by clients (Cl).
• Data: 13 dialogue transcripts; utterances where Th intended compliment
technique and Cl feel empowered by compliment are marked.
• Analysis: To understand the mismatch in vocabulary level, differences in
usage are explored in terms of frequency.
– Th tend to use compliment technique to focus on the difficulties of the
problem.
– Cl may be empowered by the words referring internal mental status.
• Future direction: Understanding resolving process of mismatches taking
the difference in proficiency of therapists and dialogue topics into account.
54