Download Trying to Understand Misunderstanding: How Robust Can Spoken

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Computer science wikipedia , lookup

Fault tolerance wikipedia , lookup

Phone connector (audio) wikipedia , lookup

Computer program wikipedia , lookup

Wire wrap wikipedia , lookup

Gender of connectors and fasteners wikipedia , lookup

Transcript
Trying to Understand
Misunderstanding: How Robust Can
Spoken Natural Language Dialogue
Systems Be?
Ronnie W. Smith
East Carolina University
Sponsors
•
•
•
•
•
National Science Foundation
Duke University
East Carolina University
DARPA
BBN
Collaborators
1987-1994: Dr. Alan Biermann, Dr. Ruth Day,
Dr. Robert Rodman, Richard Hipp,
Barry Koster, Dania Egedi, Robin Gambill,
Curry Guinn
1994-2000: Dr. Steve Gordon, Robert Hoggard,
Shannon Pollard, Chris Shaffer, Greg Keim,
Jeremy Mauget, Devender Singh, Jason Carrow,
Brian Manning
Quick Outline
•
•
•
•
Initial motivation and results
Careful studies
Not so careful studies
What I’d really like to do!
Initial Research Focus
Develop a model of dialog processing that:
• Uses problem-solving to achieve target goal
• Exploits contextual knowledge from the discourse and user
• Allows mixed-initiative interaction (computer or user
controlled)
• Allows voice interaction
• Operates in real time to permit experimentation
Experimental Evaluation
Issues:
•
•
Is computational model practical?
Would users dialog behavior change as they gained experience and
were given control of the dialog?
Hypotheses
--As users gained experience and were given the initiative, they
•
Would complete the task faster
•
Would speak fewer utterances
•
Would speak longer utterances
•
Would spend more time thinking before producing a response
Side effect: analyzable data
Data Collection: The Circuit Fix-It Shop
•
•
•
•
•
•
•
•
Constructed at Duke University
Assisted users in repairing an electronic circuit
125 word vocabulary, speaker-dependent connected speech
Varying initiative fixed per interaction
141 dialogs, 2840 user utterances
114 or 84% of the dialogs were successfully completed
50% utterance recognition rate
81.5% correct interpretation rate
Sample Interaction – Part 1
C: This is the Circuit Fix-It Shop. How may I help you?
U: This RS111 circuit is not working.
C: I am familiar with that circuit. The LED is supposed to be displaying
alternately flashing one and seven.
U: The LED is flashing seven.
C: Did you mean to say that the LED is flashing?
U: The LED is displaying a flashing seven.
C: Is the one on the LED displaying?
U: No.
C: Is anything else on the LED displaying other than the NOT** flashing seven?
U: The seven is flashing.
C: Is anything else on the LED displaying other than the flashing seven?
U: No.
C: What is the switch at when the LED is displaying only a flashing seven?
U: It is up.
C: What is the knob at when the LED is displaying only a flashing seven?
U: The knob is at ten.
Sample Interaction – Part 2
C: Is there a wire between connector one eight and connector four one?
U: Where is connector four one?
C: Below the white knob with the orange circle is a pink box, called
transistors. Now, in the upper right of this pink box, called transistors
is connector four one.
U: The wire is not there.
C: Did you mean to say that there IS** a wire between connectors one
eight and four one?
U: There is no wire.
C: Add a wire between connector one eight and connector four one.
Initial Miscommunication Handling
Techniques
Error-correcting Parsing (Hipp)
• Convert input to “nearest” grammatical utterance
• “nearest” is determined by a cost matrix for insertions,
deletions, and substitutions of words
• Costs are not all the same (e.g., “a” vs. “not”)
Tell the user what went wrong
• Only tell user what computer’s interpretation was
• Only when misrecognition caused contradictory
interpretation (but required for only 48% of these)
What to Do Next?
Get a better speech recognizer! Well--•
•
•
•
Better is not the same as perfect!
Better => stretch its limits anyway
There will probably always be ungrammatical spoken
inputs.
There will always be mismatched speaker/hearer
background knowledge.
What to Do Next?
Investigate strategies for the prevention, detection, and repair
of miscommunication in natural language dialog
• Detailed analysis of existing dialogs
• Development and evaluation of strategies for handling
miscommunication
Effects of Variable Initiative on
Linguistic Behavior in Human-Computer
Spoken Natural Language Dialog
•
Smith and Gordon (Computational Linguistics, March
1997)
Based on Circuit Fix-It Shop Data
Based on classifying utterances according to task phase
•
•
–
–
–
–
–
Introduction: establish task purpose
Assessment: establish current system behavior
Diagnosis: establish cause for errant behavior
Repair:
establish completion of correction
Test:
establish correctness of behavior
Result 1: Relative Number of Utterances
Computer-Controlled
Subdialog Type
User-Controlled
Average
Percent
Average
Percent
Introduction
2.9
5.2%
2.6
11.6%
Assessment
15.4
27.4%
7.9
35.1%
Diagnosis
11.8
21.0%
6.7
29.8%
Repair
2.9
5.2%
0.3
1.3%
Test
23.2
41.2%
5.0
22.2%
Conclusion: Experienced users tend not to discuss details they
can handle themselves.
Result 2: Frequency of User Subdialog
Transistions
Subdialog Type Computer Controlled
User Controlled
Introduction
0.0%
0.0%
Assessment
11.4%
19.4%
Diagnosis
0.0%
6.8%
Repair
0.0%
62.5%
Test
23.7%
92.8%
Conclusion: Computer initiates most subdialogs except when
experienced users are completing the task.
Result 3: Predictability of Subdialog
Transistions
Idealized Transition Model
I
A
D
R
T
F
Result 3: Predictability of Subdialog
Transistions
Empirical Transition Model
100
I
97
91
A 80
69
D
19
39
Computer controlled %
User controlled %
Percentage “normal” dialogs
• Computer-controlled: 64%
• User-controlled: 33%
8
R
12
53
96
75
25
24
62
T
72
F
Study Conclusions
Computer controlled dialogs--• Have an orderly pattern of computer-initiated subdialogs
• Have terse user responses
• Are not amenable to user-correction during
miscommunication
User controlled dialogs--• Are less orderly
• Contain more user-initiated subdialogs
• Indicate user willingness to exploit growing expertise
Analysis of Strategies for Selective
Utterance Verification
•
•
Smith (ANLP, 1997; IJHCS, 1998)
Motivation---miscommunication due to speech
recognition errors
Spoken:
I want to fix this circuit
Recognized: power a six a circuit
Spoken:
there is no wire on connector one zero four
Recognized: stays no wire I connector one zero four
Verification Subdialogs
Computer: This is the circuit fix-it shop. How may I help you?
Spoken: I want to fix a circuit.
Recognized: power a six a circuit.
Computer: Did you mean to say there is a power circuit?
WHEN TO USE THIS??
Goal: SelectiveVerification
• Initiate a verification subdialog only when it is believed to
be needed.
• Criteria for need: sufficiently unsure you’ve fully
understood AND the need to fully understand is
sufficiently great.
• Terminology
– Under-verification---system generates an incorrect meaning that is
not verified
– Over-verification---a correct meaning is verified
• Ideal: minimize under-verifications while keeping oververifications to a minimum as well
Measurements of Uncertainty
• Parse Cost---sum of costs incurred by error-correcting
parser in transforming input to a grammatical utterance
• Expectation Cost---how expected was the response given
the dialog context
Measuring Utterance Importance
• Unexplored
• Domain-dependent?
• Fixed-threshold (depends on risk due to miscommunication)
Basic Results
Strategy
Under-verification
Rate
Over-verification
Rate
Parse Cost Only
2.6%
19.2%
Expectation Only
1.4%
31.8%
Combination
4.7%
7.6%
Generalization: This result holds under most threshold values.
Study Conclusions
• Key parameters for selecting strategy and threshold value:
expectation strength, word recognition rate, and penalty for
under-verifications
• Results depend on the level of system initiative
• Results assume all verification subdialogs succeed
• Limits to verification subdialogs => other strategies are
needed.
General Problem
• “Knowing” when you don’t know.
OR
• “Suspecting” when you don’t know.
BUT
• If you don’t know, how do you talk about it?
THUS
• System must provide some information on its
understanding of context
Domain-Level Techniques
• Detection of dead ends (suspicious error path checks out
OK)
C: The LED is supposed to be displaying alternately flashing
one and seven.
U: LED displaying seven (misrecognized and understood as
“LED displaying flashing seven”)
C: Is anything else on the LED on?
U: No.
C: Is there a wire between connectors 18 and 41?
U: Yes
C: Are you sure that the LED is displaying only a flashing
seven?
Domain-Level Techniques
• Detection of impossibilities (description not possible given
“known” conditions)
C: What is the LED displaying?
U: The LED is displaying a one on the left hand side.
C: Is anything else on the LED on?
U: No. (NOTE: misstatement---seven also displaying)
C: Is the one on the LED flashing?
U: Yes.
C: Are you sure that the one on the LED is flashing?
U: Yes.
C: Are you sure that the seven on the LED is off?
U: The LED is displaying alternately flashing one and seven.
Domain-Level Techniques
In general, these are detections of logical inconsistencies, but
are their general dialogue strategies for discussing them?
Grounding---establishing common
context of understanding
• Beyond context free acknowledgments
U:
C:
U:
C:
U:
C:
U:
The LED is displaying a flashing seven. (misunderstood)
Is the one on the LED displaying?
No.
Is anything else on the LED displaying other than the
NOT** flashing seven?
The seven is flashing.
Is anything else on the LED displaying other than the
flashing seven?
No.
• How much grounding is enough?
The Initiative Factor
• Explicit semantic grounding leverages human intelligence
• But only if we allow it!
C: Whether there is a wire between connectors 111 and 120 needs to be
observed.
U: There is a wire between connectors 111 and 120 (misrecognized).
C: There is not supposed to be a wire between connectors 111 and 1.
U: (repeats and is understood)
The Initiative Factor
• Too much computer control?
C: What is the voltage between connectors 121 and 120?
(computer suspects problem in power circuit)
U: I believe there’s a problem with the LED circuit.
C: What is the voltage between connectors 121 and 120?
Suppose instead the response is--U: Voltage is eight (misrecognized as control knob at six).
Then computer response is okay.
Final Thoughts/Summary
• Design systems to leverage human intelligence
– The ability to follow step-by-step instruction?
– Varying levels of system initiative
• VERY challenging when user expertise evolves.
• Menus vs. keyboard shortcuts????
– Explicit semantic grounding
• Verification subdialogs, etc.
• What’s the right amount?
• We still need carefully designed studies with real systems!!!