Download NLP and Speech - University of Manitoba

Document related concepts

Embodied language processing wikipedia , lookup

Transcript
74.419 Artificial
Intelligence
Speech and Natural
Language Processing
Speech and Natural Language Processing
• Communication
• Natural Language
• Syntax
• Semantics
• Pragmatics
• Speech
Evolution of Human Language



communication for "work"
social interaction
basis of cognition and thinking
(Whorff & Saphir)
Communication
"Communication is the intentional
exchange of information brought
about by the production and
perception of signs drawn from a
shared system of conventional
signs."
[Russell & Norvig, p.651]
Natural Language - General
Natural Language is characterized by
 a common or shared set of signs
alphabeth; lexicon
 a systematic procedure to produce
combinations of signs
syntax
 a shared meaning of signs and
combinations of signs
(constructive) semantics
Speech and Natural Language
 Speech Recognition
 acoustic signal as input
 conversion into phonemes and written words
 Natural Language Processing





written text as input; sentences (or 'utterances')
syntactic analysis: parsing; grammar
semantic analysis: "meaning", semantic representation
pragmatics;
dialogue; discourse
 Spoken Language Processing
 transcribed utterances
 Phenomena of spontaneous speech
Speech Recognition
Acoustic / sound wave
Filtering, FFT;
Spectral Analysis
Frequency Spectrum
Signal Processing / Analysis
Features (Phonemes; Context)
Phoneme Recognition:
HMM, Neural Networks
Phonemes
Grammar or Statistics
Phoneme Sequences / Words
Word Sequence / Sentence
Grammar or Statistics for
likely word sequences
Areas in Natural Language Processing
 Morphology (word stem + ending)
 Syntax, Grammar & Parsing (syntactic description
& analysis)
 Semantics & Pragmatics (meaning; constructive;
context-dependent; references; ambiguity)
 Pragmatic Theory of Language; Intentions;
Metaphor (Communication as Action)
 Discourse / Dialogue / Text
 Spoken Language Understanding
 Language Learning
NLP Syntax Analysis Processes
Part-of-Speech
(POS)
Tagging
Morphological
Analyzer
Grammar
Rules
Lexicon
the
the – determiner
Parser
Det
NP → Det Noun
NP recognized
NP
Det
Noun
parse tree
Linguistic Background Knowledge
NLP - Syntactic Analysis
Morphological
Analyzer
Part-of-Speech
(POS)
Tagging
Grammar
Rules
Lexicon
eat + s
3rd sing
eat – verb
Parser
Verb
VP → Verb Noun
VP recognized
VP
Verb
Noun
parse tree
Morphology
A morphological analyzer determines (at least)
 the stem + ending of a word,
and usually delivers related information, like
 the word class,
 the number,
 the person and
 the case of the word.
The morphology can be part of the lexicon or implemented
as a single component, for example as a rule-based
system.
eats  eat + s
verb, singular, 3rd pers
dog  dog
noun, singular
Lexicon
The Lexicon contains information on words, as
 inflected forms (e.g. goes, eats) or
 word-stems (e.g. go, eat).
The Lexicon usually assigns a syntactic category,
 the word class or Part-of-Speech category
Sometimes also
 further syntactic information (see Morphology);
 semantic information (e.g. agent);
 syntactic-semantic information (e.g. verb
complements like: 'give' requires a direct object).
Lexicon
Example contents:
eats  verb; singular, 3rd person (-s);
can have direct object
(verb subcategorization)
dog  dog, noun, singular;
animal
(semantic annotation)
POS (Part-of-Speech) Tagging
POS Tagging determines the word class or ‘part-ofspeech’ category (basic syntactic categories) of
single words or word-stems.
The
dog
eats
the
bone
det (determiner)
noun
verb (3rd person; singular)
det
noun
Open Word Class: Nouns
Nouns denote objects, concepts, …
Proper Nouns
Names for specific individual objects, entities
e.g. the Eiffel Tower, Dr. Kemke
Common Nouns
Names for categories or classes or abstracts
e.g. fruit, banana, table, freedom, sleep, ...
Count Nouns
enumerable entities, e.g. two bananas
Mass Nouns
not countable items, e.g. water, salt, freedom
Open Word Class: Verbs
Verbs
denote actions, processes, states
e.g. smoke, dream, rest, run
Several morphological forms e.g.
non-3rd person
eat
3rd person
eats
progressive/
eating
present participle/
gerundive
past participle
eaten
Auxiliaries, e.g. be, as sub-class of verbs
Open Word Class: Adjectives
Adjectives
denote qualities or properties of objects, e.g.
heavy, blue, content
most languages have concepts for
colour
age
value
-
white, green, ...
young, old, ...
good, bad, ...
not all languages have adjectives as separate class
Open Word Class: Adverbs
Adverbs
denote modifications of actions (verbs), qualities
(adjectives)
e.g. walk slowly, heavily drunk
Directional or Locational Adverbs
Specify direction or location
e.g.
go home, stay here
Degree Adverbs
Specify extent of process, action, property
e.g.
extremely slow, very modest
Open Word Class: Adverbs 2
Manner Adverbs
Specify manner of action or process
e.g.
walk slowly, run fast
Temporal Adverbs
Specify time of event or action
e.g.
yesterday, Monday
Closed Word Classes
prepositions: on, under, over, at, from, to, with, ...
determiners: a, an, the, ...
pronouns: he, she, it, his, her, who, I, ...
conjunctions: and, or, as, if, when, ...
auxiliary verbs: can, may, should, are
particles: up, down, on, off, in, out,
numerals: one, two, three, ..., first, second, ...
Language and Grammar
Natural Language described as Formal
Language L using a Formal Grammar G:
• start-symbol S ≡ sentence
• non-terminals NT ≡ syntactic constituents
• terminals T ≡ lexical entries/ words
• production rules P ≡ grammar rules
Generate sentences or recognize sentences
(Parsing) of the language L through the
application of grammar rules.
Grammar
Here, POS Tags are included in the grammar rules.
det 
the
noun 
dog | bone
verb 
eat
NP 
det noun
(NP  noun phrase)
VP 
verb
(VP  verb phrase)
VP 
verb NP
S

NP VP
(S  sentence)
Most often we deal with Context-free Grammars,
with a distinguished Start-symbol S (sentence).
Parsing
Parsing
 derive the syntactic structure of a sentence
based on a language model (grammar)
 construct a parse tree, i.e. the derivation of
the sentence based on the grammar (rewrite
system)
Parsing (here: bottom-up)
determine the syntactic structure of the sentence
the
 det
dog
 noun
det noun  NP
eats
 verb
the
 det
bone
 noun
det noun  NP
verb NP
 VP
NP VP
S
Sample Grammar
Grammar (S, NT, T, P) - NT Non-Terminal; T Terminals; P Productions
Sentence Symbol S  NT
Word-Classes / Part-of-Speech  NT
syntactic Constituents  NT
terminal words  NT
Grammar Rules P  NT  (NT  T)*
S → NP VP | Aux NP VP
NP → Det Nominal | Proper-Noun
Nominal → Noun | Nominal PP
VP → Verb | Verb NP | Verb PP | Verb NP PP
PP → Prep NP
Det → that | this | a
Noun → book | flight | meal | money
Proper-Noun → Houston | American Airlines | TWA
Verb → book | include | prefer
Prep → from | to | on
Auc → do | does
Sample Parse Tree
Parse "Does this flight include a meal?"
S
Aux
NP
Det Nominal
Noun
does
this
flight
VP
Verb
NP
Det
include a
Nominal
meal
Bottom-up vs. Top-Down Parsing
Bottom-up – from word-nodes to sentence-symbol
Top-down Parsing – from sentence-symbol to words
S
Aux
NP
Det Nominal
Noun
does
this
flight
VP
Verb
NP
Det
include a
Nominal
meal
Ambiguity
“One morning, I shot an elephant in my pajamas.
How he got into my pajamas, I don’t know.”
Groucho Marx
syntactical or structural ambiguity – several parse trees
example: above sentence
semantic or lexical ambiguity – several word meanings
bank (where you get money) and (river) bank
even different word categories possible (interim)
He books the flight.
vs. The books are here.
Fruit flies from the balcony vs. Fruit flies are on the balcony.
Lexical Ambiguity
Several word senses or word categories
e.g. chase – noun or verb
e.g. plant - ????
Syntactic Ambiguity
Several parse trees
e.g. “The dog eats the bone in the park.”
e.g. “The dog eats the bone in the package.”
Who/what is in the park and who/what is in the package?
Syntactically speaking:
How do I bind the Prepositional Phrase
"in the ... " ?
Problems in Parsing
Problems with left-recursive rules like NP → NP PP: don’t
know how many times recursion is needed.
Pure Bottom-up or Top-down Parsing is inefficient because
it generates and explores too many structures which in
the end turn out to be invalid.
Combine top-down and bottom-up approach:
Start with sentence; use rules top-down (look-ahead);
read input; try to find shortest path from input to highest
unparsed constituent (from left to right).
→ Chart-Parsing / Earley-Parser
Chart-Parsing / Early Algorithm
Essence:


Integrate top-down and bottom-up parsing.
Keep recognized sub-structures (sub-trees) for shared
use during parsing.
Top-down Prediction: Start with S-symbol. Generate all
applicable rules for S. Go further down with left-most
constituent in rules and add rules for these constituents
until you encounter a left-most node on the RHS which is
a word category (POS).
Bottom-up Completion: Read input word and compare. If
word matches, mark as recognized and continue the
recognition bottom-up, trying to complete active rules.
Earley Algorithm - Functions
predictor
generates new rules for partly recognized RHS with
constituent right of • (top-down generation);
• indicates how far a rule has been recognized
scanner
if word category (POS) is found right of the • , the Scanner
reads the next input word and adds a rule for it to the chart
(bottom-up mode)
completer
if rule is completely recognized (the • is far right), the
recognition state of earlier rules in the chart advances: the
• is moved over the recognized constituent (bottom-up
recognition).
Chart
S  VP .
VP V NP .
NP Det Nom .
V
Book
Nom  Noun .
Det
Noun
this
flight
Semantics
Semantic Representation
Representation of the meaning of a sentence.
Generate
 a logic-based representation or
 a frame-based representation
based on the syntactic structure, lexical entries, and
particularly the head-verb (determines how to arrange
parts of the sentence in the semantic representation).
Semantic Representation
Verb-centered Representation
Verb (action, head) is regarded as center of verbal
expression and determines the case frame with
possible case roles; other parts of the sentence
are described in relation to the action as fillers of
case slots. (cf. also Schank’s CD Theory)
Typing of case roles possible (e.g. 'agent' refers to
a specific sort or concept)
General Frame for "eat"
Agent:
animate
Action: eat
Patiens: food
Manner: {e.g. fast}
Location: {e.g. in the yard}
Time:
{e.g. at noon}
Example-Frame with Fillers
Agent:
Action:
Patiens:
Location:
the dog
eat
the bone / the bone in the package
in the park
General Frame for drive
Frame with fillers
Agent:
animate
Action: drive
Patiens: vehicle
Manner:{the way it is done}
Location: Location-spec
Agent: she
Action: drives
Patiens: the convertible
Manner: fast
Location: [in the] Rocky
Mountains
Source: [from] home
Destination: [to the] ASIC
conference
Time: [in the] summer holidays
Source: Location-spec
Destination: Location-spec
Time: Time-spec
Representation in Logic
Action:
Agent:
Patiens:
eat
the dog
the bone / the bone in the package
Location:
in the park
predicate
eat (dog-1, bone-1, park-1)
constants
Representation in Logic
eat (dog-1, bone-1, park-1)
general
lexical
eat ( x, y, z )
variables
eat ( NP-1, NP-2, PP )
syntactic
animate-being (x)
food (y)
location (z)
NP-1 (x)
NP-2 (y)
PP (z)
semantic frame
syntactic frame
Pragmatics
Pragmatics
Pragmatics includes context-related aspects of NL
expressions (utterances).
These are in particular anaphoric references, elliptic
expressions, deictic expressions, …
anaphoric references – refer to items mentioned
before
deictic expressions –
simulate pointing gestures
elliptic expressions –
incomplete expression;
relate to item mentioned
before
Pragmatics
“I put the box on the top shelve.”
“I know that. But I can’t find it there.”
anaphoric reference
deictic expression
“The candy-box?”
elliptic expression
Intentions
Intentions
One philosophical assumption is that natural
language is used to achieve things or situations:
“Do things with words.”
The meaning of an utterance is essentially
determined by the intention of the speaker.
Intentionality - Examples
What was said:
What was meant:
“There is a terrible
draft here.”
"Can you please
close the window."
“How does it look
here?”
"I am really mad;
clean up your room."
"Will this ever end?"
"I would prefer to be
with my friends than
to sit in class now."
Metaphors
Metaphors
The meaning of a sentence or expression is not
directly inferable from the sentence structure and
the word meanings. Metaphors transfer concepts
and relations from one area of discourse into
another area, for example, seeing time as line (in
space) or seing friendship or life as a journey.
Metaphors - Examples
“This car eats a lot of gas.”
“She devoured the book.”
“He was tied up with his clients.”
“Marriage is like a journey.”
“Their marriage was a one-way road into hell.”
(see George Lakoff, Women, Fire and Dangerous Things)
Dialogue and Discourse
Discourse / Dialogue Structure
Grammar for various sentence types (speech acts):
dialogue, discourse, story grammar
Distinguish questions, commands, and statements:
 Where is the remote-control?
 Bring the remote-control!
 The remote-control is on the brown table.
Dialogue Grammars describe possible sequences of
Speech Acts in communication, e.g. that a question is
followed by an answer/statement.
Speech
Speech Production & Reception
Sound and Hearing
• change in air pressure  sound wave
• reception through inner ear membrane /
microphone
• break-up into frequency components: receptors in
cochlea / mathematical frequency analysis (e.g.
Fast-Fourier Transform FFT)  Frequency
Spectrum
• perception/recognition of phonemes and
subsequently words (e.g. Neural Networks, HiddenMarkov Models)
Speech Recognition Phases
Speech Recognition
• acoustic signal as input
• signal analysis - spectrogram
• feature extraction
• phoneme recognition
• word recognition
• conversion into written words
Speech Signal
Speech Signal composed of
 harmonic signal (sinus waves)
with different frequencies and amplitudes
 frequency - waves/second  like pitch
 amplitude - height of wave  like loudness
 non-harmonic signal (not sinus wave): noise
glottis and speech signal in lingWAVES (from http://www.lingcom.de)
Speech Signal Analysis
Analog-Digital Conversion of Acoustic Signal
Sampling in Time Frames (“windows”)
 frequency = 0-crossings per time frame
 e.g. 2 crossings/second is 1 Hz (1 wave)
 e.g. 10kHz needs sampling rate 20kHz
 measure amplitudes of signal in time frame
 digitized wave form
 separate different frequency components
 FFT (Fast Fourier Transform)
 spectrogram
 other frequency based representations
 LPC (linear predictive coding),
 Cepstrum
Waveform
Amplitude/
Pressure
Time
"She just had a baby."
Waveform for Vowel ae
Amplitude/
Pressure
Time
Time
Waveform and Spectrogram
Waveform and LPC Spectrum for Vowel ae
Amplitude/
Pressure
Time
Energy
Formants
Frequency
Phoneme Recognition
Recognition Process based on



features extracted from spectral analysis
phonological rules
statistical properties of language/ pronunciation
Recognition Methods



Hidden Markov Models
Neural Networks
Pattern Classification in general
Speech Signal Characteristics
Derive from signal representation:
 formants - dark stripes in spectrum
strong frequency components; characterize particular
vowels; gender of speaker
 pitch – fundamental frequency
baseline for higher frequency harmonics like formants;
gender characteristic
 change in frequency distribution
characteristic for e.g. plosives (form of articulation)
Features for Vowels & Consonants
Probabilistic FAs as Word Models
Word Recognition with Hidden Markov Model
Viterbi-Algorithm
The Viterbi Algorithm finds an optimal sequence of states
in continuous Speech Recognition, given an observation
sequence of phones and a probabilistic (weighted) FA
(state graph). The algorithm returns the path through the
automaton which has maximum probability and accepts
the observation sequence.
a[s,s'] is the transition probability (in the phonetic word
model) from current state s to next state s', and b[s',ot] is
the observation likelihood of s' given ot. b[s',ot] is 1 if the
observation symbol matches the state, and 0 otherwise.
(cf. Jurafsky Ch.5)
Speech Recognizer Architecture
Speech Processing - Characteristics
 Speech Recognition vs. Speaker Identification
(Voice Recognition)
 speaker-dependent vs. speaker-independent
 training
 unlimited vs. large vs. small vocabulary
 single word vs. continuous speech
Spoken Language
Spoken Language
 Output of Speech Recognition System as input
"text".
 Can be associated with probabilities for
different word sequences.
 Contains ungrammatical structures, so-called
"disfluencies", e.g. repetitions and corrections.
Spoken Language - Examples
1.
no [s-] straight southwest
2.
right to [my] my left
3.
[that is] that is correct
From: Robin J. Lickley. HCRC Disfluency Coding Manual.
http://www.ling.ed.ac.uk/~robin/maptask/HCRCdsm-01.html
Spoken Language - Examples
1.
we're going to [g-- ]... turn straight back around
for testing.
2.
[come to] ... walk right to the ... right-hand side of
the page.
3.
right [up ... past] ... up on the left of the ... white
mountain walk ... right up past.
4.
[i'm still] ... i've still gone halfway back round the
lake again.
Spoken Language - Examples
1.
[I’d] [d if] I need to go
2.
[it’s basi--] see if you go over the old mill
3.
[you are going] make a gradual slope … to your
right
4.
[I’ve got one] I don’t realize why it is there
Spoken Language - Disfluency
Reparandum and Repair
[come to] ... walk right to [the] ...
the right-hand side of the page
Reparandum
Repair
Additional References
Jurafsky, D. & J. H. Martin, Speech and Language
Processing, Prentice-Hall, 2000
Hong, X. & A. Acero & H. Hon: Spoken Language
Processing. A Guide to Theory, Algorithms, and System
Development. Prentice-Hall, NJ, 2001
Kemke, C., 74.793 Natural Language and Speech
Processing - Course Notes, 2nd Term 2004, Dept. of
Computer Science, U. of Manitoba
Robin J. Lickley. HCRC Disfluency Coding Manual.
http://www.ling.ed.ac.uk/~robin/maptask/HCRCdsm-01.html
Figures
Figures taken from:
Jurafsky, D. & J. H. Martin, Speech and Language Processing,
Prentice-Hall, 2000, Chapters 5 and 7.
lingWAVES (from http://www.lingcom.de