Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
74.419 Artificial Intelligence Speech and Natural Language Processing Speech and Natural Language Processing • Communication • Natural Language • Syntax • Semantics • Pragmatics • Speech Evolution of Human Language communication for "work" social interaction basis of cognition and thinking (Whorff & Saphir) Communication "Communication is the intentional exchange of information brought about by the production and perception of signs drawn from a shared system of conventional signs." [Russell & Norvig, p.651] Natural Language - General Natural Language is characterized by a common or shared set of signs alphabeth; lexicon a systematic procedure to produce combinations of signs syntax a shared meaning of signs and combinations of signs (constructive) semantics Speech and Natural Language Speech Recognition acoustic signal as input conversion into phonemes and written words Natural Language Processing written text as input; sentences (or 'utterances') syntactic analysis: parsing; grammar semantic analysis: "meaning", semantic representation pragmatics; dialogue; discourse Spoken Language Processing transcribed utterances Phenomena of spontaneous speech Speech Recognition Acoustic / sound wave Filtering, FFT; Spectral Analysis Frequency Spectrum Signal Processing / Analysis Features (Phonemes; Context) Phoneme Recognition: HMM, Neural Networks Phonemes Grammar or Statistics Phoneme Sequences / Words Word Sequence / Sentence Grammar or Statistics for likely word sequences Areas in Natural Language Processing Morphology (word stem + ending) Syntax, Grammar & Parsing (syntactic description & analysis) Semantics & Pragmatics (meaning; constructive; context-dependent; references; ambiguity) Pragmatic Theory of Language; Intentions; Metaphor (Communication as Action) Discourse / Dialogue / Text Spoken Language Understanding Language Learning NLP Syntax Analysis Processes Part-of-Speech (POS) Tagging Morphological Analyzer Grammar Rules Lexicon the the – determiner Parser Det NP → Det Noun NP recognized NP Det Noun parse tree Linguistic Background Knowledge NLP - Syntactic Analysis Morphological Analyzer Part-of-Speech (POS) Tagging Grammar Rules Lexicon eat + s 3rd sing eat – verb Parser Verb VP → Verb Noun VP recognized VP Verb Noun parse tree Morphology A morphological analyzer determines (at least) the stem + ending of a word, and usually delivers related information, like the word class, the number, the person and the case of the word. The morphology can be part of the lexicon or implemented as a single component, for example as a rule-based system. eats eat + s verb, singular, 3rd pers dog dog noun, singular Lexicon The Lexicon contains information on words, as inflected forms (e.g. goes, eats) or word-stems (e.g. go, eat). The Lexicon usually assigns a syntactic category, the word class or Part-of-Speech category Sometimes also further syntactic information (see Morphology); semantic information (e.g. agent); syntactic-semantic information (e.g. verb complements like: 'give' requires a direct object). Lexicon Example contents: eats verb; singular, 3rd person (-s); can have direct object (verb subcategorization) dog dog, noun, singular; animal (semantic annotation) POS (Part-of-Speech) Tagging POS Tagging determines the word class or ‘part-ofspeech’ category (basic syntactic categories) of single words or word-stems. The dog eats the bone det (determiner) noun verb (3rd person; singular) det noun Open Word Class: Nouns Nouns denote objects, concepts, … Proper Nouns Names for specific individual objects, entities e.g. the Eiffel Tower, Dr. Kemke Common Nouns Names for categories or classes or abstracts e.g. fruit, banana, table, freedom, sleep, ... Count Nouns enumerable entities, e.g. two bananas Mass Nouns not countable items, e.g. water, salt, freedom Open Word Class: Verbs Verbs denote actions, processes, states e.g. smoke, dream, rest, run Several morphological forms e.g. non-3rd person eat 3rd person eats progressive/ eating present participle/ gerundive past participle eaten Auxiliaries, e.g. be, as sub-class of verbs Open Word Class: Adjectives Adjectives denote qualities or properties of objects, e.g. heavy, blue, content most languages have concepts for colour age value - white, green, ... young, old, ... good, bad, ... not all languages have adjectives as separate class Open Word Class: Adverbs Adverbs denote modifications of actions (verbs), qualities (adjectives) e.g. walk slowly, heavily drunk Directional or Locational Adverbs Specify direction or location e.g. go home, stay here Degree Adverbs Specify extent of process, action, property e.g. extremely slow, very modest Open Word Class: Adverbs 2 Manner Adverbs Specify manner of action or process e.g. walk slowly, run fast Temporal Adverbs Specify time of event or action e.g. yesterday, Monday Closed Word Classes prepositions: on, under, over, at, from, to, with, ... determiners: a, an, the, ... pronouns: he, she, it, his, her, who, I, ... conjunctions: and, or, as, if, when, ... auxiliary verbs: can, may, should, are particles: up, down, on, off, in, out, numerals: one, two, three, ..., first, second, ... Language and Grammar Natural Language described as Formal Language L using a Formal Grammar G: • start-symbol S ≡ sentence • non-terminals NT ≡ syntactic constituents • terminals T ≡ lexical entries/ words • production rules P ≡ grammar rules Generate sentences or recognize sentences (Parsing) of the language L through the application of grammar rules. Grammar Here, POS Tags are included in the grammar rules. det the noun dog | bone verb eat NP det noun (NP noun phrase) VP verb (VP verb phrase) VP verb NP S NP VP (S sentence) Most often we deal with Context-free Grammars, with a distinguished Start-symbol S (sentence). Parsing Parsing derive the syntactic structure of a sentence based on a language model (grammar) construct a parse tree, i.e. the derivation of the sentence based on the grammar (rewrite system) Parsing (here: bottom-up) determine the syntactic structure of the sentence the det dog noun det noun NP eats verb the det bone noun det noun NP verb NP VP NP VP S Sample Grammar Grammar (S, NT, T, P) - NT Non-Terminal; T Terminals; P Productions Sentence Symbol S NT Word-Classes / Part-of-Speech NT syntactic Constituents NT terminal words NT Grammar Rules P NT (NT T)* S → NP VP | Aux NP VP NP → Det Nominal | Proper-Noun Nominal → Noun | Nominal PP VP → Verb | Verb NP | Verb PP | Verb NP PP PP → Prep NP Det → that | this | a Noun → book | flight | meal | money Proper-Noun → Houston | American Airlines | TWA Verb → book | include | prefer Prep → from | to | on Auc → do | does Sample Parse Tree Parse "Does this flight include a meal?" S Aux NP Det Nominal Noun does this flight VP Verb NP Det include a Nominal meal Bottom-up vs. Top-Down Parsing Bottom-up – from word-nodes to sentence-symbol Top-down Parsing – from sentence-symbol to words S Aux NP Det Nominal Noun does this flight VP Verb NP Det include a Nominal meal Ambiguity “One morning, I shot an elephant in my pajamas. How he got into my pajamas, I don’t know.” Groucho Marx syntactical or structural ambiguity – several parse trees example: above sentence semantic or lexical ambiguity – several word meanings bank (where you get money) and (river) bank even different word categories possible (interim) He books the flight. vs. The books are here. Fruit flies from the balcony vs. Fruit flies are on the balcony. Lexical Ambiguity Several word senses or word categories e.g. chase – noun or verb e.g. plant - ???? Syntactic Ambiguity Several parse trees e.g. “The dog eats the bone in the park.” e.g. “The dog eats the bone in the package.” Who/what is in the park and who/what is in the package? Syntactically speaking: How do I bind the Prepositional Phrase "in the ... " ? Problems in Parsing Problems with left-recursive rules like NP → NP PP: don’t know how many times recursion is needed. Pure Bottom-up or Top-down Parsing is inefficient because it generates and explores too many structures which in the end turn out to be invalid. Combine top-down and bottom-up approach: Start with sentence; use rules top-down (look-ahead); read input; try to find shortest path from input to highest unparsed constituent (from left to right). → Chart-Parsing / Earley-Parser Chart-Parsing / Early Algorithm Essence: Integrate top-down and bottom-up parsing. Keep recognized sub-structures (sub-trees) for shared use during parsing. Top-down Prediction: Start with S-symbol. Generate all applicable rules for S. Go further down with left-most constituent in rules and add rules for these constituents until you encounter a left-most node on the RHS which is a word category (POS). Bottom-up Completion: Read input word and compare. If word matches, mark as recognized and continue the recognition bottom-up, trying to complete active rules. Earley Algorithm - Functions predictor generates new rules for partly recognized RHS with constituent right of • (top-down generation); • indicates how far a rule has been recognized scanner if word category (POS) is found right of the • , the Scanner reads the next input word and adds a rule for it to the chart (bottom-up mode) completer if rule is completely recognized (the • is far right), the recognition state of earlier rules in the chart advances: the • is moved over the recognized constituent (bottom-up recognition). Chart S VP . VP V NP . NP Det Nom . V Book Nom Noun . Det Noun this flight Semantics Semantic Representation Representation of the meaning of a sentence. Generate a logic-based representation or a frame-based representation based on the syntactic structure, lexical entries, and particularly the head-verb (determines how to arrange parts of the sentence in the semantic representation). Semantic Representation Verb-centered Representation Verb (action, head) is regarded as center of verbal expression and determines the case frame with possible case roles; other parts of the sentence are described in relation to the action as fillers of case slots. (cf. also Schank’s CD Theory) Typing of case roles possible (e.g. 'agent' refers to a specific sort or concept) General Frame for "eat" Agent: animate Action: eat Patiens: food Manner: {e.g. fast} Location: {e.g. in the yard} Time: {e.g. at noon} Example-Frame with Fillers Agent: Action: Patiens: Location: the dog eat the bone / the bone in the package in the park General Frame for drive Frame with fillers Agent: animate Action: drive Patiens: vehicle Manner:{the way it is done} Location: Location-spec Agent: she Action: drives Patiens: the convertible Manner: fast Location: [in the] Rocky Mountains Source: [from] home Destination: [to the] ASIC conference Time: [in the] summer holidays Source: Location-spec Destination: Location-spec Time: Time-spec Representation in Logic Action: Agent: Patiens: eat the dog the bone / the bone in the package Location: in the park predicate eat (dog-1, bone-1, park-1) constants Representation in Logic eat (dog-1, bone-1, park-1) general lexical eat ( x, y, z ) variables eat ( NP-1, NP-2, PP ) syntactic animate-being (x) food (y) location (z) NP-1 (x) NP-2 (y) PP (z) semantic frame syntactic frame Pragmatics Pragmatics Pragmatics includes context-related aspects of NL expressions (utterances). These are in particular anaphoric references, elliptic expressions, deictic expressions, … anaphoric references – refer to items mentioned before deictic expressions – simulate pointing gestures elliptic expressions – incomplete expression; relate to item mentioned before Pragmatics “I put the box on the top shelve.” “I know that. But I can’t find it there.” anaphoric reference deictic expression “The candy-box?” elliptic expression Intentions Intentions One philosophical assumption is that natural language is used to achieve things or situations: “Do things with words.” The meaning of an utterance is essentially determined by the intention of the speaker. Intentionality - Examples What was said: What was meant: “There is a terrible draft here.” "Can you please close the window." “How does it look here?” "I am really mad; clean up your room." "Will this ever end?" "I would prefer to be with my friends than to sit in class now." Metaphors Metaphors The meaning of a sentence or expression is not directly inferable from the sentence structure and the word meanings. Metaphors transfer concepts and relations from one area of discourse into another area, for example, seeing time as line (in space) or seing friendship or life as a journey. Metaphors - Examples “This car eats a lot of gas.” “She devoured the book.” “He was tied up with his clients.” “Marriage is like a journey.” “Their marriage was a one-way road into hell.” (see George Lakoff, Women, Fire and Dangerous Things) Dialogue and Discourse Discourse / Dialogue Structure Grammar for various sentence types (speech acts): dialogue, discourse, story grammar Distinguish questions, commands, and statements: Where is the remote-control? Bring the remote-control! The remote-control is on the brown table. Dialogue Grammars describe possible sequences of Speech Acts in communication, e.g. that a question is followed by an answer/statement. Speech Speech Production & Reception Sound and Hearing • change in air pressure sound wave • reception through inner ear membrane / microphone • break-up into frequency components: receptors in cochlea / mathematical frequency analysis (e.g. Fast-Fourier Transform FFT) Frequency Spectrum • perception/recognition of phonemes and subsequently words (e.g. Neural Networks, HiddenMarkov Models) Speech Recognition Phases Speech Recognition • acoustic signal as input • signal analysis - spectrogram • feature extraction • phoneme recognition • word recognition • conversion into written words Speech Signal Speech Signal composed of harmonic signal (sinus waves) with different frequencies and amplitudes frequency - waves/second like pitch amplitude - height of wave like loudness non-harmonic signal (not sinus wave): noise glottis and speech signal in lingWAVES (from http://www.lingcom.de) Speech Signal Analysis Analog-Digital Conversion of Acoustic Signal Sampling in Time Frames (“windows”) frequency = 0-crossings per time frame e.g. 2 crossings/second is 1 Hz (1 wave) e.g. 10kHz needs sampling rate 20kHz measure amplitudes of signal in time frame digitized wave form separate different frequency components FFT (Fast Fourier Transform) spectrogram other frequency based representations LPC (linear predictive coding), Cepstrum Waveform Amplitude/ Pressure Time "She just had a baby." Waveform for Vowel ae Amplitude/ Pressure Time Time Waveform and Spectrogram Waveform and LPC Spectrum for Vowel ae Amplitude/ Pressure Time Energy Formants Frequency Phoneme Recognition Recognition Process based on features extracted from spectral analysis phonological rules statistical properties of language/ pronunciation Recognition Methods Hidden Markov Models Neural Networks Pattern Classification in general Speech Signal Characteristics Derive from signal representation: formants - dark stripes in spectrum strong frequency components; characterize particular vowels; gender of speaker pitch – fundamental frequency baseline for higher frequency harmonics like formants; gender characteristic change in frequency distribution characteristic for e.g. plosives (form of articulation) Features for Vowels & Consonants Probabilistic FAs as Word Models Word Recognition with Hidden Markov Model Viterbi-Algorithm The Viterbi Algorithm finds an optimal sequence of states in continuous Speech Recognition, given an observation sequence of phones and a probabilistic (weighted) FA (state graph). The algorithm returns the path through the automaton which has maximum probability and accepts the observation sequence. a[s,s'] is the transition probability (in the phonetic word model) from current state s to next state s', and b[s',ot] is the observation likelihood of s' given ot. b[s',ot] is 1 if the observation symbol matches the state, and 0 otherwise. (cf. Jurafsky Ch.5) Speech Recognizer Architecture Speech Processing - Characteristics Speech Recognition vs. Speaker Identification (Voice Recognition) speaker-dependent vs. speaker-independent training unlimited vs. large vs. small vocabulary single word vs. continuous speech Spoken Language Spoken Language Output of Speech Recognition System as input "text". Can be associated with probabilities for different word sequences. Contains ungrammatical structures, so-called "disfluencies", e.g. repetitions and corrections. Spoken Language - Examples 1. no [s-] straight southwest 2. right to [my] my left 3. [that is] that is correct From: Robin J. Lickley. HCRC Disfluency Coding Manual. http://www.ling.ed.ac.uk/~robin/maptask/HCRCdsm-01.html Spoken Language - Examples 1. we're going to [g-- ]... turn straight back around for testing. 2. [come to] ... walk right to the ... right-hand side of the page. 3. right [up ... past] ... up on the left of the ... white mountain walk ... right up past. 4. [i'm still] ... i've still gone halfway back round the lake again. Spoken Language - Examples 1. [I’d] [d if] I need to go 2. [it’s basi--] see if you go over the old mill 3. [you are going] make a gradual slope … to your right 4. [I’ve got one] I don’t realize why it is there Spoken Language - Disfluency Reparandum and Repair [come to] ... walk right to [the] ... the right-hand side of the page Reparandum Repair Additional References Jurafsky, D. & J. H. Martin, Speech and Language Processing, Prentice-Hall, 2000 Hong, X. & A. Acero & H. Hon: Spoken Language Processing. A Guide to Theory, Algorithms, and System Development. Prentice-Hall, NJ, 2001 Kemke, C., 74.793 Natural Language and Speech Processing - Course Notes, 2nd Term 2004, Dept. of Computer Science, U. of Manitoba Robin J. Lickley. HCRC Disfluency Coding Manual. http://www.ling.ed.ac.uk/~robin/maptask/HCRCdsm-01.html Figures Figures taken from: Jurafsky, D. & J. H. Martin, Speech and Language Processing, Prentice-Hall, 2000, Chapters 5 and 7. lingWAVES (from http://www.lingcom.de