* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Hidden Markov Models
Germanic strong verb wikipedia , lookup
Esperanto grammar wikipedia , lookup
Udmurt grammar wikipedia , lookup
Chinese grammar wikipedia , lookup
Swedish grammar wikipedia , lookup
Navajo grammar wikipedia , lookup
Portuguese grammar wikipedia , lookup
Georgian grammar wikipedia , lookup
English clause syntax wikipedia , lookup
Polish grammar wikipedia , lookup
Spanish grammar wikipedia , lookup
Kannada grammar wikipedia , lookup
Serbo-Croatian grammar wikipedia , lookup
Lexical semantics wikipedia , lookup
Ancient Greek grammar wikipedia , lookup
Latin syntax wikipedia , lookup
Yiddish grammar wikipedia , lookup
Hidden Markov Models Hidden Markov Models Outline Hidden Markov Models Martin Emms Best path through an HMM: Viterbi Decoding Illustration: Part of Speech Tagging March 22, 2017 Hidden Markov Models Outline Hidden Markov Models Best path through an HMM: Viterbi Decoding Decoding want to find most-probable hidden state sequence for a given sequence of visible observations decode(o1:T ) = arg max[(π(s1 )bs1 (o1 ) × HMM Viterbi Decoding s1:T ◮ N possible states ◮ N T possible state sequences for o1:T T Y bst (ot )ast−1 st )] t=2 ◮ not computationally feasible to simply enumerate the possible state sequences ◮ Viterbi algorithm is an efficient, dynamic programming method for avoiding this Hidden Markov Models Best path through an HMM: Viterbi Decoding Part of Speech tagging example one wants tries .002 .003 PNI VVZ a .5 .3 the AT0 pause NN1 STOP NN1 In this example ◮ states are part-of-speech tags ◮ observation symbols are words. ◮ eg. P(tag at t is AT0 | tag at t − 1 VVZ) = 0.5 ◮ eg. P(word at t wants | tag at t is VVZ) = 0.002 AJ0 AJC AJS AT0 AV0 AVP AVQ CJC CJS CJT CRD DPS DT0 DTQ EX0 ITJ NN0 NN1 NN2 NP0 Adjective (general or positive) Comparative adjective Superlative adjective Article General adverb Adverb particle Wh-adverb Coordinating conjunction Subordinating conjunction The subordinating conjunction Cardinal number Possessive determiner General determiner Wh-determiner Existential Interjection Common noun, neutral for number Singular common noun Plural common noun Proper noun VBB VBD VBG VBI VBN VBZ VDB VDD VDG VDI VDN VDZ VHB VHD VHG VHI VHN VHZ VM0 VVB VVD VVG VVI VVN VVZ XX0 ZZ0 The present tense forms of the verb BE The past tense forms of the verb BE The -ing form of the verb BE The infinitive form of the verb BE The past participle form of the verb BE The -s form of the verb BE The finite base form of the verb BE The past tense form of the verb DO The -ing form of the verb DO The infinitive form of the verb DO The past participle form of the verb DO The -s form of the verb DO The finite base form of the verb HAVE The past tense form of the verb HAVE The -ing form of the verb HAVE The infinitive form of the verb HAVE The past participle form of the verb HAVE The -s form of the verb HAVE Modal auxiliary verb The finite base form of lexical verbs The past tense form of lexical verbs The -ing form of lexical verbs The infinitive form of lexical verbs The past participle form of lexical verbs The -s form of lexical verbs The negative particle Alphabetical symbols good, old, beautiful better, older best, oldest the, a, an, no often, well, longer up, off, out when, where, how, why, wherever and, or, but although, when that one, 3, fifty-five, 3609 your, their, his this which, what, whose, whichever there in the there is oh, yes, mhm, wow aircraft, data, committee pencil, goose, time, revelation pencils, geese, times, revelations London, Michael, Mars This part-of-speech tagging scenario will be used to illustrate the Viterbi algorithm. The following few slides just explain the POS tags a little, though this is not really necessary to follow the algorithm ORD PNI PNP PNQ PNX POS PRF PRP PUL PUN PUQ PUR TO0 UNC Ordinal numeral Indefinite pronoun Personal pronoun Wh-pronoun Reflexive pronoun The possessive or genitive marker The preposition of Preposition (except for of Punctuation: left bracket Punctuation: general separating mark Punctuation: quotation mark Punctuation: right bracket Infinitive marker to Unclassified items first, sixth, 77th, last none, everything, one I, you, them, ours who, whoever, whom myself, yourself, itself, ourselves ’s of about, at, in, on ( or [ . , ! , : ; - or ? ’ ) or ] to formulae am, are, ’m, ’re was, were being be been is, ’s do did doing do done does, ’s have, ’ve had, ’d having have had has, ’s will, would, can, could, ’ll,’d forget, send, live, return forgot, sent, lived, returned forgetting, sending, living, returning forget, send, live, return forgotten, sent, lived, returned forgets, sends, lives, returns not or n’t A, a, B, b, c, d Hidden Markov Models Hidden Markov Models Best path through an HMM: Viterbi Decoding Best path through an HMM: Viterbi Decoding Decoding this part-of-speech example provides a model which is intended to assign probabilities to state+obs sequences like s: o: s: o: PNI one VVZ wants AT0 a AT0 the NN1 cup VVZ wins NN1 pause ◮ best path(t, i): best path through the HMM which accounts for the first t obs symbols and ends with state i ◮ abs best path(t): best path through the HMM which accounts for the first t obs symbols ◮ abs best path(T ) is what you eventually want, but since clearly etc abs best path(t) = max best path(t, i) 1≤i ≤N best path(t, i) suffices Hidden Markov Models Best path through an HMM: Viterbi Decoding Viterbi in words Viterbi Pseudo code Initialisation: f o r ( i = 1 ; i ≤ N ; i++) { best path(1, i ) = π(i )bi (o1 ) ; } ◮ abs best path(t) is impossible to calculate from abs best path(t − 1) ◮ best path(t, .) is easy to calculate from best path(t − 1, .), in outline as follows For a given state j at time t consider every possible immediate predecessor i for j and compare best path(t − 1, i) × aij bj (ot ) take the maximum and remember which i was j’s best predecessor ◮ can implement by tabulating values of best path(t, i), tabulating entries for t − 1 before entries for t Iteration: f o r ( t = 2 ; t ≤ T ; t++) { f o r ( j = 1 ; j ≤ N ; j++) { max = 0 ; f o r ( i = 1 ; i ≤ N ; i++) { p = best path(t − 1, i ).prob × aij bj (ot ) ; i f ( p > max ) { max = p ; prev state = i ; } } best path(t, j).prob = max ; best path(t, j).prev state = prev state } } The cost of this algorithm will be of the order of N 2 T , compared to the brute force cost N T . Hidden Markov Models Hidden Markov Models Best path through an HMM: Viterbi Decoding Best path through an HMM: Viterbi Decoding Illustration: Part of Speech Tagging Illustration: Part of Speech Tagging Trellis The defining probabilities one|CRD = .001 |PNI = .001 ◮ ◮ operation of the algorithm best visualised with trellis; this has as many columns as there are observation symbols and the column at t shows the states i with non-zero P(ot |i) next picture shows such a trellis for a part-of-speech tagging example, where the observation sequences is one wants a pause a|AT0 = .45 AT0|VVZ = .5 |NN2 = .01 pi(CRD) = .004 pi(PNI) = .001 pause|NN1 = .002 |VVB = .001 |VVI = .001 wants|VVZ = .002 |NN2 = .0002 VVZ|CRD = .0001 |PNI = .5 NN1|AT0 = 0.5 VVB|AT0 = 0.01 NN2|CRD = .45 |PNI = .001 Hidden Markov Models VVI|AT0 = 0.01 Hidden Markov Models Best path through an HMM: Viterbi Decoding Best path through an HMM: Viterbi Decoding Illustration: Part of Speech Tagging Illustration: Part of Speech Tagging A trellis A trellis wants CRD VVZ a pause NN1 one wants CRD VVZ a (.5 pause 2) )( .45 ) (.5 ) 0 (.0 NN1 VVB AT0 VVB (.5 AT0 )(. 0 02 ) one (.0 01 ) (. PNI NN2 VVI 00 1) PNI π(PNI) P(one | PNI) assign parts of the equation to each arc in this trellis VVI NN2 x P(AT0 | VVZ) P(a | AT0) x P(VVZ | PNI)P(wants | VVZ) x P(NN1 | AT0) P(pause | NN1) Hidden Markov Models Hidden Markov Models Best path through an HMM: Viterbi Decoding Best path through an HMM: Viterbi Decoding Illustration: Part of Speech Tagging Illustration: Part of Speech Tagging A trellis Viterbi initialisation one ) 01 CRD wants a VVZ ) 02 0 (.0 (.5 ( ) .45 ) 5) 00 (.001) (.0002) 1) PNI 2) 00 (.0 ) (. . )( 1 (.0 NN1 ) (.5 (.01) (.001) AT0 ( VVB .01 ) (.0 01 ) ) 45 (.4 (.0 01 2) (.5 )(. 0 0 (.0 wants a CRD VVZ pause 4e−6 (.0001)(.002) (.0 4) one pause VVI NN2 ) 01 .0 )( 4 .00 ( AT0 Hidden Markov Models VVB (.0 01 ) (.0 01 ) PNI 1e−6 VVI NN2 best_path(1,CRD) = 4e−6 best_path(1,PNI) = 1e−6 problem is now to find the best path through this trellis where total path prob is product of segments NN1 INITIALISATION Hidden Markov Models Best path through an HMM: Viterbi Decoding Best path through an HMM: Viterbi Decoding Illustration: Part of Speech Tagging Illustration: Part of Speech Tagging best path to wants/VVZ best path to wants/NN2 one 4e−6 wants a pause 2) wants CRD VVZ a pause 4e−6 (.0001)(.002) CRD one NN1 VVZ NN1 0 .0 )( (.5 VVB AT0 VVB 5) (.4 AT0 2) 00 (.0 PNI 1e−6 VVI NN2 for best_path(2,VVZ) compare CRD VVZ (4e−6)(2e−7) = 8e−13 PNI VVZ (1e−6)(1e−3) = 1e−9 (.0001)(.0002) PNI 1e−6 NN2 for best_path(2,NN2) compare CRD NN2 (4e−6)(9e−5) = 3.6e−10 winner! winner! PNI NN2 (1e−6)(2e−8) = 2e−14 VVI Hidden Markov Models Hidden Markov Models Best path through an HMM: Viterbi Decoding Best path through an HMM: Viterbi Decoding Illustration: Part of Speech Tagging Illustration: Part of Speech Tagging best path(2,.) finished one best path to a/AT0 wants a one wants CRD VVZ a pause 1e−9 1e−9 VVZ CRD pause NN1 NN1 (.5 ( ) .45 ) AT0 AT0 VVB VVB 5) 3.6e−10 3.6e−10 PNI VVI NN2 PNI (.4 1) (.0 VVI NN2 for best_path(3,AT0) compare PNI VVZ AT0 (1e−9)(2.25e−1) = 2.25e−10 best_path(2,VVZ) = PNI VVZ 1e−9 best_path(2,NN2) = CRD NN2 3.6e−10 (winner!) CRD NN2 AT0 (3.6e−10)(4.5e−3) = 1.6e−12 Hidden Markov Models Hidden Markov Models Best path through an HMM: Viterbi Decoding Best path through an HMM: Viterbi Decoding Illustration: Part of Speech Tagging Illustration: Part of Speech Tagging best path(3,.) finished best path to pause/NN1 one wants CRD VVZ a pause NN1 one wants CRD VVZ 2.25e−10 AT0 PNI NN2 best_path(3,AT0) = PNI VVZ AT0 2.25e−10 PNI NN2 pause ) 02 .0 ) ( (.5 2.25e−10 AT0 VVB VVI a NN1 VVB VVI best_path(4,NN1) = PNI VVZ AT0 NN1 (2.25e−10)(1e−3) = 2.25e−13 Hidden Markov Models Hidden Markov Models Best path through an HMM: Viterbi Decoding Best path through an HMM: Viterbi Decoding Illustration: Part of Speech Tagging Illustration: Part of Speech Tagging best path to pause/VVB best path to pause/VVI one wants CRD VVZ a pause NN1 one wants CRD VVZ a pause NN1 2.25e−10 AT0 2.25e−10 (.01)(.001) AT0 VVB VVB (.0 1) ( .00 1) PNI VVI NN2 best_path(4,VVB) = PNI VVZ AT0 VVB (2.25e−10)(1e−5) = 2.25e−15 Hidden Markov Models Best path through an HMM: Viterbi Decoding Illustration: Part of Speech Tagging best path[4] finished one wants CRD VVZ a NN1 2.25e−13 AT0 PNI NN2 pause VVB 2.25e−15 VVI 2.25e−15 best_path(4,NN1) = PNI VVZ AT0 NN1 = 2.25e−13 (final max) best_path(4,VVB) = PNI VVZ AT0 VVB = 2.25e−15 best_path(4,VVI) = PNI VVZ AT0 VVI = 2.25e−15 PNI NN2 VVI best_path(4,VVI) = PNI VVZ AT0 VVI (2.25e−10)(1e−5) = 2.25e−15