Download Hidden Markov Models

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Germanic strong verb wikipedia , lookup

Esperanto grammar wikipedia , lookup

Udmurt grammar wikipedia , lookup

Chinese grammar wikipedia , lookup

Swedish grammar wikipedia , lookup

Navajo grammar wikipedia , lookup

Portuguese grammar wikipedia , lookup

Georgian grammar wikipedia , lookup

English clause syntax wikipedia , lookup

Polish grammar wikipedia , lookup

Spanish grammar wikipedia , lookup

Kannada grammar wikipedia , lookup

Serbo-Croatian grammar wikipedia , lookup

Lexical semantics wikipedia , lookup

Ancient Greek grammar wikipedia , lookup

Latin syntax wikipedia , lookup

Yiddish grammar wikipedia , lookup

Icelandic grammar wikipedia , lookup

Pipil grammar wikipedia , lookup

Transcript
Hidden Markov Models
Hidden Markov Models
Outline
Hidden Markov Models
Martin Emms
Best path through an HMM: Viterbi Decoding
Illustration: Part of Speech Tagging
March 22, 2017
Hidden Markov Models
Outline
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Decoding
want to find most-probable hidden state sequence for a given sequence of
visible observations
decode(o1:T ) = arg max[(π(s1 )bs1 (o1 ) ×
HMM Viterbi Decoding
s1:T
◮
N possible states
◮
N T possible state sequences for o1:T
T
Y
bst (ot )ast−1 st )]
t=2
◮
not computationally feasible to simply enumerate the possible state
sequences
◮
Viterbi algorithm is an efficient, dynamic programming method for
avoiding this
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Part of Speech tagging example
one
wants
tries
.002 .003
PNI
VVZ
a
.5
.3
the
AT0
pause
NN1
STOP
NN1
In this example
◮
states are part-of-speech tags
◮
observation symbols are words.
◮
eg. P(tag at t is AT0 | tag at t − 1 VVZ) = 0.5
◮
eg. P(word at t wants | tag at t is VVZ) = 0.002
AJ0
AJC
AJS
AT0
AV0
AVP
AVQ
CJC
CJS
CJT
CRD
DPS
DT0
DTQ
EX0
ITJ
NN0
NN1
NN2
NP0
Adjective (general or positive)
Comparative adjective
Superlative adjective
Article
General adverb
Adverb particle
Wh-adverb
Coordinating conjunction
Subordinating conjunction
The subordinating conjunction
Cardinal number
Possessive determiner
General determiner
Wh-determiner
Existential
Interjection
Common noun, neutral for number
Singular common noun
Plural common noun
Proper noun
VBB
VBD
VBG
VBI
VBN
VBZ
VDB
VDD
VDG
VDI
VDN
VDZ
VHB
VHD
VHG
VHI
VHN
VHZ
VM0
VVB
VVD
VVG
VVI
VVN
VVZ
XX0
ZZ0
The present tense forms of the verb BE
The past tense forms of the verb BE
The -ing form of the verb BE
The infinitive form of the verb BE
The past participle form of the verb BE
The -s form of the verb BE
The finite base form of the verb BE
The past tense form of the verb DO
The -ing form of the verb DO
The infinitive form of the verb DO
The past participle form of the verb DO
The -s form of the verb DO
The finite base form of the verb HAVE
The past tense form of the verb HAVE
The -ing form of the verb HAVE
The infinitive form of the verb HAVE
The past participle form of the verb HAVE
The -s form of the verb HAVE
Modal auxiliary verb
The finite base form of lexical verbs
The past tense form of lexical verbs
The -ing form of lexical verbs
The infinitive form of lexical verbs
The past participle form of lexical verbs
The -s form of lexical verbs
The negative particle
Alphabetical symbols
good, old, beautiful
better, older
best, oldest
the, a, an, no
often, well, longer
up, off, out
when, where, how, why, wherever
and, or, but
although, when
that
one, 3, fifty-five, 3609
your, their, his
this
which, what, whose, whichever
there in the there is
oh, yes, mhm, wow
aircraft, data, committee
pencil, goose, time, revelation
pencils, geese, times, revelations
London, Michael, Mars
This part-of-speech tagging scenario will be used to illustrate the Viterbi
algorithm. The following few slides just explain the POS tags a little, though
this is not really necessary to follow the algorithm
ORD
PNI
PNP
PNQ
PNX
POS
PRF
PRP
PUL
PUN
PUQ
PUR
TO0
UNC
Ordinal numeral
Indefinite pronoun
Personal pronoun
Wh-pronoun
Reflexive pronoun
The possessive or genitive marker
The preposition of
Preposition (except for of
Punctuation: left bracket
Punctuation: general separating mark
Punctuation: quotation mark
Punctuation: right bracket
Infinitive marker to
Unclassified items
first, sixth, 77th, last
none, everything, one
I, you, them, ours
who, whoever, whom
myself, yourself, itself, ourselves
’s
of
about, at, in, on
( or [
. , ! , : ; - or ?
’
) or ]
to
formulae
am, are, ’m, ’re
was, were
being
be
been
is, ’s
do
did
doing
do
done
does, ’s
have, ’ve
had, ’d
having
have
had
has, ’s
will, would, can, could, ’ll,’d
forget, send, live, return
forgot, sent, lived, returned
forgetting, sending, living, returning
forget, send, live, return
forgotten, sent, lived, returned
forgets, sends, lives, returns
not or n’t
A, a, B, b, c, d
Hidden Markov Models
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Best path through an HMM: Viterbi Decoding
Decoding
this part-of-speech example provides a model which is intended to assign
probabilities to state+obs sequences like
s:
o:
s:
o:
PNI
one
VVZ
wants
AT0
a
AT0
the
NN1
cup
VVZ
wins
NN1
pause
◮
best path(t, i): best path through the HMM which
accounts for the first t obs symbols
and ends with state i
◮
abs best path(t): best path through the HMM which
accounts for the first t obs symbols
◮
abs best path(T ) is what you eventually want, but since clearly
etc
abs best path(t) =
max
best path(t, i)
1≤i ≤N
best path(t, i) suffices
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Viterbi in words
Viterbi Pseudo code
Initialisation:
f o r ( i = 1 ; i ≤ N ; i++) {
best path(1, i ) = π(i )bi (o1 ) ;
}
◮
abs best path(t) is impossible to calculate from abs best path(t − 1)
◮
best path(t, .) is easy to calculate from best path(t − 1, .), in outline as
follows
For a given state j at time t
consider every possible immediate predecessor i for j
and compare best path(t − 1, i) × aij bj (ot )
take the maximum and remember which i was j’s best predecessor
◮
can implement by tabulating values of best path(t, i), tabulating entries
for t − 1 before entries for t
Iteration:
f o r ( t = 2 ; t ≤ T ; t++) {
f o r ( j = 1 ; j ≤ N ; j++) {
max = 0 ;
f o r ( i = 1 ; i ≤ N ; i++) {
p = best path(t − 1, i ).prob × aij bj (ot ) ;
i f ( p > max )
{ max = p ; prev state = i ; }
}
best path(t, j).prob = max ;
best path(t, j).prev state = prev state
}
}
The cost of this algorithm will be of the order of N 2 T , compared to the brute
force cost N T .
Hidden Markov Models
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Best path through an HMM: Viterbi Decoding
Illustration: Part of Speech Tagging
Illustration: Part of Speech Tagging
Trellis
The defining probabilities
one|CRD = .001
|PNI = .001
◮
◮
operation of the algorithm best visualised with trellis; this has as many
columns as there are observation symbols and the column at t shows the
states i with non-zero P(ot |i)
next picture shows such a trellis for a part-of-speech tagging example,
where the observation sequences is one wants a pause
a|AT0 = .45
AT0|VVZ = .5
|NN2 = .01
pi(CRD) = .004
pi(PNI) = .001
pause|NN1 = .002
|VVB = .001
|VVI = .001
wants|VVZ = .002
|NN2 = .0002
VVZ|CRD = .0001
|PNI = .5
NN1|AT0 = 0.5
VVB|AT0 = 0.01
NN2|CRD = .45
|PNI = .001
Hidden Markov Models
VVI|AT0 = 0.01
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Best path through an HMM: Viterbi Decoding
Illustration: Part of Speech Tagging
Illustration: Part of Speech Tagging
A trellis
A trellis
wants
CRD
VVZ
a
pause
NN1
one
wants
CRD
VVZ
a
(.5
pause
2)
)(
.45
)
(.5
)
0
(.0
NN1
VVB
AT0
VVB
(.5
AT0
)(.
0
02
)
one
(.0
01
) (.
PNI
NN2
VVI
00
1)
PNI
π(PNI) P(one | PNI)
assign parts of the equation to each arc in this trellis
VVI
NN2
x P(AT0 | VVZ) P(a | AT0)
x P(VVZ | PNI)P(wants | VVZ)
x P(NN1 | AT0) P(pause | NN1)
Hidden Markov Models
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Best path through an HMM: Viterbi Decoding
Illustration: Part of Speech Tagging
Illustration: Part of Speech Tagging
A trellis
Viterbi initialisation
one
)
01 CRD
wants
a
VVZ
)
02
0
(.0
(.5 (
) .45
)
5)
00
(.001) (.0002)
1)
PNI
2)
00
(.0
) (.
.
)(
1
(.0
NN1
)
(.5
(.01) (.001)
AT0 (
VVB
.01
) (.0
01
)
)
45
(.4
(.0
01
2)
(.5
)(.
0
0
(.0
wants
a
CRD
VVZ
pause
4e−6
(.0001)(.002)
(.0
4)
one
pause
VVI
NN2
)
01
.0
)(
4
.00
(
AT0
Hidden Markov Models
VVB
(.0
01
) (.0
01
)
PNI
1e−6
VVI
NN2
best_path(1,CRD) = 4e−6
best_path(1,PNI) = 1e−6
problem is now to find the best path through this trellis
where total path prob is product of segments
NN1
INITIALISATION
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Best path through an HMM: Viterbi Decoding
Illustration: Part of Speech Tagging
Illustration: Part of Speech Tagging
best path to wants/VVZ
best path to wants/NN2
one
4e−6
wants
a
pause
2)
wants
CRD
VVZ
a
pause
4e−6
(.0001)(.002)
CRD
one
NN1
VVZ
NN1
0
.0
)(
(.5
VVB
AT0
VVB
5)
(.4
AT0
2)
00
(.0
PNI
1e−6
VVI
NN2
for best_path(2,VVZ) compare
CRD VVZ (4e−6)(2e−7) = 8e−13
PNI VVZ (1e−6)(1e−3) = 1e−9
(.0001)(.0002)
PNI
1e−6
NN2
for best_path(2,NN2) compare
CRD NN2 (4e−6)(9e−5) = 3.6e−10 winner!
winner!
PNI NN2 (1e−6)(2e−8) = 2e−14
VVI
Hidden Markov Models
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Best path through an HMM: Viterbi Decoding
Illustration: Part of Speech Tagging
Illustration: Part of Speech Tagging
best path(2,.) finished
one
best path to a/AT0
wants
a
one
wants
CRD
VVZ
a
pause
1e−9
1e−9
VVZ
CRD
pause
NN1
NN1
(.5 (
) .45
)
AT0
AT0
VVB
VVB
5)
3.6e−10
3.6e−10
PNI
VVI
NN2
PNI
(.4
1)
(.0
VVI
NN2
for best_path(3,AT0) compare
PNI VVZ AT0 (1e−9)(2.25e−1) = 2.25e−10
best_path(2,VVZ) = PNI VVZ 1e−9
best_path(2,NN2) = CRD NN2 3.6e−10
(winner!)
CRD NN2 AT0 (3.6e−10)(4.5e−3) = 1.6e−12
Hidden Markov Models
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Best path through an HMM: Viterbi Decoding
Illustration: Part of Speech Tagging
Illustration: Part of Speech Tagging
best path(3,.) finished
best path to pause/NN1
one
wants
CRD
VVZ
a
pause
NN1
one
wants
CRD
VVZ
2.25e−10
AT0
PNI
NN2
best_path(3,AT0) = PNI VVZ AT0 2.25e−10
PNI
NN2
pause
)
02
.0
) (
(.5
2.25e−10
AT0
VVB
VVI
a
NN1
VVB
VVI
best_path(4,NN1) = PNI VVZ AT0 NN1 (2.25e−10)(1e−3) = 2.25e−13
Hidden Markov Models
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Best path through an HMM: Viterbi Decoding
Illustration: Part of Speech Tagging
Illustration: Part of Speech Tagging
best path to pause/VVB
best path to pause/VVI
one
wants
CRD
VVZ
a
pause
NN1
one
wants
CRD
VVZ
a
pause
NN1
2.25e−10
AT0
2.25e−10 (.01)(.001)
AT0
VVB
VVB
(.0
1) (
.00
1)
PNI
VVI
NN2
best_path(4,VVB) = PNI VVZ AT0 VVB (2.25e−10)(1e−5) = 2.25e−15
Hidden Markov Models
Best path through an HMM: Viterbi Decoding
Illustration: Part of Speech Tagging
best path[4] finished
one
wants
CRD
VVZ
a
NN1 2.25e−13
AT0
PNI
NN2
pause
VVB 2.25e−15
VVI 2.25e−15
best_path(4,NN1) = PNI VVZ AT0 NN1 = 2.25e−13
(final max)
best_path(4,VVB) = PNI VVZ AT0 VVB = 2.25e−15
best_path(4,VVI) = PNI VVZ AT0 VVI = 2.25e−15
PNI
NN2
VVI
best_path(4,VVI) = PNI VVZ AT0 VVI (2.25e−10)(1e−5) = 2.25e−15