Download Hidden Markov Models

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pattern recognition wikipedia , lookup

Randomness wikipedia , lookup

Gene prediction wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Generalized linear model wikipedia , lookup

Birthday problem wikipedia , lookup

Expected utility hypothesis wikipedia , lookup

Smith–Waterman algorithm wikipedia , lookup

Simulated annealing wikipedia , lookup

Risk aversion (psychology) wikipedia , lookup

Transcript
Hidden Markov Models
Usman Roshan
CS 675
Machine Learning
HMM applications
• Determine coding and non-coding
regions in DNA sequences
• Separating “bad” pixels from “good”
ones in image files
• Classification of DNA sequences
Hidden Markov Models
• Alphabet of symbols: S
• Set of states that emit symbols from the
alphabet: Q
• Set of probabilities
– State transition: akl for each k,l Î Q
– Emission probabilities:
ek (b) for each k Î Q and b Î S
Loaded die problem
S = {1,2,3,4,5,6}, Q = {F,L}
eF (i) =1/6 for all i =1..6
eL (i) = 0.1 for all i =1..5 and eL (6) = 0.5
aFF = 0.95
aFL = 0.05
aLL = 0.9
aLF = 0.1
Loaded die automata
aFL
F
aFF
L
eF(i)
eL(i)
aLF
aLL
Loaded die problem
• Consider the following rolls:
Observed (X):
21665261
Underlying die (P): FFLLFLLF
• What is the probability that the
underlying path generated the observed
sequence?
L
p 0 = begin
P(X,P) = ap p Õ ep (x i ) . ap p
p L +1 = end
i
0 1
i=1
i
i +1
HMM computational problems
Hidden sequence
known
Hidden sequence
unknown
Transition and
emission
probabilities
known
Model fully
specified
Viterbi to
determine optimal
hidden sequence
Transition and
emission
probabilities
unknown
Maximum
likelihood
Expected
maximization and
also known as
Baum-Welch
Probabilities unknown but
hidden sequence known
• Akl: number of transitions from state k to l
• Ek(b): number of times state k emits symbol b
akl =
å
ek (b) =
Akl
q ÎQ
ås
Akq
E k (b)
E k (s )
ÎS
Probabilities known but hidden
sequence unknown
• Problem: Given an HMM and a
sequence of rolls, find the most
probably underlying generating path.
• Let x1 x 2 … x n be the sequence of rolls.
• Let VF(i) denote the probability of the
most probable path of x1 x 2 … x i that
ends in state F. (Define VL(i) similarly.)
Probabilities known but hidden
sequence unknown
• Initialize: Vbegin (0) =1, VL (0) = 0, VF (0) = 0
• Recurrence: for i=0..n-1
ì VL (i)aLL
ï
VL (i + 1) = eL (xi+1 )max í VF (i)aFL
ï V (i)a
BL
î B
ì VL (i)aLF
ï
VF (i + 1) = eF (xi+1 )max í VF (i)aFF
ï V (i)a
BF
î B
V(i +1) = max{VF (i +1),VL (i +1)}
ì0 if max from VF ü
T(i + 1) = í
ý
î1 if max from VL þ
ü
ï
ý
ï
þ
ü
ï
ý
ï
þ
Probabilities and hidden
sequence unknown
•
•
•
•
Use Expected-Maximization algorithm (also
known as EM algorithm)
Very popular and many applications
For HMMs also called Baum-Welch
algorithm
Outline:
1.
2.
3.
4.
Start with random assignment to transition and emission
probabilities
Find expected transition and emission probabilities
Estimate actual transition and emission probabilities from
expected values in previous step
Go to step 2 if probabilities not converged
HMM forward probabilities
• Consider the total probability of all hidden sequences
under a given HMM.
• Let fL(i) be the sum of the probabilities of all hidden
sequences upto i that end in the state L.
• Then fL(i) is given by
æ a f (i - 1) +
BL B
ç
f L (i) = eL (xi ) ç aLL f L (i - 1) +
ç a f (i - 1)
FL F
è
• We calculate fF(i) in the same way.
• We call these forward probabilities:
– f(i) = fL(i)+fF(i)+fB(i)
ö
÷
÷
÷
ø
HMM backward probabilities
• Similarly we can calculate backward probabilties
bL(i).
• Let bL(i) be the sum of the probabilities of all hidden
sequences from i to the end that start in state L.
• Then bL(i) is given by æ
ö
aLE bE (i + 1) +
ç
bL (i) = eL (xi ) ç aLL bL (i + 1) +
ç a b (i + 1)
LF F
è
• We calculate bF(i) in the same way.
• We call these forward probabilities:
– b(i) = bL(i)+bF(i)+bB(i)
÷
÷
÷
ø
Baum Welch
• How do we calculate expected transition
and emission probabilities?
• Consider the fair-loaded die problem.
What is the expected transition of fair
(F) to loaded (L)?
• To answer we have to count the number
of times F transitions to L in all possible
hidden sequences and multiply each by
the probability of the hidden sequence
Baum Welch
•
•
•
•
•
For example suppose input is 12
What are the different hidden sequences?
What is the probability of each?
What is the total probability?
What is the probability of all hidden
sequences where the first state is F?
• Can we determine these answers
automatically with forward and backward
probabilities?
Baum Welch
General formula for expected number of transitions
from state k to l.
General formula for expected number of emissions
of b from state k.
Equations from Durbin et. al., 1998
Baum Welch
1. Initialize random values for all
parameters
2. Calculate forward and backward
probabilities
3. Calculate new model parameters
4. Did the new probabilities (parameters)
change by more than 0.001? If yes
goto step 2. Otherwise stop.