Download bioinfo5a

Hidden Markov Models Fundamentals and applications to bioinformatics. Markov Chains  Given a finite discrete set S of possible states, a Markov chain process occupies one of these states at each unit of time.  The process either stays in the same state or moves to some other state in S.  This occurs in a stochastic way, rather than in a deterministic one.  The process is memoryless and time homogeneous. Transition Matrix  Let S={S1, S2, S3}. A Markov Chain is described by a table of transition probabilities such as the following: S1 S2 S3 S1 0 1 0 S2 1/3 2/3 0 S3 1/2 1/3 1/6 S1 1 S2 1/3 2/3 1/3 1/2 S3 1/6 A simple example  Consider a 3-state Markov model of the weather. We assume that once a day the weather is observed as being one of the following: rainy or snowy, cloudy, sunny.  We postulate that on day t, weather is characterized by a single one of the three states above, and give ourselves a transition probability matrix A given by:  0.4 0.3 0.3     0.2 0.6 0.2   0.1 0.1 0.8    -2 Given that the weather on day 1 is sunny, what is the probability that the weather for the next 7 days will be “sun-sun-rain-rain-suncloudy-sun”? PS3 , S3 , S3 , S1 , S1 , S3 , S2 , S3 | Model  P[ S3 ]  P[ S3 | S3 ]P[ S3 | S3 ]  P[ S1 | S3 ]  P[ S1 | S1 ]  P[ S3 | S1 ]  P[ S 2 | S3 ]  P[ S3 | S 2 ]  1 (0.8) 2 (0.1)(0.4)(0.3)(0.1)(0.2)  1.536 104 -3 Given that the model is in a known state, what is the probability it stays in that state for exactly d days?  The answer is pi d   aii d 1 1  aii   Thus the expected number of consecutive days in the same state is 1 di  1  aii  So the expected number of consecutive sunny days, according to the model is 5. Elements of an HMM  What if each state does not correspond to an observable (physical) event? What if the observation is a probabilistic function of the state? An HMM is characterized by the following: 1) N, the number of states in the model. 2) M, the number of distinct observation symbols per state. 3) the state transition probability distribution A  aij where  aij  P q j | qi 4) the observation symbol probability distribution in state qj, , where bj(k) is the probability that the k-th observation symbol pops up at time t, given that the model is in state Ej. B  bi k  5)    the initial state distribution p   pi  Three Basic Problems for HMMs 1) 2) 3) Given the observation sequence O = O1O2O3…Ot, and a model m = (A, B, p), how do we efficiently compute P(O | m)? Given the observation sequence O and a model m, how do we choose a corresponding state sequence Q = q1q2q3…qt which is optimal in some meaningful sense? How do we adjust the model parameters to maximize P(O | m)? Solution to Problem (1)  Given an observed output sequence O, we have that P (O)   P[O | Q]  P[Q] Q  This calculation involves the sum of NT multiplications, each being a multiplication of 2T terms. The total number of operations is on the order of 2T NT.  Fortunately, there is a much more efficient algorithm, called the forward algorithm. The Forward Algorithm  It focuses on the calculation of the quantity  t , i   P[O1 ,Ot | qt  Si ] which is the joint probability that the sequence of observations seen up to and including time t is O1,…,Ot, and that the state of the HMM at time t is Ei. Once these quantities are known, N PO     T , i  i 1 …continuation  The calculation of the (t, i)’s is by induction on t.  1, i   pi bi O1  From the formula N  t  1, i    PO1 Ot 1 , qt 1  Si and qt  S j  j 1 we get N  t  1, i    t , j a ijbi Ot 1  j 1 Backward Algorithm  Another approach is the backward algorithm.  Specifically, we calculate (t, i) by the formula  Again, by induction one can find the (t,i)’s starting with the value t = T – 1, then for the value t = T – 2, and so on, eventually working back to t = 1.  t , i   POt 1 OT | qt  Ei  Solution to Problem (2)  Given an observed sequence O = O1,…,OT of outputs, we want to compute efficiently a state sequence Q = q1,…,qT that has the highest conditional probability given O.  In other words, we want to find a Q that makes P[Q | O] maximal.  There may be many Q’s that make P[Q | O] maximal. We give an algorithm to find one of them. The Viterbi Algorithm  It is divided in two steps. First it finds maxQ P[Q | O], and then it backtracks to find a Q that realizes this maximum.  First define, for arbitrary t and i, (t,i) to be the maximum probability of all ways to end in state Si at time t and have observed sequence O1O2…Ot.  Then maxQ P[Q and O] = maxi (T,i) -2 But P[Q and O] max P[Q | O]  max Q Q P[O]  Since the denominator on the RHS does not depend on Q, we have arg max P[Q | O]  arg max P[Q and O] Q Q  We calculate the (t,i)’s inductively.  (1, i)  pi bi (O1 )  t , j   max  t  1, i aijb j Ot  1i  N -3 Finally, we recover the qi’s as follows. Define and put: q T  S (T) . ; T   arg max  T , i  1i  N  This is the last state in the state sequence desired. The remaining qt for t < T are found recursively by defining  t   arg max  t , i ai t 1 and putting 1i  N qt  S t  Solution to Problem (3)  We are given a set of observed data from an HMM for which the topology is known. We wish to estimate the parameters in that HMM.  We briefly describe the intuition behind the BaumWelch method of parameter estimation.  Assume that the alphabet M and the number of states N is fixed at the outset.  The data we use to estimate the parameters constitute a set of observed sequences {O(d)}. The Baum-Welch Algorithm  We start by setting the parameters pi, aij, bi(k) at some initial values.  We then calculate, using these initial parameter values: 1) pi* = the expected proportion of times in state Si at the first time point, given {O(d)}. -22)   a  EN | {O } E N (a) | {O } b (k )  * E Nij | {O ( d ) } ij 3) (d ) j (d ) i i E[ N i | {O ( d ) }] where Nij is the random number of times qt(d) =Si and qt+1(d) = Sj for some d and t; Ni is the random number of times qt(d) = Si for some d and t; and Ni(k) equals the random number of times qt(d) = Si and it emits symbol k, for some d and t. Upshot  It can be shown that if  = (pi, ajk, bi(k)) is substituted by * = (pi*, ajk*, bi*(k)) then P[{O(d)}| *]  P[{O(d)}| ], with equality holding if and only if * = .  Thus successive iterations continually increase the probability of the data, given the model. Iterations continue until a local maximum of the probability is reached.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download bioinfo5a