Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Class 5: Hidden Markov Models . Sequence Models So far we examined several probabilistic model sequence models These model, however, assumed that positions are independent This means that the order of elements in the sequence did not play a role In this class we learn about probabilistic models of sequences Probability of Sequences an alphabet Let X1,…,Xn be a sequence of random variables over Fix We want to model P(X1,…,Xn) Markov Chains Assumption: Xi+1 is independent of the past once we know Xi This allows us to write: P (X1 ,, Xn ) P (X1 ) P (Xi 1 | X1 , , Xi ) i P (X1 ) P (Xi 1 | Xi ) i Markov Chains (cont) Assumption: P(Xi+1|Xi) is the same for all i Notation P(Xi+1=b |Xi=a ) = Aab specifying the matrix A and initial probabilities, we define P(X1,…,Xn) To avoid the special case of P(X1), we can use a special start state, and denote P(X1 = a) = Asa By Example: CpG islands human genome, CpG dinucleotides are relatively rare CpG pairs undergo a process called methylation that modifies the C nucleotide A methylated C can (with relatively high chance) mutate to a T Promotor regions are CpG rich In These regions are not methylated, and thus mutate less often These are called CpG islands CpG Islands We construct Markov chain for CpG rich and poor regions Using maximum likelihood estimates from 60K nucleotide, we get two models Ratio Test for CpG islands a sequence X1,…,Xn we compute the likelihood ratio P ( X1 , , Xn | ) S ( X1 , , Xn ) log P ( X1 , , Xn | ) Given A Xi Xi 1 log A X i Xi 1 i X i Xi 1 i Empirical Evalation Finding CpG islands Simple Minded approach: Pick a window of size N (N = 100, for example) Compute log-ratio for the sequence in the window, and classify based on that Problems: How do we select N? What do we do when the window intersects the boundary of a CpG island? Alternative Approach Build a model that include “+” states and “-” states A state “remembers” last nucleotide and the type of region A transition from a - state to a + describes a start of CpG island Hidden Markov Models Two components: A Markov chain of hidden states H1,…,Hn with L values P(Hi+1=k |Hi=l ) = Akl X1,…,Xn Assumption: Xi depends only on hidden state Hi P(Xi=a |Hi=k ) = Bka Observations Semantics P (X1 , , Xn , H1 , , Hn ) P (H1 , , Hn )P (X1 , , Xn | H1 , , Hn ) P (H1 ) P (Hi 1 | Hi )P (Xi | Hi ) i A0 Hi AHi Hi 1 BHi Xi i Example: Dishonest Casino Computing Most Probable Sequence Given: x1,…,xn Output: h*1,…,h*n such that P (x1 ,, xn , h1* ,, hn* ) max h1 ,,hn p (x1 ,, xn , h1 ,, hn ) Idea: If we know the value of hi, then the most probable sequence on i+1,…,n does not depend on observations before time i Vi(l) be the probability of the best sequence h1,…,hi such that hi = l Let Dynamic Programming Rule P (x1 , , xi 1 , h1 , , hi 1 ) P (x1 ,, xi , h1 , , hi )P (hi 1 | hi )P (xi 1 | hi 1 ) P (x1 ,, xi , h1 , , hi )Ahi hi 1 Bhi 1 xi 1 so Vi 1 (l ) BlXi 1 maxk Vi (k )Akl Viterbi Algorithm V0(0) = 1, V0(l) = 0 for l > 0 for i = 1, …, n for l = 1,…,L set V (l ) B max k Vi (k )Akl i 1 lX Set i 1 Pi 1 (l ) arg max k Vi (l )Akl h*n = argmaxl Vn(l) for i = n-1,…,1 set h*i = Pi+1(h*i+1) Let Computing Probabilities Given: x1,…,xn Output: P(x*1,…,x*n ) P (x1 ,, xn ) P (x ,, x ,h ,,h ) h1 ,,hn 1 n 1 n How do we sum of exponential number of hidden sequences? Forward Algorithm Perform dynamic programming on sequences Let fi(l) = P(x1,…,xi,Hi=l) Recursion rule: fi 1 (l ) Blx i 1 fi (k )Akl k Conclusion P (x1 ,, xn ) fn (k ) k Backward Algorithm Perform dynamic programming on sequences Let bi(l) = P(xi+1,…,xn|Hi=l) Recursion rule: bl (i ) Bkxi 1 bi 1 (k )Akl k Conclusion P (x1 ,, xn ) b0 (l ) k Computing Posteriors How do we compute P(Hi | x1,…,xn) ? P (Hi l , x1 , , xn ) P (Hi l | x1 , , xn ) P (x1 , , xn ) P (Hi l , x1 , , xi )P (xi 1 , , xn | Hi l ) P (x1 , , xn ) fi (l )bi (l ) P (x1 , , xn ) Dishonest Casino (again) Computing posterior probabilities for “fair” at each point in a long sequence: Learning Given a sequence x1,…,xn, h1,…,hn How do we learn Akl and Bka ? We want to find parameters that maximize the likelihood P(x1,…,xn, h1,…,hn) We simply count: Nkl - number of times hi=k & hi+1=l Nka - number of times hi=k & xi = a Nkl Akl Nkl ' l' Bka Nka Nka ' a' Learning Given only sequence x1,…,xn How do we learn Akl and Bka ? We want to find parameters that maximize the likelihood P(x1,…,xn) Problem: Counts are inaccessible since we do not observe hi If we have Akl and Bka we can compute P (Hi k , Hi 1 l | x1 , , xn ) P (Hi k , Hi 1 l , x1 , , xn ) P (x1 , , xn ) P (Hi k , x1 , , xi )P (xi 1 | Hi 1 l )P (xi 2 , , xn | Hi 1 l ) P (x1 , , xn ) fi (k )Blx i 1 bi 1 (l ) P (x1 , , xn ) Expected Counts We can compute expected number of times hi=k & hi+1=l E [Nkl ] P (Hi k , Hi 1 l | x1 ,, xn ) i Similarly E [Nka ] P (H i ,xi a i k | x1 ,, xn ) Expectation Maximization (EM) Choose Akl and Bka E-step: Compute expected counts E[Nkl], E[Nka] M-Step: E [Nkl ] A'kl Restimate: E [N l' kl ' ] E [Nka ] B'ka E [Nka '] a' Reiterate EM - basic properties P(x1,…,xn: Akl, Bka) P(x1,…,xn: A’kl, B’ka) Likelihood grows in each iteration If P(x1,…,xn: Akl, Bka) = P(x1,…,xn: A’kl, B’ka) then Akl, Bka is a stationary point of the likelihood either a local maxima, minima, or saddle point Complexity of E-step Compute forward and backward messages Time & Space complexity: O(nL) Accumulate expected counts 2 Time complexity O(nL ) 2 Space complexity O(L ) EM - problems Local Maxima: Learning can get stuck in local maxima Sensitive to initialization Require some method for escaping such maxima Choosing L We often do not know how many hidden values we should have or can learn