Download Markov chain Monte Carlo

Markov Chain Monte Carlo Hadas Barkay Anat Hashavit Motivation  Models built to represent diverse types of genetic data tend to have a complex structure. Computation on these structures are often intractable.   We see this, for instance, in pedigree analysis. Stochastic simulation techniques are good ways to obtain approximation.  Markov Chain Monte Carlo methods have proven to be quite useful for such simulations. Markov chain   Markov chains are random processes X1,X2,…,having a discrete 1-dimensional index set (1,2,3…) domain D of m states {s1,…,sm} from which it takes it’s values  Satisfying the markov property:  The distribution of xt+1 given all the previous states xt xt-1 … depends only on the immediately preceding state xt P[X t 1  x t 1 | X t  x t , X t -1  x t -1 , . . . , X1  x1 , X 0  x 0 ]  P[X t 1  x t 1 | X t  x t ] X S1 X2 S2 X3 S3 Xi-1 Si-1 Xi Si Xi+1 Si+1 X S1 X2 S2 X3 S3 Xi-1 Si-1 Xi Si Xi+1 Si+1 A markov chain also consists of 1. An m dimensional initial distribution vector (  (s1 ), ,  (s m )) 2. An m×m transition probabilities transition matrix P= (Aij) s.t P(i,j)=Pr[xt+1=si|xt=sj] Notice that the transition probability between Xt and Xt+2 are also derived in the following way: Pr( xt  2  sk | xt  si )   Pr(xt  2  sk , xt 1  s j | xt  si ) j   Pr( xt  2  sk | xt  s j ) Pr(xt 1  s j | xt  si ) j   pij p jk j   The multiplications shown in the previous slide are equivalent to multiplying M by itself . We conclude that P2ik= Pr(xt+2=sk|xt=si) more generally Pnik = Pr(xt+n=sk|xt=si) Stationary distribution A distribution P = (1,…, m) is stationary if P P = P • • When P is unique for a transition matrix P , the rows of lim n  P n all coinside with P. in other words , the long-run probability of being in state j is Pj, regardless of the initial state. The gambler example     a gambler wins 1 dollar with probability losses 1 dollar with probability 1-p the gambler plays 5 rounds. The transition matrix of that markov chain is: 0 1 2 3 4 5 0 1 1 1  p  2 0 3 0  4 0  5 0 0 0 1 p 0 0 0 0 p 0 1 p 0 0 0 0 p 0 1 p 0 0 0 0 p 0 0 0 0  0 0  p  1    there is no unique stationary distribution for this example any distribution  for which 1+6=1 will satisfy P=. For example: 0 0 0  1 1  p 0 p 0  p  0 1 p 0    0.5 0 0 0 0 0.5   0 0 1 p 0   0 0 0 1 p  0 0 0  0 0 0 0 p 0 0 0 0  0    0.5 0 0 0 0 0.5    0  p  1  For p=0.5 we can see that 0 0 0 0 0  1 1 0.5 0 0.5 0 0 0  0.8     0 0.5 0 0.5 0 0  0.6   0   0 0.5 0 p 0 0.4    0 0.2 0 0 0.5 0 0.5    0 0 0 0 0 1   0  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 The long run frequency of being in a given state depends on the initial state 0  0.2   0.4  0.6   0.8   1  Some properties of Markov chains      A Markov chain is aperiodic if the chances of getting from state to state are not periodic in the number of states needed. States i,j communicate if state j is reachable from state i, i.e there exists a finite number n s.t pn(i,j)>0. A Markov chain is said to be irreducible if all its states communicate. A Markov chain is said to be ergodic if it is irreducible and aperiodic. An ergodic Markov chain always has a unique stationary distribution MCMC - Monte Carlo Markov Chain Goal: Calculating an Expectatoin Value      (i) i i When the probabilities i are given up to an unknown normalization factor Z. Examples :  E  1 1. Boltzman distribution:  i  exp   i  Z  kbT   Ei  where Z   exp    k T i  b  is the canonical Partitioning function, which is sometimes incalculable. 2. Constructing the likelihood function … Prob(data| 2) = P(x11 ,x12 ,x13 ,x21 ,x22 ,x23) = Probability of data (sum over all states of all hidden variables) Prob(data| 2) = P(x11 ,x12 ,x13 ,x21 ,x22 ,x23) =  l11m, l11f … s23f [P(l11m) P(l11f) P(x11 | l11m, l11f,) … P(s13m) P(s13f) P(s23m | s13m, 2) P(s23m | s13m, 2) ] Method: Simulate don’t go over all space. Calculate an estimator to the expectation value. We want to calculate the expectation value     s ( s ) s Instead, we calculate an estimator to the expectation value, by sampling from the space . 1 n 1   E  ( S )    ( Si ) n i 0 MCMC tells us how to carry out the sampling to create a good (unbiased, low variance) estimator. 1 n 1 lim   ( Si )  lim    n  n n  i 0 MCMC: create an ergodic (=irreducible, aperiodic) Markov Chain and simulate from it. We need to define an ergodic transition matrix P   pij  for which  is a unique stationary distibution. If P satisfies detailed balance, i.e.  i pij   j p ji for all i  j , then  is stationary for P:  p   i ij j j j   i    j p ji j p ji Metropolis (1953) suggested a simulating algorithm, in which one defines P indirectly via a symmetric transition matrix Q in the following way: If you are in state i, propose a state j with probability qij  j Move from i to j with probability aij  min 1, .   i  This assures detailed balance of P :  i pij   j p ji  i   j  i qij min 1,    j q ji min 1,   i   j qij min  i ,  j   q ji min  j ,  i  Hastings (1970) generalized for a non-symmetric Q :   j q ji  aij  min 1,    i qij  Metropolis – Hastings Algorithm: • Start with any state x0 . • Step t: current state is xt 1  i . Propose state j with probablity qij.   j q ji  compute aij  min 1,    i qij  Accept state j if aij  1. else, draw uniformly from [0,1] a random aij number cc. If state j , accept else , stay in state Simulated Annealing Goal: Find the most probable state of a Markov Chain. Use the Metropolis Algorithm with a different acceptance probability: 1         aij  min 1,  j      i   Where  is the temperature. Cool gradually, i.e., allow the chain to run m times with an initial temperature  0 , then set a new temperature  1   0   and allow the chain to run again. Repeat n times, until   0 Simulated Annealing (Kirkpatrick,1983) Goal: Find the global minimum of a multi -dimensional surface S ( x1 , x2 , , xn ), when all states have an equal probability. Use another variation of the Metropolis Algorithm. Suppose you are in state i. Draw a new state j, and accept using the acceptance probabilty:   ( S j  Si )   aij  min 1 , C  exp    kT    Where T is the temperature. Cool gradually. Parameters to play with:  n (number of loops)  m (number of iterations in each loop)  T0 (initial temperature, shouldn’t be too high or too low)  T (cooling speed, shouldn’t be too fast)  k (“Boltzman constant” – scaling parameter)  C (first step scaling parameter)  x0  ( x0,1 , x0,2 ,  qij (how to draw the next step) , x0,n ) (initial state) MCMC Example: Descent Graph    Corresponds to a specific inheritance vector. Vertices: the individuals’ genes (2 genes for each individual in the pedigree). Edges: represent the gene flow specified by the inheritance vector. A child’s gene is connected by an edge to the parent’s gene from which it flowed. (Taken from tutorial 6) 1 2 11 12 13 14 21 22 23 24 a/b a/b a/b a/b a/c Descent Graph 1 2 (a,b) Assume that the descent graph vertices below represent the pedigree on the left. b/d 3 4 (a,b) (a,b) 5 6 (a,b) (a,c) 7 8 (b,d) (Taken from tutorial 6) Descent Graph 1 2 (a,b) 3 4 (a,b) (a,b) 5 6 (a,b) (a,c) 7 8 (b,d) Assume that paternally inherited genes are on the left. 2. Assume that non-founders are placed in increasing order. 3. A ‘1’ (‘0’) is used to denote a paternally (maternally) originated gene.  The gene flow above corresponds to the inheritance vector: v = ( 1,1; 0,0; 1,1; 1,1; 1,1; 0,0 ) 1. (Taken from tutorial 6) Creating a Markov Chain of descent graphs: •A Markov Chain state is a (legal) descent graph. •We need to define transition rules from one state to another. •We will show 4 possible transition rules : T0 , T1 , T2a , T2b . Transition rule T0 : Choose parent, child. Transition rule T0 : Choose parent, child. Swap paternal arc to maternal arc (or vice versa) Small change in the graph slow mixing Transition rule T1 : Choose person i. Transition rule T1 : Choose person i. Perform a T0 transition from i to each one of his/her children. Transition rule T2a : Choose a couple i.j with common children. Transition rule T2a : Choose a couple i.j with common children. Exchange the subtree rooted at the maternal node of i with the subtree rooted at the maternal node of j. exchange also the paternally rooted subtrees. Transition rule T2b : Choose a couple i.j with common children. Transition rule T2b : Choose a couple i.j with common children. Exchange the subtree rooted at the maternal node of i with the subtree rooted at the paternal node of j, and vice versa. Problem : a 1/1 2/2 3/3 b 1/1 2/2 1/1 3/3 3/3 1/2 2/2 1/1 2/2 3/3 1/2 Two legal graphs of inheritance. No way to move from a to b using allowed transitions (the states don’t communicate). Solution: allow passage through illegal graphs (tunneling) b 1/1 2/2 1/1 3/3 1/2 1/1 2/2 1/1 3/3 3/3 2/2 3/3 1/1 2/2 2/2 3/3 2/2 3/3 1/2 1/1 1/2 Applying tunneling in MCMC: In order to move from state i to state j (one step of the chain), (1) Select a transition (+person+locus) and perform it. (2) Toss a coin. If it says “head”, stop. If it says “tail”, goto (1). (3) If the proposed state j is legal, accept it with the Metropolis probability  j aij  min 1,   i  This assures that the Markov Chain is irreducible. One use of descent graphs MCMC is the calculation of the location score for location d, given marker evidence M and trait phenotypes T :  Prd T M   log10   Pr( T )   The difficult term to calculate is the conditional probability Prd T M  . Assuming linkage equilibrium we can write Prd T M    Prd T Gˆ Pr  Gˆ M  ,where Ĝ is a descent graph. Gˆ   MCMC: the stationary distribution of the chain is the conditional distribution Pr Gˆ M .   If a sequence of legal descent graphs Gˆ1 , , Gˆ n is generated by running the chain, then for n large enough the estimate n 1 is: Prd T M    Prd T Gˆ . The term Prd T Gˆ can be n i 1 calculated quickly, using MENDEL.     The Gibbs sampler     Special case of the Metropolis - Hastings algorithm for Cartesian product state spaces Suppose that each sample point i=(i1,i2,i3..im) has m components, for instance, a pedigree of m individuals The Gibbs sampler updates one component of i at a time. If the component is chosen randomly and resampled conditional on the components, then the acceptance probability is 1 . The Gibbs sampler Proof:  Let ic be the uniformly chosen component  Denote the remaining components by i-c  If j is a neighbor of i reachable by changing only component ic then j-c=i-c. j 1  The proposal probability is q  ij m  {k : k c  i c } k  and satisfies piqij=pjqji The ratio appearing in the acceptance probability is i 1 j  j q ji m  {k : k c  j c } k  1 j 1  i qij i m  {k : k c  i c } k Example: MCMC with ordered genotypes   Suppose we want to generate a pedigree for which we know only the unordered genotype of 4 individuals in the last generation . We make the following simplifying assumptions:  The population is in HW equilibrium  The three loci are in linkage equilibrium  There is no crossover interference between the three loci  One option is to first simulate the founders genotype according to population frequencies, then “drop” genes along the tree according to Mendel’s law and the recombination fractions 1 2 11 12 13 14 21 22 23 24 2,4 1,2 2,2 2,4 1,2 1,2 1,2 1,3 1,3 3,4 1,2 1,3  One realization of an ordered genotype consisting with the data we have is the following pedigree : 1 2 2|1 1|2 1|3 1|4 3|1 2|1 11 12 13 14 2|1 1|3 1|2 4|2 2|3 2|3 2|4 1|3 3|2 3|3 21 22 23 2|4 1|2 2|2 2|4 1|2 1|2 2|1 3|1 2|1 1|3 1|3 24 4|3 1|2 1|3  Computing the probability of a certain realization of this pedigree is easy but even for this small tree the proposal we saw is just one of ((43)2)4x(23)12=284 options.  It is obvious we need a better method . The Gibbs sampler The Gibbs sampler updates the genotype of one individual at one locus at a time.  This is easy given all other data due to the local nature of the dependency. We need to compute * Pr[xim(l), xip(l)|xim(-l), xip(-l), x-i,y] Where xim(l), xip(l) are the maternal and paternal alleles of locus l of individual i, xim(-l), xip(-l) are the paternal and maternal alleles of all other loci of individual i, y is the phenotype data, and x-i denotes the data of all other individuals in the pedigree.     This calculation involves only the phenotype of i and the genotypes of his parents and children under the no crossover assumption the genotype of on locus across a chromosome depends only on it’s adjacent loci We get the following expression to calculate * Pr[x (l ), x (l ) | x ( l ), x ( l ), x , y ]  im ip im ip i Pr[x | xm ]Pr[x | x p ]Pr[y (l ) | x (l )] Pr[x |x ]  im ip i i km k kids (i ) Pr[x |x ] x (l ) Pr[xim | xm ]Pr[xip | x p ]Pr[y (l ) | xi (l )]  i km i k kids (i )  Lets return to the pedigree we showed earlier. Assume that the ordered genotype is the current state, and we want to update individual 13 at locus number 2.  There are 3 genotypes consistent with the genotype configuration of other individuals in the pedigree 1|1,2|1, 1|3  We need to calculate the probability of each one of these states according to the equation shown in the last slide  We first compute the paternal and maternal transmission probabilities for each allele in these three genotypes.  Then we need the transmission probabilities from individual 13 to his kids.  Once the three have been computed ,they are normalized by dividing each one by the total sum.  The genotype is sampled according to that distribution . Problems :   Updating is done a single genotype at a time. For a large pedigree this is a very slow process The Gibbs sampler may also be reducible, i.e. some legal states will never be reached. For example: A|B A|A A|C B|C A|C by Gibbs A|A A|B C|B  One solution to this problem is allowing illegal states in the chain which will later be discarded.  Another is to identify the irreducible components called islands and to define metropolis states allowing to jump from one island to another.  However the whole process of finding the islands and setting up proposal for the metropolis jumps is not fully automated  the work involved for each different pedigree prevents the widespread use of this method. Other MCMC that use the Gibbs Sampler  The L-sampler (Heath, 1997)  A Whole Locus Gibbs sampler combining Monte Carlo simulation with single-locus peeling Other MCMC that use the Gibbs Sampler  The M-sampler (Thompson and Heath, 1999; Thompson, 2000a)  A whole-meiosis Gibbs sampler which jointly updates an entire meiosis from the full conditional distribution References    K. Lange, Mathematical and Statistical Methods for Genetic Analysis, chapter 9, Springer (1997). A. Bureau, T.P. Speed, Course stat 260: Statistics in Genetics, week 7. A.W. George, E. Thompson, Discovering Disease Genes: Multipoint Linkage Analysis via a New Markov Chain Monte Carlo Approach, Statistical Science, Vol. 18, No.4 (2003)  Lecture notes.  Tutorial notes. The End!!

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Markov chain Monte Carlo