Download Markov chain Monte Carlo

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Species distribution wikipedia, lookup

Quantitative trait locus wikipedia, lookup

Hardy–Weinberg principle wikipedia, lookup

Transcript
Markov Chain Monte Carlo
Hadas Barkay
Anat Hashavit
Motivation

Models built to represent diverse types of genetic
data tend to have a complex structure. Computation
on these structures are often intractable.


We see this, for instance, in pedigree analysis.
Stochastic simulation techniques are good ways to
obtain approximation.

Markov Chain Monte Carlo methods have proven to be
quite useful for such simulations.
Markov chain


Markov chains are random processes X1,X2,…,having a
discrete 1-dimensional index set (1,2,3…)
domain D of m states {s1,…,sm} from which it takes
it’s values

Satisfying the markov property:
 The distribution of xt+1 given all the previous states
xt xt-1 … depends only on the immediately preceding
state xt
P[X t 1  x t 1 | X t  x t , X t -1  x t -1 , . . . , X1  x1 , X 0  x 0 ]
 P[X t 1  x t 1 | X t  x t ]
X
S1
X2
S2
X3
S3
Xi-1
Si-1
Xi
Si
Xi+1
Si+1
X
S1
X2
S2
X3
S3
Xi-1
Si-1
Xi
Si
Xi+1
Si+1
A markov chain also consists of
1. An m dimensional initial distribution vector
(  (s1 ),
,  (s m ))
2. An m×m transition probabilities transition matrix
P= (Aij) s.t P(i,j)=Pr[xt+1=si|xt=sj]
Notice that the transition probability between Xt
and Xt+2 are also derived in the following way:
Pr( xt  2  sk | xt  si )   Pr(xt  2  sk , xt 1  s j | xt  si )
j
  Pr( xt  2  sk | xt  s j ) Pr(xt 1  s j | xt  si )
j
  pij p jk
j


The multiplications shown in the previous slide are
equivalent to multiplying M by itself .
We conclude that P2ik= Pr(xt+2=sk|xt=si)
more generally Pnik = Pr(xt+n=sk|xt=si)
Stationary distribution
A distribution P = (1,…, m) is stationary if P P = P
•
•
When P is unique for a transition matrix P , the rows
of lim n  P n all coinside with P.
in other words , the long-run probability of being in
state j is Pj, regardless of the initial state.
The gambler example




a gambler wins 1 dollar with probability
losses 1 dollar with probability 1-p
the gambler plays 5 rounds.
The transition matrix of that markov chain is:
0
1
2
3
4 5
0 1
1 1  p

2 0
3 0

4 0

5 0
0
0
1 p
0
0
0
0
p
0
1 p
0
0
0
0
p
0
1 p
0
0
0
0
p
0
0
0
0

0
0

p

1



there is no unique stationary distribution for this
example
any distribution  for which 1+6=1 will satisfy P=.
For example:
0
0
0
 1
1  p 0
p
0

p
 0 1 p 0


 0.5 0 0 0 0 0.5  
0
0 1 p 0

 0
0
0 1 p

0
0
0
 0
0
0
0
p
0
0
0
0

0 


0.5
0
0
0
0
0.5



0

p

1

For p=0.5 we can see that
0
0
0 0 0 
1
1
0.5 0 0.5 0 0 0 
0.8



 0 0.5 0 0.5 0 0 
0.6
 
0


0 0.5 0 p 0
0.4



0
0.2
0
0 0.5 0 0.5



0
0
0
0
0
1


0

0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
The long run frequency of being in a given state
depends on the initial state
0 
0.2 

0.4 
0.6 

0.8 

1 
Some properties of Markov chains





A Markov chain is aperiodic if the chances of getting
from state to state are not periodic in the number of
states needed.
States i,j communicate if state j is reachable from
state i, i.e there exists a finite number n s.t pn(i,j)>0.
A Markov chain is said to be irreducible if all its
states communicate.
A Markov chain is said to be ergodic if it is
irreducible and aperiodic.
An ergodic Markov chain always has a unique
stationary distribution
MCMC - Monte Carlo Markov Chain
Goal:
Calculating an Expectatoin Value  
   (i)
i
i
When the probabilities i are given up to an
unknown normalization factor Z.
Examples :
 E 
1
1. Boltzman distribution:  i  exp   i 
Z
 kbT 
 Ei 
where Z   exp  

k
T
i
 b 
is the canonical Partitioning function,
which is sometimes incalculable.
2. Constructing the likelihood function …
Prob(data| 2) = P(x11 ,x12 ,x13 ,x21 ,x22 ,x23) =
Probability of data (sum over all
states of all hidden variables)
Prob(data| 2) = P(x11 ,x12 ,x13 ,x21 ,x22 ,x23) =
 l11m, l11f … s23f [P(l11m) P(l11f) P(x11 | l11m, l11f,) … P(s13m)
P(s13f) P(s23m | s13m, 2) P(s23m | s13m, 2) ]
Method: Simulate
don’t go over all space.
Calculate an estimator to the expectation value.
We want to calculate the expectation value     s ( s )
s
Instead, we calculate an estimator to the
expectation value, by sampling from the space .
1 n 1
  E  ( S )    ( Si )
n i 0
MCMC tells us how to carry out the sampling to create
a good (unbiased, low variance) estimator.
1 n 1
lim   ( Si )  lim   
n  n
n 
i 0
MCMC: create an ergodic (=irreducible, aperiodic)
Markov Chain and simulate from it.
We need to define an ergodic transition matrix P   pij 
for which  is a unique stationary distibution.
If P satisfies detailed balance, i.e.  i pij   j p ji
for all i  j , then  is stationary for P:
 p  
i
ij
j
j
j
  i    j p ji
j
p ji
Metropolis (1953) suggested a simulating algorithm, in
which one defines P indirectly via a symmetric
transition matrix Q in the following way:
If you are in state i, propose a state j with probability qij
 j
Move from i to j with probability aij  min 1, . 
 i 
This assures detailed balance of P :
 i pij   j p ji
 i 
 j
 i qij min 1,    j q ji min 1, 
 i 
 j
qij min  i ,  j   q ji min  j ,  i 
Hastings (1970) generalized for a non-symmetric Q :
  j q ji 
aij  min 1,

  i qij 
Metropolis – Hastings Algorithm:
• Start with any state x0 .
• Step t: current state is xt 1  i .
Propose state j with probablity qij.
  j q ji 
compute aij  min 1,

  i qij 
Accept state j if aij  1.
else, draw uniformly from [0,1] a random
aij
number cc. If
state j
, accept
else , stay in state
Simulated Annealing
Goal: Find the most probable state of a
Markov Chain.
Use the Metropolis Algorithm with a
different acceptance probability:
1








aij  min 1,  j  
   i  
Where  is the temperature.
Cool gradually, i.e., allow the chain to run m
times with an initial temperature  0 , then
set a new temperature  1   0   and allow
the chain to run again. Repeat n times, until   0
Simulated Annealing (Kirkpatrick,1983)
Goal: Find the global minimum of a multi
-dimensional surface S ( x1 , x2 , , xn ), when
all states have an equal probability.
Use another variation of the Metropolis
Algorithm.
Suppose you are in state i. Draw a new state j,
and accept using the acceptance probabilty:

 ( S j  Si )  
aij  min 1 , C  exp  

kT



Where T is the temperature.
Cool gradually.
Parameters to play with:

n (number of loops)

m (number of iterations in each loop)

T0 (initial temperature, shouldn’t be too high or
too low)

T (cooling speed, shouldn’t be too fast)

k (“Boltzman constant” – scaling parameter)

C (first step scaling parameter)

x0  ( x0,1 , x0,2 ,

qij (how to draw the next step)
, x0,n ) (initial state)
MCMC Example: Descent Graph



Corresponds to a specific inheritance vector.
Vertices: the individuals’ genes (2 genes for each
individual in the pedigree).
Edges: represent the gene flow specified by the
inheritance vector. A child’s gene is connected by
an edge to the parent’s gene from which it flowed.
(Taken from tutorial 6)
1
2
11
12
13
14
21
22
23
24
a/b
a/b
a/b
a/b
a/c
Descent Graph
1
2
(a,b)
Assume that the descent
graph vertices below
represent the pedigree on
the left.
b/d
3
4
(a,b)
(a,b)
5
6
(a,b)
(a,c)
7
8
(b,d)
(Taken from tutorial 6)
Descent Graph
1
2
(a,b)
3
4
(a,b)
(a,b)
5
6
(a,b)
(a,c)
7
8
(b,d)
Assume that paternally inherited genes are on the left.
2. Assume that non-founders are placed in increasing
order.
3. A ‘1’ (‘0’) is used to denote a paternally (maternally)
originated gene.
 The gene flow above corresponds to the inheritance
vector: v = ( 1,1; 0,0; 1,1; 1,1; 1,1; 0,0 )
1.
(Taken from tutorial 6)
Creating a Markov Chain of descent graphs:
•A Markov Chain state is a (legal) descent graph.
•We need to define transition rules from one state to
another.
•We will show 4 possible transition rules :
T0 , T1 , T2a , T2b .
Transition rule T0 :
Choose parent, child.
Transition rule T0 :
Choose parent, child.
Swap paternal arc to maternal arc (or vice versa)
Small change in the graph
slow mixing
Transition rule T1 :
Choose person i.
Transition rule T1 :
Choose person i.
Perform a T0 transition from i to each one of
his/her children.
Transition rule T2a :
Choose a couple i.j with common children.
Transition rule T2a :
Choose a couple i.j with common children.
Exchange the subtree rooted at the maternal node
of i with the subtree rooted at the maternal node
of j. exchange also the paternally rooted subtrees.
Transition rule T2b :
Choose a couple i.j with common children.
Transition rule T2b :
Choose a couple i.j with common children.
Exchange the subtree rooted at the maternal node
of i with the subtree rooted at the paternal node
of j, and vice versa.
Problem :
a
1/1
2/2
3/3
b
1/1
2/2
1/1
3/3
3/3
1/2
2/2
1/1
2/2
3/3
1/2
Two legal graphs of inheritance.
No way to move from a to b using allowed transitions
(the states don’t communicate).
Solution: allow passage through illegal graphs (tunneling)
b
1/1
2/2
1/1
3/3
1/2
1/1
2/2
1/1
3/3
3/3
2/2
3/3
1/1
2/2
2/2
3/3
2/2
3/3
1/2
1/1
1/2
Applying tunneling in MCMC:
In order to move from state i to state j (one step of
the chain),
(1) Select a transition (+person+locus) and perform it.
(2) Toss a coin. If it says “head”, stop.
If it says “tail”, goto (1).
(3) If the proposed state j is legal, accept it with the
Metropolis probability
 j
aij  min 1, 
 i 
This assures that the Markov Chain is irreducible.
One use of descent graphs MCMC is the calculation of the
location score for location d, given marker evidence M and
trait phenotypes T :
 Prd T M  
log10 

Pr(
T
)


The difficult term to calculate is the conditional probability
Prd T M  .
Assuming linkage equilibrium we can write
Prd T M    Prd T Gˆ Pr  Gˆ M  ,where Ĝ is a descent graph.
Gˆ


MCMC: the stationary distribution of the chain is the
conditional distribution Pr Gˆ M .


If a sequence of legal descent graphs Gˆ1 ,
, Gˆ n is generated
by running the chain,
then for n large enough the estimate
n
1
is: Prd T M    Prd T Gˆ . The term Prd T Gˆ can be
n i 1
calculated quickly, using MENDEL.




The Gibbs sampler




Special case of the Metropolis - Hastings algorithm
for Cartesian product state spaces
Suppose that each sample point i=(i1,i2,i3..im) has m
components, for instance, a pedigree of m individuals
The Gibbs sampler updates one component of i at a
time.
If the component is chosen randomly and resampled
conditional on the components, then the acceptance
probability is 1 .
The Gibbs sampler
Proof:
 Let ic be the uniformly chosen component
 Denote the remaining components by i-c
 If j is a neighbor of i reachable by changing only
component ic then j-c=i-c.
j
1
 The proposal probability is q 
ij
m  {k : k c  i c } k

and satisfies piqij=pjqji
The ratio appearing in the acceptance probability is
i
1
j
 j q ji
m  {k : k c  j c } k

1
j
1
 i qij
i
m  {k : k c  i c } k
Example:
MCMC with ordered genotypes


Suppose we want to generate a pedigree for which we
know only the unordered genotype of 4 individuals in
the last generation .
We make the following simplifying assumptions:

The population is in HW equilibrium

The three loci are in linkage equilibrium

There is no crossover interference between the
three loci

One option is to first simulate the founders
genotype according to population frequencies,
then “drop” genes along the tree according to
Mendel’s law and the recombination fractions
1
2
11
12
13
14
21
22
23
24
2,4
1,2
2,2
2,4
1,2
1,2
1,2
1,3
1,3
3,4
1,2
1,3

One realization of an ordered genotype consisting
with the data we have is the following pedigree :
1
2
2|1
1|2
1|3
1|4
3|1
2|1
11
12
13
14
2|1
1|3
1|2
4|2
2|3
2|3
2|4
1|3
3|2
3|3
21
22
23
2|4
1|2
2|2
2|4
1|2
1|2
2|1
3|1
2|1
1|3
1|3
24
4|3
1|2
1|3

Computing the probability of a certain realization of
this pedigree is easy but even for this small tree the
proposal we saw is just one of ((43)2)4x(23)12=284
options.

It is obvious we need a better method .
The Gibbs sampler
The Gibbs sampler updates the genotype of one
individual at one locus at a time.
 This is easy given all other data due to the local nature
of the dependency.
We need to compute
* Pr[xim(l), xip(l)|xim(-l), xip(-l), x-i,y]
Where xim(l), xip(l) are the maternal and paternal alleles
of locus l of individual i,
xim(-l), xip(-l) are the paternal and maternal alleles of all
other loci of individual i,
y is the phenotype data,
and x-i denotes the data of all other individuals in the
pedigree.




This calculation involves only the phenotype of i
and the genotypes of his parents and children
under the no crossover assumption the genotype
of on locus across a chromosome depends only
on it’s adjacent loci
We get the following expression to calculate *
Pr[x (l ), x (l ) | x ( l ), x ( l ), x , y ] 
im
ip
im
ip
i
Pr[x | xm ]Pr[x | x p ]Pr[y (l ) | x (l )]
Pr[x
|x ]

im
ip
i
i
km
k kids (i )
Pr[x
|x ]
x (l ) Pr[xim | xm ]Pr[xip | x p ]Pr[y (l ) | xi (l )]

i
km
i
k kids (i )

Lets return to the pedigree we showed earlier.
Assume that the ordered genotype is the current
state, and we want to update individual 13 at locus
number 2.

There are 3 genotypes consistent with the genotype
configuration of other individuals in the pedigree
1|1,2|1, 1|3

We need to calculate the probability of each one of
these states according to the equation shown in the
last slide

We first compute the paternal and maternal
transmission probabilities for each allele in these
three genotypes.

Then we need the transmission probabilities from
individual 13 to his kids.

Once the three have been computed ,they are
normalized by dividing each one by the total sum.

The genotype is sampled according to that
distribution .
Problems :


Updating is done a single genotype at a time.
For a large pedigree this is a very slow process
The Gibbs sampler may also be reducible, i.e. some
legal states will never be reached.
For example:
A|B
A|A
A|C
B|C
A|C
by Gibbs
A|A
A|B
C|B

One solution to this problem is allowing illegal states
in the chain which will later be discarded.

Another is to identify the irreducible components
called islands and to define metropolis states
allowing to jump from one island to another.

However the whole process of finding the islands
and setting up proposal for the metropolis jumps is
not fully automated

the work involved for each different pedigree
prevents the widespread use of this method.
Other MCMC that use the Gibbs Sampler

The L-sampler (Heath, 1997)

A Whole Locus Gibbs sampler combining
Monte Carlo simulation with single-locus
peeling
Other MCMC that use the Gibbs
Sampler

The M-sampler (Thompson and Heath, 1999;
Thompson, 2000a)

A whole-meiosis Gibbs sampler which jointly
updates an entire meiosis from the full
conditional distribution
References



K. Lange, Mathematical and Statistical Methods for
Genetic Analysis, chapter 9, Springer (1997).
A. Bureau, T.P. Speed, Course stat 260: Statistics in
Genetics, week 7.
A.W. George, E. Thompson, Discovering Disease
Genes: Multipoint Linkage Analysis via a New Markov
Chain Monte Carlo Approach, Statistical Science,
Vol. 18, No.4 (2003)

Lecture notes.

Tutorial notes.
The
End!!