Download S2 Methods Part 1: The posterior probability of the transmission tree

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Neonatal infection wikipedia , lookup

Sociality and disease transmission wikipedia , lookup

Infection wikipedia , lookup

Infection control wikipedia , lookup

Hospital-acquired infection wikipedia , lookup

Hepatitis B wikipedia , lookup

Transmission (medicine) wikipedia , lookup

Transcript
S2 Methods
Part 1: The posterior probability of the transmission tree and phylogenetic tree.
Definitions:
 n = outbreak size
 S with elements Si = sampling times per host i 1, n


G with elements Gi = genome sequences per host i 1, n , each consisting of L nucleotides
A, C, G, T, or – if unknown – N
o Dij = number of different nucleotides between hosts i and j, not counting N’s
o
Dmax = maximum Dij for all host pairs i and j
o
Nij = number of nucleotide positions with N in host i and/or j
o
dij  Dij  Nij Dmax L = sequence distance between hosts i and j
I with elements Ii = infection times per host i 1, n

M with elements Mi = infectors per host i 1, n

P is the phylogenetic tree, being a set of numbered nodes x:
o x 1, n are the sampling nodes, corresponding to hosts i
o
x n  1,2n  1 are the coalescent nodes
x 2n,3n  1 are the transmission nodes, with node x corresponding to the
infection of host i  x  1  2n
Further, we define tx as the time of node x, vx as the ancestor node of node x, and hx as the
host in which node x resides (transmission nodes assigned to the infector, as the value of x
already defines the infectee)
mrcax ,y = MRCA (most recent common ancestor) of nodes x and y
o



Pi are the phylogenetic trees within each host: Pi  x hx  i , for which we further define
 x  t x  th 2n1 as the time of the node since infection of the host.
x

 = parameter vector, with parameters for generation time and sampling time distributions,
the parameters describing the within-host coalescent model, and the mutation rate
o aG , mG = shape parameter and mean of gamma-distributed generation time
o aS , mS = shape parameter and mean of gamma-distributed sampling time
o r = slope of within-host pathogen growth model
o  = mutation rate
notation
 d a ,m  ; p a ,m  are the density and cumulative density of a Gamma distribution with shape

parameter a and mean m
Pr .  is a probability or probability density

u   is the heaviside step function, equal to 0 if   0 , and equal to 1 if   0
1
The complete posterior probability
Pr  I, M, P , θ S, G  Pr  S, G I, M, P , θ   Pr  I, M, P , θ 
 Pr  S, G I, M, P , θ   Pr  I, M, P θ   Pr  θ 
 Pr  S, G, I, M, P θ   Pr  θ 
 Pr  G S, I, M, P , θ   Pr  P S, I, M, θ   Pr  S I, M, θ   Pr I, M θ   Pr  θ 
 Pr  G P , θ   Pr  P S, I, M, θ   Pr  S I, θ   Pr I, M θ   Pr  θ 
All steps until the last follow from standard probability rules. In the last step, dependencies are
removed that do not exist in the model. In the last line, the first four terms are likelihood terms, that
will be elaborated below. The last term is the prior.
Likelihood #1 for the genetic data
We assume a Jukes-Cantor substitution model with mutation rate :
 
Pr  G P , θ   
loci A ,C ,T ,G3 n1
1
4
 
 14 exp  t x  tvx
x

mut 1 


1
4
 
 43 exp  t x  tvx

1mut 1 
For each locus, for each possible assignment at each internal node (coalescent and transmission), for
all branches in the tree indicated by the end node x, the probability of a mutation is calculated where
 mut indicates if a mutation occurred on that branch and  indicates if the branch ends in a tip
without observed nucleotide (‘n’ in the sequence data). This likelihood can be calculated using
Felsenstein’s pruning algorithm [1]. Note that in the above formulation the actual rate of nucleotide
change is 0.75µ, as a mutation gives rise to any of the four nucleotides.
Likelihood #2 for the phylogenetic tree
The likelihood for the complete phylogenetic tree is a product of the likelihoods of trees in individual
hosts:
Pr  P S, I, M, θ   Pr  Pi Si , I, M, θ 
i
The dependency on the complete vectors I and M remains, because these determine the
transmission nodes with host i as infector.
We define the number of lineages in host i at time  since infection as
Li    1 

x xPi I n x 2 n
u   x  

x xPi I x 2n
u   x   u   i  ,
where u   is the heaviside step function, i.e. u    0 if   0 and u    1 if   0 , adding 1 at
infection and at each coalescent node, and subtracting 1 at each transmission node and at sampling .
2
The within-host dynamics w  , r  describes the product of pathogen generation time and effective
population size, the inverse of which determines the coalescent rate. As a general form, we choose
w  , r   r ,
which automatically gives a bottleneck of size 1 (because the coalescent rate  as   0 ), allows
for most coalescent nodes close to the time of infection (high r) or close to the transmission nodes
(low r).
Thus, the likelihood for the phylogenetic tree in host i becomes
   L    1

1
,
Pr  Pi Si , I, M, θ   exp     i
d  



w

,
r
w

2
    x xPi I n x 2n  x , r 
 0
 0 1
with       0 .
 2  2
Likelihood #3 for the sampling intervals
The sampling intervals are assumed to follow a gamma distribution with shape parameter aS and
with mean mS . The likelihood is the product of densities for all sampling times:
Pr  S I, θ    daS ,mS   Si  Ii 
i
Likelihood #4 for infection times and infectors
The generation intervals are assumed to follow a gamma distribution with shape parameter aG and
with mean mG . The likelihood is the product of densities for all generation intervals multiplied by the
probability of each transmission tree topology which we leave out because we assume that every
topology is a priori equally likely:
Pr  I, M θ   Pr  I M, θ   Pr  M θ  
d 
i Mi  0
 aG ,mG 
I  I 
i
Mi
Prior distributions
The model has 6 parameters: mutation rate , within-host growth rate r, sampling interval
distribution parameters aS and mS, and generation interval distribution parameters aG and mG. In the
current implementation, aS and aG are not estimated but chosen before the analysis. The prior
distributions for the other parameters are
 log    : U  ,   , which corresponds to a relative density on the original scale of
d  1  .
3

mS : D  0,S , 0,S  , a prior distribution with mean 0,S and standard deviation  0,S , which is
translated (see box below) into a Gamma-distributed prior for the rate parameter bS of the
sampling interval Gamma distribution:


bS :  a0,S  2  0,2 S  0,2 S , b0,S   0,S aS   1  0,2 S  0,2 S  . Here, a0,S is the shape and b0,S is
the rate. When using an uninformative prior, we set 0,S  1 and  0,S   .


mG : D  0,G , 0,G  , as for mS.
r :   a0,r , m0,r  . When using an uninformative prior, we set a0,r  3 and m0,r  1 .
We have aS fixed, and define bS :   a0,S , b0,S  , such that
1. E  aS bS   0,S  E 1 bS   0,S aS




2. E  0,S  aS bS    0,2 S  E  0,S aS  1 bS    0,2 S aS2
2
2
Now, we wish to calculate a0,S and b0,S .
If the density of rate bS is d  bS  , then the density of scale  S  1 bS is d  S   S2 . This is an inverse
Gamma distribution, of which the mean and variance can be filled into the equations above:
1. 0,S aS 
2.  0,2 S aS2 
b0,S
a0,S  1
a
0,S
b0,2 S
 1  a0,S  2 
2
Solving these equations for a0,S and b0,S gives the prior distribution for bS .
4
Part 2: Updating steps in the MCMC chain.
The mean sampling and generation intervals are directly sampled from their posteriors. For the other
parameters in  and unobserved variables Z = {I, M, P}, updating is always done by a MetropolisHastings step:
- proposing new values with proposal density G  θ ' θ  or H  Z ' Z , S, θ  , respectively
-
accepting the new values with probability α,
 Pr  S, G, Z θ '   Pr  θ '   G  θ θ '  

 Pr  S, G, Z θ   Pr  θ   G  θ ' θ  
  min 1,
or
 Pr  S, G, Z ' θ   Pr  θ   H  Z Z ', S, θ  
 Pr  S, G, Z ' θ   H  Z Z ', S, θ  
  min 1,
,
 Pr  S, G, Z θ   Pr  θ   H  Z ' Z , S, θ  
 Pr  S, G, Z θ   H  Z ' Z , S, θ  
respectively
  min 1,
Updating the mean sampling and generation intervals
The parameters mS and mG are updated by direct sampling from the posterior distributions of rate
parameters bS and bG and calculating mS  aS bS and mG  aG bG . These posterior distributions
are:


bS :   apost ,S  a0,S  n  aS , bpost ,S  b0,S   Si  Ii 
i



bG :   apost ,G  a0,G   n  1   aG , bpost ,G  b0,G   Ii  IMi

i Mi  0




Updating other parameters in 
Proposal
A new value  ' (being  ' or r ' ) is proposed as
log  '  : N log   , 2 
Thus, the proposal density of  ' is
G  '    dN log  '  log   , 2   '
The proposal variance  2 is calculated from the available data as follows:
-
For log    , the posterior density is proportional to the likelihood (because of the uniform
prior). Although the actual likelihood calculation is done with Felsenstein’s pruning algorithm
[1],  is the rate of a Poisson process, so it should be possible to express it in terms of a
number of realizations and the total exposure time. The number of realizations x is the
5
number of mutations, i.e. the parsimony of the phylogenetic tree; the exposure time D is the
sum of all branch length multiplied by the sequence length. By expressing the likelihood of
log    in terms of x and D, and normalizing it to get a proper distribution, the variance of
this distribution can be calculated as  1  x  , which is the trigamma function. Optimal
proposals in a Metropolis-Hastings step have a variance of 2.382 times the target variance
[2]. Assuming that the posterior phylogenetic trees reach maximum parsimony, the proposal
variance is set to  2  2.382 1  x  , with x the number of SNPs in the dataset.
-
For log  r  , a similar reasoning is used (although the prior distribution is different), because
1 r is proportional to the rate of a Poisson process as well. Here, the number of realizations
x is the number of coalescent nodes, which is equal to n – 1. So,  r2  2.382 1  n  1 .
Acceptance probability
The acceptance probability is
 Pr  S, G, Z θ '   Pr  θ '   G  θ θ '  
min 1,

 Pr  S, G, Z θ   Pr  θ   G  θ ' θ  
 Pr  S, G, Z θ '   Pr  '   dN log    log  '  , 2   

 min 1,
2
 Pr  S, G, Z θ   Pr    dN log  '   log   ,    ' 
For , the prior and proposal densities cancel out, and we are left with the acceptance probability
min 1,Pr  G P ,  ' Pr  G P ,    . For r, the acceptance probability becomes
 Pr  P S, I, M, r '  d a ,m  r '  r 
 0 ,r 0 ,r 

min 1,
 Pr  P S, I, M, r  da0 ,r ,m0 ,r   r  r ' 


6
Updating the transmission tree and phylogenetic tree
Proposals
The unobserved variables are updated in small sets at once. Three types of proposal are
distinguished: the first in which both transmission tree and phylogenetic tree are changed, the
second in which only the transmission tree is changed, and the third in which only the phylogenetic
tree is changed. In the R package the user can choose the probabilities with which these three
proposals are used; the default is 80%-20%-0% for the three types, respectively.
All three proposals take a single focal host i; proposals 1 and 2 start with step 1:
1. propose a new infection time Ii ' by sampling T :   aP , mS  from a Gamma distribution with
shape = aP  23 aS and mean = mS , and calculating Ii '  Si  T
Then, the following decision trees determine the subsequent proposal steps and acceptance
probabilities, laid out in proposal paths A-K below. In some cases, the proposal is immediately
rejected if no reverse proposal exists:
Proposal 1: changing phylogenetic and transmission trees.
Q1: is host i index case?

  is I ' before host i’s first transmission node?
{Q1=Y}Q2: is Ii '  min I j M j  i
i
{Q12=YY} follow proposal path A (Fig 5A)

   is I ' before host i’s second transmission node?
{Q12=YN}Q3: is Ii '  I j M j  i
i
2
(Yes, if no second transmission node exists)
{Q123=YNY} follow proposal path B (Fig 5B)
{Q123=YNN} follow proposal path C (Fig 5C)
{Q1=N}Q2: is Ii '  min  I  is Ii ' before infection of the index case?
{Q12=NY} follow proposal path D (Fig 5D)

  is I ' before host i’s first transmission node?
{Q12=NN}Q3: is Ii '  min I j M j  i
i
(Yes, if no transmission node exists)
{Q123=NNY} follow proposal path E (Fig 5E)
{Q123=NNN} follow proposal path F (Fig 5F)
Proposal 2: changing transmission tree, but not phylogenetic tree.
Q1: is host i index case?


{Q1=Y}Q2: is Ii '  min  x hx  i  is Ii ' before host i’s first coalescent node?
{Q12=YY} follow proposal path G
{Q12=YN} reject
{Q1=N}Q2: is Ii '  mrcai ,Mi  is Ii ' after the mrca of the samplings in hosts i and Mi ?
{Q12=NY} follow proposal path H (Fig 6A)
{Q12=NN}: new proposal step:
2. propose IMi ' by sampling T :   aP , mS  and calculating IMi '  SMi  T
Q3: is IMi '  mrcai ,Mi  is IMi ' after the mrca of host i and its infector?
{Q123=NNY}Q4: is MMi  0  is host i's infector the index case?
{Q1234=NNYY} follow proposal path I (Fig 6B)
{Q1234=NNYN}Q5: is Ii ' after the mrca of the samplings in hosts i and MMi ?
{Q12345=NNYNY} follow proposal path J (Fig 6C)
{Q12345=NNYNN} reject
{Q123=NNN} reject
7
Proposal 3: changing phylogenetic tree topology only.
follow proposal path K
Proposal path A {Tree 1, Q12=YY}
Situation
Host i is the index case, and Ii ' is before its first transmission node.
Proposed changes to transmission tree
1. Topological changes: none
2. Infection time changes:
a. I i changes to Ii '
Proposal steps
2. propose a new tree Pi ' by simulating the within-host coalescent model
Proposal distribution
The proposal distribution for Ii ' and Pi ' is
H  Ii ', Pi ' I, M, P , S, θ   Pr  Pi ' Ii ', I, M, P , S, θ   Pr  Ii ' I, M, P , S, θ 
 Pr  Pi ' I', M, Si , θ   Pr  Ii ' Si 
 Pr  Pi ' I', M, Si , θ   d aP ,mS   Si  Ii ' 
Acceptance probability
The acceptance probability (removing the dependency on  after the first line, for readability) is
 Pr  S, G, Z ' θ   H  Z Z ', S, θ  
min 1,

 Pr  S, G, Z θ   H  Z ' Z , S, θ  
 Pr  G P '   Pr  P ' S, I', M  Pr  S I'   Pr  I ', M   Pr  Pi Si , I, M   d a ,m   Si  Ii  
P
S

 min 1,
Pr
G
P

Pr
P
S
,
I
,
M

Pr
S
I

Pr
I
,
M

Pr
P
'
S
,
I
',
M

d


  
  
 i i  aP ,mS   Si  Ii '  

 Pr  G P '   Pr  Si Ii '   Pr  I', M  d a ,m   Si  Ii  
P
S

 min 1,
 Pr  G P   Pr  Si Ii   Pr  I, M  d aP ,mS   Si  Ii '  
Reverse proposal
This sampling step can be reversed by proposing through (the same) proposal path A, with the
original infection time Ii proposed for the same focal host i (going back in Fig 5A).
Proposal path B {Tree 1, Q123=YNY}
Situation
Host i is index case and Ii ' is sampled after its first transmission node (infecting host j), but before its
second transmission node, if there is any.
Proposed changes to transmission tree
8
1. Topological changes:
a. host i gets a new infector
b. host j becomes index case
2. Infection time change:
a. I i changes to Ii '
Proposal steps
2. propose a new infector Mi ' from the proposal distribution Fj  Ii '  , which is the probability to
select infector j at time Ii ' in the outbreak. The proposal distribution is given by
Fj  t  

u t  I j   d aG ,bG  t  I j   1 dij



k u t  Ik   daG ,bG  t  Ik   1 dij 
,
so the probability to select infector j is proportional to the density of the generation time
plus a weight 1 dij , with dij equal to the sequence distance between isolates from hosts i
and j (see list of definitions). This proposal distribution gives a sampling weight to possible
infectors based on their infection time and related to the genetic distance between isolates.
3. host j becomes the index case: Mj '  0
4. bookkeeping: make corresponding changes in the nodes and move one of the coalescent
nodes from host i to host Mi '
5. propose new trees Pi ' and PMi ' ' by simulating the within-host coalescent model
Proposal distribution
The proposal distribution for Ii ' , Mi ' , M j ' , Pi ' , and PMi ' ' is

H Ii ', Mi ', M j ', Pi ', PMi ' ' I, M, P , S, θ



 Pr Pi ', PMi ' ' Ii ', Mi ', M j ', I, M, P , S, θ 
Pr  Mi ', M j ' Ii ', I, M, P , S, θ  
Pr  Ii ' I, M, P , S , θ 

' I', M', S, θ   F

 Pr  Pi ' I', M', Si , θ   Pr PMi ' ' I', M', S j , θ  Pr  Mi ' I', θ   Pr  M j ' I', θ   Pr  Ii ' Si , θ 

 Pr Pi ', PMi '
Mi '
 Ii '   da ,m   Si  Ii '
P
S
Here, Pr  Mj ' ...  1 because it follows automatically from the proposed Ii ' .
Acceptance probability
The acceptance probability (removing the dependency on  after the first line, for readability) is
9
 Pr  S, G, Z ' θ   H  Z Z ', S, θ  
min 1,

 Pr  S , G, Z θ   H  Z ' Z , S, θ  




Pr  G P '   Pr  P ' S, I', M'   Pr  S I'   Pr  I', M'   Pr Pi , PMi ' S, I, M  d aP ,mS   Si  Ii 

 min 1,
 Pr  G P   Pr  P S, I, M  Pr  S I  Pr  I, M  Pr Pi ', PMi ' ' S, I', M'  FMi '  Ii '   d aP ,mS   Si  Ii '  



Pr  G P '   Pr  Si Ii '   Pr  I', M'   d aP ,mS   Si  Ii  

 min 1,
 Pr  G P   Pr  Si Ii   Pr  I, M  FMi '  Ii '   d aP ,mS   Si  Ii '  


Reversal
Reversal is possible through proposal path D. Fig 5D shows the reverse proposal path D from the
original path B in Fig 5B.
Proposal path C {Tree 1, Q123=YNN}
Situation
Host i is the index case and Ii ' is sampled after its second transmission node. Host j is the first
secondary case.
Proposed changes to transmission tree
1. Topological changes:
a. host j becomes index case, transmitting to host i
b. with 50% probability: the other secondary cases of hosts i and j are exchanged
2. Infection time change:
a. the sampled Ii ' is discarded; instead, the infection times of hosts i and j are switched
Proposal steps
2. discard the sampled infection time and switch infection times by proposing Ii '  I j and I j '  Ii
3. switch role by proposing Mi '  j and Mj '  0
4. bookkeeping: make corresponding changes in the nodes and move one of the coalescent
nodes from host i to host j
5. with 50% probability: for all k Mk  i  k  j , propose Mk '  j ; for all k Mk  j , propose
Mk '  i ; finish by bookkeeping: swap all transmission and coalescent nodes between Pi and Pj
6. propose new trees Pi ' and Pj ' by simulating the within-host coalescent model
Proposal distribution
The proposal distribution for Ii ' , I j ' , M, Pi ' , and Pj ' is
10
H  Ii ', I j ', M ', Pi ', Pj ' I, M, P , S, θ 


 Pr Pi ', Pj ' Ii ', I j ', M ', I, M, P , S, θ 


Pr M ' Ii ', I j ', I, M, P , S , θ 
Pr  Ii ', I j ' I, M, P , S, θ 


 Pr  Pi ' I ', M ', Si , θ   Pr Pj ' I', M ', S j , θ 
Pr  M ' I ', θ   Pr  I j ' Ii ', θ   Pr  Ii ' I, Si , θ 
 Pr  Pi ', Pj ' I', M ', S , θ   0.5  p aP ,mS   Si  I j 
Here, Pr M' M, θ   0.5 because of the two possible rearrangements of infectees that follow
automatically from the proposed Ii ' , and Pr  I j ' Ii ', θ   1 because it follows automatically from the
proposed Ii ' . In the last step, Pr  Ii ' I, Si , θ   paP ,bS   Si  I j  is the cumulative density of the sampling
interval distribution, which is the probability of taking proposal path C.
Acceptance probability
The acceptance probability (removing the dependency on after the first line, for readability) is
 Pr  S, G, Z ' θ   H  Z Z ', S, θ  
min 1,

 Pr  S, G, Z θ   H  Z ' Z , S, θ  
 Pr  G P '   Pr  P ' S, I', M'   Pr  S I'   Pr  I', M'   Pr  Pi , Pj S, I, M  0.5  p a ,m   S j  I j  
P
S

 min 1,
 Pr  G P   Pr  P S, I, M  Pr  S I  Pr  I, M  Pr  Pi ', Pj ' S, I', M'   0.5  p aP ,mS   Si  I j  
 Pr  G P '   Pr  S I'   Pr  I', M'   p a ,m   S j  I j  
P
S

 min 1,
 Pr  G P   Pr  S I  Pr  I, M  p aP ,mS   Si  I j  
Reversal
Reversal is possible through proposal path C, with focal host j. In Fig 5C, reversal takes place by
proposing an infection time for host IV after its second transmission event. It is important to note
that if host IV would not have a second transmission event, the reversal step is impossible; therefore,
proposal path C is always rejected if the proposed new index ends up with only one secondary case.
Proposal path D {Tree 1, Q123=NY}
Situation
Host i is not the index case, and Ii ' is sampled before infection of the index case
Proposed changes to transmission tree
1. Topological changes:
a. host i becomes index case, and the original index case j its first secondary case
2. Infection time change:
a. I i changes to Ii '
Proposal steps
11
2. host i becomes the index case: Mi '  0
3. host i becomes the infector of the original index case: Mj '  i
4. bookkeeping: make corresponding changes in the nodes and move one of the coalescent
nodes from host Mi to host i
5. propose new trees Pi ' and PMi ' by simulating the within-host coalescent model
Proposal distribution
The proposal distribution for Ii ' , Mi ' , M j ' , Pi ' , and PMi ' is

H Ii ', Mi ', MMi ', Pi ', PMi ' I, M, P , S, θ



 Pr Pi ', PMi ' Ii ', Mi ', M j ', I, M, P , S, θ 
Pr  Mi ', M j ' Ii ', I, M, P , S, θ  
Pr  Ii ' I, M, P , S, θ 


 Pr  Pi ' I', M', Si , θ   Pr PMi ' I', M', SMi , θ  Pr  Mi ', M j ' I', θ   Pr  Ii ' Si 


 Pr Pi ', PMi ' I', M', S, θ  daP ,mS   Si  Ii ' 
Here, Pr  Mi ', Mj ' ...  1 because it follows automatically from the proposed Ii ' .
Acceptance probability
The acceptance probability (removing the dependency on  after the first line, for readability) is
 Pr  S , G, Z ' θ   H  Z Z ', S, θ  
min 1,

 Pr  S, G, Z θ   H  Z ' Z , S, θ  
 Pr  G P '   Pr  P ' S , I', M '   Pr  S I '   Pr I ', M '   Pr Pi , PM S , I, M  FM  Ii   d a ,m  Si  Ii  
 P S
i
i

 min 1,


Pr  G P   Pr  P S , I, M  Pr  S I  Pr  I, M  Pr Pi ', PMi ' S , I', M '  d aP ,mS   Si  Ii ' 


 Pr  G P '   Pr  Si Ii '   Pr  I', M'   FMi  Ii   d a ,m   Si  Ii  
P
S

 min 1,
Pr  G P   Pr  Si Ii   Pr  I, M  d aP ,mS   Si  Ii ' 






Here, FMi  Ii  is the probability of proposing the original infector in the reverse proposal (path B).
Reversal
Reversal is possible through proposal path B. Fig 5B shows the reverse proposal path B from the
original path D in Fig 5D.
Proposal path E {Tree 1, Q123=NNY}
Situation
Host i is not the index case, and Ii ' is sampled after infection of the index case, but before the first
transmission node of host i, if there is any.
Proposed changes to transmission tree
1. Topological changes:
12
a. host i gets a (possibly) new infector
2. Infection time changes:
a. I i changes to Ii '
Proposal steps
2. propose a new infector Mi ' from the proposal distribution Fj  Ii '  , which is the probability to
select infector j at time Ii ' in the outbreak. The proposal distribution is given by
Fj  t  

u t  I j   d aG ,bG  t  I j   1 dij


,

k u t  Ik   daG ,bG  t  Ik   1 dij 
so the probability to select infector j is proportional to the density of the generation time
plus a weight 1 dij , with dij equal to the sequence distance between isolates from hosts i
and j (see list of definitions). This proposal distribution gives a sampling weight to possible
infectors based on their infection time and related to the genetic distance between isolates.
3. bookkeeping: make corresponding changes in the nodes and move one of the coalescent
nodes from host Mi to host Mi '
4. propose new trees Pi ' , PMi ' , and PMi ' ' by simulating the within-host coalescent model
Proposal distribution
The proposal distribution for Ii ' , Mi ' , Pi ' , PMi ' , and PMi ' ' is

H Ii ', Mi ', Pi ', PMi ', PMi ' ' I, M, P , S, θ



 Pr Pi ', PMi ', PMi ' ' Ii ', Mi ', I, M, P , S, θ 
Pr  Mi ' Ii ', I, M, P , S, θ  
Pr  Ii ' I, M, P , S, θ 

 

 Pr  Pi ' I', M', Si , θ   Pr PMi ' I', M', SMi , θ  Pr PMi ' ' I', M', SMi ' , θ 
Pr  Mi ' I', θ   Pr  Ii ' Si , θ 


 Pr Pi ', PMi ', PMi ' ' I', M', S, θ  FMi '  Ii '   d aP ,mS   Si  Ii ' 
Acceptance probability
The acceptance probability (removing the dependency on after the first line, for readability) is
 Pr  S, G, Z ' θ   H  Z Z ', S, θ  
min 1,

 Pr  S, G, Z θ   H  Z ' Z , S, θ  
 Pr  G P '   Pr  P ' S, I', M'   Pr  S I'   Pr  I', M'   Pr Pi , PM , PM ' S, I, M  FM  Ii   d a ,m  Si  Ii  
 P S
i
i
i

 min 1,
 Pr  G P   Pr  P S, I, M  Pr  S I  Pr I, M  Pr Pi ', PMi ', PMi ' ' S, I', M'  FMi '  Ii '   d aP ,mS   Si  Ii '  


 Pr  G P '   Pr  Si Ii '   Pr  I', M'   FMi  Ii   d a ,m   Si  Ii  
P
S

 min 1,
 Pr  G P   Pr  Si Ii   Pr  I, M  FMi '  Ii '   d aP ,mS   Si  Ii '  




13
Reversal
Reversal is possible through proposal path E, with the original infection time Ii proposed for the same
focal host i (going back in Fig 5E).
Proposal path F {Tree 1, Q12=NNY}
Situation
Host i is not the index case and Ii ' is sampled after infection of its first infectee j
Proposed changes to transmission tree
1. Topological changes:
a. Mi becomes infector of host j
b. host j becomes infector of host i
c. with 50% probability: the other secondary cases of hosts i and j are exchanged
2. Infection time change:
a. the sampled Ii ' is discarded; instead, the infection times of hosts i and j are switched
Proposal steps
2. discard the sampled infection time and switch infection times by proposing Ii '  I j and I j '  Ii
3. switch role by proposing Mi '  j and Mj '  Mi
4. bookkeeping: make corresponding changes in the nodes and move one of the coalescent
nodes from host i to host j
5. with 50% probability: for all k Mk  i  k  j , propose Mk '  j ; for all k Mk  j , propose
Mk '  i ; finish by bookkeeping: swap all transmission and coalescent nodes between Pi and Pj
6. propose new trees Pi ' and Pj ' by simulating the within-host coalescent model
Proposal distribution
The proposal distribution for Ii ' , I j ' , Mi ' , M j ' , Pi ' , and Pj ' is
H  Ii ', I j ', M ', M j ', Pi ', Pj ' I, M, P , S, θ 


 Pr Pi ', Pj ' Ii ', I j ', Mi ', M j ', I, M, P , S, θ 


Pr M ' Ii ', I j ', I, M, P , S , θ 
Pr  Ii ', I j ' I, M, P , S, θ 


 Pr  Pi ' I ', M ', Si , θ   Pr Pj ' I', M ', S j , θ 
Pr  M ' I ', θ   Pr  I j ' Ii ', θ   Pr  Ii ' I, Si , θ 
 Pr  Pi ', Pj ' I', M ', S , θ   0.5  p aP ,mS   Si  I j 
Here, Pr M' I', θ   0.5 because of the two possible rearrangements of infectees that follow
automatically from the proposed Ii ' , and Pr  I j ' Ii ', θ   1 because it follows automatically from the
proposed Ii ' . In the last step, Pr  Ii ' I, Si , θ   paP ,bS   Si  I j  is the cumulative density of the sampling
interval distribution, which is the probability of taking proposal path F.
14
Acceptance probability
The acceptance probability (removing the dependency on  after the first line, for readability) is
 Pr  S, G, Z ' θ   H  Z Z ', S, θ  
min 1,

 Pr  S, G, Z θ   H  Z ' Z , S, θ  
 Pr  G P '   Pr  P ' S, I', M'   Pr  S I'   Pr  I', M'   Pr  Pi , Pj S, I, M  0.5  p a ,m   S j  I j  
P
S

 min 1,
Pr
G
P

Pr
P
S
,
I
,
M

Pr
S
I

Pr
I
,
M

Pr
P
',
P
'
S
,
I
',
M
'

0.5

p
S  Ij  

  
     i j

  aP ,mS   i

 Pr  G P '   Pr  S I'   Pr  I', M'   p a ,m   S j  I j  
P
S

 min 1,
 Pr  G P   Pr  S I  Pr  I, M  p aP ,mS   Si  I j  
Reversal
Reversal is possible through proposal path F, with focal host j. In Fig 5F, reversal takes place by
proposing an infection time for host III after the transmission time to host IV.
Proposal path G {Tree 2, Q12=YY}
Situation
Host i is the index case, and Ii ' is before its first coalescent node.
Proposed changes to transmission tree
1. Topological changes: none
2. Infection time changes:
a. I i changes to Ii '
Proposal steps
2. bookkeeping: propose Pi ' by adjusting the infection time
Proposal distribution
The proposal distribution for Ii ' and Pi ' is
H  Ii ', Pi ' I, M, P , S, θ   Pr  Pi ' Ii ', I, M, P , S, θ  Pr  Ii ' I, M, P , S, θ   Pr  Ii ' Si   daP ,mS   Si  Ii ' 
Here, Pr  Pi ' ...  1 , because it follows automatically from the proposed Ii ' .
Acceptance probability
The acceptance probability (removing the dependency on  after the first line, for readability) is
 Pr  S, G, Z ' θ   H  Z Z ', S, θ  
min 1,

 Pr  S, G, Z θ   H  Z ' Z , S , θ  
 Pr  G P '   Pr  P ' S, I', M  Pr  S I'   Pr  I', M   d a ,m   Si  Ii  
P
S

 min 1,
 Pr  G P   Pr  P S, I, M  Pr  S I  Pr  I, M   d aP ,mS   Si  Ii '  
 Pr  Pi ' Si , I', M  Pr  Si Ii '   Pr  I', M  d a ,m   Si  Ii  
P
S

 min 1,
 Pr  Pi Si , I, M  Pr  Si Ii   Pr  I, M  d aP ,mS   Si  Ii '  
15
Here, Pr  G P '  Pr  G P  , because it does not depend on the infection time of the index case.
Reverse proposal
This sampling step can be reversed by proposing through (the same) proposal path G, with the
original infection time Ii proposed for the same focal host i (similar to path A, Fig 5A).
Proposal path H {Tree 2, Q12=NY}
Situation
Host i is not the index case, and Ii ' is after the MRCA of the sampling nodes in hosts i and Mi .
Proposed changes to transmission tree
1. Topological changes:
a. transmission nodes move between host i and its infector, if I i and Ii ' are on different
branches in the phylogenetic tree P
2. Infection time changes:
a. I i changes to Ii '
Proposal steps
2. bookkeeping: change hx for all nodes x involved: if I i and Ii ' are on different branches in the
phylogenetic tree P, coalescent nodes and transmission nodes move from host i to host Mi if
Ii '  Ii , and vice versa if Ii '  Ii .
Proposal distribution
The proposal distribution for Ii ' , Pi ' , and PMi ' is

H Ii ', M', Pi ', PMi ' I, M, P , S, θ



 Pr Pi ', PMi ' Ii ', M', I, M, P , S, θ 
Pr  M' Ii ', I, M, P , S, θ  
Pr  Ii ' I, M, P , S, θ 
 d aP ,mS   Si  Ii ' 


Here, Pr Pi ', PMi ' ...  1 and Pr M' ...  1 because these follow automatically from the proposed Ii ' .
Acceptance probability
The acceptance probability (removing the dependency on  after the first line, for readability) is
 Pr  S, G, Z ' θ   H  Z Z ', S , θ  
min 1,

 Pr  S, G, Z θ   H  Z ' Z , S, θ  
 Pr  G P '   Pr  P ' S, I', M'   Pr  S I'   Pr  I', M'   d a ,m   Si  Ii  
P
S

 min 1,
Pr  G P   Pr  P S, I, M  Pr  S I  Pr  I, M  d aP ,mS   Si  Ii '  


 Pr Pi ', PM ' Si , I', M'  Pr  Si Ii '   Pr  I', M'   d a ,m  Si  Ii  
 P S
i

 min 1,

Pr Pi , PMi Si , I, M  Pr  Si Ii   Pr  I, M  d aP ,mS   Si  Ii '  






16
Here, Pr  G P '  Pr  G P  , because the phylogenetic tree as a whole does not change.
Reverse proposal
This sampling step can be reversed by proposing through (the same) proposal path H, with the
original infection time Ii proposed for the same focal host i (going back in Fig 6A).
Proposal path I {Tree 2, Q1234=NNYY}
Situation
Host Mi is the index case, Ii ' is before the MRCA of the sampling nodes in hosts i and Mi , and IMi ' is
after the MRCA of the sampling nodes i and Mi .
Proposed changes to transmission tree
1. Topological changes:
a. host i becomes the index case, and host Mi its secondary case
b. transmission nodes move from host Mi to host i, consistent with the branch on
which IMi ' is placed
2. Infection time changes:
a. I i changes to IMi
b.
IMi changes to IMi '
Proposal steps (after steps 1 and 2 to propose Ii ' and IMi ' )
3. discard the proposed Ii ' ; instead, propose Ii '  IMi
4. host i becomes the index case: Mi '  0
5. host Mi gets host i as infector: MMi '  i
6. bookkeeping: propose other new infectors M' by changing hx for all nodes x involved: some
coalescent nodes and transmission nodes move from host Mi to host i.
Proposal distribution
The proposal distribution for Ii ' , IMi ' , M' , Pi ' , and PMi ' is

H Ii ', IMi ', M', Pi ', PMi ' I, M, P , S, θ

Pr  M' I ', I


 Pr Pi ', PMi ' Ii ', IMi ', M', I, M, P , S, θ 
i

Mi
Pr Ii ', IMi ' I, M, P , S, θ


', I, M, P , S, θ 




 1  p aP ,mS  Si  tmrca i ,Mi   daP ,mS  SMi  IMi '



Here, Pr Pi ', PMi ' ...  1 and Pr M' ...  1 because these follow automatically from the proposed Ii '


and IMi ' . In the last step, 1  paP ,mS  Si  tmrca i ,Mi  is 1 minus the cumulative density of the proposal
distribution, which is the probability of taking proposal path I (conditional on IMi '  tmrca i ,Mi  ).
Acceptance probability
17
The acceptance probability (removing the dependency on  after the first line, for readability) is
 Pr  S, G, Z ' θ   H  Z Z ', S, θ  
min 1,

 Pr  S, G, Z θ   H  Z ' Z , S, θ  
 Pr  G P '   Pr  P ' S, I', M'   Pr  S I'   Pr  I', M'   1  p
SMi  t mrca i ,Mi   d aP ,mS   Si  Ii  
  aP ,mS 


 min 1,
 Pr G P  Pr P S, I, M  Pr S I  Pr I, M  1  p
Si  t mrca i ,Mi   d aP ,mS  SMi  IMi ' 
  
    
  aP ,mS 


 Pr P ', P ' S , S , I', M'  Pr S , S I ', I '  Pr  I', M'   1  p
SMi  t mrca i ,Mi   d.  Si  Ii  
i
Mi
i
Mi
i
Mi i
Mi
 .

 min 1,

Pr Pi , PMi Si , SMi , I, M  Pr Si , SMi Ii , IMi  Pr  I, M  1  p. Si  t mrca i ,Mi   d. SMi  IMi ' 





 
 

















Reverse proposal
This sampling step can be reversed by proposing through (the same) proposal path I, with the original
index case (now secondary case) as focal host. In Fig 6B, reversal occurs by first proposing any
infection time before MRCAI,II for host I, and then proposing for host II its original infection time.
Proposal path J {Tree 2, Q12345=NNYNY}
Situation
Hosts i and Mi are not the index case, Ii ' is before the MRCA of the sampling nodes in hosts i and Mi
but after the MRCA of the sampling nodes in hosts i and MMi , and IMi ' is after the MRCA of the
sampling nodes in hosts i and Mi .
Proposed changes to transmission tree
1. Topological changes:
a. host MMi becomes the infector of host i
b. host i becomes the infector of host Mi
c. transmission nodes move from hosts Mi and MMi to host i, consistent with the
branches on which Ii ' and IMi ' are placed
2. Infection time changes:
a. I i changes to Ii '
b. IMi changes to IMi '
Proposal steps (after steps 1 and 2 to propose Ii ' and IMi ' )
3. switch role in transmission tree by proposing Mi '  MMi and MMi '  i
4. bookkeeping: propose other new infectors M' by changing hx for all nodes x involved: some
coalescent nodes and transmission nodes move from hosts Mi and MMi to host i.
Proposal distribution
The proposal distribution for Ii ' , IMi ' , Pi ' , and PMi ' is
18

H Ii ', IMi ', M', Pi ', PMi ' I, M, P , S, θ

Pr  M' I ', I


 Pr Pi ', PMi ' Ii ', IMi ', M', I, M, P , S, θ 
i
Mi

', I, M, P , S, θ 

Pr Ii ', IMi ' I, M, P , S, θ


 d aP ,mS   Si  Ii '   d aP ,mS  SMi  IMi '



Here, Pr Pi ', PMi ' ...  1 and Pr M' ...  1 because these follow automatically from the proposed Ii '
and IMi ' .
Acceptance probability
The acceptance probability (removing the dependency on  after the first line, for readability) is
 Pr  S, G, Z ' θ   H  Z Z ', S, θ  
min 1,

 Pr  S, G, Z θ   H  Z ' Z , S, θ  
 Pr  G P '   Pr  P ' S, I', M'   Pr  S I'   Pr I', M'   d a ,m  Si  Ii   d a ,m SM  IM 
 P S
 P S
i
i

 min 1,

Pr  G P   Pr  P S, I, M  Pr  S I  Pr  I, M  d aP ,mS   Si  Ii '   d aP ,mS  SMi  IMi ' 


 Pr Pi ', PM ' Si , SM , I', M'  Pr Si , SM Ii ', IM '  Pr  I', M'   d
S  Ii   d aP ,mS  SMi  IMi
  aP ,mS   i
i
i
i
i
 min 1,

Pr Pi , PMi Si , SMi , I, M  Pr Si , SMi Ii , IMi  Pr  I, M  d aP ,mS   Si  Ii '   d aP ,mS  SMi  IMi '




 
 








 


Reverse proposal
This sampling step can be reversed by proposing through (the same) proposal path J, with the
original infector (now secondary case) as focal host. In Fig 6C, reversal occurs by first proposing the
original infection time for host II (which is between the two MRCAs), and then proposing the original
infection time for host III.
Proposal path K {Tree 3}
Situation
Any.
Proposed changes to transmission tree
None.
Proposal steps
2. discard the sampled infection time Ii '
3. propose new tree Pi ' by simulating only new v x xPi (a new topology, not coalescent times)
Proposal distribution
The proposal distribution for Pi ' is
H  Pi ' I, M, P , S, θ   Pr  Pi ' I, M, Si , θ 
Acceptance probability
19
The acceptance probability (removing the dependency on after the first line, for readability) is
 Pr  S, G, Z ' θ   H  Z Z ', S, θ  
min 1,

 Pr  S, G, Z θ   H  Z ' Z , S, θ  
 Pr  G P '   Pr  P ' S, I, M  Pr  S I  Pr  I, M   Pr  Pi S, I, M  
 min 1,

 Pr  G P   Pr  P S, I, M  Pr  S I  Pr I, M   Pr  Pi ' S, I, M  
 Pr  G P '  
 min 1,

 Pr  G P  
Reversal
Reversal is possible through the same proposal path K, by resampling the original Pi .
20
Irreducibility of the MCMC chain
Here we argue heuristically that the MCMC chain is irreducible, i.e. any configuration of the
transmission tree and phylogenetic tree consistent with the sampling times can be reached from any
(current) configuration:
- for every host i, it is possible to reach any infection time Ii (prior to its sampling time Si),
without changing the other hosts’ infection times:
o if host i does currently not have secondary cases:
 sample any infection time with host i as focal host, followed by proposal path
D or E
o if host i does currently have secondary cases, but is not the index case:
 first, lose all secondary cases of host i by taking these secondary cases as
focal hosts, sampling the infection times they already have, thus following
proposal path E, and proposing alternative infectors
 then, sample any infection time with host i as focal host, followed by
proposal path D or E
o if host i is currently the index case:
 first, lose the index case status by a single proposal path B with host i as focal
host
 then, lose all secondary cases of host i by taking these secondary cases as
focal hosts, sampling the infection times they already have, thus following
proposal path E, and proposing alternative infectors
 then, sample any infection time with host i as focal host, followed by
proposal path D or E
- for every set of infection times I, all transmission trees consistent with those times can be
reached:
o with host i as focal host, sample the infection time it already has, and follow proposal
path A (if host i is index case) or proposal path E (otherwise), and propose any host
infected before Ii as infector.
- for every transmission tree, all phylogenetic trees consistent with that tree can be reached:
o with host i as focal host, sample the infection time it already has, follow proposal
path A (if host i is index case) or proposal path E (otherwise), sample the infector it
already has, and simulate the phylogenetic minitree in host i (and its infector).
References
1.
Felsenstein J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol
Evol. 1981;17(6):368-76. PubMed PMID: 7288891.
2.
Roberts GO, Gelman A, Gilks WR. Weak convergence and optimal scaling of random walk
metropolis algorithms. Ann Appl Prob. 1997;7(1):110-20.
21