Download Algorithms Linking Phylogenetic and Transmission Trees for

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Trichinosis wikipedia , lookup

Bovine spongiform encephalopathy wikipedia , lookup

Meningococcal disease wikipedia , lookup

Human cytomegalovirus wikipedia , lookup

Rocky Mountain spotted fever wikipedia , lookup

West Nile fever wikipedia , lookup

Ebola virus disease wikipedia , lookup

Hepatitis C wikipedia , lookup

Henipavirus wikipedia , lookup

Sarcocystis wikipedia , lookup

Middle East respiratory syndrome wikipedia , lookup

Bioterrorism wikipedia , lookup

Chickenpox wikipedia , lookup

Onchocerciasis wikipedia , lookup

Chagas disease wikipedia , lookup

Schistosomiasis wikipedia , lookup

Hospital-acquired infection wikipedia , lookup

Sexually transmitted infection wikipedia , lookup

Hepatitis B wikipedia , lookup

African trypanosomiasis wikipedia , lookup

Leptospirosis wikipedia , lookup

Marburg virus disease wikipedia , lookup

Pandemic wikipedia , lookup

Oesophagostomum wikipedia , lookup

Eradication of infectious diseases wikipedia , lookup

Cross-species transmission wikipedia , lookup

Infectious mononucleosis wikipedia , lookup

Transcript
Algorithms Linking Phylogenetic and Transmission Trees
for Molecular Infectious Disease Epidemiology
Eben Kenah1 , Tom Britton2 , M. Elizabeth Halloran3,4
and Ira M. Longini, Jr.1
1
Biostatistics Department and Emerging Pathogens Institute, University of Florida
2
3
4
Mathematics Department, Stockholm University
Biostatistics Department, University of Washington
Vaccine and Infectious Diseases Division, Fred Hutchinson Cancer Research Center
September 3, 2015
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Phylodynamics of infectious disease
Sparsely-sampled genetic sequence data are an important source of
information about large-scale spread of infectious disease. Phylodynamics
combines population genetics and infectious disease dynamics to:
Reconstruct geographic spread using diffusion processes.1
Reconstruct the effective number of infections over time using
coalescent models.2
Reconstruct rates of transmission and recovery using birth-death
processes.3
These and related methods have been used to understand the origins and
spread of HIV-1, the global circulation of influenza, and the invasion of the
eastern US by raccoon-specific rabies virus 4
1
Lemey et al. (2009), PLoS Computational Biology 5: e1000520.
2
Volz et al. (2009), Genetics 183: 1421–1430.
3
Stadler et al. (2013), Proceedings of the National Academy of Sciences 110: 228–233.
4
Biek et al. (2007), Proceedings of the National Academy of Sciences 104: 7993–7998.
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Densely-sampled genetic sequence data
One of the earliest applications of genetics in infectious disease
epidemiology was to confirm or rule out a specific source of infection:
Confirming transmission of HIV in a Florida dental practice,5
Exoneration of an HIV-positive surgeon in Baltimore.6
A more ambitious task was to reconstruct transmission trees:
An early analysis of a small HIV-1 cluster in Sweden with a known
transmission tree showed that it was accurately reflected in
phylogenies reconstructed from HIV genetic sequences.7
The increasing availability of genetic sequence data has renewed interest in
combining pathogen genetic sequence and epidemiologic data to
reconstruct transmission trees.
5
Ou et al. (1991), Science 256: 1165–1171.
6
Holmes et al. (1993), Journal of Infectious Diseases 167: 1411–1414.
7
Leitner et al. (1996). Proceedings of the National Academy of Sciences 93: 10864–10869.
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Complexity at small scales
Coalescent times are not transmission times
On a large time scale, coalescent times approximate transmission times.
On a small time scale, the difference can become important.8
Figure 1. Schematic for viral dynamics. In all panels, time progresses from left to right. Hosts
8
are depicted as gray pods, virus particles as blue dots and sampled virus particles as red
Ypma et al. (2013), Genetics 195: 1055–1062.
dots. (A) The timing of coalescence of viral lineages depends on within-host viral dynamics.
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Complexity at small scales
Phylogenetic trees are not transmission trees
REVIEWS
Patient 2
Patient 3
Patient 7
Patient 8
Patient 10
Patient 9
Patient 1
Patient 5
Patient 6
Transmission tree
Virus tree
Patient 2
Patient 2
Patient 3
Patient 3
Patient 7
Patient 7
Patient 8
Patient 8
Not all studies of infection clusters focus on the pathways of transmission; sometimes the initiation date of an
outbreak is of most interest 68 and at other times the precise epidemic source is sought 69. However, coalescentbased estimates of population processes are not suitable
for infection clusters because this approach requires that
the sequences analysed represent a small fraction of the
sampled population. Despite this restriction, transmission chain phylogenies can still provide important information about populations, such as the minimum time
between transmission events70. Furthermore, modern
sequencing technology is fast enough for genetic analysis to assist contact tracing and control as an epidemic
unfolds. For example, phylogenies confirmed epidemiological suspicions that the 2007 Italian chikungunya outbreak originated from an Indian index case71. Considered
together, the studies discussed in this section highlight
the relevance of transmission chain analyses to applied
problems in clinical medicine, forensics and public
health. The microevolutionary dynamics of infection
events will become a major focus of infectious disease
research as high-resolution longitudinal studies will
be made possible by the application of next-generation
sequencing.
Because of pathogen evolution
within hosts, the phylogenetic and
transmission trees can have different
topologies.a
Selecting an optimal transmission
tree based on a phylogeny can
overestimate the information about
transmission contained in the
Within-host dynamics
genetic sequences.
The exceptionally rapid rate of evolution of RNA viruses
means that viral evolution in a single host can be studied
for the duration of an infection. Dynamics at this scale
are fundamental as within-host evolution is the ultimate asource
of alland
viral genetic
diversity,(2009).
and therefore
Pybus
Rambaut
Nature Reviews Genetics 10:540–550.
it must be understood before models that link different
Patient 1
Patient 1
evolutionary scales can be properly developed (BOX 2).
Additionally, within-host analyses can reveal the evoluPatient 5
Patient 5
tionary processes that underlie some aspects of clinical
disease. In practice, such analyses have so far been limPatient 6
Patient 6
ited to viruses that establish chronic infections lasting
Figure 3 | Reconstruction of a known HiV-1 transmission chain.
A phylogeny
of
months or years, and for which measurable amounts
Nature
Reviews | Genetics
13 HIV-1 viral particles (blue circles) sampled at different times (horizontal axis) from
of genetic change occur between viral samples; this is
9 different patients for whom the times and direction of viral transmission are known.
particularly
HIV infection and,algorithms
to a lesser
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini,
Jr. the case for
Phylogenetic
for infectious disease epidemiology
Patient 10
Patient 10
Patient 9
Patient 9
The virus phylogeny (blue lines) can be mapped within the transmission tree (yellow boxes
72
Transmission trees
What’s the purpose of reconstructing them?
The transmission tree from one epidemic does not generalize to future
epidemics of the same disease. When infectious disease transmission is
analyzed using a survival analysis framework:
Parametric likelihoods are sums over possible transmission trees.9
Nonparametric estimates10 and semiparametric regression models11
depend on averages over possible transmission trees.
By restricting the set of possible transmission trees, a pathogen phylogeny
can help us get more efficient estimates of parameters governing the
transmission of an emerging infection.
9
Kenah (2011). Biostatistics 12: 548–566.
10
Kenah (2013), Journal of the Royal Statistical Society, Series B 75: 277–303.
11
Kenah (2015). Journal of the American Statistical Association 110: 313–325.
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Generation and serial intervals
Incubation period: the time between infection and onset of symptoms in
an infected individual.
Generation interval: the time between the infection of a secondary case
and the infection of his or her infector.
Serial interval: the time between symptom onset in a secondary case and
symptom onset in his or her infector.
If i infects j, then
tisym
ti
tj
incubation
tjsym
incubation
generation interval
serial interval
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Generation intervals, branching processes, and epidemics
Statistical methods based on generation or serial intervals treat the spread of infection
as a branching process, where the generation interval is the time
between infections of infectors and infectees.
X
X
X
X = infected person
O = susceptible person
X
X
X
X
X
O
X
= disease transmission
X
X
X
= failure to transmit
X
X
X
X
X
X
Branching processes:
People are created when they are infected;
susceptibles do not exist, and there is no
uninfected person-time.
Epidemics:
Infection spreads from infected to susceptible
in a preexisting population. Infection is
transmitted to a susceptible by the first person
to make infectious contact with him or her.
Uninfected person-time contains information
about disease transmission.
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Generation interval contraction
It is often assumed that the generation interval is a stable characteristic of
an infectious disease, but the mean generation interval actually contracts
during an epidemic.12
When there is more than one infectious person, they compete to
infect the available susceptibles. The infector is the first one to make
infectious contact.
As the prevalence of infection increases, the mean generation and
serial intervals contract.
When transmission occurs in groups of close contacts such as
households, this contraction can occur even when the global
prevalence of infection is low.
12
Kenah, Lipsitch, and Robins (2008), Mathematical Biosciences 213: 71-79.
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Simulated generation intervals
SIR model with random mixing
.25
.3
Serial interval
.35
.4
.45
.5
Smoothed mean serial interval
0
2
4
6
Source infection time
R0 = 2
R0 = 4
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
8
10
R0 = 3
R0 = 5
Phylogenetic algorithms for infectious disease epidemiology
Contact intervals
Definition
A more flexible approach to the analysis of infectious disease transmission
data can be based on contact intervals.
An infectious contact from i to j is a contact sufficient to infect j if i
is infectious and j is susceptible.
The contact interval from i to j is the time between the onset of
infectiousness in i and the first infectious contact from i to j, whether
or not this causes infection in j.
We drop the requirement that i infected j—the source of the problems
with generation and serial intervals outlined above.
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Contact intervals
Observation and censoring
The price of this flexibility is that contact intervals can be right censored:
j infected
i infectious
observation ends
i recovers
Contact interval uncensored if i infects j,
censored otherwise.
j infected
Censored contact interval
i infectious
i recovers
observation ends
j infected
i infectious
observation ends
Censored contact interval.
i recovers
If infectious periods and the end of observation independently censor
contact intervals, transmission data can be treated like standard survival
data when we observe who-infected-whom.
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Maximum likelihood estimation
Who-infects-whom is observed
Let the contact interval distribution have hazard function h(τ, θ0 ) and
survival function S(τ, θ0 ), where θ0 is an unknown parameter.
Let τij denote the follow-up time for the ordered pair ij, δij = 1 if i
infected j and δij = 0 otherwise, and P be the set of pairs ij in which
i was infectious while j was susceptible.
Then the likelihood
L(θ) =
Y
h(τij , θ)δij S(τij , θ)
(1)
ij∈P
will produce a consistent and asymptotically normal maximum
likelihood estimate of θ0 .13
13
Kenah (2011),Biostatistics 12: 548–566.
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Maximum likelihood estimation
Who-infects-whom is not observed
When who-infects-whom is not observed, the likelihood is the sum of the
likelihoods for all possible transmission trees.14
For each non-imported infection j, we replace h(τij , θ)δij with the sum
of the hazards of transmission from all possible infectors of j.
This gives us
L(θ) =
YX
j∈I i∈Vj
h(τij , θ)
Y
S(τij , θ),
ij∈P
where I is the set of non-imported infections and Vj is the set of
possible infectors of j.
14
Kenah (2011), Biostatistics 12: 548–566.
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Nonparametric inference
Who-infects-whom is observed
A more flexible assumption is that the contact interval distribution is
continuous with an unknown cumulative hazard function
Z τ
H0 (τ ) =
h0 (u) du.
0
When who-infects-whom is observed, the Nelson-Aalen estimate from
standard survival analysis produces a consistent and asymptotically
normal estimate Ĥ(τ ) of H0 (τ ).15
Smoothing Ĥ(t) and taking the first derivative yields a nonparametric
estimate of h0 (τ ).
15
Kenah (2013), Journal of the Royal Statistical Society, Series B 75: 277–303.
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Nonparametric inference
Who-infects-whom not observed
When who-infects-whom is not observed, we don’t know which contact
intervals are censored and which are not.
Given the observed data, the probability that i infected j is
pij = P
h(τij )
,
i 0 ∈Vj h(τi 0 j )
and the infector of each j can be chosen independently.
With the correct weights, the average of the Nelson-Aalen estimators
for all possible transmission trees is a consistent and asymptotically
normal estimate of H(τ ).
But if we knew h(τ ), we wouldn’t need to estimate it!
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Nonparametric inference
Expectation-Maximization (EM) algorithm
Pretend the hazard function is h(τ ) = h(0) (τ ) where h(0) (τ ) is a guess.
Set k = 0.
1
2
Calculate the weights pij using h(k) (τ ).
Average the Nelson-Aalen estimates for all possible transmission trees
using these weights to get a cumulative hazard H (k) (τ ).
3
Smooth H (k) (τ ) and take a derivative to get h(k+1) (τ ).
4
Set k = k + 1. Go back to Step (1).
This loop is repeated until H (k) (τ ) stabilizes. The stable cumulative
hazard estimate H̃(τ ) is a consistent and asymptotically normal estimate
of the true cumulative hazard H(τ ).16
16
Kenah (2013), Journal of the Royal Statistical Society, Series B 75: 277–303.
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Semiparametric regression
Effects of covariates on infectiousness and susceptibility
The EM algorithm can be adapted to fit a semiparametric regression
model where
hij (τ ) = exp(β00 Xij )h0 (τ ),
where h0 (τ ) is an unspecified baseline hazard function, β0 is an unknown
coefficient vector, and Xij is a vector of covariates that can include
infectiousness covariates for i (e.g., age or vaccination status),
susceptibility covariates for j (e.g., age or vaccination status), or
pairwise covariates (e.g., living in the same household).
This allows us to simultaneously estimate infectiousness, susceptibility, and
the evolution of infectiousness over time.17
17
Eben Kenah (2015). Semiparametric relative-risk regression for infectious disease transmission data. Journal of the
American Statistical Association 110: 313–325
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Semiparametric regression
Simulation study: Estimates of hazard ratios
In just a few iterations, the semiparametric regression model produces
good estimates of β0 (below) and the H0 (τ ) (not shown).
Estimated vs. true βsus
−0.5
0.0
0.5
0.5
−1.0
−0.5
0.0
True βinf
True βsus
Estimated vs. true βpair
Iterations
0.5
1.0
−1.0
−0.5
0.0
True βpair
0.5
1.0
Gray circles indicate
estimates β̃ where
who-infected-whom was
unobserved.
0.20
0.10
Probability mass
Black circles indicate
estimates β̂ where
who-infected-whom was
observed.
0.00
1.0
0.5
0.0
Equality
^
β
~
β
−1.0
Estimated βpair
0.0
Estimated βsus
1.0
Equality
^
β
~
β
0.30
−1.0
−1.0 −0.5
0.5
0.0
Equality
^
β
~
β
−1.0
Estimated βinf
1.0
1.0
Estimated vs. true βinf
5
10
15
20
Number of iterations
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Semiparametric regression
Simulation study: Value of observing who-infects-whom
Observing who-infects-whom would help us get more efficient estimates of
hazard ratios and the baseline hazard. Knowing who-infected-whom is:
~
^
βsus versus βsus
−0.5
0.0
0.5
1.20
1.10
1.00
95% CI width ratio
Equivalent to a 20–40% sample
size increase for infectiousness
and pairwise covariate effects.
1.0
α = 0.5
α=2
−1.0
−0.5
0.0
0.5
True βinf
True βsus
~
^
βpair versus βpair
~
^ (τ)
Λ0(τ) versus Λ
0
1.0
−1.0
−0.5
0.0
True βpair
0.5
1.0
1.5
1.0
95% CI width ratio
α = 0.5
α=2
0.5
1.20
1.10
1.00
95% CI width ratio
0.90
α = 0.5
α=2
Equivalent to a 10–20% sample
size increase for the baseline
hazard.
For susceptibility effects, who
was infected is more important
than who infected them.
2.0
−1.0
0.90
1.20
1.10
1.00
α = 0.5
α=2
0.90
95% CI width ratio
~
^
βinf versus βinf
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
Contact interval (τ)
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Gray and black circles represent
models with different baseline
hazards and Λ0 (τ ) = H0 (τ ).
Phylogenetic algorithms for infectious disease epidemiology
Observing who-infected-whom
Could genetic sequence data help find the transmission tree?
Suppose A, B, and C were infected in alphabetical order; A infected B and
both are possible infectors of C.
A
A
2 possible
transmission trees
B
C
(A, B)
3 possible
phylogenetic
trees
B
(A, A)
A
A
A
A, B
A
C
B
A
C
B
A
A
C
C
A
B
Hosts of leaves in the phylogeny are known. Hosts of interior nodes are
unknown. Possible hosts are written underneath each interior node.
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Likelihood calculation with genetic sequence data
Weighted sums depending on within-host evolution
Let Epi denote the epidemiologic data, Ph denote a phylogenetic tree, and
Tr denote a transmission tree. Then the likelihood for our data is
X
Pr(Ph, Epi|θ) =
Pr(Tr , Ph, Epi|θ)
(2)
Tr
=
X
Pr(Ph|Tr , Epi, θ) Pr(Tr , Epi|θ).
(3)
Tr
The term Pr(Ph|Tr , Epi) depends on within-host pathogen evolution.
Least optimistic: Assign equal probability to all rooted, bifurcating
within-host phylogenetic topologies.
Most optimistic: Assume a single dominant strain within each
individual can be transmitted.
Disease-specific, biologically motivated models would be better.
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Phylogenies and transmission trees
Inferring their relationship from assumptions about biology and study design
Instead of making assumptions about the relationship between
phylogenetic and transmission trees, this relationship should be inferred
from more basic assumptions about biology and study design.
1
Each individual is infected at most once.
2
The order in which infections occurred is known.
3
Each infection is initiated by a single pathogen. Following infection,
pathogen evolution takes place within the host. Evolved pathogens are
transmitted to other hosts.
4
We have at least one pathogen sequence from each infected person. These
sequences are linked in a rooted phylogeny.
5
Each pathogen in the phylogeny had a host. Parent-child relationships
between pathogens with different hosts represent direct transmissions of
infection from the host of the parent to the host of the child.
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Interior node hosts and transmission trees
Phylogeny + interior node hosts = transmission tree
Lemma
The nodes hosted by each infected individual form a subtree of the
phylogeny.
This subtree represents pathogen evolution within the host.
Theorem
A phylogeny with known interior node hosts implies a unique transmission
tree.
The subtree hosted by any individual i has a unique root ri . If ri is the
root of the phylogeny, then i was infected from outside the population.
Otherwise, ri has a parent whose host infected i.
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Example 1: foot and mouth disease virus (FMDV)
Transmission cluster of 12 farms in Durham, UK in 2001
54.62
2001 FMDV outbreak in Durham, UK
2001 FMDV outbreak in Durham, UK
C
K
Latent period range
Infectious period
K
L
E
54.60
L
E
C
G
J
I
O
Farm
F
54.58
Latitude
M
54.56
O
D
F
P
D
G
54.54
M
I
5 km
J
P
−1.80
−1.75
−1.70
−1.65
March 15
April 01
April 15
May 01
May 15
June 01
Longitude
There are 19,440 transmission trees consistent with the epidemiologic data
if latent periods are allowed to vary from 2 to 16 days.
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Example 1: 2001 FMDV outbreak in the UK
Phylogeny
Seaview
2001seq−PhyML_tree
Fri Apr 24 13:23:40 2015
0.0002
PhyML ln(L)=−12044.9 8196 sites GTR 1000 replic. 4 rate classes
C
21
2001 FMDV outbreak PhyML phylogeny
O
27
D
P
65
M
D
P
100
M
40
C
E
O
98
E
L
41
L
K
K
J
88
91
J
I
100
I
G
70
G
F
100
F
B
Mar 15
Apr 01
Apr 15
May 01
May 15
Jun 01
N
Onset of infectiousness
A
Multifurcations were resolved to maximize the number of possible
transmission trees.
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
First hosts
At most one possible host of a clade root from within the clade
If x is an interior node of the phylogeny, let Cx be the clade rooted at x
and Lx be the set of leaf hosts in Cx . There is at most one possible host
of x in Lx , whom we call first(x).
2001 FMDV outbreak: first hosts
If the order of infections is
known, first(x) must be the
earliest infection in Lx .
D
D
M
P
P
C
C
O
O
E
L
K
L
K
K
J
K
I
I
G
If both are known, then either
can be used to define first(x).
G
F
F
Mar 15
Apr 01
Apr 15
If the order of onsets of
infectiousness is known, first(x)
must have the earliest onset of
infectiousness in Lx .
May 01
May 15
Jun 01
Onset of infectiousness
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Transmission trees and interior node host assignments
Different interior node hosts ⇒ different transmission tree
Lemma
For any node x in the phylogeny, either host(x) = first(x) or host(x)
infected first(x).
Theorem
A transmission tree corresponds to at most one possible assignment of
interior node hosts in a phylogeny.
Combined with the earlier theorem, this implies a one-to-one relationship
between the possible transmission trees and the possible assignments of
interior node hosts in the phylogeny.
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Postorder host sets
Within-clade constraints on the set of possible hosts
Lemma
If x is an interior node, then host(x) = first(x) or host parent(x) .
Let Dx be the set of hosts h such that at least one possible transmission
network within the clade Cx can be generated when host(x) = h.
Lemma
If x is an interior node with child y in the phylogeny, then
(
Dy
if first(y ) 6∈ Dy
∗
host(x) ∈ Dy =
Dy ∪ Vfirst(y ) if first(y ) ∈ Dy
where Vi is the set of possible infectors of i given the epidemiologic data.
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Postorder host sets
Calculation in a postorder traversal of the phylogeny
Theorem
For any interior node x in the phylogeny,
\
Dx =
Dy∗ ,
y ∈children(x)
where children(x) denotes the children of x.
Since D` = {`} for any leaf node `, we can calculate all Dx in a postorder
(children before parents) traversal of the phylogeny.
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Example 1: 2001 FMDV outbreak in the UK
Postorder host sets
2001 FMDV outbreak: postorder host sets
Leaf host sets and infectious sets are known.
Infectious set
{D}
{F, E, P}
{P}
{M}
{P}
{P}
{P}
{L, O, C}
{L, O, F, E, C}
{C}
{K, L, O}
{K, L, O}
{K, L}
{O}
{E}
{K, L, O}
{K, L}
{K}
{L}
{K}
{K}
{K}
{}
{J}
{K}
{I}
{F}
Apr 01
Apr 15
{M, G, D}
{F, E, P}
{G}
{E, F}
Mar 15
{M, I}
{M, I}
{G}
{K, L, O, E}
May 01
May 15
Jun 01
Onset of infectiousness
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Host sets
Ancestral constraints on the set of possible hosts
Lemma
If x is an interior node, then host(x) = first(x) or host parent(x) .
The root node r0 of the tree has a known host. For simplicity, assume that
all transmission occurs within the population after an initial case. Then
host(r0 ) is the initial case. For all other nodes, let
Ax = Hparent (x) ∪ {first(x)}.
Theorem
The set of possible nodes at x is Hx = Ax ∩ Dx .
The host set of r0 is known, so all Hx can be calculated in a preorder
(parents before children) traversal of the phylogeny.
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Example 1: 2001 FMDV outbreak in the UK
Host sets
2001 FMDV outbreak: host sets
2001 FMDV outbreak: transmission trees
54.62
Root host set is known.
{D}
C
K
{P}
{M}
E
L
54.60
{P}
{P}
{O, C}
D
{C}
{O}
M
{L}
{K}
{K}
{J}
{K}
Infectors of E and P can be
chosen independently
F
54.58
{E}
{K, L}
{K}
54.56
Latitude
{O}
G
J
I
O
{I}
54.54
{I}
{G}
{G}
{F}
{F}
5 km
P
Mar 15
Apr 01
Apr 15
May 01
May 15
Jun 01
Onset of infectiousness
−1.80
−1.75
−1.70
−1.65
Longitude
There are 4 possible transmission trees simultaneously consistent with the
phylogeny and the epidemic data (down from 19,440).
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Example 2: foot and mouth disease virus (FMDV)
Transmission cluster of 7 farms in Berkshire and Surrey, UK in 2007
2007 FMDV outbreak in Berkshire and Surrey, UK
2007 FMDV outbreak in Berkshire and Surrey, UK
8
51.44
5
6b
3b
7
3c
3b
Farm
Latitude
51.42
51.43
4b
3c
4b
51.41
6b
7
51.40
2 km
8
Latent period range
Infectious period
5
−0.58
−0.57
−0.56
−0.55
−0.54
−0.53
−0.52
August 15
September 01
September 15
October 01
Longitude
There are 576 transmission trees consistent with the epidemiologic data if
latent periods are allowed to vary from 2 to 16 days.
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Example 2: 2007 FMDV outbreak in the UK
Phylogeny
Seaview
2007seq−PhyML_tree
Fri Apr 24 14:14:02 2015
PhyML ln(L)=−11516.0 8176 sites GTR 1000 replic. 4 rate classes
IP6b
0.0001
2007 FMDV outbreak PhyML phylogeny
24
8
IP8
88
95
7
IP7
IP3b
6b
IP4b
3b
57
IP3c
3c
99
IP1b/2
4b
73
IP1b
89
5
IP2b
August 15
September 01
September 15
October 01
Onset of infectiousness
IP5
Multifurcations were resolved to maximize the number of possible
transmission trees.
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Example 2: 2007 FMDV outbreak in the UK
First hosts
2007 FMDV outbreak: first hosts
8
7
7
6b
6b
3b
3b
4b
3c
5
4b
4b
5
August 15
September 01
September 15
October 01
Onset of infectiousness
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Example 2: 2007 FMDV outbreak in the UK
Postorder host sets
2007 FMDV outbreak: postorder host sets
Leaf host sets and infectious sets are known.
{8}
Infectious set
{3b, 3c, 4b, 5, 6b, 7}
{3b, 3c, 4b, 5, 7}
{7}
{3b, 3c, 4b, 5}
{3b, 3c, 4b, 5}
{6b}
{3b, 3c, 4b, 5}
{4b, 5, 3b}
{3b}
{4b, 5}
{5, 4b}
{3c}
{5}
{3b, 4b, 5}
{5, 4b}
{4b}
{5}
{5}
August 15
{}
September 01
October 01
Onset of infectiousness
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Example 2: 2007 FMDV outbreak in the UK
Host sets
2007 FMDV outbreak: host sets
2007 FMDV outbreak in Berkshire and Surrey, UK
8
51.44
Root host set is known.
{8}
{4b, 3b, 7}
6b and 7 have the same infector X,
and the infector of 8 is 7 or X.
{6b}
Latitude
{3b}
{4b}
51.42
51.43
{7}
{4b, 3b}
{4b, 3b}
6b
7
3c
3b
4b
{3c}
{4b}
51.41
{5}
{4b}
2 km
51.40
{5}
5
August 15
September 01
September 15
October 01
Onset of infectiousness
−0.58
−0.57
−0.56
−0.55
−0.54
−0.53
−0.52
Longitude
There are 4 possible transmission trees simultaneously consistent with the
phylogeny and the epidemic data (down from 576).
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Molecular infectious disease epidemiology
Contact interval hazard function estimates with and without a phylogeny
The farm-to-farm contact
interval distribution was
estimated via maximum
likelihood using a log-logistic
distribution with rate λ and
shape γ using the “least
optimistic” within-host
evolutionary model.
0.30
Farm−to−farm FMDV infectiousness: 2001 and 2007
0.20
0.15
0.10
0.00
0.05
Hazard of infectious contact
0.25
Without phylogeny
With phylogeny
0
2
4
6
8
10
Days since onset of infectiousness
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
12
Pointwise 95% confidence bands
were generated by calculating
the hazard function for each of
4000 (λ, γ) samples from the
likelihood and taking the 2.5%
and 97.5% quantiles of the
hazards at each time point.
Phylogenetic algorithms for infectious disease epidemiology
Molecular infectious disease epidemiology
Simulation study: Household transmission model
To study the potential value of a phylogeny, we conducted a series of
1,000 simulations of 100 independent households of size 6.
Each household had a single index case with an infection time chosen
from an exponential(1) distribution.
Each individual i had a binary covariate Xi that affected
infectiousness and susceptibility such that the hazard of infectious
contact from i to j at time τ after the onset of infectiousness in i was
λij (τ ) = exp(βinf Xi + βsus Xj )λ0 .
In each simulation, the true βinf and βsus were independently chosen
from a uniform(-1, 1) distribution and λ0 = 1.
The infectious periods were independently chosen from an
exponential(1) distribution.
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Molecular infectious disease epidemiology
Simulation study: Data analysis
Data from the first 200 infections were analyzed three ways to estimate
βinf , βsus , and λ0 :
Using a parametric likelihood using only epidemiologic data on times
of infection and recovery.
Using epidemiologic data and who-infected-whom.
Using epidemiologic data and within-household phylogenies with one
sample for each infected individual. The within-host phylogeny for an
individual who infected k − 1 other people was chosen uniformly at
random from all rooted, bifurcating phylogenies with k tips.
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Molecular infectious disease epidemiology
Simulation study: Relative efficiencies
Phylogenetic relative efficiency: infectiousness
●
●
●●
●
●
● ●
●●
● ●
●
●
●
●
● ●
●
●
●● ● ● ●
● ●
● ●● ● ●●●● ●● ● ●
● ●
●
●
●
●●
●● ●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●● ●
●
●
●●
●● ● ●●● ● ● ●
●
● ●●
●
●
●
●● ● ●●● ● ● ●
●
●●
● ●● ●●● ●
●
●●
● ●
● ●● ● ●
●
●●
● ●
● ● ● ● ●●
●
●● ●
● ●●
● ●● ● ●
●
● ●●●●
● ●
●●
●
●
●●●●● ●●●● ●
●● ●●●●
● ● ●●●
●
●●
●● ●
●
●●
●
●●
●
● ● ●●●●
●
● ●●
● ●
●● ● ●
●● ●
●
●
●
●
● ●●●●●
●●● ●● ●●●●
●
●●
●●
●●●
●
●●
●●
●
● ●●
●
●
●●●
●● ●●●
●
●
●● ●● ●
●
●
●●●
● ●●●
● ● ● ●●●
●
●●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●● ●
●●
●
●●
●
●
●●
●●●●
●
●●● ● ●
●
●●
●●
●●
●● ●●●●●●
●●●●●●
● ●
●●●
●●●●
●●●●
●● ●
●●
●●
●●
●●●
●●
●
●
●
●
●●
●
●●●
●
●●
●
●
●
●
●●●
●
●
●● ●●●
●
●
●●
●
●
●
●●●
●●
●
●
●●●
●●
●
●● ●
●●
●●
●●
●●
●●
●●●
●●
● ●●●
●
●
●
●
●●●●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
● ● ● ● ●●●
●
●
●
● ●●●●●
●●
●●
●
●● ●
●●●
●
●
●
●●
●● ●●
● ●
●
● ●●● ●●
●●
●
●●● ●
●
●
●
●●
●
●●
●
●
●●●●
●
●●●● ● ● ●
●● ●●●
●
● ●
●
●●●●
●●●
●
●
●●●●● ● ● ●●
● ●
●
●● ●●
●●
●
● ●● ●
●
●
●●●
●
●●
●● ●
●
●
●●●
●
●
●●
●●●
●● ●●
●●●●
●
●
●●
●
●
●
● ●●●
●
● ●●
● ●●
● ●●
●●●●
●
●●
●
●●
●
● ● ●●
●
●●
●
●
●●●
●● ●
●●
●
●●●●
●
●
●
●
●
●
●
●
●● ●
●●
●● ● ●● ●● ● ●●
●
●●
●
●●
●
●●● ●
●
● ● ●●
● ● ● ●●
●
●
●●●
●●● ●
● ●●
●
●●
●●
●●
●
● ● ●
●●
●
●●●●
●
●
●● ●●
●
● ●● ●●
●
● ●
●
● ● ●
● ● ● ●●
●● ●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
● ●● ● ●
●
●
●
● ●
● ●
●
●
●
●
●●●●●● ●●
●
●
● ●●
● ● ● ● ●● ● ● ●
●●● ●
●
●
●
●
●
●
●
●
●
● ●●
●
● ● ●
●
● ●
●
● ●
●
●
●
●●
●●●
●●●●●
●
● ●
● ●●
●
● ●
●
●
● ●
●
●
●
●
●
● ●
●
● ●
●
●
●
●●●● ● ● ● ● ● ● ● ●●
●
● ● ●●● ●●●
●●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
● ●
●●
● ●
● ● ● ●●
●●●
●
●
●● ●
● ●● ● ●● ●●
● ●●
●●
● ●
● ● ●●
● ●
●
●
●
●
●
●● ●
●● ●●
●
●●
●
●●●
●●●●● ●●
● ●
●●
●● ●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ●
● ●●
●●
●●
●
●
● ●● ●
● ●●
●●● ●●
●
● ● ●●
●
● ●
●
●● ●●
●●
●● ●
●
● ●●
●
●
●
●
●●●
●●
●●
●
● ● ●●●
●●●●●
●●●
● ●●
●
●●
●
●●●● ●
●
●
●●●
●●
●●● ●
●
●● ●●●
●●
●
●●●
●
●●●
● ● ●●●
●
● ● ●● ●●
●
●●
●
●
●
●
●
●●
●
●●
●
●●
●
●●●●
●
●
●●●
●●
●
●●
●
●● ●
●
●●●●●
●
●●
●
●●
●●
●
●
●
●
●●
●●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●● ●
●●
●
● ●
●
●●●
●●
●
●●
●●
●●●●●
●●
●
●●
●
●●●
●
●●
●●
●
●●
●●●
●●
●
●
●
●●
●●●● ●●●●
●
●
●
●●●
●●
●
●●
●●
●
●
●●
●●●
●
●
●
●●
●
●● ●
●
●●
●
●●
●
●●
●
●●
●●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●●●●
●
●
●●
●
●●●
●
●●
●●
●
●●●● ●●●●
●
●
●
●●
●●
●
●●●● ●●
●●●●
●
●●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●● ●
●●
●●
●
●●
●● ●
●●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●●
●
●
●●
●
●
●●
●
●●●●
●
●●●
●●●
●●
●
●●●●●
●●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●●
●●
●●
●
●● ●
●●●
●●●
●●●● ●
●
●●
●●
●
●●
●
●●●
●
● ●
●
●●
●
●●
●
●
●
●●●
●
●
●
●
● ●●
●
●●
●● ●
●●
●
●
●
●
●
● ●
●●●●
●
●●
●●
●
●●
●●●
●●●
●
●●●
●●
●●
●●
●
●●
●
●
●●
●●●
●●
●●
●
●●
●●
●●
●●
●
●●●● ●
●
●
●
●
●●
●
●
●●● ●●
●
●
●●●
●
●●●●
● ●● ●●
●● ●
●
●●
●
●●
●
●●
●
●●
● ●
● ●●
●● ●
●
●●
●
●
●
●● ●
●●● ●
●
● ●●
●● ● ●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●●●● ●●
● ●
●
● ● ●●
●
●●● ● ●●●
●●● ● ●●
●
● ● ●● ●●
●
● ●●●
●●
●
● ●
● ●●●●
●
●● ●●●
●●●
●● ●
●●● ●
●● ●● ●●
●
●●●●●●●●●●●●●●●●●●●●●●●
●
● ● ●●●●
●●●●
●
●●●
● ●
● ●●● ●
●
●
●●●●
●
●●●
●●●●
●● ● ● ●●●
●
● ●●
●
●
●
●
●●
●
●
●● ●
●●
●
●●● ●●
●●
●●●● ● ●
●●●●●
● ●
● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●
●
● ●
●●●
●
●●●●
●
●● ● ●
●●●
●●
●
●
●●
●●●
●
●
●●
●●●●
●● ●
●●●●●
●●
●
●
●
●
●
●●
● ●● ●●
●●●● ●
●
●●
●●
●●
●
●●●
●
●
●●●●
●
●●●●
●
● ●● ●
●●
●●●●●●
●
●●●●● ● ●
●●
●● ●
●●●●●
● ●
●●
●●
● ●●
● ●●●
●● ●
●
● ●
● ●●
● ● ●●●●
●
●
● ●●
● ●●●●
● ●
●
● ●
● ●●● ●●
●
●●
●●●
●●●●
●●●
●●
●
●
●
●●● ●●●
●●●●
● ●
●
●●●●●●●●
● ●● ● ●
●
●● ●
●●
●●●●●
●
●
●●
●●●● ●●●
●
●●
●●
●● ●
●●
●●
●
●
●
●●
● ●●●
●●●●●
●●
●
●
● ● ●●●
●
●
● ●
●
●
●●●
●●●●●
●
●●
●
●
●● ●
●●●●
●●
●●●●●●
●●
●
●
●●
● ● ●●
●
● ●●
●
●
●
● ●
●
●
●
●
●●● ●●
● ● ●●
●
● ●
●
●●●
●● ●
●●●
●
●
●●●
●●●
●●
● ● ●●
●
●
●●
●●
●
●
●●●
●●
●●
● ● ●
●●
●
●●●● ●●●●●
●●●
●
●
●●
●
●●●●
●
●●
●●●
●●●●
●
●
● ●●
● ●●●●
●●●●●●●
● ● ● ●●●
●
●
●●
●●● ●● ●●
●●●
● ●
●
●● ●●● ●●
● ●
● ●● ●
● ● ●●● ●●●● ●
●
●
●
●
●●
●●●●
●
● ●●
● ●● ● ● ●● ●●
●●
●
●●●●
●
●●
●
●●
●● ●
●●
●●
●
●
● ●●
● ●●●
●●●
● ● ●
●
●●
●● ● ● ●
●
● ●
●●●
●●●
●● ● ●●
●
●
●● ● ● ● ● ●
● ●● ●
●●
● ●
● ●
● ●
● ●●
● ● ●● ● ●
●●
●●
● ●
●● ●
● ● ●
●●
● ● ●●
●● ●
●●●
● ●●
●
●
●
● ●
●
●
●
● ●
●
● ●
●
●
●
●
●
● ●
● ●
● ●● ●
●
●
●
●
●
●
●
●
●
0.5
●
●
● ● ●
● ● ●
●
●
●
●
●● ● ●
● ●
●
● ●
●●
●●
●
●
●● ●● ● ● ●●
● ● ●● ●●● ● ● ● ●● ●
●●
●
●● ●
● ●●●●
●●●●● ●●
●
●
●
●● ●
●●
●
●
● ●●
●
●
●
●●
●●●●
●
●●
●●● ● ● ● ●
●
●●
● ● ● ●●
●
● ● ●●
●● ●●
●●●●●
●●
●●● ● ● ●●
●● ●
●
● ●●
●●
● ●●
●●
●
●
●
● ●
●● ●●
●
●● ●●●
●●
● ● ●● ●●
● ●
●●● ●●
● ●
● ●●
●● ●●●
● ●
●● ●●● ● ● ●● ●
●●●●
●
●
●● ●
● ●
●●
●●●●
●●●●●
●
●●
●
●●●
●●
●●
●
●●
●
●●
●●
●●● ● ●
●
● ●
●
●●
●
● ●
●●
●●
●
●●
●●
●●●
●
● ●● ●
●●
●●●
●●
●
●
●●
●●●●●
●● ●
●●
●● ●
●
●●
●● ●
●● ●●●●
● ●●
●
●●
●●● ●
●●
● ●
●● ● ●●
●●●
●●●●
●
●●
●●●●
●●
●
●
●
●●
●
●
●
●●
● ●●
●●
●
●●
●●
●
●
●●
●
●
● ●●●
●●
●●●●
●
● ●●
●●● ●
●●●
●
●●
●
●
●
● ●●
●●●
●●
●●●
●
●●
●●●●
●●●
●
●
●●●●●●
●●●
● ●●
●
●
●●●●●
●●
●●●
●
●●
●
●●
● ●● ●
●
●●
● ●●
●
● ●●
●
●●
●●
● ●●● ●
●●
●
●
●●●●
● ●●
●●● ●
●●● ●
●
●
●●
●
●
● ●●
●
●
●●●
●●
●
●
●●
●●
●
●●
●●
●
●●●
●
● ●●●●●
●
●●
● ●
●●
●● ●●●● ●●●
●●
●
●
● ●●
●
● ●●●● ●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●●● ●●
●
●●
●
●
● ●●
●
●
● ●●●
●
● ●
●●
●
●
●●●
●●●●
●●
●●
●
● ●●
●
● ●●
● ●●●
●●
●
●
●
● ●
●
●●
●●●
●
●●
● ●
●
● ●
●
●
●●
●
●
●● ●●
●
●●
●●●●
●
●●● ●●
●
●●●●
●●●
●●●
● ●●
● ●
●
●●
●
●●
●●● ●●
●●
●●
●
●
● ●● ●
●
●
●●● ● ●●●
●
●●●
●
●
●
●
●● ●
●
●
●
●
●●
●●
●
●
●
● ●●
● ●
●●
●
●●
●
●
●● ● ●
●●●
●●
●
●●● ●●
●● ●●
● ●● ●
●
●
●●●●
●● ●● ● ●● ● ● ●●
●
●● ●
● ● ●
●
● ●
● ●●●●●
● ● ●
●
● ●●●●●●
● ●● ●
●
●
● ●●
● ●● ● ● ● ●
● ●●●
●● ● ● ●
●
●
●●
●●●●
●
●
●
●
●
● ●●
●
●
●●●
●
●
●
●
●
● ●
●●
●
● ●●●
●●●
●●
●
● ●
●
●
●
● ●●●
●
● ●●
●●●
●
●
●
●
●
●
● ●
● ●●
●
●
● ●● ●
●
●●
●
●
●●
●
● ● ●
●●
●
●
1.5
●●
1.0
●
Relative efficency
●
●
0.0
●
●
−1.0
−0.5
Epidemiologic data
Epidemiologic data with phylogeny
Epidemiologic data with who−infected−whom
0.0
0.5
1.0
Log hazard ratio for infectiousness (βinf)
●
0.0
0.5
1.0
1.5
●
Relative efficency
Phylogenetic relative efficiency: baseline hazard
●
0
200
Epidemiologic data
Epidemiologic data with phylogeny
Epidemiologic data with who−infected−whom
400
600
800
1000
Simulation number
Here, relative efficiency was calculated using mean squared error (MSE).
All analyses were equally efficient for βsus .
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Molecular infectious disease epidemiology
Simulation study: Overall relative efficiency
Compared to the analysis with epidemiologic data only, the relative
efficency of the phylogenetic analysis was 1.30 for βinf and 1.15 for λ0 .
Compared to the analysis with who-infected-whom, the relative
efficiencies were 0.81 and 0.94, respectively.
When parameter estimates were used to calculate the marginal
probability of transmission from an infectious person to a susceptible
household member for all four covariate combinations, the relative
efficiencies were 1.38 and 0.90.
A phylogeny can recover much of the information that would be gained by
observing who-infected-whom.
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Conclusion
Implications and future directions
In emerging epidemics, more emphasis needs to be given to collecting
detailed epidemiologic data from well-defined groups of contacts.
There is a tradeoff between the breadth and depth of surveillance.
It is essential to collect data on both infected people and people who
were exposed to infection but not infected.
Complete sampling of pathogen samples can be equivalent to a 10%
to 30% increase in sample size for estimates of infectiousness and
baseline hazards (i.e., evolution of infectiousness over time).
These methods need to be extended to account for phylogenetic
uncertainty and to be integrated into non- and semiparametric
statistical methods for infectious disease transmission data, most
likely in a Bayesian MCMC framework.
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology
Acknowledgements
The data on the FMDV outbreaks was made publicly available in the
following publications:
Eleanor Cottam, Gaël Thébaud, Jemma Wadsworth, John Gloster, Leonard
Mansley, David J. Paton, Donald P. King, and Daniel T. Haydon.
Integrating genetic and epidemiological data to determine transmission
pathways of foot-and-mouth disease virus (2008). Proceedings of the Royal
Society B 275: 887–895.
Marco J. Morelli, Gaël Thébaud, Joël Chadœuf, Donald P. King, Daniel T.
Haydon, and Samuel Soubeyrand (2012). PLoS Computational Biology 8:
e1002768.
This research was supported by National Institute of Allergy and Infectious
Diseases (NIAID) grant R00 AI095302 and National Institute of General
Medical Sciences (NIGMS) grant U54 GM111274. The content is solely
the responsibility of the authors and does not represent the official views
of NIAID, NIGMS, or the National Institutes of Health.
E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr.
Phylogenetic algorithms for infectious disease epidemiology