* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Algorithms Linking Phylogenetic and Transmission Trees for
Trichinosis wikipedia , lookup
Bovine spongiform encephalopathy wikipedia , lookup
Meningococcal disease wikipedia , lookup
Human cytomegalovirus wikipedia , lookup
Rocky Mountain spotted fever wikipedia , lookup
West Nile fever wikipedia , lookup
Ebola virus disease wikipedia , lookup
Hepatitis C wikipedia , lookup
Henipavirus wikipedia , lookup
Sarcocystis wikipedia , lookup
Middle East respiratory syndrome wikipedia , lookup
Bioterrorism wikipedia , lookup
Onchocerciasis wikipedia , lookup
Chagas disease wikipedia , lookup
Schistosomiasis wikipedia , lookup
Hospital-acquired infection wikipedia , lookup
Sexually transmitted infection wikipedia , lookup
Hepatitis B wikipedia , lookup
African trypanosomiasis wikipedia , lookup
Leptospirosis wikipedia , lookup
Marburg virus disease wikipedia , lookup
Oesophagostomum wikipedia , lookup
Eradication of infectious diseases wikipedia , lookup
Algorithms Linking Phylogenetic and Transmission Trees for Molecular Infectious Disease Epidemiology Eben Kenah1 , Tom Britton2 , M. Elizabeth Halloran3,4 and Ira M. Longini, Jr.1 1 Biostatistics Department and Emerging Pathogens Institute, University of Florida 2 3 4 Mathematics Department, Stockholm University Biostatistics Department, University of Washington Vaccine and Infectious Diseases Division, Fred Hutchinson Cancer Research Center September 3, 2015 E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Phylodynamics of infectious disease Sparsely-sampled genetic sequence data are an important source of information about large-scale spread of infectious disease. Phylodynamics combines population genetics and infectious disease dynamics to: Reconstruct geographic spread using diffusion processes.1 Reconstruct the effective number of infections over time using coalescent models.2 Reconstruct rates of transmission and recovery using birth-death processes.3 These and related methods have been used to understand the origins and spread of HIV-1, the global circulation of influenza, and the invasion of the eastern US by raccoon-specific rabies virus 4 1 Lemey et al. (2009), PLoS Computational Biology 5: e1000520. 2 Volz et al. (2009), Genetics 183: 1421–1430. 3 Stadler et al. (2013), Proceedings of the National Academy of Sciences 110: 228–233. 4 Biek et al. (2007), Proceedings of the National Academy of Sciences 104: 7993–7998. E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Densely-sampled genetic sequence data One of the earliest applications of genetics in infectious disease epidemiology was to confirm or rule out a specific source of infection: Confirming transmission of HIV in a Florida dental practice,5 Exoneration of an HIV-positive surgeon in Baltimore.6 A more ambitious task was to reconstruct transmission trees: An early analysis of a small HIV-1 cluster in Sweden with a known transmission tree showed that it was accurately reflected in phylogenies reconstructed from HIV genetic sequences.7 The increasing availability of genetic sequence data has renewed interest in combining pathogen genetic sequence and epidemiologic data to reconstruct transmission trees. 5 Ou et al. (1991), Science 256: 1165–1171. 6 Holmes et al. (1993), Journal of Infectious Diseases 167: 1411–1414. 7 Leitner et al. (1996). Proceedings of the National Academy of Sciences 93: 10864–10869. E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Complexity at small scales Coalescent times are not transmission times On a large time scale, coalescent times approximate transmission times. On a small time scale, the difference can become important.8 Figure 1. Schematic for viral dynamics. In all panels, time progresses from left to right. Hosts 8 are depicted as gray pods, virus particles as blue dots and sampled virus particles as red Ypma et al. (2013), Genetics 195: 1055–1062. dots. (A) The timing of coalescence of viral lineages depends on within-host viral dynamics. E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Complexity at small scales Phylogenetic trees are not transmission trees REVIEWS Patient 2 Patient 3 Patient 7 Patient 8 Patient 10 Patient 9 Patient 1 Patient 5 Patient 6 Transmission tree Virus tree Patient 2 Patient 2 Patient 3 Patient 3 Patient 7 Patient 7 Patient 8 Patient 8 Not all studies of infection clusters focus on the pathways of transmission; sometimes the initiation date of an outbreak is of most interest 68 and at other times the precise epidemic source is sought 69. However, coalescentbased estimates of population processes are not suitable for infection clusters because this approach requires that the sequences analysed represent a small fraction of the sampled population. Despite this restriction, transmission chain phylogenies can still provide important information about populations, such as the minimum time between transmission events70. Furthermore, modern sequencing technology is fast enough for genetic analysis to assist contact tracing and control as an epidemic unfolds. For example, phylogenies confirmed epidemiological suspicions that the 2007 Italian chikungunya outbreak originated from an Indian index case71. Considered together, the studies discussed in this section highlight the relevance of transmission chain analyses to applied problems in clinical medicine, forensics and public health. The microevolutionary dynamics of infection events will become a major focus of infectious disease research as high-resolution longitudinal studies will be made possible by the application of next-generation sequencing. Because of pathogen evolution within hosts, the phylogenetic and transmission trees can have different topologies.a Selecting an optimal transmission tree based on a phylogeny can overestimate the information about transmission contained in the Within-host dynamics genetic sequences. The exceptionally rapid rate of evolution of RNA viruses means that viral evolution in a single host can be studied for the duration of an infection. Dynamics at this scale are fundamental as within-host evolution is the ultimate asource of alland viral genetic diversity,(2009). and therefore Pybus Rambaut Nature Reviews Genetics 10:540–550. it must be understood before models that link different Patient 1 Patient 1 evolutionary scales can be properly developed (BOX 2). Additionally, within-host analyses can reveal the evoluPatient 5 Patient 5 tionary processes that underlie some aspects of clinical disease. In practice, such analyses have so far been limPatient 6 Patient 6 ited to viruses that establish chronic infections lasting Figure 3 | Reconstruction of a known HiV-1 transmission chain. A phylogeny of months or years, and for which measurable amounts Nature Reviews | Genetics 13 HIV-1 viral particles (blue circles) sampled at different times (horizontal axis) from of genetic change occur between viral samples; this is 9 different patients for whom the times and direction of viral transmission are known. particularly HIV infection and,algorithms to a lesser E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. the case for Phylogenetic for infectious disease epidemiology Patient 10 Patient 10 Patient 9 Patient 9 The virus phylogeny (blue lines) can be mapped within the transmission tree (yellow boxes 72 Transmission trees What’s the purpose of reconstructing them? The transmission tree from one epidemic does not generalize to future epidemics of the same disease. When infectious disease transmission is analyzed using a survival analysis framework: Parametric likelihoods are sums over possible transmission trees.9 Nonparametric estimates10 and semiparametric regression models11 depend on averages over possible transmission trees. By restricting the set of possible transmission trees, a pathogen phylogeny can help us get more efficient estimates of parameters governing the transmission of an emerging infection. 9 Kenah (2011). Biostatistics 12: 548–566. 10 Kenah (2013), Journal of the Royal Statistical Society, Series B 75: 277–303. 11 Kenah (2015). Journal of the American Statistical Association 110: 313–325. E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Generation and serial intervals Incubation period: the time between infection and onset of symptoms in an infected individual. Generation interval: the time between the infection of a secondary case and the infection of his or her infector. Serial interval: the time between symptom onset in a secondary case and symptom onset in his or her infector. If i infects j, then tisym ti tj incubation tjsym incubation generation interval serial interval E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Generation intervals, branching processes, and epidemics Statistical methods based on generation or serial intervals treat the spread of infection as a branching process, where the generation interval is the time between infections of infectors and infectees. X X X X = infected person O = susceptible person X X X X X O X = disease transmission X X X = failure to transmit X X X X X X Branching processes: People are created when they are infected; susceptibles do not exist, and there is no uninfected person-time. Epidemics: Infection spreads from infected to susceptible in a preexisting population. Infection is transmitted to a susceptible by the first person to make infectious contact with him or her. Uninfected person-time contains information about disease transmission. E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Generation interval contraction It is often assumed that the generation interval is a stable characteristic of an infectious disease, but the mean generation interval actually contracts during an epidemic.12 When there is more than one infectious person, they compete to infect the available susceptibles. The infector is the first one to make infectious contact. As the prevalence of infection increases, the mean generation and serial intervals contract. When transmission occurs in groups of close contacts such as households, this contraction can occur even when the global prevalence of infection is low. 12 Kenah, Lipsitch, and Robins (2008), Mathematical Biosciences 213: 71-79. E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Simulated generation intervals SIR model with random mixing .25 .3 Serial interval .35 .4 .45 .5 Smoothed mean serial interval 0 2 4 6 Source infection time R0 = 2 R0 = 4 E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. 8 10 R0 = 3 R0 = 5 Phylogenetic algorithms for infectious disease epidemiology Contact intervals Definition A more flexible approach to the analysis of infectious disease transmission data can be based on contact intervals. An infectious contact from i to j is a contact sufficient to infect j if i is infectious and j is susceptible. The contact interval from i to j is the time between the onset of infectiousness in i and the first infectious contact from i to j, whether or not this causes infection in j. We drop the requirement that i infected j—the source of the problems with generation and serial intervals outlined above. E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Contact intervals Observation and censoring The price of this flexibility is that contact intervals can be right censored: j infected i infectious observation ends i recovers Contact interval uncensored if i infects j, censored otherwise. j infected Censored contact interval i infectious i recovers observation ends j infected i infectious observation ends Censored contact interval. i recovers If infectious periods and the end of observation independently censor contact intervals, transmission data can be treated like standard survival data when we observe who-infected-whom. E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Maximum likelihood estimation Who-infects-whom is observed Let the contact interval distribution have hazard function h(τ, θ0 ) and survival function S(τ, θ0 ), where θ0 is an unknown parameter. Let τij denote the follow-up time for the ordered pair ij, δij = 1 if i infected j and δij = 0 otherwise, and P be the set of pairs ij in which i was infectious while j was susceptible. Then the likelihood L(θ) = Y h(τij , θ)δij S(τij , θ) (1) ij∈P will produce a consistent and asymptotically normal maximum likelihood estimate of θ0 .13 13 Kenah (2011),Biostatistics 12: 548–566. E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Maximum likelihood estimation Who-infects-whom is not observed When who-infects-whom is not observed, the likelihood is the sum of the likelihoods for all possible transmission trees.14 For each non-imported infection j, we replace h(τij , θ)δij with the sum of the hazards of transmission from all possible infectors of j. This gives us L(θ) = YX j∈I i∈Vj h(τij , θ) Y S(τij , θ), ij∈P where I is the set of non-imported infections and Vj is the set of possible infectors of j. 14 Kenah (2011), Biostatistics 12: 548–566. E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Nonparametric inference Who-infects-whom is observed A more flexible assumption is that the contact interval distribution is continuous with an unknown cumulative hazard function Z τ H0 (τ ) = h0 (u) du. 0 When who-infects-whom is observed, the Nelson-Aalen estimate from standard survival analysis produces a consistent and asymptotically normal estimate Ĥ(τ ) of H0 (τ ).15 Smoothing Ĥ(t) and taking the first derivative yields a nonparametric estimate of h0 (τ ). 15 Kenah (2013), Journal of the Royal Statistical Society, Series B 75: 277–303. E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Nonparametric inference Who-infects-whom not observed When who-infects-whom is not observed, we don’t know which contact intervals are censored and which are not. Given the observed data, the probability that i infected j is pij = P h(τij ) , i 0 ∈Vj h(τi 0 j ) and the infector of each j can be chosen independently. With the correct weights, the average of the Nelson-Aalen estimators for all possible transmission trees is a consistent and asymptotically normal estimate of H(τ ). But if we knew h(τ ), we wouldn’t need to estimate it! E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Nonparametric inference Expectation-Maximization (EM) algorithm Pretend the hazard function is h(τ ) = h(0) (τ ) where h(0) (τ ) is a guess. Set k = 0. 1 2 Calculate the weights pij using h(k) (τ ). Average the Nelson-Aalen estimates for all possible transmission trees using these weights to get a cumulative hazard H (k) (τ ). 3 Smooth H (k) (τ ) and take a derivative to get h(k+1) (τ ). 4 Set k = k + 1. Go back to Step (1). This loop is repeated until H (k) (τ ) stabilizes. The stable cumulative hazard estimate H̃(τ ) is a consistent and asymptotically normal estimate of the true cumulative hazard H(τ ).16 16 Kenah (2013), Journal of the Royal Statistical Society, Series B 75: 277–303. E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Semiparametric regression Effects of covariates on infectiousness and susceptibility The EM algorithm can be adapted to fit a semiparametric regression model where hij (τ ) = exp(β00 Xij )h0 (τ ), where h0 (τ ) is an unspecified baseline hazard function, β0 is an unknown coefficient vector, and Xij is a vector of covariates that can include infectiousness covariates for i (e.g., age or vaccination status), susceptibility covariates for j (e.g., age or vaccination status), or pairwise covariates (e.g., living in the same household). This allows us to simultaneously estimate infectiousness, susceptibility, and the evolution of infectiousness over time.17 17 Eben Kenah (2015). Semiparametric relative-risk regression for infectious disease transmission data. Journal of the American Statistical Association 110: 313–325 E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Semiparametric regression Simulation study: Estimates of hazard ratios In just a few iterations, the semiparametric regression model produces good estimates of β0 (below) and the H0 (τ ) (not shown). Estimated vs. true βsus −0.5 0.0 0.5 0.5 −1.0 −0.5 0.0 True βinf True βsus Estimated vs. true βpair Iterations 0.5 1.0 −1.0 −0.5 0.0 True βpair 0.5 1.0 Gray circles indicate estimates β̃ where who-infected-whom was unobserved. 0.20 0.10 Probability mass Black circles indicate estimates β̂ where who-infected-whom was observed. 0.00 1.0 0.5 0.0 Equality ^ β ~ β −1.0 Estimated βpair 0.0 Estimated βsus 1.0 Equality ^ β ~ β 0.30 −1.0 −1.0 −0.5 0.5 0.0 Equality ^ β ~ β −1.0 Estimated βinf 1.0 1.0 Estimated vs. true βinf 5 10 15 20 Number of iterations E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Semiparametric regression Simulation study: Value of observing who-infects-whom Observing who-infects-whom would help us get more efficient estimates of hazard ratios and the baseline hazard. Knowing who-infected-whom is: ~ ^ βsus versus βsus −0.5 0.0 0.5 1.20 1.10 1.00 95% CI width ratio Equivalent to a 20–40% sample size increase for infectiousness and pairwise covariate effects. 1.0 α = 0.5 α=2 −1.0 −0.5 0.0 0.5 True βinf True βsus ~ ^ βpair versus βpair ~ ^ (τ) Λ0(τ) versus Λ 0 1.0 −1.0 −0.5 0.0 True βpair 0.5 1.0 1.5 1.0 95% CI width ratio α = 0.5 α=2 0.5 1.20 1.10 1.00 95% CI width ratio 0.90 α = 0.5 α=2 Equivalent to a 10–20% sample size increase for the baseline hazard. For susceptibility effects, who was infected is more important than who infected them. 2.0 −1.0 0.90 1.20 1.10 1.00 α = 0.5 α=2 0.90 95% CI width ratio ~ ^ βinf versus βinf 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Contact interval (τ) E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Gray and black circles represent models with different baseline hazards and Λ0 (τ ) = H0 (τ ). Phylogenetic algorithms for infectious disease epidemiology Observing who-infected-whom Could genetic sequence data help find the transmission tree? Suppose A, B, and C were infected in alphabetical order; A infected B and both are possible infectors of C. A A 2 possible transmission trees B C (A, B) 3 possible phylogenetic trees B (A, A) A A A A, B A C B A C B A A C C A B Hosts of leaves in the phylogeny are known. Hosts of interior nodes are unknown. Possible hosts are written underneath each interior node. E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Likelihood calculation with genetic sequence data Weighted sums depending on within-host evolution Let Epi denote the epidemiologic data, Ph denote a phylogenetic tree, and Tr denote a transmission tree. Then the likelihood for our data is X Pr(Ph, Epi|θ) = Pr(Tr , Ph, Epi|θ) (2) Tr = X Pr(Ph|Tr , Epi, θ) Pr(Tr , Epi|θ). (3) Tr The term Pr(Ph|Tr , Epi) depends on within-host pathogen evolution. Least optimistic: Assign equal probability to all rooted, bifurcating within-host phylogenetic topologies. Most optimistic: Assume a single dominant strain within each individual can be transmitted. Disease-specific, biologically motivated models would be better. E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Phylogenies and transmission trees Inferring their relationship from assumptions about biology and study design Instead of making assumptions about the relationship between phylogenetic and transmission trees, this relationship should be inferred from more basic assumptions about biology and study design. 1 Each individual is infected at most once. 2 The order in which infections occurred is known. 3 Each infection is initiated by a single pathogen. Following infection, pathogen evolution takes place within the host. Evolved pathogens are transmitted to other hosts. 4 We have at least one pathogen sequence from each infected person. These sequences are linked in a rooted phylogeny. 5 Each pathogen in the phylogeny had a host. Parent-child relationships between pathogens with different hosts represent direct transmissions of infection from the host of the parent to the host of the child. E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Interior node hosts and transmission trees Phylogeny + interior node hosts = transmission tree Lemma The nodes hosted by each infected individual form a subtree of the phylogeny. This subtree represents pathogen evolution within the host. Theorem A phylogeny with known interior node hosts implies a unique transmission tree. The subtree hosted by any individual i has a unique root ri . If ri is the root of the phylogeny, then i was infected from outside the population. Otherwise, ri has a parent whose host infected i. E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Example 1: foot and mouth disease virus (FMDV) Transmission cluster of 12 farms in Durham, UK in 2001 54.62 2001 FMDV outbreak in Durham, UK 2001 FMDV outbreak in Durham, UK C K Latent period range Infectious period K L E 54.60 L E C G J I O Farm F 54.58 Latitude M 54.56 O D F P D G 54.54 M I 5 km J P −1.80 −1.75 −1.70 −1.65 March 15 April 01 April 15 May 01 May 15 June 01 Longitude There are 19,440 transmission trees consistent with the epidemiologic data if latent periods are allowed to vary from 2 to 16 days. E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Example 1: 2001 FMDV outbreak in the UK Phylogeny Seaview 2001seq−PhyML_tree Fri Apr 24 13:23:40 2015 0.0002 PhyML ln(L)=−12044.9 8196 sites GTR 1000 replic. 4 rate classes C 21 2001 FMDV outbreak PhyML phylogeny O 27 D P 65 M D P 100 M 40 C E O 98 E L 41 L K K J 88 91 J I 100 I G 70 G F 100 F B Mar 15 Apr 01 Apr 15 May 01 May 15 Jun 01 N Onset of infectiousness A Multifurcations were resolved to maximize the number of possible transmission trees. E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology First hosts At most one possible host of a clade root from within the clade If x is an interior node of the phylogeny, let Cx be the clade rooted at x and Lx be the set of leaf hosts in Cx . There is at most one possible host of x in Lx , whom we call first(x). 2001 FMDV outbreak: first hosts If the order of infections is known, first(x) must be the earliest infection in Lx . D D M P P C C O O E L K L K K J K I I G If both are known, then either can be used to define first(x). G F F Mar 15 Apr 01 Apr 15 If the order of onsets of infectiousness is known, first(x) must have the earliest onset of infectiousness in Lx . May 01 May 15 Jun 01 Onset of infectiousness E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Transmission trees and interior node host assignments Different interior node hosts ⇒ different transmission tree Lemma For any node x in the phylogeny, either host(x) = first(x) or host(x) infected first(x). Theorem A transmission tree corresponds to at most one possible assignment of interior node hosts in a phylogeny. Combined with the earlier theorem, this implies a one-to-one relationship between the possible transmission trees and the possible assignments of interior node hosts in the phylogeny. E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Postorder host sets Within-clade constraints on the set of possible hosts Lemma If x is an interior node, then host(x) = first(x) or host parent(x) . Let Dx be the set of hosts h such that at least one possible transmission network within the clade Cx can be generated when host(x) = h. Lemma If x is an interior node with child y in the phylogeny, then ( Dy if first(y ) 6∈ Dy ∗ host(x) ∈ Dy = Dy ∪ Vfirst(y ) if first(y ) ∈ Dy where Vi is the set of possible infectors of i given the epidemiologic data. E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Postorder host sets Calculation in a postorder traversal of the phylogeny Theorem For any interior node x in the phylogeny, \ Dx = Dy∗ , y ∈children(x) where children(x) denotes the children of x. Since D` = {`} for any leaf node `, we can calculate all Dx in a postorder (children before parents) traversal of the phylogeny. E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Example 1: 2001 FMDV outbreak in the UK Postorder host sets 2001 FMDV outbreak: postorder host sets Leaf host sets and infectious sets are known. Infectious set {D} {F, E, P} {P} {M} {P} {P} {P} {L, O, C} {L, O, F, E, C} {C} {K, L, O} {K, L, O} {K, L} {O} {E} {K, L, O} {K, L} {K} {L} {K} {K} {K} {} {J} {K} {I} {F} Apr 01 Apr 15 {M, G, D} {F, E, P} {G} {E, F} Mar 15 {M, I} {M, I} {G} {K, L, O, E} May 01 May 15 Jun 01 Onset of infectiousness E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Host sets Ancestral constraints on the set of possible hosts Lemma If x is an interior node, then host(x) = first(x) or host parent(x) . The root node r0 of the tree has a known host. For simplicity, assume that all transmission occurs within the population after an initial case. Then host(r0 ) is the initial case. For all other nodes, let Ax = Hparent (x) ∪ {first(x)}. Theorem The set of possible nodes at x is Hx = Ax ∩ Dx . The host set of r0 is known, so all Hx can be calculated in a preorder (parents before children) traversal of the phylogeny. E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Example 1: 2001 FMDV outbreak in the UK Host sets 2001 FMDV outbreak: host sets 2001 FMDV outbreak: transmission trees 54.62 Root host set is known. {D} C K {P} {M} E L 54.60 {P} {P} {O, C} D {C} {O} M {L} {K} {K} {J} {K} Infectors of E and P can be chosen independently F 54.58 {E} {K, L} {K} 54.56 Latitude {O} G J I O {I} 54.54 {I} {G} {G} {F} {F} 5 km P Mar 15 Apr 01 Apr 15 May 01 May 15 Jun 01 Onset of infectiousness −1.80 −1.75 −1.70 −1.65 Longitude There are 4 possible transmission trees simultaneously consistent with the phylogeny and the epidemic data (down from 19,440). E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Example 2: foot and mouth disease virus (FMDV) Transmission cluster of 7 farms in Berkshire and Surrey, UK in 2007 2007 FMDV outbreak in Berkshire and Surrey, UK 2007 FMDV outbreak in Berkshire and Surrey, UK 8 51.44 5 6b 3b 7 3c 3b Farm Latitude 51.42 51.43 4b 3c 4b 51.41 6b 7 51.40 2 km 8 Latent period range Infectious period 5 −0.58 −0.57 −0.56 −0.55 −0.54 −0.53 −0.52 August 15 September 01 September 15 October 01 Longitude There are 576 transmission trees consistent with the epidemiologic data if latent periods are allowed to vary from 2 to 16 days. E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Example 2: 2007 FMDV outbreak in the UK Phylogeny Seaview 2007seq−PhyML_tree Fri Apr 24 14:14:02 2015 PhyML ln(L)=−11516.0 8176 sites GTR 1000 replic. 4 rate classes IP6b 0.0001 2007 FMDV outbreak PhyML phylogeny 24 8 IP8 88 95 7 IP7 IP3b 6b IP4b 3b 57 IP3c 3c 99 IP1b/2 4b 73 IP1b 89 5 IP2b August 15 September 01 September 15 October 01 Onset of infectiousness IP5 Multifurcations were resolved to maximize the number of possible transmission trees. E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Example 2: 2007 FMDV outbreak in the UK First hosts 2007 FMDV outbreak: first hosts 8 7 7 6b 6b 3b 3b 4b 3c 5 4b 4b 5 August 15 September 01 September 15 October 01 Onset of infectiousness E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Example 2: 2007 FMDV outbreak in the UK Postorder host sets 2007 FMDV outbreak: postorder host sets Leaf host sets and infectious sets are known. {8} Infectious set {3b, 3c, 4b, 5, 6b, 7} {3b, 3c, 4b, 5, 7} {7} {3b, 3c, 4b, 5} {3b, 3c, 4b, 5} {6b} {3b, 3c, 4b, 5} {4b, 5, 3b} {3b} {4b, 5} {5, 4b} {3c} {5} {3b, 4b, 5} {5, 4b} {4b} {5} {5} August 15 {} September 01 October 01 Onset of infectiousness E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Example 2: 2007 FMDV outbreak in the UK Host sets 2007 FMDV outbreak: host sets 2007 FMDV outbreak in Berkshire and Surrey, UK 8 51.44 Root host set is known. {8} {4b, 3b, 7} 6b and 7 have the same infector X, and the infector of 8 is 7 or X. {6b} Latitude {3b} {4b} 51.42 51.43 {7} {4b, 3b} {4b, 3b} 6b 7 3c 3b 4b {3c} {4b} 51.41 {5} {4b} 2 km 51.40 {5} 5 August 15 September 01 September 15 October 01 Onset of infectiousness −0.58 −0.57 −0.56 −0.55 −0.54 −0.53 −0.52 Longitude There are 4 possible transmission trees simultaneously consistent with the phylogeny and the epidemic data (down from 576). E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Molecular infectious disease epidemiology Contact interval hazard function estimates with and without a phylogeny The farm-to-farm contact interval distribution was estimated via maximum likelihood using a log-logistic distribution with rate λ and shape γ using the “least optimistic” within-host evolutionary model. 0.30 Farm−to−farm FMDV infectiousness: 2001 and 2007 0.20 0.15 0.10 0.00 0.05 Hazard of infectious contact 0.25 Without phylogeny With phylogeny 0 2 4 6 8 10 Days since onset of infectiousness E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. 12 Pointwise 95% confidence bands were generated by calculating the hazard function for each of 4000 (λ, γ) samples from the likelihood and taking the 2.5% and 97.5% quantiles of the hazards at each time point. Phylogenetic algorithms for infectious disease epidemiology Molecular infectious disease epidemiology Simulation study: Household transmission model To study the potential value of a phylogeny, we conducted a series of 1,000 simulations of 100 independent households of size 6. Each household had a single index case with an infection time chosen from an exponential(1) distribution. Each individual i had a binary covariate Xi that affected infectiousness and susceptibility such that the hazard of infectious contact from i to j at time τ after the onset of infectiousness in i was λij (τ ) = exp(βinf Xi + βsus Xj )λ0 . In each simulation, the true βinf and βsus were independently chosen from a uniform(-1, 1) distribution and λ0 = 1. The infectious periods were independently chosen from an exponential(1) distribution. E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Molecular infectious disease epidemiology Simulation study: Data analysis Data from the first 200 infections were analyzed three ways to estimate βinf , βsus , and λ0 : Using a parametric likelihood using only epidemiologic data on times of infection and recovery. Using epidemiologic data and who-infected-whom. Using epidemiologic data and within-household phylogenies with one sample for each infected individual. The within-host phylogeny for an individual who infected k − 1 other people was chosen uniformly at random from all rooted, bifurcating phylogenies with k tips. E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Molecular infectious disease epidemiology Simulation study: Relative efficiencies Phylogenetic relative efficiency: infectiousness ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●●●● ●● ● ● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ●● ● ●●● ● ● ● ● ● ●● ● ● ● ●● ● ●●● ● ● ● ● ●● ● ●● ●●● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ●● ● ● ● ● ●●●● ● ● ●● ● ● ●●●●● ●●●● ● ●● ●●●● ● ● ●●● ● ●● ●● ● ● ●● ● ●● ● ● ● ●●●● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ●●●●● ●●● ●● ●●●● ● ●● ●● ●●● ● ●● ●● ● ● ●● ● ● ●●● ●● ●●● ● ● ●● ●● ● ● ● ●●● ● ●●● ● ● ● ●●● ● ●●●● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●● ● ●● ● ●● ● ● ●● ●●●● ● ●●● ● ● ● ●● ●● ●● ●● ●●●●●● ●●●●●● ● ● ●●● ●●●● ●●●● ●● ● ●● ●● ●● ●●● ●● ● ● ● ● ●● ● ●●● ● ●● ● ● ● ● ●●● ● ● ●● ●●● ● ● ●● ● ● ● ●●● ●● ● ● ●●● ●● ● ●● ● ●● ●● ●● ●● ●● ●●● ●● ● ●●● ● ● ● ● ●●●● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ●●●●● ●● ●● ● ●● ● ●●● ● ● ● ●● ●● ●● ● ● ● ● ●●● ●● ●● ● ●●● ● ● ● ● ●● ● ●● ● ● ●●●● ● ●●●● ● ● ● ●● ●●● ● ● ● ● ●●●● ●●● ● ● ●●●●● ● ● ●● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ●●● ● ●● ●● ● ● ● ●●● ● ● ●● ●●● ●● ●● ●●●● ● ● ●● ● ● ● ● ●●● ● ● ●● ● ●● ● ●● ●●●● ● ●● ● ●● ● ● ● ●● ● ●● ● ● ●●● ●● ● ●● ● ●●●● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ●● ●● ● ●● ● ●● ● ●● ● ●●● ● ● ● ● ●● ● ● ● ●● ● ● ●●● ●●● ● ● ●● ● ●● ●● ●● ● ● ● ● ●● ● ●●●● ● ● ●● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ●● ● ● ● ●●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ●●● ● ● ●● ● ● ●● ● ●● ●● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ●● ● ●● ● ●●● ●●●●● ●● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ●● ●● ● ● ● ●● ● ● ●● ●●● ●● ● ● ● ●● ● ● ● ● ●● ●● ●● ●● ● ● ● ●● ● ● ● ● ●●● ●● ●● ● ● ● ●●● ●●●●● ●●● ● ●● ● ●● ● ●●●● ● ● ● ●●● ●● ●●● ● ● ●● ●●● ●● ● ●●● ● ●●● ● ● ●●● ● ● ● ●● ●● ● ●● ● ● ● ● ● ●● ● ●● ● ●● ● ●●●● ● ● ●●● ●● ● ●● ● ●● ● ● ●●●●● ● ●● ● ●● ●● ● ● ● ● ●● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ●●● ●● ● ●● ●● ●●●●● ●● ● ●● ● ●●● ● ●● ●● ● ●● ●●● ●● ● ● ● ●● ●●●● ●●●● ● ● ● ●●● ●● ● ●● ●● ● ● ●● ●●● ● ● ● ●● ● ●● ● ● ●● ● ●● ● ●● ● ●● ●● ●●● ● ● ● ● ●● ● ● ● ● ● ●● ●●●● ● ● ●● ● ●●● ● ●● ●● ● ●●●● ●●●● ● ● ● ●● ●● ● ●●●● ●● ●●●● ● ●●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ●●● ● ●● ●● ● ●● ●● ● ●● ● ●●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●●● ● ● ●● ● ● ●● ● ●●●● ● ●●● ●●● ●● ● ●●●●● ●● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ●● ●● ●● ● ●● ● ●●● ●●● ●●●● ● ● ●● ●● ● ●● ● ●●● ● ● ● ● ●● ● ●● ● ● ● ●●● ● ● ● ● ● ●● ● ●● ●● ● ●● ● ● ● ● ● ● ● ●●●● ● ●● ●● ● ●● ●●● ●●● ● ●●● ●● ●● ●● ● ●● ● ● ●● ●●● ●● ●● ● ●● ●● ●● ●● ● ●●●● ● ● ● ● ● ●● ● ● ●●● ●● ● ● ●●● ● ●●●● ● ●● ●● ●● ● ● ●● ● ●● ● ●● ● ●● ● ● ● ●● ●● ● ● ●● ● ● ● ●● ● ●●● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ●● ● ● ● ● ● ●● ● ●●● ● ●●● ●●● ● ●● ● ● ● ●● ●● ● ● ●●● ●● ● ● ● ● ●●●● ● ●● ●●● ●●● ●● ● ●●● ● ●● ●● ●● ● ●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ●●●● ●●●● ● ●●● ● ● ● ●●● ● ● ● ●●●● ● ●●● ●●●● ●● ● ● ●●● ● ● ●● ● ● ● ● ●● ● ● ●● ● ●● ● ●●● ●● ●● ●●●● ● ● ●●●●● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ● ● ● ●●● ● ●●●● ● ●● ● ● ●●● ●● ● ● ●● ●●● ● ● ●● ●●●● ●● ● ●●●●● ●● ● ● ● ● ● ●● ● ●● ●● ●●●● ● ● ●● ●● ●● ● ●●● ● ● ●●●● ● ●●●● ● ● ●● ● ●● ●●●●●● ● ●●●●● ● ● ●● ●● ● ●●●●● ● ● ●● ●● ● ●● ● ●●● ●● ● ● ● ● ● ●● ● ● ●●●● ● ● ● ●● ● ●●●● ● ● ● ● ● ● ●●● ●● ● ●● ●●● ●●●● ●●● ●● ● ● ● ●●● ●●● ●●●● ● ● ● ●●●●●●●● ● ●● ● ● ● ●● ● ●● ●●●●● ● ● ●● ●●●● ●●● ● ●● ●● ●● ● ●● ●● ● ● ● ●● ● ●●● ●●●●● ●● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ●●●●● ● ●● ● ● ●● ● ●●●● ●● ●●●●●● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ●● ● ● ● ● ●●● ●● ● ●●● ● ● ●●● ●●● ●● ● ● ●● ● ● ●● ●● ● ● ●●● ●● ●● ● ● ● ●● ● ●●●● ●●●●● ●●● ● ● ●● ● ●●●● ● ●● ●●● ●●●● ● ● ● ●● ● ●●●● ●●●●●●● ● ● ● ●●● ● ● ●● ●●● ●● ●● ●●● ● ● ● ●● ●●● ●● ● ● ● ●● ● ● ● ●●● ●●●● ● ● ● ● ● ●● ●●●● ● ● ●● ● ●● ● ● ●● ●● ●● ● ●●●● ● ●● ● ●● ●● ● ●● ●● ● ● ● ●● ● ●●● ●●● ● ● ● ● ●● ●● ● ● ● ● ● ● ●●● ●●● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ●● ● ● ●● ● ● ● ● ●● ● ● ●● ●● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ●● ●● ● ● ●● ● ● ●● ●●● ● ● ● ●● ● ●● ● ●● ● ● ●●●● ●●●●● ●● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ●● ●●●● ● ●● ●●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ●● ●● ●●●●● ●● ●●● ● ● ●● ●● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ●● ●● ● ●● ●●● ●● ● ● ●● ●● ● ● ●●● ●● ● ● ● ●● ●● ●●● ● ● ●● ●●● ● ● ●● ● ●●●● ● ● ●● ● ● ● ●● ●●●● ●●●●● ● ●● ● ●●● ●● ●● ● ●● ● ●● ●● ●●● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ●● ●● ●●● ● ● ●● ● ●● ●●● ●● ● ● ●● ●●●●● ●● ● ●● ●● ● ● ●● ●● ● ●● ●●●● ● ●● ● ●● ●●● ● ●● ● ● ●● ● ●● ●●● ●●●● ● ●● ●●●● ●● ● ● ● ●● ● ● ● ●● ● ●● ●● ● ●● ●● ● ● ●● ● ● ● ●●● ●● ●●●● ● ● ●● ●●● ● ●●● ● ●● ● ● ● ● ●● ●●● ●● ●●● ● ●● ●●●● ●●● ● ● ●●●●●● ●●● ● ●● ● ● ●●●●● ●● ●●● ● ●● ● ●● ● ●● ● ● ●● ● ●● ● ● ●● ● ●● ●● ● ●●● ● ●● ● ● ●●●● ● ●● ●●● ● ●●● ● ● ● ●● ● ● ● ●● ● ● ●●● ●● ● ● ●● ●● ● ●● ●● ● ●●● ● ● ●●●●● ● ●● ● ● ●● ●● ●●●● ●●● ●● ● ● ● ●● ● ● ●●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ●● ● ●● ● ● ● ●● ● ● ● ●●● ● ● ● ●● ● ● ●●● ●●●● ●● ●● ● ● ●● ● ● ●● ● ●●● ●● ● ● ● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ●● ●●●● ● ●●● ●● ● ●●●● ●●● ●●● ● ●● ● ● ● ●● ● ●● ●●● ●● ●● ●● ● ● ● ●● ● ● ● ●●● ● ●●● ● ●●● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ●● ● ●● ● ● ●● ● ● ●●● ●● ● ●●● ●● ●● ●● ● ●● ● ● ● ●●●● ●● ●● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ● ●●●●●● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ●● ● ● ● ● ● ●● ●●●● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ●●● ●●● ●● ● ● ● ● ● ● ● ●●● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● 1.5 ●● 1.0 ● Relative efficency ● ● 0.0 ● ● −1.0 −0.5 Epidemiologic data Epidemiologic data with phylogeny Epidemiologic data with who−infected−whom 0.0 0.5 1.0 Log hazard ratio for infectiousness (βinf) ● 0.0 0.5 1.0 1.5 ● Relative efficency Phylogenetic relative efficiency: baseline hazard ● 0 200 Epidemiologic data Epidemiologic data with phylogeny Epidemiologic data with who−infected−whom 400 600 800 1000 Simulation number Here, relative efficiency was calculated using mean squared error (MSE). All analyses were equally efficient for βsus . E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Molecular infectious disease epidemiology Simulation study: Overall relative efficiency Compared to the analysis with epidemiologic data only, the relative efficency of the phylogenetic analysis was 1.30 for βinf and 1.15 for λ0 . Compared to the analysis with who-infected-whom, the relative efficiencies were 0.81 and 0.94, respectively. When parameter estimates were used to calculate the marginal probability of transmission from an infectious person to a susceptible household member for all four covariate combinations, the relative efficiencies were 1.38 and 0.90. A phylogeny can recover much of the information that would be gained by observing who-infected-whom. E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Conclusion Implications and future directions In emerging epidemics, more emphasis needs to be given to collecting detailed epidemiologic data from well-defined groups of contacts. There is a tradeoff between the breadth and depth of surveillance. It is essential to collect data on both infected people and people who were exposed to infection but not infected. Complete sampling of pathogen samples can be equivalent to a 10% to 30% increase in sample size for estimates of infectiousness and baseline hazards (i.e., evolution of infectiousness over time). These methods need to be extended to account for phylogenetic uncertainty and to be integrated into non- and semiparametric statistical methods for infectious disease transmission data, most likely in a Bayesian MCMC framework. E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology Acknowledgements The data on the FMDV outbreaks was made publicly available in the following publications: Eleanor Cottam, Gaël Thébaud, Jemma Wadsworth, John Gloster, Leonard Mansley, David J. Paton, Donald P. King, and Daniel T. Haydon. Integrating genetic and epidemiological data to determine transmission pathways of foot-and-mouth disease virus (2008). Proceedings of the Royal Society B 275: 887–895. Marco J. Morelli, Gaël Thébaud, Joël Chadœuf, Donald P. King, Daniel T. Haydon, and Samuel Soubeyrand (2012). PLoS Computational Biology 8: e1002768. This research was supported by National Institute of Allergy and Infectious Diseases (NIAID) grant R00 AI095302 and National Institute of General Medical Sciences (NIGMS) grant U54 GM111274. The content is solely the responsibility of the authors and does not represent the official views of NIAID, NIGMS, or the National Institutes of Health. E. Kenah, T. Britton, M. E. Halloran, and I. M. Longini, Jr. Phylogenetic algorithms for infectious disease epidemiology