* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download SOM 1 Procedure to identify potential transmitters of
Survey
Document related concepts
Herpes simplex virus wikipedia , lookup
West Nile fever wikipedia , lookup
Trichinosis wikipedia , lookup
Cross-species transmission wikipedia , lookup
Sarcocystis wikipedia , lookup
Schistosomiasis wikipedia , lookup
Human cytomegalovirus wikipedia , lookup
Hepatitis C wikipedia , lookup
Dirofilaria immitis wikipedia , lookup
Neonatal infection wikipedia , lookup
Hepatitis B wikipedia , lookup
Sexually transmitted infection wikipedia , lookup
Oesophagostomum wikipedia , lookup
Epidemiology of HIV/AIDS wikipedia , lookup
Hospital-acquired infection wikipedia , lookup
Diagnosis of HIV/AIDS wikipedia , lookup
Microbicides for sexually transmitted diseases wikipedia , lookup
Transcript
Sources of HIV infection among men having sex with men and implications for prevention * O. Ratmann1, A. van Sighem2, D. Bezemer2, A. Gavryushkina3, S. Jurriaans4, A. Wensing5, F. de Wolf1, P. Reiss2, 6, C. Fraser1 1 10 Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, United Kingdom. 2 Stichting HIV Monitoring, Amsterdam, the Netherlands. 3 Department of Computer Science, University of Auckland, New Zealand. 4 Department of Medical Microbiology, Academic Medical Center, Amsterdam, The Netherlands. 5 Department of Medical Microbiology, University Medical Center Utrecht, Utrecht, the Netherlands. 6 Department of Global Health, Academic Medical Center, Amsterdam, the Netherlands. Corresponding author: Oliver Ratmann [email protected] * 20 This manuscript has been accepted for publication in Science Translational Medicine. This version has not undergone final editing. Please refer to the complete version of record at www.sciencetranslationalmedicine.org/. The manuscript may not be reproduced or used in any manner that does not fall within the fair use provisions of the Copyright Act without the prior, written permission of AAAS. ONE SENTENCE SUMMARY To tailor HIV prevention strategies amongst men having sex with men, we characterized the sources of ~600 transmission events in the Netherlands. More than half of these infections could have been averted with available antiretrovirals, but only if considerably more men had tested annually. 30 ABSTRACT New HIV diagnoses among men having sex with men (MSM) have not decreased appreciably in most countries, even though care and prevention services have been scaled up substantially in the past twenty years. To maximize the impact of prevention strategies, it is crucial to quantify the sources of transmission at the population level. We used viral sequence and clinical patient data from one of Europe’s nation-wide cohort studies to estimate probable sources of transmission for 617 recently infected MSM. 71% of transmissions were from undiagnosed men, 6% from men who had 1 initiated antiretroviral therapy (ART), 1% from men with no contact to care for at least 18 months, 40 and 43% from those in their first year of infection. The lack of substantial reductions in incidence amongst Dutch MSM is not a result of ineffective ART provision or inadequate retention in care. In counterfactual modeling scenarios, 19% of these past cases could have been averted with current annual testing coverage and immediate ART to those testing positive. 66% of these cases could have been averted with available antiretrovirals (immediate ART provided to all MSM testing positive, and pre-exposure antiretroviral prophylaxis taken by half of all who test negative for HIV), but only if half of all men at risk of transmission had tested annually. With increasing sequence coverage, molecular epidemiological analyses can be a key tool to direct HIV prevention strategies to the predominant sources of infection, and help send HIV epidemics amongst MSM into a decisive decline. 50 INTRODUCTION Combination antiretroviral therapy (ART) transformed HIV from a deadly to a life-long disease, and is also one of the most effective strategies for preventing onward infections (1, 2). However, among men having sex with men (MSM), the substantial scale-up of ART in the past twenty years has not resulted in appreciable reductions of new HIV infections and diagnoses (table 1) (3). Building on successful behavioural and biomedical HIV prevention strategies (4), further interventions exist that could be used to reduce the number of HIV infections amongst MSM. The 2016 WHO guidelines now recommend initiation regardless of CD4 cell count after diagnosis (immediate ART), as well as 60 provision of antiretrovirals as pre-exposure prophylaxis (PrEP) to those at substantial risk of infection (5). Future prevention programmes could focus on one or both recommended interventions, as well as on increased routine HIV testing and diagnosis (6); RNA testing to detect MSM in early acute infection when they are thought to be the most infectious (7); and improved adherence and linkage support to assist patients with attaining and sustaining undetectable viral loads whilst on ART (8). The potential impact of any of these interventions, and specifically those recommended by the WHO, relies crucially on how many HIV transmissions originate from different stages in the entire HIV infection and care continuum, ranging from undiagnosed acute infection through treated infection and loss to follow-up. This has been challenging to measure directly through classical epidemiological approaches. 70 In this study, we use the viral phylogenetic relationship between partial HIV-1 subtype B polymerase sequences to reconstruct past, probable transmission events in the Netherlands (figure 1). These 2 sequences were routinely collected for drug resistance testing of HIV-infected patients that are in care (9). Amongst sampled MSM, 94% were of subtype B. Then, we use clinical records to determine the staging of probable transmission events within the infection and care continuum (figure 2A and table 2). This enabled us to estimate the population-level proportion of transmissions amongst the reconstructed transmission events that are attributable to fourteen stages of the infection and care continuum in figure 2A. Transmissions could be attributed to stages before diagnosis because HIV sequences, always collected after diagnosis, diverge fast enough to indicate past 80 transmission events (10). Similarly, transmissions could also be attributed to men with no contact to care for at least 18 months. Finally, using these estimates, we quantified the potential impact of available, but currently not implemented prevention programmes in the Dutch MSM population, had they been used in the last three years. In particular, we evaluate if the revised 2016 WHO guidelines on immediate ART and PrEP could have substantially altered the course of the Dutch HIV epidemic amongst MSM. Understanding which interventions should be prioritized for the Dutch MSM epidemic is an important case study. First, the number of new MSM infections in the Netherlands has not decreased appreciably (9) despite comprehensive linkage and retention in care, substantial ART scale up free of 90 charge, and frequent follow up to maintain viral control of the vast majority of those on ART (table 1). Second, similar epidemic trends are reported from other countries with an overall equally comprehensive cascade of care (table 1), casting more general doubts on the population-level impact of current prevention strategies targeting MSM epidemics (11). Third, nearly all HIV-infected MSM in care are enrolled in the clinical, national opt-out ATHENA cohort since early 1996 (12). HIV care is monitored comprehensively at high frequency (clinic visits, treatment histories, co-morbidities recorded; ~3 viral load/CD4 measurements per year per individual) (12), which allowed us to characterize phylogenetically reconstructed transmission events in detail. RESULTS 100 Potential transmissions to MSM in confirmed recent infection at time of diagnosis By 2013, 11,863 HIV-infected MSM were registered and still in care in the Netherlands. To estimate their sources of transmission and then the impact of prevention programmes, we focussed on transmissions to MSM that were recently infected at time of diagnosis (stage A in figure 1). Between July 1996 and December 2010, 1,794 MSM had been infected at most 12 months prior to diagnosis. 3 Types of evidence were a previous negative HIV test (76%), laboratory diagnosis (7%), or clinical diagnosis of acute infection (17%). For 1,045 (58%) of these, a sequence was available. To these recipient MSM, we considered as potential transmitters all HIV-infected men whose course of 110 infection overlapped with the infection window of the recipient (stage A in figure 1). With this approach, we could resolve the timing and direction of potential transmission events (13). Out of all 12,207 potential transmitters, 5,593 (46%) had a viral sequence and formed ~ 4.4 million potential transmission pairs with sequences available for both individuals (stage B in figure 1). Phylogenetically probable transmission events Genetic sequences of the virus alone cannot prove epidemiological linkage (14). However, most of the potential transmission pairs could be ruled out as implausible, based on the phylogenetic relationship of the viral sequences. The viral phylogeny among the Dutch sequences and their closest 120 matches in the Los Alamos HIV sequence database (http://www.hiv.lanl.gov/) was reconstructed with maximum-likelihood methods, and reliable subtrees were identified (see Material and Methods). Potential transmitters whose sequences did not occur in the same reliable subtree as those of the recipient MSM were excluded (stage C in figure 1) (14), as were potential transmitters whose sequences were incompatible with a direct HIV transmission event (stage D in figure 1) (15). Direct transmission could be excluded in 99.96% of all potential transmission pairs. We identified 903 phylogenetically probable transmitters to 617 recipient MSM in 2,343 pairs. Our analyses are based on this open observational cohort of past, phylogenetically reconstructed transmission events. To guide and interpret this exclusion analysis, we evaluated patterns of viral divergence between 130 sequences isolated from epidemiologically confirmed transmission pairs (16), and pairings of Dutch MSM that could not have infected each other (see Material and Methods). Based on these pairs, the above exclusion criteria were highly specific (true transmitters to recipients are not excluded, >90%), whilst sensitivity was low (incorrect transmission pairs could not always be excluded, ~60%). This indicates that the actual transmitter is almost certainly among the phylogenetically reconstructed, probable transmitters, provided he was sequenced. From the known sequence coverage alone, we expected that approximately half of all 1,045 recipient MSM with a sequence had their actual transmitter sampled—suggesting further that the actual transmitter is among the phylogenetically reconstructed, probable transmitters for the large majority of the reconstructed 617 transmission events. 140 4 Clinical and demographic characteristics of the selected 617 recipient MSM were typical of all 1,794 MSM that were in confirmed recent infection at time of diagnosis (table 3). This indicates that the probable transmitters in the cohort are also typical of the transmitters to recently infected MSM. Characterization of individual transmission events by stage in the HIV infection and care continuum Using clinical records, we then enumerated all stages in the HIV infection and care continuum during which the 617 transmission events could have occurred. Probable transmitters progressed in stage over time, and overlapped with infection windows in 13,169 time-resolved, six week long 150 transmission intervals (figure 2B). Censoring and sequence sampling biases were identified for each stage by comparing men with and without a sequence, and were adjusted in line with previous work (17). Reflecting targeted sequence collection, intervals were not missing at random (figures 2C and S9). Each interval was associated with a phylogenetic transmission probability, based on the genetic distance between sequences from the transmitter and recipient and the time elapsed since the putative transmission interval and the sampling dates of both individuals (see Materials and Methods and figure S10). For each recipient, the probability that transmission occurred from one of the fourteen stages then depends on the number of his probable transmitters in that stage, and the transmission probabilities associated with each of the corresponding transmission intervals (see Materials and Methods). 160 Sources of HIV transmission The population-level proportions of HIV transmissions attributable to the fourteen infection/care stages were obtained by summing individual-level transmission probabilities by stage across all recipients, and are shown in table 4. Figure 3 compares the proportion of transmissions from each stage to the population-level proportion of infected men in these stages. Between July 1996 and December 2010, an estimated 71% [66%-73%] of all 617 transmission events originated from undiagnosed men, 22% [21%-26%] from diagnosed but not yet treated men, 6% [5%-8%] from men who initiated ART and 1% [0.7-1.6%] from men with no contact to care for at least 18 months. An 170 estimated 43% [37%-46%] of the 617 recipient MSM were infected by men undergoing their first year of infection. Impact of prevention strategies 5 Figure 4 describes the counterfactual prevention scenarios for which we calculated the proportion of transmissions in the cohort that could have been averted between mid 2008 to December 2010, had we intervened to re-distribute the identified, probable transmitters to less infectious infection/care stages. Young MSM are at particularly high risk of infection (18, 19). We therefore considered— along the revised 2016 WHO guidelines (5)— roll-out of immediate ART to all infected MSM and 180 PrEP to half of all MSM aged 30 or less that test negative: at most 30% [22%-39%] of infections could have been averted without increased annual testing. Immediate ART alone could have averted 19% [13%-26%] of these cases at current testing levels. In practice, low adherence is associated with decreasing effectiveness of PrEP (20). We assumed an 86% efficacy of PrEP as reported in the recent Ipergay and PROUD trials (21, 22). Figure S12 reports the impact of lower efficacy values. Figure S13 reports the impact of lower or higher PrEP coverage. Next, we considered increased annual testing. Only 17% of identified probable transmitters had a last negative test in the year before diagnosis, compared to 27% of diagnosed MSM between mid 2008 to December 2010 and 38% of uninfected MSM in 2013 (table 1). If half of all transmitters had tested annually, immediate ART and PrEP to half of all MSM aged 30 or less that test negative could have averted 45% [34%- 190 56%] of infections. Additional roll-out of PrEP to half of all men testing negative would have substantially boosted the combination intervention: 66% [50%-78%] of infections could have been averted. DISCUSSION HIV epidemics amongst MSM have—unlike other settings (23)— not declined appreciably with substantial improvements to care and ART scale-up (table 1). We characterized 617 past transmission events amongst MSM in the Netherlands based on phylogenetic and clinical data, estimated their sources throughout the infection and care continuum, and quantified the impact that 200 biomedical prevention programmes could have had in averting the reconstructed transmission events. Analysing this transmission cohort, we aim to inform the design of future prevention interventions beyond high levels of ART coverage and the numerous successful behavioural interventions that are already in place (9). A potential limitation of this study is that transmitters to MSM in recent infection at diagnosis may differ from typical transmitters. On average, fewer men diagnosed late with a CD4 count below 350 cells/ml occurred in phylogenetic transmission clusters with a recipient MSM, compared to those without (figure S23). This may imply that overall, the proportion of transmissions from undiagnosed 6 men in chronic infection is higher, and consequently that the impact that immediate ART could have 210 had is lower than our estimates. Conversely, the impact of increased annual testing and PrEP could be larger than reported, if men diagnosed late are not more difficult to reach than the average transmitter in our cohort. Further, this study focuses on the sources and prevention of in-country transmissions: 97% of the recipient MSM reported that infection was likely acquired in the Netherlands compared to 86% of diagnosed MSM. The contribution of cross-border transmissions may increase as the response is strengthened (24), an effect which we did not consider. Phylogenetic uncertainty and the phylogenetic exclusion criteria had little impact on our findings (figures S14S22). A further potential caveat to the robustness of our findings is that only half of all potential transmitters had a viral sequence sampled. Although population-level sampling biases were adjusted, we must acknowledge that the actual transmitter may not have been sampled for all recipients. 220 Improving sequence sampling coverage at time of diagnosis is needed to facilitate phylogenetic prevention analyses (25). The identified sources of transmission imply, first, that viral suppression induced by ART is highly effective in preventing transmissions in this population (figure 3). The relative risk of HIV transmission from men after ART initiation varies by stage but is always estimated well below one when compared to diagnosed, untreated men with a CD4 count above 500 cells/ml, and is in particular 0.04 [0.02-0.1] for men with viral suppression (figure S11). Second, very few transmissions are attributable to temporary or permanent loss to follow up, which 230 must be considered in the context of high linkage and retention to care in the Netherlands: few diagnosed MSM had subsequently no contact to care for at least 18 months (8.2%) and most reentered care owithin five years (69%) (9). In contrast, several studies indicate that more than half of all transmissions amongst MSM in the United States originate from men that were not retained in care (26-28). The estimated impact of particular prevention strategies in figure 4 is limited to settings with a similar epidemic profile and care cascade as the Netherlands (table 1). Third, not more than an estimated 20% of infections in the cohort could have been averted between mid 2008 and December 2010 with immediate ART after diagnosis. Given the remarkable expansion of ART coverage in the Netherlands in the past (9), the prevention potential of immediate ART is 240 now limited. Nonetheless, starting ART at a cell count above 500 cells/ml leads to improved clinical outcomes and remains a priority (29). 7 Fourth, and similar to other locations (25, 30), almost half of all infections in our transmission cohort originated from men in their first year of infection. Frequent early transmission limits the overall impact of annual testing plus immediate ART to those testing positive (figure 4), and implies that prevention services to uninfected MSM must be strengthened. The substantial, estimated impact that PrEP would have had in averting transmissions in our cohort (figure 4) supports making PrEP available to MSM testing negative as in the United States (31). Recent PrEP demonstration projects (32, 33) indicate that existing barriers such as low awareness (34) and a lack of experience amongst 250 providers (35) can be addressed. Concerns regarding the toxicity of PrEP, increasing sexual risk behaviour and emerging drug resistance have to date not been substantiated since PrEP was made available in the United States (36). In the context of PrEP-experienced prevention services, high discontinuation rates after PrEP initiation appear to be the greatest challenge to maintain protection from infection (32). Fifth, without substantial increases to current annual testing coverage, ART and PrEP offered along the revised 2016 WHO guidelines could not have prevented more than a quarter of all infections in our transmission cohort. Since phylogenetically probable transmitters tend to test much less frequently than the average diagnosed MSM, substantial barriers likely exist in reaching men at high 260 risk of onward transmission, and further work is needed to characterize these (37). Strategies such as self-testing (38), community-based testing (39), and more provider-initiated routine testing in general practices and at medical admissions raised annual testing coverage in pilot projects (12), and need to be expanded alongside biomedical interventions. Sixth, this study indicates that substantial reductions in HIV incidence amongst MSM could have been realized with a combination approach that includes—critically—increased annual testing, with uptake of PrEP by young MSM testing negative and provision of immediate ART to those testing positive. This finding is primarily based on the impact of increased annual testing and the higher efficacy of PrEP reported in two recent randomized controlled trials (21, 22), and updates previous 270 studies that estimate more limited benefits (4, 40, 41). Beyond age at testing, other characteristics not available to this study may also indicate high infection risk (42), and thereby identify groups of MSM to which PrEP should be made available as a priority. Provision of PrEP to all men testing negative is not affordable at current drug prices in high-income countries (40). The magnitude of the predicted impact of test-and-PrEP-and-treat for all (figure 4) could set an aspirational target for the fight against HIV amongst MSM. 8 The lack of substantial reductions in incidence amongst Dutch MSM is not a result of ineffective ART provision or inadequate retention in care. New HIV infections amongst MSM are challenging to prevent due to frequent early transmission and continued low testing uptake of men at risk of 280 transmission. Counterfactual prevention scenarios on phylogenetically reconstructed, past transmission events to MSM in recent infection at diagnosis predict that increased annual testing and uptake of PrEP by men at high risk of infection have a key role to send the HIV epidemic amongst MSM into a decisive decline. MATERIALS AND METHODS Study design We conducted a retrospective viral phylogenetic transmission and prevention study that focuses on 290 transmissions to MSM in confirmed recent HIV infection at time of diagnosis in the Netherlands (figure 1). The pre-specified objectives were to, first, reconstruct past, phylogenetically probable transmission events to these recipient MSM; second, to estimate the proportion of transmissions originating throughout the infection and care continuum based on the reconstructed transmission events; and, third, to estimate the proportion of infections that could have been averted through reallocating past, probable transmitters to less infectious stages in counterfactual modeling scenarios. The ATHENA national observational HIV cohort includes anonymized data of all HIV-infected patients followed longitudinally in the 27 HIV treatment centres in the Netherlands since 1996, except 1.5% who opt-out (9). ATHENA patients are informed of data collection by their treating 300 physician and can refuse further collection of clinical data according to an opt-out procedure. Patients who were diagnosed between 1981 and1995 were included in the cohort when they were still alive in 1996 (9). Demographic, clinical, and viral sequence data were collected at entry and follow-up visits as described previously (9). By March 2013, viral sequence data had been systematically entered until December 2010. Therefore, recipients were enrolled between early 1996 and December 2010. Potential transmitters were enrolled until database closure in March 2013. Table S1 characterizes the demographic, clinical, and viral sequence data that were used in this study. The resolution of the infection/care stages in table 2 was adjusted to ensure adequate sample sizes. The number of probable transmission intervals after first viral suppression was too small to enable further stratification by treatment class. This study was reviewed and approved by the HIV 9 310 Monitoring Institutional Data Access and Ethics Committee, and reported along STROME-ID guidelines. Viral sequences of different subtypes (n=355 from MSM), with less than 250 nucleotides (n=368) or indication for intra-subtype recombination (n=52) were removed prior to analysis. Primary drug resistance mutations were masked in each sequence (43). Demographic and clinical data were checked for consistency along patient timelines, and to lie within appropriate ranges. Outliers were reported to the ATHENA quality control team, and manually updated. Recently infected, recipient MSM and infection windows 320 We enrolled as recipients all MSM for whom a narrow infection window could be identified. MSM had evidence for infection within 12 months prior to diagnosis if either a last negative HIV-1 antibody test in the 12 months preceding diagnosis, an indeterminate HIV-1 western blot, or clinical diagnosis of acute infection were reported. Figure S1 shows enrollment progress over time. Infection windows were at most 12 months, or shorter if indicated by a last negative HIV antibody test (figure S2). Potential transmitters to recipient MSM We enrolled as potential transmitters all registered infected men that overlapped with infection 330 windows of recipients, and thus could have in principle infected a recipient. This definition required estimation of putative infection times. Calculations are based on a method by Rice and colleagues (44), see the online supplementary material. Estimated infection times are associated with substantial uncertainty, and sensitivity analyses were conducted for lower and upper 95% estimates. Table S2 characterizes the potential transmitters to all recipients. Further analysis was restricted to potential transmission pairs with sequences from both individuals (stage B in figure 1). Viral phylogenetic exclusion analysis to construct the transmission cohort The viral phylogeny was reconstructed under the GTR nucleotide substitution model with maximum- 340 likelihood methods (45) and is shown in figure S3. 500 bootstrap trees were created to quantify uncertainty in tree reconstruction (14). Genetic distances between sequences from transmitterrecipient pairs were highly variable (figure S4), which was accounted for in all analyses. To guide our choice of exclusion criteria, we considered, first, epidemiologically confirmed transmission pairs from previously published transmission chains in Belgium and Sweden (16, 46). The Belgium transmission chain was subsequently oversampled (15), providing 2,807 sequence pairs from 10 confirmed transmitters and recipients without multi-drug resistance. Further, we considered 4,117 pairs of sequences from the same Dutch patient and 201,605 pairs between Dutch patients that died before the last negative antibody test of another patient. These pairs were used to quantify patterns of viral evolutionary diversification that can be expected among confirmed linked and unlinked pairs, 350 and to develop exclusion criteria with high specificity; see online supplementary material. The Swedish pairs were used for validation purposes. All potential transmitters that were not excluded were considered phylogenetically probable, and are characterized in table S4. Relative pairwise transmission probabilities Among the 2,807 confirmed transmission pairs (15), the genetic distance between sequences from the transmitter and the recipient was strongly associated with the time elapsed between both sampling dates and the midpoint of the established infection window (figure S5). We fitted a probabilistic molecular clock model to these data to describe the relative probability of observing a 360 given genetic distance between sequences from a transmission pair that diverged for a specified amount of time from each other. The fitted model was then used to express the relative probability that a phylogenetically identified transmitter was the actual transmitter to a recipient (figure S5). Matching of clinical data to associate infection/care stages with transmission intervals Sources of transmission were not defined in terms of individuals, but the fourteen stages in the infection and care continuum in table 2 (stage E in figure 1). Stages were allocated to transmission intervals based on available clinical data (table S1). The duration of transmission intervals was set to six weeks to accommodate abrupt changes in infection/care stages. 370 Adjusting for censoring and sequence sampling biases Towards the present, an increasing fraction of potential transmitters may not have been diagnosed by the time of database closure. Potential transmitters in recent infection at time of diagnosis must, by definition, have been diagnosed within 12 months after the putative transmission interval. Therefore, the extent of right censoring differs between stages. To adjust for right censoring, we counted when potential transmitters in a particular infection/care stage became diagnosed in relation to the time of diagnosis of their recipient (figure S6). This enabled us to estimate the proportion of censored intervals for a hypothetical database closure time in the past (figure S6). We then extrapolated these 11 380 estimates to the actual database closure time with a bootstrap algorithm; see the online supplementary material. To quantify sequence sampling biases, we compared men with and without a sequence in the near complete population cohort (figure S7). A negative Binomial missing data model was then used to adjust for the number of missing transmission intervals (17). Adjustments accounted for censoring; increasing sampling frequency with duration in care; high sampling frequency of men returning to care, men participating in particular sub-studies, and men with indication of drug-resistance; as well as increasing sampling frequency with calendar time (figure S7). Epidemiological transmission analysis 390 Each interval was associated with a phylogenetic transmission probability (stage F in figure 1). The relative pairwise transmission probabilities (figure S5) were equally apportioned to all observed intervals of the same transmitter-recipient pair. Stage-specific data such as viral load was not used to determine these probabilities, to avoid circularity in the attribution of transmissions to infection/care stages. Then, the transmission probability in an observed interval 𝜏 from transmitter 𝑖 to recipient 𝑗 was calculated by 𝑝𝑖𝑗𝜏 = 𝜔𝑖𝑗𝜏 ⁄∑ 𝜔 + ∑ 𝑚 (𝑧)𝜔(𝑧) , 𝑘,𝑠 𝑘𝑗𝑠 𝑧 𝑗 where 𝜔𝑖𝑗𝜏 is the relative transmission probability in interval 𝜏, and the denominator sums over all observed, competing intervals as well as expected missing intervals 𝑚𝑗 (𝑧) in stage 𝑧 to recipient 𝑗. 400 For missing intervals, relative transmission probabilities were imputed and set to the median 𝜔𝑖𝑗𝑠 of all observed intervals 𝑠 in stage 𝑧, denoted by 𝜔(𝑧). For a missing transmission interval 𝜐 in stage 𝑥 to recipient 𝑗, we calculated 𝑝𝑗𝜐 = 𝜔(𝑥) ⁄∑ 𝜔 + ∑ 𝑚 (𝑧)𝜔(𝑧) . 𝑘,𝑠 𝑘𝑗𝑠 𝑧 𝑗 In 24 cases, two recipients were each other’s phylogenetically probable transmitter. We considered transmission in each direction equally likely. The relative transmission probabilities 𝜔𝑖𝑗𝜏 were calculated by 𝜔𝑖𝑗𝜏 = 𝜔𝑖𝑗 𝜑𝑖𝑗 ⁄𝜏𝑖𝑗 , where 𝜑𝑖𝑗 equals 0.5 if 𝑖 and 𝑗 are each other’s phylogenetically probable transmitters and otherwise one, 𝜔𝑖𝑗 are the relative pairwise probabilities shown in figure S5, and 𝜏𝑖𝑗 is the number of 410 transmission intervals between transmitter 𝑖 and recipient 𝑗. 12 These probabilities sum to one per recipient. If all transmitters are sampled, we obtain 𝑝𝑖𝑗𝜏 = 𝜔𝑖𝑗𝜏 ⁄∑𝑘,𝑠 𝜔𝑘𝑗𝑠 . If some transmitters are not sampled, the first part of the denominator, ∑𝑘,𝑠 𝜔𝑘𝑗𝑠 , is smaller and adjusted by the second part of the denominator. The number of expected missing intervals 𝑚𝑗 (𝑧) differs by stage, and adjusts for stage-specific censoring and sampling biases. The proportion of transmissions originating from the fourteen infection/care stages were obtained by summing the corresponding individual-level transmission probabilities (figure S8). Precisely, the proportion of transmissions from stage 𝑥 to recipients diagnosed in [𝑡1 , 𝑡2 ] was calculated by 420 𝑃𝑇 (𝑥, 𝑡1 , 𝑡2 ) = ∑𝑗∈𝑅(𝑡1 ,𝑡2 ) 𝑝𝑗 (𝑥) 1 = ∑𝑧 ∑𝑗∈𝑅(𝑡1 ,𝑡2 ) 𝑝𝑗 (𝑧) 𝐽 ∑ 𝑝𝑗 (𝑥), 𝑗∈𝑅(𝑡1 ,𝑡2 ) where 𝑅(𝑡1 , 𝑡2 ) is the set of recipients with date of diagnosis in [𝑡1 , 𝑡2 ], 𝐽 is the number of recipients with date of diagnosis in [𝑡1 , 𝑡2 ], and 𝑝𝑗 (𝑥) is the probability that recipient 𝑗 was infected by a transmitter in stage 𝑥. The probability 𝑝𝑗 (𝑥) is the sum 𝑚𝑗 (𝑥) 𝑝𝑗 (𝑥) = ∑ ∑ 𝑖∈𝐼𝑗 𝜏∈𝑉𝑖𝑗(𝑥) 𝑝𝑖𝑗𝜏 + ∑ 𝑝𝑗𝜐 , 𝜐=1 where 𝐼𝑗 are the observed, phylogenetically probable transmitters to recipient 𝑗, 𝑉𝑖𝑗(𝑥) is the set of observed transmission intervals between 𝑖 and 𝑗 in stage 𝑥, and all other quantities as defined above. The formula for 𝑃𝑇 (𝑥, 𝑡1 , 𝑡2 ) can be intuitively interpreted as the average probability that a recipient was infected by a transmitter in stage 𝑥. Thus, the precision in the estimated 𝑃𝑇 (𝑥, 𝑡1 , 𝑡2 ) depends primarily on the number of available recipients. We identified substantial individual-level variation 430 in the transmission probabilities 𝑝𝑗 (𝑥) (figure S8), suggesting that a relatively large number of past transmission events are needed in order to reliably quantify sources of transmission. To obtain a central estimate of 𝑃𝑇 (𝑥, 𝑡1 , 𝑡2 ), we used the central estimates of the 𝜔𝑖𝑗𝜏 and the expected number of missing transmission intervals. To quantify uncertainty in 𝑃𝑇 (𝑥, 𝑡1 , 𝑡2 ), we propagated uncertainty in the genetic distances and the number of missing transmission intervals with a bootstrap algorithm. Epidemiological prevention analysis 440 With the sources of transmission estimated, we compared the impact of prevention strategies in counterfactual scenarios that modelled the re-distribution of phylogenetically identified transmitters 13 to less infectious stages in the HIV infection and care continuum. This reduced the overall probability that any of the recipients would have been infected to less than one. The proportion of infections that could have been averted in the period [𝑡1 , 𝑡2 ] with a counterfactual prevention scenario 𝐻 is 𝑎(𝐻) = 1 − ∑ ∑ 𝑝𝑗𝐻 (𝑥) , 𝑗∈𝑅(𝑡1 ,𝑡2 ) 𝑥 where 𝑝𝑗𝐻 (𝑥) is the probability that recipient 𝑗 is infected by someone in stage 𝑥 under the counterfactual prevention scenario 𝐻. The individual-level prevention models are described in the supplementary online material. 450 Statistical uncertainty Central estimates of 𝑃𝑇 (𝑥, 𝑡1 , 𝑡2 ) and 𝑎(𝐻) were obtained under central estimates of the genetic distances in figure S4, the resulting phylogenetic transmission probabilities 𝜔𝑖𝑗𝜏 , and the expected number of missing transmission intervals (figure 2C). Bootstrap sampling of the recipients, the empirical distribution of genetic distances, the number of missing transmission intervals under a Negative Binomial missing data model, and the counterfactual re-allocation procedure of probable transmitters to less infectious infection/care stages was conducted to obtain non-parametric 95% confidence intervals. Confidence intervals are based on 1,000 bootstrap replicates. 460 Supplementary Materials Word document Online Materials and Methods Fig. S1 Number of identified recipient MSM by 3-month intervals. Fig. S2 Duration of infection windows of recipient MSM. Fig. S3 Snapshot of the reconstructed viral phylogeny. Fig. S4 Uncertainty in the estimated genetic distance between sequences from the transmitter and recipient of potential transmission pairs. 470 Fig. S5 Genetic distance between sequence pairs from previously published, epidemiologically confirmed transmitter-recipient pairs, and sequence pairs from the phylogenetically probable transmission pairs in this study. Fig. S6 Right censoring at past, hypothetical database closure times. Fig. S7 Sequence sampling probabilities by stage in the infection and care continuum. Fig. S8 Invidividual-level variation in phylogenetically derived transmission probabilities by infection/care stages. Fig. S9 Frequency of infection/care stages among phylogenetically probable transmitters. 14 Fig S10. Phylogenetically derived transmission probabilities of observed transmission intervals. Fig. S11 Transmission risk ratio from men after ART start, compared to diagnosed untreated men with CD4 > 500 cells/ml. Fig. S12 Sensitivity analysis on the impact of PrEP with lower efficacy. 480 Fig. S13 Sensitivity analysis on the impact of lower or higher PrEP coverage. Fig. S14 Impact of sampling and censoring adjustments on the estimated proportion of transmissions from stages in the infection and care continuum. Fig. S15 Impact of phylogenetic transmission probabilities on the estimated proportion of transmissions from stages in the infection and care continuum. Fig. S16 Impact of infection time estimates on the estimated proportion of transmissions from stages in the infection and care continuum. Fig. S17 Impact of phylogenetic clustering criteria on the estimated proportion of transmissions from stages in the infection and care continuum. 490 Fig. S18 Impact of additional genetic distance criteria on the estimated proportion of transmissions from stages in the infection and care continuum. Fig. S19 Impact of sequence sampling and censoring adjustments on the estimated proportion of averted infections. Fig. S20 Impact of phylogenetic transmission probabilities on the estimated proportion of averted infections. Fig. S21 Impact of infection time estimates and phylogenetic exclusion criteria on the estimated proportion of averted infections. Fig. S22 Impact of additional genetic distance criteria on the estimated proportion of averted infections per biomedical intervention. Fig. S23 Differences in transmission networks with and without a recipient MSM. 500 Table S1. Clinical and viral sequence data used in this study. Table S2 Potential transmitters and potential transmission pairs to the recipient MSM. Table S3 Identified phylogenetically probable transmitters and phylogenetically probable transmission pairs to the recipient MSM in the ATHENA cohort. References and Notes 510 1. M. S. Cohen, Y. Q. Chen, M. McCauley, T. Gamble, M. C. Hosseinipour, N. Kumarasamy, J. G. Hakim, J. Kumwenda, B. Grinsztejn, J. H. Pilotto, S. V. Godbole, S. Mehendale, S. Chariyalertsak, B. R. Santos, K. H. Mayer, I. F. Hoffman, S. H. Eshleman, E. Piwowar-Manning, L. Wang, J. Makhema, L. A. Mills, G. de Bruyn, I. Sanne, J. Eron, J. Gallant, D. Havlir, S. Swindells, H. Ribaudo, V. Elharrar, D. Burns, T. E. Taha, K. Nielsen-Saines, D. Celentano, M. Essex, T. R. Fleming, Prevention of HIV-1 infection with early antiretroviral therapy. N Engl J Med 365, 493-505 (2011). 2. A. Rodger, T. Bruun, V. Cambiano, P. Vernazza, V. Estrada, J. Van Lunzen, S. Collins, A. M. Geretti, A. Phillips, J. Lundgren, HIV transmission risk through condomless sex if HIV+ partner on suppressive ART: PARTNER Study, 21st Conference on Retroviruses and Opportunistic Infections, Boston, MA, USA, 2014. 15 520 530 540 550 560 3. C. Beyrer, S. D. Baral, F. van Griensven, S. M. Goodreau, S. Chariyalertsak, A. L. Wirtz, R. Brookmeyer, Global epidemiology of HIV infection in men who have sex with men. Lancet 380, 367-377 (2012). 4. P. S. Sullivan, A. Carballo-Dieguez, T. Coates, S. M. Goodreau, I. McGowan, E. J. Sanders, A. Smith, P. Goswami, J. Sanchez, Successes and challenges of HIV prevention in men who have sex with men. Lancet 380, 388-399 (2012). 5. World Health Organization, Guideline on when to start antiretroviral therapy and on preexposure prophylaxis for HIV, No. September 2015 Geneva, 2015. 6. A. Fogarty, L. Mao, Z. M. I, H. Santana, G. Prestage, J. Rule, P. Canavan, D. Murphy, M. D, The Health in Men and Positive Health cohorts: A comparison of trends in the health and sexual behaviour of HIV-negative and HIV-positive gay men, 2002-2005, National Centre in HIV Social Research, Sydney, 2006. 7. C. D. Pilcher, S. A. Fiscus, T. Q. Nguyen, E. Foust, L. Wolf, D. Williams, R. Ashby, J. O. O'Dowd, J. T. McPherson, B. Stalzer, L. Hightow, W. C. Miller, J. J. Eron, Jr., M. S. Cohen, P. A. Leone, Detection of acute infections during HIV testing in North Carolina. N Engl J Med 352, 18731883 (2005). 8. H. A. Weiss, J. N. Wasserheit, R. V. Barnabas, R. J. Hayes, L. J. Abu-Raddad, Persisting with prevention: the importance of adherence for HIV prevention. Emerg Themes Epidemiol 5, 8 (2008). 9. A. van Sighem, L. Gras, A. Kesselring, C. Smit, I. Engelhard, I. Stolte, P. Reiss, Monitoring of human immunodeficiency vrius infection in the Netherlands. Report 2013, Amsterdam, 2013. 10. T. T. Lam, C. C. Hon, J. W. Tang, Use of phylogenetics in the molecular epidemiology and evolutionary studies of viral infections. Crit Rev Clin Lab Sci 47, 5-49 (2010). 11. D. P. Wilson, HIV treatment as prevention: natural experiments highlight limits of antiretroviral treatment as HIV prevention. PLoS Med 9, e1001231 (2012). 12. Public Health England, Time to test for HIV: Expanding HIV testing in healthcare and community services in England, 2011. 13. E. Romero-Severson, H. Skar, I. Bulla, J. Albert, T. Leitner, Timing and order of transmission events is not directly reflected in a pathogen phylogeny. Mol Biol Evol 31, 2472-2482 (2014). 14. D. Pillay, A. Rambaut, A. M. Geretti, A. J. Brown, HIV phylogenetics. BMJ 335, 460-461 (2007). 15. B. Vrancken, A. Rambaut, M. A. Suchard, A. Drummond, G. Baele, I. Derdelinckx, E. Van Wijngaerden, A. M. Vandamme, K. Van Laethem, P. Lemey, The genealogical population dynamics of HIV-1 in a large transmission chain: bridging within and among host evolutionary rates. PLoS Comput Biol 10, e1003505 (2014). 16. P. Lemey, I. Derdelinckx, A. Rambaut, K. Van Laethem, S. Dumont, S. Vermeulen, E. Van Wijngaerden, A. M. Vandamme, Molecular footprint of drug-selective pressure in a human immunodeficiency virus transmission chain. J Virol 79, 11981-11989 (2005). 17. R. J. A. Little, D. B. Rubin, Statistical analysis with missing data. (Wiley, New York ; Chichester, 1987). 18. F. van Griensven, T. H. Holtz, W. Thienkrua, W. Chonwattana, W. Wimonsate, S. Chaikummao, A. Varangrat, T. Chemnasiri, W. Sukwicha, M. E. Curlin, T. Samandari, A. Chitwarakorn, P. A. Mock, Temporal trends in HIV-1 incidence and risk behaviours in men who have sex with men in Bangkok, Thailand, 2006-13: an observational study. Lancet HIV 2, e64-70 (2015). 19. F. D. Koedijk, B. H. van Benthem, E. M. Vrolings, W. Zuilhof, M. A. van der Sande, Increasing sexually transmitted infection rates in young men having sex with men in the Netherlands, 2006-2012. Emerg Themes Epidemiol 11, 12 (2014). 16 570 580 590 600 610 20. R. M. Grant, J. R. Lama, P. L. Anderson, V. McMahan, A. Y. Liu, L. Vargas, P. Goicochea, M. Casapia, J. V. Guanira-Carranza, M. E. Ramirez-Cardich, O. Montoya-Herrera, T. Fernandez, V. G. Veloso, S. P. Buchbinder, S. Chariyalertsak, M. Schechter, L. G. Bekker, K. H. Mayer, E. G. Kallas, K. R. Amico, K. Mulligan, L. R. Bushman, R. J. Hance, C. Ganoza, P. Defechereux, B. Postle, F. Wang, J. J. McConnell, J. H. Zheng, J. Lee, J. F. Rooney, H. S. Jaffe, A. I. Martinez, D. N. Burns, D. V. Glidden, T. iPrEx Study, Preexposure chemoprophylaxis for HIV prevention in men who have sex with men. N Engl J Med 363, 2587-2599 (2010). 21. S. McCormack, D. T. Dunn, M. Desai, D. I. Dolling, M. Gafos, R. Gilson, A. K. Sullivan, A. Clarke, I. Reeves, G. Schembri, N. Mackie, C. Bowman, C. J. Lacey, V. Apea, M. Brady, J. Fox, S. Taylor, S. Antonucci, S. H. Khoo, J. Rooney, A. Nardone, M. Fisher, A. McOwan, A. N. Phillips, A. M. Johnson, B. Gazzard, O. N. Gill, Pre-exposure prophylaxis to prevent the acquisition of HIV-1 infection (PROUD): effectiveness results from the pilot phase of a pragmatic open-label randomised trial. Lancet, (2015). 22. J. M. Molina, C. Capitant, B. Spire, G. Pialoux, C. Chidiac, I. Charreau, C. Tremblay, L. Meyer, J. F. Delfraissy, in CROI. (Seattle, 2015). 23. F. Tanser, T. Barnighausen, E. Grapsa, J. Zaidi, M. L. Newell, High coverage of ART associated with decline in risk of HIV acquisition in rural KwaZulu-Natal, South Africa. Science 339, 966-971 (2013). 24. D. Frentz, A. M. Wensing, J. Albert, D. Paraskevis, A. B. Abecasis, O. Hamouda, L. B. Jorgensen, C. Kucherer, D. Struck, J. C. Schmit, B. Asjo, C. Balotta, D. Beshkov, R. J. Camacho, B. Clotet, S. Coughlan, S. De Wit, A. Griskevicius, Z. Grossman, A. Horban, T. Kolupajeva, K. Korn, L. G. Kostrikis, K. Liitsola, M. Linka, C. Nielsen, D. Otelea, R. Paredes, M. Poljak, E. PuchhammerStockl, A. Sonnerborg, D. Stanekova, M. Stanojevic, A. M. Vandamme, C. A. Boucher, D. A. Van de Vijver, S. Programme, Limited cross-border infections in patients newly diagnosed with HIV in Europe. Retrovirology 10, 36 (2013). 25. B. G. Brenner, M. A. Wainberg, Future of phylogeny in HIV prevention. J Acquir Immune Defic Syndr 63 Suppl 2, S248-254 (2013). 26. A. B. Cope, K. A. Powers, J. D. Kuruc, P. A. Leone, J. A. Anderson, L. H. Ping, L. P. Kincer, R. Swanstrom, V. L. Mobley, E. Foust, C. L. Gay, J. J. Eron, M. S. Cohen, W. C. Miller, Ongoing HIV Transmission and the HIV Care Continuum in North Carolina. PLoS One 10, e0127950 (2015). 27. J. Skarbinski, E. Rosenberg, G. Paz-Bailey, H. I. Hall, C. E. Rose, A. H. Viall, J. L. Fagan, A. Lansky, J. H. Mermin, Human immunodeficiency virus transmission at each step of the care continuum in the United States. JAMA Intern Med 175, 588-596 (2015). 28. E. S. Rosenberg, G. A. Millett, P. S. Sullivan, C. Del Rio, J. W. Curran, Understanding the HIV disparities between black and white men who have sex with men in the USA using the HIV care continuum: a modeling study. Lancet HIV 1, e112-e118 (2014). 29. I. S. S. Group, J. D. Lundgren, A. G. Babiker, F. Gordin, S. Emery, B. Grund, S. Sharma, A. Avihingsanon, D. A. Cooper, G. Fatkenheuer, J. M. Llibre, J. M. Molina, P. Munderi, M. Schechter, R. Wood, K. L. Klingman, S. Collins, H. C. Lane, A. N. Phillips, J. D. Neaton, Initiation of Antiretroviral Therapy in Early Asymptomatic HIV Infection. N Engl J Med 373, 795-807 (2015). 30. E. Volz, E. Ionides, E. Romero-Severson, M. G. Brandt, E. Mokotoff, J. Koopman, HIV-1 Transmission During Early Infection in Men Who Have Sex with Men: A Phylodynamic Analysis. PLoS Med 10, e1001568 (2013). 31. U. S. F. a. D. Administration, Truvada approved to reduce the risk of sexually transmitted HIV in people who are not infected with the virus., (2012). 32. R. M. Grant, P. L. Anderson, V. McMahan, A. Liu, K. R. Amico, M. Mehrotra, S. Hosek, C. Mosquera, M. Casapia, O. Montoya, S. Buchbinder, V. G. Veloso, K. Mayer, S. Chariyalertsak, L. G. Bekker, E. G. Kallas, M. Schechter, J. Guanira, L. Bushman, D. N. Burns, J. F. Rooney, D. V. Glidden, t. iPrEx study, Uptake of pre-exposure prophylaxis, sexual practices, and HIV incidence in 17 620 630 640 650 660 men and transgender women who have sex with men: a cohort study. Lancet Infect Dis 14, 820-829 (2014). 33. A. Liu, S. Cohen, S. Follansbee, D. Cohan, S. Weber, D. Sachdev, S. Buchbinder, Early experiences implementing pre-exposure prophylaxis (PrEP) for HIV prevention in San Francisco. PLoS Med 11, e1001613 (2014). 34. J. P. Bil, U. Davidovich, W. M. van der Veldt, M. Prins, H. J. de Vries, G. J. Sonder, I. G. Stolte, What do Dutch MSM think of preexposure prophylaxis to prevent HIV-infection? A crosssectional study. AIDS 29, 955-964 (2015). 35. M. J. Mimiaga, J. M. White, D. S. Krakower, K. B. Biello, K. H. Mayer, Suboptimal awareness and comprehension of published preexposure prophylaxis efficacy results among physicians in Massachusetts. AIDS Care 26, 684-693 (2014). 36. K. H. Mayer, S. Hosek, S. Cohen, A. Liu, J. Pickett, M. Warren, D. Krakower, R. Grant, Antiretroviral pre-exposure prophylaxis implementation in the United States: a work in progress. J Int AIDS Soc 18, 19980 (2015). 37. D. Pao, M. Fisher, S. Hue, G. Dean, G. Murphy, P. A. Cane, C. A. Sabin, D. Pillay, Transmission of HIV-1 during primary infection: relationship to sexual risk and sexually transmitted infections. AIDS 19, 85-90 (2005). 38. N. Pant Pai, J. Sharma, S. Shivkumar, S. Pillay, C. Vadnais, L. Joseph, K. Dheda, R. W. Peeling, Supervised and unsupervised self-testing for HIV in high- and low-risk populations: a systematic review. PLoS Med 10, e1001414 (2013). 39. N. Lorente, M. Preau, C. Vernay-Vaisse, M. Mora, J. Blanche, J. Otis, A. Passeron, J. M. Le Gall, P. Dhotte, M. P. Carrieri, M. Suzan-Monti, B. Spire, A.-D. S. Group, Expanding access to nonmedicalized community-based rapid testing to men who have sex with men: an urgent HIV prevention intervention (the ANRS-DRAG study). PLoS One 8, e61225 (2013). 40. G. B. Gomez, A. Borquez, K. K. Case, A. Wheelock, A. Vassall, C. Hankins, The cost and impact of scaling up pre-exposure prophylaxis for HIV prevention: a systematic review of costeffectiveness modelling studies. PLoS Med 10, e1001401 (2013). 41. R. B. Birger, T. B. Hallett, A. Sinha, B. T. Grenfell, S. L. Hodder, Modeling the impact of interventions along the HIV continuum of care in Newark, New Jersey. Clin Infect Dis 58, 274-284 (2014). 42. J. Heuker, G. J. Sonder, I. Stolte, R. Geskus, A. van den Hoek, High HIV incidence among MSM prescribed postexposure prophylaxis, 2000-2009: indications for ongoing sexual risk behaviour. AIDS 26, 505-512 (2012). 43. V. A. Johnson, V. Calvez, H. F. Gunthard, R. Paredes, D. Pillay, R. W. Shafer, A. M. Wensing, D. D. Richman, Update of the drug resistance mutations in HIV-1: March 2013. Top Antivir Med 21, 6-14 (2013). 44. B. D. Rice, J. Elford, Z. Yin, V. C. Delpech, A new method to assign country of HIV infection among heterosexuals born abroad and diagnosed with HIV. AIDS 26, 1961-1966 (2012). 45. A. M. Kozlov, A. J. Aberer, A. Stamatakis, ExaML version 3: a tool for phylogenomic analyses on supercomputers. Bioinformatics 31, 2577-2579 (2015). 46. T. Leitner, D. Escanilla, C. Franzen, M. Uhlen, J. Albert, Accurate reconstruction of a known HIV-1 transmission history by phylogenetic tree analysis. Proc Natl Acad Sci U S A 93, 1086410869 (1996). 47. A. van Sighem, F. Nakagawa, D. De Angelis, C. Quinten, D. Bezemer, E. O. de Coul, M. Egger, F. de Wolf, C. Fraser, A. Phillips, Estimating HIV Incidence, Time to Diagnosis, and the Undiagnosed HIV Epidemic Using Routine Surveillance Data. Epidemiology 26, 653-660 (2015). 48. Health Protection Agency, Longitudinal analysis of the trajectories of CD4 cell counts, 2011. 49. D. Bezemer, F. de Wolf, M. C. Boerlijst, A. van Sighem, T. D. Hollingsworth, C. Fraser, 27 years of the HIV epidemic amongst men having sex with men in the Netherlands: an in depth mathematical model-based analysis. Epidemics 2, 66-79 (2010). 18 670 680 50. S. H. Eshleman, S. E. Hudelson, A. D. Redd, L. Wang, R. Debes, Y. Q. Chen, C. A. Martens, S. M. Ricklefs, E. J. Selig, S. F. Porcella, S. Munshaw, S. C. Ray, E. Piwowar-Manning, M. McCauley, M. C. Hosseinipour, J. Kumwenda, J. G. Hakim, S. Chariyalertsak, G. de Bruyn, B. Grinsztejn, N. Kumarasamy, J. Makhema, K. H. Mayer, J. Pilotto, B. R. Santos, T. C. Quinn, M. S. Cohen, J. P. Hughes, Analysis of genetic linkage of HIV from couples enrolled in the HIV Prevention Trials Network 052 trial. J Infect Dis 204, 1918-1926 (2011). 51. A. Gavryushkina, D. Welch, T. Stadler, A. J. Drummond, Bayesian inference of sampled ancestor trees for epidemiology and fossil calibration. PLoS Comput Biol 10, e1003919 (2014). 52. S. B. McCombs, E. McCray, D. A. Wendell, P. A. Sweeney, I. M. Onorato, Epidemiology of HIV-1 infection in bisexual women. J Acquir Immune Defic Syndr 5, 850-852 (1992). 53. B. Efron, R. J. Tibshirani, An introduction to the bootstrap. (Chapman and Hall, Boca Raton u. a., ed. [Reprint], 1998), pp. XVI, 436 S. 54. M. S. Cohen, G. M. Shaw, A. J. McMichael, B. F. Haynes, Acute HIV-1 Infection. N Engl J Med 364, 1943-1954 (2011). 55. Associated Partners of EMIS, EMIS 2010: The European Men-Who-Have-Sex-With-Men Internet Survey. Findings from 38 countries., Stockholm, 2013. Acknowledgments: We thank the Imperial College High Performance Computing Service (http://www3.imperial.ac.uk/ict/services/hpc), three anonymous referees, the HIV treating physicians, HIV nurse consultants and staff of the diagnostic laboratories and facilities in the HIV treatment centres, along with the data collecting and monitoring staff both within and outside the Stichting HIV Monitoring Foundation for their contributions to make this work possible. Funding: OR is supported by the Wellcome Trust (fellowship WR092311MF); CF by the European Research Council (Advanced Grant PBDR-339251) and the Bill & Melinda Gates Foundation (PANGEA-HIV consortium). PR through his institution received independent scientific grant support from Bristol- 690 Myers Squibb, ViiV Healthcare, Gilead Sciences, Janssen Pharmaceuticals Inc., Merck&Co, served on a scientific advisory board for Gilead Sciences and serves on a data safety monitoring committee for Janssen Pharmaceuticals Inc., for which his institution has received remuneration. The Aids Therapy Evaluation in the Netherlands (ATHENA) observational cohort study is part of Stichting HIV Monitoring and supported by a grant from the Netherlands Ministry of Health, Welfare and Sport through its Centre for Infectious Disease Control-National Institute for Public Health and the Environment. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Author contributions: OR, FW, PR, CF conceived the study. OR, CF developed the methods, did the analysis, and reviewed all statistical aspects of the analysis. AS, DB, SJ, AW provided data used to conduct the analysis. AG assisted in estimating the 700 viral phylogeny. AS, DB, FW, PR advised on analysis and interpretation. OR, CF wrote the first draft. All authors reviewed and approved the final version. Competing interests: None declared. Data and materials availability: Data are available from the HIV Monitoring Institutional Data 19 Access / Ethics Committee for researchers who meet the criteria for access to confidential data. Contact email: [email protected]. 20 Fig. 1. Study design. Nationwide sources of transmission were identified for MSM with evidence 710 for recent infection in the first year prior to diagnosis (recipient MSM). (A) Out of all patients in the ATHENA cohort, men whose course of infection overlapped with the infection window were considered as potential transmitters. (B) Only those pairs with sequences from both individuals were considered for further analysis. (C-D) Using viral phylogenetic analyses, the vast majority of pairs could be ruled out. All remaining pairs were considered phylogenetically probable. (E) Based on detailed clinical records, probable transmission events were characterized by stage in the HIV infection and care continuum. Because transmitters progressed in stage over time, we considered time-resolved transmission intervals. (F) Independent viral phylogenetic data from epidemiologically confirmed pairs was used to determine the phylogenetic probability of direct transmission during each interval. Statistical analyses adjusted for extensive sampling and censoring biases. 21 720 730 Fig. 2. Phylogenetically probable transmission intervals, linked to stages in the infection and care continuum. (A) Left: Each recipient could have been infected during his infection window from multiple probable transmitters. For each transmitter, the transmission window was split into six-week long probable transmission intervals. Infection/care stages were assigned to these intervals based on clinical data to reflect progression of the transmitters through the infection/care continuum. Right: Relationship between the fourteen infection/care stages as defined in table 2. Transmitters progress uni-directionally, except for stages after first viral suppression, or when individuals re-enter care (as indicated by arrows). (B) For each stage, the total number of observed transmission intervals to recipient MSM during their infection windows is shown. Overall, the number of transmission intervals per recipient increases with time, reflecting the increasing number of infected men in care. Transmitters are increasingly less likely to have been diagnosed by 2013, resulting in a decreasing number of undiagnosed transmission intervals towards the present. (C) In addition to censoring, diagnosed transmitters may not have a sequence sampled. Comparing men with and without a sequence in the near complete population cohort, we could adjust for these biases. The total number of expected missing transmission intervals to recipients diagnosed in one of four observation periods is shown, along with 95% bootstrap confidence intervals. Observed and expected missing transmission intervals were associated with phylogenetic transmission probabilities, which sum to one per recipient. 740 22 Fig. 3. Proportion of transmissions by stage in the infection and care continuum, versus proportion of these stages amongst infected men. (A) Relative frequency of infection/care stages in the population, among potential transmitters that overlap with the infection windows of recipient MSM and could have in principle transmitted to one of the recipient MSM. (stage A in figure 1, 750 colour codes as in figure 2). (B) Proportion of the 617 transmission events attributable to each infection/care stage (bar: 95% bootstrap confidence interval). 23 Fig. 4. Impact of biomedical interventions amongst MSM in the Netherlands. Estimated proportion of transmissions that could have been averted in the period 2008/07-2010/12 if the corresponding additional prevention strategies had been implemented by 2008/07 (line: median, box: bootstrap interquartile range, whiskers: 95% bootstrap confidence interval). Scenarios were varied by annual testing coverage of phylogenetically identified, probable transmitters. Current testing 760 coverage was 17%, corresponding to the proportion of probable transmitters that had a negative test in the twelve months prior to diagnosis. 24 Table 1. HIV incidence trends and care for infected MSM in the Netherlands and other countries. Country Annual testing of uninfected MSM Diagnosed MSM receiving ART Treated MSM with suppressed viral load Retention of MSM in care HIV incidence among MSM Year % Year % Year % Viral load threshold (cps/ml) Year % Year Trend 2003 ?? 2003 79 Median CD4 count at ART initiation (cells/ml) 202 2003 80 <100 2003 92 2003 Increasing n 2013 38.4 a 2013 90 382 2013 91 <100 2012 95 2013 Stable n Australia 2013 61.1 b 2013 75 b 379 h 2013 88 k, <50 2013 96 h, 2013 British Columbia 2009 51 c 2014 85 e 411 e 2014 84 e <50 2011 86 d, 2013 Increasing, stable in Western Australia and Queensland o Stable p Switzerland 2010 39.3 a 2014 86 f 402 f 2012 96 l, <200 2012 97 i, 2014 United Kingdom 2010 36.4 a 2013 86 g 420 j 2013 91 m <200 2013 95 m 2013 Netherlands Decreasing new diagnoses and recent infections q Stable r The EMIS Network. EMIS 2010: The European Men-Who-Have-Sex-With-Men Internet Survey. Findings from 38 countries. Stockholm: European Centre for Disease Prevention and Control, 2013.b From Gay Community Periodic Surveys, https://kirby.unsw.edu.au/projects/gay-community-periodic-surveys, reported in HIV, hepatitis, and sexually transmissible infections in Australia, Annual surveillance report 2014. c From Mancount, prospective cross-sectional survey in Vancouver http://www.mancount.ca/files/ManCount_Report2010.pdf .d From Nosyk B, Montaner JS, Colley G, et al. The cascade of HIV care in British Columbia, Canada, 1996-2011: a population-based retrospective cohort study. The Lancet infectious diseases 2014;14:40-9 e From HIV monitoring quarterly report for British Columbia, Fourth quarter 2014. f 2621 out of 3081 MSM on ART and registered in the Swiss HIV Cohort Study, personal communication with the Datacenter of the Swiss HIV Cohort Study. g From https://www.gov.uk/government/statistics/hiv-data-tables. h From Australian HIV Observational Database Annual Report 2014, reporting care indicators in a closed observational cohort. i From http://www.shcs.ch/155-shcs-key-data-figures update June 2014. j Within 9 months prior to ART initiation, personal communication PHE. k From the Australian HIV Observational Database, reported in HIV, hepatitis, and sexually transmissible infections in Australia, Annual surveillance report 2014. l Kohler P, Schmidt AJ, Ledergerber B, Vernazza P, CROI2015, http://www.croiconference.org/sites/default/files/posters-2015/1008.pdf. m From HIV in the United Kingdom: 2014 Report. n From reference (44). o From fact sheet HIV and AIDS in Australia, 20th International AIDS conference. p From http://www.phac-aspc.gc.ca/aids-sida/publication/epi/2010/index-eng.php q From HIV- und STI-Fallzahlen 2014: Berichterstattung, Analysen und Trends, in comparison with numbers for 2008 in the 2012 report, http://www.bag.admin.ch/hiv_aids/12472/12480/12481/12484/index.html?lang=de. r From Birrell PJ, Gill ON, Delpech VC, et al. HIV incidence in men who have sex with men in England and Wales 2001-10: a nationwide population study. The Lancet infectious diseases 2013;13:313-8. Estimate not specific to MSM. a From 25 Table 2. Stages in the HIV infection and care continuum Infection/care stage of transmitter Definition Undiagnosed Transmission intervals whose midpoint is before diagnosis: Confirmed recent infection at diagnosis Estimated to be in recent infection Estimated to be in chronic infection Diagnosed Diagnosed < 3mo, Recent infection at diagnosis No CD4 measured CD4 > 500 CD4 in [350-500] CD4 < 350 ART initiated Before first viral suppression After first viral suppression¶ No viral load measured¶ No viral suppression¶ Viral suppression, one observation¶ Viral suppression, >1 observations¶ Not in contact ¶ All transmission intervals of transmitters that were in laboratory confirmed recent infection at time of diagnosis. Considering transmitters that had no evidence for recent infection at time of diagnosis, all transmission intervals whose midpoint is less than 12 months after the estimated infection date. Considering transmitters that had no evidence for recent infection at time of diagnosis, all transmission intervals whose midpoint is more than 12 months after the estimated infection date. Transmission intervals whose midpoint is after diagnosis and before ART start (only of transmitters that are in contact with care services): Considering potential or probable transmitters that were in laboratory confirmed recent infection at time of diagnosis, all transmission intervals whose midpoint is within the first three months after diagnosis. No available CD4 count since diagnosis up to the midpoint of the interval. CD4 counts remained above 500 cells/ml between the first CD4 count up to the midpoint of the interval. CD4 counts decreased to 350-500 cells/ml between the first CD4 count up to the midpoint of the interval. CD4 counts decreased to below 350 cells/ml between the first CD4 count up to the midpoint of the interval. Transmission intervals whose midpoint is after ART start (only of transmitters that are in contact with care services): No first viral load measurement below 100 copies/ml in any transmission interval of the transmitter after ART start No viral load measurement in any transmission interval of the transmitter after ART start At least one viral load measurement at or above 100 copies/ml in any transmission interval of the transmitter after ART start One viral load measurement in any transmission interval of the transmitter after ART start, which is below 100 copies/ml. Several viral load measurements in any transmission interval of the transmitter after ART start, all of which are below 100 copies/ml. No patient record (last contact, clinic visit, CD4 measurement, viral load measurement) in the past and future 9 months from the midpoint of the transmission interval. While flow through the stages is typically unidirectional, men could move freely between these stages. 770 26 Table 3. Characteristics of the recipient MSM with identified sources of transmission Characteristic Recipient MSM with a phylogenetically probable transmitter (n= 617) Recipient MSM with or without a sequence (n= 1,794) Diagnosed MSM 77 76 17 8 7 2 15 17 4 36.8 (29.5-42.9) 37.2 (29.9-43.5) 38.7 (31.3-45.1) 505 (350-630) 534 (360-670) 402 (200-560) 4.9 (4.4-5.5) 4.8 (4.3-5.4) 4.7 (4.3-5.3) 45.1 43.5 43.6 77.0 76.1 17.1 96.9 91.9 88.5 (n= 7,978) Evidence for infection in the past year Previous negative test in the past year (%) Laboratory diagnosis (%) Clinical diagnosis of acute infection (%) Age at diagnosis (years; mean and IQR) First CD4 count within 12 months of diagnosis and before ART start (cells/ml; mean and IQR) Viral load count within 12 months of diagnosis (log10 RNA; mean and IQR) In care in the Amsterdam metropolitan area (%) Last negative test within 12 months prior to diagnosis (%) Self-reported in country infection¶ (%) ¶ Of those self-reporting a country of origin. 27 Table 4. Proportion of transmissions by stage in the HIV infection and care continuum. Infection/care stage of transmitter % of transmissions by time of diagnosis of recipient MSM (95% confidence interval) Overall (n=617) 96/07-06/04 (n=165) 06/05-07/12 (n=145) 08/01-09/06 (n=151) 09/07-10/12 (n=156) Undiagnosed (total) Confirmed recent infection at diagnosis Estimated to be in recent infection Estimated to be in chronic infection 70.9 (65.8-72.5) 67.6 (59.3-72.7) 72.3 (64.2-76.9) 71.8 (63.4-76.3) 72.2 (63.3-76.3) 15.5 (11.9-17.4) 15 (7.6-19.4) 21.7 (15-26.5) 16.4 (11-20.8) 9.4 (5.6-14.1) 25.1 (19.4-28.1) 17.3 (11.7-22.7) 23 (15.1-30.1) 25.9 (15.4-33.6) 34.6 (19.4-43.4) 30.3 (28-34) 35.2 (30.2-42) 27.6 (22.4-34) 29.5 (24.2-36.1) 28.2 (23-35.7) Diagnosed (total) Diagnosed < 3mo, Recent infection at diagnosis No CD4 measured CD4 >500 CD4 in [350-500] CD4 < 350 22.4 (20.7-26.2) 23.6 (18.5-29.7) 22.9 (18.6-29.1) 22.8 (18.3-29.4) 20.7 (17.4-27.3) 2.9 (2.2-4.1) 1.6 (1.2-2.4) 8.3 (7-10.3) 6.4 (5.4-7.9) 3.4 (2.5-4.3) 2.5 (1-4.9) 2.9 (1.6-4.8) 10.2 (6.7-14.2) 4.8 (2.6-7.8) 3.2 (1.2-5.5) 3.2 (1.7-5.5) 0.8 (0.4-1.8) 7 (4.5-10.8) 7.3 (5.1-10.5) 4.6 (2.6-6.6) 3 (1.9-5.4) 1.5 (0.6-3) 8.7 (5.9-12.5) 5.9 (4.2-8.3) 3.7 (2.2-5.6) 2.8 (1.8-4.4) 1 (0.6-2.1) 7.1 (5.4-10.1) 7.7 (5.7-11) 2.1 (1.3-3.3) 5.7 (5.2-7.8) 7 (4.8-11.7) 3.7 (2.2-6.5) 4.9 (3.7-8.1) 6.7 (5.4-10.2) 1.8 (1.6-2.7) 2.2 (1.2-4.4) 0.7 (0.4-1.5) 1.3 (0.9-2.6) 2.8 (2.1-4.6) ART initiated (total) Before first viral suppression After first viral suppression No viral load measured No viral suppression Viral suppression, one observation Viral suppression, >1 observations Not in contact Recent infection (total) 0.5 (0.3-1) 0.9 (0.1-2.4) 0.1 (0-0.3) 0.3 (0.1-0.9) 0.8 (0.4-1.8) 1.4 (0.9-2.1) 2.8 (1.2-5.2) 1.2 (0.4-2.6) 0.9 (0.4-1.9) 0.5 (0.1-1) 0.4 (0.3-0.8) 0.1 (0-0.5) 0.2 (0-0.8) 0.5 (0.2-1.7) 0.6 (0.3-1.5) 1.6 (1.1-2.5) 1 (0.1-2.6) 1.5 (0.6-3.1) 1.9 (0.9-3.6) 2 (1.1-3.6) 1 (0.7-1.6) 1.8 (0.8-3.4) 1.1 (0.4-2.3) 0.5 (0.2-1.4) 0.4 (0.2-0.8) 43.5 (36.6-46) 34.9 (25.4-40.6) 47.9 (36.9-54.8) 45.3 (33.3-54.1) 47.7 (32.8-53.8) 28 780 Extended acknowledgements The ATHENA database is maintained by Stichting HIV Monitoring and supported by a grant from the Dutch Ministry of Health, Welfare and Sport through the Centre for Infectious Disease Control of the National Institute for Public Health and the Environment. 790 800 810 820 830 CLINICAL CENTRES * denotes site coordinating physician Academic Medical Centre of the University of Amsterdam: HIV treating physicians: J.M. Prins*, T.W. Kuijpers, H.J. Scherpbier, J.T.M. van der Meer, F.W.M.N. Wit, M.H. Godfried, P. Reiss, T. van der Poll, F.J.B. Nellen, S.E. Geerlings, M. van Vugt, D. Pajkrt, J.C. Bos, W.J. Wiersinga, M. van der Valk, A. Goorhuis, J.W. Hovius, A.M. Weijsenfeld. HIV nurse consultants: J. van Eden, A. Henderiks, A.M.H. van Hes, M. Mutschelknauss, H.E. Nobel, F.J.J. Pijnappel. HIV clinical virologists/chemists: S. Jurriaans, N.K.T. Back, H.L. Zaaijer, B. Berkhout, M.T.E. Cornelissen, C.J. Schinkel, X.V. Thomas. Admiraal De Ruyter Ziekenhuis, Goes: HIV treating physicians: M. van den Berge, A. Stegeman. HIV nurse consultants: S. Baas, L. Hage de Looff. HIV clinical virologists/chemists: D. Versteeg. Catharina Ziekenhuis, Eindhoven: HIV treating physicians: M.J.H. Pronk*, H.S.M. Ammerlaan. HIV nurse consultants: E.S. de Munnik. HIV clinical virologists/chemists: A.R. Jansz, J. Tjhie, M.C.A. Wegdam, B. Deiman, V. Scharnhorst. Emma Kinderziekenhuis: HIV nurse consultants: A. van der Plas, A.M. Weijsenfeld. Erasmus Medisch Centrum, Rotterdam: HIV treating physicians: M.E. van der Ende*, T.E.M.S. de Vries-Sluijs, E.C.M. van Gorp, C.A.M. Schurink, J.L. Nouwen, A. Verbon, B.J.A. Rijnders, H.I. Bax, M. van der Feltz. HIV nurse consultants: N. Bassant, J.E.A. van Beek, M. Vriesde, L.M. van Zonneveld. Data collection: A. de Oude-Lubbers, H.J. van den Berg-Cameron, F.B. Bruinsma-Broekman, J. de Groot, M. de Zeeuw- de Man. HIV clinical virologists/chemists: C.A.B. Boucher, M.P.G Koopmans, J.J.A van Kampen. Erasmus Medisch Centrum–Sophia, Rotterdam: HIV treating physicians: G.J.A. Driessen, A.M.C. van Rossum. HIV nurse consultants: L.C. van der Knaap, E. Visser. Flevoziekenhuis, Almere: HIV treating physicians: J. Branger*, A. Rijkeboer-Mes. HIV nurse consultant and data collection: C.J.H.M. Duijf-van de Ven. HagaZiekenhuis, Den Haag: HIV treating physicians: E.F. Schippers*, C. van Nieuwkoop. HIV nurse consultants: J.M. van IJperen, J. Geilings. Data collection: G. van der Hut. HIV clinical virologist/chemist: P.F.H. Franck. HIV Focus Centrum (DC Klinieken): HIV treating physicians: A. van Eeden*. HIV nurse consultants: W. Brokking, M. Groot, L.J.M. Elsenburg. HIV clinical virologists/chemists: M. Damen, I.S. Kwa. Isala, Zwolle: HIV treating physicians: P.H.P. Groeneveld*, J.W. Bouwhuis. HIV nurse consultants: J.F. van den Berg, A.G.W. van Hulzen. Data collection: G.L. van der Bliek, P.C.J. Bor. HIV clinical virologists/chemists: P. Bloembergen, M.J.H.M. Wolfhagen, G.J.H.M. Ruijs. Leids Universitair Medisch Centrum, Leiden: HIV treating physicians: F.P. Kroon*, M.G.J. de Boer, M.P. Bauer, H. Jolink, A.M. Vollaard. HIV nurse consultants: W. Dorama, N. van Holten. HIV clinical virologists/chemists: E.C.J. Claas, E. Wessels. Maasstad Ziekenhuis, Rotterdam: HIV treating physicians: J.G. den Hollander*, K. Pogany, A. Roukens. HIV nurse consultants: M. Kastelijns, J.V. Smit, E. Smit, D. Struik-Kalkman, C. Tearno. Data collection: M. Bezemer, T. van Niekerk. HIV clinical virologists/chemists: O. Pontesilli. Maastricht UMC+, Maastricht: HIV treating physicians: S.H. Lowe*, A.M.L. Oude Lashof, D. Posthouwer. HIV nurse consultants: R.P. Ackens, J. Schippers, R. Vergoossen. Data collection: B. Weijenberg-Maes. HIV clinical virologists/chemists: I.H.M. van Loo, T.R.A. Havenith. MC Slotervaart, Amsterdam: HIV treating physicians: J.W. Mulder, S.M.E. Vrouenraets, F.N. Lauw. HIV nurse consultants: M.C. van Broekhuizen, H. Paap, D.J. Vlasblom. HIV clinical virologists/chemists: P.H.M. Smits. MC Zuiderzee, Lelystad: HIV treating physicians: S. Weijer*, R. El Moussaoui. HIV nurse consultant: A.S. Bosma. Medisch Centrum Alkmaar: HIV treating physicians: W. Kortmann*, G. van Twillert*, J.W.T. Cohen Stuart, B.M.W. Diederen. HIV nurse consultant and data collection: D. Pronk, F.A. van Truijen-Oud. HIV clinical virologists/chemists: W. A. van der Reijden, R. Jansen. Medisch Centrum Haaglanden, Den Haag: HIV treating physicians: E.M.S. Leyten*, L.B.S. Gelinck. HIV nurse consultants: A. van Hartingsveld, C. Meerkerk, G.S. Wildenbeest. HIV clinical virologists/chemists: J.A.E.M. Mutsaers, C.L. Jansen. Medisch Centrum Leeuwarden, Leeuwarden: HIV treating physicians: M.G.A.van Vonderen*, D.P.F. van Houte, L.M. Kampschreur. HIV nurse consultants: K. Dijkstra, S. Faber. HIV clinical virologists/chemists: J Weel. Medisch Spectrum Twente, Enschede: HIV treating physicians: G.J. Kootstra*, C.E. Delsing. HIV nurse consultants: M. van der Burg-van de Plas, H. Heins. Data collection: E. Lucas. OLVG Amsterdam: HIV treating physicians: K. Brinkman*, G.E.L. van den Berk, W.L. Blok, P.H.J. Frissen, K.D. Lettinga W.E.M. Schouten, J. Veenstra. HIV nurse consultants: C.J. Brouwer, G.F. Geerders, K. Hoeksema, M.J. Kleene, I.B. van der Meché, M. Spelbrink, H. Sulman, A.J.M. Toonen, S. Wijnands. HIV clinical virologists: 29 840 850 860 M. Damen, D. Kwa. Data collection: E. Witte. Radboudumc, Nijmegen: HIV treating physicians: P.P. Koopmans, M. Keuter, A.J.A.M. van der Ven, H.J.M. ter Hofstede, A.S.M. Dofferhoff, R. van Crevel. HIV nurse consultants: M. Albers, M.E.W. Bosch, K.J.T. Grintjes-Huisman, B.J. Zomer. HIV clinical virologists/chemists: F.F. Stelma, J. Rahamat-Langendoen. HIV clinical pharmacology consultant: D. Burger. Rijnstate, Arnhem: HIV treating physicians: C. Richter*, E.H. Gisolf, R.J. Hassing. HIV nurse consultants: G. ter Beest, P.H.M. van Bentum, N. Langebeek. HIV clinical virologists/chemists: R. Tiemessen, C.M.A. Swanink. Spaarne Gasthuis, Haarlem: HIV treating physicians: S.F.L. van Lelyveld*, R. Soetekouw. HIV nurse consultants: N. Hulshoff, L.M.M. van der Prijt, J. van der Swaluw. Data collection: N. Bermon. HIV clinical virologists/chemists: W.A. van der Reijden, R. Jansen, B.L. Herpers, D.Veenendaal. Stichting Medisch Centrum Jan van Goyen, Amsterdam: HIV treating physicians: D.W.M. Verhagen. HIV nurse consultants: M. van Wijk. St Elisabeth Ziekenhuis, Tilburg: HIV treating physicians: M.E.E. van Kasteren*, A.E. Brouwer. HIV nurse consultants and data collection: B.A.F.M. de Kruijf-van de Wiel, M. Kuipers, R.M.W.J. Santegoets, B. van der Ven. HIV clinical virologists/chemists: J.H. Marcelis, A.G.M. Buiting, P.J. Kabel. Universitair Medisch Centrum Groningen, Groningen: HIV treating physicians: W.F.W. Bierman*, H. Scholvinck, K.R. Wilting, Y. Stienstra. HIV nurse consultants: H. de Groot-de Jonge, P.A. van der Meulen, D.A. de Weerd, J. Ludwig-Roukema. HIV clinical virologists/chemists: H.G.M. Niesters, A. Riezebos-Brilman, C.C. van Leer-Buter, M. Knoester. Universitair Medisch Centrum Utrecht, Utrecht: HIV treating physicians: A.I.M. Hoepelman*, T. Mudrikova, P.M. Ellerbroek, J.J. Oosterheert, J.E. Arends, R.E. Barth, M.W.M. Wassenberg, E.M. Schadd. HIV nurse consultants: D.H.M. van Elst-Laurijssen, E.E.B. van Oers-Hazelzet, S. Vervoort, Data collection: M. van Berkel. HIV clinical virologists/chemists: R. Schuurman, F. Verduyn-Lunel, A.M.J. Wensing. VU medisch centrum, Amsterdam: HIV treating physicians: E.J.G. Peters*, M.A. van Agtmael, M. Bomers, J. de Vocht. HIV nurse consultants: M. Heitmuller, L.M. Laan. HIV clinical virologists/chemists: A.M. Pettersson, C.M.J.E. Vandenbroucke-Grauls, C.W. Ang. Wilhelmina Kinderziekenhuis, UMCU, Utrecht: HIV treating physicians: S.P.M. Geelen, T.F.W. Wolfs, L.J. Bont. HIV nurse consultants: N. Nauta. COORDINATING CENTRE Director: P. Reiss. Data analysis: D.O. Bezemer, A.I. van Sighem, C. Smit, F.W.M.N. Wit. Data management and quality control: S. Zaheri, M. Hillebregt, A. de Jong. Data monitoring: D. Bergsma, P. Hoekstra, A. de Lang, S. Grivell, A. Jansen, M.J. Rademaker, M. Raethke. Data collection: L. de Groot, M. van den Akker, Y. Bakker, M. Broekhoven, E. Claessen, A. El Berkaoui, J. Koops, E. Kruijne, C. Lodewijk, R. Meijering, L. Munjishvili, B. Peeck, C. Ree, R. Regtop, Y. Ruijs, T. Rutkens, L. van de Sande, M. Schoorl, S. Schnörr, E. Tuijn, L. Veenenberg, S. van der Vliet, T. Woudstra. Patient registration: B. Tuk. 30 Supplementary Online Material and Methods SOM 1 Procedure to identify potential transmitters of recipient MSM 870 880 890 900 To reconstruct an evidence base of past transmission events amongst MSM in the Netherlands between July 1996 and December 2010, we first identified MSM for whom a narrow infection window could defined (see Materials and Methods in the main text). Next, we considered as potential transmitters all registered infected men that could have in principle infected a recipient. Potential transmitters were defined as infected men in the ATHENA cohort that overlap with the infection window of a recipient MSM. To determine if an infected individual overlapped with an infection window, we need to estimate when the individual in question became infected. Equivalently, we here estimate the time from infection to diagnosis, which we denote by 𝑇𝑖𝐼→𝐷 for individual 𝑖. This section describes how individual-level time to diagnosis estimates were obtained. We denote the estimated time to diagnosis for individual 𝑖 by 𝑇̂𝑖𝐼→𝐷 . Estimated infection times are associated with substantial uncertainty and sensitivity analyses were conducted for lower and upper 95% estimates. Findings did not depend substantially on these infection time estimates (figures S16 and S21). We adapted a previously published method that estimates an individual’s time to diagnosis based on certain risk variables at time of diagnosis (41, 45). This approach proceeds in two steps. First, HIV surveillance data from an MSM cohort of drug naïve HIV seroconverters are used to estimate the association between the time to diagnosis since the midpoint of the seroconversion interval and risk variables at diagnosis. This association is described with a suitable regression model. Next, the fitted regression model is used to predict the expected time to diagnosis for all infected individuals. Previous work found that CD4 cell count at diagnosis, age at diagnosis, infection route and ethnicity are significantly associated with the time to diagnosis since the midpoint of the seroconversion interval (45). Here, ethnicity was not available and infection route was always MSM. Both demographic variables were not considered in this analysis. The previous method to estimate an individual’s time of infection assumes, first, that the time between the midpoint of the seroconversion interval to diagnosis is representative of the unknown time to diagnosis among seroconverting MSM. We denote the time to diagnosis from the midpoint by 𝑇̃𝑖𝐼→𝐷 for seroconverter 𝑖. Second, the previous approach assumes that the approximated time to diagnosis among seroconverting MSM is representative of the time to diagnosis among all infected MSM. Here, we adapt this approach in order to relax both assumptions, using the 𝑇̃𝑖𝐼→𝐷 as an intermediate step to obtain the final estimate 𝑇̂𝑖𝐼→𝐷 . In the ATHENA cohort, data on 3,025 MSM with a last negative test and date of diagnosis between 2003/01-2010/12 were available to estimate the association between 𝑇̃𝑖𝐼→𝐷 and risk variables at time of diagnosis. Table S4 characterizes these MSM with a last negative test. We conducted an exploratory data analysis, shown in figure S24, which indicated that infection status at time of diagnosis (evidence for infection within 12 months prior to diagnosis), age at 31 910 diagnosis, status of HIV infection at diagnosis and (to a lesser extent) the first CD4 count within 12 months of diagnosis are associated with 𝑇̃𝑖𝐼→𝐷 among drug naïve MSM with a last negative test. For individuals in confirmed recent infection at time of diagnosis, we set 𝑇̂𝑖𝐼→𝐷 = 1 year. For all other individuals, we estimated first 𝑇̃𝑖𝐼→𝐷 from age and first CD4 count at time of diagnosis. Based on the exploratory data analysis shown in figure S24, we fitted the regression model 𝑇̃𝑖𝐼→𝐷 ~𝐺𝑎𝑚𝑚𝑎(𝜇𝑖 , 𝜙𝑖 ), 920 log 𝜇𝑖 = 𝛽0 + 𝑅 𝑁𝑖𝑛𝑑 ( 𝛽1𝑁𝑖𝑛𝑑 𝐶𝑖850 𝐴𝑖 + 𝛽2𝑁𝑖𝑛𝑑 𝐶𝑖250−850 𝐴𝑖 + 𝛽3𝑁𝑖𝑛𝑑 𝐶𝑖250 𝐴𝑖 + 𝛽4𝑁𝑖𝑛𝑑 𝐶𝑖𝑁𝐴 𝐴𝑖 ) + 𝑅 𝑚𝑖𝑠𝑠 ( 𝛽1𝑚𝑖𝑠𝑠 𝐶𝑖850 𝐴𝑖 + 𝛽2𝑚𝑖𝑠𝑠 𝐶𝑖250−850 𝐴𝑖 + 𝛽3𝑚𝑖𝑠𝑠 𝐶𝑖250 𝐴𝑖 + 𝛽4𝑚𝑖𝑠𝑠 𝐶𝑖𝑁𝐴 𝐴𝑖 ) log 𝜙𝑖 = 𝛾0 + 𝑅 𝑁𝑖𝑛𝑑 ( 𝛾1𝑁𝑖𝑛𝑑 𝐶𝑖850 𝐴𝑖 + 𝛾2𝑁𝑖𝑛𝑑 𝐶𝑖250−850 𝐴𝑖 + 𝛾3𝑁𝑖𝑛𝑑 𝐶𝑖250 𝐴𝑖 + 𝛾4𝑁𝑖𝑛𝑑 𝐶𝑖𝑁𝐴 𝐴𝑖 ) + 𝑅 𝑚𝑖𝑠𝑠 ( 𝛾1𝑚𝑖𝑠𝑠 𝐶𝑖850 𝐴𝑖 + 𝛾2𝑚𝑖𝑠𝑠 𝐶𝑖250−850 𝐴𝑖 + 𝛾3𝑚𝑖𝑠𝑠 𝐶𝑖250 𝐴𝑖 + 𝛾4𝑚𝑖𝑠𝑠 𝐶𝑖𝑁𝐴 𝐴𝑖 ) among MSM with a last negative test, where 𝑇̃𝑖𝐼→𝐷 𝜇𝑖 𝜙𝑖 𝛽𝑘 𝛾𝑘 𝑁𝑖𝑛𝑑 𝑅𝑖 𝑅𝑖𝑚𝑖𝑠𝑠 𝐶𝑖850 𝐶𝑖250−850 𝐶𝑖250 𝐶𝑖𝑁𝐴 𝐴𝑖 930 time between the midpoint of the seroconversion interval and diagnosis mean of the Gamma distribution for the 𝑖th seroconverter dispersion of the Gamma distribution for the 𝑖th seroconverter location parameters shape parameters 𝟏( 𝑖th seroconverter with recent infection at diagnosis not indicated ) 𝟏( 𝑖th seroconverter with missing infection status at diagnosis ) 𝟏( 𝑖th seroconverter with first CD4 count > 850 cells/ml within 12 months after diagnosis and before ART start) 𝟏( 𝑖th seroconverter with first CD4 count in [250-850] cells/ml within 12 months after diagnosis and before ART start) 𝟏( 𝑖th seroconverter with first CD4 count < 250 cells/ml within 12 months after diagnosis and before ART start) 𝟏( 𝑖th seroconverter with no first CD4 count within 12 months after diagnosis and before ART start) min(age at diagnosis of 𝑖th seroconverter, 45). All regression coefficients were significant. Figure S25 illustrates the predictions obtained with the fitted multivariable regression model. The fitted regression model explained 53% of the variance in the observed 𝑇̃𝑖𝐼→𝐷 among MSM with a last negative test. Rice et al. (41) used the expected 𝑇̃𝑖𝐼→𝐷 as an estimate of the unknown time to diagnosis. To relax the two underlying assumptions noted earlier, we used instead a particular upper quantile 𝛼 of the estimated probability density function of 𝑇̃𝑖𝐼→𝐷 . Figure S26 illustrates the probability density function of 𝑇̃𝑖𝐼→𝐷 which was obtained from the parameters of the fitted regression model. The upper quantile parameter 𝛼 was estimated so that the average 𝑇̂𝑖𝐼→𝐷 is consistent with the average time to diagnosis derived from two mathematical modelling studies between 1996 and 1999. We chose this period in order to validate if the predictive model can reproduce previously estimated reductions in average time to diagnosis in subsequent years. For this period, Bezemer et al. (46) estimated an average time 32 940 950 to diagnosis amongst MSM of 3.16 years (95% confidence interval 3.00-3.41 years) in this period. van Sighem et al. (7) estimated a mean time to diagnosis of 4.34 years (3.87-5.11 years) by the end of 1999. We chose the quantile parameters 𝛼 = 0.109, 0.148, 0.194 in correspondence to these estimates (figure S27-A). The fact that the chosen quantile parameters are substantially lower than 0.5 indicates that the expected time to diagnosis since the midpoint of the seroconversion interval cannot be considered representative of the time to diagnosis among infected MSM. We used these quantile parameters to obtain central, lower, and upper individual-level time to diagnosis estimates, as shown in Figure S27-B. Overall, the individual-level predictive model is able to reproduce previously estimated reductions in average time to diagnosis without the addition of time-dependent variables (black lines in Figure S27-B) (7, 46). The linear drop in time to diagnosis after 2005 may in part be explained by right censoring in the cohort: as the study endpoint was 2010/12 and the maximum estimated time to diagnosis is around 7 years, we expect that an increasing fraction of men infected since 2004-2005 is not yet diagnosed. In comparison to Rice et al. (41), our approach results in larger estimates of time to diagnosis. If the 50% quantile had been used to estimate times to diagnosis, the average time to diagnosis for MSM infected in the period 1996-1999 would have been slightly less than 2 years (figure S27-A). SOM 2 Procedure to declare potential transmission pairs phylogenetically implausible 960 HIV sequences cannot prove epidemiological linkage nor the direction of HIV infection (11, 12). However, viral sequences can be used to exclude potential transmission events between individuals whose viral sequences are phylogenetically unrelated. There is currently no widely agreed consensus on viral phylogenetic exclusion criteria (8). 970 980 To guide the viral phylogenetic exclusion criteria adopted in this study, we conducted an evolutionary analysis of sequences from transmitters and recipients in confirmed transmission pairs. This analysis is described in figure S5. In addition, we considered 4,117 pairs of sequences from the same Dutch patient and 201,605 pairs between Dutch patients that died before the last negative antibody test of the other patient. These analyses are described below, and were used to develop exclusion criteria with high specificity (i.e. small type-I error of falsely excluding true transmission pairs). We chose central exclusion criteria for the main transmission analysis and varied lower and upper criteria over the identified range. Sensitivity analyses demonstrate these criteria did not impact substantially on the reported transmission and prevention analyses. Figure S5 shows the genetic distance between sequences from confirmed pairs in the Belgium transmission chain as a function of time elapsed. It is clear that the genetic distance between sequences from confirmed pairs can exceed typical phylogenetic clustering thresholds, provided the time elapsed is sufficiently large. This analysis indicates that genetic distances of not more than 2% between partial HIV pol sequences from true transmission pairs are only expected when the total time elapsed is small. This is typically the case when individuals are frequently followed up as in a controlled, randomized trial (47). Among the phylogenetically probable transmission pairs in this study, the maximum time elapsed was 10.87 years. Considering figure S28, the corresponding upper 97.5% quantile of the genetic distances between sequences from true transmission pairs is ~ 7%. To validate the analysis in figure S5, we estimated the genetic distance between sequences from confirmed transmission pairs in the Swedish transmission chain in the same manner. Figure S28 33 shows that these genetic distances fall into the 80% probability range estimated from the Belgium transmission pairs. This argues against tight genetic distance thresholds to declare transmission pairs phylogenetically implausible in this study. 990 To exclude potential transmission pairs, we used the following two criteria: - - 1000 1010 Bootstrap clade support. If the potential transmitter did not occur in the same clade as the recipient MSM in sufficiently many bootstrap phylogenetic trees, the pair was excluded. Such bootstrap criteria are frequently used (8). Phylogenetic incompatibility with direct transmission. We found that within phylogenetic clades with high bootstrap support, branches between the remaining potential transmitters and the recipient MSM were often relatively long (figure S3). With approximately half of all potential transmitters sampled, one explanation is that the actual transmitter did not have a sequence sampled or was not diagnosed by March 2013. Unobserved intermediate transmitters were detected with a coalescent compatibility test that was recently introduced by Vrancken et al. (13). The idea behind this test is that viral lineages of a true transmission pair must coalesce at a time when the transmitter was already infected. The test assumes that transmitters are infected with a single virus. The test calculates the probability that the viral lineages from the potential transmitter and the recipient coalesce after the transmitter was infected and before the recipient was diagnosed. The test excludes the potential pair if this coalescent compatibility probability is below a certain threshold. To apply this test, we dated coalescent events within phylogenetic clusters. Specifically, the sampled ancestor birth-death model was used in order to allow for the possibility that transmission might have occurred after the time of sequence sampling. To accommodate temporal variation in model parameters, we implemented a skyline version of the sampled ancestor birth-death model along previous work (48). We then sought to determine thresholds so that potential transmitters are excluded with high specificity (a large proportion of true transmitters to recipients is not excluded). Typically, viral phylogenetic studies aim to identify transmission chains (23). This leads to relatively strict thresholds. Here, we aim to exclude pairs of individuals that did not infect each other. This different objective leads to relatively large thresholds. 1020 1030 For the clade frequency criterion, the type-I error is the probability that sequence pairs of a true transmission pair do not co-cluster. As a proxy, we calculated the probability that sequence pairs from the same individual do not co-cluster. Figure S29 shows this probability as a function of the clade frequency threshold. The approximate type-I error is more than 10% for clade frequency thresholds above 85%. To limit this error, we settled on 80% as the central clade frequency threshold, and considered 70% and 85% as the upper and lower thresholds respectively. To determine the threshold of the coalescent compatibility test below which potential transmission pairs are excluded, we proceeded as for the phylogenetic clustering test. We approximated the type-I error with the probability that co-clustering sequence pairs from the same individual were excluded by the coalescent compatibility test. Figure S30-A shows this probability as a function of the coalescent compatibility threshold. The approximate type-I error is around 5% for thresholds in the range of 10% to 30%. We chose 20% as the central threshold and considered 10% and 30% in sensitivity analyses. We also evaluated the power of the test in excluding co-clustering female-female pairs. All femalefemale pairs were considered as incorrect transmission pairs (49). Figure S30-B shows that the 34 coalescent compatibility test excludes more than half of all co-clustering female-female pairs if the compatibility threshold is at least 10%. 1040 To summarize, we adopted the following exclusion criteria: Exclusion criteria Clade frequency in Coalescent compatibility with bootstrap viral phylogenies direct transmission Central Lower-I Lower-II Upper-I Upper-II 80% 80% 85% 80% 70% 20% 30% 20% 10% 20% Viral phylogenetic analyses were remarkably successful in excluding potential transmission events. Across the above exclusion criteria, between 99.94%-99.96% of all potential transmission pairs with sequences available for both individuals could be excluded. Table S3 characterizes the phylogenetically probable transmitters. The difference between using a 7% threshold or no threshold at all was minimal: only 3 more recipients would have been excluded. Sensitivity analyses demonstrate that the findings reported in this study did not vary substantially across these exclusion criteria, and additional genetic distance criteria (figures S14-S22). 1050 SOM 3 Procedure to quantify censoring bias The observed, probable transmission intervals reported in figure 2 are subject to two main sources of bias. Below, we describe the technical bootstrap procedure to quantify the extent of censoring bias. The idea behind this procedure is described in the Materials and Methods of the main text and figure S6. 1060 Bootstrap techniques proceed by constructing sub-samples from observed data to estimate properties of the observed data that is sampled from the population (50). Here, we implemented a bootstrap technique that sub-censors the observed data to estimate the extent of censoring of the observed data. Censoring describes the proportion of infected individuals that have not yet been registered in the ATHENA cohort, irrespective of whether a sequence was sampled or not. To quantify censoring, we considered all potential transmitters (stage A in figure 1) and their "overlap" intervals, during which the potential transmitters overlapped with infection windows of recipients. The probable transmitters and their transmission intervals do not enter the calculations below. We adopt the following notation: 𝑡𝐸 𝑡𝐶 [𝑡1 , 𝑡2 ] 𝑡𝐶∗ = 𝑡𝐶 − 𝛿 [𝑡1∗ , 𝑡2∗ ] 1070 end of the observation period censoring time of potential transmitters observation period of recipients bootstrap censoring time, where 𝛿 > 0 bootstrap observation period, where 𝑡1∗ = 𝑡1 − 𝛿 and 𝑡2∗ = 𝑡2 − 𝛿. Here, we set 𝑡𝐸 = 2013/03, the time of database closure; 𝑡𝐶 = 2010/12, the end of the study period; and [𝑡1 , 𝑡2 ] to one of the six time intervals 1996/07-2006/06, 2006/07-0207/12, 2008/01-2009/06, 2009/07-2009/12, 2010/01-2010/06, 2010/07-2010/12. The fourth period in table 4 was split into three intervals because of the rapidly increasing impact of censoring towards the present. 35 For a bootstrap censoring time 𝑡𝐶∗ , we can calculate the proportion of non-censored intervals in infection/care stage 𝑥 to recipients that are diagnosed during the bootstrap observation period [𝑡1∗ , 𝑡2∗ ], 𝑐𝐸 (𝑥, 𝑡1∗ , 𝑡2∗ , 𝑡𝐶∗ ) = ∑𝑗∈𝑅(𝑡1∗ ,𝑡2∗ ) ∑𝜏∈𝑉𝑗 (𝑥) 𝟏{ 𝜏 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑏𝑒𝑓𝑜𝑟𝑒 𝑡𝐶∗ } ∑𝑗∈𝑅(𝑡1∗ ,𝑡2∗ ) ∑𝜏∈𝑉𝑗 (𝑥) 𝟏{ 𝜏 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑏𝑒𝑓𝑜𝑟𝑒 𝑡𝐸 } , where 𝑅(𝑡1∗ , 𝑡2∗ ) 𝑉𝑗 (𝑥) 1080 set of recipient MSM diagnosed in the period [𝑡1∗ , 𝑡2∗ ], set of overlap intervals to recipient 𝑗 that are in stage 𝑥. If the corresponding potential transmitter is not diagnosed before 𝑡𝐶∗ , then 𝟏{ 𝜏 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑏𝑒𝑓𝑜𝑟𝑒 𝑡𝐶∗ } equals zero and otherwise one. This is illustrated in figure S6, where 𝑡1∗ = 2006/06, 𝑡2∗ = 2007/12, 𝑡𝐸 = 2013/03 and 𝑡𝐶∗ could be any time between 2008/01 and 2013/03. We aim to estimate, for the actual censoring time 𝑡𝐶 , the proportion of non-censored overlap intervals in stage 𝑥 to recipients that are diagnosed during the period [𝑡1 , 𝑡2 ]. This can be written as 𝑐∞ (𝑥, 𝑡1 , 𝑡2 , 𝑡𝐶 ) = 1090 ∑𝑗∈𝑅(𝑡1 ,𝑡2 ) ∑𝜏∈𝑉𝑗 (𝑥) 𝟏{ 𝜏 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑏𝑒𝑓𝑜𝑟𝑒 𝑡𝐶 } ∑𝑗∈𝑅(𝑡1 ,𝑡2 ) ∑𝜏∈𝑉𝑗(𝑥) 𝟏{ 𝜏 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑏𝑒𝑓𝑜𝑟𝑒 ∞} . We need to assume that the censoring process has not changed within the last Δ𝑚𝑎𝑥 years from 𝑡2 . In this case, 𝑐∞ (𝑥, 𝑡1 , 𝑡2 , 𝑡𝐶 ) = 𝑐∞ (𝑥, 𝑡1 − 𝛿, 𝑡2 − 𝛿, 𝑡𝐶 − 𝛿) for all 𝛿 < Δ𝑚𝑎𝑥 . We need to assume further that all overlap intervals have been observed by 𝑡𝐸 . This is only the case when the bootstrap observation period lies sufficiently far back in time, that is 𝛿 > Δ𝑚𝑖𝑛 . In this case, 𝑐∞ (𝑥, 𝑡1 − 𝛿, 𝑡2 − 𝛿, 𝑡𝐶 − 𝛿) = 𝑐𝐸 (𝑥, 𝑡1 − 𝛿, 𝑡2 − 𝛿, 𝑡𝐶 − 𝛿) 1100 for all 𝛿 > Δ𝑚𝑖𝑛 . Under these assumptions on 𝛿, the following bootstrap algorithm provides an estimate of the proportion of overlap transmission intervals that are not censored, 𝑐∞ (𝑥, 𝑡1 , 𝑡2 , 𝑡𝐶 ). Bootstrap algorithm Let 𝐵 be the number of bootstrap iterations. For 1:B do 1. Draw 𝛿𝑏 from a uniform distribution with minimum Δ𝑚𝑖𝑛 and maximum Δ𝑚𝑎𝑥 . 2. Compute 𝑐̂𝑏 = 𝑐𝐸 (𝑥, 𝑡1 − 𝛿𝑏 , 𝑡2 − 𝛿, 𝑡𝐶 − 𝛿𝑏 ). 1110 Estimate 𝑐∞ (𝑥, 𝑡1 , 𝑡2 , 𝑡𝐶 ) with 𝑐̂ = ∑𝐵𝑏=1 𝑐̂𝑏 . We chose Δ𝑚𝑖𝑛 and Δ𝑚𝑎𝑥 as follows. Mathematical modelling indicates that the average time to diagnosis amongst MSM in the Netherlands is ~ 2-3 years in recent years (7, 46). For some individuals, time to diagnosis may be substantially longer and we allowed for up to 4 years. In this 36 case, any 𝛿 such that 𝑡2 − 𝛿 ≤ 2009/03 should be sufficiently large. Since the most recent 𝑡2 is 2010/12, we have 𝛿 ≥ Δ𝑚𝑖𝑛 = 2 years. Further, we assumed that Δ𝑚𝑎𝑥 can be set to 3 years. 1120 Figure S31 shows that estimated censoring bias is extensive: for recipients diagnosed between 2010/07-2010/12, an estimated 20% of overlap intervals from potential transmitters estimated to be in chronic infection are observed. As expected from figure S6, the estimated censoring bias was substantially smaller for overlap intervals of potential transmitters in recent infection at time of diagnosis. SOM 4 Modelling counterfactual prevention scenarios We formulated prevention models that moved probable transmitters to less infectious infection/care stages, thereby changing the overall probability that any of the recipient MSM would have been infected to less than one. This section describes these individual-level prevention models and how they were parameterized. 1130 SOM 4.1 Improved testing with conventional assays Counterfactual testing scenarios re-allocated undiagnosed men to less infectious infection/care stages between diagnosis and ART start. The individual-level testing for prevention model has three parameters 𝜃1𝑇𝑒𝑠𝑡 𝜃2𝑇𝑒𝑠𝑡 𝜃3𝑇𝑒𝑠𝑡 duration between consecutive HIV tests in months additional fraction of probable transmitters that are tested with frequency 𝜃1𝑇𝑒𝑠𝑡 window period of HIV testing assay, and proceeds as follows to simulate a counterfactual scenario. 1140 A fraction 𝜃2𝑇𝑒𝑠𝑡 of randomly chosen, undiagnosed probable transmitters are tested in 𝜃1𝑇𝑒𝑠𝑡 intervals. The first test date was randomly allocated so that the average first test was in mid-2008. We assumed that the window period 𝜃3𝑇𝑒𝑠𝑡 of conventional assays is exactly 1 month (51). Before this window period, all tests were assumed to be negative. After this window period, all tests were assumed to correctly identify HIV status. After a counterfactual, positive test probable transmission intervals before diagnosis were randomly re-allocated to one of the stages between diagnosis and ART start. The re-allocation stage was drawn in proportion to the adjusted number of probable transmitters in that stage. Each re-allocated probable transmission interval was associated with a randomly chosen transmission probability from the new stage. Thus, the testing for prevention model changes the probability of secondary infections from undiagnosed men to a lower probability of secondary infections from diagnosed, untreated men. 1150 To parameterize this model, we reviewed testing behaviour amongst uninfected MSM in the Netherlands, recipient MSM, and probable transmitters to the recipient MSM in this study. The duration between consecutive tests, 𝜃1𝑇𝑒𝑠𝑡 , was set to 12 months throughout. 38% of uninfected MSM in the Netherlands reported to test annually in the EMIS 2010: The European Men-Who-Have-SexWith-Men Internet Survey (52). Amongst MSM diagnosed between 2009/07-2010/12, 26.8% had a last negative test within 12 months prior to diagnosis. Amongst probable transmitters to recipient 37 1160 MSM diagnosed between 2009/07-2010/12, 17.3% had a last negative test within 12 months prior to diagnosis. Figure S32 shows that this low proportion of probable transmitters with a last negative test was not sensitive to the choice of infection time estimates or phylogenetic exclusion criteria. Figure 4 reports estimates of the proportion of transmissions that could have been averted by overall testing 𝑇𝑒𝑠𝑡 𝑇𝑒𝑠𝑡 coverage 𝛾𝑡𝑎𝑟𝑔𝑒𝑡 . Given a proportion 𝛾𝑐𝑢𝑟𝑟𝑒𝑛𝑡 of probable transmitters that are already testing 𝑇𝑒𝑠𝑡 annually, we determined 𝜃2 through the relationship 𝑇𝑒𝑠𝑡 𝑇𝑒𝑠𝑡 𝑇𝑒𝑠𝑡 )𝜃2𝑇𝑒𝑠𝑡 . 𝛾𝑡𝑎𝑟𝑔𝑒𝑡 = 𝛾𝑐𝑢𝑟𝑟𝑒𝑛𝑡 + (1 − 𝛾𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑇𝑒𝑠𝑡 Based on figure S32, we set 𝛾𝑐𝑢𝑟𝑟𝑒𝑛𝑡 =0.17. SOM 4.2 Improved testing with specialized assays that detect early infection before the presence of HIV antibodies 1170 Counterfactual testing scenarios with specialized assays that can detect early infection before the presence of HIV antibodies, were simulated as the scenarios based on conventional assays, except that the window period was set to zero (51). SOM 4.3 Antiretroviral pre-exposure prophylaxis Counterfactual PrEP scenarios prevented randomly chosen, uninfected men from becoming infected. The individual-level PrEP prevention model has two parameters, 𝜃1𝑃𝑟𝐸𝑃 𝜃2𝑃𝑟𝐸𝑃 fraction of individuals that take PrEP probability that an individual taking PrEP is not infected, 1180 and proceeds as follows to simulate a counterfactual scenario. 1190 A fraction 𝜃1𝑃𝑟𝐸𝑃 of recipients that test negative is randomly chosen to take PrEP by mid 2008. The intervention was assumed to be efficacious on a randomly chosen fraction 𝜃2𝑃𝑟𝐸𝑃 of those. This fraction was removed from the newly infected recipients (infection probability set from 1 to 0). In addition, a fraction 𝜃1𝑃𝑟𝐸𝑃 of probable transmitters was randomly chosen to take PrEP since they first tested negative. Test dates were simulated as in SOM 4.1. The intervention was also assumed to be efficacious on a randomly chosen fraction 𝜃2𝑃𝑟𝐸𝑃 of those. This fraction was removed from the infected probable transmitters (lowering infection probabilities of the corresponding recipients to below 1). The PrEP prevention model averts secondary infections amongst recipients as well as primary infections of probable transmitters that were uninfected at time of testing. We parameterized 𝜃2𝑃𝑟𝐸𝑃 based on findings from the iPrEX, PROUD and ANRS Ipergay studies (18, 19, 37). The iPrEX trial demonstrated an overall reduction in HIV incidence of 44% (95% confidence interval 15-63%) of daily oral tenofovir-based PrEP amongst MSM from diverse settings (18). The PROUD study demonstrated a reduction in HIV incidence of 86% (58%-96%) of daily oral single-pill PrEP amongst predominantly white, high risk MSM recruited from sexual health clinics in the United Kingdom (19). Reports from the ANRS Ipergay study indicate a reduction in HIV incidence of 86% (40%-99%) amongst MSM in France and Canada who follow an on demand dosing scheme 2-24 hours before sex (37). Reflecting the more recent PROUD and Ipergay trials, 38 1200 𝜃2𝑃𝑟𝐸𝑃 was for a single simulated counterfactual scenario drawn from a Beta distribution with mean of 86% and 95% interquartile range 40%-99%. Uncertainty in this parameter is the main reason why confidence intervals associated with prevention strategies that include PrEP are larger than those without in figure 4. For the sensitivity analysis reported in figure S12, reflecting the iPrEX trial, 𝜃2𝑃𝑟𝐸𝑃 was for a single simulated counterfactual scenario drawn from a Beta distribution with mean of 44% and 95% interquantile range 20%-70%. SOM 4.4 Treatment as prevention 1210 Counterfactual treatment as prevention (TasP) scenarios re-allocated diagnosed, untreated men to less infectious infection/care stages after ART start. The individual-level TasP prevention model has one parameter 𝜃1𝑇𝑎𝑠𝑃 time to first viral suppression, and proceeds as follows. 1220 In case of immediate ART provision, all diagnosed but untreated probable transmitters started ART. Corresponding probable transmission intervals were randomly re-allocated to stages after ART start, with the exception of the intervals between diagnosis and time to first viral suppression 𝜃1𝑇𝑎𝑠𝑃 . These intervals were always re-allocated to be ‘before first viral suppression’. Each re-allocated probable transmission interval was associated with a randomly chosen transmission probability from the new stage. Thus, the TasP prevention model changes the probability of secondary infections from diagnosed, untreated men to a lower probability of secondary infections from treated men. In case of ART provision when CD4 progress below 500 cells/ml, only the probable transmission intervals after diagnosis with CD4 progression to below 500 cells/ml were randomly re-allocated. To parameterize this model, available Kaplan-Meier estimates of the percentage of patients with initial suppression to below 100 copies/ml were used (7). An estimated 50% of all patients diagnosed between 2007/01-2010/12 reached first viral suppression in 3.6 months, and 𝜃1𝑇𝑎𝑠𝑃 was set to this value. 1230 SOM 4.5 Combinations Counterfactual combination prevention scenarios were evaluated through combination of the single intervention models. To evaluate test-and-treat prevention interventions, we first applied the testing for prevention model, followed by the treatment as prevention model. The PrEP prevention model was always linked to an HIV testing component. To evaluate PrEP in combination with test-and-treat interventions, we first applied the PrEP+test prevention model, followed by the treatment as prevention model. 39 Infection status at diagnosis (%) 100 80 60 40 20 0 1997 1999 2001 2003 2005 2007 Confirmed recent HIV infection Recent HIV infection not indicated 1240 2009 2011 Missing Figure S 1 Number of identified recipient MSM by 3-month intervals. MSM were confirmed to be in recent HIV infection at time of diagnosis if one of the following were reported: a last negative HIV-1 antibody test in the 12 months preceding diagnosis, an indeterminate HIV-1 western blot, or clinical diagnosis of acute infection. MSM with confirmed recent infection were considered as recipient in the viral phylogenetic transmission and prevention study. To evaluate trends over time, recipient MSM were stratified into four time periods as illustrated by the four blocks in the figure. 96/07−06/06 06/07−07/12 08/01−09/06 09/07−10/12 200 100 80 60 40 20 10 4 6 8 10 12 4 6 8 10 12 4 6 8 10 12 4 6 8 10 12 duration of putative infection window (months) 1250 Figure S 2 Duration of infection windows of recipient MSM. Infection windows were at most 12 months long, reflecting the definition of recency of HIV infection. Where available, last negative HIV antibody tests were used to shorten infection windows. We assumed that the window period of HIV antibody tests is approximately 4 weeks, so that the last negative test had to be within 11 months preceding diagnosis in order to reduce the duration of the infection window. 40 1260 Figure S 3 Snapshot of the reconstructed viral phylogeny. Dutch sequences were enriched with subtype B sequences from the Los Alamos HIV sequence database because multiple subtype B lineages were likely imported into the Netherlands (7). Sequences were aligned with ClustalX v2.1 (http://www.clustal.org/clustal2/) using default parameters, and the alignment was manually curated. Primary drug resistance mutations listed in the IAS-USA March 2013 update were masked in each sequence. The viral phylogeny of the enriched ATHENA sequences was reconstructed under the GTR nucleotide substitution model with the ExaML maximumlikelihood method (42). Each clade in the viral phylogeny was annotated with the frequency with which it occurred among all bootstrap trees. Sequences from the Los Alamos sequence database are shown in grey. Sequences from men in recent infection at diagnosis are shown in dark red. Sequences from men for whom recent infection at diagnosis was not indicated are shown in orange. Sequences from men with unknown infection status at diagnosis are shown in yellow. Sub-clades that occurred in 400 out of 500 bootstrap trees are shown with thicker branches. Estimated branch lengths are in units of substitutions per site. 41 0.08 ● ● ● ● ● ● ● ● ● 0.06 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● patristic distance (subst/site) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.02 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.04 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● sequence pairs first sequence from recipient MSM and second sequence from potential tr ansmitter 1270 1280 Figure S 4 Uncertainty in the estimated genetic distance between sequences from the transmitter and recipient of potential transmission pairs. For illustration purposes, 100 pairs with a median genetic distance below 2% were selected. Genetic distances (sum of the average number of nucleotide substitutions per site) were calculated from the reconstructed viral phylogeny on the sequence alignment (red dot), and from reconstructed viral phylogenies on bootstrap sequence alignments (boxplot, bar: median, box: interquartile range; whiskers: 95% quantile range). The genetic distance calculated on the tree with overall highest likelihood is shown as a blue dot. Uncertainty in genetic distances was accounted for in transmission analyses through bootstrap resampling. Figure S 5 Genetic distance between sequence pairs from previously published, epidemiologically confirmed transmitterrecipient pairs, and sequence pairs from the phylogenetically probable transmission pairs in this study. (A) Aligned sequences from the Belgium transmission chain were obtained from the authors (13). Drug-resistance sites were masked in each sequence. Patient A developed multi-drug resistance (13), and sequences from patient A were not considered. The viral phylogeny among all sequences was constructed with the maximum-likelihood methods (42) under the GTR nucleotide substitution model. Genetic distances between pairs of sequences from the confirmed transmitter and the confirmed recipient were calculated from the reconstructed viral phylogeny. Infection windows were determined through in-depth patient interviews, and made available by the authors (13). The time elapsed between sequences from a transmission pair was calculated as the time from the midpoint of the established infection window of the recipient to the sampling date of the transmitter, plus the time from the midpoint to the sampling date of the recipient. Genetic distances between confirmed pairs were strongly correlated with the time elapsed (Spearman correlation 𝝆=0.84, n=2,807). We fitted the probabilistic molecular clock model where 𝑑𝑖 𝜇𝑖 𝑑𝑖 ~ 𝐺𝑎𝑚𝑚𝑎(𝜇𝑖 , 𝜙𝑖 ) 𝜇𝑖 = 𝛽𝑡 𝜙𝑖 = 𝛾0 + 𝛾1 𝑡, genetic distance between sequence pair 𝑖 mean of the Gamma distribution for the 𝑖th pair 42 𝜙𝑖 𝛽 𝛾𝑘 dispersion of the Gamma distribution for the 𝑖th pair evolutionary rate dispersion parameters with regression techniques. The estimated model parameters were 1290 𝛽̂ = 0.00416, 𝛾̂0 = 1.008, 𝛾̂1=-0.0523. The fitted model explained 28% of the variance in the genetic distances between sequences from confirmed transmission pairs. Quantile ranges of the probabilistic molecular clock model are shown in red. (B) The fitted model was then applied to the 2,343 phylogenetically probable transmission pairs in this study to express the relative probability that a phylogenetically identified transmitter was the actual transmitter to a recipient. To reflect uncertainty in the genetic distance between probable transmission pairs, calculations were repeated on genetic distance values sampled from the distributions shown in figure S4. The time elapsed between sequences from phylogenetically probable pairs was calculated as the time from the midpoint of the infection window of the recipient to the sampling date of the transmitter, plus the time from the midpoint to the sampling date of the recipient. Transmission probabilities clearly varied between probable transmitters. 1300 1310 Figure S 6 Right censoring at past, hypothetical database closure times. (A) Distribution of time of diagnosis of potential transmitters to recipients that are diagnosed between 𝒕𝟏∗ = 𝟐𝟎06/06 to 𝒕𝟐∗ = 2007/12. (Left) Histogram of the time of diagnosis of potential transmitters with confirmed recent infection at diagnosis. Infection windows of the recipients start the earliest in June 2005, and so do the putative transmission intervals between potential transmitters and their recipients ("overlap intervals"). Therefore, all potential transmitters with an overlap interval before diagnosis must be diagnosed after June 2005. This explains the abrupt start of the histogram after June 2005. (Right) Histogram of the time of diagnosis of potential transmitters estimated to be in chronic infection. Potential transmitters in undiagnosed, chronic infection at the putative transmission time may be diagnosed several years after their recipient. (B) Estimated proportion of censored overlap intervals at hypothetical database closure times after 𝒕∗𝟐 = 2007/12. Considering a hypothetical closure time, say 𝒕∗𝑪 = 𝟐𝟎𝟎𝟖/𝟏𝟐, we considered potential transmitters with date of diagnosis after 𝒕𝑪∗ . Next, we counted the overlap intervals of the hypothetically censored potential transmitters in each stage. Then we determined the proportion of these intervals among all intervals by stage. This proportion is plotted against hypothetical closure times, and quantifies the proportion of intervals that would have been censored, had the database been closed at the hypothetical closure time. A bootstrap algorithm described in the supplementary online material was used to extrapolate these estimates to the actual database closure time. 43 ● Not in contact ART initiated, After first viral suppression Viral suppression, >1 obser vations ● ART initiated, After first viral suppression Viral suppression, 1 obser vation ● ART initiated, After first viral suppression No viral suppression ● ART initiated, After first viral suppression No viral load measured ● ART initiated, Before first viral suppression ● Diagnosed, No CD4 measured ● Diagnosed, CD4 progression to <350 ● Diagnosed, CD4 progression to [350−500] ● Diagnosed, CD4 progression to >500 ● Diagnosed < 3mo, Recent infection at diagnosis ● Undiagnosed, Unconfirmed chronic infection ● Undiagnosed, Unconfirmed recent infection ● Undiagnosed, Confirmed recent infection at diagnosis ● 0 10 20 30 40 50 60 70 80 overlap intervals of a potential transmitter with a sequence (%) time of diagnosis ● 96/07−06/06 of recipient MSM 06/07−07/12 1320 90 100 08/01−09/06 09/07−10/12 Figure S 7 Sequence sampling probabilities by stage in the infection and care continuum. To characterize sequence coverage by stage in the infection/care continuum, we considered potential transmitters with and without a sequence, and their "overlap" intervals during which they overlapped with infection windows of recipients. Then, the proportion of overlap intervals whose potential transmitter had a viral sequence sampled was calculated, and plotted by stage and time of diagnosis of the recipient. Colour codes are as in figure 2 in the main text. Typically, sampling probabilities increased with calendar time. Reflecting preferential sequencing for drug resistance testing, intervals with viral load measurements below 100 copies/ml were sampled least frequently, while those above 100 copies/ml were sampled twice as often. Intervals with a lower CD4 count were more likely to be sampled than those with a higher CD4 count. Intervals of transmitters in confirmed recent infection at diagnosis were also more likely to be sampled than those without, reflecting participation of the former in sub-studies of the ATHENA cohort (7). 44 ● 20 80 ● ● ● ● ● ● ● ● ● ● ● ● 15 60 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 ● 40 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● transmission probability (%) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● 15 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● 10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● transmission intervals 1330 Figure S 8 Invidividual-level variation in phylogenetically derived transmission probabilities by infection/care stages. Transmission probabilities for observed transmission intervals 𝝉 were calculated as described in Materials and Methods, and are shown for a random sample of 40 observed transmission intervals for four infection/care stages. Colour codes match those in figure 2 in the main text. Uncertainty in the estimated phylogenetic transmission probabilities is indicated with boxplots (black bar: median, box: 50% interquartile range, whiskers: 95% interquartile range). Substantial individual-level variation in transmission probabilities indicates that a relatively large number of past transmission events are needed in order to reliably quantify sources of transmission. 45 central estimate of HIV infection times central phylogenetic exclusion criteria lower estimate of HIV infection times upper estimate of HIV infection times 30 20 10 0 ************** ************** ************** phylogenetic exclusion criteria coalescent compatibility < 30% clade frequency < 80% phylogenetic exclusion criteria coalescent compatibility < 10% clade frequency < 80% phylogenetic exclusion criteria coalescent compatibility < 30% clade frequency < 85% 30 20 Transmission intervals (%) 10 0 ************** ************** ************** phylogenetic exclusion criteria coalescent compatibility < 20% clade frequency < 85% phylogenetic exclusion criteria coalescent compatibility < 10% clade frequency < 85% phylogenetic exclusion criteria coalescent compatibility < 30% clade frequency < 70% 30 20 10 0 ************** ************** ************** phylogenetic exclusion criteria coalescent compatibility < 20% clade frequency < 70% phylogenetic exclusion criteria coalescent compatibility < 10% clade frequency < 70% 30 20 10 0 ************** ************** stage in HIV infection and care continuum 1340 Figure S 9 Frequency of infection/care stages among phylogenetically probable transmitters. Phylogenetic exclusion criteria and infection time estimates were varied as described in the panels. Colour codes are as in figure 2 in the main text. Overall, infection/care stages before ART start were overrepresented amongst phylogenetically probable transmitters (marked by an asterix), while all stages after ART start were underrepresented amongst phylogenetically probable transmitters (marked by an asterix). Periods with no contact for at least 18 months to HIV care services were also underrepresented amongst phylogenetically probable transmitters, likely reflecting that a large proportion of potential transmitters that are listed in the ATHENA cohort but had no contact for 18 months moved abroad or died. 46 20 central estimate of HIV infection times central phylogenetic exclusion criteria lower estimate of HIV infection times upper estimate of HIV infection times phylogenetic exclusion criteria coalescent compatibility < 30% clade frequency < 80% phylogenetic exclusion criteria coalescent compatibility < 10% clade frequency < 80% phylogenetic exclusion criteria coalescent compatibility < 30% clade frequency < 85% phylogenetic exclusion criteria coalescent compatibility < 20% clade frequency < 85% phylogenetic exclusion criteria coalescent compatibility < 10% clade frequency < 85% phylogenetic exclusion criteria coalescent compatibility < 30% clade frequency < 70% phylogenetic exclusion criteria coalescent compatibility < 20% clade frequency < 70% phylogenetic exclusion criteria coalescent compatibility < 10% clade frequency < 70% 15 10 5 0 20 phylogenetic transmission probability per observed, probable interval (%) 15 10 5 0 20 15 10 5 0 20 15 10 5 0 stage in HIV infection and care continuum Figure S 10 Phylogenetically derived transmission probabilities of observed transmission intervals. Phylogenetic exclusion criteria and infection time estimates were varied as described in the panels. Colour codes are as in figure 2 in the main text. Overall, transmission probabilities were small, with a mean of 2.1% (25% quantile: 0.9%, 75% quantile: 2.5%, 97.5% quantile: 11.6%). However, when grouped by infection/care stage, the phylogenetic transmission probabilities were highly informative as to how transmission rates change with progression through the infection and care continuum. 1350 47 central estimate of HIV infection times central phylogenetic exclusion criteria lower estimate of HIV infection times upper estimate of HIV infection times 100 100 100 50 50 50 0 0 0 Transmission risk ratio compared to diagnosed, untreated men with CD4>500 (%) 96/07−06/06 08/01−09/06 96/07−06/06 08/01−09/06 96/07−06/06 08/01−09/06 06/07−07/12 09/07−10/12 06/07−07/12 09/07−10/12 06/07−07/12 09/07−10/12 phylogenetic exclusion criteria coalescent compatibility < 30% clade frequency < 80% phylogenetic exclusion criteria coalescent compatibility < 10% clade frequency < 80% phylogenetic exclusion criteria coalescent compatibility < 30% clade frequency < 85% 100 100 100 50 50 50 0 0 0 96/07−06/06 08/01−09/06 96/07−06/06 08/01−09/06 96/07−06/06 08/01−09/06 06/07−07/12 09/07−10/12 06/07−07/12 09/07−10/12 06/07−07/12 09/07−10/12 phylogenetic exclusion criteria coalescent compatibility < 20% clade frequency < 85% phylogenetic exclusion criteria coalescent compatibility < 10% clade frequency < 85% phylogenetic exclusion criteria coalescent compatibility < 30% clade frequency < 70% 100 100 100 50 50 50 0 0 0 96/07−06/06 08/01−09/06 96/07−06/06 08/01−09/06 96/07−06/06 08/01−09/06 06/07−07/12 09/07−10/12 06/07−07/12 09/07−10/12 06/07−07/12 09/07−10/12 phylogenetic exclusion criteria coalescent compatibility < 20% clade frequency < 70% phylogenetic exclusion criteria coalescent compatibility < 10% clade frequency < 70% 100 100 50 50 0 0 96/07−06/06 08/01−09/06 96/07−06/06 08/01−09/06 06/07−07/12 09/07−10/12 06/07−07/12 09/07−10/12 Figure S 11 Transmission risk ratio from men after ART start, compared to diagnosed untreated men with CD4 > 500 cells/ml. Colour codes are as in figure 2 in the main text. no improvements to annual testing coverage annual testing coverage of probable transmitters 30% annual testing coverage of probable transmitters 50% annual testing coverage of probable transmitters 70% test−PrEP (<30 yrs) PrEP reduction incidence 44% test−PrEP (<30 yrs)−treat (CD4<500) test−PrEP (<30 yrs)−treat (Immediate) test−PrEP (all) test−PrEP (all)−treat (CD4<500) test−PrEP (all)−treat (Immediate) test−PrEP (<30 yrs) PrEP reduction incidence 86% test−PrEP (<30 yrs)−treat (CD4<500) test−PrEP (<30 yrs)−treat (Immediate) test−PrEP (all) test−PrEP (all)−treat (CD4<500) test−PrEP (all)−treat (Immediate) 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 HIV infections amongst MSM in the tr ansmission cohor t that could have been averted in 08/07 − 10/12 (%) Figure S 12 Sensitivity analysis on the impact of PrEP with lower efficacy. Estimated proportion of infections between mid 2008 to 2011 that could have been averted through the listed interventions, assuming an 44% efficacy of PrEP as reported in the iPrEX trial, and an 86% efficacy of PrEP as reported in the more recent Ipergay and PROUD trials. 48 no improvements to annual testing coverage annual testing coverage of probable transmitters 30% annual testing coverage of probable transmitters 50% annual testing coverage of probable transmitters 70% test test (RNA) PrEP coverage of probable transmitters that test negative 33% test−treat (CD4<500) test−treat (Immediate) test−PrEP (<30 yrs) test−PrEP (<30 yrs)−treat (CD4<500) test−PrEP (<30 yrs)−treat (Immediate) test−PrEP (all) test−PrEP (all)−treat (CD4<500) test−PrEP (all)−treat (Immediate) test test (RNA) PrEP coverage of probable transmitters that test negative 50% test−treat (CD4<500) test−treat (Immediate) test−PrEP (<30 yrs) test−PrEP (<30 yrs)−treat (CD4<500) test−PrEP (<30 yrs)−treat (Immediate) test−PrEP (all) test−PrEP (all)−treat (CD4<500) test−PrEP (all)−treat (Immediate) test test (RNA) PrEP coverage of probable transmitters that test negative 66% test−treat (CD4<500) test−treat (Immediate) test−PrEP (<30 yrs) test−PrEP (<30 yrs)−treat (CD4<500) test−PrEP (<30 yrs)−treat (Immediate) test−PrEP (all) test−PrEP (all)−treat (CD4<500) test−PrEP (all)−treat (Immediate) 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 HIV infections amongst MSM in the tr ansmission cohor t that could have been averted in 08/07 − 10/12 (%) Figure S 13 Sensitivity analysis on the impact of lower or higher PrEP coverage. 96/07−06/04 06/05−07/12 08/01−09/06 09/07−10/12 40 with adjustment for censoring and sampling biases 30 20 10 0 40 30 with adjustment for sampling biases Proportion of transmissions (%) 1360 20 10 0 40 no adjustment for sampling and censoring biases 30 20 10 0 stage in HIV infection and care continuum Figure S 14 Impact of sampling and censoring adjustments on the estimated proportion of transmissions from stages in the infection and care continuum. The proportion of transmissions attributable to infection/care stages was calculated as described in the Materials and Methods (top row), with equal sequence sampling probabilities in the missing data model (middle row), and with 𝒎𝒋 (𝒛) =0 (bottom row, see Materials and Methods). Colour codes are as in figure 2 in the main text. With no corrections to censoring and sampling bias, the proportion of transmissions attributable to undiagnosed men declines to less than 40% by 2009/07-2010/12. With no 49 1370 corrections to censoring bias but corrections for sequence sampling bias, the proportions of stages with large per capita transmission probabilities increase most. Those stages with a large number of missing intervals are not necessarily amplified most, because each of these intervals may be associated with a relatively small transmission probability. In particular, the estimated proportion of transmissions from undiagnosed men in recent infection at diagnosis is large, even though the total number of added, missing intervals in this stage is comparatively small. This comparison confirms, first, that sequence sampling and censoring bias can have an extensive impact on population based phylogenetic analyses. Second, this comparison demonstrates that it is not intuitive how corrections to sequence sampling and censoring biases impact on such analyses when relative transmission rates also vary across risk groups. 96/07−06/04 06/05−07/12 08/01−09/06 09/07−10/12 40 with phylogenetic transmission probability per interval 30 20 10 0 with phylogenetic transmission probability per probable transmitter 40 30 Proportion of transmissions (%) 20 10 0 40 every interval equally likely 30 20 10 0 40 every probable transmitter equally likely 30 20 10 0 stage in HIV infection and care continuum 1380 Figure S 15 Impact of phylogenetic transmission probabilities on the estimated proportion of transmissions from stages in the infection and care continuum. The proportion of transmissions attributable to infection/care stages was calculated as described in the Materials and Methods (top row), without adjusting for differences in the number of intervals per pair (𝝎𝒊𝒋𝝉 = 𝝋𝒊𝒋 𝝎𝒊𝒋 , second row), with transmission from every probable transmitter equally likely (𝝎𝒊𝒋𝝉 = 𝟏⁄𝝉𝒊𝒋 , third row), and transmission from every interval equally likely (𝝎𝒊𝒋𝝉 = 𝟏 , bottom row). Colour codes are as in figure 2 in the main text. Setting 𝝎𝒊𝒋𝝉 = 𝝋𝒊𝒋 𝝎𝒊𝒋 (second row) had no substantial impact on the estimated proportions. In the last two cases, the proportion of transmissions from undiagnosed men in recent 50 infection at time of diagnosis is very small, because the high infectiousness during this short stage in the infection and care continuum is ignored. In addition, the proportion of transmissions from men after ART start is much higher because the low infectiousness during these stages in the infection and care continuum is ignored. Ignoring differential transmission probabilities among probable transmitters may complicate interpretation of viral phylogenetic cluster association studies. 96/07−06/04 06/05−07/12 08/01−09/06 09/07−10/12 central estimate of HIV infection times central phylogenetic exclusion criteria 40 30 20 10 lower estimate of HIV infection times Proportion of transmissions (%) 0 40 30 20 10 0 upper estimate of HIV infection times 40 30 20 10 0 stage in HIV infection and care continuum 1390 Figure S 16 Impact of infection time estimates on the estimated proportion of transmissions from stages in the infection and care continuum. The proportion of transmissions attributable to infection/care stages was calculated as described in the Materials and Methods (top row), based on the potential transmitters identified with lower 95% estimates of HIV infection times in table S2 (middle row), and upper 95% estimates of HIV infection times (bottom row). Colour codes are as in figure 2 in the main text. The estimated proportions did not vary substantially. 51 20 10 40 30 20 10 30 20 10 30 20 10 20 10 30 20 10 30 0 52 20 10 phylogenetic exclusion criteria coalescent compatibility < 30% clade frequency < 70% 40 phylogenetic exclusion criteria coalescent compatibility < 10% clade frequency < 85% 40 phylogenetic exclusion criteria coalescent compatibility < 20% clade frequency < 85% 30 phylogenetic exclusion criteria coalescent compatibility < 30% clade frequency < 85% 40 phylogenetic exclusion criteria coalescent compatibility < 10% clade frequency < 80% 40 phylogenetic exclusion criteria coalescent compatibility < 30% clade frequency < 80% 40 Proportion of transmissions (%) 30 central estimate of HIV infection times central phylogenetic exclusion criteria 40 09/07−10/12 08/01−09/06 06/05−07/12 96/07−06/04 0 0 0 0 0 0 Figure S 17 Impact of phylogenetic clustering criteria on the estimated proportion of transmissions from stages in the infection and care continuum. The proportion of transmissions attributable to infection/care stages was calculated as described in the Materials and Methods (top row), and then using alternative upper and lower phylogenetic exclusion criteria as described in the row panels. Colour codes are as in figure 2 in the main text. The estimated proportions did not vary substantially. 1400 Figure S 18 Impact of additional genetic distance criteria on the estimated proportion of transmissions from stages in the infection and care continuum. The proportion of transmissions attributable to infection/care stages was calculated as described in the Materials and Methods, but potential transmitters were also excluded with additional genetic distance criteria described in the row panels. Colour codes are as in figure 2 in the main text. Before censoring and sampling biases were adjusted, an additional 2% genetic distance criterion lead to a slight increase in the proportion of transmissions attributable to men in their first year of infection, while the additional 4% genetic distance criterion lead to estimates that are comparable to those obtained without the genetic distance criterion. After censoring and sampling biases are adjusted, the estimated proportions did not differ substantially from those obtained without an additional genetic distance criterion. 53 no improvements to annual testing coverage annual testing coverage of probable transmitters 30% annual testing coverage of probable transmitters 50% annual testing coverage of probable transmitters 70% test with adjustment for censoring and sampling biases test (RNA) test−treat (CD4<500) test−treat (Immediate) test−PrEP (<30 yrs) test−PrEP (<30 yrs)−treat (CD4<500) test−PrEP (<30 yrs)−treat (Immediate) test−PrEP (all) test−PrEP (all)−treat (CD4<500) test−PrEP (all)−treat (Immediate) test test (RNA) with adjustment for sampling biases test−treat (CD4<500) test−treat (Immediate) test−PrEP (<30 yrs) test−PrEP (<30 yrs)−treat (CD4<500) test−PrEP (<30 yrs)−treat (Immediate) test−PrEP (all) test−PrEP (all)−treat (CD4<500) test−PrEP (all)−treat (Immediate) test no adjustment for sampling and censoring biases test (RNA) test−treat (CD4<500) test−treat (Immediate) test−PrEP (<30 yrs) test−PrEP (<30 yrs)−treat (CD4<500) test−PrEP (<30 yrs)−treat (Immediate) test−PrEP (all) test−PrEP (all)−treat (CD4<500) test−PrEP (all)−treat (Immediate) 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 HIV infections amongst MSM in the tr ansmission cohor t that could have been averted in 08/07 − 10/12 (%) 1410 Figure S 19 Impact of sequence sampling and censoring adjustments on the estimated proportion of averted infections. The proportion of transmissions that could have been averted was calculated as described in the Materials and Methods (top row), with equal sequence sampling probabilities in the missing data model (middle row), and with 𝒎𝒋 (𝒛) =0 (bottom row). With no corrections to censoring and sampling bias, the estimated proportion of undiagnosed men is less than 40% in 2008/07-2010/12. Correspondingly, the estimated impact of test-and-treat is higher when compared to the central estimate. However, even under this extreme case of model misspecification, interventions including test-and-PrEP are associated with the largest reductions in HIV incidence. The estimated 𝒂(𝑯) differ from the central estimate by at most 10%. The case where sampling bias is adjusted but censoring bias is ignored is overall similar to the central estimates. This comparison indicates that the evaluation of the short-term impact of prevention strategies is robust to extensive differences in how sequence sampling and right censoring biases are adjusted for. no improvements to annual testing coverage annual testing coverage of probable transmitters 30% annual testing coverage of probable transmitters 50% annual testing coverage of probable transmitters 70% test with phylogenetic transmission probability per interval test (RNA) test−treat (CD4<500) test−treat (Immediate) test−PrEP (<30 yrs) test−PrEP (<30 yrs)−treat (CD4<500) test−PrEP (<30 yrs)−treat (Immediate) test−PrEP (all) test−PrEP (all)−treat (CD4<500) test−PrEP (all)−treat (Immediate) test test (RNA) test−treat (CD4<500) every interval equally likely test−treat (Immediate) test−PrEP (<30 yrs) test−PrEP (<30 yrs)−treat (CD4<500) test−PrEP (<30 yrs)−treat (Immediate) test−PrEP (all) test−PrEP (all)−treat (CD4<500) test−PrEP (all)−treat (Immediate) test every probable transmitter equally likely test (RNA) test−treat (CD4<500) test−treat (Immediate) test−PrEP (<30 yrs) test−PrEP (<30 yrs)−treat (CD4<500) test−PrEP (<30 yrs)−treat (Immediate) test−PrEP (all) test−PrEP (all)−treat (CD4<500) test−PrEP (all)−treat (Immediate) 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 HIV infections amongst MSM in the tr ansmission cohor t that could have been averted in 08/07 − 10/12 (%) 1420 Figure S 20 Impact of phylogenetic transmission probabilities on the estimated proportion of averted infections. The proportion of transmissions that could have been averted was calculated as described in the Materials and Methods without adjusting for differences in the number of intervals per pair (𝝎𝒊𝒋𝝉 = 𝝋𝒊𝒋 𝝎𝒊𝒋 , top row), with transmission from every probable transmitter equally likely (𝝎𝒊𝒋𝝉 = 𝟏⁄𝝉𝒊𝒋 , middle row), and transmission from every interval equally likely (𝝎𝒊𝒋𝝉 = 𝟏, bottom row). It is clear that if men 54 after diagnosis are considered as infectious as undiagnosed men, then there is no secondary benefit in moving these individuals to stages further down in the HIV infection and care continuum. 55 1430 56 Figure S 21 Impact of infection time estimates and phylogenetic exclusion criteria on the estimated proportion of averted infections. The proportion of transmissions that could have been averted was calculated as described in the Materials and Methods, but using alternative upper and lower phylogenetic exclusion criteria as described in the row panels. The estimated proportions averted did not vary substantially. no improvements to annual testing coverage annual testing coverage of probable transmitters 30% annual testing coverage of probable transmitters 50% central estimate of HIV infection times central phylogenetic exclusion criteria and genetic distance <2% test test (RNA) test−treat (CD4<500) test−treat (Immediate) test−PrEP (<30 yrs) test−PrEP (<30 yrs)−treat (CD4<500) test−PrEP (<30 yrs)−treat (Immediate) test−PrEP (all) test−PrEP (all)−treat (CD4<500) test−PrEP (all)−treat (Immediate) annual testing coverage of probable transmitters 70% central estimate of HIV infection times central phylogenetic exclusion criteria and genetic distance <4% test test (RNA) test−treat (CD4<500) test−treat (Immediate) test−PrEP (<30 yrs) test−PrEP (<30 yrs)−treat (CD4<500) test−PrEP (<30 yrs)−treat (Immediate) test−PrEP (all) test−PrEP (all)−treat (CD4<500) test−PrEP (all)−treat (Immediate) 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 HIV infections amongst MSM in the tr ansmission cohor t that could have been averted in 08/07 − 10/12 (%) 1440 Figure S 22 Impact of additional genetic distance criteria on the estimated proportion of averted infections per biomedical intervention. The proportion of transmissions that could have been averted was calculated as described in the Materials and Methods, but potential transmitters were also excluded with additional genetic distance criteria described in the row panels. The estimated shortterm impact of prevention strategies is insensitive to an additional 4% genetic distance criterion. With an addition 2% genetic distance criterion, the predicted impact that test-and-treat strategies could have had on reducing incidence is lower than without a genetic distance criterion. 57 1450 Figure S 23 Differences in transmission networks with and without a recipient MSM. To investigate if transmitters to MSM in recent infection at diagnosis might differ from typical transmitters, we considered all identified viral phylogenetic clusters with and without a recipient MSM. We then sought to evaluate if the number of late presenters (defined as men with a first CD4 count below 350 cells/ml within 12 months after diagnosis and before ART start) was enriched amongst clusters with no recipient MSM. (A) The number of late presenters increases linearly with cluster size (blue: loess smooth, black: regression model with linear dependence on cluster size). Adjusting for cluster size, we then fitted the following Poisson model with identity link 𝑛𝑖 ~ 𝑃𝑜𝑖(𝜇𝑖 ) 𝜇𝑖 = 𝛽0 + 𝛽1 𝑟𝑖 + 𝛽2 𝑠𝑖 , where 𝑛𝑖 𝜇𝑖 𝑟𝑖 𝑠𝑖 number of late presenters in cluster 𝑖 mean number of late presenters in cluster 𝑖 1 if cluster 𝑖 has a recipient MSM and 0 otherwise size of cluster 𝑖, to estimate the contrast 𝛽1 between the average number of late presenters in clusters with and without a recipient. 𝛽1 was significantly smaller than zero after differences in cluster size were adjusted for (p=2e-14). (B) To visualize, we calculated the adjusted number of late presenters in a cluster as the number of late presenters minus the expected number of late presenters in clusters with a recipient MSM (𝑟𝑖 = 1) under the fitted Poisson model. Dots indicate the adjusted number of late presenters for all viral phylogenetic transmission clusters. Boxplots indicate the mean and two standard deviations from the mean. Viral phylogenetic transmission clusters without a recipient MSM are clearly enriched in late presenters. 1460 58 Figure S 24 Exploratory local polynomial regression fits to the time to diagnosis of MSM with a last negative test in the ATHENA cohort. Data from MSM with a last negative test (dots) is shown on top of the mean local polynomial regression fit (blue line) and the corresponding 95% quantile ranges (light blue region). 59 Figure S 25 Multivariable Gamma regression model fitted to the time between the midpoint of the seroconversion interval and diagnosis of MSM with a last negative test in the ATHENA cohort. Data from MSM with a last negative test (dots) are shown on top of the mean Gamma regression fit (blue line) and the corresponding 95% quantile ranges (light blue region). Probability that time between midpoint of seroconversion interval and diagnosis is > t 1470 1.00 Seroconverting MSM for whom recent HIV infection was not indicated at diagnosis and 0.75 CD4 < 250, 25 years at diagnosis CD4 < 250, 35 years at diagnosis 0.50 CD4 > 850, 35 years at diagnosis 0.25 CD4 in [250−850], 35 years at diagnosis 0.00 0 1 2 3 4 5 6 7 8 t (years) Figure S 26 Estimated probability that the time between the midpoint of the seroconversion interval and diagnosis among MSM with a last negative test is larger than t years. 60 0.10 Data set pairs, ● Confirmed Belgium 0.08 evolutionary divergence (nucleotide substitutions / site) 1480 Figure S 27 Time to diagnosis estimates. (A) Average estimated time to diagnosis for MSM infected between 1996-1999 under the regression model as a function of the quantile parameter. The maximum-likelihood estimate and the 95% confidence interval derived from two mathematical modelling studies are highlighted in black/grey. (B) Predicted time to diagnosis for all MSM in the ATHENA cohort with estimated date of infection between 1990-2011. The three subplots show the higher, central and lower estimates of time to diagnosis that correspond to the calibrated quantile parameters 0.109, 0.148, 0.194. pairs, ● Confirmed Sweden 0.06 ● ● ● 0.04 ● ● ● ● 0.02 ● ● 0.00 0 2 4 6 8 10 12 time elapsed (years) Figure S 28 Genetic distance among sequence pairs from transmitter-recipient pairs in the Belgium and Swedish transmission chains. The 80% and 95% interquartile range under the fitted probabilistic molecular clock model are shown with dashed and dotted lines. 61 60 Distance threshold probability that sequences from the same individual do not co−cluster (%) None 50 2% subst/site 4% subst/site 40 30 20 10 0 0.7 0.75 0.8 0.85 0.9 0.95 clade frequency threshold 1490 Figure S 29 Approximate type-I error of the phylogenetic clustering criterion as a function of the clade frequency threshold. Approximate type-I error in excluding sequences from the same individual if no genetic distance threshold is used (green), a 2% substitutions / site distance threshold is used (orange), and a 4% substitutions / site genetic distance threshold is used (purple). Figure S 30 Type-I error and power of the coalescence compatibility test. (A) Approximate type-I error in excluding co-clustering sequences from the same individual. (B) Power in excluding co-clustering female-female pairs. 1500 62 Figure S 31 Estimated fraction of non-censored potential transmission intervals. The fraction of non-censored potential transmission intervals was estimated for six time intervals, [𝒕𝟏 , 𝒕𝟐 ]= 1996/07-2006/06, 2006/07-2007/12, 2008/01-2009/06, 2009/072009/12, 2010/01-2010/06, 2010/07-2010/12. The fraction is plotted at the end time of each time interval. central estimate of HIV infection times central phylogenetic exclusion criteria lower estimate of HIV infection times 60 60 60 40 40 40 20 20 20 0 Proportion with a last negative test 60 0 1 2 phylogenetic exclusion criteria coalescent compatibility < 10% clade frequency < 80% 3 60 0 1 2 phylogenetic exclusion criteria coalescent compatibility < 10% clade frequency < 85% 3 60 40 40 40 20 20 20 0 60 0 1 2 phylogenetic exclusion criteria coalescent compatibility < 20% clade frequency < 85% 3 60 3 60 40 40 20 20 20 60 0 1 2 phylogenetic exclusion criteria coalescent compatibility < 30% clade frequency < 85% 3 1 2 phylogenetic exclusion criteria coalescent compatibility < 20% clade frequency < 70% 3 1 2 phylogenetic exclusion criteria coalescent compatibility < 30% clade frequency < 80% 3 0 1 2 phylogenetic exclusion criteria coalescent compatibility < 30% clade frequency < 70% 40 0 phylogenetic exclusion criteria coalescent compatibility < 10% clade frequency < 70% 0 1 2 3 1 2 3 upper estimate of HIV infection times 60 40 40 20 20 0 Diagnosed in 09/07−10/12 Probable transmitters to recipients diagnosed in 09/07−10/12 0 1 2 3 1 2 3 time between last negative HIV test and diagnosis (years) Figure S 32 Time between last negative test and diagnosis amongst MSM diagnosed in 2009/07-2010/12 and probable transmitters of recipients diagnosed in 2009/07-2010/12. 1510 63 Table S1 Clinical and viral sequence data used in this study Sample sizes Registered MSM by March 2013 MSM with confirmed recent infection Viral load measurements CD4 measurements 11,863 1,794 265,853 284,151 Coverage of linked clinical data Time of diagnosis of recipient MSM of potential transmitters 96/07-06/06 06/07-07/12 08/01-09/06 09/07-10/12 Unknown recency status at diagnosis (%) 51 48 46 45 No CD4 measurement between diagnosis and ART start (%) 8.6 8.0 7.8 7.6 No viral load measurement after ART start (%) 8.4 8.2 8.0 8.0 Frequency of linked clinical data 96/07-06/06 06/07-07/12 08/01-09/06 09/07-10/12 2.5% 0.16 0.18 0.19 0.19 96/07-06/06 06/07-07/12 08/01-09/06 09/07-10/12 2.5% 0.72 0.76 0.75 0.76 CD4 measurements between diagnosis and ART start of potential transmitters (number / year) 25% Median 75% 1.38 2.75 3.93 1.61 2.92 3.99 1.67 2.97 4.01 1.68 2.97 4.01 Viral load measurements after ART start of potential transmitters (number/year) 25% Median 75% 2.13 2.70 3.29 2.11 2.66 3.23 2.10 2.64 3.21 2.09 2.62 3.19 97.5% 6.71 6.73 6.74 6.74 97.5% 5.00 4.54 4.50 4.41 Partial polymerase HIV-1 subtype B sequences available Dutch sequences enriched with 10 most similar sequences in the Los Alamos Sequence Database (http://www.hiv.lanl.gov/) Data set used in analysis, after exclusion of potentially recombinant sequences (identified with 3SEQ), and exclusion of sequences with length <= 250 nucleotides n 8,748 9,474 9,054 Charateristics of Dutch HIV-1 subtype B sequences used in analysis Dutch sequences Sequences sampled after ART start Patients sampled n 8,328 3,693 6,231 5% 1175 Length of ATHENA sequences (nt) Time from diagnosis to 0 sampling of an individual’s first sequence (years) Individuals with at least one sequence Male Female MSM 4767 0 25% 1235 Mean 1256 75% 1274 95% 1600 0.03 2.8 4.1 13.5 Drug user 123 57 Other 82 42 Unknown 174 14 Heterosexual 518 455 64 Table S2 Potential transmitters and potential transmission pairs to the recipient MSM Time of diagnosis of recipient MSM Recipient MSM Potential transmitters* Potential transmission pairs* Total Central estimate of infection time ¶ Lower estimate of infection time ¶ Upper estimate of infection time ¶ All (n) With a sequence (n) All (n) With a sequence (n) All (n) With a sequence (n) Overall 1794 1045 1794 1045 1794 1045 96/07-06/06 06/07-07/12 08/01-09/06 09/07-10/12 695 323 405 371 368 216 233 228 695 323 405 371 368 216 233 228 695 323 405 371 368 216 233 228 Overall 12193 5585 12189 5585 12193 5585 96/07-06/06 06/07-07/12 08/01-09/06 09/07-10/12 9687 10179 10962 11419 4322 4750 5148 5329 9537 10093 10935 11415 4239 4718 5142 5329 9816 10272 10989 11419 4376 4779 5161 5329 Overall 9722349 4428060 9617162 4378332 9824133 4475601 96/07-06/06 2712072 1158948 2649412 1125357 2777347 1193496 06/07-07/12 2075669 961286 2052721 951409 2096432 969131 08/01-09/06 2427787 1134094 2412127 1128919 2440400 1138425 09/07-10/12 2506821 1173732 2502902 1172647 2509954 1174549 * Potential transmitters and potential transmission pairs were counted for recipient MSM with a sequence for computational reasons. ¶ The infection time of potential transmitters was estimated from their age at diagnosis, recency of HIV infection at diagnosis, and first CD4 count within 12 months of diagnosis. The central estimate is based on 𝛼 = 0.148 in the estimation procedure described in the supplementary online material; the lower estimate is based on 𝛼 = 0.194, and the upper estimate is based on 𝛼 = 0.109. 65 Table S3 Identified phylogenetically probable transmitters and phylogenetically probable transmission pairs to the recipient MSM. Time of diagnosis of recipient MSM Total Central exclusion criteria Clade freq <80% Coal comp <20% (n) (%) Lower-I exclusion criteria Clade freq <80% Coal comp <30% (n) (%) Upper-I exclusion criteria Clade freq <80% Coal comp <10% (n) (%) Lower-II exclusion criteria Clade freq <85% Coal comp <20% (n) (%) Upper-II exclusion criteria Clade freq <70% Coal comp <20% (n) (%) Recipient MSM with a phylogenetically probable transmitter Overall 96/07-06/06 06/07-07/12 08/01-09/06 09/07-10/12 617 165 144 152 157 59.14 * 44.84 * 66.67 * 65.24 * 68.86 * 594 150 143 149 152 56.84 * 40.76 * 66.2 * 63.95 * 66.67 * 638 172 146 159 161 61.05 * 46.74 * 67.59 * 68.24 * 70.61 * 564 146 134 138 146 53.97 * 39.67 * 62.04 * 59.23 * 64.04 * 656 179 150 160 167 62.78 * 48.64 * 69.44 * 68.67 * 73.25 * Phylogenetically probable transmitters Overall 96/07-06/06 06/07-07/12 08/01-09/06 09/07-10/12 903 268 331 348 407 16.17 § 6.32 § 7.02 § 6.77 § 7.64 § 841 237 302 322 370 15.06 § 5.59 § 6.4 § 6.26 § 6.94 § 981 308 362 400 448 17.56 § 7.27 § 7.67 § 7.78 § 8.41 § 823 240 307 323 367 14.74 § 5.55 § 6.46 § 6.27 § 6.89 § 975 308 359 391 442 17.46 § 7.13 § 7.56 § 7.6 § 8.29 § Phylogenetically probable transmission pairs Overall 2343 0.05 ¶ 2059 0.05 ¶ 2698 0.06 ¶ 2097 0.05 ¶ ¶ 353 0.03 ¶ 477 0.04 ¶ 2718 0.06 ¶ 380 0.03 ¶ 498 0.04 ¶ 96/07-06/06 401 0.04 06/07-07/12 506 0.05 ¶ 446 0.05 ¶ 569 0.06 ¶ 488 0.05 ¶ 579 0.06 ¶ 08/01-09/06 731 0.06 ¶ 636 0.06 ¶ 842 0.07 ¶ 642 0.06 ¶ 819 0.07 ¶ 09/07-10/12 705 0.06 ¶ 624 0.05 ¶ 810 0.07 ¶ 587 0.05 ¶ 822 0.07 ¶ See supplementary online material for a description of sensitivity analyses. * Proportion among all recipient MSM with a sequence, §proportion among potential transmitters with a sequence, ¶proportion among potential transmission pairs with sequences from both individuals based on central estimates of infection times. 66 Table S4 Demographic and clinic characteristics of the 3,025 MSM with a last negative test, that were used to fit the multivariable regression model Time of diagnosis Age at diagnosis CD4 count at diagnosis Infection status at diagnosis <=06/06 06/07-07/12 08/01-09/06 09/07-10/12 931 609 721 764 <=25 26-35 36-45 46-55 >55 242 944 1114 545 180 No CD4 measurement to date CD4 measurement after 1 year of diagnosis 93 Recent HIV infection not indicated 722 First CD4 measurement after ART start 52 Missing ≤250 251-850 >850 471 2169 230 10 Confirmed recent HIV infection 1369 934 67