Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Downloaded from http://rstb.royalsocietypublishing.org/ on August 3, 2017 rstb.royalsocietypublishing.org Research Cite this article: Yaari G, Benichou JIC, Vander Heiden JA, Kleinstein SH, Louzoun Y. 2015 The mutation patterns in B-cell immunoglobulin receptors reflect the influence of selection acting at multiple time-scales. Phil. Trans. R. Soc. B 370: 20140242. http://dx.doi.org/10.1098/rstb.2014.0242 Accepted: 19 May 2015 One contribution of 13 to a theme issue ‘The dynamics of antibody repertoires’. Subject Areas: immunology, computational biology Keywords: antigen-driven selection, affinity maturation, mutations, lineage trees, B-cell receptor, next generation sequencing Authors for correspondence: Steven H. Kleinstein e-mail: [email protected] Yoram Louzoun e-mail: [email protected] The mutation patterns in B-cell immunoglobulin receptors reflect the influence of selection acting at multiple time-scales Gur Yaari1, Jennifer I. C. Benichou2, Jason A. Vander Heiden4, Steven H. Kleinstein4,5,† and Yoram Louzoun2,3,† 1 Bioengineering Program, Faculty of Engineering, 2Department of Mathematics, and 3Gonda Brain Research Center, Bar-Ilan University, Ramat Gan 5290002, Israel 4 Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA 5 Departments of Pathology and Immunobiology, Yale School of Medicine, New Haven, CT 06511, USA During the several-week course of an immune response, B cells undergo a process of clonal expansion, somatic hypermutation of the immunoglobulin (Ig) genes and affinity-dependent selection. Over a lifetime, each B cell may participate in multiple rounds of affinity maturation as part of different immune responses. These two time-scales for selection are apparent in the structure of B-cell lineage trees, which often contain a ‘trunk’ consisting of mutations that are shared across all members of a clone, and several branches that form a ‘canopy’ consisting of mutations that are shared by a subset of clone members. The influence of affinity maturation on the B-cell population can be inferred by analysing the pattern of somatic mutations in the Ig. While global analysis of mutation patterns has shown evidence of strong selection pressures shaping the B-cell population, the effect of different time-scales of selection and diversification has not yet been studied. Analysis of B cells from blood samples of three healthy individuals identifies a range of clone sizes with lineage trees that can contain long trunks and canopies indicating the significant diversity introduced by the affinity maturation process. We here show that observed mutation patterns in the framework regions (FWRs) are determined by an almost purely purifying selection on both short and long time-scales. By contrast, complementarity determining regions (CDRs) are affected by a combination of purifying and antigen-driven positive selection on the short term, which leads to a net positive selection in the long term. In both the FWRs and CDRs, long-term selection is strongly dependent on the heavy chain variable gene family. 1. Introduction † These authors equally contributed to this work. Electronic supplementary material is available at http://dx.doi.org/10.1098/rstb.2014.0242 or via http://rstb.royalsocietypublishing.org. B lymphocytes recognize pathogens through the binding of specific B-cell receptors (BCRs), also referred to as immunoglobulin (Ig), expressed on their cell surface. Receptor diversity in the B-cell population is generated in two stages. First, an initial BCR is created through recombination of different germline gene segments during B-cell maturation in the bone marrow [1]. Second, somatic hypermutation (SHM) introduces point mutations into the DNA coding for the BCR during T-celldependent adaptive immune responses. The SHM rate has been estimated to be approximately 1023 per base-pair per cell division [2–4]. This is 106-fold higher than the background mutation rate in other somatic cells. These mutations may alter the affinity or specificity of the BCR, and thus are a source of diversity within an expanding B-cell clone. B cells with affinity-increasing mutations are preferentially expanded in germinal centres, in a process known as affinity maturation, which results in an increase in average affinity in the population over time. Some of these B cells will differentiate into long-lived memory and plasma cells, which are & 2015 The Author(s) Published by the Royal Society. All rights reserved. Downloaded from http://rstb.royalsocietypublishing.org/ on August 3, 2017 (a) germline (b) 2 vy ea ch h n ai h tc 14 trunk gh li MRCA FWR1 CDR1 FWR2 CDR2 FWR3 CDR3 FWR4 2 5 canopy 2 3 germline M A I R L S F Q G I A R I P atg gcc ata cgg tta agc ttc caa ggg ata gcg cgc ata ccc atg gcc aga cgg tta agc ttg caa ggc ata gtg cgc ata ccc sequence M A R R L S L Q G I V R I P NS S Figure 1. B-cell lineages divide the affinity maturation process into long (trunk) and short (canopy) time-scales. (a) Antibodies are composed of two identical heavy chains and two identical light chains. The mRNA coding for each chain is divided into FWRs and CDRs, which are interlaced in the linear genetic code. To identify the somatic mutations that are accumulated during affinity maturation, antibody sequences are first aligned with the corresponding inferred germline sequence. (b) After clustering the repertoire into clonally related sequences (clones) and building lineage trees, the affinity maturation process was divided into two non-overlapping intervals: trunk and canopy, which were separated by the MRCA in the lineage, and correspond to long and short time-scales, respectively. The number of mutations is indicated along each branch (assumed to be unity if no number). (Online version in colour.) critical to protect us from recurrent infections with the same (or a closely related) microorganism. Within the germinal centres, B cells also undergo isotype switching (e.g. from IgM to IgG) which allows for different effector functions. Unlike naive B cells that start with a BCR in the unmutated germline state, B memory cells that are reactivated through exposure to recurrent and related infections usually begin with a mutated, affinity-matured receptor, which is then further diversified as part of the adaptive immune response. These two time-scales for selection are apparent in the structure of B-cell lineage trees, which often contain a ‘trunk’ consisting of mutations that are shared across all sampled members of a clone, and several branches that form a ‘canopy’ consisting of mutations that are shared by a subset of clone members (figure 1b). The trunk and canopy are separated by the most recent common ancestor (MRCA), which estimates the state of the B cell that initiated the most recent expansion. The MRCA also contains some of the mutations that were fixed during affinity maturation in the most recent germinal centre reaction [5]. Previous studies of selection in the B-cell repertoire have not differentiated between these two scales, and it is unclear if the selection processes are uniform over time. Our previous work suggests that selection may operate differently in the long- and short-term scales as removing the most recent mutations altered the signal for selection [6]. B-cell affinity maturation is a micro-evolutionary stochastic process. The dynamics of B-cell clonal expansion and selection for binding pathogens represents an example of a process involving rapid asexual reproduction, where constant diversification and adaptation occurs in parallel with a high mutation rate [7 –9]. The BCR appears to be tuned by evolution to optimize the impact of SHM. Mutation hotspots are more common in the complementarity-determining regions (CDRs), which include most antigen contact residues, and less common in framework regions (FWRs), which are important for the overall receptor structure. Mutations are also more likely to be non-synonymous (NS) than synonymous (S) in the CDRs compared with the FWRs [10] (figure 1a). Mutation in the FWRs may destabilize the antibody (and thus be subject to negative selection), whereas mutations in the CDRs may improve the antibody (and thus be positively selected). Multiple computational methods have been proposed to measure selection in populations or between populations, in the sense of evolving towards a higher fitness phenotype. A common measure compares the observed frequency of NS mutations (NS/(NS þ S)) to its expected value. The expected ratio is calculated based on an underlying mutation probability model (e.g. [11–13]), or based on genetic regions where no selection is assumed to occur [14–16]. An increased frequency of NS mutations is treated as an indication of positive selection, while a decreased frequency indicates negative selection. A potential drawback of such methods is their strong sensitivity to the baseline mutation model (i.e. the expected probability of each mutation type), especially when the mutations rate is position dependent [17]. SHM is significantly affected by the local nucleotide composition, resulting in sequence-specific hot- and cold-spots for mutation [11,12]. We have recently developed a background mutation model for SHM targeting and nucleotide substitution that can be used to estimate the expected NS and S mutation frequencies, and thus quantify selection more accurately than previously possible [11]. Phil. Trans. R. Soc. B 370: 20140242 2 rstb.royalsocietypublishing.org n ai Downloaded from http://rstb.royalsocietypublishing.org/ on August 3, 2017 (b) (a) IgA IgG IgM 5 × 102 rstb.royalsocietypublishing.org 1 × 103 3 trunk FWR canopy no. clones 1 × 102 trunk 5 × 10 CDR canopy 1 × 10 trunk FWR + CDR canopy 1×1 5 ×10–1 1 2 3−4 5−8 9−16 17−32 33−64 65−128 clone size (unique sequences) 0 5 10 no. mutations 15 20 Figure 2. Trunk mutations dominate B-cell repertoires from blood samples of healthy donors. (a) Clone size distribution for each of the sequenced isotypes from subject 420IV. Axes are logarithmic and bins are ‘logarithmic’ sized (each bin is double the size of the previous one). (b) The number of mutations associated with the trunk or canopy was determined for IgG sequences of subject 420IV. Mutations from different regions (FWRs, CDRs or both) are shown in the three panels. Recent advances in high-throughput sequencing technologies allow for large-scale characterization of B-cell repertoires, and construction of lineage trees from observed B-cell clones. We analysed the observed blood cDNA BCR repertoire from three human donors, and show clear differences in the distribution of mutations in different genetic regions, and between fixed (trunk) and non-fixed (canopy) mutations. The results presented here strengthen the common thinking about negative selection in the FWRs, but propose a more complex view of the positive selection argued to occur in CDRs. the clones (35% of sequences) in each donor were singletons, and over 25% contained both trunks and canopies. In a significant portion of trees, the MRCA was sampled (42.7%) and not inferred. Somatic mutations were not distributed evenly between the trunk and canopy. The median number of trunk mutations was five per sequence, while two mutations per sequence were observed in the canopy for IgG sequences (figure 2b). Note that only mutations occurring in the heavy chain V gene (VH), excluding the CDR3, can be taken into account, as the germline sequence in the junctional regions is estimated with significantly lower confidence. The skewing of mutations towards the trunk was observed for both FWR and CDR mutations, with a higher variance among clones in the CDRs (figure 2b). 2. Results (a) B-cell lineages in the blood contain both trunks and canopies (b) Selection estimation measures B-cell repertoire sequencing data were obtained from blood samples of three healthy human donors [18]. As described in §4(a), these sequencing data were processed using the pRESTO pipeline [19] to generate high-quality Ig sequences, which were then submitted to IMGT/HighV-QUEST for germline V(D)J segment identification. The Change-O pipeline [20] was then used to partition the sequences into clonally related groups, and a lineage tree was constructed for each clone (see the electronic supplementary material for representative examples of these trees). Observed clone sizes ranged from 1 to 128 unique sequences and followed a scale-free distribution (figure 2a). Each reconstructed clonal lineage was split into a trunk and canopy. The trunk was defined as the branch leading from the germline Ig sequence inferred by IMGT/HighV-QUEST to the MRCA, while the canopy contained all other branches. The mutation analysis focused on the variable (V) gene segment. Differences in the nucleotide sequence comparing the inferred germline to the MRCA were defined as trunk mutations, while all V gene segment mutations from the MRCA to the observed sequences were defined as canopy mutations. Singletons (i.e. clones containing a single unique sequence) were not included in the analysis, while clones where the inferred germline Ig sequence was the MRCA were included. On average, 72% of Selection can be estimated by comparing the observed and expected number of NS mutations. If the observed number of NS mutations is higher than expected, this is interpreted as an advantage induced by NS mutations (i.e. positive selection). Similarly, a lower than expected number of NS mutations is interpreted as a disadvantage of NS mutations (i.e. negative selection). To quantify selection, we use the previously proposed BASELINe method [21], along with Multi-locus Bayesian Selection Measure (MBSM), a novel method proposed herein (see the electronic supplementary material for full derivation). MBSM provides a clear Bayesian interpretation of the observed NS and S value in each region. In order to estimate the selection pressure affecting a single sequence using the number of NS and S mutations, the distribution of expected NS and S is required, and not only their average. While it may be correct to assume a symmetrical normal distribution around an expected number when a large number of sequences is observed, this may not be correct when a single sequence is analysed. To estimate the full distribution of expected NS values in a given region of a sequence, MBSM uses a Bayesian approach, which provides a direct interpretation of the number of S mutations as the probability distribution of generations since the onset of mutations. Phil. Trans. R. Soc. B 370: 20140242 5×1 Downloaded from http://rstb.royalsocietypublishing.org/ on August 3, 2017 (a) (b) 0.3 fraction 0.2 0 S −1 0 0.1 −2 0.3 −3 canopy 0.4 IGHA IGHG IGHM IGHA IGHG IGHM 0.1 hu420143−negative 420IV−negative PGP1−negative hu420143−positive 420IV−positive PGP1−positive 1 420IV−canopy hu420143−canopy PGP1−canopy 420IV−trunk hu420143−trunk PGP1−trunk 0 S −1 0 −2 0.1 −3 canopy 0.2 IGHA IGHG IGHM IGHA IGHG IGHM Figure 3. Mutations in the FWRs are negatively selected in both trunk and canopy. (a,c) The fraction of clones with significant negative (filled bars) or positive (hatched bars) selection as determined by (a) BASELINe and (c) MBSM applied to the FWRs. In each panel, the upper part refers to the trunk, while the lower part refers to the canopy. (b,d ) The average selection strength across all sequences as determined by (b) BASELINe and (d ) MBSM applied to the FWRs. For BASELINe (b), 95% CIs are shown for each of the selection estimations. (c) The framework regions are subject to purifying selection on both short and long time-scales The BCR FWRs are important to maintain the structural integrity of the receptor, and it has been estimated that approximately 50% of NS mutations in these regions are subject to negative selection [22]. Indeed, all three individuals studied here exhibited strong negative selection pressure on FWR mutations. This negative selection was observed in both trunk and canopy mutations, and for all isotypes (IgA, IgG and IgM) (figure 3). Significantly, when each clone was analysed separately, practically none displayed evidence of significant positive selection in the FWRs. These results were confirmed using two different computational methods for analysing selection mentioned above (BASELINe and MBSM; see §4(c) for details). While there were quantitative differences in the selection strengths estimated by the different methods, both agreed about the direction and consistency of negative selection in the FWRs. This observation was also not dependent on specific V genes, and similar results were found when analysing each V gene separately (data not shown). Overall, these results suggest that negative selection in the FWRs is sweeping, and not the result of a net negative balance between advantageous and deleterious mutations with selection force of the same order (figure 3). (d) The complementarity determining regions are subject to purifying selection on short time-scales Most of the residues that contact antigen are found in the CDRs, and it has been assumed that positive selection for affinity-increasing mutations would be focused in these regions [23]. However, many studies have failed to detect positive selection in the CDRs, and it has been suggested that methods based on the frequency of NS mutations may not be appropriate [24]. All three individuals studied here exhibited neutral selection in the CDRs, even when the analysis was restricted to class-switched (and presumably affinity-matured) sequences. However, a much more complex picture emerged when CDR selection was analysed separately for trunk and canopy mutations. CDR mutations that appear in the trunks of lineage trees displayed a strong signal for positive selection, while mutations in the canopy were negatively selected. These results were consistent across methods to quantify selection, isotypes and individuals. The strengths of positive and negative selection were similar (across individuals and isotypes) in the trunk and canopy, respectively. This may explain why neutral selection is commonly observed when considering all mutations as a group. Unlike the case for FWRs, where negative selection was dominant in virtually all clones, selection in the CDRs was variable among clones. Clones with either positive or negative selection among trunk mutations were identified (figure 4). Similar diversity was found when considering selection among canopy mutations. As expected from the population-level analysis, more clones were positively selected when analysing trunk mutations, while more clones were negatively selected when considering canopy mutations. A high NS frequency in the trunk can be interpreted as a high fixation probability, which has been directly correlated with selection [25]. The interpretation of the same frequency in the canopy is less straightforward as, by definition, sequences with and without the mutations are observed (otherwise, the mutation would be in the trunk). In this case, positive selection may be more evident in the frequency of cells carrying NS mutations. However, the selection analysis presented above assumes that all identical sequences were derived from a single cell, no matter how many times the sequence was observed. This is because the data are derived from PCR amplification of mRNA and it is impossible to tell whether two identical sequences were derived from: (1) PCR amplification from a Phil. Trans. R. Soc. B 370: 20140242 (d) 0.2 trunk fraction 1 420IV−trunk hu420143−trunk PGP1−trunk 0.1 0.2 (c) 4 420IV−canopy hu420143−canopy PGP1−canopy rstb.royalsocietypublishing.org trunk 0.4 hu420143−negative 420IV−negative PGP1−negative hu420143−positive 420IV−positive PGP1−positive Downloaded from http://rstb.royalsocietypublishing.org/ on August 3, 2017 (a) (b) 0.3 420IV−canopy hu420143−canopy PGP1−canopy 420IV−trunk hu420143−trunk PGP1−trunk 1.0 fraction 0.2 0.1 0.5 S 0 0.1 0 0.2 0.3 canopy 0.4 IGHA IGHG hu420143−negative 420IV−negative PGP1−negative hu420143−positive 420IV−positive PGP1−positive −0.5 −1.0 IGHM (c) IGHA IGHG IGHM (d) 0.3 420IV−canopy hu420143−canopy PGP1−canopy 420IV−trunk hu420143−trunk PGP1−trunk 1.0 0.2 0.1 0.5 S 0 0.1 0 0.2 0.3 canopy 0.4 IGHA IGHG hu420143−negative 420IV−negative PGP1−negative hu420143−positive 420IV−positive PGP1−positive IGHM −0.5 −1.0 IGHA IGHG IGHM Figure 4. Mutations in the CDRs are positively selected in the trunk. (a,c) The fraction of clones with significant negative (filled bars) or positive (hatched bars) selection as determined by (a) BASELINe and (c) MBSM applied to the CDRs. In each panel, the upper part refers to the trunk, while the lower part refers to the canopy. (b,d ) The average selection strength across all sequences as determined by (b) BASELINe and (d ) MBSM applied to the CDRs. For BASELINe (b), 95% CIs are shown for each of the selection estimations. single mRNA molecule, (2) different mRNA molecules from the same cell, or (3) mRNA molecules in different cells carrying the same Ig sequence. Potential biases were overcome by requiring: (1) a minimum coverage of two independent reads to call a sequence to reduce sequencing errors, and (2) using only the set of unique reads to reduce PCR amplification biases. As we estimate selection for each tree twice: once for the trunk and once for the canopy using a representative sequence, the effect of PCR amplification bias is expected to be small. In order to ensure that these results were not an artefact of collapsing duplicate sequences, the same analysis was repeated with all sequences in a clone treated independently, and with random sampling of only one sequence per clone. This type of analysis produced similar results (data not shown) suggesting that, indeed, CDR mutation patterns in the trunk and canopy are shaped by different selection pressures. (e) Selection strength is dependent upon VH family and conserved across individuals Selection may be affected by the germline sequence of VH. To check for such an effect, we correlated the selection values for each VH separately between the three donors (figure 5). The correlations are stronger in the trunk relative to the canopy. This similar selection behaviour in different VH segments between unrelated individuals might indicate a similar selection process for clones having similar VH genes. This could be owing to similar antigens that trigger related immune responses demonstrated by similar selection strengths of the corresponding clones. 3. Discussion The extent to which affinity maturation shapes the observed pattern of somatic mutations in B-cell repertoires has been debated, as several studies have failed to observe the expected excess of NS mutations in CDRs [26]. In some cases, the effects of selection can be confounded by the intrinsic biases of SHM [7], and integration of improved background models into methods for quantifying selection is critical. However, another limitation of current approaches based on analysing the frequency of NS mutations may relate to the short time-scale of B-cell affinity maturation. These methods were originally developed for the analysis of fixed mutations (i.e. those that appear in the trunk). However, many somatic mutations in BCR sequences are shared by only a subset of the cells in a clone, and these mutations may not share a strong signature of selection. Indeed, we have previously shown that removing the most recent mutations (based on a lineage analysis) can improve the signal for selection [6]. Here, we leverage recent advances in sequencing technologies [27] to carry out large-scale lineage and selection analysis of the human B-cell repertoire. To quantify selection, we have used our previously proposed BASELINe method [24], along with a state-of-the-art SHM targeting and nucleotide substitution model to properly account for the intrinsic biases of the mutation process [7]. In addition, we have developed and used a precise Bayesian estimate to quantify selection and show that its maximumlikelihood approximation converges to our previously proposed BASELINe method (using the Focused statistic) [24]. This provides a precise estimate of the expected distribution of NS mutations in any given region, even for a single sequence. Using these methods, we obtain an estimate of the selection forces affecting the BCR. Interestingly, we find that distinct selection pressures are apparent in the trunk and canopy of B-cell clonal lineages. The trunk mutations are shared by all members of a clone, and thus may be considered fixed. Clear and consistent selection pressures are found for these trunk mutations. In particular, the FWRs show evidence of negative selection, while the CDRs Phil. Trans. R. Soc. B 370: 20140242 1.5 trunk 0.4 fraction rstb.royalsocietypublishing.org trunk 0.4 5 1.5 Downloaded from http://rstb.royalsocietypublishing.org/ on August 3, 2017 (a) (b) 6 1.0 −1 SCDR–canopy(hu420143) 0.5 0 −2 −3 −0.5 −4 −1.0 FWR−canopy CDR−canopy FWR−trunk CDR−trunk −4 −3 −2 SCDR–canopy(PGP1) −1 Figure 5. Correlation of the V gene selection estimations between subjects. (a) Pearson’s correlations are shown for the selection estimations calculated for each V gene using MBMS. Only V genes that were used in more than 1% of the sampled repertoire were included. (b) Scatter plot for the V gene selection estimations in the CDRs and canopy data of subject PGP1 versus subject hu420143. Colours represent different V genes as indicated in the colour key to the right. (Online version in colour.) show evidence of positive selection. Trunk mutation patterns are likely devoid of any strongly deleterious mutations and are likely to be enriched for advantageous mutations in the population. Such advantageous mutations are sometimes denoted ‘key mutations’ [5,28]. Interesting evidence of such mutations emerges in transgenic mouse systems, where common mutations occur among multiple mice [29]. Such mutations will be rapidly fixed, and seem to induce the main effect of selection in the observed clones. By contrast, mutation patterns in the canopy show overall evidence of negative selection in both FWRs and CDRs. Nevertheless, some individual clones show evidence of positive selection in the CDRs. These may reflect mutations that provide a moderate advantage, or which have occurred recently and thus have not had sufficient time to become fixed in the population. The difference in selection pressures between trunk and canopy may reflect two different underlying mechanisms. First, the difference may reflect distinct time-scales. In this case, the trunk reflects the cumulative history of mutations that have been selected over the course of many immune responses; for example, as would happen from restimulation of memory cells with recurrent infections. An alternative explanation is that each clone is the result of a single germinal centre reaction. In this case, the trunk reflects strongly selected mutations which have become fixed in the population, whereas the canopy reflects weakly selected mutations that are not fixed. This study is unable to distinguish between these two models, and we suspect that they both contribute to the observed mutation patterns. All of the sequencing data analysed in this study were amplified from mRNA. Thus, it is possible that sequences from plasma cells may be over-represented in the data because they contain larger amounts of mRNA and consequently have a high probability of being observed in cDNA samples. Other biases in PCR amplification could also skew the number of times that the same sequence appears in the dataset. In order to ensure that these effects did not impact our results, we repeated the analysis with or without removing redundant sequences, which both led to similar results. When redundant sequences are ignored, the mRNA level of each BCR does not affect its importance in the analysis. We found a high correlation in the average selection strength in different VH genes among subjects. As negative selection is likely driven mainly by structural constraints, it is easy to see how selection could be VH gene-specific. Conservation of selection strength in the CDRs, especially on the trunk where positive selection is observed, is harder to explain. It seems that different VH germline segments are more (or less) able to optimize their binding affinity. Another possibility is that the correlated selection pressures represent the shared responses to common antigens. Although it is possible that these correlations represent biases in the background model for SHM targeting, this is unlikely as the correlation is much stronger in the trunk than in the canopy. If the correlation was an artefact of errors in the SHM model, no difference would be expected between the two regions. In most observed evolutionary systems, the mutation rate is too low to observe diversification in real time. The humoral immune response is an exceptional case, where the BCR is subject to a mutation rate estimated to be approximately 1023 per base-pair per cell division [2–4]. This process occurs within germinal centre structures in the secondary lymphoid organs where B cells are rapidly dividing (up to four divisions per day), producing an optimal observable micro-evolution system. Our results suggest that a better understanding of selection pressures can be obtained by separating the trunk of the lineage tree (mutations from the germline to the MRCA) and the canopy (mutations from the MRCA to the leaves). While mutations on the trunk show consistent selection pressures among donors and isotypes, the effect of canopy mutations is much more variable and probably represents weak/incremental selection. 4. Material and methods (a) Repertoire sequencing data B-cell repertoire sequencing data from blood samples of three healthy human donors was obtained from a previous study [18]. In the original publication, a time-series of samples before and after influenza vaccination were sequenced and IgA, IgG and IgM isotypes were identified based on the constant region Phil. Trans. R. Soc. B 370: 20140242 hu420143−420IV hu420143−PGP1 420IV−PGP1 rstb.royalsocietypublishing.org V7.4.1 V6.1 V5.a V5.51 V4.b V4.61 V4.4 V4.34 V4.31 V4.30.4 V4.30.2 V4.28 V3.h V3.71 V3.7 V3.66 V3.64 V3.49 V3.43 V3.33 V3.30.3 V3.30 V3.23 V3.21 V3.20 V3.15 V3.13 V3.11 V2.70 V2.5 V2.26 V1.f V1.8 V1.69 V1.58 V1.45 V1.3 V1.24 V1.2 Downloaded from http://rstb.royalsocietypublishing.org/ on August 3, 2017 — Quality filtering (1) Removal of reads where the primer could not be identified or had a poor alignment score (mismatch rate greater than 0.1). (2) Removal of sequences that did not appear in a single sample at least twice. — Assignment of germline V(D)J segments for each of the Ig sequences: initial V(D)J assignments for each sequence were obtained using IMGT/HighV-QUEST [28]. — Removal of non-functional sequences due to the occurrence of a stop codon or/and a reading frame shift between the V gene and the J gene. — Removal of sequences with more than 30 mutations. — Identification of clonally related sequences: sequences were assigned into clonal groups by first partitioning sequences based on common V gene, J gene and junction region length. Within these larger groups, sequences differing from one another by a weighted distance of less than 5 (within the junction region) were then defined as clones. Distance was measured as the number of point mutations weighted by a symmetric version of the nucleotide substitution probability previously described (50). The five threshold corresponds to up to four transition mutations or one to two transversion mutations [13]. (c) Selection estimation For each clone, a lineage tree was inferred via maximum parsimony with PHYLIP v. 3.69 [30] as described in [31]. FWRs and CDRs were defined using the IMGT definitions according to their unique numbering scheme [32]. A consensus sequence for each clone was formed by a weighted sampling of each mutated (and non ‘N’) base from all the sequences in the clone. BASELINe [21] was applied to the set of sequences using the focused test [17]. In order to analyse selection in the canopy, all sequences were collapsed to form a representative sequence as previously described [24]. Mutations in codons that had more than one mutation were discarded, as it is usually not possible to infer the order in which the mutations occurred (and thus the micro-sequence context of the mutations is unknown). MBSM, a novel method proposed here (see electronic supplementary material for full derivation) was used on all sequences. For trunk mutations, similar to BASELINe, the MRCA was compared with the inferred Ig germline sequence. For canopy sequences, all sequences were used and compared with the MRCA. Authors’ contributions. G.Y., S.H.K. and Y.L. designed the experiments; G.Y. and J.A.V.H. processed the data; G.Y., J.B., J.A.V.H. and Y.L. analysed the data; G.Y., S.H.K. and Y.L. wrote the manuscript. All authors read and commented on the text. Competing interests. All authors have no competing interests. Funding. This work was supported by United States-Israel Binational Science Foundation grants (2013395 and 2009046). Acknowledgements. We wish to acknowledge Mohamed Uduman for his help in organizing the data. References 1. 2. 3. 4. 5. 6. 7. Janeway C, Travers P, Walport M, Shlomchik M. 2004 Immunobiology, pp. 49 –53, 6th edn. New York, NY: Garland Science. McKean D, Huppi K, Bell M, Staudt L, Gerhard W, Weigert M. 1984 Generation of antibody diversity in the immune response of BALB/c mice to influenza virus hemagglutinin. Proc. Natl Acad. Sci. USA 81, 3180–3184. (doi:10.1073/pnas.81.10.3180) Storb U. 1996 The molecular basis of somatic hypermutation of immunoglobulin genes. Curr. Opin. Immunol. 8, 206– 214. (doi:10.1016/S09527915(96)80059-8) Kleinstein SH, Louzoun Y, Shlomchik MJ. 2003 Estimating hypermutation rates from clonal tree data. J. Immunol. 171, 4639 –4649. (doi:10.4049/ jimmunol.171.9.4639) Radmacher MD, Kelsoe G, Kepler TB. 1998 Predicted and inferred waiting times for key mutations in the germinal centre reaction: evidence for stochasticity in selection. Immunol. Cell Biol. 76, 373–381. (doi:10.1046/j.1440-1711.1998.00753.x) Uduman M, Shlomchik MJ, Vigneault F, Church GM, Kleinstein SH. 2014 Integrating B cell lineage information into statistical tests for detecting selection in Ig sequences. J. Immunol. 192, 867–874. (doi:10.4049/jimmunol.1301551) Liu Y, Joshua D, Williams G, Smith C, Gordon J, MacLennan I. 1989 Mechanism of antigen-driven 8. 9. 10. 11. 12. 13. selection in germinal centres. Nature 342, 929 –931. Berek C, Berger A, Apel M. 1991 Maturation of the immune response in germinal centers. Cell 67, 1121 –1129. (doi:10.1016/0092-8674(91)90289-B) Hodgkin PD, Heath WR, Baxter AG. 2007 The clonal selection theory: 50 years since the revolution. Nat. Immunol. 8, 1019–1026. (doi:10. 1038/ni1007-1019) Kocks C, Rajewsky K. 1988 Stepwise intraclonal maturation of antibody affinity through somatic hypermutation. Proc. Natl Acad. Sci. USA 85, 8206 –8210. (doi:10.1073/pnas.85.21.8206) Yaari G et al. 2013 Models of somatic hypermutation targeting and substitution based on synonymous mutations from high-throughput immunoglobulin sequencing data. Front. Immunol. 4, 358. (doi:10.3389/fimmu.2013.00358) Shapiro GS, Ellison MC, Wysocki LJ. 2003 Sequencespecific targeting of two bases on both DNA strands by the somatic hypermutation mechanism. Mol. Immunol. 40, 287–295. (doi:10.1016/S01615890(03)00101-9) Smith DS, Creadon G, Jena PK, Portanova JP, Kotzin BL, Wysocki LJ. 1996 Di- and trinucleotide target preferences of somatic mutagenesis in normal and autoreactive B cells. J. Immunol. 156, 2642 – 2652. 14. MacCarthy T, Kalis SL, Roa S, Pham P, Goodman MF, Scharff MD, Bergman A. 2009 V-region mutation in vitro, in vivo, and in silico reveal the importance of the enzymatic properties of AID and the sequence environment. Proc. Natl Acad. Sci. USA 106, 8629– 8634. (doi:10.1073/pnas.0903803106) 15. Shlomchik MJ, Aucoin AH, Pisetsky DS, Weigert MG. 1987 Structure and function of anti-DNA autoantibodies derived from a single autoimmune mouse. Proc. Natl Acad. Sci. USA 84, 9150–9154. (doi:10.1073/pnas.84.24.9150) 16. Liberman G, Benichou J, Tsaban L, Glanville J, Louzoun Y. 2013 Multi step selection in IgH chains is initially focused on CDR3 and then on other CDR regions. Front. Immunol. 4, 274. (doi:10.3389/ fimmu.2013.00274) 17. Hershberg U, Uduman M, Shlomchik MJ, Kleinstein SH. 2008 Improved methods for detecting selection by mutation analysis of Ig V region sequences. Int. Immunol. 20, 683– 694. (doi:10.1093/intimm/ dxn026) 18. Laserson U et al. 2014 High-resolution antibody dynamics of vaccine-induced immune responses. Proc. Natl Acad. Sci. USA 111, 4928–4933. (doi:10. 1073/pnas.1323862111) 19. Vander Heiden JA, Yaari G, Uduman M, Stern JN, O’Connor KC, Hafler DA, Vigneault F, Kleinstein SH. 2014 pRESTO: a toolkit for processing high- 7 Phil. Trans. R. Soc. B 370: 20140242 (b) Lineage tree construction rstb.royalsocietypublishing.org of the sequences. In this study, we pooled together data from samples collected at three time-points prior to the vaccination. Raw sequencing reads were filtered in several steps to identify and remove low-quality sequences. Conservative thresholds were applied in all cases to increase the reliability of the resulting mutation calls, at the potential expense of excluding some real mutations. Pre-processing was carried out using the Repertoire Sequencing Toolkit ( pRESTO) [19], and involved: Downloaded from http://rstb.royalsocietypublishing.org/ on August 3, 2017 21. 23. 25. 26. 27. 28. 29. 30. 31. 32. affinity maturation of an anti-hapten response. EMBO J. 7, 1995–2001. Anderson SM et al. 2009 Taking advantage: highaffinity B cells in the germinal center have lower death rates, but similar rates of division, compared to low-affinity cells. J. Immunol. 183, 7314–7325. (doi:10.4049/jimmunol.0902452) Feldstein J. 2009 PHYLIP (Phylogeny Inference Package) version 3.69. Seattle, WA: Department of Genetics, University of Washington. Stern JNH et al. 2014 B cells populating the multiple sclerosis brain mature in the draining cervical lymph nodes. Sci. Transl. Med. 6, 248ra107. (doi:10.1126/scitranslmed.3008879) Lefranc MP, Pommie C, Ruiz M, Giudicelli V, Foulquier E, Truong L, Thouvenin-Contet V, Lefranc G. 2003 IMGT unique numbering for immunoglobulin and T cell receptor variable domains and Ig superfamily V-like domains. Dev. Comp. Immunol. 27, 55 –77. (doi:10.1016/S0145305X(02)00039-3) 8 Phil. Trans. R. Soc. B 370: 20140242 22. 24. J. Mol. Biol. 275, 269 –294. (doi:10.1006/jmbi. 1997.1442) Uduman M, Yaari G, Hershberg U, Stern JA, Shlomchik MJ, Kleinstein SH. 2011 Detecting selection in immunoglobulin sequences. Nucleic Acids Res. 39, W499–W504. (doi:10.1093/ nar/gkr413) Kimura M. 1962 On the probability of fixation of mutant genes in a population. Genetics 47, 713 –719. Bose B, Sinha S. 2005 Problems in using statistical analysis of replacement and silent mutations in antibody genes for determining antigen-driven affinity selection. Immunology 116, 172–183. (doi:10.1111/j.1365-2567.2005.02208.x) Benichou J, Ben-Hamo R, Louzoun Y, Efroni S. 2012 Rep-Seq: uncovering the immunological repertoire through next-generation sequencing. Immunology 135, 183–191. (doi:10.1111/j.1365-2567.2011.03527.x) Allen D, Simon T, Sablitzky F, Rajewsky K, Cumano A. 1988 Antibody engineering for the analysis of rstb.royalsocietypublishing.org 20. throughput sequencing raw reads of lymphocyte receptor repertoires. Bioinformatics 30, 1930– 1932. (doi:10.1093/bioinformatics/btu138) Gupta NT, Vander Heiden J, Uduman M, GadalaMaria D, Yaari G, Kleinstein SH. In press. Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data. Bioinformatics btv359. (doi:10.1093/bioinformatics/ btv359) Yaari G, Uduman M, Kleinstein SH. 2012 Quantifying selection in high-throughput immunoglobulin sequencing data sets. Nucleic Acids Res. 40, e134. (doi:10.1093/nar/gks457) Shlomchik MJ, Marshak-Rothstein A, Wolfowicz CB, Rothstein TL, Weigert MG. 1987 The role of clonal selection and somatic mutation in autoimmunity. Nature 328, 805 – 811. (doi:10. 1038/328805a0) Morea V, Tramontano A, Rustici M, Chothia C, Lesk AM. 1998 Conformations of the third hypervariable region in the VH domain of immunoglobulins.