Download The mutation patterns in B-cell immunoglobulin receptors reflect the

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Molecular mimicry wikipedia , lookup

Immunomics wikipedia , lookup

Transcript
Downloaded from http://rstb.royalsocietypublishing.org/ on August 3, 2017
rstb.royalsocietypublishing.org
Research
Cite this article: Yaari G, Benichou JIC,
Vander Heiden JA, Kleinstein SH, Louzoun Y.
2015 The mutation patterns in B-cell
immunoglobulin receptors reflect the influence
of selection acting at multiple time-scales. Phil.
Trans. R. Soc. B 370: 20140242.
http://dx.doi.org/10.1098/rstb.2014.0242
Accepted: 19 May 2015
One contribution of 13 to a theme issue
‘The dynamics of antibody repertoires’.
Subject Areas:
immunology, computational biology
Keywords:
antigen-driven selection, affinity maturation,
mutations, lineage trees, B-cell receptor,
next generation sequencing
Authors for correspondence:
Steven H. Kleinstein
e-mail: [email protected]
Yoram Louzoun
e-mail: [email protected]
The mutation patterns in B-cell
immunoglobulin receptors reflect the
influence of selection acting at multiple
time-scales
Gur Yaari1, Jennifer I. C. Benichou2, Jason A. Vander Heiden4,
Steven H. Kleinstein4,5,† and Yoram Louzoun2,3,†
1
Bioengineering Program, Faculty of Engineering, 2Department of Mathematics, and 3Gonda Brain Research
Center, Bar-Ilan University, Ramat Gan 5290002, Israel
4
Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven,
CT 06511, USA
5
Departments of Pathology and Immunobiology, Yale School of Medicine, New Haven, CT 06511, USA
During the several-week course of an immune response, B cells undergo a process of clonal expansion, somatic hypermutation of the immunoglobulin (Ig)
genes and affinity-dependent selection. Over a lifetime, each B cell may participate in multiple rounds of affinity maturation as part of different immune
responses. These two time-scales for selection are apparent in the structure
of B-cell lineage trees, which often contain a ‘trunk’ consisting of mutations
that are shared across all members of a clone, and several branches that
form a ‘canopy’ consisting of mutations that are shared by a subset of clone
members. The influence of affinity maturation on the B-cell population can
be inferred by analysing the pattern of somatic mutations in the Ig. While
global analysis of mutation patterns has shown evidence of strong selection
pressures shaping the B-cell population, the effect of different time-scales of
selection and diversification has not yet been studied. Analysis of B cells
from blood samples of three healthy individuals identifies a range of clone
sizes with lineage trees that can contain long trunks and canopies indicating
the significant diversity introduced by the affinity maturation process.
We here show that observed mutation patterns in the framework regions
(FWRs) are determined by an almost purely purifying selection on both
short and long time-scales. By contrast, complementarity determining regions
(CDRs) are affected by a combination of purifying and antigen-driven positive
selection on the short term, which leads to a net positive selection in the long
term. In both the FWRs and CDRs, long-term selection is strongly dependent
on the heavy chain variable gene family.
1. Introduction
†
These authors equally contributed to this
work.
Electronic supplementary material is available
at http://dx.doi.org/10.1098/rstb.2014.0242 or
via http://rstb.royalsocietypublishing.org.
B lymphocytes recognize pathogens through the binding of specific B-cell receptors
(BCRs), also referred to as immunoglobulin (Ig), expressed on their cell surface.
Receptor diversity in the B-cell population is generated in two stages. First, an
initial BCR is created through recombination of different germline gene segments
during B-cell maturation in the bone marrow [1]. Second, somatic hypermutation
(SHM) introduces point mutations into the DNA coding for the BCR during T-celldependent adaptive immune responses. The SHM rate has been estimated to be
approximately 1023 per base-pair per cell division [2–4]. This is 106-fold higher
than the background mutation rate in other somatic cells. These mutations may
alter the affinity or specificity of the BCR, and thus are a source of diversity
within an expanding B-cell clone. B cells with affinity-increasing mutations are preferentially expanded in germinal centres, in a process known as affinity maturation,
which results in an increase in average affinity in the population over time. Some of
these B cells will differentiate into long-lived memory and plasma cells, which are
& 2015 The Author(s) Published by the Royal Society. All rights reserved.
Downloaded from http://rstb.royalsocietypublishing.org/ on August 3, 2017
(a)
germline
(b)
2
vy
ea
ch
h
n
ai
h
tc
14
trunk
gh
li
MRCA
FWR1
CDR1 FWR2 CDR2
FWR3
CDR3
FWR4
2
5
canopy
2
3
germline M A I R L S F Q G I A R I P
atg gcc ata cgg tta agc ttc caa ggg ata gcg cgc ata ccc
atg gcc aga cgg tta agc ttg caa ggc ata gtg cgc ata ccc
sequence M A R R L S L Q G I V R I P
NS
S
Figure 1. B-cell lineages divide the affinity maturation process into long (trunk) and short (canopy) time-scales. (a) Antibodies are composed of two identical heavy
chains and two identical light chains. The mRNA coding for each chain is divided into FWRs and CDRs, which are interlaced in the linear genetic code. To identify the
somatic mutations that are accumulated during affinity maturation, antibody sequences are first aligned with the corresponding inferred germline sequence. (b) After
clustering the repertoire into clonally related sequences (clones) and building lineage trees, the affinity maturation process was divided into two non-overlapping
intervals: trunk and canopy, which were separated by the MRCA in the lineage, and correspond to long and short time-scales, respectively. The number of mutations
is indicated along each branch (assumed to be unity if no number). (Online version in colour.)
critical to protect us from recurrent infections with the same (or a
closely related) microorganism. Within the germinal centres, B
cells also undergo isotype switching (e.g. from IgM to IgG)
which allows for different effector functions.
Unlike naive B cells that start with a BCR in the unmutated
germline state, B memory cells that are reactivated through
exposure to recurrent and related infections usually begin
with a mutated, affinity-matured receptor, which is then
further diversified as part of the adaptive immune response.
These two time-scales for selection are apparent in the structure
of B-cell lineage trees, which often contain a ‘trunk’ consisting
of mutations that are shared across all sampled members of a
clone, and several branches that form a ‘canopy’ consisting of
mutations that are shared by a subset of clone members
(figure 1b). The trunk and canopy are separated by the most
recent common ancestor (MRCA), which estimates the state
of the B cell that initiated the most recent expansion. The
MRCA also contains some of the mutations that were fixed
during affinity maturation in the most recent germinal centre
reaction [5]. Previous studies of selection in the B-cell repertoire
have not differentiated between these two scales, and it is
unclear if the selection processes are uniform over time. Our
previous work suggests that selection may operate differently
in the long- and short-term scales as removing the most
recent mutations altered the signal for selection [6].
B-cell affinity maturation is a micro-evolutionary stochastic process. The dynamics of B-cell clonal expansion and
selection for binding pathogens represents an example of a
process involving rapid asexual reproduction, where constant
diversification and adaptation occurs in parallel with a high
mutation rate [7 –9]. The BCR appears to be tuned by
evolution to optimize the impact of SHM. Mutation hotspots are more common in the complementarity-determining
regions (CDRs), which include most antigen contact residues,
and less common in framework regions (FWRs), which are
important for the overall receptor structure. Mutations are
also more likely to be non-synonymous (NS) than synonymous
(S) in the CDRs compared with the FWRs [10] (figure 1a).
Mutation in the FWRs may destabilize the antibody (and
thus be subject to negative selection), whereas mutations
in the CDRs may improve the antibody (and thus be positively selected). Multiple computational methods have been
proposed to measure selection in populations or between
populations, in the sense of evolving towards a higher fitness
phenotype. A common measure compares the observed frequency of NS mutations (NS/(NS þ S)) to its expected value.
The expected ratio is calculated based on an underlying
mutation probability model (e.g. [11–13]), or based on genetic
regions where no selection is assumed to occur [14–16]. An
increased frequency of NS mutations is treated as an indication
of positive selection, while a decreased frequency indicates
negative selection. A potential drawback of such methods is
their strong sensitivity to the baseline mutation model (i.e.
the expected probability of each mutation type), especially
when the mutations rate is position dependent [17]. SHM is
significantly affected by the local nucleotide composition,
resulting in sequence-specific hot- and cold-spots for mutation
[11,12]. We have recently developed a background mutation
model for SHM targeting and nucleotide substitution that
can be used to estimate the expected NS and S mutation frequencies, and thus quantify selection more accurately than
previously possible [11].
Phil. Trans. R. Soc. B 370: 20140242
2
rstb.royalsocietypublishing.org
n
ai
Downloaded from http://rstb.royalsocietypublishing.org/ on August 3, 2017
(b)
(a)
IgA
IgG
IgM
5 × 102
rstb.royalsocietypublishing.org
1 × 103
3
trunk
FWR
canopy
no. clones
1 × 102
trunk
5 × 10
CDR
canopy
1 × 10
trunk
FWR + CDR
canopy
1×1
5 ×10–1
1
2
3−4 5−8 9−16 17−32 33−64 65−128
clone size (unique sequences)
0
5
10
no. mutations
15
20
Figure 2. Trunk mutations dominate B-cell repertoires from blood samples of healthy donors. (a) Clone size distribution for each of the sequenced isotypes from
subject 420IV. Axes are logarithmic and bins are ‘logarithmic’ sized (each bin is double the size of the previous one). (b) The number of mutations associated with
the trunk or canopy was determined for IgG sequences of subject 420IV. Mutations from different regions (FWRs, CDRs or both) are shown in the three panels.
Recent advances in high-throughput sequencing technologies allow for large-scale characterization of B-cell repertoires,
and construction of lineage trees from observed B-cell clones.
We analysed the observed blood cDNA BCR repertoire from
three human donors, and show clear differences in the distribution of mutations in different genetic regions, and between
fixed (trunk) and non-fixed (canopy) mutations. The results presented here strengthen the common thinking about negative
selection in the FWRs, but propose a more complex view of
the positive selection argued to occur in CDRs.
the clones (35% of sequences) in each donor were singletons,
and over 25% contained both trunks and canopies. In a significant portion of trees, the MRCA was sampled (42.7%) and not
inferred. Somatic mutations were not distributed evenly between
the trunk and canopy. The median number of trunk mutations
was five per sequence, while two mutations per sequence were
observed in the canopy for IgG sequences (figure 2b). Note
that only mutations occurring in the heavy chain V gene (VH),
excluding the CDR3, can be taken into account, as the germline
sequence in the junctional regions is estimated with significantly
lower confidence. The skewing of mutations towards the trunk
was observed for both FWR and CDR mutations, with a
higher variance among clones in the CDRs (figure 2b).
2. Results
(a) B-cell lineages in the blood contain both trunks
and canopies
(b) Selection estimation measures
B-cell repertoire sequencing data were obtained from blood
samples of three healthy human donors [18]. As described in
§4(a), these sequencing data were processed using the pRESTO
pipeline [19] to generate high-quality Ig sequences, which were
then submitted to IMGT/HighV-QUEST for germline V(D)J segment identification. The Change-O pipeline [20] was then used to
partition the sequences into clonally related groups, and a lineage tree was constructed for each clone (see the electronic
supplementary material for representative examples of these
trees). Observed clone sizes ranged from 1 to 128 unique
sequences and followed a scale-free distribution (figure 2a).
Each reconstructed clonal lineage was split into a trunk and
canopy. The trunk was defined as the branch leading from
the germline Ig sequence inferred by IMGT/HighV-QUEST
to the MRCA, while the canopy contained all other branches.
The mutation analysis focused on the variable (V) gene segment.
Differences in the nucleotide sequence comparing the inferred
germline to the MRCA were defined as trunk mutations, while
all V gene segment mutations from the MRCA to the observed
sequences were defined as canopy mutations. Singletons (i.e.
clones containing a single unique sequence) were not included
in the analysis, while clones where the inferred germline Ig
sequence was the MRCA were included. On average, 72% of
Selection can be estimated by comparing the observed and
expected number of NS mutations. If the observed number of
NS mutations is higher than expected, this is interpreted as
an advantage induced by NS mutations (i.e. positive selection).
Similarly, a lower than expected number of NS mutations is
interpreted as a disadvantage of NS mutations (i.e. negative
selection). To quantify selection, we use the previously
proposed BASELINe method [21], along with Multi-locus
Bayesian Selection Measure (MBSM), a novel method proposed herein (see the electronic supplementary material for
full derivation). MBSM provides a clear Bayesian interpretation
of the observed NS and S value in each region. In order to estimate the selection pressure affecting a single sequence using
the number of NS and S mutations, the distribution of expected
NS and S is required, and not only their average. While it may
be correct to assume a symmetrical normal distribution around
an expected number when a large number of sequences is
observed, this may not be correct when a single sequence is
analysed. To estimate the full distribution of expected NS
values in a given region of a sequence, MBSM uses a Bayesian
approach, which provides a direct interpretation of the number
of S mutations as the probability distribution of generations
since the onset of mutations.
Phil. Trans. R. Soc. B 370: 20140242
5×1
Downloaded from http://rstb.royalsocietypublishing.org/ on August 3, 2017
(a)
(b)
0.3
fraction
0.2
0
S −1
0
0.1
−2
0.3
−3
canopy
0.4
IGHA
IGHG
IGHM
IGHA
IGHG
IGHM
0.1
hu420143−negative
420IV−negative
PGP1−negative
hu420143−positive
420IV−positive
PGP1−positive
1
420IV−canopy
hu420143−canopy
PGP1−canopy
420IV−trunk
hu420143−trunk
PGP1−trunk
0
S −1
0
−2
0.1
−3
canopy
0.2
IGHA
IGHG
IGHM
IGHA
IGHG
IGHM
Figure 3. Mutations in the FWRs are negatively selected in both trunk and canopy. (a,c) The fraction of clones with significant negative (filled bars) or positive
(hatched bars) selection as determined by (a) BASELINe and (c) MBSM applied to the FWRs. In each panel, the upper part refers to the trunk, while the lower part
refers to the canopy. (b,d ) The average selection strength across all sequences as determined by (b) BASELINe and (d ) MBSM applied to the FWRs. For BASELINe (b),
95% CIs are shown for each of the selection estimations.
(c) The framework regions are subject to purifying
selection on both short and long time-scales
The BCR FWRs are important to maintain the structural
integrity of the receptor, and it has been estimated that
approximately 50% of NS mutations in these regions are subject to negative selection [22]. Indeed, all three individuals
studied here exhibited strong negative selection pressure on
FWR mutations. This negative selection was observed in
both trunk and canopy mutations, and for all isotypes (IgA,
IgG and IgM) (figure 3). Significantly, when each clone was
analysed separately, practically none displayed evidence of
significant positive selection in the FWRs. These results
were confirmed using two different computational methods
for analysing selection mentioned above (BASELINe and
MBSM; see §4(c) for details). While there were quantitative
differences in the selection strengths estimated by the different methods, both agreed about the direction and
consistency of negative selection in the FWRs. This observation was also not dependent on specific V genes, and
similar results were found when analysing each V gene separately (data not shown). Overall, these results suggest that
negative selection in the FWRs is sweeping, and not the
result of a net negative balance between advantageous and
deleterious mutations with selection force of the same order
(figure 3).
(d) The complementarity determining regions are
subject to purifying selection on short time-scales
Most of the residues that contact antigen are found in the
CDRs, and it has been assumed that positive selection for
affinity-increasing mutations would be focused in these
regions [23]. However, many studies have failed to detect
positive selection in the CDRs, and it has been suggested
that methods based on the frequency of NS mutations may
not be appropriate [24]. All three individuals studied
here exhibited neutral selection in the CDRs, even when the
analysis was restricted to class-switched (and presumably
affinity-matured) sequences. However, a much more complex
picture emerged when CDR selection was analysed separately for trunk and canopy mutations. CDR mutations that
appear in the trunks of lineage trees displayed a strong
signal for positive selection, while mutations in the canopy
were negatively selected. These results were consistent across
methods to quantify selection, isotypes and individuals. The
strengths of positive and negative selection were similar
(across individuals and isotypes) in the trunk and canopy,
respectively. This may explain why neutral selection is commonly observed when considering all mutations as a group.
Unlike the case for FWRs, where negative selection was dominant in virtually all clones, selection in the CDRs was variable
among clones. Clones with either positive or negative selection
among trunk mutations were identified (figure 4). Similar
diversity was found when considering selection among
canopy mutations. As expected from the population-level
analysis, more clones were positively selected when analysing
trunk mutations, while more clones were negatively selected
when considering canopy mutations.
A high NS frequency in the trunk can be interpreted as a
high fixation probability, which has been directly correlated
with selection [25]. The interpretation of the same frequency
in the canopy is less straightforward as, by definition,
sequences with and without the mutations are observed (otherwise, the mutation would be in the trunk). In this case, positive
selection may be more evident in the frequency of cells carrying
NS mutations. However, the selection analysis presented above
assumes that all identical sequences were derived from a single
cell, no matter how many times the sequence was observed.
This is because the data are derived from PCR amplification
of mRNA and it is impossible to tell whether two identical
sequences were derived from: (1) PCR amplification from a
Phil. Trans. R. Soc. B 370: 20140242
(d)
0.2
trunk
fraction
1
420IV−trunk
hu420143−trunk
PGP1−trunk
0.1
0.2
(c)
4
420IV−canopy
hu420143−canopy
PGP1−canopy
rstb.royalsocietypublishing.org
trunk
0.4
hu420143−negative
420IV−negative
PGP1−negative
hu420143−positive
420IV−positive
PGP1−positive
Downloaded from http://rstb.royalsocietypublishing.org/ on August 3, 2017
(a)
(b)
0.3
420IV−canopy
hu420143−canopy
PGP1−canopy
420IV−trunk
hu420143−trunk
PGP1−trunk
1.0
fraction
0.2
0.1
0.5
S
0
0.1
0
0.2
0.3
canopy
0.4
IGHA
IGHG
hu420143−negative
420IV−negative
PGP1−negative
hu420143−positive
420IV−positive
PGP1−positive
−0.5
−1.0
IGHM
(c)
IGHA
IGHG
IGHM
(d)
0.3
420IV−canopy
hu420143−canopy
PGP1−canopy
420IV−trunk
hu420143−trunk
PGP1−trunk
1.0
0.2
0.1
0.5
S
0
0.1
0
0.2
0.3
canopy
0.4
IGHA
IGHG
hu420143−negative
420IV−negative
PGP1−negative
hu420143−positive
420IV−positive
PGP1−positive
IGHM
−0.5
−1.0
IGHA
IGHG
IGHM
Figure 4. Mutations in the CDRs are positively selected in the trunk. (a,c) The fraction of clones with significant negative (filled bars) or positive (hatched bars)
selection as determined by (a) BASELINe and (c) MBSM applied to the CDRs. In each panel, the upper part refers to the trunk, while the lower part refers to the
canopy. (b,d ) The average selection strength across all sequences as determined by (b) BASELINe and (d ) MBSM applied to the CDRs. For BASELINe (b), 95% CIs are
shown for each of the selection estimations.
single mRNA molecule, (2) different mRNA molecules from the
same cell, or (3) mRNA molecules in different cells carrying the
same Ig sequence. Potential biases were overcome by requiring:
(1) a minimum coverage of two independent reads to call a
sequence to reduce sequencing errors, and (2) using only the
set of unique reads to reduce PCR amplification biases. As we
estimate selection for each tree twice: once for the trunk and
once for the canopy using a representative sequence, the effect
of PCR amplification bias is expected to be small. In order to
ensure that these results were not an artefact of collapsing duplicate sequences, the same analysis was repeated with all
sequences in a clone treated independently, and with random
sampling of only one sequence per clone. This type of analysis
produced similar results (data not shown) suggesting that,
indeed, CDR mutation patterns in the trunk and canopy are
shaped by different selection pressures.
(e) Selection strength is dependent upon VH family and
conserved across individuals
Selection may be affected by the germline sequence of VH. To
check for such an effect, we correlated the selection values for
each VH separately between the three donors (figure 5). The
correlations are stronger in the trunk relative to the canopy.
This similar selection behaviour in different VH segments
between unrelated individuals might indicate a similar selection process for clones having similar VH genes. This could
be owing to similar antigens that trigger related immune
responses demonstrated by similar selection strengths of the
corresponding clones.
3. Discussion
The extent to which affinity maturation shapes the observed
pattern of somatic mutations in B-cell repertoires has been
debated, as several studies have failed to observe the
expected excess of NS mutations in CDRs [26]. In some
cases, the effects of selection can be confounded by the intrinsic biases of SHM [7], and integration of improved
background models into methods for quantifying selection
is critical. However, another limitation of current approaches
based on analysing the frequency of NS mutations may relate
to the short time-scale of B-cell affinity maturation. These
methods were originally developed for the analysis of fixed
mutations (i.e. those that appear in the trunk). However,
many somatic mutations in BCR sequences are shared by
only a subset of the cells in a clone, and these mutations may
not share a strong signature of selection. Indeed, we have previously shown that removing the most recent mutations (based
on a lineage analysis) can improve the signal for selection [6].
Here, we leverage recent advances in sequencing technologies
[27] to carry out large-scale lineage and selection analysis of the
human B-cell repertoire.
To quantify selection, we have used our previously proposed BASELINe method [24], along with a state-of-the-art
SHM targeting and nucleotide substitution model to properly
account for the intrinsic biases of the mutation process [7]. In
addition, we have developed and used a precise Bayesian
estimate to quantify selection and show that its maximumlikelihood approximation converges to our previously
proposed BASELINe method (using the Focused statistic)
[24]. This provides a precise estimate of the expected distribution of NS mutations in any given region, even for a
single sequence. Using these methods, we obtain an estimate
of the selection forces affecting the BCR. Interestingly, we
find that distinct selection pressures are apparent in the
trunk and canopy of B-cell clonal lineages. The trunk
mutations are shared by all members of a clone, and thus
may be considered fixed. Clear and consistent selection pressures are found for these trunk mutations. In particular, the
FWRs show evidence of negative selection, while the CDRs
Phil. Trans. R. Soc. B 370: 20140242
1.5
trunk
0.4
fraction
rstb.royalsocietypublishing.org
trunk
0.4
5
1.5
Downloaded from http://rstb.royalsocietypublishing.org/ on August 3, 2017
(a)
(b)
6
1.0
−1
SCDR–canopy(hu420143)
0.5
0
−2
−3
−0.5
−4
−1.0
FWR−canopy
CDR−canopy
FWR−trunk
CDR−trunk
−4
−3
−2
SCDR–canopy(PGP1)
−1
Figure 5. Correlation of the V gene selection estimations between subjects. (a) Pearson’s correlations are shown for the selection estimations calculated for each V gene
using MBMS. Only V genes that were used in more than 1% of the sampled repertoire were included. (b) Scatter plot for the V gene selection estimations in the CDRs and
canopy data of subject PGP1 versus subject hu420143. Colours represent different V genes as indicated in the colour key to the right. (Online version in colour.)
show evidence of positive selection. Trunk mutation patterns
are likely devoid of any strongly deleterious mutations and
are likely to be enriched for advantageous mutations in
the population. Such advantageous mutations are sometimes
denoted ‘key mutations’ [5,28]. Interesting evidence of such
mutations emerges in transgenic mouse systems, where
common mutations occur among multiple mice [29]. Such
mutations will be rapidly fixed, and seem to induce the
main effect of selection in the observed clones. By contrast,
mutation patterns in the canopy show overall evidence of
negative selection in both FWRs and CDRs. Nevertheless,
some individual clones show evidence of positive selection
in the CDRs. These may reflect mutations that provide a moderate advantage, or which have occurred recently and thus
have not had sufficient time to become fixed in the population. The difference in selection pressures between trunk
and canopy may reflect two different underlying mechanisms. First, the difference may reflect distinct time-scales. In
this case, the trunk reflects the cumulative history of
mutations that have been selected over the course of many
immune responses; for example, as would happen from restimulation of memory cells with recurrent infections. An
alternative explanation is that each clone is the result of a
single germinal centre reaction. In this case, the trunk reflects
strongly selected mutations which have become fixed in the
population, whereas the canopy reflects weakly selected
mutations that are not fixed. This study is unable to distinguish between these two models, and we suspect that
they both contribute to the observed mutation patterns.
All of the sequencing data analysed in this study were
amplified from mRNA. Thus, it is possible that sequences
from plasma cells may be over-represented in the data
because they contain larger amounts of mRNA and consequently have a high probability of being observed in cDNA
samples. Other biases in PCR amplification could also
skew the number of times that the same sequence appears
in the dataset. In order to ensure that these effects did not
impact our results, we repeated the analysis with or without
removing redundant sequences, which both led to similar
results. When redundant sequences are ignored, the mRNA
level of each BCR does not affect its importance in the
analysis.
We found a high correlation in the average selection
strength in different VH genes among subjects. As negative
selection is likely driven mainly by structural constraints, it
is easy to see how selection could be VH gene-specific. Conservation of selection strength in the CDRs, especially on the
trunk where positive selection is observed, is harder to
explain. It seems that different VH germline segments are
more (or less) able to optimize their binding affinity. Another
possibility is that the correlated selection pressures represent the shared responses to common antigens. Although
it is possible that these correlations represent biases in the
background model for SHM targeting, this is unlikely as
the correlation is much stronger in the trunk than in
the canopy. If the correlation was an artefact of errors
in the SHM model, no difference would be expected between
the two regions.
In most observed evolutionary systems, the mutation rate is
too low to observe diversification in real time. The humoral
immune response is an exceptional case, where the BCR is subject to a mutation rate estimated to be approximately 1023 per
base-pair per cell division [2–4]. This process occurs within
germinal centre structures in the secondary lymphoid organs
where B cells are rapidly dividing (up to four divisions per
day), producing an optimal observable micro-evolution
system. Our results suggest that a better understanding of
selection pressures can be obtained by separating the trunk
of the lineage tree (mutations from the germline to the
MRCA) and the canopy (mutations from the MRCA to the
leaves). While mutations on the trunk show consistent selection
pressures among donors and isotypes, the effect of canopy
mutations is much more variable and probably represents
weak/incremental selection.
4. Material and methods
(a) Repertoire sequencing data
B-cell repertoire sequencing data from blood samples of three
healthy human donors was obtained from a previous study
[18]. In the original publication, a time-series of samples before
and after influenza vaccination were sequenced and IgA, IgG
and IgM isotypes were identified based on the constant region
Phil. Trans. R. Soc. B 370: 20140242
hu420143−420IV
hu420143−PGP1
420IV−PGP1
rstb.royalsocietypublishing.org
V7.4.1
V6.1
V5.a
V5.51
V4.b
V4.61
V4.4
V4.34
V4.31
V4.30.4
V4.30.2
V4.28
V3.h
V3.71
V3.7
V3.66
V3.64
V3.49
V3.43
V3.33
V3.30.3
V3.30
V3.23
V3.21
V3.20
V3.15
V3.13
V3.11
V2.70
V2.5
V2.26
V1.f
V1.8
V1.69
V1.58
V1.45
V1.3
V1.24
V1.2
Downloaded from http://rstb.royalsocietypublishing.org/ on August 3, 2017
— Quality filtering
(1) Removal of reads where the primer could not be identified
or had a poor alignment score (mismatch rate greater than
0.1).
(2) Removal of sequences that did not appear in a single
sample at least twice.
— Assignment of germline V(D)J segments for each of the Ig
sequences: initial V(D)J assignments for each sequence were
obtained using IMGT/HighV-QUEST [28].
— Removal of non-functional sequences due to the occurrence
of a stop codon or/and a reading frame shift between the V
gene and the J gene.
— Removal of sequences with more than 30 mutations.
— Identification of clonally related sequences: sequences were
assigned into clonal groups by first partitioning sequences
based on common V gene, J gene and junction region
length. Within these larger groups, sequences differing from
one another by a weighted distance of less than 5 (within
the junction region) were then defined as clones. Distance
was measured as the number of point mutations weighted
by a symmetric version of the nucleotide substitution
probability previously described (50). The five threshold
corresponds to up to four transition mutations or one to
two transversion mutations [13].
(c) Selection estimation
For each clone, a lineage tree was inferred via maximum parsimony with PHYLIP v. 3.69 [30] as described in [31]. FWRs and
CDRs were defined using the IMGT definitions according to
their unique numbering scheme [32]. A consensus sequence for
each clone was formed by a weighted sampling of each mutated
(and non ‘N’) base from all the sequences in the clone.
BASELINe [21] was applied to the set of sequences using the
focused test [17]. In order to analyse selection in the canopy,
all sequences were collapsed to form a representative sequence
as previously described [24]. Mutations in codons that had
more than one mutation were discarded, as it is usually not possible to infer the order in which the mutations occurred (and thus
the micro-sequence context of the mutations is unknown).
MBSM, a novel method proposed here (see electronic supplementary material for full derivation) was used on all
sequences. For trunk mutations, similar to BASELINe, the
MRCA was compared with the inferred Ig germline sequence.
For canopy sequences, all sequences were used and compared
with the MRCA.
Authors’ contributions. G.Y., S.H.K. and Y.L. designed the experiments;
G.Y. and J.A.V.H. processed the data; G.Y., J.B., J.A.V.H. and Y.L.
analysed the data; G.Y., S.H.K. and Y.L. wrote the manuscript. All
authors read and commented on the text.
Competing interests. All authors have no competing interests.
Funding. This work was supported by United States-Israel Binational
Science Foundation grants (2013395 and 2009046).
Acknowledgements. We wish to acknowledge Mohamed Uduman for his
help in organizing the data.
References
1.
2.
3.
4.
5.
6.
7.
Janeway C, Travers P, Walport M, Shlomchik M.
2004 Immunobiology, pp. 49 –53, 6th edn.
New York, NY: Garland Science.
McKean D, Huppi K, Bell M, Staudt L, Gerhard W,
Weigert M. 1984 Generation of antibody diversity in
the immune response of BALB/c mice to influenza
virus hemagglutinin. Proc. Natl Acad. Sci. USA 81,
3180–3184. (doi:10.1073/pnas.81.10.3180)
Storb U. 1996 The molecular basis of somatic
hypermutation of immunoglobulin genes. Curr.
Opin. Immunol. 8, 206– 214. (doi:10.1016/S09527915(96)80059-8)
Kleinstein SH, Louzoun Y, Shlomchik MJ. 2003
Estimating hypermutation rates from clonal tree
data. J. Immunol. 171, 4639 –4649. (doi:10.4049/
jimmunol.171.9.4639)
Radmacher MD, Kelsoe G, Kepler TB. 1998 Predicted
and inferred waiting times for key mutations in the
germinal centre reaction: evidence for stochasticity
in selection. Immunol. Cell Biol. 76, 373–381.
(doi:10.1046/j.1440-1711.1998.00753.x)
Uduman M, Shlomchik MJ, Vigneault F, Church GM,
Kleinstein SH. 2014 Integrating B cell lineage
information into statistical tests for detecting
selection in Ig sequences. J. Immunol. 192,
867–874. (doi:10.4049/jimmunol.1301551)
Liu Y, Joshua D, Williams G, Smith C, Gordon J,
MacLennan I. 1989 Mechanism of antigen-driven
8.
9.
10.
11.
12.
13.
selection in germinal centres. Nature 342,
929 –931.
Berek C, Berger A, Apel M. 1991 Maturation of the
immune response in germinal centers. Cell 67,
1121 –1129. (doi:10.1016/0092-8674(91)90289-B)
Hodgkin PD, Heath WR, Baxter AG. 2007 The
clonal selection theory: 50 years since the
revolution. Nat. Immunol. 8, 1019–1026. (doi:10.
1038/ni1007-1019)
Kocks C, Rajewsky K. 1988 Stepwise intraclonal
maturation of antibody affinity through somatic
hypermutation. Proc. Natl Acad. Sci. USA 85,
8206 –8210. (doi:10.1073/pnas.85.21.8206)
Yaari G et al. 2013 Models of somatic
hypermutation targeting and substitution based on
synonymous mutations from high-throughput
immunoglobulin sequencing data. Front. Immunol.
4, 358. (doi:10.3389/fimmu.2013.00358)
Shapiro GS, Ellison MC, Wysocki LJ. 2003 Sequencespecific targeting of two bases on both DNA strands
by the somatic hypermutation mechanism. Mol.
Immunol. 40, 287–295. (doi:10.1016/S01615890(03)00101-9)
Smith DS, Creadon G, Jena PK, Portanova JP,
Kotzin BL, Wysocki LJ. 1996 Di- and trinucleotide
target preferences of somatic mutagenesis in
normal and autoreactive B cells. J. Immunol. 156,
2642 – 2652.
14. MacCarthy T, Kalis SL, Roa S, Pham P, Goodman MF,
Scharff MD, Bergman A. 2009 V-region mutation in
vitro, in vivo, and in silico reveal the importance of
the enzymatic properties of AID and the sequence
environment. Proc. Natl Acad. Sci. USA 106,
8629– 8634. (doi:10.1073/pnas.0903803106)
15. Shlomchik MJ, Aucoin AH, Pisetsky DS, Weigert MG.
1987 Structure and function of anti-DNA
autoantibodies derived from a single autoimmune
mouse. Proc. Natl Acad. Sci. USA 84, 9150–9154.
(doi:10.1073/pnas.84.24.9150)
16. Liberman G, Benichou J, Tsaban L, Glanville J,
Louzoun Y. 2013 Multi step selection in IgH chains
is initially focused on CDR3 and then on other CDR
regions. Front. Immunol. 4, 274. (doi:10.3389/
fimmu.2013.00274)
17. Hershberg U, Uduman M, Shlomchik MJ, Kleinstein
SH. 2008 Improved methods for detecting selection
by mutation analysis of Ig V region sequences. Int.
Immunol. 20, 683– 694. (doi:10.1093/intimm/
dxn026)
18. Laserson U et al. 2014 High-resolution antibody
dynamics of vaccine-induced immune responses.
Proc. Natl Acad. Sci. USA 111, 4928–4933. (doi:10.
1073/pnas.1323862111)
19. Vander Heiden JA, Yaari G, Uduman M, Stern JN,
O’Connor KC, Hafler DA, Vigneault F, Kleinstein SH.
2014 pRESTO: a toolkit for processing high-
7
Phil. Trans. R. Soc. B 370: 20140242
(b) Lineage tree construction
rstb.royalsocietypublishing.org
of the sequences. In this study, we pooled together data from
samples collected at three time-points prior to the vaccination.
Raw sequencing reads were filtered in several steps to identify
and remove low-quality sequences. Conservative thresholds
were applied in all cases to increase the reliability of the resulting mutation calls, at the potential expense of excluding some
real mutations. Pre-processing was carried out using the
Repertoire Sequencing Toolkit ( pRESTO) [19], and involved:
Downloaded from http://rstb.royalsocietypublishing.org/ on August 3, 2017
21.
23.
25.
26.
27.
28.
29.
30.
31.
32.
affinity maturation of an anti-hapten response.
EMBO J. 7, 1995–2001.
Anderson SM et al. 2009 Taking advantage: highaffinity B cells in the germinal center have lower
death rates, but similar rates of division, compared
to low-affinity cells. J. Immunol. 183, 7314–7325.
(doi:10.4049/jimmunol.0902452)
Feldstein J. 2009 PHYLIP (Phylogeny Inference
Package) version 3.69. Seattle, WA: Department of
Genetics, University of Washington.
Stern JNH et al. 2014 B cells populating the
multiple sclerosis brain mature in the draining
cervical lymph nodes. Sci. Transl. Med. 6, 248ra107.
(doi:10.1126/scitranslmed.3008879)
Lefranc MP, Pommie C, Ruiz M, Giudicelli V,
Foulquier E, Truong L, Thouvenin-Contet V, Lefranc
G. 2003 IMGT unique numbering for
immunoglobulin and T cell receptor variable
domains and Ig superfamily V-like domains. Dev.
Comp. Immunol. 27, 55 –77. (doi:10.1016/S0145305X(02)00039-3)
8
Phil. Trans. R. Soc. B 370: 20140242
22.
24.
J. Mol. Biol. 275, 269 –294. (doi:10.1006/jmbi.
1997.1442)
Uduman M, Yaari G, Hershberg U, Stern JA,
Shlomchik MJ, Kleinstein SH. 2011 Detecting
selection in immunoglobulin sequences. Nucleic
Acids Res. 39, W499–W504. (doi:10.1093/
nar/gkr413)
Kimura M. 1962 On the probability of fixation
of mutant genes in a population. Genetics 47,
713 –719.
Bose B, Sinha S. 2005 Problems in using statistical
analysis of replacement and silent mutations in
antibody genes for determining antigen-driven
affinity selection. Immunology 116, 172–183.
(doi:10.1111/j.1365-2567.2005.02208.x)
Benichou J, Ben-Hamo R, Louzoun Y, Efroni S. 2012
Rep-Seq: uncovering the immunological repertoire
through next-generation sequencing. Immunology 135,
183–191. (doi:10.1111/j.1365-2567.2011.03527.x)
Allen D, Simon T, Sablitzky F, Rajewsky K, Cumano
A. 1988 Antibody engineering for the analysis of
rstb.royalsocietypublishing.org
20.
throughput sequencing raw reads of lymphocyte
receptor repertoires. Bioinformatics 30, 1930– 1932.
(doi:10.1093/bioinformatics/btu138)
Gupta NT, Vander Heiden J, Uduman M, GadalaMaria D, Yaari G, Kleinstein SH. In press. Change-O:
a toolkit for analyzing large-scale B cell
immunoglobulin repertoire sequencing data.
Bioinformatics btv359. (doi:10.1093/bioinformatics/
btv359)
Yaari G, Uduman M, Kleinstein SH. 2012 Quantifying
selection in high-throughput immunoglobulin
sequencing data sets. Nucleic Acids Res. 40, e134.
(doi:10.1093/nar/gks457)
Shlomchik MJ, Marshak-Rothstein A, Wolfowicz
CB, Rothstein TL, Weigert MG. 1987 The role of
clonal selection and somatic mutation in
autoimmunity. Nature 328, 805 – 811. (doi:10.
1038/328805a0)
Morea V, Tramontano A, Rustici M, Chothia C, Lesk
AM. 1998 Conformations of the third hypervariable
region in the VH domain of immunoglobulins.