Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
This information is current as of June 12, 2017. Development of the Expressed Ig CDR-H3 Repertoire Is Marked by Focusing of Constraints in Length, Amino Acid Use, and Charge That Are First Established in Early B Cell Progenitors Ivaylo I. Ivanov, Robert L. Schelonka, Yingxin Zhuang, G. Larry Gartland, Michael Zemlin and Harry W. Schroeder, Jr. Supplementary Material References Subscription Permissions Email Alerts http://www.jimmunol.org/content/suppl/2005/06/06/174.12.7773.DC1 This article cites 40 articles, 14 of which you can access for free at: http://www.jimmunol.org/content/174/12/7773.full#ref-list-1 Information about subscribing to The Journal of Immunology is online at: http://jimmunol.org/subscription Submit copyright permission requests at: http://www.aai.org/About/Publications/JI/copyright.html Receive free email-alerts when new articles cite this article. Sign up at: http://jimmunol.org/alerts The Journal of Immunology is published twice each month by The American Association of Immunologists, Inc., 1451 Rockville Pike, Suite 650, Rockville, MD 20852 Copyright © 2005 by The American Association of Immunologists All rights reserved. Print ISSN: 0022-1767 Online ISSN: 1550-6606. Downloaded from http://www.jimmunol.org/ by guest on June 12, 2017 J Immunol 2005; 174:7773-7780; ; doi: 10.4049/jimmunol.174.12.7773 http://www.jimmunol.org/content/174/12/7773 The Journal of Immunology Development of the Expressed Ig CDR-H3 Repertoire Is Marked by Focusing of Constraints in Length, Amino Acid Use, and Charge That Are First Established in Early B Cell Progenitors1 Ivaylo I. Ivanov,2* Robert L. Schelonka,2† Yingxin Zhuang,‡ G. Larry Gartland,* Michael Zemlin,§ and Harry W. Schroeder, Jr.3*‡ I n jawed vertebrates, the adaptive immune system is characterized by the exponential diversity of its Ag receptors (1–5). In contrast to the receptors of the innate immune system that bind relatively invariant pathogen-associated epitopes (6), diverse Ag receptor repertoires allow recognition of novel or divergent epitopes on pathogens or toxins. The diversity of Ig, the BCR, is primarily the property of the V domains of the H and L chains (1–5). Diversity is asymmetrically distributed within each V domain (7, 8). In the primary sequence, three intervals of hypervariability, termed CDRs, are separated from each other by four relatively conserved intervals, termed framework regions (FRs).4 In the native form of the Ab, the FRs create a scaffold that supports the H and L chain CDRs. These CDRs are juxtaposed to form the Ag binding site. CDR-H1, -H2, -L1, and -L2 create the outside border; CDR-L3 forms the base; and CDR-H3 lies at the center of this Ag binding site. CDR-H1, -H2, -L1, and -L2 are entirely encoded by the V gene segment, and Departments of *Microbiology, †Pediatrics, and ‡Medicine, University of Alabama at Birmingham, Birmingham, AL 35294; and §Department of Pediatrics, Philipps Universität Marburg, Marburg, Germany Received for publication December 22, 2004. Accepted for publication March 22, 2005. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. 1 This work was supported by National Institutes of Health Grants AI42732 (to H.W.S.), AI48115 (to H.W.S.), and HD043327 (to R.L.S.), and P.E. Kempkes-Stiftung (to M.Z.). 2 I.I.I. and R.L.S. contributed equally to the preparation of this manuscript. 3 Address correspondence and reprint requests to Dr. Harry W. Schroeder, Jr., WTI 378, 1530 3rd Avenue South, Birmingham, AL 35294-3300. E-mail address: [email protected] 4 Abbreviation used in this paper: FR, framework region. Copyright © 2005 by The American Association of Immunologists, Inc. are thus initially restricted to germline sequence, whereas CDR-L3 and -H3 are created de novo by VL3 JL and VH3 DH3 JH joining, respectively. The inclusion of a D gene segment and the addition of nongermline-encoded nucleotides (N regions) vastly enhance the potential for both combinatorial and somatic diversity of CDR-H3. Enhanced diversity and a central position within the Ag binding site allow CDR-H3 to often play a critical role in the recognition of Ag (7, 8). The composition of the functional CDR-H3 repertoire is biased in length, amino acid composition, predicted loop and base structure, and charge (9). The distribution of lengths of both murine and human CDR-H3 forms normal Gaussian curves with differing means, suggesting that each species achieves its own preferred CDR-H3 length (9, 10). The average hydrophobicity of the amino acids within the CDR-H3 loop also forms a Gaussian distribution centering on neutrality to mild hydrophilicity (11). This neutral, hydrophilic preference reflects enrichment for tyrosine and glycine residues in the CDR-H3 loop in excess of that which would be predicted by random chance alone (9, 11, 12). Construction of CDR-H3 begins early in B cell progenitors. The various defined stages of B cell development can be viewed, in part, as transitions through a series of checkpoints that test the assembly and function of the V domains (13–15). A number of studies have established that the CDR-H3 plays a crucial role in these selection processes (16 –18). In humans, repertoire selection during B cell development is associated with a reduction in the distribution and mean length of the expressed CDR-H3 repertoire (19), and loss of highly charged or hydrophobic sequences (20 – 22). It has been proposed that the loss of longer sequences as well as those that are enriched for charged amino acids reflects a higher likelihood of self-reactivity in the Igs that bear them (22). To gain insight into the mechanisms used to regulate the Ab repertoire, to determine when during development constraints on 0022-1767/05/$02.00 Downloaded from http://www.jimmunol.org/ by guest on June 12, 2017 To gain insight into the mechanisms that regulate the development of the H chain CDR3 (CDR-H3), we used the scheme of Hardy to sort mouse bone marrow B lineage cells into progenitor, immature, and mature B cell fractions, and then performed sequence analysis on VH7183-containing C transcripts. The essential architecture of the CDR-H3 repertoire observed in the mature B cell fraction F was already established in the early pre-B cell fraction C. These architectural features include VH gene segment use preference, DH family usage, JH rank order, predicted structures of the CDR-H3 base and loop, and the amino acid composition and average hydrophobicity of the CDR-H3 loop. With development, the repertoire was focused by eliminating outliers to what appears to be a preferred repertoire in terms of length, amino acid composition, and average hydrophobicity. Unlike humans, the average length of CDR-H3 increased during development. The majority of this increase came from enhanced preservation of JH sequence. This was associated with an increase in the prevalence of tyrosine. With an accompanying increase in glycine, a shift in hydrophobicity was observed in the CDR-H3 loop from near neutral in fraction C (ⴚ0.08 ⴞ 0.03) to mild hydrophilic in fraction F (ⴚ0.17 ⴞ 0.02). Fundamental constraints on the sequence and structure of CDR-H3 are thus established before surface IgM expression. The Journal of Immunology, 2005, 174: 7773–7780. 7774 Materials and Methods Mice CD43 (FITC) (BD Harlingen), anti-IgD (PE) (Southern Biotechnology Associates), and anti-IgM (Cy-5). Sorting, RNA preparation, RT-PCR, and sequencing From each bone marrow fraction, 2 ⫻ 104 cells were sorted directly into RLT lysing buffer (Qiagen RNeasy minikit). Total RNA was prepared per the manufacturer’s protocol. RNA was eluted in 30 l of water. Subsequently, 10 l of the RNA was used to generate avian myeloblastosis virus reverse transcriptase (Roche Molecular Biochemicals) catalyzed firststrand cDNA with the primer C1 (5⬘-GACAGGGGGCTCTCG-3⬘). The cDNA was diluted 4-fold and used as a template to amplify VH7183DJC joins by PCR (Qiagen Taq Core PCR kit). PCR conditions were as follows: hot start at 95°C for 3 min; 35 cycles of 1 min at 94°C, 1 min at 60°C, and 1 min at 72°C; and a final extension at 72°C for 10 min. The primers were a FR1 VH7183-specific upstream primer, AF303 (5⬘-GGGGCTCGAG GAGTCTGGGGGA-3⬘) (23), and the C exon 1 primer C2 (5⬘-CAG GATCCGAGGGGGAAGACATTTGG-3⬘). PCR products were subcloned (TOPO-TA Cloning kit; Invitrogen Life Technologies), primed with the AF303 primer, and sequenced at the University of Alabama at Birmingham Heflin Center Sequencing Core. Sequences have been submitted to GenBank under the identifiers AY895210-AY895829. Sequence analysis CDR-H3 was identified as the region between (but not including) the 3⬘ VH-encoded conserved cysteine (TGT) at Kabat position 92 (IMGT 104) and the 5⬘ JH-encoded conserved tryptophan (TGG) at Kabat position 103 (IMGT 118) (7, 28). CDR-H3 was separated into two components, the base (Kabat aa 93 and 94 (IMGT 105 and 106), typically alanine and arginine; and Kabat aa 100 –102 (IMGT 115–117), typically phenylalanine or methionine, aspartic acid or alanine, and tyrosine or valine) and the loop (the intervening amino acids) (Fig. 1). We sorted bone marrow B cell fractions from four separate mice on three separate occasions. The mice analyzed represent the progeny of mice on a mixed 129/C57BL6 background that had been backcrossed for 10 generations onto BALB/cJ (The Jackson Laboratory; stock 000651). All studies were performed in accordance with Institutional Animal Care and Use Committee regulations. Statistical analysis Flow cytometry and cell sorting Results Single cell suspensions were prepared by flushing the bone marrow from two femurs with FACS buffer (1⫻ PBS ⫹ 2% heat-inactivated FCS). RBC were lysed in 1 ml of erythrocyte lysing buffer (0.15 M NH4Cl, 1 mM KHCO3, 0.1 mM EDTA; Comprehensive Cancer Center Media Prep; catalogue ACK) for 5 min at room temperature. Cells were washed and resuspended in an appropriate volume of FACS buffer for counting and staining. The total bone marrow from individual mice was incubated in 500 l of fluorescently labeled Abs in FACS buffer. Sorting was then performed on a MoFlo instrument (DakoCytomation). The following sets of mAbs were used: for Hardy fractions B and C, anti-CD19 (streptavidin SpectralRed) (Southern Biotechnology Associates), anti-CD43 (FITC) (BD Harlingen), anti-BP-1 (PE) (gift from J. Kearney, University of Alabama at Birmingham, Birmingham, AL), and antiIgM (Cy5) (Jackson ImmunoResearch Laboratories); for fraction D-F, antiCD19 (streptavidin SpectralRed) (Southern Biotechnology Associates) anti- Differences between populations were assessed, where appropriate, by Student’s t test, two tailed; Fisher’s exact test, two tailed; 2; or the Levene test for the homogeneity of variance. Analysis was performed with JMP IN version 5.1 (SAS Institute). Means are accompanied by the SEM. Cells within the Hardy bone marrow fractions B-F were sorted using the gates shown in Fig. 2. A total of 707 transcripts was sequenced, of which 649 (92%) were unique. Of these, 619 (95%) contained in-frame, open rearrangements. By fraction, there were 66 sequences from B (pro-B), 192 sequences from C (early pre-B), 131 sequences from D (late pre-B), 121 sequences from E (immature B), and 109 sequences from F (mature B). Preferential use of VH7183.10 is established early in B cell development In accordance with previous studies by other investigators (23, 25, 26, 29), the prevalence of VH7183.1 (VH81X) declined with development (B3 F; p ⬍ 0.001). VH81X represented 38% of the FIGURE 1. Analysis of CDR-H3. In this hypothetical sequence, the location of CDR-H3, the CDR-H3 loop, and frameworks 3 and 4 is shown. Each VDJC sequence was deconstructed and evaluated for VH, DH, and JH usage; P junctions and N addition; the length of CDR-H3 in codons; and the distribution of individual amino acids and average Kyte-Doolittle hydrophobicity (31, 32) in the CDR-H3 loop. Kabat and IMGT (7, 28) number designations for the TGT codon, which marks the terminus of framework 3, and the TGG, which marks the beginning of framework 4, are identified. Amino acids at the extreme (arginine and isoleucine) have been included to demonstrate the range of the hydrophobicity index. A single palindromic nucleotide (T) flanks VH sequence, and DFL16.1 sequence is accompanied by 3 nt of N addition on each flank. The normalized average hydrophobicity of the CDR-H3 loop is ⫺0.24. Downloaded from http://www.jimmunol.org/ by guest on June 12, 2017 CDR-H3 composition are imposed, and to establish the extent to which murine development resembles that of humans, we sought to establish the pattern of CDR-H3 repertoire development in mice bearing an IgMa H chain repertoire. We used the scheme of Hardy (14) to sort bone marrow B lineage cells into progenitor, immature, and mature B cell fractions. We then cloned, sequenced, and deconstructed the CDR-H3 component of VH7183DJC transcripts. We chose to look at RNA message, as this is most representative of the expressed, and thus functional, Ig repertoire. We focused on the VH7183 family because its germline complement in IgHa alleles has been well defined (23); it represents a manageable 10% of the active repertoire (24); patterns of VH7183 use during ontogeny and development have been well established (23, 25, 26); and it contributes to both self and nonself reactivities (reviewed in Ref. 27). We show in this study that the essential architecture of the CDR-H3 repertoire, including patterns of gene segment use, amino acid composition, charge, predicted base and loop structure, and average length, is established very early in B cell development, well before the expression of surface IgM. Development appears to focus the repertoire by eliminating outliers to what appears to be a preferred repertoire in terms of length, amino acid composition, and average hydrophobicity. CDR-H3 REPERTOIRE DEVELOPMENT IN BONE MARROW The Journal of Immunology 7775 Patterns of DH use remain relatively unchanged with development Use of the various DH families did not undergo a significant change with development (Fig. 3B). Using a minimum of 5 nt of identity to assign germline DH origin, we identified members of the DSP and DFL families in ⬃50 and ⬃30% of the transcripts, respectively. DQ52 was used in 4 –10% of the transcripts, and DST4 contributed to ⬍2% of the sequences. Due to exonucleolytic nibbling and N addition, we were unable to identify a DH progenitor in the remaining transcripts. The DFL16.1 gene segment was the single most commonly used DH gene segment at all stages of development, representing ⬃20% of sequences in all of the fractions. Increased prevalence of reading frame 2 in fraction B sequences fraction B sequences, 23% of the fraction C sequences (B3 C; p ⫽ 0.03), 7% of the fraction D sequences (C3 D; p ⬍ 0.001), 10% of the fraction E sequences (D3 E; p ⫽ 0.52), and 2% of the fraction F sequences (E3 F; p ⫽ 0.02) (Fig. 3A). VH7183.10 was the most commonly used VH7183 gene segment in fractions C, D, E, and F. VH7183.10 increased from 5% in fraction B to 21% in fraction C ( p ⬍ 0.01), and then remained relatively unchanged in fractions D, E, and F (31, 20, and 22%; p ⫽ 1.0). Changes in the prevalence of VH gene segments other than VH7183.10 were also observed, but none of these individual changes achieved statistical significance. FIGURE 3. Use of VH, DH, and JH gene segments and DH reading frame as a function of B cell development through Hardy fractions B through F (Fig. 1). A, Distribution of VH7183 family gene segment usage. The VH segments are arranged in germline order, with the most DH proximal sequences to the right. VH, DH, and JH use is reported as the percentage of the sequenced population of unique, in-frame, open transcripts from each B lineage fraction. B, Distribution of DH families. C, DH reading frame use in CDR-H3 intervals containing DFL or DSP family gene segments. Use is reported as the percentage of the sequenced population of DFL- and DSP-containing transcripts from each B lineage fraction. D, Distribution of JH gene segments. Downloaded from http://www.jimmunol.org/ by guest on June 12, 2017 FIGURE 2. Representative gates used to identify and sort Hardy fractions B through F from bone marrow. A shift in reading frame prevalence was observed during B cell development (Fig. 3C). We identified 53 fraction B, 152 fraction C, 88 fraction D, 92 fraction E, and 88 fraction F sequences that contained identifiable DFL or DSP gene segments. Reading frame 1 was the predominant reading frame at all stages of B cell development. However, use of reading frame 1 increased from 57% in fraction B to 70% in fraction C, 68% in fraction D, 78% in fraction E, and 78% in fraction F. Use of reading frame 3 decreased from 17% in fraction B to 12% in fraction C, 18% in fraction D, 11% in fraction E, and 9% in fraction F. Use of reading frame 2 began at 26% in fraction B, and then decreased to 18% in fraction C, 14% in fraction D, 11% in fraction E, and 13% in fraction F. The change in distribution of reading frames between fractions B and F was significant at p ⫽ 0.02. Reading frame 3 typically encodes one or more termination codons. Functional sequences containing RF3 were significantly shorter (11.7 ⫾ 0.3 codons) than those using RF1 (12.7 ⫾ 0.1; p ⫽ 0.004) and RF2 (12.5 ⫾ 0.3; p ⫽ 0.05). No significant differences were observed in the average length of RF1- and RF2-containing sequences ( p ⫽ 0.60). 7776 Increased use of JH1 in the transition to the immature B cell stage A shift in rank order of JH use was observed in the fraction B3 C transition (Fig. 3D). In fraction B, JH2 (31%) and JH3 (31%) were the most frequently used JH, followed by JH4 (27%) and JH1 (4%). In fractions C through F, JH4 was the most commonly used sequence (35– 40%), followed by relatively equivalent use of JH3 (26 –27%) and JH2 (21–28%), and then JH1 (7–16%). The rise in the use of JH1 from fractions B (4%), through C (7%), to D (16%) reached statistical significance ( p ⬍ 0.03; 2). Use of JH1 then remained relatively stable (14 and 12% in fractions E and F, respectively). CDR-H3 REPERTOIRE DEVELOPMENT IN BONE MARROW containing sequences were observed. By fraction F, the differences in DFL16.1- and DSP-containing sequences no longer achieved statistical significance (13.6 ⫾ 0.6 vs 12.9 ⫾ 0.2, respectively; p ⫽ 0.37). However, DFL16.1 sequences retained a length advantage over those with DQ52 (11.0 ⫾ 0.5; p ⫽ 0.01). The increase in length from fraction B to fraction F reflected, in part, a reduction in the prevalence of sequences whose CDR-H3 length was ⬍9 aa (Fig. 5A). Due to the larger number of sequences, this was best observed in a comparison between fractions C and F. Of the 192 sequences in fraction C, 24 were 8 aa or less, whereas only 3 of 109 sequences were 8 aa or less in fraction F ( p ⬍ 0.01). This also led to a significant narrowing in the variance of the distribution of lengths ( p ⫽ 0.01; Levene). An increase in average CDR-H3 length with development The increase in CDR-H3 length reflected increased preservation of terminal JH sequence Sequences containing identifiable DH gene segments were deconstructed to assess the contribution of VH, DH, and JH sequence, and of N addition and P junctions to the change in CDR-H3 length (Fig. 6). From fractions B to F, the average length increased by 3.4 nt ( p ⫽ 0.007), or 1.1 codons. Minor increases in the contribution of the VH sequence (⫹0.2 nt) and N addition at the 5⬘ and 3⬘ junctions (⫹0.4 nt each), which reflected one-third of the increase, were observed. However, none of these rather subtle increases in length achieved statistical significance. In contrast, the contribution of JH germline sequence increased by 2.6 nt, or two-thirds of the total increase. On average, JH sequence contributed 10.7 ⫾ 0.6 nt in fraction B and 13.3 ⫾ 0.4 nt in fraction F ( p ⬍ 0.001; Student’s t test). The increase in average JH component length reflected both the increased use of JH1 and JH4 and enhanced preservation of 5⬘ terminal nucleotides among sequences that used JH2 or JH3. The four JH sequences differ in length, with JH2 and JH3 contributing up to 14 nt each to CDR-H3, JH1 19 nt, and JH4 20 nt (Fig. 6). We examined the complete database of 619 unique sequences for the contribution of JH. The average length of the 22 sequences that contain JH1 and JH4 in fraction B was 12.8 ⫾ 0.6 vs 12.9 ⫾ 0.4 in 51 sequences from fraction F ( p ⫽ 0.82). In these CDR-H3 intervals, JH1 and JH4 contributed 14.2 ⫾ 0.9 nt in fraction B vs 15.4 ⫾ 0.6 nt in fraction F ( p ⫽ 0.25; Student’s t test). The average length of the 44 sequences containing JH2 and JH3 in fractions B was 10.7 ⫾ 0.4 vs 12.1 ⫾ 0.3 in 58 sequences from fraction F ( p ⫽ FIGURE 4. Change in average CDR-H3 length and hydrophobicity in VH7183DJC transcripts isolated from Hardy fractions B through F (14) (Fig. 1). Error bars depict the SE of each mean. A, Change in the average length of CDR-H3 intervals containing DFL16.1, DSP family gene segments, DQ52, or no identifiable D compared with the change in the population as a whole. Sequences containing DFL16.2 or DST4 are not included due to paucity of numbers. B, Change in the average hydrophobicity index value of the CDR-H3 loops. Downloaded from http://www.jimmunol.org/ by guest on June 12, 2017 The average length of CDR-H3 increased during development from an average of 11.4 ⫾ 0.3 in fraction B to 12.5 ⫾ 0.2 in fraction F ( p ⫽ 0.01) (Fig. 4A). Mouse DH sequences differ in length. To assess the contribution of the identity and length of the DH on the length of CDR-H3, we compared the average lengths of sequences that contained DFL16.1 with those with DSP gene family members, DQ52, or no identifiable D gene segment, respectively (Fig. 4A). DFL16.1 contains 23 nt, DFL16.2 and the DSP gene segments are two codons shorter with 17 nt, DST4 contains 16 nt, and DQ52 is 4 codons shorter with only 11 nt. For sequences containing DFL16.1, the average length increased from 12.6 ⫾ 0.6 in fraction B to 13.6 ⫾ 0.6 in fraction F ( p ⫽ 0.29). For sequences containing DSP family members, the length increased from 11.7 ⫾ 0.5 in fraction B to 12.9 ⫾ 0.2 in fraction F ( p ⫽ 0.01; Student’s t test). For sequences containing DQ52, the length increased from 9.0 ⫾ 0.7 to 11.0 ⫾ 0.5 ( p ⫽ 0.05). DFL16.1-containing sequences remained statistically similar in length distribution throughout development (Fig. 4A). With development, DSP-containing sequences converged in length to the DFL16.1 standard, whereas DQ52-containing sequences retained a length disadvantage. In fraction B, sequences containing DFL16.1 were ⬃0.9 codons longer than those that contained DSP gene segments ( p ⫽ 0.31) and 3.5 codons longer than those that contained DQ52 ( p ⫽ 0.002). In fraction C, the length distribution of DFL16.1- and DSP-containing sequences significantly diverged (13.1 ⫾ 0.4 vs 11.8 ⫾ 0.3, respectively, p ⫽ 0.001). In fractions D and E, DSP-containing sequences increased in length to 12.5 ⫾ 0.3 codons, while no significant changes in the length of DFL16.1- The Journal of Immunology 7777 0.001). In these CDR-H3 intervals, JH2 and JH3 contributed 9.2 ⫾ 0.6 nt in fraction B vs 10.9 ⫾ 0.3 nt in fraction F ( p ⫽ 0.0003). The convergence in length distribution between DFL16.1- and DSP-containing sequences reflected a balance between an increase in the contribution of JH and loss of 5⬘ terminal DFL16.1 sequence. At the point of greatest divergence in fraction C, the 44 sequences that contain DFL16.1 lost an average of 3.5 ⫾ 0.4 5⬘ terminal nucleotides vs 4.7 ⫾ 0.3 for the 94 sequences that contain DSP FIGURE 6. Deconstruction of the components contributing to CDR-H3 length in sequences containing identifiable DH gene segments (DFL, DSP, DST, and DQ52). The potential contribution of the germline sequence of the VH gene segment, P junctions, N region addition, the DH gene segment, and the JH gene segment to CDR-H3 length is illustrated. The length of DFL16.2 is identical with that of DSP family members. The actual contribution of these components to 57 fraction B, 164 fraction C, 101 fraction D, 98 fraction E, and 96 fraction F sequences is shown. All components are shown to scale. The major contributor to the increase in average CDR-H3 length with development is observed to be an increase in the contribution of JH sequence (p ⫽ 0.0007). Downloaded from http://www.jimmunol.org/ by guest on June 12, 2017 FIGURE 5. Distribution of CDR-H3 lengths and hydrophobicity in VH7183DJC transcripts isolated from Hardy fractions B through F (14) (Fig. 1). Prevalence is reported as the percentage of the sequenced population of unique, in-frame, open transcripts from each B lineage fraction. To facilitate visualization of the change in variance of the distribution, the lines mark the preferred range of lengths or average hydrophobicity observed in fraction F. To facilitate visualization of the change in the mean of the lengths or average hydrophobicity, arrows have been placed to mark the position of the average length or average hydrophobicity in fraction F. A, Distribution of CDR-H3 lengths as a function of B cell development. B, Distribution of the average hydrophobicity of individual CDR-H3 loops as assessed by reference to a normalized Kyte-Doolittle scale (31, 32). 7778 CDR-H3 REPERTOIRE DEVELOPMENT IN BONE MARROW FIGURE 7. Distribution of amino acids in the CDR-H3 loops of the VH7183DJC transcripts isolated from Hardy fractions B through F (14) (Fig. 1). The amino acids are arranged by relative hydrophobicity, as assessed by a normalized Kyte-Doolittle scale (31, 32). Use is reported as the percentage of the sequenced population from each B lineage fraction. Increased prevalence of tyrosine and glycine in fraction F The general bias for tyrosine and glycine in the CDR-H3 loop was first apparent in fraction B and intensified during B cell development (Fig. 7). Of the 423 predicted aa in the CDR-H3 loops from fraction B, 135 (32%) were either tyrosine or glycine, whereas of the 814 predicted aa in the loops from fraction F, 324 (40%) were tyrosine or glycine ( p ⫽ 0.025). Overall, the amino acid composition differed significantly between fraction B and fraction F ( p ⫽ 0.01, 2, 19 degrees of freedom). Fraction B loops, for example, contained more hydrophobic amino acids (11% valine, isoleucine, or leucine) than fraction F (9%). However, these and other changes in the prevalence of individual amino acids did not achieve statistical significance. Shifts in the prevalence of tyrosine and glycine, in length, and in N addition have been associated with changes in the distribution of the predicted structures of the CDR-H3 loop and base (9, 30). These types of changes have been observed as a function of ontogeny as well as of species origin. However, in the adult mouse sequences analyzed in this work, the stability in the relative prevalence of amino acid sequence, length, and N addition was accompanied by stability in the distribution of predicted base and loop structures (data not shown). No significant changes in predicted structure were observed from fractions B to F. A shift in average hydrophobicity from near neutrality to hydrophilicity To determine whether there was a global change in the distribution of hydrophobicity with development, we used a normalized KyteDoolittle scale (31, 32) to calculate the relative average hydrophobicity of the CDR-H3 loops (Fig. 4B). We observed a shift to neutrality from fraction B (⫺0.15 ⫾ 0.04) to fraction C (⫺0.08 ⫾ 0.03) that was followed by shift toward hydrophilicity in fractions D, E, and F (⫺0.15 ⫾ 0.03, ⫺0.14 ⫾ 0.03, and ⫺0.17 ⫾ 0.02, respectively). The shift in average hydrophobicity from fraction C to D is significant at p ⫽ 0.05, and the shift from fraction C to F is significant at p ⫽ 0.02. As in the case of length, the variance in average hydrophobicity decreased with development. This shift in variance was significant at p ⫽ 0.03 between fractions B and F, and at p ⬍ 0.01 between fractions C and F. The change in variance is due, in part, to the loss of sequences at the extremes (Fig. 5B). In fraction B, there was one sequence whose average hydrophobicity score was greater than 0.6, there were 13 in fraction C, 3 in fraction D, 4 in fraction E, and none in fraction F. Similarly, there were 3 sequences in fraction B with an average hydrophobicity score of less than ⫺0.6, there were 6 in fraction C, 6 in fraction D, two in fraction E, and none in fraction F. The difference in the prevalence of sequences at the extreme in fraction C (19 of 211) vs fraction F (0 of 110) is significant at p ⬍ 0.01. A comparison of highly charged sequences between surface IgM⫺ pre/pro-B cells (fractions B, C, and D) and surface IgM⫹ B cells (fractions E and F) is also significant at p ⬍ 0.05. Discussion We have shown in this study that the major patterns of VH, DH, and JH use in the expressed repertoire are already established in progenitor B cells, and thus before the expression of membranebound IgM. The relative prevalence of the various DH families in fraction B remained relatively unchanged from fraction B through fraction F. The rank order of JH prevalence that was first established in fraction C was maintained through fraction F. Consistent with previous reports that focused on rearrangement preference (23, 25, 26, 29), we found a high frequency of transcripts using the VH81X (VH7183.1) gene segment in fraction B. The prevalence of VH81X then steadily diminished in the progression from fraction C to fraction F. A preference for VH7183.10 was established in fraction C, and its prevalence remained remarkably stable from fraction C to fraction F. Thus, the dominant pattern of VH, DH, and JH use remained essentially unchanged from fraction C to F. Fraction C includes cells that are still at the intermediate DJ rearrangement stage, early pre-B cells that have rearranged their VDJ locus, and pre-B cells that express the pre-BCR (15, 33). Active translation of mRNA is associated with increased transcript abundance due to stabilization of the mRNA by polysomes, a process that is enhanced by B cell activation (34 –36). If the successful assembly of the pre-BCR activates early pre-B cells, it is possible that assembly may similarly enhance mRNA abundance and thereby stabilize VH and JH preference. Testing of this hypothesis will require detailed sequence analysis of transcripts and mRNA message abundance from fraction C cells that have been separated on the basis of pre-BCR expression. Our work confirms and extends a previous observation regarding the enhanced use of DH reading frame 2 at the earliest stages of B cell development (25). The primary mechanism postulated to limit use of reading frame 2 in the expressed repertoire is of the ability of D protein to create a pre-BCR complex (37, 38), which can then activate the allelic exclusion signal transduction pathway to prevent further V3 DJ rearrangement. We speculate that all of the components of this pathway may not be fully active in fraction B, allowing reading frame 2 DJ rearrangements to undergo V3 DJ recombination. Rearrangement at stages lacking an intact pre-BCR signal transduction complex may represent a mechanism by which Downloaded from http://www.jimmunol.org/ by guest on June 12, 2017 ( p ⫽ 0.04), giving DFL16.1 an average net gain of 1.2 germline nt relative to DSP. By fraction F, 5⬘ terminal loss among the 20 sequences that contain DFL16.1 had increased to an average of 5.5 ⫾ 0.6 nt vs 4.1 ⫾ 0.4 for the 57 sequences that contain DSP ( p ⫽ 0.05), yielding an average net loss of 1.4 nt relative to DSP. This is a net flip of 2.6 nt, or almost 1 codon. The Journal of Immunology containing charged amino acids have been reported to be sequentially purged from the repertoire (22). As with length and amino acid use, we observed an adjustment and focusing of the charge distribution of CDR-H3 during B cell development. This reflected not only a decrease in the average charge, but also a significant reduction in the contribution of highly charged or highly hydrophobic sequences. Together, these data show that the essential architecture of the CDR-H3 repertoire, including gene segment use, length, structure, amino acid composition, and average hydrophobicity, is established early in B cell development before the surface expression of membrane-bound IgM. With development, many of these features are fine-tuned and focused into an apparent optimal range of lengths, charge, and amino acid composition. These observations raise a series of open questions. DH reading frame 1 sequences encode the same amino acids that dominate CDR-H3 throughout bone marrow development. Does germline conservation of DH sequence serve as the deciding factor in dictating the composition of CDR-H3? If germline DH sequence plays a critical role in regulating CDR-H3, what role do individual DH sequences play in the development of the repertoire and the ability to mount humoral immune responses? What are the specific selective pressures and the mechanisms that serve to fine-tune the composition of CDR-H3 as the B cell population matures? And finally, what might be the functional consequences of violating these apparent constraints on CDR-H3 composition? The answer to these questions will require targeted manipulation of the germline DH locus, which may shed new light on the role of the DH gene segment in navigating B cell developmental checkpoints and optimizing immune function. Acknowledgments We thank P. Burrows, T. Carvalho, G. Ippolito, and J. Link for their invaluable advice and support. Disclosures The authors have no financial conflict of interest. References 1. Tonegawa, S. 1983. Somatic generation of antibody diversity. Nature 302: 575–581. 2. Alt, F. W., and D. Baltimore. 1982. Joining of immunoglobulin heavy chain gene segments: implications from a chromosome with evidence of three D-J heavy fusions. Proc. Natl. Acad. Sci. USA 79: 4118 – 4122. 3. Rajewsky, K. 1996. Clonal selection and learning in the antibody system. Nature 381: 751–758. 4. Hood, L., and D. Galas. 2003. The digital code of DNA. Nature 421: 444 – 448. 5. Nossal, G. J. V. 2003. The double helix and immunology. Nature 421: 440 – 444. 6. Janeway, C. A., Jr., and R. Medzhitov. 2000. Innate immune recognition. Annu. Rev. Immunol. 20: 197–216. 7. Kabat, E. A., T. T. Wu, H. M. Perry, K. S. Gottesman, and C. Foeller. Sequences of Proteins of Immunological Interest. U.S. Department of Health and Human Services, Bethesda, pp. 1–2387. 8. Padlan, E. A. 1994. Anatomy of the antibody molecule. Mol. Immunol. 31: 169 –217. 9. Zemlin, M., M. Klinger, J. Link, C. Zemlin, K. Bauer, J. A. Engler, H. W. Schroeder, Jr., and P. M. Kirkham. 2003. Expressed murine and human CDR-H3 intervals of equal length exhibit distinct repertoires that differ in their amino acid composition and predicted range of structures. J. Mol. Biol. 334: 733–749. 10. Wu, T. T., G. Johnson, and E. A. Kabat. 1993. Length distribution of CDRH3 in antibodies. Proteins Struct. Funct. Genet. 16: 1–7. 11. Ivanov, I. I., J. M. Link, G. C. Ippolito, and H. W. Schroeder, Jr. 2002. Constraints on hydropathicity and sequence composition of HCDR3 are conserved across evolution. In The Antibodies, Vol. 7. M. Zanetti and J. D. Capra, eds. Taylor and Francis Group, London, pp. 43– 67. 12. Golub, R., J. S. Fellah, and J. Charlemagne. 1997. Structure and diversity of the heavy chain VDJ junctions in the developing Mexican axolotl. Immunogenetics 46: 402– 409. 13. Meffre, E., R. Casellas, and M. C. Nussenzweig. 2000. Antibody regulation of B cell development. Nat. Immunol. 1: 379 –385. 14. Hardy, R. R., and K. Hayakawa. 2001. B cell development pathways. Annu. Rev. Immunol. 19: 595– 621. Downloaded from http://www.jimmunol.org/ by guest on June 12, 2017 suboptimal H chain V domains may enter the immature B cell repertoire. Unlike humans, in which the average length of CDR-H3 decreases with development (37, 38), we observed an increase in the average length of CDR-H3 with development in mice. On average, however, mouse sequences are significantly shorter than human ones (9). Thus, even in fraction F, the mice lacked the longer CDR-H3 sequences that have been associated with enhanced selfreactivity in humans. As with humans (19), the changes in sequence that contribute to the increase in length can be subtle. Although the only significant changes affecting overall CDR-H3 length were the enhanced retention of 5⬘ JH terminal sequence and the increased loss of 5⬘ DFL16.1 sequence, given the number of sequences analyzed it is possible that our analysis was underpowered to confirm the statistical significance of other more delicate differences. Adjustments to JH and DH length with development and with ontogeny are also observed in humans, macaques, and chimpanzees (19, 30). They appear to represent common mechanisms used to respond to potentially common selective forces. The increase in average length of CDR-H3 with development occurred in association with decreased representation of outlier lengths. Although our analysis cannot distinguish whether this phenomenon results from the loss of outliers or positive selection for sequences with lengths closer to the final average length of CDR-H3 in the mature B cell fraction F, the net effect is to decrease the range of diversity, thereby focusing the mature B cell repertoire into what appears to be a preferred range. As with length, the broad outline of amino acid preference was also established early in B lineage development. Preference for tyrosine and glycine in the CDR-H3 loop was already established in fraction B. With development, the representation of these two amino acids was enhanced. The adjustments to CDR-H3 length, the loss of 5⬘ terminal DFL16.1 and the gain of 5⬘ terminal JH sequence appear to play a role in this process. The sequences of the CDR-H3 loop portion of the four JH gene segments encode YWY, Y, W, and YYA, respectively. Thus, tyrosine represents 50% of the JH amino acids that can contribute to the CDR-H3 loop. Enhanced preservation of 5⬘ JH sequence will enrich for tyrosine at the C terminus of the loop. In reading frame 1, the 5⬘ terminus of DFL16.1 encodes tyrosine. The increased loss of 5⬘ nucleotides with development has the effect of decreasing representation of tyrosine at the amino terminus of the DFL16.1-containing loops. In humans, there is an asymmetric distribution of tyrosine in the CDR-H3 loop, with the C terminus enriched for tyrosine (9). The tyrosine gradient increases with increasing loop length. A similar tyrosine gradient is not obvious in mice. However, DFL16.1 gene segment-containing sequences are among the longest CDR-H3 loops. The combination of loss of 5⬘ DFL16.1-encoded tyrosine and gain of 3⬘ JH-encoded tyrosine may act to prevent a relative enrichment for tyrosine at the amino terminus of the loop. An excess of tyrosine at the beginning of the CDR-H3 loop may be as undesirable in mice as in humans. The average length of sequences that contain DQ52 converges toward the average for sequences in which the contribution of DH cannot be ascertained. Both of these types of short sequences are less likely to contain tyrosine in the CDR-H3 loop, although glycine is common (data not shown). The role of short, glycine-enriched, tyrosine-depleted CDR-H3 structures is unknown, but they appear clearly distinct from CDR-H3 structures that contain extensive sequence derived from the DFL and DSP gene segments, which are longer and enriched for tyrosine as well as glycine. The presence of excess charged amino acids has been correlated with self-reactivity (22, 39 – 41), especially to dsDNA. Sequences 7779 7780 27. Kirkham, P. M., and H. W. Schroeder, Jr. 1994. Antibody structure and the evolution of immunoglobulin V gene segments. Semin. Immunol. 6: 347–360. 28. Lefranc, M.-P. 2003. IMGT, the international ImMunoGeneTics database. Nucleic Acids Res. 31: 307–310. 29. Yancopoulos, G. D., S. V. Desiderio, M. Paskind, J. F. Kearney, D. Baltimore, and F. W. Alt. 1984. Preferential utilization of the most JH-proximal VH gene segments in pre-B cell lines. Nature 311: 727–733. 30. Link, J. M., J. E. Larson, and H. W. Schroeder, Jr. 2005. Despite extensive similarity in germline DH and JH sequence, the adult Rhesus macaque CDR-H3 repertoire differs from human. Mol. Immunol. 42: 943–55. 31. Kyte, J., and R. F. Doolittle. 1982. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157: 105–132. 32. Eisenberg, D. 1984. Three-dimensional structure of membrane and surface proteins. Annu. Rev. Biochem. 53: 595– 623. 33. Martensson, I. L., A. Rolink, F. Melchers, C. Mundt, S. Licence, and T. Shimizu. 2002. The pre-B cell receptor and its role in proliferation and Ig heavy chain allelic exclusion. Semin. Immunol. 14: 335–342. 34. Kelley, D. E., and R. P. Perry. 1986. Transcriptional and posttranscriptional control of immunoglobulin mRNA production during B lymphocyte development. Nucleic Acids Res. 14: 5431–5447. 35. Jack, H. M., J. Berg, and M. Wabl. 1989. Translation affects immunoglobulin mRNA stability. Eur. J. Immunol. 19: 843– 847. 36. Maquat, L. E. 2004. Nonsense-mediated mRNA decay: splicing, translation and mRNP dynamics. Nat. Rev. Mol. Cell Biol. 5: 89 –99. 37. Gu, H., D. Kitamura, and K. Rajewsky. 1991. B cell development regulated by gene rearrangement: arrest of maturation by membrane-bound D protein and selection of DH element reading frames. Cell 65: 47–54. 38. Horne, M. C., P. E. Roth, and A. L. DeFranco. 1996. Assembly of the truncated immunoglobulin heavy chain D into antigen receptor-like complexes in pre-B cells but not in B cells. Immunity 4: 145–158. 39. Shlomchik, M. J., M. A. Mascelli, H. Shan, M. Z. Radic, D. S. Pisetsky, A. Marshak-Rothstein, and M. G. Weigert. 1990. Anti-DNA antibodies from autoimmune mice arise by clonal expansion and somatic mutation. J. Exp. Med. 171: 265–292. 40. Barbas, S. M., H. J. Ditzel, E. M. Salonen, W. P. Yang, G. J. Silverman, and D. R. Burton. 1995. Human autoantibody recognition of DNA. Proc. Natl. Acad. Sci. USA 92: 2529 –2533. 41. Zemlin, M., R. L. Schelonka, K. Bauer, and H. W. Schroeder, Jr. 2002. Regulation and chance in the ontogeny of B and T cell antigen receptor repertoires. Immunol. Res. 26: 265–278. Downloaded from http://www.jimmunol.org/ by guest on June 12, 2017 15. Burrows, P. D., R. P. Stephan, Y. H. Wang, K. Lassoued, Z. Zhang, and M. D. Cooper. 2002. The transient expression of pre-B cell receptors governs B cell development. Semin. Immunol. 14: 343–349. 16. Keyna, U., G. B. Beck-Engeser, J. Jongstra, S. E. Applequist, and H. M. Jack. 1995. Surrogate light chain-dependent selection of Ig heavy chain V regions. J. Immunol. 155: 5536 –5542. 17. Kline, G. H., L. Hartwell, G. B. Beck-Engeser, U. Keyna, S. Zaharevitz, N. R. Klinman, and H.-M. Jack. 1998. Pre-B cell receptor-mediated selection of pre-B cells synthesizing functional heavy chains. J. Immunol. 161: 1608 –1618. 18. Martin, D. A., H. Bradl, T. J. Collins, E. Roth, H. M. Jack, and G. E. Wu. 2003. Selection of Ig heavy chains by complementarity-determining region 3 length and amino acid composition. J. Immunol. 171: 4663– 4671. 19. Shiokawa, S., F. Mortari, J. O. Lima, C. Nunez, F. E. I. Bertrand, P. M. Kirkham, S. Zhu, A. P. Dasanayake, and H. W. Schroeder, Jr. 1999. IgM HCDR3 diversity is constrained by genetic and somatic mechanisms until two months after birth. J. Immunol. 162: 6060 – 6070. 20. Raaphorst, F. M., C. S. Raman, J. Tami, M. Fischbach, and I. Sanz. 1997. Human Ig heavy chain CDR3 regions in adult bone marrow pre-B cells display an adult phenotype of diversity: evidence for structural selection of DH amino acid sequences. Int. Immunol. 9: 1503–1515. 21. Schroeder, H. W., Jr., G. C. Ippolito, and S. Shiokawa. 1998. Regulation of the antibody repertoire through control of HCDR3 diversity. Vaccine 16: 1383–1390. 22. Wardemann, H., S. Yurasov, A. Schaefer, J. W. Young, E. Meffre, and M. C. Nussenzweig. 2003. Predominant autoantibody production by early human B cell precursors. Science 301: 1374 –1377. 23. Williams, G. S., A. Martinez, A. Montalbano, A. Tang, A. Mauhar, K. M. Ogwaro, D. Merz, C. Chevillard, R. Riblet, and A. J. Feeney. 2001. Unequal VH gene rearrangement frequency within the large VH7183 gene family is not due to recombination signal sequence variation, and mapping of the genes shows a bias of rearrangement based on chromosomal location. J. Immunol. 167: 257–263. 24. Viale, A. C., A. Coutinho, and A. A. Freitas. 1992. Differential expression of VH gene families in peripheral B cell repertoires of newborn or adult immunoglobulin H chain congenic mice. J. Exp. Med. 175: 1449 –1456. 25. Huetz, F., L. Carlsson, U.-C. Tornberg, and D. Holmberg. 1993. V-region directed selection in differentiating B lymphocytes. EMBO J. 12: 1819 –1826. 26. Marshall, A. J., G. E. Wu, and G. J. Paige. 1996. Frequency of VH81x usage during B cell development: initial decline in usage is independent of Ig heavy chain cell surface expression. J. Immunol. 156: 2077–2084. CDR-H3 REPERTOIRE DEVELOPMENT IN BONE MARROW