* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Divergent Evolution and Evolution by the Birth-and
Epigenetics of diabetes Type 2 wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene therapy wikipedia , lookup
Point mutation wikipedia , lookup
Genetic engineering wikipedia , lookup
Copy-number variation wikipedia , lookup
Non-coding DNA wikipedia , lookup
Adaptive evolution in the human genome wikipedia , lookup
Gene nomenclature wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Transposable element wikipedia , lookup
Oncogenomics wikipedia , lookup
Human genome wikipedia , lookup
Metagenomics wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Public health genomics wikipedia , lookup
Gene desert wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Essential gene wikipedia , lookup
History of genetic engineering wikipedia , lookup
Pathogenomics wikipedia , lookup
Gene expression programming wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genomic imprinting wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Ridge (biology) wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Genome (book) wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Designer baby wikipedia , lookup
Minimal genome wikipedia , lookup
Genome evolution wikipedia , lookup
Divergent Evolution and Evolution by the Birth-and-Death Process in the Immunoglobulin VH Gene Family Tatsuya Ota and Masatoshi Nei Department of Biology and Institute of Molecular Evolutionary Genetics, The Pennsylvania State University Immunoglobulin diversity is generated primarily by the heavy- and light-chain variable-region gene families. To understand the pattern of long-term evolution of the heavy-chain variable-region (V,) gene family, which is composed of a large number of member genes, the evolutionary relationships of representative Vn genes from diverse organisms of vertebrates were studied by constructing a phylogenetic tree. This tree indicates that the vertebrate Vn genes can be classified into group A, B, C, D, and E genes. All Vn genes from cartilaginous fishes such as sharks and skates form a monophyletic group and belong to group E, whereas group D consists of bonyfish Vn genes. By contrast, group C includes not only some fish genes but also amphibian, reptile, bird, and mammalian genes. Group A and B genes were composed of the genes from mammals and amphibians. The phylogenetic analysis also suggests that mammalian Vu genes are classified into three clusters-i.e., mammalian clans I, II, and III-and that these clans have coexisted in the genome for >400 Myr. To study the short-term evolution of Vn genes, the phylogenetic analysis of human group A (clan I) and C (clan III) genes was also conducted. The results obtained show that Vn pseudogenes have evolved much faster than functional genes and that they have branched off from various functional Vu genes. There is little indication that the Vu gene family has been subject to concerted evolution that homogenizes member genes. These observations indicate that the Vn genes are subject to divergent evolution due to diversifying selection and evolution by the birth-and-death process caused by gene duplication and dysfunctioning mutation. Thus, the evolutionary pattern of this monofunctional multigene family is quite different from that of such gene families as the ribosomal RNA and histone gene families. Introduction Monofunctional multigene families, where all member genes have the same function, are generally believed to undergo coincidental or concerted evolution, which homogenizes the DNA sequences of the member genes (Smith et al. 197 1; Smith 1974; Zimmer et al. 1980). Good examples of concerted evolution are seen with ribosomal RNA (rRNA) genes (e.g., see Arnheim 1983) and histone genes (e.g., see Matsuo and Yamazaki 1989)) where all the member genes have virtually identical coding sequences within species. This is understandable, because these gene families produce a large quantity of gene products of the same function that are essential for the survival of an organism. Concerted evolution is caused either by unequal crossover or intergenic gene conversion. Key words: birth-and-death process, divergent evolution, immunoglobulin, multigene family, VH genes. Address for correspondence and reprints: Masatoshi Nei, Institute of Molecular Evolutionary Genetics, The Pennsylvania State University, 328 Mueller Laboratory, University Park, Pennsylvania 16802. Mol. Biol. Evol. 11(3):469-482. 1994. 0 1994 by The University of Chicago. All rights reserved. 0737-4038/94/l 103~0014$02.00 Concerted evolution was also invoked to explain the evolution of the immunoglobulin heavy-chain variable-region (Vu) genes (Hood et al. 1975; Ohta 1980, 1983). Gojobori and Nei ( 1984), however, showed that Vu gene (often called “gene segment”) family in the mouse and human undergoes a very slow rate of concerted evolution, if any; that the extent of sequence divergence between member genes in the mouse or the human is substantial; and that some of these genes diverged much earlier than the mammalian radiation ( N 80 Mya) (also see Tutter and Riblet 1989a). During the past 10 years, the DNA sequences of Vu genes have been determined for many lower vertebrate organisms (see Litman et al. 1993 ) . We can therefore study the pattern of evolution of the Vu gene family on a much longer evolutionary time scale. This type of study is particularly interesting, because the gene organization of Vu genes is now known to vary extensively among different classes of vertebrates (Du Pasquier 1993). In this paper, we show that the pattern of Vu gene evolution is very different from the typical concerted evolution and that two evolutionary processes, which may be called “divergent evolution” and “evolution by the birth-and-death process,” predominate in 469 470 Ota and Nei Ancestral form VDJ Cartilaginous fishes I V DDJ C VDDJ C Higher vertebrates IV VDDJ C .: ,: -IHHHHW C Vn Horned shark V DDJ C VDDJ C Dn B Jn Cn Mammals VDDJ C Little skate vVn V Dn J Cn Chicken FIG.1.-Genomic organizations of immunoglobulin heavy-chain genes. Cartilaginous fishes have the ( VH-DH-Dn-JH-CH)ntype organization including germ-line-joined genes, whereas most higher vertebrates, including bony fishes and tetrapods, have the (Vn),-( Dn)“-( Jr_,)“-(Cn), type gene organization. Chicken has a single functional variable gene. V = variable-segment gene; D = diversity-segment gene; J = joining-segment gene; C = constant-segment gene; and w = pseudogene. For more detailed information, see the work of Litman et al. ( 1993). the evolution of Vn genes in vertebrates, despite the fact that these genes have the same function. We first study the long-term evolution of VH genes, examining the evolutionary relationships of the genes obtained from various classes of vertebrates, and then a short-term evolution of human Vn genes. Material and Methods Background Information Immunoglobulin or antibody molecules consist of two heavy chains and two light chains. Both the heavy and light chains are composed of the variable region and the constant region (C). The variable region is responsible for antigen-binding, whereas the constant region is responsible for effector function. The heavy-chain variable region is encoded by the variable-segment ( Vn ) , diversity-segment (Dn), and joining-segment (Jn) genes, whereas the light-chain variable region is encoded by the variable-segment (V,) and joining-segment ( JL) genes. However, the major portion of the variable region is determined by either Vn or VL. The Vn and Vi_ genes are evolutionarily homologous with each other, but their sequence similarity is quite low. The VH (as well as V,) genes can be subdivided further into two hypervariable or complementarity-determining regions (CDR 1 and CDR2 ) and three framework regions (FWl, FW2, and FW3). (CDR3 is determined primarily by the Dn and Jn genes, so it was excluded in this paper). The human and mouse genomes contain > 100 VH genes and a few to a dozen Dn and Jn genes (see fig. 1). Each of the Vn’s is recombined with one of the Du’s, one of the Jn’s, and one of the Cn’s (constant region genes), to form a heavy-chain gene (Tonegawa 1983). This process therefore generates a large number of different immunoglobulins that are necessary for defending an individual from many different foreign antigens. Essentially the same humoral immune system exists for most of the bony fishes and tetrapods. In lower vertebrates such as sharks and skates, however, the genomic organization of the immunoglobulin genes is different from that of higher vertebrates, which may be described as (V,-D,-J,-C,) . In lower vertebrates, the basic unit of repeat of genes is (V-D-D-JC) (fig. 1); that is, the linked gene group Vn-Dn-Dn-JuCu is repeated several hundred times in the genome, Evolution of Immunoglobulin VH Gene Family and the repeat seems to be scattered on different chromosomes (Litman et al. 1993). Therefore, the immunoglobulin genes in these organisms may be represented by (V-D-D-J-C),, . Of course, there are many exceptions to this rule. Some units of shark genes have the Vu and Du genes that are already united (VDD-J-C) in the germ-line genome, whereas in some other units all VH , Du’s, and JH are united (VDDJ-C) (Litman et al. 1993). In the little skate (Raja erinacea), (VD-DJ-C) are also present. Furthermore, in the coelacanth (Latimeria chalumnae), which belongs to the lobe-finned fishes, (V-D) is apparently repeated many times in the 5’ side of the J and C genes ( Amemiya et al. 1993). Actually, the number of lower vertebrates studied so far is very small, so it is likely that some new types of genomic organization of Vu genes will be discovered in the near future. Another group of vertebrates that has a deviant genomic organization of the VH genes is that of avian species represented by chicken (for review, see McCormack et al. 199 1) . In these species, there are only one functional Vn gene and - 80 Vu pseudogenes in the genome (Reynaud et al. 1989), and the antibody diversity is generated by gene conversion of the functional gene by the pseudogenes (fig. 1) . (The same process operates for the light-chain genes as well; Reynaud et al. 1987; Thompson and Neiman 1987.) However, this system may not apply for all avian species, because in the Muscovy duck there seem to be at least two functional VL genes (McCormack et al. 1989). It is interesting that a somewhat similar pattern has been observed in the rabbit. In this organism, the D-proximal Vn gene is preferentially used, though there are b 100 VH genes and many of them seem to be functional (Knight 1992). The immunoglobulin diversity in this organism seems to be generated by gene conversion and somatic mutation. In the primitive jawless vertebrates such as lampreys and hagfishes, an extensive search for immunoglobulin genes has been conducted, but no genes that are clearly homologous to the immunoglobulin genes in cartilaginous fishes or higher vertebrates have been identified, though antibodies are clearly present in these organisms (Litman et al. 1993). Therefore, a completely different immune system may exist in these species. Nucleotide Sequences Used At the present time, there are >600 Vu gene sequences that have been determined for the various groups of organisms mentioned above. This number includes both germ-line and cDNA sequences, and the majority of them come from the human and mouse (Kabat et al. 199 1) . For studying the long-term evolution of Vu genes, it is necessary to use representative genes 47 1 from each species and to cover a wide range of organisms. We have therefore included (a) almost all sequences from a species where a small number of genes have been studied but (b) a relatively few representative sequences from a species where a large number of genes have been studied. For the study of long-term evolution, no pseudogenes were included. In both human and mouse, >60 functional germ-line genes have been sequenced, but Schroeder et al. ( 1990) showed that they can be classified into three major clusters, i.e., clans I-III (also see Kirkham et al. 1992). Our preliminary analysis of -600 mammalian VH genes, which were obtained from GenBank and Kabatnuc databases, has also suggested that these genes can be grouped into the three major clusters (data not shown). We have therefore used 11 and 7 representative sequences from the mouse and the human, respectively, covering the three major clusters, i.e., clans I-III. More than 30 different Vn sequences are also available from the African toad Xenopus laevis. We chose 11 representative sequences from them. For other organisms, we used almost all available genes after exclusion of closely related ones-except for chicken, where we used only one gene, which is used exclusively for antibody generation. For the purpose of rooting the Vu gene tree, we included a horned shark light-chain Vi_ gene and a human preB gene, which is known to be homologous to vertebrate lambda VL genes (Kudo and Melchers 1987). The total number of Vu and outgroup genes used for our study of long-term evolution was 57, and they are listed in table 1. We used the germ-line sequences whenever possible, to avoid the effect of somatic mutation ( Tonegawa 1983 ) . In a few species, however, we used cDNA, because germ-line genes were not available and sequence divergences were so large that the effect of somatic mutation was negligible. The cDNA sequences are denoted by asterisks in table 1. Since we used many genes from many different organisms, we designated each gene by the first letter of both the genus name and the species name plus the gene designation used by the original authors, except in the African toad (X . Zaevis), where we used “Xe” instead of “Xl” because “1” (el) is easily confused with “ 1” (one). For the study of “short-term” evolution, we included pseudogenes as well as functional genes and used 10 VH genes belonging to the clan I genes and 28 genes belonging to the clan III genes. (Clan II genes were not used, because there was only one pseudogene.) All of these data were taken from recent exhaustive studies of human Vu genes by Shin et al. ( 199 1) and Matsuda et al. ( 1993). These Vu genes are scattered throughout a 0.8-Mb region of the human Vn gene cluster of chromosome 14 and are intermingled with different Vu clan genes. 472 Ota and Nei Table 1 VH Genes Used for Phylogenetic Analysis in Figure 4 V Gene(s)” (references) Heavy chains: Cartilaginous fishes: Sharks: Nurse shark (Ginglymostoma cirratum) . . . Horned shark (Heterodontus franc&i) . ... ... Skate: ........ Little skate (Raja erinacea) Bony fishes: Ray-finned fishes: Goldfish (Carassius auratus) .... .... .. . Ladyfish (Elops saurus) Channel catfish (Zctalurus punctatus) . Rainbow trout (Oncorhynchus mykiss) Lobe-finned fishes: Coelacanth (Latimeria chalumnae) Amphibians: African toad (Xenopus laevis) . . . . . GcA4-2* (Vazquez et al. 1992) HfFlOl, Hf1207, Hfl315, Hf1320, Hf1403, Hf2806, Ht2807, Hf2809, Hf3083 (Kokubu et al. 1988) . . . .. Re102, Re107, RellO (Harding et al. 1990b) . ..... Reptiles: .. Caiman (Caiman crocodylus) Birds: Chicken (Gallus domesticus) . .. Mammals: .. . Human (Homo sapiens) Mouse (Mus musculus) .. Rabbit (Oryctolagus cuniculus) . . . . Light chains (outgroups): Horned shark (Heterodontus francisci) Human (Homo sapiens) . . . . . . . . . . . . CaVH99A (Wilson et al. 199 1) Es14501*, Es14515* (Amemiya and Litman 1990) IpNG lo*, IpNG22*, IpNG54*, IpNG64*, IpNG66* (Ghaffari and Lobb 199 1); Ip10.4* (Warr et al. 199 1) 0mRTVH431 (Matsunaga et al. 1990) Lc1501 la (Amemiya et al. 1993) XeVH 1 (Yamasaki-Kataoka and Honjo 1987); XeLL2.8, XeLL3.4 (Schwager et al. 1989); Xe4, Xe5, Xe6, Xe7, Xe9, XelO, Xel l.lb (Haire et al. 1991) Ccc3 (Litman et al. 1983) GdVHl (Reynaud et al. 1989) HsVH251 (Humphries et al. 1988); HsVH26 (Mathyssens and Rabbitts 1980); HsVH6 (Schroeder et al. 1988); HsV 1.4.l.b, HsV2.5 (Shin et al. 1991); HsV35 (Matsuda et al. 1988); Hsl2G-1 (Lee et al. 1987) MmA/J (Gefter et al. 1984); MmH30 (Schiff et al. 1985); MmH4a-3 (Schiff et al. 1986); MmME3 (Kasturi et al. 1990); MmPJ 14 (Sakano et al. 1980); MmVH283 (0110 et al. 1983); MmVH441 (0110 et al. 1981); MmVl 1 (Siu et al. 1987); Mm1.8, Mm43X (Rathbun et al. 1988); Mm186-2 (Bothwell et al. 1981) Ocp26.9.p2 (Currier et al. 1988) Hf122 (Shamblott and Litman 1989) HsVPB6 (Bauer et al. 1988) ’ An asterisk (*) indicates a cDNA sequence. Phylogenetic Analysis In the study of the evolutionary relationships of Vn genes for the entire group of vertebrates, we used amino acid sequences rather than nucleotide sequences, because the Vn genes are highly divergent and the nucleotide differences at certain positions apparently have reached the saturation level. All amino acid sequences were aligned by using the program CLUSTAL V (Higgins et al. 1992 ) , with some modification by visual inspection. As mentioned earlier, a Vu gene consists of CDRs and FWs. However, CDRs are highly variable and contain many insertions/deletions (indels) when the Vn and VL sequences from diverse organisms are included (see fig. 2), so they were excluded from the analysis. In the present analysis even the FW regions showed some indels. All sites showing indels were excluded from the analysis. Therefore, the total number of amino acid sites used for phylogenetic analysis was 64. Three sequences, one from each of the horned shark ( Hf 1113 ) , little skate ( Re20 * ) , and African toad ( Xe8 ) , had several indels in FW2 and FW3, and it was difficult to align them with other sequences ( fig. 2 ) . Therefore, they were not used for the present analysis. In the study of a “short-term” evolution of Vn genes, we used nucleotide sequences, since the extent of sequence divergence was relatively small in this case. Because we were interested in the Evolution of Immunoglobulin Vu Gene Family Heavy chain FRl I HsV35 MmH30 HsV12G-1 m/J HsVH26 MmVH283 CaVH99A HfFlOl XelO Xe8 Hf1113 Re20' Light chain HsVPB6 Hr122 CDRl FR2 CDR2 473 FR3 I I I QVQLVQSGAEV-KKPGASVKVSCXA-SGYTFTGYYM-IRINPNSGGTNYAQKFQGRVTSTRDTSISTAYMELSRLRSDDTVWYCAR K.R....-...I.Y...S..Y...N...KDKA.L.A.K.S.....Q..S.T.E.SA...... I.Y.Y-Y..S.Y.NPSLKS MSV...KNQFSLK..SVTAV..A...... E . ..QE..PSL-V..SQTLSLT.SV-T.DSI.S-DYWN.I.~~::~~::~..Y.S-Y..S.Y.NPSLKS:~~I . .. ..KNQY.LQ.NSVT.E..AT..... E .LE..GGL-VQ..G.LRL..A.-..F..SS.A.-S.......K.-.. .VSA.SGSG.S.Y.GDSVK..F.IS..N.KN.L.LQMNS..AE..A.....K E:id..E..GGL-V...G.L.L..A.- ..F..SS.T.-S....T.EKR-.. .VAT.SSGG.N.Y.PDSVK..F.IS..NAKNNL.LQM.S...E..AL..... .-S.TS.DSV.-....E..TL..T.- ..FSMSS.W.-G.I..K..K-...I.Q.SGST--~...SL..QFSI.T...~HL.~S.KTE..A...... D.V.T.PE..T-RS..C.LRLT..T-..FN.GSHT.-.....V....-.. .LVYY-YI.SNIH..PGVKD.F.ASK..LNNIFALDMKN.KTE.NAI.F... DPA.T.EESQLE..HND.F.LT.RG- ..FD.SQ.G.-N.I..T..K.-.. .L.L.WYDASK.V..KSVE..LVI..NNAEQVTF...KN.VYQ..A....T. EIL.S.KAS.I-ANV.T.ILLQ.EV-....NI~HH.-........P.TI..~YR--.DT.YISES.KE...PS--..G...QLRI.K.S.S..AT...T. DIV.T..ASVT-....E.H.L..SV-..FSLDS..V-...K....K.-...LLAYRKP.DTIY..PGV...IIPS.AS-ST.Y-I.IKS...E..AM..... A.V.N.KPT.A-A.S.E.L.LT.VT-..FSLSSSNV-...K.V..K.-.. .V-A.MWYDDDKD..PA.S..F.VS..S-SNVY-LQMTN...AT....A .PV.H.PP.MS-SAL.TTIRLT.TLRNDHDIGV.SV-Y.YQ.R..HP-PRFLL.YF-SQSDKSQGPQVPP.FSGSK.VA~G.LSI.E.QPE.~....M VPV.N.TPISDPVSA.ETSELK.AMQN.-KMGS.. .-SiY..R..EA-PV.~HS-T..SIYRGTG.TD.FKPS....SNSHILTIGS.EPG.SA.....A +++++++++ +++++++++++ +++++++++ +++++ ++++++++++++++++++++++++++++++ FIG. 2.-Amino acid sequences of the variable regions of immunoglobulins from diverse organisms. They are representative sequences from five major groups of Vn segments (A, B, C, D, and E) in fig. 4 and three divergent sequences (Hfll13 [Hinds-Freyet al. 19931; Re20* [Harding et al. 1990~1; and Xe8 [Haire et al. 19911) that contain additional indels in FW2 and FW3. Two VL sequences, which were used as outgroups, are also included. A dash (-) indicates an indel, and the 64 sites marked with a plus sign ( +) were used for phylogenetic construction in fig. 4. CDRl and CDR2 were not used because their sequence divergence was extensive and the sequence alignment was unreliable. dynamics of the evolution of Vu genes in this study, we included both functional genes and pseudogenes. There was no difficulty in the alignment of the sequences, except in the case of a few pseudogenes. These few pseudogenes were excluded from the analysis. The phylogenetic analysis of both amino acid and nucleotide sequence data was conducted by using the minimum-evolution method ( Rzhetsky and Nei 1992). This method has a solid theoretical basis and is known to be efficient in obtaining correct phylogenetic trees (Rzhetsky and Nei 1993). The evolutionary distances used were the Poisson-correction distance (see Nei 1987, p. 4 1 ), for amino acid sequence data, and Jukes and Cantor’s ( 1969) distance, for nucleotide sequence data. The reliability of the tree obtained was tested by computing the standard error of each interior branch and conducting a t-test (Rzhetsky and Nei 1992). Results Evolutionary Relationships of Vn Genes from Diverse Organisms of Vertebrates The Poisson-correction distances for all pairs of 55 Vu and two outgroup sequences are presented in figure 3, whereas the phylogenetic tree obtained is given in figure 4. Figure 4 shows that the Vu genes in vertebrates can be classified into five groups, i.e., groups A, B, C, D, and E. Group A includes all mammalian clan I genes and a few Xenopus genes, whereas group B consists of all mammalian clan II genes and some Xenopus genes. Group C includes mammalian clan III genes and Vn genes from chicken, caiman (reptile), Xenopus, coelacanth, and some bony fishes. Group D consists of bonyfish VH genes only, whereas all group E genes are from cartilaginous fishes. Figure 4 shows that the first evolutionary split of Vu genes occurred between group E genes and others. This is reasonable, because the genomic organization of the genes from cartilaginous fishes is different from that of other vertebrates and because cartilaginous fishes separated from the other vertebrates -450 Mya. The second split of VH genes in the rest of vertebrates occurs between bony fishes (group D) and tetrapods, though group C includes some bony (ray-finned) -fish and coelacanth (lobe-finned fish) genes. However, the confidence probability (CP) ( 1 - type I error) of group D genes is relatively low (63%). This is caused mainly by the fact that the genes CaVH99A, IpNG64 *, and Es 145 15 * in this group are relatively closely related to the genes XeVH 1, Es1450 1 *, and 0mRTVH43 1 of group C (fig. 3). This somewhat anomalous relationship of genes may be due to either stochastic errors of amino acid substitution or some intergenic recombination or gene conversion events that occurred a long time ago. The next level of splitting is between group A genes and group (B+C) genes. The group A gene cluster also has a relatively low CP value. This low CP value (64%) is mainly due to the fact that the gene Xe 10 is similar to some group C and D genes (see fig. 3 ) . If we exclude this gene, the cluster of group A genes shows a CP value of 96%. The final clusters of group B and C genes show a relatively high CP value. Within each group of Vu genes, there are several clusters that are highly significant. In group E, for example, the skate genes are substantially different from the shark genes and form a cluster with a 99% probability. Apparently, this cluster separated from the shark cluster in an early stage of evolution, and, if more data accumulate and the cluster is still supported, we may have to separate it from the shark group. Note that sharks and skates diverged -2OOMya(Caroll1988, p. 62-83). Significant clusters (CP>95%) are also observed in all other groups. An interesting one is the mouse Vu cluster Gene 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 1.H.svH251 (A) 2 HsVlA.lb 3. Hsv35 4. MmH30 30 35 28 44 39 37 5. Mml.8 6. IUm43X 7. h4mVHl862 54 49 47 49 47 44 49 44 42 11 11 16 9 16 16 8. MmH4a-3 9. xes 49 39 44 18 26 22 22 lO.Xell.lb ll.Xe9 (B) 5 12. xc10 13. HsVH6 54 54 72 75 79 69 69 60 57 54 90 79 75 90 98 98 86 94 79 75 75 72 66 79 69 75 69 66 54 82 66 82 821 14. MmAiJ 1s. H8Vl2G-1 16. MmPJl4 79 72 79 75 82 79 75 57 82 72 98 94 28 63 63 72 69 75 79 66 60 79 69 90 82 24 37 60 66 79 82 86 82 86 75 63 66 82 79 47 52 42 17. Hsv2.5 18. XeLL28 19. xeLL3.4 20. xe4 21.X& 22. xe7 86 90 94 102 111 111 102 86 82 66 102 82 47 49 44 44 63 82 82 82.90 90 79 79 66 63 79 82 52 54 42 44 39 23. HsVH26 24. MmVH283 2.5.htUlVH441 26. Gcp26.9p2 27.hhVll 28.MmME3 29.GdVI-n 30. CCC3 3l.LclSOlla (0 P 75 63 66 86 44 54 54 79 69 63 98 60 66 66 90 98 69 75 72 82 106121 63 66 69 79 63 69 52 54 57 69 49 49 52 52 52 52 54 54 69 54 57 52 52 54 72 69 69 75 66 66 66 63 72 66 69 72 86 69 79 79 79 75 98 79 75 63 79 66 116102 66 63 72 66 72 60 35. IpNG54* 36. CaVH99A 37. IpNG64’ 38.E914515* 39. IpNGlO’ 40. IpNG22’ 52 54 72 63 66 66 79 75 98 86 94 94 86 82 98 94 98 94 41. IpNG66* 42. IplO.4. 72 86 19 72 43. HfFlOl 44.Hf2809 45.Hf2807 46. Hfl320 03 72 72 69 86 49 54 57 63 72 69 90 79 82 79 79 75 72 66 79 90 98 90 60 57 69 79 86 75 63 60 72 79 19 82 32. XeVHl 33.Es14501* 34. GmRw31 @I 49 44 52 60 69 63 63 63 44 42 52 60 69 69 60 54 33 47. r-m806 48. Hfl403 49. Hfl2u7 50.Hf1315 5l.Hf3083 52. GcA4-2, 53. RclO2 54.RellO 55. RelO7 56. Hf122 57. IwTB6 72 75 72 66 90 86 86 79 98 94 94 90 63 72 75 94 79 79 86 86 82 106 98 90 90 94 102 90 63 69 72 82 69 79 75 79 75 82 66 75 98 54 57 52 75 72 75 94 69 75 69 69 60 69 98 54 60 66 86 86 57 72 98 66 106 90 69 116 106 79 69 86 152 72 82 63 72 86 66 72 57 57 75 57 63 63 60 63 60 69 49 60 57 69 90 79 94 44 57 57 66 63 66 69 63 63 72 54 54 69 72 52 57 54 60 63 75 69 75 75 26 82 66 69 66 57 72 63 79 63 75 72 72 19 66 75 75 79 66 64 63 57 60 57 66 63 69 52 60 52 54 49 66 69 72 75 75 66 54 63 63 66 66 72 72 69 79 82 57 69 52 63 66 69 66 75 82 52 66 44 47 52 60 66 60 69 52 57 63 60 69 69 69 79 82 52 66 54 54 63 66 66 60 69 47 52 82 75 79 82 82 79 72 66 66 72 72 69 79 75 72 69 57 57 86 90 98 94 98 90 75 63 66 60 66 63 72 69 82 75 52 66 72 66 86 60 75 60 86 ;; 72 90 22 72 94 20 72 94 22 69 90 37 79 98 28 79 90 30 79 98 35 72 69 28 75 94 31 -8-3 22 35 28 30 30 30 26 35 35 28 49 52 42 42 39 33 39 39 37 47 52 39 47 49 37 39 47 35 44 47 47 49 52 63 42 42 37 44 47 42 39 52 39 33 35 49 90 86 69 54 75 75160 66 60 57 51 54 66 66 12 82 44 47 79 63 60 57 63 72172 72 69 75 69 12 75 66 86 lo; 60 66 72 60 69 57 69 75 79 79 75 75 19 69 69 63 79 111 60 69 60 63 66 66 63 86 69 79 66 60 69 66 60 63 75 lo; 54 63 86 82 82 75 75 82 94 102 82 72 79 82 86 106111 121 75 86 86 82 94 98 79 86 86 90 82 72 90 90 82 90 82 111 72 82 94 90 82 72 82 94 82 90 19 75 69 82 69 86 86 1oC 79 90 86 82 79 75 69 94 I82 82 75 75 69 82 69 82 94 lit 75 90 86 86 79 79 79 90182 106 86 79 82 82 69 94 94 111 69 79 86 82 75 72 79 86 75 94 79 75 72 79 66 90 94 101 63 69 54 54 66 63 54 63 72 72 52 60 69 72 47 44 75 75 60 63 98 98 47 54 63 66 37 52 63 60 33 47 57 57 60 19 66 82 60 82 75 94 54 79 75 82 63 86 94 90 75 86 94 90 63 82 90 82 60 82 94 82 66 72 79 72 16 30 28 47 42 57 52 44 66 13 57 52 63 47 72 63 69 57 79 15 82 66 69 60 57 60 54 63 35 66 35 35 72 79 82 75 82 86 94 82 82 75 75 63 69 75 82 72 69 82 90 75 75 60 66 69 82 79 72 86 79 69 63 75 90 72 72 54 60 66 75 69 75 69 63 75 86 90 90 86 75 72 69 79 86 69 94 72 69 75 79 69 90 94 101 57 69 69 69 60 69 86 69 66 54 57 63 12 69 69 72 66 79 90 98 94 86 82 75 72 72 79 69 90 72 75 69 72 63 82 90 98 63 72 79 75 66 75 90 72 72 54 54 60 69 66 69 79 69 86 98 106102 94 90 75 72 72 82 72 90 75 79 69 75 12 86 98 98 66 72 82 79 66 79 94 72 75 54 57 63 72 66 72 66 63 19 86 94 90 82 79 72 69 79 82 66 86 63 57 72 79 86 82 75 72 66 63 75 82 63 82 66 60 72 79 86 82 75 72 66 66 72 86 69 86 IS 69 86 94 98 94 90 90 75 72 82 94 79 94 75 72 75 90 90 94 86 79 75 69 79 94 66 82 34 79 90 102 102 106 98 98 86 82 79 86 79 a6 72 86 94 94 98 90 90 75 79 79 86 75 32 66 79 90 90 94 86 86 72 79 79 90182 06 90 111 102 102 116 102 94 98 86 106 121194 11 102126 116 126 138 106 102106 86 106126111 69 66 72 69 60 79 82 98 57 66 75 69 63 66 63 69 69 57 75 19 94 52 60 69 66 60 66 69 72 66 60 75 82 98 54 63 63 66 63 82 69 75 79 69 94 98 101 60 66 75 75 57 63 75 72 69 79 75 90 98 69 79 75 72 79 111 82 86 86 86 82 98 111 121 69 82 86 82 79 111 82 82 90 90 75 102106121 63 75 79 75 72 111 86 90 90 102 79 94 102116 66 82 82 79 75 102 94 111 94 116 102 98 106 111 82 94 106102 90 116116 106 111 116 106 116111 111 02 102 106 116102 30 35 47 37 42 44 39 79 82 75 90 75 69 69 72 79 69 11 72 75 82 82 75 16 11 66 64 72 75 69 15 9 13 66 69 72 72 90 63 60 52 54 60 66 66 69 66 75 79 63 86 63 66 54 49 54 63 6063637275757213 66 86 66 69 54 54 57 66 60 63 63 69 72 66 79 75 69 66 66 69 72 72 79 72 72 82 79 94 82 75 69 63 69 66 72 72 72 86 82 72 72 79 75 18 11 16 9 16 11 13 11 15 8 9 8 11 6 75 69 15 9 9 9 13 9 3 75 79 26 20 24 22 18 24 16 18 82 72 33 28 28 22 28 26 22 22 42 82 90 86 79 69 63 72 79 75 75 75 75 79 90 75 49 42 47 39 39 44 44 47 49 49 75 86 79 72 63 63 72 75 72 79 75 79 79 90 75 49 42 47 44 44 44 44 47 47 54 6 82 86 82 82 66 63 72 82 75 82 72 79 79 90 75 49 42 44 44 44 47 42 44 44 54 16 11 94 116 94 98 98 86 86 98 90 98 82 102102 86 86 94 94 86 94 86 90 86 90 94 98 94 90 90 121 121 102 106 98 98 111 116 116 106111106 102 111 98 82 82 79 82 79 79 79 82 82 102 90 90 90 75 FIG. 3.-Pairwise Poisson-correction distances for 55 V, and two VL sequences. The five major groups of VH genes and two VL outgroup genes are separated by lines. An asterisk (*) indicates a sequence deduced from a cDNA. Mammalian clan I Group A HsVH6 MmA/J HsV12G-1 97 91 1 1 Mammalian clan II Group B African toad Xe7 rl . HsVH26 MmVH283 Mammalian clan III Chicken Caiman Coelacanth African toad 99 1 I CaVH99A IpNG64” Ray-finned fishes Group C I Group D _, Hf.2807 Hf1320 65f Hi2806 71 Hf1403 Hf1207 62c Hf1315 86 , Hf3083 GcA4-2* Cartilaginous fishes HsVPB6 Group E 1 J Outgroup FIG. 4.-Phylogenetic tree of 55 Vu and two VL (outgroup) genes. There are five major groups (A, B, C, D, and E) of Vn genes identified from this tree, though the grouping is not necessarily statistically significant. The number given to each interior branch is the probability at which the branch length is different from 0 (confidence probability). Confidence probability ~60% is not given. The branch lengths are measured in terms of the number of amino acid substitutions, with the scales given below the tree. 475 476 Ota and Nei Table 2 Jukes-Cantor 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. Distances (X100) for 10 Human Group A VH Genes 1.2 ......... 1.3 ......... 1.8 ......... 1.18 ........ 1.46 ........ ~1.17 ....... ~1.24 ....... ~1.27 ....... ~1.58 ....... 5.51 ........ Xenopus 1 l.lba . . . 1 2 3 4 5 6 7 8 9 10 7.7 3.5 6.1 4.0 10.4 9.8 21.2 10.4 27.2 50.8 7.1 7.7 5.5 11.0 10.4 23.8 11.0 31.5 53.7 6.6 4.0 11.5 8.7 21.8 9.8 29.3 51.7 6.1 12.7 10.4 26.5 12.1 29.3 48.9 11.5 9.3 22.5 9.8 30.0 49.8 16.8 31.5 16.8 37.7 56.7 29.3 13.2 33.0 53.7 28.6 37.7 61.0 32.3 54.7 49.8 NOTE.-“I+I” denotes a pseudogene. ’ Outgroup gene. in group A. Although this cluster belongs to the mammalian clan I, it is quite different from the human gene belonging to the same clan. An important feature that emerges from figure 4 is that the three mammalian VH clans separated a long time ago before mammals and amphibians diverged ( -350 Mya), because group A, B, and C genes are shared by both mammals and Xenupus; that is, mammalian Vu gene clans I-III have coexisted in the genome, for 2350 Myr, as separate lineages. Figure 4 shows that group C includes VH genes from both bony fishes and tetrapods (including mammalian clan III genes) and that separation of tetrapod group C genes from bony-fish group C genes occurred after group A, B, and C genes diverged. This suggests that the divergence of group A, B, and C genes occurred even before bony fishes and tetrapods diverged -400 Mya. It is impressive that different copies of VH genes have persisted for such a long time in the tetrapod genome, and this raises a serious question about the importance of concerted evolution in Vu genes. It should be noted that the mammalian Vu clans I-III may not exist in many mammalian species. Indeed, all Vu genes in the rabbit seem to belong to clan III, and they are relatively closely related to.each other in terms of sequence divergence (data not shown). In the rabbit, however, the VH gene, which is located proximally to the D segment gene cluster, is used with a high frequency ( 70%90% ) for generating antibody diversity, though there are 2 100 Vu genes in the genome and the rest of the genes seem to be used with a low frequency (Knight 1992). A more extreme case of preferential use of one VH gene is known in chicken. In this organism, only one Vn gene (Vu 1) is functional, and the other Vn genes are all pseudogenes (Reynaud et al. 1989; McCormack et al. 199 1). These genes again belong to group C (see fig. 4) and are relatively closely related to each other (authors’ unpublished data). The V region diversity of this organism is also generated by somatic gene conversion. Evolutionary Relationships of Vu Genes in the Human Genome In the study mentioned above, we chose distantly related VH genes from diverse organisms to study the pattern of long-term evolution. To understand the dynamics of evolutionary change of Vn genes, however, it is necessary to examine the evolutionary relationships of genes within either the same species or closely related species. The most extensive study of the structure and physical map of Vu genes has been done in the human. Matsuda et al. ( 1993 ) constructed the physical map of the O&Mb DNA fragment that contains 64 VH genes, and the nucleotide sequences of these genes are available from the GenBank. We examined the evolutionary relationships of Vu genes belonging to groups A and C (groups a and c in the notation of Matsuda et al.), as mentioned earlier. Both of these groups include many pseudogenes, but some of the pseudogenes were truncated or unalignable with the functional genes. We used the Xenopus gene 11.1b as an outgroup for group A genes and used the caiman gene C3 for group C genes. (Closely related genes are preferable as outgroups.) The Jukes-Cantor distances for group A genes are presented in table 2, and the phylogenetic tree for these genes is given in figure 5. This figure clearly shows that pseudogenes evolve much faster than functional genes. This is expected because pseudogenes do not have functional constraint and thus accumulate mutations more rapidly than do functional genes (Li et al. 198 1; Miyata and Yasunaga 198 1; Kimura 1983). This result is again Evolution of Immunoglobulin Vn Gene Family r-L,., c v1.17 1.46 r-F * 1.2 1.8 y-1 11.18 5.51 I‘ 477 whose branch lengths are nearly the same as those of the functional genes. These pseudogenes probably lost their function very recently. At any rate, this tree also does not support the view that intergenic gene conversion or unequal crossover plays an important role to homogenize the member genes. The mouse genome is also known to contain many pseudogenes as well as functional Vn genes. Unfortunately, there seems to be no systematic study of Vn gene sequences in this organism. However, using GenBank sequences available, we again constructed a phylogenetic trees for 39 group A mouse Vn genes (data not shown). This tree also showed a tendency for pseudogenes to evolve faster, and there was no clear-cut evidence of sequence homogenization. v1-27 Xenopus 1l.lb 0.1 FIG. 5.-Phylogenetic tree of 10 group A human Vn genes. All sequences except for Xenopus 11.1b were taken from Shin et al. ( 199 1) and Matsuda et al. ( 1993). The Xenopus gene used here is the one of the closest outgroup gene (see fig. 4). w = Pseudogene. The branch lengths are measured in terms of the number of nucleotide substitutions, with the scales given below the tree. The confidence probabilities of interior branches are indicated by asterisks: * , >90%; and * *, >95%. inconsistent with the view that the Vn genes are subject to frequent (germ-line) gene conversion or unequal crossover events. If intergenic gene conversion or unequal crossover occurs very often, we would expect that both functional and pseudogenes would be mixed and that most genes in the genome would have similar nucleotide sequences. The phylogenetic tree in figure 5 does not show this pattern. It is interesting to see that all functional genes have evolved nearly at the same rate and that some functional genes are quite close to each other. Gojobori and Nei ( 1984) estimated that the rate of nucleotide substitution for FWs is 1.Ol X lo-‘, 0.65 X 10v9, and 2.63 X 10m9 per site per year per lineage for the first, second, and third codon positions, respectively. Therefore, the average rate is 1.43 X 10 -9. If this rate is reliable, even the closest pair of functional genes ( Hs 1.2 and Hs 1.8 ) seem to have diverged - 12 Mya. The most divergent gene, Hs5.5 1, seems to have diverged from the other five functional genes - 100 Mya. This suggests that many functional Vn genes have persisted for a long time in the human genome. When all group A, B, and C genes are considered together, they have survived for >400 Myr, as mentioned earlier. Figure 6 shows the phylogenetic tree for group C genes (distance estimates are not shown). The pseudogenes in this group also generally evolve much faster than functional genes. There are a few pseudogenes Discussion Concerted Evolution As mentioned earlier, multigene families are believed to be subject to unequal crossover, which generates new duplicate genes or deletes some extant duplicate genes (Smith et al. 197 1; Smith 1974). If this process continues, duplicate genes in a multigene family tend yr3.41 65 3.13 ye3.16 a l 3.35 y 3.52 ty 3.63 y 3.54 Iy3.6 ez3 ,_y3.38 yl3.37 L,;y FIG. 6.-Phylogenetic tree of 28 group C human Vu genes. All sequences except for Caiman C3 were taken from Matsuda et al. ( 1993). The Caiman gene used here is the one of the closest outgroup gene (see fig. 4). The confidence probabilities of interior branches are indicated by asterisks: *, >90%; * *, >95%, and + * *, >99%. 478 Ota and Nei to have similar nucleotide sequences even in the presence of mutation. More recently, intergenic gene conversion was considered as an additional mechanism acting to homogenize the member genes of a multigene family (Slightom et al. 1980). These two mechanisms are thought to be the major causes of concerted evolution that homogenizes the member genes of such gene families as RNA genes and histone genes (fig. 7A) (Smith 1974; Ohta 1980; Arnheim 1983). The concerted (coincidental) evolution was also invoked to explain the evolution of immunoglobulin diversity (Ohta 1980, 1983), as mentioned earlier. In this case, unequal crossover or gene conversion was regarded as a mechanism to increase the genetic diversity (polymorphism) at a locus by introducing new variants from different loci. However, when Gojobori and Nei ( 1984) conducted a phylogenetic analysis of human and mouse Vu genes, they noticed that the extent of sequence divergence among different loci is very high and that, if these genes were subject to concerted evolution, the rate of unequal crossover or gene conversion must be very low-two orders of magnitudes lower than that of ribosomal RNA genes. To explain the difference between the two gene families, they proposed that, whereas rRNA genes are subject to purifying selection, Vn genes are subject to diversifying selection. Conducting a study on the number of synonymous (&) and nonsynonymous (&) substitutions in CDRs and FWs, Ohta ( 1992) recently concluded that the amino acid variability in CDRs is primarily caused by gene conversion. This conclusion is based on her observations that ds is generally higher in CDRs than in FWs and that & was not necessarily higher than ds, even in CDRs. She argued that these observations do not support the hypothesis of diversifying selection but are consistent with the gene conversion hypothesis. However, there are Time species 1 species 2 species 1 species 2 (A) Concerted evolution (B) Divergent evolution species 1 sepcies 2 (C) Evolution by birth and death process FIG. 7.-Three different modes of evolution of multigene families. 0 = Functional genes; and 0 = pseudogenes. three major problems in her study. First, in the study of evolution of Vu genes, only germ-line sequences should be used; but she used many cDNA sequences. Second, CDRs evolve rapidly and include many indels. Therefore, for the comparison of ds and dN , only closely related sequences should be used. When this is done, ds is nearly the same for both CDRs and FWs, and dN > ds in CDRs but dN < ds in FWs (Tanaka and Nei 1989). (In the human, ds was higher in CDRs than in FWs in Tanaka and Nei’s study; but the difference was not significant.) Third, there are no data showing that gene conversion occurs exclusively in CDRs. Actually, somatic gene conversion occurs in both CDRs and FWs in chicken ( McCormack and Thompson 1990). For the above reasons, Ohta’s conclusion does not seem to be supported by available data. Divergent Evolution In figure 4, we have seen that the major groups of Vn genes in higher vertebrates have been maintained in the genome for ~400 Myr. This observation is in rough agreement with the estimate of divergence (300-400 Myr) obtained by Gojobori and Nei ( 1984) in a statistical analysis of human and mouse Vn genes (see Addendum in their paper). However, it is in sharp contrast with the evolution of rRNA genes, where even the human and the chimpanzee, which apparently diverged - 5 Mya (see Stoneking 1993), do not share the same group of genes ( Arnheim 1983). In the present paper, this type of long-term persistence of member genes of a multigene family will be called “divergent evolution,” and it is schematically represented in figure 7B. There are two possible hypotheses to explain this long-term persistence of Vu genes in the vertebrate genome (Gojobori and Nei 1984); one is to assume that the rate of occurrence of unequal crossover is low in Vu genes, and the other is the hypothesis that diversifying selection operates among Vu genes. Gojobori and Nei ( 1984) estimated the rate of occurrence of unequal crossover by fitting a mathematical theory to the phylogenetic tree for Vu genes. The rate obtained was - 100 times lower than the estimate for the rRNA multigene family. However, since there is no reason to believe that the rate must be lower in Vu genes than in rRNA genes, they supported the second hypothesis. The recent study of the physical map of Vu genes by Matsuda et al. ( 1993) supports Gojobori and Nei’s ( 1984) view, because group A, B, and C Vu genes are scattered throughout the 0.8Mb Vn gene region, suggesting that unequal crossover has occurred relatively frequently in the past (also see Tutter and Riblet 1989b). The theoretical basis of the second hypothesis is that vertebrate individuals are exposed to a large number of foreign antigens, and thus, if an individual has many Evolution different types of Vn genes, it will enhance the fitness of the individual. This means that a new mutant Vu gene that encodes an antibody molecule matching for a new type of antigens has selective advantage and that old genes that have proved to be useful will be retained in the genome. Indeed, Tanaka and Nei ( 1989) have shown that the rate of amino acid-altering (nonsynonymous) nucleotide substitutions is higher than the rate of synonymous nucleotide substitution in CDRs of the human and mouse Vn genes. In other words, there is positive Darwinian selection operating for the diversification of Vu genes. Diversifying selection would certainly increase the persistence time of different Vn genes in the genome, but can it explain the existence of three major groups of Vu genes that have survived for ~400 Myr? Theoretically, it can happen by chance when random occurrence of unequal crossover creates new genes or deletes existing ones. However, it is more likely that these three groups are specialized to cope with different groups of antigens, though there is no evidence at the present time. Kirkham et al. ( 1992) showed that each of the mammalian clans I-III has a unique FR structure that influences the conformation of the antigen-binding site, and they suggested that there might be a differential use of Vn clans in the immune response. Evolution by the Birth-and-Death Process In the theory of concerted evolution, it is commonly assumed that the total number of duplicate genes of a gene family remains more or less constant (Ohta 1983 ). Hughes and Nei ( 1989) and Nei and Hughes ( 199 1, 1992) showed that, in the major-histocompatibilitycomplex (MHC) gene family, gene duplication often occurs but that many duplicate genes die out because of deleterious mutation. However, the number of functional genes remains more or less constant, since the effects of the birth and death of duplicate genes are balanced out. The dead genes either stay as pseudogenes in the genome or are eliminated by unequal crossover. They called this the “evolution by the birth-and-death process” (fig. 7C). Figures 5 and 6 show that evolution by the birthand-death process also occurs in the Vn gene family. Matsuda et al. ( 1993) estimated that - 50% of Vu genes that exist in the human genome are pseudogenes. Similarly, the mouse genome harbors many pseudogenes (Rathbun et al. 1989). Since the number of functional VH genes in the genome is similar for many mammalian species, these results indicate that nearly the same number of functional genes as that of pseudogenes is produced by unequal crossover for a given period of evolutionary time. We can therefore conclude that the Vu gene family is subject to evolution by the birth-and-death of Immunoglobulin VH Gene Family 479 process, when a relatively short period of evolutionary time is considered. From the phylogenetic trees given in figures 5 and 6, we previously concluded that concerted evolution does not play an important role in the Vu gene family. This does not mean that there is no homogenization process in the evolution of VH genes. As long as unequal crossover occurs, there must be some chance of homogenization. We also do not completely rule out the role of intergenic gene conversion. Because gene conversion is an important mechanism of somatic generation of antibody diversity in some organisms (e.g., chicken), it might have played some role in the evolution of genomic structure of Vu genes as well. Nevertheless, as far as currently available data are concerned, the Vn gene family appears to have evolved mainly following the scheme of divergent evolution and evolution by the birth-and-death process. Some Remarks It should be emphasized that, although we now have a rough picture of the evolution of VH genes in vertebrates, this picture is based on studies of a limited number of species. If more species are studied, we may find new types of genomic organization of Vu genes. It is rather surprising that the genomic structure of the coelacanth is somewhat similar to that of cartilaginous fishes. It is also interesting that the rabbit system is similar to the chicken system. Although these systems should be studied in more detail, they suggest that antibody diversity is generated in many different ways and that some systems may have evolved independently in different groups of vertebrates. Obviously, more detailed studies of the genomic structure of Vn genes are necessary in many different organisms before we understand the general pattern of evolution of this multigene family. Previously we suggested that the long persistence of VH genes in vertebrate genomes is aided by diversifying selection. However, the extent of selective advantage of a new mutant Vn gene is probably small (possibly l%-2%). The reason for this is that there are organisms in which only one or a few Vu genes are functional and in which others are either pseudogenes or rarely used. If every gene had a unique function that is essential to the survival of the organism, this type of system would never have evolved; somatic mutation and intergenic gene conversion would be too risky and too uneconomical in this case. The fact that many Vu genes die out in the course of evolution also suggests that the adaptive differences between different Vu genes is rather small. Nevertheless, this small magnitude of adaptive difference (s) would be sufficient to explain the long persistence of Vn genes in the vertebrate genome if the population size (N) is sufficiently large, because the 480 Ota and Nei HARDING, F. A., C. T. AMEMIYA,R. T. LITMAN, N. COHEN, and G. W. LITMAN. 1990a. Two distinct immunoglobulin heavy chain isotypes in a primitive, cartilaginous fish, Raja erinacea. Nucleic Acids Res. 18:6369-6376. HARDING, F. A., N. COHEN, and G. W. LITMAN. 1990b. Immunoglobulin heavy chain gene organization and complexity in the skate, Raja erinacea. Nucleic Acids Res. 18: 1015-1020. Acknowledgment HIGGINS, D. G., A. J. BLEASBY,and R. FUCHS. 1992. CLUSThis work was supported by research grants from TAL V: improved software for multiple sequence alignment. the National Institute of Health (GM-20293) and the Comput. Appl. Biosci. 8: 189- 19 1. National Science Foundation (DEB-9 119802) to M.N. HINDS-FREY,K. R., H. NISHIKATA,R. T. LITMAN,and G. W. LITMAN. 1993. Somatic variation precedes extensive diLITERATURE CITED versification of germline sequences and combinatorial joining in the evolution of immunoglobulin heavy chain diARNHEIM,N. 1983. Concerted evolution of multigene families. versity. J. Exp. Med. 178:8 15-824. Pp. 38-61 in M. NEI, and R. K. KOEHN, eds. Evolution of HOOD, L., J. H. CAMPBELL,and S. C. R. ELGIN. 1975. The genes and proteins. Sinauer, Sunderland, Mass. organization, expression, and evolution of antibody genes AMEMIYA, C. T., and G. W. LITMAN. 1990. Complete nuand other multigene families. Annu. Rev. Genet. 9:305cleotide sequence of an immunoglobulin heavy-chain gene 353. and analysis of immunoglobulin gene organization in a HUGHES, A. L., and M. NEI. 1989. Evolution of the major primitive teleost species. Proc. Natl. Acad. Sci. USA 87: histocompatibility complex: independent origin of non811-815. classical class I genes in different groups of mammals. Mol. AMEMIYA,C. T., Y. OHTA, R. T. LITMAN, J. P. RAST, R. N. Biol. Evol. 6:559-579. HAIRE, and G. W. LITMAN. 1993. VH gene organization in HUMPHRIES, C. G., A. SHEN, W. A. KUZIEL, J. D. CAPRA, a relict species, the coelacanth Latimeria chalumnae: evoF. R. BLATTNER,and P. W. TUCKER. 1988. A new human lutionary implications. Proc. Natl. Acad. Sci. USA 90:666 limmunoglobulin VH family preferentially rearranged in 6665. immature B-cell tumours. Nature 331:446-449. BAUER, S. R., A. KUDO, and F. MELCHERS. 1988. Structure JUKES, T. H., and C. R. CANTOR. 1969. Evolution of protein and pre-B lymphocyte restricted expression of the VpreB molecules. Pp. 2 1- 132 in H. N. MUNRO, ed. Mammalian gene in humans and conservation of its structure in other protein metabolism. Academic Press, New York. mammalian species. EMBO J. 7: 11 l- 116. KABAT, E. A., T. T. WV, H. M. PERRY, K. S. GOTTESMAN, BOTHWELL,A. L. M., M. PASKIND, M. RETH, T. IMANISHIand C. FOELLER. 199 1. Sequences of proteins of immuKARI, K. RAJEWSKY, and D. BALTIMORE. 198 1. Heavy nological interest, 5th ed. U.S. Department of Health and chain variable region contribution to the NPb family of Human Services, Government Printing Office, Washington, antibodies: somatic mutation evident in a y2a variable reD.C. gion. Cell 24:625-637. KASTURI, K. N., R. MAYER, C. A. BONA, V. E. SCOTT, and CAROLL, R. L. 1988. Vertebrate paleontology and evolution, C. L. SIDMAN. 1990. Germline V genes encode viable Freeman, New York. motheaten mouse autoantibodies against thymocytes and CURRIER, S. J., J. L. GALLARDA,and K. L. KNIGHT. 1988. red blood cells. J. Immunol. 145:2304-23 11. Partial molecular genetic map of the rabbit Vu chromoKIMURA, M. 1983. The neutral theory of molecular evolution. somal region. J. Immunol. 140: 165 l- 1659. Cambridge University Press, Cambridge. Du PASQUIER,L. 1993. Phylogeny of B-cell development. Curr. KIRKHAM, P. M., F. MORTARI, J. A. NEWTON, and H. W. Opin. Immunol. 5: 185- 193. SCHROEDER, JR. 1992. Immunoglobulin Vu clan and family GEFTER, M. L., M. N. MARGOLIES, R. I. NEAR, and L. J. identity predicts variable domain structure and may influWYSOCKI. 1984. Analysis of the anti-azobenzenearsonate ence antigen binding. EMBO J. 11:603-609. response at the molecular level. Annu. Inst. Pasteur ImKNIGHT, K. L. 1992. Restricted VH gene usage and generation munol. 135C: 17-30. of antibody diversity in rabbit. Annu. Rev. Immunol. 10: GHAFFARI, S. H., and C. J. LOBB. 1991. Heavy chain variable 593-6 16. region gene families evolved early in phylogeny. J. Immunol. KOKUBU, F., R. LITMAN, M. J. SHAMBLOTT,K. HINDS, and 146:1037-1046. G. W. LITMAN. 1988. Diverse organization of immunoGOJOBORI,T., and M. NEI. 1984. Concerted evolution of the globulin Vu gene loci in a primitive vertebrate. EMBO J. immunoglobulin Vu gene family. Mol. Biol. Evol. 1:1957~3413-3422. 212. KUDO, A., and F. MELCHERS. 1987. A second gene, VpreBin HAIRE, R. N., Y. OHTA, R. T. LITMAN, C. T. AMEMIYA,and the h5 locus of the mouse, which appears to be selectively G. W. LITMAN. 199 1. The genomic organization of imexpressed in pre-B lymphocytes. EMBO J. 6:2267-2272. munoglobulin VH genes in Xenopus laevis shows evidence LEE, K. H., F. MATSUDA, T. KINASHI, M. KODAIRA, and T. for interspersion of families. Nucleic Acids Res. 19:306 lHONJO. 1987. A novel family of variable region genes of 3066. maintenance of Vu genes would be determined mainly by Ns. If the theoretical study on frequency-dependent selection with respect to MHC genes (Takahata and Nei 1990) gives any guidance, it suggests that the persistence of different Vn genes would be enhanced substantially by diversifying selection when Ns > 100. Evolution of Immunoglobulin VH Gene Family 48 1 the human immunoglobulin heavy chain. J. Mol. Biol. 195: . 1992. Balanced polymorphism and evolution by the 76 l-768. birth-and-death process in the MHC loci. Pp. 27-38 in K. LI, W. H., T. GOJOBORI, and M. NEI. 198 1. Pseudogenes as TSUJI, M. AIZAWA, and T. SASAZUKI, eds. HLA 199 1. a paradigm of neutral evolution. Nature 292:237-239. Proceedings of the 11th Histocompatibility Workshop and LITMAN, G. W., L. BERGER, K. MURPHY, R. LITMAN, K. Conference. Vol. 2. Oxford University Press, Oxford. HINDS, C. L. JAHN, and B. W. ERICKSON. 1983. Complete OHTA, T. 1980. Evolution and variation of multigene families. nucleotide sequence of an immunoglobulin Vu gene hoSpringer, Berlin. mologue from Caiman, a phylogenetically ancient reptile. -. 1983. On the evolution of multigene families. Theor. Nature 303:349-352. Popul. Biol. 23:2 16-240. LITMAN, G. W., J. P. RAST, M. J. SHAMBLOTT,R. N. HAIRE, -. 1992. A statistical examination of hypervariability in M. HULST, W. ROESS, R. T. LITMAN, K. R. HINDS-FREY, complementarity determining regions of immunoglobulins. A. ZILCH, and C. T. AMEMIYA. 1993. Phylogenetic diverMol. Phylogenet. Evol. 1:305-3 11. sification of immunoglobulin genes and the antibody repOLLO, R., C. AUFFRAY, J.-L. SIKORAV, and F. ROUGEON. ertoire. Mol. Biol. Evol. 10:60-72. 198 1. Mouse heavy chain variable regions: nucleotide seMCCORMACK,W. T., L. M. CARLSON,L. W. TJOELKER,and quence of a germ-line Vu gene segment. Nucleic Acids Res. C. B. THOMPSON. 1989. Evolutionary comparison of the 9:4099-4 109. avian IgL locus: combinatorial diversity plays a role in the OLLO, R., J.-L. SIKORAV,and F. ROUGEON. 1983. Structural generation of the antibody repertoire in some avian species. relationships among mouse and human immunoglobulin Int. Immunol. 1:332-34 1. Vu genes in the subgroup III. Nucleic Acids Res. 11:7887MCCORMACK, W. T., and C. B. THOMPSON. 1990. Chicken 7897. IgL variable region gene conversions display pseudogene RATHBUN,G., J. BERMAN,G. YANCOPOULOS,and F. W. ALT. donor preference and 5’ to 3’ polarity. Genes Dev. 4:5481989. Organization and expression of the mammalian 558. heavy-chain variable region locus. Pp. 63-90 in T. HONJO, MCCORMACK,W. T., L. W. TJOELKER,and C. B. THOMPSON. F. W. ALT, and T. H. RABBITTS,eds. Immunoglobulin 199 1. Avian B-cell development: generation of an immugenes. Academic Press, London. noglobulin repertoire by gene conversion. Annu. Rev. ImRATHBUN, G. A., F. OTANI, E. C. B. MILNER, J. D. CAPRA, munol. 9:2 19-24 1. and P. W. TUCKER. 1988. Molecular characterization of MATSUDA, F., K. H. LEE, S. NAKAI, T. SATO, M. KODAIRA, the A/J 5558 family of heavy chain variable region gene S. Q. ZONG, H. OHNO, S. FUKUHARA,and T. HONJO. 1988. segments. J. Mol. Biol. 202:383-395. Dispersed localization of D segments in the human imREYNAUD, C.-A., V. ANQUEZ, H. GRIMAL, and J.-C. WEILL. munoglobulin heavy-chain locus. EMBO J. 7: 1047- 105 1. 1987. A hyperconversion mechanism generates the chicken MATSUDA, F., E. K. SHIN, H. NAGAOKA, R. MATSUMURA, light chain preimmune repertoire. Cell 48:379-388. M. HAINO, Y. FUKITA, S. TAKA-ISHI, T. IMAI, J. H. RILEY, REYNAUD, C.-A., A. DAHAN, V. ANQUEZ, and J.-C. WEILL. R., ANAND, E. SOEDA,and T. HONJO. 1993. Structure and 1989. Somatic hyperconversion diversifies the single Vn physical map of 64 variable segments in the 3’ 0.8 megabase gene of the chicken with a high incidence in the D region. region of the human immunoglobulin heavy-chain locus. Cell 59:171-183. Nature Genet. 3:88-94. RZHETSKY, A., and M. NEI . 1992. A simple method for estiMATSUNAGA,T., T. CHEN, and V. T~RMANEN. 1990. Charmating and testing minimum-evolution trees. Mol. Biol. acterization of a complete immunoglobulin heavy-chain Evol. 9:945-967. variable region germ-line gene of rainbow trout. Proc. Natl. -. 1993. Theoretical foundation of the minimum-evoAcad. Sci. USA 87:7767-777 1. lution method of phylogenetic inference. Mol. Biol. Evol. MATSUO, Y., and T. YAMAZAKI. 1989. Nucleotide variation 10:1073-1095. and divergence in the histone multigene family in DroSAKANO, H., R. MAKI, Y. KUROSAWA,W. ROEDER, and S sophila melanogaster. Genetics 122~87-97. TONEGAWA. 1980. Two types of somatic recombination MATTHYSSENS,G., and T. H. RABBITTS. 1980. Structure and are necessary for the generation of complete immunoglobmultiplicity of genes for the human immunoglobulin heavy ulin heavy-chain genes. Nature 286:676-683. chain variable region. Proc. Natl. Acad. Sci. USA 77:656 lSCHIFF,C.,M. MILILI, and M. FOUGEREAU. 1985. Functional 6565. and pseudogenes are similarly organized and may equally MIYATA,T., and T. YASUNAGA.198 1. Rapidly evolving mouse contribute to the extensive antibody diversity of the IgVuII a-globin-related pseudo gene and its evolutionary history. family. EMBO J. 4: 1225-1230. Proc. Natl. Acad. Sci. USA 78:450-453. SCHIFF,C., M. MILILI, I. HUE, S. RUDIKOFF, and M. FOUGERNEI, M. 1987. Molecular evolutionary genetics. Columbia EAU. 1986. Genetic basis for expression of the idiotypic University Press, New York. network. J. Exp. Med. 163:573-587. NEI, M., and A. L. HUGHES. 199 1. Polymorphism and evoSCHROEDER, H. W. JR., J. L. HILLSON, and R. M. PERLMUTlution of the major histocompatibility complex loci in TER. 1990. Structure and evolution of mammalian Vu mammals. Pp. 222-247 in R. SELANDER,A. CLARK, and families. Int. Immunol. 2:4 l-50. T. WHITTAM,eds. Evolution at the molecular level. Sinauer, SCHROEDER, H. W. JR., M. A. WALTER, M. H. HOFKER, A. Sunderland, Mass. 482 Ota and Nei EBENS,K. W. VAN DIJK, L. C. LIAO, D. W. Cox, E. C. B. MILNER, and R. M. PERLMUTTER. 1988. Physical linkage of a human immunoglobulin heavy chain variable region gene segment to diversity and joining region elements. Proc. Natl. Acad. Sci. USA 85:8 196-8200. SCHWAGER,J., N. BURCKERT, M. COURTET, and L. Du PASQUIER. 1989. Genetic basis of the antibody repertoire in Xenopus: analysis of the Vu diversity. EMBO J. 8:29893001. SHAMBLOTT,M. J., and G. W. LITMAN. 1989. Genomic organization and sequences of immunoglobulin light chain genes in a primitive vertebrate suggest coevolution of immunoglobulin gene organization. EMBO J. 8:3733-3739. SHIN, E. K., F. MATSUDA,H. NAGAOKA,Y. FUKITA, T. IMAI, K. YOKOYAMA,E. SOEDA, and T. HONJO. 199 1. Physical map of the 3’ region of the human immunoglobulin heavy chain locus: clustering of autoantibody-related variable segments in one haplotype. EMBO J. 10:3641-3645. SIU, G., E. A. SPRINGER,H. V. HUANG, L. E. HOOD, and S. T. CREWS. 1987. Structure of the T15 Vu gene subfamily: identification of immunoglobulin gene promoter homologies. J. Immunol. 138:4466-447 1. SLIGHTOM,J. L., A. E. BLECHL,and 0. SMITHIES. 1980. Human fetal Go- and Ay-globin genes: complete nucleotide sequences suggest that DNA can be exchanged between these duplicated genes. Cell 21:627-638. SMITH, G. P. 1974. Unequal crossover and the evolution of multigene families. Cold Spring Harb. Symp. Quant. Biol. 38:507-5 13. SMITH, G. P., L. HOOD, and W. M. FITCH. 197 1. Antibody diversity. Annu. Rev. Biochem. 40:969- 10 12. STONEKING,M. 1993. DNA and recent human evolution. Evol. Anthropol. 2:60-73. TAKAHATA, N., and M. NEI. 1990. Allelic genealogy under overdominant and frequency-dependent selection and polymorphism of major histocompatibility complex loci. Genetics 124:967-978. TANAKA, T., and M. NEI. 1989. Positive Darwinian selection observed at the variable-region genes of immunoglobulins. Mol. Biol. Evol. 6:447-459. THOMPSON,C. B., and P. E. NEIMAN. 1987. Somatic diversification of the chicken immunoglobulin light chain gene is limited to the rearranged variable gene segment. Cell 48: 369-378. TONEGAWA,S. 1983. Somatic generation of antibody diversity. Nature 302:575-58 1. TUTTER, A., and R. RIBLET. 1989~. Conservation of an immunoglobulin variable-region gene family indicates a specific, noncoding function. Proc. Natl. Acad. Sci. USA 86: 7460-7464. -. 1989b. Evolution of the immunoglobulin heavy chain variable region (Ig/.r-V) locus in the genus MUX Immunogenetics 30:3 15-329. VAZQUEZ,M., N. MIZUKI, M. F. FLAJNIK, E. C. MCKINNEY, and M. KASAHARA . 1992. Nucleotide sequence of a nurse shark immunoglobulin heavy chain cDNA clone. Mol. Immunol. 29:1157-l 158. WARR, G. W., D. L. MIDDLETON,N. W. MILLER,L. W. CLEM, and M. R. WILSON. 199 1. An additional family of Vn sequences in the channel catfish. Eur. J. Immunogen 18:393397. WILSON, M. R., D. MIDDLETON, and G. W. WARR. 199 1. Immunoglobulin Vu genes of the goldfish Carussius auratus: a re-examination. Mol. Immunol. 28:449-457. YAMASAKI-KATAOKA,Y., and T. HONJO. 1987. Nucleotide sequences of variable region segments of the immunoglobulin heavy chain of Xenopus luevis. Nucleic Acids Res. 15: 5888. ZIMMER, E. A., S. L. MARTIN, S. M. BEVERLEY,Y. W. RAN, and A. C. WILSON. 1980. Rapid duplication and loss of genes coding for the c1chains of hemoglobin. Proc. Natl. Acad. Sci. USA 77:2158-2162. JAN KLEIN, reviewing Received November Accepted January editor 5, 1993 19, 1994