* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Evolution of Immunoglobulin Kappa Chain Variable Region
Gene therapy wikipedia , lookup
X-inactivation wikipedia , lookup
Human genetic variation wikipedia , lookup
Copy-number variation wikipedia , lookup
Adaptive evolution in the human genome wikipedia , lookup
Gene nomenclature wikipedia , lookup
Point mutation wikipedia , lookup
Oncogenomics wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Transposable element wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Non-coding DNA wikipedia , lookup
Public health genomics wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Human genome wikipedia , lookup
Gene desert wikipedia , lookup
Essential gene wikipedia , lookup
History of genetic engineering wikipedia , lookup
Metagenomics wikipedia , lookup
Gene expression programming wikipedia , lookup
Pathogenomics wikipedia , lookup
Genomic imprinting wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Microevolution wikipedia , lookup
Minimal genome wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Ridge (biology) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genome (book) wikipedia , lookup
Genome evolution wikipedia , lookup
Evolution of Immunoglobulin Kappa Chain Variable Region Genes in Vertebrates Tatyana Sitnikova and Masatoshi Nei Institute of Molecular Evolutionary Genetics, and Department of Biology, The Pennsylvania State University The major source of immunoglobulin diversity is variation in DNA sequence among multiple copies of variable region (V) genes of the heavy- and light-chain multigene families. In order to clarify the evolutionary pattern of the multigene family of immunoglobulin light k chain V region (Vk) genes, phylogenetic analyses of Vk genes from humans and other vertebrate species were conducted. The results obtained indicate that the Vk genes so far sequenced can be grouped into three major monophyletic clusters, the cartilaginous fish, bony fish and amphibian, and mammalian clusters, and that the cartilaginous fish cluster first separated from the rest of the Vk genes and then the remaining two clusters diverged. The mammalian Vk genes can further be divided into 10 Vk groups, 7 of which are present in the human genome. Human and mouse Vk genes from different Vk groups are intermingled rather than clustered on the chromosome, and there are a large number of pseudogenes scattered on the chromosome. This indicates that the chromosomal locations of Vk genes have been shuffled many times by gene duplication, deletion, and transposition in the evolutionary process and that many genes have become nonfunctional during this process. This mode of evolution is consistent with the model of birth-and-death evolution rather than with the model of concerted evolution. An analysis of duplicate Vk functional genes and pseudogenes in the human genome has indicated that pseudogenes evolve faster than functional genes but that the rate of nonsynonymous nucleotide substitution in the complementarity-determining regions of Vk genes has been enhanced by positive Darwinian selection. Introduction Immunoglobulin molecules are composed of two identical heavy chains and two identical light chains. There are two different types of light chains: kappa (k) and lambda (l) chains. Both heavy and light chains consist of two domains: variable (V) and constant (C) domains. The V domains, which interact with foreign antigens, can further be divided into the complementaritydetermining regions (CDRs) and the framework regions (FRs) (Kabat et al. 1991). CDRs are antigen-binding sites and are known to be highly variable. The V domains of heavy chains are encoded by variable (VH), diversity (DH), and joining (JH) segment genes, whereas the light-chain V domains are encoded by variable (VL) and joining (JL) segment genes (Tonegawa 1983). Immunoglobulin diversity is generated primarily by sequence variation among many copies of VH and VL genes in the genome, although in some organisms somatic mutation and somatic gene conversion play an important role (Knight and Tunyaplin 1995; Weill and Reynaud 1995). For this reason, a number of authors have studied the diversity and evolution of VH and VL genes (e.g., Hood, Campbell, and Elgin 1975; Ohta 1980; Gojobori and Nei 1984; Kurth, Mountain, and Cavalli-Sforza 1993; Ota and Nei 1994; Rast et al. 1994). There are two main hypotheses for explaining the generation and maintenance of immunoglobulin V gene diversity in vertebrates. One is concerted evolution, in which the genetic diversity at a locus or a set of loci is assumed to be generated by introduction of new variants Key words: Ig Vk genes, multigene families, birth-and-death evolution, concerted evolution. Address for correspondence and reprints: Tatyana Sitnikova, Institute of Molecular Evolutionary Genetics, 208 Mueller Laboratory, The Pennsylvania State University, University Park, Pennsylvania 16802. E-mail: [email protected]. Mol. Biol. Evol. 15(1):50–60. 1998 q 1998 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038 50 from different loci in the same VH or VL multigene family (Gally and Edelman 1972; Hood, Campbell, and Elgin 1975; Ohta 1980, 1983). The other is birth-and-death evolution, in which new genes are generated by repeated gene duplication, and some duplicate genes are maintained in the genome for a long time but others are deleted or become nonfunctional by deleterious mutations (Ota and Nei 1994). Taking advantage of the large amount of DNA sequences generated by recent genome projects, Nei, Gu, and Sitnikova (1997) conducted an extensive phylogenetic analysis of the VH multigene family and reached the conclusion that this gene family is primarily subject to birth-and-death evolution rather than to concerted evolution. The purpose of this paper is to extend this analysis to the kappa variable region (Vk) gene family and examine the pattern of evolution of Vk genes. We will first conduct a phylogenetic analysis of human Vk genes, for which the complete sequence is available for all 76 genes of the family. We will then examine the evolutionary relationships of Vk genes from various organisms to study the general pattern of evolution. Materials and Methods Background Information The entire coding region (2 Mb) of the human immunoglobulin k chain gene family has been sequenced (Schäble and Zachau 1993; Zachau 1995). It contains 76 Vk genes, 5 Jk genes, and 1 Ck gene. Of the Vk genes, 40 are functional, 9 do not have obvious defects but are not expressed, and 27 are pseudogenes (Tomlinson et al. 1996). A remarkable feature of the human k gene family is that a large portion of the Vk region is duplicated, so that 33 of 76 Vk genes are present in two copies, and 10 are present in single copies (Schäble and Zachau 1993). This duplication seems to have occurred quite recently because the homologous pairs of Vk genes have Immunoglobulin Vk Genes 51 FIG. 1.—Genomic structure of the human k chain gene family (adapted from Schäble and Zachau 1993). Open or filled rectangles represent Vk genes, solid lines represent Jk genes, and a hatched rectangle represents the Ck gene. Vk genes from the O, A, and L regions, which are shown by rectangles with different shadings, are present in both the p and d contigs. The Vk genes from the B region, which are shown by filled rectangles, exist only in the p contig. A gene homologous to L8 includes only a leader exon, the rest of the gene being deleted from the d contig, and is designated by L229. Gene L22, which does not have a homolog in the p contig, includes only exon II (see Huber et al. 1993a). The roman numeral below each gene indicates the human Vk family. The transcriptional direction is shown by arrows. Open triangles indicate deletions that occurred after the duplication in the p or d contig. a sequence similarity of 95%–100%, and this duplication has not been found in the chimpanzee, gorilla, or orangutan (Ermert et al. 1995). From information on the overall sequence divergence between the two sets of duplicate genes (;1%), Schäble and Zachau (1993) suggested that the duplication occurred about 1 MYA. The copy of the Vk region adjacent to Jk genes is called the p (proximal) contig, whereas the other copy of this region is called the d (distal) contig (fig. 1). A population genetic study of the Vk region has shown that about 5% of human haplotypes do not have the d contig, and that the absence of the d contig does not seem to affect the fitness of its carrier even when the individual is homozygous for the absence of the d contig (Pargent, Schäble, and Zachau 1991; Schaible et al. 1993). The Vk region is often subdivided into four smaller regions, called the O, A, L, and B regions, which correspond to the DNA blocks used for nucleotide sequencing (Zachau 1995). These four regions are present in the p contig, whereas the d contig contains regions O and A and a large part of region L (fig. 1). Vk genes are named according to their region of location on the chromosome (with some exceptions) and are numbered consecutively in each region. Thus, gene O1 is the first gene in region O. Some genes on the boundaries between two regions are named according to the contig where they were first found (e.g., A30 from region L; see Huber et al. 1993a). The human Vk genes are subdivided into seven families according to sequence similarity (Schäble and Zachau 1993). (To be consistent with the classification for VH genes, Schäble and Zachau’s ‘‘subgroups’’ are called ‘‘families’’ in this paper.) The Vk genes that are similar to one another and belong to the same family are not necessarily located contiguously on the chromosome. Rather, they are intermingled with Vk genes from different families. Therefore, evolution by gene duplication followed by transposition has been suggested to explain the arrangement of Vk genes (Pech and Zachau 1984). Sequences Used for Analysis Human Vk Gene Tree To examine the evolutionary relationships of human Vk genes, we used germline sequences of all Vk genes located on the p contig except for genes A22, A24, A28, O13, and O15, which contain large deletions. Duplicate genes from the d contig were not used in the analysis, because the homologous genes in the p and d contigs are closely related to each other. However, we included gene A14 from the d contig in the analysis, because its homolog in the p contig has been deleted (see fig. 1). The GenBank accession numbers for all human Vk sequences used are given in Schäble and Zachau (1993). Vertebrate Vk Gene Tree In this analysis, we used several representative Vk genes from each human gene family. We also used 22 mouse Vk sequences, of which 19 represent 19 mouse Vk families according to Strohal et al.’s (1989) and Kofler and Helmberg’s (1991) classification. Two additional sequences from family 4/5 and one sequence from family 23 were included in the analysis, because these mouse gene families are not shared by other species. Our preliminary phylogenetic analysis of all available mouse Vk sequences (kindly provided by Reinhard Kofler) showed that the Vk genes belonging to different families produce separate clusters. The only exception was a sequence from family 22, which clustered with the family 8 genes. We also analyzed all complete nucleotide sequences of functional Vk genes from other mammals that were available as of January 1997. (However, we did not use rat sequences YTH345HL and IR2, because they were closely related to rat sequence NCS. The horse cDNA 52 Sitnikova and Nei sequence k42 was also excluded, because it was very close to the horse germline sequence k1.) Since light L1 (r) chain V genes from amphibians, VL1 genes from bony fishes, and type III VL genes from sharks have been shown to be more closely related to Vk than to Vl genes (Zezza, Stewart, and Steiner 1992; Daggfeldt, Bengten, and Pilström 1993; Rast et al. 1994; Lundqvist et al. 1996; Partula et al. 1996), we included representative sequences of these Vk-like genes in our analysis. We used the germline sequences whenever they were available, but otherwise we used cDNAs. (cDNAs are marked by asterisks.) However, this should not affect our results very much, because somatic mutations are generally concentrated on CDRs rather than on FRs (see Gojobori and Nei 1986; Reynaud et al. 1995; Klein and Zachau 1995 for studies on mouse, sheep, and human variable-region genes, respectively), and we used FRs in the present study. The Vk and Vk-like sequences used in this study (except for human sequences) are as follows. The sequence names together with their GenBank accession numbers are given after species names and are separated by semicolons. Mouse (Mus musculus): 12/13, MUSIGKVA; 11*, MUSIGKBE; RF*, MMIGRF2L; 38c*, J04577; 9A, MUSIGKVC; 9B, MMIG69; 10, MMIGVKID; 32*, MUSIGKCNE; 33/34*, MUSIGKABH; 1, MUSIGKVOA; 2*, MUSIGKCLM; 24, MUSIGKVH2; 8*, MUSIGKCSF; 22*, Kwan et al. 1981, no GenBank entry found; 28, MUSIGKCAM2; 20*, MUSIGKC85; 21, MUSIGKVR1; 4/5a, 4/5b, and 4/5c (Vk 17, 18, and 28 from R. Kofler’s database, respectively); 23a, MMIG71; 23b* (Vk 14 from R. Kofler’s database) Rat (Rattus norvegicus): 56R-3*, RATIGKVCL; 56R7*, RATIGKVDL; 57R-1*, RATIGKVEL; IR162*, RATIGKAC; NCS*, RATIGKNCSL; NGF*, RATIGNGFLV; Y3-Ad1.2.3*, RNIGKY3; 53R-1*, RATIGKVAL Rat rat (Rattus rattus): CD25*, RATIGCD25L Rabbit (Oryctolagus cuniculus): 18a, OCIG06; 18b, OCIG07; 19a, OCIG08; 19b, OCIG09; b95r*, OCVKVR; b95g, OCVKCDR3; b4*, RABIGKAA; b5*, OCIG03; k20, RABIGKVA; b9*, RABIGKAC; bas*, RABIGKAD Chimpanzee (Pan troglodytes): B3, PTIGK1VR Gorilla (Gorilla gorilla): B3, GGIGKVR Orangutan (Pongo pygmaeus): B3, PPIGICVR Macaque (Macaca mulatta): B3, MMIGKVREG Sheep (Ovis aries): SK*, OAIGKLC Horse (Equus caballus): k1, ECVK1 Frog (Xenopus laevis) r chain: r-V1, XELIGLVA; rV2, XELIGLVB; r-V3, XELIGLVC Trout (Oncorhynchus mykiss): c10*, OMIGLAG; c3*, OMIGLAI Channel catfish (Ictalurus punctatus): G*, ICTIGLSTD; F*, IPU25705 Sturgeon (Acipencer baeri): 202*, ABIGLC202; 309*, ABIGLC309 Shark (Heterodontus francisci): IIIa*, HEFIGCVE; IIIb*, HEFIGCVF In trees for both human and vertebrate sequences, the human germline sequences for the l chain variable region genes 3a.119B (3a), 4a.366F5 (4a), and 8a.88E1 (8a) (Tomlinson et al. 1996) were used as outgroups. In this study, we used only the FRs of Vk sequences (Kabat et al. 1991), because the CDRs are highly variable and difficult to align. All three nucleotide positions of codons were used in the study of evolutionary relationships of human Vk sequences and the relationships of rabbit sequences with other mammalian sequences. In the analysis of vertebrate Vk sequences, however, only first and second codon positions were used, since nucleotide changes at third codon positions seem to have saturated when distantly related sequences were compared. All sequences were aligned by using the SEQED computer program (Zharkikh et al. 1991). Phylogenetic Study Phylogenetic trees were constructed by the neighbor-joining method (NJ; Saitou and Nei 1987). Evolutionary distances between sequences were computed by using Kimura’s (1980) two-parameter model. The reliability of tree topologies was evaluated by two different tests: the bootstrap test (Felsenstein 1985) and the interior-branch test (Rzhetsky and Nei 1992; Sitnikova, Rzhetsky, and Nei 1995; Sitnikova 1996). The bootstrap probability (BP) and the confidence probability (CP) values were computed for each interior branch of the tree, and the interior branches with BP $ 95% and CP $ 95% were considered statistically significant. The computer programs MEGA (Kumar, Tamura, and Nei 1993), METREE (Rzhetsky and Nei 1994), TreePack (I. Belyi, personal communication), and TREEVIEW (K. Tamura, personal communication) were used in this study. Results Phylogenetic Analysis of Human Vk Genes The phylogenetic tree for human Vk genes is presented in figure 2. There are four major clusters (I, II, III, and VI) of Vk’s, which are statistically supported, and three singleton sequences (IV, V, and VII), which do not form any significant cluster. These seven groups of sequences exactly correspond to the human Vk families defined by Schäble and Zachau (1993) according to sequence similarity. The evolutionary relationships of the seven Vk families are not well resolved, since none of the branches connecting the families is statistically significant. This observation is different from that for VH genes, for which seven human VH families can be clearly divided into three groups (Schroeder, Hillson, and Perlmutter 1990; Kabat et al. 1991; Honjo and Matsuda 1995; Nei, Gu, and Sitnikova 1997). Within the families I and II, the human Vk genes located in region L on the chromosome (see fig. 1) tend to form their own clusters in the tree (fig. 2). This is in agreement with the results of sequence similarity analysis reported by Schäble and Zachau (1993). However, Immunoglobulin Vk Genes 53 FIG. 2.—Phylogenetic tree of 36 human Vk sequences. The sequences used are named according to Schäble and Zachau (1993). The symbols C and n in brackets each indicate a pseudogene and a gene which has an open reading frame but is not expressed (Tomlinson et al. 1996). Seven human families are indicated by roman numerals. Three human Vl sequences were used as outgroups. Although the standard length of the FRs is 210 nucleotides, the number of nucleotides used was 201, because we ignored all sites containing deletions in some Vk pseudogenes. Only boostrap probability (BP) . 50% and confidence probability (CP) . 50% are shown above and below each interior branch, respectively. CP is usually higher than BP, but it is not shown for interior branches which are unimportant in our discussion. The BP and CP values that show significant statistical support for the clusters of human Vk sequences from the same human family are shown in bold letters. unlike their results, no other closely located genes form a cluster. The human Vk genes include 27 pseudogenes, i.e., genes that contain stop codons, frameshift mutations, or incomplete exons. These pseudogenes generally show a longer branch compared with the branch leading to a functional gene. The higher rate of nucleotide substitution in pseudogenes is apparently due to relaxation of purifying selection (see also Ota and Nei 1994). Interestingly, the branches leading to the family III pseudogenes are somewhat longer than those leading to the pseudogenes of other families, possibly due to their relatively ancient loss of functionality. Synonymous and Nonsynonymous Sequence Divergence for Duplicate Genes A unique feature of the human k gene family is the recent block gene duplication that encompasses most of the original Vk gene region (Schäble and Zachau 1993). We took advantage of this recent simultaneous duplication of many functional genes and pseudogenes to study the mechanisms of evolution of FRs and CDRs and the relative substitution rates of functional genes and pseudogenes. The number of functional gene pairs used was 12, because we used only duplicates that have been shown to be expressed (Tomlinson et al. 1996). However, in this study, we did not use the gene pairs A30/L14, L1/L15, and L9/L24, because they did not form a cluster in phylogenetic analysis when the whole sequence or a portion of the Vk gene was used for the analysis, and thus they might have been affected by gene conversion (see also Huber et al. 1993b). Eight pseudogene pairs were used in which both members did not contain large insertions or deletions. 54 Sitnikova and Nei Table 1 Numbers of Nucleotide Differences per Site (3103) for Duplicate Human Vk Functional Genes and Pseudogenes, with the Actual Numbers of Substitutions for the Entire Concatenated Sequence Given in Parentheses Number of Gene Pairs Functional genes, pN . . . . . . Functional genes, ps . . . . . . . Pseudogenes, p . . . . . . . . . . . . 12 12 8 FRsa CDRs 2.1 6 1.1 (4)b 7.8 6 3.5 (5) 8.9 6 2.3 (15) 10.6 6 3.8 (8)c 0.0 (0) 4.6 6 2.7 (3) Total 4.6 6 1.3 (12) 5.9 6 2.6 (5) 7.7 6 1.8 (18) a The framework regions (FRs) of a V gene included 70 codons, whereas the length of complementarity-determining k regions (CDRs) varied with gene from 25 to 31 codons because of the length variation in CDR1. The average number of nucleotides used for a pseudogene was 289. b p in the FRs is significantly lower than p for pseudogenes at the 1% level according to Fisher’s exact test. N c p in CDRs is significantly higher than p in FRs at the 1% level. N N We concatenated the nucleotide sequences of the functional genes for the p and d contigs separately and computed the number of synonymous differences per synonymous site (pS) and the number of nonsynonymous differences per nonsynonymous site (pN) by using Nei and Gojobori’s (1986) method. This method does not take into account different rates of transitional and transversional substitutions but seemed to be appropriate in this case, because the transition/transversion ratio was low. Similarly, all pairs of the pseudogenes from the p and d contigs were concatenated separately, and the overall nucleotide difference (p) for duplicate pseudogenes was computed. To test the null hypothesis that synonymous and nonsynonymous substitutions in functional genes occur with the same rate, we used Fisher’s exact test, because we could count all synonymous and nonsynonymous substitutions unambiguously. Furthermore, the total number of substitutions was so small that we could not use large-sample tests such as the likelihood ratio or chi-square test (see Sokal and Rohlf 1995, p. 702; Zhang, Kumar, and Nei 1997). We used a 2 3 2 contingency table for the numbers of synonymous and nonsynonymous sites with and without substitutions. Fisher’s exact test was also applied to compare the numbers of nucleotide substitutions in functional genes and pseudogenes. The results of this study are presented in table 1. In FRs, pN is lower than pS, whereas in CDRs, pN exceeds pS, although not significantly. These results support Tanaka and Nei’s (1989) conclusion that purifying selection operates in FRs, whereas positive selection enhances the diversity of CDRs. Furthermore, in the comparison of FRs and CDRs, pN is significantly higher in CDRs than in FRs, whereas the difference in pS’s between these two regions is not statistically significant. This is again in agreement with Tanaka and Nei’s (1989) results concerning the difference in functional constraints in FRs and CDRs. When both CDRs and FRs are included, the extent of sequence divergence (p) for duplicate pseudogenes is higher than the pN or pS value for duplicate functional genes (table 1). This suggests that pseudogenes generally evolve faster than functional genes, as in the case of nonimmunological genes. In contrast, pN in CDRs is considerably higher than the pseudogene p value, again supporting the notion that CDRs are subject to positive Darwinian selection. Note that pS in CDRs appears to be unusually low compared to pS in FRs and p in pseudogenes. However, this difference is not statistically significant and can be accounted for by sampling errors. In general, our estimates of the nucleotide diversity between genes of the p and d contigs (table 1) are slightly lower than those of Schäble and Zachau (1993) (;1%). This is because, unlike Schäble and Zachau (1993), we analyzed coding regions only and excluded all the pseudogenes with large deletions and insertions and the genes that might have undergone gene conversion. We used synonymous nucleotide substitutions to estimate the time of duplication in the human Vk region. We estimated the rate of synonymous substitutions using the Vk human gene B3 and its homolog from chimpanzee assuming that humans and chimpanzees diverged about 5 MYA. (These genes are likely to be orthologous, because they are very closely related, and B3 is a single gene belonging to the human family IV [Ermert et al. 1995].) Thus, the estimate of the time of duplication in the human Vk region becomes 1–2 MYA, depending on the region used for estimation, i.e., FRs only or both FRs and CDRs. This estimate is still consistent with Schäble and Zachau’s (1993) estimate (1 MYA). Phylogenetic Analysis of Vk Sequences from Vertebrate Species In order to examine the relationships of human Vk genes with those from other species, we conducted a phylogenetic analysis of Vk and Vk-like genes from various vertebrate species. The results obtained (fig. 3) show that the mammalian Vk sequences form a cluster separate from the Vk genes from sharks, bony fishes, and amphibians (fig. 3). The first split of Vk sequences obviously occurred between shark sequences and others. The cluster of shark sequences is statistically supported, and their divergence from other sequences is likely to have occurred at the time or before cartilaginous fishes and other vertebrates diverged. The second split within the rest of the sequences separates the mammalian Vk sequences from the sequences of bony fishes and amphibians. The cluster of the amphibian and bony fish sequences is not statistically supported, suggesting that these sequences may not represent a monophyletic cluster. Immunoglobulin Vk Genes 55 FIG. 3.—Phylogenetic tree of 64 Vk sequences from vertebrate species (see Materials and Methods). Ten mammalian Vk groups are indicated by letters A–I. First and second codon positions were used, because divergent sequences are included. The number of codons used was 69 because of the lack of one codon in the Vl genes compared to the Vk genes. The bootstrap probability and confidence probability values showing significant statistical support for the clusters of mammalian sequences from the same Vk group are shown in bold letters. The cluster of mammalian Vk sequences is also not very reliable, because the bootstrap value was low (,50%). However, the mammalian sequences were monophyletic in the phylogenetic analysis of amino acid sequences for the same genes (data not shown) and in the two previous phylogenetic studies (with the exception of the rabbit sequences discussed below) (Lundqvist et al. 1996; Partula et al. 1996). In contrast, in the phylogenetic studies conducted by Daggfeldt, Bengten, and Pilström (1993), Rast et al. (1994), and Haire et al. (1996), Vk sequences from mammals were intermingled with Vk-like genes from other vertebrate species, possibly due to the error in the tree reconstruction caused by a poor representation of Vk genes. The mammalian Vk sequences can be classified into 10 Vk groups based on the clusters of the tree (fig. 3). We call a cluster of Vk sequences a Vk group if the cluster is statistically supported or nearly supported by at least one of the two statistical tests and remains stable when all available mouse sequences are included in the analysis as well as when amino acid rather than nucleotide sequences are used in the analysis (data not shown). The group Db is not statistically supported but remains stable when other sequences are included. Groups Da and Db were classified as separate groups, because they may represent different clusters when more sequences are added. The mammalian Vk groups defined in our analysis are generally in agreement with those obtained by Krömer et al. (1991) from the phylogenetic analysis of human and mouse Vk sequences. However, they classified human Vk III (group C in our tree), human Vk A10/14 (group F), and mouse Vk 23 (group H) 56 Sitnikova and Nei FIG. 4.—Phylogenetic tree for 48 mammalian Vk sequences. Thirty sequences, including rabbit sequences, belong to Vk group A, whereas nine pairs of other sequences represent nine Vk groups (B–I). into one family and human Vk IV (group Da) and mouse Vk 8, Vk 19/28, and Vk 22 (group Db) into another (see fig. 3 in Krömer et al. 1991). Most mammalian Vk groups include sequences from humans as well as from other mammalian species. Interestingly, all three single-gene human families (IV, V, and VII) have their counterparts in other mammals (groups Da, E, and G). However, two Vk groups (C and F) consist of only human sequences, and three groups (Db, H, and I) are not shared by humans and contain exclusively mouse sequences. As in the case of human Vk families, the evolutionary relationships of mammalian Vk groups are not statistically resolved. Representation of Mammalian Vk Groups in Different Species The tree for the Vk sequences from vertebrate species (fig. 3) shows variation in the representation of Vk groups in various species. The human, mouse, and rat genomes contain representatives of many Vk groups, whereas the sheep and horse sequences are restricted to one group, although it is not clear how many Vk groups actually exist in the latter organisms, because no extensive survey of Vk genes has been made (see Ford, Home, and Gibson 1994). Another species which is interesting in this aspect is the rabbit. In phylogenetic trees constructed from amino acid Vk sequences of various mammalian species, the rabbit sequences often cluster separately from other mammalian sequences (e.g., Hohman, Schluter, and Marchalonis 1992; Lundqvist et al. 1996; Partula et al. 1996). Because it seemed unlikely that the rabbit retained genes from a Vk group that was lost from the genomes of all other mammals, we examined the phylogenetic relationships of all available rabbit Vk sequences together with some representatives of mammalian Vk groups (fig. 4). The results show that rabbit Vk sequences form a tight cluster that is closely related to Vk group A. Since the definition of mammalian Vk groups was based on the statistically supported clusters in the tree (fig. 3), and the evolutionary relationships of these groups were not well established, we can assign the rabbit sequences to Vk group A. However, the interior branch leading to the rabbit cluster is very long, indicating that the rabbit sequences are quite divergent from other group A sequences. Such an extensive divergence is possibly related to an unusual configuration of the majority of rabbit immunoglobulins, where a cysteine bridge exists between variable and constant regions of light k chains (Mage 1987). From this analysis, we can see that mammalian group A genes are shared by all species so far studied, with the possible exception of the horse, for which the only sequence available at this moment belongs to group Da. In addition, group A genes are most numerous in both humans and mice; 23 of the 76 Vk genes in humans Immunoglobulin Vk Genes (Schäble and Zachau 1993) and about 46 of the 140 Vk genes estimated in mice (Kirschbaum, Jaenichen, and Zachau 1996) belong to group A. In this respect, the group A Vk genes are equivalent to the group C VH genes, which exist in the genomes of many mammals, reptiles, birds, amphibians, and bony fishes (Tutter and Riblet 1989; Ota and Nei 1994; Nei, Gu, and Sitnikova 1997). Discussion Long-Term Evolution of Vk Genes As mentioned earlier, two models have been invoked to describe the evolution of immunoglobulin multigene families: birth-and-death evolution and concerted evolution. In the first model, the birth of genes occurs by gene duplication with subsequent divergent evolution and the death of genes is caused by either gene deletion or loss of functionality (Nei, Gu, and Sitnikova 1997). This model predicts the generation of a diverse repertoire of genes in the gene family, including nonfunctional genes. The second model assumes frequent occurrence of gene conversion and recombination between alleles of the same locus as well as between those of different loci (Ohta 1980, 1983). This mode of evolution therefore has a tendency to homogenize the entire multigene family. Our phylogenetic analysis of immunoglobulin light-chain Vk genes from the human and other vertebrate species indicates that the long-term evolution of Vk genes is consistent with the model of birth-and-death evolution rather than with the model of concerted evolution. First, there is a high level of Vk gene diversity within the mammalian genome. We showed that Vk genes from mammalian species form 10 different groups that apparently emerged before the mammalian radiation. Second, a substantial number of genes are nonfunctional. For example, 27 (about 36%) of the 76 Vk genes in the human are pseudogenes (Tomlinson et al. 1996). Although conversion-like events may occur occasionally among Vk genes, they do not seem to affect the long-term evolution of the genes very much. Third, a block gene duplication involving a large number of genes may occur, and some of the duplicate genes may then be deleted, as in the case of human Vk genes. Probably, the DNA region involved in a duplication event is generally smaller than that observed in the human Vk region, but gene duplication and deletion seem to have occurred repeatedly, because different Vk gene families are intermingled with respect to chromosomal location. This strongly supports the idea of birth-and-death evolution. The birth-and-death process plays an important role in the evolution of other immunoglobulin gene families as well (Ota and Nei 1994; Nei, Gu, and Sitnikova 1997). The immunoglobulin heavy-chain VH genes from species of tetrapods form three VH gene groups that have persisted in the genome for more than 400 Myr (Tutter and Riblet 1989; Schroeder, Hillson, and Perlmutter 1990; Ota and Nei 1994). The immunoglobulin l chain V (Vl) gene family in vertebrates comprises three major 57 groups, one group being more closely related to Vk genes than to the other two groups, suggesting that Vl genes are paraphyletic in relation to Vk genes (Hayzer 1990; Zezza, Stewart, and Steiner 1992; Rast et al. 1994; Haire et al. 1996). The divergence of light-chain V genes apparently started before the emergence of cartilaginous fishes (i.e., .470 MYA), since the shark genomes contain VL genes closely related to mammalian Vk genes (type III VL genes) and two other VL groups that cannot be classified as either Vl’s or Vk’s (Rast et al. 1994; Haire et al. 1996). The presence of divergent gene groups in the mammalian genome for such a long period of time cannot be explained by a random birth-and-death process. It seems that some sort of selection is operating to maintain the diverse gene groups in the genome. In recent years, it has been shown that certain antigens called superantigens are preferentially bound by the antibodies encoded by VH genes belonging to a specific human VH family (Silverman 1995; Zouali 1995). The interaction of antibodies with superantigens involves the third framework region (FR3) as well as CDRs. FR3 has been shown to be conserved among the genes belonging to the same VH family (Kirkham et al. 1992). No such antigen specificity has been reported for light chains, but different groups of VL genes are likely to be adapted to cope with different groups of antigens. Chromosomal Location of Human Vk Genes The human Vk genes belonging to the same family are not necessarily located on the chromosome contiguously. A similar observation was made for the human VH and Vl genes (Honjo and Matsuda 1995; Williams et al. 1996). This suggests that both VH and VL genes undergo frequent gene reshuffling. By contrast, the mouse VH and Vk families seem to be located in a more clustered manner, although details of the gene map in mice remain unclear (Honjo and Matsuda 1995; Kirschbaum, Jaenichen, and Zachau 1996). Human Vk genes B3, B2, and B1 are unique in several aspects. They represent single-gene families IV, V, and VII, respectively, and are located next to the Jk gene region. Genes B2 and B3 have a transcriptional direction opposite that of all other Vk genes from the p contig and Jk and Ck genes. Genes homologous to these genes are found in the great apes, where families V and VII are represented by a single gene (Ermert et al. 1995). The counterparts of these families have also been found in the mouse (see fig. 3). Interestingly, the member genes of mouse family 21, which is homologous to human family VII, are all clustered on the chromosome and are located proximal to the Jk region (Kirschbaum, Jaenichen, and Zachau 1996). Similarly, the member genes of the mouse families 8, 22, 19/28, which are closely related to human family IV, are clustered on the chromosome and are located second next (after the family 21) to the Jk region. Therefore, the genes located near the Jk region seem to have been conserved for a long time, suggesting that they are vital for immune response. Similarly, human VH gene 6–1, the only representative of VH family 6, is the most proximal to the DH region. 58 Sitnikova and Nei This gene is quite conserved and is the only member of VH family 6 in great apes and New World monkeys (Meek, Eversole, and Capra 1991). Diversity of VH and VL Genes in Different Species The level of diversity among V genes varies with species, and it is usually related to the mechanism of generating antibody diversity. In the human and mouse, all VH, Vk, and Vl genes can be divided into a number of families that emerged long before the divergence of mammals, and the involvement of various V genes in somatic DNA rearrangement is essential for the production of diverse antibodies. By contrast, V gene families in some species are quite homogeneous, and somatic gene conversion and somatic hypermutation are employed to generate antibody diversity. In chickens, both VH and VL (Vl) gene families include one functional gene and a large number of pseudogenes, and all of these genes belong to a single family of heavy- and a single family of light-chain V genes (Reynaud et al. 1987, 1989; Ota and Nei 1995). (There are no Vk genes in chickens.) Thus, the diversity of expressed antibodies is generated by somatic gene conversion of the functional gene by pseudogenes. A similar situation exists for rabbit VH genes, for which one VH gene is preferentially expressed and undergoes sequence diversification by somatic gene conversion using many other donor genes, all of which belong to a single family (Knight and Tunyaplin 1995). The rabbit k chain genes have not been studied as extensively as the heavychain genes, but our phylogenetic analysis shows that all published rabbit Vk sequences are closely related to one another and form one cluster (see also Mage 1987). This suggests that, as in the case of rabbit heavy-chain genes, there is a mechanism for somatic diversification of rabbit Vk genes. Similarly, cattle, sheep, and pigs are known to have a restricted amount of VH diversity (Sun et al. 1994; Sinclair and Aitken 1995; Dufour, Malinge, and Nau 1996). Although Vk sequences for these organisms are not well studied, it is possible that they also have a restricted amount of Vk gene diversity. Further studies on this point will help us to understand the mechanism of evolution of immunoglobulin genes. Acknowledgments We thank Reinhard Kofler for sending us a database of mouse Vk sequences and Igor Belyi and Koichiro Tamura for making their computer programs available for phylogenetic analysis. We would like to thank Andrey Rzhetsky, Yasuo Ina, George Zhang, and Chen Su for their comments. This study was supported by grants from NIH and NSF to M.N. LITERATURE CITED DAGGFELDT, A., E. BENGTEN, and L. PILSTRÖM. 1993. A cluster type organization of the loci of the immunoglobulin light chain in Atlantic cod (Gadus morhua) and rainbow trout (Oncorhynchus mykiss Walbaum) indicated by nucleotide sequences of cDNAs and hybridization analysis. Immunogenetics 38:199–209. DUFOUR, V., S. MALINGE, and F. NAU. 1996. The sheep Ig variable region repertoire consists of a single VH family. J. Immunol. 156:2163–2170. ERMERT, K., H. MITLÖHNER, W. SCHEMPP, and H. G. ZACHAU. 1995. The immunoglobulin k locus of primates. Genomics 25:623–629. FELSENSTEIN, J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783–791. FORD, J. E., W. A. HOME, and D. M. GIBSON. 1994. Light chain isotype regulation in horse. Characterization of Ig k genes. J. Immunol. 153:1099–1111. GALLY, J. A., and G. M. EDELMAN. 1972. The genetic control of immunoglobulin synthesis. Annu. Rev. Genet. 6:1–46. GOJOBORI, T., and M. NEI. 1984. Concerted evolution of the immunoglobulin VH gene family. Mol. Biol. Evol. 1:195– 212. . 1986. Relative contributions of germline gene variation and somatic mutation to immunoglobulin diversity in the mouse. Mol. Biol. Evol. 3:156–167. HAIRE, R. N., T. OTA, J. P. RAST, R. T. LITMAN, F. Y. CHAN, L. I. ZON, and G. W. LITMAN. 1996. A third Ig light chain gene isotype in Xenopus laevis consists of six distinct VL families and is related to mammalian l genes. J. Immunol. 157:1544–1550. HAYZER, D. J. 1990. Immunoglobulin lambda light chain evolution: Igl and Igl-like sequences form three major groups. Immunogenetics 32:157–174. HOHMAN, V. S., S. F. SCHLUTER, and J. J. MARCHALONIS. 1992. Complete sequence of a cDNA clone specifying sandbar shark immunoglobulin light chain: gene organization and implications for the evolution of light chains. Proc. Natl. Acad. Sci. USA 89:276–280. HONJO, T., and F. MATSUDA. 1995. Immunoglobulin heavy chain loci of mouse and human. Pp. 145–171 in T. HONJO and F. W. ALT, eds. Immunoglobulin genes. 2nd edition. Academic Press, San Diego. HOOD, L., J. H. CAMPBELL, and S. C. R. ELGIN. 1975. The organization, expression, and evolution of antibody genes and other multigene families. Annu. Rev. Genet. 9:305–353. HUBER, C., E. HUBER, A. LAUTNER-RIESKE, K. F. SCHÄBLE, and H. G. ZACHAU. 1993a. The human immunoglobulin k locus. Characterization of the partially duplicated L regions. Eur. J. Immunol. 23:2860–2867. HUBER, C., K. F. SCHÄBLE, E. HUBER, R. KLEIN, A. MEINDL, R. THIEBE, R. LAMM, and H. G. ZACHAU. 1993b. The Vk genes of the L regions and the repertoire of Vk gene sequences in the human germ line. Eur. J. Immunol. 23:2868– 2875. KABAT, E. A., T. T. WU, H. M. PERRY, K. S. GOTTESMAN, and C. FOELLER. 1991. Sequences of proteins of immunological interest. 5th edition. U.S. Department of Health and Human Services, Government Printing Office, Washington, D.C. KIMURA, M. 1980. A simple method for estimating evolutionary rates of base substitution through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111–120. KIRKHAM, P. M., F. MORTARI, J. A. NEWTON, and H. W. SCHROEDER JR. 1992. Immunoglobulin VH clan and family identity predicts variable domain structure and may influence antigen binding. EMBO J. 11:603–609. KIRSCHBAUM, T., R. JAENICHEN, and H. G. ZACHAU. 1996. The mouse immunoglobulin k locus contains about 140 variable gene segments. Eur. J. Immunol. 26:1613–1620. KLEIN, R., and H. G. ZACHAU. 1995. Expression and hypermutation of human immunoglobulin k genes. Ann. N.Y. Acad. Sci. 764:74–83. KNIGHT, K. L., and C. TUNYAPLIN. 1995. Immunoglobulin heavy chain genes of rabbit. Pp. 289–314 in T. HONJO and Immunoglobulin Vk Genes F. W. ALT, eds. Immunoglobulin genes. 2nd edition. Academic Press, San Diego. KOFLER, R., and A. HELMBERG. 1991. Comment to the article ‘‘A new Igk-V gene family in the mouse.’’ Immunogenetics 34:139–140. KRÖMER, G., A. HELMBERG, A. BERNOT, C. AUFFRAY, and R. KOFLER. 1991. Evolutionary relationship between human and mouse immunoglobulin kappa light chain variable region genes. Immunogenetics 33:42–49. KUMAR, S., K. TAMURA, and M. NEI. 1993. MEGA: molecular evolutionary genetics analysis. The Pennsylvania State University, University Park. KURTH, J. H., J. L. MOUNTAIN, and L. L. CAVALLI-SFORZA. 1993. Subclustering of human immunoglobulin kappa light chain variable region genes. Genomics 16:69–77. KWAN, S. P., E. E. MAX, D. LEDER, and M. D. SCHARFF. 1981. Two kappa immunoglobulin genes are expressed in the myeloma S107. Cell 26:57–66. LUNDQVIST, M., E. BENGTÉN, S. STRÖMBERG, and L. PILSTRÖM. 1996. Ig light chain gene in the siberian sturgeon (Acipenser baeri). Implications for the evolution of the immune system. J. Immunol. 157:2031–2038. MAGE, R. G. 1987. Molecular biology of rabbit immunoglobulin and T-cell receptor genes. Pp. 106–133 in S. DUBINSKI, ed. The rabbit in contemporary immunological research. Longman Scientific & Technical, London. MEEK, K., T. EVERSOLE, and J. D. CAPRA. 1991. Conservation of the most JH proximal Ig VH gene segment (VHVI) throughout primate evolution. J. Immunol. 146:2434–2438. NEI, M., and T. GOJOBORI. 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418–426. NEI, M., X. GU, and T. SITNIKOVA. 1997. Evolution by the birth-and-death process in multigene families of the vertebrate immune system. Proc. Natl. Acad. Sci. USA 94:7799– 7806. OHTA, T. 1980. Evolution and variation of multigene families. Springer-Verlag, Berlin. . 1983. On the evolution of multigene families. Theor. Popul. Biol. 23:216–240. OTA, T., and M. NEI. 1994. Divergent evolution and evolution by the birth-and-death process in the immunoglobulin VH gene family. Mol. Biol Evol. 11:469–482. . 1995. Evolution of immunoglobulin VH pseudogenes in chickens. Mol. Biol. Evol. 12:94–102. PARGENT, W., K. F. SCHÄBLE, and H. G. ZACHAU. 1991. Polymorphism and haplotypes in the human immunoglobulin k locus. Eur. J. Immunol. 21:1829–1835. PARTULA, S., J. SCHWAGER, S. TIMMUSK, L. PILSTRÖM, and J. CHARLEMAGNE. 1996. A second immunoglobulin light chain isotype in the rainbow trout. Immunogenetics 45:44– 51. PECH, M., and H. G. ZACHAU. 1984. Immunoglobulin genes of different subgroups are interdigited within the Vk locus. Nucleic Acids Res. 12:9229–9236. RAST, J. P., M. K. ANDERSON, T. OTA, R. T. LITMAN, M. MARGITTAI, M. J. SHAMBLOTT, and G. W. LITMAN. 1994. Immunoglobulin light chain class multiplicity and alternative organizational forms in early vertebrate phylogeny. Immunogenetics 40:83–99. REYNAUD, C.-A., V. ANQUEZ, H. GRIMAL, and J.-C. WEILL. 1987. A hyperconversion mechanism generates the chicken light chain preimmune repertoire. Cell 48:379–388. REYNAUD, C.-A., A. DAHAN, V. ANQUEZ, and J.-C. WEILL. 1989. Somatic hyperconversion diversifies the single VH gene of the chicken with a high incidence in the D region. Cell 59:171–183. 59 REYNAUD, C.-A., C. GARCIA, W. R. HEIN, and J.-C. WEILL. 1995. Hypermutation generating the sheep immunoglobulin repertoire is an antigen-independent process. Cell 80:115– 125. RZHETSKY, A., and M. NEI. 1992. A simple method for estimating and testing minimum-evolution trees. Mol. Biol. Evol. 9:945–967. . 1994. METREE: a program package for inferring and testing minimum-evolution trees. CABIOS 10:409–412. SAITOU, N., and M. NEI. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406–425. SCHÄBLE, K. F., and H. G. ZACHAU. 1993. The variable genes of the human immunoglobulin k locus. Biol. Chem. Hoppe Seyler. 374:1001–1022. SCHAIBLE, G., G. A. RAPPOLD, W. PARGENT, and H. G. ZACHAU. 1993. The immunoglobulin k locus: polymorphism and haplotypes of Caucasoid and non-Caucasoid individuals. Hum. Genet. 91:261–267. SCHROEDER, H. W. JR., J. L. HILLSON, and R. M. PERLMUTTER. 1990. Structure and evolution of mammalian VH families. Int. Immunol. 2:41–50. SILVERMAN, G. J. 1995. Unconventional B-cell antigens and human immune repertoire. Ann. N.Y. Acad. Sci. 764:342– 355. SINCLAIR, M. C., and R. AITKEN. 1995. PCR strategies for isolation of the 59 end of an immunoglobulin-encoding bovine cDNA. Gene 167:285–289. SITNIKOVA, T. 1996. Bootstrap method of interior-branch test for phylogenetic trees. Mol. Biol. Evol. 13:605–611. SITNIKOVA, T., A. RZHETSKY, and M. NEI. 1995. Interiorbranch and bootstrap tests of phylogenetic trees. Mol. Biol. Evol. 12:319–333. SOKAL, R. R., and F. J. ROHLF. 1995. Biometry: the principles and practice of statistics in biological research. 3rd edition. W. H. Freeman, New York. STROHAL, R., A. HELMBERG, G. KROEMER, and R. KOFLER. 1989. Mouse Vk gene classification by nucleic acid sequence similarity. Immunogenetics 30:475–493. SUN, J., I. KACSAKOVICS, W. R. BROWN, and J. E. J. BUTLER. 1994. Expressed swine VH genes belong to a small VH gene family homologous to human VHIII. J. Immunol. 153:5618– 5627. TANAKA, T., and M. NEI. 1989. Positive Darwinian selection observed at the variable-region genes of immunoglobulins. Mol. Biol. Evol. 6:447–459. TOMLINSON, I. M., S. C. WILLIAMS, S. J. CORBETT, J. P. L. COX, and G. WINTER. 1996. V BASE: a database of human immunoglobulin variable region genes. MRC Centre for Protein Engineering, Cambridge, U.K. (http://www.mrccpe.cam.ac.uk/). TONEGAWA, S. 1983. Somatic generation of antibody diversity. Nature 302:575–581. TUTTER, A., and R. RIBLET. 1989. Conservation of an immunoglobulin variable-region gene family indicates a specific, noncoding function. Proc. Natl. Acad. Sci. USA 86:7460– 7464. WEILL, J.-C., and C.-A. REYNAUD. 1995. Generation of diversity by post-rearrangement diversification mechanisms: the chicken and the sheep antibody repertoires. Pp. 267–288 in T. HONJO and F. W. ALT, eds. Immunoglobulin genes. 2nd edition. Academic Press, San Diego. WILLIAMS, S. C., J.-P. FRIPPIAT, I. M. TOMLINSON, O. IGNATOVICH, M.-P. LEFRANC, and G. WINTER. 1996. Sequence and evolution of the human germline Vl repertoire. J. Mol. Biol. 264:220–232. 60 Sitnikova and Nei ZACHAU, H. G. 1995. The human immunoglobulin k genes. Pp. 173–191 in T. HONJO and F. W. ALT, eds. Immunoglobulin genes. 2nd edition. Academic Press, San Diego. ZEZZA, D. J., S. E. STEWART, and L. A. STEINER. 1992. Genes encoding Xenopus laevis Ig L chains. Implications for the evolution of k and l chains. J. Immunol. 149:3968–3977. ZHANG, J., S. KUMAR, and M. NEI. 1997. Small-sample tests of episodic adaptive evolution: a case study of primate lysozymes. Mol. Biol. Evol. 14:1335–1337. ZHARKIKH, A. A., A. Y. RZHETSKY, P. S. MOROZOV, T. L. SITNIKOVA, and J. S. KRUSHKAL. 1991. VOSTORG: a package of microcomputer programs for sequence analysis and construction of phylogenetic trees. Gene 101:251–254. ZOUALI, M. 1995. B-cell superantigens: implications for selection of the human antibody repertoire. Immunol. Today 16: 399–405. NARUYA SAITOU, reviewing editor Accepted September 24, 1997