Download Divergent Evolution and Evolution by the Birth-and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epigenetics of diabetes Type 2 wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene therapy wikipedia , lookup

Point mutation wikipedia , lookup

Genetic engineering wikipedia , lookup

Copy-number variation wikipedia , lookup

Epistasis wikipedia , lookup

Non-coding DNA wikipedia , lookup

Adaptive evolution in the human genome wikipedia , lookup

Gene nomenclature wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Transposable element wikipedia , lookup

Oncogenomics wikipedia , lookup

Human genome wikipedia , lookup

Metagenomics wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Public health genomics wikipedia , lookup

Gene desert wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Essential gene wikipedia , lookup

History of genetic engineering wikipedia , lookup

Pathogenomics wikipedia , lookup

Gene expression programming wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

RNA-Seq wikipedia , lookup

Genomic imprinting wikipedia , lookup

Gene wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Ridge (biology) wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Genome (book) wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Designer baby wikipedia , lookup

Minimal genome wikipedia , lookup

Genome evolution wikipedia , lookup

Microevolution wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
Divergent Evolution and Evolution by the Birth-and-Death Process in the
Immunoglobulin VH Gene Family
Tatsuya Ota and Masatoshi Nei
Department
of Biology and Institute
of Molecular
Evolutionary
Genetics,
The Pennsylvania
State University
Immunoglobulin
diversity is generated primarily by the heavy- and light-chain variable-region gene families. To
understand
the pattern of long-term evolution of the heavy-chain
variable-region
(V,) gene family, which is
composed of a large number of member genes, the evolutionary
relationships
of representative
Vn genes from
diverse organisms of vertebrates were studied by constructing
a phylogenetic
tree. This tree indicates that the
vertebrate Vn genes can be classified into group A, B, C, D, and E genes. All Vn genes from cartilaginous fishes
such as sharks and skates form a monophyletic
group and belong to group E, whereas group D consists of bonyfish Vn genes. By contrast, group C includes not only some fish genes but also amphibian,
reptile, bird, and
mammalian
genes. Group A and B genes were composed of the genes from mammals and amphibians.
The
phylogenetic analysis also suggests that mammalian
Vu genes are classified into three clusters-i.e.,
mammalian
clans I, II, and III-and
that these clans have coexisted in the genome for >400 Myr. To study the short-term
evolution of Vn genes, the phylogenetic
analysis of human group A (clan I) and C (clan III) genes was also
conducted. The results obtained show that Vn pseudogenes have evolved much faster than functional genes and
that they have branched off from various functional Vu genes. There is little indication that the Vu gene family
has been subject to concerted evolution that homogenizes member genes. These observations indicate that the Vn
genes are subject to divergent evolution due to diversifying selection and evolution by the birth-and-death
process
caused by gene duplication and dysfunctioning
mutation. Thus, the evolutionary pattern of this monofunctional
multigene family is quite different from that of such gene families as the ribosomal RNA and histone gene families.
Introduction
Monofunctional
multigene families, where all
member genes have the same function, are generally believed to undergo coincidental or concerted evolution,
which homogenizes the DNA sequences of the member
genes (Smith et al. 197 1; Smith 1974; Zimmer et al.
1980). Good examples of concerted evolution are seen
with ribosomal RNA (rRNA) genes (e.g., see Arnheim
1983) and histone genes (e.g., see Matsuo and Yamazaki
1989)) where all the member genes have virtually identical coding sequences within species. This is understandable, because these gene families produce a large
quantity of gene products of the same function that are
essential for the survival of an organism. Concerted evolution is caused either by unequal crossover or intergenic
gene conversion.
Key words: birth-and-death process, divergent evolution, immunoglobulin, multigene family, VH genes.
Address for correspondence and reprints: Masatoshi Nei, Institute
of Molecular Evolutionary Genetics, The Pennsylvania State University,
328 Mueller Laboratory, University Park, Pennsylvania 16802.
Mol. Biol. Evol. 11(3):469-482. 1994.
0 1994 by The University of Chicago. All rights reserved.
0737-4038/94/l 103~0014$02.00
Concerted evolution was also invoked to explain
the evolution of the immunoglobulin heavy-chain variable-region (Vu) genes (Hood et al. 1975; Ohta 1980,
1983). Gojobori and Nei ( 1984), however, showed that
Vu gene (often called “gene segment”) family in the
mouse and human undergoes a very slow rate of concerted evolution, if any; that the extent of sequence divergence between member genes in the mouse or the
human is substantial; and that some of these genes diverged much earlier than the mammalian radiation
( N 80 Mya) (also see Tutter and Riblet 1989a).
During the past 10 years, the DNA sequences of
Vu genes have been determined for many lower vertebrate organisms (see Litman et al. 1993 ) . We can therefore study the pattern of evolution of the Vu gene family
on a much longer evolutionary time scale. This type of
study is particularly interesting, because the gene organization of Vu genes is now known to vary extensively
among different classes of vertebrates (Du Pasquier
1993). In this paper, we show that the pattern of Vu
gene evolution is very different from the typical concerted evolution and that two evolutionary processes,
which may be called “divergent evolution” and “evolution by the birth-and-death process,” predominate in
469
470
Ota and Nei
Ancestral form
VDJ
Cartilaginous fishes
I
V
DDJ
C
VDDJ
C
Higher vertebrates
IV
VDDJ
C
.: ,:
-IHHHHW
C
Vn
Horned shark
V
DDJ
C
VDDJ
C
Dn
B
Jn
Cn
Mammals
VDDJ
C
Little skate
vVn
V
Dn
J
Cn
Chicken
FIG.1.-Genomic organizations of immunoglobulin heavy-chain genes. Cartilaginous fishes have the ( VH-DH-Dn-JH-CH)ntype organization
including germ-line-joined genes, whereas most higher vertebrates, including bony fishes and tetrapods, have the (Vn),-( Dn)“-( Jr_,)“-(Cn), type
gene organization. Chicken has a single functional variable gene. V = variable-segment gene; D = diversity-segment gene; J = joining-segment
gene; C = constant-segment gene; and w = pseudogene. For more detailed information, see the work of Litman et al. ( 1993).
the evolution of Vn genes in vertebrates, despite the fact
that these genes have the same function. We first study
the long-term
evolution
of VH genes, examining
the
evolutionary
relationships
of the genes obtained from
various classes of vertebrates,
and then a short-term
evolution of human Vn genes.
Material and Methods
Background
Information
Immunoglobulin
or antibody molecules consist of
two heavy chains and two light chains. Both the heavy
and light chains are composed of the variable region and
the constant region (C). The variable region is responsible for antigen-binding,
whereas the constant region is
responsible for effector function. The heavy-chain
variable region is encoded by the variable-segment
( Vn ) ,
diversity-segment
(Dn), and joining-segment
(Jn) genes,
whereas the light-chain variable region is encoded by the
variable-segment
(V,) and joining-segment
( JL) genes.
However, the major portion of the variable region is
determined
by either Vn or VL. The Vn and Vi_ genes
are evolutionarily
homologous with each other, but their
sequence similarity is quite low.
The VH (as well as V,) genes can be subdivided
further into two hypervariable
or complementarity-determining regions (CDR 1 and CDR2 ) and three framework regions (FWl, FW2, and FW3). (CDR3 is determined primarily
by the Dn and Jn genes, so it was
excluded in this paper). The human and mouse genomes
contain > 100 VH genes and a few to a dozen Dn and
Jn genes (see fig. 1). Each of the Vn’s is recombined
with one of the Du’s, one of the Jn’s, and one of the
Cn’s (constant region genes), to form a heavy-chain gene
(Tonegawa
1983). This process therefore generates a
large number
of different immunoglobulins
that are
necessary for defending an individual
from many different foreign antigens. Essentially the same humoral
immune system exists for most of the bony fishes and
tetrapods.
In lower vertebrates
such as sharks and skates,
however, the genomic organization
of the immunoglobulin genes is different from that of higher vertebrates,
which may be described as (V,-D,-J,-C,)
. In lower vertebrates, the basic unit of repeat of genes is (V-D-D-JC) (fig. 1); that is, the linked gene group Vn-Dn-Dn-JuCu is repeated several hundred times in the genome,
Evolution of Immunoglobulin VH Gene Family
and the repeat seems to be scattered on different chromosomes (Litman et al. 1993). Therefore, the immunoglobulin genes in these organisms may be represented
by (V-D-D-J-C),, .
Of course, there are many exceptions to this rule.
Some units of shark genes have the Vu and Du genes
that are already united (VDD-J-C)
in the germ-line genome, whereas in some other units all VH , Du’s, and JH
are united (VDDJ-C) (Litman et al. 1993). In the little
skate (Raja erinacea), (VD-DJ-C)
are also present.
Furthermore,
in the coelacanth (Latimeria chalumnae),
which belongs to the lobe-finned fishes, (V-D) is apparently repeated many times in the 5’ side of the J and C
genes ( Amemiya et al. 1993). Actually, the number of
lower vertebrates
studied so far is very small, so it is
likely that some new types of genomic organization
of
Vu genes will be discovered in the near future.
Another group of vertebrates that has a deviant genomic organization
of the VH genes is that of avian species represented by chicken (for review, see McCormack
et al. 199 1) . In these species, there are only one functional Vn gene and - 80 Vu pseudogenes in the genome
(Reynaud
et al. 1989), and the antibody diversity is
generated by gene conversion of the functional gene by
the pseudogenes (fig. 1) . (The same process operates for
the light-chain
genes as well; Reynaud
et al. 1987;
Thompson
and Neiman
1987.) However, this system
may not apply for all avian species, because in the Muscovy duck there seem to be at least two functional
VL
genes (McCormack
et al. 1989). It is interesting that a
somewhat similar pattern has been observed in the rabbit. In this organism, the D-proximal
Vn gene is preferentially used, though there are b 100 VH genes and
many of them seem to be functional
(Knight
1992).
The immunoglobulin
diversity in this organism seems
to be generated by gene conversion
and somatic mutation.
In the primitive jawless vertebrates such as lampreys
and hagfishes, an extensive search for immunoglobulin
genes has been conducted, but no genes that are clearly
homologous
to the immunoglobulin
genes in cartilaginous fishes or higher vertebrates have been identified,
though antibodies are clearly present in these organisms
(Litman et al. 1993). Therefore, a completely different
immune system may exist in these species.
Nucleotide
Sequences
Used
At the present time, there are >600 Vu gene sequences that have been determined
for the various
groups of organisms mentioned
above. This number includes both germ-line and cDNA sequences, and the
majority of them come from the human and mouse
(Kabat et al. 199 1) . For studying the long-term evolution
of Vu genes, it is necessary to use representative
genes
47 1
from each species and to cover a wide range of organisms.
We have therefore included (a) almost all sequences
from a species where a small number of genes have been
studied but (b) a relatively few representative
sequences
from a species where a large number of genes have been
studied. For the study of long-term evolution, no pseudogenes were included. In both human and mouse, >60
functional
germ-line genes have been sequenced,
but
Schroeder et al. ( 1990) showed that they can be classified
into three major clusters, i.e., clans I-III (also see Kirkham et al. 1992). Our preliminary
analysis of -600
mammalian
VH genes, which were obtained
from
GenBank and Kabatnuc
databases, has also suggested
that these genes can be grouped into the three major
clusters (data not shown). We have therefore used 11
and 7 representative
sequences from the mouse and the
human, respectively, covering the three major clusters,
i.e., clans I-III. More than 30 different Vn sequences
are also available from the African toad Xenopus laevis.
We chose 11 representative
sequences from them. For
other organisms, we used almost all available genes after
exclusion of closely related ones-except
for chicken,
where we used only one gene, which is used exclusively
for antibody generation. For the purpose of rooting the
Vu gene tree, we included a horned shark light-chain Vi_
gene and a human preB gene, which is known to be
homologous to vertebrate lambda VL genes (Kudo and
Melchers 1987).
The total number of Vu and outgroup genes used
for our study of long-term evolution was 57, and they
are listed in table 1. We used the germ-line sequences
whenever possible, to avoid the effect of somatic mutation ( Tonegawa 1983 ) . In a few species, however, we
used cDNA, because germ-line genes were not available
and sequence divergences were so large that the effect
of somatic mutation
was negligible. The cDNA sequences are denoted by asterisks in table 1. Since we
used many genes from many different organisms, we
designated each gene by the first letter of both the genus
name and the species name plus the gene designation
used by the original authors, except in the African toad
(X . Zaevis), where we used “Xe” instead of “Xl” because
“1” (el) is easily confused with “ 1” (one).
For the study of “short-term”
evolution,
we included pseudogenes as well as functional genes and used
10 VH genes belonging to the clan I genes and 28 genes
belonging to the clan III genes. (Clan II genes were not
used, because there was only one pseudogene.)
All of
these data were taken from recent exhaustive studies of
human Vu genes by Shin et al. ( 199 1) and Matsuda et
al. ( 1993). These Vu genes are scattered throughout
a
0.8-Mb region of the human Vn gene cluster of chromosome 14 and are intermingled
with different Vu clan
genes.
472
Ota and Nei
Table 1
VH Genes Used for Phylogenetic Analysis in Figure 4
V Gene(s)” (references)
Heavy chains:
Cartilaginous fishes:
Sharks:
Nurse shark (Ginglymostoma cirratum) . . .
Horned shark (Heterodontus franc&i)
. ... ...
Skate:
........
Little skate (Raja erinacea)
Bony fishes:
Ray-finned fishes:
Goldfish (Carassius auratus)
....
.... .. .
Ladyfish (Elops saurus)
Channel catfish (Zctalurus punctatus) .
Rainbow trout (Oncorhynchus mykiss)
Lobe-finned fishes:
Coelacanth (Latimeria chalumnae)
Amphibians:
African toad (Xenopus laevis) . . . . .
GcA4-2* (Vazquez et al. 1992)
HfFlOl, Hf1207, Hfl315, Hf1320, Hf1403, Hf2806, Ht2807,
Hf2809, Hf3083 (Kokubu et al. 1988)
. .
. ..
Re102, Re107, RellO (Harding et al. 1990b)
. .....
Reptiles:
..
Caiman (Caiman crocodylus)
Birds:
Chicken (Gallus domesticus)
. ..
Mammals:
.. .
Human (Homo sapiens)
Mouse (Mus musculus)
..
Rabbit (Oryctolagus cuniculus) . . . .
Light chains (outgroups):
Horned shark (Heterodontus francisci)
Human (Homo sapiens) . . . . . . . . . . . .
CaVH99A (Wilson et al. 199 1)
Es14501*, Es14515* (Amemiya and Litman 1990)
IpNG lo*, IpNG22*, IpNG54*, IpNG64*, IpNG66* (Ghaffari
and Lobb 199 1); Ip10.4* (Warr et al. 199 1)
0mRTVH431 (Matsunaga et al. 1990)
Lc1501 la (Amemiya et al. 1993)
XeVH 1 (Yamasaki-Kataoka and Honjo 1987); XeLL2.8,
XeLL3.4 (Schwager et al. 1989); Xe4, Xe5, Xe6, Xe7, Xe9,
XelO, Xel l.lb (Haire et al. 1991)
Ccc3 (Litman et al. 1983)
GdVHl (Reynaud et al. 1989)
HsVH251 (Humphries et al. 1988); HsVH26 (Mathyssens and
Rabbitts 1980); HsVH6 (Schroeder et al. 1988); HsV 1.4.l.b,
HsV2.5 (Shin et al. 1991); HsV35 (Matsuda et al. 1988);
Hsl2G-1 (Lee et al. 1987)
MmA/J (Gefter et al. 1984); MmH30 (Schiff et al. 1985);
MmH4a-3 (Schiff et al. 1986); MmME3 (Kasturi et al. 1990);
MmPJ 14 (Sakano et al. 1980); MmVH283 (0110 et al. 1983);
MmVH441 (0110 et al. 1981); MmVl 1 (Siu et al. 1987);
Mm1.8, Mm43X (Rathbun et al. 1988); Mm186-2 (Bothwell et
al. 1981)
Ocp26.9.p2 (Currier et al. 1988)
Hf122 (Shamblott and Litman 1989)
HsVPB6 (Bauer et al. 1988)
’ An asterisk (*) indicates a cDNA sequence.
Phylogenetic
Analysis
In the study of the evolutionary
relationships
of Vn
genes for the entire group of vertebrates, we used amino
acid sequences rather than nucleotide sequences, because
the Vn genes are highly divergent and the nucleotide
differences at certain positions apparently have reached
the saturation
level. All amino acid sequences were
aligned by using the program CLUSTAL V (Higgins et
al. 1992 ) , with some modification
by visual inspection.
As mentioned
earlier, a Vu gene consists of CDRs
and FWs. However, CDRs are highly variable and contain many insertions/deletions
(indels) when the Vn
and VL sequences from diverse organisms are included
(see fig. 2), so they were excluded from the analysis. In
the present analysis even the FW regions showed some
indels. All sites showing indels were excluded from the
analysis. Therefore, the total number of amino acid sites
used for phylogenetic analysis was 64. Three sequences,
one from each of the horned shark ( Hf 1113 ) , little skate
( Re20 * ) , and African toad ( Xe8 ) , had several indels in
FW2 and FW3, and it was difficult to align them with
other sequences ( fig. 2 ) . Therefore, they were not used
for the present analysis. In the study of a “short-term”
evolution of Vn genes, we used nucleotide sequences,
since the extent of sequence divergence was relatively
small in this case. Because we were interested in the
Evolution of Immunoglobulin Vu Gene Family
Heavy chain
FRl
I
HsV35
MmH30
HsV12G-1
m/J
HsVH26
MmVH283
CaVH99A
HfFlOl
XelO
Xe8
Hf1113
Re20'
Light chain
HsVPB6
Hr122
CDRl
FR2
CDR2
473
FR3
I
I
I
QVQLVQSGAEV-KKPGASVKVSCXA-SGYTFTGYYM-IRINPNSGGTNYAQKFQGRVTSTRDTSISTAYMELSRLRSDDTVWYCAR
K.R....-...I.Y...S..Y...N...KDKA.L.A.K.S.....Q..S.T.E.SA......
I.Y.Y-Y..S.Y.NPSLKS
MSV...KNQFSLK..SVTAV..A......
E . ..QE..PSL-V..SQTLSLT.SV-T.DSI.S-DYWN.I.~~::~~::~..Y.S-Y..S.Y.NPSLKS:~~I
. .. ..KNQY.LQ.NSVT.E..AT.....
E .LE..GGL-VQ..G.LRL..A.-..F..SS.A.-S.......K.-.. .VSA.SGSG.S.Y.GDSVK..F.IS..N.KN.L.LQMNS..AE..A.....K
E:id..E..GGL-V...G.L.L..A.- ..F..SS.T.-S....T.EKR-.. .VAT.SSGG.N.Y.PDSVK..F.IS..NAKNNL.LQM.S...E..AL.....
.-S.TS.DSV.-....E..TL..T.- ..FSMSS.W.-G.I..K..K-...I.Q.SGST--~...SL..QFSI.T...~HL.~S.KTE..A......
D.V.T.PE..T-RS..C.LRLT..T-..FN.GSHT.-.....V....-.. .LVYY-YI.SNIH..PGVKD.F.ASK..LNNIFALDMKN.KTE.NAI.F...
DPA.T.EESQLE..HND.F.LT.RG- ..FD.SQ.G.-N.I..T..K.-.. .L.L.WYDASK.V..KSVE..LVI..NNAEQVTF...KN.VYQ..A....T.
EIL.S.KAS.I-ANV.T.ILLQ.EV-....NI~HH.-........P.TI..~YR--.DT.YISES.KE...PS--..G...QLRI.K.S.S..AT...T.
DIV.T..ASVT-....E.H.L..SV-..FSLDS..V-...K....K.-...LLAYRKP.DTIY..PGV...IIPS.AS-ST.Y-I.IKS...E..AM.....
A.V.N.KPT.A-A.S.E.L.LT.VT-..FSLSSSNV-...K.V..K.-.. .V-A.MWYDDDKD..PA.S..F.VS..S-SNVY-LQMTN...AT....A
.PV.H.PP.MS-SAL.TTIRLT.TLRNDHDIGV.SV-Y.YQ.R..HP-PRFLL.YF-SQSDKSQGPQVPP.FSGSK.VA~G.LSI.E.QPE.~....M
VPV.N.TPISDPVSA.ETSELK.AMQN.-KMGS.. .-SiY..R..EA-PV.~HS-T..SIYRGTG.TD.FKPS....SNSHILTIGS.EPG.SA.....A
+++++++++ +++++++++++
+++++++++ +++++
++++++++++++++++++++++++++++++
FIG. 2.-Amino acid sequences of the variable regions of immunoglobulins from diverse organisms. They are representative sequences
from five major groups of Vn segments (A, B, C, D, and E) in fig. 4 and three divergent sequences (Hfll13 [Hinds-Freyet al. 19931; Re20*
[Harding et al. 1990~1; and Xe8 [Haire et al. 19911) that contain additional indels in FW2 and FW3. Two VL sequences, which were used as
outgroups, are also included. A dash (-) indicates an indel, and the 64 sites marked with a plus sign ( +) were used for phylogenetic construction
in fig. 4. CDRl and CDR2 were not used because their sequence divergence was extensive and the sequence alignment was unreliable.
dynamics of the evolution of Vu genes in this study, we
included both functional genes and pseudogenes. There
was no difficulty in the alignment of the sequences, except in the case of a few pseudogenes. These few pseudogenes were excluded from the analysis.
The phylogenetic analysis of both amino acid and
nucleotide sequence data was conducted by using the
minimum-evolution method ( Rzhetsky and Nei 1992).
This method has a solid theoretical basis and is known
to be efficient in obtaining correct phylogenetic trees
(Rzhetsky and Nei 1993). The evolutionary distances
used were the Poisson-correction distance (see Nei 1987,
p. 4 1 ), for amino acid sequence data, and Jukes and
Cantor’s ( 1969) distance, for nucleotide sequence data.
The reliability of the tree obtained was tested by computing the standard error of each interior branch and
conducting a t-test (Rzhetsky and Nei 1992).
Results
Evolutionary Relationships of Vn Genes from Diverse
Organisms of Vertebrates
The Poisson-correction distances for all pairs of 55
Vu and two outgroup sequences are presented in figure
3, whereas the phylogenetic tree obtained is given in
figure 4. Figure 4 shows that the Vu genes in vertebrates
can be classified into five groups, i.e., groups A, B, C,
D, and E. Group A includes all mammalian clan I genes
and a few Xenopus genes, whereas group B consists of
all mammalian clan II genes and some Xenopus genes.
Group C includes mammalian clan III genes and Vn
genes from chicken, caiman (reptile), Xenopus, coelacanth, and some bony fishes. Group D consists of bonyfish VH genes only, whereas all group E genes are from
cartilaginous fishes.
Figure 4 shows that the first evolutionary split of
Vu genes occurred between group E genes and others.
This is reasonable, because the genomic organization of
the genes from cartilaginous fishes is different from that
of other vertebrates and because cartilaginous fishes separated from the other vertebrates -450 Mya. The second
split of VH genes in the rest of vertebrates occurs between
bony fishes (group D) and tetrapods, though group C
includes some bony (ray-finned) -fish and coelacanth
(lobe-finned fish) genes. However, the confidence probability (CP) ( 1 - type I error) of group D genes is relatively low (63%). This is caused mainly by the fact that
the genes CaVH99A, IpNG64 *, and Es 145 15 * in this
group are relatively closely related to the genes XeVH 1,
Es1450 1 *, and 0mRTVH43 1 of group C (fig. 3). This
somewhat anomalous relationship of genes may be due
to either stochastic errors of amino acid substitution or
some intergenic recombination or gene conversion
events that occurred a long time ago. The next level of
splitting is between group A genes and group (B+C)
genes. The group A gene cluster also has a relatively low
CP value. This low CP value (64%) is mainly due to the
fact that the gene Xe 10 is similar to some group C and
D genes (see fig. 3 ) . If we exclude this gene, the cluster
of group A genes shows a CP value of 96%. The final
clusters of group B and C genes show a relatively high
CP value.
Within each group of Vu genes, there are several
clusters that are highly significant. In group E, for example, the skate genes are substantially different from
the shark genes and form a cluster with a 99% probability. Apparently, this cluster separated from the shark
cluster in an early stage of evolution, and, if more data
accumulate and the cluster is still supported, we may
have to separate it from the shark group. Note that sharks
and skates diverged -2OOMya(Caroll1988,
p. 62-83).
Significant clusters (CP>95%) are also observed in all
other groups. An interesting one is the mouse Vu cluster
Gene
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
1.H.svH251
(A)
2 HsVlA.lb
3. Hsv35
4. MmH30
30
35 28
44 39 37
5. Mml.8
6. IUm43X
7. h4mVHl862
54 49 47
49 47 44
49 44 42
11
11 16
9 16 16
8. MmH4a-3
9. xes
49 39 44
18 26 22 22
lO.Xell.lb
ll.Xe9
(B)
5
12. xc10
13. HsVH6
54 54 72 75 79 69 69 60 57 54
90 79 75 90 98 98 86 94 79 75 75
72 66 79 69 75 69 66 54 82 66 82 821
14. MmAiJ
1s. H8Vl2G-1
16. MmPJl4
79 72 79 75 82 79 75 57 82 72 98 94 28
63 63 72 69 75 79 66 60 79 69 90 82 24 37
60 66 79 82 86 82 86 75 63 66 82 79 47 52 42
17. Hsv2.5
18. XeLL28
19. xeLL3.4
20. xe4
21.X&
22. xe7
86 90 94 102 111 111 102 86 82 66 102 82 47 49 44 44
63 82 82 82.90 90 79 79 66 63 79 82 52 54 42 44 39
23. HsVH26
24. MmVH283
2.5.htUlVH441
26. Gcp26.9p2
27.hhVll
28.MmME3
29.GdVI-n
30. CCC3
3l.LclSOlla
(0
P
75
63
66
86
44
54
54
79
69
63
98
60
66
66
90 98
69 75
72 82
106121
63 66
69 79
63 69
52
54
57
69
49
49
52
52
52
52
54
54
69
54
57
52
52
54
72
69
69
75
66
66
66
63
72
66
69
72
86
69
79
79
79
75
98 79
75 63
79 66
116102
66 63
72 66
72 60
35. IpNG54*
36. CaVH99A
37. IpNG64’
38.E914515*
39. IpNGlO’
40. IpNG22’
52 54 72 63 66 66
79 75 98 86 94 94
86 82 98 94 98 94
41. IpNG66*
42. IplO.4.
72
86
19
72
43. HfFlOl
44.Hf2809
45.Hf2807
46. Hfl320
03
72
72
69
86
49
54
57
63
72
69
90
79
82
79
79
75
72 66 79 90 98 90
60 57 69 79 86 75
63 60 72 79 19 82
32. XeVHl
33.Es14501*
34. GmRw31
@I
49 44 52 60 69 63 63 63
44 42 52 60 69 69 60 54 33
47. r-m806
48. Hfl403
49. Hfl2u7
50.Hf1315
5l.Hf3083
52. GcA4-2,
53. RclO2
54.RellO
55. RelO7
56. Hf122
57. IwTB6
72
75
72
66
90
86
86
79
98
94
94
90
63
72
75
94
79
79
86
86
82
106
98
90
90
94
102
90
63
69
72
82
69
79
75
79
75
82
66
75
98
54
57
52
75
72
75
94
69
75
69
69
60
69
98
54
60
66
86 86 57
72 98 66
106 90 69
116 106 79
69 86 152
72 82 63
72 86 66
72
57
57
75
57
63
63
60
63
60
69
49
60
57
69
90
79
94
44
57
57
66
63
66
69
63
63
72
54
54
69
72
52
57
54
60
63
75
69
75
75
26
82
66
69
66
57
72
63
79
63
75
72
72
19
66
75
75
79
66
64
63
57
60
57
66
63
69
52
60
52
54
49
66
69
72
75
75
66
54
63
63
66
66
72
72
69
79
82
57
69
52
63
66
69
66
75
82
52
66
44
47
52
60
66
60
69
52
57
63
60
69
69
69
79
82
52
66
54
54
63
66
66
60
69
47
52
82
75
79
82
82
79
72
66
66
72
72
69
79
75
72
69
57
57
86
90
98
94
98
90
75
63
66
60
66
63
72
69
82
75
52
66
72
66 86
60 75
60 86 ;;
72 90 22
72 94 20
72 94 22
69 90 37
79 98 28
79 90 30
79 98 35
72 69 28
75 94 31
-8-3
22
35 28
30 30 30
26 35 35 28
49 52 42 42 39
33 39 39 37 47 52
39 47 49 37 39 47 35
44 47 47 49 52 63 42 42
37 44 47 42 39 52 39 33 35
49
90 86 69 54 75 75160 66 60 57 51 54 66 66 12 82 44 47
79 63 60 57 63 72172 72 69 75 69 12 75 66 86 lo; 60 66
72 60 69 57 69 75 79 79 75 75 19 69 69 63 79 111 60 69
60 63 66 66 63 86 69 79 66 60 69 66 60 63 75 lo; 54 63
86 82 82 75 75 82 94 102 82 72 79 82 86 106111 121 75 86
86 82 94 98 79 86 86 90 82 72 90 90 82 90 82 111 72 82
94 90 82 72 82 94 82 90 19 75 69 82 69 86 86 1oC 79 90
86 82 79 75 69 94 I82 82 75 75 69 82 69 82 94 lit 75 90
86 86 79 79 79 90182 106 86 79 82 82 69 94 94 111 69 79
86 82 75 72 79 86 75 94 79 75 72 79 66 90 94 101 63 69
54
54
66
63
54
63
72
72
52
60
69
72
47
44
75
75
60
63
98
98
47
54
63
66
37
52
63
60
33
47
57
57
60
19
66
82
60
82
75
94
54
79
75
82
63
86
94
90
75
86
94
90
63
82
90
82
60
82
94
82
66
72
79
72
16
30 28
47 42 57
52 44 66 13
57 52 63 47
72 63 69 57
79 15 82 66
69 60 57 60
54
63 35
66 35 35
72
79 82 75 82 86 94 82 82 75 75 63 69 75
82 72 69 82 90 75 75 60 66 69 82 79 72
86
79 69 63 75 90 72 72 54 60 66 75 69 75
69 63 75 86 90 90 86 75 72 69 79 86 69 94 72 69 75 79 69 90 94 101 57 69 69 69 60 69 86 69 66 54 57 63 12 69 69
72 66 79 90 98 94 86 82 75 72 72 79 69 90 72 75 69 72 63 82 90 98 63 72 79 75 66 75 90 72 72 54 54 60 69 66 69
79 69 86 98 106102 94 90 75 72 72 82 72 90 75 79 69 75 12 86 98 98 66 72 82 79 66 79 94 72 75 54 57 63 72 66 72
66 63 19 86 94 90 82 79 72 69 79 82 66 86
63 57 72 79 86 82 75 72 66 63 75 82 63 82
66 60 72 79 86 82 75 72 66 66 72 86 69 86
IS 69 86 94 98 94 90 90 75 72 82 94 79 94
75 72 75 90 90 94 86 79 75 69 79 94 66 82
34 79 90 102 102 106 98 98 86 82 79 86 79
a6 72 86 94 94 98 90 90 75 79 79 86 75
32 66 79 90 90 94 86 86 72 79 79 90182
06 90 111 102 102 116 102 94 98 86 106 121194
11 102126 116 126 138 106 102106 86 106126111
69 66 72 69 60 79 82 98 57 66 75 69 63
66 63 69 69 57 75 19 94 52 60 69 66 60
66 69 72 66 60 75 82 98 54 63 63 66 63
82 69 75 79 69 94 98 101 60 66 75 75 57
63 75 72 69 79 75 90 98 69 79 75 72 79
111 82 86 86 86 82 98 111 121 69 82 86 82 79
111 82 82 90 90 75 102106121 63 75 79 75 72
111 86 90 90 102 79 94 102116 66 82 82 79 75
102 94 111 94 116 102 98 106 111 82 94 106102 90
116116 106 111 116 106 116111 111 02 102 106 116102
30 35 47
37 42 44 39
79 82 75 90 75
69 69 72 79 69
11
72 75 82 82 75 16 11
66 64 72 75 69 15 9 13
66 69 72
72 90 63 60 52 54 60 66 66 69 66 75 79
63 86 63 66 54 49 54 63 6063637275757213
66 86 66 69 54 54 57 66 60 63 63 69 72
66 79 75 69 66 66 69 72 72 79 72 72 82
79 94 82 75 69 63 69 66 72 72 72 86 82
72 72
79 75
18 11 16 9
16 11 13 11 15
8 9 8 11
6
75 69 15 9 9 9 13 9
3
75 79 26 20 24 22 18 24 16 18
82 72 33 28 28 22 28 26 22 22 42
82 90 86 79 69 63 72 79 75 75 75 75 79 90 75 49 42 47 39 39 44 44 47 49 49
75 86 79 72 63 63 72 75 72 79 75 79 79 90 75 49 42 47 44 44 44 44 47 47 54 6
82 86 82 82 66 63 72 82 75 82 72 79 79 90 75 49 42 44 44 44 47 42 44 44 54 16 11
94 116 94 98 98 86 86 98 90 98 82 102102 86 86 94 94 86 94 86 90 86 90 94 98 94 90 90
121 121 102 106 98 98 111 116 116 106111106 102 111 98 82 82 79 82 79 79 79 82 82 102 90 90 90 75
FIG. 3.-Pairwise Poisson-correction distances for 55 V, and two VL sequences. The five major groups of VH genes and two VL outgroup genes are separated by lines. An asterisk (*)
indicates a sequence deduced from a cDNA.
Mammalian
clan I
Group A
HsVH6
MmA/J
HsV12G-1
97
91
1
1
Mammalian
clan II
Group B
African toad
Xe7
rl .
HsVH26
MmVH283
Mammalian
clan III
Chicken
Caiman
Coelacanth
African toad
99
1
I
CaVH99A
IpNG64”
Ray-finned
fishes
Group C
I
Group D
_,
Hf.2807
Hf1320
65f
Hi2806
71
Hf1403
Hf1207
62c
Hf1315
86 ,
Hf3083
GcA4-2*
Cartilaginous
fishes
HsVPB6
Group E
1
J Outgroup
FIG. 4.-Phylogenetic tree of 55 Vu and two VL (outgroup) genes. There are five major groups (A, B, C, D, and E) of Vn genes identified
from this tree, though the grouping is not necessarily statistically significant. The number given to each interior branch is the probability at
which the branch length is different from 0 (confidence probability). Confidence probability ~60% is not given. The branch lengths are measured
in terms of the number of amino acid substitutions, with the scales given below the tree.
475
476
Ota and Nei
Table 2
Jukes-Cantor
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
Distances (X100) for 10 Human Group A VH Genes
1.2 .........
1.3 .........
1.8 .........
1.18 ........
1.46 ........
~1.17 .......
~1.24 .......
~1.27 .......
~1.58 .......
5.51 ........
Xenopus 1 l.lba
.
.
.
1
2
3
4
5
6
7
8
9
10
7.7
3.5
6.1
4.0
10.4
9.8
21.2
10.4
27.2
50.8
7.1
7.7
5.5
11.0
10.4
23.8
11.0
31.5
53.7
6.6
4.0
11.5
8.7
21.8
9.8
29.3
51.7
6.1
12.7
10.4
26.5
12.1
29.3
48.9
11.5
9.3
22.5
9.8
30.0
49.8
16.8
31.5
16.8
37.7
56.7
29.3
13.2
33.0
53.7
28.6
37.7
61.0
32.3
54.7
49.8
NOTE.-“I+I” denotes a pseudogene.
’ Outgroup gene.
in group A. Although this cluster belongs to the mammalian clan I, it is quite different from the human gene
belonging to the same clan.
An important feature that emerges from figure 4 is
that the three mammalian VH clans separated a long
time ago before mammals and amphibians diverged
( -350 Mya), because group A, B, and C genes are
shared by both mammals and Xenupus; that is, mammalian Vu gene clans I-III have coexisted in the genome,
for 2350 Myr, as separate lineages. Figure 4 shows that
group C includes VH genes from both bony fishes and
tetrapods (including mammalian clan III genes) and that
separation of tetrapod group C genes from bony-fish
group C genes occurred after group A, B, and C genes
diverged. This suggests that the divergence of group A,
B, and C genes occurred even before bony fishes and
tetrapods diverged -400 Mya. It is impressive that different copies of VH genes have persisted for such a long
time in the tetrapod genome, and this raises a serious
question about the importance of concerted evolution
in Vu genes.
It should be noted that the mammalian Vu clans
I-III may not exist in many mammalian species. Indeed,
all Vu genes in the rabbit seem to belong to clan III, and
they are relatively closely related to.each other in terms
of sequence divergence (data not shown). In the rabbit,
however, the VH gene, which is located proximally to
the D segment gene cluster, is used with a high frequency
( 70%90% ) for generating antibody diversity, though
there are 2 100 Vu genes in the genome and the rest of
the genes seem to be used with a low frequency (Knight
1992).
A more extreme case of preferential use of one VH
gene is known in chicken. In this organism, only one
Vn gene (Vu 1) is functional, and the other Vn genes are
all pseudogenes (Reynaud et al. 1989; McCormack et
al. 199 1). These genes again belong to group C (see fig.
4) and are relatively closely related to each other (authors’ unpublished data). The V region diversity of this
organism is also generated by somatic gene conversion.
Evolutionary Relationships of Vu Genes in the
Human Genome
In the study mentioned above, we chose distantly
related VH genes from diverse organisms to study the
pattern of long-term evolution. To understand the dynamics of evolutionary change of Vn genes, however, it
is necessary to examine the evolutionary relationships
of genes within either the same species or closely related
species. The most extensive study of the structure and
physical map of Vu genes has been done in the human.
Matsuda et al. ( 1993 ) constructed the physical map of
the O&Mb DNA fragment that contains 64 VH genes,
and the nucleotide sequences of these genes are available
from the GenBank.
We examined the evolutionary relationships of Vu
genes belonging to groups A and C (groups a and c in
the notation of Matsuda et al.), as mentioned earlier.
Both of these groups include many pseudogenes, but
some of the pseudogenes were truncated or unalignable
with the functional genes. We used the Xenopus gene
11.1b as an outgroup for group A genes and used the
caiman gene C3 for group C genes. (Closely related genes
are preferable as outgroups.)
The Jukes-Cantor distances for group A genes are
presented in table 2, and the phylogenetic tree for these
genes is given in figure 5. This figure clearly shows that
pseudogenes evolve much faster than functional genes.
This is expected because pseudogenes do not have functional constraint and thus accumulate mutations more
rapidly than do functional genes (Li et al. 198 1; Miyata
and Yasunaga 198 1; Kimura 1983). This result is again
Evolution of Immunoglobulin Vn Gene Family
r-L,.,
c
v1.17
1.46
r-F
*
1.2
1.8
y-1
11.18
5.51
I‘
477
whose branch lengths are nearly the same as those of
the functional genes. These pseudogenes probably lost
their function very recently. At any rate, this tree also
does not support the view that intergenic gene conversion
or unequal crossover plays an important role to homogenize the member genes.
The mouse genome is also known to contain many
pseudogenes as well as functional Vn genes. Unfortunately, there seems to be no systematic study of Vn gene
sequences in this organism. However, using GenBank
sequences available, we again constructed a phylogenetic
trees for 39 group A mouse Vn genes (data not shown).
This tree also showed a tendency for pseudogenes to
evolve faster, and there was no clear-cut evidence of sequence homogenization.
v1-27
Xenopus 1l.lb
0.1
FIG. 5.-Phylogenetic tree of 10 group A human Vn genes. All
sequences except for Xenopus 11.1b were taken from Shin et al. ( 199 1)
and Matsuda et al. ( 1993). The Xenopus gene used here is the one of
the closest outgroup gene (see fig. 4). w = Pseudogene. The branch
lengths are measured in terms of the number of nucleotide substitutions,
with the scales given below the tree. The confidence probabilities of
interior branches are indicated by asterisks: * , >90%; and * *, >95%.
inconsistent with the view that the Vn genes are subject
to frequent (germ-line) gene conversion or unequal
crossover events. If intergenic gene conversion or unequal crossover occurs very often, we would expect that
both functional and pseudogenes would be mixed and
that most genes in the genome would have similar nucleotide sequences. The phylogenetic tree in figure 5 does
not show this pattern. It is interesting to see that all
functional genes have evolved nearly at the same rate
and that some functional genes are quite close to each
other. Gojobori and Nei ( 1984) estimated that the rate
of nucleotide substitution for FWs is 1.Ol X lo-‘, 0.65
X 10v9, and 2.63 X 10m9 per site per year per lineage
for the first, second, and third codon positions, respectively. Therefore, the average rate is 1.43 X 10 -9. If this
rate is reliable, even the closest pair of functional genes
( Hs 1.2 and Hs 1.8 ) seem to have diverged - 12 Mya.
The most divergent gene, Hs5.5 1, seems to have diverged
from the other five functional genes - 100 Mya. This
suggests that many functional Vn genes have persisted
for a long time in the human genome. When all group
A, B, and C genes are considered together, they have
survived for >400 Myr, as mentioned earlier.
Figure 6 shows the phylogenetic tree for group C
genes (distance estimates are not shown). The pseudogenes in this group also generally evolve much faster
than functional genes. There are a few pseudogenes
Discussion
Concerted Evolution
As mentioned earlier, multigene families are believed to be subject to unequal crossover, which generates
new duplicate genes or deletes some extant duplicate
genes (Smith et al. 197 1; Smith 1974). If this process
continues, duplicate genes in a multigene family tend
yr3.41
65
3.13
ye3.16
a
l
3.35
y 3.52
ty 3.63
y 3.54
Iy3.6
ez3
,_y3.38
yl3.37
L,;y
FIG. 6.-Phylogenetic tree of 28 group C human Vu genes. All
sequences except for Caiman C3 were taken from Matsuda et al. ( 1993).
The Caiman gene used here is the one of the closest outgroup gene
(see fig. 4). The confidence probabilities of interior branches are indicated by asterisks: *, >90%; * *, >95%, and + * *, >99%.
478
Ota and Nei
to have similar nucleotide sequences even in the presence
of mutation.
More recently, intergenic gene conversion
was considered
as an additional
mechanism
acting to
homogenize
the member genes of a multigene family
(Slightom
et al. 1980). These two mechanisms
are
thought to be the major causes of concerted evolution
that homogenizes
the member genes of such gene families as RNA genes and histone genes (fig. 7A) (Smith
1974; Ohta 1980; Arnheim
1983).
The concerted (coincidental)
evolution was also
invoked to explain the evolution
of immunoglobulin
diversity (Ohta 1980, 1983), as mentioned
earlier. In
this case, unequal crossover or gene conversion was regarded as a mechanism
to increase the genetic diversity
(polymorphism)
at a locus by introducing
new variants
from different loci. However, when Gojobori and Nei
( 1984) conducted a phylogenetic analysis of human and
mouse Vu genes, they noticed that the extent of sequence
divergence among different loci is very high and that, if
these genes were subject to concerted evolution, the rate
of unequal crossover or gene conversion
must be very
low-two
orders of magnitudes
lower than that of ribosomal RNA genes. To explain the difference between
the two gene families, they proposed that, whereas rRNA
genes are subject to purifying selection, Vn genes are
subject to diversifying selection.
Conducting
a study on the number of synonymous
(&) and nonsynonymous
(&) substitutions
in CDRs
and FWs, Ohta ( 1992) recently concluded
that the
amino acid variability in CDRs is primarily caused by
gene conversion.
This conclusion is based on her observations that ds is generally higher in CDRs than in FWs
and that & was not necessarily higher than ds, even in
CDRs. She argued that these observations do not support
the hypothesis of diversifying selection but are consistent
with the gene conversion hypothesis. However, there are
Time
species 1 species 2
species 1 species 2
(A) Concerted
evolution
(B) Divergent
evolution
species 1 sepcies 2
(C) Evolution by birth
and death process
FIG. 7.-Three different modes of evolution of multigene families.
0 = Functional genes; and 0 = pseudogenes.
three major problems in her study. First, in the study of
evolution of Vu genes, only germ-line sequences should
be used; but she used many cDNA sequences. Second,
CDRs evolve rapidly and include many indels. Therefore, for the comparison of ds and dN , only closely related
sequences should be used. When this is done, ds is nearly
the same for both CDRs and FWs, and dN > ds in CDRs
but dN < ds in FWs (Tanaka and Nei 1989). (In the
human, ds was higher in CDRs than in FWs in Tanaka
and Nei’s study; but the difference was not significant.)
Third, there are no data showing that gene conversion
occurs exclusively in CDRs. Actually, somatic gene conversion occurs in both CDRs and FWs in chicken
( McCormack and Thompson
1990). For the above reasons, Ohta’s conclusion does not seem to be supported
by available data.
Divergent
Evolution
In figure 4, we have seen that the major groups of
Vn genes in higher vertebrates have been maintained
in
the genome for ~400 Myr. This observation is in rough
agreement
with the estimate of divergence (300-400
Myr) obtained by Gojobori and Nei ( 1984) in a statistical analysis of human and mouse Vn genes (see Addendum in their paper). However, it is in sharp contrast
with the evolution of rRNA genes, where even the human and the chimpanzee,
which apparently
diverged
- 5 Mya (see Stoneking 1993), do not share the same
group of genes ( Arnheim 1983). In the present paper,
this type of long-term persistence of member genes of a
multigene family will be called “divergent
evolution,”
and it is schematically
represented in figure 7B.
There are two possible hypotheses to explain this
long-term persistence of Vu genes in the vertebrate genome (Gojobori and Nei 1984); one is to assume that
the rate of occurrence of unequal crossover is low in Vu
genes, and the other is the hypothesis that diversifying
selection operates among Vu genes. Gojobori and Nei
( 1984) estimated
the rate of occurrence
of unequal
crossover by fitting a mathematical
theory to the phylogenetic tree for Vu genes. The rate obtained was - 100
times lower than the estimate for the rRNA multigene
family. However, since there is no reason to believe that
the rate must be lower in Vu genes than in rRNA genes,
they supported the second hypothesis. The recent study
of the physical map of Vu genes by Matsuda et al. ( 1993)
supports Gojobori and Nei’s ( 1984) view, because group
A, B, and C Vu genes are scattered throughout the 0.8Mb Vn gene region, suggesting that unequal crossover
has occurred relatively frequently in the past (also see
Tutter and Riblet 1989b).
The theoretical basis of the second hypothesis is
that vertebrate individuals are exposed to a large number
of foreign antigens, and thus, if an individual has many
Evolution
different types of Vn genes, it will enhance the fitness of
the individual.
This means that a new mutant Vu gene
that encodes an antibody molecule matching for a new
type of antigens has selective advantage and that old
genes that have proved to be useful will be retained in
the genome. Indeed, Tanaka and Nei ( 1989) have shown
that the rate of amino acid-altering
(nonsynonymous)
nucleotide substitutions
is higher than the rate of synonymous nucleotide substitution in CDRs of the human
and mouse Vn genes. In other words, there is positive
Darwinian
selection operating for the diversification
of
Vu genes.
Diversifying selection would certainly increase the
persistence time of different Vn genes in the genome,
but can it explain the existence of three major groups
of Vu genes that have survived for ~400 Myr? Theoretically, it can happen by chance when random occurrence of unequal crossover creates new genes or deletes
existing ones. However, it is more likely that these three
groups are specialized to cope with different groups of
antigens, though there is no evidence at the present time.
Kirkham et al. ( 1992) showed that each of the mammalian clans I-III has a unique FR structure that influences the conformation
of the antigen-binding
site, and
they suggested that there might be a differential use of
Vn clans in the immune response.
Evolution
by the Birth-and-Death
Process
In the theory of concerted evolution, it is commonly
assumed that the total number of duplicate genes of a
gene family remains more or less constant (Ohta 1983 ).
Hughes and Nei ( 1989) and Nei and Hughes ( 199 1,
1992) showed that, in the major-histocompatibilitycomplex (MHC) gene family, gene duplication
often
occurs but that many duplicate genes die out because
of deleterious mutation.
However, the number of functional genes remains more or less constant, since the
effects of the birth and death of duplicate genes are balanced out. The dead genes either stay as pseudogenes in
the genome or are eliminated by unequal crossover. They
called this the “evolution by the birth-and-death
process”
(fig. 7C).
Figures 5 and 6 show that evolution by the birthand-death
process also occurs in the Vn gene family.
Matsuda et al. ( 1993) estimated that - 50% of Vu genes
that exist in the human genome are pseudogenes. Similarly, the mouse genome harbors many pseudogenes
(Rathbun
et al. 1989). Since the number of functional
VH genes in the genome is similar for many mammalian
species, these results indicate that nearly the same number of functional
genes as that of pseudogenes
is produced by unequal crossover for a given period of evolutionary time. We can therefore conclude that the Vu
gene family is subject to evolution by the birth-and-death
of Immunoglobulin
VH Gene Family
479
process, when a relatively short period of evolutionary
time is considered.
From the phylogenetic trees given in figures 5 and
6, we previously concluded that concerted evolution does
not play an important
role in the Vu gene family. This
does not mean that there is no homogenization
process
in the evolution of VH genes. As long as unequal crossover occurs, there must be some chance of homogenization. We also do not completely rule out the role of
intergenic gene conversion. Because gene conversion is
an important
mechanism
of somatic generation of antibody diversity in some organisms (e.g., chicken),
it
might have played some role in the evolution of genomic
structure of Vu genes as well. Nevertheless,
as far as
currently
available data are concerned,
the Vn gene
family appears to have evolved mainly following the
scheme of divergent evolution
and evolution
by the
birth-and-death
process.
Some Remarks
It should be emphasized that, although we now have
a rough picture of the evolution of VH genes in vertebrates, this picture is based on studies of a limited number of species. If more species are studied, we may find
new types of genomic organization
of Vu genes. It is
rather surprising that the genomic structure of the coelacanth is somewhat similar to that of cartilaginous fishes.
It is also interesting that the rabbit system is similar to
the chicken system. Although these systems should be
studied in more detail, they suggest that antibody diversity is generated in many different ways and that some
systems may have evolved independently
in different
groups of vertebrates. Obviously, more detailed studies
of the genomic structure of Vn genes are necessary in
many different organisms before we understand the general pattern of evolution of this multigene family.
Previously we suggested that the long persistence
of VH genes in vertebrate genomes is aided by diversifying selection. However, the extent of selective advantage of a new mutant Vn gene is probably small (possibly
l%-2%). The reason for this is that there are organisms
in which only one or a few Vu genes are functional and
in which others are either pseudogenes
or rarely used.
If every gene had a unique function that is essential to
the survival of the organism, this type of system would
never have evolved; somatic mutation
and intergenic
gene conversion would be too risky and too uneconomical in this case. The fact that many Vu genes die out in
the course of evolution also suggests that the adaptive
differences between different Vu genes is rather small.
Nevertheless, this small magnitude of adaptive difference (s) would be sufficient to explain the long persistence of Vn genes in the vertebrate genome if the population
size (N) is sufficiently
large, because
the
480 Ota and Nei
HARDING, F. A., C. T. AMEMIYA,R. T. LITMAN, N. COHEN,
and G. W. LITMAN. 1990a. Two distinct immunoglobulin
heavy chain isotypes in a primitive, cartilaginous fish, Raja
erinacea. Nucleic Acids Res. 18:6369-6376.
HARDING, F. A., N. COHEN, and G. W. LITMAN. 1990b. Immunoglobulin heavy chain gene organization and complexity in the skate, Raja erinacea. Nucleic Acids Res. 18:
1015-1020.
Acknowledgment
HIGGINS, D. G., A. J. BLEASBY,and R. FUCHS. 1992. CLUSThis work was supported by research grants from
TAL V: improved software for multiple sequence alignment.
the National Institute of Health (GM-20293) and the
Comput. Appl. Biosci. 8: 189- 19 1.
National Science Foundation (DEB-9 119802) to M.N.
HINDS-FREY,K. R., H. NISHIKATA,R. T. LITMAN,and G. W.
LITMAN. 1993. Somatic variation precedes extensive diLITERATURE
CITED
versification of germline sequences and combinatorial joining in the evolution of immunoglobulin
heavy chain diARNHEIM,N. 1983. Concerted evolution of multigene families.
versity. J. Exp. Med. 178:8 15-824.
Pp. 38-61 in M. NEI, and R. K. KOEHN, eds. Evolution of
HOOD, L., J. H. CAMPBELL,and S. C. R. ELGIN. 1975. The
genes and proteins. Sinauer, Sunderland, Mass.
organization, expression, and evolution of antibody genes
AMEMIYA, C. T., and G. W. LITMAN. 1990. Complete nuand other multigene families. Annu. Rev. Genet. 9:305cleotide sequence of an immunoglobulin heavy-chain gene
353.
and analysis of immunoglobulin
gene organization in a
HUGHES, A. L., and M. NEI. 1989. Evolution of the major
primitive teleost species. Proc. Natl. Acad. Sci. USA 87:
histocompatibility complex: independent origin of non811-815.
classical class I genes in different groups of mammals. Mol.
AMEMIYA,C. T., Y. OHTA, R. T. LITMAN, J. P. RAST, R. N.
Biol. Evol. 6:559-579.
HAIRE, and G. W. LITMAN. 1993. VH gene organization in
HUMPHRIES, C. G., A. SHEN, W. A. KUZIEL, J. D. CAPRA,
a relict species, the coelacanth Latimeria chalumnae: evoF. R. BLATTNER,and P. W. TUCKER. 1988. A new human
lutionary implications. Proc. Natl. Acad. Sci. USA 90:666 limmunoglobulin
VH family preferentially rearranged in
6665.
immature B-cell tumours. Nature 331:446-449.
BAUER, S. R., A. KUDO, and F. MELCHERS. 1988. Structure
JUKES, T. H., and C. R. CANTOR. 1969. Evolution of protein
and pre-B lymphocyte restricted expression of the VpreB
molecules. Pp. 2 1- 132 in H. N. MUNRO, ed. Mammalian
gene in humans and conservation of its structure in other
protein metabolism. Academic Press, New York.
mammalian species. EMBO J. 7: 11 l- 116.
KABAT, E. A., T. T. WV, H. M. PERRY, K. S. GOTTESMAN,
BOTHWELL,A. L. M., M. PASKIND, M. RETH, T. IMANISHIand C. FOELLER. 199 1. Sequences of proteins of immuKARI, K. RAJEWSKY, and D. BALTIMORE. 198 1. Heavy
nological interest, 5th ed. U.S. Department of Health and
chain variable region contribution to the NPb family of
Human Services, Government Printing Office, Washington,
antibodies: somatic mutation evident in a y2a variable reD.C.
gion. Cell 24:625-637.
KASTURI, K. N., R. MAYER, C. A. BONA, V. E. SCOTT, and
CAROLL, R. L. 1988. Vertebrate paleontology and evolution,
C. L. SIDMAN. 1990. Germline V genes encode viable
Freeman, New York.
motheaten mouse autoantibodies against thymocytes and
CURRIER, S. J., J. L. GALLARDA,and K. L. KNIGHT. 1988.
red blood cells. J. Immunol. 145:2304-23 11.
Partial molecular genetic map of the rabbit Vu chromoKIMURA, M. 1983. The neutral theory of molecular evolution.
somal region. J. Immunol. 140: 165 l- 1659.
Cambridge University Press, Cambridge.
Du PASQUIER,L. 1993. Phylogeny of B-cell development. Curr.
KIRKHAM,
P. M., F. MORTARI, J. A. NEWTON, and H. W.
Opin. Immunol. 5: 185- 193.
SCHROEDER,
JR. 1992. Immunoglobulin Vu clan and family
GEFTER, M. L., M. N. MARGOLIES, R. I. NEAR, and L. J.
identity predicts variable domain structure and may influWYSOCKI. 1984. Analysis of the anti-azobenzenearsonate
ence antigen binding. EMBO J. 11:603-609.
response at the molecular level. Annu. Inst. Pasteur ImKNIGHT, K. L. 1992. Restricted VH gene usage and generation
munol. 135C: 17-30.
of antibody diversity in rabbit. Annu. Rev. Immunol. 10:
GHAFFARI, S. H., and C. J. LOBB. 1991. Heavy chain variable
593-6 16.
region gene families evolved early in phylogeny. J. Immunol.
KOKUBU, F., R. LITMAN, M. J. SHAMBLOTT,K. HINDS, and
146:1037-1046.
G. W. LITMAN. 1988. Diverse organization of immunoGOJOBORI,T., and M. NEI. 1984. Concerted evolution of the
globulin Vu gene loci in a primitive vertebrate. EMBO J.
immunoglobulin
Vu gene family. Mol. Biol. Evol. 1:1957~3413-3422.
212.
KUDO, A., and F. MELCHERS. 1987. A second gene, VpreBin
HAIRE, R. N., Y. OHTA, R. T. LITMAN, C. T. AMEMIYA,and
the h5 locus of the mouse, which appears to be selectively
G. W. LITMAN. 199 1. The genomic organization of imexpressed in pre-B lymphocytes. EMBO J. 6:2267-2272.
munoglobulin VH genes in Xenopus laevis shows evidence
LEE, K. H., F. MATSUDA, T. KINASHI, M. KODAIRA, and T.
for interspersion of families. Nucleic Acids Res. 19:306 lHONJO. 1987. A novel family of variable region genes of
3066.
maintenance of Vu genes would be determined mainly
by Ns. If the theoretical study on frequency-dependent
selection with respect to MHC genes (Takahata and Nei
1990) gives any guidance, it suggests that the persistence
of different Vn genes would be enhanced substantially
by diversifying selection when Ns > 100.
Evolution of Immunoglobulin VH Gene Family
48 1
the human immunoglobulin heavy chain. J. Mol. Biol. 195: . 1992. Balanced polymorphism and evolution by the
76 l-768.
birth-and-death process in the MHC loci. Pp. 27-38 in K.
LI, W. H., T. GOJOBORI, and M. NEI. 198 1. Pseudogenes as
TSUJI, M. AIZAWA, and T. SASAZUKI, eds. HLA 199 1.
a paradigm of neutral evolution. Nature 292:237-239.
Proceedings of the 11th Histocompatibility Workshop and
LITMAN, G. W., L. BERGER, K. MURPHY, R. LITMAN, K.
Conference. Vol. 2. Oxford University Press, Oxford.
HINDS, C. L. JAHN, and B. W. ERICKSON. 1983. Complete
OHTA, T. 1980. Evolution and variation of multigene families.
nucleotide sequence of an immunoglobulin
Vu gene hoSpringer, Berlin.
mologue from Caiman, a phylogenetically ancient reptile.
-.
1983. On the evolution of multigene families. Theor.
Nature 303:349-352.
Popul. Biol. 23:2 16-240.
LITMAN, G. W., J. P. RAST, M. J. SHAMBLOTT,R. N. HAIRE, -.
1992. A statistical examination of hypervariability in
M. HULST, W. ROESS, R. T. LITMAN, K. R. HINDS-FREY,
complementarity determining regions of immunoglobulins.
A. ZILCH, and C. T. AMEMIYA. 1993. Phylogenetic diverMol. Phylogenet. Evol. 1:305-3 11.
sification of immunoglobulin genes and the antibody repOLLO, R., C. AUFFRAY, J.-L. SIKORAV, and F. ROUGEON.
ertoire. Mol. Biol. Evol. 10:60-72.
198 1. Mouse heavy chain variable regions: nucleotide seMCCORMACK,W. T., L. M. CARLSON,L. W. TJOELKER,and
quence of a germ-line Vu gene segment. Nucleic Acids Res.
C. B. THOMPSON. 1989. Evolutionary comparison of the
9:4099-4 109.
avian IgL locus: combinatorial diversity plays a role in the
OLLO, R., J.-L. SIKORAV,and F. ROUGEON. 1983. Structural
generation of the antibody repertoire in some avian species.
relationships among mouse and human immunoglobulin
Int. Immunol. 1:332-34 1.
Vu genes in the subgroup III. Nucleic Acids Res. 11:7887MCCORMACK, W. T., and C. B. THOMPSON. 1990. Chicken
7897.
IgL variable region gene conversions display pseudogene
RATHBUN,G., J. BERMAN,G. YANCOPOULOS,and F. W. ALT.
donor preference and 5’ to 3’ polarity. Genes Dev. 4:5481989. Organization and expression of the mammalian
558.
heavy-chain variable region locus. Pp. 63-90 in T. HONJO,
MCCORMACK,W. T., L. W. TJOELKER,and C. B. THOMPSON.
F. W. ALT, and T. H. RABBITTS,eds. Immunoglobulin
199 1. Avian B-cell development: generation of an immugenes. Academic Press, London.
noglobulin repertoire by gene conversion. Annu. Rev. ImRATHBUN, G. A., F. OTANI, E. C. B. MILNER, J. D. CAPRA,
munol. 9:2 19-24 1.
and P. W. TUCKER. 1988. Molecular characterization of
MATSUDA, F., K. H. LEE, S. NAKAI, T. SATO, M. KODAIRA,
the
A/J 5558 family of heavy chain variable region gene
S. Q. ZONG, H. OHNO, S. FUKUHARA,and T. HONJO. 1988.
segments.
J. Mol. Biol. 202:383-395.
Dispersed localization of D segments in the human imREYNAUD,
C.-A.,
V. ANQUEZ, H. GRIMAL, and J.-C. WEILL.
munoglobulin heavy-chain locus. EMBO J. 7: 1047- 105 1.
1987.
A
hyperconversion
mechanism generates the chicken
MATSUDA, F., E. K. SHIN, H. NAGAOKA, R. MATSUMURA,
light chain preimmune repertoire. Cell 48:379-388.
M. HAINO, Y. FUKITA, S. TAKA-ISHI, T. IMAI, J. H. RILEY,
REYNAUD, C.-A., A. DAHAN, V. ANQUEZ, and J.-C. WEILL.
R., ANAND, E. SOEDA,and T. HONJO. 1993. Structure and
1989. Somatic hyperconversion diversifies the single Vn
physical map of 64 variable segments in the 3’ 0.8 megabase
gene
of the chicken with a high incidence in the D region.
region of the human immunoglobulin
heavy-chain locus.
Cell 59:171-183.
Nature Genet. 3:88-94.
RZHETSKY, A., and M. NEI . 1992. A simple method for estiMATSUNAGA,T., T. CHEN, and V. T~RMANEN. 1990. Charmating and testing minimum-evolution
trees. Mol. Biol.
acterization of a complete immunoglobulin
heavy-chain
Evol. 9:945-967.
variable region germ-line gene of rainbow trout. Proc. Natl.
-.
1993. Theoretical foundation of the minimum-evoAcad. Sci. USA 87:7767-777 1.
lution method of phylogenetic inference. Mol. Biol. Evol.
MATSUO, Y., and T. YAMAZAKI. 1989. Nucleotide variation
10:1073-1095.
and divergence in the histone multigene family in DroSAKANO, H., R. MAKI, Y. KUROSAWA,W. ROEDER, and S
sophila melanogaster. Genetics 122~87-97.
TONEGAWA. 1980. Two types of somatic recombination
MATTHYSSENS,G., and T. H. RABBITTS. 1980. Structure and
are necessary for the generation of complete immunoglobmultiplicity of genes for the human immunoglobulin heavy
ulin heavy-chain genes. Nature 286:676-683.
chain variable region. Proc. Natl. Acad. Sci. USA 77:656 lSCHIFF,C.,M. MILILI, and M. FOUGEREAU. 1985. Functional
6565.
and pseudogenes are similarly organized and may equally
MIYATA,T., and T. YASUNAGA.198 1. Rapidly evolving mouse
contribute to the extensive antibody diversity of the IgVuII
a-globin-related pseudo gene and its evolutionary history.
family. EMBO J. 4: 1225-1230.
Proc. Natl. Acad. Sci. USA 78:450-453.
SCHIFF,C., M. MILILI, I. HUE, S. RUDIKOFF, and M. FOUGERNEI, M. 1987. Molecular evolutionary genetics. Columbia
EAU. 1986. Genetic basis for expression of the idiotypic
University Press, New York.
network. J. Exp. Med. 163:573-587.
NEI, M., and A. L. HUGHES. 199 1. Polymorphism and evoSCHROEDER, H. W. JR., J. L. HILLSON, and R. M. PERLMUTlution of the major histocompatibility complex loci in
TER. 1990. Structure and evolution of mammalian Vu
mammals. Pp. 222-247 in R. SELANDER,A. CLARK, and
families. Int. Immunol. 2:4 l-50.
T. WHITTAM,eds. Evolution at the molecular level. Sinauer,
SCHROEDER, H. W. JR., M. A. WALTER, M. H. HOFKER, A.
Sunderland, Mass.
482
Ota and Nei
EBENS,K. W. VAN DIJK, L. C. LIAO, D. W. Cox, E. C. B.
MILNER, and R. M. PERLMUTTER. 1988. Physical linkage
of a human immunoglobulin
heavy chain variable region
gene segment to diversity and joining region elements. Proc.
Natl. Acad. Sci. USA 85:8 196-8200.
SCHWAGER,J., N. BURCKERT, M. COURTET, and L. Du PASQUIER. 1989. Genetic basis of the antibody repertoire in
Xenopus: analysis of the Vu diversity. EMBO J. 8:29893001.
SHAMBLOTT,M. J., and G. W. LITMAN. 1989. Genomic organization and sequences of immunoglobulin
light chain
genes in a primitive vertebrate suggest coevolution of immunoglobulin gene organization. EMBO J. 8:3733-3739.
SHIN, E. K., F. MATSUDA,H. NAGAOKA,Y. FUKITA, T. IMAI,
K. YOKOYAMA,E. SOEDA, and T. HONJO. 199 1. Physical
map of the 3’ region of the human immunoglobulin heavy
chain locus: clustering of autoantibody-related
variable
segments in one haplotype. EMBO J. 10:3641-3645.
SIU, G., E. A. SPRINGER,H. V. HUANG, L. E. HOOD, and S. T.
CREWS. 1987. Structure of the T15 Vu gene subfamily:
identification of immunoglobulin gene promoter homologies. J. Immunol. 138:4466-447 1.
SLIGHTOM,J. L., A. E. BLECHL,and 0. SMITHIES. 1980. Human fetal Go- and Ay-globin genes: complete nucleotide sequences suggest that DNA can be exchanged between these
duplicated genes. Cell 21:627-638.
SMITH, G. P. 1974. Unequal crossover and the evolution of
multigene families. Cold Spring Harb. Symp. Quant. Biol.
38:507-5 13.
SMITH, G. P., L. HOOD, and W. M. FITCH. 197 1. Antibody
diversity. Annu. Rev. Biochem. 40:969- 10 12.
STONEKING,M. 1993. DNA and recent human evolution. Evol.
Anthropol. 2:60-73.
TAKAHATA, N., and M. NEI. 1990. Allelic genealogy under
overdominant
and frequency-dependent
selection and
polymorphism of major histocompatibility complex loci.
Genetics 124:967-978.
TANAKA, T., and M. NEI. 1989. Positive Darwinian selection
observed at the variable-region genes of immunoglobulins.
Mol. Biol. Evol. 6:447-459.
THOMPSON,C. B., and P. E. NEIMAN. 1987. Somatic diversification of the chicken immunoglobulin light chain gene
is limited to the rearranged variable gene segment. Cell 48:
369-378.
TONEGAWA,S. 1983. Somatic generation of antibody diversity.
Nature 302:575-58 1.
TUTTER, A., and R. RIBLET. 1989~. Conservation of an immunoglobulin variable-region gene family indicates a specific, noncoding function. Proc. Natl. Acad. Sci. USA 86:
7460-7464.
-.
1989b. Evolution of the immunoglobulin heavy chain
variable region (Ig/.r-V) locus in the genus MUX Immunogenetics 30:3 15-329.
VAZQUEZ,M., N. MIZUKI, M. F. FLAJNIK, E. C. MCKINNEY,
and M. KASAHARA . 1992. Nucleotide sequence of a nurse
shark immunoglobulin heavy chain cDNA clone. Mol. Immunol. 29:1157-l 158.
WARR, G. W., D. L. MIDDLETON,N. W. MILLER,L. W. CLEM,
and M. R. WILSON. 199 1. An additional family of Vn sequences in the channel catfish. Eur. J. Immunogen 18:393397.
WILSON, M. R., D. MIDDLETON, and G. W. WARR. 199 1.
Immunoglobulin
Vu genes of the goldfish Carussius auratus: a re-examination. Mol. Immunol. 28:449-457.
YAMASAKI-KATAOKA,Y., and T. HONJO. 1987. Nucleotide
sequences of variable region segments of the immunoglobulin heavy chain of Xenopus luevis. Nucleic Acids Res. 15:
5888.
ZIMMER, E. A., S. L. MARTIN, S. M. BEVERLEY,Y. W. RAN,
and A. C. WILSON. 1980. Rapid duplication and loss of
genes coding for the c1chains of hemoglobin. Proc. Natl.
Acad. Sci. USA 77:2158-2162.
JAN KLEIN, reviewing
Received
November
Accepted
January
editor
5, 1993
19, 1994