Download Complex History of a Chromosomal Paralogy Region: Insights from

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epistasis wikipedia , lookup

Y chromosome wikipedia , lookup

Oncogenomics wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Polyploid wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Segmental Duplication on the Human Y Chromosome wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Transposable element wikipedia , lookup

Genetic engineering wikipedia , lookup

Essential gene wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Human genome wikipedia , lookup

Copy-number variation wikipedia , lookup

Gene therapy wikipedia , lookup

Public health genomics wikipedia , lookup

NEDD9 wikipedia , lookup

Point mutation wikipedia , lookup

Pathogenomics wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Gene nomenclature wikipedia , lookup

Gene desert wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

History of genetic engineering wikipedia , lookup

X-inactivation wikipedia , lookup

Helitron (biology) wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Ridge (biology) wikipedia , lookup

RNA-Seq wikipedia , lookup

Genomic imprinting wikipedia , lookup

Minimal genome wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Genome evolution wikipedia , lookup

Gene expression profiling wikipedia , lookup

Microevolution wikipedia , lookup

Genome (book) wikipedia , lookup

Designer baby wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Complex History of a Chromosomal Paralogy Region: Insights from
Amphioxus Aromatic Amino Acid Hydroxylase Genes and
Insulin-Related Genes
Simon J. Patton, Graham N. Luke, and Peter W. H. Holland
School of Animal and Microbial Sciences, University of Reading, Whiteknights, U.K.
Aromatic amino acid hydroxylase (AAAH) genes and insulin-like genes form part of an extensive paralogy region
shared by human chromosomes 11 and 12, thought to have arisen by tetraploidy in early vertebrate evolution.
Cloning of a complementary DNA (cDNA) for an amphioxus (Branchiostoma floridae) hydroxylase gene
(AmphiPAH) allowed us to investigate the ancestry of the human chromosome 11/12 paralogy region. Molecular
phylogenetic evidence reveals that AmphiPAH is orthologous to vertebrate phenylalanine (PAH) genes; the implication is that all three vertebrate AAAH genes arose early in metazoan evolution, predating vertebrates. In contrast,
our phylogenetic analysis of amphioxus and vertebrate insulin-related gene sequences is consistent with duplication
of these genes during early chordate ancestry. The conclusion is that two tightly linked gene families on human
chromosomes 11 and 12 were not duplicated coincidentally. We rationalize this paradox by invoking gene loss in
the AAAH gene family and conclude that paralogous genes shared by paralogous chromosomes need not have
identical evolutionary histories.
Introduction
The past decade has witnessed a rapid expansion
in the number of mammalian genes mapped to chromosomal locations. As data have accumulated, patterns
have become apparent. Chromosomal paralogy regions
are sets of linked genes, each having homologues linked
on another chromosome (Lundin 1993). For example,
the genes for phenylalanine hydroxylase, FGF6, insulinlike growth factor 1 (IGF-1), lactate dehydrogenase B,
and two myogenic bHLH proteins, K-ras and parathyroid hormone-like hormone, map to human chromosome
12; all these genes have close homologues on human
chromosome 11. The simplest explanation for this pattern is common ancestry: human chromosomes 11 and
12 probably arose by duplication of an ancestral chromosome, perhaps by tetraploidy of the genome (Lundin
1993).
There is now considerable evidence that tetraploidy
did occur during early vertebrate evolution. Susumo
Ohno was the first to seriously suggest this notion in his
seminal book (Ohno 1970). More recently, analysis of
gene family complexity in vertebrates and in the closely
related invertebrate, amphioxus, has given strong support to the notion that the genome underwent at least
one, and possibly two, rounds of tetraploidy in early
vertebrate evolution (Holland et al. 1994; Sharman and
Holland 1996; Sharman, Hay-Schmidt, and Holland
1997).
If chromosomal paralogy regions arose by
tetraploidy during early vertebrate evolution, it seems
reasonable to assume that each pair of related genes
arose simultaneously at this time. This assumption has
never been adequately tested, since it requires DNA sequence information from a close outgroup for each of
Key words: amphioxus, paralogy region, chromosome evolution,
molecular phylogeny, phenylalanine hydroxylase, insulin.
Address for correspondence and reprints: Peter W. H. Holland,
School of Animal and Microbial Sciences, University of Reading,
Whiteknights, U.K. E-mail: [email protected].
Mol. Biol. Evol. 15(11):1373–1380. 1998
q 1998 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038
two or more linked gene families within a paralogy
group. Amphioxus may be the ideal outgroup for such
analyses since it is the sister group of the vertebrates
(defined here as synonymous with craniates) and is
thought to have branched from the chordate lineage just
before the putative tetraploidy events. Within the chromosome 11/12 paralogy group, the physical linkage between aromatic amino acid hydroxylase (AAAH) genes
and insulin-like genes is particularly close. Chromosome
11 has tryptophan hydroxylase (TPH) and tyrosine hydroxylase (TH), while chromosome 12 has phenylalanine hydroxylase (PAH). Chromosome 11 has insulin
and IGF-2, while chromosome 12 has IGF-1 (fig. 1). An
amphioxus homologue of the vertebrate insulin-like
genes (insulin, IGF-1, and IGF-2) has already been
cloned (Chan, Cao, and Steiner 1990) and named insulin-like peptide (ILP). The physically closest genes to
IGF-1 on human chromosome 12 and to IGF-2 and to
insulin on human chromosome 11 are members of the
AAAH gene family. We therefore elected to clone an
amphioxus member of this gene family. Molecular phylogenetic analyses of the amphioxus AAAH gene and
ILP gene sequences, in relation to vertebrate and other
metazoan homologues, revealed the dates of gene duplication in each gene family. We find that the gene
duplication yielding AAAH genes on chromosomes 11
and 12 occurred at a vastly different time from the duplication that yielded IGF genes on chromosomes 11
and 12.
Materials and Methods
Isolation and Characterization of cDNA Clones for
AAAH Genes
An amphioxus cDNA library in l-ZAPII (made
from 5- to 24-h embryos and a gift from J. Langeland,
University of Kalamazoo, Mich.) was screened with a
probe (p299) containing exons 6 to 11 of a genomic
cosmid clone for amphioxus PAH (Patton and Holland,
unpublished data). The genomic clone was originally
found by library screening with a polymerase chain re1373
1374
Patton et al.
FIG. 1—Chromosomal paralogy region shared by human chromosomes 11 and 12. Examples of genes mapping to 11p15 are shown;
the dotted lines indicate gene family relationships to genes mapping
to 12p12 or to 12q21–22. Chromosome 12 is drawn upside down to
facilitate comparison. Map positions are from Online Mendelian Inheritance in Man at http://www.ncbi.nlm.nih.gov/Omim/.
action fragment amplified with primers designed from
vertebrate AAAH genes. Hybridizations were carried
out in 7% SDS and 250 mM sodium phosphate buffer
pH 7.2 at 658C, and hybridizing clones were plaque purified. cDNA was rescued into pBSII SK1 (Stratagene),
restriction mapped, and sequenced in both directions by
a combination of subcloning and walking with specific
oligonucleotide primers.
Molecular Phylogenetic Analysis of AAAH Genes and
Insulin-Related Genes
Aromatic amino acid hydroxylase gene or cDNA
sequences were obtained through text and sequence similarity searches against GenBank databases including
data from the Nematode Genome Sequencing Project
(Sanger Centre, Hinxton, Cambridge, U.K.). The sequences used were the following: human TH type 1
(Grima et al. 1987; GenBank X05290); human TPH
(Boularand et al. 1990; GenBank X52836); human PAH
(Kwok et al. 1985; GenBank K03020); bovine TH
(D’Mello et al. 1988; SwissProt P17289); rabbit TPH
(Grenett et al. 1987; GenBank M17250); mouse TH
(Ichikawa, Sasaoka, and Nagatsu 1991; GenBank
M69200); mouse TPH (Stoll, Kozak, and Goldman
1990; GenBank J04758); mouse PAH (Ledley et al.
1990; GenBank X51942); quail TH (Fauquet et al. 1988;
GenBank M24778); chick TPH (Florez et al. 1996;
GenBank U26428); frog TPH (Green and Besharse
1994; GenBank L20679); fly TH (Neckameyer and
Quinn 1989; SwissProt P18459); fly PAH (Neckameyer
and White 1992; SwissProt P17276); and nematode TH,
TPH, and PAH (unpublished sequences citing Wilson et
al. 1994; WEBACE B0432.5, C36H8.3, and K08F8.4).
Vertebrate, amphioxus, Drosophila, and Caenorhabditis
AAAH protein sequences were aligned using the CLUSTAL V program (Higgins, Bleasby, and Fuchs 1991). A
distance matrix was constructed from the confidently
alignable regions at all sequences (323 sites), excluding
regions of dubious alignment, and the phylogeny was
deduced by the neighbor-joining method of the PHYLIP
package, version 3.572c (Felsenstein 1993). Confidence
in each node of the phylogeny was assessed by performing 100 bootstrap resamplings of the data. Neighbor
joining does not assume a molecular clock or a root
position.
An identical strategy was undertaken for the insulin-related genes, except that a greater diversity of species could be included. We included human INS (Bell
et al. 1980; GenBank J00265), human IGF-1 (Rotwein
et al. 1986; SwissProt P05019), human IGF-2 (Dull et
al. 1984; SwissProt P01344), rat insulin (Lomedico et
al. 1979; SwissProt P01323), rat IGF-1 (Shimatsu and
Rotwein 1987; SwissProt P08025), mouse IGF-2 (Rotwein and Hall 1990; GenBank U71085), rabbit insulin
(Devaskar et al. 1994; GenBank U03610), chick insulin
(Perler et al. 1980; SwissProt P01332), chick IGF-1 (Kajimoto and Rotwein 1989; SwissProt P18254), frog insulin (Shuldiner et al. 1989; SwissProt P12707), frog
IGF-1 (Kajimoto and Rotwein 1990; SwissProt
P16501), salmon insulin (Koval, Petrenko, and Kavsan
1989; SwissProt P04667), trout IGF-1 and IGF-2
(Shamblott and Chen 1992; SwissProt Q02815 and
Q02816), dogfish insulin (Bajaj et al. 1983; SwissProt
P12704, dogfish IGF-1 and IGF-2 (Duguay et al. 1995;
GenBank Z50081 and Z50082), hagfish insulin (Chan
et al. 1981; SwissProt P01342), hagfish IGF (Nagamatsu
et al. 1991; SwissProt P22618), amphioxus ILP (Chan,
Cao, and Steiner 1990; SwissProt P22334), and ascidian
insulin and IGF (McRory and Sherwood 1997). Analysis was restricted to the confidently alignable A and B
domains (50 sites); large regions of extreme length variability outside these domains (or inside for ascidian
IGF) were deliberately excluded to avoid problems with
gap weighting. Both analyses were performed without
outgroups, since alignments suggested that prokaryotic
AAAH genes (Zhao et al. 1994) and relaxin-like genes
(Hudson et al. 1983) were too distantly related to provide reliable root positions.
Results
Cloning of an Amphioxus AAAH
Screening an amphioxus cDNA library with a genomic AAAH probe identified six hybridizing clones.
Sequencing revealed that all clones derived from the
same gene: all six clones started from the same 59 EcoRI
site. There was variation in the 39 untranslated region
(UTR) between clones, as assessed by length and restriction endonuclease mapping; we conclude that there
is polymorphism within the 39 UTR. The longest clone
was fully sequenced and was found to include a 438–
amino acid open-reading frame (fig. 2); this remains in
frame at the 59 end. Alignment to known AAAH protein
sequences suggests that the clone is missing approximately 14 codons (3%) of the protein. Since the extreme
N-terminus of AAAH proteins is highly variable, it
would be uninformative for our phylogenetic analyses
in any case, and the near-full-length clone obtained is
Paralogy Evolution
1375
FIG. 2—Nucleotide and deduced amino acid sequence of amphioxus aromatic amino acid hydroxylase (AAAH) cDNA, AmphiPAH. Amino
acid residues shown in bold cover the region used in phylogenetic analyses, after optimal alignment to 16 AAAH sequences. Asterisk, stop
codon; underline, polyadenylation signal. GenBank/EMBL AJ001677.
likely to contain ample phylogenetic information. A putative polyadenylation signal was found starting at nucleotide 2837, followed 17 bp later by the poly (A)10
tail. Database searches using the deduced protein sequence revealed close similarity to known eukaryote
AAAH genes, with a particularly strong similarity to
PAH genes. On the basis of similarity, we designated
this cDNA AmphiPAH; molecular phylogenetics provide
a test of this hypothetical assignment
Molecular Phylogeny of the AAAH Gene Family
To determine the phylogenetic relationship of
AmphiPAH to other aromatic amino acid hydoxylase
genes, we first aligned it with the deduced protein sequences of all eukaryote AAAH genes. This included
three unpublished nematode AAAH sequences deposited in GenBank by the Caenorhabditis elegans genomesequencing project. All sequences aligned extremely
1376
Patton et al.
FIG. 3—Phylogenetic relationships of eukaryote aromatic amino acid hydroxylase (AAAH) proteins, inferred by the neighbor-joining
method. Figures on branches signify the percentage of times the sequences to the right of that node were found together following bootstrap
resampling of the data. Nodes receiving below 60% bootstrap support were collapsed, leaving only well-supported aspects of the tree. Branch
lengths are to scale; the scale bar denotes 0.1 inferred mutations per site. The tree is not rooted because of the absence of a suitable outgroup;
nonetheless, TH, TPH, and PAH clearly fall as distinct genes across all metazoan taxa analyzed.
well across the C-terminal two-thirds of each enzyme.
Seventy residues were absolutely conserved in all eukaryote hydroxylase proteins. A phylogenetic tree constructed by neighbor joining is shown in figure 3 with
confidence assessed by bootstrapping.
The results show that each of the three hydroxylase
gene types (TH, TPH, and PAH) forms a monophyletic
group supported by very high bootstrap values. The position of the amphioxus hydroxylase gene confirms that
it is directly orthologous to human PAH; the same conclusion applies to the Drosophila gene previously de-
scribed as a possible precursor of PAH and TPH (Neckameyer and White 1992). The three nematode AAAH
genes sequenced by the nematode genome project are
clearly orthologous to TH, TPH, and PAH. The tree is
consistent with the hypothesis of two separate phases of
gene duplications in this gene family giving rise to three
genes (TH, TPH, and PAH), but we cannot firmly deduce the relative order of the two duplications because
of the absence of an outgroup. The clear conclusion is
that all three gene types (TH, TPH, and PAH) originated
by duplication prior to the divergence of nematodes, ar-
Paralogy Evolution
1377
FIG. 4—Phylogenetic relationships of insulin-related proteins, inferred by the neighbor-joining method. Percentage bootstrap support is
shown only where this reaches 60% or above; other nodes are collapsed. Scale denotes 0.1 mutations per site.
thropods, and chordates and hence well before the origin
of vertebrates. Although amphioxus TPH, Drosophila
TPH, and amphioxus TH have not yet been cloned, their
existence is strongly predicted by the topology of the
phylogenetic tree.
Molecular Phylogeny of the Insulin Gene Family
An amphioxus gene for insulin-like peptide (ILP)
gene was cloned by Chan, Cao, and Steiner (1990); its
phylogenetic relationships were analyzed by Ellsworth,
Hewett-Emmett, and Li (1994). We have refined this
analysis by incorporating the recently published sequences of insulin gene family members from the elasmobranch Squalus acanthias (Duguay et al. 1995) and
the ascidian Chelyosoma productum (McRory and Sherwood 1997), since these give further insight into the
timing of gene duplication in the insulin gene family.
Unlike the case with the AAAH genes, we find that the
metazoan insulin-related genes do not all fall into the
three groups characteristic of mammals (fig. 4). For ex-
ample, the hagfish IGF gene is found to be a possible
outgroup to the IGF-1 and IGF-2 groups of genes. Furthermore, our phylogenetic analysis conclusively demonstrates that neither amphioxus ILP nor ascidian IGF
genes are direct orthologues of either IGF-1 or IGF-2
genes; they are most probably direct descendants of an
ancestral IGF. This implies that the gene duplication that
gave rise to IGF-1 and IGF-2 occurred after the origin
of vertebrates. The phylogenetic position of ascidian insulin is more confusing; our analyses suggest it could
be a descendant of the precursor to insulin and both IGF
genes. Further insight into the relationships between
these genes comes from examination of their gene structures, including presence of particular protein domains.
The deduced protein products of amphioxus ILP and
ascidian IGF each have a C-terminal extension (domains
D and E) characteristic of vertebrate IGF proteins, not
present in insulin. This argues that amphioxus ILP and
ascidian IGF are true IGF genes, not insulin genes. Both
1378
Patton et al.
lines of evidence point to one conclusion: IGF-1 and
IGF-2 arose within the vertebrates.
Discussion
We chose to analyze AAAH genes and insulin-related genes as test cases to investigate the hypothesis
that paralogy regions are reflections of tetraploidy in
early vertebrate evolution. When a set of linked genes
on one chromosome has linked relatives on another, it
seems reasonable to assume that each paralogous pair
arose simultaneously, by chromosome duplication.
However, our characterization of an amphioxus AAAH
gene and our molecular phylogenetic analyses of AAAH
genes and insulin-related genes have yielded an unexpected result. We find that the AAAH genes on both
human chromosomes 11 and 12 arose very early in
metazoan evolution. The gene duplication events yielding TH, PAH, and TPH predate the divergence of nematodes, arthropods, and chordates, well back into the
Precambrian era. In contrast, the neighboring IGF genes
on the same chromosomes arose much later, during
chordate radiation (postdating the hagfish/gnathostome
divergence but predating the chondrichthyan/ray-finned
fish/tetrapod divergences). The IGF duplication is dated
to the time of the putative tetraploidy events in early
vertebrate evolution, for which much supporting data
have now accumulated (Holland et al. 1994; Holland
1996; Sharman and Holland 1996). This apparent paradox raises important questions concerning the nature
and origin of chromosomal paralogy regions. The resolution that we propose below suggests that paralogy
regions have a more complex history than previously
recognized and also highlights the factors that affect the
rates of gene loss in evolution.
The PAH gene is on chromosome 12 at 12q21.1,
whereas TH and TPH map to 11p15.5. The IGF-1 gene
maps adjacent to PAH, whereas IGF-2 is close to TH
and TPH. The genes lie within a well-characterized and
much-cited chromosomal paralogy region, with over 10
different gene families mapping to both chromosome
regions (Lundin 1993; fig. 1). Considering the large
number of gene families involved, it seems an inescapable conclusion that a substantial part of human chromosome 11 is evolutionarily related to part of chromosome 12, probably through chromosome duplication due
to tetraploidy (Brissenden, Ullrich, and Francke 1984;
Craig et al. 1986; Morton et al. 1986; Ledley et al. 1987;
Lundin 1993). The discrepancies in the locations of
some genes are likely to be due to pericentric inversion
and/or translocations altering gene order on chromosome 12 compared with that on chromosome 11 (Brissenden, Ullrich, and Francke 1984; fig. 1). If we accept
that human chromosomes 11 and 12 are evolutionarily
related, how can the different duplication dates of
AAAH and IGF genes be explained?
We propose that in the genome of an early chordate, a linked array of genes existed comprising TH,
PAH, TPH, insulin, and IGF (fig. 5). Of these, at least
TH, PAH, and TPH can be traced further back into early
metazoan evolution, although when they became phys-
FIG. 5—Proposed model for the evolution of the aromatic amino
acid hydroxylase (AAAH) genes and insulin-related genes within a
chromosomal paralogy group on human chromosomes (HSA) 11 and
12. Each gene is depicted by a box (hatched boxes, AAAH genes;
dotted boxes, insulin-related genes).
ically linked to insulin-related gene(s) cannot yet be deduced. During the early stages of vertebrate evolution,
the chromosome containing this cluster of genes duplicated to yield two paralogous chromosomes, each containing linked copies of TH, PAH, TPH, insulin, and
IGF. We propose that this occurred during tetraploidy of
the whole genome, and we date this event to soon after
the divergence of jawed vertebrates from the ancestral
jawless vertebrates. This equates to the second of the
two bouts of gene duplication during vertebrate evolution proposed by Holland et al. (1994) and Sharman and
Holland (1996). After tetraploidy, the two IGF genes
diverged in function, yielding IGF-1 and IGF-2. By contrast, the duplicate copies of TH, PAH, TPH, and insulin
were all purged from the genome. Presumably, deletions, frameshifts, and nonsense mutations corroded duplicate copies to pseudogenes and ultimately deleted
them. If loss of function occurred relatively soon after
tetraploidy, as predicted by theoretical considerations
(Marshall, Raff, and Raff 1994), it is not surprising that
all traces of the duplicate genes have been removed from
the genome during the 450 million years since tetraploidization. An interesting corollary of our proposed
model is that some of the duplicate genes must have
been deleted from each of the two daughter chromosomes. Thus, the genomic region that is now human
chromosome 11 has lost a duplicate copy of the PAH
gene; the loss has been more extensive on chromosome
12, which has been purged of duplicate TH, TPH, and
insulin genes (fig. 5).
These conclusions have several implications for genome evolution. Most important, this example demonstrates that the net result of tetraploidy is not always
gene duplication. Instead, as in the case of the AAAH
Paralogy Evolution
genes, the net result can be transfer of a gene from one
chromosome to another. Hence, arrays of paralogous
genes on two chromosomes are likely to include some
genes related by duplication during tetraploidy and others related by gene transfer during tetraploidy. This implies that attempts to date chromosome or genome duplications using dates of gene duplications must proceed
with caution. Certainly, it is not valid to use congruence
of gene duplication dates within paralogous chromosomes as a test of the tetraploidy hypothesis. Likewise,
caution must be exercised in extrapolating gene duplication dates from one gene family to its neighboring
gene family (Bailey et al. 1997). More generally, our
findings highlight differential gene loss as another factor
contributing to the complex evolution of multigene families, alongside gene duplication, gene conversion, rate
differences (Williams and Holland 1998), and convergent or parallel evolution (Stewart, Schilling, and Wilson 1987).
Finally, we note that there is a wide variance in the
degree of gene loss following tetraploidy. All duplicated
AAAH genes have been lost, in stark contrast to a large
number of developmentally expressed genes in which
duplicate genes have been retained (e.g., Hox gene clusters, Msx homeobox genes, myogenic genes, and IGF;
Holland 1996). We speculate that the difference in extent of gene loss is related to the ease with which genes
may acquire novel, selectively advantageous roles after
duplication. Ubiquitous or broadly expressed genes, particularly those with enzymatic functions, may require
rare and multiple substitutions in the protein-coding sequence to alter function. These may be so infrequent
that genes are likely to be lost before novel functions
arise. In contrast, genes with very localized sites of expression, particularly developmentally expressed genes,
may be functionally adapted more simply, by loss or
gain of enhancer sequences, creating novel sites of expression. We suggest that these genes are more likely to
acquire new roles and to be retained after duplication.
Acknowledgments
P.W.H.H. thanks a critical seminar audience at
Cambridge University, U.K., for help in rationalizing
apparently contradictory results; Richard Sandford for
advice at the earliest stages of this project; Seb Shimeld
and Hiroshi Wada for critical comment; and Lars-G.
Lundin for sparking an interest in paralogy. This work
was funded by the Medical Research Council, U.K., and
the University of Reading.
LITERATURE CITED
BAILEY, W. J., J. KIM, G. P. WAGNER, and F. H. RUDDLE. 1997.
Phylogenetic reconstruction of vertebrate Hox cluster duplications. Mol. Biol. Evol. 14:843–853.
BAJAJ, M., T. L. BLUNDELL, J. E. PITTS et al. (12 co-authors).
1983. Dogfish insulin. Primary structure, conformation and
biological properties of an elasmobranch insulin. Eur. J.
Biochem. 135:535–542.
BELL, G. I., R. L. PICTET, W. J. RUTTER, B. CORDELL, E.
TISCHER, and H. M. GOODMAN. 1980. Sequence of the human insulin gene. Nature 284:26–32.
1379
BOULARAND, S., M. C. DARMON, Y. GANEM, J. M. LAUNAY,
and J. MALLET. 1990. Complete coding sequence of human
tryptophan hydroxylase. Nucleic Acids Res. 18:4257.
BRISSENDEN, J. E., A. ULLRICH, and U. FRANCKE. 1984. Human chromosomal mapping of genes for insulin-like growth
factors 1 and 11 and epidermal growth factor. Nature 310:
781–784.
CHAN, S. J., Q. P. CAO, and D. F. STEINER. 1990. Evolution of
the insulin superfamily: cloning of a hybrid insulin/insulinlike growth factor cDNA from amphioxus. Proc. Natl.
Acad. Sci. USA 87: 9319–9323.
CHAN, S. J., S. O. EMDIN, S. C. KWOK, J. M. KRAMER, S.
FALKMER, and D. F. STEINER. 1981. Messenger RNA sequence and primary structure of preproinsulin in a primitive
vertebrate, the Atlantic hagfish. J. Biol. Chem. 256:7595–
7602.
CRAIG, S. P., V. J. BUCKLE, A. LAMOUROUX, J. MALLET, and
I. CRAIG. 1986. Localisation of the human tyrosine hydroxylase gene to 11p15: gene duplication and evolution of metabolic pathways. Cytogenet. Cell Genet. 42:29–32.
DEVASKAR, S. U., S. J. GIDDINGS, P. A. RAJAKUMAR, L. R.
CARNAGHI, R. K. MENON, and D. S. ZAHM. 1994. Insulin
gene expression and insulin synthesis in mammalian neuronal cells. J. Biol. Chem. 269:8445–845.
D’MELLO, S. R., E. P. WEISBERG, M. K. STACHOWIAK, L. M.
TURZAI, A. E. GIOIO, and B. B. KAPLAN. 1988. Isolation
and nucleotide sequence of a cDNA clone encoding bovine
adrenal tyrosine hydroxylase: comparative analysis of tyrosine hydroxylase gene products. J. Neurosci. Res. 19:
440–449.
DUGUAY, S. J., S. J. CHAN, T. P. MOMMSEN, and D. F. STEINER.
1995. Divergence of insulin-like growth factors 1 and 2 in
the elasmobranch, Squalus acanthias. FEBS Lett. 371:69–
72.
DULL, T. J., A. GRAY, J. S. HAYFLICK, and A. ULLRICH. 1984.
Insulin-like growth factor 11 precursor gene organization in
relation to insulin gene family. Nature 310:777–781.
ELLSWORTH, D. L., D. HEWETT-EMMETT, and W.-H. LI. 1994.
Evolution of base composition in the insulin and insulinlike growth factor genes. Mol. Biol. Evol. 11:875–885.
FAUQUET, M., B. GRIMA, A. LAMOUROUX, and J. MALLET.
1988. Cloning of quail tyrosine hydroxylase: amino acid
homology with other hydroxylases discloses functional domains. J. Neurochem. 50:142–148.
FELSENSTEIN, J. 1993. PHYLIP. Version 3.572c. University of
Washington, Seattle.
FLOREZ, J. C., K. J. SEIDENMAN, R. K. BARRETT, A. M. SANGORAM, and J. S. TAKAHASHI. 1996. Molecular cloning of
chick pineal tryptophan-hydroxylase and circadian oscillation of its messenger-RNA levels. Mol. Brain Res. 42:25–
30.
GREEN, C. B., and J. C. BESHARSE. 1994. Tryptophan hydroxylase expression is regulated by a circadian clock in Xenopus laevis retina. J. Neurochem. 62:2420–2428.
GRENETT, H. E., F. D. LEDLEY, L. L. REED, and S. L. WOO.
1987. Full-length cDNA for rabbit tryptophan hydroxylase:
functional domains and evolution of aromatic amino acid
hydroxylases. Proc. Natl. Acad. Sci. USA 84:5530–5534.
GRIMA, B., A. LAMOUROUX, C. BONI, J.-F. JULIEN, F. JAVOYAGID, and J. MALLET. 1987. A single human gene encoding
multiple tyrosine hydroxylases with different predicted
functional characteristics. Nature 326:707–711.
HIGGINS, D. G., A. J. BLEASBY, and R. FUCHS. 1991. CLUSTAL V: improved software for multiple sequence alignment. CABIOS. 8:189–191.
1380
Patton et al.
HOLLAND, P. W. H. 1996. Molecular biology of lancelets: insights into development and evolution. Isr. J. Zool. 42:
S247–S272.
HOLLAND, P. W. H., J. GARCIA-FERNÀNDEZ, N. A. WILLIAMS,
and A. SIDOW. 1994. Gene duplications and the origin of
vertebrate development. Dev. Suppl. 1994:125–133.
HUDSON, P., J. HALEY, M. CRONK, R. CRAWFORD, J. HARALAMBIDIS, G. TREGEAR, J. SHINE, and H. NIALL. 1983.
Structure of a genomic clone encoding biologically active
human relaxin. Nature 301:628–631.
ICHIKAWA, S., T. SASAOKA, and T. NAGATSU. 1991. Primary
structure of mouse tyrosine hydroxylase deduced from its
cDNA. Biochem. Biophys. Res. Commun. 176:1610–1616.
KAJIMOTO, Y., and P. ROTWEIN. 1989. Structure and expression
of a chicken insulin-like growth factor 1 precursor. Mol.
Endocrinol. 3:1907–1913.
. 1990. Evolution of insulin-like growth factor 1 (IGF1): structure and expression of an IGF-1 precursor from
Xenopus laevis. Mol. Endocrinol. 4:217–226.
KOVAL, A. P., A. I. PETRENKO, and V. M. KAVSAN. 1989. Sequence of the salmon (Oncorhynchus keta) [corrected] preproinsulin gene. Nucleic Acids Res. 17:1758.
KWOK, S. C., F. D. LEDLEY, A. G. DILELLA, K. J. ROBSON,
and S. L. WOO. 1985. Nucleotide sequence of a full-length
complementary DNA clone and amino acid sequence of human phenylalanine hydroxylase. Biochemistry 24:556–561.
LEDLEY, F. D., H. E. GRENETT, D. P. BARTOS, P. VAN TUINEN,
D. H. LEDBETTER, and S. L. C. WOO. 1987. Assignment of
human tryptophan hydroxylase locus to chromosome 11:
gene duplication and translocation in evolution of aromatic
amino acid hydroxylases. Somat. Cell Mol. Genet. 13:575–
580.
LEDLEY, F. D., H. E. GRENETT, B. S. DUNBAR, and S. L. WOO.
1990. Mouse phenylalanine hydroxylase. Homology and divergence from human phenylalanine hydroxylase. Biochem.
J. 267:399–405.
LOMEDICO, P., N. ROSENTHAL, A. EFSTRATIDADIS, W. GILBERT,
R. KOLODNER, and R. TIZARD. 1979. The structure and evolution of the two nonallelic rat preproinsulin genes. Cell 18:
545–558.
LUNDIN, L. G. 1993. Evolution of the vertebrate genome as
reflected in paralogous chromosomal regions in man and
the house mouse. Genomics 16:1–19.
MARSHALL, C. R., E. C. RAFF, and R. A. RAFF. 1994. Dollo’s
Law and the death and resurrection of genes. Proc. Natl.
Acad. Sci. USA 91:12283–12287.
MCRORY, J. E., and N. M. SHERWOOD. 1997. Ancient divergence of insulin and insulin-like growth factor. DNA Cell
Biol. 16:939–949.
MORTON, C. C., M. G. BYERS, H. NAKAI, G. I. BELL, and T.
B. SHOWS. 1986. Human genes for insulin-like growth factors 1 and 11 and epidermal growth factor are located on
12q22-q24.1, 11p15, and 4q25-q27, respectively. Cytogenet. Cell Genet. 41:245–249.
NAGAMATSU, S., CHAN, S. J., FALKMER, S., and STEINER, D.
F. 1991. Evolution of the insulin gene superfamily. Sequence of a preproinsulin-like growth factor cDNA from
the Atlantic hagfish. J. Biol. Chem. 266:2397–2402.
NECKAMEYER, W. S., and W. G. QUINN. 1989. Isolation and
characterization of the gene for Drosophila tyrosine hydroxylase. Neuron 2:1167–1175.
NECKAMEYER, W. S., and K. WHITE. 1992. A single locus encodes both phenylalanine hydroxylase and tryptophan hydroxylase activities in Drosophila. J. Biol. Chem. 267:
4199–4206.
OHNO, S. 1970. Evolution by gene duplication. Springer-Verlag, Heidelberg, Germany.
PERLER, F., A. EFSTRADDIATOS, P. LOMEDICO, W. GILBERT, R.
KOLODNER, and J. DODGSON. 1980. The evolution of genes:
the chicken preproinsulin gene. Cell 20:555–566.
ROTWEIN, P., and L. J. HALL. 1990. Evolution of insulin-like
growth factor 2: characterization of the mouse IGF-2 gene
and identification of two pseudo-exons. DNA Cell Biol. 9:
725–735.
ROTWEIN, P., K. M. POLLOCK, D. K. DIDIER, and G. G. KRIVI.
1986. Organization and sequence of the human insulin-like
growth factor 1 gene. Alternative RNA processing produces
two insulin-like growth factor 1 precursor peptides. J. Biol.
Chem. 261:4828–4832.
SHAMBLOTT, M. J., and T. T. CHEN. 1992. Identification of a
second insulin-like growth factor in a fish species. Proc.
Natl. Acad. Sci. USA 89:8913–8917.
SHARMAN, A. C., A. HAY-SCHMIDT, and P. W. H. HOLLAND.
1997. Cloning and analysis of an HMG gene from the lamprey, Lampetra fluviatilis: gene duplication in vertebrate
evolution. Gene 184:99–105.
SHARMAN, A. C., and P. W. H. HOLLAND. 1996. Conservation,
duplication and divergence of developmental genes during
chordate evolution. Neth. J. Zool. 46:47–67.
SHIMATSU, A., and P. ROTWEIN. 1987. Mosaic evolution of the
insulin-like growth factors. Organization, sequence, and expression of the rat insulin-like growth factor 1 gene. J. Biol.
Chem. 262:7894–7900.
SHULDINER, A. R., S. PHILLIPS, C. T. ROBERTS JR., D. LEROITH,
and J. ROTH. 1989. Xenopus laevis contains two nonallelic
preproinsulin genes. cDNA cloning and evolutionary perspective. J. Biol. Chem. 264:9428–9432.
STEWART, C.-B., J. W. SCHILLING, and A. C. WILSON. 1987.
Adaptive evolution in the lysozymes of foregut fermenters.
Nature 330:401–404.
STOLL, J., C. A. KOZAK, and D. GOLDMAN. 1990. Characterization and chromosomal mapping of a cDNA encoding
tryptophan hydroxylase from a mouse mastocytoma cell
line. Genomics 7:88–96.
WILLIAMS, N. A., and P. W. H. HOLLAND. 1998. Gene and
domain duplication in the chordate Otx gene family: insights from amphioxus Otx. Mol. Biol. Evol. 15:600–607.
WILSON, R., R. AINSCOUGH, K. ANDERSON et al. (53 co-authors). 1994. 2.2 Mb of contiguous nucleotide sequence
from chromosome III of C. elegans. Nature 368:32–38.
ZHAO, G., T. XIA, J. SONG, and R. A. JENSEN. 1994. Pseudomonas aeruginosa possesses homologues of mammalian
phenylalanine hydroxylase and 4 a-carbinolamine dehydratase/DCoH as part of a three-component gene cluster. Proc.
Natl. Acad. Sci. USA 91:1366–1370.
CLAUDIA KAPPEN, reviewing editor
Accepted July 23, 1998