Download reviews

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Copy-number variation wikipedia , lookup

Segmental Duplication on the Human Y Chromosome wikipedia , lookup

Gene therapy wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

NEDD9 wikipedia , lookup

Human genome wikipedia , lookup

Population genetics wikipedia , lookup

Non-coding DNA wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Genetic engineering wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Transposable element wikipedia , lookup

Gene nomenclature wikipedia , lookup

Essential gene wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Oncogenomics wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Point mutation wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Pathogenomics wikipedia , lookup

Public health genomics wikipedia , lookup

Epistasis wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene desert wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Helitron (biology) wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genomic imprinting wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Minimal genome wikipedia , lookup

Ridge (biology) wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene expression programming wikipedia , lookup

Genome (book) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Designer baby wikipedia , lookup

Gene expression profiling wikipedia , lookup

Hox gene wikipedia , lookup

Genome evolution wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
REVIEWS
SPLITTING PAIRS: THE DIVERGING
FATES OF DUPLICATED GENES
Victoria E. Prince* and F. Bryan Pickett ‡
Many genes are members of large families that have arisen during evolution through gene
duplication events. Our increasing understanding of gene organization at the scale of whole
genomes is revealing further evidence for the extensive retention of genes that arise during
duplication events of various types. Duplication is thought to be an important means of providing
a substrate on which evolution can work. An understanding of gene duplication and its resolution
is crucial for revealing mechanisms of genetic redundancy. Here, we consider both the theoretical
framework and the experimental evidence to explain the preservation of duplicated genes.
POLYPLOIDY
A polyploid organism has more
than two sets of chromosomes
(two sets being the prevalent
diploid state). For example,
a tetraploid organism has four
sets of chromosomes and an
octaploid has eight sets.
ALLOTETRAPLOIDY
The generation of the tetraploid
state by fusion of two nuclei
from different species. For
example, two fertilized diploid
oocytes can fuse such that the
newly formed single egg has two
complete sets of chromosomes.
*Department of Organismal
Biology and Anatomy,
The University of Chicago,
1027 East 57th Street,
Chicago, Illinois 60615, USA.
‡
Department of Biology,
Loyola University of Chicago,
6525 North Sheridan Road,
Chicago, Illinois 60626, USA.
Correspondence to V.E.P.
e-mail:
[email protected]
doi:10.1038/nrg928
At the dawn of the new genomic era, we already know
the entire genome sequences of several organisms, and
the mysteries of genome structure, organization
and evolution are at last beginning to be unveiled.
Recent studies have shown that a surprisingly large
number of duplicated genes are present in all sequenced
genomes, revealing that there is frequent evolutionary
conservation of genes that arise through local, regional
or global DNA duplication events1. Tandem, regional or
whole-genome duplication events produce pairs of initially similar genes, which can ultimately become scattered throughout a dynamically rearranging genome2.
Complete or partial POLYPLOIDY is found in many
plants. It is also found in specific vertebrates, such as
salmonid fishes and certain frogs, including the popular
embryology model Xenopus laevis3. Polyploidy can
occur as the result of whole-genome duplication,
through ALLOTETRAPLOIDY or AUTOTETRAPLOIDY. All vertebrate
animals, despite their generally diploid state, carry large
numbers of duplicated genes. This has been interpreted
by some as evidence that two rounds of whole-genome
duplication occurred at the origin of the vertebrate
lineage, ~400 million years ago (Mya; the ‘2R’ hypothesis4–6; see phylogenetic tree and green text in FIG. 1).
Whether or not this event occurred, there have certainly
been duplication events on a broad scale during vertebrate evolution, although these might represent
‘segmental’ duplications of large stretches of DNA or
perhaps of whole chromosomes. The occurrence of a
whole-genome duplication event in the main branch of
the vertebrate lineage that produced all the TELEOST
fishes, at least 110 Mya, is more widely accepted7 (FIG. 1).
As more whole-genome sequences become available,
it is becoming increasingly important to understand the
forces that have shaped their organization. Here, we
review the mechanisms that lead to retention versus loss
of duplicated genes and consider the broader implications at both a genetic and an evolutionary level. In particular, we use examples from vertebrates and plants to
focus on sub-functionalization, by which duplicate
genes each lose a different subcomponent of their function, therefore reducing their joint activity to that of the
single ancestral gene. This mechanism provides an
appealing model to explain why so many genes have
been retained after duplication.
Evolutionary fates of duplicated genes
No matter how duplicate genes arise, if they are duplicated in their entirety (including regulatory elements)
then they can show inter-gene REDUNDANCY8–10. Classical
models predict two potential fates for these duplicate
gene pairs11,12. The most likely fate is that one of the pair
will degenerate to a pseudogene or be lost from the
genome due to the vagaries of chromosomal remodelling, locus deletion or point mutation (in a process
known as NON-FUNCTIONALIZATION). Gene loss through
these processes is permissible because only one of the
duplicates is required to maintain the function provided
NATURE REVIEWS | GENETICS
VOLUME 3 | NOVEMBER 2002 | 8 2 7
© 2002 Nature Publishing Group
REVIEWS
d
nn
ne
ed
fin
Cichlids,
striped bass
Medaka
Mammals
fis
y-
he
s
Ra
Pufferfish
be
s
110 Mya
Lo
he
Zebrafish
Avians
-fi
fis
Teleost
fishes
>4 Hox
clusters
Tetrapods
4 Hox
clusters
Amphibians
Lungfish
Genome
duplication
Bowfin
Sturgeon
Bichir
Duplications/
2R
Coelacanths
>400 Mya
Sharks and rays
Lamprey
Amphioxus
One Hox cluster
Figure 1 | Phylogeny of chordates. The tree indicates the approximate timing of whole-genome
(green) and Hox gene (red) duplication events. ‘2R’ refers to the two rounds of whole-genome
duplication that are believed by some to have occurred at the origin of the vertebrate lineage.
Mya, million years ago. Modified with permission from REF. 91 © (2002) Elsevier Science.
AUTOTETRAPLOIDY
In contrast to allotetraploidy,
both sets of chromosomes are
derived from the same species.
This can occur in the fertilized
oocyte if the nucleus divides but
the cell does not.
TELEOST
A bony fish that belongs to the
infraclass Teleostei (comprising
more than 20,000 species),
which includes nearly all the
important food and game fish,
and many aquarium fish.
REDUNDANCY
When two genes can fulfil an
equivalent function. Because of
pleiotropy, redundancy is often
partial, with two genes having
overlapping rather than
equivalent functions.
NON-FUNCTIONALIZATION
When one of two duplicate
genes acquires a mutation in
coding or regulatory sequences
that ultimately renders the gene
non-functional.
PURIFYING SELECTION
Selection against deleterious
alleles, which will be eliminated
from the population.
NEO-FUNCTIONALIZATION
When one of two duplicate
genes acquires a mutation in
coding or regulatory sequences
that allows the gene to take on a
new and useful function.
CONSERVED SYNTENY
(syn, same; teny, thread).
Homology of gene order
between two chromosomes or
chromosomal segments, within
or between species.
828
by the single, ancestral gene, leaving one gene under
PURIFYING SELECTION and the other gene free to accumulate
evolutionarily neutral or nearly neutral loss-of-function
mutations in the coding region. A less frequently
expected outcome is that a population acquires a new,
advantageous allele as the result of alterations in coding
or regulatory sequences, exposing the formerly redundant gene to new and distinct selective constraints.
Mutations that lead to such novel gene functions (a
process called NEO-FUNCTIONALIZATION) are assumed to be
extremely rare, so the classical model predicts that few
duplicates should be retained in the genome over the
long term.
The classical model, however, fails to explain the
existence of the many duplicated genes found in extant
genomes. Nadeau and Sankoff estimated that around
half of all duplicated vertebrate genes have been maintained13. Recent analyses of the human genome have
revealed that at least 15% of human genes are indeed
duplicates14, with segmental duplications covering 5.2%
of the genome15. Similarly, comparative genomics has
shown that the zebrafish genome retains at least 20% of
the gene pairs that arose from the latest duplication
event in this lineage, which occurred at least 110 Mya
(REF. 7). The retention of ancient duplicates is also common in plants: in Arabidopsis, 17% of genes are found in
tandem arrays that contain up to 23 genes, and ~31%
are members of duplicate pairs that reside in regions of
the genome with CONSERVED SYNTENY16. These sequence
analyses complement the results of experiments that
have assessed gene function using genetic, molecular
and developmental approaches; such studies indicate
that duplication often results in continuing partial
genetic redundancy. Expression analyses indicate that
extant gene pairs might have, in many cases, partitioned
the multiple, often PLEIOTROPIC, functions of single ancestral genes between the descendant duplicates.
Population-level models and experimental evidence
indicate that gene multifunctionality (BOX 1) might act
to potentiate the preservation of duplicated genes.
The DDC model. What alternatives exist to non-functionalization and neo-functionalization? A broadly
applicable SUB-FUNCTIONALIZATION model was recently
proposed by Force and colleagues to explain the prevalence of duplicate genes that are retained in the
genome17,18. Sub-functionalization proposes that, after
duplication, the two gene copies acquire complementary loss-of-function mutations in independent subfunctions, such that both genes are required to produce
the full complement of functions of the single ancestral
gene. This general process has been described in detail
in the duplication–degeneration–complementation
(DDC) model17,18 (BOX 2). For the sub-functionalization
Box 1 | The multifunctional nature of genes
The recent results derived from evolutionary, developmental and genomic studies in various organisms highlight the
key roles of gene and phenotypic multifunctionality during organismal evolution20,85. Genetic evidence of gene
multifunctionality has a long history and was first described in maize86 and Drosophila 87, in which non-quantitative
ALLELIC SERIES were found. Some members of these allelic series could not be placed on a simple continuum, in which
alleles retained a proportion of the activity of wild-type alleles. Different alleles had an impact on different qualitative
patterns of characters, contributing to unique phenotypes. Such observations supported Hermann Joseph Muller’s
famous proposal that a large array of non-null alleles exists for many genes, which led eventually to a scheme of
classifying allele function that was based on his reinterpretation of Gregory Bateson’s allelomorphy (which was
Bateson’s hypothesis that allelomorphs (or alleles) encode ‘unit characters’ that contribute to an observed Mendelian
character). In addition, the existence of unique alleles that cause unpredictable phenotypes also led Muller to promote
the ‘sub-gene hypothesis’ of Serebrovsky and Dubinin88. This hypothesis posited the first strongly articulated model that
individual genes have not one, but a set of functions, each contributed by independently mutable regions of chromatin
at a single locus.
Muller and colleagues made various attempts to ‘divide’ sub-gene functions using chromosomal rearrangements and,
although their work was inconclusive, the apparent interaction of position effect and the location of inversion
breakpoints on the phenotypic severity of rearranged alleles indicated that some subdivision of gene functions was
possible89. Muller also identified Drosophila dumpy alleles that affect only wing or only thoracic characters87, and the
Small eye (Sy) mutation discovered by Bridges turned out to be an allele of the mutation outstretched wings (od; now
known as os) observed by Muller90. These observations confirmed a key tenet of the ‘sub-gene hypothesis’: functions of
a gene can be independently identified and separated by mutation. The insights gained through observations made by
Muller, Emerson and other prominent early twentieth century maize and Drosophila geneticists are the foundation of
our current appreciation of the multifunctional nature of individual genes.
| NOVEMBER 2002 | VOLUME 3
www.nature.com/reviews/genetics
© 2002 Nature Publishing Group
REVIEWS
Box 2 | The duplication–degeneration–complementation model
The duplication–degeneration–complementation
(DDC) model relies on complementary degenerative
changes in a pair of duplicate genes, such that the
Duplication
duplicates together retain the original functions of
their single ancestor. The red, blue and green boxes
denote cis-regulatory elements, although degenerative
mutations in any functionally discrete, independently
mutable portion of a locus (a protein domain or
alternative splice site, for example) could participate
Degeneration
in sub-functionalization.
The mathematical models that underlie the DDC
concept depend ultimately on population-level
processes, including mutation rates and the changes in
allele frequency that occur owing to GENETIC DRIFT.
Complementation
Initially, a contest occurs between mutations that affect
any one sub-function of a gene and null mutations that
destroy the ability of a gene to produce a functional
protein. A mutational event that affects a sub-function of
either duplicate allows both genes to persist in
individuals in a population, therefore potentially
allowing both genes to experience subsequent mutations. Alternatively, any mutational event that negatively affects a
coding region can instantly convert the affected duplicate into a pseudogene, preventing subsequent participation in
sub-functionalization. Until sub-functionalization has occurred, duplicates can experience sub-function or null
mutations in the coding region.
The DDC model depends on parcelling out the pre-existing sub-functions of ancestral genes, potentially leading to a
reduction in the pleiotropy level per gene. As a consequence, the sub-functionalized duplicates are less constrained by
selection than the single ancestral gene, which had to maintain the capacity to fulfil all functions. Selection can therefore
act independently on each duplicate, increasing its functional specificity.
PLEIOTROPY
When a single gene has a role
in several processes.
SUB-FUNCTION
Any functionally discrete,
independently mutable portion
of a locus. For example,
a cis-regulatory element,
a protein domain or an
alternative splice site.
ALLELIC SERIES
A series of alleles that can be
present at the same locus and
that produce graded
phenotypes.
GENETIC DRIFT
The increase or decrease in allele
frequencies in populations due
to chance.
LENS CRYSTALLIN
A protein that accumulates at
high concentration in the eye
and that forms the crystallin
lens.
DEGENERATIVE MUTATION
A sequence change that causes
a loss of function of the affected
sub-function or gene.
INDIVIDUAL RELATIVE FITNESS
The capacity of the individual to
survive and reproduce.
EFFECTIVE POPULATION SIZE
The equivalent number of
breeding adults in a population
after adjusting for complicating
factors, such as non-random
variation in family size or
stochastic fluctuation in
population size.
model to work, sub-functions need to be independent,
such that mutations in one will not affect another. In
many cases, eukaryotic enhancers can act as sub-functions or components of sub-functions due to their
modular structure. Furthermore, transcription-factorbinding sites are short (often just 8–12 bp), indicating
that point mutations might lead frequently to the disruption or creation of sites19. These properties of regulatory sequences have led many researchers to emphasize that evolutionarily important changes might
happen primarily at the level of gene regulation rather
than protein function20,21. So, a likely way for sub-functionalization to occur is through complementary
changes in regulatory elements, perhaps leading to two
separate expression domains that together recapitulate
the more complex single expression pattern of the
ancestral gene17,22.
Aspects of this idea were previously proposed by several groups. The first paper to put forward clearly the
general idea of sub-functionalization came from
Piatigorsky and Wistow 23, who proposed a ‘gene sharing’ model. Their work with LENS CRYSTALLINS led them to
suggest that the acquisition of two expression domains
could precede a duplication event, with each duplicate
later losing one of the two expression domains. Similar
ideas were expounded by Hughes24, who suggested that
DEGENERATIVE MUTATIONS could lead to the preservation of
duplicated genes. Along similar lines, Averof and colleagues suggested that tandem duplications might often
produce duplicates with differential partitioning of
regulatory elements, such that both genes are required
to recapitulate the single ancestral expression pattern25.
The probability of sub-functionalization aiding the
preservation of duplicate gene loci was also explored
independently as part of Stoltzfus’ general model of the
contribution of neutral mutations to the diversification
of gene function26.
Whether two duplicated genes are initially preserved
through sub-functionalization or neo-functionalization,
they are likely to retain lingering redundant sub-functions. This redundancy can be resolved ultimately
through subsequent rounds of degenerative, complementary mutations in remaining sub-functions. So,
DDC processes can occur subsequent to a neo-functionalization event. The DDC process also requires that subfunction mutant alleles rise to high frequencies in populations, supplanting the ancestral alleles that were
generated at the moment of duplication. Sub-function
mutations would have a low impact on INDIVIDUAL RELATIVE
FITNESS and would, in effect, be neutral or nearly neutral
to selection. Under such near-neutrality, genetic drift has
a major impact on the overall likelihood that sub-functionalized alleles will become fixed in a population.
In common with other drift-based models, the probability of sub-functionalization after whole-genome duplication is extremely sensitive to the EFFECTIVE POPULATION SIZE
and the null mutation rate for individual genes18.
Mathematical modelling and computer simulations18,27
predict that populations experiencing high mutation
rates (10−4 per site per generation) are only likely to
NATURE REVIEWS | GENETICS
VOLUME 3 | NOVEMBER 2002 | 8 2 9
© 2002 Nature Publishing Group
REVIEWS
Box 3 | Hox cluster evolution through gene duplication
The Hox genes provide a remarkably conserved system for providing regional
identity to the primary body axis of developing embryos. Mutations in Hox genes
can lead to marked ‘homeotic’ phenotypes, in which one segment takes on the
identity of another. The Hox genes encode transcription factors with a conserved
60 amino-acid DNA-binding homeodomain and are characterized further by their
clustered organization on the chromosome. Wherever Hox genes have been looked
for among multicellular animals, they have been found — the sole exception, so far,
being the basal sponges.
The evolution of the Hox clusters is characterized by duplication events.
Invertebrates have a single cluster of Hox genes with a variable gene number
(Drosophila has 8 genes, amphioxus has 14; see panel a). Comparative analysis has
indicated that the common bilaterally symmetric ancestor of amphioxus and
Drosophila already had seven Hox genes31. This initial cluster was the result of
tandem duplication from a single ancestral Hox gene. Further tandem duplications
led to the differing complements of Hox genes in the single clusters of the different
invertebrates (for example, all the amphioxus genes indicated in green are believed to
have arisen through tandem duplications from an ancestral gene that is related to
Drosophila Abdominal-B (Abd-B)).
Vertebrates have several Hox clusters as the result of whole-cluster duplications.
Tetrapods, including mouse and human, have four Hox clusters on four separate
chromosomes, which were generated by at least two large-scale duplication events
(panel b; see also FIG. 1). These duplications might have been segmental, including
perhaps only the Hox clusters or the entire chromosome on which they lie, or might have
been genome-wide. Another duplication event in the lineage that leads to teleosts has led
to the presence of more than four Hox clusters in this group, with zebrafish having seven
clusters in total (panel c; see also FIG. 1). The organization of the zebrafish clusters32
reveals that many duplicate genes (with respect to a presumed ancestral four-cluster
organization) have been lost. So, zebrafish has far fewer than twice as many Hox genes as
mouse. In some cases, pseudogenes can be recognized (open circles), revealing the ‘ghost’
of a duplicate gene. Nevertheless, at least 11 duplicated pairs of genes (PARALOGUES) have
been retained in zebrafish, possibly as a consequence of sub-functionalization. Modified
with permission from REF. 91 © (2002) Elsevier Science.
a Drosophila
Hypothetical
ancestor
Amphioxus
1
2
3
4
5
6
7
8
9
10
11
12
13
10
11
12
13
10
11
12
13
2×
Duplication
b Mouse
A
B
C
D
1
2
3
4
5
6
7
8
9
Duplication
c Zebrafish
Aa
Ab
Ba
Bb
Ca
Cb
D
Paralogue group
830
1
2
3
4
5
6
7
8
9
14
preserve duplicates through sub-functionalization if
effective population sizes remain below 10,000. By contrast, populations with lower mutation rates (10−6) will
have duplicates with a high probability of preservation
through sub-functionalization even in populations that
exceed 1,000,000 individuals. In larger populations,
duplicates are most likely to be preserved by neofunctionalization (as this phenomenon depends on the
occurrence of rare beneficial mutations), although DDC
processes could act subsequently to resolve remaining
redundancy. Genome projects are revealing a history of
global and large regional duplication; furthermore, the
rate at which single duplicate genes are generated might
approach that of the single-nucleotide mutation rate1,
indicating that single-gene duplication events might
occur at a surprisingly high rate in extant species.
Below, we consider some examples in which DDC
processes seem to have contributed to retaining duplicate genes, and the implications of this for gene functions and networks. Although our examples are taken
from vertebrates and plants, duplication and subfunctionalization are concepts that apply broadly to
other organisms. For example, genome-sequencing projects have revealed that supposedly more simple organisms, such as the yeast Saccharomyces cerevisiae and the
nematode Caenorhabditis elegans, also have many duplicated genes in their genomes1,28,29. So, the exploration of
the functional complementation between gene pairs in
model systems should provide us with general information about the fates of duplicated genes.
Degenerative complementation in vertebrates
The vertebrate Hox genes are a clear example of evolution by gene duplication (BOX 3), providing a nice opportunity to explore some aspects of the DDC model. The
Hox genes are also particularly interesting because of
their well-known conserved role in the regionalization
of the body plan, which has led to extensive analyses of
Hox gene function and regulation in the mouse30. All
invertebrates seem to have just a single cluster of Hox
genes, whereas tetrapod vertebrates, such as mouse and
chick, have four clusters (HoxA–D), which are arranged
on four separate chromosomes31,32 (BOX 3). More
recently, it has been shown that teleost vertebrates have
more than four clusters of Hox genes, very probably due
to a whole-genome duplication event in their lineage7,33.
As both mouse and zebrafish have tractable genetic systems, their Hox genes provide ideal models for investigating the potential functional complementation
between duplicate genes. Furthermore, the remarkable
conservation of Hox function during evolution allows
meaningful comparisons to be made between mouse
and zebrafish Hox genes, despite their divergence over
~400 Mya (FIG. 1).
The vertebrate Hox genes fall into 13 paralogue
groups, with each cluster having fewer than 13 genes
(BOX 3). This is presumably a result of the loss of redundant duplicates, as predicted by classical models. Often,
more than one member of a vertebrate Hox paralogue
group is expressed in a given location, and these paralogous genes tend to have partially redundant functions.
| NOVEMBER 2002 | VOLUME 3
www.nature.com/reviews/genetics
© 2002 Nature Publishing Group
REVIEWS
a
10 hpf
10.5 hpf
12 hpf
16 hpf
hoxb1b
hoxb1a
b
Ancestral state
Intermediate state
Present-day zebrafish
hoxb1b
hoxb1b
hoxb1a
hoxb1a
Hoxb1
Sub-functionalization
Autoregulatory sequences
RARE
Figure 2 | Zebrafish duplicate genes subdivide ancestral mouse Hoxb1 expression.
a | Expression patterns of zebrafish hoxb1 duplicate genes in embryos. Embryos are shown in
dorsal view with the anterior to the top. Double in situ hydridizations reveal the expression of
hoxb1 genes (purple) and krox20 (now known as egr2; red), which is a marker for rhombomere
(r)3 and r5. The hoxb1b gene is expressed transiently in r4, up to the 10 hours post fertilization
(hpf ) stage, then gradually retreats towards the posterior. By contrast, hoxb1a has a later onset
of expression (10 hpf) and maintains a stable expression domain in r4 due to autoregulatory
control. Together, these expression patterns recapitulate the expression of the ancestral mouse
Hoxb1 gene. b | The diagram shows steps by which the expression of the ancestral mouse
Hoxb1 gene might have been partitioned into those of zebrafish hoxb1a and hoxb1b. The early
expression of mouse Hoxb1 depends on a 3′ retinoic-acid response element (3′ RARE)92, while
the r4 stripe is maintained through an autoregulatory mechanism by three Hox/cofactor binding
sites93. Zebrafish hoxb1b has a 3′ RARE similar to that of mouse Hoxb1, but has point changes
in each of the Hox/cofactor autoregulatory sites, consistent with the absence of a late r4
expression domain for this gene. By contrast, zebrafish hoxb1a retains perfect copies of all
three Hox/cofactor autoregulatory sites, but has no 3′ RARE element, which is consistent with
the lack of early expression. Together, the zebrafish hoxb1a and hoxb1b genes therefore
recapitulate the expression of the ancestral mouse Hoxb1 gene.
PARALOGUES
Homologous genes that are
related by a duplication event.
For example, mouse Hoxa2 and
Hoxb2 are paralogues.
NEURAL CREST
A vertebrate-specific migratory
cell type that derives from the
dorsal-most aspect of the neural
tube and contributes to many
tissues, including the peripheral
nervous system and cranium.
One interesting example of this is the mouse Hox paralogue group 3 genes. Although null mutants for Hoxa3
and Hoxd3 have independent phenotypes that affect
34,35
NEURAL-CREST-derived structures
and vertebrae36,
respectively, double mutant phenotypes lead to a complete absence of specific vertebral elements, revealing
redundancy between the genes37. The non-redundant
functions of the two paralogues must be a consequence
of differences in their cis-regulatory control rather than
their coding sequences, as Hoxa3 and Hoxd3 proteins
are functionally interchangeable38. Surprisingly, the
overall expression patterns of the two genes seem
superficially similar; however, the data reveal that the
details of their cis-regulation, probably including variations in level of expression, have profound functional
consequences, such that each gene has an important
patterning role in a separate tissue. These two paralogue group 3 genes probably represent an example of
functional complementation, although we need a more
detailed understanding of differences in the regulation
of the two genes, as well as a reconstruction of the
ancestral condition based on comparative data, to
investigate this further.
Among the zebrafish Hox genes, there are at least 11
instances in which duplicate genes have been retained
(BOX 3). In the case of the hoxb5a duplicates, expression
analysis coupled with gain-of-function studies indicates
that both genes have been retained possibly as a result of
sub-functionalization39. Recent studies in V.E.P.’s laboratory40,41 have focused on zebrafish Hox genes in paralogue group 1, which includes a pair of genes —
hoxb1a and hoxb1b — that are duplicated with respect
to the ancestral four-cluster state. In this study, we have
made use of the strengths of the zebrafish system, which
include both the ability to test gene function directly
and the recent availability of genomic sequence data
from the Sanger Sequencing Centre. The results of comparing the expression patterns, functions and regulatory
elements of the zebrafish hoxb1 duplicates with those of
the mouse Hox paralogue group 1 genes indicate that
both zebrafish gene copies were preserved as a consequence of complementary degenerative mutations, as
described below40.
In accordance with the DDC model, the zebrafish
hoxb1 duplicates seem to have subdivided the ancestral
mouse Hoxb1 expression pattern. So, zebrafish hoxb1b
shares the early expression pattern of mouse Hoxb1, in
the hindbrain of gastrulating embryos, whereas hoxb1a
shares the later expression of mouse Hoxb1, in a single
segment of the neurulation-stage hindbrain (RHOMBOMERE
(r)4; FIG. 2a). The DDC model further predicts degeneration of discrete and complementary cis-regulatory elements in the two zebrafish duplicates, and such changes
in these regulatory elements can indeed be recognized40
(FIG. 2b). The degenerative loss of regulatory modules in
each of the two duplicates is likely to have been sufficient
to allow the preservation of the two genes, in accordance
with the DDC model.
As the analysis of the expression patterns and regulatory elements of zebrafish hoxb1a and hoxb1b has
shown that these duplicates experienced complementary, degenerative mutations during their evolution40,
the two genes might be expected to subdivide the function of the single Hoxb1 ancestral gene. Although we
cannot know this function, it is probably similar to the
function of mouse Hoxb1. The primary phenotype of
null mutants of mouse Hoxb1 is a change in neuronal
identity: rhombomere-4-derived facial neurons do not
undergo their characteristic posterior migration42–44.
Knockdown of the zebrafish hoxb1 duplicates using
antisense MORPHOLINOS has shown that hoxb1a is similarly required for facial neuron migration40. However,
the hoxb1b gene does not have a role in this process and
NATURE REVIEWS | GENETICS
VOLUME 3 | NOVEMBER 2002 | 8 3 1
© 2002 Nature Publishing Group
REVIEWS
Ancestral state
Intermediate state
Present-day zebrafish
hoxa1b
Hoxa1
hoxa1a
Ventral midbrain and
hindbrain expression
hoxa1a
Non-functionalization
hoxa1a
Functional redundancy leads to
"function shuffling" as hoxA1a
loses hindbrain expression
hoxb1b
hoxb1b
hoxb1b
hoxb1a
hoxb1a
Hoxb1
Sub-functionalization
Autoregulatory sequences
3′ RARE
Putative midbrain domain regulatory elements
Figure 3 | Function shuffling. Zebrafish hoxb1a and hoxb1b have expression profiles that are remarkably similar to those of mouse Hoxb1 and Hoxa1, respectively41.
The early expression of mouse Hoxa1, like Hoxb1, is dependent on a 3′ retinoic-acid response element (3′ RARE), which transiently drives the expression of Hoxa1 in
the developing hindbrain94,95. By contrast, the only zebrafish orthologue of mouse Hoxa1, zebrafish hoxa1a, is not expressed in the developing hindbrain, but only in
the ventral midbrain41,96. Comparative analyses have indicated that this midbrain expression might be a primitive characteristic of Hoxa1, as it is shared by chick and
Xenopus91,97. In the zebrafish, hoxb1b has taken on the hindbrain patterning role of tetrapod Hoxa1, which has possibly freed hoxa1a to lose its hindbrain expression
domain, while retaining the ancestral midbrain patterning role. This function shuffling relies on a phase of partial functional redundancy between non-orthologous
genes, in this case hoxa1a and hoxb1b. These experiments reveal the importance of studying an entire group of duplicated genes to understand fully the
consequences of a duplication event. Furthermore, function shuffling might prove to be common among teleost paralogues. For example, it has recently been shown
using morpholino-based knock-down experiments that the zebrafish engrailed2a and engrailed2b genes have early developmental roles that are equivalent to that of
the non-orthologous mouse Engrailed 1 (En1) gene98. Modified with permission from REF. 91 © (2002) Elsevier Science.
RHOMBOMERE
A segment of the vertebrate
hindbrain (rhombencephalon).
MORPHOLINO
An antisense reagent that is able
to block translation to knock
down gene function.
TETRASOMY
When one chromosome in the
complement is represented four
times in each nucleus.
ORTHOLOGUES
Homologous genes that are
related by a speciation event. For
example, mouse Hoxa1 and
chick HOXA1 are orthologues.
832
is instead required for the correct segmental organization of the hindbrain40. This segmentation function of
hoxb1b is shared with mouse Hoxa1 (REFS 45–47). How
did the function of a HoxA gene shift to a HoxB gene?
We suggest that the extensive redundancy found
between paralogue group 1 genes (which are the result
of a series of duplication events) has allowed “function
shuffling” to occur (FIG. 3).
The duplication event that led to extra Hox clusters
in zebrafish was probably a part of a whole-genome
duplication. Evidence for this genome duplication
comes from mapping and sequencing data, coupled
with phylogenetic analysis7,33,48,49. About 20% of the
duplicates that arose from this event have been
retained7. To clarify, the zebrafish is not a tetraploid
organism. Unlike species that have undergone recent
duplications, such as salmonid fishes50, there is no evidence for TETRASOMY in zebrafish. Although many duplicated zebrafish genes have been retained, they are no
longer strictly equivalent genes showing complete
redundancy; instead, their initial redundancy has been
partially resolved in ways that can provide new insight
into the functions of the ancestral gene.
An interesting example is provided by the duplicated
microphthalmia-associated transcription factor (Mitf)
genes. The Mitf genes are required for the formation of
pigment cells, with different mutations in the single
human gene leading to syndromes that affect sensory
systems and pigmentation (Waardenburg syndrome
type 2a (REF. 51) or Tietz syndrome52). The single mouse
Mitf gene is characterized by several splice variants;
mutation of this gene leads to a loss of pigmented
neural-crest-derived melanocytes, as well as to a loss of
retinal pigment epithelium53. In zebrafish, there are two
mitf genes, mitfa and mitf b. A mutant in mitfa (the
nacre mutant) causes an absence of crest-derived
melanocytes; however, despite expression of mitfa in
retinal epithelium, this tissue is intact in nacre fish. This
observation led Lister and colleagues54 to search for and
find the mitf b duplicate gene, which they showed is
expressed with mitfa in the retina. Although the duplicates share significant sequence identity, the mitfb gene
includes an alternative 5′ exon, such that the two duplicates together recapitulate both the expression patterns
and the two distinct isoforms of their mammalian
ORTHOLOGUE. Furthermore, the two mitf genes do not
behave identically in their ability to functionally rescue
the nacre mutant, which indicates that tissue-specific
alternative splice products of a single ancestral gene have
been converted into two genes with distinct properties.
Like the Hox genes, duplicated mitf genes have been
found in several teleosts; the expression and sequence
| NOVEMBER 2002 | VOLUME 3
www.nature.com/reviews/genetics
© 2002 Nature Publishing Group
REVIEWS
ORGANIZER
A small dorsal region of the
vertebrate gastrula-stage embryo
that has the remarkable capacity
to organize a complete
embryonic body plan. Hilde
Mangold and Hans Spemann
first identified the organizer in
amphibian embryos using tissue
transplantation.
MADS BOX
A highly conserved sequence
motif found in a family of plant
transcription factors and named
after the initials of the four
founder members of the family.
MERISTEM
An undifferentiated cell
population that resides at the
growing tip of the roots or
shoots of a plant.
analysis of mitf duplicates in the small teleost
Xiphophorus maculatus indicates that sub-functionalization occurred in the common ancestor of Xiphophorus
and zebrafish55.
The zebrafish is a powerful genetic model system,
and high-throughput mutagenesis approaches have
already produced hundreds of mutations in genes that
are required for crucial developmental processes56. The
existence of functionally complementary duplicates in
this species turns out to be a help to genetic analysis
rather than a hindrance, because the pleiotropy of each
gene is reduced, which facilitates their study. For example, squint (sqt ; now called nodal-related 1, ndr1) and
cyclops (cyc) are zebrafish mutants in two duplicated
nodal-class genes. Analysis of these individual and double mutant phenotypes has helped to shed light on the
complexities of function of the single mouse nodal
gene57,58. The ndr1 gene is expressed maternally, whereas
cyc is a zygotic transcript, but the two act partially
redundantly during the establishment of the ORGANIZER,
and in the formation of both endoderm and mesoderm.
The cyc gene also has a late function in the patterning of
the neural plate. This requirement is only revealed
genetically because ndr1 provides an early embryonic
nodal signal, allowing the embryo to develop to the
point at which the cyc requirement is revealed. This late
requirement for nodal signalling is masked in the mouse
by the early lethality of the nodal mutant. So, the partial
redundancy shown between these duplicate zebrafish
genes allows us to chart the likely sub-function organization of orthologous genes in other organisms.
Another example is provided by the vertebrate Sox9
genes. Mutations in human SOX9 cause a complex condition known as campomelic dysplasia, which is characterized by extensive cartilage phenotypes and sex reversal.
The early lethality of the homozygous condition has prevented the mechanism of SOX9 function in cartilage formation from being studied in detail either in the human
or in a mouse model. Recently, Yan and colleagues59
established that a zebrafish Sox9 duplicate, sox9a, is
mutated in the jellyfish mutant. Although this is a recessive-lethal mutation, the embryos develop to larval stages,
which allows detailed analysis of cartilage phenotypes.
The jellyfish mutant has revealed that Sox9 function is
required for cartilage morphogenesis and differentiation,
but not for the initial specification or migration of the
neural crest cells from which the cartilage is derived. In
this instance, the existence of duplicate genes has allowed
the analysis of a homozygous mutant condition in
zebrafish that could not be explored in other vertebrates.
The partitioning of ancestral sub-functions between
duplicate gene pairs by DDC mechanisms in the teleost
lineage should continue to make mutant analysis in the
zebrafish system a rewarding approach and to provide
further insight into the details of the pleiotropic functions
of human disease genes.
Functional complementation in plants
INFLORESCENCE MERISTEM
An apical meristem that lies atop
a shoot and that produces
several, lateral flower meristems.
Whereas whole-genome duplication occurs infrequently
in animals, plants have adopted it as a routine. For example, many of the plants we depend on for food, such as
maize and wheat, are ancient polyploids. Extensive gene
duplication during the evolution of both maize60,61 and
the mustard Arabidopsis thaliana1,62,63 has had a marked
impact on the genes that regulate plant reproduction. The
best estimates indicate that >35% of genes in these plants
are preserved as duplicate copies16,61. One well-characterized example of partial redundancy, and apparent functional complementation, after gene duplication involves
the APETALA1 (AP1), CAULIFLOWER (CAL) and
FRUITFULL (FUL) genes of Arabidopsis. These genes all
encode MADS-BOX-containing transcriptional regulators
that have roles in the initial specification of flower
MERISTEMS and the subsequent specification of floral organ
primordia and later organ cell types64. AP1 and CAL have
closely related sequences65 and are embedded in large
regions of conserved synteny that are located on different
arms of chromosome 1. They might be products of either
an ancient linear duplication event, later separated by
chromosome rearrangement, or polyploidization and
translocation62,66. The FUL gene is more closely related to
AP1 and CAL than to other Arabidopsis MADS-box
genes, but its location on chromosome 5 indicates that it
came to reside in the genome through a process of polyploidization. The three genes have between 55% and 75%
identity in their amino-acid sequences in functionally
defined regions, and their pattern of synteny in the
Arabidopsis genome indicates that they might be good
candidates for participating in DDC processes.
The single, double and triple mutant phenotypes of
severe hypomorphic or amorphic alleles of these genes
indicate that functional complementation could explain
their collective persistence in the genome. The single
mutant phenotypes of ap1, cal and ful indicate independent roles for each gene in normal development. The
most striking phenotype of severe ap1 loss-of-function
homozygous alleles is the homeotic transformation of
cells that contribute to the outer whorl organs of the
flower (the sepals and petals) towards an INFLORESCENCE
MERISTEM fate. This results in the ectopic production of
whole-flower meristems and the eventual transformation of single flowers into multi-flower branches67 (FIG. 4).
By contrast, homozygous cal mutant lines show little,
if any, phenotype. Homozygotes for severe loss-offunction alleles of the ful gene produce defects in the cellular differentiation of the seed pod. The key collective
role of the three genes in establishing flower meristem
fate came to be appreciated when double and triple
homozygous mutants were constructed9,64,68. Double
homozygotes for ap1 and cal mutations have a markedly
synergistic phenotype with characteristics not seen in
single mutants. These plants fail to make the normal
inflorescence-to-floral transition, instead producing
large, highly ramified clusters of inflorescence meristems
— reminiscent of the edible part of the cauliflower —
that produce only occasional floral organs (FIG. 4). Triple
mutants that lack all three of these gene activities completely fail to produce floral organs of any type under
normal growth conditions. This result indicates that
these three genes share a high level of interlocus functional redundancy and that this redundant activity stimulates meristems to begin the production of flowers.
NATURE REVIEWS | GENETICS
VOLUME 3 | NOVEMBER 2002 | 8 3 3
© 2002 Nature Publishing Group
REVIEWS
Figure 4 | Mutant phenotypes of ap1 and ap1/cal plants. Scanning electron micrographs
of Arabidopsis plants. From left to right: wild type; ap1/ap1 loss of function, showing a single
flower without petals or STAMENS but with this tissue ‘homeotically’ transformed into three new
flowers; a cal/cal;ap1/ap1 double mutant, in which flowers are replaced by ‘cauliflower’-like
inflorescence meristems.
STAMEN
The male, pollen-bearing organ
of the plant.
ECOTYPE
A subdivision of a species that
survives as a distinct population
through environmental selection
and reproductive isolation.
SYNONYMOUS CHANGE
A nucleotide change that does
not alter the amino acid that is
encoded.
REPLACEMENT ALLELE
An allele in which a mutation
causes a resulting change in
amino-acid identity.
834
Expression studies that complement this mutational
analysis hint at patterns of complementation that might
have contributed to the genetic interactions now seen
between the three genes. In mutants that lack AP1 activity, FUL RNA begins to accumulate in flower meristems
at early stages. FUL is normally not expressed at such an
early stage69, which implies the loss of a negative regulatory interaction between these genes. It is possible that an
ancestral gene that has negative autoregulatory elements
gave rise to one gene that retained a negative regulatory
sub-function (FUL) and to another gene that has lost
this sub-function through degenerative mutation (AP1).
Intensive promoter analysis of Arabidopsis and outgroup
MADS-box genes indicates that a scheme of initial subfunctionalization might be a logical hypothesis. To assert
confidently that the Arabidopsis triple mutant
(ap1/cal/ful ) phenotype recapitulates the phenotype that
would be expected from the loss of a single ancestral
gene, the functional analysis of orthologues in an array
of related plants is required. Such phylogenetically driven
analyses might reveal that the process of duplication and
degeneration of sub-functions has led to the development of a gene regulatory network, underpinning both
the establishment of reproduction and the developmental modularity of inflorescences and flowers.
The DDC model indicates that the initial preservation of gene duplicates by sub-functionalization is
followed by further degeneration of redundant subfunctions. This process, referred to as resolution, is anticipated to be completed in <5 million years (REF. 17), in
most cases. Given this time frame, the continuing redundancy seen between the AP1, CAL and FUL genes is surprising as the last major duplication event in the
Arabidopsis lineage occurred at least 65 Mya (REF. 1). So,
the redundancy seen between these genes is difficult to
explain simply from the standpoint of the DDC model.
It is possible that AP1, CAL and FUL have regulatory
regions with physically or functionally overlapping cisregulatory sites that have an impact on several functions
of each gene17,22. This type of functional entanglement
would violate the requirement of the DDC model that
regulatory regions be independently mutable. Under this
model, all three loci would be left with redundant regulatory regions that could not be resolved by subsequent
rounds of mutation.
Alternatively, it is possible that these genes have
recently changed from a situation in which all three
genes were under purifying or positive selection for
their collective role to a position in which these selective
constraints have been removed. If this model is correct,
a “punctuated DDC” process might have ensued in
which initial degeneration and complementation
under a selective constraint was followed by a change in
that selection, which led to reacquisition of functional
redundancy and the subsequent accumulation of new
degenerate alleles. A singularly powerful study by
Purugganan and Suddith70 serves as a model to explore
the population-level processes that affect redundant
duplicates. Evidence from this study provides some
support for a punctuated DDC process having an
impact on these genes. A sequence survey of 17 CAL
alleles that were isolated from 12 distinct Arabidopsis
ECOTYPE populations showed that CAL is a highly polymorphic gene. A total of 21 polymorphisms were seen
in exons, 16 of which caused non-synonymous changes
at the amino-acid level and 5 of which were
SYNONYMOUS. This is in contrast to a recent survey of 18
coding-region polymorphisms at the AP1 locus, in
which 10 non-synonymous and 8 synonymous variants
were observed71. This result should be treated with a
measure of caution because of the potential for
sequencing errors, but it nevertheless shows that there
is an excess of REPLACEMENT ALLELES at CAL in
Arabidopsis 72, indicating that this gene has been evolving in a non-neutral fashion in these populations. So,
alleles in these ecotypes seem to have diversified under
evolutionary selection. However, the generation of
double homozygotes for AP1-null and CAL ecotype
alleles revealed that at least two populations were fixed
for severe loss-of-function alleles at the CAL locus. This
indicates that the role of CAL as an active gene might
be changing to that of a pseudogene, as FUL and AP1
act in its place. Alternatively, degenerative mutations
might be leading to the fixation of non-functional CAL
alleles in some populations and to alleles that complement FUL and AP1 activities in others, owing to a
recent relaxation of selective constraints. As more
becomes known about the regulation of all three genes,
it will be interesting to expand the sequence analysis of
polymorphic CAL alleles into the cis-regulatory regions
and to test the functional equivalence of FUL and AP1
alleles from ecotypes with and without functional CAL.
The general approach of using population-level assessments of allele polymorphism with the functional
assessment of ‘natural’ alleles, through genetic or transgenic experiments, should shed more light on the
evolving state of duplicate gene interactions.
The promise of the genomic era
As sub-functionalization might often depend on complementary degenerative changes in the cis-regulatory
elements of duplicated genes, it might ultimately prove
possible to recognize candidate cases of sub-functionalization on the basis of direct analysis of regulatory
sequences. Recently, a large-scale sequence comparison
of Hox clusters was carried out, which failed to produce
| NOVEMBER 2002 | VOLUME 3
www.nature.com/reviews/genetics
© 2002 Nature Publishing Group
REVIEWS
CLADE
A lineage of organisms or alleles
that comprises an ancestor and
all its descendants.
strong evidence of sub-functionalization. Chiu and colleagues73 compared the complete HoxA clusters of
species across the main jawed-vertebrate lineages
(human, horn shark (a chondrichthyian fish), striped
bass and zebrafish; FIG. 1) by using several sequencealignment tools, including the powerful Web-based software PipMaker (for percentage identity plot74; see
Online link to PipMaker and MultiPipMaker). The
PipMaker program aligns multiple large sequences and
produces a readout that is easy to understand and that
shows regions of homology where percentage identity
exceeds 50%. Chiu and colleagues focused on short
100% conserved regions, called ‘phylogenetic footprints’,
which they found occurred frequently in clusters that
could span 200 nucleotides or more. Their comparisons
showed that, although horn shark and human share
large tracts of sequence identity, these were rarely conserved in the teleosts (zebrafish or striped bass). The
analysis infrequently showed evidence of duplicated
zebrafish Hox genes partitioning the phylogenetic footprints present in the horn shark or human (which are
representative of the pre-duplication ancestral condition), which is apparently inconsistent with the DDC
model. Instead, they suggested that their results indicate
the action of adaptive modification on the duplicated
Hox clusters.
Santini and Meyer have carried out rather similar
analyses, but obtained more evidence in support of the
occurrence of DDC processes (S. Santini and A. Meyer,
personal communication). These researchers again
compared HoxA cluster sequences, incorporating data
on the cichlid Oreochromis niloticus into their analysis.
Similarly to Chiu and colleagues they used the
PipMaker software, but they focused on longer “conserved non-coding sequences”, showing at least 60%
identity over 50 nucleotides. They also made use of the
TRANSFAC Web-based software (see Online link to
TRANSFAC) to identify known transcription-factorbinding sites. This approach indicated the possibility
of more extensive conservation of HoxA regulatory
elements between all the vertebrates examined.
Furthermore, in cases where two HoxA gene duplicates
have both been retained in the zebrafish, the conserved
regulatory elements were recognizable in both copies.
However, these elements also showed some differences
from one another, consistent with the action of DDC
processes.
Another comparative study, which focused on the
Hoxb2–Hoxb3 intergenic region and used a broad array
of sequence analysis tools, also revealed extensive homology between mammals and teleosts. The Hoxb2–Hoxb3
intergenic sequences of mouse, human, zebrafish, striped
bass and pufferfish all share conserved cis-regulatory elements75. These conserved sequences are important for
the proper expression of mouse Hoxb2 and, consistent
with the conserved function of the elements, the expression patterns of the vertebrate Hoxb2 orthologues are
also largely conserved. Interestingly, in several cases, the
binding sites occur in different orders in different species,
and such reorganization of small cis-regulatory elements
might make it difficult for large-scale alignment tech-
niques to detect all the sequences that are important for
functional homology. More detailed analyses such as this
might therefore hold promise for allowing candidate
sub-functions to be recognized.
On a note of caution, the conservation that has
been detected at the sequence level for Hox-cluster regulatory elements might not reflect the situation with
non-Hox genes. The Hox genes are subject to unusual
constraints that have been powerful enough to maintain clustered organization over many millions of
years. For other classes of genes, it might prove far
more difficult to detect the sequences that produce
conserved expression patterns because cis binding sites
and transcription factor proteins can co-evolve. For
example, the potential complexity and rapid evolution
of cis-elements has been amply shown in a detailed
comparison of the eve stripe 2 enhancers from the
closely related drosophilids Drosophila melanogaster
and D. virilis 76. Despite the conserved expression patterns of the eve genes in the two species, the enhancers
have evolved rapidly, with compensatory changes
occurring in each. A different complicating factor has
occurred in the case of maize, in which transposons
have made important contributions to regulatory
sequences77. Such changes might make it difficult to
recognize the complementary degenerative changes
that should be the hallmark of sub-functionalization
events, especially in cases of ancient duplications.
An approach based on sequence analysis has also
been taken by several groups to look at the coding
regions of genes and to identify regional divergence that
would be consistent with the DDC model78,79. To identify
potential duplicates in a species’ genome, computational
methods are used to identify likely members of gene
families, to align nucleotide and amino-acid sequences
and to reconstruct the molecular phylogeny of gene families using tree-building programs78. Terminal branches
of CLADES with pairs of highly similar genes can indicate
likely candidates for duplicates65, which in turn can be
subjected to tests to identify regions with high or low relative rates of non-synonymous/replacement nucleotide
changes. These relative rates can be used to determine
whether aligned regions of closely related genes are
diverging or are under purifying selection to maintain
functional similarity in both genes80,81. The increasingly
sophisticated functional and phylogenetic analyses of
gene families identified from sequencing projects are
already revealing potential redundancy and complementation between duplicates that arose from global and
local duplication events82,83.
Using amino-acid-sequence and nucleotide-sequence
alignments, and the tools of molecular phylogeny, Gu78
has developed statistical methods to analyse all of the
amino-acid sites from gene families between and within
species that are undergoing statistically significant divergence. Using the DIVERGE software package (see Online
link to DIVERGE software), each amino acid is assigned a
probabilistic value based on its chemical type (for example, acidic or hydrophobic) and the severity of change
compared with another sequence. The program can even
map changes onto known three-dimensional structures
NATURE REVIEWS | GENETICS
VOLUME 3 | NOVEMBER 2002 | 8 3 5
© 2002 Nature Publishing Group
REVIEWS
for proteins, to assess whether or not a change is occurring in a region of the protein with a known active site, a
solvent-exposed region or a role in folding. As more
structures become available, the ability to map patterns of
functional complementation onto individual protein
structures might provide evolutionary guidance to the
biochemical and proteomic analysis of protein function.
Concluding remarks
What are the implications of the sub-functionalization model for evolution? One important concept is
that once sub-functionalization has preserved duplicate genes in the genome, those genes will be under
different selective pressures relative to their shared
ancestor. This might enable duplicates to explore a
mutational space that is closed to their shared ancestor. The differential resolution of functional overlap
through degenerative complementation might also be
important in speciation. For example, if duplicates
from different subpopulations become unable to
complement each other’s lost sub-functions, the two
populations might no longer be able to interbreed
when they are later reunited1,84. The larger the number
of duplicated genes in the original population the
more likely such differential resolution becomes.
Future investigation of the sub-functionalization
model will require an analysis of candidate subfunctionalized genes in a clear phylogenetic context.
Knowledge of the common ancestor will be crucial for
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
836
Lynch, M. & Conery, J. S. The evolutionary fate and
consequences of duplicate genes. Science 290,
1151–1155 (2000).
This paper analyses divergence rates between
duplicated genes from six eukaryotic genomes and
argues that duplications might be important in
speciation.
Song, K., Lu, P., Tang, K. & Osborn, T. C. Rapid genome
change in synthetic polyploids of Brassica and its
implications for polyploid evolution. Proc. Natl Acad. Sci.
USA 92, 7719–7723 (1995).
Hughes, M. K. & Hughes, A. L. Evolution of duplicate
genes in a tetraploid animal, Xenopus laevis. Mol. Biol.
Evol. 10, 1360–1369 (1993).
Sidow, A. Gen(om)e duplications in the evolution of early
vertebrates. Curr. Opin. Genet. Dev. 6, 715–722 (1996).
Meyer, A. & Schartl, M. Gene and genome duplications in
vertbrates: the one-to-four (-to-eight in fish) rule and the
evolution of novel gene functions. Curr. Opin. Cell Biol. 11,
699–704 (1999).
Wolfe, K. H. Yesterday’s polyploids and the mystery of
diploidization. Nature Rev. Genet. 2, 333–341 (2001).
Postlethwait, J. H. et al. Zebrafish comparative genomics
and the origins of vertebrate chromosomes. Genome
Res. 10, 1890–1902 (2000).
Tautz, D. Redundancies, development and the flow of
information. Bioessays 14, 263–266 (1992).
Pickett, F. B. & Meeks-Wagner, D. R. Seeing double,
appreciating genetic redundancy. Plant Cell 7, 1347–1356
(1995).
Thomas, J. H. Thinking about genetic redundancy. Trends
Genet. 9, 395–399 (1993).
Fisher, R. A. The sheltering of lethals. Am. Nat. 69,
446–455 (1935).
Haldane, J. B. S. The part played by recurrent mutation in
evolution. Am. Nat. 67, 5–9 (1933).
Nadeau, J. H. & Sankoff, D. Comparable rates of gene
loss and functional divergence after genome duplications
early in vertebrate evolution. Genetics 147, 1259–1266
(1997).
Li, W. H., Gu, Z., Wang, H. & Nakrutenko, A. Evolutionary
analyses of the human genome. Nature 409, 847–849
(2001).
applying rigorous tests, in which gene expression, function, regulation and sequence evolution can be
analysed to piece together the evolutionary history of
duplicates. At present, the two examples that most
closely approach complete tests of the sub-functionalization model are the analyses of the zebrafish mitf and
hoxb1 duplicate genes (as previously discussed).
Zebrafish and other teleost fishes, as well as plants, will
continue to provide useful models in which to investigate DDC processes because of their relatively recent
duplications and the ease with which they can be
manipulated genetically in the lab. More experimental
data will also allow increasingly sophisticated theoretical models to be generated. These might have predictive
value, such that, in the long term, candidate subfunctionalized genes might be recognizable simply on
the basis of sequence analysis.
Finally, the sub-functionalization model allows a
new appreciation of the concepts of redundancy and
multifunctionality. For many researchers, redundancy
has been considered something of a problem.
Theoreticians have grappled with explaining why
redundancy occurs. Mouse molecular geneticists have
been frustrated to find that knockout alleles do not
always produce a phenotype. A consideration of genes
as the sum of their sub-functions might ultimately help
us to understand both redundancy and the informational networks that support genetic and phenotypic
modularity.
15. Bailey, J. A. et al. Recent segmental duplications in the
human genome. Science 297, 1003–1007 (2002).
16. The Arabidopsis Genome Initiative. Analysis of the genome
sequence of the flowering plant Arabidopsis thaliana. Nature
408, 796–815 (2000).
17. Force, A. et al. Preservation of duplicate genes by
complementary, degenerative mutations. Genetics 151,
1531–1545 (1999).
The original description of the DDC subfunctionalization model is reported here.
18. Lynch, M. & Force, A. The probability of duplicate gene
preservation by subfunctionalization. Genetics 154,
459–473 (2000).
19. Edelman, G. M., Meech, R., Owens, G. C. & Jones, F. S.
Synthetic promoter elements obtained by nucleotide
sequence variation and selection for activity. Proc. Natl
Acad. Sci. USA 97, 3038–3043 (1999).
20. Yuh, C. H., Bolouri, H. & Davidson, E. H. Cis-regulatory logic
in the endo16 gene: switching from a specification to a
differentiation mode of control. Development 128, 617–629
(2001).
21. Carroll, S. B. Endless forms: the evolution of gene regulation
and morphological diversity. Cell 101, 577–580 (2000).
22. Force, A., Cresko, W. F. & Pickett, F. B. in Modularity in
Development and Evolution (eds Schlosser, G. & Wagner,
G.) (Univ. of Chicago Press, Illinois, in the press).
23. Piatigorsky, J. & Wistow, G. The recruitment of crystallins:
new functions precede gene duplication. Science 252,
1078–1079 (1991).
24. Hughes, A. L. The evolution of functionally novel proteins
after gene duplication. Proc. R. Soc. Lond. B Biol. Sci. 256,
119–124 (1994).
25. Averof, M., Dawes, R. & Ferrier, D. Diversification of
arthropod Hox genes as a paradigm for the evolution of
gene functions. Semin. Cell Dev. Biol. 7, 539–551 (1996).
26. Stoltzfus, A. On the possibility of constructive neutral
evolution. J. Mol. Evol. 49, 169–181 (1999).
27. Lynch, M., O’Hely, M., Walsh, B. & Force, A. The probability
of preservation of a newly arisen gene duplicate. Genetics
159, 1789–1804 (2001).
28. Castillo-Davis, C. I. & Hartl, D. L. Genome evolution and
developmental constraint in Caenorhabditis elegans. Mol.
Biol. Evol. 19, 728–735 (2002).
| NOVEMBER 2002 | VOLUME 3
29. Seoighe, C. & Wolfe, K. H. Updated map of duplicated
regions in the yeast genome. Genes Dev. 238, 253–261
(1999).
30. McGinnis, W. & Krumlauf, R. Homeobox genes and axial
patterning. Cell 68, 283–302 (1992).
31. De Rosa, R. et al. Hox genes in brachiopods and priapulids
and protostome evolution. Nature 399, 772–776 (1999).
32. Holland, P. W., Garcia-Fernandez, J., Williams, N. A. &
Sidow, A. Gene duplications and the origins of vertebrate
development. Development (Suppl.), 125–133 (1994).
33. Amores, A. et al. Genome duplications in vertebrate
evolution: evidence from zebrafish Hox clusters. Science
282, 1711–1714 (1998).
This study analysed the complete organization of the
zebrafish Hox clusters, providing strong evidence for
the occurrence of a whole-genome duplication event
during teleost evolution.
34. Manley, N. R. & Capecchi, M. R. The role of Hoxa-3 in
mouse thymus and thyroid development. Development 121,
1989–2003 (1995).
35. Chisaka, O. & Capecchi, M. R. Regionally restricted
developmental defects resulting from targeted disruption of
the mouse homeobox gene hox-1.5. Nature 350, 473–479
(1991).
36. Condie, B. G. & Capecchi, M. R. Mice homozygous for a
targeted disruption of Hoxd-3 (Hox-4.1) exhibit anterior
transformations of the first and second cervical vertebrae,
the atlas and the axis. Development 119, 579–595 (1993).
37. Condie, B. G. & Capecchi, M. R. Mice with targeted
disruptions in the paralogous genes hoxa-3 and hoxd-3
reveal synergistic interactions. Nature 370, 304–307
(1994).
38. Greer, J. M., Puetz, J., Thomas, K. R. & Capecchi, M. R.
Maintenance of functional equivalence during paralogous
Hox gene evolution. Nature 403, 661–665 (2000).
An elegant mouse genetics approach to investigating
functional redundancy in Hox genes.
39. Bruce, A., Oates, A., Prince, V. E. & Ho, R. K. Additional hox
clusters in the zebrafish: divergent expression belies
conserved activities of duplicate hoxB5 genes. Evol. Dev. 3,
127–144 (2001).
40. McClintock, J. M., Kheirbek, M. A. & Prince, V. E. Knockdown of duplicated zebrafish hoxb1 genes reveals distinct
www.nature.com/reviews/genetics
© 2002 Nature Publishing Group
REVIEWS
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
roles in hindbrain patterning and a novel mechanism of
duplicate gene retention. Development 129, 2339–2354
(2002).
Describes the sub-functionalization of a pair of
duplicated zebrafish Hox genes. This study is unique
in including the analysis of not only duplicate gene
expression and function, but also duplicate regulatory
sequences.
McClintock, J. M., Carlson, R., Mann, D. M. & Prince, V. E.
Consequences of Hox gene duplication in the vertebrates:
an investigation of the zebrafish Hox paralogue group 1
genes. Development 128, 2471–2484 (2001).
Studer, M., Lumsden, A., Ariza-McNaughton, L., Bradley,
A. & Krumlauf, R. Altered segmental identity and abnormal
migration of motor neurons in mice lacking Hoxb1. Nature
384, 630–634 (1996).
Goddard, J. M., Rossel, M., Manley, N. R. & Capecchi, M.
R. Mice with targeted disruption of Hoxb1 fail to form the
motor nucleus of the V11th nerve. Development 122,
3217–3228 (1996).
Gaufo, G. O., Flodby, P. & Capecchi, M. R. Hoxb1 controls
effectors of sonic hedgehog and Mash1 signaling
pathways. Development 127, 5343–5354 (2000).
Lufkin, T., Dierich, A., LeMeur, M., Mark, M. & Chambon, P.
Disruption of the Hox-1.6 homeobox gene results in
defects in a region corresponding to its rostral domain of
expression. Cell 66, 1105–1119 (1991).
Carpenter, E. M., Goddard, J. M., Chisaka, O., Manley, N.
R. & Capecchi, M. R. Loss of Hox-A1 (Hox-1.6) function
results in the reorganization of the murine hindbrain.
Development 118, 1063–1075 (1993).
Mark, M. et al. Two rhombomeres are altered in Hoxa1
mutant mice. Development 119, 319–338 (1993).
Postlethwait, J. H. et al. Vertebrate genome evolution and
the zebrafish gene map. Nature Genet. 18, 345–349
(1998).
Taylor, J. S., Van de Peer, Y., Braasch, I. & Meyer, A.
Comparative genomics provides evidence for an ancient
genome duplication event in fish. Phil. Trans. R. Soc. Lond.
B Biol. Sci. 356, 1661–1679 (2001).
Sakamoto, T. et al. A microsatellite linkage map of rainbow
trout (Oncorhynchus mykiss) characterized by large sexspecific differences in recombination rates. Genetics 15,
1331–1345 (2000).
Tassabehji, M., Newton, V. E. & Read, A. P. Waardenburg
syndrome type 2 caused by mutations in the human
microphthalmia (MITF) gene. Nature Genet. 8, 251–255
(1994).
Smith, S. D., Kelley, P. M., Kenyon, J. B. & Hoover, D.
Tietz syndrome (hypopigmentation/deafness) caused by
mutation of MITF. J. Med. Genet. 37, 446–448 (2000).
Hodgkinson, C. A. et al. Mutations at the mouse
microphthalmia locus are associated with defects in a gene
encoding a novel basic helix–loop–helix zipper protein. Cell
74, 395–404 (1993).
Lister, J. A., Close, J. & Raible, D. W. Duplicate mitf genes
in zebrafish: complementary expression and conservation
of melanogenic potential. Dev. Biol. 237, 333–344 (2001).
Shows that the zebrafish mitfa and mitfb duplicate
genes are homologous to distinct isoforms of the
mammalian Mitf gene.
Altschmied, J. et al. Subfunctionalization of duplicate mitf
genes associated with differential degeneration of
alternative exons in fish. Genetics 161, 259–267 (2002).
Talbot, W. S. & Hopkins, N. Zebrafish mutations and
functional analysis of the vertebrate genome. Genes Dev.
14, 755–762 (2000).
Sampath, K. et al. Induction of the zebrafish ventral brain
and floorplate requires cyclops/nodal signalling. Nature
395, 185–189 (1998).
Feldman, B. et al. Zebrafish organizer development and
germ-layer formation require nodal-related signals. Nature
395, 181–185 (1998).
Yan, Y.-L. et al. A zebrafish sox9 gene is required for
cartilage morphogenesis. Development (in the press).
Gaut, B. S. & Doebley, J. F. DNA sequence evidence for the
segmental allotetraploid origin of maize. Proc. Natl Acad.
Sci. USA 94, 6809–6814 (1997).
Gaut, B. S. Patterns of chromosomal duplication in maize
and their implications for comparative maps of the grasses.
Genome Res. 11, 55–66 (2001).
Vision, T. J., Brown, D. G. & Tanksley, S. D. The origins of
genomic duplications in Arabidopsis. Science 290,
2114–2116 (2000).
63. Blanc, G., Barakat, A., Guyot, R., Cooke, R. & Delseny, M.
Extensive duplication and reshuffling in the Arabidopsis
genome. Plant Cell 12, 1093–1101 (2000).
64. Ferrandiz, C., Gu, Q., Martienssen, R. & Yanofsky, M. F.
Redundant regulation of meristem identity and plant
architecture by FRUITFULL, APETALA1, and
CAULIFLOWER. Development 127, 725–734 (2000).
This report describes the phenotypes of triple
mutants of the Arabidopsis genes AP1, CAL and FUL
and their partially redundant functions in a gene
network.
65. Purugganan, M. D., Rounsley, S. D., Schmidt, R. J. &
Yanofsky, M. F. Molecular evolution of flower
development: diversification of the plant MADS-box
regulatory gene family. Genetics 140, 345–356 (1995).
66. Achaz, G., Netter, P. & Coissac, E. Study of
intrachromosomal duplications among the eukaryote
genomes. Mol. Biol. Evol. 18, 2280–2288 (2001).
67. Irish, V. F. & Sussex, I. M. Function of the apetala-1 gene
during Arabidopsis floral development. Plant Cell 2,
741–753 (1990).
68. Bowman, J. L., Alvarez, J., Weigel, D., Meyerowitz, E. M.
& Smyth, D. R. Control of flower development in
Arabidopsis thaliana by APETALA1 and interacting genes.
Development 119, 721–743 (1993).
69. Mandel, M. A. & Yanofsky, M. F. The Arabidopsis AGL8
MADS box gene is expressed in inflorescence meristems
and is negatively regulated by APETALA1. Plant Cell 7,
1763–1771 (1995).
70. Purugganan, M. D. & Suddith, J. I. Molecular population
genetics of the Arabidopsis CAULIFLOWER regulatory
gene: nonneutral evolution and naturally occurring
variation in floral homeotic function. Proc. Natl Acad. Sci.
USA 9, 8130–8134 (1998).
Describes sequence comparisons of MADS-box
genes from ecotypes of Arabidopisis to reveal that
CAL is a surprisingly polymorphic gene.
71. Olsen, K. M., Womack, A., Garrett, A. R., Suddith, J. I. &
Purugganan, M. D. Contrasting evolutionary forces in the
Arabidopsis thaliana floral developmental pathway.
Genetics 160, 1641–1650 (2002).
72. McDonald, J. H. & Kreitman, M. Adaptive protein
evolution at the Adh locus in Drosophila. Nature 351,
652–654 (1991).
73. Chiu, C.-H. et al. Molecular evolution of the HoxA cluster
in the three major gnathostome lineages. Proc. Natl Acad.
Sci. USA 99, 5492–5497 (2002).
74. Schwartz, S. et al. PipMaker — A web server for aligning
two genomic DNA sequences. Genome Res. 10,
577–586 (2000).
75. Scemama, J.-L., Hunter, M., McCallum, J., Prince, V. &
Stellwag, E. Evolutionary divergence of teleost Hoxb2
expression patterns and transcriptional regulatory loci.
J. Exp. Zool. 294, 285–299.
76. Ludwig, M. Z., Bergman, C., Patel, N. H. & Kreitman, M.
Evidence for stabilizing selection in a eukaryotic enhancer
element. Nature 403, 564–567 (2000).
77. Zhang, Q., Arbuckle, J. & Wessler, S. R. Recent,
extensive, and preferential insertion of members of the
miniature inverted-repeat transposable element family
Heartbreaker into genic regions of maize. Proc. Natl Acad.
Sci. USA 97, 1160–1165 (2000).
78. Gu, X. Statistical methods for testing functional
divergence after gene duplication. Mol. Biol. Evol. 16,
1664–1674 (1999).
79. Dermitzakis, E. T. & Clark, A. G. Differential selection after
duplication in mammalian developmental genes. Mol. Biol.
Evol. 18, 557–562 (2001).
80. Nei, M. & Kumar, S. Molecular Evolution and
Phylogenetics (Oxford Univ. Press, New York, 2000).
81. Kumar, S., Tamura, K., Jakobsen, I. B. & Nei, M.
MEGA2: molecular evolutionary genetics analysis
software. Bioinformatics 17, 1244–1245 (2001).
82. Shuai, B., Reynaga-Pena, C. G. & Springer, P. S.
The lateral organ boundaries gene defines a novel,
plant-specific gene family. Plant Physiol. 129, 747–761
(2002).
83. Shiu, S. H. & Bleecker, A. B. Receptor-like kinases from
Arabidopsis form a monophyletic gene family related to
animal receptor kinases. Proc. Natl Acad. Sci. USA 98,
10763–10768 (2001).
84. Lynch, M. & Force, A. Gene duplication and the origin of
interspecific genomic incompatibility. Am. Nat. 156,
590–605 (2000).
NATURE REVIEWS | GENETICS
85. Mezey, J. G., Cheverud, J. M. & Wagner, G. P. Is the
genotype–phenotype map modular? A statistical
approach using mouse quantitative trait loci data.
Genetics 156, 305–311 (2000).
86. Emerson, R. A. Genetic correlation and spurious
allelomorphism in maize. Nebraska Agric. Exp. Stat. Annu.
Rep. 24, 59–90 (1911).
87. Muller, H. J. Further studies on the nature and causes of
gene mutations. Proc. Sixth Int. Congr. Genet. 1,
213–255 (1932).
88. Serebrovsky, A. S. & Dubinin, N. P. Artificial production of
mutations and the problem of the gene. Uspeki
Eksperimental noi Biologii 8, 235–247 (1929).
89. Raffel, D. & Muller, H. J. Position effect and gene divisibility
considered in connection with three strikingly similar scute
mutations. Genetics 25, 541–583 (1940).
90. Verderosa, F. J. & Muller, H. J. Another case of dissimilar
characters in Drosophila apparently representing changes
of the same locus. Genetics 39, 999 (1954).
91. Prince, V. E. The Hox paradox: more complex(es) than
imagined. Dev. Biol. 249, 1–15 (2002).
92. Studer, M. et al. Genetic interactions between Hoxa1 and
Hoxb1 reveal new roles in regulation of early hindbrain
patterning. Development 125, 1025–1036 (1998).
93. Pöpperl, H. et al. Segmental expression of Hoxb1 is
controlled by a highly conserved autoregulatory loop
dependent upon exd/pbx. Cell 81, 1031–1042 (1995).
94. Dupe, V. et al. In vivo functional analysis of the Hoxa-1 3′
retinoic acid response element (3′RARE). Development
124, 399–410 (1997).
95. Langston, A. W., Thompson, J. R. & Gudas, L. J. Retinoic
acid-responsive enhancers located 3′ of the Hox A and
Hox B homeobox gene clusters. Functional analysis. J.
Biol. Chem. 272, 2167–2175 (1997).
96. Shih, L., Tsay, H., Lin, S. & Hwang, S. L. Expression of
zebrafish Hoxa1a in neuronal cells of the midbrain and
anterior hindbrain. Mech. Dev. 101, 279–281 (2001).
97. Kolm, P. J. & Sive, H. L. Regulation of the Xenopus labial
homeodomain genes, HoxA1 and HoxD1: activation by
retinoids and peptide growth factors. Dev. Biol. 167,
34–49 (1995).
98. Scholpp, S. & Brand, M. Morpholino-induced knockdown
of zebrafish engrailed genes eng2 and eng3 reveals
redundant and unique functions in midbrain–hindbrain
boundary development. Genesis 30, 129–133 (2001).
Acknowledgements
We thank A. Bruce, A. Force, R. Ho, J. Postlethwait and three
reviewers for helpful comments on the manuscript. We are also
grateful to D. Raible for advice on Mitf gene evolution, and to
S. Santini and A. Meyer for sharing their observations before publication. Work cited from the Prince lab was funded by the National
Science Foundation and that from the Pickett lab by the
National Institutes of Health.
Online links
DATABASES
The following terms in this article are linked online to:
LocusLink: http://www.ncbi.nlm.nih.gov/LocusLink
Abd-B | dumpy | En1 | eve stripe 2 | Hoxa1 | Hoxa3 | Hoxb1 |
hoxb1a | hoxb1b | Hoxb2 | Hoxb3 | hoxb5a | Hoxd3 | Mitf | nodal |
os | Sox9 | SOX9 | Sy
OMIM: http://www.ncbi.nlm.nih.gov/Omim
campomelic dysplasia | Tietz syndrome | Waardenburg
syndrome type 2a
The Arabidopsis Information Resource:
AP1 | CAL | FUL
ZFIN: http://zfin.org
cyc | egr2 | engrailed2a | engrailed2b | mitfa | mitfb | ndr1 | sox9a
FURTHER INFORMATION
DIVERGE software: http://xgu1.zool.iastate.edu
F. Bryan Pickett’s lab:
http://www.luc.edu/depts/biology/pickett.htm
Gene Tools LLC: http://www.gene-tools.com
PipMaker and MultiPipMaker:
http://bio.cse.psu.edu/pipmaker
TRANSFAC — The Transcription Factor Database:
http://transfac.gbf.de/TRANSFAC
Victoria Prince’s lab:
http://pondside.uchicago.edu/oba/faculty/prince_v.html
Access to this interactive links box is free online.
VOLUME 3 | NOVEMBER 2002 | 8 3 7
© 2002 Nature Publishing Group