Download Evolution by Gene Duplication

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Mutation wikipedia , lookup

Gene expression wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Gene desert wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Community fingerprinting wikipedia , lookup

Non-coding DNA wikipedia , lookup

RNA-Seq wikipedia , lookup

Ridge (biology) wikipedia , lookup

Genomic imprinting wikipedia , lookup

Gene regulatory network wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Gene wikipedia , lookup

Gene expression profiling wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Molecular evolution wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
IB404 - 25 – Theories – April 23
1. Gene duplication is such an obvious and common event that ideas
about it are seldom credited to single individuals, and indeed started
with the great evolutionary geneticists Haldane and Fisher in the 1930s,
but Susumu Ohno in particular championed the importance of gene
duplication early on. Ohno moved from Japan to the US in 1950 and
worked at the City of Hope Hospital in Los Angeles, California, and
still publishes occasional reviews on the subject. After early work on
cytogenetics of mammals and birds, in 1970 he published an influential
book titled Evolution by Gene Duplication in which he strongly
argued the basic idea that after a gene duplication event, in most cases one of the two duplicates
would be lost to deleterious mutations as a pseudogene through nonsense mutations (stop
codons), or frameshifting indels, or crucial amino acid changes. But, occasionally the duplicates
would remain active long enough to allow them to start to diverge and one might gain a slightly
different function from the other, and eventually through selection for this new function diverge
significantly from the original. This is now known as the neofunctionalization model.
There are many examples of gene duplicates that seem to follow this model, like the odorant
receptors and p450s. Perhaps the most obvious features of these are that one of the duplicates
diverges more rapidly than the other, in which case the rapidly diverging duplicate is thought to
have acquired the new function, for example, the VKORC1 gene where this duplicate in
vertebrates is the more rapidly evolving, while its paralog, VKORC1L1 of unknown function, is
far more conserved and hence might better reflect the original role of this gene/protein in
2. In 2000 an alternative model was proposed by Michael Lynch,
who was an assistant professor here in the Ethology, Ecology, and
Evolution (EEE, but now Animal Biology) department and then
moved to the University of Oregon at Eugene. He is now at Indiana
University in Bloomington. Lynch proposed that instead of rapidly
acquiring a new function, most gene duplicates that survived
partition the original functions of the single gene, a model known as
subfunctionalization. He thinks about this model very broadly, for
example, the partitioning could simply involve expression in slightly
different, and still overlapping, regions. The central idea is that any
partitioning of expression or function means that each copy becomes
important for the organism to survive
and compete, that is, they complement
each other, hence both copies are
retained, allowing them to diverge
further as time passes.
Notice that this model, at least
initially, relies only on degenerative
mutations, e.g. deletion or mutation
of different enhancer elements in
each duplicate, which are presumably
far more likely to happen than
mutations giving one copy a novel
function.
3. Lynch worked with zebra fish colleagues
at Oregon on one example involving the
engrailed gene, where zebra fish have two
copies (presumably resulting from their
extra polyploidization event) that appear to
partition the usual expression pattern of
this gene in mammals. One gene is
expressed in the developing pectoral bud,
while the other is in the hindbrain and
spinal column. The single mammalian gene
is expressed in both places (most
vertebrates have a single copy of this
crucial gene, so the other three from the 2R
event at the base of vertebrates must have
been lost).
4. Next slide - The HOX complex provides many possible examples of this where the genes,
either duplicated in tandem along the complex, or in duplicated copies of the complex, have come
to be expressed in different spatial patterns from anterior to posterior (for the tandem copies
along the complex), or in different tissues, e.g. nerves versus muscles (in vertebrates with 4
HOX complex in fly and mouse
5. Although it remains unclear how duplications of single genes initially occur, if they commonly
occur as tandem repeats, which seems to be the case, then it is easy to see how tandem arrays of
genes can be generated by unequal crossing over. A classic example is provided by the
red/green opsins in our genome. These two recently duplicated genes (paralogs) are 96%
identical in DNA sequence and in tandem on the X-chromosome, and commonly undergo
unequal recombination such that the resultant chromosomes have one copy of either red or green,
while the other chromosome has an extra copy of red or green or a hybrid of the two genes.
Males who inherit the former chromosomes are red/green color-blind. It turns out that most of us
have various combinations of duplicated versions of these two genes, but as long as we have at
least one copy of red and one of green we are not red/green color blind.
X
and
6. We routinely see these kinds of tandem arrays in any large gene family, such as the
immunoglobulin genes in vertebrates, or the p450s in most eukaryotes, or the odorant receptors in
any animal. Here’s an example from our work with the odorant receptors of bees, where there has
been an expansion of the family to around 170 genes, compared with about 60 in flies. One beespecific subfamily alone consists of 157 genes, and within it there is a large tandem array of 60
genes, which remain today in a perfect tandem array, even though the genes at either end, which
are the oldest genes according to a phylogenetic tree of their encoded proteins, barely resemble
each other (<20% amino acid identity). Eventually these tandem arrays might get broken up by
inversions and other chromosomal rearrangements, indeed in Drosophila flies there are almost no
such tandem arrays of odorant receptors, the maximum being three genes together, presumably
because Drosophila genomes have been subject to more genome “flux” than have bee genomes.
7. Here’s another example from our work with the tetraspanin family in Drosophila. This time
about half of the genes are dispersed around the genome as single copies, however 18 of them are
in an array (bottom), mostly in tandem near the centromere of chromosome arm 2R.
Nevertheless, when we compare their sequences, these genes in the array are almost as different
from each other as the rest, although we do believe that they all originated as an array from a
single gene. Among many uncertainties is why these 18 genes have remained in this array for so
long despite diverging enormously. For the HOX genes we know it has to do with their regulation
8. With the coming of genome sequences, various groups started to ask about the frequency of
gene duplication, and again Michael Lynch took the lead with a Science paper in 2000 showing
that in various genomes, the rates of gene duplication are remarkably high. He also confirmed
that, as expected, the numbers of duplicates fell off rapidly, revealing a half-life around 4 Myr,
although there is a large range. Note that he can identify the polyploidization event in
Arabidopsis as a peak of duplicated genes roughly 65 Myr old, estimated from the Ks rate (X
axis).
9. The origins of introns remains a controversial topic in
molecular biology and genomics. Some time ago Walter Gilbert
at Harvard (developer of an independent degradation method of
DNA sequencing for which he received a Chemistry Nobel along
with Fred Sanger in 1980) developed the idea that introns might be
ancient features of genes that were originally present in bacteria,
but have been lost from them. The idea is that proteins consist of
modules, and these modules originally were encoded by separate
exons, which were put together into genes where they are
separated by introns. It would then be possible for “exonshuffling” to occur on a large scale, producing diverse proteins
from a limited set of modules. This is known as the introns-early
model.
One of the essential requirements for such a theory is that the introns all be in the same phase
with respect to codons, because otherwise when exons are shuffled, the introns would lead to
frameshifts. The simplest model has all introns in phase 0, that is, between codons. However,
today we find introns in all three possible phases. Gilbert’s lab has shown, however, and genomic
sequences and their gene annotations confirm this on a grand scale, that the ratios of intron
phases are biased towards phase 0 introns, roughly 2:1:1 for the three possible phases, 0, 1, and 2.
Therefore their modified model is that roughly half the phase 0 introns represent the original
module-separating introns, while the other half, and all the phase 1 and phase 2 introns, are
subsequent acquisitions in the eukaryotic lineages. It is almost impossible to disprove this model,
because you don’t know which 50% of the phase 0 introns are supposed to be ancient.
Intron name
Intron phase
10. Note that their model now allows that
3/4 of introns are eukaryotic acquisitions,
which is essentially the introns-late
model. Exactly how introns are gained
remains obscure however, and there has
been a cessation of intron acquisition in
the mammalian lineage. But from
phylogenetic studies of various genes and
gene families it is clear that introns have
been gained in all sorts of organismal
lineages. Here’s an example from the
carbon dioxide receptor genes in insects.
The introns are spread throughout the gene
(above) in various phases. When mapped
onto a phylogenetic tree of the three genes
(Gr1-3) in various insects, there are
instances of intron loss (lower case letter
on branches) and intron gains (upper case
letters). The best theory is that new introns
derive from transposon insertions.
11. Another major controversial topic is how some
genomes get to be so small. We’ve seen how
genomes get big, primarily through acquisition of
transposable elements and other junk DNA like
pseudogenes, but also through tetraploidization. The
ranges of genome size are huge. For example,
crickets, grasshoppers, and locusts have genomes up
to 5 times ours, that is, 15 Gbp, while lungfish and
lilies can get up to 100 Gbp. Presumably these
organisms have lost control of transposons which
have flourished in their genomes. We currently think
that RNAi is, at least in part, a genomic defense
mechanism against transposons, so perhaps their
RNA interference systems are compromised.
Human
Nematode
12. The removal of DNA in the form of random deletions is also important. Several groups
including us have shown that organisms differ enormously in the sizes and frequencies of random
deletions in pseudogenes and transposons in their genomes, with the result that those that delete
lots of DNA are smaller (above histograms are for transposon copies in human versus nematode note longer deletions in nematode). For example, Drosophila flies are estimated to delete DNA
roughly 75 times more rapidly than humans, explaining in large part why there are almost no
pseudogenes and very few old transposon copies in the Drosophila genome. Even the mouse
genome relative to human appears to be smaller due to more deletions. The big question is why?
13. Michael Lynch at Indiana University has
taken a grand view of genome complexity across
the entire scale of biology and genome sizes.
While the basic notions have been proposed
before, Lynch is the first to systematically address
these grand-scale questions. Basically he suggests
that many of these features of genomes, that is,
large size, presence and number of transposons,
and even number and length of introns, are mildly
deleterious and should be selected against.
However, selection only works effectively against
these kinds of mildly deleterious traits in very
large populations, which are typically small
organisms in low trophic levels. In large
organisms with relatively small population sizes,
typically in higher trophic levels, drift becomes a
much more significant factor, and frequently
selection does not succeed.
A simple laboratory example of this effect of drift
on population genetics is shown for beetle
populations of two marked strains. When the
populations are small (10-20 individuals) drift can
win (top), but when they are large, selection
winds (bottom).
14. The basic
observation is simply
that smaller
organisms with
larger populations
(top) have smaller
genomes and fewer
genes, taken all the
way from the most
abundant bacteria in
the oceans
(Prochlorococcus) to
mammals (bottom).
An important point
that we’ve seen a
few times before is
that there is no great
discontinuity in any
of these measures
between prokaryotes
and eukaryotes. Thus
there are bacteria
with large genomes
and many genes.
15. Lynch correlated particular features with genome size, and found positive relationships for
transposon numbers, intron size and numbers, and half-life of gene duplicates, explaining them in
terms of the weak selection against these mildly deleterious features in large organisms with
small population sizes and large genomes. This is an unsettling idea, because it suggests that most
of genome complexity arises because selection at the organismal level can’t prevent it, instead of
being selected for and adaptive. It mirrors earlier controversies about whether most molecular
evolution of base and amino acid changes is neutral and slightly deleterious versus advantageous.
It’s now clear that most is effectively neutral.
Nevertheless it remains unclear if these genome complexity relationships will hold for finer scale
comparisons, e.g. related carnivores versus herbivores. Initial analyses within mammals suggest
that genome size is not so simply related, e.g. carnivores actually have slightly smaller genomes
than do rodents, when Lynch’s theory would predict the opposite. Many still believe that adaptive
explanations are important, e.g. small genome size in birds makes them lighter.
transposons
Genome Size (Mb)