Download The history of gene duplication Phylogenies are not just useful for

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Oncogenomics wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

X-inactivation wikipedia , lookup

Point mutation wikipedia , lookup

Epistasis wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Transposable element wikipedia , lookup

NEDD9 wikipedia , lookup

Ridge (biology) wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Public health genomics wikipedia , lookup

Minimal genome wikipedia , lookup

Genomic imprinting wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Genetic engineering wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

History of genetic engineering wikipedia , lookup

Pathogenomics wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Copy-number variation wikipedia , lookup

Nutriepigenomics wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene therapy wikipedia , lookup

Gene wikipedia , lookup

The Selfish Gene wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene desert wikipedia , lookup

Genome (book) wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Helitron (biology) wikipedia , lookup

Gene nomenclature wikipedia , lookup

Gene expression profiling wikipedia , lookup

Gene expression programming wikipedia , lookup

Genome evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Designer baby wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
The history of gene duplication
Phylogenies are not just useful for studying morphological traits and geography but they
also are essential tools for making sense of the evolutionary history of genomes. As
already discussed, trees can be used to infer ancestral gene or protein sequences. While
beyond the scope of this primer, statistical analyses of gene sequence evolution along the
branches of a tree can provide evidence that selection has acted to shape molecular
variation. Here, however, we will discuss gene duplication.
When biologists began sequencing genomes they were surprised to find that many genes
have closely related genes within the very same genome. We now understand that during
evolution genes often duplicate – an ancestral genome with one copy gives rise to a
descendant genome with two copies of a particular gene. Over time, repeated
duplications can result in gene families: sets of related genes that have similar, but often
somewhat diverged, functions. It is not necessary to go into the molecular mechanisms of
duplication, nor to discuss the fascinating natural history of gene copies and their longterm evolutionary fate. But it will be useful to consider the way that gene duplication
shapes gene trees and, correspondingly, how phylogenetic analysis of gene families can
shed light on the history of gene duplication.
There are parallels between the biogeography and gene duplication. In the same way that
a species within a particular landmass can undergo lineage splitting to yield two daughter
species, so to can a gene within a species’ genome give rise to two descendant genes.
Furthermore, the splitting of geographic areas during vicariance affects the species living
in those areas in much the same way that the splitting of population lineages affects the
gene copies that “occupy” those populations. We will address these two concepts in turn.
First, we will consider the duplication of genes within a single lineage. Then we will
discuss how lineage splits interact with gene duplication to shape the topology of gene
trees.
Imagine a gene, A, that exists as a single copy in the genome of all organisms in an
ancestral population. Through an error in DNA replication, or the action of a
transposable element, or some other molecular mechanism, a second exact copy of the
original gene is generated somewhere else in the genome. Since it is an exact copy it is
not fruitful to worry about which is the original gene and which is the copy. Let’s call
the duplicate genes A1 and A2. Gene duplication isa lineage branching event in that we
have gone from one ancestral gene to two descendant genes. As with lineage splitting,
following gene duplication the two gene copies will accumulate mutations independently
and will gradually diverge in sequence.
The duplication will only persist in the long run if it first arises in an individual that
leaves offspring and if eventually it comes to be fixed in the population lineage. Imagine
that the after being fixed, gene A2 undergoes yet another gene duplication to give rise to
genes A2a and A2b. Once this second duplication goes to fixation, what will be the
relationship among the three genes? Since A2a and A2b share a more recent common
ancestor (A2) than either does with A1 (A), the correct tree is the one shown in figure x.
If we correctly inferred this (rooted) gene tree we
would immediately see that genes A2a and A2b
represent a more recent gene duplication, whereas A1
vs. A2 was a more ancient gene duplication.
A1
A2a
A2b
Now we have the opportunity to test your treethinking skills: what would the gene tree look like if,
after these duplication events had happened, the
lineage split to give rise to two living species, X and
Y? Species X and Y would each have three gene copies, A1, A2a, and A2b, meaning
there would be six tips. But how would they be related to each other?
One way to think through this problem is to first draw the population lineages as though
they were hollow tubes. Then you can draw the gene tree inside these tubes making sure
that all gene copies present in an ancestral population make it into the two species
lineages. Then use tree thinking skills to “unfold” the gene tree, labeling the genes based
in which species they came from. As shown in the figure, three nodes, marked with a
circle, correspond to a lineage-splitting event (X versus Y), whereas two nodes, marked
with squares, correspond to the two gene-duplication events.
A1 A2aA2b
A1
A2a
A2b
XA1
YA1
XA2a
YA2a
XA2b
Before discussing alternative possible histories of gene duplication and lineage splitting,
we should clarify some widely-used terminology applied to genes. Because all these
genes descended relatively recently from a common ancestral gene they are all
homologous genes or homologs. Pairs of genes that occur in different species whose last
common ancestor corresponds to a lineage-splitting event are orthologous genes or
orthologs. For example, XA1 and YA1 are orthologs, because they both descend from a
node that corresponds to the split of the X and Y lineage. In contrast, pairs of genes (in
the same or different organisms) that descend from a gene duplication event are
paralogous genes or paralogs. For example XA1 is paralogous to XA2a because the last
common ancestor of these two genes was the root node, which corresponds to the A1-A2
gene duplication event.
YA2b
The concepts of orthology and paralogy relate to the process that caused the existence of
distinct gene lineages: population splitting or gene duplication. It does not directly relate
to the role that a gene plays in the development of an organism – its function. When
looking between species, orthologs have a more recent common ancestor than paralogs.
For example, XA1 is more closely related to YA1 than to YA2a or YA2b. Because gene
functions generally change slowly, it is more likely that orthologs share functions than
paralogs. However, this is not a rule. Supposing that YA1 acquired a novel function,
while the other genes retained an ancestral function, then XA1 could be functionally
more similar to YA2a than to YA1. This is another manifestation of the principle that
trees depict relationships not similarity (Chaps 3-4).
The preceding scenario explained the occurrence of three gene copies in species X and Y
via two gene duplication events that predated all lineage splitting events. Now consider a
different scenario where the population lineages split just after the A1-A2 gene
duplication, with separate gene duplications happening independently in species X and Y.
In that case the six gene copies still require a gene tree with five internal nodes.
However, because three of the nodes now correspond to gene duplication events (one
before X and Y split and one each within X and Y), and two are the result of lineage
splitting, the topology is different from the preceding case. In this case XA2a is more
closely related to XA2b than to YA2a.
A1 A2aA2b
A1
A2a
A2b
XA1
YA1
XA2a
XA2b
YA2a
To be complete, we should also consider the case in which three gene copies (A1,A2a,
and A2b) in species X and Y are due to gene duplication events occurring independently
in X and Y. In this case the five nodes correspond to one lineage branching event and
four gene duplication events. Because the gene duplication events occurred after lineage
splitting, there are no pairwise orthologs between the two species.
YA2b
A1 A2a A2b
A2a
A2b
A1
XA2a
XA2b XA1
YA1
YA2a
Looking over the three scenarios that could give rise to three genes in each of two
species, you will see that the expected gene tree topologies are different. If you were
confident that the true gene tree was the first one, then you would be able to infer that the
gene duplication events happened before any relevant lineage splitting. Equally either of
the other two tree topologies would have supported a different history of gene
duplication. The same applies to trees with multiple species, although it can take practice
to learn to read trees to elucidate the history of gene duplications.
YA2b