Download Molecular Evolution

Document related concepts

Designer baby wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Genetic code wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Human genome wikipedia , lookup

RNA-Seq wikipedia , lookup

Viral phylodynamics wikipedia , lookup

Protein moonlighting wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Genome (book) wikipedia , lookup

Gene expression programming wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Adaptive evolution in the human genome wikipedia , lookup

Gene expression profiling wikipedia , lookup

Minimal genome wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Gene wikipedia , lookup

Non-coding DNA wikipedia , lookup

Helitron (biology) wikipedia , lookup

Oncogenomics wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Koinophilia wikipedia , lookup

Population genetics wikipedia , lookup

Genome evolution wikipedia , lookup

Frameshift mutation wikipedia , lookup

Epistasis wikipedia , lookup

Microevolution wikipedia , lookup

Mutation wikipedia , lookup

Point mutation wikipedia , lookup

Transcript
Rates of
Nucleotide
Substitution
Dan Graur
1
r = Rate of substitution per site per year
K = Number of substitutions per site per year
K
r
2T
2
Mean Rate of Nucleotide
Substitution in Mammalian
Nuclear Genomes
Less than
-9
10 substitutions/site/year
Evolution is a very slow process at the
molecular level.
Not much happens in evolution.
3
Substitutions Rates in Protein-Coding Regions
The rate of
synonymous
substitution is
much larger than
the nonsynonymous
rate.
4
5
A lot
A little
6
Synonymous
substitutions are
more frequent than
nonsynonymous
ones.
7
Mean nonsynonymous rate = 0.75  10–9 substitutions per site per year
Mean synonymous rate = 3.65  10–9 substitutions per site per year
The synonymous substitution rate is 5 times higher than the nonsynonymous
substitution rate
Coefficient of variation of nonsynonymous rate = 95%
Coefficient of variation of synonymous rate = 31%
8
The distribution of KA to KS ratios in >13,000 orthologous
protein-coding genes from human and chimpanzee
9
58 nucleotide differences
3 amino acid differences
In a comparison of human and yeast ubiquitin genes,
the inferred number of synonymous substitutions per
synonymous site is ~6 (almost certainly indicative of
saturation). The inferred number of nonsynonymous
substitutions per nonsynonymous site is 0.03. Thus,
synonymous substitutions have accumulated at least
200 times faster than nonsynonymous substitutions.
10
Ratio
1.5
4.4
1.1
11
Substitution Rates of in Noncoding Regions
12
13
Divergence between cow and goat b- and g-globin genes
and between cow and goat b-globin pseudogenes
______________________________________________
Region
K
______________________________________________
5’ Flanking region
5.3  1.2
5’
5’ Untranslated region
4.0  2.0
4-fold degenerate sites
8.6  2.5
Introns
8.1  0.7
3’ Untranslated region
8.8  2.2 3’
3’ Flanking region
8.0  1.5
Pseudogenes
9.1  0.9
______________________________________________
14
15
Coding regions
evolve slower than
noncoding regions.
16
Evolutionary Rate Profiles
17
Alignment preproinsulin
Xenopus
Bos
MALWMQCLP-LVLVLLFSTPNTEALANQHL
MALWTRLRPLLALLALWPPPPARAFVNQHL
**** : * *.*: *:..* :. *:****
Xenopus
Bos
CGSHLVEALYLVCGDRGFFYYPKIKRDIEQ
CGSHLVEALYLVCGERGFFYTPKARREVEG
***************:***** ** :*::*
Xenopus
Bos
AQVNGPQDNELDG-MQFQPQEYQKMKRGIV
PQVG---ALELAGGPGAGGLEGPPQKRGIV
.**.
** *
*
*****
Xenopus
Bos
EQCCHSTCSLFQLENYCN
EQCCASVCSLYQLENYCN
**** *.***:*******
18
19
Functional regions
evolve slower than
nonfunctional
regions.
20
21
22
Rates of amino acid replacement in
different proteins
23
Fibrinogen to Fibrin
•
•
•
•
Fibrinogen consists of 6 chains: 2a, 2b, 2g
Fibrinopeptides are very negatively charged
Fibrinopeptides A are cleaved first (to allow polymerization of fibrins)
Fibrinopeptides B are cleaved second (to enhance crosslinking)
25
Important proteins
evolve slower than
unimportant ones.
26
27
Can we explain the different rates of
substitution by the selectionist model?
1. Mutations can be either deleterious or
advantageous.
2. If the fraction of advantageous mutations is large,
the rate of evolution will be high. If the fraction of
advantageous mutations is small, the rate of
evolution will be low.
3. A mutation occurring at a functional site has a
higher probability of being advantageous than a
mutation occurring at a nonfunctional site.
Expectation: Important entities should evolve
faster than less important ones.
29
Can we explain the different rates of
substitution by the selectionist model?
1. Mutations can be either deleterious or
advantageous.
2. If the fraction of advantageous mutations is large,
the rate of evolution will be high. If the fraction of
advantageous mutations is small, the rate of
evolution will be low.
3. A mutation occurring at a functional site has a
higher probability of being advantageous than a
mutation occurring at a nonfunctional site.
Expectation: Important entities should evolve
faster than less important ones.
30
Can we explain the different rates of
substitution by the neutralist model?
1. Mutations can be either deleterious or neutral.
2. If the fraction of deleterious mutations is large, the
rate of evolution will be low. If the fraction of
deleterious mutations is small, the rate of evolution
will be high.
3. A mutation occurring at a functional site has a
higher probability of being deleterious than a
mutation occurring at a nonfunctional site.
Expectation: Important entities should evolve
slower than less important ones.
31
Can we explain the different rates of
substitution by the neutralist model?
1. Mutations can be either deleterious or neutral.
2. If the fraction of deleterious mutations is large, the
rate of evolution will be low. If the fraction of
deleterious mutations is small, the rate of evolution
will be high.
3. A mutation occurring at a functional site has a
higher probability of being deleterious than a
mutation occurring at a nonfunctional site.
Expectation: Important entities should evolve
slower than less important ones.
32
Kimura’s First Law of Molecular Evolution
33
Functional entities evolve
slower than entities devoid
of function.
34
Functional constraint = Degree of
intolerance towards mutations at
a genomic location.
The functional constraint defines
the range of alternative residues
that are acceptable at a site
without affecting negatively the
fitness of the organism.
35
For neutral mutations:
K=v
Rate of substitution
Mutation rate
36
Kimura’s model of functional constraint
Suppose that a fraction, f0, of all mutations are
selectively neutral and the rest (1 − f0) are deleterious.
Advantageous mutations are assumed to occur only very
rarely, such that their relative frequency is effectively
zero.
If we denote by vT the total mutation rate per unit time,
then the rate of neutral mutation, v0, is
v0  vT f0
37
v0  vT f0
According to the neutral theory, the rate of
substitution is:
Hence,
K  v0
K  vT f0
The highest substitution rate is expected in
sequences that do not have any function, such
that all mutations are neutral
f0  1
38
39
An evolutionary experiment
Spalax ehrenberghi
40
aA-crystallin
41
In Spalax, aA-crystallin lost its
functional role more than 25 million
years ago, when the mole rat became
subterranean and presumably lost use
of its eyes.
The aA-crystallin of Spalax evolves 20
times faster than the aA-crystallins in
other rodents, such as rats, mice,
hamsters, gerbils and squirrel.
42
Additional Facts:
(1) The aA-crystallin of Spalax possess
all the prerequisites for normal
function and expression, including the
proper signals for alternative splicing.
(2) The aA-crystallin of Spalax evolves
slower than pseudogenes.
43
Explanation 1:
The aA-crystallin gene may not
have lost all of its vision-related
functions, such as photoperiod
perception and adaptation to
seasonal changes.
Contradicting evidence:
The atrophied eye of Spalax does
not respond to light.
44
Explanation 2:
The blind mole rat lost its vision more recently than
25 million years ago. The rate of nonsynonymous
substitution after nonfunctionalization has been
underestimated.
Contradicting evidence:
The aA-crystallin gene is
still an intact gene as far as
the essential molecular
structures for its expression
are concerned.
45
Explanation 3:
The aA-crystallin-gene product serves another
function (unrelated to that of the eye). aAcrystallin is a multifunctional protein
Supporting evidence:
1. aA crystallin has been found in other tissues.
2. aA crystallin also functions as a chaperonin that
binds denaturing proteins and prevents their
aggregation.
3. The regions within aA crystallin
responsible
for chaperonin activity are conserved in the mole
rat.
46
4. The protein has viable secondary and quarternary
Genetic nonfunctionalization or
partial nonfunctionalization
accelerates evolution.
Most evolutionary “action”
occurs after death.
47
The Concept of Functional
Constraint
The intensity of purifying selection is determined by the degree
of intolerance characteristic of a site or a genomic region towards
mutations.
The functional or selective constraint defines the range of
alternative nucleotides that is acceptable at a site without
affecting negatively the function or structure of the gene or the
gene product.
DNA regions, in which a mutation is likely to affect function, have a
more stringent functional constraint than regions devoid of
function
The stronger the functional
constraints on a macromolecule
are, the slower its rate of
substitution will be.
Functional density (Zuckerkandl 1976)
The functional density, F, of a gene is defined
as ns/N, where ns is the number of sites
committed to specific functions and N is the
total number of sites. F, therefore, is the
proportion of amino acids that are subject to
stringent functional constraints.
Functional density (Zuckerkandl 1976)
The higher the functional density, the lower
the rate of substitution is expected to be.
Thus, a protein in which the active sites
constitute only 1% of its sequence will be less
constrained, and therefore will evolve more
quickly than a protein that devotes 50% of its
sequence to performing specific biochemical
or physiological tasks.
According to the neutral theory of evolution, the rate
of substitution (as inferred from between-species
comparisons) should positively correlate with the degree
of genetic polymorphism (as inferred from comparisons
among individuals within one species).
An interesting corollary of this hypothesis is that we
should observe very little or no variation at the
population level at evolutionary conserved positions.
The variation observed at conserved positions should be
mostly deleterious (i.e., associated with disease).
Substitution rates and disease:
The case of Gaucher disease
Gaucher disease is an autosomal recessive lysosomal storage disorder due to
deficient activity of an enzyme called acid b-glucosidase. There are many
subtypes of Gaucher disease with fitness effects ranging from slight reduction
in fitness to perinatally lethal, in which death occurs during the period
between 154 days of gestation to seven days after birth.
b-glucosidase
We aligned the amino acid sequences of acid b-glucosidase from nine placental
mammals (human, chimpanzee, Sumatran orangutan, bovine, pig, dog, horse,
rat, and mouse). The length of the alignment (excluding one gap due to a codon
deletion in the ancestor of mouse and rat) was 496 amino-acids, of which 387
(78%) were identical in all nine species and 109 (22%) were variable..
Thirty-six single amino-acid replacements (at 34 amino-acid
positions) resulting in Gaucher disease are described in the
literature. Perinatal lethal mutations are shown in red.
All 36 deleterious mutations occur at completely conserved sites (below
asterisks). The expectation under a random model is that only 36 × 0.78 = 28
mutations should occur at completely conserved sites. This statistically significant
non-random association between disease and evolutionary conservation (p =
0.0002) indicates that invariable sites are conserved because they evolve under
extremely stringent functional constraints and cannot tolerate change.
Q: What determines functional constraint?
A: Many factors.
Q: Example?
A: Interactions.
A network (or graph) is an abstract representation
of a set of objects, where some objects are
connected to one another. The objects are
represented by vertices (or nodes), and the links
that connect the vertices are called edges (or
branches). Edges can be polarized
Edges can be polarized to indicate directionality
and type of interaction (e.g., activation,
inhibition). Edges can also be quantified to
denoted extent of effect.
Protein-protein interaction networks
(a) A simple example of a protein-protein
interaction network consisting of five
proteins (A-E), represented by the nodes,
each of which interacts with at least one
other protein. There are five interactions,
denoted by the links.
In biological networks, three variables are
usually studied:
(b) degree centrality or connectedness = the
number of interactions for a protein.
(c) betweenness centrality = the number of
times that a node appears on the shortest
path between all pairs of nodes.
(d) closeness centrality = the mean number
of links connecting a protein to all other
proteins in the network.
Proteins with high connectedness evolve
slowly.
Proteins with low connectedness evolve fast.
Proteins with high betweenness evolve slowly.
Proteins with low betweenness evolve fast.
Proteins with high closeness evolve slowly.
Proteins with low closeness evolve fast.
Why do the rates of synonymous substitution
vary from gene to gene?
(1) The variation represents stochastic
fluctuations.
(2) The variation is due to deterministic factors on
top of stochastic fluctuations.
(2.1) Variation in the rate of mutation
among
different regions of the genome.
(2.2) Selection operating on synonymous
mutations.
Fact: There is a positive correlation between synonymous
and nonsynonymous substitution rates in a gene.
Explanations:
(1) The rate of mutation varies along the genome and among
genes (and hence some genes will have both high
synonymous and nonsynonymous rates of substitution)
(2) The extent of selection at synonymous sites is affected
by the nucleotide composition at adjacent
nonsynonymous positions.
(3) (1) and (2).
In the absence of positive Darwinian
selection, the universal observation is that
important sequences tend to evolve
slower than less important ones.
The opposite, however, is not always true.
That is, conserved regions in the genome
may not always be important.
Defining “importance” is not a trivial
undertaking.
Hurst and Smith (1999) tested the relationship between rate of
substitution and dispensability (a proxy for importance).
Approximately two thirds of all knockouts of individual mouse
genes give rise to viable fertile mice. These genes have been termed
“non-essential,” in contrast to “essential” genes, the knockouts of
which result in death or infertility.
It is predicted that non-essential genes will subject to lesser
intensities of purifying selection, and should therefore evolve faster
than essential genes.
In a comparison of 74 non-essential genes with 64 essential ones,
the rate of substitution was found not to correlate with the severity
of the knockout phenotype.
To account for differences in function, Hurst and Smith (1999)
restricted their analysis exclusively to neuron-specific genes, which
have significantly lower rates of substitution than other genes.
They could find no difference in the rate of substitution between 16
essential neuron-specific genes and 18 non-essential ones.
The functional role (if any) of ~98% of mammalian genomes
remains undetermined.
Nóbrega et al. (2004) deleted ~2 Mb-long sequences from the
mouse genome, a 1,817,000 region mapping to mouse chromosome
3 and a 983,000 region mapping to chromosome 19. (Orthologous
regions of about the same size are present on human chromosomes
1 and 10, respectively.)
Viable mice homozygous for the deletions were generated and
were indistinguishable from wild-type littermates with regard to
morphology, reproductive fitness, growth, longevity, and general
homeostasis. Further analysis of the expression of multiple genes
bracketing the deletions revealed only minor expression
differences between homozygous-deletion mice and wild-type
mice.
The two deleted segments harbor 1,243 non-coding sequences
conserved between humans and rodents (more than 100 base pairs,
70% identity). Yet, the deletion of so many sequences that have been
conserved for such long period of time (mouse-human divergence ≈
100 million years) resulted in no reduction in fitness.
Conclusion I: There are potentially ‘disposable
DNA’ in the genomes of mammals.
Conclusion II: Sequence conservation may not
necessarily indicate constraint.
Ahituv et al. (2007) removed
from the mouse genome four
ultraconserved elements—
sequences of 200 base pairs
or longer that are 100%
identical among human,
mouse, and rat.
Remarkably, lines of mice
homozygous for the four
deletions were viable and
fertile, and failed to reveal
any developmental or
phenotypic abnormalities.
These results indicate that
extreme sequence
conservation may not
necessarily reflect
extreme evolutionary
constraint.
There must be forces
other than selection
that promote sequence
conservation.