Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

DNA damage theory of aging wikipedia , lookup

Molecular cloning wikipedia , lookup

Quantitative trait locus wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Mutagen wikipedia , lookup

Gene expression profiling wikipedia , lookup

Metagenomics wikipedia , lookup

Human genome wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Epistasis wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Genetic code wikipedia , lookup

Oncogenomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Non-coding DNA wikipedia , lookup

Genomics wikipedia , lookup

Gene wikipedia , lookup

Microsatellite wikipedia , lookup

Minimal genome wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Helitron (biology) wikipedia , lookup

Frameshift mutation wikipedia , lookup

Population genetics wikipedia , lookup

Genome evolution wikipedia , lookup

Mutation wikipedia , lookup

Point mutation wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
Models of Molecular
Evolution II
Level 3 Molecular Evolution and
Bioinformatics
Jim Provan
Page and Holmes: Sections 7.3 – 7.4
Isochore structure of vertebrate
genomes
Why do patterns of base composition – the
frequencies of the four bases and of codons used to
specify amino acids – differ between genomes?
Mean G + C content in bacteria ranges from 25% to
75%, but there is little intragenome variation
Genomes of vertebrates have a much greater range
of G + C values:
Caused by continuous sections (> 300kb) each of which has
a uniform G + C content (isochores)
G + C content of isochores also varies between species
Properties of vertebrate isochores
G + C rich isochores
A + T rich isochores
Correlate with reverse Giesma (R) bands
Early replicating
High density of genes
SINEs present
CpG islands in genes
High G + C content at third codon position
High frequency of retroviral sequences
High frequency of chiasmata
Correlate with Giesma (G) bands
Late replicating
Low density of genes (only tissue specific)
LINEs present
No CpG islands
High A + T content at third codon position
Low frequency of retroviral sequences
Low frequency of chiasmata
Theories on the existence of isochores
Selectionist hypothesis of Bernardi et al. suggests
that GC-rich isochores predominantly found in warmblooded vertebrates are an adaptation to higher body
temperature:
Extra hydrogen bond in G-C pair may lessen possibility of
thermal damage to DNA
Desert plants also have higher GC contents
Evidence for independent occurrence of isochores
since birds and mammals do not share an immediate
ancestor
However, some thermophilic bacteria are AT-rich
Theories on the existence of isochores
Neutralist explanation for the existence of isochores
is that they simply reflect variation in the process of
mutation across the genome
Studies on argininosuccinate synthetase processed
pseudogenes from anthropoid primates:
Pseudogenes were derived from same functional ancestral
gene but then inserted into different parts of the genome
Despite their common ancestry, they now differ in base
composition
Because pseudogenes are not subject to selection,
differences in base composition must have been due to
regional variation in mutation patterns
Why should mutation patterns vary
across genomes?
Replication hypothesis suggests that genes which
replicate earlier in the cell cycle are more GC-rich than
those which replicate later:
Believed to be due to the fact that G and C precursor pools of
dNTPs are larger at this time – errors are more likely to
incorporate G or C
Repair hypothesis is based on assumption that efficiency
of DNA repair varies across genome:
May be an outcome of transcriptionally active areas being
repaired more efficiently
CpG islands are maintained by a special repair system –
efficiency of DNA replication may be dependent on location
Why should mutation patterns vary
across genomes?
Recombination hypothesis claims that isochore structure
of vertebrate genomes is the outcome of differences in
the pattern and frequency of recombination:
Low GC localities will be associated with regions of reduced
recombination:
—
—
Genes with low rates of recombination have low GC values
The large, non-recombining region of the Y-chromosome has a low
GC composition
Fact that recombination plays such a large part in the structuring
of eukaryote genomes makes this an attractive hypothesis
Although the relative contributions of these hypotheses
are still unclear, the neutralist interpretation seems more
likely
Codon usage
60
50
40
30
20
10
0
E. coli
Human
What determines codon usage?
Degeneracy of genetic code:
Null hypothesis is that all codons for a particular amino acid
are used with equal frequency
Refuted when nucleotide sequences became available for a
wide range of organisms
Selectionist argument:
Highly expressed genes show most codon bias because they
require more translational efficiency: coevolution of tRNAs
and codons
Also supports the neutralist prediction of a relationship
between functional constraint and substitution rate
Gene expression and codon bias
Highly expressed
genes
Lowly expressed
genes
Strong selection for
translational efficiency
Weak selection for
translational efficiency
Restricted
tRNAs used
More
tRNAs used
Strong codon bias
Weak codon bias
Low rate of
synonymous substitution
(few neutral mutations)
High rate of
synonymous substitution
(many neutral mutations)
The molecular clock
Idea of a molecular clock is central to the neutralist
theory, since it demonstrates the constancy of the
underlying neutral mutation rate
Previous example of a-globin
Does not imply that all genes and proteins evolve at
the same rate:
Great variation between proteins (fibrinonectins vs. histones)
Variation in rate among genes and proteins is compatible with
the neutral theory if the underlying cause is changes in
selective constraint
Key question concerning the validity of a molecular clock is
whether rates of substitution are constant within genes across
evolutionary time
Neutral theory and the molecular
clock
Rate of nucleotide substitution (fixation) at any site
per year, k, in a diploid population of size 2N is equal
to the number of new mutations (neutral, deleterious
or advantageous) arising per year, m, multiplied by
their probability of fixation, u:
k = 2N mu
For a neutral mutation, probability of fixation is
reciprocal of population size:
u = 1/2N
So substitution rate for a neutral mutation is:
k = (2N )(1/2N )m
Neutral theory and the molecular
clock (continued)
Parameters for population size (2N) cancel out,
leaving:
k=m
One of the most important formulae in molecular
evolution – means that rate of substitution in neutral
mutations is dependent only on underlying mutation
rate and is independent of other factors such as
population size
Also holds for mutants with a very weak selective
advantage e.g. s < 1/2Ne
Substitution of selectively
advantageous mutations
Probability of fixation is roughly twice the selection
coefficient:
u = 2sNe/N
Substituting this into the original equation, we get:
k = 4Nesm
In this case, substitution rate for an advantageous
mutation also depends on population size and magnitude
of selective advantage
For natural selection to produce a molecular clock, it is
necessary for Ne, s and m (combination of ecological,
mutational and selective events) to be the same across
evolutionary time – highly unlikely!
Constancy of the molecular clock
Neutral theory predicted a molecular clock and first
protein sequence data appeared to confirm this: led
Kimura to cite this as the best evidence for neutrality
As more comparative sequence data became
available, particularly from mammals, examples of
rate variation began to appear
Debate arose concerning the constancy of the
molecular clock
Testing the molecular clock
Dispersion index R(t): test whether there is more rate
variation between lineages than expected under a
Poisson process:
If the data fit a Poisson process, variance in number of
substitutions between lineages should be no greater than
the mean number
If the data fit a Poisson process then R(t) = 1.0, if not then
R(t) > 1.0 and the clock is said to be overdispersed
A star phylogeny should be used, since any phylogenetic
structure will complicate the calculations (e.g. placental
mammals)
Testing the molecular clock
Protein
Species (n)
Amino acids
R(t)
Haemoglobin a
Haemoglobin b
Myoglobin
Cytochrome c
Ribonuclease
a-Crystallin
6
6
6
4
4
6
141
146
153
104
123
175
1.17
3.04
1.60
3.22
2.15
2.71
Mammalian protein data presented a serious problem for
neutralists
Problems most likely due to inaccuracies in phylogenies:
“Outlier” in data was guinea pig
Guinea pig is much more divergent than previously thought
The relative rate test
The relative rate test compares the difference between the
numbers of substitutions between two closely related taxa in
comparison with a third, more distantly related outgroup
A
B
X
C
If A and B have evolved
according to a molecular
clock, both should be
equidistant from C
dAC = dBC
A and B must be closest
relatives and C must not
be too far removed
The relative rate test
Old World
monkey
1
Human
2
New World
monkey
3
Synonymous sites in nine nuclear
genes (3520 bp):
d12 = 6.7
d13 – d23 = 2.3 ± 0.6
yh-globin pseudogene (1827 bp):
d12 = 7.9
d13 – d23 = 1.5 ± 0.4
Three introns (3376 bp):
d12 = 6.9
d13 – d23 = 1.0 ± 0.5
Two flanking regions (936 bp):
d12 = 7.9
d13 – d23 = 3.1 ± 1.1
Lineage effects and the molecular clock
Substitution rate varies with underlying neutral
mutation rate: k = m
Three ways for rates to vary between species:
Differences in generation time
Differences in metabolic rate
Differences in efficiency of DNA repair
These are known as lineage effects: neutralists believe
that lineage effects alone can account for all variation in
molecular clock
Selectionists believe that genes also show rate variation
due to other, selection-driven factors (residue effects)
Time
Generation time and the molecular
clock
Generation time and the molecular
clock
At the molecular level, generation time (g) can be
defined as time it takes for germ-line DNA to replicate
i.e. from one gamete to the next
Since most mutations occur at this point, rate of
substitution under neutral theory is a function of both
mutation rate and generation time:
k = m/g
General conclusion from molecular data is that the clock
is generation time dependent at silent sites and in noncoding DNA:
Silent rates in orang-utan, gorilla and chimp are 1.3-, 2.2- and
1.2-fold faster than in humans, which matches differences in
generation times
The metabolic rate hypothesis
In sharks, rate of silent change is five- to sevenfold
lower than in primates and ungulates which have
similar generation times:
Led to the hypothesis that differences in molecular rate are a
better explanation for differences in mutation rates than
differences in generation time (metabolic rate hypothesis)
States that organisms with high metabolic rates have higher
levels of DNA synthesis
Two pieces of mitochondrial DNA evidence support this:
Small bodied animals, which have higher metabolic rates, tend
to have higher mutation rates
Warm-blooded animals also have higher mutation rates than
cold-blooded animals
Relationship between body mass and
sequence evolution
10
% sequence divergence per Myr
Rodents
Geese
1
Dogs
Primates
Tortoises
Salmon
Tortoises
Newts
Frogs
Sea turtles
0.1
0.01
0.1
Horses
Bears
1
10
Whales
Sharks
100
1000
Body mass (kg)
10,000
100,000
DNA repair and mutation
Direct
damage
DNA
Correctly
repaired
Repair
Replication
errors
Incorrectly
repaired
Mutation
DNA repair and mutation
Repair mechanisms are extremely complex and there
are many repair pathways
There is some evidence supporting the hypothesis that
DNA repair influences mutation rate:
Evidence that highly transcribed genes are more efficiently
repaired
Base composition and substitution rates at silent sites in
mammalian genes tends to be gene- rather than speciesspecific: suggests that homologous genes are transcribed and
repaired in a similar manner
Conversely, closely related species such as hominind
primates, which share very similar repair mechanisms,
can exhibit greatly differing substitution rates