Download NEUTRAL THEORY TOPIC 3: Rates and patterns of molecular

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

RNA-Seq wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Viral phylodynamics wikipedia , lookup

Adaptive evolution in the human genome wikipedia , lookup

Gene expression programming wikipedia , lookup

Oncogenomics wikipedia , lookup

Gene expression profiling wikipedia , lookup

Group selection wikipedia , lookup

Designer baby wikipedia , lookup

Genome (book) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene wikipedia , lookup

Genetic drift wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Genome evolution wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genetic code wikipedia , lookup

Frameshift mutation wikipedia , lookup

Epistasis wikipedia , lookup

Koinophilia wikipedia , lookup

Mutation wikipedia , lookup

Point mutation wikipedia , lookup

Population genetics wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
NEUTRAL THEORY TOPIC 3: Rates and patterns of molecular evolution
Neutral theory predictions
A particularly valuable use of neutral theory is as a rigid null hypothesis. The neutral theory
makes a wide variety of predictions, and one or more of these predictions may be tested in any
given molecular dataset. Depending on which predictions (if any) are rejected, we gain
considerable insight in the underlying process of evolution for the involved molecular data. The
following four predictions are so widely applicable to the field of molecular evolution, that they are
often viewed as principles of molecular evolution.
1. The level of within species genetic variation is determined by population size and
mutation rate, and is correlated with the level of sequence divergence between species.
2. The rate of gene evolution (substitution) is inversely related to the level of functional
constraint (purifying selection) acting on the gene.
3. The pattern of base composition (and codon usage in protein coding genes) at neutral
sites reflects mutational equilibrium.
4. There is a constant rate of sequence evolution; i.e., a molecular clock.
Each of these predictions is examined in detail in the following four sections of these notes.
1. Variation within and among species
Neutral theory provides the bridge between microevolution, in populations, and macroevolution.
The connection between the two is actually quite simple, which is one of the reasons why neutral
theory has been so successful as a scientific theory.
Neutral theory makes two clear predictions about genetic variation within and between species:
1. Equilibrium polymorphism (usually measured as heterozygosity) is controlled by only two
parameters; population size (Ne) and mutation rate (µ).
2. Neutral population polymorphism is correlated with divergence between species.
1.1 Equilibrium polymorphism: We covered the first prediction in some detail in the last set of
notes. Not surprisingly, there was early interest (1970’s) in comparing natural levels of
heterozygosity inferred by protein gel-electrophoresis with those predicted under neutral theory.
The results were surprising, in that natural levels of population polymorphism were lower than
expected. This finding lead to two important developments in the field: (i) that the parameters Ne
and µ are hard to estimate; and (ii) that there were some problems with the original theory that
were corrected in what is now called NEARLY NEUTRAL THEORY (we will return to this topic later).
Note that protein gel electrophoresis can underestimate polymorphism, and that more recent
studies have revealed a general association between heterozygosity and mutation rate. This
approach has low power as a means of testing for the expectations of neutral evolution, so most
modern work in this area has focused on prediction 2.
1.2 Polymorphism and divergence are correlated: If genes are evolving neutrally, measures of
polymorphism within a species should be proportional to the level of divergence between species.
This is where the impact of the genetic code on the effect of a mutation becomes very important.
Remember that all changes in protein coding sequences can be divided into two classes: (i)
synonymous (S) and (ii) non-synonymous (NS). These types of mutation will be impacted
differently by the effect of selection on the protein product of the gene. Under neutral theory,
selection is not involved, so the ratio of S to NS polymorphism within a species is expected to be
equal to the ratio of S and NS substitutions measured between species. If positive selection were
acting on at least some mutations, their residence time in the population would we lower than
neutral mutations. Hence the ratios would not be the same, and NS mutants would represent a
smaller proportion of the within species polymorphism.
Comparison of the ratio of synonymous and nonsynonymous polymorphism within
species to divergence between species. Neutral theory suggests that the fraction of
variation that is nonsynonymous within species should be the same as between
species.
Species 1
12:4
17:6
Polymorphic
Fixed
Species 2
Species 3
6:2
10:3
14:5
19:6
Synonymous (S)
28
50
Polymorphism within a
species
Substitutions between
species
Non-synonymous (NS)
9
17
S:NS
3.1
2.9
Data are hypothetical. Ratios are tested by using a G-test on the counts of S and NS.
These hypothetical data are not significant. If positive selection were acting, residence
times for NS would be lower within species and polymorphic S:NS > fixed S:NS.
Tests for heterogeneity in the pattern of polymorphism to divergence are called NEUTRALITY TESTS.
Tests need not be based on S and NS; amino acids can be divided into physiochemically radical
(r) and conservative (c) and the c:r ratio can be tested for heterogeneity.
Neutrality tests are powerful and useful. However there is an important caveat with the
interpretation of a significant result. Rejection of strict neutrality does not distinguish between
violation of the assumption of selective equivalence of alleles, and violation of another one of the
involved assumptions of the model. For example, if the effect of selection changes over time due
to changes in effective population size, as in nearly neutral theory, a significant result will be
obtained from this test. We will return to this topic later.
2. The rate of gene evolution is inversely related to functional constraint
Under neutral theory the substitution rate is determined by the mutation rate and the probability of
fixation. It is well known that rates vary among genes (e.g., histones verses MHC) and within
genes (e.g., introns verse exons). Such rate variation is consistent with neutral theory, even
when mutation rates are the same.
Remember that neutral theory only asserts that
polymorphism is selectively equivalent; it does not require that the frequency of such
polymorphism cannot change among sites, gene, or species.
2.1 Variation within genes: We begin with rate variation within genes because it is unlikely that
mutation rates vary, and the interpretation of variation in substitution rates is easier. Consider a
protein coding gene. It stands to reason that due to the genetic code mutations at some sites will
have little effect on the encoded protein (e.g., 3rd codon positions) whereas mutations at other
sites (e.g., 1st and 2nd codon positions) are very likely to affect the encoded protein.
Consequently, the frequency of selectively equivalent alleles occurring at 3rd codon positions is
expected to be much higher than 1st and 2nd codon positions. Hence, 3rd positions are expected
to evolve more quickly than 1st and 2nd codon positions. The evolution of functional genes fits this
pattern in the vast majority of known cases. For a real example see plot below.
Mean number of substitution per site at the three codon positions of the epsilon-globin gene of
primates. Two measures are presented: (i) the average over all pair wise comparisons
between genes; and (ii) the sum of the branch lengths of the epsilon globin gene tree.
Mean number of substitutions/site
Cebus
Saimiri
0.8
0.15
pairwise subst/site
Callithrix
Lagothrix
Brachyteles
Alouatta
Ateles
Pan
Homo
0.6
0.1
0.4
0.05
0.2
0
0
Pongo
Macaca
subst/site as a sum
over tree
Aotus
1
2
3
Hylobates
Tarsius
Codon position
Galago
Otolemur
Cheirogaleus
0.01
mean pairwise subst rate
Subst rate as a sum of branch lengths
Eulemur
Gene tree for primate epsilon globins
Under both measures of substitution rate, 3rd codon
st
nd
positions evolve faster than 1 and 2 positions.
Note: mean number of substitutions per site were computed in all cases by using the Jukes and
Cantor (1969) correction.
The previous logical argument, as well as the above plots of real data, demonstrates a well
known principle of molecular evolution:
The greater the functional constraint, the slower the rate of molecular evolution.
Under neutral theory we can formulate this principle as a model (Kimura 1968). First we divide all
mutations into three categories: (i) adaptive, (ii) deleterious, and (iii) neutral. The first category is
assumed to occur very rarely, so their frequency is expected to be effectively zero. Hence the
frequency of deleterious mutations is fD and the frequency of neutral mutations f0 = 1 - fD. Let µT
equal the total mutation rate per site per unit time. Then the neutral mutation rate per site is:
µ0 = µT f0
Hence, the rate of substitution per site per unit time is:
k = µT f0
The rate of evolution depends on the “size (f0) of the selective sieve”
New mutations
Fixation in a “slow gene”
Kimura’s
New mutations
Fixation in a “fast gene”
f0 is the fraction of mutations that passes through the “sieve”.
So within genes we will assume that µT is the same for all sites. Clearly the value of f0 is largest
for the 3rd codon positions of protein coding genes. However we know that not all mutations at 3rd
codon positions are synonymous. Thus we might expect that f0 for synonymous positions is even
larger than f0 for 3rd codon positions, and this turns out to be generally true.
The average substitution rate between primates and rodents is higher for synonymous sites
as compared with third codon positions. The results are based on a sample of 82 nuclear
genes.
45
t1
t0
Mean at 3rd positions: 0.40
40
number of proteins
Rodent
gene
Primate
gene
Mean at synonymous sites: 0.61
35
30
25
20
15
10
5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
More
substitutions / site / 2x80 million years
Ancestral gene
3rd codon postions
Synonymous sites
Mean number of substitutions per site
This result is consistent with neutral theory given that
between primates and rodents is t = t0 +
f0 is smaller for 3rd codon positions because some
t1. The unit of time is 2 × 80my; the time
mutations at such site will be nonsynonymous.
since primates and rodents shared a
common ancestor.
Data from Bielawski, Dunn and Yang (2000) Genetics. 156:1299-1308.
Under this model the highest rate should occur when f0 = 1 for an entire gene. Pseudogenes are
expected to satisfy this expectation, and in fact they tend to exhibit the highest rates of evolution.
9
Substitutions per site per 10 years
Comparison of mean substitution rates in different parts of genes and pseudo-genes. Data is
9
from Li et al. (1985). Substitution rate is the mean number of substitutions per site per 10
years. Rates are an average over 3000 mammalian genes.
5
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
In
5'
Ps
5'
No
Sy
3'
3'
tro
fla
un
un
fla
eu
on
nns
nk
nk
tra
sy
tra
do
ym
in
no
in
ns
ns
ge
o
g
g
u
ny
ne
la
la
r
re
s
eg
te
te
m
s
s
gi
d
d
it e
ou
io
on
re
re
n
s
s
gi
gi
s
on
on
it e
s
This notion can be extended to other classes of sites within genes. Synonymous sites should
have f0 = 1 as long as selection is acting only with respect to the protein product of a gene. In
fact, the above figure illustrates that synonymous sites have a substitution rate comparable with
that of pseudogenes. Non-synonymous sites will have f0 < 1 that depends on the level of
functional constraint affecting the protein product. It is likely that f0 for nonsynonymous sites
comprises the full range between 1 and zero, where some sites are under no constraint (f0 = 1),
others tolerate no change (f0 = 0), and others fall somewhere in between.
The same notion can be extended to functional domains. The most functionally critical domains
will have f0 << 1, whereas others will have larger values of f0.
The figure below illustrates that different domains of the gene that encodes the insulin protein
evolve at different rates. Remember that following cleavage of the initial amino acid chain, the
polypeptide folds to a state of lowest free energy; this state allows the formation of disulfide
bonds. Once this occurs, a section of amino acids called the C-chain is removed, producing the
mature insulin protein (in a conformation that is not at the lowest free energy). The C-chain, then,
is not involved in the function of the mature proteins and consequently is expected to have a
higher f0 as compared with the A and B chains, which have a functional role in the mature insulin
proteins. As expected the substitution rate is higher (5 fold) in the C-chain. Interestingly, the
substitution rate in the C-chain (1.1 per site per 109 years) is less than half the average for
synonymous sites, suggesting that it still has considerable functional importance, presumably in
ensuring that an appropriate conformation for formation of the disulfide bonds is reached before it
is removed.
Substitution rate differs in different polypeptide domains of preproinsulin.
C chain: 1.1 × 10 / site / year
-9
A&B chains: 0.2 × 10 / site / year
-9
5 fold higher rate in C chain
Note that c-chain rate is still lower than in many other proteins, and at
synonymous sites, so its amino acid sequence must still has considerable
functional importance to the protein, probably in folding to lowest free energy so
that disulfide bonds can be formed.
The principle that sequence conservation reflects functional constraint is fundamental to nearly all
aspects of bioinformatics. Similarity searches of genetic databases rely on the assumption that
functionally critical sites will be conserved, thereby leaving a sequence similarity signal that can
be identified by the search algorithms. Moreover, methods that attempt to infer function of newly
discovered programs rely on matching regions of sequence in a new gene and genes with
conserved regions in other genes with a known function. The figure below illustrates regions of
the beta-globin gene that have been conserved for over 450 million years.
Multiple sequence alignment of four vertebrate beta-globin genes representing 450
million years of evolution. Amino acids shaded in green represent sites that appear
conserved for over those 450millin years.
Note: many of the shaded sites are located in the heme pocket or at the interfaces between
globins subunits, consistent with the notion that sites most critical to protein function evolve at
the slowest rates.
2.2 Differences among genes:
The notion of the size of the selective sieve; i.e., f0, is relevant to understanding difference among
genes in the rate of substitution. Mutations in genes with a higher fraction of functionally
important sites will have a higher probability of being deleterious; hence fD will be larger.
Consequently such genes will have small f0 and a relatively low mean rate of substitution per site.
As the fraction of functionally important sites decreases, fD decreases and f0 increases; the
probability that a mutation will be fixed is high because, on average, the probability that a new
mutation will be neutral is larger.
The nonsynonymous and synonymous substitution rates allow us to estimate the size of f0.
Hence, we can use these rates to compare the level of selection pressure acting on different
genes. The box below provides an example of how the ratio of nonsynonymous to synonymous
substitution rates are used to infer the level of selection acting on the neuroleukin gene of
primates.
Under neutral theory:
1.
The synonymous substitution rate (kS) is equal to the mutation rate.
2.
The nonsynonymous substitution rate (kN) measures the substitution rate for neutral amino acid changes.
•
•
Thus the ratio of these rates (kN / kS) represents the fraction of amino acid mutations that are neutral: this is
f0 for amino acids
The fraction of amino acid mutations that are deleterious (fD) must be 1 - (kN / kS).
Let’s take the Neuroleukin gene of primates as an example:
kN = 0.016
kS = 0.300
•
The fraction of amino acid changes that are neutral is 0.016/0.300 = 0.053, a small amount.
•
Hence the fraction of amino acid changes that are deleterious is 1 - 0.053 = 0.95!
With this framework we can compare the level of selection pressure acting on different genes.
The figure below compares a highly constrained gene with a gene evolving under relatively weak
constraints for mammals. Note that both functional genes are much more constrained as
compared with a pseudogene.
Estimated level of function constraint for three nuclear genes of primates
A1 adenosine receptor
Most mutations in a gene
evolving under strong
functional constraints are
deleterious
Prolactin
Less functionally constrained
gene has more neutral
mutations
Pseudogene
Pseudogenes are nonfunctional so all mutations
are expected to be neutral
Fraction of deleterious mutations
Fraction of neutral mutations
Estimates obtained from relative rates of synonymous and nonsynonymous substitution. Data is from
Bielawski, Dunn, and Yang (2000) Genetics 156:1299-1308.
As expected, we observe considerable variation in nonsynonymous rates. This reflects different
levels of purifying selection acting on proteins with different functional requirements.
Synonymous are also variable, reflecting differences in mutation rates in different parts of
mammalian genomes. Note that sampling errors contribute to some of the variance in both
synonymous and nonsynonymous substitution rates.
Distribution of nonsynonymous and synonymous substitution rates for 82 nuclear
genes of primates.
0.045 / site / 80 million years
Mean rate of synonymous
substitution:
0.201 / site / 80 million years
60
Number of proteins
Mean rate of nonsynonymous
substitution:
50
40
30
20
10
0.55
0.45
0.35
0.25
0.15
Data from Bielawski, Dunn, and
Yang (2000) Genetics,
156:1299-1308.
0.05
0
substitutions/site/80
million years
Method: GY94 under ML
Nonsyonymous rate
Synonymous rate
3. Mutation patterns
Neutral theory predicts that sites in the genome that are free from purifying selection will have
nucleotide or amino acid frequencies that reflect mutational equilibrium. For a review of
mutational equilibrium see the notes for POPULATION GENETICS TOPIC 5.
Based on our knowledge of molecular biology, we expect that the nucleotide frequencies (A, C,
G, and T) of pseudogenes, introns, 5’ and 3’ flanking regions of genes should be close to the
equilibrium point. Our knowledge of the genetic code suggest that this is also likely to be true of
the 3rd positions of codons, and even more so at the 3rd positions of 4-fold degenerate codons
(i.e., four-fold degenerate sites of a gene). For simplicity, nucleotide frequencies are often
summarized in terms %G + %C, or simply “GC content”.
Nucleotide frequencies in the human beta-globin gene differ among the three
positions of the codon. Frequencies at positions 1 and 2 reflect selection
acting on the protein product of the gene. Frequencies at position 3 reflect a
strong influence of mutation pressure.
st
nd
1 codon
position
0.45
rd
2 codon
position
3 codon
position
0. 45
0.45
0.4
0.4
0.4
0.35
0. 35
0.3
0.3
0.3
0.25
0. 25
0.25
0.2
0.2
0.2
0.15
0 . 15
0 . 15
0.1
0.1
3
0.1
0.05
0
0.35
0. 05
0.05
0
1
2
3
A C G
4
T
0
1
2
3
A C G
4
T
1
2
3
A C G
4
T
We have seen an example of this in the mitochondrial genome of vertebrates, where the third
codon positions GC content is correlated with distance from the origin of replication. As the
distance is also correlated with the expected amount of mutation by spontaneous decay that
accumulates during the process of replication, this observation is believed to reflect different
mutation equilibrium points in different parts of the genome.
GC content also is highly variable in the nuclear genomes of mammals, Drosophila and yeast,
and the nucleoid chromosome of prokaryotes. The effect of this bias on protein coding genes is
that some codons will be used more frequently than other codons in a given gene. In some
cases this bias appears to reflect a mutational equilibrium, whereas in others natural selection is
thought to play a role.
•
Prokaryotes: Synonymous codon usage in prokaryotes appears to have a positive
correlation with the frequency of its cognate tRNA. Such a relationship could result
from natural selection for increased translational efficiency and accuracy. This
relationship seems to depend on the expression level of the gene. In genes with
low levels of expression, the correlation is weak, suggesting that selection is too
weak to offset the effect of mutation pressure. There is general consensus that
this is a type of WEAK SELECTION, as the selective difference between favourable
and unfavourable codons must be very small.
•
Mammals: Mammalian genomes exhibit a highly organized genomic structure
where a small fraction of the genome encodes the majority of genes, and these
gene rich regions are GC rich. Such regions are called ISOCHORES. There has
been a long-running debate about the role of natural selection in the origin and
maintenance of isochores. The consensus opinion today is that the GC content of
isochores reflects mutation pressure arising from spontaneous mutations and the
mutagenic consequences of recombination. Here differences in GC content reflect
differences in equilibrium points rather than selection pressures.
•
Drosophila: The case of Drosophila is reminiscent of the case of prokaryotes, in
that biased use of synonymous codons seems to reflect selection pressure for
increased translational accuracy. Note that codon bias is reduced in Drosophila
pseudogenes and in species with small effective populations sizes.
The observation that codon usage might be determined, at least in part, by the effects of natural
selection is NOT evidence against the neutral theory. In such cases synonymous mutations to
and from favoured and un-favoured codons are less frequent due to selection; hence the principle
that selection results in lowered rates of substitution holds.
4. Molecular clock
It is clear that evolutionary rates vary among regions within genes and among genes. If we take
the rate of substitution as an average over all sites in a given gene, the neutral theory predicts
that the rate of evolution of that gene should be approximately constant over time, where time is
measured in generations.
Interestingly, the hypothesis of a clock-like tempo of evolution (the MOLECULAR CLOCK HYPOTHESIS)
predates neutral theory. Zuckerkandl and Pauling (1965) noticed an approximately uniform rate
of amino acid substitutions over time measured in years. The plot below illustrates clock-like
evolution in the mitochondrial protein coding genes of teleost fishes.
Mean number of substitutions / site
Linear relation between mitochondrial substitution rate
and time since common ancestor in teleost fishes
0.2
0.15
0.1
0.05
0
0
50
100
150
200
Time (millions of years)
Linear relationship is expected under a uniform rate of substitution.
Substitutions are the mean number of changes at first codon
positions of all mitochondrial protein coding genes. Data were
kindly provided by K. Dunn.
The notion of a molecular clock has always been controversial. Neutral theory suggest that
generation time is the appropriate unit of time, and some believe that variation in generation times
among different lineages of organisms has a significant affect on the pace of molecular evolution.
The expectation is that molecule clocks should run faster in lineage with shorter generation times.
This is known as the GENERATION –TIME EFFECT HYPOTHESIS. Interestingly, many studies of protein
evolution since Zuckerkandl and Pauling indicate an approximately constant rate in terms of
absolute time in years. Conflict between these opposing views still exists.
Note that some researches deny that any such pattern of general rate constancy exists at all!
Some argue that the observation of a clock is an artefact of averaging rates over very long
periods of time. Gillespie (1986) argued that substitutions might occur in an episodic fashion,
occurring in clusters rather than at regular intervals. This model suggests that the observation of
a molecular clock is only superficially consistent with neutral theory, and that the actual process
of evolution is more generally non-neutral.
There are two reasons for such considerable interest and controversy over the molecular clock
hypothesis.
1. Macromolecules that evolve at constant rates can be used to date evolutionary events
that are not visible in the fossil record. This includes both species-level divergences
where the fossil record is incomplete, or major genomic events such as gene
duplications, etc. that are unavailable from any type of fossil data.
2. Rate variation among lineages, when it exists, should provide insights into the
mechanisms of molecular evolution. Such variation could indicate a change in mutation
rate, or a change in the substitution rate due to a relaxation of selection pressure or a
period of positive Darwinian selection. So, even if the hypothesis is incorrect, it provides
a valuable null model (i.e., nothing interesting happens) against which we can test for
interesting patterns of evolution.
Neutral theory is the basis for indirect methods of estimating the mutation
rate
Remember from our earlier subject of mutation (FOUNDATIONS TOPIC 5) that there are two general
approaches to the measurement of mutation rates: (i) DIRECT METHODS, and (ii) INDIRECT
METHODS. The indirect methods, which estimate the number of substitutions in lineages that have
diverged from a common ancestor, are grounded in neutral theory.
Remember that the substitution rate (k) is equal to the product of the mutation rate (µ) and the
fraction of such mutations that are neutral (f0). Hence, all indirect methods must select a dataset
(or subset of a dataset) where f0 is assumed to equal one. Reasonable candidates for indirect
measurement of the mutation rate are pseudogenes, third codon positions of four-fold degenerate
codons, or introns.
Nearly neutral theory
Tomoko Ohta was a student of Motoo Kimura during the development of the neutral theory. Ohta
realized almost immediately that some fraction of mutations would probably be only mildly
deleterious, and consequently their fate could be influenced both by drift and by natural selection.
Ohta recognized that observed levels of heterozygosity in natural populations were much higher
than expected if all were subject to selection, but lower than predicted by the strict neutral model.
Ohta realized that by simply allowing for a fraction of mutations with slightly deleterious effects,
the neutral model could be extended so as to accommodate those observations. This
modification was first called the SLIGHTLY DELETERIOUS MODEL (Ohta and Kimura 1971). Later, with
the addition of slightly beneficial mutations, it became known as the NEARLY NEUTRAL THEORY.
SLIGHTLY DELETERIOUS MUTATIONS: Mutations with small selection coefficients such that both drift
and natural selection can influence their probability of fixation. When population sizes are very
large, drift effects are so small that natural section causes such alleles to be lost from the
population. [Remember when we modelled selection with infinite population sizes, the selection
coefficient always determined fate of an allele.] However, when population sizes are small the
fate of slightly deleterious alleles will be determined by drift because the magnitude of change in
allele frequency from generation to generation is so much larger than any changes that arise due
to negative selection pressure. Thus slightly deleterious alleles can be fixed due to drift
SLIGHTLY BENEFICIAL MUATIONS: The same logic applies as above, except that when population
sizes are very large selection acts to fix the allele in the population and when population sizes are
small the same allele can be lost due to drift.
The strictly neutral model was extended to accommodate nearly neutral mutations
Beneficial
Deleterious
Neutral
Strictly neutral model
Neutral
Slightly deleterious model
Slightly deleterious
Slightly beneficial
Nearly neutral model
Neutral
Slightly deleterious
Given this model, the substitution rate is NOT independent of population size. Rather, the
fraction of neutral mutations (f0), and hence the substitution rate, changes over time depending
on any changes in the population size and the value of the selection coefficient.
Large population size:
(selection very effective)
neutral
mutations
slightly
deleterious
mutations
FIXED
Small population size:
(selection a little less effective)
neutral
mutations
slightly
deleterious
mutations
FIXED
Nearly neutral theory provides a possible reconciliation of rate constancy in years with expected
rate constancy in generations: Organisms with short generation times tend to be small in size and
maintain large population sizes (case 1; e.g., rodents). Organisms with long generation times
tend to be large and maintain small population sizes (case 2; e.g., elephants). Under nearly
neutral theory the effects of generation time (tg) and population size (Ne) act in opposite directions
and could cancel each other out:
•
Case 1: Shorter tg means more generations per year and higher yearly rate. This
is offset by a reduced rate due to the greater effectiveness of selection; i.e., more
mildly deleterious mutations are effectively prevented from going to fixation.
•
Case 2: Longer tg means fewer generations per year and lower yearly rate. This is
offset by an increased rate due to reduced effectiveness of selection in small
populations; i.e., more mildly deleterious mutations fixed per generation due to
drift.
Nearly neutral theory also predicts a more narrow range of heterozygosity than strict neutral
theory, and this is consistent with what is observed in natural populations.
Neutrality can depend on genetic and environmental conditions
Neutral theory make no predictions about the stability (or instability) of the environment.
Environmental conditions change dramatically among seasons and among years. If an
environmental change leads to a change in the fraction of neutral mutations (f0), then neutral
theory predicts a change in the substitution rate dependent on changing f0.
Changes in the genetic background can also impact the f0 over evolutionary time. Consider the
evolution of an enzyme’s activity over time. If an amino acid is fixed by positive selection
because it increases the activity of an enzyme (let’s call this site X), then the amino acids at other
sites at that time could experience a change in selection pressure, even if they were not the
target of positive selection themselves. Sites interacting with site X could have been evolving
under a neutral model before the action of natural selection on site X, but evolved under strong
purifying selection following that action of natural selection on site X.
Classic examples of this come from viruses such as HIV that cross a species barrier. Following a
cross-species transmission event the environment changes, leading to a change in selection
pressure acting on many sites of viral genes and overall substitution rate. Moreover, genetic
recombination events are common among such viruses, and the altered genetic environment that
arises from the recombination event can also influence the nature of selection, including f0.
Success of the neutral theory
Since its inception, weaknesses in the neutral theory have been identified. It is to the credit of
this theory that it has survived them and has been enlarged to accommodate phenomena such as
slightly deleterious mutations and non-clocklike patterns of evolution. Most molecular biologists,
including “selectionists”, accept the notion of neutral evolution for characters such as pseudogenes and synonymous mutations.
Even among amino acid replacements, particularly
physiochemically conservative ones, the consensus opinion allows for a large component of
neutral evolution. There seems to be very little to debate in terms of the structure of the theory.
Differences of opinion remain as to the relative proportions of neutral, slightly deleterious, and
adaptive substitutions; however, controversies have shifted to more important issues in molecule
evolution. As a theory, the neutral theory appears to be here to stay.
James F. Crow (1985) provided four reasons for its success:
1. The theory provides the best explanation for the dramatic differences in the rates and
patterns of evolution in molecules as compared with morphology.
2. The neutral theory provides a common framework for understanding the dramatic
differences among genes, codon positions, introns, and pseudogenes.
3. The neutral theory correctly predicts the differences in rates among molecular datasets
as well as the similarity of substitution rates between the so-called “living fossil”
organisms and the most rapidly changing species.
4. The neutral theory has stimulated theoretical studies as well as studies of natural
variation in a framework based on a rigid null hypothesis.
In fact, prior to the 1970’s evolutionary biology was generally regarded as speculative and
undisciplined. Roger Stainer commented in a speech before members of the Society for General
Microbiology in 1970 that evolutionary studies are “a relatively harmless habit, like eating
peanuts”. The application of rigid mathematical models, beginning in the 1930’s, and working up
to neutral theory as a general null model for molecular evolution, played a large part in bringing
about a dramatic change in the way evolutionary biology was pursued as a science.