Download EQUATIONS USED IN 40-300 POPULATION GENETICS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Hybrid (biology) wikipedia , lookup

Philopatry wikipedia , lookup

Dual inheritance theory wikipedia , lookup

Gene wikipedia , lookup

Adaptive evolution in the human genome wikipedia , lookup

History of genetic engineering wikipedia , lookup

Inbreeding wikipedia , lookup

Genome evolution wikipedia , lookup

Genome (book) wikipedia , lookup

Gene expression programming wikipedia , lookup

Epistasis wikipedia , lookup

Heritability of IQ wikipedia , lookup

Designer baby wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Group selection wikipedia , lookup

Human genetic variation wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Genetic drift wikipedia , lookup

Koinophilia wikipedia , lookup

Population genetics wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
1
LECTURE NOTES
POPULATION GENETICS AND EVOLUTION
Molecular Biology And Genetics
40-300
2
QUANTIFYING GENETIC VARIATION
When we have allele frequency data for multiple loci we can quantify the
variation as follows:
Within populations, we can estimate
average Heterozygosity: H = 1 - pi2
where pi = frequency of the ith allele
proportion of polymorphic loci:
of loci
P = number of polymorphic loci/total number
Between populations we can estimate Nei's genetic distance, D, as follows:
The probability that a randomly chosen allele from EACH of TWO populations will
be identical, relative to the probability that two randomly chosen alleles from
the SAME population will be identical is the normalized identity, I:
I =
=
(pxipxj) / [( pxi2) x ( pxj2)]1/2
JXY / [JXX x JYY]1/2
where:
pxi = frequency of allele x in population i
pxj = frequency of allele x in population j
When surveying multiple loci, average across loci by using
I = JXY / [JXX x JYY]1/2
Nei's genetic distance is then calculated as:
D = -ln I
When we have information about the sequence of the alleles, we can quantify the
difference between the alleles, as well as differences in their frequencies.
The average number of nucleotide substitutions between a pair of alleles X and
Y, based on restriction site data is:
dXY = -lnS
j
S =
NXY
(NX + NY)/2
where:
NX = number of restriction sites in allele X
NY = number of restriction sites in allele Y
NXY = number of sites shared by X and Y
j = number of nucleotides in a restriction site
3
The average number of nucleotide substitutions between a pair of alleles X and
Y based on DNA sequence data is:
XY = -3/4 ln (1 - 4/3 dXY)
where: dXY = proportion of nucleotides that differ between alleles X and Y.
The average number of substitutions per nucleotide site in population i is:
vi = 2/[n(n-1)] nXnYXY
where:
n = number of alleles assayed
nX = number of alleles of type X
nY = number of alleles of type Y
XY = sequence divergence between alleles X and Y
The average number of substitutions per nucleotide site between populations i
and j is:
vij' =
piXpjYXY
where:
piX = proportion of allele type X in population i
pjY = proportion of allele type Y in population j
XY = sequence divergence between alleles X and Y
The diversity between populations, corrected for diversity within populations
is:
vij = vij' - (vi + vj)
2
THE HARDY WEINBERG THEOREM
To calculate allele frequencies from raw data for one locus use:
f(A) = f(AA) + 1/2 f(Aa)
f(a) = f(aa) + 1/2 f(Aa)
Under random mating, the GENOTYPE frequencies in the next generation will be:
f(AA) = p2
f(Aa) = 2pq
f(aa) = q2
Under inbreeding, the GENOTYPE frequencies in the next generation will be:
f(AA) = p2 + pqF
f(Aa) = 2pq(1-F)
f(aa) = q2 + pqF
4
We can estimate F, the inbreeding coefficient as:
F = 1 - [f(Aa)/2pq]
To calculate GAMETE frequencies from raw data for two loci, with a recombination
frequency, r, between the loci, we use:
gAB = f(AABB) + 1/2f(AABb) + 1/2f(AaBB) + 1/2(AB/ab)(1-r) + 1/2(Ab/aB)r
gAb = f(AAbb) + 1/2f(AABb) + 1/2f(Aabb) + 1/2(AB/ab)(r) + 1/2(Ab/aB)(1-r)
gaB = f(aaBB) + 1/2f(AaBB) + 1/2f(aaBb) + 1/2(AB/ab)r + 1/2(Ab/aB)(1-r)
gab = f(aabb) + 1/2f(Aabb) + 1/2f(aaBb) + 1/2(AB/ab)(1-r) + 1/2(Ab/aB)r
Using this information we can calculate D, the coefficient of Linkage
Disequilibrium
D = (gAB x gab) - (gAb x gaB)
We call AB and ab, COUPLING gametes
We call Ab and aB, REPULSION gametes
The frequency of the nine genotypes after one generation of random mating will
be:
AA
Aa
aa
BB
gAB2
2xgABxgaB
gaB2
Bb
2xgABxgAb
2xgABxgab
+
2xgAbxgaB
2xgaBxgab
bb
gAb2
2xgAbxgab
gab2
To calculate the new gamete frequencies we use:
gAB’
= (gAB2) + (0.5x2xgABxgAb) + (0.5x2xgABxgaB) + [(0.5x2xgABxgab)(1-r)]
[(0.5x2xgAbxgaBxr)
= (gAB2) + (gABxgAb) + (gABxgaB) + (gABxgab) - (rxgABxgab) + (rxgAbxgaB)
= gAB (gAB + gAb + gaB + gab) - r[(gABxgab)- (gAbxgaB)]
= gAB - rD
Thus, the frequency of the four gamete types in the next generation is:
gAB' = gAB - rD
gAb' = gAb + rD
+
5
gaB' = gaB + rD
gab' = gab - rD
Dmax = [f(A) x f(b)] or [f(a) x f(B)] whichever is smaller
Dmin = [-f(A) x f(B)] or [-f(a) x f(b)] whichever is larger
We can estimate the value of D after t generations as:
Dt = (1-r)tDo
6
GENETIC DRIFT AND GENE FLOW
Genetic drift causes gene frequencies to change by chance, due to sampling error
between generations when populations are not infinite. Populations tend to
become fixed for a single allele over time leading to a decrease in variation
within populations. Different populations may randomly become fixed for
different alleles leading to an increase in variation between populations.
We can calculate the probability of getting i alleles of type A in a sample of
N individuals when f(A) = p and f(a) = q as follows:
Probability =
(2N)!
i! (2N-i)!
piq2N-i
where: ! is factorial: eg. 5! = 5 x 4 x 3 x 2 x 1
Effective population size
Many factors can cause effective population (Ne) to differ from actual
population size. When this occurs, populations can be much more suseptible to
drift than we may think based on the actual size.
1. Unequal sex ratio.
Ne =
4NmNf
Nm + Nf
2. Fluctuating population size - Ne is the HARMONIC mean of population size over
time
1 =
Ne
1
Ni
where: Ni = population size in generation i
7
Gene flow refers to the movement of genes between demes - it can happen at the
level of individuals or at the level of gametes (eg - pollen).
We can estimate the gene frequency within a deme as a result of gene flow from
other demes. We must assume that the sample of genes entering the population
has the same allele frequency as the average allele frequency for all of the
demes.
After one generation of gene flow
pt = pt-1(1-m) + pm
where:
m = proportion of population that is migrants
p = mean allele frequency across all demes (the source of migrants)
From any starting point po, we can estimate the value of p within a deme after
t generations of gene flow (m) as:
pt = p + (po - p)(1-m)t
Because gene flow spreads alleles among populations, it tends to increase
variation within populations and decrease variation among populations.
At EQUILIBRIUM, it can be shown that FST is a function of Ne and m as follows:
FST^
=
1
(4Nem + 1)
We can use OBSERVED values of FST to calculate the parameter Nem from the above
equation. This estimate can be thought of as the combination of gene flow and
drift that would result in the observed value of FST at equilibrium.
When Nem = 1, subpopulations are exchanging one migrant per generation, on
average. Values below 1 are considered to be an indication of restricted gene
flow. Values above 1 indicate substantial gene flow.
8
FIXATION INDICES
Species that are widely distributed are often subdivided into DEMES.
Individuals may be more likely to mate within their deme rather than with
individuals from other demes because of distance or physical barriers. Thus,
populations will tend to drift apart. We can measure the divergence of
populations as a result of the increase in homozygosity due to drift using the
fixation index, F. It is similar to the inbreeding coefficient which measures
the increase in homozygosity due to non-random mating.
FIS = 1 - (HI/HS)
homozygosity of individuals relative to the expectation for the subpopulation
FST = 1 - (HS/HT)
homozygosity of demes relative to the expectation of the total population
FIT = 1 - (HI/HT)
homozygosity of individuals relative to the total population
The relationship among the 3 values is:
(1 - FIT) = (1 - FIS) x (1 - FST)
HI =
[f(Aa)i]
N
where;
N = number of populations
f(Aa)i = observed number of heterozygotes in population i
pi = frequency of allele A in population i
HS =
[2piqi]
N
HT = 2pq
p = [pi] q = [qi]
N
N
We can also estimate fixation indices from restriction site and sequence data
using vi and vij. This value tells us what proportion of the total nucleotide
diversity in a group of demes is due to differences among demes.
vw = vi
Unweighted mean of all vi values
vb = vij
Unweighted mean of all vij values
FST = ___vb___
(vw + vb)
9
MUTATION
When there are two alleles in a population and the mutation rate of A to a is
µ and the mutation rate of a to A is v, we can calculate the frequency of the
A allele at any time in the future (t generations) from some starting point,
po, as:
pt = [v / (µ + v)] + (po - [v / (µ + v)])(1 - µ - v)t
The equilibrium frequency of the A allele will be:
p^ = v / (v + µ)
Infinite alleles model
At the molecular level, it is not unreasonable to assume that each new
mutation results in a new allele. Thus, at equilibrium between mutation and
drift it can be shown that:
F^
=
1
4Neµ + 1
where:
Ne = effective population size
µ = the mutation rate to new alleles
F^ is a measure of homozygosity within a population as a result of the loss of
alleles due to drift. Homozygosity can also be measured from allele frequency
data as:
F = pi2
where: pi = frequency of the ith allele
We can define ne as the effective number of alleles, or the number of EQUALLY
FREQUENT alleles it would take to provide a particular value of homozygosity.
We can use the 2 equations above to estimate ne, or to estimate the parameter
Neµ as follows:
ne = 1/pi2
= 4Neµ + 1
10
NATURAL SELECTION
One-locus Models
Genotype
Relative Fitness (Wi)
AA
WAA
Contribution to the next
generation
f(AA)x(WAA)
Aa
WAa
aa
Waa
f(Aa)x(WAa)
f(aa)x(Waa)
W = [f(AA) x WAA] + [f(Aa) x WAa] + [f(aa) x Waa]
New f(A) = p' = f(AA)(WAA) + (1/2)f(Aa)(WAa)
W
New f(a) = q' = f(aa)(Waa) + (1/2)f(Aa)(WAa)
W
Fitness of each allele:
WA = f(AA)(WAA) + (1/2)f(Aa)(WAa)
p
Wa = f(aa)(Waa) + (1/2)f(Aa)(WAa)
q
The CHANGE in allele frequency in the next generation can be calculated as:
p = pq (WA - Wa)
W
Selection on alleles with varying effects on fitness.
NOTE: Wi = 1 - s
Genotype
Wi
AA
1
s = selection coefficient
Aa
aa
1-hs 1-s
when h = 0, then A is dominant to a wrt fitness
when 0 < h < 1, then A and a are codominant
when h = 1, then a is dominant to A wrt fitness
At equilibrium between mutation and selection against an allele, a:
q^ = (µ/s)
q^ = µ/hs
q^ = µ/Fs
11
When the heterozygote has the highest fitness, we say that it is overdominant:
Genotype
Wi
AA
1-t
At equilibrium
Aa
1
aa
1-s
q^ = t/(s + t)
This is a STABLE equilibrium
When the heterozygote has the lowest fitness, we say that it is underdominant:
Genotype
Wi
AA
1+t
Aa
1
aa
1+s
At equilibrium
q^ = t/(s + t)
This is a UNSTABLE equilibrium
Two-locus Models
Fitness interactions between loci can be ADDITIVE, MULTIPLICATIVE or EPISTATIC
ADDITIVE: When an individual obtains an increment of fitness for each locus
affecting a particular trait, overall fitness for the trait is calculated as
THE SUM of the fitness obtained from each locus.
AA
WAA
Aa
WAa
aa
Waa
BB - WBB
WAA + WBB
WAa + WBB
Waa + WBB
Bb - WBb
WAA + WBb
WAa + WBb
Waa + WBb
bb - Wbb
WAA + Wbb
WAa + Wbb
Waa + Wbb
12
MULTIPLICATIVE: When an individual obtains fitness for each locus affecting
a particular trait independently of the other loci, fitness for the overall trait
is calculated as THE PRODUCT of the fitness obtained from each locus.
AA
Aa
aa
WAA
WAa
Waa
BB - WBB
WAA x WBB
WAa x WBB
Waa x WBB
Bb - WBb
WAA x WBb
WAa x WBb
Waa x WBb
bb - Wbb
WAA x Wbb
WAa x Wbb
Waa x Wbb
EPISTATIC: When the fitness of a genotype at one locus depends on the genotype
at a second locus, the fitness interaction between the loci is said to be
EPISTATIC.
We can calculate the change in GAMETE frequencies and ALLELE frequencies as a
result of selection acting on the 2 loci.
First, define WXY, the average fitness of gamete XY
WAB = gAB[WAB/AB] + gAb[WAB/Ab] + gaB[WAB/aB] + gab[WAB/ab]
WAb = gAB[WAb/AB] + gAb[WAb/Ab] + gaB[WAb/aB] + gab[WAb/ab]
WaB = gAB[WaB/AB] + gAb[WaB/Ab] + gaB[WaB/aB] + gab[WaB/ab]
Wab = gAB[Wab/AB] + gAb[Wab/Ab] + gaB[Wab/aB] + gab[Wab/ab]
The average fitness of the entire population is:
W = [f(AiAjBiBj) x WAiAjBiBj]
where i and j are the alleles at each locus.
With 2 alleles at each locus, there will be NINE terms in the equation, one for
each of the 2-locus genotypes.
13
To calculate GAMETE frequencies after one generation of selection we use:
gAB’ = gAB[WAB/W] - rDWAaBb
gAb’ = gAb[WAb/W] + rDWAaBb
gaB’ = gaB[WaB/W] + rDWAaBb
gab’ = gab[Wab/W] - rDWAaBb
14
MOLECULAR EVOLUTION
Rates of amino acid substitution per unit time
 = rate of amino acid substitution per unit time
D = observed proportion of amino acid differences
Dt = 1 - exp(-2t)
k = expected proportion of amino acid differences
k = 2t = -ln(1-D)
We can calculate k from D using k = -ln(1-D).
Then, we can calculate t if we know  OR we can calculate  if we know t.
Rates of nucleotide substitution per unit time
 = rate of nucleotide substitution per unit time
D = observed proportion of nucleotide differences
Dt = 1 - exp(-2t)
k = expected proportion of nucleotide differences
k = 2t = -[3/4]ln(1- 4/3[D])
We can calculate k from D using k = -[3/4]ln(1- 4/3[D]).
Then, we can calculate t if we know  OR we can calculate  if we know t.
OBSERVATIONS THAT LED TO THE NEUTRAL THEORY
1.
Proteins evolve faster and are more polymorphic than would be expected if
substitutions are the result of fixation of beneficial mutations by natural
selection.
2. Proteins evolve at a constant rate through time.
beneficial mutations to occur at regular intervals.
We would not expect
3. Different proteins and different parts of proteins evolve at different
rates.
15
THE NEUTRAL THEORY OF MOLECULAR EVOLUTION
Kimura (1968) and King and Jukes (1969) propoded the NEUTRAL THEORY to explain
these observations. They suggested that most (but NOT all) evolutionary
changes in macromolecules were due to the random fixation of selectively
equivalent (neutral) variants by genetic drift. Prior to this, it was believed
that all variation must be under the influence of natural selection.
1. The neutral theory explains the high levels of variation. Drift decrease
heterozygosity at the rate of 1/2Ne per generation. But mutation adds new
variation.
The average time to FIXATION of a neutral allele is 4Ne per generation.
The average time to LOSS of a neutral allele is 2(Ne/N)ln2N generations.
Table 1 shows that a neutral mutation on its way to fixation will create a
polymorphism for a long period of time.
2. The neutral theory explains why substitution rates are constant.
The steady state rate at which neutral mutations is fixed is
v (neutral mutation rate) = 1/2N x 2N x v
3. The neutral theory explains the variable rates of substitution as variation
in the probability that a new mutation is NEUTRAL. If the probability is high,
then the substitution rate will be high. If the probability is low, then the
substitution rate will be low. So, absolute mutation rates may be similar in
different molecules but neutral mutation rates may vary.
HOW WELL DOES THE NEUTRAL THEORY EXPLAIN OBSERVATIONS?
Levels of heterozygosity are lower than expected under the neutral theory. This
has led to a modification of the theory to incorporate nearly neutral mutations.
The interaction between drift and natural selection will determine the fate of
a new mutation. The relevant quantity is 4Nes
If 4Nes > 10, then selection is the primary force acting on the allele
If 4Nes < 0.1, the drift is the primary force acting on the allele
If 0.1 < 4Nes < 10, then both forces will act on the allele
MOLECULAR CLOCKS
Rates of substitution seem to be fairly constant for many genes and proteins.
We can use this relationship to estimate times of divergence for species with
poor fossil data. It is important that the “clock” be “calibrated” for the
organisms and the genes under consideration.
Estimates of rates of DIVERGENCE PER UNIT TIME are estimates of 2.
Use observed estimates of D to calculate k. Then t = k/(2).
16
QUANTITATIVE GENETICS
Many phenotypic traits are nearly continuous in their expression.
If the trait can take on any value it is said to be CONTINUOUS
(eg. height, weight)
If the trait can only take on whole integer values it is said to be MERISTIC
(eg. litter size, bristle or spot number, appendage number)
When considering a population, we can measure the value of the trait and describe
the distribution of the phenotypes by its MEAN and VARIANCE.
mean = x = nixi
n
The actual value of the mean = 
variance = s2 = ni(xi-x)2
(n-1)
= (nixi2 - nx2)
(n-1)
The actual value of the variance is 2
The standard deviation = s2 = 
We can express the relationship between phenotype and genotype as follows:
P =  + G + E
where:
G = the deviation from the mean due to genotype
E = the deviation from the mean due to the environment
If we assume no genotype x environment interaction then:
G = 0 and E = 0
We want to be able to determine the contribution of a particular allele to the
phenotype. Phenotypes cannot be passed but we can determine out an allele
contributes to the phenotype “on average”.
17
AVERAGE EXCESS OF AN ALLELE, a
This can be defined as the average contribution of an allele to the phenotype
beyond the mean phenotype.
aA = [f(AA) x GAA] + [1/2f(Aa) x GAa]
p
aa = [f(aa) x Gaa] + [1/2f(Aa) x GAa]
q
If there are more than 2 alleles, all the heterozygotes must be considered.
aA = [f(AA) x GAA] + [1/2f(AB) x GAB] + [1/2f(AC) x GAC]
p
To estimate Gi (the genotypic deviation of the ith genotype) use:
Gi = Pi - 
where:
 is the population mean phenotype
Pi is the mean phenotype of genotype i.
We can also calculate the BREEDING VALUE (BV) of each allele. We need to
calculate ,the average effect of an allele which equals a in a random mating
population.
In an inbred population,  = a/(1 + F)
BVAA = A + A
BVAa = A + a
BVaa = a + a
BVi and Gi do not have the same value because Gi includes a dominance contribution
to the phenotype whereas BVi includes only the additive contribution of an allele
to the phenotype.
18
VARIANCE COMPONENTS
Unless all individuals have exactly the same phenotype, a population will have
phenotypic variance. This variance can be divided into contributions from the
genotype and the envirnoment (as was the mean).
P2 = G2
+ E2
where:
P2 is the total phenotypic variance
G2 is the variance due to genetic variation
E2 is the variance due to environmental variation
G2 can be further subdivided into an additive genetic component(A2)and a
dominance genetic component (D2) as follows:
G2 = A2
+ D2
Total phenotypic variance is calculated in the usual way:
P2 = ni(Pi-)2
(n-1)
Genetic and additive variance components can be estimated as follows:
G2 = fiGi2
A2 = fiBVi2
The variance components tell us how much of the total phenotypic variance is
due to genetic factors and how much is due to environmental factors. The ONLY
variation that is important for evolution is ADDITIVE GENETIC VARIANCE.
h2 is defined as the proportion of total phenotypic variance that is due to
genetic variance.
Heritability in the broad sense is h2B =
Heritability in the narrow sense is h2N =
G2/P2
A2/P2 .
If h2N = 1, then ALL phenotypic variation for a particular trait is due to additive
genetic variation and a population will respond quickly to selection on the
trait.
If h2N = 0, then NONE of the phenotypic variation is due to additive genetic
variation and a population will NOT respond to selection on the trait.
19
When considering complex phenotypic traits that are controlled by many loci,
POLYGENIC TRAITS, it is very difficult to calcuate the individual variance
components for each locus. Thus, we use parent-offspring regression to
estimate heritability. The higher the heritability, the higher the correlation
between parents and their offspring.
We can calculate the COVARIANCE between two variables x and y as:
XY = ni(xi - x)(yi - y)
(n-1)
= [nixiyi - nxy]
(n-1)
The equation for a line is y = c + bx
It can be shown that the SLOPE of a line, b = XY/2x
and that h2 = b when we regress the phentype of the MIDPARENT on the phenotype
of the offspring.
If we regress the phenotype of ONE parent on the phenotype of the offspring,
then h2 = 2b because each parent only contributes 1/2 of the offspring’s genes.
In an artificial selection experiment, R = h2S
Response to selection = R = (‘ - ) and
Selection differential = S = (S - ) and
where:
‘ = the mean after selection
S = the mean of the selected parents.
20
ADAPTATION
Definitions:
Adaptedness is a measure of a genotype’s (phenotype’s) capacity for survival and reproduction
relative to that of other genotypes.
An adaptation is a phenotypic variant that results in the highest fitness among a specified set of
variants in a given environment.
Adaptation/adaptedness is RELATIVE, not optimal
Life can be divided into different levels of organization. We can ask which level of organization
benefits from adaptations . The question may seem trivial. However, there can be conflict among
the levels. What benefits a gene may not benefit the individual. What benefits an individual may
not benefit the population, or vice versa.
Genic Selection
An allele that is favored by selection at the level of the gene, may not be favored at the level of the
individual.
Examples:
Segregation distorter in Drosophila melanogaster.
Biased gene conversion
Transposable elements.
Individual Selection
An allele is favored because it increases the survival and/or reproduction of an individual. This is
what we normally think of when we talk about natural selection.
Group Selection
An allele is favored because it increases the “survival” of the population. Many traits favored in
individuals do this, in which case there is no need to consider group selection. However, can a trait
which benefits the group at the expense of the individuals who carry it, ever increase in frequency via
natural selection?
Example:
Altruistic behavior.
A behavior is ALTRUISTIC if it increases survival/reproduction in the recipient and decrease
survival/reproduction of the altruist.
21
Many examples of apparently altruistic behavior have been observed in nature: for example, warning
calls. Theoretically, altruistic behavior could increase in frequency if it increases the probability that
a group (population) will avoid extinction, or if it increases the probability that a group will expand to
form other groups (see Figure 4). However, individuals reproduce at a much faster rate than do
groups, and it seems unlikely that group selection could ever over ride individual selection.
There is one case where group selection does seem to be important - when the “group” is composed
of relatives.
Kin Selection
If you increase the fitness of your relatives, they can pass on the same genes that you do.
Inclusive fitness: Your total fitness is a function of your own fitness, plus the fitness you get when
you increase the survival/reproduction of relatives.
wi = ai + rijbij
where:
wi = inclusive fitness of individual i
ai = direct effect of the altruistic trait on the individual fitness of i
bij = the effect of the altruistic trait on the fitness of another individual, j
rij = coefficient of relatedness between i and j (the fraction of j’s gametes that are identical by descent
to alleles carried by I).
r
Y
=
FXY
1 + FX
where FXY = inbreeding coefficient of hypothetical offspring of X and
where FX = inbreeding coefficient of X
How to calculate F from pedigrees: Fi = (1/2)i(1 + FA)
where i is the number of individuals in the path leading to i, and F A is the inbreeding coefficient of
A, the common ancestor of the individuals in the path leading to i.
Even if ai <0 (the altruistic behavior works against the individual directly), the altruistic trait can
increase if the individual obtains a large increment towards fitness from helping relatives. (see
example from Ridley concerning scrub jay helpers in Table 12.2)
So, to come back to the question: What is the unit of selection?
One viewpoint is that the units that show adaptation are the units that show heritability - the
phenotypic traits and the individuals that possess them. Mutations that influence the phenotype of a
unit (cell, tissue, organ, limb) must be transmitted to offspring of that unit - that is how natural
selection increases the frequency of the trait.
22
Another viewpoint is that the unit of selection is the gene itself. It is the only entity which is
potentially “immortal” . Phenotypes are not passed on - they are genotype by environment
interactions. Even the very same genotype may not produce an identical phenotype at another time
in another environment. Genotypes are not passed on - genetic combinations are reshuffled each
generation due to meiosis and recombination. Even though ecological processes such as predation,
competition etc., act on the individuals and ultimately cause the change in allele frequencies, it is the
alleles that are passed on.
Thus, it can be argued that adaptations exist because they increase the reproduction of the genes
that encode them, relative to the genes that encode alternate forms of the trait.
Those entities which propagate genes efficiently will show adaptation.
What sorts of genetic changes can cause ADAPTATION?
1. Changes affecting the Biochemistry of an organism.
Enzymes can change to affect their affinity for different substrates, their temperature optima, their
kinetics etc.
There is no doubt that biochemical evolution is important. However, we seldom see major new
biochemical pathways evolving - the basic pathways are the same among all living things.
One example of a new pathway evolving is the divergence of C3 vs. C4 photosynthetic pathways in
plants.
2. Changes affecting the evolution of NEW CELL TYPES.
From a histological perspective, we can recognize a relatively small number of basic cell types,
regardless of the organisms from which they come (muscle, blood cell, nerve cell, epidermal cell,
etc.). There is not enough variation in cell types to account for the amount of morphological
evolution that has occurred.
3. Changes in DEVELOPMENTAL PATTERNING.
Most morphological variation that we observe is due to changes in the developmental patterning of
cellular mechanisms, not due to changes in the mechanisms themselves.
Development
An organism develops from a zygote due to the proliferation and differentiation of various cell lines at
particular times and at distinct rates. Morphology can change if there are changes in:
SPATIAL organization of cell types.
TEMPORAL patterns of differentiation.
23
Passage from: “The origin of animal body plans” (March-April 1997) American Scientist
85:126-137 by D. Erwin, J. Valentine and D. Jablonski
[Developmental regulation proceeds through the sequential activation of a series of regulatory
switches that in turn activate networks of other genes. In general, regulatory genes produce proteins
that bind to and influence the activity of other genes. The protein products of these genes then
activate still other genes and the cascade continues.
Regulatory genes that are active early in development help set up the body axes by
determining which end of the embryo becomes the head, and which end the tail, which part is the
back and which is the belly. These early expressing genes also set up the basic tissue types.
Genes that are active later in the cascade help block out distinctive morphological regions
within the body - say the head from the abdomen. Later still in the cascade, genes mediate the
growth of appendages like limbs, until the most refined morphological details have been achieved.
Many different classes of regulatory genes share a common DNA sequence which is known as the
homeobox which predates the origin of animals.]
Information about HOX genes from: Homeotic genes and the evolution of arthropods and
chordates. (10 August 1995) Nature 376:479-485) by S.B. Carroll.
HOX genes demarcate relative positions in animals rather than specify any particular structures.
They regulate the expression of large numbers of target genes. In Drosophila, KNOWN HOX
targets include genes encoding other transcriptional regulatory proteins, secreted signaling proteins,
structural proteins. There are between 85 and 170 genes that are know to be regulated by the
product of one particular HOX gene, Ultrabithorax, alone. A mutation in one HOX gene can affect
regulation of many other genes and thus have profound effects on morphology.
Arthropods differ in the number, type, and organization of body appendages (antennae, claws, mouth
parts, legs) all of which evolved from ancestral arthropod limbs. Changes in HOX gene expression
can explain why some crustaceans have limbs on their abdominal segments, and others do not. It is
possible to evolve new regulatory interactions which determine WHEN and WHERE a limb will
develop.
In vertebrates, HOX genes influence vertebral morphology and patterns of limb and central nervous
system development.
“The creative potential of regulatory evolution lies in the hierarchical and combinatorial nature
of the regulatory networks that guide the organization of body plans and the morphogenesis
of body parts.”
Understanding how adaptation of complex morphological traits occurs is possible if we recognize two
basic properties of development:
1. DEVELOPMENT IS EPIGENETIC - it depends on prior developmental events and cannot be
understood entirely in terms of primary gene action. Mutations that act early in development have
larger effects than mutations that act late in development.
24
2. DEVELOPMENT IS INTEGRATED - development of complex structures such as limbs involves
changes in many cell types and all of these changes must be timed correctly. The integrated control
of these changes through regulatory genes such as HOX genes makes complex morphological
changes possible. It was previously thought that there had to be changes in the separate genes
controlling all the parts of a trait. This would make the evolution of complex morphological traits
highly unlikely.
ADAPTIVE EXPLANATION
Can natural selection explain all known adaptations?
Traits with simple genetic basis are no problem - colour in moths, hemoglobin in high altitude ducks.
But, what about complex traits like the eyes, wings, organ systems?
Darwin was convinced that such traits must have evolved “gradually” via many small changes.
Population genetics theory is concordant with this view: mutations of small effect are more likely to be
beneficial than mutations of large effect.
The critical requirement is to show that a complex trait COULD have evolved via small changes. It
doesn’t matter if we know exactly what all of those small changes were.
Two classes of adaptations may cause a problem for natural selection.
1.
Complex traits with many integrated parts that must all change simultaneously.
eg. giraffe’s long neck, the eye.
As knowledge of development increases, this problem is easy to overcome. We now know that
much morphological evolution occurs by changes in REGULATORY genes which alter EXPRESSION
of many loci simultaneously. The genes controlling the STRUCTURE of the parts do not have to
change.
2.
Traits for which the rudimentary stages would seem to be disadvantageous or functionless.
eg. wings
Adaptations do not generally come from nothing , but from modification of a structure that already
exists. Thus, an important concept concerning the evolution of complex traits when the early stages
would seem not to be useful is PREADAPTATION.
Preadaptation refers to the evolution of a trait for one purpose but the later use of the trait for another
purpose. We often observe a large change in the function of the trait with little change in the
strucuture. After the trait is used for the new purpose, natural selection can act on variants that
influence the new purpose.
eg. lobe fins evolving into tetrapod limbs.
eg. the evolution of wings prior to the evoluton of flight in birds
Some people refer to traits that have changed functions as EXAPTATIONS
25
THE STUDY OF ADAPTATION.
1. Identify types of genetic variants that a trait may have.
2. Develop hypotheses or models of the function of the trait.
3. Test predictions of the hypotheses.
A. Determine if the actual form of the trait matches the hypothesis. If not, the hypothesis is
incorrect.
B. Perform experiments to determine if the hypothesis is correct. This requires that variant
forms of the trait are available or can be manufactured.
Example: neck teeth and Chaoborus predation on Daphnia
C. The COMPARATIVE METHOD
.
The hypothesis about the adaptive value of a trait predicts that some species in a particular
environment should have a particular form of the trait that differs from that observed in species
in different environments.
Example: the production of neck teeth evolved independently in 2 groups of Daphnia that coexist
with Chaoborus
WHY ARE ADAPTATIONS IMPERFECT?
1.
TIME LAGS - the environment changes so the population must respond. But, evolution via
natural selection takes time.
eg. tropical fruits with hard outer casings evolved for disperal by now-extinct mammals.
2.
GENETIC CONSTRAINTS - Heterozygous advantage is an example of a genetic constraint.
A sexual diploid population cannot be “true-breeding” for heterozygotes. So, the population must
tolerate the existence of less-fit homozygotes.
A population could “get around” this constraint via gene duplication.
3.
DEVELOPMENTAL CONSTRAINTS
definition: A development constraint is a bias on the production of variant phenotypes or a limitation
on phenotypic variablity caused by the structure, character, composition, or dynamics of the
developmental system.
Causes:
PLEIOTROPY - genes effect more than one trait. Selection cannot operate on the traits
independently.
eg. Small salamanders with 4 toes.
Populations can sometimes evolve altered developmental pathways to decrease the constraint.
CANALIZING SELECTION
A new mutation might provide an advantage with respect to one trait, but it also causes some
disruption of development. Selection will favour alleles at MODIFIER loci that decrease this
disruption. In time, the developmental pathway can be restored even when the mutation is fixed,
by fixation of alleles at modifier loci.
eg. Resistance to insecticides
eg. Abnormal abdomen in Drosophila mercatorum.
26
DEVELOPMENTAL CONSTRAINT HAS BEEN PROPOSED AS AN ALTERNATIVE TO NATURAL
SELECTION AS AN EXPLANATION FOR THE FORM OF SOME TRAITS.
In other words, if the trait COULD evolve, it might be favoured by selection, but developmental
constraints prevent the trait from occurring in the first place.
eg. spotted mammals tend to have ringed tails
How could we distinguish between these 2 alternatives?
1.
Adaptive prediction - If you can predict the form a trait will take under particular conditions, you
could argue that natural selection is responsible for its form.
2.
Direct measure of selection - If it is possible, we can measure selection on the trait relative to
other forms of the trait. This is not always practical.
3.
Heritability - If the trait is highly constrained we do not expect there to be any additive genetic
variation at the loci which control it. We can do artificial selection on the trait and if heritability is not
0, then the trait is probably not constrained..
4.
Cross-species evidence - Do the missing forms of the trait occur in other species? Can we
create the missing forms via artificial selection? If we can obtain them, then developmental constraint
is NOT a good explanation for why they do not exist in nature.
Allometry - it could be argued that you can’t get a particular phenotype because allometric
relationships prevent it. However, if you can alter allometric relationships via artificial selection, it is
possible that other forms of the trait could occur in nature.
eg. eye stalks in flys
5.
Historical constraint - A population could evolve to a high peak on its adaptive topography, but
the environment may change so that another location on the topography now provides a higher peak.
Even so, the population will be stuck on the current peak, even though a higher one now exists.
eg. the recurrent laryngeal nerve in the neck of mammals
eg. different means to obtain the same trait - neck teeth in different groups of Daphnia.
Adaptation must be understood in a historical context. What was present in the past
provides the raw material for subsequent change.
6.
Trade-offs - If a trait is used for many functions, it may not be possible to optimize it for every
one. eg. the vertebrate mouth is used for breathing and ingesting food.
In theory, it is simple to define adaptations as traits that have evolved via natural selection.
In practice, it may be difficult to determine if the current form of the trait did indeed evolve via
natural selection.
27
SPECIES CONCEPTS AND SPECIATION
First we must distinguish between two types of evolutionary change:
ANAGENESIS is evolution, or a change in the gene pool, within a species. This is what we have
been talking about up until now.
CLADOGENESIS is branching evolution and refers to the development of 2 species from a single
ancestral species.
WHAT IS A SPECIES?
In practice, we recognize species by their morphological differences. So we can define a
PHENETIC SPECIES as a group of organisms that look similar to one another, but is distinct from
other such groups. The criteria usually include a large number of morphological characters.
Unfortunately, a set of characters may not always define the same “groups”. In addition, this
definition of species has no relationship to evolution. We can define species this way even if
evolution does not occur.
The most commonly used definition currently, at least among zoologists, is the BIOLOGICAL
SPECIES which can be defined as a group of interbreeding individuals that is reproductively isolated
from all other such groups. This definition was proposed by Ernst Mayr. This definition is satisfying
from an evolutionary perspective because it incorporates the idea of shared gene pools so that
species and speciation can be studied in the framework of population genetics; a gene pool = a
species.
The phenetic and biological species concepts often describe the same groups of individuals. This is
not surprising - we use morphological characters to identify individuals that belong to the same gene
pool. The phenetic similarity is a direct consequence of the heritability of the traits encoded by the
gene pool. Thus, as far as proponents of the biological species concept are concerned, phenetic
similarity only matters in so far as it is an indicator of interbreeding.
An example of a situation where the 2 concepts disagree is SIBLING SPECIES. In this case we
have 2 species whose individuals are morphologically indistinguishable (at least to humans).
However, genetics, behavior and/or reproductive biology show that their are really 2 groups that do
not interbreed.
NOTE: The “glue” that holds species together under this concept is GENE FLOW.
Over the years there have been many attempts to modify the definition of species. The reason there
has been so much effort in this area is that no definition is “perfect”. For example, how can you use
the criterion of interbreeding with respect to an asexual or parthenogenetic organism? Below is a
short description of some of the other species concepts that have been proposed.
28
THE RECOGNITION SPECIES CONCEPT (H. Paterson)
A species is a group of individuals sharing the same Specific Mate Recognition System (SMRS).
In general, this definition should define the same groups as the Biological Species concept.
However, instead of framing things in terms of who individuals do NOT breed with, this concept
focuses on who individuals DO breed with.
THE ECOLOGICAL SPECIES CONCEPT
A species is a group of organisms exploiting a single niche. It has been argued that ecological
niches in nature occupy discrete zones with gaps between them. The “glue” holding the species
together in this case is natural selection. Interbreeding between species would not be favored
because of the creation of hybrids that are not adapted to either niche. This concept differs from the
Biological species concept in its focus on natural selection as the cohesive force, rather than gene
flow.
THE COHESION SPECIES CONCEPT (A. Templeton)
Species are the most inclusive group of individuals having the potential for phenotypic cohesion
throught intrinsic cohesion mechanisms. Mechanisms of cohesion include: gene flow, stabilizing
selection, developmental contrainsts, reproductive isolation
There is, as of yet, no perfect definition of species that everyone can agree on. This is, perhaps, not
surprising as the factors that lead to the development of 2 species from 1 operate over very long time
scales. Should we be surprised to catch a species “in the act” of diverging into 2 species, making it
hard to decide if there is 1or 2?
ORIGIN OF NEW SPECIES
Speciation is caused by the evolution of genetic barriers to interbreeding.
1. Start with a single species composed of a set of interbreeding individuals.
2. A new variant(s) spreads throughout part of the species range. Barriers of this variant mate only
or preferentially with other bearers of the variant.
3. Once the mating preference becomes exclusive breeding within each of the two groups (with and
without the variant), two species exist.
4. The two species will continue to diverge at other loci.
REPRODUCTIVE ISOLATING MECHANISMS
Mechanisms that prevent interbreeding are generally divided into 2 groups
Pre-zygotic isolating mechanisms
Post-zygotic isolating mechanisms
29
How much genetic differentiation must there be for speciation to occur?
* Nothing by itself is critical for speciation. *
Speciation can be caused by changes at a few loci, or by changes at many loci.
The changes can relate to any part of the genome controlling any feature of the organisms;
morphology, behaviour, karyotype, allozymes, habitat preferences, etc etc.
eg. 2 species can be morphologically similar but genetically divergent - Daphnia
eg. 2 species can be morphologically different but genetically simliar - humans and chimps
MECHANISMS OF SPECIATION
A speciation mechanism is ANYTHING that restricts gene flow among populations and thus leads to
reproductive isolation.
Mechanisms of speciation have been classified in various ways (SEE HANDOUT).
Mayr has classified speciation mechanisms according to the level at which it occurs (individuals vs
populations) and , in the case of populations, according to their geographical relationship.
Templeton has classified speciation mechanisms in a population genetic framework.
GEOGRAPHIC SPECIATION
(Allopatric, Parapatric, Sympatric)
Allopatric speciation - Reproductive isolation (RI) evolves while 2 groups are separated by some
geographical barrier. The genetic changes can be caused by drift or natural selection, but they do
not occur to cause speciation per se. Speciation is a BY-PRODUCT of divergence in the absence of
gene flow.
When isolated populations come into secondary contact, there can be one of two outcomes: the
populations interbreed and remerge into a single species OR they remain separate as a result of RI
that has evolved during the separation.
It has been argued that the occurrence of post-mating isolation between the groups in secondary
contact can cause selection to favour the evolution of pre-mating isolation to REINFORCE the RI that
has already evolved. In other words, it natural selection can directly complete the speciation
process by favouring genotypes that mate within their own group. This is called speciation via
REINFORCEMENT.
This is theoretically possible, BUT it requires strong linkage disequilibrium between the loci causing
the pre- and post-mating isolation. This will occur initially, when the groups come back into contact.
However, interbreeding will break down the linkage disequilibrium - usually faster than selection can
increase the frequency of alleles for pre-mating isolation. In the meantime, selection is also acting to
reduce the frequency of alleles that cause the post-mating isolation. As this occurs, the barrier to
interbreeding will decrease and there will no longer be a selective advantage to pre-mating isolation.
30
Parapatric specation - RI evolves in a continuous population which spans an environmental
gradient. If different alleles are favoured in different environments, then a cline in allele
frequencies will develop. Clines are common in a nature and when they are gradual, they seldom
lead to speciation. However, when there are abrupt changes in the environment, the cline can be
very steep leading to what is called a STEP CLINE. Heterozygotes tend to be disadvantageous
(post-mating isolation) and their occurrence in the transition zone can lead to the evolution of
pre-mating isolation via reinforcement. If the transition zone is stable and long-lived, it may provide
the conditions necessary for reinforcement to occur - stability of the heterozygote disadvantage
providing sustained selection in favour of pre-mating isolation.
Hybrid zones - An area of contact between two noticeably different forms at which hybridization
takes place.
When the hybrid zone forms on either side of an abrupt environmental transition, it is considered to
PRIMARY.
When populations come back into secondary contact, they may hybridize at the contact zone. Such
zones of contact are considered to be SECONDARY hybrid zones. In practice, it is difficult to
determine whether a hybrid zone is primary or secondary. The relative frequency of the two types
has important implications for the relative importance of allopatric versus parapatric speciation.
Sympatric speciation - RI evolves within the range of the ancestral group, often as a result of spatial
environmental heterogeneity. This is a controversial idea because it REQUIRES the operation of
reinforcement.
The conditions for the occurrence of sympatric speciation are similar to the conditions required for the
establishment of a multiple niche polymorphism. There must be some sort of HABITAT
SELECTION such that individuals who have high fitness in one environment tend to choose that
environment and thus, tend to mate with other individuals that have high fitness in that same
environment. This habitat selection can provide the reduction in gene flow between habitats that
would allow the development of RI.
eg. The evolution of host races in the fruit fly Rhagoletis.
POPULATION GENETIC MODES OF SPECIATION
Templeton divides mechanisms of speciation into two main groups: TRANSILIENCE and
DIVERGENCE.
Speciation via transilience can occur when some event other than natural selection creates a
change in the genetic compositon of a species. Natural selection acting to maintain the “status quo”
is overcome by this event, and then natural selection acts to stabilize the new state.
Speciation via divergence occurs when RI evolves gradually as a consequence of the operation of
natural selection under different conditions.
31
DIVERGENCE MODES
Adaptive (similar to allopatric)
Some extrinsic barrier to gene flow develops. Isolated populations diverge due to adaptation to
different environments. RI is a secondary consequence of the adaptation. Rates of divergence are
dependent on population structure. Large panmictic populations that occupy similar environments
would be slow to diverge from one another even in the absence of gene flow.
Clinal (similar to parapatric)
Natural selection occurs along an environmental gradient with isolation by distance. This is most
likely to occur if selection is creating a cline of allele frequencies at a major locus with many modifier
loci. When different alleles at the major locus are favoured at opposite ends of the cline, the
accumulation of differences at the modifier locui, which enhance the phenotypic expression of the
trait under selection, can indirectly result in post-mating isolation.
Habitat (similar to sympatric)
There is no isolation by distance, and there is gene flow (or the potential for gene flow) among
groups of individuals which prefer, and have different fitnesses in different habitats. In order for this
to lead to a speciation event, there must be a genetic basis for habitat selection which can then lead
to assortative mating within groups. In general, the loci controlling the habitat selection need to be in
linkage disequilibrium with the loci controlling fitness differences in the various habitats.
TRANSILIENCE MODES
Genetic
A rapid change in the genetic composition of a population due to a founder effect. This is most likely
to occur in a large, panmictic population that gives rise to a small peripheral population. The founder
event can create linkage disequilibrium among loci which can effect the trajectory of natural selection
in the new population. If the population stays small, then inbreeding and homozygosity will increase
(relative to the ancestral population). The resulting change in the genetic architecture of the new
population, through the redevelopment of new co-adapted gene complexes, can lead to the
secondary development of RI between the new population and the ancestral population. One way
to think of this is as speciation via PEAK SHIFTS .
eg. Hawaiian Drosophila
Chromosomal
Chromosomal rearrangements such as inversions, translocation, and fusions can cause a high
degree of hybrid sterility. Generally, we expect the new variant to be eliminated by natural selection.
However, if it becomes fixed in one population, that population will be reproductively isolated from
other populations. One way to overcome the selective barrier is extreme inbreeding in small
populations which could rapidly fix the new chromosomal variant. This seems unlikely in many
species but species with small isolated populations such as rodents, some species of lizards and
some species of plants appear to have speciated via this mechanism.
Hybrid maintenance
When 2 species hybridize, the F1 hybrids may be viable/fertile, but there may be F2 breakdown. In
some cases there has been the evolution of a mechanism ot maintaine the viable F 1 state. In plants,
a common mechanism to do this is polyploidy. In animals, the development of parthenogenesis
can lead to new “species”.
32
Hybrid recombination
When 2 species hybridize, the F1 hybrids may be viable/fertile, but there may be F2 breakdown. In
plants, the F1 generation may be viable but essentially sterile. However, this F 1 state can be
maintained indefinitely via vegetative reproduction. During this time the plants will continue to
produce pollen and seeds and eventually recombinants may occur that are fertile, but that differ from
either parent. If these recombinants are interfertile amongst themselves, they can establish a new
species which is isolated from either parent via post-mating barriers.
DIFFERENT MODES OF SPECIATION ARE MORE LIKELY IN SPECIES WITH CERTAIN
ATTRIBUTES
Allopatric speciation (or speciation via adaptive divergence) is considered to the most common
form of speciation. There are no theoretical difficulties with this concept and virutally any species
can speciate if it is divided into isolated demes. This mode of speciation requires fairly long periods
of time - it has been estimated that Drosophila species take between 1.5 and 3.5 million years to
evolve allopatrically.
Speciation via genetic transilience could occur more rapidly than adaptive divergence and seems
more likely in situations where islands have been colonized by large mainland populations, and in
organisms with low vagility.
Parapatric (or clinal speciation) is also more likely in low-vagility organisms such as plants,
terrestrial snails, fossorial rodents, flightless insects, lizards, etc.
Sympatric speciation seems to be more likely in organisms that use various “hosts” such as
parasites and phytophagous insects.
Speciation is not adaptive in itself but it has profound consequences for adaptive evolution.
Populations can become more “fine-tuned” to their environment in the absence of gene flow from
populations in other environments. A group of organisms can occupy a much larger “ecological
space” if it is divided into reproductively isolated groups each specializing on one region of that
space. A species cannot be best at everything (recall the difficulty with maintaining polymorphism
via environmental heterogeneity) but reproductively isolated groups can “be best” at one particular
thing.
Rates of speciation have varied immensely over evolutionary time. The fossil record indicates that
there have been long periods of stasis interrupted by periods of rapid diversification.
Often, bursts of speciation involve ADAPTIVE RADIATIONS which are the evolutionary divergence of
members of one lineage into different adaptive zones.
33
Some generalizations about adaptive radiations:
* They often occur at the edge of species ranges where a new genetic combination might be
favoured in a different environment than that usually occupied by the species.
* A lack of direct competitors or predators will facilitate the process by allowing a species to invade
an environment to which it is not well adapted. This is difficult if well-adapted competitors or
predators are already there.
* Adaptive radiations often happen when something opens up new niches for colonization.
- archipelagos which are uninhabited when they first form are subsequently colonized (eg. the
invasion of the Galapagos Islands by a single finch species that radiated to fill multiple niches
occupied by non-finches on the mainland, the radiation of picture-wing Drosophila and
honeycreepers in the Hawaiian islands).
-profound changes in climate can open up vast areas of novel habitat. (eg. the drying of Africa turned
much of the forest into savannah allowing the adaptive radiation of ungulates).
-mass extinctions in one lineage can open up new niches for other lineages (eg. the extinction of
large carnivorous dinosaurs provided the opportunity for the radiation of large carnivorous mammals
and birds - only the canids and felids survive today).
* Adaptive radiations often happen after the evolution of a KEY INNOVATION. The evolution of a
new morphological feature in one lineage often opens up opportunities for that group to invade
niches which it could not previously occupy. (eg: the evolution of flight in birds and in bats, and the
evolution of modified jaw structure in cichlid fishes).
34
PHYLOGENY RECONSTRUCTION
Evolutionary biologists have 2 major tasks:
1.
2.
To determine the ecological and genetic mechanisms of evolutionary change.
To determine the actual history of evolutionary change.
To date, we have been mainly concerned with MICROEVOLUTION - changes in gene frequencies
below the species level. If such change proceeds beyond a certain point, we recognize that one
species could become 2. However, we have not considered the evolution of higher taxonomic
groups or MACROEVOLUTION.
The term that we use for evolutionary change within a lineage is ANAGENESIS.
The term that we use for evolutionary change leading to the branching of one lineage into two
lineages is CLADOGENESIS.
To understand macroevolution, we need to understand PHYLOGENY and SYSTEMATICS
Phylogeny is the pattern of branching showing the evolutionary relationships among species.
Systematics refers to the organization of organisms into hierarchical groups. Ultimately, evolutionary
biologists would like the system of classification to reflect phylogeny. Classification based on overall
morphological similarity often does correspond with phylogeny but it need not if organisms have
evolved in parallel or convergently.
Why Worry About Phylogeny?
If we have accurate phylogenies, we can ask questions about rates of evolution, patterns of
evolution, and adaptation via the comparative method. It allows us to identify INDEPENDENT
evolutionary events.
Phylogenetic terms
Monophyletic group - a group of taxa descended from a single ancestral taxon
Polyphyletic group - a group of taxa descended from two or more distinct ancestral taxa.
Paraphyletic group - a group of taxa derived from a single ancestral taxon, but one which does not
contain all of the descendants of the most recent common ancestor.
Plesiomorphy - a character showing the ancestral condition.
Apomorphy - a character derived from and differing from an ancestral condition.
Homologous - used of structures, traits or properties having common ancestry but not necessarily
retaining similarity of structure or function.
Analagous - pertaining to similarity of structure or function due to convergence rather than to
common ancestry.
When constructing phylogenies, we must strive to choose characters that are homologous among the
taxa under consideration, rather than analagous.
35
PHENETICS
(Numerical Taxonomy)
The phenetic approach involves grouping taxa on the basis of overall similarity for a large number of
characters. Any sort of characters can be used: morphological, molecular, behavioral. The tree of
relationships constructed using this method is called a PHENOGRAM or DENDROGRAM.
Assumptions
Evolution occurs at an approximately constant rate so that higher similarity reflects closer genetic
relationship.
Advantage:
It is not necessary to know whether the forms of each character are ancestral or derived.
Disadvantage:
It is not possible to reconstruct the original character states of the taxa from the phenograms. It is
not possible to learn how the traits are changing along the branches of the phenogram - all of the
data are reduced to single numbers characterizing the similarity (or distance) between pairs of taxa.
Once character state data are gathered from a group of taxa, it is necessary to convert it to a
measure of distance or similarity between all pairs of taxa (a distance matrix). There are various
ways to do this for different types of data. Examples for molecular data follow:
Nei’s Genetic Distance
It is common to gather allele frequency data for a large number of loci and then convert these data to
a measure of genetic distance.
I
=
(xi2
xiyi
x yi2)1/2
=
Jxy
(Jx x Jy)1/2
where xi is the frequency of the ith allele in population x
where yi is the frequency of the ith allele in population y
I is the normalized identity - the probability of choosing the same allele from each of population x and
y, relative to the probability of choosing the same allele twice from either population x or y.
D = -lnI
D is the genetic distance between population x and y.
To calculate I for many loci use
Jxy
(Jx x Jy)1/2
where J is the mean across loci.
36
Sequence Divergence
It is now common to gather data about DNA sequences directly. The raw data must be converted to
a measure of nucleotide divergence between pairs of taxa.
Restriction site data
It is possible to map the location of restriction sites in DNA fragments. The raw data will consist of a
table indicating whether a particular restriction site is present or absent in a particular DNA fragment.
Example:
DNA fragment 1
DNA fragment 2
1111110000111111001011101
1111111111001110111111111
Sequence divergence between fragments 1 and 2 can be estimated as:
dXY = -ln S
r
where r = the number of nucleotides in the enzyme recognition site (usually 4 or 6)
where
S = 2mxy
mx + my
where mxy = the number of sites shared by sequence X and Y
where mx = the number of sites in sequence X
where my = the number of sites in sequence Y
In the example above:
m12 = 14, m1 = 17, m2 = 22, S = (2 x 14)/(17 + 22) = 0.7179
If enzymes recognizing hexanucleotide sites were used, then r = 6 and
d12 = 0.055 or 5.5% sequence divergence
DNA sequence data
It is now much more common to obtain direct sequence data. In this case,
dxy is the proportion of observed nucleotide difference between a pair of sequences.
This observed number of differences is an underestimate of the actual number of changes that have
occurred as there may have been multiple changes at some sites such as A>G>C.
One correction used to account for this multiple substitution is the Jukes-Cantor correction:
xy = -3/4 ln(1 - [4/3]dxy)
Example:
DNA Sequence 1
DNA Sequence 2
dxy = 5/40 = 0.125
AGGCT GAGAG AGATA CCCCG GATAG CAGAT ACGAT ACGAT
AGGCC GAGAG AGATG CCCCG GGTAG CAAAT ATGAT ACGAT
*
*
*
*
*
xy = -3/4 ln(1 - [4/3]0.125) = 0.137 or 13.7 % sequence divergence.
37
Once we have constructed a distance matrix for PAIRS of taxa, we require a method to group taxa
based on the similarity(distance). A method that was once commonly used (but has since fallen out
of favor) is UPGMA (Unweighted Pair-group Method using Arithmetic means) (see example on
handout).
Many other methods have been developed, each designed to minimize the distortion of the original
distance matrix on the final phenogram. In other words, the branch lengths on the phenogram
should correspond to the actual genetic distances in the original data matrix.
Examples of other phenetic clustering methods:
Neighbor-Joining (Saitou and Nei. Molecular Biology and Evolution 4:406-425, 1987)
Fitch-Margoliash (Fitch and Margoliash. Science 155:279-284, 1967).
CLADISTICS
The cladistic approach involves grouping taxa on the basis of shared derived characters
(apomorphies). In other words, organisms that share apomorphies are more closely related to one
another than they are to taxa that do not possess the apomorphic character state. In order to use this
approach, it is necessary to classify character states as derived and ancestral. The tree of
relationships constructed using the approach is called a CLADOGRAM.
Assumption: Derived character states only evolve once.
Advantage: It is possible to reconstruct the pattern of character state changes from the cladogram.
Disadvantage: It may be difficult to determine which character states are ancestral and which are
derived.
One way to polarize character states is to use an OUTGROUP. An outgroup taxon is chosen based
on its close phylogenetic relationship to the group of taxa for which you are attempting to construct a
phylogeny (the INGROUP). The choice of outgroup is very important; it is absolutely essential that
the outgroup NOT be more closely related to any of the ingroup taxa than they are to one another.
Character states are said to be ancestral if they are shared by the outgroup and any of the ingroup
taxa. Character states that are unique to some subset of the ingroup taxa are said to be derived.
These character states define relationships among the ingroup taxa.We attempt to draw a cladogram
that is consistent with the pattern of character state change for all of the characters in our study. In
other words, we try to find a tree that requires each derived character state to evolve only once on
the cladogram. Unfortunately, this is rarely possible. A character that changes between the
ancestral and a particular derived state more than once on the cladogram is said to be
HOMOPLASIOUS. Thus, we have the dilemma of trying to decide which of the many possible trees
we could draw is the “best” tree. There are several criteria that are commonly used to make this
decision.
38
Parsimony
We draw all the possible trees suggested by our data set, and then ask, how many total character
state changes are required by each tree? The “best” tree is considered to be the tree that requires
the fewest character state changes, or steps, to explain the data. This criterion is based on the idea
that character state change is rare, so that the shortest tree is most likely to represent the true tree.
This method is good if homoplasy is not dispersed among many characters.
Compatibility
We draw all the possible trees suggested by our data set, and then ask, which tree is congruent
(requires only a single character state change) with the highest number of characters? The “best”
tree has the fewest number of homoplasious characters. This method is good if there are a few
characters that seem to be evolutionary labile (easily change between the ancestral and derived
state).
Ideally, there is only one “best” tree. However, we often find that there are a large number of equally
parsimonious or equally compatible trees. How do we decide which one is best? The analysis of
methods to determine which trees are best is a very active area of evolutionary research. New
methods are being proposed all the time. However, one example that has been widely used in the
past (and still is) is BOOTSTRAPPING.
Boostrapping
Bootstrapping involves constructing a large number of replicate data sets from the original data set
by randomly choosing characters from the original data and then replacing them before choosing
again. Suppose we have surveyed 100 characters. We construct a replicate data set by randomly
choosing a character from among the 100 to include in our new data set. Then we “put it back” and
choose again. We repeat this process until we have randomly choosen 100 characters for our new
data set. This data set may not include some of the 100 original characters, but it may also include
some of them many times. We repeat this process until we have constructed a large number (100
or more) of replicate data sets, each consisting of 100 characters. Then we construct trees from
each of the data sets and keep the “best” trees according to our optimality criteria. A CONSENSUS
tree is then constructed from this group of trees: it is the tree in which each monophyletic group of
taxa occurs most frequently. The frequency with which each monophyletic group occurs among the
set of trees is called the bootstrap value. The higher the value, the more confident we are in the
proposed grouping.
MAXIMUM LIKELIHOOD
Maximum likelihood methods of phylogenetic inference evaluate a hypothesis about evolutionary
history in terms of the probability that a proposed model of the evolutionary process and the
hypothesized history would give rise to the observed data. In the case of phylogeny reconstruction,
the data are observed nucleotide or protein sequences, and the unknowns are the branching order
and branch lengths of a phylogenetic tree. We must specify a model that accounts for the
conversion of one sequence into another. In some cases, parameters of the model (for example,
patterns of substiution) can be estimated from the data. The maximum likelihood approach evaluates
the probability that the model we have chosen will have generated the observed sequences.
Phylogenies are inferred by finding those trees that yield the highest likelihood values.
39
PROBLEMS WITH PHYLOGENY RECONSTRUCTION
There are two major reasons why we may not find the true phylogeny with these methods.
1.
Variation in evolutionary rates. Phenetics is particulary impacted by changes in
evolutionary rates. As a result of rapid evolutionary change within one lineage, it may be appear to
be quite divergent from even its closest relatives. On the other hand, its close relatives may appear to
be similar to more distant taxa that have been evolving very slowly. Since phenetics groups taxa
based on overall similarity, the fast evolving lineage will be placed in the wrong position on the
phenogram.
2.
Homoplasy. When characters change state more than once during evolutionary history they
confuse our perception of phylogenetic relationships.
There are three types of homoplasy we need to be concerned with:
a.
Convergence - the evolution of a derived state from two different starting points. The
descendents of two different lineages resemble each more than did the ancestors. This often
occurs when a common problem is “solved” with a similar solution in 2 unrelated lineages.
eg. the wings of birds and bats, the sucking mouth parts of mosquitoes and true bugs, the trachea of
chelicerates and insects.
b.
Parallelism - the evolution of a derived state by a similar pathway in two lineages that share a
common ancestor. eg. the parallel radiation of marsupial and placental mammals.
c.
Evolutionary reversal - the loss of a derived state back to the ancestral state. eg. the
redevelopment of wings in a lineage of wingless insects (the ancestral state is winged).
These processes can cause problems for phenetics because they cause distantly related taxa to be
more similar to one another than they are to their closest relatives. With a cladistic approach, we
group taxa that share a derived character state. If that state has independently evolved several
times, we will incorrectly group together all of the taxa that possess it when in fact, they are not each
other’s closest relatives. These sorts of phenomena are the reason that we do not get one “best”
tree in a cladistic analysis (there are several equally parsimonious or compatible trees depending on
which characters are considered to be homoplasious).