Download Gen660_Lecture6B_MolEvo

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Site-specific recombinase technology wikipedia , lookup

Genome evolution wikipedia , lookup

DNA barcoding wikipedia , lookup

Genetic drift wikipedia , lookup

Group selection wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Koinophilia wikipedia , lookup

Population genetics wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
The Neutral Theory
M. Kimura, 1968
Most observed DNA substitutions are likely to be neutral = no effect on fitness.
Polymorphisms arise through new mutations.
- deleterious mutations are rapidly removed by purifying selection,
which must be pervasive
- positive selection is likely to be rare
- thus, remaining observed polymorphisms likely to be neutral
Given a ~constant mutation rate, can convert the # of substitutions into
time of divergence since speciation = molecular clock theory.
Neutral changes evolve by genetic drift, not natural selection.
* Most are probably lost, some can become fixed in the population
1
The Nearly-Neutral Theory
T. Ohta, 1973
Many slightly deleterious (or slightly advantageous) substitutions are not
selected against efficiently if population sizes are large.
* Small populations are more subject to drift (e.g. random events).
* Selection is ‘slower’ in large populations … therefore many weakly
deleterious substitutions have yet to be removed by selection.
** Therefore, considering the population size N is an important concept
(or more importantly, the ‘effective’ population size Ne .. see more later)
Thus, many substitutions that are nearly neutral can evolve mostly by drift
** Practically what this means is that SOME substitutions found
in extant sequences can be slightly deleterious & have yet to be removed
2
Theory of Darwinian Evolution (Selectionism)
This opposing theory states that much of inter-specific variation
may NOT be neutral but selected for through positive selection.
3
Theory of Darwinian Evolution (Selectionism)
This opposing theory states that much of inter-specific variation
may NOT be neutral but selected for through positive selection.
Deleterious mutation
Compensatory changes can be positively selected (& context dependent)
4
Theory of Darwinian Evolution (Selectionism)
This opposing theory states that much of inter-specific variation
may NOT be neutral but selected for through positive selection.
The relative importance/frequency of neutral mutations
vs. selected advantageous alleles is still debated.
Clearly both are at work, likely fluctuate over time
(A. Wagner: cycles of neutral change & positive selection:)
Much attention is devoted to cases of positive selection, since it shapes novelty
5
Signatures of Selection
Different types of selection leave behind different signatures on the genome
Negative selection: reduces variation at the affected site(s) but also at
neighboring sites through background selection
6
Signatures of Selection
Different types of selection leave behind different signatures on the genome
Negative selection: reduces variation at the affected site(s) but also at
neighboring ‘linked’ neutral sites through background selection
Positive selection through recent selective sweep: reduces variation flanking
the selected site (even if neutral) due to hitchhiking
* background and hitchiking selection are important concepts, because
it means not all neutral-site evolution is stochastic (drift)
7
Recent selective sweeps can reduce variation flanking a selected locus
Variation can recover over time due to new mutation and recombination
Strength of selection &
time since sweep affect
degree of reduced var.
If sweep goes to completion,
the polymorphism is said
to be ‘fixed’ in the species
(otherwise a ‘partial sweep’).
8
Recombination rates have a major influence on selection signatures
Gametes
After many generations, recombination acts
to shorten the region of ‘linked’ sequence.
Signatures of Selection
Different types of selection leave behind different signatures on the genome
Negative selection: reduces variation at the affected site(s) but also at
neighboring sites through background selection
Positive selection through recent selective sweep: reduces variation flanking
the selected site (even if neutral) due to hitchhiking
Diversifying selection can increase variation since >1 extreme alleles selected
e.g. selection for diverse viral antigens to evade host immune system
Balancing selection can increase variation by maintaining >1 allele in population
e.g. maintained heterozygosity (sickle cell anemia)
OR
different alleles in different subpopulations due to fluctuating environments
10
Signatures of Selection
Also different methods of looking for these signatures
1. Evolutionary rate within species vs. between species
e.g. Ka/Ks ratio … McDonald-Kreitman test … HKA test
2. Frequency spectrum: frequency of different alleles in the population
e.g. Tajima’s D … Fay & Wu’s H … Fu & Li’s D*
3. Linkage disequillibrium & Haplotype structure
For all of these tests: compare REAL DATA to
a MODEL of what data should look like under neutral evolution
11
Ks (sometimes called Ds ): # of silent substitutions that encode SAME (synonymous) codon
* often these changes are ASSUMED to be neutral**
* given a constant rate of point mutations, Ks can be used to date a sequence
** now people realize that Ks can also be constrained by other things besides codon
Ka (sometimes called DN ): # of substitutions encoding a nonsynonymous codon
If we assume that Ks reflects the underlying neutral rate of change:
Ka/Ks = 1 …. Rate of codon changes is the same as rate of silent changes
* taken to mean NO constraint on gene sequence
Ka/Ks < 1 …. Rate of codon changes is LESS than the rate of neutral change
* implies deleterious codon changes were removed by purifying selection
* therefore implies constraint on gene sequence
Ka/Ks > 1 …. Rate of codon changes is the GREATER than rate of silent changes
* implies codon changes have been selected for by positive selection12
Ks (sometimes called Ds ): # of silent substitutions that encode SAME (synonymous) codon
* often these changes are ASSUMED to be neutral**
* given a constant rate of point mutations, Ks can be used to date a sequence
** now people realize that Ks can also be constrained by other things besides codon
Ka (sometimes called DN ): # of substitutions encoding a nonsynonymous codon
Ka/Ks test is VERY conservative in identifying positive selection
Ka/Ks = 1 …. Rate of codon changes is the same as rate of silent changes
* taken to mean NO constraint on gene sequence
Ka/Ks < 1 …. Rate of codon changes is LESS than the rate of neutral change
* implies deleterious codon changes were removed by purifying selection
* therefore implies constraint on gene sequence
Ka/Ks > 1 …. Rate of codon changes is the GREATER than rate of silent changes
* implies codon changes have been selected for by positive selection13
Polymorphism within species vs. Divergence between species
Under neutrality, expect the evolutionary rate of a gene to be constant.
Therefore, evolutionary rate within the species should ~ rate between species.
McDonald-Kreitman (MK) Test for Coding Regions
McDonald & Kreitman, 1991
Ka/Ks within species should be ~ Ka/Ks between species for neutrally evolving sequences
Within species
differences
Number of Non-synonymous changes
Number of Synonymous changes
Polymorphic
Between species
differences
Fixed
14
Polymorphism data
3 Polymorphic sites
within species
(2 non-synonymous
1 synonymous)
Outgroup sequence(s)
4 Fixed differences
between species
(1 non-synonymous
3 synonymous)
15
Polymorphism within species vs. Divergence between species
Under neutrality, expect the evolutionary rate of a gene to be constant.
Therefore, evolutionary rate within the species should ~ rate between species.
McDonald-Kreitman (MK) Test for Coding Regions
McDonald & Kreitman, 1991
Ka/Ks within species should be ~ Ka/Ks between species for neutrally evolving sequences
Within species
differences
Number of Non-synonymous changes
Number of Synonymous changes
Polymorphic
2
42
Between species
differences
Fixed
7
17
Ratio within species (2/42 = 0.048) is LESS than ratio between species (7/17 = 0.41)
Is this significant? Use statistics to assign p-value (G-test or Fisher’s exact)
here p = 0.007
16
MK test for Positive Selection
If rate of evolution is ~equal WITHIN and BETWEEN species (i.e. insignificant p-value):
suggests Neutral Evolution.
If rate of evolution WITHIN species is LESS than rate BETWEEN species::
- could be Positive Selection within the species being considered.
Because rate of variation has been reduced within species due to sweep
- could also be relaxed constraint in outgroup
17
Polymorphism data
Outgroup sequence(s)
18
MK test for Positive Selection
If rate of evolution is ~equal WITHIN and BETWEEN species (i.e. insignificant p-value):
suggests Neutral Evolution.
If rate of evolution WITHIN species is LESS than rate BETWEEN species::
- could be Positive Selection within the species being considered.
Because rate of variation has been reduced within species due to sweep
- could also be relaxed constraint in outgroup
If rate of evolution WITHIN species is GREATER than rate BETWEEN species:
could be several things:
- loss of selective constraint: no more purifying selection to remove variation
- diversifying selection: selection to increase variation
- balancing selection: may be selection to maintain >1 allele
- * OR can actually reflect deleterious alleles yet to be removed from the population
* Deleterious alleles likely at low frequency - can redo test only on
polymorphisms at medium frequency (eg. >10% in population)
19
A related approach generalized beyond coding sequences:
The HKA test
Hudson, Kreitman, & Aguade, 1987
Assumes that # segregating sites S within the species is related to
the # of divergent sites D between species (dependent on
the mutation rate  and Time between species)
HKA compares estimated parameters at 2 (or more) genes
 = 4 Ne 
‘Effective’ population size
Mutation rate per gene
per generation
(accounting for skews in pop. dynamics,
sex ratios, breeding skews)
Neutral model: all genes share same Ne and have their own 
20
HKA Neutral model: all genes have their own and should share same Ne
Gene 1: S (# variable sites WITHIN species 1) = 3
Gene 1: D (# variable sites fixed BETWEEN species) = 4
 = 4 Ne where  calculated from S1 and  = D1/2T
Gene 2: S (# variable sites WITHIN species 1) = 8
Gene 2: D (# variable sites fixed BETWEEN species) = 9
 = 4 Ne where  calculated from S2 and  = D2/2T and
If Ne is the same at both genes, can’t reject the null model.
But if Ne is Different indicates each gene has a distinct history, reject null.
p-values come from coalescence simulations
21
We can imagine that choice of genes could have a big impact here …
Multi-locus HKA test (Wright & Charlesworth, Genetics 2004):
uses maximum likelihood (ML) to compare many different genes
(software available)
Compares the log-likelihood of a model in which
ALL genes are evolving neutrally,
vs. model in which one or more chosen genes are evolving
non-neutrally compared to rest.
22
We want to USE the signatures to IDENTIFY loci subject to selection
BUT ….
some of these signatures can emerge due to demography:
* unknown population structure
* population expansions that change allele frequencies and
change the selection efficiency
* population contractions (‘bottlenecks’) that eliminate
genetic variation
This violates assumptions for many neutral models:
* Random mating within population (panmictic)
* Mating rate according to Poisson distribution
* Constant population size
Population structure can mimic balancing selection
This might appear as excess of intermediate-frequency alleles … 24
Population structure can mimic balancing selection
… when in fact it’s really two distinct, non-mating populations
25
Changes in population size can mimic selection as well
26
Population expansion can mimic positive selection (increasing frequency of rare alleles)
Changes in population size can mimic selection as well
27
Historical bottleneck can mimic balancing selection (excess of intermediate-frequency alleles)
The power of genome-wide scans:
Demographic effects should be constant across each genome.
28
Methods based on the Allele Frequency Spectrum
1. For each allele at a given locus, calculate the frequency.
Some alleles will be at high frequencies in the population,
some at low frequencies (i.e. very uncommon)
2.
Make a histogram of the % individuals that have alleles of different frequencies
e.g. how many individuals have (any of the) low-frequency alleles?
29
From Nielsen Nat Rev Gen 2005 review
Methods based on the Allele Frequency Spectrum
30
From “A Haplotype Map of the Human Genome” Nature 2005
Methods based on the Allele Frequency Spectrum
Tajima’s D (F. Tajima, 1989): takes the # of segregating sites within species (S)
and also the average # difference between each pair of sequences ()
S=3
 = (2 + 2 + 1 + 2) + (2 + 1 + 0) + (1 + 2) +(1) = 1.4
10 pairwise comparisons
avg. # difs between
each pair of sequences
31
Tajima’s D compares S and  to estimate the proportion of low/high-frequency alleles
Methods based on the Allele Frequency Spectrum
Tajima’s D (F. Tajima, 1989): takes the # of segregating sites within species (S)
and also the average # difference between each pair of sequences ()
Negative Tajima’s D = excess of low-frequency alleles (= reduced variation)
Indicates positive selection, OR nearly-neutral alleles, OR population expansion**
Positive Tajima’s D = excess of intermediate-frequency alleles
(low amounts of both high- and low-frequency alleles)
Indicates balancing selection OR partial sweep OR population bottleneck**
How can you get a p-value? Difficult to estimate - best to compare across loci
32
Empirical model for significance of Tajima’s D
Sliding window across a locus
From Nielsen Nat Rev Gen 2005 review
OR
Compare to several other loci
From Will et al. PLoS Genetics 2010
33
Methods based on the Allele Frequency Spectrum
Tajima’s D (F. Tajima, 1989): takes the # of segregating sites within species (S)
and also the average # difference between each pair of sequences ()
Scores an excess of low-frequency alleles or dearth of low/high frequency alleles
Fay & Wu’s H (Justin Fay & Chung-I Wu, 2000): looks for an excess of derived alleles
(i.e. those not found in the ancestral state) at high frequency
Designed to look for hitchhiking after positive selection
Fu & Li methods (Fu & Wen-Hsiung Li, 2000):
* Each test is designed to find skews at different windows of the frequency spectrum
34