* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Gen660_Lecture6B_MolEvo
Site-specific recombinase technology wikipedia , lookup
Genome evolution wikipedia , lookup
DNA barcoding wikipedia , lookup
Genetic drift wikipedia , lookup
Group selection wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
Koinophilia wikipedia , lookup
The Neutral Theory M. Kimura, 1968 Most observed DNA substitutions are likely to be neutral = no effect on fitness. Polymorphisms arise through new mutations. - deleterious mutations are rapidly removed by purifying selection, which must be pervasive - positive selection is likely to be rare - thus, remaining observed polymorphisms likely to be neutral Given a ~constant mutation rate, can convert the # of substitutions into time of divergence since speciation = molecular clock theory. Neutral changes evolve by genetic drift, not natural selection. * Most are probably lost, some can become fixed in the population 1 The Nearly-Neutral Theory T. Ohta, 1973 Many slightly deleterious (or slightly advantageous) substitutions are not selected against efficiently if population sizes are large. * Small populations are more subject to drift (e.g. random events). * Selection is ‘slower’ in large populations … therefore many weakly deleterious substitutions have yet to be removed by selection. ** Therefore, considering the population size N is an important concept (or more importantly, the ‘effective’ population size Ne .. see more later) Thus, many substitutions that are nearly neutral can evolve mostly by drift ** Practically what this means is that SOME substitutions found in extant sequences can be slightly deleterious & have yet to be removed 2 Theory of Darwinian Evolution (Selectionism) This opposing theory states that much of inter-specific variation may NOT be neutral but selected for through positive selection. 3 Theory of Darwinian Evolution (Selectionism) This opposing theory states that much of inter-specific variation may NOT be neutral but selected for through positive selection. Deleterious mutation Compensatory changes can be positively selected (& context dependent) 4 Theory of Darwinian Evolution (Selectionism) This opposing theory states that much of inter-specific variation may NOT be neutral but selected for through positive selection. The relative importance/frequency of neutral mutations vs. selected advantageous alleles is still debated. Clearly both are at work, likely fluctuate over time (A. Wagner: cycles of neutral change & positive selection:) Much attention is devoted to cases of positive selection, since it shapes novelty 5 Signatures of Selection Different types of selection leave behind different signatures on the genome Negative selection: reduces variation at the affected site(s) but also at neighboring sites through background selection 6 Signatures of Selection Different types of selection leave behind different signatures on the genome Negative selection: reduces variation at the affected site(s) but also at neighboring ‘linked’ neutral sites through background selection Positive selection through recent selective sweep: reduces variation flanking the selected site (even if neutral) due to hitchhiking * background and hitchiking selection are important concepts, because it means not all neutral-site evolution is stochastic (drift) 7 Recent selective sweeps can reduce variation flanking a selected locus Variation can recover over time due to new mutation and recombination Strength of selection & time since sweep affect degree of reduced var. If sweep goes to completion, the polymorphism is said to be ‘fixed’ in the species (otherwise a ‘partial sweep’). 8 Recombination rates have a major influence on selection signatures Gametes After many generations, recombination acts to shorten the region of ‘linked’ sequence. Signatures of Selection Different types of selection leave behind different signatures on the genome Negative selection: reduces variation at the affected site(s) but also at neighboring sites through background selection Positive selection through recent selective sweep: reduces variation flanking the selected site (even if neutral) due to hitchhiking Diversifying selection can increase variation since >1 extreme alleles selected e.g. selection for diverse viral antigens to evade host immune system Balancing selection can increase variation by maintaining >1 allele in population e.g. maintained heterozygosity (sickle cell anemia) OR different alleles in different subpopulations due to fluctuating environments 10 Signatures of Selection Also different methods of looking for these signatures 1. Evolutionary rate within species vs. between species e.g. Ka/Ks ratio … McDonald-Kreitman test … HKA test 2. Frequency spectrum: frequency of different alleles in the population e.g. Tajima’s D … Fay & Wu’s H … Fu & Li’s D* 3. Linkage disequillibrium & Haplotype structure For all of these tests: compare REAL DATA to a MODEL of what data should look like under neutral evolution 11 Ks (sometimes called Ds ): # of silent substitutions that encode SAME (synonymous) codon * often these changes are ASSUMED to be neutral** * given a constant rate of point mutations, Ks can be used to date a sequence ** now people realize that Ks can also be constrained by other things besides codon Ka (sometimes called DN ): # of substitutions encoding a nonsynonymous codon If we assume that Ks reflects the underlying neutral rate of change: Ka/Ks = 1 …. Rate of codon changes is the same as rate of silent changes * taken to mean NO constraint on gene sequence Ka/Ks < 1 …. Rate of codon changes is LESS than the rate of neutral change * implies deleterious codon changes were removed by purifying selection * therefore implies constraint on gene sequence Ka/Ks > 1 …. Rate of codon changes is the GREATER than rate of silent changes * implies codon changes have been selected for by positive selection12 Ks (sometimes called Ds ): # of silent substitutions that encode SAME (synonymous) codon * often these changes are ASSUMED to be neutral** * given a constant rate of point mutations, Ks can be used to date a sequence ** now people realize that Ks can also be constrained by other things besides codon Ka (sometimes called DN ): # of substitutions encoding a nonsynonymous codon Ka/Ks test is VERY conservative in identifying positive selection Ka/Ks = 1 …. Rate of codon changes is the same as rate of silent changes * taken to mean NO constraint on gene sequence Ka/Ks < 1 …. Rate of codon changes is LESS than the rate of neutral change * implies deleterious codon changes were removed by purifying selection * therefore implies constraint on gene sequence Ka/Ks > 1 …. Rate of codon changes is the GREATER than rate of silent changes * implies codon changes have been selected for by positive selection13 Polymorphism within species vs. Divergence between species Under neutrality, expect the evolutionary rate of a gene to be constant. Therefore, evolutionary rate within the species should ~ rate between species. McDonald-Kreitman (MK) Test for Coding Regions McDonald & Kreitman, 1991 Ka/Ks within species should be ~ Ka/Ks between species for neutrally evolving sequences Within species differences Number of Non-synonymous changes Number of Synonymous changes Polymorphic Between species differences Fixed 14 Polymorphism data 3 Polymorphic sites within species (2 non-synonymous 1 synonymous) Outgroup sequence(s) 4 Fixed differences between species (1 non-synonymous 3 synonymous) 15 Polymorphism within species vs. Divergence between species Under neutrality, expect the evolutionary rate of a gene to be constant. Therefore, evolutionary rate within the species should ~ rate between species. McDonald-Kreitman (MK) Test for Coding Regions McDonald & Kreitman, 1991 Ka/Ks within species should be ~ Ka/Ks between species for neutrally evolving sequences Within species differences Number of Non-synonymous changes Number of Synonymous changes Polymorphic 2 42 Between species differences Fixed 7 17 Ratio within species (2/42 = 0.048) is LESS than ratio between species (7/17 = 0.41) Is this significant? Use statistics to assign p-value (G-test or Fisher’s exact) here p = 0.007 16 MK test for Positive Selection If rate of evolution is ~equal WITHIN and BETWEEN species (i.e. insignificant p-value): suggests Neutral Evolution. If rate of evolution WITHIN species is LESS than rate BETWEEN species:: - could be Positive Selection within the species being considered. Because rate of variation has been reduced within species due to sweep - could also be relaxed constraint in outgroup 17 Polymorphism data Outgroup sequence(s) 18 MK test for Positive Selection If rate of evolution is ~equal WITHIN and BETWEEN species (i.e. insignificant p-value): suggests Neutral Evolution. If rate of evolution WITHIN species is LESS than rate BETWEEN species:: - could be Positive Selection within the species being considered. Because rate of variation has been reduced within species due to sweep - could also be relaxed constraint in outgroup If rate of evolution WITHIN species is GREATER than rate BETWEEN species: could be several things: - loss of selective constraint: no more purifying selection to remove variation - diversifying selection: selection to increase variation - balancing selection: may be selection to maintain >1 allele - * OR can actually reflect deleterious alleles yet to be removed from the population * Deleterious alleles likely at low frequency - can redo test only on polymorphisms at medium frequency (eg. >10% in population) 19 A related approach generalized beyond coding sequences: The HKA test Hudson, Kreitman, & Aguade, 1987 Assumes that # segregating sites S within the species is related to the # of divergent sites D between species (dependent on the mutation rate and Time between species) HKA compares estimated parameters at 2 (or more) genes = 4 Ne ‘Effective’ population size Mutation rate per gene per generation (accounting for skews in pop. dynamics, sex ratios, breeding skews) Neutral model: all genes share same Ne and have their own 20 HKA Neutral model: all genes have their own and should share same Ne Gene 1: S (# variable sites WITHIN species 1) = 3 Gene 1: D (# variable sites fixed BETWEEN species) = 4 = 4 Ne where calculated from S1 and = D1/2T Gene 2: S (# variable sites WITHIN species 1) = 8 Gene 2: D (# variable sites fixed BETWEEN species) = 9 = 4 Ne where calculated from S2 and = D2/2T and If Ne is the same at both genes, can’t reject the null model. But if Ne is Different indicates each gene has a distinct history, reject null. p-values come from coalescence simulations 21 We can imagine that choice of genes could have a big impact here … Multi-locus HKA test (Wright & Charlesworth, Genetics 2004): uses maximum likelihood (ML) to compare many different genes (software available) Compares the log-likelihood of a model in which ALL genes are evolving neutrally, vs. model in which one or more chosen genes are evolving non-neutrally compared to rest. 22 We want to USE the signatures to IDENTIFY loci subject to selection BUT …. some of these signatures can emerge due to demography: * unknown population structure * population expansions that change allele frequencies and change the selection efficiency * population contractions (‘bottlenecks’) that eliminate genetic variation This violates assumptions for many neutral models: * Random mating within population (panmictic) * Mating rate according to Poisson distribution * Constant population size Population structure can mimic balancing selection This might appear as excess of intermediate-frequency alleles … 24 Population structure can mimic balancing selection … when in fact it’s really two distinct, non-mating populations 25 Changes in population size can mimic selection as well 26 Population expansion can mimic positive selection (increasing frequency of rare alleles) Changes in population size can mimic selection as well 27 Historical bottleneck can mimic balancing selection (excess of intermediate-frequency alleles) The power of genome-wide scans: Demographic effects should be constant across each genome. 28 Methods based on the Allele Frequency Spectrum 1. For each allele at a given locus, calculate the frequency. Some alleles will be at high frequencies in the population, some at low frequencies (i.e. very uncommon) 2. Make a histogram of the % individuals that have alleles of different frequencies e.g. how many individuals have (any of the) low-frequency alleles? 29 From Nielsen Nat Rev Gen 2005 review Methods based on the Allele Frequency Spectrum 30 From “A Haplotype Map of the Human Genome” Nature 2005 Methods based on the Allele Frequency Spectrum Tajima’s D (F. Tajima, 1989): takes the # of segregating sites within species (S) and also the average # difference between each pair of sequences () S=3 = (2 + 2 + 1 + 2) + (2 + 1 + 0) + (1 + 2) +(1) = 1.4 10 pairwise comparisons avg. # difs between each pair of sequences 31 Tajima’s D compares S and to estimate the proportion of low/high-frequency alleles Methods based on the Allele Frequency Spectrum Tajima’s D (F. Tajima, 1989): takes the # of segregating sites within species (S) and also the average # difference between each pair of sequences () Negative Tajima’s D = excess of low-frequency alleles (= reduced variation) Indicates positive selection, OR nearly-neutral alleles, OR population expansion** Positive Tajima’s D = excess of intermediate-frequency alleles (low amounts of both high- and low-frequency alleles) Indicates balancing selection OR partial sweep OR population bottleneck** How can you get a p-value? Difficult to estimate - best to compare across loci 32 Empirical model for significance of Tajima’s D Sliding window across a locus From Nielsen Nat Rev Gen 2005 review OR Compare to several other loci From Will et al. PLoS Genetics 2010 33 Methods based on the Allele Frequency Spectrum Tajima’s D (F. Tajima, 1989): takes the # of segregating sites within species (S) and also the average # difference between each pair of sequences () Scores an excess of low-frequency alleles or dearth of low/high frequency alleles Fay & Wu’s H (Justin Fay & Chung-I Wu, 2000): looks for an excess of derived alleles (i.e. those not found in the ancestral state) at high frequency Designed to look for hitchhiking after positive selection Fu & Li methods (Fu & Wen-Hsiung Li, 2000): * Each test is designed to find skews at different windows of the frequency spectrum 34