Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Population structure • The evolutionary significance of structure • Detecting and describing structure – Wright’s F statistics • Implications for genetic variability – Inbreeding effects of structure – The Wahlund effect – Drift and founder effects • Island models of population structure – Identity by descent – Diffusion methods – The coalescent with structure • Selection in subdivided populations – Location adaptation – Clines – Wright’s Shifting-Balance theory Copyright: Gilean McVean, 2001 1 Population structure Distribution of surname Hannah • Non-random location • Non-random mating Goodacre and Sykes Genetic and phenotypic divergence due to Chance Selection Selection plus chance Copyright: Gilean McVean, 2001 2 Detecting and describing genetic structure Wright’s FST statistic Heterozygosity over all populations HT − H S = HT Average heterozygosity within subpopulations Testing by permutation Copyright: Gilean McVean, 2001 3 The hierarchical nature of F statistics • F statistics can be used to contrast structure at different levels e.g. HS − HI FIS = HS Average withinindividual heterozygosity measure of inbreeding H Individual < H Subpopulation < H Population < H Region < H Total Copyright: Gilean McVean, 2001 4 FST in natural populations Allozymes Organism HT HS FST Human (major races) 0.130 0.121 0.069 Human (Yanomama) 0.039 0.036 0.077 House mouse 0.097 0.086 0.113 Jumping rodent 0.037 0.012 0.676 Nei (1975) SNPs HT HS FST Human (major races) 0.195 0.201 0.067 Drosophila melanogastera 0.0154 0.0151 0.023 Organism aBased on pairwise diversity Copyright: Gilean McVean, 2001 5 The inbreeding effect of population structure • Differences in allele frequency between populations lead to an excess of homozygotes HWeqm Combined samples Expected homozygosity p12 + q12 Observed homozygosity p12 + q12 + σ 2p + σ 2q Heterozygosity = 1- Homozygosity σ 2p + σ 2q FS − FT FST = = 1 − FT 1− p2 − q 2 Copyright: Gilean McVean, 2001 6 The Wahlund effect • Increase in heterozyogisty following mixing of isolated populations Combine Random mating • Medical implications for disease incidence in admixed populations – Recessive disease reduced by mixing Disease High risk population Disease allele frequency Cystic fibrosis Caucasians 0.022 Hopi 0.07 Ashkenazi Jews 0.013 Albinism Tay-Sachs disease Copyright: Gilean McVean, 2001 7 Differences between allozymes and DNA? • American oysters (Crassostrea virginica) 1 Allozymes 0.8 0.6 0.4 0.2 0 MA SC GA FL FL FL FL FL LA 1 DNA mtDNA 0.8 0.6 0.4 0.2 0 MA SC GA GFL FL FL FL FL LA Avise (1994) Copyright: Gilean McVean, 2001 8 Differences between allozymes? Locus FST pgm 0.028 pgi 0.052 got 0.017 ak 0.062 bdh 0.034 α-gpdh to hk 0.027 Checkersport butterfly Euphydryas editha McKechnie et al. 1975 0.035 0. 291 Unusually high differentiation Problems with FST • Arbitrary a priori choice of structure to test • High sampling variance when polymorphism low • Throws away much information Copyright: Gilean McVean, 2001 9 Population genetics models of structure • Quantify relationship between genetic drift, selection and population differentiation Island model n-island model • Assumptions – – – – Infinite mainland population (island) Equal population size (n-island) Constant population size Proportion m of population replaced migrants each generation – Symmetric migration (n-island) Copyright: Gilean McVean, 2001 10 Identity by descent in the island model Event Same parent 1 ft-1 0 1/ 2Ne 1 − 1 / 2 N e − 2m 2m Identity Probability Different parents Migration At equilibrium 1 f = 1 + 4Nem 4 N e m = 2 × Number of migrants per generation Only a few migrants each generation are required to prevent a build up of identity within the island population Copyright: Gilean McVean, 2001 11 Relationship between FST and migration rate 1 E[ FST ] ≈ 1 + 4Nem • Can estimate scaled migration rate from estimated FST (assuming equilibrium, etc.) 100 Nem 10 1 0 0.2 0.4 0.6 0.8 1 FST 0.1 0.01 E.g. in humans, FST ≈ 0.067 N e m ≈ 3.5 NB: This is NOT a good estimator – do not trust the answer! Copyright: Gilean McVean, 2001 12 Wright’s diffusion model for allele frequencies with migration Mainland frequency = xm Deterministic M δx = m( xm − x) x(1 − x) Vδx = 2Ne Island frequency = x Drift Wright (1951) 4 N e m = 10 Probability density Allele frequency on mainland = 0.5 4 N e m = 0 .2 0 0.2 0.4 0.6 0.8 1 allele frequency on island Copyright: Gilean McVean, 2001 13 Example: SNP frequencies in African Americans • Goddard et al. (2000) African American frequency – 114 SNPs in 33 genes – 190 African Americans sampled 1 0.75 0.5 0.25 0 0 0.25 0.5 0.75 1 Worldwide frequency • Likelihood estimation of Nem from sample – assume independence between SNPs 0 ∆Ln(L) 0 5 -10 -20 10 Nem 15 N e m = 5.0 -30 -40 -50 Copyright: Gilean McVean, 2001 14 The coalescent in structured populations • Two-island model Population 1 Pr{coalescence} = Pr{migration} = Copyright: Gilean McVean, 2001 Population 2 ni ( ni − 1) 4Ne ni m 15 The time to coalescence for two sequences sampled from the same population Pr{1st event is a coalescence} 1/ 2Ne 1 = 1 / 2 N e + 2m 1 + 4 N e m Pr{1st event is a migration} 2m 4Nem = 1 / 2 N e + 2m 1 + 4 N e m 4Ne Expected time to coalescence = Ne For expected pairwise diversity (within population) ≡ Ne Subdivided: 4Nem = 0.2 BUT Single population 0 3 6 Copyright: Gilean McVean, 2001 9 12 15 18 21 24 2Ne Variance affected by population structure Average pairwise differences 16 Effect on allele frequency spectrum Slow coalescence between populations Mutation at high frequency Rapid coalescence within population Subdivided: 4Nem = 0.1 Single population 1 4 7 10 13 16 19 Frequency of derived allele Copyright: Gilean McVean, 2001 17 Effect on neutrality statistics within populations • Tajima’s D statistic Single population Subdivided: 4Nem = 0.2 -4 -3 -2 -1 0 1 2 3 4 • Fu and Li D statistic Single population Subdivided: 4Nem = 0.2 -4.5 -3 -1.5 0 1.5 3 Main effect is to increase the variance Other statistics (e.g. Fay and Wu, 2000) more sensitive Copyright: Gilean McVean, 2001 18 Effect on polymorphism between populations • Tajima’s D statistic Subdivided: 4Nem = 0.2 Single population -4 -3 -2 -1 0 1 2 3 4 • Frequency distribution Single population 1 3 Copyright: Gilean McVean, 2001 5 7 Subdivided: 4Nem = 0.2 9 11 13 15 17 19 19 Effect on linkage disequilibrium • Linkage disequilibrium measures correlations between alleles at different loci D = f AB − f A f B • Population structure increases linkage disequilibrium between linked loci Single population 4Ner = 1 Subdivided: 4Nem = 0.1 0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95 r2 • Population structure creates linkage disequilibrium between unlinked loci in different populations f A = 0.2, f B = 0.8 D=0 f A = 0.8, f B = 0.2 D=0 Copyright: Gilean McVean, 2001 Naive analysis D = 0.09 Admixture 20 Admixture dynamics • Combination of two previously separated populations f − f = δA 1 A 2 A f B1 − f B2 = δ B • Over time random mating returns population to equilibrium D0 = 14 δ Aδ B Dt = D0 (1 − r ) t • Disequilibrium between unlinked loci can persist for several generations, while Hardy-Weinberg equilibrium is achieved instantly Dt / D0 1cM distance unlinked 0 Copyright: Gilean McVean, 2001 2 4 6 8 10 generation 21 Selection in a subdivided population • Maruyama (1970) – The fixation probability of an unconditionally beneficial mutation is unaffected by population structure (Pfix ≈ 2s) • Levene (1953) – Environmental heterogeneity can maintain genetic polymorphism favoured on favoured on • BUT – If migration high, selection has to be strong and finely balanced to habitat frequencies to maintain polymorphism • Low migration rates can promote local adaptation – Heavy metal tolerance in plants – Melanism in the peppered moth – Milk tolerance in humans Copyright: Gilean McVean, 2001 22 Selection at different scales • Evidence for local adaptation from gradients in allele frequency : clines • Continental clines in Adh activity and allozyme variation in Drosophila ∇1 Frequency 0.6 F/S 0.4 0.2 0 22 27 32 37 Driven by scale of environmental heterogeneity 42 47 Latitude Berry & Kreitman (1993) • Clines in genetic and morphological characters in the toad Bombina Balance between selection against hybrids and migration, following secondary contact 1 Frequency B. variegata morphological Genetic 0 -10 -5 0 Distance (√km) Copyright: Gilean McVean, 2001 5 10 Szymura & Barton (1991) 23 Indirect evidence for local adaptation? Microsatellite diversity • Local hitch-hiking? India China Zimbabwe Antilles Locus Schlötterer et al. (1997) • But the structured coalescent also leads to variation in coalescence times Copyright: Gilean McVean, 2001 24 The interaction between selection, gene flow and genetic drift • Wright’s Shifting Balance theory • Epistasis between alleles at different loci AA Aa aa Locus 1 BB least fit Locus 2 Bb bb most fit • The adaptive landscape – Epistasis creates adaptive valleys between peaks of fitness Adaptive valley Population fitness Starting point of population Frequency allele B Frequency allele A Copyright: Gilean McVean, 2001 25 The Shifting Balance theory • Drift allows population to cross adaptive valley due to stochastic processes in finite populations Subpopulations are natural experiments, allowing species to evolve across complex adaptive landscapes • Evidence for widespread epistasis? – F2 hybrid breakdown – Coadapted gene complexes • Theoretical issues – Very difficult for a population that has crossed a valley to spread throughout rest of population – The interaction between epistatic selection and genetic drift may be important in reproductive isolation • e.g. recessive epistatic interactions important in Haldane’s rule of unisexual hybrid sterility Copyright: Gilean McVean, 2001 26 Future directions • Theoretical and statistical issues – Methods for discriminating between local adaptation and chance effects of coalescence in a structured population – The relationship between population structure and linkage disequilibrium – Selection on polygenic traits in subdivided populations • Empirical challenges – Describing patterns of gene diversity at many loci across genomes (from an wellchosen sample) – Comparing differentiation for different types of mutation (e.g. silent v replacement) – Mapping genetic variation to phenotypic variation Copyright: Gilean McVean, 2001 27