Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Demographic events Measuring Genetic Diversity • • • • Theta = θ = 4Nµ = 4Nm = 4N(µ+m) For haploid markers θ = 2Nµ = 2Nm = 2N(µ+m) The all important population genetic parameter. It is based on the number of alleles or the number of different nucleotides in a given sample. • It quantifies genetic diversity of a given population. Theta (θ) Hom • The expected homozygosity (Zouros, 1979; Chakraborty and Weiss (1991) in a population at equilibrium between drift and mutation. • Sensitive to small sample and allele sizes • For microsat data Theta (θ) S • Estimated from the infinite-site equilibrium relationship (Watterson, 1975) between the number of segregating sites (S), the sample size (n) and θ for a sample of non-recombining DNA. Theta (θ) k • Estimated from the infinite-allele equilibrium relationship (Ewens, 1972) between the expected number of alleles (k), the sample size (n) and θ. • 95% confidence limits are calculated as Sterling number (expansion factor of a factorial Falling factorial Theta (θ) πˆ • Estimated from the infinite-site equilibrium (Tajima, 1983) relationship between the mean number of pairwise differences (πˆ) and theta (θ ). Why so many θ measures • Not all methods are suitable for all types of data. • Ultimately all methods should result in the same estimates of theta. • Differences in estimates can be interpreted as violations of assumptions, and each method is sensitive to different assumptions. Tajima’s D • Tajima’s (1989) D test quantifies the discordance between the estimate of theta from number of segregating sites and from average pair-wise sequence divergence. • Negative values interpreted as a signal of purifying selection or alternately as demographic expansion. Fu’s Fs • Fu’s (1997) Fs measures the probability of observing a certain number of haplotypes given particular value of θ – the test looks at discordance in values of θ derived from number of haplotypes and average pairwise sequence divergence. • Negative values interpreted as a signal of purifying selection or alternately as demographic expansion. Stobek’s S • Strobek’s (1987) S measures the level of discordance in values of θ derived from number of haplotypes and average pair-wise sequence divergence. • The expected number of alleles in a sample is an increasing function of the migration rates, whereas the expected average number of nucleotide differences is shown to be independent of the migration rates and equal to 4Nµ. • Negative values interpreted as a signal population structure. Differences in θ measures • Have selective interpretations. • Have demographic interpretations. Inferring demographic change • Demographic changes are changes in effective population size over time. • Differences in θ summary statistics based different population genetic measures will detect demographic changes. • Distribution of allelic frequencies are the signatures of demographic changes. • Coalescent analysis of data will recover signals of demographic changes. Mismatch distribution • A frequency graph of pair-wise differences between alleles. • It is usually multimodal in samples drawn from populations at demographic equilibrium (it reflects the highly stochastic shape of gene trees) • It is usually unimodal in populations having passed through a recent demographic expansion (Rogers and Harpending, 1992; Hudson and Slatkin, 1991) or though a range expansion with high levels of migration between neighboring demes (Ray et al. 2003, Excoffier 2004). Mismatch distribution • Multimodal Mismatch distribution • Unimodal Mismatch distribution • The mismatch distribution is a graphic way of visualizing the signature of an expansion. • If there is population expansion, then theoretically we can calculate the population size before expansion, the population size after expansion, and the time that the expansion happened. Pure demographic expansion • We assumes that a population at equilibrium has suddenly passed τ generations ago from a population size of N0 to N1 • The probability of observing S segregating sites between two randomly chosen non-recombining haplotypes is given by Pure demographic expansion • In a simplified analysis θ1 is assumed to be ∞ - i.e. no coalescent event since expansion. • In this case: • Where m and v are the mean and variance of the mismatch distribution. • This simplifying assumption tends to underestimate the time to expansion, but is a fast solution. • This model is implemented in DnaSP. Pure demographic expansion • Alternately all three variables θ1, θ0 and τ can be solved for simultaneously using generalized nonlinear least-squares approach. • The objective is to simultaneously change all three variables such that Fs is maximized. • This approach is more exact, but it is computationally intensive. • This model is implemented in Arlequin. Pure demographic expansion • Using the model implemented in Arlequin, we can also estimate the confidence intervals around all three variables. • This is done by a parametric bootstrap. Parametric bootstrap • Assume some model plus some set of parameters, and simulate a new dataset. • Simulate N datasets. • For each new simulated dataset, calculate a new set of values. • Plot a frequency distribution of newly calculated values, and see where actual values or parameters are placed relative to values of parameters derived from simulated data. • In this case we know the values of three variables θ1, θ0 and τ and using these we generate new datasets under the assumption of population expansion. Test of demographic expansion • Using the parametric bootstrap, we can place confidence intervals on the three variables θ1, θ0 and τ. • The model used in the parametric bootstrap assumes population expansion, and we have not yet tested if the data have a signature of population expansion. • We use the parametric bootstrap. • Statistic 1 - the sum of square deviations (SSD) between the observed and the expected mismatch. • Statistic 2 - the raggedness index of the observed distribution. SSD statistic • Using the parametric bootstrap (we simulate a new dataset under the assumption of a demographic expansion, and some values of θ1, θ0 and τ derived from the original data), we get a new mismatch distribution. • We calculate a sum of square deviations of the mismatch distribution for each bootstrapped dataset, and compare it to the sum of squared deviations of the actual dataset. • The test statistic is Raggedness index statistic • Using the parametric bootstrap (we simulate a new dataset under the assumption of a demographic expansion, and some values of θ1, θ0 and τ derived from the original data), we get a new mismatch distribution. • We calculate a summary statistic r based on maximum number of mutational differences (d) and frequency of the allelic classes (x). • The test statistic is same as for SSD. Test of demographic expansion • Populations that have undergone demographic expansions are expected to have smaller sum of squared differences and smaller raggedness index in their mismatch distributions than non-expanded populations. • Therefore, based on the test statistics, what values of P would be considered significant (i.e. support the hypothesis of a demographic expansion)? Mismatch distribution • Multimodal Mismatch distribution • Unimodal Test of demographic expansion • Maximum likelihood approaches. – We assume some model of molecular evolution, and calculate the likelihood of our data under that model of evolution – We estimate relevant parameters under this model – Model of molecular evolution has to be known a priori • Bayesian inference. – We do not assume a particular model – We divide our sampling throughout the duration of the coalescent and estimate relevant parameters – Based on distribution of parameter estimates, we infer a process (model). Test of demographic expansion • Both maximum likelihood and Bayesian inference approaches allow the calculation of confidence intervals. • Likelihoods and posterior probabilities are solved through the Markov Chain Monte Carlo (MCMC) algorithm – this is kind of a resampling algorithm. • Likelihoods obtained under different models can be compared using standard model selection criteria – these include hierarchical Likelihood Ratio Test (hLRT), Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). Maximum likelihood methods • ML assumes a model of sequence evolution. • ML attempts to answer the question: What is the probability that I would observe these data (a multiple sequence alignment), given a particular model of evolution (a tree and a process). • ML uses a ‘model’. This is justifiable, since molecular sequence data can be shown to have arisen according to a evolutionary process. Maximum Likelihood - goal • To estimate the probability that we would observe a particular dataset, given a phylogenetic tree and some notion of how the evolutionary process worked over time. Probability of ⎧a b ⎪ ⎪b a ⎨ ⎪c e ⎪ ⎩d c given π = [a,c,g,t] c e a f d⎫ ⎪ f⎪ ⎬ g⎪ ⎪ a⎭ Bayesian inference methods Bayesian inference methods Bayesian inference methods Demographic events • Same as a population can undergo a demographic expansion, it can also undergo a demographic contraction. • Severe demographic contractions are called bottlenecks. Demographic declines • Demographic declines should result in patterns opposite to demographic expansions. • Three popular methodologies exist – Heterozygozity method – Allele number vs. allele range method – Coalescent method Demographic declines • The heterozygosity method takes note of the fact that recently declined populations will have a relative excess of observed to expected heterozygotes • Why would this be true? • Implemented in the program Bottleneck. Demographic declines • The allele number vs. allele range method takes note of the fact that recently declined populations will have fewer alleles than expected relative to allele range. • Why would this be true? • Implemented in the program M value. Demographic declines • M = k/(R+1) Demographic declines • The heterozygosity and allele number / allele range methods work only with microsatellite data. • They model expected heterozygosity under three different models of microsatellite evolution. – IAM – Infinite alleles model – SMM – Stepwise mutation model – TPM – Two phase model Demographic declines • IAM – Infinite alleles model – it assumes that every new mutation results in a new allele, and that there is no relationships between the newly generated allele, and the parental allele Demographic declines • SMM – Stepwise mutation model – it assumes that every new mutation results in a new allele, and that this new allele is either one step larger or one step smaller than the parental allele. Demographic declines • TPM – Two phase model – this model is a mix of the SMM and IAM models. Some percentage of the mutations are allowed to form according to the SMM model (~80%), the rest according to IAM. Some programs allow for the input of an average allele jump size. Demographic declines • TPM – Two phase model – this model is a mix of the SMM and IAM models. Some percentage of the mutations are allowed to form according to the SMM model (~80%), the rest according to IAM. Some programs allow for the input of an average allele jump size. Demographic declines • The method implemented in M value also assumes the a priory knowledge of θ of the population prior to the population decline. • Calculating θ from the current genetic diversity would result in a conservative estimate of population decline. Demographic declines • Harpia harpyja example: • The average value of M for the 24 microsatellite loci was 0.84, a value significantly lower than that obtained under simulation of a pre-bottleneck population size (p = 0.026 using the genetic parameter θ of 2.24). • We derived θ from estimated census sizes of 104 to 105 harpy eagle individuals assuming that the effective number of individuals is equivalent to 1/10 the census size, and that microsatellite mutation rate (µ) estimates range from 2.5 x 10-3 to 5.6 x 10-4. Demographic declines • Harpia harpyja example: • When the parameter θ was estimated directly from the microsatellite data (θ = 1.50), the M value was not significant (p = 0.101). • However, the θ calculated from the data itself is necessarily a lower bound estimate if H. harpyja shows any population structure, or if the θ does not represent original population prior to reduction. Demographic declines Demographic declines • Different methods have different power to detect bottleneck, an to register a bottleneck event for different amount of time. • Heterozygosity are often more immediately sensitive, but they do not register a bottleneck for a very long time. • Allele number / allele range methods tend to register equally or slightly less severe bottleneck, but longer time in the past. Demographic declines • The coalescent method estimates parameters for current and ancestral population size, and time when population size occurred. • Implemented in the program MSvar. Demographic declines • Both populations experienced decreases in effective population sizes. • Populations started decreasing 100-150 years ago. • Reductions range from 25% to 60%. 8 8 7 7 6 6 present 4 past 3 5 Ne Ne 5 past 3 2 2 1 1 0 present 4 0 0 5000 10000 15000 generations 20000 25000 0 5000 10000 15000 generations 20000 25000 Populations size • Effective population size – a summary statistic representing some ideal number of individuals based on some summary statistic. Populations size • Inbreeding effective population size – The ideal number of individuals that are contributing to the reproductive population – can be calculated from pedigree information and more commonly from the coalescent properties of the observed gene tree. Populations size • Variance effective population size – The ideal number of individuals that represents the sampling variance of the population – this can either be sampling across generations, or based on the variance in gene frequencies observed in the data (so based from θ estimates). Populations size • Different concepts of effective population sizes will result in different estimates of effective population sizes. • The inbreeding effective population size is a backward looking statistic whereas the variance effective population sizes reflect recent demographic/population genetic processes influencing genetic systems. Populations size • Large inbreeding effective population sizes and small variance effective population sizes are indicative of recently reduced genetic variation due to decreases in population size or habitat fragmentation (Gerber & Templeton, 1996). • In contrast, with a rapid increase in population size, theory predicts a small inbreeding effective population size and large variance and eigenvalue effective sizes (Templeton, 1980). Populations size • There is therefore no simple relationship between effective population sizes, and census sizes, although it is often claimed that in stable populations at equilibrium, there is 1:10 relationships between inbreeding and census population sizes.