* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lecture 3: Allele Frequencies and Hardy
Survey
Document related concepts
Transcript
Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 24, 2015 Last Time ◆ Review of genetic variation and Mendelian Genetics Ø Sample calculations for Mendelian expectations: see solutions in excel file on website ◆ Methods for detecting variation Ø Morphology Ø Allozymes Ø DNA Markers (deferred to Friday: Guest lecture) ➤ Anonymous ➤ Sequence-tagged Today ◆ Introduction to statistical distributions ◆ Estimating allele frequencies ◆ Introduction to Hardy-Weinberg Equilibrium ◆ Using Hardy-Weinberg: Estimating allele frequencies for dominant loci Statistical Distributions: Normal Distribution ◆ Many types of estimates follow normal distribution Ø Can be visualized as a frequency distribution (histogram) Ø Can interpret as a probability density function 2 sd 1 sd Expected Value (Mean): 1 n x = ∑ xi n i =1 where n is the number of samples Variance (Vx): A measure of the dispersion around the mean: 1 n 2 Vx = ( x − x ) ∑ i n − 1 i =1 Standard Deviation (sd): A measure of dispersion around the mean that is on same scale as mean sd = Vx Standard Error of Mean ◆ Standard Deviation is a measure of how individual points differ from the mean estimates in a single sample ◆ Standard Error is a measure of how much the estimate differs from the true parameter value (in the case of means, µ) Ø If you repeated the experiment, how close would you expect the mean estimate to be to your previous estimate? Standard Error of the Mean (se): 95% Confidence Interval: se = Vx n x ± 1.96( se) Estimating Allele Frequencies, Codominant Loci ◆ Measured allele frequency is maximum likelihood estimator of the true frequency of the allele in the population (See Hedrick, pp 82-83 for derivation) p= 1 N12 2 N N11 + ◆ Expected number of observations of allele A1: E(Y)=np Ø Where n is number of samples Ø For diploid organisms, n = 2N , where N is number of individuals sampled ◆ Expected number of observations of allele A1 is analogous to the mean of a sample from a normal distribution ◆ Allele frequency can also be interpreted as an estimate of the mean Allele Frequency Example ◆ Assume a population of Mountain Laurel (Kalmia latifolia) at Cooper’s Rock, WV Red buds: 5000 Pink buds: 3000 White buds: 2000 A 1A 1 A 1A 2 A 2A 2 ◆ Phenotype is determined by a single, codominant locus: Anthocyanin ◆ What is frequency of “red” alleles (A1), and “white” alleles (A2)? Frequency of A1 = p 1 N11 + N12 2 N11 + N12 2 p= = , N 2N Frequency of A2 = q 1 N 22 + N12 2 N 22 + N12 2 q= = , N 2N Allele Frequencies are Distributed as Binomials ◆ Based on samples from a population Ø For two-allele system, each sample is like a “trial” Ø Does the individual contain Allele A1? Ø Remember, q=1-p, so only one parameter is estimated ◆ Binomials are variables that can be interpreted as the number of successes and failures in a series of trials ⎛ n ⎞ y n − y P(Y = y ) = ⎜⎜ ⎟⎟ s f , ⎝ y ⎠ Number of ways of observing y positive results ⎛ n ⎞ in n trials ⎜ ⎟ where s is the probability of a success, and f is the probability of a failure Probability of observing y positive results in n trials once n! ⎜ y ⎟ = C = y!(n − y )! ⎝ ⎠ n y Given the allele frequencies that you calculated earlier for Cooper’s Rock Kalmia latifolia, what is the probability of observing two “white” alleles in a sample of two plants? Variation in Allele Frequencies, Codominant Loci ◆ Binomial variance is pq or p(1-p) ◆ Variance in number of observations of A1: V(Y) = np(1-p) ◆ Variance in allele frequency estimates (codominant, diploid): Vp = p(1 − p) 2N ◆ Standard Error of allele frequency estimates: SE p = p(1 − p) 2N ◆ Notice that estimates get better as sample size increases ◆ Notice also that variance is maximum at intermediate allele frequencies Maximum variance as a function of allele frequency for a codominant locus 0.3 0.25 p (1-p ) 0.2 0.15 0.1 0.05 0 0 0.1 0.2 0.3 0.4 0.5 p 0.6 0.7 0.8 0.9 1 Why is variance highest at intermediate allele frequencies? p = 0.5 p = 0.125 If this were a target, how variable would your outcome be in each case (red versus white hits)? Variance is constrained when value approaches limits (0 or 1) What if there are more than 2 alleles? ◆ General formula for calculating allele frequencies in multiallelic system with codominant alleles: 1 n N ii + ∑ N ij 2 j =1 pi = , j≠i N ◆ Variance and Standard Error of allele frequency estimates remain: V pi = pi (1 − pi ) SE pi = 2N pi (1 − pi ) 2N How do we estimate allele frequencies for dominant loci? Codominant locus - + A1A1 A1A2 A2A2 Dominant locus A1A1 A1A2 A2A2 Hardy-Weinberg Law ◆ After one generation of random mating, single-locus genotype frequencies can be represented by a binomial (with 2 alleles) or a multinomial function of allele frequencies 2 2 ( p + q) = p + 2 pq + q Frequency of A1A1 (P) Frequency of A1A2 (H) 2 Frequency of A2A2 (Q) Hardy-Weinberg Law ◆ Hardy and Weinberg came up with this simultaneously in 1908 ◆ After one generation of random mating, single-locus genotype frequencies can be represented by a binomial (with 2 alleles) or a multinomial function of allele frequencies 2 2 ( p + q) = p + 2 pq + q Frequency of A1A1 (P) Frequency of A1A2 (H) 2 Frequency of A2A2 (Q) Hardy-Weinberg Equilibrium ◆ After one generation of random mating, genotype frequencies remain constant, as long as allele frequencies remain constant ◆ Provides a convenient Neutral Model to test for departures from assumptions ◆ Allows genotype frequencies to be represented by allele frequencies: simplification of calculations New Notation Genotype AA Aa aa Frequency P H Q Allele A a Frequency p q Hardy-Weinberg Assumptions ◆ Diploid ◆ Large population ◆ Random Mating: equal probability of mating among genotypes ◆ No mutation ◆ No gene flow ◆ Equal allele frequencies between sexes ◆ Nonoverlapping generations Graphical Representation of Hardy-Weinberg Law (p+q)2 = p2 + 2pq + q2 = 1 Relationship Between Allele Frequencies and Genotype Frequencies under Hardy-Weinberg Hardy-Weinberg Law and Probability A(p) a(q) A (p) AA (p2) Aa (pq) a (q) aA (qp) aa (q2) p2 + 2pq + q2 = 1 How does Hardy-Weinberg Work? ◆ Reproduction is a sampling process ◆ Example: Mountain Laurel at Cooper’s Rock Red Flowers: 5000 Pink Flowers: 3000 White Flowers: 2000 Alleles: : A2=14 : A1=26 A 1A 1 A 1A 2 A 2A 2 Frequency of A1 = p = 0.65 Frequency of A2 = q = 0.35 What are expected numbers of phenotypes and genotypes in a sample of 20 trees? What are expected frequencies of alleles in pollen and ovules? Genotypes: : 4 : 10 : 6 Phenotypes: : 4 : 10 : 6 What will be the genotype and phenotype frequencies in the next generation? What assumptions must we make? What about a 3-Allele System? ◆ Alleles occur in gamete pool at same frequency as in adults ◆ Probability of two alleles coming together to form a zygote is A B U Pollen Gametes A1 (p) A2 (q) A3 (r) A1A1 = p2 A1A2 = 2pq A1A3 = 2pr A 2A 2 = q2 A 3A 3 = r 2 Ovule Gametes A2A3 = 2qr A1 (p) A2 (q) A3 (r) From Neal, D. 2004. Introduction to Population Biology. ◆ Equilibrium established with ONE GENERATION of random mating ◆ Genotype frequencies remain stable as long as allele frequencies remain stable ◆ Remember assumptions!