Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Population Genetics Lab 2 BINOMIAL PROBABILITY & HARDY-WEINBERG EQUILIBRIUM Last Week : Sample Point Methods: Example: Use the Sample Point Method to find the probability of getting exactly two heads in three tosses of a balanced coin. 1. The sample space of this experiment is: Outcome Toss 1 Toss 2 Toss 3 Shorthand Probabilities 1 2 3 4 5 6 7 8 Head Head Head Tail Tail Tail Head Tail Head Head Tail Head Tail Head Tail Tail Head Tail Head Head Head Tail Tail Tail HHH HHT HTH THH TTH THT HTT TTT 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 2. Assuming that the coin is fair, each of these 8 outcomes has a probability of 1/8. 3. The probability of getting two heads is the sum of the probabilities of outcomes 2, 3, and 4 (HHT, HTH, and THH), or 1/8 + 1/8 + 1/8 = 3/8 = 0.375. Sample- point method : Example: Find the probability of getting exactly 10 heads in 30 tosses of a balanced coin. Total # of sample points = 230 = 1,073,741,824 Need a way of accounting for all the possibilities Example: In drawing 3 M&Ms from an unlimited M&M bowl that is always 60% red and 40% green, what is the P(2 green)? P(2G ) P(GGR) P(GRG ) P( RGG ) P(2G ) 0.40.40.6 0.40.60.4 0.60.40.4 P(2G ) 30.40.40.6 30.4 0.6 2 If one green M&M is just as good as another… P(2G ) 0.4 0.6 3 2 2 Binomial Probability Distribution ænö y P(Y = y) = ç ÷ s f è yø n-y Where, n = Total # of trials. y = Total # of successes. s = probability of getting success in a single trial. f = probability of getting failure in a single trial (f = 1-s). n! C = y!(n - y)! n y Assumptions of Binomial Distribution 1. # of trials are independent, finite, and conducted under the same conditions. 2. There are only two types of outcome.(Ex. success and failure). 3. Outcomes are mutually exclusive and independent. 4. Probability of getting a success in a single trial remains constant throughout all the trials. 5. Probability of getting a failure in a single trial remains constant throughout all the trials. 6. # of success are finite and a non-negative integer (0,n) Properties of Binomial Distribution Mean or expected # of successes in n trials, E(y) = ns Variance of y, V(y) = nsf Standard deviation of y, σ (y) = (nsf)1/2 Example: Find the probability of getting exactly 10 heads in 30 tosses of a balanced coin. Solution: n! y n-y P(y) = *s * f y!(n - y)! We know, n = 30 y = 10 s = 0.5 f = 0.5 30! P(10) = * 0.510 * 0.530-10 10!(30 -10)! = 30045015 ´ 0.000976563´ 9.53674E-07 = 0.027982 Example: Find the expected # of heads in 30 tosses of a balanced coin. Also calculate variance. Solution: E(Y) = ns = 30*0.5 = 15 V(Y) = nsf = 30*0.5*0.5 = 7.5 Problem 1 (10 minutes)(2 points) An allozyme locus has three alleles, A1,A2, and A3 with frequencies 0.847, 0.133, and 0.020, respectively. If we sample 30 diploid individuals, what is the probability of: •Not finding any copies of A2? •Finding at least one copy of A2? •GRADUATE STUDENTS ONLY: Finding fewer than 2 copies of A2? Example: How many diploid individuals should be sampled to detect at least one copy of allele A2 from Problem 1 with probability of at least 0.95? Solutions: n! 10.0330 * 0.967n ³ 0.95 0!* n! n! 0.0330 * 0.967n £ 0.05 0!* n! Þ 0.967n £ 0.05 Þ n ln 0.967 £ ln 0.05 ln 0.05 -2.9957 Þn³ = = 89.2735 ln 0.967 -0.0336 Thus, to detect at least one copy of allele A2 with probability of 0.95, one would need to sample at least 90 alleles (i.e., at least 45 diploid individuals). Problem 2 (15 minutes)(2 points) Problem 2. The frequency of red-green color-blindness is 0.07 for men and 0.005 for women. You are designing a survey to determine the effect of color blindness on educational success. How many males and females would you have to sample to ensure that the probability including at least one color blind individual of each sex would be 0.90 or greater? Estimation of allele frequency for Co-dominant locus 1 N11 + N12 2 p= N 1 N 22 + N12 2 q= N Where, p = Frequency of allele A1 q = Frequency of Allele A2 N11 = # of individuals with genotype A1A1 N12 = # of individuals with genotype A1A2 N22 = # of individuals with genotype A2A2 N = total # of diploid individuals =N11+N12+N22 Estimation of Standard Error SE p = p(1- p) 2N q(1- q) SEq = 2N Where, p = Frequency of allele A1 q = Frequency of Allele A2 SEp = Standard error for frequency of allele A1 SEq = Standard error for frequency of allele A2 N = total # of diploid individuals =N11+N12+N22 Standard Deviation v. Standard Error SD Var Measure of data dispersion Var SE Measure of mean dispersion n We expect ~68% of the data to fall within 1 standard deviation of the mean. Example: What are the allele frequencies of alleles A1 and A2, if the following genotypes have been observed in a sample of 50 diploid individuals? Genotype A1A1 A1A2 A2A2 Solution: Count 17 23 10 N11 = 17, N12 = 23, and N22 = 10 1 N11 + N12 17 +11.5 2 p= = = 0.57, N 50 SE p = p(1- p) 0.57(1- 0.57) 0.57´ 0.43 = = = 0.002451 = 0.0495. 2N 100 100 q = 1 – p = 0.43 q(1- q) SEq = 2N 0.43(1- 0.43) = 100 0.43´ 0.57 = = 0.002451 = 0.0495. 100 Problem 3 (10 minutes) (2 pts) Estimate the allele frequencies (include their respective standard errors) for alleles A1, A2, and A3 if the following genotypes have been observed in a sample of 200 individuals Genotype Count n 1 N ii N ij 2 j 1 pi , ji N SE pi = pi (1- pi ) 2N A1A1 19 A2A2 17 A3A3 14 A1A2 52 A1A3 57 A2A3 41 Problem 4 (Time 10 min.)(2 pts) Tay Sachs disease is an autosomal recessive genetic disorder causing the death of nerve cells in the brain due to the steady accumulation of gangliosides. Extensive genotyping has determined that approximately 1 in 30 of the 5 million Ashkenazi Jews within the United States is a carrier. a) Assuming HWE and Mendelian inheritance of the disease, what is the frequency of the recessive allele in this population? b) What is the SE of this estimate? (Assume 1,000 people were sampled) c) How many affected children would you expect to be born in this population? d) What are the assumptions of these estimates? Hypothesis Testing Hypothesis: Tentative statement for a scientific problem, that can be tested by further investigations. 1.Null Hypothesis(Ho): There is no significant difference in observed and expected values. 2.Alternate Hypothesis(H1): There is a significant difference in observed and expected values. Example: Ho = Fertilized and unfertilized crops have equal yields H1 = Fertilized and unfertilized crops do not have equal yields Remember: In final conclusion after the experiment ,we either – "Reject H0 in favor of H1" Or “Fail to reject H0”, Type I error: Error due to rejection of a null hypothesis, when it is actually true (False positive). Level of significance(LOS) (α) : Maximum probability allowed for committing “type I error”. At 5 % LOS (α=0.05), we accept that if we were to repeat the experiment many times, we would falsely reject the null hypothesis 5% of the time. P- value: Probability of committing type I error If P-value is smaller than a particular value of α, then result is significant at that level of significance Testing departure from HWE In a randomly mating population, allele and genotype frequencies remain constant from generation to generation. Ho= There is no significant difference between observed and expected genotype frequencies (i.e. Population is in HWE) H1= There is a significant difference between observed and expected genotype frequencies (i.e. Population is not in HWE) HWE Assumptions 1. Random mating 2. No selection a. Equal numbers of offspring per parent b. All progeny equally fit 3. No mutation 4. Single, very large population 5. No migration 2 χ (Oi - Ei ) c =å Ei i=1 k 2 2 - test Reject H0 if 2 Where, Oi Observed count of genotype i Ei Expected count of genotype i k Number of genotypes df k - # parameters estimated - 1 2 df , Example: A population of Mountain Laurel at Cooper’s Rock State Forest has the following observed genotype counts: Genotype Observed number A1A1 5000 A1A2 3000 A2A2 2000 Is this population in Hardy-Weinberg equilibrium ? p= 1 N12 5000 +1500 2 = = 0.65, N 10000 N11 + q =1- p =1- 0.65 = 0.35, Genotype Expected frequency under HWE Expected number under HWE A1A1 p2 = 0.652 = 0.4225 0.4225 10000 = 4225 A1A2 2pq = 0.455 0.455 10000 = 4550 A2A2 q2 = 0.1225 0.1225 10000 = 1225 Genotype Obs. #(O) Exp. #(E) A1A1 5000 4225 A1A2 3000 4550 A2A2 2000 1225 (O-E) 775 -1550 775 (O-E)^2 (O-E)^2/E 600625 142.1598 2402500 528.022 600625 490.3061 χ2 1160.488 We estimated 1 parameter (p) from the data (3 genotypes) . We do not count q as an estimated parameter because it is dependent on p (i.e. q 1 p ) df 3 1 1 1 The critical value (Table value) of χ2 at 1 df and at α=0.05 is approx. 3.84. Conclusion: Because the calculated value of χ2 (1160.49) is greater than the critical value (3.84), we reject the null hypothesis and accept the alternative (Not in HWE). Problem 5 (Time 10 min) (2 pts) Based on the observed genotype counts in problem 3, test whether the population that had been sampled is in HWE. What are some possible explanations for the observed results? Genotype Count A1A1 19 A2A2 17 A3A3 14 A1A2 52 A1A3 57 A2A3 41