Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
POPULATION GENETICS Biology 107/207L Winter 2005 Computer lab 1. The Hardy-Weinberg-Castle Equilibrium Introduction The Hardy-Weinberg principle states that allele frequencies at a polymorphic locus will not change from generation to generation provided a series of assumptions are met. Therefore, the Hardy-Weinberg principle represents the formal null model of no evolutionary change – one or more assumptions must be violated in order for allelic frequencies to change over time in a population. Evaluating deviations from Hardy-Weinberg proportions of genotypes in natural populations thus has the potential to provide insights into the operation of specific microevolutionary processes (although considerable caution must be exercised!). In natural populations, departures from Hardy-Weinberg equilibrium may be caused by selection, nonrandom mating, random genetic drift, and/or migration (provided the populations that are mixing differ in allele frequencies). Traditionally, deviations from Hardy-Weinberg equilibrium have been tested by computing a chi-square statistic that follows asymptotically a chi-square distribution with k(k-1)/2 df under the null hypothesis, where k is the observed number of alleles (Li and Horvitz 1953; see discussion in Hedrick (2005) p. 90-97). Alternatively, an exact test may be used (Haldane 1954) that is more accurate than the chi-square test when many rare alleles are present at low frequencies. Recently, more powerful forms of exact tests have been developed using a Markov chain method that avoids having to pool genotypic classes (Guo and Thompson 1992; Raymond and Rousset 1995). The objective of this lab will be to introduce you to one computer program (GENEPOP) that provides powerful tests for departures from HardyWeinberg equilibrium. We will be analyzing three datasets composed of genotypes at 10 nuclear restriction fragment length polymorphism (RFLP) loci scored from populations of the Atlantic cod, Gadus morhua (L.), sampled throughout its geographic range (described below). We will perform exact tests on the three datasets and attempt to interpret any departures from HardyWeinberg expectations. Download the GENEPOP program from the following web site: http://wbiomed.curtin.edu.au/genepop/ The datasets The first dataset (“atlantic”) contains RFLP data for ten anonymous nuclear gene regions from six population samples distributed over the entire North Atlantic Ocean. The six populations are Newfoundland (n = 428), Nova Scotia (n = 412), Iceland (n = 84), North Sea (n = 81), Balsfjord (n = 87), and the Barents Sea (n = 82). All samples were collected over the period of 1991-1993 by otter trawls at depths ranging between 100 and 300 m. All samples were obtained from spawning aggregations. The second dataset (“baltic”) was collected from the Baltic Sea region in 1995 and 1996. Three populations were sampled - a northern Baltic Sea sample from near Turku, Finland (n = 241), a central population collected 45 km off the Latvian coast (n = 276), and a southern sample captured 18 km east of Copenhagen, Denmark (n = 184). Only the northern and southern Baltic samples were collected from spawning grounds - the Latvian sample was obtained in the fall when cod populations in the Baltic are migrating to their winter feeding grounds. The third dataset (“brasdor”) contains only a single population sample captured in the Bras d’Or Lakes region of Cape Breton Island, Nova Scotia in September 1991 (n = 67). This population inhabits an unusual brackish water environment that experiences dramatic seasonal fluctuations in temperature and is believed to receive very little, if any, migrants from nearby oceanic populations. Running the GENEPOP program Open up a Command Prompt window from the path “Start” – “Programs” – “Accessories” – “Command Prompt”. Change the directory to where the GENEPOP program has been installed. For example, if the program is in c:\Program Files\GENEPOP then type “cd\ Program Files\GENEPOP”. Type “genepop” to start the program. A menu will appear that will prompt you for the file to open and the specific analyses to perform. To open a new datafile, type “c” and you will be prompted for the new file. Type the name of the file (i.e., “atlantic”) and the data will be read by the program. You will then be asked if you wish to calculate the numbers of distinct alleles at each locus. After typing a “yes” or a “no” followed by a return, you will view the default MENU window. Typing a “1” will access a number of tests for Hardy-Weinberg equilibrium. We will restrict our testing to option “3” which is the exact probability test. You will asked whether you wish to perform exact tests wherever possible – select “yes”. You will then be prompted for setting the Markov chain parameters. Again, select the default settings for dememorization number (1000), number of batches (50), and number of iterations per batch (1000). The output file will have a “.P” extension (i.e., running the probability test on the file “atlantic” will create the output file “atlantic.P”). As before, open up the output file with Word and change the font of the document to “courier” so the results will be most easily viewed. The output file will present the results of the tests (i) by locus, and (ii) by population. To save time, only concern yourself with the results by population. For each population, the P-value of the exact test will presented along with its standard error. The null hypothesis being tested is that gametes are uniting at random – i.e., that there is random mating in the population. If random mating does occur then the inbreeding coefficient (FIS) estimated from the data is expected to be zero. Estimates of the inbreeding coefficient are then shown for the methods of Weir and Cockerham (1984) (i.e., W&C) and Robertson and Hill (1984) (i.e., R&H). The final column presents the number of tables considered during the complete enumeration of the Markov chain. The GENEPOP program can also generate tables listing observed and expected numbers of genotypes. These can be produced using Option 5 – “Allele frequencies, various Fis and gene diversities” (the output file has a “.INF” extension). Comparisons of observed and expected genotypes can be useful in interpreting causes of deviations from HW proportions. Procedure 1. Using the GENEPOP program, perform exact probability tests of Hardy-Weinberg equilibrium for each dataset. 2. Examine and tabulate the results of the exact probability tests for each population. 3. Compare the results of the exact test with the chi-square test (i.e., Fisher’s method listed below the tables of exact probabilities). 4. Evaluate the effect of increasing the length of the Markov chain (i.e., by increasing the number of batches and iterations per batch above the default settings of 50 and 1000, respectively). Assignment 1. Present the results of the exact and chi-square tests for each population for each dataset. Construct tables presenting FIS values at each locus in each population (use the W&C estimates from GENEPOP). Note that a positive FIS value indicates a deficiency of heterozygotes compared to that expected under Hardy-Weinberg equilibrium. Conversely, a negative FIS value indicates an excess of heterozygotes. 2. Discuss the relative performance of each test. Does the exact probability test appear to be superior to the chi-square test by detecting more significant departures from Hardy-Weinberg expectations? 3. What can account for the results obtained for the Baltic Sea samples? Can you think of a way to test this hypothesis from the data? 4. Provide an explanation for the departure from Hardy-Weinberg equilibrium detected in the Bras d’Or Lakes sample. How could you test your hypothesis? References Guo, S.W., and E.A. Thompson. 1992. Performing the exact test of Hardy-Weinberg proportions for multiple alleles. Biometrics 48: 361-372. Haldane, J.B.S. 1954. An exact test for randomness of mating. Genetics 52: 631-635. Hedrick, P.W. 2005. Genetics of Populations, 3rd Edition. Jones & Bartlett, Sudbury, MA. Li, C.C., and D.G. Horvitz. 1953. Some methods of estimating the inbreeding coefficient. Am. J. Hum. Genet. 5: 107-117. Raymond, M, and F. Rousset. 1995. GENEPOP (version 1.2): Population genetics software for exact tests and ecumenism. J. Heredity 86: 248-249. Robertson, A., and W.G. Hill. 1984. Deviations from Hardy-Weinberg proportions: sampling variances and use in estimation of inbreeding coefficients. Genetics 107: 713-718. Rousset, F., and M. Raymond. 1995. Testing heterozygote excess and deficiency. Genetics 140: 1413-1419. Weir, B.S., and C.C. Cockerham. 1984. Estimating F-statistics for the analysis of population structure. Evolution 38: 1358-1370.