Download Lab 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
POPULATION GENETICS
Biology 107/207L
Winter 2005
Computer lab 1. The Hardy-Weinberg-Castle Equilibrium
Introduction
The Hardy-Weinberg principle states that allele frequencies at a polymorphic locus will
not change from generation to generation provided a series of assumptions are met. Therefore,
the Hardy-Weinberg principle represents the formal null model of no evolutionary change – one
or more assumptions must be violated in order for allelic frequencies to change over time in a
population. Evaluating deviations from Hardy-Weinberg proportions of genotypes in natural
populations thus has the potential to provide insights into the operation of specific
microevolutionary processes (although considerable caution must be exercised!). In natural
populations, departures from Hardy-Weinberg equilibrium may be caused by selection, nonrandom mating, random genetic drift, and/or migration (provided the populations that are mixing
differ in allele frequencies). Traditionally, deviations from Hardy-Weinberg equilibrium have
been tested by computing a chi-square statistic that follows asymptotically a chi-square
distribution with k(k-1)/2 df under the null hypothesis, where k is the observed number of alleles
(Li and Horvitz 1953; see discussion in Hedrick (2005) p. 90-97). Alternatively, an exact test
may be used (Haldane 1954) that is more accurate than the chi-square test when many rare
alleles are present at low frequencies. Recently, more powerful forms of exact tests have been
developed using a Markov chain method that avoids having to pool genotypic classes (Guo and
Thompson 1992; Raymond and Rousset 1995). The objective of this lab will be to introduce you
to one computer program (GENEPOP) that provides powerful tests for departures from HardyWeinberg equilibrium. We will be analyzing three datasets composed of genotypes at 10 nuclear
restriction fragment length polymorphism (RFLP) loci scored from populations of the Atlantic
cod, Gadus morhua (L.), sampled throughout its geographic range (described below). We will
perform exact tests on the three datasets and attempt to interpret any departures from HardyWeinberg expectations.
Download the GENEPOP program from the following web site:
http://wbiomed.curtin.edu.au/genepop/
The datasets
The first dataset (“atlantic”) contains RFLP data for ten anonymous nuclear gene regions
from six population samples distributed over the entire North Atlantic Ocean. The six
populations are Newfoundland (n = 428), Nova Scotia (n = 412), Iceland (n = 84), North Sea (n
= 81), Balsfjord (n = 87), and the Barents Sea (n = 82). All samples were collected over the
period of 1991-1993 by otter trawls at depths ranging between 100 and 300 m. All samples were
obtained from spawning aggregations.
The second dataset (“baltic”) was collected from the Baltic Sea region in 1995 and 1996.
Three populations were sampled - a northern Baltic Sea sample from near Turku, Finland (n =
241), a central population collected 45 km off the Latvian coast (n = 276), and a southern sample
captured 18 km east of Copenhagen, Denmark (n = 184). Only the northern and southern Baltic
samples were collected from spawning grounds - the Latvian sample was obtained in the fall
when cod populations in the Baltic are migrating to their winter feeding grounds.
The third dataset (“brasdor”) contains only a single population sample captured in the
Bras d’Or Lakes region of Cape Breton Island, Nova Scotia in September 1991 (n = 67). This
population inhabits an unusual brackish water environment that experiences dramatic seasonal
fluctuations in temperature and is believed to receive very little, if any, migrants from nearby
oceanic populations.
Running the GENEPOP program
Open up a Command Prompt window from the path “Start” – “Programs” –
“Accessories” – “Command Prompt”. Change the directory to where the GENEPOP program
has been installed. For example, if the program is in c:\Program Files\GENEPOP then type “cd\
Program Files\GENEPOP”. Type “genepop” to start the program. A menu will appear that will
prompt you for the file to open and the specific analyses to perform. To open a new datafile, type
“c” and you will be prompted for the new file. Type the name of the file (i.e., “atlantic”) and the
data will be read by the program. You will then be asked if you wish to calculate the numbers of
distinct alleles at each locus. After typing a “yes” or a “no” followed by a return, you will view
the default MENU window. Typing a “1” will access a number of tests for Hardy-Weinberg
equilibrium. We will restrict our testing to option “3” which is the exact probability test. You
will asked whether you wish to perform exact tests wherever possible – select “yes”. You will
then be prompted for setting the Markov chain parameters. Again, select the default settings for
dememorization number (1000), number of batches (50), and number of iterations per batch
(1000). The output file will have a “.P” extension (i.e., running the probability test on the file
“atlantic” will create the output file “atlantic.P”). As before, open up the output file with Word
and change the font of the document to “courier” so the results will be most easily viewed. The
output file will present the results of the tests (i) by locus, and (ii) by population. To save time,
only concern yourself with the results by population.
For each population, the P-value of the exact test will presented along with its standard
error. The null hypothesis being tested is that gametes are uniting at random – i.e., that there is
random mating in the population. If random mating does occur then the inbreeding coefficient
(FIS) estimated from the data is expected to be zero. Estimates of the inbreeding coefficient are
then shown for the methods of Weir and Cockerham (1984) (i.e., W&C) and Robertson and Hill
(1984) (i.e., R&H). The final column presents the number of tables considered during the
complete enumeration of the Markov chain. The GENEPOP program can also generate tables
listing observed and expected numbers of genotypes. These can be produced using Option 5 –
“Allele frequencies, various Fis and gene diversities” (the output file has a “.INF” extension).
Comparisons of observed and expected genotypes can be useful in interpreting causes of
deviations from HW proportions.
Procedure
1. Using the GENEPOP program, perform exact probability tests of Hardy-Weinberg
equilibrium for each dataset.
2. Examine and tabulate the results of the exact probability tests for each population.
3. Compare the results of the exact test with the chi-square test (i.e., Fisher’s method listed
below the tables of exact probabilities).
4. Evaluate the effect of increasing the length of the Markov chain (i.e., by increasing the
number of batches and iterations per batch above the default settings of 50 and 1000,
respectively).
Assignment
1. Present the results of the exact and chi-square tests for each population for each dataset.
Construct tables presenting FIS values at each locus in each population (use the W&C estimates
from GENEPOP). Note that a positive FIS value indicates a deficiency of heterozygotes
compared to that expected under Hardy-Weinberg equilibrium. Conversely, a negative FIS value
indicates an excess of heterozygotes.
2. Discuss the relative performance of each test. Does the exact probability test appear to be
superior to the chi-square test by detecting more significant departures from Hardy-Weinberg
expectations?
3. What can account for the results obtained for the Baltic Sea samples? Can you think of a way
to test this hypothesis from the data?
4. Provide an explanation for the departure from Hardy-Weinberg equilibrium detected in the
Bras d’Or Lakes sample. How could you test your hypothesis?
References
Guo, S.W., and E.A. Thompson. 1992. Performing the exact test of Hardy-Weinberg
proportions for multiple alleles. Biometrics 48: 361-372.
Haldane, J.B.S. 1954. An exact test for randomness of mating. Genetics 52: 631-635.
Hedrick, P.W. 2005. Genetics of Populations, 3rd Edition. Jones & Bartlett, Sudbury, MA.
Li, C.C., and D.G. Horvitz. 1953. Some methods of estimating the inbreeding coefficient.
Am. J. Hum. Genet. 5: 107-117.
Raymond, M, and F. Rousset. 1995. GENEPOP (version 1.2): Population genetics software
for exact tests and ecumenism. J. Heredity 86: 248-249.
Robertson, A., and W.G. Hill. 1984. Deviations from Hardy-Weinberg proportions: sampling
variances and use in estimation of inbreeding coefficients. Genetics 107: 713-718.
Rousset, F., and M. Raymond. 1995. Testing heterozygote excess and deficiency. Genetics
140: 1413-1419.
Weir, B.S., and C.C. Cockerham. 1984. Estimating F-statistics for the analysis of population
structure. Evolution 38: 1358-1370.