* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Isolation by distance, based on microsatellite data, tested with
Quantitative comparative linguistics wikipedia , lookup
Viral phylodynamics wikipedia , lookup
Genetics and archaeogenetics of South Asia wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Genetic studies on Bulgarians wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
Koinophilia wikipedia , lookup
Behavioural genetics wikipedia , lookup
Heritability of IQ wikipedia , lookup
Dominance (genetics) wikipedia , lookup
Medical genetics wikipedia , lookup
Microsatellite wikipedia , lookup
Microevolution wikipedia , lookup
Human genetic variation wikipedia , lookup
Genetic drift wikipedia , lookup
Molecular Ecology Notes (2004) doi: 10.1046/j.1471-8286.2003.00581.x PROGRAM NOTE Blackwell Science, Ltd Isolation by distance, based on microsatellite data, tested with spatial autocorrelation (SPAIDA) and assignment test (SPASSIGN) SNÆBJÖRN PÁLSSON Institute of Biology, University of Iceland, Sturlugata 7, 101 Reykjavik, Iceland Abstract SPASSIGN and SPAIDA are two small programs useful to detect isolate by distance of microsatellite loci. The programs are written in C and are available for Linux and Windows system at http://www.hi.is/∼snaebj/programs.html. SPAIDA calculates two estimates of spatial autocorrelation, Moran’s I and Geary’s c, first by assuming the infinite allele model, and second by assuming a stepwise mutational model. SPASSIGN calculates the assignment probabilities of an individuals genotype to the location where it was sampled and compares probabilities of assignment to other locations. Genetic distances among regions based on the overall differences in likelihoods are calculated. Keywords: assignment, distance, genetics, population, spatial autocorrelation Received 30 September 2003; revision received 30 October 2003; accepted 20 November 2003 Isolation by distance (Wright 1943) results from less mixing among individuals, or pairs of populations, which are situated further apart than among those which are separated by shorter distances. This leads to a positive correlation among genetic and geographical distances, either within a continuously distributed species (e.g. Sokal & Jacquez 1991) or among populations with a discrete structure (Kimura & Weiss 1964). Microsatellites have in recent years become a popular marker of choice to address population genetics and demographic questions. This interest has led further to the development of various methods for the analysis of such data. As microsatellites may carry information on the past mutational events, characterized by different number of repeats this has led to development of statistics incorporating this information (stepwise mutational model, SMM) such as RST (Slatkin 1995). How well such methods, compared to the ones based on the infinite allele model (IAM), will reflect the true demographic events has been a subject of several studies (see Hardy et al. 2003). Statistics based on SMM may be of choice when, for example, heterozygosity is very high, blurring the signal of subdivision (Hedrick 1999). Bertorelle & Barbujani 1995) developed the program Correspondence: Snæbjörn Pálsson. Fax: + 354 525 4069; E-mail: [email protected] © 2004 Blackwell Publishing Ltd aida to detect isolation by distance, based on correlation in allele frequencies among geographical distances. In spaida I extend this concept by adding the information obtained by differences in allelic sizes. Another class of analysis, assignment tests (e.g. Pritchard et al. 2000), which have gained increased attention lately, is based on probabilities of a certain genotype being sampled at different locations, given the population frequencies of alleles comprising the genotype. The program spassign tests how the assignment values depend on geographical distance. The software binaries for Unix, Linux and Windows can be found at http://www.hi.is/∼snaebj/programs.html. Source code is available on request. Previous use of these programs can be found in Palsson (2000) and Goroposhnaya et al. (2001). The input file is similar to the one used for genepop (see Table 1). However, there should not be any commas and a space should be inserted between the alleles carried by an individual at a single locus. Missing values are noted with 0 allele size. Geographical coordinates are given in the last two columns, either as selected coordinates on a map or as latitudes and longitudes (as degree. min). The spaida program calculates two estimates of spatial autocorrelation, extended for microsatellite data, Moran’s I and Geary’s c (see Bertorelle & Barbujani 1995), first in the 2 PROGRAM NOTE Table 1 Example of an input file Locus1 Locus2 Locus3 Pop Indiv1 Indiv2 Indiv3 Indiv4 Pop ind1 ind2 ind3 ind4 12 12 12 13 13 14 16 13 20 20 20 20 20 20 20 20 34 34 36 36 34 36 36 36 0 0 0 0 0 0 0 0 16 16 16 12 14 14 14 14 20 18 10 10 20 20 20 20 34 34 34 34 36 36 36 36 0 0 0 0 3 3 3 3 traditional way assuming that the variation at the marker has been generated by the infinite allele model, and second assuming a stepwise mutational model. Moran’s I weights the covariance among alleles from individuals separated with a certain distance class, with the total variance. Values can therefore range from −1 to 1, from negative to a positive relationship. Geary’s c weights the variance within a distance class with the total variance, resulting in values ranging from 0 and up. When there is a positive correlation, there is little variation within distance class and the c-value is close to 0. In the IAM the variances and the covariances are based only on the frequencies of different alleles, whereas in SMM the variances and covariances are based on differences in the number of repeats. Numbers of distance classes can be selected to be either of equal length or the border of each class can be defined. Geographic distances can be log(x + 1)-transformed. The significance of the correlation for each distance class is evaluated by a permutation test. For a selected number of cases, geographical locations of individuals are assigned randomly and the statistics are calculated anew, providing a reference distribution for the observed test statistics. Output: Geographic distance classes are printed out and for each estimate the following is printed: Correlation for each locus (row) and distance class (column), followed by a line giving the proportion of permutations which gave larger correlation than the observed one. Below the results for all loci are the weighted averages of all the loci and their corresponding permutation values. In addition the terms constituting the overall estimates can be printed out for further inspection. For both methods (IAM and SMM) both the covariances and the variances are given for each locus and for the average. spassign calculates assignment test or the probability for an individual to be assigned to any one population sampled. Two methods are implemented: (i) a method developed by Paetkau et al. (1995); and (ii) the Bayesian method from Rannala & Mountain (1997). Both methods assume independent association of alleles, that is Hardy–Weinberg and linkage equilibrium. For each pair of populations, the program calculates the proportion of individuals which are more likely to belong to the other population than the one in which they were sampled, and it gives the weighted mean, Pass, of the two. If an allele does not exist in one population it is assumed that its frequency is 1/(n + 1), where n is the sample size. The program calculates also distance based on the overall differences in likelihoods. Consider populations X and Y, with nx and ny individuals sampled, respectively. If LiXX is defined as the likelihood of the genotype of individual i, sampled from population X given to have originated in population X, and LiXY, correspondingly, as the likelihood of the same genotype i, sampled in population X given that it originated in population Y, then nx L xx DLR = ∑ log i + i Li xy ny L yy ∑ log Li yx n i i x 1 + ny Thus if DLR = 2 this means that the genotypes of individuals from the two populations being compared are on average two orders of magnitude more likely to occur in the individuals’ own population than in the other population (Paetkau et al. 1997). I modified the equation by Paetkau et al. by weighting with the sample sizes (nx and ny) when taking the average. The association between Pass and geographical distances, and DLR and geographical distances is tested with the Mantel test (Sokal & Rohlf 1995). Geographic distances can be log(x + 1)-transformed. Output: Matrix of average pairwise proportion (of the two populations compared) assigned to the other population. Matrix of pairwise geographical distances among populations. The probability of the observed association (Mantel test). Matrix of DLR distances and the corresponding probability of association. Individual loglikelihoods of the assignment of each individual to a given population can also be printed out. Acknowledgements I want to thank Perttu Seppa for discussions on this work, and Einar Árnason for comments. This work was supported by the Icelandic Research Council. References Bertorelle G, Barbujani G (1995) Analysis of DNA diversity by spatial autocorrelation. Genetics, 140, 811–819. Goroposhnaya A, Seppa P, Pamilo P (2001) Social and genetic characteristics of geographically isloated populations in the ant Formica cinerea. Molecular Ecology, 10, 2807–2817. Hardy OJ, Charbonnel N, Freville H, Heuertz M (2003) Microsatellite allele sizes: a simple test to assess their significance on genetic differentiation. Genetics, 2003, 1467–1482. © 2004 Blackwell Publishing Ltd, Molecular Ecology Notes, 10.1046/j.1471-8286.2003.00581.x PROGRAM NOTE 3 Hedrick PW (1999) Perspective: Highly variable loci and their interpretation in evolution and conservation. Evolution, 53 , 313–318. Kimura M, Weiss GH (1964) The stepping stone model of population structure and the decrease of genetic correlation with distance. Genetics, 49, 561–576. Paetkau D, Calvert W, Stirling I, Strobeck C (1995) Microsatellite analysis of population structure in Canadian polar bears. Molecular Ecology, 4, 347. Paetkau D, Waits IP, Clarkson PL, Craighead I, Strobeck C (1997) An empirical evaluation of genetic distance statistics using microsatellite data from bear (Ursidae) populations. Genetics, 147, 1943–1957. Palsson S (2000) Microsatellite variation in Daphnia pulex from both sides of the Baltic Sea. Molecular Ecology, 9, 1075– 1088. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multlocus genotype data. Genetics, 155, 945 –959. Rannala B, Mountain L (1997) Detecting immigration by using multilocus genotypes. Proceedings of the National Academy of Sciences, 94, 9197–9221. Slatkin M (1995) A measure of population subdivision based on microsatellite allele frequencies. Genetics, 139, 457–462. Sokal RR, Jacquez GM (1991) Testing inferences about microevolutinary processes by means of spatial autocorrelation. Evolution, 45, 152–168. Sokal RR, Rohlf J (1995) Biometry. W.H. Freeman, New York. Wright S (1943) Isolation by distance. Genetics, 28, 114–138. © 2004 Blackwell Publishing Ltd, Molecular Ecology Notes, 10.1046/j.1471-8286.2003.00581.x