Download Isolation by distance, based on microsatellite data, tested with

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Quantitative comparative linguistics wikipedia , lookup

Viral phylodynamics wikipedia , lookup

Genetics and archaeogenetics of South Asia wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Genetic studies on Bulgarians wikipedia , lookup

Twin study wikipedia , lookup

Inbreeding wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Koinophilia wikipedia , lookup

Behavioural genetics wikipedia , lookup

Heritability of IQ wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Medical genetics wikipedia , lookup

Microsatellite wikipedia , lookup

Microevolution wikipedia , lookup

Human genetic variation wikipedia , lookup

Genetic drift wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

Population genetics wikipedia , lookup

Transcript
Molecular Ecology Notes (2004)
doi: 10.1046/j.1471-8286.2003.00581.x
PROGRAM NOTE
Blackwell Science, Ltd
Isolation by distance, based on microsatellite data, tested
with spatial autocorrelation (SPAIDA) and assignment test
(SPASSIGN)
SNÆBJÖRN PÁLSSON
Institute of Biology, University of Iceland, Sturlugata 7, 101 Reykjavik, Iceland
Abstract
SPASSIGN and SPAIDA are two small programs useful to detect isolate by distance of microsatellite loci. The programs are written in C and are available for Linux and Windows system
at http://www.hi.is/∼snaebj/programs.html. SPAIDA calculates two estimates of spatial autocorrelation, Moran’s I and Geary’s c, first by assuming the infinite allele model, and second
by assuming a stepwise mutational model. SPASSIGN calculates the assignment probabilities
of an individuals genotype to the location where it was sampled and compares probabilities
of assignment to other locations. Genetic distances among regions based on the overall differences in likelihoods are calculated.
Keywords: assignment, distance, genetics, population, spatial autocorrelation
Received 30 September 2003; revision received 30 October 2003; accepted 20 November 2003
Isolation by distance (Wright 1943) results from less
mixing among individuals, or pairs of populations, which
are situated further apart than among those which are
separated by shorter distances. This leads to a positive
correlation among genetic and geographical distances,
either within a continuously distributed species (e.g.
Sokal & Jacquez 1991) or among populations with a discrete
structure (Kimura & Weiss 1964).
Microsatellites have in recent years become a popular
marker of choice to address population genetics and demographic questions. This interest has led further to the
development of various methods for the analysis of such
data. As microsatellites may carry information on the past
mutational events, characterized by different number of
repeats this has led to development of statistics incorporating this information (stepwise mutational model, SMM)
such as RST (Slatkin 1995). How well such methods, compared to the ones based on the infinite allele model (IAM),
will reflect the true demographic events has been a subject
of several studies (see Hardy et al. 2003). Statistics based on
SMM may be of choice when, for example, heterozygosity
is very high, blurring the signal of subdivision (Hedrick
1999). Bertorelle & Barbujani 1995) developed the program
Correspondence: Snæbjörn Pálsson. Fax: + 354 525 4069; E-mail:
[email protected]
© 2004 Blackwell Publishing Ltd
aida to detect isolation by distance, based on correlation in
allele frequencies among geographical distances. In spaida
I extend this concept by adding the information obtained
by differences in allelic sizes.
Another class of analysis, assignment tests (e.g. Pritchard
et al. 2000), which have gained increased attention lately, is
based on probabilities of a certain genotype being sampled
at different locations, given the population frequencies
of alleles comprising the genotype. The program spassign
tests how the assignment values depend on geographical
distance.
The software binaries for Unix, Linux and Windows can
be found at http://www.hi.is/∼snaebj/programs.html.
Source code is available on request. Previous use of these
programs can be found in Palsson (2000) and Goroposhnaya
et al. (2001).
The input file is similar to the one used for genepop (see
Table 1). However, there should not be any commas and a
space should be inserted between the alleles carried by an
individual at a single locus. Missing values are noted with
0 allele size. Geographical coordinates are given in the last
two columns, either as selected coordinates on a map or as
latitudes and longitudes (as degree. min).
The spaida program calculates two estimates of spatial
autocorrelation, extended for microsatellite data, Moran’s
I and Geary’s c (see Bertorelle & Barbujani 1995), first in the
2 PROGRAM NOTE
Table 1 Example of an input file
Locus1
Locus2
Locus3
Pop
Indiv1
Indiv2
Indiv3
Indiv4
Pop
ind1
ind2
ind3
ind4
12
12
12
13
13
14
16
13
20
20
20
20
20
20
20
20
34
34
36
36
34
36
36
36
0
0
0
0
0
0
0
0
16
16
16
12
14
14
14
14
20
18
10
10
20
20
20
20
34
34
34
34
36
36
36
36
0
0
0
0
3
3
3
3
traditional way assuming that the variation at the marker
has been generated by the infinite allele model, and second
assuming a stepwise mutational model. Moran’s I weights
the covariance among alleles from individuals separated
with a certain distance class, with the total variance. Values
can therefore range from −1 to 1, from negative to a positive relationship. Geary’s c weights the variance within a
distance class with the total variance, resulting in values
ranging from 0 and up. When there is a positive correlation,
there is little variation within distance class and the c-value
is close to 0. In the IAM the variances and the covariances
are based only on the frequencies of different alleles,
whereas in SMM the variances and covariances are based
on differences in the number of repeats.
Numbers of distance classes can be selected to be either
of equal length or the border of each class can be defined.
Geographic distances can be log(x + 1)-transformed.
The significance of the correlation for each distance class
is evaluated by a permutation test. For a selected number
of cases, geographical locations of individuals are assigned
randomly and the statistics are calculated anew, providing
a reference distribution for the observed test statistics.
Output: Geographic distance classes are printed out and
for each estimate the following is printed: Correlation for
each locus (row) and distance class (column), followed by
a line giving the proportion of permutations which gave
larger correlation than the observed one. Below the results
for all loci are the weighted averages of all the loci and their
corresponding permutation values.
In addition the terms constituting the overall estimates
can be printed out for further inspection. For both methods
(IAM and SMM) both the covariances and the variances
are given for each locus and for the average.
spassign calculates assignment test or the probability for
an individual to be assigned to any one population sampled. Two methods are implemented: (i) a method developed by Paetkau et al. (1995); and (ii) the Bayesian method
from Rannala & Mountain (1997). Both methods assume
independent association of alleles, that is Hardy–Weinberg
and linkage equilibrium. For each pair of populations, the
program calculates the proportion of individuals which
are more likely to belong to the other population than the
one in which they were sampled, and it gives the weighted
mean, Pass, of the two. If an allele does not exist in one
population it is assumed that its frequency is 1/(n + 1),
where n is the sample size.
The program calculates also distance based on the overall differences in likelihoods. Consider populations X and
Y, with nx and ny individuals sampled, respectively. If LiXX
is defined as the likelihood of the genotype of individual i,
sampled from population X given to have originated in
population X, and LiXY, correspondingly, as the likelihood
of the same genotype i, sampled in population X given that
it originated in population Y, then
 nx
L xx
DLR =  ∑ log i
+
 i
Li xy

ny
L yy

∑ log Li yx  n
i
i

x
1
+ ny
Thus if DLR = 2 this means that the genotypes of individuals from the two populations being compared are
on average two orders of magnitude more likely to occur
in the individuals’ own population than in the other population (Paetkau et al. 1997). I modified the equation by
Paetkau et al. by weighting with the sample sizes (nx and
ny) when taking the average.
The association between Pass and geographical distances, and DLR and geographical distances is tested with
the Mantel test (Sokal & Rohlf 1995). Geographic distances
can be log(x + 1)-transformed.
Output: Matrix of average pairwise proportion (of the
two populations compared) assigned to the other population. Matrix of pairwise geographical distances among
populations. The probability of the observed association
(Mantel test). Matrix of DLR distances and the corresponding
probability of association.
Individual loglikelihoods of the assignment of each individual to a given population can also be printed out.
Acknowledgements
I want to thank Perttu Seppa for discussions on this work, and
Einar Árnason for comments. This work was supported by the
Icelandic Research Council.
References
Bertorelle G, Barbujani G (1995) Analysis of DNA diversity by spatial autocorrelation. Genetics, 140, 811–819.
Goroposhnaya A, Seppa P, Pamilo P (2001) Social and genetic
characteristics of geographically isloated populations in the ant
Formica cinerea. Molecular Ecology, 10, 2807–2817.
Hardy OJ, Charbonnel N, Freville H, Heuertz M (2003) Microsatellite allele sizes: a simple test to assess their significance
on genetic differentiation. Genetics, 2003, 1467–1482.
© 2004 Blackwell Publishing Ltd, Molecular Ecology Notes, 10.1046/j.1471-8286.2003.00581.x
PROGRAM NOTE 3
Hedrick PW (1999) Perspective: Highly variable loci and their
interpretation in evolution and conservation. Evolution, 53 ,
313–318.
Kimura M, Weiss GH (1964) The stepping stone model of population structure and the decrease of genetic correlation with
distance. Genetics, 49, 561–576.
Paetkau D, Calvert W, Stirling I, Strobeck C (1995) Microsatellite
analysis of population structure in Canadian polar bears. Molecular Ecology, 4, 347.
Paetkau D, Waits IP, Clarkson PL, Craighead I, Strobeck C (1997)
An empirical evaluation of genetic distance statistics using
microsatellite data from bear (Ursidae) populations. Genetics,
147, 1943–1957.
Palsson S (2000) Microsatellite variation in Daphnia pulex
from both sides of the Baltic Sea. Molecular Ecology, 9, 1075–
1088.
Pritchard JK, Stephens M, Donnelly P (2000) Inference of population
structure using multlocus genotype data. Genetics, 155, 945 –959.
Rannala B, Mountain L (1997) Detecting immigration by using
multilocus genotypes. Proceedings of the National Academy of
Sciences, 94, 9197–9221.
Slatkin M (1995) A measure of population subdivision based on
microsatellite allele frequencies. Genetics, 139, 457–462.
Sokal RR, Jacquez GM (1991) Testing inferences about microevolutinary processes by means of spatial autocorrelation.
Evolution, 45, 152–168.
Sokal RR, Rohlf J (1995) Biometry. W.H. Freeman, New York.
Wright S (1943) Isolation by distance. Genetics, 28, 114–138.
© 2004 Blackwell Publishing Ltd, Molecular Ecology Notes, 10.1046/j.1471-8286.2003.00581.x