Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Quantitative comparative linguistics wikipedia , lookup

Genetic drift wikipedia , lookup

Microevolution wikipedia , lookup

Human genetic variation wikipedia , lookup

Population genetics wikipedia , lookup

Transcript
30 April 2004
AMOVA – Analysis of MOlecular VAriance
To calculate AMOVA ST values we will use Arlequin and the Arlequin input file
generated by Convert. This file has a predefined structure with two groups for comparison:
Group 1 contains MEN, PLU, SANB, SIS, and SAND; Group 2 contains SIE, SLO, FRE, SAC,
and MOD. Defining a genetic structure is only required for AMOVA analyses. Also note that if
we wanted to we could perform additional genetic distance estimates with Arlequin: eg., a count
of the number of different alleles between two haplotypes (a weighted FST of shared alleles, Weir
and Cockerham [1984}, Michalakis and Excoffier [1996]), or the sum of the squared number of
repeat differences between tow haplotypes (Slatkin’s RST, a linearized FST suited for the stepwise
mutation model that we think applies to microsatellite data [1995])
Arlequin is an exploratory population genetics software environment able to handle large
samples of molecular data (RFLPs, DNA sequences, microsatellites), while retaining the
capacity of analyzing conventional genetic data (standard multi-locus data or mere allele
frequency data). A variety of population genetics methods have been implemented either at the
intra-population or at the inter-population level, and they can be selected and parameterized
through a graphical interface. Arlequin is designed to handle different types of molecular or
conventional (non-molecular or frequency-type) data. It can also handle data either presented in
the form of genotype frequencies, or the form of haplotype frequencies, as well as the possibility
of treating codominant or recessive data (with the definition of a single recessive allele per
locus).
Molecular data can be analyzed by entering their definition (as DNA sequences, RFLP
haplotypes, microsatellite profiles, or multilocus haplotypes), or by entering a distance matrix
defining the relationships among the haplotypes. The data format is specified in an input file, and
this has already been done (but you can always look at Appendix 2 if you really want to see it).
Arlequin performs some of the basic population genetic functions listed below:
II.
● Calculates classic population parameters, FST, Hardy-Weinberg exact tests, tests of
linkage disequilibrium
● Calculates Mantel tests (isolation by distance) if given a distance matrix
● Calculates pairwise genetic distances, exact tests of population differentiation,
assignment of genotypes, and AMOVA
1. Configuring the input file of the bears codominant data
Arlequin is real pain in the ass because of the input file format and the program is
notorious for being finicky about input file configuration. Inadequate instructions on how
to configure your data file for input into Arlequin are given in the 111 page Arlequin
manual. Luckily this has already been done so we can move to loading the data file. Note
that you can use the AMOVA procedure to test for the presence of genetic structure with
dominant data under certain conditions. However, you must assume that you have the
same mating pattern in all your population samples. If you are ready to make this
assumption, you can pretend you have RFLP markers, and proceed with the AMOVA
analysis. In that case you are going to partition the genotypic variance, and not the
variance of allele frequencies as for co-dominant markers. Therefore, even though the
proportion of variance due to different levels will be quite informative, and its significance
will be meaningful, you should not try to compare the estimated F-statistics to those
inferred from co-dominant markers. This is because F-statistics refer to correlation of
Page 1
30 April 2004
genes for dominant markers, but in this case they would be equal to correlations of
genotypes.
2. Open
and click the Open Project tab.
Click Add to list… and then select the file named bearsAMOVA2grps.arp.
3. Select the Calulation Settings tab, then the + Genetic Structure folder.
You will see + AMOVA / MSN (Minimum Spanning Network), and you should
select that option. Now you will be presented with a number of suboptions to choose to
perform an AMOVA as well as pairwise distances, computing a distance matrix, a Locusby-Locus AMOVA, and MSN among haplotypes. (If you want to see the output of the
MSN, you must use the #NEXUS block information displayed in the results file in some
program that can view networks from Nexus files [eg., TreeView]). You choose among
these and decide which you want to perform. When you’ve decided, press the Run tab.
Your results will be displayed in a web browser, and these include population pairwise
distances in addition to the AMOVA ST results. The basic interpretation of an AMOVA
is to consider how much of the genetic variation can be accounted for by the apriori
defined groups in the [[Structure]] part of the input file. If you want, you can use the
output of the unrooted network from Phylip to define your groupings, but that’s up to you.
4. Results are displayed in a web browser with a directory structure that allows the user to go
directly to the result of any particular analysis. AMOVA is like a hierarchical analysis of
variance in that it separates and tests tiers of genetic diversity:



Diversity among groups of populations
Diversity among the populations within groups
Diversity among the individuals within a population
5. A note on microsatellite data format for performing an AMOVA
This depends on which kind of estimator you want to use for AMOVA. If you want to
compute an analogue of Slatkin's RST, then you would need to provide your data coded in
terms of absolute or relative number of microsatellite motif repetitions. This is needed to
compute the sum of the square number of repeat differences between each pair of
microsatellite haplotypes. If your input is proportional to the mere length of the amplified
PCR product, this estimate will be flawed. In this case, compute a F-statistic, which does
not use information on the amount of difference between alleles at each microsatelite
locus.
Page 2
30 April 2004
Appendix II. Arlequin input file format (abbreviated for samples). Note toward the end of the
file where the [[Structure]] format is displayed. This is the basis for population groupings and
comparisons in the AMOVA analysis. Structure can be arranged according to user-defined
groupings based on geography or some other justification.
[Profile]
Title = "bear data"
NbSamples = 10
GenotypicData = 1
LocusSeparator = WHITESPACE
GameticPhase = 0 # = unknown gametic phase
MissingData = '?'
CompDistMatrix = 1
DataType = STANDARD
[Data]
[[Samples]]
SampleName
SampleSize
SampleData
MEN1
1
= "MEN"
= 21
= {
187
99
187
99
165
165
202
204
128
130
235
235
118
120
240
240
187
187
99
99
163
163
202
202
136
136
233
233
120
120
240
240
= "PLU"
= 22
= {
187
99
189
99
163
163
208
208
120
120
229
229
130
132
244
244
187
187
99
99
163
163
208
208
120
120
227
229
130
132
244
244
= "SANB"
= 30
= {
191 105
191 107
159
161
216
216
142
144
237
237
128
130
248
248
191
193
107
107
161
163
212
214
142
144
237
239
128
128
246
246
= "SIS"
= 11
= {
195 101
195 103
161
161
210
212
120
122
223
225
122
124
242
244
189
191
101
103
159
161
208
208
122
122
225
225
128
128
246
246
= "SAND"
= 15
= {
195 107
195 107
163
163
214
214
142
144
237
239
128
130
244
246
187
187
163
163
208
208
120
120
227
229
130
130
246
246
.
.
.
MEN21
1
}
SampleName
SampleSize
SampleData
PLU1
1
.
.
.
PLU22
1
}
SampleName
SampleSize
SampleData
SANB1
1
.
.
.
SANB30
1
}
SampleName
SampleSize
SampleData
SIS1
1
.
.
.
SIS11
1
}
SampleName
SampleSize
SampleData
SAND1
1
.
.
.
SAND15
1
99
99
}
SampleName = "SIE"
SampleSize = 20
SampleData = {
Page 3
30 April 2004
SIE1
1
193
195
105
105
159
159
212
214
142
144
237
239
128
130
246
248
1
191
193
105
105
161
161
216
216
142
144
237
239
130
132
246
246
= "SLO"
= 13
= {
191 105
193 105
165
165
214
216
142
144
237
239
130
132
244
246
183
185
109
109
159
159
220
220
146
148
231
233
130
130
244
244
= "FRE"
= 28
= {
187 113
189 113
157
159
218
218
138
140
231
233
130
132
244
246
189
189
111
111
157
159
218
220
140
140
231
233
130
132
244
246
= "SAC"
= 18
= {
187 109
187 109
159
159
218
218
138
138
231
233
130
132
244
246
183
183
113
113
157
157
222
222
142
142
231
231
130
132
244
246
= "MOD"
= 30
= {
189 101
191 103
161
161
208
208
124
124
223
223
122
124
242
244
189
189
161
161
212
212
122
124
225
225
126
126
244
246
.
.
.
SIE20
}
SampleName
SampleSize
SampleData
SLO1
1
.
.
.
SLO13
1
}
SampleName
SampleSize
SampleData
FRE1
1
.
.
.
FRE28
1
}
SampleName
SampleSize
SampleData
SAC1
1
.
.
.
SAC18
1
}
SampleName
SampleSize
SampleData
MOD1
1
.
.
.
MOD30
1
101
103
}
[[Structure]]
StructureName = "One Group"
NbGroups = 2
IndividualLevel = 1
#group1
Group = {
"MEN"
"PLU"
"SANB"
"SIS"
"SAND"
}
#group2
Group = {
"SIE"
"SLO"
"FRE"
"SAC"
"MOD"
}
Page 4