Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
30 April 2004 AMOVA – Analysis of MOlecular VAriance To calculate AMOVA ST values we will use Arlequin and the Arlequin input file generated by Convert. This file has a predefined structure with two groups for comparison: Group 1 contains MEN, PLU, SANB, SIS, and SAND; Group 2 contains SIE, SLO, FRE, SAC, and MOD. Defining a genetic structure is only required for AMOVA analyses. Also note that if we wanted to we could perform additional genetic distance estimates with Arlequin: eg., a count of the number of different alleles between two haplotypes (a weighted FST of shared alleles, Weir and Cockerham [1984}, Michalakis and Excoffier [1996]), or the sum of the squared number of repeat differences between tow haplotypes (Slatkin’s RST, a linearized FST suited for the stepwise mutation model that we think applies to microsatellite data [1995]) Arlequin is an exploratory population genetics software environment able to handle large samples of molecular data (RFLPs, DNA sequences, microsatellites), while retaining the capacity of analyzing conventional genetic data (standard multi-locus data or mere allele frequency data). A variety of population genetics methods have been implemented either at the intra-population or at the inter-population level, and they can be selected and parameterized through a graphical interface. Arlequin is designed to handle different types of molecular or conventional (non-molecular or frequency-type) data. It can also handle data either presented in the form of genotype frequencies, or the form of haplotype frequencies, as well as the possibility of treating codominant or recessive data (with the definition of a single recessive allele per locus). Molecular data can be analyzed by entering their definition (as DNA sequences, RFLP haplotypes, microsatellite profiles, or multilocus haplotypes), or by entering a distance matrix defining the relationships among the haplotypes. The data format is specified in an input file, and this has already been done (but you can always look at Appendix 2 if you really want to see it). Arlequin performs some of the basic population genetic functions listed below: II. ● Calculates classic population parameters, FST, Hardy-Weinberg exact tests, tests of linkage disequilibrium ● Calculates Mantel tests (isolation by distance) if given a distance matrix ● Calculates pairwise genetic distances, exact tests of population differentiation, assignment of genotypes, and AMOVA 1. Configuring the input file of the bears codominant data Arlequin is real pain in the ass because of the input file format and the program is notorious for being finicky about input file configuration. Inadequate instructions on how to configure your data file for input into Arlequin are given in the 111 page Arlequin manual. Luckily this has already been done so we can move to loading the data file. Note that you can use the AMOVA procedure to test for the presence of genetic structure with dominant data under certain conditions. However, you must assume that you have the same mating pattern in all your population samples. If you are ready to make this assumption, you can pretend you have RFLP markers, and proceed with the AMOVA analysis. In that case you are going to partition the genotypic variance, and not the variance of allele frequencies as for co-dominant markers. Therefore, even though the proportion of variance due to different levels will be quite informative, and its significance will be meaningful, you should not try to compare the estimated F-statistics to those inferred from co-dominant markers. This is because F-statistics refer to correlation of Page 1 30 April 2004 genes for dominant markers, but in this case they would be equal to correlations of genotypes. 2. Open and click the Open Project tab. Click Add to list… and then select the file named bearsAMOVA2grps.arp. 3. Select the Calulation Settings tab, then the + Genetic Structure folder. You will see + AMOVA / MSN (Minimum Spanning Network), and you should select that option. Now you will be presented with a number of suboptions to choose to perform an AMOVA as well as pairwise distances, computing a distance matrix, a Locusby-Locus AMOVA, and MSN among haplotypes. (If you want to see the output of the MSN, you must use the #NEXUS block information displayed in the results file in some program that can view networks from Nexus files [eg., TreeView]). You choose among these and decide which you want to perform. When you’ve decided, press the Run tab. Your results will be displayed in a web browser, and these include population pairwise distances in addition to the AMOVA ST results. The basic interpretation of an AMOVA is to consider how much of the genetic variation can be accounted for by the apriori defined groups in the [[Structure]] part of the input file. If you want, you can use the output of the unrooted network from Phylip to define your groupings, but that’s up to you. 4. Results are displayed in a web browser with a directory structure that allows the user to go directly to the result of any particular analysis. AMOVA is like a hierarchical analysis of variance in that it separates and tests tiers of genetic diversity: Diversity among groups of populations Diversity among the populations within groups Diversity among the individuals within a population 5. A note on microsatellite data format for performing an AMOVA This depends on which kind of estimator you want to use for AMOVA. If you want to compute an analogue of Slatkin's RST, then you would need to provide your data coded in terms of absolute or relative number of microsatellite motif repetitions. This is needed to compute the sum of the square number of repeat differences between each pair of microsatellite haplotypes. If your input is proportional to the mere length of the amplified PCR product, this estimate will be flawed. In this case, compute a F-statistic, which does not use information on the amount of difference between alleles at each microsatelite locus. Page 2 30 April 2004 Appendix II. Arlequin input file format (abbreviated for samples). Note toward the end of the file where the [[Structure]] format is displayed. This is the basis for population groupings and comparisons in the AMOVA analysis. Structure can be arranged according to user-defined groupings based on geography or some other justification. [Profile] Title = "bear data" NbSamples = 10 GenotypicData = 1 LocusSeparator = WHITESPACE GameticPhase = 0 # = unknown gametic phase MissingData = '?' CompDistMatrix = 1 DataType = STANDARD [Data] [[Samples]] SampleName SampleSize SampleData MEN1 1 = "MEN" = 21 = { 187 99 187 99 165 165 202 204 128 130 235 235 118 120 240 240 187 187 99 99 163 163 202 202 136 136 233 233 120 120 240 240 = "PLU" = 22 = { 187 99 189 99 163 163 208 208 120 120 229 229 130 132 244 244 187 187 99 99 163 163 208 208 120 120 227 229 130 132 244 244 = "SANB" = 30 = { 191 105 191 107 159 161 216 216 142 144 237 237 128 130 248 248 191 193 107 107 161 163 212 214 142 144 237 239 128 128 246 246 = "SIS" = 11 = { 195 101 195 103 161 161 210 212 120 122 223 225 122 124 242 244 189 191 101 103 159 161 208 208 122 122 225 225 128 128 246 246 = "SAND" = 15 = { 195 107 195 107 163 163 214 214 142 144 237 239 128 130 244 246 187 187 163 163 208 208 120 120 227 229 130 130 246 246 . . . MEN21 1 } SampleName SampleSize SampleData PLU1 1 . . . PLU22 1 } SampleName SampleSize SampleData SANB1 1 . . . SANB30 1 } SampleName SampleSize SampleData SIS1 1 . . . SIS11 1 } SampleName SampleSize SampleData SAND1 1 . . . SAND15 1 99 99 } SampleName = "SIE" SampleSize = 20 SampleData = { Page 3 30 April 2004 SIE1 1 193 195 105 105 159 159 212 214 142 144 237 239 128 130 246 248 1 191 193 105 105 161 161 216 216 142 144 237 239 130 132 246 246 = "SLO" = 13 = { 191 105 193 105 165 165 214 216 142 144 237 239 130 132 244 246 183 185 109 109 159 159 220 220 146 148 231 233 130 130 244 244 = "FRE" = 28 = { 187 113 189 113 157 159 218 218 138 140 231 233 130 132 244 246 189 189 111 111 157 159 218 220 140 140 231 233 130 132 244 246 = "SAC" = 18 = { 187 109 187 109 159 159 218 218 138 138 231 233 130 132 244 246 183 183 113 113 157 157 222 222 142 142 231 231 130 132 244 246 = "MOD" = 30 = { 189 101 191 103 161 161 208 208 124 124 223 223 122 124 242 244 189 189 161 161 212 212 122 124 225 225 126 126 244 246 . . . SIE20 } SampleName SampleSize SampleData SLO1 1 . . . SLO13 1 } SampleName SampleSize SampleData FRE1 1 . . . FRE28 1 } SampleName SampleSize SampleData SAC1 1 . . . SAC18 1 } SampleName SampleSize SampleData MOD1 1 . . . MOD30 1 101 103 } [[Structure]] StructureName = "One Group" NbGroups = 2 IndividualLevel = 1 #group1 Group = { "MEN" "PLU" "SANB" "SIS" "SAND" } #group2 Group = { "SIE" "SLO" "FRE" "SAC" "MOD" } Page 4