Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
From which population does this individual come? Statistical Package for Analyzing Mixtures (SPAM) Alaska Department of Fish and Game. 2003. SPAM Version 3.7: Statistical Package for Analyzing Mixtures. Alaska Department of Fish and Fame, Commercial Fisheries Division, Gene Conservation Lab. Available for download from www.genetics.cf.adfg.state.ak.us/software/spampage.php More explicit details are provided in the very large handbook for the program. How do I use SPAM? 1. Data Preparation The program WhichRun can convert baseline and allele frequency Genpop files (.txt) into a format useable by SPAM (.frq, .mix). These files must be saved individually with the correct prefix, and a baseline file created. It took us about 20 minutes to massage the splittail example files to work for this computer class. Excel had to be used to format from the WhichRun converted files. The baseline file (*.bse, *.frq) contains the response (allele or phenotype) frequencies for each character (locus, isolocus, or phenotype) and can be stored in a single large file, or separate individual population files. The frequencies can either be relative frequencies or actual counts. Counts are preferred because rounding off can cause errors in calculations. In a baseline file, the first line identifies the population with a # and the name, and is followed with the response frequencies. If relative frequencies are used, decimal points delimit value and a space between responses is not needed. The mixture file (*.mix) contains individual types (genotypes for genetic characters). The first portion of this file contains the characters used. The first line is “* character”, followed by the character id# and names, and ending with “* end”. The mixture data begins after a backslash (\) with each individual’s genotype on a line and each character separated by a space. Alleles must total 2 for a locus, 4 for an isolocus, and 1 for phenotype. 2. Building a Control File Each control file has 8 sections. The following table details the controls in each section. To turn on and of each option in SPAM, the program recognizes T, F, true, false, yes, no, on, off. *estimation of *simlulation *options Identifies type of analysis being done *parameters Specifies number of population and characters, upper limit parameter, and tolerances for optimization search *characters Defines each character's id#, type, and name *populations Defines each populaiton's id#, name, baseline file, and regional aggregations *regions Permits aggregations to be labelled *files Associate files for analysis with *.ctl file *run denots end of control file Allows selection of performance and output I recommend using the default control file (found in the Control folder) and adjusting your setting once you are able to get the program to run. To simply run the default control file, make certain the file path is correct in the *Files Section are correct. For this lab (with the “Columbia” Folder copied into your personal folder) they should look something like this. *file path: C:\Documents and Settings\jaisrael\Desktop\ECL 290-SPAM\columbia\baseline mixture: C:\Documents and Settings\jaisrael\Desktop\ECL 290-SPAM\columbia\mixture output: C:\Documents and Settings\jaisrael\Desktop\ECL 290-SPAM\columbia\Output 3. Evaluating Results Output File *.log *.est *sim *.itr *bot *.rsm *.bsl *.cmx Purpose Lists steps in SPAM analysis and any errors Contribution estimates w/ jacknife S.E.; normal and likelihood C.I.s; jacknife covariance and correlation matrices contribution estimates w/bootstrap S.D.; percentile C.I.; bootstrap covariance and correlation matrices Created for estimation analyses describing maximum likelihood search for diagnostic purposes and performance evaluation. NOT available for most simulations Mixture resampling goodness of fit test; VERY similar to *.sim Outputs contribution estimates for every population in every resampling to examine distributions or generating estimates for new reporting regions baseline frequencies of populations at each locus Lists all unique types found in the mixture with frequencies *.gen A matrix with unique types in rows and populations in columns of conditional probabilites based on baseline frequencies and H-W equilibrium *.pop A matrix of unique types in rows and populations in columns of conditional populatiton probabilites from Bayes' Rule calculated with conditianl genotype probabilities and conribution estimates In lab, we can try: 1. Running the Columbia estimation and simulation. Check to make sure your file names are correct in the control files. Look at regions vs. populations. Compare known simulation contributions with results of simulation. 2. Running the splittail example and evaluating the limited results. Add contributions from known similar populations to evaluate contributions compared to geneclass.