Download Statistical Package for Analyzing Mixtures (SPAM)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Population genetics wikipedia , lookup

Microevolution wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

Transcript
From which population does this individual come?
Statistical Package for Analyzing Mixtures (SPAM)
Alaska Department of Fish and Game. 2003. SPAM Version 3.7: Statistical Package for
Analyzing Mixtures. Alaska Department of Fish and Fame, Commercial Fisheries
Division, Gene Conservation Lab. Available for download from
www.genetics.cf.adfg.state.ak.us/software/spampage.php
More explicit details are provided in the very large handbook for the program.
How do I use SPAM?
1. Data Preparation
The program WhichRun can convert baseline and allele frequency Genpop files
(.txt) into a format useable by SPAM (.frq, .mix). These files must be saved
individually with the correct prefix, and a baseline file created. It took us about 20
minutes to massage the splittail example files to work for this computer class.
Excel had to be used to format from the WhichRun converted files.
The baseline file (*.bse, *.frq) contains the response (allele or phenotype)
frequencies for each character (locus, isolocus, or phenotype) and can be stored
in a single large file, or separate individual population files. The frequencies can
either be relative frequencies or actual counts. Counts are preferred because
rounding off can cause errors in calculations. In a baseline file, the first line
identifies the population with a # and the name, and is followed with the
response frequencies. If relative frequencies are used, decimal points delimit
value and a space between responses is not needed.
The mixture file (*.mix) contains individual types (genotypes for genetic
characters). The first portion of this file contains the characters used. The first
line is “* character”, followed by the character id# and names, and ending with “*
end”. The mixture data begins after a backslash (\) with each individual’s
genotype on a line and each character separated by a space. Alleles must total 2
for a locus, 4 for an isolocus, and 1 for phenotype.
2. Building a Control File
Each control file has 8 sections. The following table details the controls in each section.
To turn on and of each option in SPAM, the program recognizes T, F, true, false, yes, no,
on, off.
*estimation of
*simlulation
*options
Identifies type of analysis being done
*parameters
Specifies number of population and characters,
upper limit parameter, and tolerances for
optimization search
*characters
Defines each character's id#, type, and name
*populations
Defines each populaiton's id#, name, baseline file,
and regional aggregations
*regions
Permits aggregations to be labelled
*files
Associate files for analysis with *.ctl file
*run
denots end of control file
Allows selection of performance and output
I recommend using the default control file (found in the Control folder) and adjusting
your setting once you are able to get the program to run.
To simply run the default control file, make certain the file path is correct in the *Files
Section are correct. For this lab (with the “Columbia” Folder copied into your personal
folder) they should look something like this.
*file
path: C:\Documents and Settings\jaisrael\Desktop\ECL 290-SPAM\columbia\baseline
mixture: C:\Documents and Settings\jaisrael\Desktop\ECL 290-SPAM\columbia\mixture
output: C:\Documents and Settings\jaisrael\Desktop\ECL 290-SPAM\columbia\Output
3. Evaluating Results
Output File
*.log
*.est
*sim
*.itr
*bot
*.rsm
*.bsl
*.cmx
Purpose
Lists steps in SPAM analysis and any errors
Contribution estimates w/ jacknife S.E.; normal and likelihood C.I.s; jacknife
covariance and correlation matrices
contribution estimates w/bootstrap S.D.; percentile C.I.; bootstrap covariance
and correlation matrices
Created for estimation analyses describing maximum likelihood search for
diagnostic purposes and performance evaluation. NOT available for most
simulations
Mixture resampling goodness of fit test; VERY similar to *.sim
Outputs contribution estimates for every population in every resampling to
examine distributions or generating estimates for new reporting regions
baseline frequencies of populations at each locus
Lists all unique types found in the mixture with frequencies
*.gen
A matrix with unique types in rows and populations in columns of conditional
probabilites based on baseline frequencies and H-W equilibrium
*.pop
A matrix of unique types in rows and populations in columns of conditional
populatiton probabilites from Bayes' Rule calculated with conditianl genotype
probabilities and conribution estimates
In lab, we can try:
1. Running the Columbia estimation and simulation. Check to make sure your file names
are correct in the control files. Look at regions vs. populations. Compare known
simulation contributions with results of simulation.
2. Running the splittail example and evaluating the limited results. Add contributions from
known similar populations to evaluate contributions compared to geneclass.