Download Here is my presentation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
DNA Assembly with Gaps:
Simulating Sequence Evolution
Reed A. Cartwright
Department of Genetics
University of Georgia
Synopsis
Explain the importance of
simulations.
Introduce Dawg, a new sequence
simulation program.
Example usage of Dawg.
3.12.2005
RA Cartwright
[email protected] - http://scit.us/
2
Why Simulate Phylogenies?
Biologists use many techniques to
reconstruct phylogenies based on
biological data.
However, true phylogenies are
unknown, except for a few
instances.
How then can we test the accuracy
of these reconstruction methods?
Use simulations.
3.12.2005
RA Cartwright
[email protected] - http://scit.us/
3
Why Simulate Phylogenies?
Techniques are often based on
certain models of evolution.
Simulating sequence evolution
based on these models produces an
ideal situation to test the
techniques.
Using other models can test how
robust a technique is.
3.12.2005
RA Cartwright
[email protected] - http://scit.us/
4
Testing Procedure
1. Start with a “known” tree.
2. Simulate sequence
sets based on the tree.
A
B
C
3.
of
4.
to
Estimate the trees
the simulated data.
Compare estimated trees
the original tree.
A
D
A
B
C
D
3.12.2005
D
AATTCTTTGAGTTAA
AATTCTTTGAGTTAA
AATTCTTAAAGTTAA
AATTCTTAAAGTTAA
A
A
B
C
D
B C
B C
D
AAAAGATAAAGCAAA--A
GAAAGATAAAGCAAA--A
GAAAGATAAAGAAAAACA
GAAAGATAAAGAAAAACA
RA Cartwright
[email protected] - http://scit.us/
5
Simulating Evolution
Proper simulation of molecular
evolution should include both
substitutions and indels.
However, existing programs either
do not include indels or use an
unjustified model of indel formation.
Dawg was created to address this
gap.
3.12.2005
RA Cartwright
[email protected] - http://scit.us/
6
What is Dawg?
Dawg stands for “DNA Assembly
with Gaps.”
A portable and robust program for
simulating molecular evolution.
Development Website:
http://scit.us/dawg/
3.12.2005
RA Cartwright
[email protected] - http://scit.us/
7
Comparing Software
Feature
Seq-Gen
Evolver
Indels
Rose
Dawg
Yes
Yes
Indel Parameter
Estimator
Recombination
Yes
Substitution
GTR
GTR
PAM
GTR
Rate Heterogeneity
Γ+I
Γ
Γ+I
Γ+I
Switch
File
File
File
Unix
Yes
Yes
Yes
Yes
Mac OS X
Yes
Yes
Yes
Yes
Win32
Yes
Yes
Input Format
3.12.2005
Yes
Yes
RA Cartwright
[email protected] - http://scit.us/
Yes
8
Parameters

























3.12.2005
Tree
TreeScale
Sequence
Length
Rates
Model
Freqs
Params
Width
Scale
Gamma
Alpha
Iota
GapModel
Lambda
GapParams
Reps
File
Format
GapSingleChar
GapPlus
LowerCase
Translate
NexusCode
Seed
phylogeny
coefficient to scale branch lengths by
root sequences
length of generated root sequences
rate of evolution of each root nucleotide
model of evolution: GTR|JC|K2P|K3P|HKY|F81|F84|TN
nucleotide (ACGT) frequencies
parameters for the model of evolution
block width for indels and recombination
block position scales
coefficients of variance for rate heterogeneity
shape parameters
proportions of invariant sites
models of indel formation: NB|PL|US
rates of indel formation
parameter for the indel model
number of data sets to output
output file
output format: Fasta|Nexus|Phylip|Clustal
output gaps as a single character
distinguish insertions from deletions in alignment
output sequences in lowercase
translate outputed sequences to amino acids
text or file to include between datasets in Nexus format
PRNG seed (integers)
RA Cartwright
[email protected] - http://scit.us/
9
Sample Input File
# example.dawg
Tree = ((AY727331:0.001359,AY727330:0.001359):0.084512,
(AY727327:0.006116,AY727326:0.006116):0.079756);
Model = "GTR"
Params = {1.08031, 2.45581, 0.44452,
1.09145, 4.06519, 1.00000}
Freqs = {0.353470, 0.143681, 0.178206, 0.324643}
Length = 300
Lambda = 0.143120
GapModel = "NB"
GapParams = {1, 0.753247}
Format = "Clustal"
File = "example.aln"
Seed = 1981
3.12.2005
RA Cartwright
[email protected] - http://scit.us/
10
CLUSTAL multiple sequence alignment (Created by DAWG Version 1.0.0)
AY727326
AY727327
AY727330
AY727331
TTCGAAAATATGTTAGTACTCAATATGAATTCTTTGAGTTAAAAAAGATAAAGCAAA--A
TTCGAAAATATGTTAGTACTCAATATGAATTCTTTGAGTTAAGAAAGATAAAGCAAA--A
TTCAAAAATATGCTAGGACTGAATATGAATTCTTAAAGTTAAGAAAGATAAAGAAAAACA
TTCAAAAATATGCTAGGACTGAATATGAATTCTTAAAGTTAAGAAAGATAAAGAAAAACA
AY727326
AY727327
AY727330
AY727331
ATACATAATGTGATTTCAATATTCCAATTACCTAACAATACGGCTATCAATTAAACGATT
ATACATAATGTGATTTCAATATTCCAATTACCTAACAATACGGCTATCAATTAAACGATT
GTACATAATGTAAA----TTATTGCAA---------AAAACGGCTAACAATTAGACGATT
GTACATAATGTAAA----TTATTGCAA---------AAAACGGCTAACAATTAGACGATT
AY727326
AY727327
AY727330
AY727331
TTAGGATTACACCGACAAATATTAGGCCGATATGAATTTAACATCATGTTGTATTTAGAT
TTAGGATTACACCGACAAATATTAGGCCGATATGAATTTACCATCATGTTGTATTTAGAT
TTAGGATTACGCTGACAAATATTAGGATGATATTAATTTA------TCTTGTATTTAGAT
TTAGGATTACGCTGACAAATATTAGGATGATATTAATTTA------TCTTGTATTTAGAT
AY727326
AY727327
AY727330
AY727331
GCTGTCTTTTATTAACATTCATCATTAAAT-TTGGAACCTTTTGCATTTAAGAAGTACAT
GCTGTCTTTTATTAACATTCATCATTAAAT-TTGGAACCTTTTGTATTTAAGAAGTACAT
GCTGTCTTTTATCAACATTCATCACTAGATATTGGAACCTATTGCATCTAAGAAGTACAT
GCTGTCTTTTATCAACATTCATCACTAGATATTGGAACCTATTGCATCTAAGAAGTACAT
AY727326
AY727327
AY727330
AY727331
GTTTAATAGTGTTTAAAA-TATATATGAAATTGATCATAAGGA---TCTATAAATGCGGT
GTTTAATAGTGTTTATAA-TATATATGAAATTGATCGTAAGGA---TCTATAAATGCAGT
GTTTAATAGGGTT-AAAACTATATATGAAGTCGATTATAAGGAATTTCTATAAATGTAGC
GTTTAATAGGGTT-AAAACTATATATGAAGTCGATTATAAGGAATTTCTATAAATGTAGC
AY727326
AY727327
AY727330
AY727331
TCTTCAATTTCTTG
TCTTCAATTTCTTG
TCTTCAATTTCCTA
TCTTCAATTTCCTA
3.12.2005
RA Cartwright
[email protected] - http://scit.us/
11
Estimating Indel Rate
Dawg would be of little benefit if
biologists could not estimate
parameters of indel formation from
real data.
Dawg’s indel model allows such
estimation, which is implemented in
a Perl script, lambda.pl.
3.12.2005
RA Cartwright
[email protected] - http://scit.us/
12
Example Usage:
Confidence Interval of Indel Rate
I aligned the sequences of
chloroplast trnK introns from two
Hibiscus and two Prunus species.
Using Paup*, I estimated the
phylogeny and substitution
parameters.
Using lambda.pl, I estimated the
indel formation parameters.
3.12.2005
RA Cartwright
[email protected] - http://scit.us/
13
Example Usage
From these estimated parameters of
evolution, I constructed an input file
for Dawg.
From the input file Dawg produced
a thousand simulated sequence
sets.
The rate of indel formation was
estimated for each of the simulated
sequences.
3.12.2005
RA Cartwright
[email protected] - http://scit.us/
14
Results
The estimated rate of indel
formation was 0.143120.
Bootstrapping gave a 95% CI of
0.078530 to 0.213560.
Biologically this is 8 to 21 indels per
100 substitutions.
3.12.2005
RA Cartwright
[email protected] - http://scit.us/
15
Synopsis
Explain the importance of
simulations.
Introduce Dawg, a new sequence
simulation program.
Example usage of Dawg.
3.12.2005
RA Cartwright
[email protected] - http://scit.us/
16
Thanks
 Marjorie Asmussen
 Wyatt Anderson
 John Avise
 Jim Hamrick
 Ron Pulliam
 Paul Schliekelman
3.12.2005
 Jeff Ross-Ibarra
 Beth Dakin
 Douglas Theobald
 Yong-Kyu Kim
RA Cartwright
[email protected] - http://scit.us/
17
Related documents