Download Multiple Sequence Alignments(pdf

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Ancestral sequence reconstruction wikipedia , lookup

Homology modeling wikipedia , lookup

Structural alignment wikipedia , lookup

Transcript
Multiple Sequence Alignments
Pair-wise Alignments
Blast and FASTA first find small high-scoring alignments to build words which
are used as a starting points for alignments
Blast words default size is 3 for proteins and 11 for nucleotides
GATGAGTATGCTCGTACCTGTAATGTAGTGTATAGTACATGCA
­­G­­TA­­­CATGCA
GTACATGCA
GAAGACTATGCTCGTACCTCTAATGTAG­­­­­­­TACATGCA
Affine gaps
Progressive Multiple Sequence Alignment
1. First pairwise alignments of each sequence are made to form a guide tree
2. Guide tree is used to progressively add sequences to the alignment
beginning with the most closely related and continuing until the most distant
A
B
C
D
E
F
G
Pros:
Cons:
1.
2.
3.
4.
5.
6.
A with B ---> alignment AB
C with D ---> alignment CD
AB with CD ---> alignment ABCD
ABCD with E ---> ABCDE
ABCDE with F
---> ABCDEF
ABCDEF with G ---> ABCDEFG
Relatively fast
Alignment errors early in the progression will be carried throughout the entire process
Instant Notes Bioinformatics
ClustalW
ClustalW is a Multiple Sequence Alignment (MSA) program for DNA or protein sequences.
It produces biologically meaningful multiple sequence alignments of divergent sequences.
You need to create gulo_aa.fa using
NP_848862 these NP_ accessions saved in a file
NP_071556
and run your script from last week.
NP_001123420
NP_001029215
sudo apt­get install clustalw phylip
clustalw ­infile=gulo_aa.fa ­type=protein
clustalw ­infile=gulo_aa.aln ­tree ­output=phylip
# look at the .ph file, this is a standard text format used for a phylogenetic tree
# now use phylip to draw an diagram of your phylogenetic tree
phylip drawgram enter gulo_aa.ph when prompted. S to change tree style
phylip retree
Create a single Perl script to create a Phylogenetic Tree
1.
Read in a file that is a single column of NCBI accessions.
2.
Use BioPerl to create a file of FASTA formatted sequences for the accessions
3.
Use PRANK to perform a multiple alignment of the sequences in the file from step 2.
4.
Use R to create a .jpg image of the phylogenetic tree of your sequence alignment.
Perl pipeline script
accessions_file
BioPerl
seqs.fa
PRANK
alignment.dnd
R
.jpg image
PRANK
Prank is a multiple sequence alignment application that first performs pairwise
alignments of each sequence to form a multiple alignment. Then it generates a new
guide tree based on this first alignment and makes a second, more improved alignment
Search the web for prank alignment download, unzip and install
sudo apt­get install g++
tar ­xzf prank.src.100802.tgz
cd prank
sudo make
sudo cp prank /usr/local/bin
prank gulo_aa.fa
#install g++ compiler
#copy prank to UNIX path
#command to run prank
Output Files:
First
Alignment
output.1.dnd
output.1.fas
output.1.xml
file for constructing graphical tree
fasta format of the alignment
xml format of the alignment
Second
Improved
Alignment
output.2.dnd
output.2.fas
output.2.xml
file for constructing graphical tree *
fasta format of the second alignment
xml format of the second alignment
* use this file for generating tree image
View and Save your alignment using R
First install the ape package from the R command prompt:
sudo R
#start R from UNIX terminal
install.packages("ape")
#only needs to be done once in R
q()
#quit R and open gedit
library(ape)
my_tree <­ read.tree("output.2.dnd")
jpeg("gulo_tree.jpg")
plot(my_tree)
dev.off()
U
N
I
X
Save these
commands
to a text file and quit
gedit:
commands_file.r
R ­­save < commands_file.r
#input commands_file.r into R
eog gulo_tree.jpg
#view image from UNIX command line
Perl Pipeline Script Assignment
Using only one Perl script, create a 'pipeline' script to generate a
phylogenetic tree of related sequences and save as a jpg image.
Pseudocode:
1.
use BioPerl to get FASTA formatted sequences
*
get hominids D-Loop accessions file
*
use the dloop_accessions.txt file name as an argument
*
save sequences to a file in FASTA format: dloop_accessions.fa
dloop_accessions.txt
2.
use PRANK to create an alignment of your sequences from your FASTA file
3.
use R to create a phylogenetic tree image of your PRANK alignment
*
save your image using the file name of your accessions list
dloop_accessions.jpg
*
this means you will need to use Perl to write all the R commands to a file.
*
hints:
print R_FILE "jpeg(\"image.jpg\")\n";
`R ­­save < commands_file.r`;
#backslash \" to print "
#use backticks ` to run a UNIX
commands within a Perl script
dloop_accessions.jpg
dloop_accessions.list
Perl Pipeline Script
AF011222
AF254446
BioPerl
dloop_seqs.fa
>L-gulono..
CTGTATG
TAGACGT
PRANK
write
a file
R
output.2.dnd
commands_file.r
(
A1:0.01629,
A2:0.01780)
library(ape)
my_tree <- read
attach(my_tree)