Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Multiple Sequence Alignments Pair-wise Alignments Blast and FASTA first find small high-scoring alignments to build words which are used as a starting points for alignments Blast words default size is 3 for proteins and 11 for nucleotides GATGAGTATGCTCGTACCTGTAATGTAGTGTATAGTACATGCA GTACATGCA GTACATGCA GAAGACTATGCTCGTACCTCTAATGTAGTACATGCA Affine gaps Progressive Multiple Sequence Alignment 1. First pairwise alignments of each sequence are made to form a guide tree 2. Guide tree is used to progressively add sequences to the alignment beginning with the most closely related and continuing until the most distant A B C D E F G Pros: Cons: 1. 2. 3. 4. 5. 6. A with B ---> alignment AB C with D ---> alignment CD AB with CD ---> alignment ABCD ABCD with E ---> ABCDE ABCDE with F ---> ABCDEF ABCDEF with G ---> ABCDEFG Relatively fast Alignment errors early in the progression will be carried throughout the entire process Instant Notes Bioinformatics ClustalW ClustalW is a Multiple Sequence Alignment (MSA) program for DNA or protein sequences. It produces biologically meaningful multiple sequence alignments of divergent sequences. You need to create gulo_aa.fa using NP_848862 these NP_ accessions saved in a file NP_071556 and run your script from last week. NP_001123420 NP_001029215 sudo aptget install clustalw phylip clustalw infile=gulo_aa.fa type=protein clustalw infile=gulo_aa.aln tree output=phylip # look at the .ph file, this is a standard text format used for a phylogenetic tree # now use phylip to draw an diagram of your phylogenetic tree phylip drawgram enter gulo_aa.ph when prompted. S to change tree style phylip retree Create a single Perl script to create a Phylogenetic Tree 1. Read in a file that is a single column of NCBI accessions. 2. Use BioPerl to create a file of FASTA formatted sequences for the accessions 3. Use PRANK to perform a multiple alignment of the sequences in the file from step 2. 4. Use R to create a .jpg image of the phylogenetic tree of your sequence alignment. Perl pipeline script accessions_file BioPerl seqs.fa PRANK alignment.dnd R .jpg image PRANK Prank is a multiple sequence alignment application that first performs pairwise alignments of each sequence to form a multiple alignment. Then it generates a new guide tree based on this first alignment and makes a second, more improved alignment Search the web for prank alignment download, unzip and install sudo aptget install g++ tar xzf prank.src.100802.tgz cd prank sudo make sudo cp prank /usr/local/bin prank gulo_aa.fa #install g++ compiler #copy prank to UNIX path #command to run prank Output Files: First Alignment output.1.dnd output.1.fas output.1.xml file for constructing graphical tree fasta format of the alignment xml format of the alignment Second Improved Alignment output.2.dnd output.2.fas output.2.xml file for constructing graphical tree * fasta format of the second alignment xml format of the second alignment * use this file for generating tree image View and Save your alignment using R First install the ape package from the R command prompt: sudo R #start R from UNIX terminal install.packages("ape") #only needs to be done once in R q() #quit R and open gedit library(ape) my_tree < read.tree("output.2.dnd") jpeg("gulo_tree.jpg") plot(my_tree) dev.off() U N I X Save these commands to a text file and quit gedit: commands_file.r R save < commands_file.r #input commands_file.r into R eog gulo_tree.jpg #view image from UNIX command line Perl Pipeline Script Assignment Using only one Perl script, create a 'pipeline' script to generate a phylogenetic tree of related sequences and save as a jpg image. Pseudocode: 1. use BioPerl to get FASTA formatted sequences * get hominids D-Loop accessions file * use the dloop_accessions.txt file name as an argument * save sequences to a file in FASTA format: dloop_accessions.fa dloop_accessions.txt 2. use PRANK to create an alignment of your sequences from your FASTA file 3. use R to create a phylogenetic tree image of your PRANK alignment * save your image using the file name of your accessions list dloop_accessions.jpg * this means you will need to use Perl to write all the R commands to a file. * hints: print R_FILE "jpeg(\"image.jpg\")\n"; `R save < commands_file.r`; #backslash \" to print " #use backticks ` to run a UNIX commands within a Perl script dloop_accessions.jpg dloop_accessions.list Perl Pipeline Script AF011222 AF254446 BioPerl dloop_seqs.fa >L-gulono.. CTGTATG TAGACGT PRANK write a file R output.2.dnd commands_file.r ( A1:0.01629, A2:0.01780) library(ape) my_tree <- read attach(my_tree)