Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
BLAST http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html What is BLAST? What is it good for? Basic Local Alignment Search Tools Given query (DNA or Protein) find “matches” What is a match? How do judge a good one? Two kinds of alignment or matches Global alignment (sequence to sequence) Local alignment (subseq to subseq) Genome Revolution: COMPSCI 004G 8.1 Global Alignment Words explain (see O’Reilly BLAST) Align ‘coelacanth’ and ‘pelican’ Score +1 for match, -1 for mismatch, -1 gap coelacanth p-elican- coelacanth -pelican-- What are scores of these matches? What’s the best score? Needleman-Wunsch algorithm Genome Revolution: COMPSCI 004G 8.2 Global Alignment 0 P -1 E -2 L -3 I -4 C -5 A -6 N -7 C O E L A C A N T H -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -1 -2 Genome Revolution: COMPSCI 004G -2 -2 -1 0 8.3 Local Alignment Subsequence alignment rather than global Advantages? Tradeoffs? Score +1 for match, -1 for mismatch, -1 gap (co)ELECAN(th) (p)ELICAN Smith-Waterman: initialize to zero, only score positive, trace-back from highest score Genome Revolution: COMPSCI 004G 8.4 Local Alignment P 0 E 0 L 0 I 0 C 0 A 0 N 0 C O E L A C A N T H 0 0 0 0 0 0 0 0 0 0 Genome Revolution: COMPSCI 004G 1 2 1 1 4 8.5 Analysis How long does this algorithm take to execute? How do we measure the complexity/size? Time v. Memory We need a different measure of “gap match” and mismatch? Just using +1 or -1 doesn’t provide domain specific analysis In practice use scoring matrix, see ncbi site Genome Revolution: COMPSCI 004G 8.6 BLOSUM 62 scoring matrix http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=sef.figgrp.194 Genome Revolution: COMPSCI 004G 8.7