Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
database search Overview: 1. FastA:is suitable for protein sequence searching 2. BLAST:is suitable for DNA, RNA, protein sequence searching FastA History:FastA was developed by Lipman and Pearson in 1985, which is the first database search software. EBI provides fastA service, available at http://www.ebi.ac.uk/Tools/fasta/ Idea: identify the short substring matching with the target sequence. commonly used other software http://www.ebi.ac.uk/Tools/sss/ example: protein sequence: EDCIAVGQLCVFWNIGRPCCSGLCVFACTVKLP select database input sequence parameters 100% identity 17/28=60.7% (identity) results 28 aa overlap BLAST Basic Local Alignment Search Tool (BLAST) . BLAST was developed by NCBI. BLAST finds regions of similarity between biological sequences. Basic BLAST Program Blastn Sequence Nucleotide database Program description Nucleotide Search a nucleotide database using a nucleotide query Algorithms: blastn, megablast, discontiguous megablast Blastp Protein Protein Search protein database using a protein query Algorithms: blastp, psi-blast, phi-blast, delta-blast Blastx Nucleotide protein Search protein database using a translated nucleotide query Tblastn Protein Nucleotide Search translated nucleotide database using a protein query Tblastx Nucleotide Nucleotide Search translated nucleotide database using a translated nucleotide query T:translation, n: nucleotide, p:protein; x: cross BLASTALL Query Sequence Amino acid Sequence BLASTp TBLASTn DNA Sequence BLASTn Translated Protein Database Nucleotide Database Nucleotide Database BLASTx TBLASTx Translated Translated Protein Database Nucleotide Database Blast source 1. NCBI: http://blast.ncbi.nlm.nih.gov/Blast.cgi/ (online version) ftp://ftp.ncbi.nih.gov/blast/ (stand alone) 2.other websites: http://life.zsu.edu.cn/blast/ http://www.fruitfly.org/blast/ http://www.mcgb.uestc.edu.cn/blast/blast.html … BLAST 1. online:from website 2. stand alone:download the software comparison between them web server advantages: 1. easy. 2. update. 4. database download is no need. disadvantages: 1. not suitable for large data. 2. cannot define your own database. Web Blast provided by NCBI Blastn for nucleotide Blastp for protein http://blast.ncbi.nlm.nih.gov/Blast.cgi An example: 1. cctggcgataaccgtcttgtcggcggttgcgctgacgttgcgtcgtgatatcatcagggcAgaccggttacatccccctaa 2. gatcgaaaaacgcttgtgttaaaaatttgctaaattttgccaatttggtaaaacagttgcAtcacaacaggagatagcaat the first sequence The second sequence sequence range similarity from high to low software results shown in new window results of pairwise alignment information of the two sequences parameters selected No significant similarity found Blast (standalone version) Why we need the standalone version of BLAST? 1. specific database 2. privacy 3. batch processing Blast (standalone version) How to download BLAST ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release blast-2.2.23-ia32-win32.exe unzip, we can get three folders bin: all the exe files data:data for BLAST doc:readme Blast (standalone version) We need to format the database for BLAST. First, save your database as Fasta format; Second, use formatdb provided in BLAST package to format the database. dos command: formatdb –i sequence.fa –p T/F –o T/F –n db_name An example 1. There are 13 proteins in the file “Delta.txt” as the database. 2. 1 protein is selected as the query sequence, and stored in file “seq.txt” ; 1. format Delta.txt: formatdb –i Delta.txt –p T parameter: 1. –i: database 2. –p: T-protein,F-nucleotide 2. search Delta.txt by using BLAST: Blastall –p blastp –d Delta.txt –i seq.txt –o out.txt parameter: 1. –p: program name:blastp,blastn,blastx, tblastn,tblastx 2. –d: database name 3. –i: query sequences 4. –o: output file 3. To read other parameters just type blastall 4. Results: Sequences producing significant alignments: P83301|CXO_CONVE P69749|CXD6A_CONBU P69750|CXD6A_CONCN P24159|CXDB_CONTE P18511|CXDA_CONTE P60179|CXD66_CONAA P60513|CXD6A_CONER P69751|CXD6E_CONCT P69748|CXD6A_CONAI P69754|CXD6B_CONMA P69753|CXD6A_CONMA P69752|CXD6B_CONER P58913|CXD6A_CONPU P69756|CXD6D_CONMA P69755|CXD6C_CONMA Q9XZK5|CXSO6_CONST P69757|CXD6A_CONSE Score (bits) E Value 69 20 18 18 17 17 16 14 14 13 12 1e-017 0.009 0.036 0.042 0.066 0.11 0.19 0.56 0.62 0.89 2.6 >P83301|CXO_CONVE Length = 33 Score = 69.3 bits (168), Expect = 1e-017, Method: Compositional matrix adjust. Identities = 33/33 (100%) Query: 1 EDCIAVGQLCVFWNIGRPCCSGLCVFACTVKLP 33 EDCIAVGQLCVFWNIGRPCCSGLCVFACTVKLP Sbjct: 1 EDCIAVGQLCVFWNIGRPCCSGLCVFACTVKLP 33 >P69749|CXD6A_CONBU Length = 27 Score = 20.0 bits (40), Expect = 0.009, Method: Compositional matrix adjust. Identities = 13/30 (43%), Gaps = 6/30 (20%) Query: 1 EDCIAVGQLCVFWNIGRP CCSGLCVFAC 28 C A G C RP CCS C FAC Sbjct: 1 DECSAPGAFCLI RPGLCCSEFCFFAC 26 5. pairwise alignment: bl2seq –p blastp –i seq.txt –j 1.txt –o out.txt parameter: 1.–p: program name:blastp,blastn…… 2. –i: first sequence 3. –j: second sequence 4. –o: output files To read other parameter, just type bl2seq 6. database can be downloaded from: ftp://ftp.ncbi.nih.gov/blast/db/ scoring matrices can be downloaded from: ftp://ftp.ncbi.nih.gov/blast/matrices/ PSI-blast Position specific iterative BLAST (PSIBLAST) . Altschul et al. (1997). Gapped Blast and PSI-Blast: a new generation of protein database search programs. Nucleic Acids Research, 25(17):3389-3402 target: only proteins PSI-blast Position specific iterative BLAST (PSI-BLAST) refers to a feature of BLAST 2.0 in which a profile is automatically constructed from the first set of BLAST alignments. PSIBLAST is similar to NCBI BLAST2 except that it uses position-specific scoring matrices derived during the search, this tool is used to detect distant evolutionary relationships. online source: http://npsa-pbil.ibcp.fr/cgi- bin/npsa_automat.pl?page=/NPSA/npsa_psiblast.ht ml http://blast.ncbi.nlm.nih.gov/Blast.cgi http://www.ebi.ac.uk/Tools/blastpgp/