Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
database search
Overview:
1. FastA:is suitable for protein sequence
searching
2. BLAST:is suitable for DNA, RNA,
protein sequence searching
FastA
History:FastA was developed by
Lipman and Pearson in 1985, which is the
first database search software.
EBI provides fastA service, available at
http://www.ebi.ac.uk/Tools/fasta/
Idea: identify the short substring
matching with the target sequence.
commonly used
other software
http://www.ebi.ac.uk/Tools/sss/
example: protein sequence:
EDCIAVGQLCVFWNIGRPCCSGLCVFACTVKLP
select database
input
sequence
parameters
100% identity
17/28=60.7% (identity)
results
28 aa overlap
BLAST
Basic Local Alignment Search Tool (BLAST) .
BLAST was developed by NCBI.
BLAST finds regions of similarity between
biological sequences.
Basic BLAST
Program
Blastn
Sequence
Nucleotide
database
Program description
Nucleotide
Search a nucleotide database using a nucleotide
query Algorithms: blastn, megablast,
discontiguous megablast
Blastp
Protein
Protein
Search protein database using a protein query
Algorithms: blastp, psi-blast, phi-blast,
delta-blast
Blastx
Nucleotide
protein
Search protein database using a translated
nucleotide query
Tblastn
Protein
Nucleotide
Search translated nucleotide database using a
protein query
Tblastx
Nucleotide
Nucleotide
Search translated nucleotide database using a
translated nucleotide query
T:translation, n: nucleotide, p:protein; x: cross
BLASTALL
Query Sequence
Amino acid Sequence
BLASTp
TBLASTn
DNA Sequence
BLASTn
Translated
Protein
Database
Nucleotide
Database
Nucleotide
Database
BLASTx
TBLASTx
Translated
Translated
Protein
Database
Nucleotide
Database
Blast source
1. NCBI:
http://blast.ncbi.nlm.nih.gov/Blast.cgi/ (online
version)
ftp://ftp.ncbi.nih.gov/blast/ (stand alone)
2.other websites:
http://life.zsu.edu.cn/blast/
http://www.fruitfly.org/blast/
http://www.mcgb.uestc.edu.cn/blast/blast.html
…
BLAST
1. online:from website
2. stand alone:download the software
comparison between them
web server
advantages:
1. easy.
2. update.
4. database download is no need.
disadvantages:
1. not suitable for large data.
2. cannot define your own database.
Web Blast provided by NCBI
Blastn for nucleotide
Blastp for protein
http://blast.ncbi.nlm.nih.gov/Blast.cgi
An example:
1.
cctggcgataaccgtcttgtcggcggttgcgctgacgttgcgtcgtgatatcatcagggcAgaccggttacatccccctaa
2.
gatcgaaaaacgcttgtgttaaaaatttgctaaattttgccaatttggtaaaacagttgcAtcacaacaggagatagcaat
the first sequence
The second sequence
sequence
range
similarity from
high to low
software
results shown in new window
results of pairwise alignment
information of the
two sequences
parameters selected
No significant similarity found
Blast (standalone version)
Why we need the standalone version of
BLAST?
1. specific database
2. privacy
3. batch processing
Blast (standalone version)
How to download BLAST
ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release
blast-2.2.23-ia32-win32.exe
unzip, we can get three folders
bin: all the exe files
data:data for BLAST
doc:readme
Blast (standalone version)
We need to format the database for BLAST.
First, save your database as Fasta format;
Second, use formatdb provided in BLAST package to
format the database.
dos command:
formatdb –i sequence.fa –p T/F –o T/F –n db_name
An example
1. There are 13 proteins in the file “Delta.txt” as the
database.
2. 1 protein is selected as the query sequence, and stored
in file “seq.txt” ;
1. format Delta.txt:
formatdb –i Delta.txt –p T
parameter:
1. –i: database
2. –p: T-protein,F-nucleotide
2. search Delta.txt by using BLAST:
Blastall –p blastp –d Delta.txt –i seq.txt –o out.txt
parameter:
1. –p: program name:blastp,blastn,blastx,
tblastn,tblastx
2. –d: database name
3. –i: query sequences
4. –o: output file
3. To read other parameters just type blastall
4. Results:
Sequences producing significant alignments:
P83301|CXO_CONVE
P69749|CXD6A_CONBU
P69750|CXD6A_CONCN
P24159|CXDB_CONTE P18511|CXDA_CONTE
P60179|CXD66_CONAA
P60513|CXD6A_CONER
P69751|CXD6E_CONCT P69748|CXD6A_CONAI
P69754|CXD6B_CONMA P69753|CXD6A_CONMA
P69752|CXD6B_CONER P58913|CXD6A_CONPU
P69756|CXD6D_CONMA P69755|CXD6C_CONMA
Q9XZK5|CXSO6_CONST P69757|CXD6A_CONSE
Score
(bits)
E
Value
69
20
18
18
17
17
16
14
14
13
12
1e-017
0.009
0.036
0.042
0.066
0.11
0.19
0.56
0.62
0.89
2.6
>P83301|CXO_CONVE
Length = 33
Score = 69.3 bits (168), Expect = 1e-017, Method: Compositional matrix adjust.
Identities = 33/33 (100%)
Query: 1 EDCIAVGQLCVFWNIGRPCCSGLCVFACTVKLP 33
EDCIAVGQLCVFWNIGRPCCSGLCVFACTVKLP
Sbjct: 1 EDCIAVGQLCVFWNIGRPCCSGLCVFACTVKLP 33
>P69749|CXD6A_CONBU
Length = 27
Score = 20.0 bits (40), Expect = 0.009, Method: Compositional matrix adjust.
Identities = 13/30 (43%), Gaps = 6/30 (20%)
Query: 1 EDCIAVGQLCVFWNIGRP CCSGLCVFAC 28
C A G C
RP CCS C FAC
Sbjct: 1 DECSAPGAFCLI
RPGLCCSEFCFFAC 26
5. pairwise alignment:
bl2seq –p blastp –i seq.txt –j 1.txt –o out.txt
parameter:
1.–p: program name:blastp,blastn……
2. –i: first sequence
3. –j: second sequence
4. –o: output files
To read other parameter, just type bl2seq
6. database can be downloaded from:
ftp://ftp.ncbi.nih.gov/blast/db/
scoring matrices can be downloaded from:
ftp://ftp.ncbi.nih.gov/blast/matrices/
PSI-blast
Position specific iterative BLAST (PSIBLAST) .
Altschul et al. (1997). Gapped Blast and PSI-Blast: a new
generation of protein database search programs. Nucleic
Acids Research, 25(17):3389-3402
target: only proteins
PSI-blast
Position specific iterative BLAST (PSI-BLAST) refers to a
feature of BLAST 2.0 in which a profile is automatically
constructed from the first set of BLAST alignments. PSIBLAST is similar to NCBI BLAST2 except that it uses
position-specific scoring matrices derived during the
search, this tool is used to detect distant evolutionary
relationships.
online source:
 http://npsa-pbil.ibcp.fr/cgi-
bin/npsa_automat.pl?page=/NPSA/npsa_psiblast.ht
ml
 http://blast.ncbi.nlm.nih.gov/Blast.cgi
 http://www.ebi.ac.uk/Tools/blastpgp/
Related documents