Download Blast Instructions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Oracle Database wikipedia , lookup

Database wikipedia , lookup

Ingres (database) wikipedia , lookup

Concurrency control wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Clusterpoint wikipedia , lookup

ContactPoint wikipedia , lookup

Versant Object Database wikipedia , lookup

Transcript
Blastall
I. Running a Blast against a downloaded data base (example: Arabidopsis cDNA data base)
1. Download the database from the Tair web site.
2. Format the database using the command
formatdb -i dbname -p T -l name.logfile –o
here dbname is the name of the Tair database file
and name is an arbitrary label given to the logfile.
This command creates additional files associated with the
Database, e.g. .phr, .pin, .psd, psi, .psq, .logfile.
--the –p T indicates this is a protein database
This works when running blastx
An alternate formatting uses –p F when the file is not
A protein dababase, e.g. an EST database.
3. Run the blast:
blastall –p blastx –a 10 –i query –d /full/path/name –e 1e-10 –m 9 > outputfile
-a is number of results to report per sequence
-d is the name given in 2 with full path
-e is e-value
-m output format
e.g. name=TAIR10_pep_20101214_updated
4. I have downloaded the nr protein database to ~/Purple/nr/ at my Biology account.
To download it, go to ftp://ftp.ncbi.nih.gov/blast/db/
A list of links to databases will be shown. You want the ones that are "nr.XX.tar.gz", where
"xx" is 00, 01, 02, . . .
You need to download each of these because the entire database is broken up into several files.
You can use the wget command to do this: at the ftp site, right click on the link and choose
"copy link"
Then, in the unix directory you want to download the file to,issue the command
wget "URL" ,
where "URL" is the address of the link you just copied (in my experience, right clicking after
"wget" will put in the address)
Once the files have been downloaded, you need to unzip them. Use the command:
tar -xzf nr.YY.tar.gz
where "YY" is the appropriate number 01 or 02 or . . .
After you have unzipped the files, delete the .gz file using command
rm nr.YY.tar.gz
You are now ready to do the blast.
5. To run a blast against this database, first cd to ~/Purple/nr (or whatever directory holds the
nr database) and execute the following:
blastall –p blastx –a 10 –i /full/path/query –d nr –e 1e-10 –m 7 > outputfile
using, of course, whatever options one wants.
Note that if the query file is not in the database with the nr files, you'll need to
type the full path of the query file.
6. The blastall program information:
blastall 2.2.24 arguments:
-p Program Name [String]
-d Database [String]
default = nr
-i Query File [File In]
default = stdin
-e Expectation value (E) [Real]
default = 10.0
-m alignment view options:
0 = pairwise,
1 = query-anchored showing identities,
2 = query-anchored no identities,
3 = flat query-anchored, show identities,
4 = flat query-anchored, no identities,
5 = query-anchored no identities and blunt ends,
6 = flat query-anchored, no identities and blunt ends,
7 = XML Blast output,
8 = tabular,
9 tabular with comment lines
10 ASN, text
11 ASN, binary [Integer]
default = 0
range from 0 to 11
-o BLAST report Output File [File Out] Optional
default = stdout
-F Filter query sequence (DUST with blastn, SEG with others) [String]
default = T
-G Cost to open a gap (-1 invokes default behavior) [Integer]
default = -1
-E Cost to extend a gap (-1 invokes default behavior) [Integer]
default = -1
-X X dropoff value for gapped alignment (in bits) (zero invokes default behavior)
blastn 30, megablast 20, tblastx 0, all others 15 [Integer]
default = 0
-I Show GI's in deflines [T/F]
default = F
-q Penalty for a nucleotide mismatch (blastn only) [Integer]
default = -3
-r Reward for a nucleotide match (blastn only) [Integer]
default = 1
-v Number of database sequences to show one-line descriptions for (V) [Integer]
default = 500
-b Number of database sequence to show alignments for (B) [Integer]
default = 250
-f Threshold for extending hits, default if zero
blastp 11, blastn 0, blastx 12, tblastn 13
tblastx 13, megablast 0 [Real]
default = 0
-g Perform gapped alignment (not available with tblastx) [T/F]
default = T
-Q Query Genetic code to use [Integer]
default = 1
-D DB Genetic code (for tblast[nx] only) [Integer]
default = 1
-a Number of processors to use [Integer]
default = 1
-O SeqAlign file [File Out] Optional
-J Believe the query defline [T/F]
default = F
-M Matrix [String]
default = BLOSUM62
-W Word size, default if zero (blastn 11, megablast 28, all others 3) [Integer]
default = 0
-z Effective length of the database (use zero for the real size) [Real]
default = 0
-K Number of best hits from a region to keep. Off by default.
If used a value of 100 is recommended. Very high values of -v or -b is also suggested [Integer]
default = 0
-P 0 for multiple hit, 1 for single hit (does not apply to blastn) [Integer]
default = 0
-Y Effective length of the search space (use zero for the real size) [Real]
default = 0
-S Query strands to search against database (for blast[nx], and tblastx)
3 is both, 1 is top, 2 is bottom [Integer]
default = 3
-T Produce HTML output [T/F]
default = F
-l Restrict search of database to list of GI's [String] Optional
-U Use lower case filtering of FASTA sequence [T/F] Optional
-y X dropoff value for ungapped extensions in bits (0.0 invokes default behavior)
blastn 20, megablast 10, all others 7 [Real]
default = 0.0
-Z X dropoff value for final gapped alignment in bits (0.0 invokes default behavior)
blastn/megablast 100, tblastx 0, all others 25 [Integer]
default = 0
-R PSI-TBLASTN checkpoint file [File In] Optional
-n MegaBlast search [T/F]
default = F
-L Location on query sequence [String] Optional
-A Multiple Hits window size, default if zero (blastn/megablast 0, all others 40 [Integer]
default = 0
-w Frame shift penalty (OOF algorithm for blastx) [Integer]
default = 0
-t Length of the largest intron allowed in a translated nucleotide sequence when linking multiple
distinct alignments. (0 invokes default behavior; a negative value disables linking.) [Integer]
default = 0
-B Number of concatenated queries, for blastn and tblastn [Integer] Optional
default = 0
-V Force use of the legacy BLAST engine [T/F] Optional
default = F
-C Use composition-based score adjustments for blastp or tblastn:
As first character:
D or d: default (equivalent to T)
0 or F or f: no composition-based statistics
2 or T or t: Composition-based score adjustments as in Bioinformatics 21:902-911,
1: Composition-based statistics as in NAR 29:2994-3005, 2001
2005, conditioned on sequence properties
3: Composition-based score adjustment as in Bioinformatics 21:902-911,
2005, unconditionally
For programs other than tblastn, must either be absent or be D, F or 0.
As second character, if first character is equivalent to 1, 2, or 3:
U or u: unified p-value combining alignment p-value and compositional p-value in round 1 only
[String]
default = D
-s Compute locally optimal Smith-Waterman alignments (This option is only
available for gapped tblastn.) [T/F]
default = F