Download Blast and Database Searches

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Promoter (genetics) wikipedia , lookup

Molecular cloning wikipedia , lookup

DNA barcoding wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Community fingerprinting wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Non-coding DNA wikipedia , lookup

Point mutation wikipedia , lookup

Molecular evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Homology modeling wikipedia , lookup

Structural alignment wikipedia , lookup

Transcript
BLAST and Database
Searches
Mohammed Mehdi Rizvi
Molecular Lab Techniques 446
Background

BLAST is the Basic Local Alignment Search Tool.

It is an algorithm used for comparing sequences, such as amino acid
sequences of proteins, or the nucleotides of DNA sequences.

The program and algorithms were designed by David J. Lipman, Webb Miller,
Eugene Myers, Warren Gish and Stephen Altschul at NIH.

There are a number of different databases, GenBank, EMBL and DDBJ for DNA.

BLAST searches against these databases, but databases can be accessed,
searched and queried without using BLAST.

BLAST is more used for comparing alignments and similarities.
Eugene Myers
Webb Miller
David J. Lipman
Stephen Altschul
BLAST

We’ve used nucleotide BLAST in this class.

There are protein-protein BLASTs, as well as protein-translated nucleotide
BLASTs.

BLAST is useful in primer design.

Before fast algorithms such as BLAST were developed, database searches
were incredibly time consuming due to the use of full alignment procedures,
such as the Smith-Waterman algorithm.

Before BLAST, Lipman and William R. Pearson also developed FASTA, another
alignment software package, which left the legacy of the FASTA format, still
ubiquitous today.
How does it work?

The first step of the algorithm breaks the query into “words”.

The usual length for DNA is 11 characters.

A long sequence of DNA will be broken down into 11 character “words”

The words are compared against the sequence database.

A scoring matrix is used to obtain the S value.

Low complexity sequences are filtered and removed due to causing artificial
hits.
Alignment

Alignment is used to determine if a sequence is like another sequence,
uncovering identical or similar regions.

There are two alignment types: global and local.

Global contains the whole sequence to an entire other sequence.

The output of a global alignment is a comparison of two sequences.

Local alignments reveals similar, conserved regions.

BLAST, as implied by the name, uses local alignments.
Interpreting BLAST output

Lower E value suggests more significant match (smaller probability of match
by chance)

Query coverage: what percentage of the sequence is aligned.

These show where a similarity has been
found, where the colour indicates the
degree of similarity.
References

https://www.ndsu.edu/pubweb/~mcclean/plsc411/Blast-explanation-lectureand-overhead.pdf

http://blog.thegrandlocus.com/2014/06/once-upon-a-blast

https://blastalgorithm.com/

https://www.ncbi.nlm.nih.gov/

http://www.genebee.msu.su/blast/blast_faqs.html