Download Basic Local Alignment Search Tool

Document related concepts

Protein mass spectrometry wikipedia , lookup

Protein structure prediction wikipedia , lookup

Structural alignment wikipedia , lookup

Homology modeling wikipedia , lookup

Transcript
DNA序列分析
David Shiuan
Department of Life Science
Institute of Biotechnology and
Interdisciplinary Program of Bioinformatics
National Dong Hwa University
DNA序列分析 (I)



BLAST comparison
ORF (open reading frame) Finder
Promoter Search
- Promoter Prediction (BCM)
- EPD (Eukaryote Promoter Database)
- NNPP prokaryote promoter prediction (BCM)
- ProtScan (BIMAS)
DNA序列分析 (II)





Sequence Alignment (Clastal W)
Tree Analysis (MEGA, PAUP, UPGMA)
Motif Prediction
Restriction Analysis (TCGA)
RNAFOLD (GCG)
Basic Local Alignment Search Tool

A sequence comparison algorithm
optimized for speed used to search
sequence databases for optimal local
alignments to a query.

Algorithm : A fixed procedure embodied in
a computer program.
Basic Local Alignment Search Tool

The initial search is done for a word of length
"W" that scores at least "T" when compared to
the query using a substitution matrix. Word hits
are then extended in either direction in an
attempt to generate an alignment with a score
exceeding the threshold of "S". The "T"
parameter dictates the speed and sensitivity of
the search.
Calculating alignment scores
BLOSUM62 Substitution Scoring Matrix


The BLOSUM 62 matrix shown here is a 20 x
20 matrix, in which every possible identity and
substitution is assigned a score based on the
observed frequencies of such occurences in
alignments of related proteins.
Identities are assigned the most positive scores.
The NCBI BLAST family of programs





blastp compares an amino acid query sequence against a
protein sequence database
blastn compares a nucleotide query sequence against a
nucleotide sequence database
blastx compares a nucleotide query sequence translated in
all reading frames against a protein sequence database
tblastn compares a protein query sequence against a
nucleotide sequence database dynamically translated in all
reading frames
tblastx compares the six-frame translations of a
nucleotide query sequence against the six-frame
translations of a nucleotide sequence database.
Peptide Sequence Databases
for BLAST search

nr


month


All non-redundant GenBank CDS
translations+PDB+SwissProt+PIR+PRF
All new or revised GenBank CDS
translation+PDB+SwissProt+PIR+PRF
released in the last 30 days.
swissprot

Last major release of the SWISS-PROT protein
sequence database (no updates)
Filtering of low-complexity segments
E-value for the score S

the expected number of HSPs with score
at least S is given by the formula
E = K m n e – lS
HSP : high-scoring segment pairs
m and n : sequence lengths
K and lambda : parameters
Promoter Search




ProtScan (at BIMAS)
EPD (Eukaryote Promoter Database)
Promoter Prediction (BCM)
NNPP (Prokaryote Promoter Prediction at BCM)
About the neural network method




NNPP is a method that finds eukaryotic and
prokaryotic promoters in a DNA sequence.
It has been shown that multiple functional sites
in the primary DNA are involved in the
polymerase binding process.
These elements, such as the TATA-box and the
transcription start site ("Initiator") for
eukaryotes.
These promoter elements are present in various
combinations separated by various distances in
the sequence.