Download Protein Sequence Databases

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Deoxyribozyme wikipedia , lookup

List of types of proteins wikipedia , lookup

Protein wikipedia , lookup

DNA barcoding wikipedia , lookup

Protein (nutrient) wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Protein moonlighting wikipedia , lookup

Western blot wikipedia , lookup

Protein adsorption wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Community fingerprinting wikipedia , lookup

Gene expression wikipedia , lookup

Genetic code wikipedia , lookup

Non-coding DNA wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Molecular evolution wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Point mutation wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Protein structure prediction wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Structural alignment wikipedia , lookup

Homology modeling wikipedia , lookup

Transcript
MOLECULAR DATABASE
Biological databases are stores of biological information. The journal Nucleic Acids Research regularly
publishes special issues on biological databases and has a list of such databases.
Protein Sequence Databases
In the field of bioinformatics, a sequence database is a type of biological database that is composed of a
large collection of computerized ("digital") nucleic acid sequences, protein sequences, or other polymer
sequences stored on a computer. The UniProt database is an example of a protein sequence database.
Sequence databases can be searched using a variety of methods. The most common usage is probably
searching for sequences similar to a certain target protein or gene whose sequence is already known to
the user. The BLAST program is a popular method of this type.
BLAST
In bioinformatics, BLAST for Basic Local Alignment Search Tool is an algorithm for comparing primary
biological sequence information, such as the amino-acid sequences of proteins or the nucleotides of
DNA sequences. A BLAST search enables a researcher to compare a query sequence with a library or
database of sequences, and identify library sequences that resemble the query sequence above a
certain threshold.
Using a heuristic method, BLAST finds similar sequences, by locating short matches between the two
sequences. This process of finding similar sequences is called seeding.
FASTA
FASTA is a DNA and protein sequence alignment software package first described (as FASTP) by David J.
Lipman and William R. Pearson in 1985.
FASTA is pronounced "fast A", and stands for "FAST-All", because it works with any alphabet.
FASTA takes a given nucleotide or amino acid sequence and searches a corresponding sequence
database by using local sequence alignment to find matches of similar database sequences.
The FASTA program follows a largely heuristic method which contributes to the high speed of its
execution.
Like BLAST, FASTA can be used to infer functional and evolutionary relationships between sequences as
well as help identify members of gene families.
Protein Structure Databases
In biology, a protein structure database is a database that is modeled around the various experimentally
determined protein structures. The aim of most protein structure databases is to organize and annotate
the protein structures, providing the biological community access to the experimental data in a useful
way. Data included in protein structure databases often includes three-dimensional coordinates as well
as experimental information, such as unit cell dimensions and angles for x-ray crystallography
determined structures. Though most instances, in this case either proteins or a specific structure
determinations of a protein, also contain sequence information and some databases even provide
means for performing sequence based queries, the primary attribute of a structure database is
structural information, whereas sequence databases focus on sequence information, and contain no
structural information for the majority of entries.
SEQUENCE ALIGNMENT
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to
identify regions of similarity that may be a consequence of functional, structural, or evolutionary
relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are
typically represented as rows within a matrix.
1. Pairwise alignment:
Pairwise sequence alignment methods are used to find the best-matching piecewise (local) or
global alignments of two query sequences. Pairwise alignments can only be used between two
sequences at a time, but they are efficient to calculate and are often used for methods that do
not require extreme. The three primary methods of producing pairwise alignments are
i.
dot-matrix methods,
ii.
dynamic programming, and
iii.
word methods
2. Multiple sequence alignment:
Multiple sequence alignment is an extension of pairwise alignment to incorporate more than
two sequences at a time. Multiple alignment methods try to align all of the sequences in a given
query set. Multiple alignments are often used in identifying conserved sequence regions across
a group of sequences hypothesized to be evolutionarily related. Such conserved sequence
motifs can be used in conjunction with structural and mechanistic information to locate the
catalytic active sites of enzymes.
The methods of producing Multiple sequence alignment are:
i.
Dynamic programming
ii.
Progressive methods
iii.
Iterative methods
iv.
Motif finding
v.
Techniques inspired by computer science