Download Advancing research through protein sequence analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Circular dichroism wikipedia , lookup

Protein wikipedia , lookup

Protein folding wikipedia , lookup

Bimolecular fluorescence complementation wikipedia , lookup

Protein design wikipedia , lookup

Cyclol wikipedia , lookup

List of types of proteins wikipedia , lookup

Protein domain wikipedia , lookup

Rosetta@home wikipedia , lookup

Structural alignment wikipedia , lookup

Protein purification wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

Western blot wikipedia , lookup

Proteomics wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Homology modeling wikipedia , lookup

Protein structure prediction wikipedia , lookup

Transcript
IBM
Life Sciences
division
Accelerate
results
using algorithms to analyze protein molecules
Advancing research through protein
sequence analysis
As a comprehensive resource
for protein sequence analysis,
PredictProtein incorporates several
cutting-edge tools for protein structure
and function prediction, including:
•
Algorithms providing prediction of:
• Binding/active sites
• Sub-cellular localization
• Domain boundaries
• Fold recognition
• Inter-residue contacts
Highlights
■ Predict the most and least useful
The problem with traditional
• Regions lacking regular structure
proteomics research is that lab
• Secondary structure
experiments are expensive and take
• Solvent accessibility
too long to produce verified results,
• Transmembrane, globular and
gene product targets using
even from a limited range of inputs. In
computational tools
the race to find answers, is it possible
to accelerate the research process
■ Discover new leads through
coiled coil regions
• Disulfide-bonds
•
Algorithms providing annotation of:
to cover more ground, predict the
• Multiple sequence alignments
improved interpretation of
most—and least—promising targets
• PROSITE sequence motifs
existing data
more accurately, and thus invest time
• Low-complexity regions
and money more successfully?
■ Decrease expense and improve
Deep dive into the protein pool
efficiency over traditional
Life beyond lab data
The algorithm at the core of
research methods
PredictProtein helps get the most out
PredictProtein annotates protein
of available lab data by extracting the
secondary structure. The latest
more reliable predictions from high-
additions to the package include:
from high-throughput analysis
throughput methods and extrapolating
data
•
from high-accuracy results. The
localization of a protein and the
system requires only protein sequence
likelihood of a protein-DNA binding. By
■ Gauge meaningful annotations
LOCtree: Predicts the subcellular
to push the experiment into new
incorporating a hierarchical ontology
territory and to analyze data aspects
of localization classes modeled onto
to a previously unattainable degree of
biological processing pathways as
accuracy. PredictProtein analyzes the
well as using structural information
protein’s properties, observing the way
from a secondary prediction, LOCtree
that it folds and operates—its unique
can deduce the protein’s cellular
function in the cell.
compartment location. LOCtree was
1
IBM Life Sciences division
extremely successful at learning
residues through whole-sequence
evolutionary similarities among sub-
computational mutagenesis studies.
cellular localization classes and was
•
•
MD: A neural-network based disorder
significantly more accurate than other
prediction method that finds proteins
traditional networks at predicting sub-
that are in a constant state of disorder—
cellular localization.
that is, those that might be antibodies
ISIS: A prediction algorithm for
or be involved in an antibody
annotating regions of proteins that
process--and annotates the part of
are likely to bind to other proteins.
the protein that is in the state of flux.
ISIS identifies interacting residues
It uses various orthogonal sources of
from sequence and determines the
information as input, particularly the
likelihood for attaching a molecule
outputs of different disorder predictors,
at a certain amino acid. Although the
and other properties that are correlated
method is developed using transient
with protein disorder including solvent
proteins—protein interfaces from
accessibility and residue flexibility. MD
complexes of experimentally known
is more accurate than its constituents
3D structures—it never explicitly uses
and other top prediction methods.
3D information. Instead, it combines
•
predicted structural features with
Backed by IBM and Biosof
evolutionary information. The
In collaboration with Biosof, which
strongest predictions of the method
markets the PredictProtein software,
reached over 90% accuracy in a cross-
IBM® provides secure infrastructure
validation experiment.
combined with cutting-edge hardware
SNAP: A prediction method for
to run these highly resource-intensive
annotating the functional effects of
algorithms.
© Copyright IBM Corporation 2008
IBM Global Services
Route 100
Somers, NY 10589
USA
Produced in the United States of America
07-08
All Rights Reserved.
IBM, the IBM logo, ibm.com and System x
are trademarks or registered trademarks of
International Business Machines Corporation
in the United States, other countries, or both.
If these and other IBM trademarked terms
are marked on their first occurrence in this
information with a trademark symbol (® or
™), these symbols indicate U.S. registered
or common law trademarks owned by IBM
at the time this information was published.
Such trademarks may also be registered or
common law trademarks in other countries.
A current list of IBM trademarks is available
on the Web at “Copyright and trademark
information” at ibm.com/legal/copytrade.
shtml
Cell Broadband Engine is a trademark of
Sony Computer Entertainment, Inc. in the
United States, other countries, or both and is
used under license therefrom.
Other company, product and service names
may be trademarks or service marks of
others. References in this publication to IBM
products and services do not imply that
IBM intends to make them available in all
countries in which IBM operates.
The IBM home page on the Internet can be
found at ibm.com
non-synonymous SNPs. SNAP needs
only sequence information as input, but
IBM offers the broadest portfolio
benefits from functional and structural
of high-performance computing
annotations if available. In a cross-
solutions optimized for the needs of
validation test on more than 80,000
proteomics with its System x™ clusters
mutants, SNAP identified 80% of the
and Cell Broadband Engine™-based
non-neutral (loss and gain of function)
architectures. Additionally, IBM Power
substitutions at 77% accuracy and 76%
technology clustered in large, dense
of the neutral (functionally unaltered)
SMPs can handle the most challenging
substitutions at 80% accuracy.
problems facing researchers and
SNAP predictions can be used in
scientists. Power technology is also
annotating functionally important
available in compact blade form factors.
2