Download Instructions for Gram-LocEN Web-server

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

G protein–coupled receptor wikipedia , lookup

Thylakoid wikipedia , lookup

Magnesium transporter wikipedia , lookup

SR protein wikipedia , lookup

QPNC-PAGE wikipedia , lookup

Cyclol wikipedia , lookup

Paracrine signalling wikipedia , lookup

Gene regulatory network wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Protein wikipedia , lookup

Protein moonlighting wikipedia , lookup

Protein adsorption wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Cell-penetrating peptide wikipedia , lookup

Western blot wikipedia , lookup

Cell membrane wikipedia , lookup

Endomembrane system wikipedia , lookup

List of types of proteins wikipedia , lookup

Transcript
Instructions for Gram-LocEN Server
Instructions for Gram-LocEN Web-server
Shibiao Wan∗,‡ and Man-Wai Mak‡
email: [email protected], [email protected]
∗
‡
Princeton University, NJ, USA
The Hong Kong Polytechnic University, Hong Kong SAR, China
Nov 2016
Back to Gram-LocEN Server
Contents
1 Significance of Subcellular Localization Prediction
2
2 Major organelles in Gram-positive and Gram-negative bacterial cells
3
3 Specific information about Gram-LocEN
5
4 Some Notes
7
1
Instructions for Gram-LocEN Server
1
Significance of Subcellular Localization Prediction
Proteins must be transported to the correct organelles of a cell and folded into correct
3-D structures to properly perform their functions. Therefore, knowing the subcellular
localization is one step towards understanding its functions. Proteins can exist in different
locations within a cell, and some proteins can even simultaneously reside at, or move
between, two or more different subcellular locations. As an essential and indispensable
topic in proteomics research and molecular cell biology, protein subcellular localization
is critically important for protein function annotation, drug target discovery, and drug
design. Efficient and reliable computational methods are developed to assist the biological
experiments such as fluorescent microscopy imaging. Proteins with multiple locations play
important roles in some metabolic processes taking place in more than one compartment.
2
Instructions for Gram-LocEN Server
2
Major organelles in Gram-positive and Gram-negative
bacterial cells
Most of the biological activities performed by proteins occur in organelles. An organelle
is a cellular component or subcellular location within a cell that has specific functions.
For Gram-positive bacteria species, their proteins are usually located in cell membrane,
cell wall, cytoplasm and extracellular space. Cell membrane is a biological membrane that
separates the intracellular environment from extracellular space. Its basic function is to
protect the cell from its surroundings. Cell wall provides structural integrity to the cell.
In bacteria, the primary function of the cell wall is to protect the cell from internal turgor
pressure caused by the much higher concentrations of proteins and other molecules inside
the cell compared to its external environment. Cytoplasm takes up most of the cell volume
where most of the cellular activities, such as cell division and metabolic pathways, occur.
Extracellular space refers to the space outside the plasma membrane, which is occupied
by fluid.
For Gram-negative bacteria species, their proteins are located in eight subcellular locations, including cell inner membrane, cell outer membrane, cytoplasm, extracellular space,
fimbrium, flagellum, nucleus and periplasm. The inner cell membrane is the selectively
permeable membrane which separates the cytoplasm from the periplasm in prokaryotes
with 2 membranes. The outer cell membrane is the selectively permeable membrane
which separates the bacterial periplasm from its cell surroundings. The cytoplasm and
extracellular space are defined above. The fimbrium is a hair-like, non-flagellar, polymeric filamentous appendage that extend from the bacterial or archaeal cell surface, such
3
Instructions for Gram-LocEN Server
as type 1 pili, P-pili, type IV pili or curli. The flagellum is a long hair-like cell surface
appendage. The flagellar apparatus consists of the flagellar filament made of polymerized
flagellin (the propeller), the hook-like structure near the cell surface (the universal joint)
and the basal body (the engine) which is a rod and a system of rings embedded in the
cell envelope. Nucleus is a membrane-enclosed organelle containing most of the genetic
materials for a cell. Its main function is to control the activities of the cell by regulating
gene expression. The periplasm is the space between the inner and outer membrane in
Gram-negative bacteria.
4
Instructions for Gram-LocEN Server
3
Specific information about Gram-LocEN
Gram-LocEN is an interpretable multi-label predictor which uses unified features to yield
sparse and interpretable solutions for large-scale prediction of both single-label and multilabel proteins of different species, including Gram-positive bacteria and Gram-negative
bacteria. Given a query protein sequence in a particular species, a set of GO terms are
retrieved from a newly created compact databases, namely ProSeq-GO. The frequencies
of GO occurrences are used to formulate frequency vectors with dimensionality of more
than eight thousand. By using the one-vs-rest EN-based (elastic net-based) classifiers,
much fewer GO terms are selected. Subsequently, the dimension-reduced feature vectors
are classified by a multi-label EN classifier. Based on the selected essential GO terms, the
user of Gram-LocEN can not only determine where a protein resides, but also explains
why it is located there.
Similar to our previous studies [1, 2, 3, 4, 5], instead of using the Swiss-Prot and GOA
databases as previous predictors [6, 7, 8, 9, 10, 11, 12], Gram-LocEN uses two newlycreated compact databases, namely ProSeq and ProSeq-GO, for GO information transfer.
The ProSeq database is a sequence database in which each amino acid sequence has at
least one GO term annotated to it. The ProSeq-GO comprises GO terms annotated to
the protein sequences in the ProSeq database. An important property of the ProSeq
and ProSeq-GO databases is that they are much smaller than the Swiss-Prot and GOA
databases, respectively.
Gram-LocEN is designed to be able to predict subcellular localization of proteins from
Gram-positive bacteria and Gram-negative bacteria.
5
Instructions for Gram-LocEN Server
• For Gram-positive bacterial proteins, 4 subcellular locations can be predicted, including: (1) cell membrane; (2) cell wall; (3) cytoplasm; and (4) extracellular space.
The predictor is not designed for predicting the subcellular localization of non-Grampositive bacterial proteins when selecting predicting the Gram-positive bacterial
proteins. Therefore, the prediction results of non-Gram-positive bacterial proteins
in this case are arbitrary and meaningless.
• For Gram-negative bacterial proteins, 8 subcellular locations can be predicted, including: (1) cell inner membrane; (2) cell outer membrane; (3) cytoplasm; (4)
extracellular space; (5) fimbrium; (6) flagellum; (7) nucleus; and (8) periplasm. The
predictor is not designed for predicting the subcellular localization of non-Gramnegative bacterial proteins when selecting predicting the Gram-negative bacterial
proteins. Therefore, the prediction results of non-Gram-negative bacterial proteins
in this case are arbitrary and meaningless.
6
Instructions for Gram-LocEN Server
4
Some Notes
The following notes for Gram-LocEN web-server should be paid attention to:
1. Gram-LocEN can make prediction for the input of amino acid sequences of proteins
in FASTA format.
2. Users can optionally provide email addresses to receive the prediction results. If
no (or wrongly-formatted) email address is provided, the results will be shown on
the webpage. On the other hand, if an email address is provided, the prediction
results will be sent via email. For users’ convenience, the prediction results are still
downloadable from the webpage even if the results are sent to their emails.
3. Gram-LocEN is an interpretable predictor and the interpretations for a particular
query protein (or a list of proteins) are presented in a formatted table in an HTML
file which can also be downloadable.
4. Because Gram-LocEN uses two new compact databases, namely ProSeq and ProSeqGO, instead of Swiss-Prot and GOA, it consumes much less memory and predicts
faster.
7
Instructions for Gram-LocEN Server
References
[1] S. Wan, M. W. Mak, and S. Y. Kung, “Sparse regressions for predicting and interpreting subcellular localization of multi-label proteins,” BMC Bioinformatics, vol.
17, no. 97, 2016.
[2] S. Wan, M. W. Mak, and S. Y. Kung, “Mem-ADSVM: a two-layer multi-label
predictor for identifying multi-functional types of membrane proteins,” Journal of
Theoretical Biology, vol. 398, pp. 32–42, 2016.
[3] S. Wan, M. W. Mak, and S. Y. Kung, “Mem-mEN: predicting multi-functional types
of membrane proteins by interpretable elastic nets,” IEEE/ACM Transactions on
Computational Biology and Bioinformatics, vol. 13, pp. 706–718, 2016.
[4] S. Wan, M. W. Mak, and S. Y. Kung, “Ensemble linear neighborhood propagation for predicting subchloroplast localization of multi-location proteins,” Journal of
Proteome Research, vol. to appear, pp. 1–8, 2016.
[5] S. Wan, M. W. Mak, and S. Y. Kung, “R3P-Loc: A compact multi-label predictor
using ridge regression and random projection for protein subcellular localization,”
Journal of Theoretical Biology, vol. 360, pp. 34–45, 2014.
[6] K. C. Chou, Z. C. Wu, and X. Xiao, “iLoc-Hum: using the accumulation-label scale
to predict subcellular locations of human proteins with both single and multiple
sites,” Molecular BioSystems, vol. 8, pp. 629–641, 2012.
8
Instructions for Gram-LocEN Server
[7] S. Wan, M. W. Mak, and S. Y. Kung, “mGOASVM: Multi-label protein subcellular
localization based on gene ontology and support vector machines,” BMC Bioinformatics, vol. 13, pp. 290, 2012.
[8] S. Wan, M. W. Mak, and S. Y. Kung, “GOASVM: A subcellular location predictor by
incorporating term-frequency gene ontology into the general form of Chou’s pseudoamino acid composition,” Journal of Theoretical Biology, vol. 323, pp. 40–48, 2013.
[9] S. Wan, M. W. Mak, and S. Y. Kung, “HybridGO-Loc: Mining hybrid features
on gene ontology for predicting subcellular localization of multi-location proteins,”
PLoS ONE, vol. 9, no. 3, pp. e89545, 2014.
[10] S. Wan, M. W. Mak, and S. Y. Kung, “mPLR-Loc: An adaptive decision multi-label
classifier based on penalized logistic regression for protein subcellular localization
prediction,” Analytical Biochemistry, vol. 473, pp. 14–27, 2015.
[11] S. Wan and M. W. Mak, Machine learning for protein subcellular localization prediction, De Gruyter, 2015.
[12] S. Wan and M. W. Mak, “Predicting subcellular localization of multi-location proteins by improving support vector machines with an adaptive-decision scheme,” International Journal of Machine Learning and Cybernetics, pp. 1–13.
9