* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Instructions for Gram-LocEN Web-server
Survey
Document related concepts
G protein–coupled receptor wikipedia , lookup
Magnesium transporter wikipedia , lookup
Paracrine signalling wikipedia , lookup
Gene regulatory network wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Protein moonlighting wikipedia , lookup
Protein adsorption wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Cell-penetrating peptide wikipedia , lookup
Western blot wikipedia , lookup
Cell membrane wikipedia , lookup
Transcript
Instructions for Gram-LocEN Server Instructions for Gram-LocEN Web-server Shibiao Wan∗,‡ and Man-Wai Mak‡ email: [email protected], [email protected] ∗ ‡ Princeton University, NJ, USA The Hong Kong Polytechnic University, Hong Kong SAR, China Nov 2016 Back to Gram-LocEN Server Contents 1 Significance of Subcellular Localization Prediction 2 2 Major organelles in Gram-positive and Gram-negative bacterial cells 3 3 Specific information about Gram-LocEN 5 4 Some Notes 7 1 Instructions for Gram-LocEN Server 1 Significance of Subcellular Localization Prediction Proteins must be transported to the correct organelles of a cell and folded into correct 3-D structures to properly perform their functions. Therefore, knowing the subcellular localization is one step towards understanding its functions. Proteins can exist in different locations within a cell, and some proteins can even simultaneously reside at, or move between, two or more different subcellular locations. As an essential and indispensable topic in proteomics research and molecular cell biology, protein subcellular localization is critically important for protein function annotation, drug target discovery, and drug design. Efficient and reliable computational methods are developed to assist the biological experiments such as fluorescent microscopy imaging. Proteins with multiple locations play important roles in some metabolic processes taking place in more than one compartment. 2 Instructions for Gram-LocEN Server 2 Major organelles in Gram-positive and Gram-negative bacterial cells Most of the biological activities performed by proteins occur in organelles. An organelle is a cellular component or subcellular location within a cell that has specific functions. For Gram-positive bacteria species, their proteins are usually located in cell membrane, cell wall, cytoplasm and extracellular space. Cell membrane is a biological membrane that separates the intracellular environment from extracellular space. Its basic function is to protect the cell from its surroundings. Cell wall provides structural integrity to the cell. In bacteria, the primary function of the cell wall is to protect the cell from internal turgor pressure caused by the much higher concentrations of proteins and other molecules inside the cell compared to its external environment. Cytoplasm takes up most of the cell volume where most of the cellular activities, such as cell division and metabolic pathways, occur. Extracellular space refers to the space outside the plasma membrane, which is occupied by fluid. For Gram-negative bacteria species, their proteins are located in eight subcellular locations, including cell inner membrane, cell outer membrane, cytoplasm, extracellular space, fimbrium, flagellum, nucleus and periplasm. The inner cell membrane is the selectively permeable membrane which separates the cytoplasm from the periplasm in prokaryotes with 2 membranes. The outer cell membrane is the selectively permeable membrane which separates the bacterial periplasm from its cell surroundings. The cytoplasm and extracellular space are defined above. The fimbrium is a hair-like, non-flagellar, polymeric filamentous appendage that extend from the bacterial or archaeal cell surface, such 3 Instructions for Gram-LocEN Server as type 1 pili, P-pili, type IV pili or curli. The flagellum is a long hair-like cell surface appendage. The flagellar apparatus consists of the flagellar filament made of polymerized flagellin (the propeller), the hook-like structure near the cell surface (the universal joint) and the basal body (the engine) which is a rod and a system of rings embedded in the cell envelope. Nucleus is a membrane-enclosed organelle containing most of the genetic materials for a cell. Its main function is to control the activities of the cell by regulating gene expression. The periplasm is the space between the inner and outer membrane in Gram-negative bacteria. 4 Instructions for Gram-LocEN Server 3 Specific information about Gram-LocEN Gram-LocEN is an interpretable multi-label predictor which uses unified features to yield sparse and interpretable solutions for large-scale prediction of both single-label and multilabel proteins of different species, including Gram-positive bacteria and Gram-negative bacteria. Given a query protein sequence in a particular species, a set of GO terms are retrieved from a newly created compact databases, namely ProSeq-GO. The frequencies of GO occurrences are used to formulate frequency vectors with dimensionality of more than eight thousand. By using the one-vs-rest EN-based (elastic net-based) classifiers, much fewer GO terms are selected. Subsequently, the dimension-reduced feature vectors are classified by a multi-label EN classifier. Based on the selected essential GO terms, the user of Gram-LocEN can not only determine where a protein resides, but also explains why it is located there. Similar to our previous studies [1, 2, 3, 4, 5], instead of using the Swiss-Prot and GOA databases as previous predictors [6, 7, 8, 9, 10, 11, 12], Gram-LocEN uses two newlycreated compact databases, namely ProSeq and ProSeq-GO, for GO information transfer. The ProSeq database is a sequence database in which each amino acid sequence has at least one GO term annotated to it. The ProSeq-GO comprises GO terms annotated to the protein sequences in the ProSeq database. An important property of the ProSeq and ProSeq-GO databases is that they are much smaller than the Swiss-Prot and GOA databases, respectively. Gram-LocEN is designed to be able to predict subcellular localization of proteins from Gram-positive bacteria and Gram-negative bacteria. 5 Instructions for Gram-LocEN Server • For Gram-positive bacterial proteins, 4 subcellular locations can be predicted, including: (1) cell membrane; (2) cell wall; (3) cytoplasm; and (4) extracellular space. The predictor is not designed for predicting the subcellular localization of non-Grampositive bacterial proteins when selecting predicting the Gram-positive bacterial proteins. Therefore, the prediction results of non-Gram-positive bacterial proteins in this case are arbitrary and meaningless. • For Gram-negative bacterial proteins, 8 subcellular locations can be predicted, including: (1) cell inner membrane; (2) cell outer membrane; (3) cytoplasm; (4) extracellular space; (5) fimbrium; (6) flagellum; (7) nucleus; and (8) periplasm. The predictor is not designed for predicting the subcellular localization of non-Gramnegative bacterial proteins when selecting predicting the Gram-negative bacterial proteins. Therefore, the prediction results of non-Gram-negative bacterial proteins in this case are arbitrary and meaningless. 6 Instructions for Gram-LocEN Server 4 Some Notes The following notes for Gram-LocEN web-server should be paid attention to: 1. Gram-LocEN can make prediction for the input of amino acid sequences of proteins in FASTA format. 2. Users can optionally provide email addresses to receive the prediction results. If no (or wrongly-formatted) email address is provided, the results will be shown on the webpage. On the other hand, if an email address is provided, the prediction results will be sent via email. For users’ convenience, the prediction results are still downloadable from the webpage even if the results are sent to their emails. 3. Gram-LocEN is an interpretable predictor and the interpretations for a particular query protein (or a list of proteins) are presented in a formatted table in an HTML file which can also be downloadable. 4. Because Gram-LocEN uses two new compact databases, namely ProSeq and ProSeqGO, instead of Swiss-Prot and GOA, it consumes much less memory and predicts faster. 7 Instructions for Gram-LocEN Server References [1] S. Wan, M. W. Mak, and S. Y. Kung, “Sparse regressions for predicting and interpreting subcellular localization of multi-label proteins,” BMC Bioinformatics, vol. 17, no. 97, 2016. [2] S. Wan, M. W. Mak, and S. Y. Kung, “Mem-ADSVM: a two-layer multi-label predictor for identifying multi-functional types of membrane proteins,” Journal of Theoretical Biology, vol. 398, pp. 32–42, 2016. [3] S. Wan, M. W. Mak, and S. Y. Kung, “Mem-mEN: predicting multi-functional types of membrane proteins by interpretable elastic nets,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 13, pp. 706–718, 2016. [4] S. Wan, M. W. Mak, and S. Y. Kung, “Ensemble linear neighborhood propagation for predicting subchloroplast localization of multi-location proteins,” Journal of Proteome Research, vol. to appear, pp. 1–8, 2016. [5] S. Wan, M. W. Mak, and S. Y. Kung, “R3P-Loc: A compact multi-label predictor using ridge regression and random projection for protein subcellular localization,” Journal of Theoretical Biology, vol. 360, pp. 34–45, 2014. [6] K. C. Chou, Z. C. Wu, and X. Xiao, “iLoc-Hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites,” Molecular BioSystems, vol. 8, pp. 629–641, 2012. 8 Instructions for Gram-LocEN Server [7] S. Wan, M. W. Mak, and S. Y. Kung, “mGOASVM: Multi-label protein subcellular localization based on gene ontology and support vector machines,” BMC Bioinformatics, vol. 13, pp. 290, 2012. [8] S. Wan, M. W. Mak, and S. Y. Kung, “GOASVM: A subcellular location predictor by incorporating term-frequency gene ontology into the general form of Chou’s pseudoamino acid composition,” Journal of Theoretical Biology, vol. 323, pp. 40–48, 2013. [9] S. Wan, M. W. Mak, and S. Y. Kung, “HybridGO-Loc: Mining hybrid features on gene ontology for predicting subcellular localization of multi-location proteins,” PLoS ONE, vol. 9, no. 3, pp. e89545, 2014. [10] S. Wan, M. W. Mak, and S. Y. Kung, “mPLR-Loc: An adaptive decision multi-label classifier based on penalized logistic regression for protein subcellular localization prediction,” Analytical Biochemistry, vol. 473, pp. 14–27, 2015. [11] S. Wan and M. W. Mak, Machine learning for protein subcellular localization prediction, De Gruyter, 2015. [12] S. Wan and M. W. Mak, “Predicting subcellular localization of multi-location proteins by improving support vector machines with an adaptive-decision scheme,” International Journal of Machine Learning and Cybernetics, pp. 1–13. 9