* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PRO
Survey
Document related concepts
Histone acetylation and deacetylation wikipedia , lookup
Multi-state modeling of biomolecules wikipedia , lookup
Phosphorylation wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Magnesium transporter wikipedia , lookup
List of types of proteins wikipedia , lookup
Protein domain wikipedia , lookup
Protein design wikipedia , lookup
Protein folding wikipedia , lookup
Protein phosphorylation wikipedia , lookup
Protein moonlighting wikipedia , lookup
Protein structure prediction wikipedia , lookup
Protein (nutrient) wikipedia , lookup
Proteolysis wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Transcript
Protein Ontology (PRO) Ontology for semantic integration of heterogeneous biological data • OBO Foundry establishes rules and best practices to create a suite of orthogonal interoperable reference ontologies • PRO - one of the first set of OBO Foundry ontologies 1 Protein Ontology (PRO) PRO in OBO (Open Biological and Biomedical Ontologies) Foundry – Reference Ontology for Proteins – Capture knowledge about proteins to model biology Three sub-ontologies – Ontology for Protein Evolution (ProEvo): Captures protein classes reflecting evolutionary relatedness of proteins – Ontology for Protein Forms (ProForm): Captures different protein forms of a given gene locus from genetic variations, alternative splicing, proteolytic cleavage, PTMs => “proteoforms” – Ontology for Protein Complexes (ProComp): Captures distinct complexes as exist in different species and defines through component proteins The Protein Ontology: a structured representation of protein forms and complexes Natale DA, Arighi CN, Barker WC, Blake JA, Bult CJ, Caudy M, Drabkin HJ, D'Eustachio P, Evsikov AV, Huang H, Nchoutmboube J, Roberts NV, Smith B, Zhang J, Wu CH. (2011) Nucleic Acids Res. 39, D539-545 2 PRO Framework PRO (ProForm, ProEvo, ProComp) is aligned with other OBO Foundry ontologies under the umbrella of the Basic Formal Ontology (BFO) PRO terms are defined/annotated using other ontologies and resources via definition of relations or mappings when appropriate 3 Why PRO Provides formalization to support precise annotation of specific protein classes/forms/complexes, allowing accurate data mapping, integration, analysis Allows specification of relationships between PRO and other ontologies, such as GO, SO (Sequence Ontology), PSI-MOD, ChEBI, CL (Cell Ontology) Provides stable unique identifiers to distinct protein types Provides a formal structure to support computer-based reasoning based on homology and shared protein attributes, including “ortho-isoform,” “ortho-PTM” Representation of protein forms & complexes in biological/network context TGF-b Signaling Pathway 4 Hierarchical View smad2 protein forms & complexes Orthologous-Gene Organism-Gene Ortho-Isoform Organism-Isoform Ortho-PTM Organism-PTM Ortho-Complex Organism-Complex 5 Network View: smad2 protein forms & complexes Connecting protein forms and complexes with annotation => Modeling biology Orthologous-Gene Ortho-Isoform Ortho-Complex Ortho-PTM Organism-Complex Organism-PTM 6 PRO Workflow Data Sources • Manual annotation (curator, collaborator, user): sourceforge tracker; Race-PRO • Semi-automated processing of external databases (e.g., UniProtKB, Reactome, MouseCyc, EcoCyc); coverage of 12 reference genomes in progress Integration with text mining system • RLIMS-P/eFIP Distribution Files • Ontology (OBO) • Annotation (PAF) • Mappings (exact; is_a) 7 PRO Dissemination • PRO Website (http://www.proconsortium.org) Searching, browsing, visualization, download • PRO Views • Entry view • Table summary • OBO stanza, OWL • Ontology hierarchy • Cytoscape network • PRO Link: Persistent URL • http://purl.obolibrary.org/obo/PR_xxxxxxxxx • http://purl.obolibrary.org/obo/PR_UniProtKB_xxxxxx • OBO Foundry (http://www.obofoundry.org/) • NCBO Bioportal (http://bioportal.bioontology.org/) • EBI OLS (ontology lookup service) 8 PRO Communities • • • • • • • • • • • • • • • GO ontology: Interfaces of GO/PRO complexes; GO definition (e.g., GO:0005109) GO annotation: precise annotation of protein forms in PomBase ChEBI: Move protein terms to PRO Dendritic Cell Ontology: Define cell types based on +/- protein types [PMID:19243617] Annotation Ontology for annotating scientific documents on the web [PMID:21624159] Brucellosis Ontology (IDOBRU), Infectious Disease Ontology (IDO) [PMID:22041276] PubChem: Map protein targets to PRO Concept annotation in CRAFT corpus [PMC3476437] Neuroscience Information Framework (NIF) Reactome, MouseCyc, EcoCyc/BioCyc, Center for Molecular Immunology (Duke) Alzforum Salivaomics KB/SALO (Saliva Ontology): Saliva Biomarkers Clinical flow cytometry, immunology (IEDB), infectious disease community Top-Down Proteomics Consortium Protein Data Bank 9 PRO Consortium Team Protein Information Resource (PIR) [Georgetown U & U Delaware] Cathy Wu, Cecilia Arighi, Darren Natale, John Garavelli, Karen Ross, Natalia Roberts, Hongzhan Huang, Jian Zhang, Julie Cowart, Jia Ren The Jackson Lab – Mouse Genome Informatics (MGI) Judith Blake, Carol Bult, Harold Drabkin, Alexei Evsikov University at Buffalo-SUNY Barry Smith, Alan Ruttenberg, Alexander Diehl NYU School of Medicine – Reactome Peter D’Eustachio, Veronica Shamovsky AlzForum Elizabeth Wu Top Down Proteomics Consortium Neil Kelleher Protein Data Bank Helen Berman, John Westbrook 1R01GM080646-01 3R01GM080646-04S2 5R01GM080646-06 3R01GM080646-07S1 10 iPTMnet Framework Data Mining: iProClass database for molecular and omics data integration Text Mining: RLIMS-P/eFIP system for knowledge extraction from literature Ontology: PRO for knowledge representation of PTM forms/complexes Web portal linking data and analysis/visualization tools for scientific queries (http://proteininformationresource.org/iPTMnet) 11 iPTM Network Kinase-substrate relationship & phosphorylated protein-protein interaction => Phosphorylation Network Coupled with functional annotation and biological context => Hypothesis generation and discovery 12 PIR Team – Protein Science Team: Darren Natale, Cecilia Arighi, Kati Laiho, CR Vinayaka, Lai-Su Yeh, John Garavelli, Qinghua Wang, Karen Ross, Shruti Yerramalla – Informatics Team: Hongzhan Huang, Baris Suzek, Peter McGarvey, Leslie Arminski, Natalia Roberts, Chuming Chen, Yongxing Chen, Jing Zhang, Yuqi Wang – Students and Interns • Consortium Collaborators – UniProt: Apweiler/Bateman, Xenarios and EBI/SIB Teams – PRO: Smith (SUNY), Blake, Bult (MGI), D'Eustachio (Reactome) – BioCreative: Hirschman (MITRE), Valencia, Krallinger (CNIO), Lu, Wilbur (NCBI), Cohen (U Colorado) 13 CBCB Team – PIR Team at UD – Center Scientists/Staff: Shawn Polson (Core Coordinator), Susan Phipps (Administrative), Manabu Torii, Oana Tudor, Jennifer Wyffels – Graduate Students: Yifan Peng, Luis Lopez, TianChuan Du, Ruoyao Du, Gang Li, Zhiwen Li (CIS), Modupe Adetunji, Julie Cowart, Erin Crowgy, Jia Ren, Irem Celen, Mengxi Lv (BINF) Collaborators – UD: Vijay Shanker, Keith Decker, Ben Carterette, Li Liao, Jingyi Yu, Hagit Shatkay (CIS), Ulhas Naik (BIO), Carl Schmidt, Larry Cogburn (ANFS), Karl Steiner (ECE), Terry Papoutsakis, Maciek Antoniewicz, Kelvin Lee (ChemE) – DHSA (Delaware Health Sciences Alliance) [UD, Nemours, CCHC, TJU] – NECC (North East Cyberinfrastructure Consortium) – BiND (Bioinformatics Network of Delaware) 14 Funding Support • • • • • • • • • • • • • NIH/NHGRI &NIGMS: UniProt: A Centralized Protein Sequence and Function Resource NIH/NIGMS: PRO: Protein Ontology in OBO Foundry for Integration of Biomedical Knowledge NIH/NLM : Linking Text Mining and Data Mining for Biomedical Knowledge Discovery NIH/NCRR&NIGMS: Delaware INBRE: North East Cyberinfrastructure Consortium (NECC) NSF/DBI : Integrative Bioinformatics for Knowledge Discovery of PTM Networks NSF/DBI : BioCreative Workshops: Linking Text Mining with Ontology and Systems Biology NSF/DGE: IGERT: Systems Biology of Cells in Engineered Environments (SBE2) DOE: Experimental Systems-Biology Approaches for Clostridia-Based Bioenergy Production Delaware Health Sciences Alliance (DHSA): Linking Genotype to Phenotype Delaware Bioscience Center for Advanced Technology (CAT): Bioinformatics Optimization for Recombinant Protein Expression for Vaccines and Therapeutics UniDel Foundation: UD Center for Bioinformatics and Computational Biology NIH/NCRR & NIGMS: Delaware INBRE – Bioinformatics Core NSF: Delaware EPSCoR – Bioinformatics Core NIH: National Institutes of Health NSF: National Science of Foundation DOE: Department of Energy 15