Download PRO

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Histone acetylation and deacetylation wikipedia , lookup

Proteasome wikipedia , lookup

Multi-state modeling of biomolecules wikipedia , lookup

Phosphorylation wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Protein wikipedia , lookup

Magnesium transporter wikipedia , lookup

List of types of proteins wikipedia , lookup

Protein domain wikipedia , lookup

Protein design wikipedia , lookup

Protein folding wikipedia , lookup

Protein phosphorylation wikipedia , lookup

Protein moonlighting wikipedia , lookup

Protein structure prediction wikipedia , lookup

Protein (nutrient) wikipedia , lookup

Proteolysis wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Transcript
Protein
Ontology
(PRO)
Ontology for
semantic integration
of heterogeneous
biological data
• OBO Foundry establishes rules and best practices to create a
suite of orthogonal interoperable reference ontologies
• PRO - one of the first set of OBO Foundry ontologies
1
Protein Ontology (PRO)


PRO in OBO (Open Biological and Biomedical Ontologies) Foundry
– Reference Ontology for Proteins
– Capture knowledge about proteins to model biology
Three sub-ontologies
– Ontology for Protein Evolution (ProEvo): Captures protein classes
reflecting evolutionary relatedness of proteins
– Ontology for Protein Forms (ProForm): Captures different protein forms
of a given gene locus from genetic variations, alternative splicing,
proteolytic cleavage, PTMs => “proteoforms”
– Ontology for Protein Complexes (ProComp): Captures distinct complexes
as exist in different species and defines through component proteins
The Protein Ontology: a structured representation of protein forms and complexes
Natale DA, Arighi CN, Barker WC, Blake JA, Bult CJ, Caudy M, Drabkin HJ, D'Eustachio P, Evsikov
AV, Huang H, Nchoutmboube J, Roberts NV, Smith B, Zhang J, Wu CH. (2011)
Nucleic Acids Res. 39, D539-545
2
PRO Framework


PRO (ProForm, ProEvo, ProComp) is aligned
with other OBO Foundry ontologies under the
umbrella of the Basic Formal Ontology (BFO)
PRO terms are defined/annotated using other
ontologies and resources via definition of
relations or mappings when appropriate
3
Why PRO




Provides formalization to support precise annotation of specific protein
classes/forms/complexes, allowing accurate data mapping, integration, analysis
Allows specification of relationships between PRO and other ontologies, such as
GO, SO (Sequence Ontology), PSI-MOD, ChEBI, CL (Cell Ontology)
Provides stable unique identifiers to distinct protein types
Provides a formal structure to support computer-based reasoning based on
homology and shared protein attributes, including “ortho-isoform,” “ortho-PTM”
Representation of protein
forms & complexes in
biological/network context
TGF-b Signaling Pathway
4
Hierarchical
View
smad2
protein forms
& complexes
 Orthologous-Gene
 Organism-Gene
 Ortho-Isoform
 Organism-Isoform
 Ortho-PTM
 Organism-PTM
 Ortho-Complex
 Organism-Complex
5
Network View:
smad2 protein forms & complexes
Connecting protein
forms and complexes
with annotation
=> Modeling biology
Orthologous-Gene
Ortho-Isoform
Ortho-Complex
Ortho-PTM
Organism-Complex
Organism-PTM
6
PRO Workflow
 Data Sources
• Manual annotation (curator, collaborator, user):
sourceforge tracker; Race-PRO
• Semi-automated processing of external databases
(e.g., UniProtKB, Reactome, MouseCyc, EcoCyc);
coverage of 12 reference genomes in progress
 Integration with text mining system
• RLIMS-P/eFIP
 Distribution Files
• Ontology (OBO)
• Annotation (PAF)
• Mappings (exact; is_a)
7
PRO Dissemination
• PRO Website (http://www.proconsortium.org)
 Searching, browsing, visualization, download
• PRO Views
• Entry view
• Table summary
• OBO stanza, OWL
• Ontology hierarchy
• Cytoscape network
• PRO Link: Persistent URL
• http://purl.obolibrary.org/obo/PR_xxxxxxxxx
• http://purl.obolibrary.org/obo/PR_UniProtKB_xxxxxx
• OBO Foundry (http://www.obofoundry.org/)
• NCBO Bioportal (http://bioportal.bioontology.org/)
• EBI OLS (ontology lookup service)
8
PRO Communities
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
GO ontology: Interfaces of GO/PRO complexes; GO definition (e.g., GO:0005109)
GO annotation: precise annotation of protein forms in PomBase
ChEBI: Move protein terms to PRO
Dendritic Cell Ontology: Define cell types based on +/- protein types [PMID:19243617]
Annotation Ontology for annotating scientific documents on the web [PMID:21624159]
Brucellosis Ontology (IDOBRU), Infectious Disease Ontology (IDO) [PMID:22041276]
PubChem: Map protein targets to PRO
Concept annotation in CRAFT corpus [PMC3476437]
Neuroscience Information Framework (NIF)
Reactome, MouseCyc, EcoCyc/BioCyc, Center for Molecular Immunology (Duke)
Alzforum
Salivaomics KB/SALO (Saliva Ontology): Saliva Biomarkers
Clinical flow cytometry, immunology (IEDB), infectious disease community
Top-Down Proteomics Consortium
Protein Data Bank
9
PRO Consortium Team
Protein Information Resource (PIR) [Georgetown U & U Delaware]
Cathy Wu, Cecilia Arighi, Darren Natale, John Garavelli, Karen Ross, Natalia
Roberts, Hongzhan Huang, Jian Zhang, Julie Cowart, Jia Ren
The Jackson Lab – Mouse Genome Informatics (MGI)
Judith Blake, Carol Bult, Harold Drabkin, Alexei Evsikov
University at Buffalo-SUNY
Barry Smith, Alan Ruttenberg, Alexander Diehl
NYU School of Medicine – Reactome
Peter D’Eustachio, Veronica Shamovsky
AlzForum
Elizabeth Wu
Top Down Proteomics Consortium
Neil Kelleher
Protein Data Bank
Helen Berman, John Westbrook
1R01GM080646-01
3R01GM080646-04S2
5R01GM080646-06
3R01GM080646-07S1
10
iPTMnet Framework




Data Mining: iProClass database for molecular and omics data integration
Text Mining: RLIMS-P/eFIP system for knowledge extraction from literature
Ontology: PRO for knowledge representation of PTM forms/complexes
Web portal linking data and analysis/visualization tools for scientific queries
(http://proteininformationresource.org/iPTMnet)
11
iPTM
Network
Kinase-substrate relationship & phosphorylated protein-protein interaction
=> Phosphorylation Network
Coupled with functional annotation and biological context
=> Hypothesis generation and discovery
12
PIR Team
– Protein Science Team: Darren Natale, Cecilia Arighi, Kati Laiho, CR Vinayaka, Lai-Su
Yeh, John Garavelli, Qinghua Wang, Karen Ross, Shruti Yerramalla
– Informatics Team: Hongzhan Huang, Baris Suzek, Peter McGarvey, Leslie Arminski,
Natalia Roberts, Chuming Chen, Yongxing Chen, Jing Zhang, Yuqi Wang
– Students and Interns
• Consortium Collaborators
– UniProt: Apweiler/Bateman, Xenarios and EBI/SIB Teams
– PRO: Smith (SUNY), Blake, Bult (MGI), D'Eustachio (Reactome)
– BioCreative: Hirschman (MITRE), Valencia, Krallinger (CNIO),
Lu, Wilbur (NCBI), Cohen (U Colorado)
13
CBCB Team
– PIR Team at UD
– Center Scientists/Staff: Shawn Polson (Core Coordinator), Susan Phipps (Administrative),
Manabu Torii, Oana Tudor, Jennifer Wyffels
– Graduate Students: Yifan Peng, Luis Lopez, TianChuan Du, Ruoyao Du, Gang Li, Zhiwen Li
(CIS), Modupe Adetunji, Julie Cowart, Erin Crowgy, Jia Ren, Irem Celen, Mengxi Lv (BINF)
Collaborators
– UD: Vijay Shanker, Keith Decker, Ben Carterette, Li Liao, Jingyi Yu, Hagit Shatkay (CIS), Ulhas
Naik (BIO), Carl Schmidt, Larry Cogburn (ANFS), Karl Steiner (ECE), Terry Papoutsakis,
Maciek Antoniewicz, Kelvin Lee (ChemE)
– DHSA (Delaware Health Sciences Alliance) [UD, Nemours, CCHC, TJU]
– NECC (North East Cyberinfrastructure Consortium)
– BiND (Bioinformatics Network of Delaware)
14
Funding Support
•
•
•
•
•
•
•
•
•
•
•
•
•
NIH/NHGRI &NIGMS: UniProt: A Centralized Protein Sequence and Function Resource
NIH/NIGMS: PRO: Protein Ontology in OBO Foundry for Integration of Biomedical Knowledge
NIH/NLM : Linking Text Mining and Data Mining for Biomedical Knowledge Discovery
NIH/NCRR&NIGMS: Delaware INBRE: North East Cyberinfrastructure Consortium (NECC)
NSF/DBI : Integrative Bioinformatics for Knowledge Discovery of PTM Networks
NSF/DBI : BioCreative Workshops: Linking Text Mining with Ontology and Systems Biology
NSF/DGE: IGERT: Systems Biology of Cells in Engineered Environments (SBE2)
DOE: Experimental Systems-Biology Approaches for Clostridia-Based Bioenergy Production
Delaware Health Sciences Alliance (DHSA): Linking Genotype to Phenotype
Delaware Bioscience Center for Advanced Technology (CAT): Bioinformatics Optimization for
Recombinant Protein Expression for Vaccines and Therapeutics
UniDel Foundation: UD Center for Bioinformatics and Computational Biology
NIH/NCRR & NIGMS: Delaware INBRE – Bioinformatics Core
NSF: Delaware EPSCoR – Bioinformatics Core
NIH: National Institutes of Health NSF: National Science of Foundation DOE: Department of Energy
15