Download Biological ontologies for human functional annotation and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Endogenous retrovirus wikipedia , lookup

Biosynthesis wikipedia , lookup

DNA supercoil wikipedia , lookup

Signal transduction wikipedia , lookup

Gene regulatory network wikipedia , lookup

Magnesium transporter wikipedia , lookup

Paracrine signalling wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Western blot wikipedia , lookup

Expression vector wikipedia , lookup

Interactome wikipedia , lookup

Metalloprotein wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Protein purification wikipedia , lookup

Biochemistry wikipedia , lookup

Proteolysis wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene expression wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Point mutation wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Transcript
From GENIA to BioTop –
Towards a Top-Level Ontology for
Biology
Stefan Schulz, Elena Beisswanger, Udo Hahn, Joachim
Wermter, Anand Kumar, Holger Stenzhorn
Freiburg University, Jena University, Saarland University
Germany
Need for Ontologies in Biology
Raw Data
Experimental
Data
Need for Ontologies in Biology
Raw Data
Experimental
Data
MEDLINE
Literature
Collection
Need for Ontologies in Biology
UniProt
Raw Data
Experimental
Data
Yeast
FlyBase
Databases
Mouse
MEDLINE
Literature
Collection
Need for Ontologies in Biology
UniProt
Raw Data
Experimental
Data
Yeast
Ontologies
FlyBase
Databases
Mouse
MEDLINE
Literature
Collection
Flood of Semi-Formal Bio-Ontologies
OBO (Open Biomedical Ontologies)
• About 60 ‘ontologies’
-
•
•
Gene Ontology (GO)
Sequence Ontology (SO)
Cell Ontology (CO)
Chemical entities of biological interest (ChEBI)
Anatomy of model organisms
… and others
Ontologies unconnected, upper level missing
OBO foundry project
-
Standards for bio-medical ontology development
Ontology integration
Ontological Layers
Upper
Ontology /
Top Ontology
Domain Upper
Ontology /
Top Domain Ontology
Domain Ontology
Ontological Layers
Upper
Ontology /
Top Ontology
Domain Upper
Ontology /
Top Domain Ontology
Domain Ontology
OBO (GO, SO, ChEBI, …)
Ontological Layers
Upper
Ontology /
Top Ontology
•
Domain Upper
Ontology /
Top Domain Ontology
Transcription
• DNA-dependent transcription
• antisense RNA transcription
• mRNA transcription
• rRNA transcription
• tRNA transcription
• …
(from Gene Ontology)
Domain Ontology
OBO (GO, SO, ChEBI, …)
Ontological Layers
DOLCE
BFO
Upper
Ontology /
Top Ontology
Domain Upper
Ontology /
Top Domain Ontology
Domain Ontology
OBO (GO, SO, ChEBI, …)
Ontological Layers
DOLCE
BFO
•Entity
• Continuant
• Dependent Continuant
•
Realizable Entity
•
Function
•
Role
• Independent Continuant
•
Object
•
Object Aggregate
• Occurrent
(from BFO)
Upper
Ontology /
Top Ontology
Domain Upper
Ontology /
Top Domain Ontology
Domain Ontology
OBO (GO, SO, ChEBI, …)
Ontological Layers
DOLCE
BFO
Upper
Ontology /
Top Ontology
Domain Upper
Ontology /
Top Domain Ontology
Domain Ontology
OBO (GO, SO, ChEBI, …)
Ontological Layers
DOLCE
BFO
Upper
Ontology /
Top Ontology
Domain Upper
Ontology /
Top Domain Ontology
Domain Ontology
OBO (GO, SO, ChEBI, …)
Ontological Layers
DOLCE
BFO
Upper
Ontology /
Top Ontology
Biology Domain:
• Organism
• Body Part
• Cell
• Cell Component
• Tissue
• Protein
• Nucleic Acid
• DNA
• RNA
• Biological Function
• Biological Process
Domain Upper
Ontology /
Top Domain Ontology
Domain Ontology
OBO (GO, SO, ChEBI, …)
Ontological Layers
DOLCE
BFO
Upper
Ontology /
Top Ontology
Simple Bio Upper
Ontology
Domain Upper
Ontology /
Top Domain Ontology
Domain Ontology
OBO (GO, SO, ChEBI, …)
GFO-Bio
GENIA
Ontological Layers
DOLCE
BFO
BioTop
Upper
Ontology /
Top Ontology
Simple Bio Upper
Ontology
Domain Upper
Ontology /
Top Domain Ontology
Domain Ontology
OBO (GO, SO, ChEBI, …)
GFO-Bio
GENIA
GENIA Ontology
“The GENIA ontology is intended to be a formal
model of cell signaling reactions in human. It is to
be used as a basis of thesauri and semantic
dictionaries for natural language processing
applications […]
Another use of the GENIA ontology is to provide a
basis for integrated view of multiple databases”
•
•
•
•
Developed at Tsujii Laboratory, University Tokyo
Taxonomy, 48 classes
Verbal definitions (“scope notes”)
Purpose: Semantic annotation of biological papers
Annotation of Biological Entities
We have shown that <PROTEIN_MOLECULE> interleukin-1 (IL-1)
</PROTEIN_MOLECULE > and <PROTEIN_MOLECULE> IL-2
</PROTEIN_MOLECULE> control <DNA_DOMAIN> IL-2 receptor alpha (IL-2R
alpha) gene </DNA_DOMAIN> transcription in <CELL_LINE> CD4-CD8- murine Tlymphocyte precursors </CELL_LINE>. Here we map the <DNA_FAMILY> cisacting elements </DNA_FAMILY> that mediate <PROTEIN_FAMILY> interleukin
</PROTEIN_FAMILY> responsiveness of the <DNA_DOMAIN> mouse IL-2R
alpha gene </DNA_FAMILY> using a <CELL_LINE> thymic lymphoma-derived
hybridoma (PC60) </CELL_LINE>. The transcriptional response of the
<DNA_DOMAIN> IL-2R alpha gene </DNA_DOMAIN> to stimulation by
<PROTEIN_MOLECULE> IL-1 </PROTEIN_MOLECULE> +
<PROTEIN_MOLECULE> IL-2 </PROTEIN_MOLECULE> is biphasic.
<PROTEIN_MOLECULE> IL-1 </PROTEIN_MOLECULE> induces a rapid, protein
synthesis-independent appearance of <RNA_MOLECULE> IL-2R alpha mRNA
</RNA_MOLECULE> that is blocked by inhibitors of <PROTEIN_MOLECULE>
NF-kappa B </PROTEIN_MOLECULE> activation.
Annotation of Biological Entities
We have shown that <PROTEIN_MOLECULE> interleukin-1 (IL-1)
</PROTEIN_MOLECULE > and <PROTEIN_MOLECULE> IL-2
</PROTEIN_MOLECULE> control <DNA_DOMAIN> IL-2 receptor alpha (IL-2R
alpha) gene </DNA_DOMAIN> transcription in <CELL_LINE> CD4-CD8- murine Tlymphocyte precursors </CELL_LINE>. Here we map the <DNA_FAMILY> cisacting elements </DNA_FAMILY> that mediate <PROTEIN_FAMILY> interleukin
</PROTEIN_FAMILY> responsiveness of the <DNA_DOMAIN> mouse IL-2R
alpha gene </DNA_FAMILY> using a <CELL_LINE> thymic lymphoma-derived
hybridoma (PC60) </CELL_LINE>. The transcriptional response of the
<DNA_DOMAIN> IL-2R alpha gene </DNA_DOMAIN> to stimulation by
<PROTEIN_MOLECULE> IL-1 </PROTEIN_MOLECULE> +
<PROTEIN_MOLECULE> IL-2 </PROTEIN_MOLECULE> is biphasic.
<PROTEIN_MOLECULE> IL-1 </PROTEIN_MOLECULE> induces a rapid, protein
synthesis-independent appearance of <RNA_MOLECULE> IL-2R alpha mRNA
</RNA_MOLECULE> that is blocked by inhibitors of <PROTEIN_MOLECULE>
NF-kappa B </PROTEIN_MOLECULE> activation.
Annotation of Biological Entities
We have shown that <PROTEIN_MOLECULE> interleukin-1 (IL-1)
</PROTEIN_MOLECULE > and <PROTEIN_MOLECULE> IL-2
</PROTEIN_MOLECULE> control <DNA_DOMAIN> IL-2 receptor alpha (IL-2R
alpha) gene </DNA_DOMAIN> transcription in <CELL_LINE> CD4-CD8- murine Tlymphocyte precursors </CELL_LINE>. Here we map the <DNA_FAMILY> cisacting elements </DNA_FAMILY> that mediate <PROTEIN_FAMILY> interleukin
</PROTEIN_FAMILY> responsiveness of the <DNA_DOMAIN> mouse IL-2R
alpha gene </DNA_FAMILY> using a <CELL_LINE> thymic lymphoma-derived
hybridoma (PC60) </CELL_LINE>. The transcriptional response of the
<DNA_DOMAIN> IL-2R alpha gene </DNA_DOMAIN> to stimulation by
<PROTEIN_MOLECULE> IL-1 </PROTEIN_MOLECULE> +
<PROTEIN_MOLECULE> IL-2 </PROTEIN_MOLECULE> is biphasic.
<PROTEIN_MOLECULE> IL-1 </PROTEIN_MOLECULE> induces a rapid, protein
synthesis-independent appearance of <RNA_MOLECULE> IL-2R alpha mRNA
</RNA_MOLECULE> that is blocked by inhibitors of <PROTEIN_MOLECULE>
NF-kappa B </PROTEIN_MOLECULE> activation.
Annotation of Biological Entities
We have shown that <PROTEIN_MOLECULE> interleukin-1 (IL-1)
</PROTEIN_MOLECULE > and <PROTEIN_MOLECULE> IL-2
</PROTEIN_MOLECULE> control <DNA_DOMAIN> IL-2 receptor alpha (IL-2R
alpha) gene </DNA_DOMAIN> transcription in <CELL_LINE> CD4-CD8- murine Tlymphocyte precursors </CELL_LINE>. Here we map the <DNA_FAMILY> cisacting elements </DNA_FAMILY> that mediate <PROTEIN_FAMILY> interleukin
</PROTEIN_FAMILY> responsiveness of the <DNA_DOMAIN> mouse IL-2R
alpha gene </DNA_FAMILY> using a <CELL_LINE> thymic lymphoma-derived
hybridoma (PC60) </CELL_LINE>. The transcriptional response of the
<DNA_DOMAIN> IL-2R alpha gene </DNA_DOMAIN> to stimulation by
<PROTEIN_MOLECULE> IL-1 </PROTEIN_MOLECULE> +
<PROTEIN_MOLECULE> IL-2 </PROTEIN_MOLECULE> is biphasic.
<PROTEIN_MOLECULE> IL-1 </PROTEIN_MOLECULE> induces a rapid, protein
synthesis-independent appearance of <RNA_MOLECULE> IL-2R alpha mRNA
</RNA_MOLECULE> that is blocked by inhibitors of <PROTEIN_MOLECULE>
NF-kappa B </PROTEIN_MOLECULE> activation.
GENIA Ontology
http://www-tsujii.is.s.u-tokyo.ac.jp/~genia/
corpus/GENIAontology.owl
GENIA Scope Notes
•
Amino Acid Monomer:
“An amino acid monomer, e.g. tyrosine, serine, tyr, ser”
•
DNA:
“DNAs include DNA groups, families, molecules,
domains, and regions”
False Ontological Parent
GENIA: Source
“Sources are biological
locations where
substances are found …”
•
‘Organism’, ‘Tissue’, …
are objects and not
primarily sources!
•
‘Source’ is a role …
Misconception of Type
GENIA: Cell Type
“A cell type, e.g. TLymphocyte, T-cell,
astrocyte, fibroblast”
•
Are the instances of
‘Cell Type’ classes ?
•
Otherwise rename the
class as ‘Cell’!
Non-Conformant Naming Policy
GENIA: Amino Acid, GENIA: Protein
“Proteins include
protein groups,
families, molecules,
complexes, and
substructures.”
“An amino acid
molecule or the
compounds that
consist of amino
acids.”
•
Names suggest
single molecules
•
Don’t work against
biologists’ intuition!
BioTop
• Top level ontology for biology
• Ontologically founded
• Formal semantics
• Use in bio text mining
OBO Relations Ontology
foundational
spatial
temporal
participation
OBO Relations Ontology
foundational
spatial
temporal
participation
BioTop Relations
BioTop Relations
•
partOf
•
properPartOf
•
locatedIn
•
derivesFrom
•
hasParticipant
BioTop Relations
•
partOf
•
properPartOf
•
locatedIn
•
derivesFrom
•
hasParticipant
•
hasFunction (functionOf)
BioTop Relations
grainOf (hasGrain)
componentOf (hasComponent)
•
partOf
•
properPartOf
•
locatedIn
•
derivesFrom
•
hasParticipant
•
hasFunction (functionOf)
Compound hasComponent
•
•
hasComponent non-transitive subrelation of hasPart
Example: Definition of Protein
Oxygen
Carbon
NH2
Protein
•
Arginine
Compounds are determined by their components
Collective hasGrain
•
•
hasGrain non-transitive subrelation of hasPart
Example: “Collective of Cell hasGrain only Cell”
•
Collectives can gain or lose grains without
changing their identity
Full Definitions
•
•
•
Classes defined by necessary and sufficient conditions
Rationales
- Precise understanding of meaning
- Empowering the classifier for automated validation
processes
Required introduction of new
Nucleotide
classes, e.g. Phosphate, Ribose
Base
Necessary conditions of class ‘Nucleotide’
Phosphate
Ribose
Rearranged Classes
GENIA
BioTop
properPartOf
componentOf
hasComponent
BioTop Results
• BioTop.owl:
-
143 non-taxonomic relation instances
(seven relation types)
128 classes
• BioTopGenia.owl:
-
Imports GENIA
Links BioTop classes to GENIA classes (if possible)
Ontology Integration
BioTop
OBO Ontologies
Biological Process ↔ Biological ProcessGene Ontology
Protein Function
↔ Molecular FunctionGene Ontology
Cell Component
↔ Cellular ComponentGene Ontology
Cell
↔ CellCell Ontology and CellFMA
Atom
↔ AtomsChEBI
Organic Compound ↔ Organic Molecular EntitiesChEBI
Tissue
↔ TissueFMA
DNA, RNA
↔ DNASequence Ontology, RNASequence Ontology
Protein
↔ ProteinSequence Ontology
Conclusions
•
•
•
GENIA Ontology as de facto standard for bio text
mining but with major shortcomings
BioTop as a new proposal for an upper-level ontology
for biology
- principled
- formalized
- enhanced
Coupling to OBO ontologies
BioTop:
www.ifomis.org/biotop
Current Projects on Bio Text Mining at Jena University:
BOOTStrep: www.bootstrep.eu
Bootstrapping ontologies and terminologies
StemNet: www.stemnet.de
A knowledge management system for stem cell biology