Download The OBO Foundry Project

Document related concepts

Extracellular matrix wikipedia , lookup

Protein moonlighting wikipedia , lookup

Cellular differentiation wikipedia , lookup

JADE1 wikipedia , lookup

Organ-on-a-chip wikipedia , lookup

List of types of proteins wikipedia , lookup

Transcript
Part III. The OBO Foundry Project:
Towards Scientific Standards and
Principles-Based Coordination in
Biomedical Ontology Development
1
High quality shared ontologies
build communities
NIH, FDA trend to consolidate ontologybased standards for the communication
and processing of biomedical data.
caBIG / NECTAR / BIRN / BRIDG ...
2
http://obo.sourceforge.net
3
http://www.geneontology.org/
4
5
6
7
The Methodology of Annotations
GO employs scientific curators, who use
experimental observations reported in the
biomedical literature to link gene products with
GO terms in annotations.
This gene product exercises this function, in this
part of the cell, leading to these biological
processes
8
The Methodology of Annotations
This process of annotating literature leads to
improvements and extensions of the ontology,
which in turn leads to better annotations
This institutes a virtuous cycle of improvement in
the quality and reach of both future annotations
and the ontology itself.
Annotations + ontology taken together yield
a slowly growing computer-interpretable
map of biological reality.
9
RECALL: Alignment of GO and Cell ontologies will
permit the generation of consistent and complete
definitions
GO
id: CL:0000062
name: osteoblast
def: "A bone-forming cell which secretes an extracellular matrix.
Hydroxyapatite crystals are then deposited into the matrix to form
bone." [MESH:A.11.329.629]
is_a: CL:0000055
relationship: develops_from CL:0000008
relationship: develops_from CL:0000375
+
Cell type
=
Osteoblast differentiation: Processes whereby an
osteoprogenitor cell or a cranial neural crest cell
acquires the specialized features of an osteoblast, a
bone-forming cell which secretes extracellular matrix.
New Definition
The OBO Foundry
11
The OBO Foundry
A subset of OBO ontologies, whose developers have
agreed in advance to accept a common set of
principles designed to ensure
intelligibility to biologists (curators, annotators, users)
formal robustness
stability
compatibility
interoperability
support for logic-based reasoning
12
The OBO Foundry
Custodians
• Michael Ashburner (Cambridge)
• Suzanna Lewis (Berkeley)
• Barry Smith (Buffalo/Saarbrücken)
13
The OBO Foundry
A collaborative experiment
participants have agreed in advance to a
growing set of principles specifying best
practices in ontology development
designed to guarantee interoperability of
ontologies from the very start
14
The OBO Foundry
The developers of each ontology commit to
its maintenance in light of scientific advance,
and to soliciting community feedback for its
improvement.
They commit to working with other Foundry
members to ensure that, for any particular
domain, there is community convergence on
a single reference ontology.
15
The OBO Foundry
Initial Candidate Members of the OBO Foundry
GO Gene Ontology
CL Cell Ontology
SO Sequence Ontology
ChEBI Chemical Ontology
PATO Phenotype (Quality) Ontology
FuGO Functional Genomics Investigation Ontology
FMA Foundational Model of Anatomy
RO Relation Ontology
16
The OBO Foundry
Under development
Disease Ontology
Mammalian Phenotype Ontology
OBO-UBO / Ontology of Biomedical Reality
Organism (Species) Ontology
Plant Trait Ontology
Protein Ontology
RnaO RNA Ontology
NCI Thesaurus ????
17
The OBO Foundry
Considered for development
Environment Ontology
Behavior Ontology
Biomedical Image Ontology
Clinical Trial Ontology
18
The
OBO Foundry
The OBO
Foundry
CRITERIA
The ontology is open and available to be used by
all.
The developers of the ontology agree in advance
to collaborate with developers of other OBO
Foundry ontology where domains overlap.
The ontology is in, or can be instantiated in, a
common formal language.
19
The OBO Foundry
CRITERIA
The ontology possesses a unique identifier
space within OBO.
The ontology provider has procedures for
identifying distinct successive versions.
The ontology includes textual definitions for
all terms.
20
The OBO Foundry
CRITERIA
The ontology has a clearly specified and
clearly delineated content.
The ontology is well-documented.
The ontology has a plurality of
independent users.
21
The OBO Foundry
CRITERIA
The ontology uses relations which are
unambiguously defined following the
pattern of definitions laid down in the
OBO Relation Ontology.*
*Genome Biology 2005, 6:R46
22
The
OBO Foundry
The OBO
Foundry
CRITERIA
Further criteria will be added over time in
order to bring about a gradual
improvement in the quality of the
ontologies in the Foundry
23
The OBO Foundry
Goal
Alignment of OBO Foundry ontologies
through a common system of formally
defined relations
to enable reasoning both within and
across ontologies
24
The OBO Foundry
A reference ontology
is analogous to a scientific theory; it seeks
to optimize representational adequacy to
its subject matter to the maximal degree
that is compatible with the constraints of
computational usefulness.
25
The OBO Foundry
An application ontology
is comparable to an engineering artifact
such as a software tool. It is constructed for
a specific practical purpose.
Examples:
National Cancer Institute Thesaurus
FuGO Functional Genomics
Investigation Ontology
26
The OBO Foundry
Reference Ontology vs.
Application Ontology
Currently, application ontologies are often built
afresh for each new task; commonly introducing
not only idiosyncrasies of format or logic, but also
simplifications or distortions of their subjectmatters.
To solve this problem application ontology
development should take place always against the
background of a formally robust reference ontology
framework
27
The OBO Foundry
Reference Ontologies promote reusability of data
if dataschemas are formulated using terms
drawn from a reference ontology used by
others, then the data will be to this degree
more accessible to others
28
The OBO Foundry
Advantages of the methodology of
shared coherently defined ontologies
• promotes quality assurance (better coding)
• guarantees automatic reasoning across
ontologies and across data at different
granularities
• makes links between ontologies explicit
• yields direct connection to temporally indexed
instance data
29
The OBO Foundry
Advantages of the methodology of
shared coherently defined ontologies
We know that high-quality ontologies can
help in creating better mappings e.g.
between human and model organism
phenotypes
S Zhang, O Bodenreider, “Alignment of Multiple
Ontologies of Anatomy: Deriving Indirect Mappings from
Direct Mappings to a Reference Ontology”, AMIA 2005
30
The OBO Foundry
Reference Ontologies
are already being used to create technology
to aid literature search
http://www.gopubmed.org/
31
The OBO Foundry
Goal:
to create a family of gold standard reference
ontologies upon which terminologies developed
for specific applications can draw
32
The OBO Foundry
Goal:
to introduce the scientific method into ontology
development:
– all Foundry ontologies must be constantly updated
in light of scientific advance
– all Foundry ontology developers must work with all
other Foundry ontology developers in a spirit of
scientific collaboration
33
The OBO Foundry
Goal:
to replace the current policy of ad hoc creation
of new database schemas by each clinical
research group by providing reference
ontologies in terms of which database schemas
can be defined
34
The OBO Foundry
Goal:
to introduce some of the features of scientific
peer review into biomedical ontology
development
35
The OBO Foundry
Goal:
to create controlled vocabularies for use
by clinical trial banks, clinical guidelines
bodies, scientific journals, ...
36
The OBO Foundry
Goal:
to create an evolving map-like
representation of the entire domain of
biological reality
37
GO’s three ontologies
biological
process
molecular
function
cellular
component
molecular
process
cellular
physiology
organism-level
physiology
cell (types)
species
cellular
anatomy
anatomy
(fly, fish,
human...)
molecular
function
(GO)
ChEBI,
Sequence,
RNA ...
molecular
process
cellular
physiology
organism-level
physiology
cell (types)
species
cellular
anatomy
anatomy
(fly, fish,
human...)
molecular
function
(GO)
ChEBI,
Sequence,
RNA ...
granular levels
molecula
r process
cellular
physiology
organism-level
physiology
molecula
r function
(GO)
normal
(functionings)
cell
(types)
species
cellular
anatomy
anatomy
(fly, fish,
human...)
ChEBI,
Sequence,
RNA ...
pathophysiology
(disease)
pathological
(malfunctionings)
pathoanatomy
(fly, fish, human ...)
molecula
r process
cellular
physiology
organism-level
physiology
pathophysiology
(disease)
molecula
r function
(GO)
cell
(types)
species
ChEBI,
Sequence,
RNA ...
pathoanatomy
(fly, fish, human ...)
cellular
anatomy
(GO)
anatomy
(fly, fish,
human...)
molecula
r process
cellular
physiology
organism-level
physiology
pathophysiology
(disease)
molecula
r function
(GO)
cell
(types)
phenotype
species
ChEBI,
Sequence,
RNA ...
pathoanatomy
(fly, fish, human ...)
cellular
anatomy
anatomy
(fly, fish,
human...)
molecula
r process
cellular
physiology
pathophysiology
(disease)
organism-level
physiology
molecula
r function
(GO)
cell
(types)
phenotype
species
pathoanatomy
(fly, fish, human ...)
ChEBI,
Sequence,
RNA ...
cellular
anatomy
anatomy
(fly, fish,
human...)
investigation
(FuGO)
Judith Blake:
“The use of bio-ontologies … ensures
consistency of data curation, supports
extensive data integration, and enables
robust exchange of information between
heterogeneous informatics systems. ..
ontologies … formally define relationships
between the concepts.”
46
"Gene Ontology: Tool for the
Unification of Biology"
an ontology "comprises a set of well-defined
terms with well-defined relationships"
(Ashburner et al., 2000, p. 27)
47
Low Hanging Fruit
Ontologies should include only those
relational assertions which hold universally
(= have the ALL-SOME form)
Often, order will matter here:
We can include
adult transformation_of child
but not
child transforms_into adult
48
The Gene Ontology
49
GO’s three ontologies
biological
processes
molecular
functions
cellular
components
50
When a gene is identified
three types of questions need to be
addressed:
1. Where is it located in the cell?
2. What functions does it have on the
molecular level?
3. To what biological processes do these
functions contribute?
51
Three granularities:
Cellular (for components)
Molecular (for functions)
Organ + organism (for processes)
52
GO has cells
but it does not include terms for molecules
or organisms within any of its three
ontologies
except e.g. GO:0018995 host
=Def. Any organism in which another
organism spends part or all of its life cycle
53
Are the relations between functions and
processes a matter of granularity?
Molecular activities are the ‘building blocks’
of biological processes ?
But they are not allowed to be represented
in GO as parts of biological processes
54
GO’s three ontologies
biological
processes
molecular
functions
cellular
components
55
What does “function” mean?
an entity has a biological function if and
only if it is part of an organism and has a
disposition to act reliably in such a way
as to contribute to the organism’s
survival
the function is this disposition
56
Improved version
an entity has a biological function if
and only if it is part of an organism and
has a disposition to act reliably in such
a way as to contribute to the
organism’s realization of the canonical
life plan for an organism of that type
57
This canonical life plan might
include
canonical embryological development
canonical growth
canonical reproduction
canonical aging
canonical death
58
The function of the heart is to
pump blood
Not every activity (process) in an organism
is the exercise of a function – there are
mal functionings
side-effects (heart beating)
accidents (external interference)
background stochastic activity
59
Kidney
60
Nephron
61
Functional Segments
62
Functions
63
Functions
This is a screwdriver
This is a good screwdriver
This is a broken screwdriver
This is a heart
This is a healthy heart
This is an unhealthy heart
64
Functions are associated with certain
characteristic process shapes
Screwdriver: rotates and simultaneously
moves forward simultaneously transferring
torque from hand and arm to screw
Heart: performs a contracting movement
inwards and an expanding movement
outwards
65
Not functioning at all
leads to death, modulo
internal factors:
plasticity
redundancy (2 kidneys)
criticality of the system involved
external factors:
prosthesis (dialysis machines, oxygen tent)
special environments
assistance from other organisms
66
What clinical medicine is for
to eliminate malfunctioning by fixing broken
body parts
(or to prevent the appearance of
malfunctioning by intervening e.g. at the
molecular level)
67
Hypothesis: there are no ‘bad’ functions
It is not the function of an oncogene to
cause cancer
Oncogenes were in every case protooncogenes with functions of their own
They become oncogenes because of bad
(non-prototypical) environments
68
Is there an exception for molecular
functions?
Does this apply only to functions on
biological levels of granularity
(= levels of granularity coarser than the
molecule) ?
If pathology is the deviation from (normal)
functioning, does it make sense to talk of a
pathological molecule?
(Pathologically functioning molecule vs.
pathologically structured molecule)
69
Is there an exception for molecular
functions?
A molecular function is a propensity of a gene
product instance to perform actions on the
molecular level of granularity.
Hypothesis 1: these actions must be reliably
such as to contribute to biological processes.
Hypothesis 2: these actions must be reliably
such as to contribute to the organism’s
realization of the canonical life plan for an
organism of that type.
70
The Gene Ontology
is a canonical ontology – it represents only
what is normal in the realm of molecular
functioning
71
The GO is a canonical
representation
“The Gene Ontology is a computational
representation of the ways in which gene
products normally function in the biological
realm”
Nucl. Acids Res. 2006: 34.
72
The FMA is a canonical
representation
It is a computational representation of types
and relations between types deduced from
the qualitative observations of the normal
human body, which have been refined and
sanctioned by successive generations of
anatomists and presented in textbooks and
atlases of structural anatomy.
73
The importance of pathways
(successive causality)
Each stage in the history of a disease
presupposes the earlier stages
Therefore need to reason across time,
tracking the order of events in time, using
relations such as derives_from,
transformation_of ...
Need pathway ontologies on every level of
granularity
74
The importance of granularity
(simultaneous causality)
Networks are continuants
At any given time there are networks existing in the
organism at different levels of granularity
Changes in one cause simultaneous changes in all
the others
(Compare Boyle’s law: a rise in temperature
causes a simultaneous increase in pressure)
75
The Granularity Gulf
most existing data-sources are of fixed,
single granularity
many (all?) clinical phenomena cross
granularities
Therefore need to reason across time,
tracking the order of events in time
76
Good ontologies require:
consistent use of terms, supported by
logically coherent (non-circular) definitions,
in equivalent human-readable and
computable formats
coherent shared treatment of relations to
allow cascading inference both within and
between ontologies
77
Three fundamental dichotomies
• continuants vs. occurrents
• dependent vs. independent
• types vs. instances
78
ONTOLOGIES ARE
REPRESENTATIONS OF
TYPES
aka kinds, universals,
categories, species, genera,
...
79
Continuants (aka endurants)
have continuous existence in time
preserve their identity through change
exist in toto whenever they exist at all
Occurrents (aka processes)
have temporal parts
unfold themselves in successive phases
exist only in their phases
80
You are a continuant
Your life is an occurrent
You are 3-dimensional
Your life is 4-dimensional
81
Dependent entities
require independent continuants as their
bearers
There is no run without a runner
There is no grin without a cat
82
Dependent vs. independent
continuants
Independent continuants (organisms, cells,
molecules, environments)
Dependent continuants (qualities, shapes,
roles, propensities, functions)
83
All occurrents are dependent entities
They are dependent on those independent
continuants which are their participants
(agents, patients, media ...)
84
Top-Level Ontology
Continuant
Independent
Continuant
Dependent
Continuant
Occurrent
(always dependent
on one or more
independent
continuants)
= A representation of top-level types
Continuant
Occurrent
biological process
Independent
Continuant
Dependent
Continuant
cell component
molecular function
Top-Level Ontology
Continuant
Independent
Continuant
Occurrent
Dependent
Continuant
Function
Side-Effect,
Stochastic
Process, ...
Functioning
Top-Level Ontology
Continuant
Independent
Continuant
Dependent
Continuant
Function
Occurrent
Functioning
Side-Effect,
Stochastic
Process, ...
Top-Level Ontology
Continuant
Independent
Continuant
Quality
Dependent
Continuant
Function
Occurrent
Functioning
Side-Effect,
Stochastic
Process, ...
Spatial
Region
instances (in space and time)
Smith B, Ceusters W, Kumar A, Rosse C. On Carcinomas and
Other Pathological Entities, Comp Functional Genomics, Apr.
2006
90
everything here is an
independent continuant
91
Functions, etc.
Some dependent continuants are
realizable
expression of a gene
application of a therapy
course of a disease
execution of an algorithm
realization of a protocol
92
Functions vs Functionings
the function of your heart = to pump blood
in your body
this function is realized in processes of
pumping blood
not all functions are realized (consider the
function of this sperm ...)
93
Concepts
Biomedical ontology integration will never be
achieved through integration of meanings or
concepts
The problem is precisely that different user
communities use different concepts
Concepts are in your head and will change as
your understanding changes
94