Download Pax Terminologica - Buffalo Ontology Site

Document related concepts

Cellular differentiation wikipedia , lookup

Somatic cell nuclear transfer wikipedia , lookup

Human–animal hybrid wikipedia , lookup

Bio-MEMS wikipedia , lookup

Tissue engineering wikipedia , lookup

Gene Disease Database wikipedia , lookup

Ehud Shapiro wikipedia , lookup

Transcript
Common Anatomy Reference Ontology Workshop
What an Ontology is For
Barry Smith
University at Buffalo
http://ontology.buffalo.edu/smith
1
we are accumulating huge amounts of data
 how do we know what data we have ?
 how do I know what data you have ?
 how do we know what data we don’t have ?
 how do we make different sorts of data combinable ?
2
3
where in the cell ?
what kind of process ?
what kind of biological end ?
we need semantic annotation of data
4
how create broad-coverage semantic
annotation systems for biomedicine?
Semantic Web, Moby, wikis, UMLS, etc.
 let a million flowers (weeds) bloom
 and create integration via post hoc mappings
5
an
foralternative
science
develop high quality annotation resources in a
collaborative, community effort
create an evolutionary path towards improvement
on the basis of common prospective standards
based on science
6
for science
science works out from a validated core, and strives
to isolate and resolve inconsistencies as it extends
outwards
we need to create a validated core
including ontologies corresponding to
the basic biomedical sciences in this
core
low hanging fruit
7
Anatomical
Structure
Anatomical Space
Organ Cavity
Subdivision
Organ
Cavity
Organ
Serous Sac
Cavity
Subdivision
Serous Sac
Cavity
Serous Sac
Organ
Component
Organ
Subdivision
Pleural Sac
Pleural
Cavity
Parietal
Pleura
Interlobar
recess
Organ Part
Mediastinal
Pleura
Foundational
FMA Model of Anatomy
Pleura(Wall
of Sac)
Visceral
Pleura
Mesothelium
of Pleura
Tissue
but for
we science
need more
where do we find scientifically validated information
linking gene products and other entities represented
in biochemical databases to semantically
meaningful terms pertaining to disease, anatomy,
development, histology in different model
organisms?
9
what makes
GO so wildly
successful ?
10
The methodology of annotations
science base: trained experts curating peerreviewed literature
create an evolving set of standardized descriptions
used to annotate the entities represented in the
major biochemical databases
and thereby to integrate these databases
11
this leads to improvements and
extensions of the ontology
which in turn leads to better annotations
which leads to further improvement in the quality and reach
of both future annotations and the ontology itself
RESULT: a slowly growing computer-interpretable map of
biological reality within which major databases are
automatically integrated in semantically searchable form
12
Five bangs for your GO buck
cross-species database integration
cross-granularity database integration
through links to the things which are of biomedical
relevance
semantic searchability links people to software
human curated science base creates de facto gold
standard (benchmark for comparison)
13
but now
need to create a de jure standard:
improve the quality of the GO
establish common rules governing best practices
for creating ontologies and for using these in
annotations
apply these rules to create a complete suite of
orthogonal interoperable biomedical reference
ontologies
14
First step (2003)
a shared portal for (so far) 58 ontologies
(low regimentation)
http://obo.sourceforge.net
15
Second step (2004)
reform efforts initiated, e.g. linking GO to other
OBO ontologies to ensure orthogonality
GO
id: CL:0000062
name: osteoblast
def: "A bone-forming cell which secretes an extracellular matrix.
Hydroxyapatite crystals are then deposited into the matrix to form
bone."
is_a: CL:0000055
relationship: develops_from CL:0000008
relationship: develops_from CL:0000375
Osteoblast differentiation: Processes whereby an
osteoprogenitor cell or a cranial neural crest cell
acquires the specialized features of an osteoblast, a
bone-forming cell which secretes extracellular matrix.
+
Cell type
=
New Definition
16
Third step (2006)
The OBO Foundry
http://obofoundry.org/
17
18
The OBO Foundry
a family of interoperable gold standard
biomedical reference ontologies to serve the
annotation of inter alia



scientific literature
model organism databases
clinical trial data
The OBO Foundry
http://obofoundry.org/
19
A prospective standard
designed to guarantee interoperability of ontologies from
the very start (contrast to: post hoc mapping)
12 initial candidate OBO ontologies – focused primarily on
basic science domains
several being constructed ab initio by influential consortia
who have the authority to impose their use on large parts of
the relevant communities.
20
GO Gene Ontology
undergoing
ChEBI Chemical Ontology
rigorous
CL Cell Ontology
FMA Foundational Model of Anatomy reform
PaTO Phenotype Quality Ontology
SO Sequence Ontology
CARO Common Anatomy Reference Ontology
CTO Clinical Trial Ontology
FuGO Functional Genomics Investigation Ontology
PrO Protein Ontology
RnaO RNA Ontology
RO Relation Ontology
new
The OBO Foundry
http://obofoundry.org/
21
GO Gene Ontology
ChEBI Chemical Ontology
CL Cell Ontology
FMA Foundational Model of Anatomy
PaTO Phenotype Quality Ontology
SO Sequence Ontology
CARO Common Anatomy Reference Ontology
CTO Clinical Trial Ontology
FuGO Functional Genomics Investigation Ontology
PrO Protein Ontology
to be absorbed in new
RnaO RNA Ontology
Ontology of
RO Relation Ontology
new
The OBO Foundry
http://obofoundry.org/
Biomedical
Investigations (OBI)
22
Ontology
Scope
URL
Custodians
Cell Ontology
(CL)
cell types from prokaryotes
to mammals
obo.sourceforge.net/cgibin/detail.cgi?cell
Jonathan Bard, Michael
Ashburner, Oliver Hofman
Chemical Entities of Biological Interest (ChEBI)
molecular entities
ebi.ac.uk/chebi
Paula Dematos,
Rafael Alcantara
Common Anatomy Reference Ontology (CARO)
anatomical structures in
human and model organisms
(under development)
Melissa Haendel, Terry
Hayamizu, Cornelius Rosse,
David Sutherland ???
Foundational Model of
Anatomy (FMA)
structure of the human body
fma.biostr.washington.
edu
JLV Mejino Jr.,
Cornelius Rosse
Functional Genomics
Investigation Ontology
(FuGO)
design, protocol, data
instrumentation, and analysis
fugo.sf.net
FuGO Working Group
Gene Ontology
(GO)
cellular components,
molecular functions,
biological processes
www.geneontology.org
Gene Ontology Consortium
Phenotypic Quality
Ontology
(PaTO)
qualities of anatomical
structures
obo.sourceforge.net/cgi
-bin/ detail.cgi?
attribute_and_value
Michael Ashburner, Suzanna
Lewis, Georgios Gkoutos
Protein Ontology
(PrO)
protein types and
modifications
(under development)
Protein Ontology Consortium
Relation Ontology (RO)
relations
obo.sf.net/relationship
Barry Smith, Chris Mungall
RNA Ontology
(RnaO)
three-dimensional RNA
structures
(under development)
RNA Ontology Consortium
Sequence Ontology
(SO)
properties and features of
nucleic sequences
song.sf.net
Karen Eilbeck
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy?)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
24
all OBO Foundry developers have agreed to a
common set of evolving principles reflecting best
practice in ontology development designed to
ensure

tight connection to the biomedical basic sciences

compatibility

interoperability, common relations

formal robustness

support for logic-based reasoning
The OBO Foundry
http://obofoundry.org/
25
PRINCIPLES
 The ontology is OPEN and available to be used
by all.
 The ontology is in, or can be instantiated in, a
COMMON FORMAL LANGUAGE.
 The developers of the ontology agree in advance
to COLLABORATE with developers of other OBO
Foundry ontology where domains overlap.
The OBO Foundry
http://obofoundry.org/
26
PRINCIPLES
 UPDATE: The developers of each ontology
commit to its maintenance in light of scientific
advance, and to soliciting community feedback
for its improvement.
 ORTHOGONALITY: They commit to working with
other Foundry members to ensure that, for any
particular domain, there is community
convergence on a single controlled vocabulary.
The OBO Foundry
http://obofoundry.org/
27
orthogonality
of
ontologies
implies
for science
additivity of annotations
if we annotate a database or body of literature with
one high-quality biomedical ontology, we should be
able to add annotations from a second such
ontology without conflicts
science aims for consistency
because science aims for correctness
The OBO Foundry
http://obofoundry.org/
28
PRINCIPLES
 IDENTIFIERS: The ontology possesses a unique
identifier space within OBO.
 VERSIONING: The ontology provider has
procedures for identifying distinct successive
versions to ensure BACKWARDS COMPATIBITY
with annotation resources already in common use
 The ontology includes TEXTUAL DEFINITIONS
and where possible equivalent formal definitions of
its terms.
29
PRINCIPLES
 CLEARLY BOUNDED: The ontology has a
clearly specified and clearly delineated content.
 DOCUMENTATION: The ontology is welldocumented.
 USERS: The ontology has a plurality of
independent users.
The OBO Foundry
http://obofoundry.org/
30
PRINCIPLES
 COMMON ARCHITECTURE: The ontology uses
relations which are unambiguously defined
following the pattern of definitions laid down in
the OBO Relation Ontology.*
* Smith et al., Genome Biology 2005, 6:R46
The OBO Foundry
http://obofoundry.org/
31
OBO Relation Ontology
Foundational
is_a
part_of
Spatial
located_in
contained_in
adjacent_to
Temporal
transformation_of
derives_from
preceded_by
Participation
has_participant
has_agent
The OBO Foundry
http://obofoundry.org/
32
IT WILL GET HARDER
Further principles will be added over time in light of
lessons learned
BUT NOT EVERYONE NEEDS TO JOIN
The Foundry is not seeking to serve as a check on
flexibility or creativity
The OBO Foundry
http://obofoundry.org/
33
GOALS
 CREDIT for high quality ontology
development work
 KUDOS for early adopters of high quality
ontologies / terminologies e.g. in reporting
clinical trial results
The OBO Foundry
http://obofoundry.org/
34
GOALS
 to introduce some of the features of
SCIENTIFIC PEER REVIEW into biomedical
ontology development
 to providing a FRAMEWORK OF RULES to
counteract the current policy of ad hoc creation
 if data-schemas are formulated using a single
ontology system in widespread use this supports
DATA REUSABILITY
The OBO Foundry
http://obofoundry.org/
35
A dichotomy
universals (types, kinds, classes)
vs.
instances (particulars, individuals)
36
Catalog vs. inventory
A
B
C
515287
521683
521682
DC3300 Dust Collector Fan
Gilmer Belt
Motor Drive Belt
37
An ontology is a representation of universals
38
An ontology is a
representation of universals
We learn about universals by looking at
scientific texts – which describe what is
general in reality
39
universals
substance
organism
animal
mammal
cat
leaf class
siamese
instances
frog
rule of single inheritance
no diamonds:
C
is_a2
B
is_a1
A
41
problems with multiple inheritance
B
C
is_a1
is_a2
A
‘is_a’ no longer univocal
42
‘is_a’ is pressed into service to mean
a variety of different things
shortfalls from single inheritance are often
clues to incorrect entry of terms and
relations
the resulting ambiguities make the rules for
correct entry difficult to communicate to
human curators
43
is_a overloading
serves as obstacle to integration with
neighboring ontologies
The success of ontology alignment
depends crucially on the degree to
which basic ontological relations such
as is_a and part_of can be relied on as
having the same meanings in the
different ontologies to be aligned.
44
What single inheritance costs
In some respects harder to build ontologies
harder to use ontologies to find terms
Solutions: normalization, GUIs
Recommendation: if building from scratch
use single inheritance
45
What single inheritance brings
Coherent hierarchies
Modularity
Statistical representativeness
Jointly exhaustive pairwise disjoint
classification
Coherent methodology for definitions
46
Aristotelian definitions
When A is_a B, the definition of ‘A’ has the form:
an A =def. a B which ...
a human being =def. an animal which is rational
Each definition reflects the position in the hierarchy to
which a defined term belongs.
47
FMA Examples
Cell =def. an anatomical structure
which consists of cytoplasm
surrounded by a plasma membrane
with or without a cell nucleus
Plasma membrane =def. a cell part that
surrounds the cytoplasm
48
Canonical ontologies
49
The FMA is a canonical representation
of types and relations between types
deduced from the qualitative
observations of the normal human
body, which have been refined and
sanctioned by successive generations
of anatomists and presented in
textbooks and atlases of structural
anatomy.
50
The GO is a canonical representation
“The Gene Ontology is a computational
representation of the ways in which
gene products normally function in the
biological realm”
Nucl. Acids Res. 2006: 34.
51
The Gene Ontology
is a canonical ontology – it represents
only what is normal in the realm of
molecular functioning
52
The core of the OBO Foundry
consists of canonical ontologies
(pathoanatomy, pathophysiology will
come later)
53
Three canonical ontologies
CARO
+ Ontology of Functions
+ Ontology of Developmental Processes
(part of GO Biological Process ontology?)
54
A second fundamental dichotomy
• universals vs. instances
• continuants vs. occurrents
55
Continuants (aka endurants)
– have continuous existence in time
– preserve their identity through change
Occurrents (aka processes)
– have temporal parts
– unfold themselves in successive
phases
56
You are a continuant
Your life is an occurrent
You are 3-dimensional
Your life is 4-dimensional
57
A third fundamental dichotomy
• types vs. instances
• continuants vs. occurrents
• dependent vs. independent
58
Dependent entities
require independent continuants as their
bearers
There is no grin without a cat
There is no quality without a bearer
There is no disease without an organism
59
All occurrents are dependent entities
They are dependent on those
independent continuants which are
their participants (agents, patients,
media ...)
There is no run without a runner
60
Dependent vs. independent
continuants
Independent continuants (organisms,
cells, molecules, environments)
Dependent continuants (qualities,
shapes, roles, propensities,
functions)
61
Continuant
Independent
Continuant
Occurrent
(always dependent
on one or more
independent
continuants)
Dependent
Continuant
Top-Level Ontology
62
Continuant
Occurrent
biological process
Independent
Continuant
Dependent
Continuant
cell component
molecular function
The GO Top-Level Ontology
63
Functions vs Functionings
the function of your heart = to pump blood
in your body
this function is realized in processes of
pumping blood
not all functions are realized (consider the
function of this sperm ...)
not all processes are functionings
64
Continuant
Independent
Continuant
Occurrent
Dependent
Continuant
(Function)
Functioning
Stochastic
process
Incidental
by-product
65
The OBO Relation Ontology
66
Part_of as a relation between
universals
heart part_of human being ?
human heart part_of human being ?
human testis part_of human being ?
human being has_part human testis ?
67
two kinds of parthood
1. between instances:
Mary’s heart part_of Mary
this nucleus part_of this cell
2. between universals
human heart part_of human
cell nucleus part_of cell
68
Definition of part_of as a relation
between universals
A part_of B =Def. all instances of A are
instance-level parts of some instance of B
human testis part_of adult human being
but not
adult human being has_part human testis
69
Continuants
– have continuous existence in time
– preserve their identity through change
Occurrents (aka processes)
– have temporal parts
– unfold themselves in successive
phases
70
part_of (for processes)
A part_of B =def.
For all x, if x instance_of A then there is
some y, y instance_of B and x part_of y
where ‘part_of’ is the instance-level part
relation
EVERY A IS PART OF SOME B
71
part_of (for continuants)
A part_of B =def.
For all x, t if x instance_of A at t then there is some
y, y instance_of B at t and x part_of y at t
where ‘part_of’ is the instance-level part relation
ALL-SOME STRUCTURE
72
part_of (for continuants)
A part_of B =def.
if an A exists at t then it is part_of some B at t
where ‘part_of’ is the instance-level part
relation
73
has_part (for continuants)
A has_part B =def.
if an A exists at t then there is some B of which
it is a part at t
74
human testis part_of adult human being
but not
adult human being has_part human testis
75
is_a (for processes)
A is_a B =def
For all x, if x instance_of A then x
instance_of B
cell division is_a biological process
76
is_a (for continuants)
A is_a B =def
For all x, t if x instance_of A at t then x
instance_of B at t
abnormal cell is_a cell
adult human is_a human
but not: adult is_a child
77
A part_of B, B part_of C ...
The all-some structure of the definitions
in the OBO-RO allows
cascading of inferences
(i) within ontologies
(ii) between ontologies
(iii) between ontologies and EHR
repositories of instance-data
78
OBO Relation Ontology
Foundational
is_a
part_of
Spatial
located_in
contained_in
adjacent_to
Temporal
transformation_of
derives_from
preceded_by
Participation
has_participant
has_agent
79
David Sutherland
For any structure x, I should be able to answer
the questions:
1. What is x (what type of thing is it)?
2. Where is x (what is it part of)?
3. What subtypes of x are there?
4. What parts does x have?
80
For any structure x, I should be able to answer the
questions:
1.
2.
3.
4.
5.
What type of thing is x? Say: A
What types of things are As part of ?
What types of things are As located in ?
What subtypes of A’s are there?
What parts do A’s have?
For continuants: located_in = either part_of or
contained_in
81
David
The first 2 questions are important for navigating the
ontology
The second 2 questions are crucial to grouping
curations
If we are looking for phenotypes that effect hands,
we need to be able to deduce that a hand has
fingers and so add finger phenotypes to our hand
phenotype list.
I think that having 'has_part' relationships in the
ontology is key to acheiving this.
82
Anatomical
Structure
Anatomical Space
Organ Cavity
Subdivision
Organ
Cavity
Organ
Serous Sac
Cavity
Subdivision
Serous Sac
Cavity
Serous Sac
Organ
Component
Organ
Subdivision
Pleural Sac
Pleural
Cavity
Parietal
Pleura
Interlobar
recess
Organ Part
Mediastinal
Pleura
Foundational
FMA Model of Anatomy
Pleura(Wall
of Sac)
Visceral
Pleura
Mesothelium
of Pleura
Tissue
human uterus part_of human being
but not
human body has_part human uterus
84
Temporal relations
85
transformation_of
same instance
C
c at t
C1
c at t1
time
86
transformation_of
A transformation_of B =Def.
Every instance of A was at some earlier
time an instance of B
adult transformation_of child
heart transformation of heart-precursor
87
C
C1
c at t
c at t1
embryological development
88
derives_from
C
C1
c at t
c1 at t1
time
C'
c' at t
instances
ovum
zygote derives_from
sperm
89
two continuants fuse to form a
new continuant
C
C1
c at t
c1 at t1
C'
c' at t
fusion
90
one initial continuant is replaced by two
successor continuants
C
c at t
C1
c1 at t1
C2
c1 at t1
fission
91
is a relation combining transformation with fusion
and fission (extended from the binary cases) what
we are seeking in order to capture development
via CARO?
should this relation be called ‘derives_from’ or
‘develops_from’
92
one continuant detaches itself from an
initial continuant, which itself continues
to exist
C
c at t
c at t1
C1
c1 at t
budding
93
one continuant absorbs a second continuant
while itself continuing to exist
C
c at t
c at t1
C'
c' at t
capture
94
Principle of low hanging fruit
often one of two reciprocal relations
(e.g. part_of and has_part) will hold
universally
human testis part_of human body
but not
human body has_part human testis
95
Principle of low hanging fruit
nucleus adjacent_to cytoplasm
but not
cytoplasm adjacent_to nucleus
96
Principle of low hanging fruit
seminal vesicle adjacent_to urinary
bladder
but not:
urinary bladder adjacent_to
seminal vesicle
97
Top-Level Categories in the FMA
anatomical
entity
physical
anatomical entity
material physical
anatomical entity
anatomical
structure
body
substance
non-physical
anatomical entity
non-material physical
anatomical entity
body
space
boundary
anatomical
attribute
anatomical
relationship
98
Fiat vs. bona fide boundaries
99
Layers of the body’s surface
kidshealth.org/kid/ body/skin_noSW.html
100
Top-Level Categories in the FMA
anatomical
entity
physical
anatomical entity
material physical
anatomical entity
anatomical
structure
body
substance
non-physical
anatomical entity
non-material physical
anatomical entity
body
space
boundary
anatomical
attribute
anatomical
relationship
101
102
www.enel.ucalgary.ca/ People/Mintchev/stomach.htm
anatomical
entity
physical
anatomical entity
material physical
anatomical entity
anatomical
structure
body
substance
non-physical
anatomical entity
non-material physical
anatomical entity
body
space
boundary
fiat boundary
anatomical
attribute
anatomical
relationship
bona fide
boundary
103
fiat vs. bona fide boundaries
fiat
boundary in
anatomical
space
physical
boundary
104
105
www.enel.ucalgary.ca/ People/Mintchev/stomach.htm
varieties of fiat boundaries
in anatomical structures
in body spaces
spatial vs. temporal (stages, pathways)
in instances
in the realm of universals
106
varieties of fiat boundaries
in anatomical structures
107
modes of connection
–attached_to (muscle to bone)
–synapsed_with (nerve to nerve, nerve
to muscle)
–continuous_with (= share a fiat
boundary)
108
a continuous_with b
= a and b are continuant instances
which share a fiat boundary
This relation on the instance level is
always symmetric:
if x continuous_with y , then y
continuous_with x
109
continuous_with
(relation between universals)
A continuous_with B =Def.
for all x, if x instance-of A then there is
some y such that y instance_of B and x
continuous_with y
110
continuous_with as a relation between
universals is not symmetric
Consider lymph node and lymphatic vessel:
– Each lymph node is continuous with
some lymphatic vessel, but there are
lymphatic vessels (e.g. lymphs and
lymphatic trunks) which are not
continuous with any lymph nodes
111
wherever we have fiat boundaries
there is a certain indeterminacy in the
location of the boundary
where does the arm begin?
where does the head begin?
where does abnormal curvature of the
spine begin
112
do regions have this indeterminacy?
113
An ontology is a
representation of types
Each term in an ontology should be a singular
common noun
Cell, lung, ...
refer to instances in reality by referring to the
types which they instantiate
114
Problems with mass nouns
‘blood’
‘menstrual fluid’
115
Problems with ‘tissue’
a specific portion of cells (instance)
a specific portion of cells (type)
a specific portion of cells of a certain type (instance)
a specific portion of cells of a certain type (type)
an arbitrary portion of cells x 4 as above
all of the above IN the body
all of the above in the form of samples OUTSIDE the
body
a type of tissue, e.g. mesothelial tissue
116
Brenda Tissue Ontology
contains statements like: arm is-a limb (here everything a
tissue)
Aukland Anatomy Ontology
Classifies tissue into: Connective tissue, Epithelial tissue,
Glandular tissue, Muscle tissue, Nervous tissue;
proceeding further down the hierarchy we find not tissues but
SimpleTubularGland, SimpleAcinarGland, etc.
EndocrineGland is asserted to have two ‘instances’
EndocrineGland (!), and FollicularEndocrineGland.
ConnectiveTissue has ‘instances’: Left Humerus, Right Tibia, ...
117
Recommendation
avoid ‘tissue’ and all count nouns
hypothesis: in every case where one
would want to use ‘portion of tissue’ in a
scientific anatomy we mean:
maximally connected portion of tissue,
and there is already a common noun for
a corresponding type (?)
118
119
CM
application (current and future) of Foundry
principles in GO
stages
application aspects of multiple inheritance:
pre- and post-coordination
120