Download Ontology_for_Develop.. - Buffalo Ontology Site

Document related concepts

Cellular differentiation wikipedia , lookup

Organ-on-a-chip wikipedia , lookup

List of types of proteins wikipedia , lookup

Transcript
What developers
need to know about
ontologies?
Barry Smith
http://ontology.buffalo.edu/smith
1
HL7 Watch (blog)
Microsoft Healthvault:
Allergic Episode is_a Health Record Item,
Health Record Item =def. A single piece of
data in a health record that is accessible
through the HealthVault service
2
Problem of ensuring sensible
cooperation in a massively
interdisciplinary community
concept
type
instance
model
representation
data
3
What do these mean?
‘conceptual data model’
‘semantic knowledge model’
‘reference information model’
4
You’re interested
in which genes
control heart
muscle
development
17,536 results
5
time
Defense response
Immune response
Response to stimulus
Toll regulated genes
JAK-STAT regulated genes
Microarray data
shows changed
expression of
thousands of genes.
Puparial adhesion
Molting cycle
hemocyanin
Amino acid catabolism
Lipid metobolism
How will you spot
the patterns?
Peptidase activity
Protein catabloism
Immune response
Immune response
Toll regulated genes
attacked control
Tree:
pearson
Coloredby:
by:
arson
lw n3d
... lw n3d ... Colored
Copy
of Copy
C5_RMA
Copy
ofofCopy
of(Defa...
C5_RMA (Defa...
6
Lab / pathology data
EHR data
Clinical trial data
Family history data
Medical image data
Microarray data
Model organism data
Flow cytometry
Mass spec
Genotype / SNP data
How will you find the data you
need?
7
−
−
−
−
−
−
Human
Mouse
Rat
Fish
Yeast
E. coli
How will you find the compare
the data? How will you
integrate the data
8
The GO Idea
GlyProt
MouseEcotope
sphingolipid
transporter
activity
DiabetInGene
GluChem
:.
annotation using common ontologies
yields integration of databases
GlyProt
MouseEcotope
Holliday junction
helicase complex
DiabetInGene
GluChem
:.
• For this to work, ontologies cannot be
allowed to proliferate uncontrollably
• Rather, we need as far as possible nonoverlapping ontology modules (OBO
Foundry)
• How should we build these modules in
such a way as to ensure glue-ability of
annotations?
Glue-ability / integration
• rests on the existence of a common
benchmark called ‘reality’
• the ontologies we want to glue together are
representations of what exists in the world
• not of what exists in the heads of different
groups of people
12
two kinds of annotations
13
names of types
14
names of instances
15
First basic distinction
type vs. instance
(science text vs. diary)
(human being vs. Tom Cruise)
16
For ontologies
it is generalizations that are
important = ontologies are
about types, kinds, universals
17
Ontology
types
Instances
18
Ontology = A Representation of types
19
An ontology is a representation
of types
We learn about types in reality from looking
at the results of scientific experiments in the
form of scientific theories
experiments relate to what is particular
science describes what is general
20
Inventory vs. Catalog
Two kinds of representational
artifact
Very roughly:
Databases represent instances
Ontologies represent types
21
Catalog vs. inventory
A
B
C
515287
521683
521682
DC3300 Dust Collector Fan
Gilmer Belt
Motor Drive Belt
22
Catalog vs. inventory
23
Catalog of types/Types
24
types
object
organism
animal
mammal
cat
siamese
frog
instances
25
Ontologies are here
26
or here
27
ontologies represent general
structures in reality (leg)
28
Ontologies do not represent
concepts in people’s heads
29
They represent types in reality
30
which provide the benchmark for
integration
31
Entity =def
anything which exists, including things and
processes, functions and qualities, beliefs
and actions, documents and software
(Levels 1, 2 and 3)
32
what are the kinds of entity?
33
First basic distinction
type vs. instance
(science text vs. diary)
(human being vs. Tom Cruise)
34
Ontology
Types
Instances
35
Ontology = A Representation of types
36
Domain =def
a portion of reality that forms the subjectmatter of a single science or technology or
mode of study or administrative practice ...;
proteomics
HIV
epidemiology
37
Representation =def
an image, idea, map, picture, name or
description ... of some entity or entities.
38
Ontologies are representational
artifacts
comparable to science texts
and subject to the same sorts
of constraints (including
need for update)
39
Representational units =def
terms, icons, alphanumeric identifiers ...
which refer, or are intended to refer, to
entities
and which are minimal (atoms)
40
Composite representation =def
representation
(1) built out of representational units
which
(2) form a structure that mirrors, or is intended
to mirror, the entities in some domain
41
Analogue representations
no representational
units, no ‘atoms’
42
The Periodic Table
Periodic Table
43
Class =def
a maximal collection of particulars
determined by a general term
(‘cell’. ‘electron’ but also: ‘ ‘restaurant in
Palo Alto’, ‘Italian’)
the class A
= the collection of all particulars x for
which ‘x is A’ is true
44
types vs. their extensions
types
{a,b,c,...}
collections of particulars
45
Extension =def
The extension of a type A is the class:
instance of the type A
(it is the class of A’s instances)
(the class of all entities to which the term ‘A’
applies)
46
Problem
The same general term can be used to refer
both to types and to collections of
particulars. Consider:
HIV is an infectious retrovirus
HIV is spreading very rapidly through Asia
47
types vs. classes
types
{c,d,e,...}
classes
48
types vs. classes
types
~ defined classes
49
types vs. classes
types
e.g. populations, ...
50
Defined class =def
a class defined by a general term which
does not designate a type
the class of all diabetic patients in
Leipzig on 4 June 1952
51
OWL is a good representation of
defined classes
• sibling of Finnish spy
• member of Abba aged > 50 years
• pizza with > 4 different toppings
52
Terminology =def.
a representational artifact whose
representational units are natural language
terms (with IDs, synonyms, comments, etc.)
which are intended to designate types
together with defined classes, with no
particular attention to composite
representations
53
types, classes, concepts
types
defined classes
‘concepts’
?
54
types < defined classes < ‘concepts’
‘concepts’ which do not correspond to
defined classes:
‘Surgical or other procedure not carried out
because of patient's decision’
‘Congenital absent nipple’
because they do not correspond to anything
55
Gene Ontology: The Very Top
cellular
component
molecular
function
biological
process
56
Gene Ontology: The Very Top
continuant
cellular
component
molecular
function
occurrent
biological
process
57
BFO: The Very Top
continuant
independent
continuant
dependent
continuant
cellular
component
molecular
function
occurrent
biological
processes
58
Basic Formal Ontology
continuant
independent
continuant
occurrent
dependent
continuant
organism
59
Basic Formal Ontology
continuant
independent
continuant
occurrent
dependent
continuant
anatomical
structure
60
Continuants
• continue to exist through time,
preserving their identity while
undergoing different sorts of changes
• independent continuants – objects,
things, ...
• dependent continuants – qualities,
attributes, shapes, potentialities ...
61
Qualities
temperature
blood pressure
mass
...
are continuants
they exist through time while
undergoing changes
62
Qualities
temperature / blood pressure / mass ...
are dimensions of variation within the
structure of the entity; a quality is
something which can change while its
bearer remains one and the same
63
A Chart representing how
John’s temperature changes
65
John’s temperature
the temperature he has throughout his
entire life, cycles through different
determinate temperatures from one
time to the next
John’s temperature is a physiology
variable which, in thus changing,
exerts an influence on other physiology
variables through time
66
BFO: The Very Top
continuant
independent
continuant
occurrent
dependent
continuant
quality
temperature
67
Blinding Flash of the Obvious
independent
continuant
dependent
continuant
quality
organism
John
temperature
John’s
temperature
types
instances
68
Blinding Flash of the Obvious
independent
continuant
dependent
continuant
quality
organism
John
temperature
John’s
temperature
types
instances
69
Blinding Flash of the Obvious
inheres_in
organism
John
temperature
John’s
temperature
types
instances
70
types
temperature
37ºC
instantiates
at t1
37.1ºC
instantiates
at t2
37.2ºC
instantiates
at t3
37.3ºC
instantiates
at t4
37.4ºC
instantiates
at t5
37.5ºC
instantiates
at t6
John’s temperature
instances
71
types
human
embryo
instantiates
at t1
fetus
instantiates
at t2
neonate
instantiates
at t3
infant
child
instantiates
at t4
instantiates
at t5
adult
instantiates
at t6
John
instances
72
• lower lever of types does not ‘carry
identity’ in OntoClean terms
• are threshold divisions (hence we do
not have sharp boundaries, and we
have a certain degree of choice, e.g. in
how many subtypes to distinguish,
though not in their ordering)
73
independent
continuant
dependent
continuant
quality
organism
John
temperature
types
John’s
temperature
instances
74
independent
continuant
organism
John
dependent
continuant
occurrent
quality
process
temperature
John’s
temperature
course of
temperature
changes
John’s
temperature history
75
independent
continuant
organism
John
dependent
continuant
occurrent
quality
process
temperature
John’s
temperature
life of an
organism
John’s
life
76
BFO/GO: The Very Top
continuant
independent
continuant
dependent
continuant
cellular
component
molecular
function
occurrent
biological
processes
77
BFO: The Very Top
continuant
independent
continuant
occurrent
dependent
continuant
quality
function
role
disposition
78
Function
-
of
of
of
of
of
liver: to store glycogen
birth canal: to enable transport
eye: to see
mitochondrion: to produce ATP
liver: to store glycogen
not optional; reflection of physical
makeup of bearer; can malfunction
79
:.
Role
optional:
exists because the bearer is in
some special natural, social, or
institutional set of
circumstances in which the
bearer does not have to be
80
:.
Role
- bearers can have more than one
role
person as student / as staff member
- roles often form systems of mutual
dependence
husband / wife
first in queue / last in queue
doctor / patient
host / pathogen
:.
81
Role
of some chemical compound: to serve
as analyte in an experiment
of a dose of penicillin in this human
child: to treat a disease
of this bacteria in a primary host: to
cause infection
82
:.
Qualities are categorical
features of reality – you just
have them
Functions, roles and dispositions
are potential featires of reality:
they are realizable dependent
continuants, realized in certain
associated processes
83
:.
independent
continuant
portion of
chemical
compound
this portion
of aspirin
dependent
continuant
occurrent
role
process
drug role
process of drug
adminstration
role of this
portion of aspirin
John’s taking
this portion of aspirin
84
independent
continuant
portion of
chemical
compound
dependent
continuant
occurrent
role
process
drug role
process of drug
adminstration
inheres_in
realized_in
this portion
of aspirin
role of this
portion of aspirin
John’s taking
this portion of aspirin
85
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
The Open Biomedical Ontologies (OBO) Foundry
86
• The Road to Convergence
All ontologies for each given domain (anatomy,
chemistry…) should be part of a single suite of
interoperable ontologies
should use a common top-level core
for subdomains with many variants, should
follow the strategy of canonical ontologies
with extensions
should require acceptance of common, tested
guidelines on all subscribing ontology
developers
87
RELATION TO
TIME
GRANULARITY
INDEPENDENT
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
CONTINUANT
DEPENDENT
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
OCCURRENT
Molecular Function
(GO)
Organism-Level
Process
(GO)
Cellular Process
(GO)
Molecular
Process
(GO)
initial OBO Foundry coverage, ontologies
automatically semantically coupled 88
Disposition (InternallyGrounded Realizable Entity)
disposition =def.
a realizable entity which if it ceases to
exist, then its bearer is physically
changed, and
whose realization occurs when this
bearer is in some special physical
circumstances, in virtue of the
bearer’s physical make-up
89
Function
• A Disposition (Internally-Grounded
Realizable Entity) that is designed or
selected for
90
OGMS
• Ontology for General Medical Science
http://code.google.com/p/ogms
91
Physical Disorder
– independent continuant
fiat object part
92
:.
Big Picture
93
A disease is a disposition rooted in a
physical disorder in the organism and
realized in pathological processes.
produces
etiological process
bears
disorder
realized_in
disposition
pathological process
produces
diagnosis
interpretive process
produces
signs & symptoms
used_in
abnormal bodily features
recognized_as
94
Elucidation of Primitive Terms
• ‘bodily feature’ - an abbreviation for a physical
component, a bodily quality, or a bodily process.
• disposition - an attribute describing the propensity to
initiate certain specific sorts of processes when
certain conditions are satisfied.
• clinically abnormal - some bodily feature that
– (1) is not part of the life plan for an organism of the
relevant type (unlike aging or pregnancy),
– (2) is causally linked to an elevated risk either of pain or
other feelings of illness, or of death or dysfunction, and
– (3) is such that the elevated risk exceeds a certain threshold
level.*
*Compare: baldness
95
Definitions - Foundational Terms
• Disorder =def. – A causally linked combination of
physical components that is clinically abnormal.
• Pathological Process =def. – A bodily process that is
a manifestation of a disorder and is clinically
abnormal.
• Disease =def. – A disposition (i) to undergo
pathological processes that (ii) exists in an organism
because of one or more disorders in that organism.
96
Dispositions and Predispositions
• All diseases are dispositions; not all dispositions are
diseases.
• A predisposition is a disposition.
• Predisposition to Disease of Type X =def. – A disposition
in an organism that constitutes an increased risk of the
organism’s subsequently developing the disease X.
• HNPCC is caused by a
– disorder (mutation) in a DNA mismatch repair gene that
– disposes to the acquisition of additional mutations from
defective DNA repair processes, and thus is a
– predisposition to the development of colon cancer.
97
Cirrhosis - environmental exposure
•
•
•
•
•
•
•
Etiological process - phenobarbitolinduced hepatic cell death
– produces
Disorder - necrotic liver
– bears
Disposition (disease) - cirrhosis
– realized_in
Pathological process - abnormal tissue
repair with cell proliferation and
fibrosis that exceed a certain
threshold; hypoxia-induced cell death
– produces
Abnormal bodily features
– recognized_as
Symptoms - fatigue, anorexia
Signs - jaundice, splenomegaly







Symptoms & Signs
 used_in
Interpretive process
 produces
Hypothesis - rule out cirrhosis
 suggests
Laboratory tests
 produces
Test results - elevated liver enzymes in
serum
 used_in
Interpretive process
 produces
Result - diagnosis that patient X has a
disorder that bears the disease
cirrhosis
98
Influenza - infectious
•
•
•
•
•
•
•
Etiological process - infection of
airway epithelial cells with influenza
virus
– produces
Disorder - viable cells with influenza
virus
– bears
Disposition (disease) - flu
– realized_in
Pathological process - acute
inflammation
– produces
Abnormal bodily features
– recognized_as
Symptoms - weakness, dizziness
Signs - fever







Symptoms & Signs
 used_in
Interpretive process
 produces
Hypothesis - rule out influenza
 suggests
Laboratory tests
 produces
Test results - elevated serum antibody titers
 used_in
Interpretive process
 produces
Result - diagnosis that patient X has a
disorder that bears the disease flu
But the disorder also induces normal
physiological processes (immune response)
that can results in the elimination of the
99
disorder (transient disease course).
Huntington’s Disease - genetic
•
•
•
•
•
•
•
Etiological process - inheritance of
>39 CAG repeats in the HTT gene
– produces
Disorder - chromosome 4 with
abnormal mHTT
– bears
Disposition (disease) - Huntington’s
disease
– realized_in
Pathological process - accumulation of
mHTT protein fragments, abnormal
transcription regulation, neuronal cell
death in striatum
– produces
Abnormal bodily features
– recognized_as
Symptoms - anxiety, depression
Signs - difficulties in speaking and
swallowing







Symptoms & Signs
 used_in
Interpretive process
 produces
Hypothesis - rule out Huntington’s
 suggests
Laboratory tests
 produces
Test results - molecular detection of
the HTT gene with >39CAG repeats
 used_in
Interpretive process
 produces
Result - diagnosis that patient X has a
disorder that bears the disease
Huntington’s disease
100
HNPCC - genetic pre-disposition
• Etiological process - inheritance of a mutant mismatch repair gene
– produces
• Disorder - chromosome 3 with abnormal hMLH1
– bears
• Disposition (disease) - Lynch syndrome
– realized_in
• Pathological process - abnormal repair of DNA mismatches
– produces
• Disorder - mutations in proto-oncogenes and tumor suppressor genes with
microsatellite repeats (e.g. TGF-beta R2)
– bears
• Disposition (disease) - non-polyposis colon cancer
– realized in
• Symptoms (including pain)
101
The OBO Foundry Initiative
102
A good solution to the data integration
problem must be:
• modular
• incremental
• bottom-up
• evidence-based
• revisable
• incorporate a strategy for motivating
potential developers and users
103
GO is amazingly successful
– but covers only three sorts
of biological entities:
– cellular components
– molecular functions
– biological processes
and does not provide representations
of disease-related phenomena
104
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
The Open Biomedical Ontologies (OBO) Foundry
105
OBO Foundry provides
• tested guidelines enabling new groups to
develop the ontologies they need in ways which
counteract forking and dispersion of effort
• an incremental bottoms-up approach to
evidence-based terminology practices in
medicine that is rooted in basic biology
• automatic web-based linkage between medical
terminologies and biological knowledge
resources
• traffic laws and traffic police
106
the strategy
establish common rules governing best
practices for creating ontologies in
coordinated fashion, with an evidencebased pathway to incremental
improvement
107
The methodology of cross-products
compound terms in ontologies to be defined
as cross-products of simpler terms:
E.g elevated blood glucose is a cross-product of
PATO: increased concentration with FMA: blood and
CheBI: glucose.
= factoring out of ontologies into disciplinespecific modules (orthogonality)
108
The methodology of cross-products
enforcing use of common relations in linking terms
drawn from Foundry ontologies serves
• to ensure that the ontologies are maintained and
revised in tandem
• logically defined relations serve to bind terms in
different ontologies together to create a network
109
CRITERIA
CRITERIA
 opennness
 common formal language.
 collaborative development
 evidence-based maintenance
 identifiers
 versioning
 textual and formal definitions
110
Orthogonality = modularity
• one ontology for each domain
• no need for mappings (which are in
any case too expensive, too fragile,
too difficult to keep up-to-date as
mapped ontologies change)
• everyone knows where to look to
find out how to annotate each kind
of data
111
Ontologies and research groups
using BFO and RO
– OBO Foundry (60 biomedical ontologies, including
GO, OBI, Protein Ontology, Cell Ontology, IDO …
– National Cancer Institute (BiomedGT)
– NIF (NIH Neuroscience Information Framework)
– Cleveland Clinic Semantic Database
– Siemens
– AstraZeneca
– EU (ACGT Cancer Ontology, RAPS, …)
112
Because the ontologies in the
Foundry
are built as orthogonal modules which form an
incrementally evolving network
• scientists are motivated to commit to
developing ontologies because they will need in
their own work ontologies that fit into this
network
• users are motivated by the assurance that the
ontologies they turn to are maintained by
experts
113
More benefits of orthogonality
• helps those new to ontology to find what they
need
• to find models of good practice
• ensures mutual consistency of ontologies
(trivially)
• and thereby ensures additivity of annotations
114
More benefits of orthogonality
• it rules out the sorts of simplification and
partiality which may be acceptable under
more pluralistic regimes
• thereby brings an obligation on the part of
ontology developers to commit to scientific
accuracy and domain-completeness
115
More criteria of a
successful standard
1. intelligibility to users, consistent use of terms
like ‘term’, ‘class’, ‘entity’, ‘object’ …)
2. track record of lessons learned (GO has 10
years of hard user testing)
3. lots of existing users (ontologies are like
telephone networks)
116
COMMON ARCHITECTURE
 The ontology uses relations which are unambiguously
defined following the pattern of definitions laid down
in the Basic Formal Ontology (BFO) including the
Relation Ontology (RO)
http://ifomis.org/bfo
http://www.obofoundry.org/ro/
117
top level
mid-level
Basic Formal Ontology (BFO)
Anatomy Ontology
(FMA*, CARO)
Cell
Ontology
(CL)
domain level
Ontology for
Biomedical
Investigations
(OBI)
Information Artifact
Ontology
(IAO)
Cellular
Component
Ontology
(FMA*, GO*)
Environment
Ontology
(EnvO)
Subcellular Anatomy Ontology (SAO)
Sequence Ontology
(SO*)
Protein Ontology
(PRO*)
Spatial Ontology
(BSPO)
Infectious
Disease
Ontology
(IDO*)
Phenotypic
Quality
Ontology
(PaTO)
Biological
Process
Ontology (GO*)
Molecular
Function
(GO*)
OBO Foundry Modular Organization
118
BFO:continuant
continuant
independent
continuant
portion of
material
object
fiat object
part
object
aggregate
object
boundary
dependent
continuant
site
generically
dependent
continuant
information
artifact
spatial
region
specifically
dependent
continuant
quality
realizable
entity
0D-region
1D-region
2D-region
function
3D-region
role
disposition
BFO:occurrent
occurrent
processual
entity
process
spatiotemporal
region
scattered
spatiotemporal
region
connected
spatiotemporal
region
temporal
region
scattered
temporal
region
connected
temporal
region
fiat process
part
spatiotemporal
instant
temporal
instant
process
aggregate
spatiotemporal
interval
temporal
interval
process
boundary
processual
context
Example: The Cell Ontology
top level
mid-level
Basic Formal Ontology (BFO)
Anatomy Ontology
(FMA*, CARO)
Cell
Ontology
(CL)
domain level
Ontology for
Biomedical
Investigations
(OBI)
Information Artifact
Ontology
(IAO)
Cellular
Component
Ontology
(FMA*, GO*)
Environment
Ontology
(EnvO)
Subcellular Anatomy Ontology (SAO)
Sequence Ontology
(SO*)
Protein Ontology
(PRO*)
Spatial Ontology
(BSPO)
Infectious
Disease
Ontology
(IDO*)
Phenotypic
Quality
Ontology
(PaTO)
Biological
Process
Ontology (GO*)
Molecular
Function
(GO*)
OBO Foundry Modular Organization
122