Download Development of the Field of Biomedical Ontology

Document related concepts

Infection control wikipedia , lookup

Compartmental models in epidemiology wikipedia , lookup

Eradication of infectious diseases wikipedia , lookup

Fetal origins hypothesis wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Transmission (medicine) wikipedia , lookup

Pandemic wikipedia , lookup

Infection wikipedia , lookup

Epidemiology wikipedia , lookup

Syndemic wikipedia , lookup

Public health genomics wikipedia , lookup

Disease wikipedia , lookup

Multiple sclerosis research wikipedia , lookup

Transcript
Towards an Ontological Treatment
of Disease and Diagnosis
Barry Smith
New York State Center of Excellence
in Bioinformatics and Life Sciences
University at Buffalo
http://ontology.buffalo.edu/smith
1
Anders Grimsmo, “Patients, diagnoses and processes in
general practice in the Nordic countries. An attempt to
make data from computerised medical records available for
comparable statistics”
Scandinavian Journal of Primary Health Care, 2001
 “The major obstacle to extracting more
epidemiological data from computerised
medical records is caused by information
in the databases not being uniquely linked
to episodes of care.”
http://ontology.buffalo.edu/smith
2
What is to be linked with what?
What is information in the databases about?
To answer this question (to assign numbers to
discrete entities), we need a good ontology of the
care domain, including episodes of care on the one
hand and entities on the side of the patient on the
other.
http://ontology.buffalo.edu/smith
3
and we need to take account of
context
– of multiple diseases
– of the patient’s style of life
– of the patient’s environment
– of specific aspects of the presentation
http://ontology.buffalo.edu/smith
4
we do this by paying attention to
natural language
but the more we succeed in this, the
more difficult it is to aggregate the data
disease of UMLSitis
http://ontology.buffalo.edu/smith
5
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
6
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
7
R T U New York State
Center of Excellence in
OBOFoundry
Bioinformatics & Life Sciences
8
Buffalo Longitudinal Cancer Data
Even with the best of intentions, and even if
we just use one coding system, results are
not always what they seem
Problem of SNOMEDitis
with acknowledgements to NLM: 1R21LM009824-01A1
11
Why does SNOMED change so much?
12
SNOMED CT: Anaplasma marginale (organism)
with acknowledgements to
NLM: 1R21LM009824-01A1
13
infectious agent
is_a navigational concept
with acknowledgements to
Werner Ceusters
14
NLM: 1R21LM009824-01A1
infectious agent
is_a navigational concept
15
with acknowledgements to
NLM: 1R21LM009824-01A1
16
with acknowledgements to
NLM: 1R21LM009824-01A1
17
with acknowledgements to
NLM: 1R21LM009824-01A1
18
with acknowledgements to
NLM: 1R21LM009824-01A1
19
20
Why does SNOMED change so much?
• Problems with ‘concept’  no real coherence
as to what SNOMED is representing
21
Why does SNOMED change so much?
• No proper hierarchy (of more and less
general)
• Confusion of disorders (continuants) with
etiological and diagnostic processes
(occurrents) and of both with information
entities (‘findings’)
• Confusion of ‘disorders’ with ‘morphological
abnormalities’
22
SNOMED CT
128477000
44132006
Abscess (disorder)
Abscess (morphologic abnormality)
23
Epistemology and Combinatorial
Explosion
• Epistaxis/nosebleed
– Epistaxis (disorder)
– Nosebleed/epistaxis symptom (finding)
– On examination - epistaxis (disorder)
– Has nosebleeds - epistaxis (disorder)
– Evidence of recent epistaxis (finding)
from Bill Hogan
Epistemology and Combinatorial
Explosion
• Rash
– Cutaneous eruption (morphologic abnormality),
with synonym Rash
– Eruption of skin (disorder), with synonym Rash
– Complaining of a rash (finding)
– On examination - a rash (finding)
• Dry skin
–
–
–
–
Dry skin (finding)
Complaining of dry skin (finding)
On examination - dry skin (finding)
Dry skin dermatitis (disorder)
from Bill Hogan
An Alternative: Basic Formal Ontology
 360 BC: Aristotle’s Metaphysics
 1879: Invention of modern logic (Boole,
Frege)
 1920: The problem of the Unity of
Science (Logical Positivism)
 1940 Birth of computing (Turing)
http://ontology.buffalo.edu/smith
30
Ontology Timeline





1970:
1980:
1990:
2000:
2007:
AI, Robotics (J. McCarthy, P. Hayes)
KIF: Knowledge Interchange Format
Description Logics
Semantic Web (OWL), Protégé
National Center for Biomedical
Ontology (NCBO) Bioportal
http://ontology.buffalo.edu/smith
31
Uses of ‘ontology’ in PubMed abstracts
32
Biomedical Ontology in PubMed
1000
900
860
900
800
700
618
600
501
500
412
400
300
283
200
143
100
0
35
2000
37
2001
69
2002
2003
2004
2005
2006
2007
2008
2009
By far the most successful: GO (Gene Ontology)
34
Ontology Timeline
 1990: Human Genome Project
 1999: The Gene Ontology (GO) – Model
Organism Research
 2005: The Open Biomedical Ontologies
(OBO) Foundry
 2010: Ontology for General Medical
Science
http://ontology.buffalo.edu/smith
35
The GO is a controlled vocabulary for
use in annotating data
 multi-species, multi-disciplinary, open
source
 contributing to the cumulativity of
scientific results obtained by distinct
research communities
 compare use of kilograms, meters,
seconds … in formulating
experimental results
36
NIH Mandates for Data Sharing
Organizations such as the NIH now require use of
common standards in a way that will ensure that
the results obtained through funded research are
more easily accessible to external groups.
ODR will be created in such a way that its use will
address the new NIH mandates. It will designed
also to allow information presented in its terms
to be usable in satisfying other regulatory
purposes—such as submissions to FDA.
http://ontology.buffalo.edu/smith
37
GO provides answers to three
types of questions:
for each gene product (protein ...)
 in what parts of the cell has it been
identified? Cell Constituent Ontology
 exercising what types of molecular
functions? Molecular Function Ontology
 with what types of biological processes?
Biological Process Ontology
38
39
40
= part_of
= subtype_of
Gene Product
Associations
41
$100 mill. invested in literature
curation using GO
 over 11 million annotations relating gene
products described in the UniProt, Ensembl
and other databases to terms in the GO
 ontologies provide the basis for capturing
biological theories in computable form
 in contrast to terminologies and thesauri –
which focus on socially diverse uses of
language – the GO method focuses on
commonly shared results of basic biological
science
42
A new kind of biological research
based on analysis and comparison of the massive
quantities of annotations linking ontology terms to
raw data, including genomic data, clinical data,
public health data
What 10 years ago took multiple groups of
researchers months of data comparison effort, can
now be performed in milliseconds
43
The GO covers only generic (‘normal’)
biological entities of three sorts:
– cellular components
– molecular functions
– biological processes
It does not provide representations of
diseases, symptoms, genetic abnormalities …
How to extend the GO methodology to other
domains of biology and medicine?
45
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
The Open Biomedical Ontologies (OBO) Foundry
46
OBO Foundry ontologies
all follow the same principles to ensure
interoperability
– GO Gene Ontology
– ChEBI Chemical Ontology
– PRO Protein Ontology
– CL Cell Ontology
– ...
– OGMS Ontology for General Medical
Science
47
Basic Formal Ontology:
GO at a high level
http://ontology.buffalo.edu/smith
48
Basic Formal Ontology (BFO)
A simple top-level ontology to support
information integration in scientific research
No abstracta
Nothing propositional
Clear hierarchy
No overlap with domain ontologies
No confusion of ontology with epistemology
No confusion of terms with what terms
represent in reality
49
Basic Formal Ontology
Continuant
Independent
Continuant
Occurrent
(Process, Event)
Dependent
Continuant
http://ifomis.uni-saarland.de/bfo/
50
BFO and the 3 Gene Ontologies (GO)
Continuant
Occurrent
Biological Process
Independent
Continuant
Cell Component
Dependent
Continuant
Molecular Function
Kumar A., Smith B, Borgelt C. Dependence relationships between Gene Ontology
terms based on TIGR gene product annotations. CompuTerm 2004, 31-38.
Bada M, Hunter L. Enrichment of OBO Ontologies. J Biomed Inform. 2006 Jul 26
51
Users of BFO
NCI BiomedGT
SNOMED CT
Ontology for General Medical Science
(OGMS)
ACGT Clinical Genomics Trials on Cancer –
Master Ontology / Formbuilder (Case
Report Forms for Cancer Clinical Trials)
53
Users of BFO
MediCognos / Microsoft Healthvault
Cleveland Clinic Semantic Database in
Cardiothoracic Surgery
Major Histocompatibility Complex (MHC)
Ontology (NIAID)
Neuroscience Information Framework
Standard (NIFSTD) and Constituent
Ontologies
54
Users of BFO
Interdisciplinary Prostate Ontology (IPO)
Nanoparticle Ontology (NPO): Ontology for
Cancer Nanotechnology Research
Neural Electromagnetic Ontologies (NEMO)
ChemAxiom – Ontology for Chemistry
Ontology for Risks Against Patient Safety
(RAPS/REMINE) (EU FP7)
IDO Infectious Disease Ontology (NIAID)
55
Infectious Disease Ontology Consortium
• MITRE, Mount Sinai, UTSouthwestern –
Influenza
• IMBB/VectorBase – Vector borne diseases (A.
gambiae, A. aegypti, I. scapularis, C. pipiens, P.
humanus)
• Colorado State University – Dengue Fever
• Duke University – Tuberculosis, Staph. aureus
• Case Western Reserve – Infective Endocarditis
• University of Michigan – Brucellosis
56
The OBO Foundry
•
•
•
•
•
•
•
•
•
•
•
•
•
•
GO Gene Ontology
CL Cell Ontology
SO Sequence Ontology
ChEBI Chemical Ontology
PATO Phenotype (Quality) Ontology
FMA Foundational Model of Anatomy
ChEBI Chemical Entities of Biological Interest
CARO Common Anatomy Reference Ontology
PRO Protein Ontology
Infectious Disease Ontology
Plant Ontology
Environment Ontology
Ontology for Biomedical Investigations
RNA Ontology
57
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
The Open Biomedical Ontologies (OBO) Foundry
58
RELATION TO
TIME
GRANULARITY
INDEPENDENT
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
CONTINUANT
DEPENDENT
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RNAO, PRO)
OCCURRENT
Molecular Function
(GO)
Organism-Level
Process
(GO)
Cellular Process
(GO)
Molecular
Process
(GO)
rationale of OBO Foundry coverage
(homesteading principle)
59
OBO Foundry organized in terms of
Basic Formal Ontology
Methodology of downward population
Each Foundry ontology can be seen as an
extension of a single upper level ontology
(BFO)
60
Example: The Cell Ontology
Ontology for General Medical Science
BFO-based ontology for clinical medicine
Continuant
Independent
Continuant
Anatomical
Component
+
Disorder
Occurrent
Dependent
Continuant
Pathological Process
+
Clinical Encounter
Disease
+
Bodily Quality
62
Continuant
Independent
Continuant
Quality
Dependent
Continuant
Disposition
..... .....
63
realization depends_on realizable
Continuant
Independent
Continuant
Dependent
Continuant
bearer
disposition
Occurrent
Process of
realization
.... ..... .......
67
the universal red
instantiates
the universal eye
instantiates
this particular case depends_on an instance of eye
of redness (of a
(in a particular fly)
particular fly eye)
70
color
is_a
red
instantiates
the particular case
of redness (of a
particular fly eye)
anatomical structure
is_a
eye
instantiates
an instance of an
depends on
eye (in a particular
fly)
71
portion of
water
portion of
ice
instantiates
at t1
portion of
liquid water
instantiates
at t2
Phase
transitions
portion of
gas
instantiates
at t3
this portion of H20
72
human
in nature, no sharp
boundaries here
embryo
instantiates
at t1
fetus
neonate
instantiates
at t2
instantiates
at t3
infant
instantiates
at t4
child
instantiates
at t5
adult
instantiates
at t6
John (exists continuously)
73
temperature
in nature, no sharp
boundaries here
37ºC
37.1ºC
instantiates
at t1
instantiates
at t2
37.2ºC
instantiates
at t3
37.3ºC
instantiates
at t4
37.4ºC
37.5ºC
instantiates
at t5
instantiates
at t6
John’s temperature (exists continuously)
74
coronary heart
disease
early lesions
and small
fibrous plaques
instantiates
at t1
asymptomatic
(‘silent’)
infarction
instantiates
at t2
surface
disruption of
plaque
instantiates
at t3
unstable
angina
instantiates
at t4
stable
angina
instantiates
at t5
John’s coronary heart disease (exists continuously)
time
75
OGMS
Ontology for General Medical Science
http://code.google.com/p/ogms/
76
OGMS: The Big Picture
77
Disposition (potentiality)
A disposition is
a realizable entity which is such that, if it
ceases to exist, then its bearer is
physically changed,
whose realization occurs, in virtue of the
bearer’s physical make-up, when this
bearer is in some special physical
circumstances
89
Disorder
independent continuant
that is part of an organism
that deviates from the
canonical anatomy of
the organism
in a way that gives rise to
pathological processes
90
Disorder
serves as the bearer of a disposition to
pathological processes
A part of the body that typically gets larger
over time
91
Disease course
• the totality of all disease processes
through which a given disease instance
is realized .
• multiple disease courses will be
associated with the same disorder type,
for example in reflection of the
presence or absence of pharmaceutical
or other interventions, of differences in
environmental influence, and so forth.
The Big Picture
94
A disease is a disposition rooted in a
physical disorder in the organism and
realized in pathological processes.
produces
etiological process
bears
disorder
realized_in
disposition
pathological process
produces
diagnosis
interpretive process
produces
signs & symptoms
used_in
abnormal bodily features
recognized_as
95
Definitions - Foundational Terms

Disorder =def. – A causally linked combination of
physical components that is clinically abnormal.

Pathological Process =def. – A bodily process that is
a manifestation of a disorder and is clinically
abnormal.

Disease =def. – A disposition (i) to undergo
pathological processes that (ii) exists in an organism
because of one or more disorders in that organism.
97
Influenza - infectious







Etiological process - infection of
airway epithelial cells with influenza
virus
 produces
Disorder - viable cells with influenza
virus
 bears
Disposition (disease) - flu
 realized_in
Pathological process - acute
inflammation
 produces
Abnormal bodily features
 recognized_as
Symptoms - weakness, dizziness
Signs - fever







Symptoms & Signs
 used_in
Interpretive process
 produces
Hypothesis - rule out influenza
 suggests
Laboratory tests
 produces
Test results - elevated serum antibody titers
 used_in
Interpretive process
 produces
Result - diagnosis that patient X has a
disorder that bears the disease flu
But the disorder also induces normal
physiological processes (immune response)
that can results in the elimination of the 98
disorder (transient disease course).
Huntington’s Disease - genetic







Etiological process - inheritance of
>39 CAG repeats in the HTT gene
 produces
Disorder - chromosome 4 with
abnormal mHTT
 bears
Disposition (disease) - Huntington’s
disease
 realized_in
Pathological process - accumulation of
mHTT protein fragments, abnormal
transcription regulation, neuronal cell
death in striatum
 produces
Abnormal bodily features
 recognized_as
Symptoms - anxiety, depression
Signs - difficulties in speaking and
swallowing







Symptoms & Signs
 used_in
Interpretive process
 produces
Hypothesis - rule out Huntington’s
 suggests
Laboratory tests
 produces
Test results - molecular detection of
the HTT gene with >39CAG repeats
 used_in
Interpretive process
 produces
Result - diagnosis that patient X has a
disorder that bears the disease
Huntington’s disease
99
HNPCC - genetic pre-disposition







Etiological process - inheritance of a mutant mismatch repair gene
 produces
Disorder - chromosome 3 with abnormal hMLH1
 bears
Disposition (disease) - Lynch syndrome
 realized_in
Pathological process - abnormal repair of DNA mismatches
 produces
Disorder - mutations in proto-oncogenes and tumor suppressor genes with
microsatellite repeats (e.g. TGF-beta R2)
 bears
Disposition (disease) - non-polyposis colon cancer
 realized in
Symptoms (including pain)
100
Dispositions and Predispositions




All diseases are dispositions; not all dispositions are
diseases.
A predisposition is a disposition to acquire a disposition.
Predisposition to Disease of Type X =def. – A disposition
in an organism that constitutes an increased risk of the
organism’s subsequently developing the disease X.
HNPCC is caused by a
 disorder (mutation) in a DNA mismatch repair gene that
 disposes to the acquisition of additional mutations from
defective DNA repair processes, and thus is a
 predisposition to the development of colon cancer.
101
Cirrhosis - environmental exposure
•
•
•
•
•
•
•
Etiological process - phenobarbitolinduced hepatic cell death
– produces
Disorder - necrotic liver
– bears
Disposition (disease) - cirrhosis
– realized_in
Pathological process - abnormal tissue
repair with cell proliferation and
fibrosis that exceed a certain
threshold; hypoxia-induced cell death
– produces
Abnormal bodily features
– recognized_as
Symptoms - fatigue, anorexia
Signs - jaundice, splenomegaly







Symptoms & Signs
 used_in
Interpretive process
 produces
Hypothesis - rule out cirrhosis
 suggests
Laboratory tests
 produces
Test results - elevated liver enzymes in
serum
 used_in
Interpretive process
 produces
Result - diagnosis that patient X has a
disorder that bears the disease
cirrhosis
Systemic arterial hypertension
•
•
•
•
•
•
•
Etiological process – abnormal
reabsorption of NaCl by the kidney
– produces
Disorder – abnormally large scattered
molecular aggregate of salt in the
blood
– bears
Disposition (disease) - hypertension
– realized_in
Pathological process – exertion of
abnormal pressure against arterial wall
– produces
Abnormal bodily features
– recognized_as
Symptoms Signs – elevated blood pressure







Symptoms & Signs
 used_in
Interpretive process
 produces
Hypothesis - rule out hypertension
 suggests
Laboratory tests
 produces
Test results  used_in
Interpretive process
 produces
Result - diagnosis that patient X has a
disorder that bears the disease hypertension
Type 2 Diabetes Mellitus
•
•
•
•
•
•
•
Etiological process –
– produces
Disorder – abnormal pancreatic beta
cells and abnormal muscle/fat cells
– bears
Disposition (disease) – diabetes
mellitus
– realized_in
Pathological processes – diminished
insulin production , diminished
muscle/fat uptake of glucose
– produces
Abnormal bodily features
– recognized_as
Symptoms – polydipsia, polyuria,
polyphagia, blurred vision
Signs – elevated blood glucose and
hemoglobin A1c







Symptoms & Signs
 used_in
Interpretive process
 produces
Hypothesis - rule out diabetes mellitus
 suggests
Laboratory tests – fasting serum blood
glucose, oral glucose challenge test, and/or
blood hemoglobin A1c
 produces
Test results  used_in
Interpretive process
 produces
Result - diagnosis that patient X has a
disorder that bears the disease type 2
diabetes mellitus
Type 1 hypersensitivity to penicillin
•
•
•
•
•
•
•
Etiological process – sensitizing of mast
cells and basophils during exposure to
penicillin-class substance
– produces
Disorder – mast cells and basophils with
epitope-specific IgE bound to Fc epsilon
receptor I
– bears
Disposition (disease) – type I
hypersensitivity
– realized_in
Pathological process – type I
hypersensitivity reaction
– produces
Abnormal bodily features
– recognized_as
Symptoms – pruritis, shortness of breath
Signs – rash, urticaria, anaphylaxis







Symptoms & Signs
 used_in
Interpretive process
 produces
Hypothesis  suggests
Laboratory tests –
 produces
Test results – occasionally, skin testing
 used_in
Interpretive process
 produces
Result - diagnosis that patient X has a
disorder that bears the disease type 1
hypersensitivity to penicillin
Next steps in OGMS
• classification of distinct types of disease
courses for instances of each disease
type
– in different typical environments
– with and without treatment
– with treatment plan that is or is not
realized by the patient
– where the disease exists in combination
with other diseases
Next steps in OGMS
• modify the Big Picture to take account of
differences between primary care and
specialist care
The Big Picture
108
Definitions - Clinical Evaluation Terms


Sign =def. – A bodily feature of a patient that is
observed in a physical examination and is deemed by
the clinician to be of clinical significance. (Objectively
observable features)
Symptom =def. – An experienced bodily feature of a
patient that is observed by and observable only by the
patient and is of the type that can be hypothesized by a
patient to be a realization of a disease. (A restricted
family of phenomena including pain, nausea, anger,
drowsiness, which are of their nature experienced in the
first person)
Symptoms are subjective. But this does not mean that there is
no objective fact of the matter whether a given symptom exists
109
Definition: Etiology

Etiological Process =def. – A process in an organism that
leads to a subsequent disorder.

Example: toxic chemical exposure resulting in a mutation in
the genomic DNA of a cell; infection of a human with a
pathogenic virus; inheritance of two defective copies of a
metabolic gene

The etiological process creates the physical basis of that
disposition to pathological processes which is the disease.
110
Definitions - Diagnosis

Clinical Picture =def. – A representation of a
clinical phenotype that is inferred from the
combination of laboratory, image and clinical
findings about a given patient.

Diagnosis =def. – A conclusion of an interpretive
process that has as input a clinical picture of a given
patient and as output an assertion to the effect that the
patient has a disease of such and such a type.
111
Definitions - Qualities

Manifestation of a Disease =def. – A bodily feature of a
patient that is (a) a deviation from clinical normality that exists
in virtue of the realization of a disease and (b) is observable.





Observability includes observable through elicitation of response or
through the use of special instruments.
Preclinical Manifestation of a Disease =def. – A
manifestation of a disease that exists prior to its becoming
detectable in a clinical history taking or physical examination.
Clinical Manifestation of a Disease =def. – A manifestation
of a disease that is detectable in a clinical history taking or
physical examination.
Phenotype =def. – A (combination of) bodily feature(s) of an
organism determined by the interaction of its genetic make-up
and environment.
Clinical Phenotype =def. – A clinically abnormal phenotype.112
For an ontology to succeed,
 potential users should be incentivized to use it,
 it should be populated using the terms that
they need and using definitions that conform
to their understanding of these terms
 it should be easily correctable in light of new
research discoveries
 it should enable the data annotated in its
terms to be easily integrated with legacy data
from related fields
 it should be easily extendable to new kinds of
data.
http://ontology.buffalo.edu/smith
113
A new kind of Electronic Health
Record
resting on the use of the same (public
domain) ontologies in mapping proprietary
EHR vocabularies to yield patient data
annotated in consistent ways that support
 integrated care and continuity of care
 comparison and integration for diagnosis and
meta-analysis
 secondary uses for research
114