Download MIE_Tutorial_OBOFoun.. - Buffalo Ontology Site

Document related concepts

History of genetic engineering wikipedia , lookup

Gene wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Gene therapy wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genome (book) wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Microevolution wikipedia , lookup

RNA-Seq wikipedia , lookup

NEDD9 wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Public health genomics wikipedia , lookup

Designer baby wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Transcript
Biomedical Ontologies:
The State of the Art
Barry Smith and Werner Ceusters
MIE, Sarajevo, August 30
1
Part 1: Barry Smith
Ontologies are Representations of
What is General in Reality
Part 2: Werner Ceusters
Referent Tracking: Pinning
Ontologies to Instances in Reality
2
Uses of ‘ontology’ in PubMed abstracts
3
By far the most successful: GO (Gene Ontology)
4
You’re interested
in which genes
control heart
muscle
development
17,536 results
5
time
Defense response
Immune response
Response to stimulus
Toll regulated genes
JAK-STAT regulated genes
Microarray data
shows changed
expression of
thousands of genes.
Puparial adhesion
Molting cycle
hemocyanin
Amino acid catabolism
Lipid metobolism
How will you spot
the patterns?
Peptidase activity
Protein catabloism
Immune response
Immune response
Toll regulated genes
attacked control
Tree:
pearson
Coloredby:
by:
arson
lw n3d
... lw n3d ... Colored
Copy
of Copy
C5_RMA
Copy
ofofCopy
of(Defa...
C5_RMA (Defa...
6
You’re interested in which
of your hospital’s patient
data is relevant to
understanding how genes
control heart muscle
development
7
Lab / pathology data
EHR data
Clinical trial data
Family history data
Medical imaging
Microarray data
Model organism data
Flow cytometry
Mass spec
Genotype / SNP data
How will you spot the patterns?
How will you find the data you
need?
8
One strategy for bringing order into this
huge conglomeration of data is through the
use of Common Data Elements
• Discipline-specific (cancer, NIAID, …)
• Do not solve the problems of balkanization
(data siloes)
• Do not evolve gracefully as knowledge
advances
• Support data cumulation, but do not readily
support data integration and computation
9
An ontology is not a terminology
Existing term lists and CDEs
• built to serve specific data-processing
• in ad hoc ways
Ontologies
• designed from the start to ensure
integratability and reusability of data
• by incorporating a common logical
structure
10
How does the
Gene Ontology work?
with thanks to
Jane Lomax, Gene Ontology Consortium
11
GO provides a controlled system of
representations for use in annotating data
• multi-species, multi-disciplinary, open
source
• contributing to the cumulativity of
scientific results obtained by distinct
research communities
• compare use of kilograms, meters,
seconds … in formulating experimental
results
12
13
Definitions
14
Gene products involved in cardiac muscle
development in humans
15
GO provides answers to three types
of questions
for each gene product
• in what parts of the cell has it been identified?
• exercising what types of molecular functions?
• with what types of biological processes?
when is a particular gene product involved
• in the course of normal development?
• in the process leading to abnormality
with what functions is the gene product
associated in other biological processes?
16
Some pain-related terms in GO
GO:0048265 response to pain
GO:0019233 sensory perception of pain
GO:0048266 behavioral response to pain
GO:0019234 sensory perception of fast pain
GO:0019235 sensory perception of slow pain
GO:0051930 regulation of sensory perception of pain
GO:0050967 detection of electrical stimulus during sensory perception of pain
GO:0050968 detection of chemical stimulus involved in sensory perception of
pain
GO:0050966 detection of mechanical stimulus involved in sensory perception of
pain
17
GO:0050968 detection of chemical stimulus
involved in sensory perception of pain
18
GO provides a tool for
algorithmic reasoning
19
Hierarchical view representing
relations between represented
types
20
GO allows a new kind of
biological research, based on
analysis and comparison of the
massive quantities of
annotations linking GO terms to
gene products
21
One standard method
Sjöblöm T, et al. analyzed13,023 genes in
11 breast and 11 colorectal cancers
using functional information captured by GO
for given gene product types
identified 189 as being mutated at significant
frequency and thus as providing targets for
diagnostic and therapeutic intervention.
Science. 2006 Oct 13;314(5797):268-74.
22
Uses of GO in studies of:
• Biomedical discovery acceleration, with applications to
craniofacial development. PMID: 19325874
• Persistent changes in spinal cord gene expression
after recovery from inflammatory hyperalgesia: a
preliminary study on pain memory. PMID: 18366630
• Spinal cord transcriptional profile analysis reveals
protein trafficking and RNA processing as prominent
processes regulated by tactile allodynia. PMID:
17069981
• Immune system involvement in abdominal aortic
aneurisms (PMID 17634102)
23
$100 mill. invested in literature
curation using GO
over 11 million annotations relating gene
products described in the UniProt,
Ensembl and other databases to terms in
the GO
experimental results reported in 52,000
scientific journal articles manually annoted
by expert biologists using GO
ontologies provide the basis for capturing
biological theories in computable form
24
GO is amazingly successful in
overcoming problems of balkanization
but it covers only generic biological entities of
three sorts:
– cellular components
– molecular functions
– biological processes
and it does not provide representations of
diseases, symptoms, …
25
Extending the GO methodology to
other domains of biology and
medicine
26
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
The Open Biomedical Ontologies (OBO) Foundry
27
Ontology
Scope
URL
Custodians
Cell Ontology
(CL)
cell types from prokaryotes
to mammals
obo.sourceforge.net/cgibin/detail.cgi?cell
Jonathan Bard, Michael
Ashburner, Oliver Hofman
Chemical Entities of Biological Interest (ChEBI)
molecular entities
ebi.ac.uk/chebi
Paula Dematos,
Rafael Alcantara
Common Anatomy Reference Ontology (CARO)
anatomical structures in
human and model organisms
(under development)
Melissa Haendel, Terry
Hayamizu, Cornelius Rosse,
David Sutherland,
Foundational Model of
Anatomy (FMA)
structure of the human body
fma.biostr.washington.
edu
JLV Mejino Jr.,
Cornelius Rosse
Functional Genomics
Investigation Ontology
(FuGO)
design, protocol, data
instrumentation, and analysis
fugo.sf.net
FuGO Working Group
Gene Ontology
(GO)
cellular components,
molecular functions,
biological processes
www.geneontology.org
Gene Ontology Consortium
Phenotypic Quality
Ontology
(PaTO)
qualities of anatomical
structures
obo.sourceforge.net/cgi
-bin/ detail.cgi?
attribute_and_value
Michael Ashburner, Suzanna
Lewis, Georgios Gkoutos
Protein Ontology
(PrO)
protein types and
modifications
(under development)
Protein Ontology Consortium
Relation Ontology (RO)
relations
obo.sf.net/relationship
Barry Smith, Chris Mungall
RNA Ontology
(RnaO)
three-dimensional RNA
structures
(under development)
RNA Ontology Consortium
Sequence Ontology
(SO)
properties and features of
nucleic sequences
song.sf.net
Karen Eilbeck
28
OBO Foundry
recognized by NIH as framework to
address mandates for re-usability of data
collected through Federally funded
research
see NIH PAR-07-425: Data Ontologies for
Biomedical Research (R01)
29
The OBO Foundry
Initial Candidate Members
– GO Gene Ontology
– CL Cell Ontology
– SO Sequence Ontology
– ChEBI Chemical Ontology
– PATO Phenotype (Quality) Ontology
– FMA Foundational Model of Anatomy
– ChEBI Chemical Entities of Biological Interest
– CARO Common Anatomy Reference Ontology
– PRO Protein Ontology
30
The OBO Foundry
Under development
– Disease Ontology
– Infectious Disease Ontology
– Mammalian Phenotype Ontology
– Plant Trait Ontology
– Environment Ontology
– Ontology for Biomedical Investigations
– Behavior Ontology
– RNA Ontology
– RO Relation Ontology
31
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
The Open Biomedical Ontologies (OBO) Foundry
32
OBO Foundry is organized in terms
of Basic Formal Ontology
Each Foundry ontology can be seen as an
extension of a single upper level ontology
(BFO)
33
Basic Formal Ontology (BFO)
Continuant
Independent
Continuant
Occurrent
(Process, Event)
Dependent
Continuant
http://ifomis.uni-saarland.de/bfo/
34
Fundamental Dichotomy
• Continuants preserve their identity through
change
vs.
• Occurrents (aka processes)
– have temporal parts
– unfold themselves in successive phases
– exist only in their phases
– have all their parts of necessity
35
Ontology and Referent Tracking
types
Continuant
Independent
Continuant
Dependent
Continuant
thing
quality
Occurrent
process, event
.... ..... .......
instances
36
RELATION TO
TIME
GRANULARITY
INDEPENDENT
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
CONTINUANT
DEPENDENT
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RNAO, PRO)
OCCURRENT
Molecular Function
(GO)
Organism-Level
Process
(GO)
Cellular Process
(GO)
Molecular
Process
(GO)
rationale of OBO Foundry coverage
(homesteading principle)
37
RELATION
TO TIME
CONTINUANT
INDEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
CELL AND
CELLULAR
COMPONENT
MOLECULE
Family, Community,
Deme, Population
Organism
(FMA,
(NCBI
CARO)
Taxonomy)
Cell
(CL)
Cell Component
(FMA,
GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
DEPENDENT
ENVIRONMENT
COMPLEX OF
ORGANISMS
OCCURRENT
Organ
Function
(FMP,
CPRO)
Population
Phenotype
Population
Process
Phenotypic
Quality
(PaTO)
Biological
Process
(GO)
Cellular
Function
(GO)
Molecular Function
(GO)
Molecular
Process
(GO)
38
The Gene Ontology (GO)
Continuant
Occurrent
biological process
Independent
Continuant
Dependent
Continuant
cell component
molecular function
Kumar A., Smith B, Borgelt C. Dependence relationships between Gene Ontology
terms based on TIGR gene product annotations. CompuTerm 2004, 31-38.
Bada M, Hunter L. Enrichment of OBO Ontologies. J Biomed Inform. 2006 Jul 26
39
Users of BFO
GO / OBO Foundry
NCI BiomedGT
SNOMED CT
ACGT Clinical Genomics Trials on Cancer –
Master Ontology / Formbuilder (Case
Report Forms for Cancer Clinical Trials)
Ontology for Risks Against Patient Safety
(RAPS) (EU)
40
Users of BFO
MediCognos / Microsoft Healthvault
Cleveland Clinic Semantic Database in
Cardiothoracic Surgery
Major Histocompatibility Complex (MHC)
Ontology (NIAID)
Neuroscience Information Framework
Standard (NIFSTD)
41
IDO Infectious Disease Ontology
• MITRE, Mount Sinai, UTSouthwestern –
Influenza
• IMBB/VectorBase – Vector borne diseases (A.
gambiae, A. aegypti, I. scapularis, C. pipiens, P.
humanus)
• Colorado State University – Dengue Fever
• Duke University – Tuberculosis, Staph. aureus
• Case Western Reserve – Infective Endocarditis
• University of Michigan – Brucilosis
42
Users of BFO
Interdisciplinary Prostate Ontology (IPO)
Nanoparticle Ontology (NPO): Ontology for
Cancer Nanotechnology Research
Neural Electromagnetic Ontologies (NEMO):
Ontology-based Tools for Representation
and Integration of Event-related Brain
Potentials
Ontology for General Medical Science
43
depends_on
Continuant
Independent
Continuant
Dependent
Continuant
thing
quality
Occurrent
process, event
quality depends
on bearer
.... ..... .......
44
Specifically dependent
continuants
•
the quality of whiteness of this
cheese
•
your role as lecturer
•
the disposition of this patient to
experience diarrhea
45
depends_on
Continuant
Occurrent
process
Independent
Continuant
Dependent
Continuant
thing
quality
temperature depends
on bearer
.... ..... .......
46
Realizable dependent continuants
plan
function
role
disposition
capability
tendency
continuants
47
Their realizations
execution
expression
exercise
realization
application
course
occurrents
48
Continuant
Independent
Continuant
Dependent
Continuant
Non-realizable
Dependent
Continuant
(quality)
Realizable
Dependent
Continuant
(function, role,
disposition)
..... .....
49
realization depends_on disposition
Continuant
Independent
Continuant
Dependent
Continuant
bearer
disposition
Occurrent
Process of
realization
.... ..... .......
50
Dependence
a is dependent on b =def. a is
necessarily such that if b ceases
to exist than a ceases to exist
51
Specifically Dependent Continuants
Specifically
Dependent
Continuant
if any bearer ceases to
exist, then the quality
or function ceases to
exist
the color of my skin
Quality,
Pattern
Realizable
Dependent
Continuant
the function of my
heart to pump blood
my weight
52
Generically Dependent Continuants
if one bearer ceases to
exist, then the entity can
survive, because there
are other bearers
Generically
Dependent
Continuant
(copyability)
the pdf file on my laptop
the DNA (sequence) in
this chromosome
Information
Object
Gene
Sequence
53
Four distinct classificatory tasks
1.
2.
3.
4.
of people (patients, carriers, …)
of diseases (cases, instances, problems, …)
of courses of disease (symptoms, treatments…)
of representations (records, observations, data,
diagnoses…)
ICD confuses 1. & 2.
HL7, most standard terminologies, confuse 2. and 4
54
Four distinct BFO categories
1. person (patient, carrier, …)
– independent continuant
2. disease (case, instance, problem, …)
– specifically dependent continuant
3. course of disease (symptom, treatment…)
– occurrent
4. representation (record, datum, diagnosis…)
– generically dependent continuant
55
Four distinct BFO categories
1. people (patients, carriers, …)
– independent continuants
2. diseases (cases, instances, problems, …)
– dispositions
3. courses of disease (symptoms, treatments…)
– realizations of dispositions
4. representations (records, data, diagnoses…)
– generically dependent continuants
56
Big Picture (with thanks to Richard Scheuermann)
57
A disease is a disposition rooted in a
physical disorder in the organism and
realized in pathological processes.
produces
etiological process
bears
disorder
realized_in
disposition
pathological process
produces
diagnosis
interpretive process
produces
signs & symptoms
used_in
abnormal bodily features
recognized_as
58
Elucidation of Primitive Terms



‘bodily feature’ - an abbreviation for a physical
component, a bodily quality, or a bodily process.
disposition - an attribute describing the propensity to
initiate certain specific sorts of processes when
certain conditions are satisfied.
clinically abnormal - some bodily feature that



(1) is not part of the life plan for an organism of the relevant
type (unlike aging or pregnancy),
(2) is causally linked to an elevated risk either of pain or
other feelings of illness, or of death or dysfunction, and
(3) is such that the elevated risk exceeds a certain threshold
level.*
*Compare: baldness
59
Definitions - Foundational Terms

Disorder =def. – A causally linked combination of
physical components that is clinically abnormal.

Pathological Process =def. – A bodily process that is
a manifestation of a disorder and is clinically
abnormal.

Disease =def. – A disposition (i) to undergo
pathological processes that (ii) exists in an organism
because of one or more disorders in that organism.
60
Dispositions and Predispositions




All diseases are dispositions; not all dispositions are
diseases.
A predisposition is a disposition.
Predisposition to Disease of Type X =def. – A disposition
in an organism that constitutes an increased risk of the
organism’s subsequently developing the disease X.
HNPCC is caused by a
 disorder (mutation) in a DNA mismatch repair gene that
 disposes to the acquisition of additional mutations from
defective DNA repair processes, and thus is a
 predisposition to the development of colon cancer.
61
Definitions - Clinical Evaluation Terms


Sign =def. – A bodily feature of a patient that is
observed in a physical examination and is deemed by
the clinician to be of clinical significance. (Objectively
observable features)
Symptom =def. – A experienced bodily feature of a
patient that is observed by and observable only by the
patient and is of the type that can be hypothesized by a
patient to be a realization of a disease. (A restricted
family of phenomena including pain, nausea, anger,
drowsiness, which are of their nature experienced in the
first person)
Symptoms are subjective. But this does not mean that there is
no objective fact of the matter whether a given symptom exists
62
Cirrhosis - environmental exposure







Etiological process - phenobarbitolinduced hepatic cell death
 produces
Disorder - necrotic liver
 bears
Disposition (disease) - cirrhosis
 realized_in
Pathological process - abnormal tissue
repair with cell proliferation and
fibrosis that exceed a certain
threshold; hypoxia-induced cell death
 produces
Abnormal bodily features
 recognized_as
Symptoms - fatigue, anorexia
Signs - jaundice, splenomegaly







Symptoms & Signs
 used_in
Interpretive process
 produces
Hypothesis - rule out cirrhosis
 suggests
Laboratory tests
 produces
Test results - elevated liver enzymes in
serum
 used_in
Interpretive process
 produces
Result - diagnosis that patient X has a
disorder that bears the disease
cirrhosis
63
Influenza - infectious







Etiological process - infection of
airway epithelial cells with influenza
virus
 produces
Disorder - viable cells with influenza
virus
 bears
Disposition (disease) - flu
 realized_in
Pathological process - acute
inflammation
 produces
Abnormal bodily features
 recognized_as
Symptoms - weakness, dizziness
Signs - fever







Symptoms & Signs
 used_in
Interpretive process
 produces
Hypothesis - rule out influenza
 suggests
Laboratory tests
 produces
Test results - elevated serum antibody titers
 used_in
Interpretive process
 produces
Result - diagnosis that patient X has a
disorder that bears the disease flu
But the disorder also induces normal
physiological processes (immune response)
that can results in the elimination of the 64
disorder (transient disease course).
Huntington’s Disease - genetic







Etiological process - inheritance of
>39 CAG repeats in the HTT gene
 produces
Disorder - chromosome 4 with
abnormal mHTT
 bears
Disposition (disease) - Huntington’s
disease
 realized_in
Pathological process - accumulation of
mHTT protein fragments, abnormal
transcription regulation, neuronal cell
death in striatum
 produces
Abnormal bodily features
 recognized_as
Symptoms - anxiety, depression
Signs - difficulties in speaking and
swallowing







Symptoms & Signs
 used_in
Interpretive process
 produces
Hypothesis - rule out Huntington’s
 suggests
Laboratory tests
 produces
Test results - molecular detection of
the HTT gene with >39CAG repeats
 used_in
Interpretive process
 produces
Result - diagnosis that patient X has a
disorder that bears the disease
Huntington’s disease
65
HNPCC - genetic pre-disposition







Etiological process - inheritance of a mutant mismatch repair gene
 produces
Disorder - chromosome 3 with abnormal hMLH1
 bears
Disposition (disease) - Lynch syndrome
 realized_in
Pathological process - abnormal repair of DNA mismatches
 produces
Disorder - mutations in proto-oncogenes and tumor suppressor genes with
microsatellite repeats (e.g. TGF-beta R2)
 bears
Disposition (disease) - non-polyposis colon cancer
 realized in
Symptoms (including pain)
66
Definition: Etiology

Etiological Process =def. – A process in an organism that
leads to a subsequent disorder.

Example: toxic chemical exposure resulting in a mutation in
the genomic DNA of a cell; infection of a human with a
pathogenic virus; inheritance of two defective copies of a
metabolic gene

The etiological process creates the physical basis of that
disposition to pathological processes which is the disease.
67
Definitions - Diagnosis

Clinical Picture =def. – A representation of a
clinical phenotype that is inferred from the
combination of laboratory, image and clinical
findings about a given patient.

Diagnosis =def. – A representation of a conclusion of
an interpretive process that has as input a clinical
picture of a given patient and as output an assertion to
the effect that the patient has a disease of such and
such a type.
68
Definitions - Qualities

Manifestation of a Disease =def. – A bodily feature of a
patient that is (a) a deviation from clinical normality that exists
in virtue of the realization of a disease and (b) is observable.





Observability includes observable through elicitation of response or
through the use of special instruments.
Preclinical Manifestation of a Disease =def. – A
manifestation of a disease that exists prior to its becoming
detectable in a clinical history taking or physical examination.
Clinical Manifestation of a Disease =def. – A manifestation
of a disease that is detectable in a clinical history taking or
physical examination.
Phenotype =def. – A (combination of) bodily feature(s) of an
organism determined by the interaction of its genetic make-up
and environment.
Clinical Phenotype =def. – A clinically abnormal phenotype.69