Download New York State Center of Excellence in Bioinformatics & Life Sciences

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Designer baby wikipedia , lookup

RNA-Seq wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Public health genomics wikipedia , lookup

Transcript
New York State
Center of Excellence in
Bioinformatics & Life Sciences
Biomedical Ontology in Buffalo
Part I: The Gene Ontology
Barry Smith and Werner Ceusters
New York State
Center of Excellence in
Bioinformatics & Life Sciences
Biomedical data is siloed
•
•
•
•
•
•
•
•
•
Lab / pathology data
Electronic Health Record data
Clinical trial data
Patient histories
Medical imaging
Microarray data
Protein chip data
Flow cytometry
Genotype / SNP data
2
New York State
Center of Excellence in
Bioinformatics & Life Sciences
Biomedical data is siloed
Data in Pittsburgh
Data owned by Medicare
Data owned by the NIH
Data owned by HIV researchers
Data owned by the Cleveland Clinic
Data owned by regional health organizations
Data owned by mouse biologists
Data owned by Dr McFritz
 NIH mandates for data reusability
3
Department of Philosophy
135 Park Hall
University at Buffalo
Buffalo NY 14260
New York State
Center of Excellence in
Bioinformatics & Life Sciences
Ontology: An antidote to silos
promoting:
• information retrieval
• information consistency, and thus continuity and
cumulation
• information integration
• reasoning
4
New York State
Center of Excellence in
Bioinformatics & Life Sciences
Uses of ‘ontology’ in PubMed abstracts
5
New York State
Center of Excellence in
By far the&most
successful:
Bioinformatics
Life Sciences
GO (The Gene Ontology)
You’re interested
in which genes
control heart
muscle
development
17,536 results
7
time
Defense response
Immune response
Response to stimulus
Toll regulated genes
JAK-STAT regulated genes
Microarray data
shows changed
expression of
thousands of genes.
Puparial adhesion
Molting cycle
hemocyanin
Amino acid catabolism
Lipid metobolism
How will you
spot the
Peptidase activity patterns?
Protein catabloism
Immune response
Immune response
Toll regulated genes
control
attacked
Tree:
pearson
Coloredby:
by:
arson
lw n3d
... lw n3d ... Colored
Copy
of Copy
C5_RMA
Copy
ofofCopy
of(Defa...
C5_RMA (Defa...
8
You’re interested in which
of your hospital’s patient
data is relevant to
understanding how genes
control heart muscle
development
9
Lab / pathology data
EHR data
Clinical trial data
Family history data
Medical imaging
Microarray data
Model organism data
Flow cytometry
Mass spec
Genotype / SNP data
How will you spot the patterns?
How will you find the data you
need?
10
New York State
Center of Excellence in
Bioinformatics & Life Sciences
GO provides a controlled system of 25,000
categories for use in annotating data
• multi-species (model organism research)
• multi-disciplinary
• open source
11
12
Definitions
13
Gene products involved in cardiac muscle
development in humans
14
The GO categorizations are organized in a
way which provides a tool for algorithmic
reasoning
Hierarchical view representing
relations between represented types
15
New York State
Center of Excellence in
Bioinformatics & Life Sciences
$100 mill. invested in literature curation using GO
over 11 million annotations relating gene
products described in the UniProt, Ensembl
and other databases to terms in the GO
experimental results reported in 52,000
scientific journal articles manually annoted by
expert biologists using GO
16
New York State
Center of Excellence in
Bioinformatics & Life Sciences
One standard method
Sjöblöm T, et al. analyzed13,023 genes in 11 breast
and 11 colorectal cancers
using baseline functional information captured by
GO for given gene product types
identified 189 genes as being mutated at significant
frequency and thus as providing targets for
diagnostic and therapeutic intervention.
Science. 2006 Oct 13;314(5797):268-74.
17
New York State
Center of Excellence in
Bioinformatics & Life Sciences
Uses of GO in studies of:
• Persistent changes in spinal cord gene expression after
recovery from inflammatory hyperalgesia: a preliminary
study on pain memory. PMID: 18366630
• Spinal cord transcriptional profile analysis reveals protein
trafficking and RNA processing as prominent processes
regulated by tactile allodynia. PMID: 17069981
• Immune system involvement in abdominal aortic
aneurisms (PMID 17634102)
• Biomedical discovery acceleration, with applications to
craniofacial development. PMID: 19325874
18
New York State
Center of Excellence in
Bioinformatics & Life Sciences
Ontology in Buffalo
Part 2: Problems of Clinical Ontologies
New York State
Center of Excellence in
Bioinformatics & Life Sciences
Source of all data
Reality !
20
New York State
Center of Excellence in
Bioinformatics & Life Sciences
Ultimate goal
A digital copy of the world
21
New York State
Center of Excellence in
Bioinformatics & Life Sciences
Requirements for this digital copy
• R1: A faithful representation of reality
• R2 … of everything that is digitally registered,
what is generic  scientific theories
what is specific  what individual entities exist and how
they
relate
• R3 … which is computable, in order to …
… allow queries over the world’s past and present
… make predictions (diagnostic support, early warnings …)
… fill in gaps
… identify mistakes
...
22
New York State
Center of Excellence in
Bioinformatics & Life Sciences
… the ultimate crystal ball
23
New York State
Center of Excellence in
Bioinformatics & Life Sciences
The ‘binding’ wall
A cartoon of the world
24
New York State
Center of Excellence in
Bioinformatics & Life Sciences
“Better Information” must cover …
1
Patient-specific information
3
Scientific “knowledge”
2
• EHR-EMR-ENR-…
• PHR
• Various modality-related
databases
– Lab, imaging, …
• Textbooks
• Classification systems
• Terminologies
• Ontologies
25
New York State
Center of Excellence in
Bioinformatics & Life Sciences
Key question
How to extend to clinical medicine the standard
of quality of the GO and other ontologies based
in biological science?
26
New York State
Center of Excellence in
Bioinformatics & Life Sciences
2
NCI Thesaurus (April 2008)
27
New York State
Center of Excellence in
Bioinformatics & Life Sciences
2
?
NCI Thesaurus (April 2008)
28
New York State
Center of Excellence in
MeSH: &some
paths from
Bioinformatics
Life Sciences
top to Wolfram Syndrome
All MeSH Categories
2
Diseases Category
Nervous System Diseases
Male Urogenital
Diseases
Eye Diseases
Cranial Nerve
Diseases
Optic Nerve
Diseases
Eye Diseases,
Hereditary
Optic Nerve
Diseases
Optic Atrophy
Female Urogenital Diseases
and Pregnancy Complications
Female Urogenital Diseases
Neurodegenerative
Diseases
Heredodegenerative
Disorders,
Nervous System
Urologic Diseases
Kidney Diseases
Optic Atrophies,
Hereditary
Wolfram
Syndrome
Diabetes Insipidus
32
New York State
Center of Excellence in
What would
it mean
if
Bioinformatics
& Life
Sciences
used in the context of a patient ?
All MeSH Categories
3
Diseases Category
???
Nervous System Diseases
Male Urogenital
Diseases
Eye Diseases
Cranial Nerve
Diseases
Optic Nerve
Diseases
Eye Diseases,
Hereditary
has
…
Optic Nerve
Diseases
Optic Atrophy
???
Female Urogenital Diseases
and Pregnancy Complications
Female Urogenital Diseases
Neurodegenerative
Diseases
Heredodegenerative
Disorders,
Nervous System
Urologic Diseases
Kidney Diseases
Optic Atrophies,
Hereditary
has
Wolfram
Syndrome
Diabetes Insipidus
33
New York State
Center of Excellence in
Bioinformatics & Life Sciences
Biomedical Ontology in BuffaloPart
3: What we do
New York State
Center of Excellence in
Bioinformatics & Life Sciences
The GO is amazingly successful in
overcoming silo problems
but it covers only generic biological entities of
three sorts:
– cellular components
– molecular functions
– biological processes
and it does not provide representations of diseases,
symptoms, …
35
New York State
Center of Excellence in
Bioinformatics & Life Sciences
The core of biomedical ontology in Buffalo
– extending the methodology of high quality
ontologies to other domains of biology and
medicine, and to EHRs and coding systems
– combining ontology with referent tracking
36
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
The Open Biomedical Ontologies (OBO) Foundry
37
New York State
Center of Excellence in
Bioinformatics & Life Sciences
National Center for
Biomedical
Ontology
NCBO
(NCBO)
NIH Roadmap Center for Biomedical Computing
Collaboration of:
Stanford Biomedical Informatics Research
Mayo Clinic
University at Buffalo
38
New York State
Center of Excellence in
Bioinformatics & Life Sciences
National Center for Ontological Research
(NCOR)
• Army Net-Centric Data Strategy Center of
Excellence
– Biometrics Ontology
– Command and Control Ontology
– Universal Core Semantic Layer
39
New York State
Center of Excellence in
Bioinformatics & Life Sciences
Current funded biomedical ontology projects
• Protein Ontology (PRO) (NIH/NIGMS)
• Infectious Disease Ontology (IDO) (NIH/NIAID)
• Realism-Based Versioning for Biomedical Ontologies
(SNOMED) (NIH/NLM)
• Ontology for Risks Against Patient Safety (RAPS) (EU)
• DSM Ontology (to support work on revision of Diagnostic
and Statistical Manual of Mental Disorders
• Cleveland Clinic Semantic Database in Cardiothoracic
Surgery
40
New York State
Center of Excellence in
Bioinformatics & Life Sciences
IDO Consortium
• MITRE, Mount Sinai, UTSouthwestern – Influenza
• IMBB/VectorBase – Vector borne diseases (A.
gambiae, A. aegypti, I. scapularis, C. pipiens, P.
humanus)
• Colorado State University – Dengue Fever
• Duke University – Tuberculosis
• Cleveland Clinic – Infective Endocarditis
• University of Michigan – Brucilosis
41
New York State
Center of Excellence in
Bioinformatics & Life Sciences
“Better Information” must cover …
1
Patient-specific information
3
Scientific “knowledge”
2
• EHR-EMR-ENR-…
• PHR
• Various modality-related
databases
– Lab, imaging, …
• Textbooks
• Classification systems
• Terminologies
• Ontologies
42
New York State
Center of Excellence in
Bioinformatics & Life Sciences
Ontologies
Keeping track of what is general (diabetes,
malaria, nasal bone, nose …)
43
New York State
Center of Excellence in
Bioinformatics & Life Sciences
Referent tracking
Keeping track of what is particular (this particular
nasal bone, this particular fracture, this particular
swimming pool, this particular image …)
44
New York State
Center of Excellence in
Bioinformatics & Life Sciences
eyeGENE
45
New York State
Center of Excellence in
Bioinformatics & Life Sciences
Ontology for Risks Against Patient Safety
46
New York State
Center of Excellence in
Bioinformatics & Life Sciences
REMINE: RT-based adverse event analysis
IUI
#1
#2
Particular description
the patient who is treated
#1’s treatment
#3
#4
#5
the physician responsible for #2
#1’s arthrosis
#1’s anti-inflammatory treatment
#6
#7
#8
#1’s physiotherapy
#1’s stomach
#7’s structure integrity
#9
#10
#11
#1’s stomach ulcer
coming into existence of #9
change brought about by #9
#12
noticing the presence of #9
#13
cognitive representation in #3 about #9
Properties
#1 member C1 since t2
#2 instance_of C3
#2 has_participant #1 since t2
#2 has_agent #3 since t2
#3 member C4 since t2
#4 member C5 since t1
#5 part_of #2
#5 member C2 since t3
#6 part_of #2
#7 member C6 since t2
#8 instance_of C8 since t0
#8 inheres_in #7 since t0
#9 part_of #7 since t3
#10 has_participant #9 at t3
#11 has_agent #9 since t3
#11 has_participant #8 since t3
#11 instance_of C10 at t3
#12 has_participant #9 at t3+x
#12 has_agent #3 at t3+x
#13 is_about #9 since t3+x
47