Download Clinical Trial Ontology Achieving Consensus

Document related concepts

Geographic information system wikipedia , lookup

Inverse problem wikipedia , lookup

Theoretical computer science wikipedia , lookup

Pattern recognition wikipedia , lookup

Data analysis wikipedia , lookup

Data assimilation wikipedia , lookup

Corecursion wikipedia , lookup

Neuroinformatics wikipedia , lookup

Transcript
The Future of Biomedical
Informatics
Barry Smith
University at Buffalo
http://ontology.buffalo.edu/smith
1
1. Biomedical Informatics Needs Data
2. The Problem of Local Coding Schemes
3. NIH Policies for Data Reusability and the
Growth of Clinical Research Consortia
4. Is SNOMED the Solution?
5. The Gene Ontology
6. The OBO Foundry
7. The National Center for Biomedical
Ontology
8. Ontology in Buffalo
2
1. Biomedical Informatics Needs Data
2. The Problem of Local Coding Schemes
3. NIH Policies for Data Reusability and the
Growth of Clinical Research Consortia
4. Is SNOMED the Solution?
5. The Gene Ontology
6. The OBO Foundry
7. The National Center for Biomedical
Ontology
8. Ontology in Buffalo
3
Biomedical Informatics Needs Data
• Four sides of the equation of translational
medicine
• Biological data + clinical data
• Access + usability
4
Problems of gaining access to
clinical data
1. privacy, security, liability
2. incentives (value of data ...)
3. costs (training ...)
5
Making data (re-)usable
through standards
• Standards provide
– common structure and terminology
– single data source for review (less redundant
data)
• Standards allow
– use of common tools and techniques
– common training
– single validation of data
6
Problems with standards
• Not all standards are of equal quality
• Once a bad standard is set in stone you are
creating problems for your children and for
your children’s children
• Standards, especially bad standards, have
costs
7
1. Biomedical Informatics Needs Data
2. The Problem of Local Coding Schemes
3. NIH Policies for Data Reusability and the
Growth of Clinical Research Consortia
4. Is SNOMED the Solution?
5. The Gene Ontology
6. The OBO Foundry
7. The National Center for Biomedical
Ontology
8. Ontology in Buffalo
8
Multiple kinds of data in multiple
kinds of silos
Lab / pathology data
Clinical trial data, including regulatory data
Electronic Health Record data
Patient histories (free text)
Medical imaging
Microarray data
Protein chip data
Flow cytometry
Mass spectrometry data
Genotype / SNP data
Mouse data, fly data, chicken data ...
9
How to find your data?
How to find other people’s data?
How to reason with data when you find it?
How to work out what data you do not have?
How to understand the significance of your
own data from 3 years ago?
1
1. Biomedical Informatics Needs Data
2. The Problem of Local Coding Schemes
3. NIH Policies for Data Reusability and the
Growth of Clinical Research Consortia
4. Is SNOMED the Solution?
5. The Gene Ontology
6. The OBO Foundry
7. The National Center for Biomedical
Ontology
8. Ontology in Buffalo
11
Sharing Research Data:
Investigators submitting an NIH
application seeking $500,000 or
more in direct costs in any single
year are expected to include a plan
for data sharing or state why this is
not possible
(http://grants.nih.gov/grants/policy/d
ata_sharing).
12
Program Announcement (PA) Number: PAR-07-425
Title: Data Ontologies for
Biomedical Research (R01)
NIH Blueprint for Neuroscience Research, (http://neuroscienceblueprint.nih.gov/)
National Cancer Institute (NCI), (http://www.cancer.gov)
National Center for Research Resources (NCRR), (http://www.ncrr.nih.gov/)
National Eye Institute (NEI), (http://www.nei.nih.gov/)
National Heart Lung and Blood Institute (NHLBI), (http://http.nhlbi.nih.gov )
National Human Genome Research Institute (NHGRI), (http://www.genome.gov)
National Institute on Alcohol Abuse and Alcoholism (NIAAA), (http://www.niaaa.nih.gov/)
National Institute of Biomedical Imaging and Bioengineering (NIBIB), (http://www.nibib.nih.gov/)
National Institute of Child Health and Human Development (NICHD), (http://www.nichd.nih.gov/)
National Institute on Drug Abuse (NIDA), (http://www.nida.nih.gov/)
National Institute of Environmental Health Sciences (NIEHS), (http://www.niehs.nih.gov/)
National Institute of General Medical Sciences (NIGMS), (http://www.nigms.nih.gov/)
National Institute of Mental Health (NIMH), (http://www.nimh.nih.gov/)
National Institute of Neurological Disorders and Stroke (NINDS), (http://www.ninds.nih.gov/)
National Institute of Nursing Research (NINR), (http://www.ninr.nih.gov)
Release/Posted Date: August 3, 2007
Letters of Intent Receipt Date(s): December 18, 2007, August 18, 2008,
December 22, 2009, and August 21, 2009 for the four separate receipt
dates.
13
Purpose. Optimal use of informatics tools and
resources [data sets] depends upon explicit
understandings of concepts related to the data upon
which they compute. This is typically accomplished by
a tool or resource adopting a formal controlled
vocabulary and ontology ... that describes objects and
the relationships between those objects in a formal
way.
... this FOA solicits Research Project Grant (R01)
applications from institutions/ organizations that
propose to develop an ontology that will make it
possible for software to understand how two or more
existing data sets relate to each other.
14
Currently, there is no convenient way to map the
knowledge that is contained in one data set to that in
another data set, primarily because of differences in
language and structure.
... in some areas there are emerging
standards. Examples include:
• the Unified Medical Language System (UMLS),
• the Gene Ontology, http://www.geneontology.org/,
• the work supported by the caBIG project
(https://cabig.nci.nih.gov/workspaces/VCDE/),
• ontologies listed at the Open Biomedical Ontology
web site (http://obo.sourceforge.net/).
15
This FOA will support limited awards, each of which
focuses on integrating information between two (or a
few very closely related) data sets in a single subject
domain. The hope is that the developed vocabularies
and ontologies will serve as nucleation points for
other researchers in the area to build upon by
adopting and extending the vocabularies and
ontologies developed under this FOA.
Applicants are expected to identify and adopt emerging standards (such
as those listed above) whenever possible. Applicants are also strongly
encouraged to federate their data under appropriate infrastructures
when possible. One potential infrastructure is provided by the
Biomedical Informatics Research Network (http://www.nbirn.net ). The
caBIG infrastructure (http://cabig.cancer.gov ) is another well
established infrastructure that researchers should consider.
16
NIH anticipates that once important data
sets in a topical area have been unified
that others in that area will adopt the
emerging standard.
The nucleation points should be able to interact with
each other, e.g. through the use of tools that are
made freely available to the research community,
such as those created by the National Center for
Biomedical Ontology (NCBO) (http://bioontology.org/)
or by caBIG
17
Another determinate of ontology acceptance
is the degree to which the ontology
conforms to best practices governing
ontology design and construction.
Criteria have been developed, and are undergoing
empirical validation, by the Vocabulary and Common Data
Element Work Group of caBIG. Other criteria have been
specified by the OBO Foundry (http://obofoundry.org/ ).
In this FOA, the applicant should specify the criteria with
which the ontology will conform and the reasons that those
criteria are relevant to the data sets being integrated by the
proposed ontology.
18
Growth of Clinical and
Translational Research Consortia
Examples:
• PharmGKB
• caBIG
• BIRN – Biomedical Informatics Research
Network
– BIRN Ontology Task Force
19
1. Biomedical Informatics Needs Data
2. The Problem of Local Coding Schemes
3. NIH Policies for Data Reusability and the
Growth of Clinical Research Consortia
4. Is SNOMED the Solution?
5. The Gene Ontology
6. The OBO Foundry
7. The National Center for Biomedical
Ontology
8. Ontology in Buffalo
20
medical records
SNOMED codes
21
http://ontolog
The Systematized Nomenclature of
Medicine
• built by College of American Pathologists
• now maintained by International Health
Terminology Standards Development
Organisation
• access via Virginia Tech SNOMED CT®
Browser http://snomed.vetmed.vt.edu/
• (semi-) Open Source
22
SNOMED often includes nonperspicuous terms
FullySpecifiedName:
Coordination observable (observable entity)
FullySpecifiedName:
Coordination (observable entity)
23
and more:
Self-control behavior: aggression (observable
entity)
Physical activity target light exercise (finding)
is a type of physical activity finding (finding)
24
odd bunchings
European is a ethnic group
6
Other European in New Zealand (ethnic
group) is a ethnic group
Mixed ethnic census group is a ethnic group
Flathead is a ethnic group
25
Poor modular development
• No clear strategy for improvement
• Difficult to use for coding
• A tax on world health information
technology?
26
SNOMED embraces only some of
the multiple kinds of siloed data
Lab / pathology data
Electronic Health Record data
Patient histories
Clinical trial data, including regulatory data
Medical imaging
Microarray data
Protein chip data
Flow cytometry
Mass spectrometry data
Genotype / SNP data
Mouse data, fly data, chicken data ...
27
1. Biomedical Informatics Needs Data
2. The Problem of Local Coding Schemes
3. NIH Policies for Data Reusability and the
Growth of Clinical Research Consortia
4. Is SNOMED the Solution?
5. The Gene Ontology
6. The OBO Foundry
7. The National Center for Biomedical
Ontology
8. Ontology in Buffalo
28
29
The Gene Ontology
Open Source
Cross-Species
Impressive annotation resource
Impressive policies for maintenance
30
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFES
IPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVIS
How to do Biology across the
VMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVY
TLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLER
Genome?
CHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKY
GYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERL
KRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRAC
ALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVC
KLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDD
NNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGI
SLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLK
TLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPW
MDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEY
ATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGS
RFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSG
sequence of X chromosome in baker’s yeast
31
TTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDV
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDR
KRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTL
SLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYM
FLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRA
CALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCAC
TARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTR
RIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDP
NQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGS
RFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCS
FSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEI
YMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPV
RNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQS
QFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMF
NLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVV
WIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGG
LCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIE
RMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTAST
NVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATT
TESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTS
ATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTN
SNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSEN
MNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEAL
AVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTR
GKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKG
GVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSM
LIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDG
32
RFDILLCRDSSREVGE
what cellular component?
what molecular function?
what biological process?
33
A strategy for translational medicine
Sjöblöm T, et al. analyzed 13,023 genes in 11
breast and 11 colorectal cancers
using functional information captured by GO
for given gene product types identified 189 as
being mutated at significant frequency and
thus as providing targets for diagnostic and
therapeutic intervention.
Science. 2006 Oct 13;314(5797):268-74.
34
GO widely used
Sjöblöm T, et al. analyzed 13,023 genes in 11
breast and 11 colorectal cancers
using functional information captured by GO for
given gene product types identified189 as being
mutated at significant frequencies and thus as
providing targets for diagnostic and therapeutic
intervention.
Science. 2006 Oct 13;314(5797):268-74.
35
http:
//ont
Benefits of GO
1. links people to data
2. links data together
• across species (human, mouse, yeast, fly ...)
• across granularities (molecule, cell, organ,
organism, population)
3. links medicine to biological science
3
1. Biomedical Informatics Needs Data
2. The Problem of Local Coding Schemes
3. NIH Policies for Data Reusability and the
Growth of Clinical Research Consortia
4. Is SNOMED the Solution?
5. The Gene Ontology
6. The OBO Foundry
7. The National Center for Biomedical
Ontology
8. Ontology in Buffalo
37
2003
a shared portal for (so far) 58 ontologies
(low regimentation)
http://obo.sourceforge.net  NCBO BioPortal
38
39
Ontology
Scope
URL
Custodians
Cell Ontology
(CL)
cell types from prokaryotes
to mammals
obo.sourceforge.net/cgibin/detail.cgi?cell
Jonathan Bard, Michael
Ashburner, Oliver Hofman
Chemical Entities of Biological Interest (ChEBI)
molecular entities
ebi.ac.uk/chebi
Paula Dematos,
Rafael Alcantara
Common Anatomy Reference Ontology (CARO)
anatomical structures in
human and model organisms
(under development)
Melissa Haendel, Terry
Hayamizu, Cornelius Rosse,
David Sutherland,
Foundational Model of
Anatomy (FMA)
structure of the human body
fma.biostr.washington.
edu
JLV Mejino Jr.,
Cornelius Rosse
Functional Genomics
Investigation Ontology
(FuGO)
design, protocol, data
instrumentation, and analysis
fugo.sf.net
FuGO Working Group
Gene Ontology
(GO)
cellular components,
molecular functions,
biological processes
www.geneontology.org
Gene Ontology Consortium
Phenotypic Quality
Ontology
(PaTO)
qualities of anatomical
structures
obo.sourceforge.net/cgi
-bin/ detail.cgi?
attribute_and_value
Michael Ashburner, Suzanna
Lewis, Georgios Gkoutos
Protein Ontology
(PrO)
protein types and
modifications
(under development)
Protein Ontology Consortium
Relation Ontology (RO)
relations
obo.sf.net/relationship
Barry Smith, Chris Mungall
RNA Ontology
(RnaO)
three-dimensional RNA
structures
(under development)
RNA Ontology Consortium
Sequence Ontology
(SO)
properties and features of
nucleic sequences
song.sf.net
40 Karen Eilbeck
RELATION
TO TIME
CONTINUANT
INDEPENDENT
OCCURRENT
DEPENDENT
GRANULARITY
ORGAN AND
ORGANISM
Organism
(NCBI
Taxonomy)
CELL AND
CELLULAR
COMPONENT
Cell
(CL)
MOLECULE
Anatomical
Organ
Entity
Function
(FMA,
(FMP, CPRO) Phenotypic
CARO)
Quality
(PaTO)
Cellular
Cellular
Component Function
(FMA, GO)
(GO)
Molecule
(ChEBI, SO,
RnaO, PrO)
Molecular Function
(GO)
Biological
Process
(GO)
Molecular Process
(GO)
Building out from the original GO
41
Ashburner
Cambridge
Lewis
Berkeley
Mungall
Berkeley
Smith
Buffalo
OBO Foundry Coordinators
42
The goal
all biological (biomedical) research data
should cumulate to form a single,
algorithmically processible, whole
http://obofoundry.org
43
CRITERIA
FOUNDRY CRITERIA
 The ontology is open and available to be used by
all.
 The ontology is in, or can be instantiated in, a
common formal language.
 The developers of the ontology agree in advance
to collaborate with developers of other OBO
Foundry ontology where domains overlap.
44
CRITERIA
 UPDATE: The developers of each ontology
commit to its maintenance in light of scientific
advance, and to soliciting community feedback
for its improvement.
 ORTHOGONALITY: They commit to working with
other Foundry members to ensure that, for any
particular domain, there is community
convergence on a single controlled vocabulary.
45
Consequences
 OBO Foundry is serving as a benchmark
for improvements in discipline-focused
terminology resources
 yielding callibration of existing
terminologies and data resources and
alignment of different views
46
Mature OBO Foundry ontologies
(now undergoing reform)
Cell Ontology (CL)
Chemical Entities of Biological Interest (ChEBI)
Foundational Model of Anatomy (FMA)
Gene Ontology (GO)
Phenotypic Quality Ontology (PaTO)
Relation Ontology (RO)
Sequence Ontology (SO)
47
Ontologies being built to satisfy Foundry
principles ab initio
Common Anatomy Reference Ontology
(CARO)
Ontology for Biomedical Investigations (OBI)
Protein Ontology (PRO)
RNA Ontology (RnaO)
Subcellular Anatomy Ontology (SAO)
48
Ontologies in planning phase
Biobank/Biorepository Ontology (BrO, part of OBI)
Environment Ontology (EnvO)
Immunology Ontology (ImmunO)
Infectious Disease Ontology (IDO)
49
1. Biomedical Informatics Needs Data
2. The Problem of Local Coding Schemes
3. NIH Policies for Data Reusability and the
Growth of Clinical Research Consortia
4. Is SNOMED the Solution?
5. The Gene Ontology
6. The OBO Foundry
7. The National Center for Biomedical
Ontology
8. Ontology in Buffalo
50
NCBO
National Center for Biomedical
Ontology (NIH Roadmap Center)
•
•
•
•
•
•
Stanford Medical Informatics
University of San Francisco Medical Center
Berkeley Drosophila Genome Project
Cambridge University Department of Genetics
The Mayo Clinic
University at Buffalo Department of Philosophy
51
1. Biomedical Informatics Needs Data
2. The Problem of Local Coding Schemes
3. NIH Policies for Data Reusability and the
Growth of Clinical Research Consortia
4. Is SNOMED the Solution?
5. The Gene Ontology
6. The OBO Foundry
7. The National Center for Biomedical
Ontology
8. Ontology in Buffalo
52
Ontology Research Group in CoE
Werner Ceusters
Louis Goldberg
Barry Smith
Robert Arp
Thomas Bittner
Maureen Donnelly
David Koepsell
Ron Rudnicki
Shahid Manzoor
53
Ontologies in Buffalo
Common Anatomy Reference Ontology (CARO)
Environment Ontology (EnvO)
Foundational Model of Anatomy (FMA)
Infectious Disease Ontology (IDO)
MS Ontology
Protein Ontology (PRO)
Relation Ontology (RO)
54
Ontologies planned
ICF Ontology
Food Ontology
Allergy Ontology
Vaccine Ontology
Ontology for Community-Based Medicine
Psychiatry Ontology
55