Download SW_TransMed - W3C mailing list

Document related concepts

Clinical neurochemistry wikipedia , lookup

Transcript
Translational Medicine
from a Semantic Web Perspective
Eric Neumann
W3C June 16, 2006
Drug Discovery
and Medicine
• Health
• Practice
• Safety
• Prevention
• Privacy
• Knowledge
Hygieia, G. Klimt
2
Data Expansion
Large Data Sets
Variables >> Samples
Many New
Data Types
Combine
Which Formats?
3
Where Information Advances are
Most Needed
• Supporting Innovative Applications in R&D
– Translational Medicine (Biomarkers)
– Molecular Mechanisms (Systems)
– Data Provenance, Rich Annotation
• Clinical Information
– eHealth Records, EDC, Clinical Submission Documents
– Safety Information, Pharmacovigilance, Adverse Events,
Biomarker data
• Standards
– Central Data Sources
• Genomics, Diseases, Chemistry, Toxicology
– MetaData
• Ontologies
• Vocabularies
4
Knowledge
“--is the human acquired capacity (both
potential and actual) to take effective
action in varied and uncertain
situations.”
How does this translate into using Information Systems
better in support of Innovation?
5
Drug Discovery Challenges
Knowledge 
Predictiveness
• Knowledge of Target Mechanisms
• Knowledge of Toxicity
• Knowledge of Patient-Drug Profiles
6
Current Challenges: Drug Discovery
•
Business
–
–
–
–
Costly, lengthy drug discovery process (12-14 years)
Poor funding to find new uses for existing therapies (ie antibiotics)
Insufficient economic drivers for certain disease areas
Discovery and clinical trials design not well aligned with anticipating adverse
effect detection
• Post-launch surveillance is weak
•
Science & Technology
– Counteracting the legacy of “Silos”
– How to break away from the DD “conveyor belt model” to the “Translation
model”
• gaining and sharing insights throughout the process
– The Benefit of New Targets for New Diseases
– How to best identify safety and efficacy issues early on, so that cost and failure
are reduced
• A D3 Knowledge-base: Drugability and Safety
7
The Big Picture -
Hard to understand from
just a few Points of View
8
9
Complete view tells a very different Story
10
Distributed Nature of R&D
Silos of Data…
11
Existing Web Data Throttles
the R&D Potential
R&D Scientist
Integrating
Data Manually
Static,
Untagged,
Disjoint
LIMS
Bioinformatics
Dolor Sit Amet Consectetuer
Lacreet Dolore Euismod Volutpat
Lacreet Dolore Magna Volutpat
Dolor Sit Amet Consectetuer
Lacreet Dolore Euismod Volutpat
Lacreet Dolore Magna Volutpat
Nibh Euismod Tincidunt Aliguam Erat
Nibh Euismod Tincidunt Aliguam Erat
Cheminformatics
Public Data Sources
12
Data Integration:
Biology Requirements
Papers
Disease
Proteins
Genes
Retention
Policy
Assays
Compounds
Audit
Trail
Curation
Ontology Experiment
Tools
13
Semantic Web Data Integration
R&D Scientist
Dynamic,
Linked,
Searchable
LIMS
Bioinformatics
Cheminformatics
14
Public Data Sources
Raw Data
MAGE ML
Decision
Support
GO
CDISC
BioPAX
Biomarker
Qualification
Translational
Research
Psi XML
ICH
ASN1.
XLS
SAS Tables
Target
Validation
Semantic Bridge
New
Applications
Safety
CSV
Toxicity
15
Key Technologies
Pharmaceuticals use to
Exchanging Knowledge
16
New Regulatory Issues Confronting
Pharmaceuticals
Tox/Efficacy
ADME Optim
from Innovation or Stagnation, FDA Report March 2004
17
Key Functionality
• Ubiquity
– Same identifiers for anything from anywhere
• Discoverability
– Global search on any entity
• Interoperability
– => Application independence:
“Recombinant Data”
18
Additional Functionality
• Provenance
– Origin and history of data and annotations
• Scalability
– Over all potentially relevant data and content
• Authentication/Security
–
–
–
–
Single user and team identity and granular data security
Non-repudiation of authorship
Encryption of graphs
Policy Awareness
• Data Preservation
– Long-term persistence by minimizing API needs
19
Translational Research and Personalized
Medicine
Biomedical
Research
-Two significant areas of HCLS activity
- Span most areas of activity
Biological
Translational
Medicine
Clinical
Clinical
Research
Clinical
Practice
Research
Practice
Personalized
Medicine
20
HCLS Framework:
Biomedical Research
• Molecular, Cellular and Systems Biology/Physiology
– Organism as an integrated an interacting network of genes, proteins and
biochemical reactions
– Human body as a system of interacting organs
• Molecular Cell Biology/Genomic and Proteomic Research
– Gene Sequencing, Genotyping, Protein Structures
– Cell Signaling and other Pathways
• Biomarker Research
– Discovery of genes and gene products that can be used to measure disease
progression or impacts of drug
• Pharmaco-genomics
– Impact of genetic inheritance on
• Drug Discovery and Translational Research
– Use of preclinical research to identify promising drug candidates
21
HCLS Framework:
Clinical Research
• Clinical Trials
– Determination of efficacy, impact and safety of drugs for particular
diseases
• Pharmaco-vigilance/ADE Surveillance
– Monitoring of impacts of drugs on patients, especially safety and adverse
event related information
• Patient Cohort Identification and Management
– Identifying patient cohorts for drug trials is a challenging task
• Translational Research
– Test theories emerging from pre-clinical experimentation on disease
affected human subjects
• Development of EHRs/EMRs for both clinical research and
practice
– Currently EHRs/EMRs focussed on clinical workflow processes
– Re-using that information for clinical research and trials is a challenging
task
22
Translational Research
• Improve communication between basic and clinical science so
that more therapeutic insights may be derived from new
scientific ideas - and vice versa.
• Testing of theories emerging from preclinical experimentation
on disease-affected human subjects.
• Information obtained from preliminary human experimentation
can be used to refine our understanding of the biological
principles underpinning the heterogeneity of human disease
and polymorphism(s).
• http://www.translational-medicine.com/info/about
• Reference NIH Digital Roadmap activity
24
Personalized Medicine
• Propagation of insights from Genomic research into clinical practice
• Impact of new Molecular diagnostic tests hitting the market
– How can they be incorporated into clinical care?
– How does one update current clinical guidelines to incorporate the use of these
tests
– How can one enable novel clinical decision support?
• How can phenotypic characteristics and genomic markers be used to:
– Stratify patient populations
– “Personalize” clinical care
• Genetic test results as risk factors
• Therapeutic use of genomic markers
27
Ecosystem: Current State
Characterized by silos with uncoordinated
supply chains leading to inefficiencies in the system
Patients
National Institutes
Of Health
Patients,
Public
FDA
Pharmaceutical
Companies
Hospitals
Center for
Disease
Control
Payors
Universities,
Academic Medical
Centers (AMCs)
Biomedical Research
Clinical Practice
Clinical Research
Organizations (CROs)
Hospitals Doctors
Patients
Clinical Trials/Research
29
Patients
Clinical Practice
Ecosystem: Goal State
/* Need to expand this with Biomedical Research + Clinical Practice */
Biomedical Research
Clinial Practice
/* Need to expand this to include Healthcare and Biomedical Research
Players as well… Show an integrated picture with “continuous” information
flow */
30
Use Case Flow:
Drug Discovery and Development
Qualified
Targets
Lead
Generation
Lead
Optimization
Toxicity &
Safety
KD
Biomarkers
Molecular
Mechanisms
Pharmacogenomics
Clinical
Trials
32
Drug Discovery & Development Knowledge
Qualified
Targets
Molecular
Mechanisms
Lead
Generation
Toxicity &
Safety
Lead
Optimization
Pharmacogenomics
Biomarkers
Clinical
Trials
33
Launch
Semantic Web Drug DD Application Space
Therapeutics
Critical Path
Chem Lib
manufacturing
NDA
Production
Genomics
Clinical
Studies
HTS
eADME
Biology
Compound
Opt
DMPK
genes
35
Patent
informatics
Opportunities for Semantics in HealthCare
• Enhanced interoperability via:
– Semantic Tagging
– Grounding of concepts in Standardized Vocabularies
– Complex Definitions
• Semantics-based Observation Capture
• Inference on Diseases
– Phenotypes
– Genetics
– Mechanisms
• Semantics-based Clinical Decision Support
– Guided Data Interpretation
– Guided Ordering
• Semantics-based Knowledge Management
36
Data Semantics in the Life Sciences
Pathways,
Biomarkers
Publications
Publications + data
Image +
Text
Text
Categorical
Taxonomic
Data Items
Data Items
Text + data
items
Histology Profiling
Data Items
genomics
Systems
Biology
Complex
Objects with
Categorical/
Taxonomic
Data Items
Gene expression
Complex
Objects
Clinical Findings
Composite
Objects with
Embedded
“process”
Clinical trials
Unstructured
Data Types
Structured
and Complex Data Types
37
RDB => RDF
Virtualized RDF
39
Use-Case: COSA
Row Semantic <rdf:type Subject>
Column Semantic <rdf:type Gene>
Data Set
42
Use-Case:
Experimental Design Definition
Treatment
W
Cultured
Cells
Control
Visible
Microscopy
Time
Points
Image
Analysis
Staining
Fluorescent
Microscopy
Treatment
Z
43
Case Study: Drug Safety
‘Safety Lenses’
• Lenses can ‘focus data in specific ways
– Hepatoxicity, genotoxicity, hERG, metabolites
• Can be “wrapped” around statistical tools
• Aggregate other papers and findings (knowledge) in context
with a particular project
• Align animal studies with clinical results
• Support special “Alert-channels” by regulators for each
different toxicity issue
• Integrate JIT information on newly published mechanisms of
actions
44
Example:
Knowledge
Aggregation
45
Courtesy of
BG-Medicine
Case Study: Omics
ApoA1 …
… is produced by the Liver
… is expressed less in Atherosclerotic Liver
… is correlated with DKK1
… is cited regarding Tangier’s disease
… has Tx Reg elements like HNFR1
Subject  Verb  Object
46
Scenario: Biomarker Qualification
•
Biomarker Roles
–
–
–
•
Disease
Toxicity
Efficacy
Molecular and cytological markers
– Tissue-specific
– High content screening derived information
– Different sets associated with different predictive tools
•
Statistical discrimination based on selected samples
– Predictive power
– Alternative cluster prediction algorithms
– Support qualifications from multiple studies (comparisons)
•
Causal mechanisms
– Pathways
– Population variation
48
BioMarker Semantics
Disease
Pathways
+Samples
Biomarker Set
Significance
&
Strength
49
-Samples
Scenario: Toxicity
•
Mechanisms
–
–
–
•
Tissue-selective, Species-specific
Pathways, Off-Targets
Metabolites, PK sensitivity
Evidence
–
Biomarkers
•
–
•
Drug Metabolism to toxic forms (CYP, SULT, UGT)
Target interaction variability
Potential vs. Demonstrated
Predictions
–
–
•
Literature
Population Variation
–
–
–
•
In vitro assays (cell lines), Animal models, Clinical Phase 1
Data Mining Patterns
Computational Modeling
Working Solutions
–
–
–
Chemical modifications
Dosing, Reformulation
Documented animal <=> human similarity and variation
50
Knowledge Mining using Semantic Web
“Gene Prioritization through
Data Fusion”
- Aerts et al, 2006, Nature
-Use of quantitative and
qualitative information for
statistical ranking.
-Can be used to identify novel
genes involved in diseases
51
Case Study: BioPAX (Pathways)
<bp:PATHWAYSTEP rdf:ID="xDshToXGSK3bPathwayStep">
<bp:next-step rdf:resource="#xGSK3bToBetaCateninPathwayStep"/>
<bp:step-interactions>
<bp:MODULATION rdf:ID="xDshToXGSK3b">
<bp:keft rdf:resource="#xDsh"/>
<bp:right rdf:resource="#xGSK-3beta"/>
<bp:participants rdf:resource="#xGSK-3beta"/>
<bp:name rdf:datatype="http://www.w3.org/2001/XMLSchema#stri
Dishevelled to GSK3beta</bp:name>
<bp:direction rdf:datatype="http://www.w3.org/2001/XMLSchema#
IRREVERSIBLE-LEFT-TO-RIGHT</bp: direction >
<bp:control-type rdf:datatype="http://www.w3.org/2001/XMLSchem
INHIBITION</bp: control-type >
<bp: participants rdf:resource="#xDsh"/>
</bp: MODULATION >
</bp: step-interactions >
</bp: PATHWAYSTEP >
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
52
Case Study: BioPAX (Pathways)
<bp:PATHWAYSTEP rdf:ID="xDshToXGSK3bPathwayStep">
<bp:next-step rdf:resource="#xGSK3bToBetaCateninPathwayStep"/>
<bp:step-interactions>
<bp:MODULATION rdf:ID="xDshToXGSK3b">
<bp:keft rdf:resource="#xDsh"/>
<bp:right rdf:resource="#xGSK-3beta"/>
<bp:participants rdf:resource="#xGSK-3beta"/>
<bp:name rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
Dishevelled to GSK3beta</bp:name>
<bp:direction rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
IRREVERSIBLE-LEFT-TO-RIGHT</bp: direction >
<bp:control-type rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
INHIBITION</bp: control-type >
<bp: participants rdf:resource="#xDsh"/>
</bp: MODULATION >
</bp: step-interactions >
</bp: PATHWAYSTEP >
53
Case Study: BioPAX (Pathways)
<bp:PATHWAYSTEP rdf:ID="xDshToXGSK3bPathwayStep">
Modulation
<bp:next-step rdf:resource="#xGSK3bToBetaCateninPathwayStep"/>
<bp:step-interactions>
<bp:MODULATION rdf:ID="xDshToXGSK3b">
<bp:keft rdf:resource="#xDsh"/>
<bp:right rdf:resource="#xGSK-3beta"/>
<bp:participants rdf:resource="#xGSK-3beta"/>
<bp:name rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
Dishevelled to GSK3beta</bp:name>
<bp:direction rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
IRREVERSIBLE-LEFT-TO-RIGHT</bp: direction >
<bp:control-type rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
INHIBITION</bp: control-type >
<drug:affectedBy rdf:resource=”http://pharma.com/cmpd/CHIR99102"/>
<bp: participants rdf:resource="#xDsh"/>
</bp: MODULATION >
</bp: step-interactions >
</bp: PATHWAYSTEP >
54
affectedBy
CHIR99102
Potential Linked Clinical Ontologies
Clinical Obs
Disease
Descriptions
SNOMED
Applications CDISC
ICD10
RCRIM
(HL7)
Clinical Trials Disease
Models
ontology
Mechanisms
Pathways
(BioPAX)
IRB
Tox
Extant ontologies
Genomics
Molecules
Under development
Bridge concept
55
Case Study: Drug Discovery
Dashboards
• Dashboards and Project Reports
• Next generation browsers for semantic
information via Semantic Lenses
• Renders OWL-RDF, XML, and HTML documents
• Lenses act as information aggregators and logic
style-sheets
add { ls:TheraTopic
hs:classView:TopicView
}
56
Drug Discovery Dashboard
http://www.w3.org/2005/04/swls/BioDash
Topic: GSK3beta Topic
Disease: DiabetesT2
Alt Dis: Alzheimers
Target: GSK3beta
Cmpd: SB44121
CE: DBP
Team: GSK3 Team
Person: John
Related Set
Path: WNT
57
Bridging Chemistry and Molecular Biology
Semantic Lenses: Different Views of the same
data
BioPax
Components
Target Model
urn:lsid:uniprot.org:uniprot:P49841
Apply Correspondence Rule:
if ?target.xref.lsid == ?bpx:prot.xref.lsid
then ?target.correspondsTo.?bpx:prot
58
Bridging Chemistry and Molecular Biology
•Lenses can aggregate, accentuate,
or even analyze new result sets
• Behind the lens, the data can be
persistently stored as RDF-OWL
• Correspondence does not need
to mean “same descriptive
object”, but may mean objects
with identical references
59
Pathway Polymorphisms
•Merge directly onto
pathway graph
•Identify targets with
lowest chance of genetic
variance
Non-synonymous
polymorphisms
from db-SNP
•Predict parts of pathways
with highest functional
variability
•Map genetic influence to
potential pathway elements
•Select mechanisms of
action that are minimally
impacted by polymorphisms
60
Knowledge Channels
<item rdf:about="http://www.connotea.org/user/hannahr/uri/48e905bdb66310af85ad2e8503628e01">
<title>High Mda-7 expression promotes malignant cell survival and p38 MAP kinase activation in chronic
lymphocytic leukemia.</title>
<link>http://www.connotea.org/user/hannahr/uri/48e905bdb66310af85ad2e8503628e01</link>
<description>Posted by hannahr to CLLSignalling&#x26;Processes on Thu Jan 19 2006</description>
<dc:creator>hannahr</dc:creator>
<dc:date>2006-01-19T11:24:03Z</dc:date>
<dc:subject>CLLSignalling&#x26;Processes</dc:subject>
<connotea:uri>
<dc:title>High Mda-7 expression promotes malignant cell survival and p38 MAP kinase activation
in chronic lymphocytic leukemia.</dc:title>
<dc:creator>A Sainz-Perez</dc:creator>
<dc:creator>H Gary-Gouy</dc:creator>
<dc:identifier>
<connotea:PubMedID>
<connotea:idValue>16408101</connotea:idValue>
<rdf:value>PMID: 16408101</rdf:value>
</connotea:PubMedID>
</dc:identifier>
<dc:date>2006-01-12</dc:date>
<prism:publicationName>Leukemia</prism:publicationName>
<prism:issn>0887-6924</prism:issn>
</connotea:uri>
</item>
61
Knowledge Channels
<item rdf:about="http://www.connotea.org/user/hannahr/uri/48e905bdb66310af85ad2e8503628e01">
<title>High Mda-7 expression promotes malignant cell survival and p38 MAP kinase activation in chronic lymphocytic
leukemia.</title>
<link>http://www.connotea.org/user/hannahr/uri/48e905bdb66310af85ad2e8503628e01</link>
<description>Posted by hannahr to CLLSignalling&#x26;Processes on Thu Jan 19 2006</description>
<dc:creator>hannahr</dc:creator>
<dc:date>2006-01-19T11:24:03Z</dc:date>
<dc:subject>CLLSignalling&#x26;Processes</dc:subject>
<kn:nugget rdf:resource=“#N251”>
<tn:expert>Giles Day </tn:expert>
<tn:topic>pf#P38</tn:topic>
<tn:kChannel>pf#Kinases</tn:kChannel >
<tn:comment>This paper suggests a mechanism for P38 protection of CLL B-cells</tn:comment >
</kn:nugget >
<connotea:uri>
<dc:title>High Mda-7 expression promotes malignant cell survival and p38 MAP kinase activation
in chronic lymphocytic leukemia.</dc:title>
<dc:creator>A Sainz-Perez</dc:creator>
<dc:creator>H Gary-Gouy</dc:creator>
<dc:identifier>
<connotea:PubMedID>
<connotea:idValue>16408101</connotea:idValue>
<rdf:value>PMID: 16408101</rdf:value>
</connotea:PubMedID>
</dc:identifier>
<dc:date>2006-01-12</dc:date>
<prism:publicationName>Leukemia</prism:publicationName>
<prism:issn>0887-6924</prism:issn>
</connotea:uri>
</item>
62
P38 paper
nugget
N251
expert
Giles Day
topic
pf#P38
kChannel
Pf#Kinases
Case Study: Drug Safety
‘Safety Lenses’
• Lenses can ‘focus data in specific ways
– Hepatoxicity, genotoxicity, hERG, metabolites
• Can be “wrapped” around statistical tools
• Aggregate other papers and findings (knowledge) in context
with a particular project
• Align animal studies with clinical results
• Support special “Alert-channels” by regulators for each
different toxicity issue
• Integrate JIT information on newly published mechanisms of
actions
63
GeneLogic GeneExpress Data
• Additional relations
and aspects can be
defined additionally
Diseased
Tissue
Links to
OMIM (RDF)
65
Bar View of GeneExpress
66
ClinDash: Clinical Trials Browser
Subjects
•Values can be
normalized across all
measurables (rows)
Clinical Obs
•Samples can be
aligned to their
subjects using RDF
rules
Expression
Data
•Clustering can now be
done over all
measureables (rows)
67
68
69
70
71
W3C Launches Semantic Web for HealthCare
and Life Sciences Interest Group
•
Interest Group formally launched Nov 2005:
http://www.w3.org/2001/sw/hcls
•
First Domain Group for W3C - “…take SW through its paces”
•
An Open Scientific Forum for Discussing, Capturing, and Showcasing Best
Practices
•
Recent life science members: Pfizer, Merck, Partners HealthCare,
Teranode, Cerebra, NIST, U Manchester, Stanford U, AlzForum
•
SW Supporting Vendors: Oracle, IBM, HP, Siemens, AGFA,
•
Co-chairs: Dr. Tonya Hongsermeier (Partners HealthCare); Eric Neumann
(Teranode)
76
HCLS Objectives
• Share use cases, applications,
demonstrations, experiences
• Exposing collections
• Developing vocabularies
• Building / extending (where appropriate) core
vocabularies for data integration
77
HCLS Activities
•
•
•
•
•
BioRDF - data + NLP as RDF
BioONT - ontology coordination
Scientific Publishing - evidence management
Adaptive Clinical Protocols and Pathways
Clinical Trials
78
BioRDF: NeuroCommons.org
The Neurocommons project, a collaboration between Science Commons and the
Teranode Corporation, is creating a free, public Semantic Web for neurological
research. The project has three distinct goals:
1. To demonstrate that scientific impact and innovation is directly related to the
freedom to legally reuse and technically transform scientific information.
2. To establish a legal and technical framework that increases the impact of
investment in neurological research in a public and clearly measurable manner.
3. To develop an open community of neuroscientists, funders of neurological
research, technologists, physicians, and patients to extend the Neurocommons
work in an open, collaborative, distributed manner.
79
BioRDF: Reagents
RDF resources that describes various kinds of
experimental reagents, starting with antibodies:
•Initial RDF that captures: Gene, the fact that this is an antibody,
various kinds of pages about the antibody, such as vendor
documentation, and any other properties that are explicitly captured
in the source material
•Work with the Ontology task force to identify appropriate
ontologies and vocabularies to use in the RDF.
•Write queries against the RDF to answer questions of the sort
posed on the Alzforum's
80
BioRDF: NCBI
• NCBI Data: URIs and as RDF
• Terminology Integration: NLM’s UMLS, MESH
–
SNOMED
• Olivier Bodensreider
81
BioRDF Neuro Tasks
• Aggregate facts and models around
Parkinson’s Disease
• BIRN / Human Brain Project
• SWAN: scientific annotations and evidence
• Use RDF and OWL to describe
– ’Brain Connectivity'
–N
euronal data in SenseLab
82
What does RDF get you?
• Structure is not format-rigid (i.e. tree)
– Semantics not implicit in Syntax
– No new parsers need to be defined for new data
• Entities can be anywhere on the web (URI)
• Define semantics into graph structures (ontologies)
– Use rules to test data consistency and extract important relations
• Data can be merged into complete graphs
• Multiple ontologies supported
89
RDF vs. XML example
Wang et al., Nature Biotechnology, Sept 2005
AGML
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
90
HUPML
RDF Stripe Mode
Node>Edge>Node
>Edge….
91
RDF Graph
92
gsk:KENPAL
rdf:type :Compound ;
dc:source
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&#38;db=pubmed&#38;dopt=Ab
stract&#38;list_uids=14698171 ;
chemID “3820” ;
clogP “2.4” ;
kA “e-8” ;
mw “327.17” ;
ic50 { rdf:type :IC50 ; value “23” ; units :nM ; forTarget gsk:GSK3beta } ;
chemStructure “C16H11BrN2O” ;
rdfs:label “kenpaullone” ;
synonym “bromo-paullone” ;
smiles “C1C2=C(C3=CC=CC=C3NC1=O)NC4=C2C=C(C=C4)B” ;
inChI “1/C16H11BrN2O/c17-9-5-6-14-11(7-9)12-8-15(20)18-13-4-2-1-3-10(13)16(12)1914/h1-7,19H,8H2,(H,18,20)/f/h18H” ;
xref http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=3820 .
94
Multiple Ontologies Used Together
Disease
OMIM
UMLS
Group
FOAF
Disease
Polymorphisms
SNP
Drug target
ontology
UniProt
Protein
BioPAX
Person
PubChem
Patent
ontology
Extant ontologies
Chemical
entity
95
Under development
Bridge concept
Case Studies
96
Case Study: NeuroCommons.org
•
•
•
•
Public Data & Knowledge for CNS
R&D Forum
Available for industry and academia
All based on Semantic Web Standards
97
NeuroCommons.org
The Neurocommons project, a collaboration between Science Commons
and the Teranode Corporation, is creating a free, public Semantic Web
for neurological research. The project has three distinct goals:
1. To demonstrate that scientific impact and innovation is directly related to
the freedom to legally reuse and technically transform scientific
information.
2. To establish a legal and technical framework that increases the impact
of investment in neurological research in a public and clearly
measurable manner.
3. To develop an open community of neuroscientists, funders of
neurological research, technologists, physicians, and patients to extend
the Neurocommons work in an open, collaborative, distributed manner.
99
HCLS Neuro Tasks
• Aggregate facts and models around
Parkinson’s Disease
• SWAN: scientific annotations and evidence
• Use RDF and OWL to describe
–
–
–
–
–
Brain scans in the The Whole Brain Atlas
Neural entries in NCBI’s Entrez Gene Database
’Brain Connectivity'
N
euronal data in SenseLab
Neurological Disease entries in OMIM
102
Conclusions:
Key Semantic Web Principles
•
•
•
•
•
•
•
•
Plan for change
Free data from the application that created it
Lower reliance on overly complex Middleware
The value in "as needed" data integration
Big wins come from many little ones
The power of links - network effect
Open-world, open solutions are cost effective
Importance of "Partial Understanding"
104
What is the Semantic Web ?
It’s Semantic
Webs
It’s Text
Extraction
It’s AI
It’s
Web 2.0
It’s Data
Tracking
It’s a Global
Conspiracy
• http://www.w3.org/2006/Talks/0125-hclsig-em/
106
It’s
Ontologies
W3C Roadmap
• Semantic Web foundation specifications
– RDF, RDF Schema and OWL are W3C
Recommendations as of Feb 2004
• Standardization work is underway in Query,
Best Practices and Rules
• Goal of moving from a Web of Document to
a Web of Data
The Only Open and Web-based Data Integration Model
Game in Town
107
The Current Web
 What the computer sees:
“Dumb” links
 No semantics - <a href>
treated just like <bold>
 Minimal machineprocessable information
108
The Semantic Web
 Machine-processable
semantic information
 Semantic context
published – making the
data more informative
to both humans and
machines
109
Google Graphs
Ranking Sites based on
Topology
Associate Word
frequencies with ranked
sites
110
The Technologies: RDF
• Resource Description Framework
• W3C standard for making statements of fact or
belief about data or concepts
• Descriptive statements are expressed as triples:
(Subject, Verb, Object)
– We call verb a “predicate” or a “property”
Subject
<Patient HB2122>
Property
<shows_sign>
111
Object
<Disease Pneumococcal_Meningitis>
What RDF Gets You
Universal, semantic
connectivity supports
the construction of
elaborate structures.
112
Losing Connectedness in Tables
Fast Uptake and ease of use,
but loose binding to entities and terms
?
113
Casp2
Casp2
Colon
Endodermal
Data Integration?
• Querying Databases is not sufficient
• Data needs to include the Context of Local
Scientists
• Concepts and Vocabulary need to be
associated
• More about Sociology than Technology
Information  Knowledge
114
Standards- Why Not?
• Good when there’s a majority of agreement
• By vendors, for vendors?
• Mainly about Data Packing-- should be more
about Semantics (user-defined)
• API dominated (Time trapped)
• Ease and Expressivity
• Too often they’re Brittle and Slow to develop
• “They’re great, that’s why there are so many of
them”
115
Data Integration Enables Business
Integration: Efficiency and Innovation
•
•
•
•
•
•
Searching
Visualization
Analysis
Reporting
Notification
Navigation
116
Searching…
#1 way for finding information in
companies…
117