Download HCLSIG_BioRDF_Subgroup$SW_eNeuroscience_HCLSIG2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Data center wikipedia , lookup

Clusterpoint wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Web Ontology Language wikipedia , lookup

Data model wikipedia , lookup

Data analysis wikipedia , lookup

Forecasting wikipedia , lookup

3D optical data storage wikipedia , lookup

Information privacy law wikipedia , lookup

Business intelligence wikipedia , lookup

Data vault modeling wikipedia , lookup

Database model wikipedia , lookup

Semantic Web wikipedia , lookup

Resource Description Framework wikipedia , lookup

Transcript
Yale Center for Medical Informatics
(YCMI)
AlzPharm: an RDF Use
Case for Semantic Web
in Neuroscience
SeS2006 Workshop, Beijing, China (Sept. 3, 2006)
• Authors
–
–
–
–
–
–
–
–
–
–
–
–
–
–
Hugo Y.K. Lam (CB&B Ph.D. Program)
Kei Cheung (YCMI)
Luis Marenco (YCMI)
Perry Miller (YCMI)
Nian Liu (YCMI)
Chiquito Crasto (YCMI)
Tim Clark (Mass. General Hospital, Harvard
University)
Yong Gao (Partners)
June Kinoshita (AlzForum)
Elizabeth Wu (AlzForum)
Gwen Wong (AlzForum)
Gordon Shepherd (Yale Neurobiology)
Tom Morse (Yale Neurobiology)
Susie Stephens (Oracle)
Overview
• Most of the neuroscience databases are
neither integrated nor interoperating
• A domain ontology is insufficient for
integrating neuroscience data spanning
multiple domains
• We present a Semantic Web approach to
building an e-Neuroscience data
integration framework, which involves
using RDF as a standard data model to
facilitate representation and integration
of data
e-Neuroscience
• Involves developing tools, technologies,
and infrastructure to support
multidisciplinary and collaborative
science enabled by the Internet
• Aims to address data integration
problem in neuroscience
• Fits the informatics-oriented goal of the
Human Brain Project initiated by NIH
• Provides a better understanding of brain
function by integrating different levels of
brain data.
Current Issues
•Registry
–Keyword-based
search approach
suffers from the
problem of
specificity and
sensitivity
–Centralized
approach to
registering
resources may
not be scalable.
E.g. NDG
Current Issues (cont’d)
• No Links
between
related
databases
Semantic Web Approach
Representing and Integrating Data
Semantic Web
• Exposes the semantics of Web-accessible
data in a standard machine-readable
way so that the data can be more easily
interpreted and integrated by computer
programs (or Web agents)
• Components of the Semantic Web
technologies:
– Ontology
– Ontological Languages
– Semantic-Web-aware Tools (e.g.,
databases)
RDF Data Modeling
• Uses the Oracle RDF Data Model (which
is installed on a Linux server) to build a
Semantic Web data warehouse for
integrating datasets extracted from two
independently-developed neuroscience
databases:
– BrainPharm (a subdatabase of
SenseLab)
– SWAN (Semantic Web Applications
in Neuromedicine)
RDF Data Modeling
• BrainPharm
– A database under development to
support research on drugs for the
treatment of different neurological
disorders
(http://senselab.med.yale.edu/senselab/Brain
Pharm/alzData.asp)
– Contains pharmacological agents that
act on neuronal receptors and signal
transduction pathways in the normal
brain and in nervous disorders such
as Alzheimer’s Disease (AD)
– Enables searches for drug actions at
the level of key molecular
constituents, cell compartments and
individual cells
RDF Data Modeling
• SWAN (http://swan.mindinformatics.org/)
– A project to develop knowledge management
tools and resources for Alzheimer Disease
(AD) researchers, based on an ecosystem
model of scientific discourse
– Uses an upper ontology, including the
following components: scientists, experiments,
publications, bibliographic databases, research
collaborations, scientific web communities, and
etc
– Implemented using Semantic Web technology
– Represents data in RDF format,
– Currently stores a subset of data obtained
from the Alzheimer Research Forum
(http://www.alzforum.org)
Data Conversion & Loading
• The drug-related (chemical) information
are extracted from BrainPharm
• The SWAN hypotheses and publications
are extracted from Alzforum
• SWAN data are already available in
RDF format
• BrainPharm exports data in its own
XML format called EDSP (Electronic
DataSet Protocol)
• Convert the EDSP/XML format into the
corresponding RDF/XML format using
XSL Transformation (XSLT)
• Load both the SWAN and BrainPharm
RDF datasets into the Oracle RDF Data
Model using its data loader program,
which takes RDF data in N-triple format
RDF Based Queries
• Oracle has extended SQL to provide
support for an RDF query language,
which allows users to perform queries
against multiple RDF datasets
• The following two examples illustrate
how such queries can be made to
retrieve and integrate data from
BrainPharm and SWAN
RDF Based Queries
• Example One
– Target
• Query BrainPharm to group and
count AD drugs based on their
molecular targets
– Result
• There are 2 groups of drugs for
AD.
• The first one contains 5 drugs that
act as acetylcholinesterase
inhibitors.
• The second one contains only 1
drug that is a N-methyl-D-aspartic
acid (NMDA) receptor antagonist.
RDF Based Queries
•Example One
–Query
RDF Based Queries
• Example One
– Remarks
• Current implementations of RDF
query languages (RQL) by
specialized RDF stores do not
support aggregate functions (e.g.,
COUNT, SUM and AVERAGE)
via “GROUP BY”
• The Oracle RDF query language
supports such functions, as it is a
hybrid between RQL and SQL
RDF Based Queries
• Example Two
– Target
• Retrieves the information (stored
in BrainPharm) about the AD
drug “Donepezil” and publications
(stored in SWAN) whose titles or
abstracts contain the term
“Donepezil” (case-insensitive)
• Demonstrates the use of RDF
inferencing based on the parentchild (is-a) relationship between
the Publication class (e.g., original
articles retrieved from PubMed)
and ARFPublication class (e.g.,
PubMed articles that have been
commented by
researchers/curators associated
with Alzforum) as defined in the
SWAN RDF Schema
RDF Based Queries
•Example Two
–Query (Partial)
RDF Based Queries
• Example Two
– Result
• With the is-a inference rule
incorporated into the query, it
finds a total of 19 publications that
are linked to claims and/or
hypotheses that have to do with
the effect of Donepezil on AD
treatment
• Among these publications, one of
them belongs to the
ARFPublication class (i.e., one of
the 19 publications is ARFcommented)
• Given the ID (PubMed ID) of the
commented publication, the user
can retrieve the detailed comments
through the Alzforum Web site
RDF Based Queries
• Example Two
– Remarks
• In the SWAN dataset, publications
(e.g., those retrieved from
PubMed) are treated as instances
of the Publication class
• We define publications that have
been commented in the Alzforum
as instances of the ARFPublication
class
• The Oracle RDF Data Model
allows us to create rules for
hierarchical relationships from the
RDFS for the data so that it
enables us to find out all the
publications and its subclasses
(e.g. the ARF publications)
AlzPharm
AlzForum
BrainPharm
Oracle/RDF
SWAN