Download Partners

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
From Web 1.0  Web 3.0:
Is RDF access to RDB enough?
Vipul Kashyap
[email protected]
Senior Medical Informatician, Clinical Informatics R&D
Partners Healthcare System
Martin Flanagan,
[email protected]
CTO, InSilico Discovery
W3C Workshop on RDF Access to Relational Databases
October 26th , 2007
Outline
•
•
•
•
•
Position
Use Case Scenario
Solution Approach
A Generalized Framework for RDF Access
Next Steps:
— Proposed Roadmap
— Research Topics
Position
There is a need for a generalized framework (format,
representation language, algebra?) for RDF access to:
(A) Relational Databases
(B) Tabular Data Sources, e.g., Excel Spreadsheets
(C) Web Services
Motivation:
(A) Large amounts of “tabular” data and increasing number of
web services in the Healthcare and Life Sciences
(B) Learn from the relational database success story: Declarative
query language + Algebra + Opportunities for optimization
(C) Potential for providing incremental value, increasing the
adoption and acceptance of the Semantic Web.
Use Case Scenario:
Biological Explanations for Statistical Correlations
•
What is the location of a given Gene, e.g., CPNE1 on the Human Genome?
Data Repository: NCBI Entrez
Access Mechanism: Web Services
•
For what gene(s) is a given SNP, e.g.., rs6060535 in the upstream regulatory
region?
Data Repository: RDBMS containing dbSNP and regulatory region data,
Access Mechanism: JDBC/SQL
•
What genes have been found to be "coexpressed" with CPNE1 and in what
study?
Data Repository: Excel Spreadsheet containing the co-expression patterns of
various genes in various studies.
Access Mechanism: .NET API, MS Office API
Solution Approach
•
•
•
Ontology based RDF query specification
Mapping Framework
— Relational Databases
— Excel Spreadsheets
— Web Services
Query Translations and Execution
Illustrations of a working system based on the Semantic Discovery
System by InSilico Discovery
(http://www.insilicodiscovery.com)
Ontology based RDF Query Specification
SPARQL Query Generated:
prefix example <http://www.semanticdiscoverysystems.com/Example.owl#>
prefix ns <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select distinct ?v0, ?v1
where
{
?v0 ns:type example:gene
?v0 example:has_gene_region ?v1
?v0 example:gname ‘CPNE’
}
Mapping to Relational Databases
Mapping to Oracle
Databases
Mapping to Gene Names
Mediator Class
Mapping to Web Services
Mapping to Web Services
Mapping to GetGenomeLocations
in gene_regions Mediator class
Mapping to Excel Spreadsheets
Mapping to Spreadsheet Data
Mapping to Gene Names
Mediator Class
Query Translation and Execution
This one SPARQL statement ‘joins’ data
From NCBI, Excel, Oracle – “who did what assay
Translators
matching this sequence
data …”
A Generalized Framework for RDF Access
Ontology Classes and Properties
Gene, GeneRegion
has_gene_region, gname
Mediator Framework Classes:
gene.mdl, gene_region.mdl, gene_names.mdl, …
RDB specific classes:
oracle.mdl
Web service specific classes:
ncbi.mdl, keg.mdl
Excel specific classes:
excel.mdl
The SDS Platform is based on the Mediator Definition Language
work done by Val Tannen and his students at U. Pennsylvania.
Was earlier implemented in the K3 system and was widely used in Pharma
Conclusions
•
•
•
•
Need to think of various types of structured/semistructured/tabular data sources in a wholistic manner:
— XML Documents (GRDDL Transforms)
— Relational Databases
— Web Services
— Excel Spreadsheets
— Other “Tabular” and “Tree” data sources
Potential for providing value beyond relational databases
Accelerate the transition to the Semantic Web
Increase Adoption and Acceptance
Next Steps: Proposed Roadmap
RDF
Generalized Transformation Language
GRDDL
XML
Relational
Algebra
Relational
Databases
WSDL
Excel
Spreadsheets
Next Steps: Research
•
•
•
Extension of Relational Algebra?
— XQuery
— RDF
— GRDDL Transformations
— WSDL
— Read only Web Service Choreography/Composition
What aspects of the above can be “webified”?
— Access Transformation Languages
— Mapping Languages: Is XQuery or RDF enough?
Existing efforts in Mediator research
— E.g., Mediator Definition Language (MDL)