Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
BioRDF Overview and Update By Kei Cheung, Ph.D. Yale Center for Medical Informatics C-SHALS 2009, Boston, Massachusetts, February 25, 2009 BioRDF Objectives Enhance the HCLS KB Increase the value and use of HCLS KB by identifying scientific use case Work on human-friendly user interface Document and publish findings to help accelerate/promote adoption of the Semantic Web Participants Universities, pharmaceutical companies, start-up companies, government institutes, W3C, etc BioRDF Activities/Tasks Invited Talks UMLS, NCBO, NIF, Biogateway, WikiNeuron, Gene Wiki, 3D Web Visualization, BioSIOC/aTag, VoID HCLS KB Two instances of HCLS KB have been created DERI (Virtuoso) Free University in Berlin (Allegro Graph) add receptors to the picture aTags SenseLab and TCMGeneDIT Neuroscience use case Neurocommons Matthias Samwald, Kei Cheung Query Federation Kei Cheung, Rob Frost, Kingsley Idehen, Scott Marshall, Adrian Paschke , Eric Prud’hommeaux, Matthias Samwald, Jun Zhao Brain: Neuron and Synapse Courtesy of NIDA aTags aTags Very simple, generic way of expressing biomedical statements A short snippet of text + a list of ontology terms used for describing the text Using established vocabulary (SIOC, OBO ontologies) Encoded in RDFa (easy to embed in existing HTML-based systems) aTags Transmitter T seems to activate receptor R Receptor R is expressed in brain region B Region B has strong axonal projections into brain region B2 aTags aTags will be created by conversion of existing biomedical datasets manual curation of data (highlight text snippet in browser & click on del.icio.us – like bookmarklet) Design philosophy: simplicity and practicality Use existing resources Play along with existing systems (HTML content management, RDFa-enabled search engines) Query Federation A Journey to Query Federation: from SPARQL Endpoint to Linked Data Application demo Receptor explorer Mismatch between Wikipedia and DBpedia Comparison of Triplestores Linked Data description and deployment voiD FeDeRate URI Receptor Explorer Receptor Genes involved in receptor Publications about gene Clinical trials referencing publications App ESB/ SOA VectorC Semantic Service Bus HCLS KB DERI DBpedia map linkedct.org Entrez Gene PubMed Clinicaltrials.gov SenseLa b Receptor s Bio2RDF RDF Wikipedia PubMed Copyright 2008 VectorC, LLC Clinicaltrials.gov web sites A Semantic Mismatch between Wikipedia and DBpedia Wikipedia DBpedia Triplestore Comparison Features Virtuoso Allegro Graph Class Hierarchy Inference Linked Data Deployment Query Federation Yes Yes Built-in support 3rd party software (e.g., Pubby) Built-in support (Sesame and Oracle only). For other triplestores, a 3rd party middleware approach is required. Linked Data Spaces (SPARQL against resource URI’s) Federated Query (FeDeRate) Federated Query FeDeRate Local query 1 DBPedia (RDF) Local query 2 IUPHAR (SQL) Query Mediation Local query n Federation Scenario PREFIX db: <http://www.w3.org/2003/01/21-RDF-RDB-access/ns#SqlDB?properties=..%2Ftest%2F> PREFIX re: <http://receptor.example/re#> PREFIX dp: <http://receptor.example/dp#> SELECT ?abstract ?code ?ligand ?hum_seq_id ?chr ?refseq FROM NAMED db:IUPHAR.prop FROM NAMED db:DBPedia.rdf WHERE { # Get info from the (SQL) IUPHAR receptor tables. GRAPH db:IUPHAR.prop { ?r re:Code ?code . ?r re:Ligand ?ligand . ?r re:Human_nucleotide ?hum_seq_id } # Get info from (RDF) DBPedia. GRAPH db:DBPedia.rdf { ?p dp:chromosome ?p dp:refseq ?p dp:symbol ?p db:abstract } ?chr . ?refseq . ?symbol . ?abstract } Example Join between IUAPHAR & DBPedia (GABAB receptor) IUPHAR DBPedia voiD: vocabulary of interlinked Datasets Motivation – Effective Dataset Selection – Efficient Discovery of Datasets, by search engines or data publishers – SPARQL query optimisation and query federation • Two high-level concepts – Dataset: a dataset is published and maintained by a single provider and accessible on the Web through de-referenceable URIs or a SPARQL endpoint – Linkset: a subset of a void:Dataset; store triples to express the interlinking relationship between dataset • voiD Vocabulary, http://rdfs.org/ns/void/html • voiD User's Guide, http://rdfs.org/ns/void-guide Biological Dataset in voiD Format :senselabontology a void:Dataset ; dcterms:title "SenseLab Neuron Ontology" ; dcterms:description "Neuroscience ontology derived from the SenseLab NeuronDB database."; dcterms:license <> ; # TODO foaf:homepage <http://neuroweb.med.yale.edu/senselab/> ; void:exampleResource <http://purl.org/science/owl/sciencecommons/identified_by_pmid> ; void:exampleResource <http://purl.org/ycmi/senselab/neuron_ontology.owl#has_Receptor> ; void:exampleResource <http://purl.org/ycmi/senselab/neuron_ontology.owl#NMDA> ; dcterms:creator :senselab ; ## this organization can be further defined dcterms:source <http://purl.org/ycmi/senselab/neuron_ontology.owl#> ; dcterms:subject <http://purl.org/ycmi/senselab/neuron_ontology.owl#Receptor> ; dcterms:subject <http://dbpedia.org/resource/Receptor_(biochemistry)> ; dcterms:subject <http://dbpedia.org/resource/Neurotransmitter_receptor> ; dcterms:subject <http://dbpedia.org/resource/Sensory_receptor> ; dcterms:source <doi:10.1093/bib/bbm018> ; void:feature :owl ; ## this technical feature can be further defined void:sparqlEndpoint <http://hcls.deri.org:8080/> ; void:vocabulary <http://www.obofoundry.org/ro/ro.owl> . voiD Deployment Deploy a voiD file (in either Turtle, RDF/XML or RDFa format) onto the Web server Make it accessible to search engines, such as Sindice (http://sindice.com/) Publish a Semantic Sitemap file (sitemap.xml) on the server “...... allows Data publishers to state where documents containing RDF data are located, and to advertise alternative means to access it ......” [1] Use the datasetURI property in the sitemap.xml to point to the voiD description of a dataset, e.g., http://neuroweb.med.yale.edu/senselab/senselab-void.ttl#senselabontology [1] http://sw.deri.org/2007/07/sitemapextension/ URI Issues Proliferation of synonymous URI’s http://dbpedia.org/resource/Dopamine_receptor http://purl.org/ycmi/senselab/neuron_ontology.owl#Dopaminergic_Receptor Potential problems Performance Maintenance Possible solutions Involvement of nomenclature committee (e.g., IUPHAR) and domain authority (e.g., Neuroscience Information Framework or NIF) Persistent/permanent URI scheme (e.g., PURL) E.g., http://purl.org/nif/ontology/NIF-Molecule.owl#nifext_5832 Dereferenceable URI’s A dereferenceable URI is a resource identification mechanism that uses the HTTP protocol to obtain a representation of the resource it identifies For Linked Data, the representation takes the form of an information resource that describes the resource that the URI identifies. Future Directions Submit a paper describing the query federation work to a journal or conference Continue and extend current tasks: Query Federation and aTag Add new tasks Expand the HCLS KB (both instances) e.g., semantic wiki, workflow, user interface, … e.g., new datasets such as UMLS Collaborate with other task forces e.g., LODD (natural alternative use case, Faviki) and SWAN/SIOC The End