* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download rdb2rdf_sssw11 - Department of Computer Science
Survey
Document related concepts
Transcript
Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin I want RDF… but my data is in RDB! 2 Why RDB2RDF? • Semantic Web – Deep Web is 500 times bigger than Static Web (2008) – Where do you think that the majority of the data is stored? – If we want a Semantic Web, we need data to be on the web as RDF and interlinked! • Where do you think this data is going to come from? RDB RDB RDB RDB2RDF RDB2RDF RDB RDB2RDF RDB RDB Why RDB2RDF? • Data Integration – Do you know why RDF is cool? • because it’s a graph! – How do link/integrate two different graphs? • add edges between nodes or merge nodes! Real world scenario • Boss: Find me clients that are based in cities who have a population less than 1 million? • You: ??? id Clients Name c_id Locations c_id city state 10 ACME Inc 20 20 Austin TX 11 Foo Bars 21 21 Dallas TX Real world scenario • You: I found the population information… but it’s in a different database. Can you add a column to the Location table in order to insert the new data? • DBA: NO! id Clients Name c_id Locations c_id city state 10 ACME Inc 20 20 Austin TX 11 Foo Bars 21 21 Dallas TX Location state pop id city 1 Austin TX 790390 2 Dallas TX 1197816 ACME Inc ex:name Austin http://db1/cl ient10 rdf:type ex:city ex:basedIn Austin ex:state rdf:type Dallas http://db1/ client11 ex:city ex:basedIn ex:state ex:pop http://db2/loc1 TX Dallas ex:state ex:state http://db2/loc2 id Clients Name c_id Locations c_id city state 10 ACME Inc 20 20 Austin TX 11 Foo Bars 21 21 Dallas TX 790390 TX ex:city http://db1/loc21 Foo Bars TX ex:city http://db1/loc20 ex:Client ex:name TX ex:pop 1197816 Location state pop id city 1 Austin TX 790390 2 Dallas TX 1197816 ACME Inc ex:name TX Austin http://db1/cl ient10 rdf:type ex:city ex:state ex:basedIn ex:pop http://db2/loc1 ex:Client rdf:type Dallas http://db1/ client11 ex:name ex:city ex:basedIn TX ex:state ex:pop http://db2/loc2 Foo Bars id Clients Name c_id Locations c_id city state 10 ACME Inc 20 20 Austin TX 11 Foo Bars 21 21 Dallas TX 790390 1197816 Location state pop id city 1 Austin TX 790390 2 Dallas TX 1197816 A bit of history • Relational Databases on the Web. TimBL, 1998 • W3C Workshop on RDF Access to Relational Databases, October 2007 – Report: http://www.w3.org/2007/03/RdfRDB/report • W3C RDB2RDF Incubator Group, 2008-2009 – Survey: http://www.w3.org/2005/Incubator/rdb2rdf/RDB2RDF_Survey Report.pdf • W3C RDB2RDF Working Group, 2009 – today – R2RML: RDB to RDF Mapping Language – A Direct Mapping of Relational Data to RDF RDB and the Semantic Web RIF OWL RDFS RDF 12 RDB and the Semantic Web TRIGGERS CONSTRAINTS TABLE DEFINITION RELATIONAL MODEL 13 RDB and the Semantic Web TRIGGERS RIF CONSTRAINTS OWL TABLE DEFINITION RDFS RELATIONAL MODEL RDF 14 Overview R2RML: RDB to RDF Mapping Language • Language for expressing customized mappings from relational databases to RDF datasets • Give precise control to the developer – You create the structure you want – You choose the target vocabulary • No RDFS/OWL is created from the schema 16 R2RML Mapping RDB R2RML manual RDF Direct Mapping • Automatic transformation from Relational Database to RDF – Click a button… Voila! • Generate RDFS/OWL of the database schema • If this doesn’t get you where you want…use existing languages for mapping – RDF to RDF with RIF or SPARQL Construct • Semantic Web community – Create SQL Views and directly map those • Database community 18 Direct Mapping RDB Direct Mapping automatic SQL Views RDF RIF/ SPARQL Construct RDF Hybrid • Instead of starting from a blank R2RML file… • 1) Direct Mapping • 2) Manual Editing 20 Hybrid Mapping Direct Mapping Direct Mapping in R2RML Modify RDB R2RML RDF Materialize Triples • Data is not dynamic • Dump RDB into RDF and then insert into triplestore • RDF dump may not be consistent with RDB 22 Materialized Triples SPARQL RDB Dump RDF Virtual Triples • Data is dynamic • Need to query RDB with SPARQL • Translate SPARQL to SQL – Comparing the overall performance […] of the fastest rewriter with the fastest relational database shows an overhead for query rewriting of 106%. This is an indicator that there is still room for improving the rewriting algorithms [Bizer and Schultz 2009] – Current rdb2rdf systems are not capable of providing the query execution performance required [...] it is likely that with more work on query translation, suitable mechanisms for translating queries could be developed. These mechanisms should focus on exploiting the underlying database system’s capabilities to optimize queries and process large quantities of structure data [Gray et al. 2009] – Ultrawrap solves this • RDF data is consistent with RDB data 24 Virtual Triples SPARQL RDB Mapping RDF RDB2RDF Space Materialized Triples Direct Mapping Hybrid Custom Mapping Virtual Triples Tuples to Triples PREDICATE SID NAME AGE Alice 25 SUBJECT 1 OBJECT 2 Bob 26 http://ex.com/person1 http://ex.com/age 25 Current Status of W3C RDB2RDF WG • R2RML: RDB to RDF Mapping Language Working Draft http://www.w3.org/TR/r2rml/ • A Direct Mapping of Relational Data to RDF Working Draft http://www.w3.org/TR/rdb-direct-mapping/ • Last Call: Sept 1 (hopefully) 28 Implementations • Ultrawrap – SPARQL and semantically equivalent SQL have equal execution time – Commercial databases – http://ribs.csres.utexas.edu/ultrawrap • Spyder – Oracle and HSQLDB – http://www.revelytix.com/content/spyder • Other non-standard RDB2RDF – D2R Server, Virtuoso, Triplify, … 29 Publicity • International Semantic Web Conference – Oct 23 – 27 in Bonn, Germany • Posters and Demos – August 15 • Consuming Linked Data Workshop – August 15 • Outrageous Ideas Track – Sept 5 • Semantic Web Challenge – Sept 30 Join the Facebook group SSSW2011 • 2nd Linked Data-a-thon – Oct 1 http://iswc2011.semanticweb.org/ 30 Thank You Acknowledgments: - RiBS @ UT Austin - W3C RDB2RDF WG members - David McNeil - Revelytix @juansequeda