Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
RDF languages and storages part 1 - expressivness Maciej Janik Conrad Ibanez CSCI 8350, Fall 2004 Outline Comparison of RDF languages RQL Sesame implementation SquishQL - bases for RDQL Redland store Sesame Web-based architecture Persistent RDF store use of traditional DBMS use of dedicated RDF triple storage Database independent Scalable architecture Query engine that implements RQL Sesame - architecture Written in Java Modules: HTTP/SOAP handler Admin module Query module Export module Repository Abstraction Layer Use of PostgreSQL Sesame - modules Admin module incrementaly add RDF/RDFS clearing repository schema operations recognise ‘type’, ‘subClassOf’, ‘subPropertyOf’ consistency checking adding inferred facts to repository RDF Export module export RDF to standard XML-serialized format Sesame - modules Query module query plan and optimizer similar to already known DB solutions query is translated to a set of simple RAL calls each leaf of the query plan can ‘evaluate itself’ and pull data from RAL data are returned as streams lack of optimization on storage level Sesame - modules RAL - Repository Abstraction Layer makes Sesame storage independent API supportes RDF Schema semantics (e.g. subsumption reasoning) can be stacked one on another interface oriented for persistance storage (DBMS, Object-Relational DB) data returned as streams can even use net-based RDF services (!) Due to poor performance, implemented cache as one of RALs cache mainly for RDFS, as it needs code support in reasoning (subClassOf, ...) Sesame - issues Due to portability (RAL) cannot optimize for underlying data storage Incremental uploads (schema) are slow due to rebuilding table in PostreSQL Scaled up to 400,000 statements (RDF from Wordnet) very loosely connected graph took 94 minutes (71 statements per second) Slow upload of new data due to lots of required database operations Query works slow due to the same issues Redland, Rasqual, Raptor Storage for RDF triples - do not implement any language by itself This is the main module to include in RDF manipulation system Implemented in pure C for portability Rich API enables to build modules on top of it Rasqual - RDF query module RDQL SPARQL Raptor - a fast RDF parser Redland Triple: Subject - Predicate - Object API enables retrieval of triples Highly optimized for performance Indexes SP 2 O PO 2 S SO 2 P P 2 SO S2P - get get get get get target source relations between nodex nodes in relation relations for subject Redland - RDF Model stores Memory based memory Persistent double-linked list small models basic indexes on triples hashes - bdb memory native storage with DBD hashes, no persistence hashes with BDB hashes - memory 3store BDB hashes on disk native storage, scales tolow million of tuples triplestore from AKT project not well supported mysql uses MYSQL DB Redland - class diagram Efficient implementation of triple in memory use of pointers URI value separated Strict memory management - no leaks Abstraction of model to support different storages Fast parser / serializer Redland API available in different languages API for manipulating triples, URI/literals, graphs Portable - can built in most OSes Scalable to handle millions of triples C, C#, Java, Perl, Python, PHP, Ruby, Tcl while using of persistent storage but indexing is very space-consuming Support for context and hierarchy of models RDF languages and storages part 2 - indexing semi-structure data Maciej Janik Conrad Ibanez CSCI 8350, Fall 2004