Download A Path-based Relational RDF Database Akiyoshi Matono Toshiyuki

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data center wikipedia , lookup

Clusterpoint wikipedia , lookup

Data model wikipedia , lookup

Web Ontology Language wikipedia , lookup

Data analysis wikipedia , lookup

SAP IQ wikipedia , lookup

3D optical data storage wikipedia , lookup

Versant Object Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Information privacy law wikipedia , lookup

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Semantic Web wikipedia , lookup

Resource Description Framework wikipedia , lookup

Transcript
Author: Akiyoshi Matonoy, Toshiyuki
Amagasay, Masatoshi Yoshikawaz, Shunsuke
Uemuray
Semantic Web
• The World Wide Web growing ever larger and more
complex, the Semantic Web has emerged as a vision of
the next generation of the web. Compared with the
current Web, the Semantic Web makes human-tomachine and machine-to-machine interactions more
intelligent with the good quality and quantity of
metadata on Web resources.
RDF
• Resource Description Framework (RDF), the core of the
•
Semantic Web, describes its metadata and semantics.
With the popular utilization of the Semantic Web, the
storage and retrieval of RDF data come into the light
accordingly.
RDF is commonly used for large data, such as ontology
or dictionaries. If we use conventional RDF databases to
process such large data, some problems may emerge.
RDF
• RDF Schema is a specification for defining
schematic information of RDF data. It makes
developers define a particular vocabulary for RDF
data and specify the kinds of object.
• RDF data can be decomposed into statements, so it
also can be modeled as a directed graph, where
nodes and arcs represent resources and
relationships separately. It is composed of RDFmeta schema data, RDF schema data and RDF data,
and each group are instances of the former one.
The conventional approach
• Flatly store
• Problems?
Any query contains RDF schema information will not be handled
properly.
The conventional approach
• Creates relational tables for classes and properties,
storing resources according to their classes.
• Problems?
Doesn’t make any distinction between schema
and data, will have problem when you perform a
schema query other than RDF data query.
The conventional approach
• Store the subject , predicate and object as
keys into three tables. using these keys ,
we can retrieve corresponding statements.
• Problems?
– Poor performance when processing
path-based queries.
– Join operation makes the query string
longer
Sub graphs
 Graph CI, inheritance relationships between classes
 Graph PI, inheritance relationships between properties
 Graph T, a single-labeled directed acyclic graph
 Graph DR, domain (rdfs:domain) or range (rdfs:range) of
each property
 Graph G, consist of all the remaining statements not
included in the above sub graphs
 Separate RDF schema information and RDF instance data
 Simpler structure ease to store
Path expression
Store arc paths of the graphs into path table in relational
database
Extended interval numbering
scheme
 Add virtual root if the graph has more than one root
node
 Add new node (s) for the node which is reachable
through multiple path
 Each node is assigned (preorder, postorder, depth)
 V is an ancestor of u: pre (v) < pre (u) ^ post (v) > post
(u), v, u are nodes in the graph.
 V is a parent of u: v is an ancestor of u, and depth (u) –
depth (v) = 1
Algorithm
Relational database schema
Query processing
 Path query - Find the title of something painted by someone:
SELECT r.resourceName
FROM path AS p, resource AS r
WHERE p.pathID = r.pathID
AND p.pathexp = '#title<#paints'
 Schema query - Find the names of the classes that are
http://www.w3.org/2000/01/rdf-schema# Resource’s direct super
class:
SELECT c1.className
FROM class AS c, class AS c1
WHERE c.pre < c1.pre
AND c.post > c1.post
AND c.depth = c1.depth - 1
AND c.className =
'http://www.w3.org/2000/01/rdf-schema#Resource'
Summary & Conclusion
• The main reason for the study is to improve the
performance, while retrieving RDF related data
and path based querying of Relational RDF data
is efficient as it reduces number of joins. Also, It
is for both RDF without schema, and RDF with
schema data. The paper assumes that most of
the RDF data is acyclic. The other thing to
observe is, sub graph extraction into 5 sub
graphs.
• Data is stored based on 5 sub
•
graphs. Extended interval numbering
scheme is used to detect parent –
child relationships, resulting into fast
retrieval of super classes, sub
classes.
It is mentioned that most of the
queries for RDF data are generally
queries to detect sub graphs
matching a given graph. Also, they
are, in general, queries to detect a
set of nodes, which can be reached
via given path expression. So, RDF
data can be dealt more efficiently
using path based queries.
Why Relational RDF…
• Because Flat & Hash approaches do
•
•
not make any distinction between
schema information & resource
descriptions.
Schema approach is able to process
RDF based queries. What about
schema less RDF data. Also, there is
a big overhead while maintaining
schema, as it evolves.
Hence, Relational DB and store the
RDF data, schema in separate tables.
Conclusions :
As both RDF schema & RDF instance
data are stored in to distinct
relational tables, We
1.Can handle schema less RDF data.
2.Can process, schema based queries.
(using the extended interval
numbering scheme.)
3.Can process, path based expressions
as the RDF data is stored in the
Relational DB based on path
expressions.
• Also, the performance is
dramatically improved, as the
length of path expression is
increased. Refer to the graph
on Page 6.
• Problems:
• Sub graphing, Assumption of
Acyclic data, No mention of
ETL if we want to convert from
conventional. Not easy to
query (compared SQL).