Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Transcript
First International Workshop on Querying Graph Structured Data – GraphQ 2010
(in conj. with ADBIS 2010 – Novi Sad, Serbia, September 2010)
T-SPARQL: a TSQL2-like Temporal
Query Language for RDF
Fabio Grandi
Alma Mater Studiorum - Università degli Studi di Bologna
Introduction
 Some application fields require the maintenance
of past versions of an RDF graph (e.g. encoding a
domain ontology) after changes
 For instance, in the legal domain:


Ontologies evolve as a natural consequence
of the dynamics involved in normative systems
Agents must often deal with a past perspective
(e.g. a Court judging today on some fact committed in the past)
 Moreover, several time dimensions are usually important
for applications in such domains
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
Multi-temporal versioning
 Time dimensions of interest in the legal domain:

Validity time
is the time a norm is in force in the real world

Efficacy time
is the time a norm can be applied to a concrete case;
while such cases exist, the norm continues its efficacy
though no longer in force

Transaction time
is the time a norm is stored in the computer system

Publication time
is the time a norm is published on the Official Journal
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
Temporal RDF Data Models
 Temporal RDF data models have been recently
proposed, the proposals remarkably include:
[Gutierrez, Hurtado & Vaisman, 2007]
[Pugliese, Udrea & Subrahmanian, 2008]
[Tappolet & Bernstein, 2009]
 Index structures (e.g. tGRIN and keyTree) have
been proposed for efficient processing of
temporal queries
 Interval timestamping of RDF triples is adopted
 A single time dimension (valid time) is usually
considered
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
Temporal SPARQL Extensions
 Temporal extensions of the SPARQL query
language for RDF have been proposed,
including:



extensions not based on a temporal data model
[Frasincar, Borsje & Levering, 2009]
extensions based on temporal logic
[Mateescu, Meriot & Rampaceck, 2009]
extensions based on mapping to plain SPARQL
[Tappolet & Bernstein, 2009]
 Interval timestamping of RDF triples is adopted
 A single time dimension (valid time) is usually
considered
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
The TSQL2 Temporal Query Language
 A consensual temporal extension of the standard
database language SQL-92
 Defined by a design committee of 18 temporal
database experts chaired by Richard Snodgrass
 It represents the synthesis of more than a decade
of work in temporal query languages
 It was aimed at collecting the best features of the
previously proposed languages as to expressivity
and user-friendliness
 Specification published as a book in1995
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
The T-SPARQL Proposal
 Based on the temporal data model presented in
F. Grandi, “Multi-temporal RDF Ontology versioning”,
IWOD Workshop, 2009:

multiple time dimensions are considered…

temporal-element timestamping is adopted…

… in order to preserve the scalability property
of triple storage technology
 Presenting the main features of the TSQL2 language

TSQL2-like temporal data types and operators

TSQL2-like temporal selection and projection facilities
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
The Multi-temporal RDF Database Model
 N-dimensional time domain:
 T = T1 x T2 x … x TN
Ti = [0,UC)i
 Multi-temporal RDF triple:
 ( s,p,o | T )
s is a subject
p is a predicate
o is an object
T T is a timestamp
 Multi-temporal RDF database:
 RDF-TDB = { ( s,p,o | T ) | T  T }
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
Multi-temporal RDF Triples
 A temporal triple ( s,p,o | T )
assigns a temporal pertinence
to an RDF triple ( s,p,o )
 The non-temporal triple ( s,p,o )
is the value (or the contents)
of the temporal triple ( s,p,o | T )
 The temporal pertinence T
is a subset of the time domain T
represented by a temporal element
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
Temporal Elements
 A temporal element [Gadia 1998] is
a disjoint union of temporal intervals
 Multi-temporal intervals are obtained as the Cartesian
product of one interval for each temporal dimension

T = U1≤j≤m Ij
= U1≤j≤m [tjs, tje)1 x [tjs, tje)2 x … x [tjs, tje)N

Ij ∩ Ik = Ø for all 1≤j<k≤m
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
Integrity Constraint
 No value-equivalent distinct triples exist:
 ( s,p,o | T ), ( s,p,o | T  )  RDF-TDB:
s=s  p=p  o=o  T=T 
 The constraint is made possible by the adoption
of temporal element timestamping
 Temporal elements lead to space saving, whenever the
temporal pertinence of a triple is not a convex interval
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
Memory Saving with Temporal Elements
 For example, even with a monodimensional time domain,
the two value-equivalent triples with interval time-stamping ( t2 < t3 ):
( s,p,o | [t1, t2) ) and ( s,p,o | [t3, t4) )
can be merged into a single triple with element time-stamping:
( s,p,o | [t1, t2) U [t3, t4) )
where the same space is required for the timestamps in both cases
(i.e. the space needed by 4 time points) and the contents of the
triple is stored twice in the former case and only once in the latter
 Different triple versions are stored only once with a complex
timestamp instead of storing multiple copies (value-equivalent
triples) with a simple timestamp
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
An Example
 The memory saving obtained with temporal elements
grows with the dimensionality of the time domain!
 The memory saving is also emphasized by the triple size
with respect to the timestamp size
 In very large RDF benchmark datasets, the average triple size
ranges from 80140 bytes (DBpedia, UScensus, LUBM, BSBM)
to more than 600 bytes (UniProtKB)
 The timestamp (date+time) data size in SQL is 68 bytes
 In the example which follows we assume
a bitemporal domain (valid + transaction time)
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
Representation of the Evolution of a Triple
t0
(s, p, o1 )
t1
(s, p, o2 )
t2
(s, p, o3 )
t0
t1
t2
UC
With temporal intervals (5 needed)
( s, p, o1 | [t0,t1)x[t0,UC) )
( s, p, o1 | [t1,UC)x[t0,t1) )
( s, p, o2 | [t1,t2)x[t1,UC) )
( s, p, o2 | [t2,UC)x[t1,t2) )
( s, p, o3 | [t2,UC)x[t2,UC) )
UC
With temporal elements (3 triples needed)
( s, p, o1 | [t0,t1)x[t0,UC) U [t1,UC)x[t0,t1) )
( s, p, o2 | [t1,t2)x[t1,UC) U [t2,UC)x[t1,t2) )
( s, p, o3 | [t2,UC)x[t2,UC) )
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
Memory Saving Figures
 Percentage space saving with temporal element vs interval
timestamping. Avg. number of versions per triple in colums,
triple size in bytes in rows. We assume 8-byte timestamps.
80
120
160
200
2
27,78
29,41
30,30
30,86
5
37,04
39,22
40,40
41,15
8
38,89
41,18
42,42
43,21
11
39,68
42,02
43,29
44,09
 For instance, with 120-byte triples with 5 versions per triple
on average, we have a 39,22% space saving.
With 1 billion of triples, this means an RDF-TDB size of

721 GB with temporal elements

1.14 TB with temporal intervals
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
Outline of the T-SPARQL language
 Time representation (temporal datatypes)
 Temporal projection and selection
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
Time Representation
 Like in TSQL2, time is discrete with a minimal
system-dependent unit called chronon
 Three baseTemporal Datatypes:



Datetime
instantaneous event without duration,
conventionally represented as a chronon
Period
set of consecutive chronons on the time axis
charactherized by two datetime-type boundaries
Interval
pure duration, non anchored on the time axis,
represented by a multiple of the chronon
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
Temporal datatypes
 The datetime datatype corresponds to the
xs:dateTime XML Schema primitive datatype
examples:
"2010-01-01"^^xs:date
"2010-01-01T00:00:00.000+01:00"^^xs:dateTime
 The interval datatype corresponds to the
xs:duration XML Schema primitive datatype
examples:
"P2Y"^^xs:duration
"P1Y2M3DT5H20M30.123S"^^xs:duration
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
Temporal datatypes
 The period datatype requires the definition of a
new datatype as XML Schema extension:
xs:period
 with a new constructor:
fn:period($arg1 as xs:dateTime,
$arg2 as xs:dateTime) as xs:period
example:
"[2010-01-01,2010-01-31]"^^xs:period equiv. to
fn:period("2010-01-01"^^xs:date,
"2010-01-31"^^xs:date)
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
The xs:period datatype
 The xs:period datatype is assumed to be
compatible with the standard xs:gYearMonth
and xs:gYear datatypes:
"[2010-01-01,2010-01-31]"^^xs:period
= "2010-01"^^xs:gYearMonth
"[2009-01-01,2009-12-31]"^^xs:period
= "2009"^^xs:gYear

GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
The xs:period datatype
 Two predefined functions can be used to extract
the left and right boundaries from xs:period data:
fn:begin($arg1 as xs:period) as xs:dateTime
fn:end($arg1 as xs:period) as xs:dateTime
examples:
fn:begin("[2010-01-01,
2010-01-31]"^^xs:period)
= "2010-01-01"^^xs:date
fn:end("2009"^^xs:gYear)
= "2009-12-31"^^xs:date

GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
The xs:temporalElement datatype
 We also assume a new primitive
xs:temporalElement datatype to be defined
to represent temporal elements
 The constructor has a variable number of
xs:period-type arguments, example:
fn:temporalElement(
"[2008-06-01,2009-07-15]"^^xs:period,
"[2009-11-01,2010-02-21]"^^xs:period )
= "[2008-06-01,2009-07-15]+[2009-1101,2010-02-21]"^^xs:temporalElement
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
Built-in functions for xs:temporalElement
 Like in TSQL2 useful functions are available to
extract the first (last) period from an element:
fn:first($arg1 as xs:temporalElement)
as xs:period
fn:last($arg1 as xs:temporalElement)
as xs:period
 In order to extract the first (last) chronon of an
element, the fn:begin (fn:end) function can directly
be applied also to elements, that is:
fn:begin(T) = fn:begin(fn:first(T))
fn:end(T) = fn:end(fn:last(T))
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
Temporal Projection
 Specifies which temporal pertinence has to be assigned
to the results of a T-SPARQL query
 The query result can be:

a temporal RDF graph consistent
with the underlying data model (timeslice query)

a regular, non-temporal RDF graph (snapshot query)

an arbitrary tuple set
 A TSQL2-like INTERSECT clause is available to assign the
right temporal pertinence to timeslice query results
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
Temporal Projection

Given a time point t = (t1, t2,…, tN)  T
we define the RDF database snapshot valid at t as
RDF-TDB(t) = { ( s,p,o ) | ( s,p,o | T )  RDF-TDB  t  T }

In T-SPARQL:
CONSTRUCT { ?s,?p,?o }
WHERE { TGRAPH < …myURI… > { ?s, ?p, ?o | ?t } .
FILTER ?t CONTAINS "(t1, t2,…, tN) " . }


Given a time period I = I1 x I2 x … x In  T
we define the RDF database timeslice valid in I as
RDF-TDB(I) = { ( s,p,o | T' ) | ( s,p,o | T )  RDF-TDB
 T' = T ∩ I ≠ Ø }
In T-SPARQL:
TCONSTRUCT { ?s,?p,?o |
INTERSECT( ?t, "(I1 x I2 x …x IN) " ) . }
WHERE { TGRAPH < …myURI… > { ?s, ?p, ?o | ?t } }
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
Timestamp Variables
 Graph patterns to be used in the WHERE clause of
the SELECT statement are augmented with an
optional fourth position where matching with triple
timestamps can be specified, e.g.
_:e
ex:Dept
"Toys"
|
?t
where the variable ?t binds to the timestamp of a
temporal triple whose (non-temporal) contents are:
_:e
ex:Dept
"Toys"
 i.e. the timestamp variable ?t represents the time an
employee denoted by the blank node _:e has been
working in the Toys department
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
Temporal Selection
 In the T-SPARQL FILTER clause, TSQL2-like temporal (binary
infix) predicates can be used to specify constraints over
timestamp variables, e.g.
FILTER ( VALID(?t) OVERLAPS
"[2010-01-01,2010-12-31]"^^xs:period
&& TRANSACTION(?t) CONTAINS
"2009-06-01"^^xs:date )
which only matches timestamps ?t whose valid time
component overlaps January 2010 and whose transaction time
component contains the June 1, 2009 time point
 i.e. the temporal triple whose timestamp is bound to ?t is
selected only if it is (even partially) valid in January 2010,
as of June 1, 2009.
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
Temporal Selection Operators
 The available comparison operators are the same as in TSQL2:
Operator
Definition
A PREDECES B
END(A) is earlier than BEGIN(B)
A=B
A and B are identical
A MEETS B
END(A) immediately precedes BEGIN(B)
A CONTAINS B
Each chronon in B is also contained in A
 They can be used to compare (monodimensional) temporal
elements, periods and time points; also operands with different
types can be compared (owing to reducibility to chronon sets)
 The user-friendly operators, whose definition is close to their
meaning in English, form a non minimal but complete set,
equivalent to the Allen’s Algebra for intervals and time points
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
Query Examples

We assume ex: is a prefix referencing a namespace involving the
definition of employee data:
@prefix
ex:
<http://myExample.org/employee/>
.

Sample employee data (temporal RDF graph):

_:emp1 rdf:type ex:emp
_:emp1 ex:Name "Ann"
_:emp1 ex:Salary "2200"^^xs:integer |
"[2009-06-01,2009-09-30]+[2009-06-01,UC]"^^xs:temporalElement

_:emp2 rdf:type ex:emp
_:emp2
ex:Name "Tom"
_:emp2 ex:Salary "2000"^^xs:integer |
"[2008-01-01,2008-12-31]"^^xs:temporalElement
_:emp2 ex:Salary "2200"^^xs:integer |
"[2009-01-01,UC]"^^xs:temporalElement
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
Query Example (1)
 A query involving both temporal selection and projection
(result not organized as a temporal RDF graph)

SELECT ?salary INTERSECT(?t,"[2007-01-01,2009-12-31]")
WHERE {
?emp rdf:type ex:emp ;
ex:Name "Tom" ;
ex:Salary ?salary | ?t .
FILTER ( VALID(?t) OVERLAPS
"[2007-01-01,2009-12-31]"^^xs:period ) . }
 The query retrieves the Tom’s salary history from 2007 to 2009
 An implied conjunct
&& TRANSACTION(?t) CONTAINS fn:current-date()
is assumed in the FILTER clause to retrieve only current data
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
Query Example (2)
 A similar query can be used to retrieve the same data after a
database rollback to the beginning of 2008

SELECT ?salary INTERSECT(?t,"[2007-01-01,2009-12-31]")
WHERE {
?emp rdf:type ex:emp ;
ex:Name "Tom" ;
ex:Salary ?salary | ?t .
FILTER ( VALID(?t) OVERLAPS
"[2007-01-01,2009-12-31]"^^xs:period
&& TRANSACTION(?t) CONTAINS
"2008-01-01"^^xs:date ) . }
 The query retrieves the Tom’s salary history from 2007 to 2009,
as of January 1, 2008
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
Query Example (3)
 A query involving both temporal selection and projection
(result not organized as a temporal RDF graph)

SELECT ?salary INTERSECT(?t,"[2007-01-01,2009-12-31]")
WHERE {
?emp rdf:type ex:emp ;
ex:Name "Tom" ;
ex:Salary ?salary | ?t .
FILTER ( VALID(?t) OVERLAPS
"[2007-01-01,2009-12-31]"^^xs:period ) . }
 The query retrieves the Tom’s salary history from 2007 to 2009
 An implied conjunct
&& TRANSACTION(?t) CONTAINS fn:current-date()
is assumed in the FILTER clause to retrieve only current data
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
Query Example (4)
 A query involving a sort of temporal join involving a comparison
between the duration of two validity periods

SELECT ?ename WHERE {
?emp1 rdf:type ex:emp ;
ex:Name "Ann" ;
ex:Salary ?salary | ?ts .
?emp2 rdf:type ex:emp ;
ex:Name ?ename ;
ex:Dept "Toys" | ?tt .
FILTER ( ?salary > "20000"^^xs:integer
&& xs:duration(VALID(?tt)) >
xs:duration(VALID(?ts)) ) . }
 The query retrieves the name of the employees (?emp2) who
have worked in the Toys department longer than Ann (?emp1)
has made $20,000
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
Query Example (5)
 An optional modifier PERIOD can be specified in the
declaration of temporal variables
 SELECT ?ename WHERE {
?emp1 rdf:type ex:emp ;
ex:Name ?ename ;
ex:Dept "Sales" | ?t .
FILTER
 ( xs:duration(VALID(?tt)) > "P2Y"^^xs:duration ) )
. }
 This first query version retrieves the name of the
employees who worked in the Sales department for
more than two years (altogether)
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
Query Example (6)
 An optional modifier PERIOD can be specified in the
declaration of temporal variables
 SELECT ?ename WHERE {
?emp1 rdf:type ex:emp ;
ex:Name ?ename ;
ex:Dept "Sales" | ?t PERIOD .
FILTER
 ( xs:duration(VALID(?tt)) > "P2Y"^^xs:duration ) )
. }
 This second query version retrieves the name of the
employees who worked (continuously) in the Sales
department for a period longer than two years
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
Query Example (7)
 The PERIOD modifier can also be used to refernce
consecutive periods within the same data history
 SELECT ?ename ?job WHERE {
?emp rdf:type ex:emp ;
ex:Name ?ename ;
ex:Job ?job | ?t1 PERIOD .
ex:Job "Direct2or" | ?t2 PERIOD .
ex:Job ?job | ?t3 PERIOD .
FILTER ( VALID(?t1) MEETS VALID(?t2)
&& VALID(?t2) MEETS VALID(?t3) ) . }
 This query retrieves the name of the employees who
returned to their previous job (?job) after having been
directors for some time
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF
Conclusions and Future Work
 We presented T-SPARQL a temporal SPARQL extension
supporting the temporal RDF database model we introduced in
[Grandi 2009] employing triple timestamping with multi-dimensional
temporal elements
 T-SPARQL is equipped with the basic temporal constructs
introduced for the TSQL2 query language and works with an
extended set of the temporal datatypes, functions and operators
available in the SPARQL specification
 Future work will consider the design and implementation of a
prototype query engine supporting a T-SPARQL interface and the
adoption of suitable index and storage structures for efficiently
querying temporal RDF graphs
GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF