Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
First International Workshop on Querying Graph Structured Data – GraphQ 2010 (in conj. with ADBIS 2010 – Novi Sad, Serbia, September 2010) T-SPARQL: a TSQL2-like Temporal Query Language for RDF Fabio Grandi Alma Mater Studiorum - Università degli Studi di Bologna Introduction Some application fields require the maintenance of past versions of an RDF graph (e.g. encoding a domain ontology) after changes For instance, in the legal domain: Ontologies evolve as a natural consequence of the dynamics involved in normative systems Agents must often deal with a past perspective (e.g. a Court judging today on some fact committed in the past) Moreover, several time dimensions are usually important for applications in such domains GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF Multi-temporal versioning Time dimensions of interest in the legal domain: Validity time is the time a norm is in force in the real world Efficacy time is the time a norm can be applied to a concrete case; while such cases exist, the norm continues its efficacy though no longer in force Transaction time is the time a norm is stored in the computer system Publication time is the time a norm is published on the Official Journal GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF Temporal RDF Data Models Temporal RDF data models have been recently proposed, the proposals remarkably include: [Gutierrez, Hurtado & Vaisman, 2007] [Pugliese, Udrea & Subrahmanian, 2008] [Tappolet & Bernstein, 2009] Index structures (e.g. tGRIN and keyTree) have been proposed for efficient processing of temporal queries Interval timestamping of RDF triples is adopted A single time dimension (valid time) is usually considered GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF Temporal SPARQL Extensions Temporal extensions of the SPARQL query language for RDF have been proposed, including: extensions not based on a temporal data model [Frasincar, Borsje & Levering, 2009] extensions based on temporal logic [Mateescu, Meriot & Rampaceck, 2009] extensions based on mapping to plain SPARQL [Tappolet & Bernstein, 2009] Interval timestamping of RDF triples is adopted A single time dimension (valid time) is usually considered GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF The TSQL2 Temporal Query Language A consensual temporal extension of the standard database language SQL-92 Defined by a design committee of 18 temporal database experts chaired by Richard Snodgrass It represents the synthesis of more than a decade of work in temporal query languages It was aimed at collecting the best features of the previously proposed languages as to expressivity and user-friendliness Specification published as a book in1995 GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF The T-SPARQL Proposal Based on the temporal data model presented in F. Grandi, “Multi-temporal RDF Ontology versioning”, IWOD Workshop, 2009: multiple time dimensions are considered… temporal-element timestamping is adopted… … in order to preserve the scalability property of triple storage technology Presenting the main features of the TSQL2 language TSQL2-like temporal data types and operators TSQL2-like temporal selection and projection facilities GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF The Multi-temporal RDF Database Model N-dimensional time domain: T = T1 x T2 x … x TN Ti = [0,UC)i Multi-temporal RDF triple: ( s,p,o | T ) s is a subject p is a predicate o is an object T T is a timestamp Multi-temporal RDF database: RDF-TDB = { ( s,p,o | T ) | T T } GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF Multi-temporal RDF Triples A temporal triple ( s,p,o | T ) assigns a temporal pertinence to an RDF triple ( s,p,o ) The non-temporal triple ( s,p,o ) is the value (or the contents) of the temporal triple ( s,p,o | T ) The temporal pertinence T is a subset of the time domain T represented by a temporal element GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF Temporal Elements A temporal element [Gadia 1998] is a disjoint union of temporal intervals Multi-temporal intervals are obtained as the Cartesian product of one interval for each temporal dimension T = U1≤j≤m Ij = U1≤j≤m [tjs, tje)1 x [tjs, tje)2 x … x [tjs, tje)N Ij ∩ Ik = Ø for all 1≤j<k≤m GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF Integrity Constraint No value-equivalent distinct triples exist: ( s,p,o | T ), ( s,p,o | T ) RDF-TDB: s=s p=p o=o T=T The constraint is made possible by the adoption of temporal element timestamping Temporal elements lead to space saving, whenever the temporal pertinence of a triple is not a convex interval GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF Memory Saving with Temporal Elements For example, even with a monodimensional time domain, the two value-equivalent triples with interval time-stamping ( t2 < t3 ): ( s,p,o | [t1, t2) ) and ( s,p,o | [t3, t4) ) can be merged into a single triple with element time-stamping: ( s,p,o | [t1, t2) U [t3, t4) ) where the same space is required for the timestamps in both cases (i.e. the space needed by 4 time points) and the contents of the triple is stored twice in the former case and only once in the latter Different triple versions are stored only once with a complex timestamp instead of storing multiple copies (value-equivalent triples) with a simple timestamp GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF An Example The memory saving obtained with temporal elements grows with the dimensionality of the time domain! The memory saving is also emphasized by the triple size with respect to the timestamp size In very large RDF benchmark datasets, the average triple size ranges from 80140 bytes (DBpedia, UScensus, LUBM, BSBM) to more than 600 bytes (UniProtKB) The timestamp (date+time) data size in SQL is 68 bytes In the example which follows we assume a bitemporal domain (valid + transaction time) GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF Representation of the Evolution of a Triple t0 (s, p, o1 ) t1 (s, p, o2 ) t2 (s, p, o3 ) t0 t1 t2 UC With temporal intervals (5 needed) ( s, p, o1 | [t0,t1)x[t0,UC) ) ( s, p, o1 | [t1,UC)x[t0,t1) ) ( s, p, o2 | [t1,t2)x[t1,UC) ) ( s, p, o2 | [t2,UC)x[t1,t2) ) ( s, p, o3 | [t2,UC)x[t2,UC) ) UC With temporal elements (3 triples needed) ( s, p, o1 | [t0,t1)x[t0,UC) U [t1,UC)x[t0,t1) ) ( s, p, o2 | [t1,t2)x[t1,UC) U [t2,UC)x[t1,t2) ) ( s, p, o3 | [t2,UC)x[t2,UC) ) GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF Memory Saving Figures Percentage space saving with temporal element vs interval timestamping. Avg. number of versions per triple in colums, triple size in bytes in rows. We assume 8-byte timestamps. 80 120 160 200 2 27,78 29,41 30,30 30,86 5 37,04 39,22 40,40 41,15 8 38,89 41,18 42,42 43,21 11 39,68 42,02 43,29 44,09 For instance, with 120-byte triples with 5 versions per triple on average, we have a 39,22% space saving. With 1 billion of triples, this means an RDF-TDB size of 721 GB with temporal elements 1.14 TB with temporal intervals GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF Outline of the T-SPARQL language Time representation (temporal datatypes) Temporal projection and selection GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF Time Representation Like in TSQL2, time is discrete with a minimal system-dependent unit called chronon Three baseTemporal Datatypes: Datetime instantaneous event without duration, conventionally represented as a chronon Period set of consecutive chronons on the time axis charactherized by two datetime-type boundaries Interval pure duration, non anchored on the time axis, represented by a multiple of the chronon GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF Temporal datatypes The datetime datatype corresponds to the xs:dateTime XML Schema primitive datatype examples: "2010-01-01"^^xs:date "2010-01-01T00:00:00.000+01:00"^^xs:dateTime The interval datatype corresponds to the xs:duration XML Schema primitive datatype examples: "P2Y"^^xs:duration "P1Y2M3DT5H20M30.123S"^^xs:duration GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF Temporal datatypes The period datatype requires the definition of a new datatype as XML Schema extension: xs:period with a new constructor: fn:period($arg1 as xs:dateTime, $arg2 as xs:dateTime) as xs:period example: "[2010-01-01,2010-01-31]"^^xs:period equiv. to fn:period("2010-01-01"^^xs:date, "2010-01-31"^^xs:date) GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF The xs:period datatype The xs:period datatype is assumed to be compatible with the standard xs:gYearMonth and xs:gYear datatypes: "[2010-01-01,2010-01-31]"^^xs:period = "2010-01"^^xs:gYearMonth "[2009-01-01,2009-12-31]"^^xs:period = "2009"^^xs:gYear GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF The xs:period datatype Two predefined functions can be used to extract the left and right boundaries from xs:period data: fn:begin($arg1 as xs:period) as xs:dateTime fn:end($arg1 as xs:period) as xs:dateTime examples: fn:begin("[2010-01-01, 2010-01-31]"^^xs:period) = "2010-01-01"^^xs:date fn:end("2009"^^xs:gYear) = "2009-12-31"^^xs:date GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF The xs:temporalElement datatype We also assume a new primitive xs:temporalElement datatype to be defined to represent temporal elements The constructor has a variable number of xs:period-type arguments, example: fn:temporalElement( "[2008-06-01,2009-07-15]"^^xs:period, "[2009-11-01,2010-02-21]"^^xs:period ) = "[2008-06-01,2009-07-15]+[2009-1101,2010-02-21]"^^xs:temporalElement GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF Built-in functions for xs:temporalElement Like in TSQL2 useful functions are available to extract the first (last) period from an element: fn:first($arg1 as xs:temporalElement) as xs:period fn:last($arg1 as xs:temporalElement) as xs:period In order to extract the first (last) chronon of an element, the fn:begin (fn:end) function can directly be applied also to elements, that is: fn:begin(T) = fn:begin(fn:first(T)) fn:end(T) = fn:end(fn:last(T)) GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF Temporal Projection Specifies which temporal pertinence has to be assigned to the results of a T-SPARQL query The query result can be: a temporal RDF graph consistent with the underlying data model (timeslice query) a regular, non-temporal RDF graph (snapshot query) an arbitrary tuple set A TSQL2-like INTERSECT clause is available to assign the right temporal pertinence to timeslice query results GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF Temporal Projection Given a time point t = (t1, t2,…, tN) T we define the RDF database snapshot valid at t as RDF-TDB(t) = { ( s,p,o ) | ( s,p,o | T ) RDF-TDB t T } In T-SPARQL: CONSTRUCT { ?s,?p,?o } WHERE { TGRAPH < …myURI… > { ?s, ?p, ?o | ?t } . FILTER ?t CONTAINS "(t1, t2,…, tN) " . } Given a time period I = I1 x I2 x … x In T we define the RDF database timeslice valid in I as RDF-TDB(I) = { ( s,p,o | T' ) | ( s,p,o | T ) RDF-TDB T' = T ∩ I ≠ Ø } In T-SPARQL: TCONSTRUCT { ?s,?p,?o | INTERSECT( ?t, "(I1 x I2 x …x IN) " ) . } WHERE { TGRAPH < …myURI… > { ?s, ?p, ?o | ?t } } GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF Timestamp Variables Graph patterns to be used in the WHERE clause of the SELECT statement are augmented with an optional fourth position where matching with triple timestamps can be specified, e.g. _:e ex:Dept "Toys" | ?t where the variable ?t binds to the timestamp of a temporal triple whose (non-temporal) contents are: _:e ex:Dept "Toys" i.e. the timestamp variable ?t represents the time an employee denoted by the blank node _:e has been working in the Toys department GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF Temporal Selection In the T-SPARQL FILTER clause, TSQL2-like temporal (binary infix) predicates can be used to specify constraints over timestamp variables, e.g. FILTER ( VALID(?t) OVERLAPS "[2010-01-01,2010-12-31]"^^xs:period && TRANSACTION(?t) CONTAINS "2009-06-01"^^xs:date ) which only matches timestamps ?t whose valid time component overlaps January 2010 and whose transaction time component contains the June 1, 2009 time point i.e. the temporal triple whose timestamp is bound to ?t is selected only if it is (even partially) valid in January 2010, as of June 1, 2009. GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF Temporal Selection Operators The available comparison operators are the same as in TSQL2: Operator Definition A PREDECES B END(A) is earlier than BEGIN(B) A=B A and B are identical A MEETS B END(A) immediately precedes BEGIN(B) A CONTAINS B Each chronon in B is also contained in A They can be used to compare (monodimensional) temporal elements, periods and time points; also operands with different types can be compared (owing to reducibility to chronon sets) The user-friendly operators, whose definition is close to their meaning in English, form a non minimal but complete set, equivalent to the Allen’s Algebra for intervals and time points GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF Query Examples We assume ex: is a prefix referencing a namespace involving the definition of employee data: @prefix ex: <http://myExample.org/employee/> . Sample employee data (temporal RDF graph): _:emp1 rdf:type ex:emp _:emp1 ex:Name "Ann" _:emp1 ex:Salary "2200"^^xs:integer | "[2009-06-01,2009-09-30]+[2009-06-01,UC]"^^xs:temporalElement _:emp2 rdf:type ex:emp _:emp2 ex:Name "Tom" _:emp2 ex:Salary "2000"^^xs:integer | "[2008-01-01,2008-12-31]"^^xs:temporalElement _:emp2 ex:Salary "2200"^^xs:integer | "[2009-01-01,UC]"^^xs:temporalElement GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF Query Example (1) A query involving both temporal selection and projection (result not organized as a temporal RDF graph) SELECT ?salary INTERSECT(?t,"[2007-01-01,2009-12-31]") WHERE { ?emp rdf:type ex:emp ; ex:Name "Tom" ; ex:Salary ?salary | ?t . FILTER ( VALID(?t) OVERLAPS "[2007-01-01,2009-12-31]"^^xs:period ) . } The query retrieves the Tom’s salary history from 2007 to 2009 An implied conjunct && TRANSACTION(?t) CONTAINS fn:current-date() is assumed in the FILTER clause to retrieve only current data GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF Query Example (2) A similar query can be used to retrieve the same data after a database rollback to the beginning of 2008 SELECT ?salary INTERSECT(?t,"[2007-01-01,2009-12-31]") WHERE { ?emp rdf:type ex:emp ; ex:Name "Tom" ; ex:Salary ?salary | ?t . FILTER ( VALID(?t) OVERLAPS "[2007-01-01,2009-12-31]"^^xs:period && TRANSACTION(?t) CONTAINS "2008-01-01"^^xs:date ) . } The query retrieves the Tom’s salary history from 2007 to 2009, as of January 1, 2008 GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF Query Example (3) A query involving both temporal selection and projection (result not organized as a temporal RDF graph) SELECT ?salary INTERSECT(?t,"[2007-01-01,2009-12-31]") WHERE { ?emp rdf:type ex:emp ; ex:Name "Tom" ; ex:Salary ?salary | ?t . FILTER ( VALID(?t) OVERLAPS "[2007-01-01,2009-12-31]"^^xs:period ) . } The query retrieves the Tom’s salary history from 2007 to 2009 An implied conjunct && TRANSACTION(?t) CONTAINS fn:current-date() is assumed in the FILTER clause to retrieve only current data GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF Query Example (4) A query involving a sort of temporal join involving a comparison between the duration of two validity periods SELECT ?ename WHERE { ?emp1 rdf:type ex:emp ; ex:Name "Ann" ; ex:Salary ?salary | ?ts . ?emp2 rdf:type ex:emp ; ex:Name ?ename ; ex:Dept "Toys" | ?tt . FILTER ( ?salary > "20000"^^xs:integer && xs:duration(VALID(?tt)) > xs:duration(VALID(?ts)) ) . } The query retrieves the name of the employees (?emp2) who have worked in the Toys department longer than Ann (?emp1) has made $20,000 GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF Query Example (5) An optional modifier PERIOD can be specified in the declaration of temporal variables SELECT ?ename WHERE { ?emp1 rdf:type ex:emp ; ex:Name ?ename ; ex:Dept "Sales" | ?t . FILTER ( xs:duration(VALID(?tt)) > "P2Y"^^xs:duration ) ) . } This first query version retrieves the name of the employees who worked in the Sales department for more than two years (altogether) GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF Query Example (6) An optional modifier PERIOD can be specified in the declaration of temporal variables SELECT ?ename WHERE { ?emp1 rdf:type ex:emp ; ex:Name ?ename ; ex:Dept "Sales" | ?t PERIOD . FILTER ( xs:duration(VALID(?tt)) > "P2Y"^^xs:duration ) ) . } This second query version retrieves the name of the employees who worked (continuously) in the Sales department for a period longer than two years GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF Query Example (7) The PERIOD modifier can also be used to refernce consecutive periods within the same data history SELECT ?ename ?job WHERE { ?emp rdf:type ex:emp ; ex:Name ?ename ; ex:Job ?job | ?t1 PERIOD . ex:Job "Direct2or" | ?t2 PERIOD . ex:Job ?job | ?t3 PERIOD . FILTER ( VALID(?t1) MEETS VALID(?t2) && VALID(?t2) MEETS VALID(?t3) ) . } This query retrieves the name of the employees who returned to their previous job (?job) after having been directors for some time GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF Conclusions and Future Work We presented T-SPARQL a temporal SPARQL extension supporting the temporal RDF database model we introduced in [Grandi 2009] employing triple timestamping with multi-dimensional temporal elements T-SPARQL is equipped with the basic temporal constructs introduced for the TSQL2 query language and works with an extended set of the temporal datatypes, functions and operators available in the SPARQL specification Future work will consider the design and implementation of a prototype query engine supporting a T-SPARQL interface and the adoption of suitable index and storage structures for efficiently querying temporal RDF graphs GraphQ 2010 - F. Grandi - T-SPARQL: a TSQL2-like Temporal Query Language for RDF