Download 1995-01-01:1996-12-31

ArchIS: An Efficient Transaction-Time Temporal Database System Built on Relational Databases and XML Fusheng Wang University of California, Los Angeles Motivation: Temporal Applications Most database applications are temporal in nature:  Financial applications  Record-keeping applications  Scheduling applications  Scientific applications Temporal Databases: the Reality  Over 40 temporal data models and query languages have been proposed in the past  A long struggle to get around the limitations of RDBMS  No DBMS vendors have moved aggressively to extend SQL with temporal support What’s Needed? A temporal database system that provides:  Expressive temporal representations and data models with minimal or no extension  Powerful languages for temporal queries with minimal or no extension  Indexing, clustering and query optimization techniques for efficient query support  Architectures that bring these together Outline  Motivation  Viewing Relation History in XML  Temporal Queries with XQuery  The ArchIS System  Performance Study  Database Compression  Conclusion Background: Publishing Relational Database as XML  Publishing relational DBs as XML as actual XML documents: SQL/XML as XML views: SilkRoute, XPeranto Viewing Relation History in XML  Our proposal: view the history of relational DBs as XML documents: Such history can be naturally represented in XML, without any extension to the data model Temporal queries can be expressed in XQuery as is—without any extension to the language Amenable for efficiently implementations Temporal Grouping in XML  Temporal data models can be classified as:  Temporally ungrouped  Temporally grouped  Temporally grouped data models have more expressive power and are more natural for users  It is difficult to fit temporally grouped models into RDBMS  Temporally grouped data model can be represented well in XML Example: Transaction-Time History of Tables Timestamped tuple snapshots (temporally ungrouped)  deptno DOB start end Engineer d01 1945-04-09 1995-01-01 1995-05-31 70000 Engineer d01 1945-04-09 1995-06-01 1995-09-30 10003 70000 Sr Engineer d02 1945-04-09 1995-10-01 1996-01-31 10003 70000 Tech Leader d02 1945-04-09 1996-02-01 1996-12-31 name empno salary title Bob 10003 60000 Bob 10003 Bob Bob Temporally grouped history of employees  name empno salary 60000 1995-01-01:1996-05-31 Bob Engineer 1995-01-01:1995-09-30 deptno 70000 1995-10-01:1996-01-31 Tech Leader 1995-06-01:1996-12-31 1996-02-01:1996-12-31 DOB d01 1995-01-01:1995-09-30 Sr Engineer d02 10003 1995-01-01:1996-12-31 1995-01-01:1996-12-31 title 1945-04-09 1995-10-01:1996-12-31 1995-01-01:1996-12-31 XML Representation of DB History <employees tstart="1995-01-01" tend="1996-12-31"> <employee tstart="1995-01-01" tend="1996-12-31"> <empno tstart="1995-01-01" tend="1996-12-31">10003</empno> <name tstart="1995-01-01" tend="1996-12-31">Bob</name> <salary tstart="1995-01-01" tend="1995-05-31">60000</salary> <salary tstart="1995-06-01" tend="1996-12-31">70000</salary> <title tstart="1995-01-01" tend="1995-09-30">Engineer</title> <title tstart="1995-10-01" tend="1996-01-31">Sr Engineer</title> <title tstart="1996-02-01" tend="1996-12-31">Tech Leader</title> <deptno tstart="1995-01-01" tend="1995-09-30">d01</deptno> <deptno tstart="1995-10-01" tend="1996-12-31">d02</deptno> <DOB tstart="1995-01-01" tend="1996-12-31">1945-04-09</DOB> </employee>  </employees> Advantages of XML Representations  The attribute value history is grouped, and can be queried directly  The H-document has a well-defined schema generated from the current table  The interval constraints are maintained in the updates Outline  Motivation  Viewing Relation History in XML  Temporal Queries with XQuery  The ArchIS System  Performance Study  Database Compression  Conclusion Temporal Queries with XQuery  XQuery: the coming standard query language for XML  With XQuery, we can specify temporal queries without any extension:  Temporal projection, snapshot queries, temporal joins, interval queries queries: A SINCE B, continuous periods, period containment  Complex Temporal Queries with XQuery  Temporal projection: retrieve the salary history of “Bob”: element salary_history { for $s in doc("employees.xml")/ employees/employee/[name=“Bob”]/salary return $s }  Snapshot queries: retrieve the departments on 1996-01-31: for $d in doc("depts.xml")/depts/dept [tstart(.) <= "1996-01-31" and tend(.) >= "1996-01-31"] let $n := $d/name[tstart(.)<="1996-01-31" and tend(.)>="1996-01-31"] let $m := $d/manager[tstart(.)<="1996-01-31" and tend(.)>= "1996-01-31"] return( element dept{$n,$m } ) Temporal Functions Shield the user from the low-level details used in representing time, e.g., “now”  Eliminate the need for the user to write complex functions, e.g., coalescing  Predefined functions:   Restructuring: coalese($l)  Period tmeets comparison : toverlaps, tprecedes, tcontains, tequals,  Duration and date/time: tstart($e), tend($e), timespan($e)  telement(Ts, Te): constructs an empty element element timestamped as tstart=Ts, tend=Te Support for ‘now’ ‘now’: no change until now  Internally, “end of time” values are used to denote ‘now’, e.g., 9999-12-31  Intervals are only accessed through built-in functions: tstart() returns the start of an interval, tend() returns  the end or CURRENT_DATE if it’s different from 9999-12-31  In the output, tend value can be:  “9999-12-31”  CURRENT_DATE by using rtend($e) that recursively replaces all the occurrence of 9999-12-31 with the current date,  “now”, using externalnow($e) that recursively replaces all the occurrence of \9999-12-31" with the string \now". Outline  Motivation  Viewing Relation History in XML  Temporal Queries with XQuery  The ArchIS System  Performance Study  Database Compression  Conclusion The ArchIS System  Two approaches are possible for storing and querying Hdocuments (H-views)  Native XML database approach: store H-documents directly into XML DB  XML-enabled RDBMS. Design issues include:  mapping (shredding) the XML views representing the Hdocuments into tables (H-tables)  translation  indexing,  of queries from the XML views to the H-tables clustering and query mapping techniques ArchIS: Archival Information System The ArchIS System: Architecture Current Database Relational Data SQL Queries Active Rules/ update logs Temporal XML Data H-views (H-documents) H-tables Temporal XML Queries A R C H I S H-tables  Assumptions  Each entity or relation has a unique key ( or composite keys) to identify it which will not change along the history. e.g., employee: empno  H-tables:  attribute history table: store history of each attribute  key table: built for the key  global relation table: record the history of relations  e.g.: current database:  employee(empno, title) name, sex, DOB, deptno, salary, H-tables current table employee (cont’d) H-tables relations(relationname, tstart, tend) global relation table empno key table employee_id(id, tstart, tend) name attribute history employee_name(id, name, tstart, tend) … table … salary employee_salary(id, salary, tstart, tend) title employee_title(id, title, tstart, tend) H-tables (cont’d)  Sample contents of employee_salary: ID ======= 100022 100022 100022 100022 100022 100023 ... SALARY ======= 58805 61118 65103 64712 65245 43162 TSTART ========== 02/04/1985 02/05/1986 02/05/1987 02/05/1988 02/04/1989 07/13/1988 TEND ========== 02/04/1986 02/04/1987 02/04/1988 02/03/1989 02/03/1990 07/13/1989 Updating Table Histories  Changes in the current database can be tracked with either update logs or triggers  DB2: triggers  ArchIS: update logs Query Mapping  General purpose query mapping: XPeranto  In ArchIS, we have well-defined mapping between H-documents (or H-views) and Htables  We map temporal XQuery queries into SQL, utilizing SQL/XML  SQL/XML is a new standard to map between RDBMS and XML  Both tag-binding and structure construction is pushed inside the relational engine, thus be very efficient SQL/XML Publishing Functions  XMLElement and XMLAttribute select XMLElement (Name "dept", XMLAttributes (tstart as "tstart", tend as "tend"), deptname) from dept where deptname = ‘Sales’ <dept tstart = "02/04/1985" tend = "12/31/9999"> Sales </dept>  XMLAgg select XMLElement (Name as "new_employees", XMLAttributes ("02/04/2003" as "Since") XMLAgg (XMLElement (Name as "employee", e.name)) from employee_name as e where e.tstart >= ‘02/04/2003’ <new_employees Since = "02/04/2003"> <employee>Bob</employee> <employee>Jack</employee> </new_employees> XQuery Mapping to SQL with SQL/XML  Temporal projection: retrieve the salary history of “Bob”: element salary_history { for $s in doc("employees.xml")/ employees/employee/[name=“Bob”]/salary return $s } select XMLElement (Name "salaryhistory", XMLAgg (XMLElement (Name as "salary", XMLAttributes (S.tstart as tstart, S.tend as "tend"), S.salary))) from employee_salary as S, employee_name as N where N.id = S.id and N.name = 'Bob' group by N.id XQuery Mapping to SQL with SQL/XML: Steps  Identification of variable range  Map variables in FOR/LET clause into underlying Htables  Generation of join conditions  There is a join condition any pair of distinct tuple variables: join them by ids  Translation of built in functions  Map built-in temporal functions in XQuery into functions in ArchIS  Output generation  use XMLElement and XMLAgg constructs Temporal Clustering and Indexing  Tuples in H-tables are stored in the order of updates, thus neither temporally clustered nor clustered by objects  Traditional indexes such as B+ Tree will not help on snapshot queries, and better temporal clustering is needed  For every segment, usefulness: U = Nlive/Nall  At the beginning, U =100%, and it decreases with updates  The minimum tolerable usefulness: Umin Segment-based Clustering Scheme Live Live All All All Segment 1 Segment 2 Segment 3 segstart1 segend1 segstart2 segend2 tstarttuple <= segendSEG tendtuple >= segstartSEG segstart3 segend3 Segment-based Clustering Scheme  Initially all tuples for an attribute history table are archived in a live segment SEGlive with usefulness U =100%. With updates, when U drops below Umin: 1. A new segment is allocated; 2. The interval of this segment is recorded in the table segment(segno, segstart, segend); 3. All tuples in SEGlive are copied into a new segment Si sorted by id; 4. All live tuples in SEGlive are copied into a new live segment SEGlive', and the old live segment is dropped; After that, the new segment SEGlive’ becomes the new starting segment for updates Segment-based Clustering Scheme (cont’d)  Sample segments: Segment1 (01/01/1985 - 10/17/1991): ID SALARY TSTART TEND 100002 40000 02/20/1988 02/19/1989 100002 42010 02/20/1989 02/19/1990 100002 42525 02/20/1990 02/19/1991 100002 42727 02/20/1991 12/31/9999 ... Segment2 (10/18/1991 - 07/08/1995): ID SALARY TSTART TEND 100002 42727 02/20/1991 02/19/1992 100002 45237 02/20/1992 02/18/1993 100002 46465 02/19/1993 02/18/1994 100002 47418 02/19/1994 02/18/1995 100002 47273 02/19/1995 12/31/9999 ... Advantages of Segment-based Clustering Scheme  The current live segment always has a high usefulness, assuring efficient updates;  Records are globally temporally clustered on segments;  For snapshot queries, only one segment is used; for interval queries, only segments involved are used;  Flexibility to control the number of redundant tuples in segments with Umin Storage Usage of Segment-based Clustering Relative storage size with different Umin 1.6 Storage Size Nseg <= N0/(1-Umin) 1/(1-Umin) Testing Data(Umin=0) 1.7 1.5 1.4 1.3 1.2 1.1 1.0 0.0 0.1 0.2 Umin NS 0.3 0.4 Query Performance on Temporal Data with Segment-based Clustering ArchIS without segment-based clustering ArchIS with segment-based clustering Queries: Point: Q1 Snapshot: Q2 Interval: Q5 History: Q3, Q4, Q6 5 Seconds 4 3 2 1 Q1 Q2 Q3 Q4 Q5 Q6 Outline  Motivation  Viewing Relation History in XML  Temporal Queries with XQuery  The ArchIS System  Performance Study  Database Compression  Conclusion Performance Study: Experimental Setup  Systems: Tamino, DB2, and ArchIS  ArchIS uses BerkeleyDB as its storage manager, and it builds on top of it a SQL query engine  Temporal data set: the history of 300,024 employees over 17 years  The simulation models real world salary increases, changes of titles, and changes of departments  The size of the XML data is 334MB  The single large XML document is cut into a collection of 15,000 small XML documents with around 25KB each  Machine: Pentium IV 2.4GHz PC with RedHat 8.0 Performance Study: Query Performance 100 DB2 ArchIS Tamino DB2 and ArchIS: with clustering Tamino: without clustering Seconds 10 1 0.1 Q1 Q2 Q3 Q4 Q5 Q6 snapshot query Q2 on ArchIS is 137 times faster than that on Tamino; interval query Q5 is 91 times faster; history Q6 is 25 times faster; Q4 4 times faster, and Q3 near 3 times faster. Tamino with clustering: snapshot Q2 is 3.3 times faster than without clustering ( still 41 times slower than archIS); interval query Q5 is 2.9 times faster than without clustering ( still 31 times slower than on ArchIS); history queries are much slower Storage Utilization Compression Ratio 1.5 1.0 0.5 0.0 DB2 ArchIS Tamino (with compression) Outline  Motivation  Viewing Relation History in XML  Temporal Queries with XQuery  The ArchIS System  Performance Study  Database Compression  Conclusion Database Compression  The disparity between CPU/memory and disk speeds is becoming larger and larger  Cost to read one IDE disk page: 14ms  Cost to uncompress one page: 1.1ms(500MHz CPU) 0.26ms(2.4GHz CPU)  Cost to retrieve one compressed page: 14ms + 0.26ms = 14.3ms  Cost to retrieve uncompressed pages (3.6 pages):  14ms x 3.6 = 50.4ms Page-based Compression: PageZIP  Traditional data compression tools: compress a file as a whole  PageZIP: page-based compression and uncompression at the granularity of a page  Based on gzip library: zlib  Benefit: save space; point, snapshot or interval queries only retrieve a small fraction of the history, and can be efficient PageZIP page 1 page 2 page 3 … Segment 1 … Segment n ID: 1001 - 1100 ID: 1100 - 1203 ID: 1203 - 1331 Storage Utilization with Compression For each attribute history table, we compress it as a sequence of pages and store each page as a BLOB in a RDBMS employee_salary (sid, salary, tstart, tend) => employee_salary_blob(pageno, startsid, endsid, pageblob) without compression 1.5 Compression ratio  with compression 1.0 0.5 0.0 Tamino DB2 ArchIS Query Performance with Compression DB2 without compression DB2 with compression Tamino Seconds 100 ArchIS without compression ArchIS with compression 10 1 Q1 Q2 Q3 Q4 Q5 Q6 Update Performance  For RDBMS, only the current segment is used for updates. For Tamino, current data and historical data are clustered together  Update an employee’s salary:   DB2: 0.29 seconds; Tamino: 1.2 seconds Assume that every employee gets updated once a year: about 1/260 of the total employee get updated every day on average  DB2: 1.52 seconds; Tamino: 15 seconds  In the worse case for segment-based archiving: 39 seconds for copying segments and 36 segments for compression: but only once Summary  We built a transaction time temporal database on RDBMS and XML, with:  XML to support temporally grouped (virtual) representations of the database history  XQuery to express powerful temporal queries on such views  temporal clustering for managing the actual historical data in a RDBMS  SQL/XML for executing the queries on the XML views as equivalent queries on the relational DB  compression as option for efficient storage  ArchIS provides a unified solution for a wide spectrum of temporal application problems Future Work  Friendly temporal query interfaces based on temporally grouped models  Other clustering and indexing techniques to be investigated  Other efficient data compression techniques proposed for XML data to be investigated  Apply the approach to valid-time DB and bitemporal DB  Apply the approach to OODBMS and semistructured data model

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 1995-01-01:1996-12-31