Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Temporal Databases S. Srinivasa Rao April 12, 2007 [Part 1 based on Ch23 of C.J. Date (slides by Prof. Ghafoor, EE 562)] [Part 2 based on slides by Prof. Arge, I/O-algorithms] Outline • Part 1: Introduction to temporal databases • Part 2: Temporal index: Persistent B-tree and its applications 2 Introduction • Temporal database: a database that contains historical data as well as current data. – Note: ‘historical’ is a misleading term – temporal databases may contain data regarding the future as well as the past. • Extreme case: data is only inserted, never deleted from a temporal database (eg. vehicle position data in the ‘project’). • So far, we have studied the other extreme - i.e. ‘snapshot’ databases. • Distinguishing feature: the element of time. 3 Introduction • Temporal data: encoded representation of timestamped facts. – Each tuple must include at least one timestamp. – Problem:What about queries that produce results that are not temporal? i.e. result of query is outside the domain of (temporal) database. – eg. Get names of all people who have supplied something in the past. • Redefine temporal database: database that includes, but is not limited to, temporal data. 4 Motivation • Queries on time-varying data are difficult to express in SQL. • Temporal databases provide build-in support for recording and querying such information. • It is possible to use SQL to evaluate these queries, but performance is poor. 5 Motivation • Most applications manage temporal data. • If a temporal database is used for such data: – Schemas, including integrity constraints are simpler. – Queries are simpler • Application code is less complex – easier to understand – easier to produce – easier to maintain 6 Applications Most applications of database technology are temporal in nature: • Financial apps.: portfolio management, accounting & banking, stock market analysis, audit analysis • Record-keeping apps.: personnel, medical records, inventory management, legal records (commercial laws change frequently) • Data Warehousing: historical trends for analysis • Scheduling apps.: airline, car, hotel reservations and project management • Scientific apps.: weather monitoring, chemical process monitoring 7 Intervals • An interval [s,e] is a set of times from time s to time e. – Does interval [s,e] represent an infinite set? – Assumption: Timeline is a finite sequence of discrete, indivisible time quanta. • Time Quanta: smallest unit of time system can represent. • Timepoints/point: time unit considered indivisible for our purpose. • An interval is treated as a single type, not as pair of separate values. • Interval can be open/closed w.r.t. start point/end point. – eg. [d04,d10],[d04,d11),(d03,d10],(d03,d11) all represent the sequence of days from day4 to day10 inclusive. 8 Operators on Intervals • Temporal predicate operators: i1 = [s1,e1]; i2 = [s2,e2] i1 – i1 BEFORE i2 (e1<s2) – i1 MEETS i2 i1 (s2 = e1) – i1 EQUALS i2 (s1 = s2 AND e1 = e2) – i1 OVERLAPS i2 (s2 < s1 < e2 OR s1 < s2 < e1) i2 i2 i1 i2 i1 i2 9 Operators on Intervals i1 – i1 DURING i2 (s2 < s1 AND e2 > e1 ) – i1 STARTS i2 (s1 = s2 AND e1 < e2) – i1 FINISHES i2 (e1 = e2 AND s1 > s2) i2 i1 i2 i1 i2 • Additional operators: – i1 MERGES i2: – i1 CONTAINS i2: (i1 MEETS i2 OR i1 OVERLAPS i2) (i2 DURING i1) 10 Scalar and Relational Operators • DURATION(i) - returns the number of time points in i – eg. DURATION ([d03,d07]) returns 5 • i1 UNION i2 – returns [MIN(s1,s2),MAX(e1,e2) ] if (i1 MERGES i2) otherwise undefined • i1 INTERSECT i2 – returns [MAX(s1,s2),MIN(e1,e2)] if (i1 OVERLAPS i2) otherwise undefined 11 Aggregate Operators • EXPAND(X): Where X is a set. The output is also a set. Used to generate time quantum intervals. – The expanded form of X is the set of all intervals of the form [p,p] where p is a time point in some interval in X. • e.g.: – X1 = { [d01,d01],[d03,d05],[d04,d06] } – X2 = { [d01,dp1],[d03,d04],[d05,d05],[d05,d06] } – X3 = { [d01,d01],[d03,d03],[d04,d04],[d05,d05],[d06,d06] } – Then EXPAND(X1) = EXPAND(X2) = X3 12 Aggregate Operators • COLLAPSE(X): The collapsed form of X is the set Y of intervals of the same type such that – (a) X & Y have the same unfolded form. – (b) no two distinct members i1 and i2 of Y are such that (i1 MERGES i2) is true. • e.g.: – X1 = { [d01,d01],[d03,d05],[d04,d06] } – X2 = { [d01,d01],[d03,d04],[d05,d05],[d05,d06] } – X3 = { [d01,d01],[d03,d06] } – Then COLLAPSE (X1) = COLLAPSE (X2) = X3 13 Relation Operators Involving Intervals • PACK r on A: groups the relation r by all its attributes apart from A This is equivalent to WITH ( r GROUP {A} AS X ) AS R1 ( EXTEND R1 ADD COLLAPSE (X) AS Y ) {ALL BUT X } AS R2 : R2 UNGROUP Y • UNPACK r on A: Replace COLLAPSE with EXPAND in PACK. 14 Example Given two temporal relations: S: Supplier S# was under contract during the interval During SP: Supplier S# was able to supply part P# during the interval During SP S# P# During S1 P1 [d04,d10] S1 P7 [d05,d10] S1 P3 [d09,d10] S1 P5 [d06,d10] S S# During S1 [d04,d10] S2 P1 [d02,d04] S2 P9 [d03,d03] S2 P1 [d08,d10] S2 [d02,d04] S2 [d07,d10] S3 [d03,d10] S4 P2 [d06,d09] S4 [d04,d10] S4 P5 [d04,d08] S5 [d02,d10] S4 P7 [d05,d10] S2 P5 [d09,d10] S3 P1 [d08,d10] 15 Example 1 • Active supplier intervals: Get S#-DURING pairs for suppliers who have been able to supply at least one part during at least one interval of time, where DURING designates such an interval. • PACK SP {S#,DURING} ON DURING RESULT S# During SP S# P# During S1 P1 [d04,d10] S1 P7 [d05,d10] S1 P3 [d09,d10] S1 P5 [d06,d10] S2 P1 [d02,d04] S2 P9 [d03,d03] S2 P1 [d08,d10] S1 [d04,d10] S2 [d02,d04] S3 P1 [d08,d10] S2 [d08,d10] S4 P2 [d06,d09] S3 [d08,d10] S4 P5 [d04,d08] S4 [d04,d10] S4 P7 [d05,d10] S2 P5 [d09,d10] 16 Example 2 • Inactive (passive) supplier intervals: Get S#-DURING pairs for suppliers who have been unable to supply any parts at all during at least one interval of time, where DURING designates such an interval. • PACK ( ( UNPACK S {S#,DURING} ON DURING ) MINUS ( UNPACK SP {S#,DURING} ON DURING ) ) ON DURING RESULT S# During S2 [d07,d07] S3 [d03,d07] S5 [d02,d10] • Shorthand: U_MINUS 17 More Relational Operators • USING ( AList ) ◄ r1 op r2 ► is a shorthand for: PACK ( ( UNPACK r1 on (AList) ) op ( UNPACK r1 on (AList) ) ) ON (AList) Where op is either UNION, INTERSECT, MINUS or JOIN • Various comparison operators on relations are defined similarly. USING ( AList ) ◄ r1 rel-op r2 ► is equivalent to ( ( UNPACK r1 on (AList) ) rel-op ( UNPACK r1 on (AList) ) ) 18 Part 2 Persistent B-trees and applications 19 Persistent B-tree • In some applications we are interested in being able to access previous versions of data structure – Databases – Geometric data structures • Partial persistence: – Update the current version (getting a new version) – Query all versions • We would like to have partial persistent B-tree with – O(N/B) space – N is number of updates performed – O(log B N ) update – O(log B N T B) query in any version 20 Persistent B-tree • East way to make B-tree partial persistent – Copy structure at each operation – Maintain “version-access” structure (B-tree) update i i+1 i+2 i i+1 i+2 i+3 • Good O(log B N T B) query in any version, but – O(N/B) I/O update – O(N2/B) space 21 Persistent B-tree • Idea: Elements augmented with “existence interval” and stored in one structure • Persistent B-tree with parameter b: – Directed graph * Nodes contain elements augmented with existence interval * At any time t, nodes with elements alive at time t form B-tree with leaf and branching parameter b (i.e., each node/leaf has at least b/4 and at most b children/keys in them) – B-tree with leaf and branching parameter b on indegree 0 nodes If b=B: Query at any time t in O(log B N T B) I/Os 22 Persistent B-tree: Updates • Updates performed as in B-tree • To obtain linear space we maintain new-node invariant: – New node contains between 3 8 B and 7 8 B alive elements and no dead elements 1 4 1 8 B B 3 8 1 2 B 1 8 B 7 8 B B B 23 Persistent B-tree Insert • Search for relevant leaf u and insert new element • If u contains B+1 elements: Block overflow – Version split: Mark u dead and create new node u’ with x alive element – If x 7 8 B: Strong overflow – If x 3 8 B: Strong underflow – If 3 8 B x 7 8 B then recursively update parent(u): Delete (persistently) reference to u and insert reference to u’ 1 4 B 3 8 B 7 8 B B 1 4 B 3 8 B 7 8 B B 24 Persistent B-tree Insert • Strong overflow ( x 7 8 B) – Split u into u’ and u’’ with x 2 elements each ( 3 8 B x 2 – Recursively update parent(u): Delete reference to u and insert reference to v’ and v’’ 1 4 B 3 8 B 171 B 33B B BB 484 B 88 771 BB 884 B B83BB 1 2 B) 7 8 B B • Strong underflow ( x 3 8 B) – Merge x elements with y live elements obtained by version split on sibling ( 1 2 B x y 118 B) – If x y 7 8 B then (strong overflow) perform split into nodes with (x+y)/2 elements each ( 716 B ( x y) / 2 1116 B ) – Recursively update parent(u): Delete two insert one/two references 25 Persistent B-tree Delete • Search for relevant leaf u and mark element dead • If u contains x 1 4 B alive elements: Block underflow – Version split: Mark u dead and create new node u’ with x alive element – Strong underflow ( x 3 8 B ): Merge (version split) and possibly split (strong overflow) – Recursively update parent(u): Delete two references insert one or two references 1 4 1 8 B B 3 8 1 2 B 1 8 B 7 8 B B B 26 Persistent B-tree Insert Delete done Block underflow 0,0 Block overflow Version split done Version split Strong overflow Strong underflow Split Merge -1,+1 Strong overflow done -1,+2 1 4 1 8 B B 3 8 1 2 B 1 8 B 7 8 B B B done -2,+1 Split done -2,+2 27 Persistent B-tree Analysis • Update: O(log B N ) – Search and “rebalance” on one root-leaf path • Space: O(N/B) – At least 18 B updates in leaf in existence interval – When leaf u dies * At most two other nodes are created * At most one block over/underflow one level up (in parent(u)) 1 1 1 B B B – During N updates we create: 8 8 2 * O( N B) leaves 7 * O( N Bi ) nodes i levels up 1 B B B 83 B 8 4 O( N B i ) O( N B ) blocks i 28 Summary/Conclusion: Persistent B-tree • Persistent B-tree – Update current version – Query all versions • Efficient implementation obtained using existence intervals – Standard technique • During N operations – O(N/B) space – O(log B N ) update – O(log B N T B) query 29 Interval Management • Problem: – Maintain N intervals with unique endpoints dynamically such that stabbing query with point x can be answered efficiently x • As in (one-dimensional) B-tree case we are interested in – O( N B) space – O(log B N ) update – O(log B N T B) query 30 Interval Management: Static Solution • Sweep from left to right maintaining persistent B-tree – Insert interval when left endpoint is reached – Delete interval when right endpoint is reached x • Query x answered by reporting all intervals in B-tree at “time” x – O( N B) space – O(log B N T B) query – O( NB log B N ) construction using buffer technique • Dynamic with O(log 2B N ) insert bound using logarithmic method 31 Internal Memory Logarithmic Method Idea • Given (semi-dynamic) structure D on set V – O(log N) query, O(log N) delete, O(N log N) construction • Logarithmic method: – Partition V into subsets V0, V1, … Vlog N, |Vi| = 2i or |Vi| = 0 – Build Di on Vi .................................. * Delete: O(log N) 2 2 2 * Query: Query each Di O(log2 N) * Insert: Find first empty Di and construct Di out of 0 1 2 2 log N 1 ij10 2 j 2i elements in V0,V1, … Vi-1 – O(2i log 2i) construction O(log N) per moved element – Element moved O(log N) times O(log2 N ) amortized 32 External Logarithmic Method Idea • Decrease number of subsets Vi to logB N to get O(log 2B N ) query .................................. B0 B1 B2 B log B N • Problem: Since 1 ij10 B j Bi there are not enough elements in V0,V1, … Vi-1 to build Vi • Solution: We allow Vi to contain any number of elements Bi i i – Insert: Find first Di such that j 0 V j B and construct new Di from elements in V0,V1, … Vi * We move i 1 j 0 V j B i 1elements * If Di constructed in O((|Vi|/B)logB |Vi|) = O(Bi-1logB N) I/Os every moved element charged O(logB N) I/Os * Element moved O(logB N) times O(log 2B N ) amortized 33 External Logarithmic Method Idea • Given (semi-dynamic) linear space external data structure with – O(log B N T B) I/O query – O( NB log B N ) I/O construction (– O(log B N ) I/O delete) • Linear space dynamic data structure with – O(log2B N T B) I/O query – O(log 2B N ) I/O insert amortized (– O(log B N ) I/O delete) • Dynamic interval management – O(log2B N T B) I/O query – O(log 2B N ) I/O insert amortized x 34 Planar Point Location • Static problem: – Store planar subdivision with N segments on disk such that region containing query point q can be found I/O-efficiently • We concentrate on vertical ray shooting query – Segments can store regions it bounds – Segments do not have to form subdivision q • Dynamic problem: – Insert/delete segments (we will not discuss this) 35 Static Solution • Vertical line imposes above-below order on intersected segments • Sweep from left to right maintaining persistent B-tree on above-below order – Left endpoint: Insert segment – Right endpoint: Delete segment q • Query q answered by successor query on B-tree at time qx – O( N B) space – O(log B N T B) query 36 Static Solution • Note: Not all segments comparable! – Have to be careful about what we compare q • Problem: Routing elements in internal nodes of leaf oriented B-trees – Luckily we can modify persistent B-tree to use regular (live) elements as routing elements • However, buffer technique construction cannot be used • Only O( N log B N ) I/O construction algorithm • Cannot be made dynamic using logarithmic method 37 References • External Memory Geometric Data Structures Lecture notes by Lars Arge. – Section 1-4 • I/O-efficient Point Location using Persistent B-trees – Lars Arge, Andrew Danner and Sha-Mayn Teh 38