* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Temporal Databases
Survey
Document related concepts
Transcript
Temporal Databases
S. Srinivasa Rao
April 12, 2007
[Part 1 based on Ch23 of C.J. Date (slides by Prof. Ghafoor, EE 562)]
[Part 2 based on slides by Prof. Arge, I/O-algorithms]
Outline
• Part 1: Introduction to temporal databases
• Part 2: Temporal index: Persistent B-tree and its applications
2
Introduction
• Temporal database: a database that contains historical data as well
as current data.
– Note: ‘historical’ is a misleading term – temporal databases may contain
data regarding the future as well as the past.
• Extreme case: data is only inserted, never deleted from a temporal
database (eg. vehicle position data in the ‘project’).
• So far, we have studied the other extreme - i.e. ‘snapshot’ databases.
• Distinguishing feature: the element of time.
3
Introduction
• Temporal data: encoded representation of timestamped facts.
– Each tuple must include at least one timestamp.
– Problem:What about queries that produce results that are not
temporal? i.e. result of query is outside the domain of (temporal)
database.
– eg. Get names of all people who have supplied something in the
past.
• Redefine temporal database: database that includes, but is not
limited to, temporal data.
4
Motivation
• Queries on time-varying data are difficult to express in SQL.
• Temporal databases provide build-in support for recording and
querying such information.
• It is possible to use SQL to evaluate these queries, but performance
is poor.
5
Motivation
• Most applications manage temporal data.
• If a temporal database is used for such data:
– Schemas, including integrity constraints are simpler.
– Queries are simpler
• Application code is less complex
– easier to understand
– easier to produce
– easier to maintain
6
Applications
Most applications of database technology are temporal in nature:
• Financial apps.: portfolio management, accounting & banking, stock
market analysis, audit analysis
• Record-keeping apps.: personnel, medical records, inventory management,
legal records (commercial laws change frequently)
• Data Warehousing: historical trends for analysis
• Scheduling apps.: airline, car, hotel reservations and project management
• Scientific apps.: weather monitoring, chemical process monitoring
7
Intervals
• An interval [s,e] is a set of times from time s to time e.
– Does interval [s,e] represent an infinite set?
– Assumption: Timeline is a finite sequence of discrete, indivisible
time quanta.
• Time Quanta: smallest unit of time system can represent.
• Timepoints/point: time unit considered indivisible for our purpose.
• An interval is treated as a single type, not as pair of separate values.
• Interval can be open/closed w.r.t. start point/end point.
– eg. [d04,d10],[d04,d11),(d03,d10],(d03,d11)
all represent the sequence of days from day4 to day10 inclusive.
8
Operators on Intervals
• Temporal predicate operators:
i1 = [s1,e1]; i2 = [s2,e2]
i1
– i1 BEFORE i2
(e1<s2)
– i1 MEETS i2
i1
(s2 = e1)
– i1 EQUALS i2
(s1 = s2 AND e1 = e2)
– i1 OVERLAPS i2
(s2 < s1 < e2 OR s1 < s2 < e1)
i2
i2
i1
i2
i1
i2
9
Operators on Intervals
i1
– i1 DURING i2
(s2 < s1 AND e2 > e1 )
– i1 STARTS i2
(s1 = s2 AND e1 < e2)
– i1 FINISHES i2
(e1 = e2 AND s1 > s2)
i2
i1
i2
i1
i2
• Additional operators:
– i1 MERGES i2:
– i1 CONTAINS i2:
(i1 MEETS i2 OR i1 OVERLAPS i2)
(i2 DURING i1)
10
Scalar and Relational Operators
• DURATION(i) - returns the number of time points in i
– eg. DURATION ([d03,d07]) returns 5
• i1 UNION i2
– returns [MIN(s1,s2),MAX(e1,e2) ]
if (i1 MERGES i2)
otherwise undefined
• i1 INTERSECT i2
– returns [MAX(s1,s2),MIN(e1,e2)]
if (i1 OVERLAPS i2)
otherwise undefined
11
Aggregate Operators
• EXPAND(X):
Where X is a set. The output is also a set.
Used to generate time quantum intervals.
– The expanded form of X is the set of all intervals of the form [p,p]
where p is a time point in some interval in X.
• e.g.:
– X1 = { [d01,d01],[d03,d05],[d04,d06] }
– X2 = { [d01,dp1],[d03,d04],[d05,d05],[d05,d06] }
– X3 = { [d01,d01],[d03,d03],[d04,d04],[d05,d05],[d06,d06] }
– Then
EXPAND(X1) = EXPAND(X2) = X3
12
Aggregate Operators
• COLLAPSE(X):
The collapsed form of X is the set Y of intervals of the same type
such that
– (a) X & Y have the same unfolded form.
– (b) no two distinct members i1 and i2 of Y are such that
(i1
MERGES i2) is true.
• e.g.:
– X1 = { [d01,d01],[d03,d05],[d04,d06] }
– X2 = { [d01,d01],[d03,d04],[d05,d05],[d05,d06] }
– X3 = { [d01,d01],[d03,d06] }
– Then
COLLAPSE (X1) = COLLAPSE (X2) = X3
13
Relation Operators Involving
Intervals
• PACK r on A: groups the relation r by all its attributes apart from A
This is equivalent to
WITH ( r GROUP {A} AS X ) AS R1
( EXTEND R1 ADD COLLAPSE (X) AS Y )
{ALL BUT X } AS R2 :
R2 UNGROUP Y
• UNPACK r on A:
Replace COLLAPSE with EXPAND in PACK.
14
Example
Given two temporal relations:
S: Supplier S# was under contract
during the interval During
SP: Supplier S# was able to supply
part P# during the interval During
SP S# P# During
S1 P1 [d04,d10]
S1 P7 [d05,d10]
S1 P3 [d09,d10]
S1 P5 [d06,d10]
S S# During
S1
[d04,d10]
S2 P1 [d02,d04]
S2 P9 [d03,d03]
S2 P1 [d08,d10]
S2
[d02,d04]
S2
[d07,d10]
S3
[d03,d10]
S4 P2 [d06,d09]
S4
[d04,d10]
S4 P5 [d04,d08]
S5
[d02,d10]
S4 P7 [d05,d10]
S2 P5 [d09,d10]
S3 P1 [d08,d10]
15
Example 1
• Active supplier intervals: Get S#-DURING pairs for
suppliers who have been able to supply at least one
part during at least one interval of time, where
DURING designates such an interval.
• PACK SP {S#,DURING} ON DURING
RESULT S# During
SP S# P# During
S1 P1 [d04,d10]
S1 P7 [d05,d10]
S1 P3 [d09,d10]
S1 P5 [d06,d10]
S2 P1 [d02,d04]
S2 P9 [d03,d03]
S2 P1 [d08,d10]
S1
[d04,d10]
S2
[d02,d04]
S3 P1 [d08,d10]
S2
[d08,d10]
S4 P2 [d06,d09]
S3
[d08,d10]
S4 P5 [d04,d08]
S4
[d04,d10]
S4 P7 [d05,d10]
S2 P5 [d09,d10]
16
Example 2
• Inactive (passive) supplier intervals: Get S#-DURING pairs for
suppliers who have been unable to supply any parts at all during at
least one interval of time, where DURING designates such an
interval.
• PACK
( ( UNPACK S {S#,DURING} ON DURING )
MINUS
( UNPACK SP {S#,DURING} ON DURING ) )
ON DURING
RESULT S# During
S2
[d07,d07]
S3
[d03,d07]
S5
[d02,d10]
• Shorthand: U_MINUS
17
More Relational Operators
• USING ( AList ) ◄ r1 op r2 ► is a shorthand for:
PACK
( ( UNPACK r1 on (AList) ) op ( UNPACK r1 on (AList) ) )
ON (AList)
Where op is either UNION, INTERSECT, MINUS or JOIN
• Various comparison operators on relations are defined similarly.
USING ( AList ) ◄ r1 rel-op r2 ► is equivalent to
( ( UNPACK r1 on (AList) ) rel-op ( UNPACK r1 on (AList) ) )
18
Part 2
Persistent B-trees
and applications
19
Persistent B-tree
• In some applications we are interested in being able to access
previous versions of data structure
– Databases
– Geometric data structures
• Partial persistence:
– Update the current version (getting a new version)
– Query all versions
• We would like to have partial persistent B-tree with
– O(N/B) space – N is number of updates performed
– O(log B N ) update
– O(log B N T B) query in any version
20
Persistent B-tree
• East way to make B-tree partial persistent
– Copy structure at each operation
– Maintain “version-access” structure (B-tree)
update
i
i+1
i+2
i
i+1
i+2
i+3
• Good O(log B N T B) query in any version, but
– O(N/B) I/O update
– O(N2/B) space
21
Persistent B-tree
• Idea: Elements augmented with “existence interval” and stored in
one structure
• Persistent B-tree with parameter b:
– Directed graph
* Nodes contain elements augmented with existence interval
* At any time t, nodes with elements alive at time t form B-tree
with leaf and branching parameter b (i.e., each node/leaf has
at least b/4 and at most b children/keys in them)
– B-tree with leaf and branching parameter b on indegree 0 nodes
If b=B: Query at any time t in O(log B N T B) I/Os
22
Persistent B-tree: Updates
• Updates performed as in B-tree
• To obtain linear space we maintain new-node invariant:
– New node contains between 3 8 B and 7 8 B alive elements and no
dead elements
1
4
1
8
B
B
3
8
1
2
B
1
8
B
7
8
B
B
B
23
Persistent B-tree Insert
• Search for relevant leaf u and insert new element
• If u contains B+1 elements: Block overflow
– Version split:
Mark u dead and create new node u’ with x alive element
– If x 7 8 B: Strong overflow
– If x 3 8 B: Strong underflow
– If 3 8 B x 7 8 B then recursively update parent(u):
Delete (persistently) reference to u and insert reference to u’
1
4
B
3
8
B
7
8
B
B
1
4
B
3
8
B
7
8
B
B
24
Persistent B-tree Insert
• Strong overflow ( x 7 8 B)
– Split u into u’ and u’’ with x 2 elements each ( 3 8 B x 2
– Recursively update parent(u):
Delete reference to u and insert reference to v’ and v’’
1
4
B
3
8
B
171 B 33B
B BB
484 B 88
771
BB
884 B
B83BB
1
2 B)
7
8
B
B
• Strong underflow ( x 3 8 B)
– Merge x elements with y live elements obtained by version split on
sibling ( 1 2 B x y 118 B)
– If x y 7 8 B then (strong overflow) perform split into nodes
with (x+y)/2 elements each ( 716 B ( x y) / 2 1116 B )
– Recursively update parent(u): Delete two insert one/two references
25
Persistent B-tree Delete
• Search for relevant leaf u and mark element dead
• If u contains x 1 4 B alive elements: Block underflow
– Version split:
Mark u dead and create new node u’ with x alive element
– Strong underflow ( x 3 8 B ):
Merge (version split) and possibly split (strong overflow)
– Recursively update parent(u):
Delete two references insert one or two references
1
4
1
8
B
B
3
8
1
2
B
1
8
B
7
8
B
B
B
26
Persistent B-tree
Insert
Delete
done
Block underflow
0,0
Block overflow
Version split
done
Version split
Strong overflow
Strong underflow
Split
Merge
-1,+1
Strong overflow
done -1,+2
1
4
1
8
B
B
3
8
1
2
B
1
8
B
7
8
B
B
B
done
-2,+1
Split
done
-2,+2
27
Persistent B-tree Analysis
• Update: O(log B N )
– Search and “rebalance” on one root-leaf path
• Space: O(N/B)
– At least 18 B updates in leaf in existence interval
– When leaf u dies
* At most two other nodes are created
* At most one block over/underflow one level up (in parent(u))
1
1
1
B
B
B
– During N updates we create:
8
8
2
* O( N B) leaves
7
* O( N Bi ) nodes i levels up
1
B B
B 83 B
8
4
O( N B i ) O( N B ) blocks
i
28
Summary/Conclusion: Persistent B-tree
• Persistent B-tree
– Update current version
– Query all versions
• Efficient implementation obtained using existence intervals
– Standard technique
• During N operations
– O(N/B) space
– O(log B N ) update
– O(log B N T B) query
29
Interval Management
• Problem:
– Maintain N intervals with unique endpoints dynamically such
that stabbing query with point x can be answered efficiently
x
• As in (one-dimensional) B-tree case we are interested in
– O( N B) space
– O(log B N ) update
– O(log B N T B) query
30
Interval Management: Static Solution
• Sweep from left to right maintaining persistent B-tree
– Insert interval when left endpoint is reached
– Delete interval when right endpoint is reached
x
• Query x answered by reporting all intervals in B-tree at “time” x
– O( N B) space
– O(log B N T B) query
– O( NB log B N ) construction using buffer technique
• Dynamic with O(log 2B N ) insert bound using logarithmic method
31
Internal Memory Logarithmic Method Idea
• Given (semi-dynamic) structure D on set V
– O(log N) query, O(log N) delete, O(N log N) construction
• Logarithmic method:
– Partition V into subsets V0, V1, … Vlog N, |Vi| = 2i or |Vi| = 0
– Build Di on Vi
..................................
* Delete: O(log N)
2
2
2
* Query: Query each Di O(log2 N)
* Insert: Find first empty Di and construct Di out of
0
1
2
2 log N
1 ij10 2 j 2i elements in V0,V1, … Vi-1
– O(2i log 2i) construction O(log N) per moved element
– Element moved O(log N) times O(log2 N ) amortized
32
External Logarithmic Method Idea
• Decrease number of subsets Vi
to logB N to get O(log 2B N ) query
..................................
B0
B1
B2
B log B N
• Problem: Since 1 ij10 B j Bi there are not enough elements in
V0,V1, … Vi-1 to build Vi
• Solution: We allow Vi to contain any number of elements Bi
i
i
– Insert: Find first Di such that j 0 V j B and construct new
Di from elements in V0,V1, … Vi
* We move
i 1
j 0 V j B i 1elements
* If Di constructed in O((|Vi|/B)logB |Vi|) = O(Bi-1logB N) I/Os
every moved element charged O(logB N) I/Os
* Element moved O(logB N) times O(log 2B N ) amortized
33
External Logarithmic Method Idea
• Given (semi-dynamic) linear space external data structure with
– O(log B N T B) I/O query
– O( NB log B N ) I/O construction
(– O(log B N ) I/O delete)
• Linear space dynamic data structure with
– O(log2B N T B) I/O query
– O(log 2B N ) I/O insert amortized
(– O(log B N ) I/O delete)
• Dynamic interval management
– O(log2B N T B) I/O query
– O(log 2B N ) I/O insert amortized
x
34
Planar Point Location
• Static problem:
– Store planar subdivision with N segments on disk such that
region containing query point q can be found I/O-efficiently
• We concentrate on vertical ray shooting query
– Segments can store regions it bounds
– Segments do not have to form subdivision
q
• Dynamic problem:
– Insert/delete segments
(we will not discuss this)
35
Static Solution
• Vertical line imposes above-below order on intersected segments
• Sweep from left to right maintaining
persistent B-tree on above-below order
– Left endpoint: Insert segment
– Right endpoint: Delete segment
q
• Query q answered by successor query on B-tree at time qx
– O( N B) space
– O(log B N T B) query
36
Static Solution
• Note: Not all segments comparable!
– Have to be careful about what we compare
q
• Problem: Routing elements in internal nodes of leaf oriented B-trees
– Luckily we can modify persistent B-tree to use regular (live)
elements as routing elements
• However, buffer technique construction cannot be used
• Only O( N log B N ) I/O construction algorithm
• Cannot be made dynamic using logarithmic method
37
References
• External Memory Geometric Data Structures
Lecture notes by Lars Arge.
– Section 1-4
• I/O-efficient Point Location using Persistent B-trees
– Lars Arge, Andrew Danner and Sha-Mayn Teh
38