Download time-databases

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linked list wikipedia , lookup

Lattice model (finance) wikipedia , lookup

Quadtree wikipedia , lookup

Red–black tree wikipedia , lookup

Binary tree wikipedia , lookup

B-tree wikipedia , lookup

Binary search tree wikipedia , lookup

Interval tree wikipedia , lookup

Transcript
Indexing transaction time databases
Toga Bozkaya, Meral Ozsoyoglu
presented by
Priyatham Pamu
1
Overview

The problem addressed:
 Indexing time intervals of temporal objects assuming a
record based storage model with object-versioning.

Object-versioning
 Different versions of the same time-varying entity are kept linked
to each other (by means of time-invariant key attributes in
relational model, by using pointers in object-oriented data model).
2
Overview


We consider a transaction time database with two states:
 The past state contains the temporal data that has once been valid,
but is not valid any more (i.e. historical data).
 We suggest to use IB+-tree, AD* tree, R trees depending on the
application requirement.
 The current state contains only the data that is valid at the current
time.
 We suggest to use a version of B+-tree structure called
Modified B+-tree for indexing the current state.
We can also use IB+-tree structure indexing valid time intervals in an
effective way.
3
Existing Temporal index structures






Time-index
AP-tree (Append-only tree)
TP-index (Time polygon index)
SR-tree (Segment R-tree)
Snapshot Index
‘windows’ indexing scheme
4
Temporal data model

A temporal data model can support two types of time
 Transaction time data
 Bounded by the current time pt and the time when the database
was initiated.
 Data is never deleted from a transaction time database, and the
new data arrives in an append-only fashion.
 Valid time data
 Validity of some temporal entity may span into future - No
upper bound
 Valid time databases are not append-only databases.
5
A Temporal Indexing Scheme




Assumptions
 We assume a discrete time model in our scheme.
 The time points are mapped to natural numbers starting from 0
(time when database is initiated) to current time point denoted by
the variable now.
In a transaction database, all object versions with their transaction time
intervals of the form [a,now] (a<=now) belong to the current state of
the database as they are currently valid.
All other versions that have transaction time intervals of the form [a,b]
(a<=b and b<now) belong to the past state of the database as their
validity had ceased at some point in the past.
Transaction time databases are append-only databases where random
deletions or insertions do not take place.
6
Temporal Indexing Scheme contd…


We denote a temporal object version as a 2-tuple, (I,V), where I is the
transaction time interval of the object version V.
Operations defined on a transaction time databases are



Deletion: If the current version v of an object o is to be deleted;
([ts,now],v) is deleted from the database and ([ts,td],v) is inserted to the
past state where td is the time of deletion.
Insertion: For insertions, all we do is create the initial version v of an
object o and insert to the current state. ([ts,now],v) is inserted to the
current state where ts is the time of insertion.
Updates: In case of updates, we have to create a new most recent version
([tu,now],v’) of the object o. This new version is inserted into the current
state and the old version ([ts,tu-1],v) is migrated to the past state.
7
Figure for current state and past state.
8
Indexing the current state


Features
 Unlike past-index, there are deletions as well as insertions to the
current-index.
 Deletions are done when object versions migrate from the
current state to the past state of the database.
 Insertions are done when new object versions are inserted into
the database.
For current-index, starting points of the transaction time intervals have
to be indexed, because the finish points of all such intervals are the
same (i.e. now). Hence we only need a 1-D index structure for
indexing the current state.
9
Indexing the current state…..

Properties specific to the current-state.


Insertions come ordered in starting time, and deletions are arbitrary.
We exploit the above property to come up with a B+ like structure
(i.e.MB+ -tree) that supposedly has a higher storage utilization which
also directly affects the height of the structure and hence the search
efficiency.
10
MB+ -tree



MB+-tree is simply a modified version of B+-tree.
MB+-tree
 Insertions are done from the right end, and can handle deletions
from anywhere in the tree.
 The nodes along the rightmost path of the tree can have as few as
one child, and the rightmost leaf can have as few as one key.
 The deletion algorithm is same as it is in B+ tree.
Depending on the distribution of the duration of the transaction time
intervals, this structure may provide considerable improvement over
the regular B+-tree in both efficiency and storage.
11
MB+-tree
12
Indexing the past state

Three different indexing tree structures are proposed.






Interval B+-trees
AD*trees
One and two-dimensional R-trees.
These index structures meet different requirements for different
applications.
In past state, we don’t have any dynamic deletions other than
vacuuming. It is an append-only database.
Since IB+ -trees are similar to Interval trees, we discuss Interval trees
first.
13
Interval-trees




Interval-tree is a binary tree (AVL or Red-black tree...) that
is augmented to support operations on a dynamic set of
intervals.
A node x of an Interval-tree contains an interval (int[x]),
and the key of x is the starting point of that interval.
An inorder tree walk of the data structure lists the intervals
in sorted order by their starting points. In addition, each
node contains the maximum finish point of the intervals
stored in the subtree rooted at that node.
Insertions and Deletions can be done in O(log2n).
14
A balanced interval tree figure
15
INTERVAL-SEARCH
INTERVAL-SEARCH(T,I)
(For a given interval I[is,if], find an interval that intersects with I in
the interval-tree T.) (left[x] and right[x] stand for the left and right
child of a node x)
(1)x=root(T),
(2) while x!=NIL and I does not intersect the interval int[x] do
(2.1) if left[x] != NIL and max[left[x]] >=is, then x=left[x]
(2.2) else x=right[x]
(3) Return x if it is not NIL.
16
Interval B+-tree structure




IB+-tree is a direct generalization of the Interval-tree to a multi-way
B+-tree structure. It is basically a B+-tree on the starting points of
intervals where each node is augmented with the same kind of
information as binary Interval-trees.
Unlike in Interval-trees, internal nodes of IB+-trees do not keep data
intervals. All data intervals are kept in the leaf nodes.
Number of children is equal to the number of keys.
Refer to the figure for internal node structure.
17
Internal IB+ node structure figure.
18
INTERVAL-SEARCH for IB+-tree
INTERVAL-SEARCH (N,I)
(For a given search interval I[is,if], find an interval that intersects with I in the
Interval B+-tree T. Here, N is node of the Interval B+-tree and the initial
call is INTERVAL-SEARCH(root(T),I).)
INTERVAL-SEARCH(root(T),I)
(We assume that N has k children (if an internal node), or k data items (if a leaf
node))
(1) If N is a leaf node then check if there is an intersection interval with I
among the intervals in N.
(2)else if N is an internal node then
(2.1) i=1;
(2.2) if I intersects [ai,mi] then INTERVAL-SEARCH(ci,I)
else if i<k then i=i+1, goto 2.2.
19
INTERVAL-SEARCH ……

Note that INTERVAL-SEARCH algorithm returns one interval (if
there exists atleast one) that intersects with the given search interval.

To obtain all the intersecting intervals


we have to use the links between the leaf nodes for a sequential search
from that point on. Since the intervals in the leaf nodes are sorted as per
their starting points, so it is efficient.
Or we can follow all the child pointers that satisfy condition in step2.2 of
the algorithm.
20
Insertions and Deletions in IB+tree



Insertion and deletion operations for IB+tree are similar to those for
B+trees with the only exception of a little overhead to maintain the
augmented information.
For every merge operation done during insertion or deletion, the
maximum fields for all the nodes along the ancestral path have to be
updated accordingly.
Complexity of insertion and deletion operations for IB+trees is still
O(logkn) (the same as B+-trees), where n is the number of leaf nodes
and k is the average fanout of a node in the tree.
21
IB+-tree figure.
22
Storage Utilization of IB+-tree
Storage utilization of IB+ tree is similar to B+tree.
Let size of a node (leaf or internal) be Mbytes
Let each pointer take p bytes while a key value is taken as k bytes.
Max number of entries in leaf node is P1=M/(2k+p) where the tuple
identifier is p bytes.
Max number of entries in internal node is Pi=M/(2k+p) which same as P1.
If there are N intervals to index, there will be (N/P1 ln2) leaf nodes as
each node is shown to be ln2 full on the average.
Number of internal nodes = (N/P1 ln2) (1/(P1 ln2) + 1/(P1 ln2)2+…..(h-1)
terms……
=N/(P1 ln2 –1) approximately.

23
Comparison with a regular B+ tree.

A regular B+ tree with no augmentation information will take less little
number of pages (for internal nodes) as Pi would be (M+k)/(k+p)
(#children is one more than #keys). As fanout increases, the difference
between a IB+ and B+ is very insignificant.
Height comparison:
hIB+-tree/hB+-tree= (ln Pi(B+-tree)+ ln ln2)/(ln Pi(IB+tree)+ ln ln2)
Which is very close to 1. For p=8, k=4, M=2048, Pi(IB+tree) is 128, and
Pi(B+-tree) becomes 171. In this case, the height ratio becomes 1.06 app.
Observation: The height of B+-tree and IB+-tree structure built on the
same set of interval data will most likely be the same.
24
AD-trees & AD*-trees

We propose a one-dimensional AD*-tree structure for indexing transaction
time intervals with respect to their finish points.

AD* tree structure is simply an augmented AD-tree.

AD-tree is built on finish points of the data intervals and is augmented with
minimum starting point information to obtain AD* structure.

Since AD* and AD-trees are similar, we just discuss the features of AD-tree.

Since the finish points of the insertions are ordered, we do the insertion at the
right most node.
25
AD-tree properties
26
AD-tree with k=4 after an insertion and a deletion
27
28
Deletion in AD-trees figure.
29
AD-tree vs MB+-tree
Lemma1: The minimum number of keys in an AD-tree of height h with
order k (maximum fanout) is (k-1)kh-2 + 2, where k,h>=2.
Lemma 2: The minimum number of keys in an MB+-tree of height h with
order k (maximum fanout) is (floor(k/2)*kh-2) + 2, where k,h>=2;
Theorem: The worst case density of an AD-tree of height h with order k is
more than that of an MB+-tree of the same height and the same order,
(k>=3).
Experimental results have shown that the height of AD-tree increases
slower than the MB+-tree when we assume they index the same set of
keys and have the same parameters.
30
R-trees




1D R-tree
 Each internal node entry contains a minimum bounding interval
(MBI) and a child pointer.
 The deletion, insertion, and search algorithms of the general Rtrees are not changed.
2D R-trees, each interval is mapped to a point in a 2D-space where the
dimensions are the starting pt. and the finish pt.
 In each internal nodes, each entry contains a child pointer and a
MBR that encloses the 2D points indexed below in the
corresponding subtree.
R-tree does not assume, no consider any ordering among the data
intervals as in IB+ tree.
2D R-trees require more storage and perform worse for common
intersection queries.
31
Temporal relationships between intervals
32
Querying the past state ……






Queries on intervals employ different temporal operators.
Depending on the querying requirement, different operators can be
applied.
Operators: after, met by, right overlaps, left covered by, right covered
by, right covers, equals, covered by are well supported by IB+-tree.
Operators: before, meets, left overlaps, left covers, right-covered by,
equals, covered by, left covered by are well supported by AD*-tree.
All the operators either invoke an intersection or an inclusion search in
1-D R-tree structure. All the operators are uniformly supported by Rtrees
For 2D R-trees, each of the operators corresponds to a 2D query region
as shown in fig.
33
Temporal relationships in 2D space.
34
Experimental results. AD-trees
35
Experimental results ….

Fig a.


MB+-tree performs better than B+tree when its height is smaller than the
height of B+tree. When they have the same height, B+tree performs better
than MB+tree.
Fig b.
B+ outperforms MB+tree when they have same height.

AD-tree outperforms both the structures.
In terms of insertion performance, AD-tree and MB+tree are almost same,
but B+tree performs worse than the other two.
When deletions are not strictly from the left hand MB+-tree performs better
than B+-tree in all categories.

36
Experimental results IB+ & R-trees
37
Experimental results IB+ & R-trees


Insertion and deletion in IB+trees are simpler and less costly compared
to R-trees.
In Fig a.



For 2D R-trees, this operation is a window query so it is not very efficient.
For IB+ trees, this is related to the fact that the starting points of all
qualifying intervals fall in the query interval. Thus it performs best.
In fig b.


For 1D R-trees, all nodes whose MBIs include the finish point of the
query interval have to be retrieved. This becomes costly as MBI size
increases.
Similarly with 2d R-trees.
38
Experimental results IB+ & R-trees
39
Experimental results IB+ & R-trees

In Fig a.



1D R-tree performed the best with 30-40% edge over IB+-tree and 30%
over 2D R-trees.
Sequential search method and the range search method of IB+-tree
performed very close each other with the sequential search method
slightly better than range search method.
In Fig b.



1D R-tree performed the best.
Exponential distribution affects the way the augmented information in an
IB+-tree is utilized.
IB+-tree range search performed better than the sequential search method.
40
Indexing valid time databases


For valid time data, we need to use indexing structures that support
dynamic insertions and deletions.
IB+-trees can also support indexing on valid time intervals, including
the intervals that span into the current time and into future.
 We make the following assumptions:
 Valid time intervals should have an absolute (fixed) starting
time point, and the finish points of the valid time intervals can
either be an absolute time point or the current time variable
now.
41
Indexing valid-time intervals
42