Download 04_VDB_encyc_cpt - NDSU Computer Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Transcript
V E R T I C A L
D A T A B A S E
D E S I G N
The concept of vertical data files, in fact, is not new at all. Copeland et al
(1985) presented an attribute-level Decomposition Storage Model called DSM,
similar to the Attribute Transposed File model (AT F) (Batory, 1979) that stores
each column of a relational table into a separate table. However, DSM was shown
to perform well. It utilizes surrogate keys to map individual attributes together,
hence requiring a surrogate key to be associated with each att ribute of each record
in the database. Attribute -level vertical decomposition is also used in Remotely
Sensed Imagery, e.g. Landsat Thematic Mapper Imagery, where it is called Band
Sequential (BSQ) format. Beyond attribute -level decomposition, Wong et al (1985)
presented the Bit Transposed File model (BTF), which further partitioned each
column into bit level and utilized encoding methods to reduce the storage space.
Due to the difficulty of accessing files directly in an operating system, a higher
layer of accessing known as database is invented. In most cases, databases are
stored horizontally which is suitable for data retrieval but not data mining
purposes. On the other hand, vertical database can achieve both data retrieval and
data mining purposes.
In vertical databases, data is stored vertically and processed horizontally
through fast, multi-operand logical operations, such as AND, OR, XOR, and
complement. Predicate tree (P-tree) is one of lossless vertical structures that can
meet the requirement. P-tree is suitable to represent numerical and categorical data
and has been successfully used in OLAP operations (Wang et al, 2003) and various
data mining applications, including classification (Khan et al, 2002), clustering
(Denton et al, 2002), and association rule mining (Ding et al, 2002).
A vertical database consists of a set of P -trees rather than a set of relational
tables. To convert a relational table of horizontal records to a set of vertical P trees, the table has to be projected into colu mns, one for each attribute, retaining
the original record order in each.
Then each attribute column is further
decomposed into separate bit vectors, one for each bit position of the values in that
attribute. Figure 1 shows a relational table with three a ttributes, in which all of the
attributes are numeric. Figure 2 shows the decomposition process from the
relational table R to a set of bit vectors.
R (A1, A2, A3)
A2
5
2
7
7
2
4
3
1
2
3
2
2
5
7
2
3
7
2
2
5
5
1
1
4
Figure 1. A relational table R.