Download 04_VDB_encyc_cpt - NDSU Computer Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Transcript
A
S Y S T E M
P R O T O T Y P E
As a proof of concept, a prototype system has been developed and tested
successfully for scalable data mining on the top of the vertical database concept.
The multi-layered software framewor k approach has been taken to design the
prototype. The system is formally named as DataMIME TM (Serazi et al, 2004).
The layers of the system include Data Mining Interface (DMI), Data Capture
and Data Integration Interface (DCI/DII), Data Mining Algorithm (DMA), and
Distributed Ptree Management Interface (DPMI). DMI does counting, the most
important operation for data mining provided by P -trees, including basic P-trees,
value P-trees, tuple P-trees, interval P-trees, and cube P-trees. DMI also provide
the P-tree algebra, which has four operations, AND, OR, NOT (complement) and
XOR, to implement the point wise logical operations on P -trees for (Data Mining
Algorithms) DMA. DCI/DII allows user to capture and to integrate data to system
required format (P-tree format). The DPMI layer provides access, location, and
concurrency transparency by hiding the fact that data representation may differ, and
resource access protocol may vary, resources may be located in different places,
and shared by several competitive users. DMA layer contains a collection of data
mining tools, e.g. P-KNN (Khan et al, 2002), PINE (Perrizo et al, 2003), P BAYESIAN (Perera et al, 2002), P -SVM (Pan et al, 2004), and P -ARM (Ding et al,
2002). Besides all those core layers the system provide s a graphical user interface
that adds flexible user interaction with the system.
In order to comprehend how vertical database concept affects the system, there
are some key concepts that must be grasped. Unlike traditional database, data is not
stored as horizontal row-based format rather they are stored as compressed vertical
P-tree format. The DPMI layer is responsible to store and manage this P -tree based
vertical data in the system. The efficient bit -wise operations on vertical data offer
the scalability for data mining algorithms and these are achieved through DMI
layer. Finally, this uniform efficient vertical data structure at the lowest layer can
take advantage of the latest hardware.
C O N C L U S I O N
Horizontal data structure has been proven to be i nefficient for data mining on
very large sets due to the large cost of scanning. It is of importance to develop
vertical data structures and algorithms to solve the scalability issue. Various
structures have been proposed, among which P -tree is a very promising vertical
structure. This database model is not a set of indexes, but is a collection of
representations of dataset itself. P-trees have show great performance to process
data containing large number of tuples due to the fast logical AND operation
without scanning (Ding et al, 2002). In general, horizontal data organization is
preferable for transactional data with intended output as a relation, and vertical
data structure is more appropriate for data mining on very large data sets.