Download Maintenance.of..................... Association.Rules.Using. Pre

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Oracle Database wikipedia , lookup

IMDb wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Commitment ordering wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Serializability wikipedia , lookup

Functional Database Model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Relational model wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Concurrency control wikipedia , lookup

ContactPoint wikipedia , lookup

Transcript
 Hong & Wang
Chapter.III
Maintenance.of.....................
Association.Rules.Using.
Pre-Large.Itemsets
Tzung-Pei Hong, National University of Kaohsiung, Taiwan
Ching-Yao Wang, National Chiao-Tung University, Taiwan
Abstract
Developing an efficient mining algorithm that can incrementally maintain discovered
information as a database grows is quite important in the field of data mining. In the
past, we proposed an incremental mining algorithm for maintenance of association
rules as new transactions were inserted. Deletion of records in databases is, however, commonly seen in real-world applications. In this chapter, we first review the
maintenance of association rules from data insertion and then attempt to extend it
to solve the data deletion issue. The concept of pre-large itemsets is used to reduce
the need for rescanning the original database and to save maintenance costs. A
novel algorithm is proposed to maintain discovered association rules for deletion of
records. The proposed algorithm doesn’t need to rescan the original database until
a number of records have been deleted. If the database is large, then the number
of deleted records allowed will be large too. Therefore, as the database grows, our
proposed approach becomes increasingly efficient. This characteristic is especially
useful for real-world applications.
Copyright © 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited.
Maintenance of Association Rules Using Pre-Large Itemsets
Introduction
Due to the increasing use of very large databases and data warehouses, mining useful
information and helpful knowledge from transactions is evolving into an important
research area. In the past, researchers usually assumed databases were static to
simplify data-mining problems. Thus, most proposed algorithms focus on batch
mining (Agrawal, Imielinksi, & Swami, 1993; Agrawal & Srikant, 1994; Agrawal &
Srikant, 1995; Agrawal, Srikant, & Vu, 1997; Han & Fu, 1995; Mannila, Toivonen,
& Verkamo, 1994; Park, Chen, & Yu, 1997; Srikant & Agrawal, 1995; Srikant &
Agrawal, 1996) and do not utilize previously mined patterns for later maintenance.
This may require considerable computation time to obtain the updated set of association rules or patterns (Cheung, Han, Ng, & Wong, 1996).
Researchers have recently developed efficient mining algorithms for maintaining
association rules and avoiding the above-mentioned shortcomings when new transactions are inserted. Examples include the FUP algorithm (Cheung et al., 1996),
the adaptive algorithm (Sarda & Srinivas, 1998), and the incremental updating
technique based on the concept of negative border (Feldman, Aumann, Amir, &
Mannila, 1997; Thomas, Bodagala, Alsabti, & Ranka, 1997). The common idea
among these approaches is that previously mined patterns are stored in advance for
later use. When new transactions are inserted, a large part of the final results can
be obtained by comparing the patterns mined from the newly inserted transactions
with the pre-stored mined knowledge. Only a small portion of the patterns need to
be re-processed against the entire database, thus saving much computation time.
Among these approaches, the FUP algorithms (Cheung et al., 1996) store the previously mined large itemsets for later maintenance. Some approaches utilize negative
borders (Feldman et al., 1997; Thomas et al., 1997) to enlarge the amount of prestored mined information, thus improving maintenance performance at the expense
of storage space. Furthermore, we proposed an incremental mining algorithm based
on the concept of pre-large itemsets for data insertion (Hong, Wang, & Tao, 2001).
The concept of pre-large itemsets is denoted as the set of itemsets having support
between a lower support threshold, which is smaller than the given minimum support, and an upper support threshold, which is equal to the given minimum support.
Therefore, using the pre-large itemsets to enlarge the amount of pre-stored mined
information can avoid rescanning the original database until the accumulative amount
of new transactions exceeds the safety bound at the expense of storage spaces. This
is because they act as a buffer to avoid the movements of itemset directly from small
to large and vice-versa during the incremental mining process.
In addition to record insertion, record deletion is also commonly seen in real-world
applications. For example, the records which were generated some years ago may
be moved to a magnetic tape. Developing efficient maintenance algorithms for
deletion of records is practical and necessary. In this chapter, we first review the
Copyright © 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission
of Idea Group Inc. is prohibited.
15 more pages are available in the full version of this document,
which may be purchased using the "Add to Cart" button on the
publisher's webpage:
www.igi-global.com/chapter/maintenance-association-rulesusing-pre/24229
Related Content
Spatial Network Databases
Michael Vassilakopoulos (2009). Handbook of Research on Innovations in Database
Technologies and Applications: Current and Future Trends (pp. 307-315).
www.irma-international.org/chapter/spatial-network-databases/20715/
Data Quality Assessment
Juliusz L. Kulikowski (2009). Handbook of Research on Innovations in Database
Technologies and Applications: Current and Future Trends (pp. 378-384).
www.irma-international.org/chapter/data-quality-assessment/20722/
Towards Autonomic Workload Management in DBMSs
Baoning Niu, Patrick Martin and Wendy Powley (2011). Theoretical and Practical
Advances in Information Systems Development: Emerging Trends and Approaches (pp.
154-173).
www.irma-international.org/chapter/towards-autonomic-workload-managementdbmss/52956/
Changing the Face of War through Telemedicine and Mobile E-Commerce
James A. Rodger, Parag C. Pendharkar and Mehdi Khosrow-Pour (2002). Advanced
Topics in Database Research, Volume 1 (pp. 267-292).
www.irma-international.org/chapter/changing-face-war-throughtelemedicine/4332/
Ensuring Customised Transactional Reliability of Composite Services
Sami Bhiri, Walid Gaaloul, Claude Godart, Olivier Perrin, Maciej Zaremba and Wassim
Derguech (2011). Journal of Database Management (pp. 64-92).
www.irma-international.org/article/ensuring-customised-transactional-reliabilitycomposite/52993/