Download Hardware-Enhanced Association Rule Mining with Hashing and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
1
Hardware-Enhanced Association Rule
Mining with Hashing and Pipelining
Abstract—Generally speaking, to implement Apriori-based association rule mining in
hardware, one has to load candidate itemsets and a database into the hardware. Since the
capacity of the hardware architecture is fixed, if the number of candidate itemsets or the
number of items in the database is larger than the hardware capacity, the items are loaded
into the hardware separately. The time complexity of those steps that need to load candidate
itemsets or database items into the hardware is in proportion to the number of candidate
itemsets multiplied by the number of items in the database. Too many candidate itemsets
and a large database would create a performance bottleneck. In this paper, we propose a
HAsh-based and PiPelIned (abbreviated as HAPPI) architecture for hardware- enhanced
association rule mining. We apply the pipeline methodology in the HAPPI architecture to
compare itemsets with the database and collect useful information for reducing the number
of candidate itemsets and items in the database simultaneously. When the database is fed
into the hardware, candidate itemsets are compared with the items in the database to find
frequent itemsets. At the same time, trimming information is collected from each
transaction. In addition, itemsets are generated from transactions and hashed into a hash
table. The useful trimming information and the hash table enable us to reduce the number
of items in the database and the number of candidate itemsets. Therefore, we can effectively
reduce the frequency of loading the database into the hardware.
INTRODUCTION
ata mining technology is now used in a wide variety of fields. Applications include the
D analysis of customer transaction records, web site logs, credit card purchase
information, call records, to name a few. The interesting results of data mining can provide
useful information such as customer behavior for business managers and researchers. One
of the most important data mining applications is association rule mining [11], which can
be described as follows: Let I = {il,i2 , . . . , in} denote a set of items; let D denote a set of
database transactions, where each transaction T is a set of items such that T ÇI; and let X
denote a set of items, called an itemset.
Architecture Diagram:
www.frontlinetechnologies.org
[email protected]
+91 7200247247
2
CONCLUSION
In this work, we have proposed the HAPPI architecture for hardware-enhanced association
rule mining. The bottleneck of a priori-based hardware schemes is related to the number of
candidate itemsets and the size of the database. To solve this problem, we apply the pipeline
methodology in the HAPPI architecture to compare itemsets with the database and collect
useful information to reduce the number of candidate itemsets and items in the database
simultaneously. HAPPI can prune infrequent items in the transactions and reduce the size of
the database gradually by utilizing the trimming filter. In addition, HAPPI can effectively
eliminate infrequent candidate itemsets with the help of the hash table filter.
REFERENCES
1.
R. Agarwal, C. Aggarwal, and V. Prasad, "A Tree Projection Algorithm for Generation
of Frequent Itemsets," J. Parallel and Distributed Computing, 2000.
2.
R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules," Proc. 20th
Int'l Conf. Very Large Databases (VLDB), 1994.
3.
Z.K. Baker and V.K. Prasanna, "Efficient Hardware Data Mining with the Apriori
Algorithm on FPGAS," Proc. 13th Ann. IEEE Symp. Field-Programmable Custom
Computing Machines (FCCM), 2005.
4.
Z.K. Baker and V.K. Prasanna, "An Architecture for Efficient Hardware Data Mining
Using Reconfigurable Computing Systems," Proc. 14th Ann. IEEE Symp. Field-
Programmable Custom Computing Machines (FCCM '06), pp. 67-75, Apr. 2006.
www.frontlinetechnologies.org
[email protected]
+91 7200247247
3
5.
C. Besemann and A. Denton, "Integration of Profile Hidden Markov Model Output
into Association Rule Mining," Proc. 11th ACM SIGKDD Int'l Conf. Knowledge
Discovery in Data Mining (KDD '05), pp. 538-543, 2005.
6.
C.W. Chen, J. Luo, and K.J. Parker, "Image Segmentation via Adaptive K-Mean
Clustering and Knowledge-Based Morphological Operations with Biomedical
Applications," IEEE Trans. Image Processing, vol. 7, no. 12, pp. 1673-1683, 1998.
7.
S.M. Chung and C. Luo, "Parallel Mining of Maximal Frequent
Itemsets from Databases," Proc. 15th IEEE Int'l Conf. Tools with Artificial Intelligence
(ICTAI), 2003.
8.
S. Cong, J. Han, J. Hoeflinger, and D. Padua, "A Sampling-Based Framework for
Parallel Data Mining," Proc. 10th ACM SIGPLAN Symp. Principles and Practice of
Parallel Programming (PPoPP '05), June 2005.
9.
M. Estlick, M. Leeser, J. Szymanski, and J. Theiler, "Algorithmic Transformations in
the Implementation of K-Means Clustering on Reconfigurable Hardware," Proc.
Ninth Ann. IEEE Symp. Field- Programmable Custom Computing Machines (FCCM),
2001.
10.
M. Gokhale, J. Frigo, K. McCabe, J. Theiler, C. Wolinski, and D. Lavenier, "Experience
with a Hybrid Processor: K-Means Clustering," J. Supercomputing, pp. 131-148,
2003.
www.frontlinetechnologies.org
[email protected]
+91 7200247247