Download Hardware-Enhanced Association Rule Mining with Hashing and

1 Hardware-Enhanced Association Rule Mining with Hashing and Pipelining Abstract—Generally speaking, to implement Apriori-based association rule mining in hardware, one has to load candidate itemsets and a database into the hardware. Since the capacity of the hardware architecture is fixed, if the number of candidate itemsets or the number of items in the database is larger than the hardware capacity, the items are loaded into the hardware separately. The time complexity of those steps that need to load candidate itemsets or database items into the hardware is in proportion to the number of candidate itemsets multiplied by the number of items in the database. Too many candidate itemsets and a large database would create a performance bottleneck. In this paper, we propose a HAsh-based and PiPelIned (abbreviated as HAPPI) architecture for hardware- enhanced association rule mining. We apply the pipeline methodology in the HAPPI architecture to compare itemsets with the database and collect useful information for reducing the number of candidate itemsets and items in the database simultaneously. When the database is fed into the hardware, candidate itemsets are compared with the items in the database to find frequent itemsets. At the same time, trimming information is collected from each transaction. In addition, itemsets are generated from transactions and hashed into a hash table. The useful trimming information and the hash table enable us to reduce the number of items in the database and the number of candidate itemsets. Therefore, we can effectively reduce the frequency of loading the database into the hardware. INTRODUCTION ata mining technology is now used in a wide variety of fields. Applications include the D analysis of customer transaction records, web site logs, credit card purchase information, call records, to name a few. The interesting results of data mining can provide useful information such as customer behavior for business managers and researchers. One of the most important data mining applications is association rule mining [11], which can be described as follows: Let I = {il,i2 , . . . , in} denote a set of items; let D denote a set of database transactions, where each transaction T is a set of items such that T ÇI; and let X denote a set of items, called an itemset. Architecture Diagram: www.frontlinetechnologies.org [email protected] +91 7200247247 2 CONCLUSION In this work, we have proposed the HAPPI architecture for hardware-enhanced association rule mining. The bottleneck of a priori-based hardware schemes is related to the number of candidate itemsets and the size of the database. To solve this problem, we apply the pipeline methodology in the HAPPI architecture to compare itemsets with the database and collect useful information to reduce the number of candidate itemsets and items in the database simultaneously. HAPPI can prune infrequent items in the transactions and reduce the size of the database gradually by utilizing the trimming filter. In addition, HAPPI can effectively eliminate infrequent candidate itemsets with the help of the hash table filter. REFERENCES 1. R. Agarwal, C. Aggarwal, and V. Prasad, "A Tree Projection Algorithm for Generation of Frequent Itemsets," J. Parallel and Distributed Computing, 2000. 2. R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules," Proc. 20th Int'l Conf. Very Large Databases (VLDB), 1994. 3. Z.K. Baker and V.K. Prasanna, "Efficient Hardware Data Mining with the Apriori Algorithm on FPGAS," Proc. 13th Ann. IEEE Symp. Field-Programmable Custom Computing Machines (FCCM), 2005. 4. Z.K. Baker and V.K. Prasanna, "An Architecture for Efficient Hardware Data Mining Using Reconfigurable Computing Systems," Proc. 14th Ann. IEEE Symp. Field- Programmable Custom Computing Machines (FCCM '06), pp. 67-75, Apr. 2006. www.frontlinetechnologies.org [email protected] +91 7200247247 3 5. C. Besemann and A. Denton, "Integration of Profile Hidden Markov Model Output into Association Rule Mining," Proc. 11th ACM SIGKDD Int'l Conf. Knowledge Discovery in Data Mining (KDD '05), pp. 538-543, 2005. 6. C.W. Chen, J. Luo, and K.J. Parker, "Image Segmentation via Adaptive K-Mean Clustering and Knowledge-Based Morphological Operations with Biomedical Applications," IEEE Trans. Image Processing, vol. 7, no. 12, pp. 1673-1683, 1998. 7. S.M. Chung and C. Luo, "Parallel Mining of Maximal Frequent Itemsets from Databases," Proc. 15th IEEE Int'l Conf. Tools with Artificial Intelligence (ICTAI), 2003. 8. S. Cong, J. Han, J. Hoeflinger, and D. Padua, "A Sampling-Based Framework for Parallel Data Mining," Proc. 10th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP '05), June 2005. 9. M. Estlick, M. Leeser, J. Szymanski, and J. Theiler, "Algorithmic Transformations in the Implementation of K-Means Clustering on Reconfigurable Hardware," Proc. Ninth Ann. IEEE Symp. Field- Programmable Custom Computing Machines (FCCM), 2001. 10. M. Gokhale, J. Frigo, K. McCabe, J. Theiler, C. Wolinski, and D. Lavenier, "Experience with a Hybrid Processor: K-Means Clustering," J. Supercomputing, pp. 131-148, 2003. www.frontlinetechnologies.org [email protected] +91 7200247247

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Hardware-Enhanced Association Rule Mining with Hashing and