Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Faculty of Computers and Information Information Systems Department “Association Rules Discovery in Databases Using Associative Memories” Thesis Submitted For Partial Fulfillment of the Requirements for the Master Degree in Computers and Information Information Systems Department Submitted by Hebatallah Mohamed Nabil Yasein Supervised by Professor Dr. Ahmed Sharaf Eldin Vice Dean of the Faculty of Computers and Information, Helwan University Dr. M. Abd El-Fattah Belal Assistant Prof. Computer Science Department,Faculty Of Computers and Information, Helwan University Dr. Sayed Abd El-Gaber Lecturer Prof. Information Systems Department, Faculty Of Computers and Information, Helwan University 2006 ﷲا ا ﻗـﻞ إن ﺻﻼﺗﻰ و ﻧﺴﻜﻰ و ﻣﺤﻴﺎى و ﻣﻤﺎﺗﻰ ﷲ رب اﻟﻌﺎﻟﻤﻴﻦ . ﺻﺪق اﷲ اﻟﻌﻈﻴﻢ Dedicated to: My dear parents, husband, and first of all to ALLAH Acknowledgments First and foremost I thank ALLAH (SWT) for granting me patience and stamina to complete my work. After ALLAH, all my thanks to my beloved parents who showed me the right path and for constant support and encouragement; I hope some day I can repay them for every thing they did for me. I would like to express my deepest gratitude to Prof. Ahmed Sharaf ElDin who supports me and honestly guide me as a teacher and father. I would also like to express my deepest and almost gratitude to Ass. Prof. Mohamed Belal for suggesting the point of research, for his deep and truly help and for encouraging me all over my way of research. I would also thank Dr. Sayed Abd ElGaber for his help and cooperation. Really, all my deepest appreciation to my dear dean at Shrouk Academy, Prof. Ahmed Gabr for his kind helps and support as a true father. I also want to thank my big family members for their encouragement, pray and love. And to all my helpful friends, thank you very much for your cooperation. Last, but not the least, I would like to express my special thanks to my loving husband Mohamed for his encouragement, help and support. Hebatallah Nabil Contents ABSTRACT List of Figures and Tables Chapter 1: Introduction 1.1 Preface …………………………………………………………………. 1 1.2 Problem Statement ………………………………………………….…. 3 1.3 Research Objectives …………………………………………………… 3 1.4 Research Techniques …………………………………………………... 4 1.5 Thesis Outline …………………………………………………….……. 5 Chapter 2: Theoretical Background 2.1 Concepts and Definitions …………………………………….…….…... 7 2.1.1 Knowledge Discovery in Database (KDD) ……………………..….…. 7 2.1.2 Data Mining (DM) ………………………………………………..…… 7 2.1.3 Association Rules (AR) ………………………………….…….…...…. 9 2.1.4 Training an Artificial Neural Network …………………………..……. 10 2.2 Association Rules Discovery Algorithms ……………………….…….… 2.2.1 Apriori ……………………………………………………..….…..…. 2.2.2 Apriori-TID ………………………………………………..….….….. 2.2.3 Apriori-Hybrid ……………………………………………………….. 2.2.4 Sampling ………………………………………………………….….. 2.2.5 Partition ……………………………………………………….….…... 2.2.6 Dynamic Itemset Counting algorithm (DIC) ……………………….…. 2.2.7 Direct Hashing and Pruning algorithm (DHP) ……….…….….……. 2.2.8 Frequent Pattern Growth algorithm (FP-Growth)…………..…………. 2.2.9 Ready and Go algorithm (R &G)……………………………………… 2.2.10 Dynamic and Direct Support algorithm (DDS)……………..…..……. 13 15 20 22 23 24 26 31 33 36 38 2.3 Problem Investigation …………………………………………….…..…. 41 Chapter 3: The Proposed Model: Dynamic Support and Update (DSU) 3.1 Data Structure in DSU ……………………….……………………….. 46 3.2 Dynamic Support and Update (DSU) ……………..……………….…….. 47 3.3 DSU Notation …………………….……………………..……….……... 49 3.4 Outline of the DSU ……………………………………..….……….……... 50 3.5 DSU Algorithm ………………………………………………...….…..… 61 Chapter 4: Simulation and Results 4.1 DSU Tracing Example ………………………………………………..…. 67 4.2 Numerical Experiments ………………………………………..…...……. 89 4.2.1 Design of the experiment ………………………………….…..……... 89 4.2.2 Experiment Output ………………………………………..…………. 89 4.3 Results and Analysis …………………………………………...……. 92 Chapter 5: Conclusions and Future Work 5.1 Conclusions …………………………………………………….………... 94 5.2 Recommendations for Future Work …………………………….….……... 96 References Appendices Appendix I: Flowcharts Appendix II: Script ABSTRACT In early nineties, the problem of association rules aroused, and up to now, many researchers are trying to improve and create new techniques for association rules discovery. Most of the algorithms regarding this issue are depending on repeating the process of scanning the entire DB to count the support of the itemsets. This research is trying to find out an optimal method to extract association rules without repeating the process of scanning the whole database for each new transaction, and it is also intended to be flexible enough to accept any change in the minimum support threshold without repeating the process of scanning the entire DB again or affecting the efficiency of the output. So, we introduce a new algorithm for dynamic association rules mining based on an Associative Memory Neural Network (AMNN) model. List of Figures and Tables Figure 2.1 Finding frequent itemsets in Apriori example] Figure 2.2 DIC Lattice Figure 2.3 DIC scans 1.7 passes Figure 2.4 Apriori scans 7 passes Figure 2.5 The conditional FP-tree associated with the conditional node "C" Figure 2.6 The FP-tree for all transactions Figure 2.7 Transaction representation in DDS Figure 3.1: DSU Block Diagram Figure 3.2: DSU general process architecture for a new transaction Figure 3.3: DSU General Architecture Figure 3.4(a): DSU General Flowchart Figure 3.4(b): Generate β(L) Flowchart Figure 3.4(c): Generate N(L) Flowchart Figure 3.4(d): Update Rules Flowchart Figure 3.5(a): General Module. Figure 3.5(b): Module of generating β at level L of association Figure 3.5(c): Module of calculating the counting matrix N at level L of association. Figure 3.5(d): Module of updating extracted association rules. Figure 3.5(e): Computational module for hyper weight matrix Figure 4.1: The relationship between the number of transactions and the consumed time Figure 4.2: The relationship between the value of the support threshold and the consumed time. Figure 4.3: Apriori performance due to the change in the Sth value Table 2.1: C'k structure in Apriori-TID algorithm Table 2.2 conditional FP-tree paths REFERENCES References [1] Sleem M. A., Msc., “Beyond Static Association Rules: A proposed Method for Dynamic Association Rules”, University of Nottingham, School of Computer Sceince and Information Technology, Sept. 2002. [2] Han J. and Kamber M. “Data Mining: Concepts and Techniques”, New York: Morgan Kaufman 2001. [3] Baixeries J., Casas G. and Balcazar J. L. "Frequent Sets, sequences and taxonomies: new, efficient algorithmic proposals". ALCOM Project (European Union ESPRIT LRT Project 20244), December 2000. [4] Loo K.K., Yip C.L., Kao B., and Cheung D.W., "Exploiting the Duality of Maximal Frequent Itemsets and Minimal Infrequent Itemsets for I/O Efficient Association Rule Mining", Proceedings of the 11th International Conference on Database and Expert Systems Applications (DEXA'2000), Greenwich, September 2000. [5] Gunopulos D., Khardon R., Mannila H., Saluja S., Toivonen H. and R. S. Sharma. "Discovering All Most Specific Sentences". ACM Transactions on Database Systems, Vol. 28, No. 2, Pages 140–174, June 2003,. [6] Zheng Z., Kohavi R. and Mason L.. "Real World Performance of Association Rule Algorithms". A short version of this paper is published in Proceedings of the Seventh ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY: ACM,2001. [7] Lin D. and Kedem Z. M.."Pincer-search: A new algorithm for discovering the maximum frequent set". In Proceedings of the 6th Conference on extending Database Technology (EDBT), Valencia, Spain, March 1998. [8] Agrawal R. and Srikant R.. "Fast Algorithms for Mining Association Rules". IBM Research Report RJ 9839, IBM Almaden Research Center, In Proc. 20th VLDB, Sept. 1994. [9] Zaki M. J., Parthasarathy S., Ogihara M, and Li W.. "New Algorithms for Fast Discovery of Association Rules". Technical Report 651. July 1997. [10] Webb G. I.. "Efficient search for association rules". KDD-2000, Poston, MA Augus 2000 . [11] Hipp J., Guntzer U. and Nakhaeizadeh G.."Algorithms for Association Rule Mining: A General Survey and Comparison". SIGKDD Explorations, Vol. 2, Issue 1- PP 63, Jul. 2000. [12] Agrawal R., Imielinski T., and Swami A."Mining association rules between sets of items in large databases". In Proc. of ACM SIGMOD Conf. on management of data, Washington, D.C., pp. 207–216, May 1993. [13] Houtsma M. and Swami A. "Set-oriented mining of association rules". Research Report RJ 9567, IBM Almaden Research Center, San Jose, California, Oct.1993. [14] Park J. S., Chen M. S. and Yu P. S. "An Effective Hash-Based Algorithm for Mining Association Rules". Proceedings of the ACM SIGMOD International Conference on Management of Data, 1995. [15] Brin S., Motwani R., Ullman J. and Tsur S.. "Dynamic itemset counting and implication rules for market basket data". Int. Conf. Management of Data. p.255-264 ACM press 1997. [16] Han J., Pei J. and Yin Y.."Mining Frequent Patterns without Candidate Generation". Proc. ACM-SIGMOD, Dallas, TX, May 2000. [17] Park J. S., Chen M. S. and Yu P. S.."Efficient Data Mining for Path Traversal Patterns". IEEE Transactions on Knowledge and Data Engineering, Vol. 10, No. 2, March/April 1998. [18] Tveter D. “ The Pattern Recognition Basis of Artificial Intelligence”, 1998. [19] Stergiou C. and Siganos D., "Neural Networks", http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html. [20] Wikipedia, version 1.2, November 2002, http://wikikpedia.org/wiki/Neural_Network.