Download “Association Rules Discovery in Databases Using Associative

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Expectation–maximization algorithm wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

Transcript
Faculty of Computers and Information
Information Systems Department
“Association Rules Discovery in Databases Using
Associative Memories”
Thesis Submitted For Partial Fulfillment of the Requirements for the Master
Degree in Computers and Information
Information Systems Department
Submitted by
Hebatallah Mohamed Nabil Yasein
Supervised by
Professor Dr. Ahmed Sharaf Eldin
Vice Dean of the Faculty of Computers and Information, Helwan University
Dr. M. Abd El-Fattah Belal
Assistant Prof. Computer Science Department,Faculty Of Computers and Information,
Helwan University
Dr. Sayed Abd El-Gaber
Lecturer Prof. Information Systems Department, Faculty Of Computers and Information,
Helwan University
2006
‫ﷲا‬
‫ا‬
‫ﻗـﻞ إن ﺻﻼﺗﻰ و ﻧﺴﻜﻰ و‬
‫ﻣﺤﻴﺎى و ﻣﻤﺎﺗﻰ ﷲ رب‬
‫اﻟﻌﺎﻟﻤﻴﻦ ‪.‬‬
‫ﺻﺪق اﷲ اﻟﻌﻈﻴﻢ‬
Dedicated to:
My dear parents, husband, and first of all
to ALLAH
Acknowledgments
First and foremost I thank ALLAH (SWT) for granting me
patience and stamina to complete my work. After ALLAH, all
my thanks to my beloved parents who showed me the right path
and for constant support and encouragement; I hope some day I
can repay them for every thing they did for me.
I would like to express my deepest gratitude to Prof. Ahmed
Sharaf ElDin who supports me and honestly guide me as a
teacher and father.
I would also like to express my deepest and almost gratitude
to Ass. Prof. Mohamed Belal for suggesting the point of research,
for his deep and truly help and for encouraging me all over my
way of research.
I would also thank Dr. Sayed Abd ElGaber for his help and
cooperation.
Really, all my deepest appreciation to my dear dean at Shrouk
Academy, Prof. Ahmed Gabr for his kind helps and support as a
true father.
I also want to thank my big family members for their
encouragement, pray and love. And to all my helpful friends,
thank you very much for your cooperation.
Last, but not the least, I would like to express my special
thanks to my loving husband Mohamed for his encouragement,
help and support.
Hebatallah Nabil
Contents
ABSTRACT
List of Figures and Tables
Chapter 1: Introduction
1.1 Preface ………………………………………………………………….
1
1.2 Problem Statement ………………………………………………….….
3
1.3 Research Objectives ……………………………………………………
3
1.4 Research Techniques …………………………………………………...
4
1.5 Thesis Outline …………………………………………………….…….
5
Chapter 2: Theoretical Background
2.1 Concepts and Definitions …………………………………….…….…...
7
2.1.1 Knowledge Discovery in Database (KDD) ……………………..….…. 7
2.1.2 Data Mining (DM) ………………………………………………..…… 7
2.1.3 Association Rules (AR) ………………………………….…….…...…. 9
2.1.4 Training an Artificial Neural Network …………………………..……. 10
2.2 Association Rules Discovery Algorithms ……………………….…….…
2.2.1 Apriori ……………………………………………………..….…..….
2.2.2 Apriori-TID ………………………………………………..….….…..
2.2.3 Apriori-Hybrid ………………………………………………………..
2.2.4 Sampling ………………………………………………………….…..
2.2.5 Partition ……………………………………………………….….…...
2.2.6 Dynamic Itemset Counting algorithm (DIC) ……………………….….
2.2.7 Direct Hashing and Pruning algorithm (DHP) ……….…….….…….
2.2.8 Frequent Pattern Growth algorithm (FP-Growth)…………..………….
2.2.9 Ready and Go algorithm (R &G)………………………………………
2.2.10 Dynamic and Direct Support algorithm (DDS)……………..…..…….
13
15
20
22
23
24
26
31
33
36
38
2.3 Problem Investigation …………………………………………….…..….
41
Chapter 3: The Proposed Model: Dynamic Support and Update (DSU)
3.1 Data Structure in DSU
……………………….………………………..
46
3.2 Dynamic Support and Update (DSU) ……………..……………….…….. 47
3.3 DSU Notation …………………….……………………..……….……...
49
3.4 Outline of the DSU ……………………………………..….……….……... 50
3.5 DSU Algorithm ………………………………………………...….…..… 61
Chapter 4: Simulation and Results
4.1 DSU Tracing Example ………………………………………………..…. 67
4.2 Numerical Experiments ………………………………………..…...……. 89
4.2.1 Design of the experiment ………………………………….…..……... 89
4.2.2 Experiment Output ………………………………………..…………. 89
4.3 Results and Analysis
…………………………………………...…….
92
Chapter 5: Conclusions and Future Work
5.1 Conclusions …………………………………………………….………... 94
5.2 Recommendations for Future Work …………………………….….……... 96
References
Appendices
Appendix I: Flowcharts
Appendix II: Script
ABSTRACT
In early nineties, the problem of association rules aroused, and up to
now, many researchers are trying to improve and create new techniques
for association rules discovery. Most of the algorithms regarding this
issue are depending on repeating the process of scanning the entire DB
to count the support of the itemsets. This research is trying to find out
an optimal method to extract association rules without repeating the
process of scanning the whole database for each new transaction, and it
is also intended to be flexible enough to accept any change in the
minimum support threshold without repeating the process of scanning
the entire DB again or affecting the efficiency of the output. So, we
introduce a new algorithm for dynamic association rules mining based
on an Associative Memory Neural Network (AMNN) model.
List of Figures and Tables
Figure 2.1 Finding frequent itemsets in Apriori example]
Figure 2.2 DIC Lattice
Figure 2.3 DIC scans 1.7 passes
Figure 2.4 Apriori scans 7 passes
Figure 2.5 The conditional FP-tree associated with the conditional node "C"
Figure 2.6 The FP-tree for all transactions
Figure 2.7 Transaction representation in DDS
Figure 3.1: DSU Block Diagram
Figure 3.2: DSU general process architecture for a new transaction
Figure 3.3: DSU General Architecture
Figure 3.4(a): DSU General Flowchart
Figure 3.4(b): Generate β(L) Flowchart
Figure 3.4(c): Generate N(L) Flowchart
Figure 3.4(d): Update Rules Flowchart
Figure 3.5(a): General Module.
Figure 3.5(b): Module of generating β at level L of association
Figure 3.5(c): Module of calculating the counting matrix N at level L of
association.
Figure 3.5(d): Module of updating extracted association rules.
Figure 3.5(e): Computational module for hyper weight matrix
Figure 4.1: The relationship between the number of transactions and the consumed
time
Figure 4.2: The relationship between the value of the support threshold and the
consumed time.
Figure 4.3: Apriori performance due to the change in the Sth value
Table 2.1: C'k structure in Apriori-TID algorithm
Table 2.2 conditional FP-tree paths
REFERENCES
References
[1] Sleem M. A., Msc., “Beyond Static Association Rules: A proposed Method for
Dynamic Association Rules”, University of Nottingham, School of Computer
Sceince and Information Technology, Sept. 2002.
[2] Han J. and Kamber M. “Data Mining: Concepts and Techniques”, New York:
Morgan Kaufman 2001.
[3] Baixeries J., Casas G. and Balcazar J. L. "Frequent Sets, sequences and
taxonomies: new, efficient algorithmic proposals". ALCOM Project (European
Union ESPRIT LRT Project 20244), December 2000.
[4] Loo K.K., Yip C.L., Kao B., and Cheung D.W., "Exploiting the Duality of
Maximal Frequent Itemsets and Minimal Infrequent Itemsets for I/O Efficient
Association Rule Mining", Proceedings of the 11th International Conference on
Database and Expert Systems Applications (DEXA'2000), Greenwich,
September 2000.
[5] Gunopulos D., Khardon R., Mannila H., Saluja S., Toivonen H. and R. S.
Sharma. "Discovering All Most Specific Sentences". ACM Transactions on
Database Systems, Vol. 28, No. 2, Pages 140–174, June 2003,.
[6] Zheng Z., Kohavi R. and Mason L.. "Real World Performance of Association
Rule Algorithms". A short version of this paper is published in Proceedings of
the Seventh ACM-SIGKDD International Conference on Knowledge Discovery
and Data Mining, New York, NY: ACM,2001.
[7] Lin D. and Kedem Z. M.."Pincer-search: A new algorithm for discovering the
maximum frequent set". In Proceedings of the 6th Conference on extending
Database Technology (EDBT), Valencia, Spain, March 1998.
[8] Agrawal R. and Srikant R.. "Fast Algorithms for Mining Association Rules".
IBM Research Report RJ 9839, IBM Almaden Research Center, In Proc. 20th
VLDB, Sept. 1994.
[9] Zaki M. J., Parthasarathy S., Ogihara M, and Li W.. "New Algorithms for Fast
Discovery of Association Rules". Technical Report 651. July 1997.
[10] Webb G. I.. "Efficient search for association rules". KDD-2000, Poston, MA
Augus 2000 .
[11] Hipp J., Guntzer U. and Nakhaeizadeh G.."Algorithms for Association Rule
Mining: A General Survey and Comparison". SIGKDD Explorations, Vol. 2,
Issue 1- PP 63, Jul. 2000.
[12] Agrawal R., Imielinski T., and Swami A."Mining association rules between sets
of items in large databases". In Proc. of ACM SIGMOD Conf. on management
of data, Washington, D.C., pp. 207–216, May 1993.
[13] Houtsma M. and Swami A. "Set-oriented mining of association rules". Research
Report RJ 9567, IBM Almaden Research Center, San Jose, California,
Oct.1993.
[14] Park J. S., Chen M. S. and Yu P. S. "An Effective Hash-Based Algorithm for
Mining Association Rules". Proceedings of the ACM SIGMOD International
Conference on Management of Data, 1995.
[15] Brin S., Motwani R., Ullman J. and Tsur S.. "Dynamic itemset counting and
implication rules for market basket data". Int. Conf. Management of Data.
p.255-264 ACM press 1997.
[16] Han J., Pei J. and Yin Y.."Mining Frequent Patterns without Candidate
Generation". Proc. ACM-SIGMOD, Dallas, TX, May 2000.
[17] Park J. S., Chen M. S. and Yu P. S.."Efficient Data Mining for Path Traversal
Patterns". IEEE Transactions on Knowledge and Data Engineering, Vol. 10,
No. 2, March/April 1998.
[18] Tveter D. “ The Pattern Recognition Basis of Artificial Intelligence”, 1998.
[19] Stergiou C. and Siganos D., "Neural Networks",
http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html.
[20] Wikipedia, version 1.2, November 2002,
http://wikikpedia.org/wiki/Neural_Network.