Download a scrutiny of association rule mining algorithms

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

K-means clustering wikipedia , lookup

Transcript
ISSN 23499842(Online), Volume 1, Special Issue 2(ICITET 15), March 2015
International Journal of Innovative Trends and Emerging Technologies
A SCRUTINY OF ASSOCIATION RULE MINING
ALGORITHMS
M. MADHANKUMARA AND Dr. C. SURESHGANADHASB
a
Research Scholar, Department of Computer Science and Engineering, Manonmaiam Sundaranar
University, Tamil Nadu, India
b
Professor & Head, Department of Computer Science and Engineering, Vivekanandhan College
of Engineering for Women, Elayampalayam, Thiruchencode, Tamil Nadu
[email protected]
ABSTRACT - Association Rule Mining (AM) is
one of the most popular data mining techniques. This
paper provides a brief review about the existing
Association Rules Data Mining. Association rule
mining is normally performed in generation of
frequent itemsets and rule generation in which many
researchers presented several efficient algorithms.
Association rule mining has been extensively studied
in the data mining community. The intention of this
review is to present a broad classification of existing
association rule mining algorithms and to motivate
for the research.
Keywords – Data Mining, Association Rules,
ARM, Algorithms.
I.
and
. The sets of items X
and Y are called antecedent (left-hand-side or LHS)
and consequent (right-hand-side or RHS) of the rule
respectively.
Association rules are usually required to satisfy a
user-specified minimum support and a user-specified
minimum confidence at the same time. Association
rule generation is usually split up into two separate
steps:
1.
2.
INTRODUCTION
Association rule learning is a popular and well
researched method for discovering interesting
relations between variables in large databases. It is
intended to identify strong rules discovered in
databases
using
different
measures
of
interestingness.[1] Based on the concept of strong
rules, Rakesh Agrawal et al.[2] introduced
association rules for discovering regularities between
products in large-scale transaction data recorded by
point-of-sale (POS) systems in supermarkets.
Following the original definition by Agrawal et al.[2]
the problem of association rule mining is defined as:
Let
be a set of n binary attributes
called items. Let
be a set of
transactions called the database. Each transaction in
D has a unique transaction ID and contains a subset
of the items in I. A rule is defined as an implication
of
the
form
where
.
First, minimum support is applied to find all
frequent itemsets in a database.
Second, these frequent itemsets and the
minimum confidence constraint are used to
form rules.
While the second step is straightforward, the first step
needs more attention.
Many algorithms for generating association rules
were presented over time.
Some well known algorithms are Apriori, Eclat and
FP-Growth, but they only do half the job, since they
are algorithms for mining frequent itemsets. Another
step needs to be done after to generate rules from
frequent itemsets found in a database.
I. ASSOCIATION RULE
MINING APPROACHES
The Table 1. presents the classification of the
Association rule mining is a well explored research
area, we will discuss about basic and classic
approaches
for
association
rule
mining
Paper ID # IC15007
27
ISSN 23499842(Online), Volume 1, Special Issue 2(ICITET 15), March 2015
International Journal of Innovative Trends and Emerging Technologies
Table 1. Existing Association Rules in Data Mining Techniques and Algorithms
Author
Algorithms
Agrawal et al.
1993 [3]
AIS Algorithm.
(Agrawal,
Imielinski,
Swami))
Agrawal
and Srikant 1994.
[4]
Apriori
Algorithm.
Srikant and
Agrawal 1996.
[5]
Multiple
Dimensional
ARM.
Cheung et al.
1997.
[6]
Maintaining of
Association
Rules.
Han and
Kamber 2000.
[7]
Multiple Concept
Level ARM.
Pei and Han 2000.
[8]
Constraints based
ARM.
Han et al. 2000.
[9]
FP-Tree
(Frequent Pattern
Tree) Algorithm.
Das et al. 2001
[10]
Rapid Association
Rule Mining
(RARM)
John D. Holt and
Soon M. Chung.
2002.
[11]
Hashing and
Pruning (IHP) for
mining
association rules
Yi-Chung Hu et al.
Fuzzy Grids
Advantages
Agrawal et al. (1993) firstly proposed pattern mining concept in form
of market based analysis for finding association between items bought
in a market.It focus on improving the quality of databases
together with necessary functionality to process decision
support queries.
The AIS is just a straightforward approach that requires many
passes over the database, generating many candidate itemsets
and storing counters of each candidate while most of them turn
out to be not frequent.
Multiple dimensional association rule mining is to discovery
the correlation between different predicts/attributes. Each
attribute/predict is called a dimension, such as: age, occupation
and buys in this example. At the same time multiple
dimensional association rule mining concerns all types of data
such as boolean data, categorical data and numerical dat.
From the definition of data mining, we can see that the object
of data mining is data stored in very large repositories. The
giant amount of data poses a challenge of maintaining and
updating the discovered rules while the data may change from
time to time in different ways.
The FUP (Fast UPdate) algorithm was introduced to deal with
insertion of new transaction data.
Multiple level association rule mining is trying to mine strong
association rules among intra and inter different levels of
abstraction. For example, besides the association rules between
milk and ham, it can generalize those rules to relation between
drink and meat, at the same time it can also specify relation
between certain brand of milk and ham.
In order to improve the efficiency of existing mining
algorithms, constraints were applied during the mining process
to generate only those association rules that are interesting to
users instead of all the association rules.
The frequent itemsets are generated with only two passes over
the database and without any candidate generation process.
By avoiding the candidate generation process and less passes
over the database, FP-Tree is an order of magnitude faster than
the Apriori algorithm.
RARM is claimed to be much faster than FP-Tree algorithm
with the experiments result shown in the original paper. By
using the SOTrieIT structure RARM can generate large 1itemsets and 2-itemsets quickly without scanning the database
for the second time and candidates generation.
A new algorithm named Inverted Hashing and Pruning (IHP)
for mining association rules between items in transaction
databases. The performance of the IHP algorithm was
evaluated for various cases and compared with those of two
well-known mining algorithms, Apriori algorithm [Proc. 20th
VLDB Conf., 1994, pp. 487–499] and Direct Hashing and
Pruning algorithm [IEEE Trans. on Knowledge Data Engrg. 9
(5) (1997) 813–825]. It has been shown that the IHP algorithm
has better performance for databases with long transactions.
A new algorithm named fuzzy grids based rules mining
Paper ID # IC15007
28
ISSN 23499842(Online), Volume 1, Special Issue 2(ICITET 15), March 2015
International Journal of Innovative Trends and Emerging Technologies
2003. [12]
Based Rules
Mining Algorithm
(FGBRMA)
Feng-Hsu Wang
and Hsiu-Mei
Shao. 2004. [13]
Hierarchical
Bisecting
Medoids
Algorithm (HBM)
Yuh-Jiuan Tsay
and Jiunn-Yann
Chiang. 2005. [14]
Cluster-Based
Association Rule
(CBAR)
Guoqing Chen et
al. 2006. [15]
Gain based
Association Rule
Classification
(GARC)
Frans Coenen and
Paul Leng. 2007.
[16]
Classification
Association Rule
Mining (CARM)
He Jiang et al.
2008. [17]
Weighted
Association Rules
(WARs)
Yuanyuan Zhao et
al. 2009. [18]
Weighted
Negative
Association Rules
algorithm (FGBRMA) is proposed to generate fuzzy
association rules from a relational database. The proposed
algorithm consists of two phases: one to generate the large
fuzzy grids, and the other to generate the fuzzy association
rules. A numerical example is presented to illustrate a detailed
process for finding the fuzzy association rules from a specified
database, demonstrating the effectiveness of the proposed
algorithm.
The author proposed a new clustering method, called HBM
(Hierarchical Bisecting Medoids Algorithm) to cluster users
based on the time-framed navigation sessions. Those navigation
sessions of the same group are analyzed using the associationmining method to establish a recommendation model for
similar students in the future. Finally, an application of this
recommendation method to an e-learning web site is presented,
including plans of recommendation policies and proposal of
new efficiency measures. The effectiveness of the
recommendation methods, with and without time-framed user
clustering, are investigated and compared.
The author presented an efficient algorithm named clusterbased association rule (CBAR). The CBAR method is to create
cluster tables by scanning the database once, and then
clustering the transaction records to the k-th cluster table, where
the length of a record is k. Moreover, the large itemsets are
generated by contrasts with the partial cluster tables. This not
only prunes considerable amounts of data reducing the time
needed to perform data scans and requiring less contrast, but
also ensures the correctness of the mined results.
The author presented a new approach for constructing a
classifier, based on an extended association rule mining
technique in the context of classification. The characteristic of
this approach is threefold: first, applying the information gain
measure to the generation of candidate itemsets; second,
integrating the process of frequent itemsets generation with the
process of rule generation; third, incorporating strategies for
avoiding rule redundancy and conflicts into the mining process.
Classification Association Rule Mining (CARM) systems
operate by applying an Association Rule Mining (ARM)
method to obtain classification rules from a training set of
previously classified data. The rules thus generated will be
influenced by the choice of ARM parameters employed by the
algorithm (typically support and confidence threshold values).
In this paper we examine the effect that this choice has on the
predictive accuracy of CARM methods.
The weighted association rules (WARs) mining are made
because importance of the items is different. Negative
association rules (NARs) play important roles in decisionmaking. But the misleading rules occur and some rules are
uninteresting when discovering positive and negative weighted
association rules (PNWARs) simultaneously.
The Negative association rules become a focus in the field of
data mining. Negative association rules are useful in marketbasket analysis to identify products that conflict with each other
Paper ID # IC15007
29
ISSN 23499842(Online), Volume 1, Special Issue 2(ICITET 15), March 2015
International Journal of Innovative Trends and Emerging Technologies
(WNARs)
Tongyan Li and
Xingming Li. 2010
[19]
Association Rules
Mining based
Alarm Correlation
Analysis System
(ARM-ACAS)
WeiminOuyang et
al. 2011. [20]
Mining Fuzzy
Association Rules
IdhebaMohamad
Ali O. Swesi et al.
2012. [21]
Interesting
Multiple
Level
Minimum
Supports
(IMLMS)
Algorithm
AnjanaGosainet al.
2013. [22]
Traditional
Algorithm for
Association
Mining Rules
Yiyong Xiao et al.
2014. [23]
Variable
Neighbourhood
Search (VNS)
Algorithm
Vahid Beiranvand
et al. 2014. [24]
Multi-Objective
Particle Swarm
Optimization
Algorithm
(MOPAR)
II.
or products that complement each other. The negative
association rules often consist in the infrequent items. The
experiment proves that the number of the negative association
rules from the infrequent items is larger than those from the
frequent.
A novel Association Rules Mining based Alarm Correlation
Analysis System (ARM-ACAS) to find interesting association
rules between alarm events. In order to mine some infrequent
but important items, ARM-ACAS first uses neural network to
classify the alarms with different levels. In addition, ARMACAS also exploits an optimization technique with the
weighted frequent pattern tree structure to improve the mining
efficiency.
They propose mining fuzzy association rules to address the first
limitation. In this they put forward a discovery algorithm for
mining both direct and indirect fuzzy association rules with
multiple minimum supports to resolve these three limitations.
A new approach (PNAR_IMLMS) for mining both negative
and positive association rules from the interesting frequent and
infrequent item sets mined by the IMLMS model. The
experimental results show that the PNAR_IMLMS model
provides significantly better results than the previous model.
The traditional algorithms for mining association rules are built
on binary attributes databases, which has two imitations.
Firstly, it cannot concern quantitative attributes; secondly, it
treats each item with the same significance although different
item may have different significance.
A variable neighbourhood search (VNS) algorithm is
developed to solve the problem with near-optimal solutions.
Computational experiments are performed to test the VNS
algorithm against a benchmark problem set. The results show
that the VNS algorithm is an effective approach for solving the
MTFWS problem, capable of discovering many large-one
frequent itemset with time-windows (FITW) with a larger timecoverage rate than the lower bounds, thus laying a good
foundation for mining ARTW.
This paper deals with the numerical ARM problem using a
multi-objective perspective by proposing a multi-objective
particle swarm optimization algorithm (i.e., MOPAR) for
numerical ARM that discovers numerical association rules
(ARs) in only one single step. To identify more efficient ARs,
several objectives are defined in the proposed multi-objective
optimization
approach,
including
confidence,
comprehensibility, and interestingness. Finally, by using the
Pareto optimality the best ARs are extracted.
CONCLUSION
In this paper we briefly reviewed the existing
association rules mining in data mining applications.
This review would be helpful to researchers to focus
on the various issues of data mining. In future course,
we will review the various classification algorithms
and significance of evolutionary computing (genetic
programming) approach in designing of efficient
classification algorithms for data mining.
This study is prepared to our new project titled secure
multi party algorithm for mining of association rules.
Paper ID # IC15007
30
ISSN 23499842(Online), Volume 1, Special Issue 2(ICITET 15), March 2015
International Journal of Innovative Trends and Emerging Technologies
III.
REFERENCES
[1] Piatetsky-Shapiro, Gregory (1991), Discovery,
analysis, and presentation of strong rules, in
Piatetsky-Shapiro, Gregory; and Frawley, William J.;
eds., Knowledge Discovery in Databases, AAAI/MIT
Press, Cambridge, MA.
[2] Agrawal, R.; Imieliński, T.; Swami, A. (1993).
"Proceedings of the 1993 ACM SIGMOD
international conference on Management of data SIGMOD '93". p. 207. doi:10.1145/170035.170072.
[3] Agrawal, R., Imielinski, T., and Swami, A. N.
1993. Mining association rules between sets of items
in large databases. In Proceedings of the 1993 ACM
SIGMOD International Conference on Management
of Data, P. Buneman and S. Jajodia, Eds.
Washington, D.C., 207-216.
[4] Agrawal, R. and Srikant, R. 1994. Fast algorithms
for mining association rules. In Proc. 20th Int. Conf.
Very Large Data Bases, VLDB, J. B. Bocca, M.
Jarke, and C. Zaniolo, Eds. Morgan Kaufmann,
487{499.
[5] Srikant, R. and Agrawal, R. 1996. Mining
quantitative association rules in large relational
tables. In Proceedings of the 1996 ACM SIGMOD
international conference on Management of data.
ACM Press, 1-12.
[6] Cheung, D. W.-L., Lee, S. D., and Kao, B. 1997.
A general incremental technique for maintaining
discovered association rules. In Database Systems for
Advanced Applications. 185-194.
[7] Han, J. and Kamber, M. 2000. Data Mining
Concepts and Techniques. Morgan Kanufmann.
[8] Pei, J. and Han, J. 2000. Can we push more
constraints into frequent pattern mining? In
Proceedings of the sixth ACM SIGKDD international
conference on Knowledge discovery and data mining.
ACM Press, 350-354.
[9] Han, J. and Pei, J. 2000. Mining frequent patterns
by pattern-growth: methodology and implications.
ACM SIGKDD Explorations Newsletter 2, 2, 14-20.
[10] Das, A., Ng, W.-K., and Woon, Y.-K. 2001.
Rapid association rule mining. In Proceedings of the
tenth international conference on Information and
knowledge management. ACM Press, 474 - 481.
[11] John D. Holt, Soon M. Chung, “M ining
association rules using inverted hashing and
pruning“, Information Processing Letters, Volume
83, Issue 4, 31 August 2002, Pages 211-220,
Elsevier.
[12] Yi-Chung Hu, Ruey-Shun Chen, Gwo-Hshiung
Tzeng, “Discovering fuzzy association rules using
fuzzy partition methods”,
Knowledge-Based
Systems, Volume 16, Issue 3, April 2003, Pages 137147, Elsevier.
[13] Feng-Hsu Wang, Hsiu-Mei Shao, “E ffective
personalized recommendation based on time-framed
navigation clustering and association mining”, Expert
Systems with Applications, Volume 27, Issue 3,
October 2004, Pages 365-37, Elsevier.
[14] Yuh-Jiuan Tsay, Jiunn-Yann Chiang, “BAR: an
efficient method for mining association rules”,
Knowledge-Based Systems, Volume 18, Issues 2–3,
April 2005, Pages 99-10, Elsevier.
[15] Guoqing Chen, Hongyan Liu, Lan Yu, Qiang
Wei, Xing Zhang,” A new approach to classification
based on association rule mining “,Decision Support
Systems, Volume 42, Issue 2, November 2006, Pages
674-68, Elsevier.
[16] Frans Coenen, Paul Leng, “T he effect of
threshold values on association rule based
classification accuracy”, Data & Knowledge
Engineering, Volume 60, Issue 2, February 2007,
Pages 345-360, Elsevier.
[17] He Jiang; Yuanyuan Zhao; Xiangjun Dong,
"Mining Positive and Negative Weighted Association
Rules from Frequent Itemsets Based on Interest,"
Computational Intelligence and Design,ISCID '08.
International Symposium on , vol.2, no., pp.242,245,
17-18 Oct. 2008.
[18]
YuanyuanZhao,He
Jiang;
RunianGeng;
Xiangjun Dong, "Mining Weighted Negative
Association Rules Based on Correlation from
Infrequent
Items,"
Advanced
Computer
Control,ICACC '09. International Conference on,
vol., no., pp.270,273, 22-24 Jan. 2009.
[19] Tongyan Li, Xingming Li, “Novel alarm
correlation analysis system based on association rules
mining in telecommunication networks“, Information
Sciences, Volume 180, Issue 16, 15 August 2010,
Pages 2960-2978, Elsevier.
[20] WeiminOuyang; Qinhua Huang, "Mining direct
and indirect fuzzy association rules with multiple
minimum supports in large transaction databases,"
Fuzzy Systems and Knowledge Discovery (FSKD),
Eighth International Conference on , vol.2, no.,
pp.947,951, 26-28 July 2011.
[21] WeiminOuyang,” Mining Positive and Negative
Fuzzy Association Rules with Multiple Minimum
Supports”, International Conference on Systems and
Informatics, 2012.
[22] AnjanaGosain and ManeelaBhugra,” A
Comprehensive Survey of Association Rules On
Quantitative Data In Data Mining”, IEEE Conference
on Information and Communication Technologies,
2013.
[23] Yiyong Xiao, Yun Tian, Qiuhong Zhao,
“Optimizing frequent time-window selection for
association rules mining in a temporal database using
Paper ID # IC15007
31
ISSN 23499842(Online), Volume 1, Special Issue 2(ICITET 15), March 2015
International Journal of Innovative Trends and Emerging Technologies
a variable neighbourhood search”, Computers &
Operations Research, Volume 52, Part B, December
2014, Pages 241-250, Elsevier.
[24] Vahid Beiranvand, Mohamad MobasherKashani, Azuraliza Abu Bakar, “Multi-objective PSO
algorithm for mining numerical association rules
without a priori discretization “, Expert Systems with
Applications, Volume 41, Issue 9, July 2014, Pages
4259-4273, Elsevier.
Paper ID # IC15007
32