Download A Review of Negative and Positive Association rule mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 2, Issue 12, December 2012)
A Review of Negative and Positive Association rule mining
with multiple constraint and correlation factor
Nikky Suryawanshi1, Susheel Jain2, Anurag Jain3
1,2,3
Computer Science Department., Radharaman Institute of Technology and Science, Bhopal (M.P.), India
Abstract-Negative and positive association rule mining is
extract needful information for large database. The
generation of negative and positive rule based on interesting
pattern and non- interesting pattern of database. The
violation of given threshold value such as minimum support
and minimum confidence generate some negative rules. The
generation of association rule mining dependent some
algorithm such as Apriori algorithm, FP-growth algorithm
and tree based algorithm.. In this paper we present different
approach for reduction of negative rule from association
rule mining. The process of reduction and removal of
association rule mining follow the multiple constraints and
correlation factor of rule mining.
Keywords—-association rule mining, PNAR, multiple
constraints, correlation factor.
A. Positive and Negative rules
A strong positive association is referred to as a positive
relation between two item sets. Negative relation implies a
negative rule between the two item sets. However, strong
negative associations reveal only the existence of negative rules
in a hidden representation, and do not provide the actual
negative rules. Unlike existing mining techniques, the research
in this paper extends traditional associations to include
association rules of forms A→⌐B, ⌐A→B, and ⌐A→⌐B, which
indicate negative associations between item sets. We call rules
of the form A→B positive rules, and rules of the other forms
negative rules [4]. While positive association rules are useful in
decision-making, negative association rules also play important
roles in decision-making.
For example, there are typically two types of trading
behaviors (insider trading and market manipulation) that impair
fair and efficient trading in securities stock markets. The
objective of the market surveillance team is to ensure a fair and
efficient trading environment for all participants through an
alert system. Negative association rules assist in determining
which alerts can be ignored. Assume that each piece of
evidence: A, B, C and D, can cause an alert of unfair trading X.
If having rules A→⌐X and C → ⌐X, the team can make the
decision of fair trading when A or C occurs, in other words,
alerts caused by A or C can be ignored.
I. INTRODUCTION
Association rules provide a convenient and effective way to
identify and represent certain dependencies between attributes
in a database. Association rules mining consist of positive and
negative association rules mining. In the traditional approach to
find association rules, one merely thinks in terms of positive
association rules: especially when determining the degree of
support and confidence. Previous studies propose that negative
association rules have high Accuracy at handling massive
unstructured data [1]. On one hand, it is not easy to find
interesting or useful frequent item sets [2]. Traditional
techniques always select some simple criterions based on only
positive frequent item sets such as confidence, distance
functions, rough set and so on, while ignoring the value of
negative frequent item sets. The simple pick may affect the
mining accuracy. Recently, there are some growing interests in
developing techniques for mining association patterns without
support constraint. The algorithms proposed in are limited to
deal with identifying pairs of similar columns.
The approaches presented in [3] employ confidence-based
pruning strategy instead of support-based pruning adopted in
the traditional association rule mining. The mining of supportfree association discovers rules in the patterns with high
support, cross-support where items have widely differing
support levels, and low support. In fact, patterns with a high
minimum support level often are obvious and well known;
patterns with cross support level have extremely poor
correlation, and patterns with low support often provide
valuable new insights
B.Multiple Constraints
A data mining process is performed with specified
constraints may generate interesting frequent patterns. Such
mining can adapt both the efficiency and reliability of frequent
pattern mining, thus promote constraint based mining which
define user specified constraint to confine search space and
derive interesting patterns[5].
Regular expressions can be used to define desired
relationship between attributes .Pattern template is another form
of regular expression, this specify the required pattern to be
mined. Types of constraint defined as [6]:

Knowledge base constraint: knowledge base
constraint which specify the desired form of the
knowledge to be mined.

Data Constraint: Data constraints which specify the
set of relevant data or frequent item set.

Measure constraint: Threshold value is specified by
this constraint support, confidence and correlation
depend on threshold.

Hierarchy Constraint: Hierarchy constraint which
specifies the desired hierarchy level of the item set
this hierarchy level is used to mined rule.
II. BASIC CONCEPT & TERMINOLOGY
Association rule mining, also known as pattern discovery in
transaction data base, has been accomplished for finding
negative and positive rules. Basic concept and terminology is
defined into following category:

778
Rule Constraint: Rule constraint which specify
the relationship among attribute ,attribute value.
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 2, Issue 12, December 2012)
C. Correlation Factor
The normal measures support and confidence are useful for
finding association rules between the data set. support is the
first used to find the frequent items and exploiting its down –
ward closure property to prune the items and reduce the search
space [7].
The negation of an item set A is represented by ¬A, which
means the absence of the item set A. We call a rule of the form
A=> B a positive association rule, and rules of the other forms
(A=> ¬B, ¬A=>B and ¬A=> ¬B) negative association rules.
The support and confidence of the negative association rules
can make use of those of the positive association rules [10].In
this paper work we have create a meaning for these type rule
like: Positive Rule (PR) = A=> B Consequent Negative Rule
(CNR) = A=> ¬B Antecedent Negative Rule (ANR) = ¬A=>B
Antecedent and Consequent Negative (ACNR) = ¬A=> ¬B .
The support and Confidence for CNR, ANR and ACNR rule is
given by the following formulas:
Given a set of transaction [8], where each transaction is a set
of items, an association rule is an expression X=> Y, where X
and Y are set of items. The meaning of such rule is that
transaction in database which contains the item in X tend to also
contain the item in Y. An example (98%) of customers who
purchase tires an, auto accessories and also buy some
automotive services, here 98% is called the confidence. The
support of the rule X=>Y is the percentage of transaction that
contain both X and Y. Support=X_Count (XUY)/N, where
N is the number of transaction in database. Confidence is
defined as[9].
1 .Consequent Negative Rule (CNR):Supp (A=>¬B) =supp (A) supp (AUB)............................ (1)
Conf (A=>¬B) = supp (A)-supp (AUB)......................... (2)
Supp (A)
Confidence =
2 .Antecedent Negative Rule (ANR):
Support (XUY)
Support X
Association rule mining is just to find out such rules that their
support and confidence were greater than user–specified
minimum support and minimum confidence.
Supp (¬A=>B) =supp (B)-supp (AUB)........................... (3)
Conf (¬A=>B) = supp (B)-supp (AUB)..........................(4)
1-supp (A)
Minimum support≥ predefined threshold
Minimum confidence≥ predefined threshold
3 Antecedents and Consequent Negative (ACNR):
Supp (¬A=>¬B) =1-supp (A)-supp (B) +supp (AUB)..... (5)
III. RELATED WORK
Conf (¬A=>¬B) =1-supp (A)-supp (B) +supp (AUB)..... (6)
1-supp (A)
In the process of review of association rule mining of
positive and negative mining for large database generate
millions of insignificant rule .researchers developed a lot of
algorithm and technique for determining association rule
Apriori algorithm also called level wise algorithm [10] is the
most popular algorithm to find all the frequent set. The apriori
relevant item set discovery algorithm uses two methods
(candidate generation and pruning strategy) at every step. It
moves upward from bottom to top, where no candidate set
remains after pruning. The defects are obvious one need to scan
database many times, and the other is the candidate data set
scale generated is large.
The negative association rules discovery seeks rules of
the three forms with their support and confidence greater
than, lesser than or equal to, user-specified minsupp and
minconf thresholds respectively. These rules are referred
to as an interesting negative association rule.
IV. Correlation Coefficient (CRC)
The algorithm uses the correlation coefficient (CRC)
between itemsets to find negative association rules. The
correlation coefficient (CRC) between itemsets can be
defined as:
CRC={
|
}
The WEI Yong-Qing et al. [11] define Improved Apriori
Algorithm..An improved algorithm removed the defects of the
apriori. Through scanning the data base once, it counts each
candidate item set that can’t meet the supporting degree, so as
to improve the efficiency of algorithm. The improved algorithm
reduces the times of scanning the database greatly, without use
the operation of joining and pruning. Apriori algorithm for
mining can truly reflect the relationship between data still needs
to be evaluated .If the association rule confidence level is high
(100%),so the performance level of the algorithm needs further
modification.
....(7)
A and B are itemsets.
When CRC (A, B) = 1, A and B are independent.
When CRC (A, ⌐B) <1, A and B have negative
correlation.
When CRC (⌐A, B) <1, A and B have negative
correlation.
By using Mod1 in eq.7, CRC values not exceed more
than 1.it is providing benefit in negative rule generation.
D. Support and Confidence
779
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 2, Issue 12, December 2012)
A.B.M .Rezbaul Islam et al. [2] the main problem is the
generation of candidate set. Among the existing techniques, the
frequent pattern growth (FP-growth) method is the most
efficient and scalable approach. FP-tree is a suppressed
representation of transaction database because relevant items
are used to construct tree other irrelevant are pruned. The main
drawback of FP growth is,. It generates number of conditional
FP tree. It is not suitable for incremental mining .A new
algorithm called improved FP-tree, mines all possible frequent
item set without generating candidate set.
Algorithm performs a first function for finding correlation
value and according this value association rules divide into
positive and negative rules and second function used dual
confidence to reduce useless negative rules and mined useful
negative rules.
Kundu,G et al.[16] an accurate classifiers is better than
decision tree classifier. Accurate classifiers have added a new
direction to the ongoing research. In some cases, It is found that
the accurate classifier contain the incorrect rules The idea is to
put accurate negative rules in place of inaccurate positive rules.
Generate a enough number of negative rules efficiently so that
classification accuracy is enhanced .Associative classifier with
negative rules (ACN) is time-efficient method and it can be
used for classification perspective.
R. Uday Kiran et al .[12] describe Apriori approach suffer
from “rare item problem” . At high minsup value, rare item sets
are missed, and at low minsup value, the number of frequent
item sets explodes. The technique in which minsup of each item
is fixed or based on support difference reduces both “missing
rule and “explosion rule”.To improve the performance of
extracting itemsets involving rare items, an approach known as
Multiple Minimum Support Apriori (MSApriori).It was
observed MSApriori still suffer from the “rare item problem” if
item support vary widely.
IV CONCLUSION AND FUTURE WORK
In this paper we present review of association rule mining
with different approach and discuss advantage and demerits of
rule mining. Also discuss the generation process of negative
and positive association rule mining. End users of association
rule mining tools encounter several well known problems in
practice. In future we optimized the negative rule generation
technique using genetic algorithm.
Sandeep Singh Rawat et al.[13].In this literature an effort
has been made to extract rare association rule with multiple
minimum supports. Explore the probability and propose
multiple minsup base apriori-like approach called Probability
Apriori Multiple Mininmum Support to efficiently discover
.rare association rules. The main idea of this algorithm is to
count the probability of each item and used to make next
generation of candidate and frequent items without scanning the
database every time.
ACKNOWLEDGEMENT
Nikky Suryawanshi Rai is a scholar of M.Tech, (Computer
Science Engineering), at R.I.T.S. Bhopal, under R.G.T.U.
Bhopal, India.
Susheel Jain, Assistant Professor in Computer science
department of R.I.T.S., Bhopal, M.P. He has done his M.Tech.
in Software Engineering From Gautam Buddh Technical
University, Lucknow, India.
Anurag Jain, H.O.D. of Computer science department of
R.I.T.S. Bhopal, M.P. He has done his M.Tech, in Computer
Science and Engineering, From Barkatullah University, Bhopal,
India.
In traditional approach finding negative association rules
encounters large search space and computing time. Li-Min Tsai
et al. [14] propose An improved approach called generalized
negative association rule (GNAR) algorithm describe negative
rules are as important as positive rules. Negative association
rules can help users quickly decide which ones as important
instead checking many rule. This method mainly decreases the
huge computing cost of mining negative association rules and
reduces most non-interesting negative rules.
REFERENCES
[1] By Xiang long Liu;Bo Lang;Wei Yu;Janwu Luo;Lei Huang ”
AUDR An Advanced unstructured Data Repository” 978-1-45770209-9 $31 @2011.
Idheba Mohamad Ali o et al.[4] efficient positive negative
association rule algorithm may produce some obstacle, such as
rules generation, high degree of irrelevant data set, how to
handle single minimum support value. A new algorithm has
been proposed, PNAR_IMLMS is combination of PNAR and
IMLMS .this algorithm mined both rule simultaneously.
[2] By A.B.M.Rezbaul Islam, Tae-Sun Chung” An Improved Frequent
Pattern Tree Based Association Rule Mining Technique” 978-14244-9224-4/11/$26.00 ©IEEE 2011.
[3] Xiong,H.”Mining strong affinity association patterns in data sets
with skewed support distribution”Computing & processing
(software & hardware) 0-7695-1978-4/09 $31@ IEEE 2003
Xiufen Piao et al.[15] old techniques based on support
confidence and ,may be generate large scale of redundant rules
therefore produce a new algorithm. This algorithm performs
two function correlation and dual confidence.
[4] By Idheba Mohamad Ali O,Swesi ,Azuraliza Abu Bakar, Anis
Suhailis Abdul Kadir “Mining Possitive and Negative Association
Rules from Interesting Frequent and Infrequent Itemsets”978-14673-0024/10/@26 @2012.
780
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 2, Issue 12, December 2012)
[5] By Xiaofeng Yuan, Hualong Xu, and Shuhong Chen “Improvement
on the Constrained Association Rule Mining Algorithm of
Separate” 1-4244-0682-X/06/$20.00 © IEEE 2006.
[6] Jiawwei Han and Micheline Kamber, “Data Mining Concept and
Technique”.Morgan Kaufmann ,San Fransisco, CA 94111 @2006.
[7] By LI Tong-yan, LI Xing-ming New Criterion for Mining Strong
Association Rules in Unbalanced Events Intelligent Information
Hiding and Multimedia Signal Processing 978-0-7695-3278-3/08
$25.00 © IEEE 2008.
[8] By R. Srikan tand R.Agrawal..Mining Generalized Association
Rules”IBM Almaden Research Centre, an Jose ,California,@ 1995
[9] By XING Xue, Chen Yao, and Wang Yan-en;Study on mining
theories of association rules and its application 978-0-7695-39423/10 $26 @IEEE 2010.
[10] By Arun K Pujari ”Data Mining Techniques”
press(India) private limited 2001, 2009:81-99.
@Universities
[11] By WEI Yong-qing1, YANG Ren-hua2, LIU Pei-yu2 “An
Improved Apriori Algorithm for Association Rules of Mining”
978-1-4244-3930-0/09/$25.00 © IEEE 2009.
[12] By R. Uday Kiran and P. Krishna Reddy” An Improved Multiple
Minimum Support Based Approach to Mine Rare Association
Rules “978-1-4244-2765-9/09/$25.00 © IEEE 2009.
[13] By Sandeep Singh Rawat and Lakshmi Rajamani “Probability
Apriori based Approach to Mine Rare Association Rules” Data
Mining and Optimization (DMO) 28-29 June 2011, Selangor,
Malaysia 978-1-61284-212-7/11/$26.00 ©IEEE 2011.
[14] By Li-Min Tsai, Shu-Jing Lin, and Don-Lin Yang “Efficient
Mining of Generalized Negative Association Rules” Granular
Computing 978-0-7695-4161-7/10 $26.00 © IEEE 2010.
[15] By Xiufend Piao,Zhanlong Wang,Gang Liu “Reasearch on mining
positive and negative association rules based on dual
confidence”978-1-4244-9954-0/11 @31 @2011.
[16]Kudu, G.Islam,M.M; Munir,S.; Bari ,M.F. “An associative
classifier with negative rules”978-1-4244-2172-5 @31 @ 2008
781