Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 2, Issue 12, December 2012) A Review of Negative and Positive Association rule mining with multiple constraint and correlation factor Nikky Suryawanshi1, Susheel Jain2, Anurag Jain3 1,2,3 Computer Science Department., Radharaman Institute of Technology and Science, Bhopal (M.P.), India Abstract-Negative and positive association rule mining is extract needful information for large database. The generation of negative and positive rule based on interesting pattern and non- interesting pattern of database. The violation of given threshold value such as minimum support and minimum confidence generate some negative rules. The generation of association rule mining dependent some algorithm such as Apriori algorithm, FP-growth algorithm and tree based algorithm.. In this paper we present different approach for reduction of negative rule from association rule mining. The process of reduction and removal of association rule mining follow the multiple constraints and correlation factor of rule mining. Keywords—-association rule mining, PNAR, multiple constraints, correlation factor. A. Positive and Negative rules A strong positive association is referred to as a positive relation between two item sets. Negative relation implies a negative rule between the two item sets. However, strong negative associations reveal only the existence of negative rules in a hidden representation, and do not provide the actual negative rules. Unlike existing mining techniques, the research in this paper extends traditional associations to include association rules of forms A→⌐B, ⌐A→B, and ⌐A→⌐B, which indicate negative associations between item sets. We call rules of the form A→B positive rules, and rules of the other forms negative rules [4]. While positive association rules are useful in decision-making, negative association rules also play important roles in decision-making. For example, there are typically two types of trading behaviors (insider trading and market manipulation) that impair fair and efficient trading in securities stock markets. The objective of the market surveillance team is to ensure a fair and efficient trading environment for all participants through an alert system. Negative association rules assist in determining which alerts can be ignored. Assume that each piece of evidence: A, B, C and D, can cause an alert of unfair trading X. If having rules A→⌐X and C → ⌐X, the team can make the decision of fair trading when A or C occurs, in other words, alerts caused by A or C can be ignored. I. INTRODUCTION Association rules provide a convenient and effective way to identify and represent certain dependencies between attributes in a database. Association rules mining consist of positive and negative association rules mining. In the traditional approach to find association rules, one merely thinks in terms of positive association rules: especially when determining the degree of support and confidence. Previous studies propose that negative association rules have high Accuracy at handling massive unstructured data [1]. On one hand, it is not easy to find interesting or useful frequent item sets [2]. Traditional techniques always select some simple criterions based on only positive frequent item sets such as confidence, distance functions, rough set and so on, while ignoring the value of negative frequent item sets. The simple pick may affect the mining accuracy. Recently, there are some growing interests in developing techniques for mining association patterns without support constraint. The algorithms proposed in are limited to deal with identifying pairs of similar columns. The approaches presented in [3] employ confidence-based pruning strategy instead of support-based pruning adopted in the traditional association rule mining. The mining of supportfree association discovers rules in the patterns with high support, cross-support where items have widely differing support levels, and low support. In fact, patterns with a high minimum support level often are obvious and well known; patterns with cross support level have extremely poor correlation, and patterns with low support often provide valuable new insights B.Multiple Constraints A data mining process is performed with specified constraints may generate interesting frequent patterns. Such mining can adapt both the efficiency and reliability of frequent pattern mining, thus promote constraint based mining which define user specified constraint to confine search space and derive interesting patterns[5]. Regular expressions can be used to define desired relationship between attributes .Pattern template is another form of regular expression, this specify the required pattern to be mined. Types of constraint defined as [6]: Knowledge base constraint: knowledge base constraint which specify the desired form of the knowledge to be mined. Data Constraint: Data constraints which specify the set of relevant data or frequent item set. Measure constraint: Threshold value is specified by this constraint support, confidence and correlation depend on threshold. Hierarchy Constraint: Hierarchy constraint which specifies the desired hierarchy level of the item set this hierarchy level is used to mined rule. II. BASIC CONCEPT & TERMINOLOGY Association rule mining, also known as pattern discovery in transaction data base, has been accomplished for finding negative and positive rules. Basic concept and terminology is defined into following category: 778 Rule Constraint: Rule constraint which specify the relationship among attribute ,attribute value. International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 2, Issue 12, December 2012) C. Correlation Factor The normal measures support and confidence are useful for finding association rules between the data set. support is the first used to find the frequent items and exploiting its down – ward closure property to prune the items and reduce the search space [7]. The negation of an item set A is represented by ¬A, which means the absence of the item set A. We call a rule of the form A=> B a positive association rule, and rules of the other forms (A=> ¬B, ¬A=>B and ¬A=> ¬B) negative association rules. The support and confidence of the negative association rules can make use of those of the positive association rules [10].In this paper work we have create a meaning for these type rule like: Positive Rule (PR) = A=> B Consequent Negative Rule (CNR) = A=> ¬B Antecedent Negative Rule (ANR) = ¬A=>B Antecedent and Consequent Negative (ACNR) = ¬A=> ¬B . The support and Confidence for CNR, ANR and ACNR rule is given by the following formulas: Given a set of transaction [8], where each transaction is a set of items, an association rule is an expression X=> Y, where X and Y are set of items. The meaning of such rule is that transaction in database which contains the item in X tend to also contain the item in Y. An example (98%) of customers who purchase tires an, auto accessories and also buy some automotive services, here 98% is called the confidence. The support of the rule X=>Y is the percentage of transaction that contain both X and Y. Support=X_Count (XUY)/N, where N is the number of transaction in database. Confidence is defined as[9]. 1 .Consequent Negative Rule (CNR):Supp (A=>¬B) =supp (A) supp (AUB)............................ (1) Conf (A=>¬B) = supp (A)-supp (AUB)......................... (2) Supp (A) Confidence = 2 .Antecedent Negative Rule (ANR): Support (XUY) Support X Association rule mining is just to find out such rules that their support and confidence were greater than user–specified minimum support and minimum confidence. Supp (¬A=>B) =supp (B)-supp (AUB)........................... (3) Conf (¬A=>B) = supp (B)-supp (AUB)..........................(4) 1-supp (A) Minimum support≥ predefined threshold Minimum confidence≥ predefined threshold 3 Antecedents and Consequent Negative (ACNR): Supp (¬A=>¬B) =1-supp (A)-supp (B) +supp (AUB)..... (5) III. RELATED WORK Conf (¬A=>¬B) =1-supp (A)-supp (B) +supp (AUB)..... (6) 1-supp (A) In the process of review of association rule mining of positive and negative mining for large database generate millions of insignificant rule .researchers developed a lot of algorithm and technique for determining association rule Apriori algorithm also called level wise algorithm [10] is the most popular algorithm to find all the frequent set. The apriori relevant item set discovery algorithm uses two methods (candidate generation and pruning strategy) at every step. It moves upward from bottom to top, where no candidate set remains after pruning. The defects are obvious one need to scan database many times, and the other is the candidate data set scale generated is large. The negative association rules discovery seeks rules of the three forms with their support and confidence greater than, lesser than or equal to, user-specified minsupp and minconf thresholds respectively. These rules are referred to as an interesting negative association rule. IV. Correlation Coefficient (CRC) The algorithm uses the correlation coefficient (CRC) between itemsets to find negative association rules. The correlation coefficient (CRC) between itemsets can be defined as: CRC={ | } The WEI Yong-Qing et al. [11] define Improved Apriori Algorithm..An improved algorithm removed the defects of the apriori. Through scanning the data base once, it counts each candidate item set that can’t meet the supporting degree, so as to improve the efficiency of algorithm. The improved algorithm reduces the times of scanning the database greatly, without use the operation of joining and pruning. Apriori algorithm for mining can truly reflect the relationship between data still needs to be evaluated .If the association rule confidence level is high (100%),so the performance level of the algorithm needs further modification. ....(7) A and B are itemsets. When CRC (A, B) = 1, A and B are independent. When CRC (A, ⌐B) <1, A and B have negative correlation. When CRC (⌐A, B) <1, A and B have negative correlation. By using Mod1 in eq.7, CRC values not exceed more than 1.it is providing benefit in negative rule generation. D. Support and Confidence 779 International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 2, Issue 12, December 2012) A.B.M .Rezbaul Islam et al. [2] the main problem is the generation of candidate set. Among the existing techniques, the frequent pattern growth (FP-growth) method is the most efficient and scalable approach. FP-tree is a suppressed representation of transaction database because relevant items are used to construct tree other irrelevant are pruned. The main drawback of FP growth is,. It generates number of conditional FP tree. It is not suitable for incremental mining .A new algorithm called improved FP-tree, mines all possible frequent item set without generating candidate set. Algorithm performs a first function for finding correlation value and according this value association rules divide into positive and negative rules and second function used dual confidence to reduce useless negative rules and mined useful negative rules. Kundu,G et al.[16] an accurate classifiers is better than decision tree classifier. Accurate classifiers have added a new direction to the ongoing research. In some cases, It is found that the accurate classifier contain the incorrect rules The idea is to put accurate negative rules in place of inaccurate positive rules. Generate a enough number of negative rules efficiently so that classification accuracy is enhanced .Associative classifier with negative rules (ACN) is time-efficient method and it can be used for classification perspective. R. Uday Kiran et al .[12] describe Apriori approach suffer from “rare item problem” . At high minsup value, rare item sets are missed, and at low minsup value, the number of frequent item sets explodes. The technique in which minsup of each item is fixed or based on support difference reduces both “missing rule and “explosion rule”.To improve the performance of extracting itemsets involving rare items, an approach known as Multiple Minimum Support Apriori (MSApriori).It was observed MSApriori still suffer from the “rare item problem” if item support vary widely. IV CONCLUSION AND FUTURE WORK In this paper we present review of association rule mining with different approach and discuss advantage and demerits of rule mining. Also discuss the generation process of negative and positive association rule mining. End users of association rule mining tools encounter several well known problems in practice. In future we optimized the negative rule generation technique using genetic algorithm. Sandeep Singh Rawat et al.[13].In this literature an effort has been made to extract rare association rule with multiple minimum supports. Explore the probability and propose multiple minsup base apriori-like approach called Probability Apriori Multiple Mininmum Support to efficiently discover .rare association rules. The main idea of this algorithm is to count the probability of each item and used to make next generation of candidate and frequent items without scanning the database every time. ACKNOWLEDGEMENT Nikky Suryawanshi Rai is a scholar of M.Tech, (Computer Science Engineering), at R.I.T.S. Bhopal, under R.G.T.U. Bhopal, India. Susheel Jain, Assistant Professor in Computer science department of R.I.T.S., Bhopal, M.P. He has done his M.Tech. in Software Engineering From Gautam Buddh Technical University, Lucknow, India. Anurag Jain, H.O.D. of Computer science department of R.I.T.S. Bhopal, M.P. He has done his M.Tech, in Computer Science and Engineering, From Barkatullah University, Bhopal, India. In traditional approach finding negative association rules encounters large search space and computing time. Li-Min Tsai et al. [14] propose An improved approach called generalized negative association rule (GNAR) algorithm describe negative rules are as important as positive rules. Negative association rules can help users quickly decide which ones as important instead checking many rule. This method mainly decreases the huge computing cost of mining negative association rules and reduces most non-interesting negative rules. REFERENCES [1] By Xiang long Liu;Bo Lang;Wei Yu;Janwu Luo;Lei Huang ” AUDR An Advanced unstructured Data Repository” 978-1-45770209-9 $31 @2011. Idheba Mohamad Ali o et al.[4] efficient positive negative association rule algorithm may produce some obstacle, such as rules generation, high degree of irrelevant data set, how to handle single minimum support value. A new algorithm has been proposed, PNAR_IMLMS is combination of PNAR and IMLMS .this algorithm mined both rule simultaneously. [2] By A.B.M.Rezbaul Islam, Tae-Sun Chung” An Improved Frequent Pattern Tree Based Association Rule Mining Technique” 978-14244-9224-4/11/$26.00 ©IEEE 2011. [3] Xiong,H.”Mining strong affinity association patterns in data sets with skewed support distribution”Computing & processing (software & hardware) 0-7695-1978-4/09 $31@ IEEE 2003 Xiufen Piao et al.[15] old techniques based on support confidence and ,may be generate large scale of redundant rules therefore produce a new algorithm. This algorithm performs two function correlation and dual confidence. [4] By Idheba Mohamad Ali O,Swesi ,Azuraliza Abu Bakar, Anis Suhailis Abdul Kadir “Mining Possitive and Negative Association Rules from Interesting Frequent and Infrequent Itemsets”978-14673-0024/10/@26 @2012. 780 International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 2, Issue 12, December 2012) [5] By Xiaofeng Yuan, Hualong Xu, and Shuhong Chen “Improvement on the Constrained Association Rule Mining Algorithm of Separate” 1-4244-0682-X/06/$20.00 © IEEE 2006. [6] Jiawwei Han and Micheline Kamber, “Data Mining Concept and Technique”.Morgan Kaufmann ,San Fransisco, CA 94111 @2006. [7] By LI Tong-yan, LI Xing-ming New Criterion for Mining Strong Association Rules in Unbalanced Events Intelligent Information Hiding and Multimedia Signal Processing 978-0-7695-3278-3/08 $25.00 © IEEE 2008. [8] By R. Srikan tand R.Agrawal..Mining Generalized Association Rules”IBM Almaden Research Centre, an Jose ,California,@ 1995 [9] By XING Xue, Chen Yao, and Wang Yan-en;Study on mining theories of association rules and its application 978-0-7695-39423/10 $26 @IEEE 2010. [10] By Arun K Pujari ”Data Mining Techniques” press(India) private limited 2001, 2009:81-99. @Universities [11] By WEI Yong-qing1, YANG Ren-hua2, LIU Pei-yu2 “An Improved Apriori Algorithm for Association Rules of Mining” 978-1-4244-3930-0/09/$25.00 © IEEE 2009. [12] By R. Uday Kiran and P. Krishna Reddy” An Improved Multiple Minimum Support Based Approach to Mine Rare Association Rules “978-1-4244-2765-9/09/$25.00 © IEEE 2009. [13] By Sandeep Singh Rawat and Lakshmi Rajamani “Probability Apriori based Approach to Mine Rare Association Rules” Data Mining and Optimization (DMO) 28-29 June 2011, Selangor, Malaysia 978-1-61284-212-7/11/$26.00 ©IEEE 2011. [14] By Li-Min Tsai, Shu-Jing Lin, and Don-Lin Yang “Efficient Mining of Generalized Negative Association Rules” Granular Computing 978-0-7695-4161-7/10 $26.00 © IEEE 2010. [15] By Xiufend Piao,Zhanlong Wang,Gang Liu “Reasearch on mining positive and negative association rules based on dual confidence”978-1-4244-9954-0/11 @31 @2011. [16]Kudu, G.Islam,M.M; Munir,S.; Bari ,M.F. “An associative classifier with negative rules”978-1-4244-2172-5 @31 @ 2008 781