Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
International Journal of Computer Engineering and Applications, Volume V, Issue II, Feb.14 A SURVEY OF POST MINING OF ASSOCIATION RULES AND HIGH SPEED ASSOCIATION RULE MINING D.William Albert1,Dr.K.Fayaz 2, D.Veerabhadra Babu 3 1 Department of Computer Science & IT, Mahatma Gandhi University, Meghalaya, India 2 Department of Computer Science & Technology, SK University,Anantapuram, India 3 Department of Computer Science & IT, Mahatma Gandhi University, Meghalaya, India ABSTARCT: Association rule mining has widely been used to extract trends or patterns that satisfy given support and confidence. The trends are then transformed into business intelligence for expert decision making. However, association rule mining often results in huge number of rules. Users might feel it difficult to identify those rules that are of interest to them. Therefore post mining of association rules is essential to get rid of insignificant rules besides summarizing and visualizing the discovered rules. Recent innovations with respect to GPU (Graphics Processing Unit) technologies help leverage the high speed processing for association rule mining. This paper makes a systematic analysis of various post mining techniques required for discovering actionable knowledge from large number of association rules. It also throws light into the GPU and its usage in high speed association rule mining. We also evaluate the techniques based on the speed, accuracy, time and efficiency. Keywords –Data mining, association rule mining, post mining techniques, high speed association rule mining [1] INTRODUCTION Association rule mining has been around which one of the data mining techniques is widely used in real world applications. The rules obtained bestow the trends that can reveal business intelligence which can be used to make well informed business decisions. According to Boettcher et al. [1] association rule mining discovers the hidden relationships or correlations among the attributes. In this paper we analyzed methods for association rule mining, methods used for post mining of association rules and high speed association rule mining. Association rule mining which is meant for discovering trends or patterns in given data set has very important utility in the real world enterprises. It is often used to make expert decisions that lead to profit. Various methods or techniques used in association rule mining are explored in [2], [3], [4], [5], [6], [7], [8], [9], [10], [11] as presented in this paper. This paper also focuses on various methods or techniques used for post mining of association rules as explored in [12], [13], [14], [15], [16], [17], [18], [19] and [20]. Finally it throws light into the technological innovations that yield Graphics Processing Unit (GPU) and the usage of it for general purpose computing applications in order to achieve high speed association rule mining. The high speed association rule mining [21], [22], [23], [24] and [25] the architecture of GPU and the results of research pertaining to the high speed association rule mining are analyzed in this paper. 77 D.William Albert, Dr.K.Fayaz and D.Veerabhadra Babu A Survey of Post Mining of Association Rules and High Speed Association Rule Mining Our contributions in this paper include the analysis of association rule mining, its techniques; post mining of association rules and techniques used in the real world; architecture of GPU and its applications for high speed association rule mining. The remainder of this paper is structured as follows. Section II provides information about methods of association rule mining. Section III bestows the analysis of post mining of association rules. Section IV focuses on high speed association rule mining while section V concludes the paper. [2] METHODS OF ASSOCIATION RULE MINING Basically association rule mining in the discipline that discovers the hidden correlations among the attributes of a data source. Boukerche and Samarah [2] applied association rule mining on the huge amount of data generated by Wireless Sensor Networks (WSNs). The generated association rules provided the correlations among the sensors that help in making decisions. These researchers also association rule mining mechanisms that are based on coverage and in-network data reduction that leverage correlations among the sensors with respect to locations as well [3]. Association rule mining has been applied to various kinds of data. Shyu et al. [4] applied association rule mining technique on temporal-spatial data obtained from geospatial image databases. Ding et al. [5] applied association rule mining techniques on spatial data. Remote Sensed Imagery (RSI) database was used for their experiments. For compressed and lossless representation of such data, they used Peano Count Tree (PCT) structure. They produced comparatively better results when compared with existing algorithms such as Apriori and FP-growth. Nestorov and Jukić [6] applied ad hoc association rule mining to data warehouses as they are information rich. Their approach is comprehensive to discover association rules from transactional data as well as historical data which are represented in star schema. Artificial Intelligence (AI) was also used for mining association rules. Huang and Hu [7] applied AI techniques in association with conventional framework to mine association rules for expert decision making. The actual techniques used by them are neural networks and Self-Organization Map (SOM) besides rough set theory. Pang and Kasyanov [8] used SVM classification trees as input for mining association rules.SVM is a famous supervised learning model with underlying algorithms for learning. It is best used to discover patterns and used forregression analysis and classification.Thus they achieved better comprehensibility from the rules that are the result of SVM. They also worked out on the encoding and decoding of rules for knowledge extraction and usage of it in the real world applications. Couturier et al. [9] improved the presentation of knowledge represented in the form of association rules. They proposed a framework that integrates the data mining process for producing association rules and also visualized the association rules for human computer interaction (HCI). However, they did not integrate the tool with other clustering algorithms. Pan et al. [10] focused on association rule mining with knowledge of a domain expert. As domain experts are aware of the useful support and confidence required for accurate decision making, their technique has assumed 78 D.William Albert, Dr.K.Fayaz and D.Veerabhadra Babu International Journal of Computer Engineering and Applications, Volume V, Issue II, Feb.14 significance. Choi et al. [11] improved the general usage of association rules. They focused on prioritizing the association rules based on the business needs and the associated criteria. They achieved synergic effect of this and the capabilities of data analysis tools. Thus they could address quality, quantity and conflicts problems in association rule mining. Safaei [12] focused on generating fine-grained association rules that can provide comprehensive knowledge for better decision making. The researcher also worked on the post mining process in order to merge association rules that have been discovered in order to make them more meaningful and useful. Au and Chan [13] proposed a novel fuzzy data mining technique that makes use of discovered association rules and find the changes in association rules over time. In fact they mined fuzzy rules in order to discover changes. These researchers could predict well about the changes in the association rules in future based on their fuzzy rules analysis. Lee, Hong and Wang [14] proposed a framework for data mining that produces fuzzy multiple-level association rules from given dataset. Moreover the results are obtained based on the given multiple minimum supports with respect to itemsets and taxonomy. The algorithm they proposed proved to be simple and effective in generating fuzzy multiple-level association rules. Textual data can also be used for mining association rules. Latiri and Yahia [15] used taxonomy and formal concept analysis (FCA) for generating implicit association rules from textual data. Their approach has two phases. In the first phase they extract formal concepts from textual data explicitly. In the second phase, they use an algorithm in order to extract implicit association rules. Zhong, Li and Wu [16] presented a pattern discovery method that produced association rules as part of text mining. They make some sort of post mining on the generated patterns in order to update then and finding relevant, interesting information. Xie and Liu [17] focused on association rules and attribute implications as two kinds of knowledge. They proposed a framework that makes use of intent reducts and approximate intent reducts in order to discover the two kinds of knowledge. Cherfi, Napoli and Toussaint [18] applied association rule extraction method to textual data. They introduced two classification methods based on classical numerical approach and domain knowledge based approach. They combined the semantic techniques and data mining for post mining of association rules. They built a knowledge model that can be updated incrementally. Singh et al. [19] proposed an optimization algorithm that performs online data mining for generating association rules. They preferred graph to lattice for representing knowledge and generating rules. Mart´ınez-Ballesteros et al. [20] applied evolutionary algorithms for finding genegene interrelations. In fact, they extracted these relationships from already discovered quantitative association rules. Their approach also generated some unknown relationships that were considered as hypotheses for further research. Privacy preserving data mining has become very important recently. Giannotti et al. [21] performed experiments on transactional databases in order to mine association rules while preserving privacy of data. They devised an attack model that helped in ensuring the validity of the solution for privacy preserving data mining. Saygin et al. [22] introduced metrics that help in finding how the data mining technique used for mining association rules in actually has privacy preserving element. 79 D.William Albert, Dr.K.Fayaz and D.Veerabhadra Babu A Survey of Post Mining of Association Rules and High Speed Association Rule Mining [3] POST MINING OF ASSOCIATION RULES Thakkar, Mozafari and Zaniolo [23] proposed an architecture where mining process and post mining are combined into a workflow that provides high quality of services with data sources including the modern data stream management systems. They achieved the high level of mining with SWIM algorithm. Their framework supports many activities such as clustering, trend analysis, and summarization and so on. Continuous post mining process is kept part of the framework that improves quality of service besides reducing the execution time. Jiang and Chen [24] applied post mining of association rules in sensor network. They used post mining on association rules there were meant for sensor data estimation. In fact they focused on informative and non-redundant association rules based on the user interests. Thus the number of rules is reduced which results in improving quality of service. The results of the experiments made by these authors are presented in the following graphs. Figure 1 –Average and Maximum Estimation of Accuracy for Traffic Monitoring Application [24] Figure 2 –Running Time for Traffic Monitoring Application 80 D.William Albert, Dr.K.Fayaz and D.Veerabhadra Babu International Journal of Computer Engineering and Applications, Volume V, Issue II, Feb.14 Figure 3 – Memory Usage for Traffic Monitoring Application As can be seen in Figure 1, 2 and 3 it is evident that the proposed approach for post mining in [24] consumes less memory besides giving high accuracy. However, it takes relatively more time to complete the processing. Marinica and Guillet [25] explored ontologies for postminig of association rules in an interactive mode. Their framework for interactive post mining is as presented in Figure 1. Figure 4 - Illustrates Ontology Based Interactive Post Mining [25] As can be seen in Figure 4it is evident that association rules are extracted from database. Afterwards, the user knowledge which is in the form of ontology, rule schemas and operators is applied for post mining which results in filtered rules. The filtered rules are more accurate and can be used directly for decision making. Sulthana and Murugeswari [26] also focused on interactive post mining using ontologies. Their approach is named as “ARIPSO” which is meant for filtering or pruning the discovered rules using rule schema and ontologies. 81 D.William Albert, Dr.K.Fayaz and D.Veerabhadra Babu A Survey of Post Mining of Association Rules and High Speed Association Rule Mining [4] HIGH SPEED ASSOCIATION RULE MINING Owens et al. [27] described GPU (Graphics Processing Unit) as powerful, programmable and highly parallel processing unit for general purpose computing application for high speed processing. The very important characteristic of GPU is parallel computing which is required by the real world applications. The power of GPU out performs that of the conventional CPU (Central Processing Unit). The rapid growth in GPU computing has attracted many researchers in the field to focus on leveraging the power of GPU for general purpose applications. The architecture of modern GPU is as shown in Figure 5. Figure 5 – Illustrates the architecture of Modern GPU [27] As can be seen in Figure 5, it is evident that the GPU architecture is flexible for new hardware innovations besides providing parallel programming a boost. These are the two important characteristics that make GPU useful in real world applications. When GPU is used for data mining, for instance association rule mining, there exists high speed processing with respect to association rule mining. The CPU and GPU performance comparison is shown in Figure 6. Figure 6 – Performance of CPU vs. GPU [27] 82 D.William Albert, Dr.K.Fayaz and D.Veerabhadra Babu International Journal of Computer Engineering and Applications, Volume V, Issue II, Feb.14 As can be seen in Figure 6, the scan performance of graphics based GPU (OpenGL); directcompute GPU (CUDA) is presented. The results reveal that OpenGL and CUDA which are forms of GPU computing exhibit high performance. Daga et al. [28] analyzed the efficiency of using a fused CPU and GPU in order to leverage parallel computing. They used kernel benchmarks and micro benchmarks and actual applications in order to demonstrate the efficiency of fused approach. These authors made experiments with Nacate APU which is the fused form of both CPU and GPU, Radeon HD 5870, and Radeon HD 5450 in terms of size and bandwidth with respect to host to device and device to host. The results are as shown in Figure 7 and Figure 8. Figure 7 –Bandwidth Test (Host to Device) [28] As shown in Figure 7, it is evident that the size and bandwidth are represented by horizontal and vertical axes respectively. The results revealed that the fusion approach exhibits better performance. Figure 8 – Bandwidth Test (Device to Host) [28] As shown in Figure 8, it is evident that the size and bandwidth are represented by horizontal and vertical axes respectively. The results revealed that the fusion approach exhibits better performance. 83 D.William Albert, Dr.K.Fayaz and D.Veerabhadra Babu A Survey of Post Mining of Association Rules and High Speed Association Rule Mining [5] CONCLUSION In this paper we studied the association rule mining which is one of the techniques in data mining. We focused on post mining of association rules and high speed association rule mining as well. The post mining helps in reducing the number of discovered rules while the high speed association rule mining leverages the power of Graphics Processing Unit (GPU). Our analysis has resulted in certain important facts which are described here. Post mining when incorporated into the mining framework provides better performance when compared to the linear usage. Post mining could efficiently reduce the scope of rules and improved the decision making process. GPU helped to make use of recent innovations in computing to speed up the process of data mining. For instance, it can be used for high speed association rule mining. In fact GPU leveraged the speed of general purpose computing besides making the parallel computing possible. Another interesting fact is that the fusion of CPU and GPU in some architecture also proved to be effective in high speed processing required for mining association rules. 84 D.William Albert, Dr.K.Fayaz and D.Veerabhadra Babu International Journal of Computer Engineering and Applications, Volume V, Issue II, Feb.14 REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] Mirko Boettcher, Georg Ruß, DetlefNauck and Rudolf Kruse. (2009). From Change Mining to Relevance Feedback: A Unified View on Assessing Rule Interestingness. IGI.p12-37. AzzedineBoukerche and SamerSamarah. (2007). An Efficient Data Extraction Mechanism for Mining Association Rules from Wireless Sensor Networks. IEEE.p3936-3941. AzzedineBoukerche and SamerSamarah.(2009). In-Network Data Reduction and CoverageBased Mechanisms for Generating Association Rules in Wireless Sensor Networks. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY.58 (8), p4426-4438. Chi-RenShyu, Matt Klaric, Grant Scott and Wannapa Kay Mahamaneerat.(2006). Knowledge Discovery by Mining Association Rules and Temporal-Spatial Information from Large-Scale Geospatial Image Databases. IEEE.p17-20. Qin Ding, Qiang Ding and William Perrizo. (2008). PARM—An Efficient Algorithm to Mine Association Rules From Spatial Data. IEEE.58 (6), p1513-1524. SvetlozarNestorov and NenadJukić.(2002). d-Hoc Association-Rule Mining within the Data Warehouse. IEEE.p1-10. ZHE AUANG and WN-QUAN HU. (2003). Applying AI Technology And Rough Set Theory To Mine Association Rules For Supporting Knowledge Management. IEEE.1820-1825. Shaoning Pang and NikKasabov. (2008). r-SVMT: Discovering the Knowledge of Association Rule over SVM Classification Trees. IEEE.p2486-2493. Olivier Couturier, TarekHamrouni, Sadok Ben Yahia and EngelbertMephuNguifo. (2007). A scalable association rule visualization towards displaying large amounts of knowledge. IEEE.p1-7. Haiwei Pan, Qilong Han, Guisheng Yin, Wei Zhang and Jianzhong Li. (2008).Association Rule Mining with Domain Knowledge Constraint.IEEE. p273-280. Duke Hyun Choia, ByeongSeokAhnb and SoungHie Kim. (2005). Prioritization of association rules in data mining: Multiple criteria decision approach. ELSEVIER.p867-878. MarjanehSafaei. (2009). Fine-Grained Association Rules toward Knowledge Discovery. IEEE.p1-4. Wai-Ho Au and Keith C.C. Chan. (2002).Fuzzy Data Mining for Discovering Changes in Association Rules over Time. IEEE.p890-895. Yeong-Chyi Lee, Tzung-Pei Hong and Tien-Chin Wang.(2006). Mining Fuzzy Multiple-level Association Rules under Multiple Minimum Supports.IEEE.p4112-4117. Ch. CherifLatiri and S. BenYahia. (2001). Generating Implicit Association Rules from Textual Data. IEEE.p137-143. NingZhong, Yuefeng Li and Sheng-Tang Wu.(2012). Effective Pattern Discovery for Text Mining. IEEE.24 (1), p30-44. ZhipengXie and Zongtian Liu. (2005). From Intent Reducts for Attribute Implications to Approximate Intent Reducts for Association Rules. IEEE.p1-8. HacèneCherfi, Amedeo NAPOLI and Yannick TOUSSAINT. (2009). A Conformity Measure using Background Knowledge for Association Rules: Application to Text Mining. ACM.p1-17. Archana Singh, MeghaChaudhary, Dr (Prof.) Ajay Rana and GauravDubey.(2006). Online Mining of data to Generate Association Rule Mining in Large Databases. IEEE.p126-131. M. Mart´ınez-Ballesteros, I. Nepomuceno-Chamorro, J.C. Riquelme. (2011). Inferring GeneGene Associations from Quantitative Association Rules. IEEE.p1241-1246. FoscaGiannotti, Laks V. S. Lakshmanan, Anna Monreale, Dino Pedreschi, and Hui (Wendy) Wang. (2013). Privacy-Preserving Mining of Association Rules From Outsourced Transaction Databases. IEEE.7 (3), p385-395. Y¨ucelSaygın, Vassilios S. Verykios and Ahmed K. Elmagarmid.(2002). Privacy Preserving Association Rule Mining. IEEE.p1-8. HetalThakkar, BarzanMozafari and Carlo Zaniolo.(n.d). Continuous Post-Mining of Association Rules in a Data Stream Management System.IEEE.p1-18. 85 D.William Albert, Dr.K.Fayaz and D.Veerabhadra Babu A Survey of Post Mining of Association Rules and High Speed Association Rule Mining [24] Nan Jiang and Zhiqiang Chen.(2010). Post Mining of Non-redundant Association Rules for Sensor Data Estimation. IEEE.p102-106. [25] Claudia Marinica and FabriceGuillet. (2010). Knowledge-Based Interactive Postmining of Association Rules Using Ontologies. IEEE.22 (6), p784-797. [26] A.RaziaSulthana and B.Murugeswari. (2011). ARIPSO : Association Rule Interactive Postmining Using Schemas And Ontologies. IEEE.p941-946. [27] John D. Owens, Mike Houston, David Luebke, Simon Green, John E. Stone, and James C. Phillips. (2008). GPU Computing . IEEE.96 (5), p879-899. [28] MayankDaga, Ashwin M. Aji and Wu-chunFeng.(n.d). On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing. IEEE.p1-9. 86 D.William Albert, Dr.K.Fayaz and D.Veerabhadra Babu