Download Association Rule Hiding by Heuristic Approach to Reduce Side

International Journal of Computer Applications (0975 – 8887) Volume 45– No.1, May 2012 Association Rule Hiding by Heuristic Approach to Reduce Side Effects & Hide Multiple R.H.S. Items Komal Shah Amit Thakkar Amit Ganatra U and P.U. Patel Department of Computer Engineering, Chandubhai S Patel Institute of Technology, Charotar University of Science and Technology, Changa, Gujarat, India Department of Information Technology, Chandubhai S Patel Institute of Technology, Charotar University of Science and Technology, Changa, Gujarat, India U and P.U. Patel Department of Computer Engineering, Chandubhai S Patel Institute of Technology, Charotar University of Science and Technology, Changa, Gujarat, India ABSTRACT Association rule mining is a powerful model of data mining used for finding hidden patterns in large databases. One of the great challenges of data mining is to protect the confidentiality of sensitive patterns when releasing database to third parties. Association rule hiding algorithms sanitize database such that certain sensitive association rules cannot be discovered through association rule mining techniques. In this study, we propose two algorithms, ADSRRC (Advanced Decrease Support of R.H.S. items of Rule Cluster) and RRLR (Remove and Reinsert L.H.S. of Rule), for hiding sensitive association rules. Both algorithms are developed to overcome limitations of existing rule hiding algorithm DSRRC (Decrease Support of R.H.S. items of Rule Cluster). Algorithm ADSRRC overcomes limitation of multiple sorting in database as well as it selects transaction to be modified based on different criteria than DSRRC algorithm. Algorithm RRLR overcomes limitation of hiding rules having multiple R.H.S. items. Experimental results show that both proposed algorithms outperform DSRRC in terms of side effects generated and data quality in most cases. General Terms Database, Data mining, Security. Keywords Association Rule Hiding, Data Mining, Privacy Preservation Data Mining. 1. INTRODUCTION Data mining technology aims to find useful patterns from large amount of data. These patterns represent knowledge and are expressed in decision trees, clusters or association rules. The knowledge discovered by various data mining techniques may contain private information about individual or business. Revelation of any private information may cause threat to security. For example, in medical database, it is useful to share information about diseases but at the same time it is required to preserve patient‟s identity. Here individual privacy must be maintained. Another example is market basket database which is used to analyze customer‟s purchasing behavior represented in terms of association rules. In market basket database, instead of data related to individuals, the sensitive information or knowledge derived from data is required to be protected. developed for modifying the original data in such that sensitive data and knowledge remains unrevealed even after the mining process. Association rule hiding is one of the privacy preservation techniques to hide sensitive association rules. All association rule hiding algorithm aims to minimally modify the original database such that no sensitive association rule is derived from it. In this paper we have proposed two association rule hiding algorithms, ADSRRC (Advanced Decrease Support of R.H.S. items of Rule Cluster) and RRLR (Remove and Reinsert L.H.S. of Rule), based on heuristic approach. Both algorithms are based on algorithm DSRRC proposed in [12]. Algorithm DSRRC depends on ordering of transactions for removing items from database. Also it requires sorting of database each time item is removed from database. Algorithm ADSRRC is proposed to overcome these limitations. Algorithm DSRRC cannot hide rule having multiple R.H.S. items. To overcome this limitation algorithm RRLR is proposed. 2. THEORETICAL BACKGROUND AND RELATED WORK Privacy is defined as “The right of an entity to be secure from unauthorized disclosure of sensible information that are contained in an electronic repository or that can be derived as aggregate and complex information from data stored in an electronic repository”[2]. PPDM is to conduct data mining operations under the condition of preserving data privacy [3]. PPDM investigates the side effects of data mining methods that originate from the penetration into the privacy of individuals and organizations [4]. The aim of PPDM algorithms is to extract relevant knowledge from large amounts of data while protecting at the same time sensitive information [1]. The aim of association rule hiding algorithms is to properly sanitize the original data so that any association rule mining algorithms that may be applied to the sanitized version of the data (i) will be incapable to uncover the sensitive rules under certain parameter settings, and (ii) will be able to mine all the non-sensitive rules that appeared in the original dataset (under the same or higher parameter settings) and no new rules generated. There are mainly 3 approaches for Association Rule Hiding (i) Heuristic Approach (ii) Border Based Approach (iii) Exact Approach [1]. In following, overview of these approaches is given in brief. Privacy preservation data mining (PPDM) considers problem of maintaining privacy in data mining. PPDM algorithms are 1 International Journal of Computer Applications (0975 – 8887) Volume 45– No.1, May 2012 2.1 Heuristic approach This approach involves efficient, fast and scalable algorithms that selectively sanitize a set of transactions from the original database to hide the sensitive association rules [1]. Various heuristic algorithms are based on mainly two techniques: Data distortion technique and blocking technique. Data distortion is done by the alteration of an attribute value by a new value. It changes 1‟s to 0‟s or vice versa in selected transactions to increase or decrease support or confidence of sensitive rule. Heuristic algorithms cannot give an optimal solution because of undesirable side effects to nonsensitive rules, e.g. lost rules and new rule. Algorithms proposed using heuristic approach can be divided into rule hiding and itemset hiding algorithms [6]. Five algorithms are proposed to hide sensitive information in database [6], among them three are rule hiding algorithms and two are itemset hiding algorithms. Later on itemset hiding algorithms to automatically hide sensitive rules without pre mining and selection of rules are proposed in [7] and [8]. These algorithms increase the support of L.H.S. of the rule or decrease the support of the R.H.S. of the rule by inserting and removing sensitive items from selected transactions respectively. An item set hiding algorithm which uses patterninversion tree is proposed in [9] to store related information so that only one scan of database is required. In [10] four heuristic algorithms are proposed which selects the sensitive transactions to sanitize based on degree of conflict and then removes items from selected transactions based on certain criteria like remove all items except item with highest frequency, remove item having smallest support, remove item with the maximum support or removes group of items sharing shame patterns. A rule hiding algorithm is proposed in [11], which correlates sensitive association rules and transactions by using a graph to effectively select the proper item for modification. Later on in [12], a rule hiding algorithm named DSRRC (Decrease Support of R.H.S. item of Rule Clusters) is proposed, which clusters the sensitive association rules based on R.H.S. of rules and hides as many as possible rules at a time by modifying fewer transactions. Because of less modification in database it helps maintaining data quality, but this algorithm cannot hide rules having multiple R.H.S. items and also it exhibits undesirable side effects. Blocking is the replacement of an existing value with a “?”. It inserts unknown values in the data to fuzzify the rules. In some applications where publishing wrong data is not acceptable, then unknown values may be inserted to blur the rules. In [13] two algorithms are built based on blocking for rule hiding. The first one focuses on hiding the rules by reducing the minimum support of the itemsets that generated these rules (i.e., generating itemsets). The second one focuses on reducing the minimum confidence of the rules. 2.2 Border based approach This approach hides sensitive association rule by modifying the borders in the lattice of the frequent and the infrequent itemsets of the original database. These algorithms use the theory of borders presented in [14]. The first frequent itemset hiding methodology that is based on the notion of the border is proposed in [15]. It maintains the quality of database by greedily selecting the modifications with minimal side effect. 2.3 Exact approach This approach contains nonheuristic algorithms which formulates the hiding process as a constraints satisfaction problem or an optimization problem which is solved by integer programming. These algorithms can provide optimal hiding solution with ideally no side effects. An exact algorithm for association rule hiding is proposed in [16] which tries to minimize the distance between the original database and its sanitized version. In [17] proposed an exact border based approach to achieve optimal solution as compared to previous approaches. The rest of this paper is organized as follows. In section 3, we discuss association rule mining strategy and problem of association rule hiding. We also discuss limitations of algorithm DSRRC. In section 4, a detailed description of proposed ADSRRC and example demonstrating ADSRRC is given. Section 5 contains detailed description of RRLR algorithm and example demonstrating this algorithm. In section 6 we analyze and discuss the performance results of proposed algorithms. Finally in section 7, we conclude by defining some future enhancements. 3. PROBLEM DEFINITION Let I = {i1,…., in} be a set of items. Let D be a set of transactions or database. Each transaction t Є D is an item set such that t is a proper subset of I. A transaction t supports X, a set of items in I, if X is a proper subset of t. Assume that the items in a transaction or an item set are sorted in lexicographic order. An association rule is an implication of the form X Y, where X and Y are subsets of I and X∩Y= Ø. The support of rule XY can be computed by the following equation: Support(XY) = |XY| / |D|, where |XY| denotes the number of transactions in the database that contains the itemset XY, and |D| denotes the number of the transactions in the database D. The confidence of rule is calculated by following equation: Confidence(XY) = |XY| / |X|, where |X| is number of transactions in database D that contains itemset X. A rule XY is strong if support(XY) ≥ min_support and confidence(XY) ≥ min_confidence, where min_support and min_confidence are two given minimum thresholds. Association rule mining algorithms scan the database of transactions and calculate the support and confidence of the rules and retrieve only those rules having support and confidence higher than the user specified minimum support and confidence threshold. Association rule hiding algorithms prevents the sensitive rules from being disclosed. The problem can be stated as follows: “Given a transactional database D, minimum confidence, minimum support and a set R of rules mined from database D. A subset RH of R is denoted as set of sensitive association rules which are to be hidden. The objective is to transform D into a database D‟ in such a way that no association rule in RH will be mined and all non sensitive rules in R could still be mined from D‟. The problem for finding an optimal sanitization to a database against association rule analysis has been proven to be NP-Hard [5]. For example, a sample transactional database D is shown in Table I. TID shows unique transaction number. Suppose MST and MCT are selected 3 and 75% respectively. The association rules satisfying MST and MCT, generated by apriori algorithm are ba, ad, da, bd, cd, dc, ec, ed, abd, bda, acd, ced, dec, and edc. Suppose the rules ba, bd and cd specified as sensitive and should be hidden in sanitized database. In order to hide an association rule, we can either decrease its support or its confidence to be smaller than pre-specified 2 International Journal of Computer Applications (0975 – 8887) Volume 45– No.1, May 2012 minimum support and minimum confidence threshold. To decrease the support of rule XY, we can decrease the support of corresponding large itemset XY. To decrease the confidence of rule XY, we can either increase support of X (i.e. L.H.S.) in transactions not supporting Y or we can decrease the support of Y (i.e. R.H.S.) in transactions supporting X and Y both. In algorithm DSRRC, authors decrease confidence of rule by removing R.H.S. of rule from selected transactions. To remove R.H.S. authors change value of an itemset from „1‟ to „0‟. They also use following concept of sensitivities specified in [12]. 1) Item Sensitivity is the frequency of data item exists in the number of the sensitive association rule containing this item. It is used to measure rule sensitivity. 2) Rule Sensitivity is the sum of the sensitivities of all items containing that association rule. Table II. Cluster and Transaction Sensitivities Cluster-1 RHS: d Rules:bd,cd Sensitivity :4 Item SensitiSensitiItem vity vity b c d 1 1 2 TID 1 2 3 4 5 6 7 8 9 4) Sensitive Transaction is the transaction in given database which contains sensitive item. 5) Transaction sensitivity is the sum of sensitivities of sensitive items contained in the transaction. Table I. Sample Database TID 1 2 3 4 5 6 7 8 9 Items abcde acd abdfg bcde abd cdefh abcg acde acdh b a 1 1 TID 1 2 3 4 5 6 7 8 9 Items abcde acd abdfg bcde abd cdefh abcg acde acdh C1 C2 4 2 3 3 2 4 3 2 3 - 2 3 3 - Table III. Sanitized Database by DSRRC Algorithm 3) Cluster Sensitivity is the sum of the sensitivities of all association rules in cluster. Cluster sensitivity defines the rule cluster which is most affecting to the privacy. They clusters the sensitive rules based on R.H.S. So in this example two clusters are made as shown in Table II. It also shows sensitivities of each item in cluster, sensitivity of each cluster and sensitivities of each transaction with respect to both clusters. C1 and C2 represent sensitivities of transaction with respect to cluster 1and 2 respectively. They first sort clusters in decreasing order of sensitivities and choose highest sensitive cluster first. Then transactions are sorted in decreasing order of their sensitivities for that cluster. Then highest sensitive transaction is selected for modification and removes the R.H.S. item of cluster from that transaction. Each time an item is modified, sensitivities are updated and transactions are sorted in decreasing order of their sensitivities. Thus for large database, it will affect the running time of algorithm. This process continues until all sensitive rules in each cluster are hidden. Final sanitized database is shown in Table III. In final sanitized database, all sensitive rules are successfully hidden (i.e. 0% Hiding Failure), almost 36% of the rules are missing (i.e. Misses Cost) and no artifactual rules generated and only one transaction is modified. Cluster-2 RHS: a Rules: ba Sensitivity :2 Items bc e acd abdfg bcde abd cdefh abcg acde acdh Table IV. Modified Database By Interchanging Rows Of Table I TID 1 2 3 4 5 6 7 8 9 Items bcde abd acd abcg acde abdfg cdefh abcde acdh Table V. Transaction Sensitivities and Final Sanitized Database TID 1 2 3 4 5 6 7 8 9 Items bcde abd acd abcg acde abdfg cdefh abcde acdh C1 4 3 3 3 3 3 4 3 C2 2 2 2 2 - TID 1 2 3 4 5 6 7 8 9 Items bce bd acd abcg acde abdfg cdefh abcde acdh It is observed that algorithm DSRRC is sensitive to the order of transaction in the database. For example if we interchange any two or more rows in database then it gives different result on same database. Consider original database shown in Table I and modified database by interchanging rows shown in Table IV. According to DSRRC algorithm two clusters are generated as shown in Table II. Then transactions are indexed 3 International Journal of Computer Applications (0975 – 8887) Volume 45– No.1, May 2012 as per sensitivities as shown in Table V. Because this algorithm selects transactions sequentially, for modified database, two transactions are updated. So it is analyzed that DSRRC is dependent on ordering of records in database. Ordering of rows greatly impacts the resultant modified database shown in Table V. In final sanitized database shown in Table V, there is 0% hiding failure and 36% Misses Cost and 27.28% of new rules are created as artifactual patterns. Also two transactions are modified in database. For same database, algorithm DSRRC exhibits different results if we simply interchange some of the transactions. Transaction selection method of DSRRC cannot select same transaction to modify for different instances of same database. Another limitation of algorithm DSRRC is that cannot hide rule having multiple RHS items. It hides only rules having single consequent. In next sections we are proposing two different algorithms ADSRRC and RRLR. ADSRRC overcomes the problem of transaction dependency by employing a different technique of transaction selection. Algorithm RRLR is extension of DSRRC such that it can hide rules multiple R.H.S. items. 4. PROPOSED ALGORITHM – ADSRRC ADSRRC is based on concept of sensitivities in given in [12]. Initially association rules are mined from the source database by using association rule mining algorithms e.g. Apriori algorithm. Then sensitive rules are specified from mined rules. Selected rules are clustered based on common R.H.S. item of the rules. Then transactions are indexed by sensitivities. In DSRRC transaction sensitivity is different for each cluster but ADSRRC calculates transaction sensitivity irrespective of clusters. It means for all clusters transaction sensitivity is same. After transaction indexing, ADSRRC sorts the transactions based on sensitivity. In DSRRC, each time item is removed, transactions are sorted. But in ADSRRC, first transactions are sorted in decreasing order of their sensitivity. Then transactions having same sensitivity are sorted in decreasing order of their length. When an item is removed from any transaction, sensitivity is not modified. Thus transactions are sorted only two times. Thus we are avoiding here multiple sorting of transactions which will significantly reduce execution time of algorithm for large database. After sorting process, rule hiding process, starts by selecting highest sensitive transaction for deleting R.H.S. item. If two transactions have same sensitivity then lengthiest transaction is chosen to be modified. This process continues until all the sensitive rules in all clusters are not hidden. Finally modified transactions are updated in original database and produced database is called sanitized database D‟ which ensures certain privacy for specified rules and maintains data quality. Algorithm ADSRRC INPUT: Source database D, Minimum Confidence Threshold (MCT), Minimum support threshold (MST). OUTPUT: The sanitized database D‟. 1. 2. 3. Begin Generate association rules. Selecting the Sensitive rule set RH with single antecedent and consequent e.g. xy. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. Clustering-based on common item in R.H.S. of the selected rules. Find sensitivity of each item in each cluster. Find the sensitivity of each rule in each cluster. Find the sensitivity of each cluster Find sensitivity of each item i∈ D. For each transaction t∈ D { Sensitivity of t = 0; For each item i ∈ t { Sensitivity of t = Sensitivity of t + Sensitivity of i } } Sort transactions in decreasing order of their sensitivity. If two or more transactions have same sensitivity then sort those in decreasing order of their length. Sort generated clusters in decreasing order of their sensitivity. For each cluster c∈ C { While(all the sensitive rules ∈ c are not hidden) { Take first transaction and delete common R.H.S. item from the transaction. If this transaction doesn‟t contain that item then select next sensitive transaction in order. For i = 1 to no. of rule RH∈c { Update support and confidence of the rule r ∈c. If(support of r < MST or confidence of r < MCT) { Remove Rule r from RH } } Take next transaction. } End while } End for Fig 1: Algorithm ADSRRC Fig. 1 shows proposed algorithm ADSRRC. Steps 1 to 3 generates association rule and selects sensitive association rules. Step 4 clusters sensitive rules based on common R.H.S. item. Then in step 5 we are finding sensitivities of each item in cluster. Rule sensitivity and cluster sensitivity is found in steps 6 and 7. In step 8 total sensitivity of each item is found. Step 9 to 16 describes the method to find sensitivities of each transaction. It is calculated by summing sensitivities of all items belongs to that transaction. In step 17, transactions are sorted in decreasing order of their sensitivity. If two or more transactions have same sensitivity then they are sorted in decreasing order of their length. From step 18 onwards, rule hiding process starts. This process is similar as in DSRRC. But it does not modify sensitivity of transaction at each removal of item and it does not sort transactions during rule hiding process. Process of rule hiding continues until all sensitive rules are hidden. Considering previous example of modified database, shown in Table IV, this algorithm also generates two clusters from selected sensitive rules, shown in Table II. As shown in step 8 of ADSRRC, sensitivity of each item is calculated. For example item „b‟ has sensitivity 1 in cluster 1 and sensitivity 1 4 International Journal of Computer Applications (0975 – 8887) Volume 45– No.1, May 2012 in cluster 2. So total sensitivity of item b is equals to 2. For transaction sensitivity, sensitivities of each item appearing in that transaction is added. Table VI shows sensitivity for all items in database as well for all transactions and it also shows transaction length. Now transaction with highest sensitivity, which is transaction having TID = 8, is chosen. According to algorithm ADSRRC, for first cluster, item „d‟ is removed from this transaction. Now all rules in cluster 1 are hidden. For second cluster again same transaction is chosen and item „a‟ is removed from it. This process continues until all sensitive rules in all clusters are hidden. Table VII shows final sanitized database. As compared to DSRRC, only one transaction is modified in final database and no new rule is generated. Table VI. Item and Transaction Sensitivity Item Sensitivity a b c d e f g h 1 2 1 2 0 0 0 0 TID Items 1 2 3 4 5 6 7 8 9 bcde abd acd abcg acde abdfg cdefh abcde acdh SensitiLength vity 5 5 4 4 4 4 3 6 4 4 3 3 4 4 5 5 5 4 and Deletion Process. It will not update the sensitivity of transactions during rule hiding. Hiding process starts from highest sensitive transaction and continues until all the sensitive rules are not hidden. Finally modified transactions are updated in original database and produced database is called sanitized database which ensures certain privacy for specified rules and maintains data quality. Algorithm RRLR INPUT: Source database D, Minimum Confidence Threshold (MCT), Minimum support threshold (MST). OUTPUT: The sanitized database D‟. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. Table VII. Final Sanitized Database TID 1 2 3 4 5 6 7 8 9 Items bcde abd acd abcg acde abdfg cdefh bce acdh 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 5. PROPOSED ALGORITHM – RRLR Concept of sensitivity is also used in RRLR algorithm. This algorithm hides sensitive association rules having multiple RHS items. In algorithm RRLR, to hide sensitive association rule we are decreasing support and confidence both. For rule xyz, support of rule can be decreased by decreasing support of large itemset „xyz‟. We are applying here LHS deletion process to decrease the support of large itemset „xyz‟. To decrease the confidence of rule xyz we are using LHS insertion process. In LHS insertion, confidence of rule can be decreased by inserting LHS of rule in transaction not supporting RHS of rule. Initially association rules are mined from the source database by using association rule mining algorithms e.g. Apriori algorithm. Then sensitive rules are specified from mined rules. Then transaction sensitivity and item sensitivity is found. Transactions are sorted in decreasing order of their sensitivity and length. These all steps are similar to ADSRRC algorithm except that in RRLR we are not creating clusters. After sorting process, rule hiding process hides all the sensitive rules in sorted transactions by using LHS insertion 24. 25. 26. 27. 28. 29. 30. 31. 32. Begin Generate association rules. Selecting the Sensitive rule set RH with single antecedent and multiple consequent e.g. xyz Find sensitivity of each item i∈ D. For each transaction t∈ D { Sensitivity of t = 0; For each item i ∈ t { Sensitivity of t = Sensitivity of t + Sensitivity of i } } Sort transactions in decreasing order of their sensitivity. If two or more transactions have same sensitivity then sort those in decreasing order of their length. Sort sensitive rules of RH in decreasing order of their confidence. While (all the sensitive rules are not hidden) { For i=1 to no. of rules ∈ RH { Select rule Ri (for example xyz) for hiding LHS Deletion Process For j = 1 to no. of transactions ∈ D { If (itemset xyz ∈ transaction tj) Remove LHS of rule Ri from transaction tj and start LHS Insertion process } LHS Insertion Process For k = j to no. of transactions ∈ D { If (RHS of rule (e.g. itemset „yz‟) does not belongs to transaction tk and LHS of rule (item „x‟) does not belongs to transaction tk ) Insert LHS of rule Ri in transaction tk and start modification of support and confidence } Modification of Support and Confidence Update support and confidence of all the rule belongs to RH If(support of Ri < MST or confidence of Ri < MCT) Remove Rule Ri from RH Else repeat LHS deletion and insertion }//End for } //End while Fig 2: Algorithm RRLR Fig. 2 shows the algorithm of RRLR. In steps 1 to11 we are mining rules, selecting sensitive rules, finding sensitivities of items and transactions as well as sorting transaction is done. Then LHS deletion and insertion process is specified, which is explained by example as following. 5 International Journal of Computer Applications (0975 – 8887) Volume 45– No.1, May 2012 Considering database shown in Table VIII . Following rules are mined from database under support = 50% and confidence = 70% : e c, hc, hd, dec, dhc, chd, hcd, ac, dc, adc, gb, ed, abc, ced, ecd, da, ad, cda, acd, ca, cd, bca, bc, dac, acd. Suppose rules hcd (confidence=100%), ecd (confidence =83%) and dac (confidence =70%) are sensitive and needs to be removed. First sensitivity of each item is calculated. For example item „d‟ is appearing in all sensitive rules. So its total sensitivity is calculated as 1+1+1 = 3. For transaction sensitivity, sensitivities of each item appearing in that transaction is added. Table IX shows item sensitivity and transactions sorted in decreasing order of sensitivity and length. Table VIII. Sample Database TID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Itemset acd acdh abcdefh abdfg bcde abc cdefh abcg bcefg acdg bfg abcde acdefh bg abcdh Table IX. Item and Transaction Sensitivities Item Sensit ivity a b c d e f g h 1 0 3 3 1 0 0 1 TID 3 13 7 12 15 2 5 10 1 4 9 8 6 11 14 Itemset Sensiti Length vity abcdefh acdefh cdefh abcde abcdh acdh bcde acdg acd abdfg bcefg abcg abc bfg bg Table X. Final Sanitized Database TID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Itemset acd acdh abcf abdefgh bcde abc cdefh abcg bcdefg acdg bfg abcde acdefh bg abcdh 9 9 8 8 8 8 7 7 7 4 4 4 4 0 0 7 6 5 5 5 4 4 4 3 5 5 4 3 3 2 According to algorithm RRLR all sensitive rules are sorted in decreasing order of their confidence. Then rule having highest confidence is chosen for hiding. So rule hcd is chosen first for hiding. Transaction having TID = 3 is most sensitive transaction. So it is chosen to be modified. To hide a rule two procedures are developed: LHS Deletion and LHS Insertion. LHS deletion process deletes LHS of sensitive rule from the selected transaction. So item „h‟ is deleted from transaction having TID = 3. So support of large itemset „hcd‟ is decreased. After deletion, LHS item is inserted in most sensitive transaction not having large itemset „cd‟ and no item „h‟. Thus transaction having TID = 4 is chosen and „h‟ is inserted into it. So confidence of hcd is decreased due to increase in support of „h‟ item. After deletion and insertion confidence and support of rule hcd is modified. The process of deletion and insertion continues until all rules are hidden in sensitive rule set. Table X shows final sanitized database. Now, if we mine association rules from final sanitized database, we can see that all of the sensitive rules are hidden and very few side effects produced. 6. ANALYSIS OF PROPOSED ALGORITHM For performance comparison, we have selected algorithm DSRRC because both proposed algorithms are based on this algorithm. We used algorithm DSRRC and ADSRRC to sanitize sample database as shown in Table IV and applied apriori algorithm on sanitized database produced by both algorithms to mine association rules. In our experiment, we selected 3 sensitive association rules as in example. The sample dataset has 14 association rules with support count ≥ 3 and confidence ≥ 75. Then we evaluated proposed algorithm with respect to following parameters: hiding failure (HF), misses cost (MC), artifactual patterns (AP), dissimilarity (DISS), and completeness etc. A detailed overview of these parameters is given in [18]. As shown in Table XI, performance of ADSRRC is better than DSRRC, in terms of number of sorting, artifactual patterns, percentage of transaction modified and completeness. To hide rules having multiple R.H.S. items, we have applied algorithm RRLR on database shown in Table VIII. Then we have applied apriori algorithm on sanitized database produced by RRLR algorithm to mine association rules. Results with respect to different parameters are shown in Table XI. It is shown that performance of RRLR is better than DSRRC, in terms of sorting, misses cost, artifactual patterns, dissimilarity, transaction modified and completeness. 6 International Journal of Computer Applications (0975 – 8887) Volume 45– No.1, May 2012 IEEE Transactions on Knowledge and Engineering, vol. 16, no. 4, pp. 434-447, 2004. Table XI. Comparative Analysis With DSRRC Algorithm Parameters No. of Sorting Hiding Failure Misses Cost Artifactual Patterns Dissimilarity Transaction Modified Completeness DSRRC > Two 0% 36.36% 27.28% 5.40% 22.22% 77.78% ADSRRC Two 0% 36.36% 0% 5.40% 11.11% 88.89% RRLR Two 0% 22.73% 0% 0% 20% 80% 7. CONCLUSION AND FUTURE EXTENSIONS In this paper, all existing approaches for association rule hiding are briefed. Then two algorithms are proposed based on item heuristic approach. Algorithm ADSRRC is modification of algorithm DSRRC such that side effects and time complexity both are reduced. Algorithm RRLR is proposed for hiding rules having multiple items in RHS. Both algorithms are analyzed with respect to hiding failure, misses cost, artifactual patterns, dissimilarity and completeness by taking suitable examples. In future algorithm ADSRRC can be extended for hiding rules which are not from transactional data base. A more general rule hiding algorithm can be proposed. In algorithm RRLR, clustering of association rule is not done. So in future it can be incorporated to it. Currently algorithm RRLR hides rules only having multiple RHS items. It can be extended to multiple LHS and multiple RHS items. 8. REFERENCES [1] ArisGkoulalas–Divanis; Vassilios S. Verykios “Association Rule Hiding For Data Mining” Springer, DOI 10.1007/978-1-4419-6569-1, Springer Science + Business Media, LLC 2010 [2] Elisa Bertino; Dan Lin; Wei Jiang,“ A Survey of Quantification of Privacy Preserving Data Mining Algorithms”, Springer, Pages: 183–205,2008 [3] RamakrishnanSrikant;”Privacy Preserving Data Mining: Challenges and Opportunities”, PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, Springer-Verlag London, UK , 2002 [4] [5] [6] Ahmed K. Elmagarmid; Amit P. Sheth , “PrivacyPreserving Data Mining Models and Algorithms ADVANCES IN DATABASE SYSTEMS - Volume 34” M. Atallah, E. Bertino, A. Elmagarmid, M. Ibrahim, and V. S. Verykios “Disclosure limitation of sensitive rules,”.In Proc. of the 1999 IEEE Knowledge and Data Engineering Exchange Workshop (KDEX’99), pp. 45–52, 1999. Vassilios S. Verykios, A.K. Elmagarmid, E. Bertino, Y. Saygin, and E. Dasseni, “Association Rule Hiding,” Data [7] Shyue-Liang Wang; Bhavesh Parikh,; Ayat Jafari, “Hiding informative association rule sets”, ELSEVIER, Expert Systems with Applications 33 (2007) 316– 323,2006 [8] Shyue-LiangWang ;Dipen Patel ;Ayat Jafari ;Tzung-Pei Hong, “Hiding collaborative recommendation association rules”, Published online: 30 January 2007, Springer Science+Business Media, LLC 2007 [9] Shyue-Liang Wang; Rajeev Maskey; Ayat Jafari; TzungPei Hong “ Efficient sanitization of informative association rules” ACM , Expert Systems with Applications: An International Journal, Volume 35, Issue 1-2, July, 2008 [10] R. M. Oliveira; Osmar R. Za¨_ane, “Privacy Preserving Frequent Itemset Mining”, IEEE International Conference on Data Mining Workshop on Privacy, Security, and Data Mining, Maebashi City, Japan. Conferences in Research and Practice in Information Technology, Vol. 14.2002 [11] Chih-Chia Weng; Shan-Tai Chen; Hung-Che Lo, “A Novel Algorithm for Completely Hiding Sensitive Association Rules”, IEEE Intelligent Systems Design and Applications, 2008.,vol 3, pp.202-208, 2008 [12] Modi, C.N.; Rao, U.P.; Patel, D.R., “Maintaining privacy and data quality in privacy preserving association rule mining”, IEEE 2008 Seventh International Conference on Machine Learning and Applications, pp 1-6, 2010 [13] Y.Saygin, V. S. Verykios, and C. Clifton, “Using Unknowns to Prevent Discovery of Association Rules,” ACM SIGMOD, vol.30(4), pp. 45–54, Dec. 2001 [14] H. Mannila and H. Toivonen, “Levelwise search and borders of theories in knowledge discovery,” Data Mining and Knowledge Discovery, vol.1(3), pp. 241– 258, Sep. 1997. [15] X. Sun and P. S. Yu. Hiding sensitive frequent itemsets by a border–based approach. Computing science and engineering, 1(1):74–94, 2007. [16] A. Gkoulalas-Divanis and V.S. Verykios, “An Integer Programming Approach for Frequent Itemset Hiding,” In Proc. ACM Conf. Information and Knowledge Management (CIKM ’06), Nov. 2006. Gkoulalas-Divanis and V.S. Verykios, “Exact Knowledge Hiding through Database Extension,” IEEE Transactions on Knowledge and Data Engineering, vol. 21(5), pp. 699–713, May 2009. [17] A. [18] Charu C. Aggarwal, Philip S. Yu, Privacy-Preserving Data Mining: Models and Algorithms. Springer Publishing Company Incorporated, 2008, pp. 267-286. 7

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Association Rule Hiding by Heuristic Approach to Reduce Side