Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Journal of Innovations & Advancement in Computer Science IJIACS ISSN 2347 – 8616 Volume 4, Special Issue May 2015 Analysis of Game Theoretic Approach in Data Mining Security Asha Baby, Anoop Jose, Jisha C.T. Assistant Professor, Dept. of CSE Vimal Jyothi Engineering College, Chemperi Abstract: The growing popularity and development of data mining technologies bring serious threat to the security of individual's sensitive information. Analysis of privacy-sensitive data in a multiparty environment often assumes that the parties are well-behaved and they abide by the protocols. Parties compute whatever is needed, communicate correctly following the rules, and do not collude with other parties for exposing third party sensitive data. The paper offers a more realistic formulation of the PPDM problem as a multi-party game where each party tries to maximize its own objectives. It offers a game-theoretic framework for developing and analyzing new robust PPDM algorithms. Keywords: Data Mining, Sensitive Information, Privacy-Preserving Data Mining, Anonymization, Provenance, Game Theory, Privacy Auction, Anti-Tracking, Nash Equilibrium I. INTRODUCTION The term “data mining'' is often treated as a synonym for another term “knowledge discovery from data'' (KDD) which highlights the goal of the mining process. It deals with discovering interesting patterns and knowledge. There are four different types of users in Data mining: a) Data Provider: the user who owns some data that are desired by the data mining task. b) Data Collector: the user who collects data from data providers and then publishes the data to the data miner. c) Data Miner: the user who performs data mining tasks on the data. d) Decision Maker: the user who makes decisions based on the data mining results in order to achieve certain goals. Fig 1. Data Mining Cycle II. STEPS IN KNOWLEDGE DISCOVERY FROM DATA To obtain useful knowledge from data, the following steps are performed in an iterative way: Step 1: Data preprocessing: Basic operations include data selection (to retrieve data relevant to the KDD task from the database), data cleaning (to remove noise and inconsistent data, to handle the missing data fields, etc.) and data integration (to combine data from multiple sources). 7 Asha Baby, Anoop Jose, Jisha C.T. International Journal of Innovations & Advancement in Computer Science IJIACS ISSN 2347 – 8616 Volume 4, Special Issue May 2015 Step 2: Data transformation: The goal is to transform data into forms appropriate for the mining task, that is, to find useful features to represent the data. Feature selection and feature transformation are basic operations. Step 3: Data mining: This is an essential process where intelligent methods are employed to extract data patterns (e.g. association rules, clusters, classification rules, etc). Step 4: Pattern evaluation and presentation: Basic operations include identifying the truly interesting patterns which represent knowledge, and presenting the mined knowledge in an easy-to-understand fashion. Fig 2. Steps in KDD Existing techniques for privacy preserving data mining include data hiding using micro-aggregation, perturbation, or anonymization, rule hiding, secure multi-party computation (SMC) and distributed data mining. Game theory has been used extensively in economics and finance and security or defense related applications to come up with policies and governing rules. However, applications of game theory in privacy analysis of data mining algorithms in distributed scenarios are an area that is still in its nascent stage. Fig 3. Data Transformations 8 Asha Baby, Anoop Jose, Jisha C.T. International Journal of Innovations & Advancement in Computer Science IJIACS ISSN 2347 – 8616 Volume 4, Special Issue May 2015 The individuals’ privacy may be violated due to mainly two reasons. a) Unauthorized access to personal data. b) Undesired discovery of additional information. For instance, the U.S. retailer Target once received complaints from a customer who was angry that Target sent coupons for baby clothes to his teenage daughter. However, it was true that the daughter was pregnant at that time, and Target correctly inferred the fact by mining its customer data. From this story, we can see that the contact between data mining and privacy security does exist. To deal with the privacy issues in data mining, a sub-field of data mining, referred to as privacy preserving data mining (PPDM) has gained a great development in recent years. The objective of PPDM is to safeguard sensitive information from unsolicited or unsanctioned disclosure, and meanwhile, preserve the utility of the data. III. PPDM (PRIVACY PRESERVING DATA MINING) Its objective is to safeguard sensitive information. It is a two-fold consideration Process. a) Sensitive data should not be directly used. b) Sensitive mining results whose disclosure will result in privacy violation is excluded. IV. GAME THEORY IN DATA PRIVACY Data mining activity is treated as a game played by multiple users. The essential elements include players, actions, payoffs and information. If no players are deviate from their strategy, Nash equilibrium is obtained. When a private data collected and processed for publication, Data user makes price offer to the data collector at the beginning of the game. If he accepts, data collector makes incentives to data providers. Before selling to the data user, data is anonymized. Then how Privacy preserving is done in distributed data mining? Here distrustful parties may take following actions: a) Semi-honest adversary b) Attempts to analyze other’s private data. c) Malicious adversary d) Deviates from protocol and leads to failure of computation. e) Collusion: One colludes with others and exposes other party who doesn’t participate in collusion. PPDARM (privacy preserving distributed association rule mining) is the technique to prevent the collusion. In such a situation a Nash equilibrium is obtained. When participating in a data mining activity, each user has his own consideration about the benefit he may obtain and the (privacy) cost he has to pay. For example, a company can make profit from the knowledge mined from customers' data, but he may need to pay high price for data containing sensitive information; a customer can get monetary incentives or better services by providing personal data to the company, but meanwhile he has to consider the potential privacy risks. Generally, the user would act in the way that can bring him more benefits, and one user's action may have effect on other users' interests. Therefore, it is natural to treat the data mining activity as a game played by multiple users, and apply game theoretical approaches to analyze the iterations among different users. 9 Asha Baby, Anoop Jose, Jisha C.T. International Journal of Innovations & Advancement in Computer Science IJIACS ISSN 2347 – 8616 Volume 4, Special Issue May 2015 Table 1: Roles of Data Provider, Data Collector and Data User along with their actions in Game Theory Information is modeled using the concept of information set which represents a player's knowledge about the values of different variables in the game. The outcome of the game is a set of elements picked from the values of actions, payoffs, and other variables after the game is played out. Game theory provides a formal approach to model situations where a group of agents have to choose optimum actions considering the mutual effects of other agents' decisions. The essential elements of a game are players, actions, payoffs, and information. Players have actions that they can perform at designated times in the game. As a result of the performed actions, players receive payoffs. The payoff to each player depends on both the player's action and other players' actions. A player is called rational if he acts in such a way as to maximize his payoff. A player's strategy is a rule that tells him which action to choose at each instant of the game, given his information set. A strategy problem is an ordered set consisting of one strategy for each of the players in the game. Equilibrium is a strategy problem consisting of a best strategy for each of the players in the game. The most important equilibrium concept for the majority of games is Nash equilibrium. A strategy problem is a Nash equilibrium if no player has incentive to deviate from his strategy, given that other players do not deviate. Game theory has been successfully applied to various fields, such as economics, political science, computer science, etc. The strategies chosen by a party at any step change the local state of the party. The entire play of the game by player i can therefore be viewed as a process of traversing through a game tree where each tree-node represents the local state described by player i’s initial state and messages communicated with other nodes. Each run r represents a path through the tree ending at a leaf node. 10 Asha Baby, Anoop Jose, Jisha C.T. International Journal of Innovations & Advancement in Computer Science IJIACS ISSN 2347 – 8616 Volume 4, Special Issue May 2015 V. CONCLUSION For data provider, his objective is to effectively control the amount of sensitive data revealed to others. For data collector, his objective is to release data to data miners without disclosing data provider’s identity. For data miner, his objective is to get correct data mining results without containing the sensitive information. For decision maker, his objective is to make a correct judgment of the data credibility. Real-world problem should be formalized in a more realistic way, so that the game theoretical analysis can have more practical implications. VI. REFERENCES [1] J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques. San Mateo, CA, USA: Morgan Kaufmann, 2006. [2] L. Brankovic and V. Estivill-Castro, ``Privacy issues in knowledge discovery and data mining,'' in Proc. Austral. Inst. Comput. Ethics Conf., 1999, pp. 89-99. [3] R. Agrawal and R. Srikant, ``Privacy-preserving data mining,'' ACM SIGMOD Rec., vol. 29, no. 2, pp. 439-450, 2000. [4] Y. Lindell and B. Pinkas, ``Privacy preserving data mining,'' in Advances in Cryptology. Berlin, Germany: Springer-Verlag, 2000, pp. 3654. [5] C. C. Aggarwal and S. Y. Philip, A General Survey of Privacy-Preserving Data Mining Models and Algorithms. New York, NY, USA: Springer-Verlag, 2008. [6] M. B. Malik, M. A. Ghazi, and R. Ali, ``Privacy preserving data mining techniques: Current scenario and future prospects,'' in Proc. 3rd Int. Conf. Comput. Commun. Technol. (ICCCT), Nov. 2012, pp. 2632. [7] S. Matwin, ``Privacy-preserving data mining techniques: Survey and challenges,'' in Discrimination and Privacy in the Information Society. Berlin, Germany: Springer-Verlag, 2013, pp. 209221. [8] E. Rasmusen, Games and Information: An Introduction to Game Theory, vol. 2. Cambridge, MA, USA: Blackwell, 1994. [9] N. R. Nanavati and D. C. Jinwala, ``A novel privacy preserving game theoretic repeated rational secret sharing scheme for distributed data mining,'' vol. 91, 2013. [Online]. Available: http://www.researchgate.net/ publication/256765823_A_NOVEL_PRIVACY_PRESERVING_GAME_THEORETIC_REPEA TED_RATIONAL_SECRET_SHARING_SCHEME_FOR_DISTRIBUTED_DATA_MINING [10] M. Halkidi and I. Koutsopoulos, ``A game theoretic framework for data privacy preservation in recommender systems,'' in Proc. Mach. Learn. Knowl. Discovery Databases, 2011, pp. 629-644. 11 Asha Baby, Anoop Jose, Jisha C.T.