Download Analysis of Game Theoretic Approach in Data Mining Security

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
International Journal of Innovations & Advancement in Computer Science
IJIACS
ISSN 2347 – 8616
Volume 4, Special Issue
May 2015
Analysis of Game Theoretic Approach in Data Mining Security
Asha Baby, Anoop Jose, Jisha C.T.
Assistant Professor, Dept. of CSE
Vimal Jyothi Engineering College, Chemperi
Abstract: The growing popularity and development of data mining technologies bring serious threat
to the security of individual's sensitive information. Analysis of privacy-sensitive data in a multiparty environment often assumes that the parties are well-behaved and they abide by the protocols.
Parties compute whatever is needed, communicate correctly following the rules, and do not collude
with other parties for exposing third party sensitive data. The paper offers a more realistic
formulation of the PPDM problem as a multi-party game where each party tries to maximize its own
objectives. It offers a game-theoretic framework for developing and analyzing new robust PPDM
algorithms.
Keywords: Data Mining, Sensitive Information, Privacy-Preserving Data Mining, Anonymization,
Provenance, Game Theory, Privacy Auction, Anti-Tracking, Nash Equilibrium
I.
INTRODUCTION
The term “data mining'' is often treated as a synonym for another term “knowledge discovery from
data'' (KDD) which highlights the goal of the mining process. It deals with discovering interesting
patterns and knowledge.
There are four different types of users in Data mining:
a)
Data Provider: the user who owns some data that are desired by the data mining task.
b)
Data Collector: the user who collects data from data providers and then publishes the data to
the data miner.
c)
Data Miner: the user who performs data mining tasks on the data.
d)
Decision Maker: the user who makes decisions based on the data mining results in order to
achieve certain goals.
Fig 1. Data Mining Cycle
II.
STEPS IN KNOWLEDGE DISCOVERY FROM DATA
To obtain useful knowledge from data, the following steps are performed in an iterative way:
Step 1: Data preprocessing: Basic operations include data selection (to retrieve data relevant to the
KDD task from the database), data cleaning (to remove noise and inconsistent data, to handle the
missing data fields, etc.) and data integration (to combine data from multiple sources).
7
Asha Baby, Anoop Jose, Jisha C.T.
International Journal of Innovations & Advancement in Computer Science
IJIACS
ISSN 2347 – 8616
Volume 4, Special Issue
May 2015
Step 2: Data transformation: The goal is to transform data into forms appropriate for the mining task,
that is, to find useful features to represent the data. Feature selection and feature transformation are
basic operations.
Step 3: Data mining: This is an essential process where intelligent methods are employed to extract
data patterns (e.g. association rules, clusters, classification rules, etc).
Step 4: Pattern evaluation and presentation: Basic operations include identifying the truly interesting
patterns which represent knowledge, and presenting the mined knowledge in an easy-to-understand
fashion.
Fig 2. Steps in KDD
Existing techniques for privacy preserving data mining include data hiding using micro-aggregation,
perturbation, or anonymization, rule hiding, secure multi-party computation (SMC) and distributed
data mining. Game theory has been used extensively in economics and finance and security or
defense related applications to come up with policies and governing rules. However, applications of
game theory in privacy analysis of data mining algorithms in distributed scenarios are an area that is
still in its nascent stage.
Fig 3. Data Transformations
8
Asha Baby, Anoop Jose, Jisha C.T.
International Journal of Innovations & Advancement in Computer Science
IJIACS
ISSN 2347 – 8616
Volume 4, Special Issue
May 2015
The individuals’ privacy may be violated due to mainly two reasons.
a)
Unauthorized access to personal data.
b)
Undesired discovery of additional information.
For instance, the U.S. retailer Target once received complaints from a customer who was angry that
Target sent coupons for baby clothes to his teenage daughter. However, it was true that the daughter
was pregnant at that time, and Target correctly inferred the fact by mining its customer data. From
this story, we can see that the contact between data mining and privacy security does exist. To deal
with the privacy issues in data mining, a sub-field of data mining, referred to as privacy preserving
data mining (PPDM) has gained a great development in recent years. The objective of PPDM is to
safeguard sensitive information from unsolicited or unsanctioned disclosure, and meanwhile,
preserve the utility of the data.
III.
PPDM (PRIVACY PRESERVING DATA MINING)
Its objective is to safeguard sensitive information. It is a two-fold consideration Process.
a) Sensitive data should not be directly used.
b) Sensitive mining results whose disclosure will result in privacy violation is excluded.
IV.
GAME THEORY IN DATA PRIVACY
Data mining activity is treated as a game played by multiple users. The essential elements include
players, actions, payoffs and information. If no players are deviate from their strategy, Nash
equilibrium is obtained.
When a private data collected and processed for publication, Data user makes price offer to the data
collector at the beginning of the game. If he accepts, data collector makes incentives to data
providers. Before selling to the data user, data is anonymized.
Then how Privacy preserving is done in distributed data mining? Here distrustful parties may take
following actions:
a)
Semi-honest adversary
b)
Attempts to analyze other’s private data.
c)
Malicious adversary
d)
Deviates from protocol and leads to failure of computation.
e)
Collusion: One colludes with others and exposes other party who doesn’t participate in
collusion.
PPDARM (privacy preserving distributed association rule mining) is the technique to prevent
the collusion. In such a situation a Nash equilibrium is obtained.
When participating in a data mining activity, each user has his own consideration about the benefit
he may obtain and the (privacy) cost he has to pay. For example, a company can make profit from
the knowledge mined from customers' data, but he may need to pay high price for data containing
sensitive information; a customer can get monetary incentives or better services by providing
personal data to the company, but meanwhile he has to consider the potential privacy risks.
Generally, the user would act in the way that can bring him more benefits, and one user's action may
have effect on other users' interests. Therefore, it is natural to treat the data mining activity as a game
played by multiple users, and apply game theoretical approaches to analyze the iterations among
different users.
9
Asha Baby, Anoop Jose, Jisha C.T.
International Journal of Innovations & Advancement in Computer Science
IJIACS
ISSN 2347 – 8616
Volume 4, Special Issue
May 2015
Table 1: Roles of Data Provider, Data Collector and Data User along with their actions in Game Theory
Information is modeled using the concept of information set which represents a player's
knowledge about the values of different variables in the game. The outcome of the game is a set of
elements picked from the values of actions, payoffs, and other variables after the game is played out.
Game theory provides a formal approach to model situations where a group of agents have to choose
optimum actions considering the mutual effects of other agents' decisions.
The essential elements of a game are players, actions, payoffs, and information. Players have actions
that they can perform at designated times in the game. As a result of the performed actions, players
receive payoffs. The payoff to each player depends on both the player's action and other players'
actions.
A player is called rational if he acts in such a way as to maximize his payoff. A player's
strategy is a rule that tells him which action to choose at each instant of the game, given his
information set. A strategy problem is an ordered set consisting of one strategy for each of the
players in the game. Equilibrium is a strategy problem consisting of a best strategy for each of the
players in the game.
The most important equilibrium concept for the majority of games is Nash equilibrium. A strategy
problem is a Nash equilibrium if no player has incentive to deviate from his strategy, given that other
players do not deviate. Game theory has been successfully applied to various fields, such as
economics, political science, computer science, etc. The strategies chosen by a party at any step
change the local state of the party. The entire play of the game by player i can therefore be viewed as
a process of traversing through a game tree where each tree-node represents the local state described
by player i’s initial state and messages communicated with other nodes. Each run r represents a path
through the tree ending at a leaf node.
10
Asha Baby, Anoop Jose, Jisha C.T.
International Journal of Innovations & Advancement in Computer Science
IJIACS
ISSN 2347 – 8616
Volume 4, Special Issue
May 2015
V.
CONCLUSION
For data provider, his objective is to effectively control the amount of sensitive data revealed to
others. For data collector, his objective is to release data to data miners without disclosing data
provider’s identity. For data miner, his objective is to get correct data mining results without
containing the sensitive information. For decision maker, his objective is to make a correct judgment
of the data credibility. Real-world problem should be formalized in a more realistic way, so that the
game theoretical analysis can have more practical implications.
VI.
REFERENCES
[1] J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques. San Mateo, CA, USA:
Morgan Kaufmann, 2006.
[2] L. Brankovic and V. Estivill-Castro, ``Privacy issues in knowledge discovery and data mining,''
in Proc. Austral. Inst. Comput. Ethics Conf., 1999, pp. 89-99.
[3] R. Agrawal and R. Srikant, ``Privacy-preserving data mining,'' ACM SIGMOD Rec., vol. 29, no.
2, pp. 439-450, 2000.
[4] Y. Lindell and B. Pinkas, ``Privacy preserving data mining,'' in Advances in Cryptology. Berlin,
Germany: Springer-Verlag, 2000, pp. 3654.
[5] C. C. Aggarwal and S. Y. Philip, A General Survey of Privacy-Preserving Data Mining Models
and Algorithms. New York, NY, USA: Springer-Verlag, 2008.
[6] M. B. Malik, M. A. Ghazi, and R. Ali, ``Privacy preserving data mining techniques: Current
scenario and future prospects,'' in Proc. 3rd Int. Conf. Comput. Commun. Technol. (ICCCT),
Nov. 2012, pp. 2632.
[7] S. Matwin, ``Privacy-preserving data mining techniques: Survey and challenges,'' in
Discrimination and Privacy in the Information Society. Berlin, Germany: Springer-Verlag, 2013,
pp. 209221.
[8] E. Rasmusen, Games and Information: An Introduction to Game Theory, vol. 2. Cambridge, MA,
USA: Blackwell, 1994.
[9] N. R. Nanavati and D. C. Jinwala, ``A novel privacy preserving game theoretic repeated rational
secret sharing scheme for distributed data mining,'' vol. 91, 2013. [Online]. Available:
http://www.researchgate.net/
publication/256765823_A_NOVEL_PRIVACY_PRESERVING_GAME_THEORETIC_REPEA
TED_RATIONAL_SECRET_SHARING_SCHEME_FOR_DISTRIBUTED_DATA_MINING
[10] M. Halkidi and I. Koutsopoulos, ``A game theoretic framework for data privacy preservation in
recommender systems,'' in Proc. Mach. Learn. Knowl. Discovery Databases, 2011, pp. 629-644.
11
Asha Baby, Anoop Jose, Jisha C.T.