Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Anonymzing Collections of Tree Structured Data ABSTRACT: Collections of real-world data usually have implicit or explicit structural relations. For example, databases link records through foreign keys, and XML documents express associations between different values through syntax. Privacy preservation, until now, has focused either on data with a very simple structure, e.g. relational tables, or on data with very complex structure e.g. social network graphs, but has ignored intermediate cases, which are the most frequent in practice. In this work, we focus on tree structured data. Such data stem from various applications, even when the structure is not directly reflected in the syntax, e.g. XML documents. A characteristic case is a database where information about a single person is scattered amongst different tables that are associated through foreign keys. The paper defines k(m;n)-anonymity, which provides protection against identity disclosure and proposes a greedy anonymization heuristic that is able to sanitize large datasets. The algorithm and the quality of the anonymization are evaluated experimentally. EXISTING SYSTEM: Classification is a fundamental problem in data analysis. Training a classifier requires accessing a large collection of data. Releasing person-specific data, such as customer data or patient records, may pose a threat to individual’s privacy. Even after removing explicit identifying information such as Name and SSN, it is still possible to link released records back to their identities by matching some combination of non-identifying attributes such as {Sex,Zip, Birthdate}. A useful approach to combat such linking attacks, called k-anonymization, is anonymizing the linking attributes so that at least k released records match each value combination of the linking attributes. Previous work attempted to find an optimal k-anonymization that minimizes some data distortion metric. The system argues that minimizing the distortion to the training data is not relevant to the classification goal that requires extracting the structure of predication on the “future” data. In this paper, we propose a kanonymization solution for classification. Our goal is to find a kanonymization, not necessarily optimal in the sense of minimizing data distortion, that preserves the classification structure. The system conducted intensive experiments to evaluate the impact of anonymization on the classification on future data. Experiments on real life data show that the quality of classification can be preserved even for highly restrictive anonymity requirements. PROPOSED SYSTEM: In the proposed system, the system focuses on tree structured data. Such data stem from various applications, even when the structure is not directly reflected in the syntax, e.g. XML documents. A characteristic case is a database where information about a single person is scattered amongst different tables that are associated through foreign keys. The paper defines k(m;n)-anonymity, which provides protection against identity disclosure and proposes a greedy anonymization heuristic that is able to sanitize large datasets. The algorithm and the quality of the anonymization are evaluated experimentally. SYSTEM SPECIFICATION Hardware Requirements: • System : Pentium IV 3.5 GHz. • Hard Disk : 40 GB. • Monitor : 14’ Colour Monitor. • Mouse : Optical Mouse. • Ram : 1 GB. Software Requirements: • Operating system : Windows XP or Windows 7, Windows 8. • Coding Language : Java – AWT,Swings,Networking • Data Base : My Sql / MS Access. • Documentation : MS Office • IDE : Eclipse Galileo • Development Kit : JDK 1.6