Download On Summarization and Timeline Generation for Evolutionary Tweet

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Anonymzing Collections of Tree Structured Data
ABSTRACT:
Collections of real-world data usually have implicit or explicit structural relations.
For example, databases link records through foreign keys, and XML documents
express associations between different values through syntax. Privacy preservation,
until now, has focused either on data with a very simple structure, e.g. relational
tables, or on data with very complex structure e.g. social network graphs, but has
ignored intermediate cases, which are the most frequent in practice. In this work,
we focus on tree structured data. Such data stem from various applications, even
when the structure is not directly reflected in the syntax, e.g. XML documents. A
characteristic case is a database where information about a single person is
scattered amongst different tables that are associated through foreign keys. The
paper defines k(m;n)-anonymity, which provides protection against identity
disclosure and proposes a greedy anonymization heuristic that is able to sanitize
large datasets. The algorithm and the quality of the anonymization are evaluated
experimentally.
EXISTING SYSTEM:
 Classification is a fundamental problem in data analysis. Training a classifier
requires accessing a large collection of data. Releasing person-specific data,
such as customer data or patient records, may pose a threat to individual’s
privacy.
 Even after removing explicit identifying information such as Name and
SSN, it is still possible to link released records back to their identities by
matching some combination of non-identifying attributes such as {Sex,Zip,
Birthdate}.
 A useful approach to combat such linking attacks, called k-anonymization, is
anonymizing the linking attributes so that at least k released records match
each value combination of the linking attributes. Previous work attempted to
find an optimal k-anonymization that minimizes some data distortion metric.
 The system argues that minimizing the distortion to the training data is not
relevant to the classification goal that requires extracting the structure of
predication on the “future” data. In this paper, we propose a kanonymization solution for classification. Our goal is to find a kanonymization, not necessarily optimal in the sense of minimizing data
distortion, that preserves the classification structure.
 The system conducted intensive experiments to evaluate the impact of
anonymization on the classification on future data. Experiments on real life
data show that the quality of classification can be preserved even for highly
restrictive anonymity requirements.
PROPOSED SYSTEM:
In the proposed system, the system focuses on tree structured data. Such data stem
from various applications, even when the structure is not directly reflected in the
syntax, e.g. XML documents. A characteristic case is a database where information
about a single person is scattered amongst different tables that are associated
through foreign keys. The paper defines k(m;n)-anonymity, which provides
protection against identity disclosure and proposes a greedy anonymization
heuristic that is able to sanitize large datasets. The algorithm and the quality of the
anonymization are evaluated experimentally.
SYSTEM SPECIFICATION
Hardware Requirements:
•
System
: Pentium IV 3.5 GHz.
•
Hard Disk
: 40 GB.
•
Monitor
: 14’ Colour Monitor.
•
Mouse
: Optical Mouse.
•
Ram
: 1 GB.
Software Requirements:
•
Operating system
: Windows XP or Windows 7, Windows 8.
•
Coding Language
: Java – AWT,Swings,Networking
•
Data Base
: My Sql / MS Access.
•
Documentation
: MS Office
•
IDE
: Eclipse Galileo
•
Development Kit
: JDK 1.6