Download Abstract - Compassion Software Solutions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Database wikipedia , lookup

Big data wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Concurrency control wikipedia , lookup

Data model wikipedia , lookup

Data center wikipedia , lookup

Data analysis wikipedia , lookup

Operational transformation wikipedia , lookup

Forecasting wikipedia , lookup

Information privacy law wikipedia , lookup

Clusterpoint wikipedia , lookup

Data vault modeling wikipedia , lookup

3D optical data storage wikipedia , lookup

Database model wikipedia , lookup

Business intelligence wikipedia , lookup

Transcript
Project Name : Incremental Detection of Inconsistencies in
Distributed Data .
Technology : Java, Sql
Domain : Data Mining
Abstract
This paper investigates incremental detection of errors in distributed data. Given a distributed
database D, a set _ of conditional functional dependencies (CFDs), the set V of violations of the
CFDs in D, and updates _D to D, it is to find, with minimum data shipment, changes _V to V in
response to _D. The need for the study is evident since real-life data is often dirty, distributed
and frequently updated. It is often prohibitively expensive to recompute the entire set of
violations when D is updated. We show that the incremental detection problem is NP-complete
for database D that is partitioned either vertically or horizontally, even when _ and D are fixed.
Nevertheless, we show that it is bounded: there exist algorithms to detect errors such that their
computational cost and data shipment are both linear in the size of _D and _V, independent of
the size of the database D. We provide such incremental algorithms for vertically partitioned data
and horizontally partitioned data, and show that the algorithms are optimal. We further propose
optimization techniques for the incremental algorithm over vertical partitions to reduce data
shipment. We verify experimentally, using real-life data on Amazon Elastic Compute Cloud
(EC2), that our algorithms substantially outperform their batch counterparts.
Index Terms—Incremental algorithms, distributed data, conditional functional
dependencies, error detection .
Contact Us:
F-303, Second Floor,
Megacenter, Magarpatta chowk ,
Pune-solapur road, Hadapsar, Pune
9260528020 / 020 66200913
Mail us [email protected]
www.compassionsoftwares.com