Download Poster PKDD07 - University of California, Riverside

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
Y
X
C. Lucchese,
M. Vlachos,
University of Venice, Italy IBM Research
D. Rajan,
IBM Research
P.S. Yu
University of Chicago
Objective: Ownership seal with Mining Guarantees
the trajectories are modified imperceptibly,
but their neighboring objects are not distorted
NN Search
Final
Destination
Clustering
…
Classification
Embed a stamp so that
we can claim ownership
of the data
Output on database and
data mining operations
is the same as on the
original data
Applications: Database Search
Watermark does not change the
nearest neighbor
 Search operations remains same
– outsource data to a mining company
– maintain principal rights of the dataset
NN(x)
y1
y2
x
We want to retain the Nearest Neighbors
of each object.
Determine the maximum watermark
embedding power p which maintains NN
for all objects:
Dp(x, NN(x)) < Dp(x,y)
Applications: Classification Preservation
Modified Dataset including watermark
Dataset of time-series/trajectories with
class labels
Class A
Class A
Class A
Class B
Objective: Distort the data
imperceptibly so that class
labels are maintained.
Unacceptable
Class B
Class B
Acceptable
Applications: Clustering Preservation
 Results of clustering remains the same
– geodesic distances will remain the same
– hierarchical clustering will not be affected
Gray-necked Owl
Monkey Female
Gray-necked Owl
Monkey Male
Orangutan juvenile
Mandrill male
Red Howler
Monkey Male
Mantled Howler
Monkey
Orangutan2 male
Mandrill2 male
Juvenile Baboon
De Brazza Monkey Juvenile Male
De Brazza Monkey Male
Common Chimpanzee male
Common Chimpanzee
Male 2
The secret key is embedded in a domain resilient to
common trajectory transformations
Frequency Domain
Frequency Domain
Phase
ft
Magnitude
same
modified
Phase
Magnitude
ift
watermarked
magnitudes
original data
watermark
Example:
w = [-1 1 -1 -1 1 1 ]
Additive Embedding in Magnitudes
p (embedding power)
watermarked
data
Techniques are also applicable for image shapes
(shapes can be treated as trajectories)
Red Howler Monkey Male
(Alouatta seniculus
seniculus)
Orangutan skull
Extracted Shape
Conversion of skull shape
into a two-dimensional
sequence
Embed the key in the k most important coefficients
Secret information is hidden in some
of the frequency components
Y
X
2 coeffs
16 coeffs
4 coeffs
32 coeffs
8 coeffs
64 coeffs
Select the frequency
coefficients that best describe
the shape of the trajectory
One can select either highest energy
coefficients, or low frequency
coefficients. Removal of the watermark
will be more difficult without destroying
the important trajectory characteristics
key is detected very efficiently even when
it is inserted with low embedding power
Threshold
Frequency Domain
Phase
ft
watermarked
data
Detection of the embedded
key is virtually perfect
Magnitude
correlation
watermark
w = [-1 1 -1 -1 1 1 ]
Better Detection (semi-blind):
Remove ‘background noise’ bias
before the embedding and during the
detection
example of using our technique
for spanning tree preservation
MST before watermarking
MST after watermarking
the proposed fast algorithm prunes a
significant amount of the search space
Finding the maximum embedding power
NN(x)
y
x
z
We need to examine for each power p,
how many times the following is violated:
Dp(x, NN(x)) > Dp(x,y)
Express distance parametrized by the
embedding power of the key
our approach can embed the hidden information more
than 300 times faster than the brute-force approach
The fast search techniques find the same result as the exhaustive search,
but are 2-3 orders of magnitude faster
Running Time
The efficient key embedding + detection
allow for effective key recovery even under attacks
 Geometric Attacks: perfect detection under
Translation/Rotation/Scaling attacks
 Gaussian Noise attack has to destroy the data in order to be
effective
 Decimation attack can be perfectly withstood
 Data Reduction attack (even when pruning 50% of dataset) is not
effective