Download PPT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Deterministic Length Reduction:
Fast Convolution in Sparse Data
and Applications
Written by:
Amihood Amir, Oren Kapah and Ely Porat
Motivation – Point Set Matching
 Integer 1-D Point Set Matching:
 T: (t1,t2,…,tn)
 P: (p1,p2,…,pm)
 Where ti and pi are integers.
 Let N=tn, M=pm. (the maximal index)
 Time: O(nm), O(N·log(M))
Motivation – Point Set Matching
 2-D Point Set Matching – Searching in Music:
 T: (i1,j1),(i2,j2),…,(in,jn)
 P: (i1,j1),(i2,j2),…,(im,jm)
Pattern
 Dimension Reduction: (i,j) →i·N + j
Text
Motivation – Generalized Case
 The generalized case of these problems is the d-
Dimensional sparse wildcard matching problem.
 Problem Definition: Given d-Dimensional text T with
zeros and non-zeros, and a d-Dimensional pattern P
with wildcards and non-zeros. Find all the locations
where P matches T.
 Applications: d-Dimensional point set matching,
searching in music, protein activity research, etc.
Length Reduction
 Goal: Given two vectors V1&V2, obtain two vectors
V’1&V’2 of size O(n1) such that all non-zero in V1
and in V2 will appear as singletons in respectively
while maintaining the distance property.
 The Distance Property: If V’2[f(0)] is aligned with V’1[f(i)],
then V’2[f(j)] will be aligned with V’1[f(i + j)].
 Using the reduced size vectors, matching can be done in
time O(n1log(n1)) using convolutions.
Example: Length Reduction
The vectors are given as sets of pairs: (index, value).
V1: (0, 5), (6, 2), (13, 3), (19, 1)
V2: (0, 2), (7, 3)
Length Reduction Function: mod(5)
V’1:
5 2 0 3 1
V’2:
2 0 3 0 0
The Randomized Algorithm
(Cole & Hariharan – STOC02)
 Idea: Find a set of log(n) short vectors, in which with
high probability, each non-zero in V, appears as
a singleton in at least one of the vectors.
 Hash functions: (ax mod(q))mod(s). Where q is a
large prime number, and s is O(n).
 If s is c·n, then the probability of a non-zero
appearing as a multiple is constant.
 Using log(n) different hash functions will reduce the
failure probability exponentially.
The Randomized Algorithm
Sources of Errors
1.
Some non-zeros may appear only as multiples in all
the set of vectors.
2.
The non-zero from the text which was aligned with
the non-zero from the pattern came from a different
index (false matches).
3.
This algorithm was created for matching, but in
convolution each non-zero should be calculated
only once.
Deterministic Length Reduction
 Our Goal: Find a set of log(n) hash functions,
which will ensure that each non-zero appears as
a singleton at least once.
 Finding the hash functions is done in a preprocessing
step based on V1.
 The algorithm distinguish between 2 cases:


N1 is polynomial in n1.
N1 is exponential in n1.
The Polynomial case: N<nc
 Let q be a prime number of size O(n), and mod(q) be
the suggested hash function.
 Let i,j be the indices of two non-zeros.
 Observation: If i and j are mapped into the same
location, it means that q divides dij.
 Observation: There are at most c prime numbers of
size O(n), which divides dij.
 Corollary: A non-zero can appear as a multiple in at
most c·n prime numbers.
Choosing Prime Numbers
 Test 2c·n prime numbers (of
size O(nlogn) ), and build the
following table:
 Each column represents a
non-zero (n columns).
 Each row represents a
prime number (2c·n rows).
 Reminder: Each non-zero can
appear as a multiple at
most c·n times.
 Corollary: The table is at least
half full with ones.
NZ1
NZ2
NZ3
NZ4
NZ5
P1
1
0
0
1
0
P2
1
1
0
1
0
P3
0
0
0
0
1
P4
1
0
1
0
1
P5
0
1
1
1
0
P6
0
1
0
0
1
P7
1
0
1
1
0
P8
0
0
1
1
1
P9
1
1
0
0
0
P10
0
1
1
0
1
P11
0
1
0
1
1
P12
1
0
1
0
0
Choosing Prime Numbers: Cont.
1.
2.
3.
Select a prime number which
generates a row that is at least
half full. (for example P2)
Delete the row and all the
columns in which there was 1
in the deleted row.
Repeat steps 1 and 2 until the
whole table is deleted
Slected Primes: P2, P4,
Time: O(n2)
NZ31
NZ52
NZ3
NZ4
NZ5
P1
01
0
0
1
0
P32
01
1
0
1
0
P43
1
0
10
0
0
1
P54
1
0
1
0
1
P65
0
1
1
1
0
P76
10
01
0
0
1
P87
1
10
1
1
0
P98
0
0
1
1
1
PP10
9
1
1
0
0
0
P10
11
0
1
1
0
1
P12
11
10
01
0
1
1
P12
1
0
1
0
0
The Exponential Case: n<2n
 Idea: Reduce the length of the vector to polynomial
and continue with the previous algorithm.
 Any distance dij can be divided by at most n prime
numbers.
 There are at most n2 different distances.
 Corollary: There are at most n3 prime numbers which
generates multiples.
The Reduction Algorithm.
1. Choose a prime number q of size O(n4).
2. Create the reduced size vector using the mod(q)
hash function.
3. Repeat steps 1&2 if a multiple was created.
4. Duplicate the obtained vector (create a vector of size
2q), to allow further reduction of the vector.
Time: O(n4)
The Randomized Algorithm
Sources of Errors
1.
Some non-zeros may appear only as multiples in all
the set of vectors.
2.
The non-zero from the text which was aligned with
the non-zero from the pattern came from a different
index (false matches).
3.
This algorithm was created for matching, but in
convolution each non-zero should be calculated
only once.
The Convolution Algorithm
1.
For each prime number Pi:
1.
2.
3.
4.
2.
Create the reduced size vectors V’1,i &V’2,i using the
indices of the non-zeros and perform shift matching.
Create the reduced size vectors V’1,i &V’2,i using 1’s
instead of the non-zeros and perform convolution.
Create the reduced size vectors V’1,i &V’2,i using the values
of the non-zeros and perform convolution.
Zero the value of the non-zeros appeared as singletons.
For all indices where shift matching was found:
1.
2.
Sum the results of the 1’s convolutions.
If the result is n2 then sum the results of the values
convolutions and report the result.
Time: O(nlog3(n))
Example
V1: (0, 5), (5, 2), (13, 3), (20, 1)
V2: (0, 2), (8, 3)
Prime Numbers: 5,7
V’1,1: 0 0 0
13
0
0 0 0 1 0
0 0 0 3 0
V’2,1:
8
0
1 0 0 1 0
2 0 0 3 0
‘0’
0 0
(5, 1, 9), (13, 1, 6)
V’1,2:
‘0’
0 0 0 0
V’2,2:
‘0’
8
0
5 0 0 0 0 2 0
0 0 0 0 0
2 3 0 0 0 0 0
(0, 1, 10), (5, 1, 4)
5
Conclusions and Open Problems
 A deterministic algorithm for length reduction and fast
convolution was presented.


Preprocessing time: O(n2) – Polynomial case, O(n4) –
Exponential case.
Running time: O(nlog2n)
 Open problems:
 Can the preprocessing time be reduced?
 Can the size of the vectors be reduced?
 Can the number of vectors be reduced?
Thank You!
Questions?
Related documents