Download Optimal Bounds for Johnson-Lindenstrauss Transforms and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Optimal Bounds for JohnsonLindenstrauss Transforms and
Streaming Problems with SubConstant Error
T.S. Jayram
David Woodruff
IBM Almaden
Data Stream Model
• Have a stream of m updates to an n-dimensional vector v
– “add x to coordinate i”
– Insertion model -> all updates x are positive
– Turnstile model -> x can be positive or negative
– stream length and updates < poly(n)
• Estimate statistics of v
–
–
–
–
# of distinct elements F0
Lp-norm |v|p = (Σi |vi|p )1/p
entropy
and so on
• Goal: output a (1+ε)-approximation with limited memory
Lots of “Optimal” Papers
• Lots of “optimal” results
– “An optimal algorithm for the distinct elements problem”
[KNW]
– “Fast moment estimation in optimal space” [KNPW]
– “A near-optimal algorithm for estimating entropy of a stream”
[CCM]
– “Optimal approximations of the frequency moments of data
streams” [IW]
– “A near-optimal algorithm for L1-difference” [NW]
– “Optimal space lower bounds for all frequency moments” [W]
• This paper
– Optimal Bounds for Johnson-Lindenstrauss Transforms and
Streaming Problems with Sub-Constant Error
What Is Optimal?
• F0 = # of non-zero entries in v
• “For a stream of indices in {1, …, n}, our algorithm
computes a (1+ε)-approximation using an optimal O(ε-2 +
log n) bits of space with 2/3 success probability… This
probability can be amplified by independent repetition.”
• If we want high probability, say, 1-1/n, this increases the
space by a multiplicative log n
• So “optimal” algorithms are only optimal for algorithms
with constant success probability
Can We Improve the Lower Bounds?
x2
-2
ε
{0,1}
y2
-2
ε
{0,1}
Gap-Hamming: either Δ(x,y) > ½ + ε or Δ(x,y) < ½-ε
Lower bound of Ω(ε-2) with 1/3 error probability
But upper bound of ε-2 with 0 error probability
Our Results
Streaming Results
• Independent repetition is optimal!
• Estimating Lp-norm in turnstile model up to 1+ε w.p. 1-δ
– Ω(ε-2 log n log 1/δ) bits for any p
– [KNW] get O(ε-2 log n log 1/δ) for 0 · p · 2
• Estimating F0 in insertion model up to 1+ε w.p. 1-δ
– Ω(ε-2log 1/δ + log n) bits
– [KNW] get O(ε-2 log 1/δ) for ε-2 > log n
• Estimating entropy in turnstile model up to 1+ε w.p. 1-δ
– Ω(ε-2log n log 1/δ) bits
– Improves Ω(ε-2 log n) bound [KNW]
Johnson-Lindenstrauss Transforms
• Let A be a random matrix so that with probability
1- δ, for any fixed q 2 Rd
|Aq|2 = (1 ± ε) |q|2
• [JL] A can be a 1/ε2 log 1/δ x d matrix
– Gaussians or sign variables work
• [Alon] A needs to have (1/ε2 log 1/δ) / log 1/ε
rows
• Our result: A needs to have 1/ε2 log 1/δ rows
Communication Complexity
Separation
f(x,y) 2 {0,1}
y
x
1
0
D1/3, ρ (f) = communication of best 1-way deterministic protocol
that errs w.p. 1/3 on distribution ρ
[KNR]: R||1/3(f) = maxproduct distributions ¹ £ λ D ¹ £ λ,1/3(f)
Communication Complexity
Separation
f(x,y) 2 {0,1}
VC-dimension: maximum number r of columns for which all
2r rows occur in communication matrix on these columns
[KNR]: R||1/3(f) = Θ(VC-dimension(f))
Our result: there exist f and g with VC-dimension k, but:
R||δ(f) = Θ(k log 1/δ) while
R||δ(g) = Θ(k)
Our Techniques
Lopsided Set Intersection (LSI)
U = 1/ε2 ¢ 1/δ
Is S Å T = ;?
S ½ {1, 2, …, U}
|S| = 1/ε2
T ½ {1, 2, …, U}
|T| = 1/δ
- Alice cannot describe S with o(ε-2 log U) bits
- If x, y are uniform then with constant probability, S Å T = ;
- R||1/3(LSI) > Duniform, 1/3 (LSI) = Ω(ε-2 log 1/δ)
Lopsided Set Intersection (LSI2)
U = 1/ε2 ¢ 1/δ
Is S Å T = ;?
S ½ {1, 2, …, U}
|S| = 1/ε2
T ½ {1, 2, …, U}
|T| = 1
-R||δ/3(LSI2) ¸ R||1/3(LSI) = Ω(ε-2 log 1/δ)
- Union bound over set elements in LSI instance
Low Error Inner Product
U = 1/ε2 ¢ 1/δ
Does <x,y> = 0?
x 2 {0, ε}U
|x|2 = 1
y 2 {0, 1}U
|y|2 = 1
Estimate <x, y> up to ε w.p. 1-δ -> solve LSI2 w.p. 1-δ
R||δ (inner productε) = Ω(ε-2 log 1/δ)
L2-estimationε
U = 1/ε2 ¢ 1/δ
- log 1/δWhat
factor
is
new,
but
is
|x-y|
?
2
want an (ε-2 log n log
U
1/δ)
lower
bound
U
y
2
{0,
1}
x 2 {0, ε}
|x|2 = 1
|y|2 = 1
- Can use a known trick to
2
- |x-y|22 = |x|
|y|22extra
- 2<x, y>
2 –factor
2<x,y>
2 +an
get
log= n
- Estimate |x-y|2 up to (1+Θ(ε))-factor solves inner-productε
- So R||δ (L2-estimationε) = Ω(ε-2 log 1/δ)
Augmented Lopsided Set
Intersection (ALSI2)
Universe [U] = [1/ε2 ¢ 1/δ]
j 2 [U]
i* 2 {1, 2, …, r}
Si*+1 …, Sr
S1, …, Sr ½ [U]
All i: |Si| = 1/ε2
Is j 2 Si*?
R||1/3(ALSI2) = (r ε-2 log 1/δ)
Reduction of ALSI2 to L2-estimationε
- Set r = Θ(log n)
S1
S2
…
Sr
- R|| δ(L2-estmationε) = (ε-2 log n log
1/δ)
x1
j
yi*
|| (L
x2
S -estimationxεi*+1
- Streaming
Space
>
R
)
δ 2i*+1
x
…
…
…
xr
Sr
xr
}
y - x = 10i* yi* - i=1i* 10i ¢ xi
|y-x|2 is dominated by 10i* |yi* – xi*|2
}
y
Lower Bounds for JohnsonLindenstrauss
x 2 {-nO(1), …, nO(1)} t
y 2 {-nO(1), …, nO(1)} t
Use public randomness to agree on a JL matrix A
Ax
- Can estimate |x-y|2 up to 1+ε
w.p. 1-δ
- #rows(A) = (r ε-2 log 1/δ /log
n)
- Ay
|A(x-y)|2
Low-Error Hamming Distance
Universe = [n]
Δ(x,y) = Hamming Distance
between x and y
x 2 {0,1}n
y 2 {0,1}n
• R||δ (Δ(x,y)ε) = (ε-2 log 1/δ log n)
• Reduction to ALSI2
• Gap-Hamming to LSI2 reductions with Low Error
• Implies our lower bounds for estimating
• Any Lp-norm
• Distinct Elements
• Entropy
Conclusions
• Prove first streaming space lower bounds that
depend on probability of error δ
– Optimal for Lp-norms, distinct elements
– Improves lower bound for entropy
– Optimal dimensionality bound for JL transforms
• Adds several twists to augmented indexing proofs
– Augmented indexing with a small set in a large domain
– Proof builds upon lopsided set disjointness lower bounds
– Uses multiple Gap-Hamming to Indexing reductions that
handle low error
ALSI2 to Hamming Distance
multiple copies by
S1, …, Sr ½ Embed
[1/ε2 ¢ 1/δ]
2
duplicating
coordinates at
All i: |Si| = 1/ε
different scales
j 2 [U]
i* 2 {1, 2, …, r}
Si*+1 …, Sr
- Let t = 1/ ε2 log 1/δ
- Use public coin to generate t random strings b1, …, bt 2 {0,1}t
- Alice sets xi = majorityk in Si bi, k
- Bob sets yi = bi ,j