Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Optimal Bounds for JohnsonLindenstrauss Transforms and Streaming Problems with SubConstant Error T.S. Jayram David Woodruff IBM Almaden Data Stream Model • Have a stream of m updates to an n-dimensional vector v – “add x to coordinate i” – Insertion model -> all updates x are positive – Turnstile model -> x can be positive or negative – stream length and updates < poly(n) • Estimate statistics of v – – – – # of distinct elements F0 Lp-norm |v|p = (Σi |vi|p )1/p entropy and so on • Goal: output a (1+ε)-approximation with limited memory Lots of “Optimal” Papers • Lots of “optimal” results – “An optimal algorithm for the distinct elements problem” [KNW] – “Fast moment estimation in optimal space” [KNPW] – “A near-optimal algorithm for estimating entropy of a stream” [CCM] – “Optimal approximations of the frequency moments of data streams” [IW] – “A near-optimal algorithm for L1-difference” [NW] – “Optimal space lower bounds for all frequency moments” [W] • This paper – Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Sub-Constant Error What Is Optimal? • F0 = # of non-zero entries in v • “For a stream of indices in {1, …, n}, our algorithm computes a (1+ε)-approximation using an optimal O(ε-2 + log n) bits of space with 2/3 success probability… This probability can be amplified by independent repetition.” • If we want high probability, say, 1-1/n, this increases the space by a multiplicative log n • So “optimal” algorithms are only optimal for algorithms with constant success probability Can We Improve the Lower Bounds? x2 -2 ε {0,1} y2 -2 ε {0,1} Gap-Hamming: either Δ(x,y) > ½ + ε or Δ(x,y) < ½-ε Lower bound of Ω(ε-2) with 1/3 error probability But upper bound of ε-2 with 0 error probability Our Results Streaming Results • Independent repetition is optimal! • Estimating Lp-norm in turnstile model up to 1+ε w.p. 1-δ – Ω(ε-2 log n log 1/δ) bits for any p – [KNW] get O(ε-2 log n log 1/δ) for 0 · p · 2 • Estimating F0 in insertion model up to 1+ε w.p. 1-δ – Ω(ε-2log 1/δ + log n) bits – [KNW] get O(ε-2 log 1/δ) for ε-2 > log n • Estimating entropy in turnstile model up to 1+ε w.p. 1-δ – Ω(ε-2log n log 1/δ) bits – Improves Ω(ε-2 log n) bound [KNW] Johnson-Lindenstrauss Transforms • Let A be a random matrix so that with probability 1- δ, for any fixed q 2 Rd |Aq|2 = (1 ± ε) |q|2 • [JL] A can be a 1/ε2 log 1/δ x d matrix – Gaussians or sign variables work • [Alon] A needs to have (1/ε2 log 1/δ) / log 1/ε rows • Our result: A needs to have 1/ε2 log 1/δ rows Communication Complexity Separation f(x,y) 2 {0,1} y x 1 0 D1/3, ρ (f) = communication of best 1-way deterministic protocol that errs w.p. 1/3 on distribution ρ [KNR]: R||1/3(f) = maxproduct distributions ¹ £ λ D ¹ £ λ,1/3(f) Communication Complexity Separation f(x,y) 2 {0,1} VC-dimension: maximum number r of columns for which all 2r rows occur in communication matrix on these columns [KNR]: R||1/3(f) = Θ(VC-dimension(f)) Our result: there exist f and g with VC-dimension k, but: R||δ(f) = Θ(k log 1/δ) while R||δ(g) = Θ(k) Our Techniques Lopsided Set Intersection (LSI) U = 1/ε2 ¢ 1/δ Is S Å T = ;? S ½ {1, 2, …, U} |S| = 1/ε2 T ½ {1, 2, …, U} |T| = 1/δ - Alice cannot describe S with o(ε-2 log U) bits - If x, y are uniform then with constant probability, S Å T = ; - R||1/3(LSI) > Duniform, 1/3 (LSI) = Ω(ε-2 log 1/δ) Lopsided Set Intersection (LSI2) U = 1/ε2 ¢ 1/δ Is S Å T = ;? S ½ {1, 2, …, U} |S| = 1/ε2 T ½ {1, 2, …, U} |T| = 1 -R||δ/3(LSI2) ¸ R||1/3(LSI) = Ω(ε-2 log 1/δ) - Union bound over set elements in LSI instance Low Error Inner Product U = 1/ε2 ¢ 1/δ Does <x,y> = 0? x 2 {0, ε}U |x|2 = 1 y 2 {0, 1}U |y|2 = 1 Estimate <x, y> up to ε w.p. 1-δ -> solve LSI2 w.p. 1-δ R||δ (inner productε) = Ω(ε-2 log 1/δ) L2-estimationε U = 1/ε2 ¢ 1/δ - log 1/δWhat factor is new, but is |x-y| ? 2 want an (ε-2 log n log U 1/δ) lower bound U y 2 {0, 1} x 2 {0, ε} |x|2 = 1 |y|2 = 1 - Can use a known trick to 2 - |x-y|22 = |x| |y|22extra - 2<x, y> 2 –factor 2<x,y> 2 +an get log= n - Estimate |x-y|2 up to (1+Θ(ε))-factor solves inner-productε - So R||δ (L2-estimationε) = Ω(ε-2 log 1/δ) Augmented Lopsided Set Intersection (ALSI2) Universe [U] = [1/ε2 ¢ 1/δ] j 2 [U] i* 2 {1, 2, …, r} Si*+1 …, Sr S1, …, Sr ½ [U] All i: |Si| = 1/ε2 Is j 2 Si*? R||1/3(ALSI2) = (r ε-2 log 1/δ) Reduction of ALSI2 to L2-estimationε - Set r = Θ(log n) S1 S2 … Sr - R|| δ(L2-estmationε) = (ε-2 log n log 1/δ) x1 j yi* || (L x2 S -estimationxεi*+1 - Streaming Space > R ) δ 2i*+1 x … … … xr Sr xr } y - x = 10i* yi* - i=1i* 10i ¢ xi |y-x|2 is dominated by 10i* |yi* – xi*|2 } y Lower Bounds for JohnsonLindenstrauss x 2 {-nO(1), …, nO(1)} t y 2 {-nO(1), …, nO(1)} t Use public randomness to agree on a JL matrix A Ax - Can estimate |x-y|2 up to 1+ε w.p. 1-δ - #rows(A) = (r ε-2 log 1/δ /log n) - Ay |A(x-y)|2 Low-Error Hamming Distance Universe = [n] Δ(x,y) = Hamming Distance between x and y x 2 {0,1}n y 2 {0,1}n • R||δ (Δ(x,y)ε) = (ε-2 log 1/δ log n) • Reduction to ALSI2 • Gap-Hamming to LSI2 reductions with Low Error • Implies our lower bounds for estimating • Any Lp-norm • Distinct Elements • Entropy Conclusions • Prove first streaming space lower bounds that depend on probability of error δ – Optimal for Lp-norms, distinct elements – Improves lower bound for entropy – Optimal dimensionality bound for JL transforms • Adds several twists to augmented indexing proofs – Augmented indexing with a small set in a large domain – Proof builds upon lopsided set disjointness lower bounds – Uses multiple Gap-Hamming to Indexing reductions that handle low error ALSI2 to Hamming Distance multiple copies by S1, …, Sr ½ Embed [1/ε2 ¢ 1/δ] 2 duplicating coordinates at All i: |Si| = 1/ε different scales j 2 [U] i* 2 {1, 2, …, r} Si*+1 …, Sr - Let t = 1/ ε2 log 1/δ - Use public coin to generate t random strings b1, …, bt 2 {0,1}t - Alice sets xi = majorityk in Si bi, k - Bob sets yi = bi ,j