Download Corruption and Recovery-Efficient Locally

Corruption and RecoveryEfficient Locally Decodable Codes David Woodruff IBM Almaden Locally Decodable Codes • A binary (q, δ, ε)-LDC is an encoding m C: {0,1}n -> {0,1} for which there is a machine A with access to a noisy version y of C(x) • 8 x 2 {0,1}n, if ¢(y, C(x)) · δm, then 8 k 2 [n], Pr[Ay(k) = xk] ¸ ½ + ε (probability over A’s coins) • A always queries at most q coordinates of y • C is called linear if C is a linear transformation Locally Decodable Codes • Tradeoff between message length n, encoding length m, number of queries q, fraction of corrupted bits δ, and recovery probability ½+ ε We focus on the popular case when q is constant • For q = 1, LDCs do not exist (Katz, Trevisan) • For q = 2, LDC is linear and Hadamard-based, and achieves m = exp(δn) which is optimal for linear codes (Obata, improving upon Goldreich, Karloff, Schulman, Trevisan) • For q = 3, known construction has m = exp(exp(log1/2 n log log n)), assuming ε and δ are constant (Efremenko, improving upon Yekhanin) • All known constructions of LDCs are linear Main Result We give a black box transformation from a linear LDC into a non-linear LDC with a better dependence on δ and ε. Can yield significant improvements in applications in which δ and ε may be flexible (e.g., small constants or sub-constant). Theorem: Given a family of (q, δ, ½-βδ)-LDCs of length m(n), where q is a constant, β > 0 is a constant, and δ < 1/(2β), there is a family of nonlinear (q, £(δ), ε)-LDCs of length poly(1/(ε+ δ)) m(max(ε, δ)δn). Hadamard code is (2, δ, ½-2δ)-LDC with m(n) = 2n. We get a family of (2, £(δ), ε)-LDCs of length poly(1/(ε+δ))exp(max(ε,δ) δn). Separates linear and non-linear LDCs as there is an exp(dn) lower bound for linear 2-query LDCs (answers a question posed by Kerenidis and de Wolf) Improves the exponent (in terms of δ, ε) for constant q-query LDCs, replacing occurrences of n in known constructions with max(δ, ε)δn Additional Results • Our result gives a 2-query LDC with m(n) = exp(max(ε,δ) δn). Known lower bound for general (non-linear) 2-query LDCs is m(n) ¸ exp(ε2 δn) (Kerenidis, de Wolf) • What is the optimal dependence on ε, δ? • We improve the lower bound to m(n) ¸ exp(max(ε,δ) δn) when the decoder is a matching sum decoder (generalizes the perfectly smooth decoder of Trevisan, all known decoders have this property) • Assumption: decoder has partial matchings M1, …, Mn of edges on vertices {1, …, m}. Given k in [n] , the decoder chooses a uniformly random edge {a, b} in Mk and outputs ya © yb (recall that y is the received word). Note that the encoding need not be linear. Additional Results • Our lower bound technique also yields concrete improvements in the best known 3-query LDCs for constant δ and ε • The best known 3-query LDC has m(n) = exp(exp(log1/2 n log log n)), but has recovery probability < ½ as soon as δ > 1/12 • We get a 3-query LDC of length m(n) = exp(exp(log1/2 n log log n)) for any δ < 1/6 • Our LDC, as well as Efremenko’s, has a matching sum decoder, and there is no 3-query LDC with a matching sum decoder with δ > 1/6 Techniques Theorem: Given a family of (q, δ, ½-βδ)-LDCs of length m(n), where q is a constant, β > 0 is a constant, and δ < 1/(2β), there is a family of non-linear (q, £(δ), ε)-LDCs of length poly(1/(ε+ δ)) m(max(ε, δ)δn). Take x in {0,1}n and partition n coordinates into n/r blocks B1, …, Bn/r , each containing r = £((ε+ δ)-2) coordinates. Compute zj = majority(xi | i in Bj), and encode z1, …, zn/r with a (q, δ, ε)-LDC C. If k 2 Bj , then Prx in {0,1}n[xk = zj] ¸ ½ + 3q(ε+δ). Choose s1, …, st in {0,1}n uniformly at random, apply the above procedure to each of x © s1, x © s2, …, x © st, and take the concatenation. s1, …, st are chosen by the probabilistic method so that 8 x in {0,1}n and 8 k 2 [n], if k 2 Bj, then Pri in [t][(x © si)k = majority{(x © si)l | l in Bj}] ¸ ½ + 2q(ε+δ). Length of encoding is t ¢ m(n/r) = poly(1/(ε+ δ)) ¢ m(n(ε+δ)2) Techniques • To decode, choose a random i 2 [t]. Decode from the q positions in the (corrupted) encoding of x © si. • Two sources of error: 1. Adversarial δ fraction of bits flipped in encoding 2. Sometimes (x © si)k unequal to majority of bits in corresponding block. • If error sources were independent, decoder’s probability would be Pri[(x © si)k = majority{(x © si)l | l in Bj}]*(1-qδ) ¸ (1/2+2qε+2qδ)(1-qδ) > ½ + ε • Not independent though, as adversary can first decode to recover x, then However, with probability at least 1-qδ, no queried position is corrupted. guess a k 2 [n], then corrupt exactly those encodings of x © si for which By a union bound, Pr[decode correctly] ¸(1/2+2qε+2qδ) – qδ > ½ + ε (x © s ) equals the majority of bits in the corresponding block. i k Techniques • We have a (q, £(δ), ε)-LDC, but the length is poly(1/(ε+ δ)) ¢ m(n(ε+δ)2). • If q = 2, this gives poly(1/(ε+ δ))exp(n(ε+δ)2). However, there is a linear 2query LDC with length poly(1/(ε+ δ))exp(δn). • If ε > δ, our LDC might be longer. We can handle ε > δ as follows: • Break the message x in {0,1}n into £(ε/δ) groups, each of size £((δ/ε)n). • Encode each group using the above procedure, and concatenate the encodings. The length is m’ = poly(1/(ε+ δ))exp((δ/ε)n(ε+δ)2) = poly(1/(ε+ δ))exp(εδn). • Inside each group, the adversary can corrupt up to δm’ positions, which since the group has £((δ/ε))m’ positions, is an ε fraction of positions. Thus, Pr[decode correctly] ¸(1/2+2qε+2qδ) – qε > ½ + ε, as desired. Recap X1 X2 … … … Xn Break into d = max(1, £(ε/δ)) groups Group 1 Group 2 … Group d For each group, 1. Break into (n/d)(ε+δ)2 blocks each of size 1/(ε+δ)2. 2. Compute the majority bit of each block. Repeat for several different shifts x © si for random si in {0,1}n 3. Encode the majority bits using an existing q-query LDC. Lower Bound Techniques • How good is our upper bound? • For 2 queries we achieve m(n) = exp(max(ε,δ)δn) • Kerenidis and de Wolf show m(n) ¸ exp(ε2δn) for 2-query non-linear LDCs (recall that for 2-query linear LDCs, the bound is m(n) ¸ exp(δn)). • We improve the bound to a tight m(n) = exp(max(ε,δ)δn) under the assumption that the decoder has partial matchings M1, …, Mn of edges on vertices {1, …, m}. Given k in [n] , the decoder chooses a uniformly random edge {a, b} in Mk and outputs ya © yb (recall that y is the received word). • Any linear LDC can be assumed to be in this form by minor modifications. Our LDC also has this form (so all known LDCs have this form). Lower Bound Techniques • Intuition: fix a matching Mk. For each edge e=(a,b) in Mk, look at the probability pk,e that C(x)a © C(x)b = xk for a random x in {0,1}n • Let qk be the probability the decoder succeeds, assuming there are no bits flipped by the adversary, over a random x in {0,1}n • By our assumptions, qk = sume in Mk pk, e/|Mk| • (correctness) qk ¸ ½ + ε • (restricted decoder) qk ¸ ½ + δm/|Mk|. Otherwise there is a fixed x in {0,1}n that has less than |Mk|/2 + δm edges that can be used for recovering xk. The adversary can flip one endpoint of exactly δm edges in Mk • Main “Average-case LDC Lemma”: suppose we have matchings Mk with sizes ckm such that for all k, qk ¸ ½ + rk. Let r = sumk=1n rk/n and c = sumk=1n ck/n. Then m ¸ exp(ncr2). Lower Bound Techniques • Main Lemma: suppose we have matchings Mk with sizes ckm such that for all k, qk ¸ ½ + rk. Let r = sumk=1n rk/n and c = sumk=1n ck/n. Then m ¸ exp(ncr2). • Our claims imply rk ¸ max(ε, δm/|Mk|) = max(ε, δ/ck). • Can show that exp(ncr2) is minimized at exp(max(ε,δ)δn). • Proof of the main lemma generalizes earlier quantum information theory arguments of Kerenidis and de Wolf. Conclusions • Gave a black box transformation from a linear LDC into a non-linear LDC with a better dependence on δ and ε. – Separates linear and non-linear 2-query LDCs – Yields 3-query LDCs with best known dependence on δ and ε • Gave a tight lower bound for 2-query LDCs with matching sum decoders. • Extended the range of δ for which 3-query LDCs become non-trivial from δ < 1/12 to δ < 1/6. • General question: how are the parameters of linear and non-linear LDCs related? Additional Perspective • To prove the main lemma in our lower bound, we need various transformations between LDCs. • One such transformation yields concrete improvements in the best known 3-query LDCs for constant δ and ε • Best known 3-query LDC has m(n) = exp(exp(log1/2 n log log n)), but has recovery probability < ½ as soon as δ > 1/12 • We get a 3-query LDC of length m(n) = exp(exp(log1/2 n log log n)) with non-trivial recovery probability for any δ < 1/6 (and we preserve linearity) • Our LDC, as well as Efremenko’s, has a matching sum decoder, and there is no 3-query LDC with a matching sum decoder with δ > 1/6 An LDC Transformation • Take Efremenko’s linear 3-query LDC of length m(n). Identify the codeword positions with linear forms vj, so that the j-th position computes <vj, x> • For all k in [n], there is a matching Mk of triples {a, b, c} of codeword positions so that va © vb © vc = ek with |Mk| ¸ Ám for a constant Á > 0 • Consider a new LDC formed by taking each ordered multiset S = {a1, …, ap) of size p = O(ln 1/Á), and creating the entry a1 © a2 © … © ap • New codeword length is m(n)p, which is still exp(exp(log1/2 n log log n)) • With high probability, for any multiset S and any k, there exists an a in S for which (a, b) in Mk for some b, and so one can consider the edge (S, T), where T is the multiset formed by removing a from S and inserting b. • We boost the size of matchings, and this gives a better dependence on δ There are a few minor issues to ensure the matchings are well-defined.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Corruption and Recovery-Efficient Locally