Download Corruption and Recovery-Efficient Locally

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Corruption and RecoveryEfficient Locally Decodable
Codes
David Woodruff
IBM Almaden
Locally Decodable Codes
•
A binary (q, δ, ε)-LDC is an encoding
m
C: {0,1}n -> {0,1}
for which there is a machine A with access to a noisy
version y of C(x)
•
8 x 2 {0,1}n, if ¢(y, C(x)) · δm, then 8 k 2 [n],
Pr[Ay(k) = xk] ¸ ½ + ε (probability over A’s coins)
•
A always queries at most q coordinates of y
•
C is called linear if C is a linear transformation
Locally Decodable Codes
• Tradeoff between message length n, encoding length m, number of
queries q, fraction of corrupted bits δ, and recovery probability ½+ ε
We focus on the popular case when q is constant
• For q = 1, LDCs do not exist (Katz, Trevisan)
• For q = 2, LDC is linear and Hadamard-based, and achieves m =
exp(δn) which is optimal for linear codes (Obata, improving upon
Goldreich, Karloff, Schulman, Trevisan)
• For q = 3, known construction has m = exp(exp(log1/2 n log log n)),
assuming ε and δ are constant (Efremenko, improving upon
Yekhanin)
• All known constructions of LDCs are linear
Main Result
We give a black box transformation from a linear LDC into a non-linear
LDC with a better dependence on δ and ε.
Can yield significant improvements in applications in which δ and ε may
be flexible (e.g., small constants or sub-constant).
Theorem: Given a family of (q, δ, ½-βδ)-LDCs of length m(n), where q is a
constant, β > 0 is a constant, and δ < 1/(2β), there is a family of nonlinear (q, £(δ), ε)-LDCs of length poly(1/(ε+ δ)) m(max(ε, δ)δn).
Hadamard code is (2, δ, ½-2δ)-LDC with m(n) = 2n. We get a family of
(2, £(δ), ε)-LDCs of length poly(1/(ε+δ))exp(max(ε,δ) δn).
Separates linear and non-linear LDCs as there is an exp(dn) lower
bound for linear 2-query LDCs (answers a question posed by
Kerenidis and de Wolf)
Improves the exponent (in terms of δ, ε) for constant q-query LDCs,
replacing occurrences of n in known constructions with max(δ, ε)δn
Additional Results
• Our result gives a 2-query LDC with m(n) = exp(max(ε,δ) δn).
Known lower bound for general (non-linear) 2-query LDCs is m(n) ¸
exp(ε2 δn) (Kerenidis, de Wolf)
• What is the optimal dependence on ε, δ?
• We improve the lower bound to m(n) ¸ exp(max(ε,δ) δn) when the
decoder is a matching sum decoder (generalizes the perfectly
smooth decoder of Trevisan, all known decoders have this property)
• Assumption: decoder has partial matchings M1, …, Mn of edges on
vertices {1, …, m}. Given k in [n] , the decoder chooses a uniformly
random edge {a, b} in Mk and outputs ya © yb (recall that y is the
received word). Note that the encoding need not be linear.
Additional Results
• Our lower bound technique also yields concrete improvements in the
best known 3-query LDCs for constant δ and ε
• The best known 3-query LDC has m(n) = exp(exp(log1/2 n log log n)),
but has recovery probability < ½ as soon as δ > 1/12
• We get a 3-query LDC of length m(n) = exp(exp(log1/2 n log log n))
for any δ < 1/6
• Our LDC, as well as Efremenko’s, has a matching sum decoder, and
there is no 3-query LDC with a matching sum decoder with δ > 1/6
Techniques
Theorem: Given a family of (q, δ, ½-βδ)-LDCs of length m(n), where q is a
constant, β > 0 is a constant, and δ < 1/(2β), there is a family of non-linear
(q, £(δ), ε)-LDCs of length poly(1/(ε+ δ)) m(max(ε, δ)δn).
Take x in {0,1}n and partition n coordinates into n/r blocks B1, …, Bn/r , each
containing r = £((ε+ δ)-2) coordinates. Compute zj = majority(xi | i in Bj), and
encode z1, …, zn/r with a (q, δ, ε)-LDC C.
If k 2 Bj , then Prx in {0,1}n[xk = zj] ¸ ½ + 3q(ε+δ).
Choose s1, …, st in {0,1}n uniformly at random, apply the above procedure to
each of x © s1, x © s2, …, x © st, and take the concatenation.
s1, …, st are chosen by the probabilistic method so that 8 x in {0,1}n and 8 k 2
[n], if k 2 Bj, then Pri in [t][(x © si)k = majority{(x © si)l | l in Bj}] ¸ ½ + 2q(ε+δ).
Length of encoding is t ¢ m(n/r) = poly(1/(ε+ δ)) ¢ m(n(ε+δ)2)
Techniques
• To decode, choose a random i 2 [t]. Decode from the q positions in the
(corrupted) encoding of x © si.
• Two sources of error:
1. Adversarial δ fraction of bits flipped in encoding
2. Sometimes (x © si)k unequal to majority of bits in corresponding
block.
• If error sources were independent, decoder’s probability would be
Pri[(x © si)k = majority{(x © si)l | l in Bj}]*(1-qδ) ¸ (1/2+2qε+2qδ)(1-qδ) > ½ + ε
• Not independent though, as adversary can first decode to recover x, then
However,
with probability at least 1-qδ, no queried position is corrupted.
guess a k 2 [n], then corrupt exactly those encodings of x © si for which
By
a union bound, Pr[decode correctly] ¸(1/2+2qε+2qδ) – qδ > ½ + ε
(x © s ) equals the majority of bits in the corresponding block.
i k
Techniques
•
We have a (q, £(δ), ε)-LDC, but the length is poly(1/(ε+ δ)) ¢ m(n(ε+δ)2).
•
If q = 2, this gives poly(1/(ε+ δ))exp(n(ε+δ)2). However, there is a linear 2query LDC with length poly(1/(ε+ δ))exp(δn).
•
If ε > δ, our LDC might be longer. We can handle ε > δ as follows:
•
Break the message x in {0,1}n into £(ε/δ) groups, each of size £((δ/ε)n).
•
Encode each group using the above procedure, and concatenate the
encodings.
The length is m’ = poly(1/(ε+ δ))exp((δ/ε)n(ε+δ)2) = poly(1/(ε+ δ))exp(εδn).
•
Inside each group, the adversary can corrupt up to δm’ positions, which
since the group has £((δ/ε))m’ positions, is an ε fraction of positions. Thus,
Pr[decode correctly] ¸(1/2+2qε+2qδ) – qε > ½ + ε, as desired.
Recap
X1
X2
…
…
…
Xn
Break into d = max(1, £(ε/δ)) groups
Group 1
Group 2
…
Group d
For each group,
1. Break into (n/d)(ε+δ)2 blocks each of size 1/(ε+δ)2.
2. Compute the majority bit of each block.
Repeat for
several
different shifts x
© si for random
si in {0,1}n
3. Encode the majority bits using an existing q-query LDC.
Lower Bound Techniques
•
How good is our upper bound?
•
For 2 queries we achieve m(n) = exp(max(ε,δ)δn)
•
Kerenidis and de Wolf show m(n) ¸ exp(ε2δn) for 2-query non-linear LDCs
(recall that for 2-query linear LDCs, the bound is m(n) ¸ exp(δn)).
•
We improve the bound to a tight m(n) = exp(max(ε,δ)δn) under the
assumption that the decoder has partial matchings M1, …, Mn of edges on
vertices {1, …, m}. Given k in [n] , the decoder chooses a uniformly random
edge {a, b} in Mk and outputs ya © yb (recall that y is the received word).
•
Any linear LDC can be assumed to be in this form by minor modifications.
Our LDC also has this form (so all known LDCs have this form).
Lower Bound Techniques
•
Intuition: fix a matching Mk. For each edge e=(a,b) in Mk, look at the
probability pk,e that C(x)a © C(x)b = xk for a random x in {0,1}n
•
Let qk be the probability the decoder succeeds, assuming there are no bits
flipped by the adversary, over a random x in {0,1}n
•
By our assumptions, qk = sume in Mk pk, e/|Mk|
•
(correctness) qk ¸ ½ + ε
•
(restricted decoder) qk ¸ ½ + δm/|Mk|. Otherwise there is a fixed x in {0,1}n
that has less than |Mk|/2 + δm edges that can be used for recovering xk. The
adversary can flip one endpoint of exactly δm edges in Mk
•
Main “Average-case LDC Lemma”: suppose we have matchings Mk with
sizes ckm such that for all k, qk ¸ ½ + rk. Let r = sumk=1n rk/n and c = sumk=1n
ck/n. Then m ¸ exp(ncr2).
Lower Bound Techniques
• Main Lemma: suppose we have matchings Mk with sizes
ckm such that for all k, qk ¸ ½ + rk. Let r = sumk=1n rk/n and
c = sumk=1n ck/n. Then m ¸ exp(ncr2).
• Our claims imply rk ¸ max(ε, δm/|Mk|) = max(ε, δ/ck).
• Can show that exp(ncr2) is minimized at exp(max(ε,δ)δn).
• Proof of the main lemma generalizes earlier quantum
information theory arguments of Kerenidis and de Wolf.
Conclusions
• Gave a black box transformation from a linear LDC into
a non-linear LDC with a better dependence on δ and ε.
– Separates linear and non-linear 2-query LDCs
– Yields 3-query LDCs with best known dependence
on δ and ε
• Gave a tight lower bound for 2-query LDCs with
matching sum decoders.
• Extended the range of δ for which 3-query LDCs
become non-trivial from δ < 1/12 to δ < 1/6.
• General question: how are the parameters of linear
and non-linear LDCs related?
Additional Perspective
• To prove the main lemma in our lower bound, we need various
transformations between LDCs.
• One such transformation yields concrete improvements in the best
known 3-query LDCs for constant δ and ε
• Best known 3-query LDC has m(n) = exp(exp(log1/2 n log log n)), but
has recovery probability < ½ as soon as δ > 1/12
• We get a 3-query LDC of length m(n) = exp(exp(log1/2 n log log n))
with non-trivial recovery probability for any δ < 1/6 (and we preserve
linearity)
• Our LDC, as well as Efremenko’s, has a matching sum decoder, and
there is no 3-query LDC with a matching sum decoder with δ > 1/6
An LDC Transformation
•
Take Efremenko’s linear 3-query LDC of length m(n). Identify the codeword
positions with linear forms vj, so that the j-th position computes <vj, x>
•
For all k in [n], there is a matching Mk of triples {a, b, c} of codeword
positions so that va © vb © vc = ek with |Mk| ¸ Ám for a constant Á > 0
•
Consider a new LDC formed by taking each ordered multiset S = {a1, …, ap)
of size p = O(ln 1/Á), and creating the entry a1 © a2 © … © ap
•
New codeword length is m(n)p, which is still exp(exp(log1/2 n log log n))
•
With high probability, for any multiset S and any k, there exists an a in S for
which (a, b) in Mk for some b, and so one can consider the edge (S, T),
where T is the multiset formed by removing a from S and inserting b.
•
We boost the size of matchings, and this gives a better dependence on δ
There are a few minor issues to ensure the matchings are well-defined.
Related documents