Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Locally Decodable
Codes
Uri Nadav
Contents
• What is Locally Decodable Code (LDC) ?
• Constructions
• Lower Bounds
• Reduction from Private Information Retrieval
(PIR) to LDC
Minimum Distance
For every x≠y that satisfy d(C(x),C(y)) ≥ δ
• Error correction problem is solvable for less than δ/2
errors
• Error Detection problem is solvable for less than δ
errors
/2
codeword
Error-correction
Codeword
x
Encoding
C(x)
Input
Errors
Corrupted
codeword
i
Bit to decode
Worst case
error assumption
y
Decoding
x[i]
Decoded bit
Query Complexity
• Number of indices decoder is allowed to read
from (corrupted) codeword
• Decoding can be done with query complexity
Ω(|C(x)|)
• We are interested in constant query
complexity
Adversarial Model
We can view the errors model as an
adversary that chooses positions to
destroy, and has access to the
decoding/encoding scheme (but not to
random coins)
The adversary is allowed to insert at most m errors
Why not decode in blocks?
Adversary is worst case so it can
destroy more than δ fraction of some
blocks, and less from others.
Nice errors:
Worst Case:
Many errors in the same block
Ideal Code C:{0,1} 
n
m
Constant information rate: n/m > c
Resilient against constant fraction of
errors (linear minimum distance)
Efficient Decoding (constant query
complexity)
No Such Code!
Definition of LDC
C:{0,1}  is a (q,,) locally decodable code if there
exists a prob. algorithm A such that:
n
m
x  {0,1} , y   with distance d(y,C(x))<m and
n
m
i  {1,..,n}, Pr[ A(y,i)=xi ] > ½ + 
The Probability is over the coin tosses of A
A reads at most q indices of y (of its choice)
A has oracle access to y Queries are not allowed to be adaptive
A must be probabilistic if q< m
Example: Hadamard Code
• Hadamard is (2,δ, ½ -2δ) LDC
• Construction:
Relative minimum distance ½
Encoding
x1 x2
xn
source word
<x,2n-1>
<x,1> <x,2>
codeword
Example: Hadamard Code
n
Reconstruction
Pick aR{0,1}
the i’th entry
ei=(0,…0,1,0,…,0)
reconstruction formula
2 queries
xi
=
Decoding
x1 x2
xn
source word
<x,a> + <x,a+ei>
<x,2n-1>
<x,1> <x,2>
codeword
If less than δ fraction of errors, then
reconstruction probability is at least 1-2δ
Another Construction…
Reconstruction of bit xi,j:
2 n´ 2
1) A,B
2) A{i},B
3) A,B{j}
4) A{i},B{j}
n´
n
matrix
n matrix
x
{
AÍ 1,..., n
Å
Probability of 1-4 for correct decoding
{
B Í 1,..., n
i Î A, j Î B
}
xi , j
}
Generalization…
n1/ k
2
n1/ k
´ L´ 2
n1/ k ´ L ´ n1/ k CUBE
x
2k queries
1/k
kn
m=2
Smoothly Decodable Code
C:{0,1}  is a (q,c,) smoothly decodable code if there
exists a prob. algorithm A such that:
n
1
m
x  {0,1} and i  {1,..,n}, Pr[ A(C(x),i)=xi ] > ½ + 
n
The Probability
A has access
is to
over
a non
thecorrupted
coin tosses
codeword
of A
2
A reads at most q indices of C(x) (of its choice)
Queries are not allowed to be adaptive
3
i  {1,..,n} and j  {1,..,m}, Pr[ A(·,i) reads j ] ≤ c/m
The event is: A reads index j of C(x) to reconstruct index i
LDC is also Smooth Code
Claim: Every (q,δ,ε) LDC is a (q,q/δ,ε)
smooth code.
Intuition – If the code is resilient against
linear number of errors, then no bit of the
output can be queried too often (or else
adversary will choose it)
Proof: LDC is Smooth
A - a reconstruction algorithm for (q,δ,ε) LDC
Si= {j | Pr[A query j] > q/δm}
Set of indices
read ‘too’ often
There are at mostq queries, so sum of prob.
over j is q , thus |Si| < δm
Proof:LDC is Smooth
A’ – uses A as black box, returns whatever A returns as xi
A’ gives A oracle access to corrupted codeword C(x)’,
return only indices not in S
[C(x)’]j =
0
j  Si
C(x)j otherwise
A reconstructs xi with probability at least 1/2 + ε,
because there are at most |Si| < δm errors
A’ is a (q,q/δ, ε) Smooth decoding algorithm
Proof: LDC is Smooth
indices that A reads too often
C(x)
what A wants
A
what A gets
C(x)’
0
0
0
indices that A’ fixed arbitrarily
Smooth Code is LDC
• A bit can be reconstructed using q
uniformly distributed queries, with ε
advantage , when no errors
• With probability (1-qδ) all the queries are
to non-corrupted indices.
Remember: Adversary does not know decoding
procedure’s random coins
Lower Bounds
• Non existence for q = 1 [KT]
• Non linear rate for q ≥ 2 [KT]
• Exponential rate for linear code, q=2
[Goldreich et al]
• Exponential rate for every code, q=2
[Kerenidis,de Wolf] (using quantum arguments)
Information Theory basics
• Entropy
H(x) = -∑Pr[x=i] log(Pr[x=i])
• Mutual Information
I(x,y) = H(x)-H(x|y)
Information Theory cont…
• Entropy of multiple variable is less than
the sum of entropies! (equal in case of all
variables mutually independent:
H(x1x2…xn) ≤ ∑ H(xi)
• Highest entropy is of a uniformly
distributed random variable.
IT result from [KT]
Thm : Let C : {0,1}n → R be a function. Assume
there exists algorithm A s.t.
∀i ∈[n] , Prx [A(C(x),i) = xi ] > 1/2 + ε
(prob' taken over all random coins of A, and all x)
then
log(| R |) ≥ (1 - H (1 / 2 + ε ))n
Proof
• I(x, C(x)) ≤ H(C(x)) ≤ log(| R |)
• I(x, C(x)) = H(x) - H(x | C(x))
n
≥ H(x) - ∑
H(xi | C ( x ))
i=1
≥ (1 - H(1 / 2 + ε))n
Combined …
log(| R |) ≥ (1- H (1/ 2 + ε))n
Single query (q=1)
Claim: If C:{0,1}  is (1,δ,ε) locally
decodable then:
n
m,
log | Σ |
n≤
δ(1 - H(1/2 + ε))
No such family of codes!
Good Index
Index j is said to be ‘good’ for i, if
Pr[A(C(x),i)=xi |A reads j] > ½ + ε
Single query (q=1)
1/2 + ε < Prx [A(C(x),i) = xi ]
=
By definition of LDC
∑Pr [A(C(x),i) = x | A(•, i) reads j]Pr[A(•, i) reads j]
j∈{1.. m}
x
i
Conditional prob. summing over disjoint events
There exist at least a single j1 which is
good for i.
Perturbation Vector
Def: Perturbation vector Δj1,j2,…
takes random values uniformly
distributed from ∑, in position
j1,j2,… and 0 otherwise.
Destroys specified
indices in most
unpredicted way
0
0
j1»
j2 »
∑
0
0
∑
0
Adding perturbation
1/2 + ε < Prx [A(C(x)⊕Δj1 , i) = xi ]
=
A resilient
Against at
least 1 error
∑Pr [A(C(x)⊕Δ , i) = x | A(•, i) reads j]Pr[A(•, i) reads j]
j∈{1.. m}
x
j1
i
So, there exists at least one index,
j2 ‘good’ for i.
j2 ≠ j1 , because j1 can not be good!
Single query (q=1)
∑Pr [A(C(x)⊕Δ
j∈[m]
x
j1 , j2 ,.. jδm
> 1/2 + ε
, i) = xi | A(•, i) reads j]Pr[A(•, i) reads j]
A resilientAgainst
δm errors
So, There are at least δm indices of
The codeword ‘good’ for every i.
By pigeonhole principle , there exists
an index j’ in {1..m}, ‘good’ for δn indices.
Single query (q=1)
Think of C(x[1..δn]) projected on j’ as a function
from the δn indices of the input. The range
is ∑, and each bit of the input can be
reconstructed w.p. ½ + ε. Thus by IT result:
log(| Σ |)
δn <
(1 - H(1/2 + ε))
Case q≥2
m = Ω(n)q/(q-1)
Constant time reconstruction
procedures are impossible for codes
having constant rate!
Case q≥2 Proof Sketch
• A LDC C is also smooth
• A q smooth codeword has a small
enough subset of indices, that still
encodes linear amount of information
• So, by IT result, m(q-1)/q = Ω(n)
Applications?
• Better locally decodable codes have
applications to PIR
• Applications to the practice of faulttolerant data storage/transmission?
What about Locally Encodable
• A ‘Respectable Code’ is resilient against
Ω(m) fraction of errors.
• We expect a bit of the encoding to depend
on many bits of the encoding
Otherwise, there exists a bit which
influence less than 1/n fraction of the encoding.
Open Issues
• Adaptive vs Non-Adaptive Queries
guess first q-1 answers with succeess probability ∑q-1
• Closing the gap
Logarithmic number of
queries
• View message as polynomial p:Fk->F
of degree d (F is a field, |F| >> d)
• Encode message by evaluating p at all
|F|k points
• To encode n-bits message, can have
|F| polynomial in n, and d,k around
polylog(n)
To reconstruct p(x)
• Pick a random line in Fk passing through x;
• evaluate p on d+1 points of the line;
• by interpolation, find degree-d univariate
polynomial that agrees with p on the line
• Use interpolated polynomial to estimate
p(x)
• Algorithm reads p in d+1 points, each
uniformly distributed
x+(d+1)y
x
x+y
x+2y
Private Information Retrieval
(PIR)
• Query a public database, without
revealing the queried record.
• Example: A broker needs to query
NASDAQ database about a stock, but
don’t won’t anyone to know he is
interested.
PIR
A k server PIR scheme of one round,
for database length n consists of:
l
K query functions Q1 ,..., Qk : [n] × {0,1}lrnd → {0,1} q
lq
K answer functions A1 ,..., Ak : [n] × {0,1} → {0,1}la
reconstruc tion function : R : [n] × {0,1}lrnd × ( {0,1}la )k
PIR – definition
• These function should satisfy:
Correctnes s : ∀x ∈{0,1} n , i ∈[n ], r ∈{0,1} rnd
R(i, r, ..., Aj (Qj (i, r)),...) = xi
Privacy : For every i, j ∈[n] , s ∈[k] and q ∈{0,1} q ,
Pr [Qs (i, r) = q] = Pr[Qs (j, r) = q].
Simple
Construction of PIR
• 2 servers, one round
• Each server holds bits x1,…, xn.
• To request bit i, choose uniformly A subset
of [n]
• Send first server A.
• Send second server A+{i} (add i to A if it is
not there, remove if is there)
• Server returns Xor of bits in indices of
request S in [n].
• Xor the answers.
Lower Bounds On
Communication Complexity
• To achieve privacy in case of single
server, we need n bits message.
• (not too far from the one round 2
server scheme we suggested).
Reduction from PIR to LDC
• A codeword is a Concatenation of all
possible answers from both servers
• A query procedure is made of 2
queries to the database