* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download On The Computability of Julia Sets
Survey
Document related concepts
Transcript
Introduction to information
complexity
Mark Braverman
Princeton University
June 30, 2013
1
Part I: Information theory
• Information theory, in its modern format
was introduced in the 1940s to study the
problem of transmitting data over physical
channels.
communication channel
Alice
Bob
2
Quantifying “information”
• Information is measured in bits.
• The basic notion is Shannon’s entropy.
• The entropy of a random variable is the
(typical) number of bits needed to remove
the uncertainty of the variable.
• For a discrete variable:
𝐻 𝑋 ≔ ∑ Pr 𝑋 = 𝑥 log 1/Pr[𝑋 = 𝑥]
3
Shannon’s entropy
• Important examples and properties:
– If 𝑋 = 𝑥 is a constant, then 𝐻 𝑋 = 0.
– If 𝑋 is uniform on a finite set 𝑆 of possible
values, then 𝐻 𝑋 = log 𝑆.
– If 𝑋 is supported on at most 𝑛 values, then
𝐻 X ≤ log 𝑛.
– If 𝑌 is a random variable determined by 𝑋,
then 𝐻 𝑌 ≤ 𝐻(𝑋).
4
Conditional entropy
• For two (potentially correlated) variables
𝑋, 𝑌, the conditional entropy of 𝑋 given 𝑌 is
the amount of uncertainty left in 𝑋 given 𝑌:
𝐻 𝑋 𝑌 ≔ 𝐸𝑦~𝑌 H X Y = y .
• One can show 𝐻 𝑋𝑌 = 𝐻 𝑌 + 𝐻(𝑋|𝑌).
• This important fact is known as the chain
rule.
• If 𝑋 ⊥ 𝑌, then
𝐻 𝑋𝑌 = 𝐻 𝑋 + 𝐻 𝑌 𝑋 = 𝐻 𝑋 + 𝐻 𝑌 .
5
Example
•
•
•
•
𝑋 = 𝐵1 , 𝐵2 , 𝐵3
𝑌 = 𝐵1 ⊕ 𝐵2 , 𝐵2 ⊕ 𝐵4 , 𝐵3 ⊕ 𝐵4 , 𝐵5
Where 𝐵1 , 𝐵2 , 𝐵3 , 𝐵4 , 𝐵5 ∈𝑈 {0,1}.
Then
– 𝐻 𝑋 = 3; 𝐻 𝑌 = 4; 𝐻 𝑋𝑌 = 5;
– 𝐻 𝑋 𝑌 = 1 = 𝐻 𝑋𝑌 − 𝐻 𝑌 ;
– 𝐻 𝑌 𝑋 = 2 = 𝐻 𝑋𝑌 − 𝐻 𝑋 .
6
Mutual information
• 𝑋 = 𝐵1 , 𝐵2 , 𝐵3
• 𝑌 = 𝐵1 ⊕ 𝐵2 , 𝐵2 ⊕ 𝐵4 , 𝐵3 ⊕ 𝐵4 , 𝐵5
𝐻(𝑋)
𝐻(𝑋|𝑌)
𝐵1
𝐼(𝑋; 𝑌)
𝐵1 ⊕ 𝐵2
𝐵2 ⊕ 𝐵3
𝐻(𝑌|𝑋)
𝐻(𝑌)
𝐵4
𝐵5
7
Mutual information
• The mutual information is defined as
𝐼 𝑋; 𝑌 = 𝐻 𝑋 − 𝐻 𝑋 𝑌 = 𝐻 𝑌 − 𝐻(𝑌|𝑋)
• “By how much does knowing 𝑋 reduce the
entropy of 𝑌?”
• Always non-negative 𝐼 𝑋; 𝑌 ≥ 0.
• Conditional mutual information:
𝐼 𝑋; 𝑌 𝑍 ≔ 𝐻 𝑋 𝑍 − 𝐻(𝑋|𝑌𝑍)
• Chain rule for mutual information:
𝐼 𝑋𝑌; 𝑍 = 𝐼 𝑋; 𝑍 + 𝐼(𝑌; 𝑍|𝑋)
8
• Simple intuitive interpretation.
Information Theory
• The reason Information Theory is so
important for communication is because
information-theoretic quantities readily
operationalize.
• Can attach operational meaning to
Shannon’s entropy: 𝐻 𝑋 ≈ “the cost of
transmitting 𝑋”.
• Let 𝐶 𝑋 be the (expected) cost of
transmitting a sample of 𝑋.
9
𝐻 𝑋 = 𝐶(𝑋)?
• Not quite.
• Let trit 𝑇 ∈𝑈 1,2,3 .
5
3
• 𝐶 𝑇 = ≈ 1.67.
1
2
0
10
3
11
• 𝐻 𝑇 = log 3 ≈ 1.58.
• It is always the case that 𝐶 𝑋 ≥ 𝐻(𝑋).
10
But 𝐻 𝑋 and 𝐶(𝑋) are close
• Huffman’s coding: 𝐶 𝑋 ≤ 𝐻 𝑋 + 1.
• This is a compression result: “an
uninformative message turned into a short
one”.
• Therefore: 𝐻 𝑋 ≤ 𝐶 𝑋 ≤ 𝐻 𝑋 + 1.
11
Shannon’s noiseless coding
• The cost of communicating many copies of 𝑋
scales as 𝐻(𝑋).
• Shannon’s source coding theorem:
– Let 𝐶 𝑋 𝑛 be the cost of transmitting 𝑛
independent copies of 𝑋. Then the
amortized transmission cost
lim 𝐶(𝑋 𝑛 )/𝑛 = 𝐻 𝑋 .
𝑛→∞
• This equation gives 𝐻(𝑋) operational
meaning.
12
𝐻 𝑋 𝑜𝑝𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑎𝑙𝑖𝑧𝑒𝑑
𝑋1 , … , 𝑋𝑛 , …
𝐻(𝑋) per copy
to transmit 𝑋’s
communication
channel
13
𝐻(𝑋) is nicer than 𝐶(𝑋)
• 𝐻 𝑋 is additive for independent variables.
• Let 𝑇1 , 𝑇2 ∈𝑈 {1,2,3} be independent trits.
• 𝐻 𝑇1 𝑇2 = log 9 = 2 log 3.
• 𝐶 𝑇1 𝑇2 =
29
9
5
3
< 𝐶 𝑇1 + 𝐶 𝑇2 = 2 × =
30
.
9
• Works well with concepts such as channel
capacity.
14
Operationalizing other quantities
• Conditional entropy 𝐻 𝑋 𝑌 :
• (cf. Slepian-Wolf Theorem).
𝑋1 , … , 𝑋𝑛 , … 𝐻(𝑋|𝑌) per copy
to transmit 𝑋’s
communication
channel
𝑌1 , … , 𝑌𝑛 , …
Operationalizing other quantities
• Mutual information 𝐼 𝑋; 𝑌 :
𝑌1 , … , 𝑌𝑛 , …
𝑋1 , … , 𝑋𝑛 , …
𝐼 𝑋; 𝑌 per copy
to sample 𝑌’s
communication
channel
Information theory and entropy
• Allows us to formalize intuitive notions.
• Operationalized in the context of one-way
transmission and related problems.
• Has nice properties (additivity, chain rule…)
• Next, we discuss extensions to more
interesting communication scenarios.
17
Communication complexity
• Focus on the two party randomized setting.
Shared randomness R
A & B implement a
functionality 𝐹(𝑋, 𝑌).
X
Y
F(X,Y)
A
e.g. 𝐹 𝑋, 𝑌 = “𝑋 = 𝑌? ”
B
18
Communication complexity
Goal: implement a functionality 𝐹(𝑋, 𝑌).
A protocol 𝜋(𝑋, 𝑌) computing 𝐹(𝑋, 𝑌):
Shared randomness R
m1(X,R)
m2(Y,m1,R)
m3(X,m1,m2,R)
X
Y
A
Communication costF(X,Y)
= #of bits exchanged.
B
Communication complexity
• Numerous applications/potential
applications.
• Considerably more difficult to obtain lower
bounds than transmission (still much easier
than other models of computation!).
20
Communication complexity
• (Distributional) communication complexity
with input distribution 𝜇 and error 𝜀:
𝐶𝐶 𝐹, 𝜇, 𝜀 . Error ≤ 𝜀 w.r.t. 𝜇.
• (Randomized/worst-case) communication
complexity: 𝐶𝐶(𝐹, 𝜀). Error ≤ 𝜀 on all inputs.
• Yao’s minimax:
𝐶𝐶 𝐹, 𝜀 = max 𝐶𝐶(𝐹, 𝜇, 𝜀).
𝜇
21
Examples
• 𝑋, 𝑌 ∈ 0,1 𝑛 .
• Equality 𝐸𝑄 𝑋, 𝑌 ≔ 1𝑋=𝑌 .
• 𝐶𝐶 𝐸𝑄, 𝜀 ≈
1
log .
𝜀
• 𝐶𝐶 𝐸𝑄, 0 ≈ 𝑛.
22
Equality
•𝐹 is “𝑋 = 𝑌? ”.
•𝜇 is a distribution where w.p. ½ 𝑋 = 𝑌 and w.p.
½ (𝑋, 𝑌) are random.
Y
X
MD5(X) [128 bits]
X=Y? [1 bit]
A
Error?
• Shows that 𝐶𝐶 𝐸𝑄, 𝜇, 2−129 ≤ 129.
B
Examples
• 𝑋, 𝑌 ∈ 0,1 𝑛 .
• Inner product 𝐼𝑃 𝑋, 𝑌 ≔ ∑𝑖 𝑋𝑖 ⋅ 𝑌𝑖 (𝑚𝑜𝑑 2).
• 𝐶𝐶 𝐼𝑃, 0 = 𝑛 − 𝑜(𝑛).
In fact, using information complexity:
• 𝐶𝐶 𝐼𝑃, 𝜀 = 𝑛 − 𝑜𝜀 (𝑛).
24
Information complexity
• Information complexity 𝐼𝐶(𝐹, 𝜀)::
communication complexity 𝐶𝐶 𝐹, 𝜀
as
• Shannon’s entropy 𝐻(𝑋) ::
transmission cost 𝐶(𝑋)
25
Information complexity
• The smallest amount of information Alice
and Bob need to exchange to solve 𝐹.
• How is information measured?
• Communication cost of a protocol?
– Number of bits exchanged.
• Information cost of a protocol?
– Amount of information revealed.
26
Basic definition 1: The
information cost of a protocol
• Prior distribution: 𝑋, 𝑌 ∼ 𝜇.
Y
X
Protocol
Protocol π
transcript Π
B
A
𝐼𝐶(𝜋, 𝜇) = 𝐼(Π; 𝑌|𝑋) + 𝐼(Π; 𝑋|𝑌)
what Alice learns about Y + what Bob learns about X
Example
•𝐹 is “𝑋 = 𝑌? ”.
•𝜇 is a distribution where w.p. ½ 𝑋 = 𝑌 and w.p.
½ (𝑋, 𝑌) are random.
Y
X
MD5(X) [128 bits]
X=Y? [1 bit]
A
B
𝐼𝐶(𝜋, 𝜇) = 𝐼(Π; 𝑌|𝑋) + 𝐼(Π; 𝑋|𝑌) ≈ 1 + 65 = 66 bits
what Alice learns about Y + what Bob learns about X
Prior 𝜇 matters a lot for
information cost!
• If 𝜇 = 1
𝑥,𝑦
a singleton,
𝐼𝐶 𝜋, 𝜇 = 0.
29
Example
•𝐹 is “𝑋 = 𝑌? ”.
•𝜇 is a distribution where (𝑋, 𝑌) are just
uniformly random.
Y
X
MD5(X) [128 bits]
X=Y? [1 bit]
A
B
𝐼𝐶(𝜋, 𝜇) = 𝐼(Π; 𝑌|𝑋) + 𝐼(Π; 𝑋|𝑌) ≈ 0 + 128 = 128 bits
what Alice learns about Y + what Bob learns about X
Basic definition 2: Information
complexity
• Communication complexity:
𝐶𝐶 𝐹, 𝜇, 𝜀 ≔
min
𝜋 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑠
𝐹 𝑤𝑖𝑡ℎ 𝑒𝑟𝑟𝑜𝑟 ≤𝜀
Needed!
• Analogously:
𝐼𝐶 𝐹, 𝜇, 𝜀 ≔
𝜋.
inf
𝜋 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑠
𝐹 𝑤𝑖𝑡ℎ 𝑒𝑟𝑟𝑜𝑟 ≤𝜀
𝐼𝐶(𝜋, 𝜇).
31
Prior-free information complexity
• Using minimax can get rid of the prior.
• For communication, we had:
𝐶𝐶 𝐹, 𝜀 = max 𝐶𝐶(𝐹, 𝜇, 𝜀).
𝜇
• For information
𝐼𝐶 𝐹, 𝜀 ≔
inf
𝜋 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑠
𝐹 𝑤𝑖𝑡ℎ 𝑒𝑟𝑟𝑜𝑟 ≤𝜀
max 𝐼𝐶(𝜋, 𝜇).
𝜇
32
Operationalizing IC: Information
equals amortized communication
• Recall [Shannon]: lim 𝐶(𝑋 𝑛 )/𝑛 = 𝐻 𝑋 .
𝑛→∞
• Turns out [B.-Rao’11]:
lim 𝐶𝐶(𝐹 𝑛 , 𝜇𝑛 , 𝜀)/𝑛 = 𝐼𝐶 𝐹, 𝜇, 𝜀 , for 𝜀 > 0.
𝑛→∞
[Error 𝜀 allowed on each copy]
• For 𝜀 = 0: lim 𝐶𝐶(𝐹 𝑛 , 𝜇𝑛 , 0+ )/𝑛 = 𝐼𝐶 𝐹, 𝜇, 0 .
𝑛→∞
𝑛 𝑛
• [ lim 𝐶𝐶(𝐹 , 𝜇 , 0)/𝑛 an interesting open
𝑛→∞
problem.]
33
Entropy vs. Information Complexity
Entropy
IC
Additive?
Yes
Yes
Operationalized
𝐶(𝑋 𝑛 )/𝑛
Compression?
lim
𝑛→∞
Huffman: 𝐶 𝑋 ≤
𝐻 𝑋 +1
𝐶𝐶 𝐹 𝑛 , 𝜇𝑛 , 𝜀
lim
𝑛→∞
𝑛
???!
Can interactive communication
be compressed?
• Is it true that 𝐶𝐶 𝐹, 𝜇, 𝜀 ≤ 𝐼𝐶 𝐹, 𝜇, 𝜀 + 𝑂(1)?
• Less ambitiously:
𝐶𝐶 𝐹, 𝜇, 𝑂(𝜀) = 𝑂 𝐼𝐶 𝐹, 𝜇, 𝜀 ?
• (Almost) equivalently: Given a protocol 𝜋 with
𝐼𝐶 𝜋, 𝜇 = 𝐼, can Alice and Bob simulate 𝜋 using
𝑂 𝐼 communication?
• Not known in general…
35
Applications
• Information = amortized communication
means that to understand the amortized
cost of a problem enough to understand its
information complexity.
36
Example: the disjointness function
•
•
•
•
𝑋, 𝑌 are subsets of 1, … , 𝑛 .
Alice gets 𝑋, Bob gets 𝑌.
Need to determine whether 𝑋 ∩ 𝑌 = ∅.
In binary notation need to compute
𝑛
(𝑋𝑖 ∧ 𝑌𝑖 )
𝑖=1
• An operator on 𝑛 copies of the 2-bit AND
function.
37
Set intersection
•
•
•
•
•
𝑋, 𝑌 are subsets of 1, … , 𝑛 .
Alice gets 𝑋, Bob gets 𝑌.
Want to compute 𝑋 ∩ 𝑌.
This is just 𝑛 copies of the 2-bit AND.
Understanding the information complexity
of AND gives tight bounds on both
problems!
38
Exact communication bounds
[B.-Garg-Pankratov-Weinstein’13]
• 𝐶𝐶 𝐼𝑛𝑡𝑛 , 0+ ≥ 𝑛 (trivial).
• 𝐶𝐶 𝐷𝑖𝑠𝑗𝑛 , 0+ = Ω(𝑛) [KalyanasundaramSchnitger’87, Razborov’92]
New:
• 𝐶𝐶 𝐼𝑛𝑡𝑛 , 0+ ≈ 1.4922 𝑛 ± 𝑜(𝑛).
• 𝐶𝐶 𝐷𝑖𝑠𝑗𝑛 , 0+ ≈ 0.4827 𝑛 ± 𝑜 𝑛 .
39
Small set disjointness
• 𝑋, 𝑌 are subsets of 1, … , 𝑛 , 𝑋 , 𝑌 ≤ 𝑘.
• Alice gets 𝑋, Bob gets 𝑌.
• Need to determine whether 𝑋 ∩ 𝑌 = ∅.
• Trivial: 𝐶𝐶 𝐷𝑖𝑠𝑗𝑛𝑘 , 0+ = 𝑂(𝑘 log 𝑛).
• [Hastad-Wigderson’07]: 𝐶𝐶 𝐷𝑖𝑠𝑗𝑛𝑘 , 0+ = Θ 𝑘 .
• [BGPW’13]: 𝐶𝐶 𝐷𝑖𝑠𝑗𝑛𝑘 , 0+ = (2log 2 𝑒) k ± 𝑜(𝑘).
40
Open problem: Computability of IC
• Given the truth table of 𝐹 𝑋, 𝑌 , 𝜇 and 𝜀,
compute 𝐼𝐶 𝐹, 𝜇, 𝜀 .
• Via 𝐼𝐶 𝐹, 𝜇, 𝜀 = lim 𝐶𝐶(𝐹 𝑛 , 𝜇𝑛 , 𝜀)/𝑛 can
𝑛→∞
compute a sequence of upper bounds.
• But the rate of convergence as a function
of 𝑛 is unknown.
41
Open problem: Computability of IC
• Can compute the 𝑟-round 𝐼𝐶𝑟 𝐹, 𝜇, 𝜀
information complexity of 𝐹.
• But the rate of convergence as a function
of 𝑟 is unknown.
• Conjecture:
1
𝐼𝐶𝑟 𝐹, 𝜇, 𝜀 − 𝐼𝐶 𝐹, 𝜇, 𝜀 = 𝑂𝐹,𝜇,𝜀 2 .
𝑟
• This is the relationship for the two-bit AND.
42
Thank You!
43