Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to information complexity Mark Braverman Princeton University June 30, 2013 1 Part I: Information theory • Information theory, in its modern format was introduced in the 1940s to study the problem of transmitting data over physical channels. communication channel Alice Bob 2 Quantifying “information” • Information is measured in bits. • The basic notion is Shannon’s entropy. • The entropy of a random variable is the (typical) number of bits needed to remove the uncertainty of the variable. • For a discrete variable: 𝐻 𝑋 ≔ ∑ Pr 𝑋 = 𝑥 log 1/Pr[𝑋 = 𝑥] 3 Shannon’s entropy • Important examples and properties: – If 𝑋 = 𝑥 is a constant, then 𝐻 𝑋 = 0. – If 𝑋 is uniform on a finite set 𝑆 of possible values, then 𝐻 𝑋 = log 𝑆. – If 𝑋 is supported on at most 𝑛 values, then 𝐻 X ≤ log 𝑛. – If 𝑌 is a random variable determined by 𝑋, then 𝐻 𝑌 ≤ 𝐻(𝑋). 4 Conditional entropy • For two (potentially correlated) variables 𝑋, 𝑌, the conditional entropy of 𝑋 given 𝑌 is the amount of uncertainty left in 𝑋 given 𝑌: 𝐻 𝑋 𝑌 ≔ 𝐸𝑦~𝑌 H X Y = y . • One can show 𝐻 𝑋𝑌 = 𝐻 𝑌 + 𝐻(𝑋|𝑌). • This important fact is known as the chain rule. • If 𝑋 ⊥ 𝑌, then 𝐻 𝑋𝑌 = 𝐻 𝑋 + 𝐻 𝑌 𝑋 = 𝐻 𝑋 + 𝐻 𝑌 . 5 Example • • • • 𝑋 = 𝐵1 , 𝐵2 , 𝐵3 𝑌 = 𝐵1 ⊕ 𝐵2 , 𝐵2 ⊕ 𝐵4 , 𝐵3 ⊕ 𝐵4 , 𝐵5 Where 𝐵1 , 𝐵2 , 𝐵3 , 𝐵4 , 𝐵5 ∈𝑈 {0,1}. Then – 𝐻 𝑋 = 3; 𝐻 𝑌 = 4; 𝐻 𝑋𝑌 = 5; – 𝐻 𝑋 𝑌 = 1 = 𝐻 𝑋𝑌 − 𝐻 𝑌 ; – 𝐻 𝑌 𝑋 = 2 = 𝐻 𝑋𝑌 − 𝐻 𝑋 . 6 Mutual information • 𝑋 = 𝐵1 , 𝐵2 , 𝐵3 • 𝑌 = 𝐵1 ⊕ 𝐵2 , 𝐵2 ⊕ 𝐵4 , 𝐵3 ⊕ 𝐵4 , 𝐵5 𝐻(𝑋) 𝐻(𝑋|𝑌) 𝐵1 𝐼(𝑋; 𝑌) 𝐵1 ⊕ 𝐵2 𝐵2 ⊕ 𝐵3 𝐻(𝑌|𝑋) 𝐻(𝑌) 𝐵4 𝐵5 7 Mutual information • The mutual information is defined as 𝐼 𝑋; 𝑌 = 𝐻 𝑋 − 𝐻 𝑋 𝑌 = 𝐻 𝑌 − 𝐻(𝑌|𝑋) • “By how much does knowing 𝑋 reduce the entropy of 𝑌?” • Always non-negative 𝐼 𝑋; 𝑌 ≥ 0. • Conditional mutual information: 𝐼 𝑋; 𝑌 𝑍 ≔ 𝐻 𝑋 𝑍 − 𝐻(𝑋|𝑌𝑍) • Chain rule for mutual information: 𝐼 𝑋𝑌; 𝑍 = 𝐼 𝑋; 𝑍 + 𝐼(𝑌; 𝑍|𝑋) 8 • Simple intuitive interpretation. Information Theory • The reason Information Theory is so important for communication is because information-theoretic quantities readily operationalize. • Can attach operational meaning to Shannon’s entropy: 𝐻 𝑋 ≈ “the cost of transmitting 𝑋”. • Let 𝐶 𝑋 be the (expected) cost of transmitting a sample of 𝑋. 9 𝐻 𝑋 = 𝐶(𝑋)? • Not quite. • Let trit 𝑇 ∈𝑈 1,2,3 . 5 3 • 𝐶 𝑇 = ≈ 1.67. 1 2 0 10 3 11 • 𝐻 𝑇 = log 3 ≈ 1.58. • It is always the case that 𝐶 𝑋 ≥ 𝐻(𝑋). 10 But 𝐻 𝑋 and 𝐶(𝑋) are close • Huffman’s coding: 𝐶 𝑋 ≤ 𝐻 𝑋 + 1. • This is a compression result: “an uninformative message turned into a short one”. • Therefore: 𝐻 𝑋 ≤ 𝐶 𝑋 ≤ 𝐻 𝑋 + 1. 11 Shannon’s noiseless coding • The cost of communicating many copies of 𝑋 scales as 𝐻(𝑋). • Shannon’s source coding theorem: – Let 𝐶 𝑋 𝑛 be the cost of transmitting 𝑛 independent copies of 𝑋. Then the amortized transmission cost lim 𝐶(𝑋 𝑛 )/𝑛 = 𝐻 𝑋 . 𝑛→∞ • This equation gives 𝐻(𝑋) operational meaning. 12 𝐻 𝑋 𝑜𝑝𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑎𝑙𝑖𝑧𝑒𝑑 𝑋1 , … , 𝑋𝑛 , … 𝐻(𝑋) per copy to transmit 𝑋’s communication channel 13 𝐻(𝑋) is nicer than 𝐶(𝑋) • 𝐻 𝑋 is additive for independent variables. • Let 𝑇1 , 𝑇2 ∈𝑈 {1,2,3} be independent trits. • 𝐻 𝑇1 𝑇2 = log 9 = 2 log 3. • 𝐶 𝑇1 𝑇2 = 29 9 5 3 < 𝐶 𝑇1 + 𝐶 𝑇2 = 2 × = 30 . 9 • Works well with concepts such as channel capacity. 14 Operationalizing other quantities • Conditional entropy 𝐻 𝑋 𝑌 : • (cf. Slepian-Wolf Theorem). 𝑋1 , … , 𝑋𝑛 , … 𝐻(𝑋|𝑌) per copy to transmit 𝑋’s communication channel 𝑌1 , … , 𝑌𝑛 , … Operationalizing other quantities • Mutual information 𝐼 𝑋; 𝑌 : 𝑌1 , … , 𝑌𝑛 , … 𝑋1 , … , 𝑋𝑛 , … 𝐼 𝑋; 𝑌 per copy to sample 𝑌’s communication channel Information theory and entropy • Allows us to formalize intuitive notions. • Operationalized in the context of one-way transmission and related problems. • Has nice properties (additivity, chain rule…) • Next, we discuss extensions to more interesting communication scenarios. 17 Communication complexity • Focus on the two party randomized setting. Shared randomness R A & B implement a functionality 𝐹(𝑋, 𝑌). X Y F(X,Y) A e.g. 𝐹 𝑋, 𝑌 = “𝑋 = 𝑌? ” B 18 Communication complexity Goal: implement a functionality 𝐹(𝑋, 𝑌). A protocol 𝜋(𝑋, 𝑌) computing 𝐹(𝑋, 𝑌): Shared randomness R m1(X,R) m2(Y,m1,R) m3(X,m1,m2,R) X Y A Communication costF(X,Y) = #of bits exchanged. B Communication complexity • Numerous applications/potential applications. • Considerably more difficult to obtain lower bounds than transmission (still much easier than other models of computation!). 20 Communication complexity • (Distributional) communication complexity with input distribution 𝜇 and error 𝜀: 𝐶𝐶 𝐹, 𝜇, 𝜀 . Error ≤ 𝜀 w.r.t. 𝜇. • (Randomized/worst-case) communication complexity: 𝐶𝐶(𝐹, 𝜀). Error ≤ 𝜀 on all inputs. • Yao’s minimax: 𝐶𝐶 𝐹, 𝜀 = max 𝐶𝐶(𝐹, 𝜇, 𝜀). 𝜇 21 Examples • 𝑋, 𝑌 ∈ 0,1 𝑛 . • Equality 𝐸𝑄 𝑋, 𝑌 ≔ 1𝑋=𝑌 . • 𝐶𝐶 𝐸𝑄, 𝜀 ≈ 1 log . 𝜀 • 𝐶𝐶 𝐸𝑄, 0 ≈ 𝑛. 22 Equality •𝐹 is “𝑋 = 𝑌? ”. •𝜇 is a distribution where w.p. ½ 𝑋 = 𝑌 and w.p. ½ (𝑋, 𝑌) are random. Y X MD5(X) [128 bits] X=Y? [1 bit] A Error? • Shows that 𝐶𝐶 𝐸𝑄, 𝜇, 2−129 ≤ 129. B Examples • 𝑋, 𝑌 ∈ 0,1 𝑛 . • Inner product 𝐼𝑃 𝑋, 𝑌 ≔ ∑𝑖 𝑋𝑖 ⋅ 𝑌𝑖 (𝑚𝑜𝑑 2). • 𝐶𝐶 𝐼𝑃, 0 = 𝑛 − 𝑜(𝑛). In fact, using information complexity: • 𝐶𝐶 𝐼𝑃, 𝜀 = 𝑛 − 𝑜𝜀 (𝑛). 24 Information complexity • Information complexity 𝐼𝐶(𝐹, 𝜀):: communication complexity 𝐶𝐶 𝐹, 𝜀 as • Shannon’s entropy 𝐻(𝑋) :: transmission cost 𝐶(𝑋) 25 Information complexity • The smallest amount of information Alice and Bob need to exchange to solve 𝐹. • How is information measured? • Communication cost of a protocol? – Number of bits exchanged. • Information cost of a protocol? – Amount of information revealed. 26 Basic definition 1: The information cost of a protocol • Prior distribution: 𝑋, 𝑌 ∼ 𝜇. Y X Protocol Protocol π transcript Π B A 𝐼𝐶(𝜋, 𝜇) = 𝐼(Π; 𝑌|𝑋) + 𝐼(Π; 𝑋|𝑌) what Alice learns about Y + what Bob learns about X Example •𝐹 is “𝑋 = 𝑌? ”. •𝜇 is a distribution where w.p. ½ 𝑋 = 𝑌 and w.p. ½ (𝑋, 𝑌) are random. Y X MD5(X) [128 bits] X=Y? [1 bit] A B 𝐼𝐶(𝜋, 𝜇) = 𝐼(Π; 𝑌|𝑋) + 𝐼(Π; 𝑋|𝑌) ≈ 1 + 65 = 66 bits what Alice learns about Y + what Bob learns about X Prior 𝜇 matters a lot for information cost! • If 𝜇 = 1 𝑥,𝑦 a singleton, 𝐼𝐶 𝜋, 𝜇 = 0. 29 Example •𝐹 is “𝑋 = 𝑌? ”. •𝜇 is a distribution where (𝑋, 𝑌) are just uniformly random. Y X MD5(X) [128 bits] X=Y? [1 bit] A B 𝐼𝐶(𝜋, 𝜇) = 𝐼(Π; 𝑌|𝑋) + 𝐼(Π; 𝑋|𝑌) ≈ 0 + 128 = 128 bits what Alice learns about Y + what Bob learns about X Basic definition 2: Information complexity • Communication complexity: 𝐶𝐶 𝐹, 𝜇, 𝜀 ≔ min 𝜋 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑠 𝐹 𝑤𝑖𝑡ℎ 𝑒𝑟𝑟𝑜𝑟 ≤𝜀 Needed! • Analogously: 𝐼𝐶 𝐹, 𝜇, 𝜀 ≔ 𝜋. inf 𝜋 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑠 𝐹 𝑤𝑖𝑡ℎ 𝑒𝑟𝑟𝑜𝑟 ≤𝜀 𝐼𝐶(𝜋, 𝜇). 31 Prior-free information complexity • Using minimax can get rid of the prior. • For communication, we had: 𝐶𝐶 𝐹, 𝜀 = max 𝐶𝐶(𝐹, 𝜇, 𝜀). 𝜇 • For information 𝐼𝐶 𝐹, 𝜀 ≔ inf 𝜋 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑠 𝐹 𝑤𝑖𝑡ℎ 𝑒𝑟𝑟𝑜𝑟 ≤𝜀 max 𝐼𝐶(𝜋, 𝜇). 𝜇 32 Operationalizing IC: Information equals amortized communication • Recall [Shannon]: lim 𝐶(𝑋 𝑛 )/𝑛 = 𝐻 𝑋 . 𝑛→∞ • Turns out [B.-Rao’11]: lim 𝐶𝐶(𝐹 𝑛 , 𝜇𝑛 , 𝜀)/𝑛 = 𝐼𝐶 𝐹, 𝜇, 𝜀 , for 𝜀 > 0. 𝑛→∞ [Error 𝜀 allowed on each copy] • For 𝜀 = 0: lim 𝐶𝐶(𝐹 𝑛 , 𝜇𝑛 , 0+ )/𝑛 = 𝐼𝐶 𝐹, 𝜇, 0 . 𝑛→∞ 𝑛 𝑛 • [ lim 𝐶𝐶(𝐹 , 𝜇 , 0)/𝑛 an interesting open 𝑛→∞ problem.] 33 Entropy vs. Information Complexity Entropy IC Additive? Yes Yes Operationalized 𝐶(𝑋 𝑛 )/𝑛 Compression? lim 𝑛→∞ Huffman: 𝐶 𝑋 ≤ 𝐻 𝑋 +1 𝐶𝐶 𝐹 𝑛 , 𝜇𝑛 , 𝜀 lim 𝑛→∞ 𝑛 ???! Can interactive communication be compressed? • Is it true that 𝐶𝐶 𝐹, 𝜇, 𝜀 ≤ 𝐼𝐶 𝐹, 𝜇, 𝜀 + 𝑂(1)? • Less ambitiously: 𝐶𝐶 𝐹, 𝜇, 𝑂(𝜀) = 𝑂 𝐼𝐶 𝐹, 𝜇, 𝜀 ? • (Almost) equivalently: Given a protocol 𝜋 with 𝐼𝐶 𝜋, 𝜇 = 𝐼, can Alice and Bob simulate 𝜋 using 𝑂 𝐼 communication? • Not known in general… 35 Applications • Information = amortized communication means that to understand the amortized cost of a problem enough to understand its information complexity. 36 Example: the disjointness function • • • • 𝑋, 𝑌 are subsets of 1, … , 𝑛 . Alice gets 𝑋, Bob gets 𝑌. Need to determine whether 𝑋 ∩ 𝑌 = ∅. In binary notation need to compute 𝑛 (𝑋𝑖 ∧ 𝑌𝑖 ) 𝑖=1 • An operator on 𝑛 copies of the 2-bit AND function. 37 Set intersection • • • • • 𝑋, 𝑌 are subsets of 1, … , 𝑛 . Alice gets 𝑋, Bob gets 𝑌. Want to compute 𝑋 ∩ 𝑌. This is just 𝑛 copies of the 2-bit AND. Understanding the information complexity of AND gives tight bounds on both problems! 38 Exact communication bounds [B.-Garg-Pankratov-Weinstein’13] • 𝐶𝐶 𝐼𝑛𝑡𝑛 , 0+ ≥ 𝑛 (trivial). • 𝐶𝐶 𝐷𝑖𝑠𝑗𝑛 , 0+ = Ω(𝑛) [KalyanasundaramSchnitger’87, Razborov’92] New: • 𝐶𝐶 𝐼𝑛𝑡𝑛 , 0+ ≈ 1.4922 𝑛 ± 𝑜(𝑛). • 𝐶𝐶 𝐷𝑖𝑠𝑗𝑛 , 0+ ≈ 0.4827 𝑛 ± 𝑜 𝑛 . 39 Small set disjointness • 𝑋, 𝑌 are subsets of 1, … , 𝑛 , 𝑋 , 𝑌 ≤ 𝑘. • Alice gets 𝑋, Bob gets 𝑌. • Need to determine whether 𝑋 ∩ 𝑌 = ∅. • Trivial: 𝐶𝐶 𝐷𝑖𝑠𝑗𝑛𝑘 , 0+ = 𝑂(𝑘 log 𝑛). • [Hastad-Wigderson’07]: 𝐶𝐶 𝐷𝑖𝑠𝑗𝑛𝑘 , 0+ = Θ 𝑘 . • [BGPW’13]: 𝐶𝐶 𝐷𝑖𝑠𝑗𝑛𝑘 , 0+ = (2log 2 𝑒) k ± 𝑜(𝑘). 40 Open problem: Computability of IC • Given the truth table of 𝐹 𝑋, 𝑌 , 𝜇 and 𝜀, compute 𝐼𝐶 𝐹, 𝜇, 𝜀 . • Via 𝐼𝐶 𝐹, 𝜇, 𝜀 = lim 𝐶𝐶(𝐹 𝑛 , 𝜇𝑛 , 𝜀)/𝑛 can 𝑛→∞ compute a sequence of upper bounds. • But the rate of convergence as a function of 𝑛 is unknown. 41 Open problem: Computability of IC • Can compute the 𝑟-round 𝐼𝐶𝑟 𝐹, 𝜇, 𝜀 information complexity of 𝐹. • But the rate of convergence as a function of 𝑟 is unknown. • Conjecture: 1 𝐼𝐶𝑟 𝐹, 𝜇, 𝜀 − 𝐼𝐶 𝐹, 𝜇, 𝜀 = 𝑂𝐹,𝜇,𝜀 2 . 𝑟 • This is the relationship for the two-bit AND. 42 Thank You! 43