* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download On The Computability of Julia Sets
Survey
Document related concepts
Transcript
Introduction to information complexity Mark Braverman Princeton University June 30, 2013 1 Part I: Information theory • Information theory, in its modern format was introduced in the 1940s to study the problem of transmitting data over physical channels. communication channel Alice Bob 2 Quantifying “information” • Information is measured in bits. • The basic notion is Shannon’s entropy. • The entropy of a random variable is the (typical) number of bits needed to remove the uncertainty of the variable. • For a discrete variable: 𝐻 𝑋 ≔ ∑ Pr 𝑋 = 𝑥 log 1/Pr[𝑋 = 𝑥] 3 Shannon’s entropy • Important examples and properties: – If 𝑋 = 𝑥 is a constant, then 𝐻 𝑋 = 0. – If 𝑋 is uniform on a finite set 𝑆 of possible values, then 𝐻 𝑋 = log 𝑆. – If 𝑋 is supported on at most 𝑛 values, then 𝐻 X ≤ log 𝑛. – If 𝑌 is a random variable determined by 𝑋, then 𝐻 𝑌 ≤ 𝐻(𝑋). 4 Conditional entropy • For two (potentially correlated) variables 𝑋, 𝑌, the conditional entropy of 𝑋 given 𝑌 is the amount of uncertainty left in 𝑋 given 𝑌: 𝐻 𝑋 𝑌 ≔ 𝐸𝑦~𝑌 H X Y = y . • One can show 𝐻 𝑋𝑌 = 𝐻 𝑌 + 𝐻(𝑋|𝑌). • This important fact is known as the chain rule. • If 𝑋 ⊥ 𝑌, then 𝐻 𝑋𝑌 = 𝐻 𝑋 + 𝐻 𝑌 𝑋 = 𝐻 𝑋 + 𝐻 𝑌 . 5 Example • • • • 𝑋 = 𝐵1 , 𝐵2 , 𝐵3 𝑌 = 𝐵1 ⊕ 𝐵2 , 𝐵2 ⊕ 𝐵4 , 𝐵3 ⊕ 𝐵4 , 𝐵5 Where 𝐵1 , 𝐵2 , 𝐵3 , 𝐵4 , 𝐵5 ∈𝑈 {0,1}. Then – 𝐻 𝑋 = 3; 𝐻 𝑌 = 4; 𝐻 𝑋𝑌 = 5; – 𝐻 𝑋 𝑌 = 1 = 𝐻 𝑋𝑌 − 𝐻 𝑌 ; – 𝐻 𝑌 𝑋 = 2 = 𝐻 𝑋𝑌 − 𝐻 𝑋 . 6 Mutual information • 𝑋 = 𝐵1 , 𝐵2 , 𝐵3 • 𝑌 = 𝐵1 ⊕ 𝐵2 , 𝐵2 ⊕ 𝐵4 , 𝐵3 ⊕ 𝐵4 , 𝐵5 𝐻(𝑋) 𝐻(𝑋|𝑌) 𝐵1 𝐼(𝑋; 𝑌) 𝐵1 ⊕ 𝐵2 𝐵2 ⊕ 𝐵3 𝐻(𝑌|𝑋) 𝐻(𝑌) 𝐵4 𝐵5 7 Mutual information • The mutual information is defined as 𝐼 𝑋; 𝑌 = 𝐻 𝑋 − 𝐻 𝑋 𝑌 = 𝐻 𝑌 − 𝐻(𝑌|𝑋) • “By how much does knowing 𝑋 reduce the entropy of 𝑌?” • Always non-negative 𝐼 𝑋; 𝑌 ≥ 0. • Conditional mutual information: 𝐼 𝑋; 𝑌 𝑍 ≔ 𝐻 𝑋 𝑍 − 𝐻(𝑋|𝑌𝑍) • Chain rule for mutual information: 𝐼 𝑋𝑌; 𝑍 = 𝐼 𝑋; 𝑍 + 𝐼(𝑌; 𝑍|𝑋) 8 • Simple intuitive interpretation. Information Theory • The reason Information Theory is so important for communication is because information-theoretic quantities readily operationalize. • Can attach operational meaning to Shannon’s entropy: 𝐻 𝑋 ≈ “the cost of transmitting 𝑋”. • Let 𝐶 𝑋 be the (expected) cost of transmitting a sample of 𝑋. 9 𝐻 𝑋 = 𝐶(𝑋)? • Not quite. • Let trit 𝑇 ∈𝑈 1,2,3 . 5 3 • 𝐶 𝑇 = ≈ 1.67. 1 2 0 10 3 11 • 𝐻 𝑇 = log 3 ≈ 1.58. • It is always the case that 𝐶 𝑋 ≥ 𝐻(𝑋). 10 But 𝐻 𝑋 and 𝐶(𝑋) are close • Huffman’s coding: 𝐶 𝑋 ≤ 𝐻 𝑋 + 1. • This is a compression result: “an uninformative message turned into a short one”. • Therefore: 𝐻 𝑋 ≤ 𝐶 𝑋 ≤ 𝐻 𝑋 + 1. 11 Shannon’s noiseless coding • The cost of communicating many copies of 𝑋 scales as 𝐻(𝑋). • Shannon’s source coding theorem: – Let 𝐶 𝑋 𝑛 be the cost of transmitting 𝑛 independent copies of 𝑋. Then the amortized transmission cost lim 𝐶(𝑋 𝑛 )/𝑛 = 𝐻 𝑋 . 𝑛→∞ • This equation gives 𝐻(𝑋) operational meaning. 12 𝐻 𝑋 𝑜𝑝𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑎𝑙𝑖𝑧𝑒𝑑 𝑋1 , … , 𝑋𝑛 , … 𝐻(𝑋) per copy to transmit 𝑋’s communication channel 13 𝐻(𝑋) is nicer than 𝐶(𝑋) • 𝐻 𝑋 is additive for independent variables. • Let 𝑇1 , 𝑇2 ∈𝑈 {1,2,3} be independent trits. • 𝐻 𝑇1 𝑇2 = log 9 = 2 log 3. • 𝐶 𝑇1 𝑇2 = 29 9 5 3 < 𝐶 𝑇1 + 𝐶 𝑇2 = 2 × = 30 . 9 • Works well with concepts such as channel capacity. 14 Operationalizing other quantities • Conditional entropy 𝐻 𝑋 𝑌 : • (cf. Slepian-Wolf Theorem). 𝑋1 , … , 𝑋𝑛 , … 𝐻(𝑋|𝑌) per copy to transmit 𝑋’s communication channel 𝑌1 , … , 𝑌𝑛 , … Operationalizing other quantities • Mutual information 𝐼 𝑋; 𝑌 : 𝑌1 , … , 𝑌𝑛 , … 𝑋1 , … , 𝑋𝑛 , … 𝐼 𝑋; 𝑌 per copy to sample 𝑌’s communication channel Information theory and entropy • Allows us to formalize intuitive notions. • Operationalized in the context of one-way transmission and related problems. • Has nice properties (additivity, chain rule…) • Next, we discuss extensions to more interesting communication scenarios. 17 Communication complexity • Focus on the two party randomized setting. Shared randomness R A & B implement a functionality 𝐹(𝑋, 𝑌). X Y F(X,Y) A e.g. 𝐹 𝑋, 𝑌 = “𝑋 = 𝑌? ” B 18 Communication complexity Goal: implement a functionality 𝐹(𝑋, 𝑌). A protocol 𝜋(𝑋, 𝑌) computing 𝐹(𝑋, 𝑌): Shared randomness R m1(X,R) m2(Y,m1,R) m3(X,m1,m2,R) X Y A Communication costF(X,Y) = #of bits exchanged. B Communication complexity • Numerous applications/potential applications. • Considerably more difficult to obtain lower bounds than transmission (still much easier than other models of computation!). 20 Communication complexity • (Distributional) communication complexity with input distribution 𝜇 and error 𝜀: 𝐶𝐶 𝐹, 𝜇, 𝜀 . Error ≤ 𝜀 w.r.t. 𝜇. • (Randomized/worst-case) communication complexity: 𝐶𝐶(𝐹, 𝜀). Error ≤ 𝜀 on all inputs. • Yao’s minimax: 𝐶𝐶 𝐹, 𝜀 = max 𝐶𝐶(𝐹, 𝜇, 𝜀). 𝜇 21 Examples • 𝑋, 𝑌 ∈ 0,1 𝑛 . • Equality 𝐸𝑄 𝑋, 𝑌 ≔ 1𝑋=𝑌 . • 𝐶𝐶 𝐸𝑄, 𝜀 ≈ 1 log . 𝜀 • 𝐶𝐶 𝐸𝑄, 0 ≈ 𝑛. 22 Equality •𝐹 is “𝑋 = 𝑌? ”. •𝜇 is a distribution where w.p. ½ 𝑋 = 𝑌 and w.p. ½ (𝑋, 𝑌) are random. Y X MD5(X) [128 bits] X=Y? [1 bit] A Error? • Shows that 𝐶𝐶 𝐸𝑄, 𝜇, 2−129 ≤ 129. B Examples • 𝑋, 𝑌 ∈ 0,1 𝑛 . • Inner product 𝐼𝑃 𝑋, 𝑌 ≔ ∑𝑖 𝑋𝑖 ⋅ 𝑌𝑖 (𝑚𝑜𝑑 2). • 𝐶𝐶 𝐼𝑃, 0 = 𝑛 − 𝑜(𝑛). In fact, using information complexity: • 𝐶𝐶 𝐼𝑃, 𝜀 = 𝑛 − 𝑜𝜀 (𝑛). 24 Information complexity • Information complexity 𝐼𝐶(𝐹, 𝜀):: communication complexity 𝐶𝐶 𝐹, 𝜀 as • Shannon’s entropy 𝐻(𝑋) :: transmission cost 𝐶(𝑋) 25 Information complexity • The smallest amount of information Alice and Bob need to exchange to solve 𝐹. • How is information measured? • Communication cost of a protocol? – Number of bits exchanged. • Information cost of a protocol? – Amount of information revealed. 26 Basic definition 1: The information cost of a protocol • Prior distribution: 𝑋, 𝑌 ∼ 𝜇. Y X Protocol Protocol π transcript Π B A 𝐼𝐶(𝜋, 𝜇) = 𝐼(Π; 𝑌|𝑋) + 𝐼(Π; 𝑋|𝑌) what Alice learns about Y + what Bob learns about X Example •𝐹 is “𝑋 = 𝑌? ”. •𝜇 is a distribution where w.p. ½ 𝑋 = 𝑌 and w.p. ½ (𝑋, 𝑌) are random. Y X MD5(X) [128 bits] X=Y? [1 bit] A B 𝐼𝐶(𝜋, 𝜇) = 𝐼(Π; 𝑌|𝑋) + 𝐼(Π; 𝑋|𝑌) ≈ 1 + 65 = 66 bits what Alice learns about Y + what Bob learns about X Prior 𝜇 matters a lot for information cost! • If 𝜇 = 1 𝑥,𝑦 a singleton, 𝐼𝐶 𝜋, 𝜇 = 0. 29 Example •𝐹 is “𝑋 = 𝑌? ”. •𝜇 is a distribution where (𝑋, 𝑌) are just uniformly random. Y X MD5(X) [128 bits] X=Y? [1 bit] A B 𝐼𝐶(𝜋, 𝜇) = 𝐼(Π; 𝑌|𝑋) + 𝐼(Π; 𝑋|𝑌) ≈ 0 + 128 = 128 bits what Alice learns about Y + what Bob learns about X Basic definition 2: Information complexity • Communication complexity: 𝐶𝐶 𝐹, 𝜇, 𝜀 ≔ min 𝜋 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑠 𝐹 𝑤𝑖𝑡ℎ 𝑒𝑟𝑟𝑜𝑟 ≤𝜀 Needed! • Analogously: 𝐼𝐶 𝐹, 𝜇, 𝜀 ≔ 𝜋. inf 𝜋 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑠 𝐹 𝑤𝑖𝑡ℎ 𝑒𝑟𝑟𝑜𝑟 ≤𝜀 𝐼𝐶(𝜋, 𝜇). 31 Prior-free information complexity • Using minimax can get rid of the prior. • For communication, we had: 𝐶𝐶 𝐹, 𝜀 = max 𝐶𝐶(𝐹, 𝜇, 𝜀). 𝜇 • For information 𝐼𝐶 𝐹, 𝜀 ≔ inf 𝜋 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑠 𝐹 𝑤𝑖𝑡ℎ 𝑒𝑟𝑟𝑜𝑟 ≤𝜀 max 𝐼𝐶(𝜋, 𝜇). 𝜇 32 Operationalizing IC: Information equals amortized communication • Recall [Shannon]: lim 𝐶(𝑋 𝑛 )/𝑛 = 𝐻 𝑋 . 𝑛→∞ • Turns out [B.-Rao’11]: lim 𝐶𝐶(𝐹 𝑛 , 𝜇𝑛 , 𝜀)/𝑛 = 𝐼𝐶 𝐹, 𝜇, 𝜀 , for 𝜀 > 0. 𝑛→∞ [Error 𝜀 allowed on each copy] • For 𝜀 = 0: lim 𝐶𝐶(𝐹 𝑛 , 𝜇𝑛 , 0+ )/𝑛 = 𝐼𝐶 𝐹, 𝜇, 0 . 𝑛→∞ 𝑛 𝑛 • [ lim 𝐶𝐶(𝐹 , 𝜇 , 0)/𝑛 an interesting open 𝑛→∞ problem.] 33 Entropy vs. Information Complexity Entropy IC Additive? Yes Yes Operationalized 𝐶(𝑋 𝑛 )/𝑛 Compression? lim 𝑛→∞ Huffman: 𝐶 𝑋 ≤ 𝐻 𝑋 +1 𝐶𝐶 𝐹 𝑛 , 𝜇𝑛 , 𝜀 lim 𝑛→∞ 𝑛 ???! Can interactive communication be compressed? • Is it true that 𝐶𝐶 𝐹, 𝜇, 𝜀 ≤ 𝐼𝐶 𝐹, 𝜇, 𝜀 + 𝑂(1)? • Less ambitiously: 𝐶𝐶 𝐹, 𝜇, 𝑂(𝜀) = 𝑂 𝐼𝐶 𝐹, 𝜇, 𝜀 ? • (Almost) equivalently: Given a protocol 𝜋 with 𝐼𝐶 𝜋, 𝜇 = 𝐼, can Alice and Bob simulate 𝜋 using 𝑂 𝐼 communication? • Not known in general… 35 Applications • Information = amortized communication means that to understand the amortized cost of a problem enough to understand its information complexity. 36 Example: the disjointness function • • • • 𝑋, 𝑌 are subsets of 1, … , 𝑛 . Alice gets 𝑋, Bob gets 𝑌. Need to determine whether 𝑋 ∩ 𝑌 = ∅. In binary notation need to compute 𝑛 (𝑋𝑖 ∧ 𝑌𝑖 ) 𝑖=1 • An operator on 𝑛 copies of the 2-bit AND function. 37 Set intersection • • • • • 𝑋, 𝑌 are subsets of 1, … , 𝑛 . Alice gets 𝑋, Bob gets 𝑌. Want to compute 𝑋 ∩ 𝑌. This is just 𝑛 copies of the 2-bit AND. Understanding the information complexity of AND gives tight bounds on both problems! 38 Exact communication bounds [B.-Garg-Pankratov-Weinstein’13] • 𝐶𝐶 𝐼𝑛𝑡𝑛 , 0+ ≥ 𝑛 (trivial). • 𝐶𝐶 𝐷𝑖𝑠𝑗𝑛 , 0+ = Ω(𝑛) [KalyanasundaramSchnitger’87, Razborov’92] New: • 𝐶𝐶 𝐼𝑛𝑡𝑛 , 0+ ≈ 1.4922 𝑛 ± 𝑜(𝑛). • 𝐶𝐶 𝐷𝑖𝑠𝑗𝑛 , 0+ ≈ 0.4827 𝑛 ± 𝑜 𝑛 . 39 Small set disjointness • 𝑋, 𝑌 are subsets of 1, … , 𝑛 , 𝑋 , 𝑌 ≤ 𝑘. • Alice gets 𝑋, Bob gets 𝑌. • Need to determine whether 𝑋 ∩ 𝑌 = ∅. • Trivial: 𝐶𝐶 𝐷𝑖𝑠𝑗𝑛𝑘 , 0+ = 𝑂(𝑘 log 𝑛). • [Hastad-Wigderson’07]: 𝐶𝐶 𝐷𝑖𝑠𝑗𝑛𝑘 , 0+ = Θ 𝑘 . • [BGPW’13]: 𝐶𝐶 𝐷𝑖𝑠𝑗𝑛𝑘 , 0+ = (2log 2 𝑒) k ± 𝑜(𝑘). 40 Open problem: Computability of IC • Given the truth table of 𝐹 𝑋, 𝑌 , 𝜇 and 𝜀, compute 𝐼𝐶 𝐹, 𝜇, 𝜀 . • Via 𝐼𝐶 𝐹, 𝜇, 𝜀 = lim 𝐶𝐶(𝐹 𝑛 , 𝜇𝑛 , 𝜀)/𝑛 can 𝑛→∞ compute a sequence of upper bounds. • But the rate of convergence as a function of 𝑛 is unknown. 41 Open problem: Computability of IC • Can compute the 𝑟-round 𝐼𝐶𝑟 𝐹, 𝜇, 𝜀 information complexity of 𝐹. • But the rate of convergence as a function of 𝑟 is unknown. • Conjecture: 1 𝐼𝐶𝑟 𝐹, 𝜇, 𝜀 − 𝐼𝐶 𝐹, 𝜇, 𝜀 = 𝑂𝐹,𝜇,𝜀 2 . 𝑟 • This is the relationship for the two-bit AND. 42 Thank You! 43