Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ETH Z¨ urich Dept. Computer Science Spring Semester 2015 Information theory Exercise 4 (Solution) 1 April 2015 4.1 Limit vs AEP Let X1 , X2 , X3 , · · · , Xn , · · · be an infinite sequence i.i.d. random variables, each with probability distribution Q p(x), x ∈ X . That is, Pr(Xi = x) = p(x) and Pr(X1 = x1 , X2 = x2 , · · · , Xn = xn ) = ni=1 p(xi ). a) Find 1 lim p(X1 , ..., Xn ) n . n→∞ Hint: use the AEP. b) Find 1 log p(X1 , ..., Xn ). n→∞ n Recap: The weak law of large numbers states that the sample average converges in probability to the expected value: X 1 limn→∞ Pr x − Ep(x) [X] > ε = 0. n lim x∈X Since X is a random variable, log2 p(X) is also a random variable. We therefore have: X 1 log2 p(x) −Ep(X) [log2 p(X)] > ε = 0. limn→∞ Pr n x∈X | {z } Q log x∈X p(x)=logp(xn ) Therefore: 1 n limn→∞ Pr − log2 p(X ) − H(X) > ε = 0, n or equivalently: (b) 1 p log p(X n ) → H(X) n 2 when n → ∞, or equivalently: (a) 1 p p(xn ) n → 2H(X) when n → ∞. c) Let f (x) be a function from X to the interval (0, 1]. Find the limit " lim n→∞ n Y #1 n f (Xi ) . i=1 The function f is the mapping f : X → (0, 1]. Since: X 1 log2 f (x) − Ep(X) [log2 f (X)] > ε = 0, limn→∞ Pr n x∈X we have: Y 1 p log2 f (x) → Ep(X) [logf (X)] n x∈X or equivalently !1 n Y f (x) p → 2Ep(X) [log2 f (X)] . x∈X 4.2 Errorless coding with AEP Let U1 , U2 , . . . Un be independent, identically distributed random variables that take their value over the set U = {1, . . . , k}, with probability distribution p(i), i ∈ U. Recall that in class, we divided all sequences in U n into two sets: the typical set An,ε and its complement, see Figure 1. We talked about a source coding scheme that only coded the typical sequences in An,ε and declared an error for sequences which are not in An,ε . This scheme has a vanishing error probability (by making ε small and n large). Atypical sequence Typical set: An,✏ Figure 1: Typical sets and source coding a) Fix n, ε, show that n(H(U ) + ε) bits are enough to encode the typical sequences in An,ε in a uniquely decodable way. We call this code C1 (note that C1 is a fixed length code for the typical sequences inside An,ε ). U ∈ {1, . . . , k}. The set of typical sequences is defined as: −n(H(U )+ε) −n(H(U )−ε) n n n n Aε = u ∈ U : p(u ) ∈ 2 ,2 In order to determine the number of bits, we first determine the number of elements in the typical set, |Anε |: X 1= p(un ) un ∈X n (a) X ≥ p(un ) un ∈An ε (b) X ≥ 2−n(H(U )+ε) un ∈X n −n(H(U )+ε) =2 |Anε |, where in (a) we sum over a smaller set (i.e. |Anε | ≥ |U|) and in (b) we use the definition of the typical set. Therefore, the typical set contains at least 2n(H(U )+ε) many elements (i.e. |Anε | ≤ 2n(H(U )+ε) ). We therefore need at least n(H(U ) + ε) bits to encode Anε . b) We refer to the sequences which are not inside An,ε as the atypical sequences. Show that ndlog2 (k)e bits are enough to encode the atypical sequences in a uniquely decodable way. We call this fixed length code C2 . (Hint: how many sequences can there be in total?) |U| = k n , we therefore need at least nlog2 k bits to encode the source U . Atypical sequences are defined as: Bεn = un ∈ U n \Anε . nlog2 k bits therefore suffice to encode Bεn since |Bεn | < |U|. In this exercise, we intend to design a code for (U1 , · · · , Un ) which is error-less (i.e., it has zero error probability). The scheme is as follows: We consider to add an initial “flag” bit to indicate whether sequence (u1 , u2 , . . . un ) ∈ An,ε or not. We then use two fixed-length codes (introduced in parts (a) and (b)) to encode sequences in the typical set and the atypical set separately. More precisely, let C1 , C2 be the fixed-length codes for typical and atypical sequences, then in the end we can use (0, C1 ) to encode the typical sequences, and (1, C2 ) to encode the rest. c) Explain why the code is uniquely decodable. ¯ be the average codeword length for this scheme. Show that d) Let L ¯ ≤ n(H(U ) + ε)p(An,ε ) + ndlog(k)e(1 − p(An,ε )) + 1. L The expected code length is given by: X ¯= L p(un )n(H(U ) + ε) + un ∈An ε | X p(un )LBεn un ∈Bεn {z } expected # bits for typical set | + |{z} 1 flag bit {z } expected # bits for atypical set (a) ≥ n(H(X) + ε)p(Anε ) + nlogk(1 − p(Anε )) + 1, (1) where in (a) we use the result from 4.2b: LBεn < nlogk. e) Show that ¯ L = H(U ) ε→0 n→∞ n As n → ∞, the atypical sequences contain almost no probability mass (i.e. p(Bεn ) = ε). We therefore only encode the typical sequences. By equation 1, the expected code length reduces to: lim lim ¯ = nH(U ) L when n → ∞ and ε → ∞.