Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 The Capacity-Cost Function of a Hard-Constrained Channel Abstract: In this paper we consider hard-constrained costly channels. By using the thermodynamic formalism, we prove thats its capacity-cost function strictly convex and some other related properties. We also obtain estimates for the variations of the cost of the sequences generated by optimum codes. 0.1 Introduction A hard-constrained channel transmits without errors certain sequences and reject others. If for each sequence is assigned a cost, we have the hardconstrained costly channel. This channel has been studied by Khayrallah and Neuhoff in [KN], where they show how to calculate its capacity-cost function, assuming that the channel is finite-state. They also show methods for constructing optimum codes. In the present paper we shall prove that the capacity-cost function of a hardly-constrained channel is strictly convex and some other related properties. The general case is considered, not imposing that the channel is finite-state and also admiting very general cost functions. At the end of the paper, we also prove some estimates concerning the variations of the cost of sequences generated by optimum codes. The techniques we use to prove these facts are from the thermodynamic formalism. The hard constrained channel is used to model the magnetic recording. In this case, the constraints are due to physical limitation of the storage system. It can also be used in Shannon’s telegraph channel, but in this case the constraints are dictated by the scheme of transmission used. Sometimes, it is interesting to add a cost for the usage of an undesirable sequence. The idea is to accept some of these sequences, but not too much of them. This is done by limiting the mean cost of the sequences. In [KN], some examples of practical applications of these channels are presented. The capacity-cost function C(ρ) is the maximum code rate which is possible with average cost at most ρ. This function is only meaningful in a certain interval [ρmin , ρmax ] which are characterized by ρmin = sup{ρ|C(ρ) = 0} ρmax = inf{ρ|C(ρ) = Cmax } 2 The function C(ρ) has some well-known properties. It is continuous, strictly crescent and convex ∩ in this interval [ρmin , ρmax ] (see appendix). In this paper, we shall prove that C(ρ) is in fact a strictly convex ∩ , analytic, function of ρ in the interval (ρmin , ρmax ]. Moreover , we shall also show that dC dC dρ (ρmin ) = +∞ and dρ (ρmax ) = 0 . These techniques used in this paper allow us to show some facts concerning the cost of the sequences generated by optimum codes. One of them is that the cost satisfies assymptotically a normal law. Also, we will show how the capacity-cost function can be used to estimate the assymptotic probability that the average cost differs from its limit ρ. In section 2, we define more precisely the capacity-cost function. In sections 3 and 4 we prove the results about C(ρ) mentioned above. These sections are the core of the article. In section 5, the convergence to normal is studied and in section 6 we show the relation between the capacity-cost function and large deviations of the cost. 0.2 The Capacity-Cost Function In this section, we shall define the capacity-cost function of a hard-constrained channel and relate some of its properties. Let Σ denote the space of sequences in d symbols, Σ = {x = (x0 , x1 , x2 , ...)|xi ∈ {1, ..., d}} and S the shift, S(x0 , x1 , x2 , ...) = (x1 , x2 , ...) A noiseless constrained channel is a subset Π of Σ invariant by S. We shall assume that Π is irredutible and aperiodic ([A],[KN]). This is a natural hypothesis, since if it does not hold we can divide Π in several channels with these properties. The cost is a Hölder-continuous function k : Π → R + ([B]). The average cost kn is given by X 1 n−1 kn (x) = k(S j (x)) n j=0 We say that k is homologous to a constant if, for each x ∈ Σ, lim kn (x) = K n→∞ where the limit K is independent of x ∈ Σ . 0.2. THE CAPACITY-COST FUNCTION 3 Given ρ ≥ 0 , the capacity C(ρ) is defined by Z C(ρ) = max {H(p)| p∈M(¦) kdp ≤ ρ} where M(¦) denote the set of shift invariant probability measures with support in Π. Let ρmin = sup{ρ|C(ρ) = 0} and ρmax = min{ρ|C(ρ) = Cmax } where Cmax = maxp∈M(¦) {H(p)}. Evidently C(ρ) = 0, if ρ < ρmin and C(ρ) = Cmax , if ρ ≥ ρmax . We observe also that if k is homologous to a constant, then ρmin = ρmax . In [E], it is proved that the capacity-cost function C(ρ) is continuous, strictly increasing and convex ∩ in the interval [ρmin , ρmax ]. Example 1: Take Π to be the full shift Σ in 2 symbols and let k : Σ → R+ be given 0if x0 = 1 by k(x0 , x1 , ...) = . This channel is memoryless and so we can 1if x0 = 2 use i.i.d. source tests to find the capacity-cost function [E]. We have then Z C(ρ) = max {H(p)| p∈M(¦) kdp ≤ ρ, pi.i.d.} Every i.i.d. source is characterized by a number in the interval [0, 1] which represents P {x0 = 2}. We shall denote this number also byR p, hoping that this will cause no confusion to the reader. It is clear that kdp = p. Denote by h(p) the entropy function given by h(p) = −p log p − (1 − p) log(1 − p) Then C(ρ) = max {h(p)| p ≤ ρ} p∈[0,1] So we have ρmin = 0, ρmax = 1 2 and for ρ ∈ [0, 12 ], C(ρ) = h(ρ). 4 0.3 The Topological Pressure Given µ the topological pressure P (µ) is given by Z P (µ) = sup {H(p) + µ Π p∈M(¦) kdp} This function is very important in thermodynamic formalism and we shall use it here to derive our results. The theorem of Ruelle states some important facts about it: (1) P is an analytic function of µ. (2) There exists a unique probability measure p∗ = p∗ (µ) in Π such that Z P (µ) = H(p∗ ) + µ Π kdp∗ (3) The derivative of P is given by dP (µ) = dµ Z Π kdp∗ (4) The second derivative of P is given by d2 P (µ) = σ 2 (µ) d2 µ where 1 σ (µ) = lim n→∞ n 2 Z n−1 X ( Π j=0 Z j k(S (x)) − n Π kdp∗ )2 dp∗ Moreover, if k is not homologous to a constant, then σ 2 (µ) > 0, for any µ ∈ R. In this case the topological pressure P is a strictly convex ∪ function of µ. Example 1 (cont.): Returning to example 1, we have that P (µ) = sup {h(p) + µp} p∈[0,1] In order to find the maximum of the above expression we look for p satisfying dh (p) = −µ dp 0.3. THE TOPOLOGICAL PRESSURE Since 5 dh 1−p (p) = log( ) dp p we obtain p∗ = 1 1 + e−µ Therefore P (µ) = log(1 + eµ ) − µ We obtain also expressions for dP 1 =− dµ 1 + eµ and d2 P eµ = dµ2 (1 + eµ )2 In next lemma, we show that for µ ≤ 0, the probability measure that leads to the pressure also leads to the capacity. Lemma 1: R Given µ ≤ 0, take ρ = Π kdp∗ . Then C(ρ) = H(p∗ ) Proof: R If p ∈ M (Π), p 6= p∗ , with Π kdp ≤ ρ then Z H(p) + µ Z Π kdp < H(p∗ ) + µ Π kdp∗ and hence H(p) < H(p∗ ) Therefore Z H(p∗ ) = sup{H(p)| p Π kdp ≤ ρ} = C(ρ) Example 1 (cont.): In example 1, given µ ≤ 0, take p∗ = 1+e1−µ . Then ρ = p∗ and by the above lemma, C(ρ) = h(ρ) This is in accordance with our previous calculus of C(ρ), since for µ ≤ 0, ρ ∈ (0, 21 ]. 6 Remark: For each µ, we can obtain the value of P (µ) as the logarithm of the dominant eigenvalue λ = λ(µ) of a linear positive operator L in the space of continuous functions on Π . The operator L is called the Ruelle-PerronFrobenius operator. The adjoint operator La defined on the space of measures on Π has also λ(µ) as its dominant eigenvalue. If we denote by ν the eigenvector of La associated to λ normalized to be aR probability and by g the eigenvector of L associated to λ normalized by gdv = 1, then the probability p∗ is equal to gν . For more details, see [PP]. In case of a finite-state channel, the operator L reduces to a positive matrix B (see [KN]). Example 2: Consider the costly finite-state channel defined by the graph of figure1. It is a (1, 3) hard channel that admits also sequences of 1’s, but with a unit cost for each pair together (see [KN]). In this example, B= eµ 1 1 1 1 0 0 0 0 1 0 0 0 0 1 0 We can compute P (µ) as the logarithm of the eigenvalue λ of this matrix. The probability vector p∗ is obtained as a product of the normalized left and right eigenvectors associated with λ. The correspondent value of ρ is given by R ∗ , which in this case reduces to the first component of p∗ . For example, kdp Π if µ = 1.0885, one obtain ρ = 0.09 and C(ρ) = 0.8181 (see [KN]). 0.4 The Legendre Transform Assume that k is not homologous to a constant. Let dP (µ) µ→−∞ dµ dP = (0) dµ dP = lim (µ) µ→+∞ dµ ρ− = ρ0 ρ+ lim Denote by L(P)(ρ) the Legendre transform of P (µ), which is defined by L(P)(ρ) = inf {P(µ) − µρ} µ 0.4. THE LEGENDRE TRANSFORM 7 for ρ ∈ (ρ− , ρ+ ). Since P is strictly convex, L(P) is well defined and there exists a unique µ∗ ∈ R such that L(P)(ρ) = P(µ∗ ) − µ∗ ρ and dP ∗ (µ ) = ρ dµ Lemma: Take ρ ∈ (ρ− , ρ0 ) . Then L(P)(ρ) = C(ρ) . Proof: We have L(P)(ρ) = P(µ∗ ) − µ∗ ρ By Ruelle’s theorem, there exists a unique p∗ such that Z P (µ∗ ) = H(p∗ ) + µ∗ and dP ∗ (µ ) = dµ kdp∗ Z kdp∗ Therefore L(P)(ρ) = H(√∗ ) which implies, by lemma 1, that L(P)(ρ) = C(ρ) Lemma: We have that ρ− = ρmin and ρ0 = ρmax . Proof: Since C(ρ) > 0 for ρ > ρ− , we have ρmin ≤ ρ− . And by the convexity of C and limρ→ρ− dC dρ (ρ) = +∞, we conclude that ρ− ≤ ρmin . Hence ρ− = ρmin . At µ = 0, C(ρ0 ) = P (0) = Cmax and therefore ρ0 ≥ ρmax . If µ < 0, ρ < ρ0 and dC dρ (ρ) > 0. This implies that ρ0 ≤ ρmax . Therefore ρ0 = ρmax . Example 1 (cont.): In example 1 we have ρmin = ρ− = 0 and ρmax = ρ0 = 12 . Also, ρ+ = 1. Example 2 (cont.): By making µ = −∞, we obtain ρmin = 0 and C(0) = log(1.4656). And by making µ = 0, we obtain ρmax = 0.2938 and Cmax = log(1.9276) . 8 Theorem: Given ρ ∈ (ρmin , ρmax ], take µ = µ(ρ) such that dP dµ (µ) = ρ. Then µ is an analytic function of ρ and the following formulas are valid: C(ρ) = P (µ) − ρµ dC (ρ) = −µ dρ à !−1 d2 P d2 C (ρ) = − (µ) dρ2 dµ2 Proof: Given ρ ∈ ( ρmin , ρmax ], let µ = µ(ρ) be defined implicitely by the formula dP (µ) = ρ dµ By the theorem of implicit function, µ(ρ) is analytic (?). And dµ (ρ) = dρ à !−1 d2 P (µ) dµ2 And since C(ρ) = P (µ(ρ)) − ρµ(ρ) we have dP dµ dµ dC (ρ) = (µ) (ρ) − µ(ρ) − ρ (ρ) = −µ(ρ) dρ dµ dρ dρ Therefore à !−1 d2 C d2 P dµ (ρ) = − (ρ) = − (µ) dρ2 dρ dµ2 Corollary: The capacity-cost function C(ρ) is analytic in the interval ( ρmin , ρmax ] and strictly convex ∩ in the interval [ρmin , ρmax ]. Moreover, dC (ρmax ) = 0 dρ and lim ρ&ρmin dC (ρ) = +∞ dρ 0.5. CENTRAL LIMIT THEOREM 9 Proof: Since C(ρ) = P (µ(ρ)) − ρµ(ρ), with µ(ρ) analytic, C(ρ) is also analytic. And by the formula à !−1 d2 C d2 P (ρ) = − (µ) dρ2 dµ2 2 we conclude that ddρC2 (ρ) < 0. Therefore C(ρ) is convex ∩ in the interval ( ρmin , ρmax ] . By the continuity of C(ρ) in [ρmin , ρmax ], we conclude that in fact C(ρ) is convex ∩ in the interval [ ρmin , ρmax ]. dC Finally, the equation dC dρ (ρ) = −µ(ρ) implies that dρ (ρmax ) = 0 and limρ&ρmin dC dρ (ρ) = +∞ . 0.5 Central Limit Theorem If we assume that the source is generating symbols independently and with equal probability, and we use an optimal code, then the corresponding sequences in Π will be distributed according to p∗ . The cost k have a mean ρ and an assymptotic variation given by ([PP]) 1 σ (µ) = lim n→∞ n Z 2 n−1 X ( Π j=0 k(S j (x)) − nρ)2 dp∗ This formula can be simplified to ([L]) Z σ 2 (µ) = Π (k(x) − ρ)2 dp∗ + +2 ∞ Z X i=1 Π (k(x) − ρ)(k(S i (x)) − ρ)dp∗ In this case, the central limit theorem is valid [PP]. This means that X 1 n−1 √ k(σ j (x)) n j=0 converges in distribution to the normal N (ρ, σ). Example1(cont.): In example 1, it is clear that, for any i ≥ 1, k(S i (x) is independent of k(x) . Therefore Z Π (k(x) − ρ)(k(S i (x)) − ρ)dp∗ = 0 10 And Z Π (k(x) − ρ)2 dp∗ = p(1 − p)2 + (1 − p)(0 − p)2 Hence σ 2 (µ) = p(1 − p) Example 2(cont.): Calcular o valor de σ 2 (µ) para o valor de µ = 1.0885. 0.6 Large Deviations As in last section, assume that the source is generating symbols independently and with equal probability. If we consider an optimum code, the corresponding sequences in Π will be distributed according to p∗ .We know that Z X 1 n−1 k(T j (x)) = kdp∗ = ρ∗ lim n→∞ n Π j=0 P n−1 In this section, we will study the probability of deviations of n1 j=0 k(T j (x)) from its limit ρ. In thermodinamic formalism, this is called large deviations. Let X 1 n−1 p∗n (A) = p∗ ( k(T j (x)) ∈ A) n j=0 Then, by the above formula, lim p∗ (A) n→∞ n lim p∗ (A) n→∞ n = 1if ρ∗ ∈ A = 0if ρ∗ ∈ /A In the theory of large deviations, the following fact is true [L]: 1 log p∗n (A) = sup(I(ρ)) n→∞ n ρ∈A lim where I(ρ) = L{P(µ + µ∗ ) − P(µ∗ )} Lemma: For µ∗ ≤ 0, let ρ∗ = dP ∗ dµ (µ ). Then I(ρ) = C(ρ) − C(ρ∗ ) − (ρ − ρ∗ ) dC ∗ (ρ ) dρ 0.6. LARGE DEVIATIONS 11 Proof: I(ρ) = inf {P (µ + µ∗ ) − P (µ∗ ) − µρ} µ = inf {P (µ + µ∗ ) − (µ + µ∗ )ρ} − P (µ∗ ) + µ∗ ρ µ = L{P}(ρ) − {P(µ∗ ) − µ∗ ρ∗ } + (ρ − ρ∗ )µ∗ = L{P}(ρ) − L{P}(ρ∗ ) + (ρ − ρ∗ )µ∗ ∗ By lemma 2, L{P}(ρ) = C(ρ) and µ∗ = − dC dρ (ρ ). Therefore I(ρ) = C(ρ) − C(ρ∗ ) − (ρ − ρ∗ ) dC ∗ (ρ ) dρ Remarks: (1) Observe that if ρ∗ ∈ A, then lim n→∞ 1 log p∗n (A) = 0 n (2) Observe that the greter the value of σ 2 (µ), less convex ∩ is the function C(ρ). This implies that the function I(ρ) is less negative and hence the probability of large deviations of k increases.