Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CHAPTER 1 An Introduction to Codes Basic Definitions The concept of a string is fundamental to the subjects of information and coding theory. Let A = {a1 , a2 , . . . , an } be a finite nonempty set which we refer to as an alphabet. A string over A is simply a finite sequence of elements of A. Strings will be denoted by boldface letters, such as x and y. If x = x1 x2 · · · xk is a string over A, then each xi in x is called an element of x. The length of a string x, denoted by Len(x), is the number of elements in the string x. A code is nothing more than a set of strings over a certain alphabet. Of course, codes are generally used to encode messages. Definition 1.1. Let A = {a1 , a2 , . . . , ar } be a set of r elements, which we call a code alphabet and whose elements are called code symbols. An r-ary code over A is a subset C of the set of all strings over A. The number r is called the radix of the code. The elements of C are called codewords and the number of codewords of C is called the size of C. When A = Z2 and A = Z3 , codes over A are referred to as binary codes and ternary codes, respectively. Definition 1.2. Let S = {s1 , s2 , . . . , sn } be a finite set which we refer to as source alphabet. The elements of S are called source symbol and the number of source symbol in S is called the size of S. Let C be a code. An encoding function is a bijective function f : S → C, from S to C. We refer to the ordered pair (C, f ) as an encoding scheme for S. Definition 1.3. If all the codewords in a code C have the same length, we say that C is a fixed length code, or block code. Any encoding scheme that uses a fixed length code will be referred to as a fixed length encoding scheme. If C contains codewords of different lengths, we say that C is a variable length code. Any encoding scheme that uses a variable length code will be referred to as a variable length encoding schemes. Fixed length codes have advantages and disadvantages over variable length codes. One advantage is that they never require a special symbol to separate the source symbols in the message being coded. Perhaps the main disadvantage of fixed length codes is that source symbols that are used frequently have codes as long as source symbols that are used infrequently. On the other hand, variable length codes, which can encode frequently used source symbols using shorter codewords, can save a great deal of time and space. Uniquely Decipherable Codes Definition 1.4. A code C over an alphabet A is uniquely decipherable if no two different sequences of codewords in C represents the same string over A. In symbols, if c1 c2 · · · cm = d1 d2 · · · dn for ci , dj ∈ C, then m = n and ci = di for all i = 1, . . . , n. 1 2 1. AN INTRODUCTION TO CODES The following theorem, proved by McMillan in 1956, provides us some information about the codeword lengths for unique decipherable code. Theorem 1.5 (McMillan’s Theorem). Let C = {c1 , c2 , . . . , cn } be a uniquely decipherable r-ary code and let li = Len(ci ). Then its codeword lengths l1 , l2 . . . , ln must satisfy n X 1 ≤ 1. r li i=1 Remark 1.6. Consider the binary code C = {0, 11, 100, 110}. Its codeword lengths 1, 2, 3 and 3 satisfy Kraft’s Inequality, but it is not uniquely decipherable. Hence McMillan’s Theorem cannot tell us when a particular code is uniquely decipherable, but only when it is not. Instantaneous Codes Definition 1.7. A code is said to be instantaneous if each codeword in any string of codewords can be decoded (reading from left to right) as soon as it is received. If a code is instantaneous, then it is also uniquely decipherable. However, there exist codes that are uniquely decipherable but not instantaneous. Definition 1.8. A code is said to have the prefix property if no codeword is a prefix of any other codeword, that is, if whenever c = x1 x2 · · · xn is a codeword, then x1 x2 · · · xk is not a codeword for 1 ≤ k < n. Given a code C, it is easy to determine whether or not it has the prefix property. It is only necessary to compare each codeword with all codewords of greater length to see if it is a prefix. The importance of the prefix property comes from the following proposition. Proposition 1.9. A code is instantaneous if and only if it has the prefix property. Now we come to a theorem, published by L. G. Kraft in 1949, gives a simple criterion to determine whether or not there is an instantaneous code with given codeword lengths. Theorem 1.10 (Kraft’s Theorem). There exists an instantaneous r-ary code C with codeword lengths l1 , . . . , ln , if and only if these lengths satisfy Kraft’s inequality, n X 1 ≤ 1. li r i=1 Remark 1.11. Again we should point out that as in Remark 1.6, Kraft’s Theorem does not say that any code whose codeword lengths satisfy Kraft’s inequality must be instantaneous. However, we can construct an instantaneous code with these codeword lengths. Definition 1.12. An instantaneous code C is said to be maximal instantaneous if C is not contained in any strictly larger instantaneous code. Corollary 1.13. Let C be an instantaneous r-ary code with codeword lengths l1 , . . . , ln . Then C is maximal instantaneous if and only if these lengths satisfy n X 1 = 1. r li i=1 EXERCISES 3 McMillan’s Theorem and Kraft’s Theorem together tell us something interesting about the relationship between uniquely decipherable codes and instantaneous codes. We have the following useful result. Corollary 1.14. If a uniquely decipherable code exists with codeword lengths l1 , . . . , ln , then an instantaneous code must also exist with these same codeword lengths. Our interest in Corollary 1.14 will come later, when we turn to questions related to codeword lengths. For it tells us that we lose nothing by considering only instantaneous codes rather than all uniquely decipherable codes. Exercises (1) What is the minimum possible length for a binary block code containing n codewords? (2) How many encoding functions are possible from the source alphabet S = {a, b, c} to the code C = {00, 01, 11}? List them. (3) How many r-ary codes are there with maximum codeword length n over an alphabet A? What is this number for r = 2 and n = 5? (4) Which of the the following codes C1 = {0, 01, 011, 0111, 01111, 11111} and C2 = {0, 10, 1101, 1110, 1011, 110110} (5) (6) (7) (8) are uniquely decipherable? Is it possible to construct a uniquely decipherable code over the alphabet {0, 1, . . . , 9} with nine codewords of length 1, nine codewords of length 2, ten codewords of length 3 and ten codewords of length 4? For a given binary code C = {0, 10, 11}, let N (k) be the total number of sequences of codewords that contain exactly k bits. For instance, we have N (3) = 5. Show that in this case N (k) = N (k − 1) + 2N (k − 2), for all k ≥ 3. Suppose that we want an instantaneous binary code that contains the codewords 0, 10 and 1100. How many additional codewords of length 6 could be added to this code? Suppose that C is a maximal instantaneous code with maximum codeword length m. Show that C must contain at least two codewords of maximum length m. CHAPTER 2 Noiseless Coding Optimal Encoding Schemes In order to achieve unique decipherability, McMillan’s Theorem tells us that we must allow reasonably long codewords. Unfortunately, this tends to reduce the efficiency of a code. On the other hand, it is often the case that not all source symbols occur with the same frequency within a given class of messages. When no errors can occur in the transmission of data, it makes sense to assign the longer codewords to the less frequently used source symbols, thereby improving the efficiency of the code. Definition 2.1. An information source is an ordered pair I = (S, P), where S = {s1 , . . . , sn } is a source alphabet and P is a probability law that assigns to each source symbol si of S a probability P(si ). The sequence P(s1 ), . . . , P(sn ) is the probability distribution for I. For the noiseless coding, the measure of efficiency of an encoding scheme is its average codeword length. Definition 2.2. The average codeword length of an encoding scheme (C, f ) for an information source I = (S, P), where S = {s1 , . . . , sn }, is defined by n X Len(f (si ))P(si ). i=1 We should emphasizes the fact that the average codeword length of an encoding scheme is not the same as the average codeword length of a code, since the former depends also on the probability distribution. It is clear that the average codeword length of an encoding scheme is not affected by the nature of the source symbols themselves. Hence, for the purposes of measuring average codeword length, we may assume that the codewords are assigned directly to the probabilities. Accordingly, we may speak of an encoding scheme (c1 , . . . , cn ) for the probability distribution (p1 , . . . , pn ). With this in mind, the average codeword length of an encoding scheme C = (c1 , . . . , cn ) is n X pi Len(ci ). AveLen(C) = i=1 Let (C1 , f1 ) and (C2 , f2 ) be two encoding schemes of the information source I such that the corresponding codes have the same radix. We say that (C1 , f1 ) is more efficient than (C2 , f2 ), if AveLen(C1 ) < AveLen(C2 ). We should point out that it makes sense to compare the average codeword lengths of different encoding schemes only when the corresponding codes have the same radix. For in general the larger the radix, the shorter we can make the average codeword length. 5 6 2. NOISELESS CODING We will use the notation MinAveLenr (p1 , . . . , pn ) to denote the minimum average codeword length among all r-ary instantaneous encoding schemes for the probability distribution (p1 , . . . , pn ). Definition 2.3. An optimal r-ray encoding scheme for a probability distribution (p1 , . . . , pn ) is an r-ary instantaneous encoding scheme (c1 , . . . , cn ) for which AveLen(c1 , . . . , cn ) = MinAveLenr (p1 , . . . , pn ). Note the optimal encoding schemes are, by definition, instantaneous. By virtue of Corollary 1.14, this minimum is also over all uniquely decipherable schemes. Hence, we may restrict attention to instantaneous codes. Huffman Encoding In 1952 D. A. Huffman published a method for constructing optimal encoding schemes. This method is now known as Huffman encoding. Since we are dealing with r-ary codes, we may as well assume that the code alphabet is {1, 2, . . . , r}. Lemma 2.4. Let P = (p1 , . . . , pn ) be a probability distribution, with p1 ≥ p2 ≥ · · · ≥ pn . Then there exists an optimal r-ary encoding scheme C = (c1 , . . . , cn ) for P that has exactly s codewords of maximum length of the form d1, d2, . . . , ds, where s is uniquely determined by the conditions s ≡ n (mod r − 1) and 2 ≤ s ≤ r. As a result, for such probability distributions, we have where q = Pn MinAveLenr (p1 , . . . , pn ) = MinAveLenr (p1 , . . . , pn−s , q) + q, i=n−s+1 pi . By Lemma 2.4 we can present Huffman’s algorithm. Theorem 2.5. The following algorithm H produces r-ary optimal encoding schemes C for probability distributions P: (1) If P = (p1 , . . . , pn ), where n ≤ r, then let C = (1, . . . , n). (2) If P = (p1 , . . . , pn ), where n > r, then (a) Reorder P if necessary so that p1 ≥ pP 2 ≥ · · · ≥ pn . (b) Let Q = (p1 , . . . , pn−s , q), where q = ni=n−s+1 pi and s is uniquely determined by the conditions s ≡ n (mod r − 1) and 2 ≤ s ≤ r. (c) Perform the algorithm H on Q, obtaining an encoding scheme D = (c1 , . . . , cn−s , d). (d) Let C = (c1 , . . . , cn−s , d1, d2, . . . , ds). Entropy of a Source For the information obtained from a source symbol, it should have the property that the less likely a source symbol is to occur, the more information we obtain from an occurrence of that symbol, and conversely. Because the information obtained from a source symbol is not a function of the symbol itself, but rather of the symbol’s probability of occurrence p, we use the notation I(p) to denote the information obtained from a source symbol with probability of occurrence p. ENTROPY OF A SOURCE 7 Definition 2.6. For a source alphabet S, the r-ary information Ir (p) obtained from a source symbol s ∈ S with probability of occurrence p, is given by 1 Ir (p) = logr . p Ir (p) can be characterized by the fact that it is the only continuous function on (0, 1] with the property that Ir (pq) = Ir (p) + Ir (q) and Ir (1/r) = 1. Definition 2.7. Let P = {p1 , . . . , pn } be a probability distribution. The r-ary entropy of the distribution P is Hr (P) = n X pi Ir (pi ) = i=1 n X pi logr i=1 1 . pi (When pi = 0 we set pi logr (1/pi ) = 0.) If I = (S, P) is a information source, with probability distribution P = {p1 , . . . , pn }, then we refer to Hr (I) = Hr (P) as the entropy of the source I. The quantity Hr (I) is the average information obtained from a simple sample of I. It seems reasonable to say that sampling from I with equal probability gives an amount of information equal to one r-ary unit. For instance, if S = {0, 1} with P(0) = 1/2 and P(1) = 1/2, then it gives us one binary unit of information (or one bit of information). We mention that many books on information theory restrict attention to binary entropy and use the notation H(p1 , . . . , pn ) for binary entropy. To begin with the main properties of entropy, we begin with a lemma which can be easily derived from the fact that ln x ≤ x − 1, for all x > 0, and equality holds only when x = 1 . Lemma 2.8. Let P = {p1 , . . . , pn } be a probability distribution. Let Q = {q1 , . . . , qn } Pn have the property that 0 ≤ qi ≤ 1 for all i, and i=1 qi ≤ 1. Then n X i=1 n pi logr X 1 1 ≤ pi logr , pi qi i=1 (We set 0 · logr 10 = 0 and p logr 01 = +∞, for p > 0.) Furthermore, the equality holds if and only if pi = qi for all i. With Lemma 2.8 at our disposal, we can get the range of th entropy function. Theorem 2.9. For a information source I = (S, P) of size n (i.e. |S| = n), the entropy satisfies 0 ≤ Hr (P) ≤ logr n. Furthermore, Hr (P) = logr n if and only if the source has a uniform distribution (i.e. all of the source symbols are equally likely to occur), and Hr (P) = 0 if and only if one of the source symbols has probability 1 of occurring. Theorem 2.9 confirms the fact that, on the average, the most information is obtained from sources for which each source symbol is equally likely to occur. 8 2. NOISELESS CODING The Noiseless Coding Theorem As we know, the entropy H(I) of an information source I is the amount of information contained in the source. Further, since an instantaneous encoding scheme for I captures the information in the source, it is reasonable to believe that the average codeword length of such a code must be at least as large as the entropy. In fact, this is what the Noiseless Coding Theorem says. Theorem 2.10 (The Noiseless Coding Theorem). For any probability distribution P = (p1 , . . . , pn ), we have Hr (p1 , . . . , pn ) ≤ MinAveLenr (p1 , . . . , pn ) < Hr (p1 , . . . , pn ) + 1. Notice that the condition for equality in Theorem 2.10 is that li = − logr pi , which means that logr pi is an integer. Since this is not often the case, we cannot often expect equality. In general, if we choose the integer li to satisfy 1 1 logr ≤ li < logr + 1, pi pi for all i, then, by Kraft’s Theorem, there is an instantaneous encodings with these codeword lengths. An encoding scheme constructed by this method is referred as a Shannon-Fano encoding scheme. However, this method does not, in general, give the smallest possible average codeword length. The Noiseless Coding Theorem determines MinAveLenr (p1 , . . . , pn ) to within 1 r-ary unit, but this may still be too much for some purposes. Fortunately, there is a way to improve upon this, based on the following idea. Definition 2.11. Let S = {x1 , . . . , xn } with probability distribution P(xi ) = pi , for all i. The k-th extension of I = (S, P) is I k = (S k , P k ), where S k is the set of all strings of length k over S and P n is the probability distribution defined for x = x1 x2 · · · xk ∈ S k by P k (x) = P(x1 ) · · · P(xk ). The entropy of an extension I k is related to the entropy of I in a very simple way. It seems intuitively clear that, since we get k times as much information from a string of length k as from a single symbol, the entropy of I k should be k times the entropy of I. The following lemma confirms this. Lemma 2.12. Let I be an information source and let I k be its k-th extension. Then Hr (I k ) = kHr (I). Applying the Noiseless Coding Theorem to the extension I k and using Lemma 2.12, gives the final version of the Noiseless Coding Theorem. Theorem 2.13. Let P be a probability distribution and let P k be its k-th extension. Then Hr (P) ≤ 1 MinAveLenr (S k ) < Hr (P) + . k k Since each codeword in the k-th extension S k encodes k source symbol from S, the quantity MinAveLenr (S k ) k EXERCISE 9 is the minimum average codeword length per source symbol of S, taken over all uniquely decipherable r-ary encodings of S k . Theorem 2.13 says that, by encoding a sufficiently long extension of I, we may make the minimum average codeword length per source symbol of S as close to the entropy Hr (P) as desired. The penalty for doing so is that, since |S k | = |S|k , the number of codewords required to encode the k-th extension S k grows exceedingly large as k gets large. Exercise (1) Let P = (0.3, 0.1, 0.1, 0.1, 0.1, 0.06, 0.05, 0.05, 0.05, 0.04, 0.03, 0.02). Find the Huffman encodings of P for the given radix r, with r = 2, 3, 4. (2) Determine possible probability distributions that have (00, 01, 10, 11) and (0, 10, 110, 111) as binary Huffman encodings. (3) Determine all possible ternary Huffman encodings of sizes 5 and 6. (4) Let C be a binary Huffman encoding. Prove that C is maximal instantaneous. (5) Let C be a binary Huffman encoding for the uniform probability distribution P = (1/n, . . . , 1/n) and suppose that Len(ci ) = li for i = 1, . .P . , n. Let m = maxi {li } (a) Show that C has the minimum total codeword length ni=1 li among all instantaneous encodings. (b) Show that there exist two codewords c and d in C such that Len(c) = Len(d) = m, and c and d differ only in their last positions. (c) Show that m − 1 ≤ li ≤ m for i = 1, . . . , n. (d) Let n = α2k , where 1 < α ≤ 2. Let u be the number of codewords of length m − 1 and let v be the number of codewords of length m. determine u, v and m in terms of α and k. (e) Find MinAveLen2 (P). (6) Prove the following properties of entropy. (a) Let {p1 , . . . , pn , q1 , . . . , qm } be a probability distribution. If p = p1 + · + pn , then ¡ p1 ¡ q1 pn ¢ qm ¢ Hr (p1 , . . . , pn , q1 , . . . , qm ) = Hr (p, 1 − p) + pHr ,..., + (1 − p)Hr ,..., . p p 1−p 1−p (b) Let P = {p1 , . . . , pn } and Q = {q1 , . . . , qn } be two probability distributions. For 0 ≤ t ≤ 1, we have Hr (tp1 + (1 − t)q1 , . . . , tpn + (1 − t)qn ) ≥ tHr (p1 , . . . , pn ) + (1 − t)Hr (q1 , . . . , qn ). (c) Let P = {p1 , . . . , pn } be a probability distribution. Suppose that ε is a positive real number such that p1 − ε > p2 + ε ≥ 0. Thus, {p1 − ε, p2 + ε, p3 , . . . , pn } is also a probability distribution. Show that Hr (p1 , . . . , pn ) < Hr (p1 − ε, p2 + ε, p3 , . . . , pn ). (7) Let S = {0, 1}. In order to guarantee that the average codeword length per source symbol of S is at most 0.01 greater than the entropy of S, which extension of S should we encode? How many codewords would we need? (8) Let I be an information source and let I 2 be its second extension. Is the second extension of I 2 equal to the fourth extension of S? (9) Show that the Noiseless Coding Theorem is best possible by showing that for any ² > 0, there is a probability distribution P = {p1 , . . . , pn } for which MinAveLenr (p1 , . . . , pn ) − Hr (p1 , . . . , pn ) ≥ 1 − ². CHAPTER 3 Noisy Coding Communications Channels In the previous chapter, we discussed the question of how to most efficiently encode source information for transmission over a noiseless channel, where we did not need to be concerned about correcting errors. Now we are ready to consider the question of how to encode source data efficiently and, at the same time, minimize the probability of uncorrected errors when transmitting over a noisy channel. Definition 3.1. A communications channel consists of a finite input alphabet I = {x1 , . . . , xs } and output alphabet O = {y1 , . . . , yt }, P and a set of forward channel probabilities or transition probabilities, Pf (yj | xi ), satisfying tj=1 Pf (yj | xi ) = 1, for all i = 1, . . . , s. Intuitively, we think of Pf (yj | xi ) as the probability that yj is received, given that xi is sent through the channel. It is important not to confuse the forward channel probability Pf (yj | xi ) with the so-called backward channel probability Pb (xi | yj ). In the forward probabilities, we assume a certain input symbol was sent. In the backward probabilities, we assume a certain output symbol is received. Example 3.2. The noiseless channel, which we discussed in previous chapter, has the same input and output alphabet I = O = {x1 , . . . , xs } and channel probabilities Pf (xi | xj ) = ( 1 i = j, 0 otherwise. Example 3.3. A communications channel is called symmetric if it has the same input and output alphabet I = O = {x1 , . . . , xs } and channel probabilities Pf (xi | xi ) = Pf (xj | xj ) and Pf (xi | xj ) = Pf (xj | xi ), for all i, j = 1, . . . , s. Perhaps the most important memoryless channel is the binary symmetric channel, which has I = O = {0, 1} and channel probabilities Pf (1 | 0) = Pf (0 | 1) = p and Pf (0 | 0) = Pf (1 | 1) = 1 − p. Thus, the probability of a symbol error, also called the crossover probability, is p. Example 3.4. Another important memoryless channel is the binary erasure channel, which has input alphabet I = {0, 1}, output alphabet O = {0, ?, 1} and channel probabilities Pf (1 | 0) = Pf (0 | 1) = q, Pf (? | 0) = Pf (? | 1) = p and Pf (0 | 0) = Pf (1 | 1) = 1 − p − q. We will deal only with channels that have no memory, in the following sense. Definition 3.5. A communications channel is said to be memoryless if for c = c1 · · · cn ∈ I and d = d1 · · · dn ∈ O, the probability that d is received, given that c is sent, is n Y Pf (di | ci ). Pf (d | c) = i=1 We will also refer to the probabilities Pf (d | c) as forward channel probabilities. 11 12 3. NOISY CODING We use the the term memoryless because the probability that an output symbol di is received depends only on the current input ci , and not on previous inputs. Decision Rules A decision rule for C is a partial function f from the set of output strings to the set of the codewords C. The process of applying a decision rule is referred to as decoding. The word “partial” refers to the fact that f may not be defined for all output strings. The intention is that, if an output string d is received and if f (d) ∈ C is defined, then the decision rule decodes that f (d) is the codeword that was sent or else declares a decoding error. Our goal is to find a decision rule that maximizes the probability of correct decoding. The probability of correct decoding can be expressed in a variety of ways. Conditioning on the codeword sent gives XX P(correct decoding) = Pf (d | c)Pi (c), c∈C d∈Bc where Bc = {d|f (d) = c} and Pi (c) is the probability that c is sent through the channel. The probabilities {Pi (c)| c ∈ C} form the so-called input distribution for the channel. Conditioning instead on the string received gives X P(correct decoding) = Pb (f (d) | d)Po (d), d where Po (d) is the probability that d is received through the channel and is called the output distribution for the channel. The probability of correct decoding can be maximized by choosing the decision rule that maximizes each of the conditional probability Pb (f (d) | d). Definition 3.6. Any decision rule f for which f (d) has the property that Pb (f (d) | d) = max Pb (c | d), c∈C for every possible received string d, is called an ideal observer. Proposition 3.7. An ideal observer decision rule maximizes the probability of the correct decoding of received strings among all decision rules. We remark that an ideal observer decision rule depends on the input distribution because Pf (d | c)Pi (c) Pb (c | d) = P . 0 0 c0 ∈C Pf (d | c )Pi (c ) For the case that the input probability distribution is uniform, i.e. Pi (c) = 1/|C|, we have Pf (d | c) Pb (c | d) = P . 0 c0 ∈C Pf (d | c ) Now the denominator on the right is a sum of forward channel probabilities and thus depends only on the communications channel. Thus, maximizing Pb (c | d) is equivalent to maximizing Pf (d | c). This leads to the following definition and proposition. Definition 3.8. Any decision rule f for which f (d) maximizes the forward channel probabilities, that is, for which Pf (d | f (d)) = max Pf (d | c), c∈C CONDITIONAL ENTROPY AND CHANNEL CAPACITY 13 for every possible received string d, is called a maximum likelihood decision rule. Proposition 3.9. For the uniform input distribution, an ideal observer is the same as a maximum likelihood decoding. Conditional Entropy and Channel Capacity In general, knowing the value of the output of a channel will have an effect on our information about the input. This leads us to make the following definition. Definition 3.10. Consider a communications channel with the input alphabet I and the output alphabet O. The r-ary conditional entropy of I, given y ∈ O, is defined by X 1 Hr (I | y) = Pb (x | y) logr . Pb (x | y) x∈I The r-ary conditional entropy of I, given O, is the average conditional entropy defined by X Hr (I | O) = Hr (I | y)Po (y). y∈O Note that Hr (I | O) measure the amount of information remaining in I, after sampling O, and so it can be interpreted as the loss of information about I caused by the channel. Conditional entropy can also be defined for strings. Definition 3.11. Let C be a code over the input alphabet I and D be the set of output strings over the output alphabet O. The r-ary conditional entropy of C, given that d = y1 · · · ym ∈ D, is defined by X 1 Hr (C | d) = Pb (c | d) logr . Pb (c | d) c∈C The r-ary conditional entropy of C, given D is defined by X Hr (C | d)Po (d). Hr (C | D) = d∈D The quantity Ir (I, O) = Hr (I) − Hr (I | O) is the amount of information in I minus the amount of information still in I after knowing O. In other words, Ir (I, O) is the amount of information about I that gets through the channel. Definition 3.12. The r-ary mutual information of I and O is defined by X 1 Ir (I, O) = Hr (I) − Hr (I | O) = Pi (x) logr − Hr (I | O). Pi (x) x∈I Notice that the quantity Ir (I, O) depends upon the input distribution of I as well as the forward channel probabilities Pf (y | x). We are now ready to define the concept of the capacity of a communications channel. This concept plays a key role in the main results of information theory. Definition 3.13. The capacity of a communications channel is the maximum mutual information Ir (I, O), taken over all input distributions of I. 14 3. NOISY CODING Proposition 3.14. Consider a symmetric channel with input alphabet and output alphabet I of size r. Then capacity of this symmetric channel is X 1 Pf (y | x) logr 1− , Pf (y | x) y∈I for any x ∈ I. Furthermore, the capacity is achieved by the uniform input distribution. Corollary 3.15. The capacity of the binary symmetric channel with crossover probability p is 1 + p log2 p + (1 − p) log2 (1 − p). The Noisy Coding Theorem It is sometimes said that there are two main results in information theory. One is the Noiseless Coding Theorem, which we discussed in previous chapter, and the other is the so-called Noisy Coding Theorem. Before we can state the Noisy Coding Theorem formally, we need to discuss in detail the notion of rate of transmission. Let us suppose that the source information is in the form of strings of length k, over the input alphabet I of size r and that the r-ary block code C consist of codewords of fixed length n over I. Now, since the channel must transmit n code symbols in order to send k source symbols, the rate of transmission is R = k/n source symbols per code symbol. Further, since there are rk possible source strings, the code must have size at least rk in order to accommodate all of these strings. Assuming that |C| = rk , we have k = logr |C| and hence R = logr |C|/n. Thus we have the following. Definition 3.16. An r-ary block code C of length n and size |C| is called an (n, |C|) − code. The number logr |C| R(C) = n is called the rate of C. Now, we can state the Noisy Coding Theorem. Let dxe denote the smallest integer greater than or equal to x. Theorem 3.17 (The Noisy Coding Theorem). Consider a memoryless communications channel with capacity C. For any positive number R < C, there exists a sequence Cn of r-ary block codes and corresponding decision rules fn with the following properties. (1) Cn is an (n, drnR e)-code. Thus, Cn has length n and rate at least R. (2) The probability of decoding error of fn approach 0 as n → ∞. Roughly speaking, the Noisy Coding Theorem says that, if we choose any transmission rate below the capacity of the channel, there exists a code that can transmit at that rate and yet maintain a probability of decoding error below some predefined limit. The price we pay for this efficient encoding is that the code size n may be extremely large. Furthermore, the known proofs of this theorem tell us only that such a code must exist, but do not show us how to actually find these codes. EXERCISE 15 Exercise (1) Consider a channel whose input alphabet is the set of all integers between −n and n and whose output is the square of the input. Determinate the forward channel probabilities of this channel. (2) Suppose that codewords from the code {0000, 1111} are being sent over a binary symmetric channel (c.f. Example 3.3) with crossover probability p = 0.01. Use the maximum likelihood decision rule to decode the received strings 0000, 0010 and 1010. (3) Let C be a block code consists of all 8 binary strings of length 3. Denote the input codeword by i1 i2 i3 and the received string by o1 o2 o3 . Let B.S.C. denote a binary symmetric channel with crossover probability p = 0.001. Consider the following different channels. (a) The first channel works as follows: send i1 through the B.S.C. to get o1 and no matter what i2 and i3 are, choose o2 and o3 randomly. (b) The second channel works as follows: send i1 through the B.S.C. to get o1 , send i2 through the B.S.C. to get o2 and send i3 through the B.S.C. to get o3 . (c) The third channel works as follows: choose o1 = o2 = o3 to be the majority bit among i1 , i2 and i3 . Compute the probability of correct decoding for each of these channels, assuming a uniform input distribution. Which channel is best? (4) Show that for a symmetric channel with uniform input distribution, the output distribution is also uniform. (5) Let I and O be the input alphabet and the output alphabet of a noiseless communications channel. Show that Hr (I | O) = 0. (6) Let I and O be the input alphabet and the output alphabet of a communications channel with forward channel probabilities {Pf (y | x) | x ∈ I, y ∈ O}. Suppose that {Pi (x) | x ∈ I} is the input distribution and {Po (y) | y ∈ O} is the output distribution for the channel. (a) Show that the backward channel probability for x ∈ I and y ∈ O is Pf (y | x)Pi (x) . Pb (x | y) = Po (y) (b) Show that for an r-ary symmetric channel, X X 1 1 Ir (I, O) = Po (y) logr − Pf (y | x) logr , P P o (y) f (y | x) y∈O y∈O for any x ∈ I. (7) Consider the special case of a binary erasure channel (c.f. Example 3.4), which has input alphabet I = {0, 1}, output alphabet O = {0, ?, 1} and channel probabilities Pf (1 | 0) = Pf (0 | 1) = 0, Pf (? | 0) = Pf (? | 1) = p and Pf (0 | 0) = Pf (1 | 1) = 1 − p. Calculate the mutual information I2 (I, O) in terms of the input probability Pi (0) = p0 . Then determine the capacity of the channel, and an input probability that achieves that capacity. CHAPTER 4 General Remarks on Codes Nearest Neighbor Decoding In general the problem of finding good codes is a very difficult one. However, by making certain assumptions about the channel, we can at least give the problem a highly intuitive flavor. We begin with a definition. Definition 4.1. Let x = x1 x2 · · · xn and y = y1 y2 · · · yn be strings of the same length n over the same alphabet A. The Hamming distance d(x, y) between x and y is the number of positions in which xi 6= yi . For instance, if x = 10112 and y = 20110, then d(x, y) = 2. The following result says that Hamming distance is a metric. Proposition 4.2. Let An be the set of all strings of length n over the alphabet A. Then the Hamming distance function d : An × An → N satisfies the following properties. For all x, y and z in An , (1) d(x, y) ≥ 0 and d(x, y) = 0 if and only if x = y; (2) d(x, y) = d(y, x); (3) d(x, y) ≤ d(x, z) + d(z, y). In other words, (An , d) is a metric space. Suppose that C is a block code of length n over A. The codewords that are closest to a given received string x are referred to as nearest neighbor codewords. The nearest neighbor decoding or minimum distance decoding is the decision rule that decodes a received strings as a nearest neighbor codeword. When there are more than one nearest neighbor codeword, we will refer to this situation as a tie. In some cases, we may wish to choose randomly from among the candidates. In other cases, it might be more desirable simply to admit a decoding error. The term complete decoding refers to the case where all received strings are decoded and the term incomplete decoding refers to the case where we prefer occasionally to simply admit an error, rather than always decodes. There are many channels for which maximum likelihood decoding takes the intuitive form of nearest neighbor decoding. For instance, the r-ary symmetric channel with forward channel probabilities ( 1 − p if i = j, Pf (xi | xj ) = p otherwise. r−1 has this property, for p < 1/2. In implementing nearest neighbor decoding, the following concepts are useful. Definition 4.3. Let C be a block code with at least two codewords. The minimum distance of C is defined to be d(C) = min{d(c, d) | c, d ∈ C, c 6= d}. 17 18 4. GENERAL REMARKS ON CODES An (n, M, d)-code is a block code of size M , length n and minimum distance d. The numbers n, M and d are called the parameters of the code. Since for c 6= d, d(c, d) ≥ 1, the minimum distance of a code must be at least 1. Perfect Code Definition 4.4. Let x be a string in An , where |A| = r and let ρ > 0. The sphere in A with center x and radius ρ is the set n Srn (x, ρ) = {y ∈ An | d(x, y) ≤ ρ}. The volume Vrn (ρ) of the sphere Srn (x, ρ) is the number of elements in Srn (x, ρ). This volume is independent of the center and is given by bρc µ ¶ X n n (r − 1)k , Vr (ρ) = k k=0 where bρc denote the greatest integer smaller than or equal to ρ. We can determine the minimum distance of a code C by simply increasing the radius t of the spheres centered at each codeword of C until just before two spheres become “tangent” (which will happen when d(C) = 2t + 2), or just before two spheres “overlap” (which will happen when d(C) = 2t + 1). Definition 4.5. Let C ∈ An be a code. The packing radius of C is the largest integer ρ for which the spheres Srn (c, ρ) centered at each codeword c are disjoint. The covering radius of C is the smallest integer ρ0 for which the spheres Srn (c, ρ0 ) centered at each codeword c cover An . We will denote the packing radius of C by pr(C) and the covering radius by cr(C). c. Proposition 4.6. The packing radius of an (n, M, d)-code C is pr(C) = b d−1 2 The following concept plays a major role in coding theory. Definition 4.7. An r-ary (n, M, d)-code C is perfect if pr(C) = cr(C) In words, a code C ⊆ An is perfect if there exists a number ρ for which the spheres centered at each codeword c are disjoint and cover An . The size of a perfect code is uniquely determined by the length and the minimum distance. The following result is known as the sphere-packing condition. Srn (c, ρ) Proposition 4.8. Let C be an r-ary (n, M, d)-code. Then C is perfect if and only if d = 2v + 1 is odd and v µ ¶ X n n (r − 1)k = rn . M · Vr (v) = M · k k=0 It is important to emphasize that the existence of numbers n, M and d = 2v + 1 for which the sphere-packing condition holds does not mean that there is a perfect code with these parameters. The problem of determining all perfect codes has not yet been solved. However, a great deal is known about perfect codes over alphabets whose size is a power of a prime. ERROR DETECTION AND ERROR CORRECTION 19 Error Detection and Error Correction Let u be a positive integer. If u errors occur in the transmission of a codeword, we will say that an error of size u has occurred. It is possible that so many errors occurred as to change the codeword into another codeword, so that we cannot detect if any error has occurred or not. Definition 4.9. A code C is u-error-detecting, if whenever an error of size of at most u but at least one has occurred, the resulting string is not a codeword. A code C is exactly u-error-detecting if it is u-error-detecting but not u + 1-error-detecting. The next theorem is essentially just a restatement of the definition of u-error-detecting in terms of minimum distance. Theorem 4.10. A code C is u-error-detecting if and only if d(C) ≥ u + 1. In particular, C is exactly u-error-detecting if and only if d(C) = u + 1. Definition 4.11. Let v be a positive integer. A code C is v-error-correcting if nearest neighbor decoding is able to correct v or fewer errors, assuming that if a tie occurs in the decoding process, a decoding error is reported. A code is exactly v-error-correcting if it is v-error-correcting but not (v + 1)-error-correcting. It should be kept in mind that, as long as the received word is not a codeword, nearest neighbor decoding will decode it as some codeword, but the receiver has no way of knowing whether that codeword is the one that was actually sent. We know only that, under a v-error-correcting code, if no more than v errors were introduced, then nearest neighbor decoding will produce the codeword that was sent. Theorem 4.12. A code is v-error-correcting if and only if d(C) ≥ 2v + 1. In particular, C is exactly v-error-correcting if and only if d(C) = 2v + 1 or d(C) = 2v + 2. c-errorCorollary 4.13. A code C has d(C) = d if and only if it is exactly b d−1 2 correcting. The following result is a consequence of Proposition 4.6 and Theorem 4.12. It shows the connection between error correction and pr(C). Corollary 4.14. Assuming that ties are always reported as error, a code C is exactly v-error-correcting if and only if pr(C) = v. Example 4.15. The r-ary repetition code of length n is Repr (n) = {00 · · · 0, 11 · · · 1, . . . , (r − 1)(r − 1) · · · (r − 1)}, consisting of r codewords each of length n. The r-ary repetition code of length n can detect up to n − 1 errors in transmission, and so it is exactly (n − 1)-error-detecting. Furthermore, c-error-correcting. it is exactly b n−1 2 Suppose that a code C has minimum distance d. If we use C for error detecting only, it can detect up to d − 1 errors. On the other hand, if we want C to also correct errors c errors, but may no longer be able to whenever possible, then it can correct up to b d−1 2 d−1 detect a situation where more than b 2 c but less than d errors have occurred. For if more than b d−1 c are made, nearest neighbor decoding might “correct” the received word to the 2 wrong codeword and thus the errors will go undetected. 20 4. GENERAL REMARKS ON CODES We consider the following strategy: Let v be a positive integer. If a string x is received and if the closed codeword c to x is at a distance of at most v, and there is only one such codeword, then decode x as c. If there is more than one codeword at minimum distance to x or if the closest codeword has distance greater than v, then simply declare an error. Definition 4.16. A code C is simultaneously v-error-correcting and u-error-detecting, if whenever at least one but at most v errors are made, the strategy describe above will correct these errors and if whenever at least v + 1 but at most v + u errors are made, the strategy above simply reports an error. Theorem 4.17. A code C is simultaneously v-error-correcting and u-error-detecting if and only if d(C) ≥ 2v + u + 1. It is intuitively clear that, given any code C, we may continually add new codewords to it at no cost to its minimum distance. This leads us to make the following definition. Definition 4.18. An (n, M, d)-code is said to be maximal if it is not contained in any larger code with the same minimum distance, that is, if it is not contained in any (n, M +1, d)code. Thus an (n, M, d)-code C is maximal if and only if, for all strings x ∈ An , there is a codeword c ∈ C with the property that d(x, c) < d. Proposition 4.19. For the binary symmetric channel with crossover probability p using minimum distance decoding, the probability of a decoding error for maximal (n, M, d)-code satisfies b d−1 cµ ¶ n µ ¶ 2 X X n k n k n−k p (1 − p) ≤ P(decode error) ≤ 1 − p (1 − p)n−k . k k k=d k=0 Furthermore, for a non-maximal code, the upper bound still holds, but the lower bound may not. Making New Codes from Old Codes There are several useful techniques that can be used to obtain new codes from old codes. In the following, we always suppose that our codes are over the alphabet A = Zr = Z/rZ. Extending a Code. The process of adding one or more additional positions to all the codewords in a code, thereby increasing the length of the code, is referred to as extending the code. The most common way to extend a code is by adding an overall parity check, which is done as follows. If C is an r-ary (n, M, d)-code over Zr , we define the extended code C by C = {c1 c2 · · · cn cn+1 | c1 c2 · · · cn ∈ C and n+1 X ck ≡ 0 (mod r)}. k=1 If C is an (n, M , d)-code, then n = n + 1, M = M and d = d or d + 1. We remark that for a binary (n, M, d)-code C, the minimum distance of C depends on the parity of d. In particular, since all of the codewords in C have even sum, the minimum distance of C is even. It follows that if d is even then d(C) = d and if d is odd then c = b d(C)−1 c, the error-correcting capabilities of the d(C) = d + 1. Moreover, since b d(C)−1 2 2 code do not increase. MAKING NEW CODES FROM OLD CODES 21 Puncturing a Code. The opposite process to extending a code is puncturing a code, in which one or more positions are removed from the codewords. If C is an r-ary (n, M, d)-code and if d ≥ 2, then the code C ∗ obtained by puncturing C once has parameters n∗ = n − 1, M ∗ = M and d∗ = d or d − 1. For binary code, the process of extending and puncturing can be used to prove the following useful result. Lemma 4.20. A binary (n, M, 2v +1)-code exists if and only if a binary (n+1, M, 2v +2)code exists. Shortening a Code. Shortening a code refers to the process of keeping only those codewords in a code that have a given symbol in a given position, and then deleting that position. If C is an (n, M, d)-code then a shortened code has length n − 1 and minimum distance at least d. In fact, shortening a code can result in a substantial increase in the minimum distance, but shortening a code does result in a code with smaller size. The shortened code formed by taking codewords with an s in the i-th position is referred to as the cross-section xi = s. We will have many occasions to use cross-sections in the sequel. Augmenting a Code. Augmenting a code which simply means adding additional strings to the code. A common way to augment a binary code C is to include the complements of each codeword in C, where the complement of a binary codeword c is the string obtained from c by interchanging all 0’s and 1’s. Let us denote the complement of c by cc and denote the set of all complements of the codewords in C by C c . It is easy to check that if x, y ∈ Zn2 , then d(x, yc ) = n − d(x, y). Proposition 4.21. Let C be a binary (n, M, d)-code. Suppose that d0 is the maximum distance between codewords in C. Then d(C ∪ C c ) = min{d, n − d0 }. The Direct Sum Construction. If C1 is an r-ary (n1 , M1 , d1 )-code and C2 is an r-ary (n2 , M2 , d2 )-code, the direct sum C1 ¯ C2 is the code C1 ¯ C2 = {cd | c ∈ C1 , d ∈ C2 }. Clearly, C1 ¯ C2 has parameters n = n1 + n2 , M = M1 M2 and d = min{d1 , d2 }. The u(u + v) Construction. A much more useful construction than the direct sum is the following. If C1 is an r-ary (n, M1 , d1 )-code and C2 is an r-ary (n, M2 , d2 )-code, then we define a code C1 ⊕ C2 by C1 ⊕ C2 = {c(c + d) | c ∈ C1 , d ∈ C2 }. Certainly, the length of C1 ⊕ C2 is 2n and the size is M1 M2 . As for the minimum distance, consider two distinct codewords x = c1 (c1 + d1 ) and y = c2 (c2 + d2 ). If d1 = d2 , then d(x, y) ≥ 2d1 . On the other hand, if d1 6= d2 , then d(x, y) ≥ d2 . Since equality can hold in both cases, we get the following result. Lemma 4.22. Let C1 be an r-ary (n, M1 , d1 )-code and C2 be an r-ary (n, M2 , d2 )-code. Then C1 ⊕ C2 is a (2n, M1 M2 , d0 )-code, where d0 = min{2d1 , d2 }. 22 4. GENERAL REMARKS ON CODES Equivalence of Codes. There are various definitions of equivalence of codes in the literature. We will adopt the following definitions. Definition 4.23. Two r-ary (n, M )-codes C1 and C2 are equivalent if there exists a permutation σ of the n positions and permutations π1 , . . . , πn of the code alphabet for which c1 c2 · · · cn ∈ C1 if and only if π1 (cσ(1) )π2 (cσ(2) ) · · · πn (cσ(n) ) ∈ C2 . In particular, any r-ary code over Zr is equivalent to a code that contains the zero codeword 0 = 00 · · · 0. Furthermore, equivalent codes have the same length, size and minimum distance. The Main Coding Theory Problem A good r-ary (n, M, d)-codes should have a relatively large size so that it can be used to encode a large number of source messages and it should have a relatively large minimum distance so that it can be used to correct a large number of errors. Not surprisingly, these goals are conflicting. For given values of n and d, it is customary to let Ar (n, d) denote the largest possible size M for which there exists an r-ary (n, M, d)-code. Any r-ary (n, M, d)-code with M = Ar (n, d) is called an optimal code. The numbers Ar (n, d) play a central role in coding theory and much effort has been expended in attempting to determine their values. In fact, determining the values of Ar (n, d) has come to be known as the main coding theory problem. Note that in order to show that Ar (n, d) = M , it is enough to show that Ar (n, d) ≤ M and then find a specific r-ary (n, M )-code C for which d(C) ≥ d, which shows that Ar (n, d) ≥ Ar (n, d(C)) ≥ M . Example 4.24. Let C be a binary (4, M, 3)-code. Without lose of generality, we may assume that C contains the zero codeword 0 = 0000. Now since d(c, 0) ≥ 3 for any other codeword c in C. This leaves five possibilities for additional codewords in C, namely: 1110, 1101, 1011, 0111, 1111. But no pair of these has distance 3 apart, and so only one can be included in C. Hence A2 (4, 3) = 2. Example 4.25. Let C be a binary (5, M, 3)-code. Consider the cross-section C0 defined by x1 = 0. We know that C0 has minimum distance d0 where 4 ≥ d0 ≥ 3 and since A2 (4, 3) = A2 (4, 4) = 2, it follows that C0 has size M0 ≤ 2. Similarly, consider the crosssection C1 defined by x1 = 1. C1 has size M1 ≤ 2. Thus M = M0 + M1 ≤ 4 and hence A2 (5, 3) ≤ 4. On the other hand, the code C = {00000, 11100, 00111, 11011} has minimum distance d(C) = 3 and so A2 (5, 3) = 4. The approach used in Example 4.25 will not go very far in determining values of A2 (n, d). In fact, very few actual values of A2 (n, d) are known. For instance, we only know that 72 ≤ A2 (10, 3) ≤ 79. Let us now turn to the establishment of some general results about the numbers Ar (n, d). Proposition 4.26. For any n ≥ 1, (1) Ar (n, d) ≤ rn for all 1 ≤ d ≤ n. (2) Ar (n, 1) = rn . (3) Ar (n, n) = r. THE MAIN CODING THEORY PROBLEM 23 Let C be an optimal r-ary (n, M, d)-code. By use of the pigeon-hole principle, one of the cross-sections x1 = i of C must contain at least M/r codewords, and so we have the following. Proposition 4.27. For any n ≥ 2, Ar (n, d) ≤ rAr (n − 1, d). According to Lemma 4.20, a binary (n, M, 2v + 1)-code exists if and only if a binary (n + 1, M, 2v + 2)-code exists. Hence, we immediately have the following. Proposition 4.28. If d > 0 is even, then A2 (n, d) = A2 (n − 1, d − 1). Thus, for binary codes, it is enough to determine A2 (n, d) for all odd values of d. Let us now turn our attention to some upper and lower bounds on the numbers Ar (n, d) that arise from considering spheres in Znr . Let C = {c1 , . . . , cM } be an optimal r-ary (n, M, d)-code over Zr . Thus M = Ar (n, d). Because C has maximal size, there can be no string in Znr whose distance from every codeword S n n n in C is at least d. In symbols Znr ⊆ M i=1 Sr (ci , d − 1). Since |Zr | = r , it implies that rn ≤ Vrn (d − 1) · M . We arrive at the following result, called the sphere-covering bound for Ar (n, d). Theorem 4.29 (The sphere-covering bound for Ar (n, d)). If Vrn (ρ) denotes the volume of a sphere of radius ρ in Znr , then rn ≤ Ar (n, d). Vrn (d − 1) The sphere-covering bound is a lower bound for Ar (n, d). We can derive an upper bound for Ar (n, d) by similar methods. In particular, let C = {c1 , . . . , cM } be an optimal (n, M, d)S n n code. Since pr(C) = b d−1 c and M i=1 Sr (ci , pr(C)) ⊆ Zr , we have the sphere-packing bound 2 for Ar (n, d). Theorem 4.30 (sphere-packing bound for Ar (n, d)). If Vrn (ρ) denotes the volume of a sphere of radius ρ in Znr , then rn Ar (n, d) ≤ n d−1 . Vr (b 2 c) The sphere-packing bound is not the only useful upper bound on the values of Ar (n, d). We consider two additional bounds. Let C be an (n, M, d)-code. If we remove the last d − 1 positions from each codeword in C, the resulting shortened codewords must all be distinct. Since the length of the shortened codewords is n − d − 1, we have the following. Theorem 4.31 (The Singleton bound). Ar (n, d) ≤ rn−d+1 . Example 4.32. According to the Singleton bound, Ar (4, 3) ≤ r2 . On the other hand, the sphere-packing bound is Ar (4, 3) ≤ r4 /(4r − 3). Thus, for r ≥ 4. the Singleton bound is much more better than the sphere-packing bound. Let C be an r-ary (n, M, P d)-code and consider the sum of the distance between codewords, which is given by S = c,d∈C d(c, d). Since the minimum distance of C is d, we have S ≥ M (M − 1)d. On the other hand, suppose that the number of j’s in the i-th position of 24 4. GENERAL REMARKS ON CODES all codewords in C is kij , where j = 0, . . . , r − 1. Then the i-th position contributes a total of r−1 r−1 X X M2 2 kij (M − kij ) = M − kij2 ≤ M 2 − r j=0 j=0 to S, since the last sum above is smallest when kij = M/r. Since there are n positions, we have M (M − 1)d ≤ S ≤ nM 2 (1 − 1/r). Solving for M gives the following result. Theorem 4.33 (The Plotkin Bound). If n < dr/(r − 1), then dr Ar (n, d) ≤ . dr − nr + n The Plotkin bound can easily be refined a bit when r = 2. Theorem 4.34. (The Plotkin Bound for Binary Code). (1) If d is even and n < 2d, then d A2 (n, d) ≤ 2b c 2d − n and for n = 2d, A2 (2d, d) ≤ 4d. (2) If d is odd and n < 2d + 1, then d+1 A2 (n, d) ≤ 2b c 2d + 1 − n and for n = 2d + 1, A2 (2d + 1, d) ≤ 4d + 4. The Plotkin bound applies only when the minimum distance d is rather large. It seems superior to the sphere-packing bound. Example 4.35. The Plotkin bound can also be used, in conjunction with Proposition 4.27, to give an upper bound when d ≤ n(r − 1)/r. For example, We have A2 (13, 5) = 23 A2 (10.5) ≤ 96. Exercise (1) Consider the code C consisting of all strings in Zn2 that have an even number of 1s. What is the length, size, and minimum distance of C? (2) Let c, d ∈ An and consider the sets S = {x ∈ An | d(x, c) < d(x, d)} and T = {x ∈ An | d(x, c) > d(x, d)}. Show that |S| = |T |. (3) Construct an explicit example to illustrate that simultaneous error detection and correction can reduce the error detecting capabilities of a code. (4) Estimate the probability of a decoding error using the binary repetition code of length 5 under a binary symmetric channel with crossover probability p = 0.001. (5) Dose a binary (8, 4, 5)-code exist? Justify your answer. (6) Let C be an r-ary (n, M, d)-code over the alphabet Zr . Show that, as long as d < n, then for some position i, there is a cross-section that has minimum distance d. What can happen if d = n? (7) Suppose that C is an (n, M, d)-code. Show that C is a cross-section of a larger code with parameters (n + 1, M + 2, 1). (8) Let C1 = {c1 c2 c3 c4 | c1 + c2 + c3 + c4 ≡ 0 (mod 2)} be the code over Z2 . (a) What are the parameters of C1 ? EXERCISE (9) (10) (11) (12) (13) (14) (15) 25 (b) Construct C2 = C1 ⊕ Rep2 (4). What are the parameters of C2 ? (c) What are the parameters of C3 = C2 ⊕ Rep2 (8)? (d) What are the parameters of C4 = C3 ⊕ Rep2 (16)? (e) Show that we can construct a binary (2m , 2m+1 , 2m−1 )-code in this fashion. If C is a code over Zp and C is the code obtained by adding an overall parity check, what is the relation between the minimum distances of C and C? Verify that A2 (6, 5) = 2, A2 (7, 5) = 2 and A2 (8, 5) = 4. Let C be an (n, M, d)-code. (a) If C is not maximal, is it always possible to add codewords to C until the resulting code is maximal? (b) If C is not optimal, is it always possible to add codewords to C until the resulting code is optimal? (c) Given an example of a code that is maximal but not optimal. Is there a binary (8, 29, 3)-code? Explain. Show that Ar (r + 1, 5) ≤ 2rr−2 /(r − 1). Compare the Singleton, Plotkin and sphere-packing upper bounds for A2 (9, 5). Let C be a perfect binary (n, M, 7)-code. Use the sphere-packing condition to show that n = 7 or n = 23. CHAPTER 5 Linear Codes Finite Fields Finite fields play a major role in coding theory and so it is important to gain a solid understanding of the structure of such fields. Let K and F be fields. If K is an extension of F , we write K/F . In this case, K is also a vector space over F . If the dimension of K over F is finite, we say that K is a finite extension of F and denote this dimension by [K : F ]. It is easy to check that if F is a finite field and K is a finite extension of F with d = [K : F ], then K is a finite field such that |K| = |F |d . If R is a ring and if there exists a positive integer n for which n·a=a · · + a} = 0 | + ·{z n times for all a ∈ R, then the smallest such n is called the characteristic of R and is denoted by char(R). If no such positive integer n exists, we say that R has characteristic 0. In a field of characteristic 0, the positive integers 1, 2, . . . , are all distinct, and so a finite field must have nonzero characteristic. Suppose that the characteristic of a finite field F is n. If n = uv where 1 < u, v < n, then (u · 1)(v · 1) = 0 implying u · 1 = 0 or v · 1 = 0. In either case, we have a contradiction to the fact that n is the smallest positive integer such that n · 1 = 0. Thus, n must be a prime number. Lemma 5.1. If F is a finite field, then F has prime characteristic. Furthermore, if char(F ) = p, then F has pn elements, for some positive integers n. From now on, p will represent a prime number and q will represent a prime power. The following result is a key reason why the theory of finite fields has its characteristic flavor. Lemma 5.2. If F is a finite field of characteristic p, then n n n (α + β)p = αp + β p , for any positive integer n and for all α, β ∈ F . According to the definition, the set F ∗ of nonzero elements of a field F forms a group under multiplication. If |F | = q, then |F ∗ | = q − 1 and since the order of every element in a group divides the order of the group, we have αq−1 = 1 for all α ∈ F ∗ . In other words, every element of F is a root of the polynomial fq (x) = xq − x. But since this polynomial has at most q roots, we see that F is the set of all roots of fq (x) and therefore is also the splitting field for fq (x). Lemma 5.3. If F is a finite field of q elements, then F is both the set of all roots of fq (x) = xq − x and the splitting field for fq (x). 27 28 5. LINEAR CODES Since any two splitting fields for the same polynomial are isomorphic, Lemma 5.3 tells us that any finite field of the same size is isomorphic. We will denote a finite field of size q by Fq . It remains now to determine whether or not there is a finite field of size q for every prime power q = pn . Let K be the splitting field for fq (x) = xq − x and let R be the set of roots of fq (x). If α, 0 6= β ∈ R, then by Lemma 5.2, α + β and αβ −1 are also in R. Thus, R is a subfield of K which implies that R = K. Let us summarize our results. Theorem 5.4. All finite fields have size q = pn , for some prime p. On the other hand, for every q = pn , there exists a unique (up to isomorphism) field of size q. Our goal now is to describe the subfield of a finite field. Suppose that K is a field of size d n p and let d | n. It is not hard to show that pd − 1 | pn − 1 and so xp − x | xp − x. Hence d fpd (x) = xp − x splits into linear factors over K. In other words, K contains a subfield of size pd . n Theorem 5.5. Let K be a finite field of size pn . Then K has exactly one subfield of size pd for each d | n. Furthermore, this accounts for all of the subfields of K. For a finite field, the multiplicative group K ∗ could not have a simpler structure: it is cyclic. Recall that if G is a cyclic group of order n, then G contains exactly φ(d) elements of each order d dividing n, where φ is the Euler’s phi function. This gives the formula X φ(d) = n. d|n Now, suppose that |F ∗ | = q −1 and α is an element of F ∗ of order d. Thus, d | q −1. Consider the cyclic subgroup < α > generated by α. Every element of < α > has order dividing d and so is a root of the polynomial xd − 1. But this polynomial can have at most d roots in F and so < α > is the set of all roots of xd − 1. In particular, all of the elements of F of order d must lie in < α >. However, in < α >, there are exactly φ(d) elements of order d. Hence, letting ψ(d) denote the number of elements of F of order d, then ψ(d) = φ(d) or ψ(d) = 0, we have X X φ(d). ψ(d) = |F ∗ | = q − 1 = d | q−1 d | q−1 We have the following result. Theorem 5.6. If F is a finite field of q elements, then F contains exactly φ(d) elements of order d, for each d | q − 1. In particular, the multiplicative group F ∗ of nonzero elements of F is cyclic. Basic Definitions The set Fnq of all n-tuples whose components belong to Fq is a vector space over Fq of dimension n. We will write the vector (x1 , x2 , . . . , xn ) in the form x1 x2 · · · xn . We can now define the most important and most studied type of code. Definition 5.7. A code C ⊆ Fnq that is also a subspace of Fnq is called a linear code. If C has dimension k and minimum distance d(C) = d, then C is an [n, k, d]-code. When we do not care to emphasize the minimum distance d, we use the notation [n, k]-code. The number n, k and d are called the parameters of the linear code. BASIC DEFINITIONS 29 Note that a linear code C being a subspace of Fnq , must contain the zero codeword 0 = 0 · · · 0. Note also that a q-ary linear [n, k, d]-code is an (n, q k , d)-code. Since a linear code is a vector space, we can describe it by giving a basis. It is customary to arrange the basis vectors as rows of a matrix. Definition 5.8. Let C be an [n, k]-code with a basis B = {b1 , . . . , bk }. If b1 = b11 b12 · · · b1n b2 = b21 b22 · · · b2n .. . bk = bk1 bk2 · · · bkn then the k × n matrix b11 b12 · · · b21 b22 · · · G= .. . bk1 bk2 · · · b1n b2n bkn whose rows are the codewords in B, is called the generator matrix for C. If C is a q-ary linear [n, k]-code with generator matrix G, then the codewords in C are precisely the row space of G. Put another way, C = {x · G | x ∈ Fnq }. Since performing elementary row operations does not change the row space of a matrix, any matrix that is row equivalent to G is also a generator matrix for C. On the other hand, interchanging two column of G, gives us a generator matrix for a code which is equivalent to C. A generator matrix of the form G = (Ik | Mk,n−k ) (where Ik is the identity matrix of size k × k and Mk,n−k is a matrix of size k × (n − k)), is said to be in left standard form. In view of the previous remarks, every linear code is equivalent to a linear code which has a generator matrix in standard form. When a k × n generator matrix is in left standard form, it makes both encoding and decoding processes very simple. Example 5.9. As we will see later, the 1 0 0 1 G= 0 0 0 0 matrix 0 0 1 0 0 0 0 1 0 1 1 1 1 0 1 1 1 1 0 1 is a generator matrix for the Hamming code H2 (3). The Hamming code H2 (3) can encode source words from F42 as follows 1 0 0 0 0 1 1 0 1 0 0 1 0 1 x·G = (x1 , x2 , x3 , x4 ) 0 0 1 0 1 1 0 = (x1 , x2 , x3 , x4 , x2 +x3 +x4 , x1 +x3 +x4 , x1 +x2 +x4 ) 0 0 0 1 1 1 1 Since G is in left standard form. the original source message appears as the first k symbols of its codeword. 30 5. LINEAR CODES The Dual of a Linear Code We have seen several ways of constructing new codes from old ones. Now, we describe another method (perhaps the most important one for linear codes). Definition 5.10. Let x = x1 x2 · · · xn and y = y1 y2 · · · yn be strings in Fnq . The inner product of x and y, denoted by x · y, is the element of Fq defined by x · y = x1 y1 + x2 y2 + · · · + xn yn where the sum and product are taken in Fq . For any set S ⊆ Fnq , we let S ⊥ denote the set of all strings in Fnq that are orthogonal to every strings in S. Thus, S ⊥ = {x ∈ Fnq | s · x = 0, ∀ s ∈ S}. This set is called the orthogonal complement of S. Lemma 5.11. For any subset S in Fnq , the set S ⊥ is a linear code. From Lemma 5.11, we have the following definition. Definition 5.12. The orthogonal complement C ⊥ of any code C is a linear code called the dual code of C. We may apply some basic linear algebra to get the following results which give some of the most basic properties of dual codes. Proposition 5.13. Let C be a linear [n, k]-code over Fq , with generator matrix G. (1) C ⊥ is the set of all strings that are orthogonal to every row of G. In symbols, C ⊥ = {x ∈ Fnq | x · Gt = 0}. (where Gt is the transpose of G) (2) C ⊥ is a linear [n, n − k]-code. In other words, dim(C ⊥ ) = n − dim(C). (3) We have (C ⊥ )⊥ = C. We should remark that the properties of the dual of a linear code over a finite field can be quite different from those of the dual space of a vector space over the real numbers. For instance, if W is a subspace of a finite dimensional real vector space V , then W ⊥ ∩ W = {0}, since no vector is orthogonal to itself. This is not always the case for linear codes over finite field, however. In fact, as the next example illustrates, we can even have C ⊥ = C. Example 5.14. For the binary [4, 2]-code, C = {0000, 1100, 0011, 1111}, we have C ⊆ C , and since C ⊥ is also a [4, 2]-code, we get C = C ⊥ . ⊥ Definition 5.15. A linear code C is said to be self-orthogonal if C ⊆ C ⊥ . A linear code C for which C = C ⊥ is said to be self-dual. It is easy to check that a linear code C with generator matrix G is self-orthogonal if and only if the rows of G are orthogonal to themselves and to each other. Note that a linear [n, k]-code is self-dual if and only if it is self-orthogonal and k = n/2. By Proposition 5.13 (1), we can describe the dual code as the solutions to certain equations. The system of equations x · Gt = 0 is called the parity check equations for the code THE DUAL OF A LINEAR CODE 31 C ⊥ . A string x = x1 x2 · · · xn ∈ Fnq is in the dual code C ⊥ if and only if its components x1 , . . . , xn satisfy the parity check equations for C ⊥ . Definition 5.16. A parity check matrix for a linear q-ary [n, k]-code C is a matrix P with the property that C = {x ∈ Fnq | x · P t = 0}. Note that, unlike a generator matrix, we make no requirement that the rows of P be linearly independent. Of course, parity check matrices in which the rows are linearly independent are smaller and therefore more efficient than other parity check matrices. Any linear code C has a parity check matrix. In particular, a generator matrix for the dual code C ⊥ is a parity check matrix for C. We have now two convenient ways to define a linear code C: by giving a generator matrix or by giving a parity check matrix. One of the advantages of a generator matrix in left standard form is that such a description makes it easy to encode and decode source messages. Another advantage is that it is easy to construct a parity check matrix from a generator matrix that is in left standard form. Let G = (Ik | B) be a generator matrix for C. Consider P = (−B t | In−k ). Then µ ¶ −B t GP = (Ik | B) = −B + B = O In−k where O is the k × (n − k) zero matrix. Hence, the rows of P are orthogonal to the rows of G and since rank(P ) = n − k = dim(C ⊥ ), we deduce that P is a generator matrix for the dual code C ⊥ . We have the following. Proposition 5.17. The matrix G = (Ik | B) is a generator matrix for an [n, k]-code C if and only if the matrix P = (−B t | In−k ) is a parity check matrix for C. Example 5.18. The code H2 (3) in Example 0 1 1 1 P = 1 0 1 1 1 1 0 1 5.9 has parity check matrix 1 0 0 0 1 0 . 0 0 1 In this case, the parity check equations are x2 + x3 + x4 + x5 = 0 x1 + x3 + x4 + x6 = 0 . x1 + x2 + x4 + x7 = 0 A matrix of the form A = (M | Ik ) is said to be in right standard form. By Proposition 5.17, it is easy to go back and forth between generator matrices in left standard form and parity check matrices in right standard form. The use of parity check matrices that are in right standard form also has some interesting features. For instance, the code H2 (3) in Example 5.18 has parity check matrix in right standard form. A string x = x1 x2 · · · x7 is in H2 (3) if and only if x5 = x2 + x3 + x4 x6 = x1 + x3 + x4 . x7 = x1 + x2 + x4 This description of H2 (3) is very pleasant, for we can easily generate codewords from it by just picking values for x1 , x2 , x3 and x4 and substituting, or we can easily determine whether or not a given string is a codeword. 32 5. LINEAR CODES The Minimum Distance of a Linear Code In order to determine the minimum distance for an arbitrary code C of size M , we need to check each of the M (M − 1)/2 distance d(c, d) between codewords. For linear codes, we can greatly simplify the task. Definition 5.19. The weight w(x) of a string x ∈ Fnq is defined to be the number of nonzero positions in x. The weight of a code C, denoted by w(C), is the minimum weight of all nonzero codewords in C. Lemma 5.20. d(x, y) = w(x − y) for all strings x, y in Fnq . Since for a linear code C, we have that c, d ∈ C implies c − d ∈ C, by Lemma 5.20, we have the following. Proposition 5.21. If C is a linear code, then d(C) = w(C). It is important to emphasize that Proposition 5.21 holds only for codes which are additive subgroups of Fnq . As we have said, a linear code C can be described either by giving a generator matrix G or a parity check matrix P . Both method have advantages. For instance, it is easier to generate all codewords in C from G. On the other hand, to use P to generate all codewords in C requires solving a system of linear equations. However, it is easier to determine whether or not a given string is in C by using P . Furthermore, there does not seem to be a simple, direct method for determining the minimum weight of a linear code from a generator matrix. However, the following result shows that it is easy to do so from a parity check matrix. Proposition 5.22. Let P be a parity check matrix for a linear code C. Then the minimum distance of C is the smallest integer r for which there are r linearly dependent columns in P . Recall the sphere-covering lower bound on Ar (n, d), is given by rn ≤ Ar (n, d). Vrn (d − 1) It happens that we can improve upon this bound, in some cases, by considering linear codes and using Proposition 5.22. Theorem 5.23 (Gilbert-Varshamov bound). There exists a q-ary linear [n, k, d]-code if qn q k < n−1 . Vq (d − 2) Thus, if q k is the largest power of q satisfying this inequality, we have Aq (n, d) ≥ q k . The inequality displayed in Theorem 5.23 is known as the Gilbert-Varshamov inequality. The following example will show that the Gilbert-Varshamov bound is better than the sphere-covering bound. Example 5.24. The sphere-covering bound says that A2 (5, 3) ≥ 2. On the other hand, the Gilbert-Varshamov bound says that there exists a binary linear (5, 2k , 3)-code if 2k < 32/5 and so we may take k = 2, showing that there is a binary linear (5, 4, 3)-code, whence A2 (5, 3) ≥ 4. CORRECTING ERRORS IN A LINEAR CODE 33 Correcting Errors in a Linear Code Nearest neighbor decoding involves finding a codeword closest to the received string. There are betters methods for decoding with linear codes. Let us recall a few simple fact about quotient spaces. If W is a subspace of V over K, the quotient space of V modulo W is defined by V /W = {v + W | v ∈ V }. The set v + W = {v + w | w ∈ W } is called a coset of W . The quotient space is also a vector space over K, where λ(v + W ) = λv + W and (v + W ) + (v 0 + W ) = (v + v 0 ) + W for all λ ∈ K and v, v 0 ∈ V . Recall that v + W = v 0 + W if and only if v − v 0 ∈ W . Now let us suppose that a string x ∈ Fnq is received. nearest neighbor decoding requires that we decode x as a codeword c for which x − c has smallest weight. But as c ranges over a linear code C, x − c ranges over the coset x + C. Hence, nearest neighbor decoding requires that we decode x as the codeword c = x − f , where f is a string in x + C of smallest weight. Let C be a q-ary linear [n, k]-code. The process can be described in terms of so-called standard array for C, 0 f2 f3 .. . c1 f2 + c 1 f3 + c 1 .. . c2 f2 + c2 f3 + c2 .. . ··· ··· ··· .. . fqn−k fqn−k + c1 fqn−k + c2 · · · cqk f2 + c q k f3 + c q k .. . fqn−k + cqk The first row of the arry consists of codewords in C. To form the second row, we choose a string f2 of smallest weight that is not in the first row and add it to each codeword of the first row. This forms the coset f2 + C. In general, the i-th row of the array is formed by choosing a string fi of smallest weight that is not yet in the array and adding it to each codeword of the first row, to form the coset fi + C. The elements fi are called the coset leader of the array. The following basic fact about standard arrays will be used repeatedly Lemma 5.25. Let C be a q-ary linear [n, k]-code with standard array A. (1) Every strings in Fnq appears exactly once in A. (2) The number of rows of A is q n−k . (3) Two strings x and y in Fnq lie in the same coset (row) of A if and only if their difference x − y is in C. (4) The coset leader has minimum weight among all strings in its coset. Example 5.26. A standard array for the binary [4, 2]-code C = {0000, 1011, 0110, 1101} is 0000 1000 0100 0001 1011 0011 1111 1010 0110 1110 0010 0111 1101 0101 1001 1100 We remark that standard arrays are not unique. For instance, in the array of the previous example, we could have chosen 0010 to be the coset leader for the third row of the array. We now come to the purpose of standard arrays, which is to implement nearest neighbor decoding. 34 5. LINEAR CODES Proposition 5.27. Let C be a q-ary [n, k]-code with standard array A. For any string x in Fnq , the codeword c that lies at the top of the column containing x is a nearest neighbor codeword to x. Notice that the difference x − c between the received string x and the nearest neighbor interpretation c at the top of the column containing x, is the coset leader for the coset containing x. This coset leader is called the error string. Nearest neighbor ties are always decided when using a standard array. Thus, standard array decoding is complete decoding. Recall that if C is a linear [n, k, d]-code, then it is v-error-correcting, where v = b d−1 c. Put another way, any errors that result in an error 2 string of weight v or less are corrected. It follows that the coset leaders of any standard array for C must include all strings of weight v or less. One of the advantage of parity check matrices is that they can be used for efficient implementation of nearest neighbor decoding. Definition 5.28. Let P be the parity check matrix for a linear code C ⊆ Fnq . The syndrome S(x) of a string x ∈ Fnq is the product x · P t . We remark that the syndrome function S has the properties that S(x + y) = S(x) + S(y) and S(λx) = λS(x). Note also that the parity check equation x · P t = 0 is equivalent to S(x) = 0 and so x ∈ C if and only if S(x) = 0. The main importance of the syndrome comes from the following lemma. Lemma 5.29. Let C be a linear code. Two strings x and y are in the same coset of any standard array for C if and only if they have the same syndrome. Recall that under nearest neighbor decoding, the error string e in a received word x is the coset leader of the coset containing x and that the nearest neighbor codeword is c = x − e. But, the syndrome of x is equal to the syndrome of e and since the syndromes of the coset leaders are all distinct, we can find e simply by comparing the syndrome of x to the syndromes of the coset leaders. Now, nearest neighbor decoding can be implemented by the following simple algorithm. (1) Compute the syndrome S(x) of the received string x. (2) Compare it with the list of syndromes of the coset leaders {fi }. If S(x) = S(fi ), then fi is the error string and c = x − fi is a nearest neighbor codeword. Thus, we need only a list of coset leaders and their syndromes, which we refer to as a syndrome table for C coset leader syndrome 0 0 f2 S(f2 ) f3 S(f3 ) .. .. . . fqn−k S(fqn−k ) This process is referred to as syndrome decoding. Note that a standard array for a q-ary [n, k]-code C has q n−k rows. If P is a parity check matrix for C and P has linearly independent rows, then it has size (n − k) × n and therefore | = q n−k , we conclude that the set . Since |Fn−k each syndrome x · P t is an element of Fn−k q q n−k of syndrome is precisely the entire space Fq . CORRECTING ERRORS IN A LINEAR CODE 35 Let C be a linear code. We have seen that syndrome decoding will result in correct codeword if and only if the error made in transmission is one of the coset leader. Assuming a channel wherein the probability that a code symbol is changed to any other code symbol is p, if we let wi be the number of coset leaders that have weight i, then the probability of correct decoding is n X P(correct decoding) = wi pi (1 − p)n−i . i=1 In general, the problem of determining the number wi of coset leaders of weight i is quite difficult. However, in the case of perfect linear [n, k, d]-codes, ¡ ¢we can easily determine these numbers. By using the result in Exercise 11, we have wi = ni for 0 ≤ i ≤ b d−1 c and wi = 0 2 d−1 for i > b 2 c. An error in the transmission of a codeword c will go undetected if and only if the error string is a nonzero codeword. Hence, if Ai denotes the number of codewords of weight i, for the channel wherein the probability that a code symbol is changed to any other code symbol is p, the probability of an undetected error is n X P(undetected error) = Ai pi (1 − p)n−i . i=1 In defining a communications channel, we included the requirements that symbol errors be independent of time. While these assumptions make life a lot simpler, they are not always realistic. This leads us to the concept of a burst error. Definition 5.30. A burst in Fnq of length b is a string in Fnq whose nonzero coordinates confined to b consecutive positions, the first and last of which must be nonzero. For example, the string 0001100100 in F10 2 is a burst of length 5. Note that not all of the coordinates between the first and last 1s need be nonzero. Note that if a linear code is to correct any burst of length b or less, then no such burst can be a codeword. The following lemma will be useful. Lemma 5.31. Let C be a linear [n, k]-code over Fq . If C contains no bursts of length b or less, then k ≤ n − b. We have seen that the more errors we expect a code to detect or correct, the smaller must be the size of the code. This situation for burst error detection is settled by the following result. Proposition 5.32. If a linear [n, k]-code C can detect all burst errors of length b or less, then k ≤ n − b. Furthermore, there is a linear [n, n − b]-code that will detect all burst errors of length b or less. Now let us consider burst correction. Proposition 5.33. If a linear [n, k]-code C can correct all burst errors of length b or less, using nearest neighbor decoding, then k ≤ n − 2b. If a code can correct any burst error of length b or less, then no two such burst can lie in the same coset of a standard array of C. Thus by counting the total number of bursts of length b or less, we get an lower bound on the number of cosets of C, and hence an upper bound on the dimension of C. 36 5. LINEAR CODES Proposition 5.34. If a linear [n, k]-code C over Fq can correct all burst errors of length b or less, using nearest neighbor decoding, then k ≤ n − b + 1 − logq [(n − b + 1)(q − 1) + 1]. Finally, we introduce a procedure referred to as majority logic decoding. This procedure often provides a simple method for decoding a linear code. Definition 5.35. A system of parity check equations for a linear code is said to be orthogonal with respect to the variable xi provided xi appears in every equation of the system, but all other variables appear in exactly one equation. Now suppose a system of parity check equations is orthogonal with respect to xi and suppose that a single error occurs in transmission. If the error is in the i-th position, then xi is incorrect, but all other xj are correct. Hence, each equation will be unsatisfied. On the other hand, if the error is in any position other then the i-th position, then exactly one of the equations will be unsatisfied. Thus, the number of unsatisfied equations will tell us whether or not the i-th position in the received string is correct (assuming a single error). More generally, suppose we have r parity check equations which is orthogonal with respect to the variable xi . Suppose further that t ≤ r/2 error have occurred in transmission. If one of the errors is in the i-th position, then at most t − 1 of the equations can be corrected by the remaining errors, and so at least r − (t − 1) ≥ r/2 + 1 equations will be unsatisfied. On the other hand, if the i-th position does not suffer an error, then at most t ≤ r/2 equations will be unsatisfied. Therefore, the i-th position in the received string is in error if and only if the majority of equations is unsatisfied. This is majority logic decoding. Exercise (1) Let F be an arbitrary field. Prove that if F ∗ is cyclic, then F must be a finite field. (2) Is the code En consisting of all codewords in Fn2 with even weight a linear code? If so, give a basis, state the dimension and find the minimum weight. (3) Prove that for a binary linear code C, either all of the codewords have even weight or else exactly half of the codewords have even weight. (4) Write out all of the codewords for the ternary code with generator matrix µ ¶ 1 0 1 1 0 1 1 2 (5) (6) (7) (8) and find the parameters of the code. Show that it is perfect. Let C be a binary linear code. Let C c be the set of complements (c.f. Chapter 4) of codewords in C. Let 1 = 1 · · · 1 ∈ Fn2 . (a) Show that if 1 ∈ C then C c = C. (b) Is C also a linear code? (c) Show that C ∪ C c is a linear code. Let C be a linear code and let C (c.f. Chapter 4) be the extended code defined by adding an overall parity check to C. (a) Show that C is also a linear code. (b) If P is the parity check matrix for C, what is the parity check matrix for C? Prove that a binary self-dual [n, n/2]-code exists for all positive even integers n. Let G be a generator matrix for a q-ary linear code C. Then C is self-dual if and only if distinct rows of G are orthogonal and each row of G has weight divisible by q. EXERCISE 37 (9) Show that there is no binary linear [90, 78, 5]-code. (10) Let 1 = 1 · · · 1 ∈ Fn2 . (a) Show that if C is a binary self-orthogonal code, then all codewords in C have even weight and 1 ∈ C ⊥ (b) Suppose that n is odd. Show that if C is a binary [n, (n − 1)/2]-code, then C ⊥ is generated by any basis for C together with the string 1 (11) Let C be a linear [n, k, d]-code with standard array A. Show that C is perfect if and c or less. only if the coset leaders of A are precisely the strings of weight b d−1 2 (12) Let A and B be mutually orthogonal subset of Fnq , that is, a · b = 0 for all a ∈ A and b ∈ B. Suppose furthermore that |A| = q k and |B| > q n−k−1 . Show that A is a linear code. CHAPTER 6 Some Linear Codes Maximum Distance Separable Codes For fixed n and k, We may ask for the largest minimum distance d among all linear [n, k]-code. This problem has a very simple answer and leads to some fascinating theory. The Singleton bound (Theorem 4.31) or Proposition 5.22 implies the following. Lemma 6.1. For a linear [n, k]-code, we must have d ≤ n − k + 1. Definition 6.2. A linear [n, k]-code with minimum distance d = n − k + 1 is called a maximum distance separable code or an MDS code. It is not hard to see that q-ary MDS codes exist with parameters [n, n, 1], [n, 1, n] and [n, n − 1, 2]. This codes are referred to as the trivial MDS codes. Thus, any nontrivial MDS [n, k]-code must satisfy 2 ≤ k ≤ n − 2. Proposition 5.22 says that a linear code has minimum distance d if and only if any d − 1 columns of a parity check matrix are linearly independent but some d columns are linearly dependent. Thus we have the following Lemma 6.3. Let C be a linear [n, k]-code with parity check matrix P . Then C is MDS if and only if any n − k columns of P are linearly independent. If we choose the parity check matrix P of C with the property that rows of P are linearly independent, then P is a genera—or matrix for the dual code C ⊥ . We can characterize MDS codes in terms of their generator matrices. Proposition 6.4. Let C be a linear [n, k]-code with generator matrix G. Then C is MDS if and only C ⊥ is MDS. Furthermore, we have that C is MDS if and only if any k columns of G are linearly independent. Hear is another beautiful characterization of MDS codes. Proposition 6.5. Let C be an [n, k]-code with generator matrix G = (Ik | M ) in left standard form. Then C is an MDS code if and only if every square submatrix of A is nonsingular. The support of a vector x ∈ Fnq is the set of all coordinate positions where x is nonzero. Our next result characterizes MDS codes in yet another way. Proposition 6.6. A linear [n, k, d]-code C is an MDS code if and only if given any d coordinate positions, there is a codeword whose support is precisely these positions. Since MDS codes are very special, it is not surprising that the existence of such a code puts strong constraints on the possible values of the parameters of the code. Lemma 6.7. There are no nontrivial MDS [n, k]-codes for which 1 ≤ k ≤ n − q. 39 40 6. SOME LINEAR CODES By applying Lemma 6.7 to the dual code C ⊥ , we get the dual result. Corollary 6.8. There are no nontrivial MDS [n, k]-codes for which q ≤ k ≤ n. Lemma 6.7 and Corollary 6.8 can be restated as the following. Proposition 6.9. If a nontrivial MDS [n, k]-code exists, then n − q + 1 ≤ k ≤ q − 1. This Proposition spells apathy for binary MDS code. Corollary 6.10. The only binary MDS codes are the trivial codes. One of the most important problems related to MDS codes is the following. Given k and q, find the largest value of n for which there exists a q-ary MDS [n, k]-code. Let us denote this value of n by m(k, q). According to Proposition 6.9, m(k, q) ≤ k + q − 1. It is not difficult to construct a family of MDS codes. Let α1 , . . . , αu be nonzero elements from a field. The Vandermonde matrix based on these elements is 1 1 ··· 1 α1 α2 · · · αu 2 2 α α · · · αu2 . 2 V (α1 , . . . , αu ) = 1 .. .. ... . ··· . u−1 u−1 u−1 α1 α2 · · · α4 Lemma 6.11. The determinant of the Vandermonde matrix is Y det[V (α1 , . . . , αu )] = (αj − αi ). 1≤i<j≤u Now let Fq = {0, α1 , . . . , αq−1 } and consider the following (q − k + 1) × (q + 1) matrix obtained from a Vandermonde matrix by adding two additional columns 1 ··· 1 1 0 α1 · · · αq−1 0 0 2 2 H1 = α1 · · · αq−1 0 0 .. .. .. .. . ··· . . . q−k q−k α1 · · · αq−1 0 1 where 1 ≤ k ≤ q. Using Lemma 6.11, any q −k +1 columns of H1 form a nonsingular matrix. Therefore, we have the following. Proposition 6.12. For 1 ≤ k ≤ q, the matrix H1 is a parity check matrix of a q-ary MDS [q + 1, k]-code. Notice that we cannot, in general, add additional columns to the matrix H1 and expect a parity check matrix for an MDS code. For instance, consider the matrix 1 ··· 1 1 0 0 H2 = α1 · · · αq−1 0 1 0 . 2 0 0 1 α12 · · · αq−1 Choosing 2 columns among the first q − 1 columns along with the q + 1-th column, we get 1 1 0 αi αj 1 . αi2 αj2 0 HAMMING CODES 41 This matrix has determinant αi2 − αj2 . In order for any choice of distinct αi and αj , the field Fq must have the property that if α 6= β then α2 6= β 2 . This says that the characteristic of Fq must be 2, that is, q must be a power of 2. We have the following. Proposition 6.13. For q = 2m , the matrix H2 is the parity check matrix of a q-ary MDS [q + 2, q − 1]-code. Taking account the dual codes, Propositions 6.12 and 6.13 give the following. Corollary 6.14. For 1 ≤ k ≤ q, there exist q-ary MDS [q+1, k]-code and [q+1, q−k+1]code. For q = 2m , there exist q-ary MDS [q + 2, q − 1]-code and [q + 2, 3]-code. We remark that for k ≥ 3 and q odd, we can improve upon this slightly. Thus, for a nontrivial q-ary MDS [n, k]-code, with k ≥ 3 and q odd, we have n ≤ q + k − 2. At this point, we have gathered enough information to determine the value of m(3, q). In fact, we have ( q + 1 if q is odd, m(3, q) = q + 2 otherwise. It has been conjectured that, except for the case k = 3 and q even, if there exists a nontrivial MDS [n, k]-code, then m(k, q) = q + 1. Hamming Codes The Humming codes Hq (h) are probably the most famous of all error-correcting codes. They are perfect, linear codes that decode in a very elegant manner. For a given code alphabet Fq , we can construct a parity check matrix P with h rows and with the maximum possible number of columns such that no two of its columns are linearly dependent but some set of three columns is linearly dependent. First, pick any nonzero column v1 in Fhq . Then pick any nonzero column v2 in Fhq \ {αv1 | α 6= 0}. We continue to pick nonzero columns and then discard all nonzero scalar multiples of the chosen column until all columns have been discarded. The result is a parity check matrix with (q h − 1)/(q − 1) columns and with the properties we want. The resulting matrix, known as a Hamming matrix of order h, has the following property. Theorem 6.15. The Hamming matrix of order h is a parity check matrix of a q-ary linear [n, k, 3]-code with parameters n= qh − 1 , q−1 k = n − h, d = 3. This code Hq (h) is known as a q-ary Hamming code of order h. It is an exactly singleerror-correcting perfect code. Notice that the choice of columns is not unique and so there are many different Hamming matrices and Hamming codes with a given set of parameters. However, any Hamming matrix can be obtained from any other with the same parameters by permuting the columns and multiplying some columns by nonzero scalars. Hence any two Hamming codes of the same size are equivalent. Example 6.16. The binary case is by far the most common, where H2 (h) is a binary linear [2h − 1, n − h, 3]-code. For instance, the parity check matrix for the binary Hamming 42 6. SOME LINEAR CODES code H2 (3) is 0 0 0 1 1 1 1 H2 (3) = 0 1 1 0 0 1 1 1 0 1 0 1 0 1 Notice that the i-th column of H2 (3) is simply the binary representations of i. Now, if a single error occurs in transmission in the i-th position, resulting in the error vector ei , the syndrome of the received word is ei H2 (3)t , which is just the i-th column of H2 (3) written as a row. The previous example leads to the following. Proposition 6.17. If a codeword from the binary Hamming code H2 (h) suffers a single error, resulting in the received string x, then the syndrome S(x) = xH2 (h)t is the binary representation of the position in x of the error. In the non-binary case, we can do almost as well by choosing the columns of the parity check matrix Hq (h) in increasing size as q-ary numbers, but for which the first nonzero entry in each column is 1. For instance, the parity check matrix for H3 (3) is 0 0 0 0 1 1 1 1 1 1 1 1 1 H3 (3) = 0 1 1 1 0 0 0 1 1 1 2 2 2 1 0 1 2 0 1 2 0 1 2 0 1 2 Now, if an error occurs in the i-th position, the error will have the form αei for some nonzero scalar α. Hence, the syndrome is αei Hq (h)t which is α times the i-th column of Hq (h) written as a row. Because the way Hq (h) was constructed, we see that α is the first nonzero entry in the syndrome. Multiplying the syndrome by α−1 will give us the i-th column of Hq (h), telling us the position of the error. Since the Hamming codes have some special properties, it is not surprising that their dual codes also have special properties. we will restrict attention to binary codes. The dual of the binary Hamming code H2 (h) is called the simplex code S(h). Since the rows of the parity check matrix H2 (h) for H2 (h) is linearly independent, H2 (h) is a generator matrix for S(h). The simplex code S(h) is a [2h − 1, h]-code. To determine the distance properties of the simplex codes, we observe that the generator matrix H2 (h + 1) can be obtained from two copies of the matrix H2 (h) as follows 0 ··· 0 1 1 ··· 1 0 H2 (h + 1) = .. H2 (h) . H2 (h) 0 Now, any codeword c ∈ S(h + 1) is a sum of some of the rows of H2 (h + 1). Hence, c = xαy, where x is a sum of rows of H2 (h) and is therefore a codeword in S(h), α is either 0 or 1 and y is equal to x or xc (the complement of x), depending upon whether or not the first row of H2 (h + 1) is included in the sum. These cases are summarized in the following theorem which completely describes the simplex codes. Theorem 6.18. The simplex code S(h) can be described as follows. (1) S(2) = {000, 011, 101, 110}. GOLAY CODES 43 (2) For any integer h ≥ 2, S(h + 1) = {x0x | x ∈ S(h)} ∪ {x1xc | x ∈ S(h)}. Furthermore, d(c, d) = 2h−1 , for every pair of distinct codewords c and d in S(h). Theorem 6.18 explains why the codes S(h) are referred to as simplex codes. The line segments connecting the codewords form a regular simplex. Golay Codes There are total of four Golay codes: two binary codes and two ternary codes. We will define these codes by giving generating matrices, as did Marcel Golay in 1949. The Binary Golay Code G24 . The binary Golay generator matrix has the form G = [I12 | A], where 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 0 1 1 0 1 1 1 0 0 0 1 0 1 1 1 0 0 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0 0 0 1 0 1 A= 1 1 0 0 0 1 0 1 1 1 0 0 0 1 0 1 1 0 1 0 0 1 0 1 1 0 1 1 0 1 0 1 1 0 1 1 1 1 0 1 1 0 1 1 1 1 0 1 1 0 1 1 1 0 code G24 is a [24, 12]-code whose 1 0 1 0 1 1 0 1 1 1 0 0 1 1 0 1 1 0 1 1 1 0 0 0 1 0 1 1 0 1 1 1 0 0 0 1 We will show that G24 has minimum weight 8. It is straightforward to check that, if r and s are rows of G, then r · s = 0. Hence ⊥ ⊥ G24 ⊆ G24 . Since G24 and G24 both have dimension 12, they must be equal. By Proposition 5.17, we have that the matrix [At | I12 ] is a parity check matrix of G24 . Since G24 is self-dual and A is a symmetric matrix, we have the following. Lemma 6.19. Let [I12 | A] be the generator matrix for the Golay code G24 in left standard form. Then the matrix [A | I12 ] is also a generator matrix for G24 . We take advantage of the two generator matrices G1 = [I12 | A] and G2 = [A | I12 ] for G24 . Suppose that c ∈ G24 and consider c = xy. We have that w(x) ≥ 1 and w(y) ≥ 1. Suppose that w(c) = 4. If w(x) = 1, then c must be a row of G1 and similarly, if w(y) = 1, then c must be a row of G2 . None of these has weight 4. These leave only the possibility w(x) = w(y) = 2, which can be rule out by checking that no sum of any two rows of G1 has weight 4. Hence, there is no codeword in G24 of weight 4. However, if r and s are rows of G1 , then w(r + s) ≡ w(r) + w(s) (mod 4). It implies that the weight of every codeword in G24 is divisible by 4. We can now state the following Theorem 6.20. The binary Golay code G24 is a [24, 12, 8]-code. Since G24 is a [24, 12, 8]-code, syndrome decoding would require that we construct 224 /212 = 4096 syndromes. On the other hand, using the structure of G24 , we can considerably reduce the work involved in decoding. 44 6. SOME LINEAR CODES Since G24 is self-dual, the matrix G1 = [I12 | A] and G2 = [A | I12 ] are both parity check matrices for G24 . Suppose that 3 or fewer errors occur in the transmission of a codeword and let x be the received string and e be the error string. Let us write e = fg. we can compute the syndromes of the received string using both parity check matrices as follows: S1 (x) = xGt1 = f + gA and S2 (x) = xGt2 = f A + g. Now let us examine the possibilities. (1) If w(f ) = 0, then e = 0g = 0S2 (x) and w(S1 (x)) ≥ 5, w(S2 (x)) ≤ 3. (2) If w(g) = 0, then e = f0 = S1 (x)0 and w(S1 (x)) ≤ 3, w(S2 (x)) ≥ 5. (3) If w(f ) ≥ 1 and w(g) ≥ 1 then w(S1 (x)) ≥ 5, w(S2 (x)) ≥ 5. Thus, if either syndrome has weight at most 3, we can easily recover the error string e. If w(S1 (x)) and w(S2 (x)) are both greater than 3, we know that one of the following holds (a) w(f ) = 1 and w(g) = 1 or 2: We have f = ei , where ei is the string with 1 in the i-th position and zeros elsewhere. Consider yu = (x + eu 0)Gt2 = (ei g + eu 0)Gt2 = ei A + g + eu A. Then w(yu ) = 1 or 2 precisely when u = i; otherwise, w(yu ) ≥ 4. Thus, we can determine both the error position i and second half g, by looking at the 12 strings y1 , . . . , y12 . (b) w(f ) = 2 and w(g) = 1: We have f = ei + ej for some i 6= j. Consider yu = (x + eu 0)Gt2 = ei A + ej A + g + eu A. Then w(yu ) ≥ 4 for all u = 1, . . . , 12. In this case, we use similar computation using Gt1 . Because g = eλ for some λ, we have zu = (x + 0eu )Gt1 = f + eλ A + eu A which has weight w(zu ) = 2 if u = λ and weight w(zu ) ≥ 5 for u 6= λ. Thus we may easily pick out f = zλ and the error position λ by looking at the 12 strings z1 , . . . , z12 . In summary, if at most three errors occur, then we can decode correctly by computing at most the 26 syndromes xGt1 , xGt2 , (x + e1 0)Gt2 , . . . , (x + e12 0)Gt2 , (x + 0e1 )Gt1 , . . . , (x + 0e12 )Gt1 . The Binary Golay Code G23 . The binary Golay code G23 is obtained by puncturing the code G24 in its last coordinate position. (We remark that puncturing the code G24 in any of its coordinate positions will lead to an equivalent code.) The resulting punctured code has length 23 and since the distance between codewords in G24 is greater than 1, all of the punctured codewords are distinct, so G23 has the same size as G24 . It is clear that puncturing a code cannot increase the minimum distance nor decrease it by more than 1 and so d(G23 ) = 7 or 8. But the parameters [23, 12, 7] satisfy the sphere-packing condition and so d(G23 ) = 7. Theorem 6.21. The binary Golay code G23 is a perfect binary [23, 12, 7]-code. We will see that the code G23 can also be defined as a cyclic code, and this leads to efficient decoding procedures for G23 . The Ternary Golay Codes. The ternary matrix G = [I6 | B] where 0 1 1 1 0 1 1 1 0 B= 1 2 1 1 2 2 1 1 2 Golay code G12 is the code with generator 1 2 1 0 1 2 1 2 2 1 0 1 1 1 2 2 1 0 REED-MULLER CODES 45 As with the binary Golay code, the ternary Golay code G12 is self-dual and it is also generated by the matrix [B | I6 ]. We can also construct the ternary Golay code G11 by puncturing G12 in its last coordinate position. Theorem 6.22. The ternary Golay code G12 is a [12, 6, 6]-code and the ternary Golay code G11 is a perfect [11, 6, 5]-code. Many coding theorists established the uniqueness of Golay code. Their results can be summarized by saying that any code (linear or nonlinear) that has the parameters of a Golay code is equivalent to a Golay code. We also mention another remarkable result concerning the existence of perfect codes. As we have seen, the code consisting of a single codeword, the entire space and the repetition codes are all perfect. These are referred to as the trivial perfect codes. Theorem 6.23. For alphabets of prime power size, all nontrivial perfect codes C have the parameters of either a Hamming code or a Golay code. Furthermore, (1) if C has the parameters of a Golay code, then it is equivalent to that Golay code. (2) if C is linear and has the parameters of a Hamming code, then it is equivalent to that Hamming code. However, there are nonlinear perfect codes with the Hamming parameters. However, over any alphabet, the only nontrivial t-error-correcting perfect code with t ≥ 3 is the binary Golay code G23 . Notice that there are some gaps in Theorem 6.23. With regard to alphabets of prime power size, it is not know how many nonequivalent, nonlinear perfect codes there are with Hamming parameters. In 1962, Vasil0 ev discovered a family of such codes, which we discuss in the exercise. More generally, it is still not known whether there are perfect double-errorcorrecting codes over any alphabet whose size is not a power of a prime. (It is conjectured that there are none.) The issue of how many nonequivalent single-error-correcting perfect codes may exist seems to be extremely difficult. Reed-Muller Codes Reed-Muller codes are one of the oldest families of codes and have been widely used in applications. For each positive integer m and each integer r satisfying 0 ≤ r ≤ m, the r-th order Reed-Muller code R(r, m) is a binary linear [n, k, d]-code with parameters µ ¶ µ ¶ m m m n=2 , k =1+ + ··· + , d = 2m−r . 1 r At first, we restrict attention to the first order Reed-Muller codes R(m), which are binary linear [2m , m + 1, 2m−1 ]-codes. Definition 6.24. The Reed-Muller codes R(m) are binary codes defined for all integers m ≥ 1 as follows. (1) R(1) = Z22 = {00, 01, 10, 11} (2) For m ≥ 1, R(m + 1) = {uu | u ∈ R(m)} ∪ {uuc | u ∈ R(m)}. In words, the codewords in R(m + 1) are formed by juxtaposing each codeword in R(m) with itself and with its complement. 46 6. SOME LINEAR CODES To demonstrate the virtues of an inductive definition, note that R(1) is a linear [2, 2, 1]code in which every codeword except 00 and 11 has weight 20 . We can easily extend this statement to the other Reed-Muller code by induction. Theorem 6.25. For m ≥ 1, the Reed-Muller code R(m) is a linear [2m , m + 1, 2m−1 ]-code for which every codeword except 0 and 1 has weight 2m−1 . The inductive definition of R(m) also allows us codes. If Rm is a generator matrix for R(m), then a µ 0 ··· 0 1 Rm+1 = Rm to define generator matrices for these generator matrix for R(m + 1) is ¶ ··· 1 Rm We can describe the generator matrices Rm directly, both in terms of their rows and their columns. The first row of Rm consists of a block of 2m−1 0s followed by 2m−1 1s 0| ·{z · · 0} 1| ·{z · · 1} 2m−1 2m−1 The next row of Rm consists of alternating blocks of 0s and 1s of length 2m−2 0| ·{z · · 0} 1| ·{z · · 1} 0| ·{z · · 0} 1| ·{z · · 1} 2m−2 2m−2 2m−2 2m−2 In general, the i-th row of Rm consists of alternating blocks of 0s and 1s of length 2m−i . The last row of Rm is a row of all 1s. The columns of Rm can be describe as follows. Excluding the last row of Rm , the columns of Rm consist of all possible binary strings of length m, which read from the top down as binary numbers are 0, 1, . . . , 2m − 1, in this order. It is interesting to compare the characteristics of the Reed-Muller codes with those of Hamming codes. For approximately the same codeword length, the code size of Reed-Muller is significantly smaller than that of Hamming code. With Hamming codes, we pay for the large code size with a minimum distance of only 3. For the Reed-Muller codes, the relatively large minimum distance grows along with the code size. Since R(m) is a [2m , m + 1, 2m−1 ]-code, it is capable of correcting 2m−2 − 1 errors. Howm ever, a standard array for R(m) has 22 −m−1 rows. Thus, decoding using a syndrome table is time consuming, even with small value of m. We will describe a special type of majority logic decoding, call Reed decoding, that applies to Reed-Muller codes. Let Rm be the generator matrix for R(m) defined above and denote the rows of Rm by r1 , . . . , rm+1 . Then for a codeword c = c1 · · · cn , we have c = α1 r1 + · · · αm rm + αm+1 rm+1 , for some scalars αi ∈ F2 . Fixing i, we would like to find strings xi ∈ Fn2 such that xi ·rj = δij . Suppose that rj = rj1 · · · rjn . We have that (eu + ev ) · rj = rju + rjv . Therefore, if eu + ev is to be our candidate for xi , then we must have rju = rjv if j 6= i and riu 6= riv . Thus, for each row i, we want a pair of columns that are identical except in their i-th row. We refer to such pair of columns as good pair for the i-th row. Remark that the last row of Rm consists of all 1s, so the last row has no good pair. On the other hand, the last row will never give any trouble in finding good pairs for the other rows and so, for now, we can simply ignore the last row. REED-MULLER CODES 47 0 In fact, let Rm be the matrix obtained from Rm by removing the last row. The columns 0 of Rm consist of the binary representations of the numbers 0, 1, . . . , 2m − 1, in this order. Hence if rju = rjv for j 6= i and riu = 0, riv = 1, then u and v must have distance 2m−i apart. In particular, there are exactly 2m−1 good pairs for each row. Now imagine that a codeword c = α1 r1 + · · · αm rm + αm+1 rm+1 is sent. Using the 2m−1 good pair for row i, we get 2m−1 expressions for αi (for i ≤ m). Specifically, if c = c1 · · · cn and columns u and v is a good pair foe row i, then αi = (eu + ev ) · c = cu + cv . Hence, each of these 2m−1 expressions for αi involves different positions in the codeword c. Thus, if no more than 2m−2 − 1 errors occur, then at most 2m−2 − 1 of the coordinates cj are incorrect and so at most 2m−2 − 1 of expressions for αi are incorrect. This means that at least 2m−1 − (2m−2 − 1) = 2m−2 + 1 of these expressions give the correct value of αi . It follows that we can get the correct value of αi by computing the 2m−1 expressions for αi and taking the majority value. The final step is to obtain the coefficient αm+1 . If at most 2m−2 − 1 errors have occurred in receiving x, then the error string e = x − c has weight at most 2m−2 − 1. Letting d = α1 r1 + · · · αm rm , we have x − d = αm+1 rm+1 + e = αm+1 1 + e. There are two possibilities. If αm+1 = 0, then e = x − d and if αm+1 = 1, then e = (x − c)c . Thus, if w(x − d) ≤ 2m−2 − 1, we decode αm+1 as 0 and if w((x − d)c ) ≤ 2m−2 − 1, then we decode αm+1 as 1. Example 6.26. Suppose that a codeword from R(3) x = 11011100. Consider the generator matrix 0 0 0 0 1 1 1 0 0 1 1 0 0 1 R3 = 0 1 0 1 0 1 0 1 1 1 1 1 1 1 is sent and the received string is 1 1 . 1 1 For row 1 we have 23−1 = 4 so the good pairs are (1, 5), (2, 6), (3, 7), (4, 8). For row 2 we have 23−2 = 2 so the good pairs are (1, 3), (2, 4), (5, 7), (6, 8). For row 3 we have 23−3 = 1 so the good pairs are (1, 2), (3, 4), (5, 6), (7, 8). Thus, if c = c1 · · · c8 = α1 r1 + α2 r2 + α3 r3 + α4 r4 the expressions for α1 are α1 = c1 + c5 = c1 + c5 = c1 + c5 = c1 + c5 . The majority logic decision is α1 = 0 and similarly, α2 = 1 and α3 = 0. Thus, x − (α1 r1 + α2 r2 + α3 r3 ) = 11011100 − r2 = 11101111. Since the complement of this string has weight 1 ≤ 23−2 − 1, we decode α4 as 1. It follows that the codeword sent is 11001100. Now, we introduce the higher order Reed-Muller code R(r, m). In order to introduce these codes, we begin with a discussion of Boolean functions and Boolean polynomials. Definition 6.27. A Boolean function of m variables x1 , . . . , xm is a function f (x1 , . . . , xm ) from Fm 2 to F2 . 48 6. SOME LINEAR CODES (1) A Boolean monomial in m variables x1 , . . . , xm of degree s is an expression of the form g(x1 , . . . , xm ) = xi1 xi2 · · · xis , 1 ≤ i1 < i2 < · · · < is ≤ m. (2) A Boolean polynomial in m variables x1 , . . . , xm is a linear combination of Boolean monomials in these variables with coefficients in F2 . The degree of a Boolean polynomial g is the largest of the degrees of the Boolean monomials that form g. m The set Bm of all Boolean functions of m variables forms a vector space of size 22 over F2 . The set Bm of all Boolean polynomials in m variables is a vector space over F2 , as is the set Bm,r of all Boolean ¡ ¢polynomials in m variables of degree at most r. Since there are ms distinct Boolean monomials of degree s in m variables, the total number of distinct Boolean monomials is µ ¶ µ ¶ m m 1+ + ··· + = 2m 1 m m and the total number of distinct Boolean polynomials in m variables is 22 . This also happens to be the total number of Boolean functions in m variables, which is no mere a coincidence. Proposition 6.28. For every Boolean function f (x1 , . . . , xm ) in Bm , there is a unique Boolean polynomial g(x1 , . . . , xm ) in Bm for which f (α1 , . . . , αm ) = g(α1 , . . . , αm ) for all (α1 , . . . , αm ) ∈ Fm 2 . If we always agree to list the variables in the same order, we obtain a one-to-one correspondence between Boolean function f ∈ Bm and binary strings af of length 2m . Example 6.29. Suppose that f ∈ B3 with x1 x2 x3 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 f 0 1 1 0 0 0 1 1 We obtain the binary string af = 01100011. Using a convenient abuse of notation and writing a binary string in place of the corresponding polynomial, we have 01100011 = 0110 + x1 (0011 − 0110) = 0110 + x1 (0101) = 01 + x2 (10 − 01) + x1 (01 + x2 (01 − 10) = 01 + x2 (11) + x1 (01 + x2 (00)) = x3 + x2 (1) + x1 (x3 + x2 (0)) = x3 + x2 + x1 x3 Definition 6.30. Let 0 ≤ r ≤ m. The r-th order Reed-Muller code R(r, m) is the set of all binary strings ag of length n = 2m associated with the Boolean polynomials g ∈ Bm,r . REED-MULLER CODES 49 Example 6.31. (1) The 0-th order Reed-Muller code R(0, m) consists of the binary strings associated with the constant polynomials 0 and 1, that is, R(0, m) = Rep(2m ). On the other extreme, the m-th order Reed-Muller code R(m, m) consists of all binary strings of length 2m . (2) The first order Reed-Muller code of length n = 22 is the set of all binary strings associated with the Boolean polynomials p of the form α0 + α1 x1 + α2 x2 , where αi = 0 or 1. Thus, we can list the codewords in R(1, 2) as follows Polynomial Codeword 0 0000 x1 0011 x2 0101 x1 + x2 0110 1 1111 1 + x1 1100 1 + x2 1010 1 + x1 + x2 1001 The Reed-Muller code can be obtained using the u(u + v)-construction. Recall that if C1 is an (n, M1 , d1 )-code and C2 is an (n, M2 , d2 )-code, then the u(u + v)-construction yields a code C1 ⊕ C2 by C1 ⊕ C2 = {c(c + d) | c ∈ C1 , d ∈ C2 } which is a (2n, M1 M2 , d)-code with d = min{2d1 , d2 }. Suppose that 0 < r < m and consider a codeword af ∈ R(r, m), where f ∈ Bm,r . We can factor the variable x1 from those terms in which it appears and write f in the form f (x1 , . . . , xm ) = x1 g(x2 , . . . , xm ) + h(x2 , . . . , xm ), where g ∈ Bm−1,r−1 and h ∈ Bm−1,r . Let ag ∈ R(r − 1, m − 1) and ah ∈ R(r, m − 1) be the binary strings corresponding to the polynomials g and h, respectively. The string corresponding to x1 g(x2 , . . . , xm ) is 0ag and if think of h as a Boolean polynomial in m variables x1 , . . . , xm , then the string corresponding to h is ah ah . Hence, the string corresponding to f is af = 0ag + ah ah = ah (ah + ag ). Theorem 6.32. For the Reed-Muller codes R(r, m), we have (1) R(0, m) = Rep(2m ), (2) R(m, m) = Fn2 , where n = 2m , (3) for 0 < r < m, R(r, m) = R(r, m − 1) ⊕ R(r − 1, m − 1), where ⊕ denotes the u(u + v)-construction. In particular, R(r, m) has minimum distance 2m−r . Corollary 6.33. For r < m, R(r, m) contains codewords of even weight only. Let af ∈ R(r, m) and ag ∈ R(m − r − 1, m) with f ∈ Bm,r and g ∈ Bm,m−r−1 . Observe that af · ag ≡ w(af g ) (mod 2). Since deg(f g) ≤ deg(f ) + deg(g) ≤ m − 1, we have af g ∈ R(m − 1, m). According to Corollary 6.33, w(af g ) is even, which implies that af · ag = 0. 50 6. SOME LINEAR CODES Theorem 6.34. For 0 < r < m − 1, we have R(r, m)⊥ = R(m − r − 1, m). Exercise (1) Let C be a q-ary MDS [n, k]-code and let Aw be the number of codewords in C of weight w. Show that µ ¶ n . Ad = (q − 1) n−k+1 (2) We remarked in Theorem 6.23 that if C is a linear code with the same parameters as the Hamming code H2 (h), then C is equivalent to H2 (h). We now construct a binary nonlinear code V(h) with the same parameters as the Hamming code H2 (h). Let n = 2h − 1 and let f : H2 (h) → Z2 be a nonlinear function with f (0) = 0. Let π : Zn2 → Z2 be the function defined by ( 0 w(x) ≡ 0 (mod 2), π(x) = 1 otherwise. Now let V(h) = {x(x + c)(π(x) + f (c)) | x ∈ Zn , c ∈ H2 (h)} Show that V(h) is a binary [2h+1 − 1, 2h+1 − h − 2, 3]-code. Show also that V(h) is nonlinear. (3) In this exercise, we define and discuss the Nordstrom-Robinson code. This code has the interesting property that it has strictly larger minimum distance than any linear code with the same length and size. (a) Let G = [I12 | A] be the generator matrix of G24 . Show that, by permuting columns and using elementary row operations, the matrix G can be brought to the form 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 * 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 1 G0 = 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 * 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 where the asterisks represent some values. (b) Let C be the code generated by the generator matrix G0 . Show that there are 8 × 25 = 256 codewords in C whose first eight coordinates are one of 10000001, 01000001, 00100001, 00010001, 00001001, 00000101, 00000011, 00000000. EXERCISE (4) (5) (6) (7) 51 (c) The Nordstrom-Robinson code N is the code whose codewords are obtained from the 256 codewords obtained above by deleting the first eight coordinate positions. Show that N is a (16, 256, 6)-code. (d) Show that there is no linear (16, 256, 6)-code. Assuming the Reed-Muller code R(4) is used, decode the received word 0111011011100010. Find Boolean polynomial corresponding to the binary strings 1101111000011001. Show that for any Boolean function f (x1 , . . . , xm−1 ), the function xm + f (x1 , . . . , xm−1 ) takes on the values 0 and 1 equally often. Find an expression for a generator matrix for R(r, m) in terms of generator matrices for R(r, m − 1) and R(r − 1, m − 1) CHAPTER 7 Cyclic Codes Basic Definitions Definition 7.1. The right cyclic shift of a string x = x1 · · · xn−1 xn is the string xn x1 · · · xn−1 obtained by shifting each element to the right one position, wrapping the last element around to the first position. A linear code C is cyclic if whenever c ∈ C then the right cyclic shift of c is also in C. As an immediate consequence of this definition, if C is a cyclic code and c ∈ C, then the string obtained by shifting the elements of c any number of positions with wrapping is also a codeword in C. Example 7.2. The binary code D = {0000, 1001, 0110, 1111} is not cyclic, since shifting 1001 gives 1100, which is not in D. However, D is equivalent to a cyclic code C = {0000, 1010, 0101, 1111}. To get better understanding of cyclic codes, it pays to think of strings as polynomials. In particular, to each string c = c0 c1 · · · cn−1 , we associate a polynomial c0 + c1 x + c2 x2 + · · · + cn−1 xn−1 . Note that addition and scalar multiplication of strings corresponds to the analogous operations for polynomials. Thus, we may think of a linear code C of length n over Fq as a subspace of the space Pn (Fq ) of polynomials of degree less than n with coefficients in Fq . We can express the process of performing a right cyclic shift in terms of operations on polynomials. Notice that multiplying a codeword p(x) = c0 + c1 x + · · · cn−1 xn−1 by x gives xp(x) = c0 x + c1 x2 + · · · cn−1 xn which has some resemblance to a right cyclic shift and indeed would be a right cyclic shift if we replace xn by x0 = 1. Let Rn (Fq ) = Fq [x]/(xn − 1). Recall that Rn (Fq ) is the set of all polynomials over Fq of degree less than n. Addition in Rn (Fq ) is the usual addition of polynomial and multiplication is ordinary multiplication of polynomials, then dividing by xn − 1 and keep only the remainder. Note that taking the product modulo xn − 1 is very easy, since we simply take the ordinary product and then replace xn by 1. As an example, in R4 (F2 ), (x3 + x2 + 1)(x2 + 1) = x5 + x4 + x3 + 1 = 1 · x + 1 + x3 + 1 = x3 + x It is also important to note that, since xn − 1 is not irreducible in Fq [x], the product of nonzero polynomials may equal the zero polynomial. We can now think of a linear code C over Fq as a subspace of the vector space Rn (Fq ). In addition, if p(x) ∈ C, then the right cyclic shift of p(x) is the polynomial xp(x). In general, applying k right cyclic shifts is equivalent to multiplying p(x) by xk . 53 54 7. CYCLIC CODES Lemma 7.3. A linear code C ⊆ Rn (Fq ) is cyclic if and only if p(x) ∈ C implies that f (x)p(x) ∈ C for any f (x) ∈ Rn (Fq ). In the language of abstract algebra, the set Rn (Fq ), together with the operations of addition, scalar multiplication and multiplication modulo xn − 1 is an algebra over Fq . Any subset C of Rn (Fq ) that is a vector subspace and also has the property described in Lemma 7.3 is called an ideal of Rn (Fq ). That is, the cyclic codes in Rn (Fq ) are precisely the ideals of Rn (Fq ). An ideal C of Rn (Fq ) is called a principle ideal if there exists a polynomial g(x) ∈ C such that C = hg(x)i = {f (x)g(x) | f (x) ∈ Rn (Fq )}. The following theorem, in the language of abstract algebra, says that the ring Rn (Fq ) is a principle ideal domain. Theorem 7.4. Let C be a cyclic code in Rn (Fq ). Then there is a unique polynomial g(x) in C that is both monic and has the smallest degree among all nonzero polynomials in C. Moreover, C = hg(x)i. The unique polynomial mentioned in Theorem 7.4 is called the generator polynomial of C. Example 7.5. For the binary cyclic code C = {0, 1 + x2 , x + x2 , 1 + x}, we have 0 = 0 · (1 + x) x + x2 = x · (1 + x) 1 + x2 = x2 · (1 + x) 1 + x = 1 · (1 + x) and so C = h1 + xi. Since 1 + x has minimum degree in C, it is the generator polynomial for C. Notice also that 0 = 0 · (1 + x) 1 + x2 = 1 · (1 + x2 ) x + x2 = x2 · (1 + x2 ) 1 + x = x · (1 + x2 ) and so C is also generated by the polynomial 1 + x2 . However, since 1 + x2 does not have minimum degree in C, it is not the generator polynomial for C. It is very easy to characterize those polynomials that are generator polynomials. Proposition 7.6. A monic polynomial p(x) ∈ Rn (Fq ) is the generator polynomial of a cyclic code in Rn (Fq ) if and only if it divides xn − 1. Proposition 7.6 is very important, for it tells us that there is precisely one cyclic code in Rn (Fq ) for each factor of xn − 1. Thus, we can find all cyclic codes in Rn (Fq ) by factoring xn − 1. We have seen that if g(x) is the generator polynomial for a cyclic code C, then C consists of all polynomial multiples of g(x). We can easily obtain a basis for C from g(x). Theorem 7.7. Let g(x) = g0 + g1 x + · · · + gk xk be the generator polynomial of a nonzero cyclic code C in Rn (Fq ). (1) C has basis B = {g(x), xg(x), . . . , xn−k−1 g(x)} (2) C has dimension n − deg(g(x)). In fact, C = {r(x)g(x) | deg(r(x)) < n − k}. BASIC DEFINITIONS (3) C has generator matrix g0 g1 · · · 0 g0 g1 G = 0 0 g0 . . .. .. 0 0 ··· gk ··· 0 gk 0 0 g1 · · · gk .. .. . . 0 g0 g1 55 0 0 .. 0 . .. . 0 · · · gk ··· ··· whose n − k rows each consist of a right cyclic shift of the row above. We have seen that the generator polynomial g(x) of a cyclic code C ⊆ Rn (Fq ) divides x − 1. Hence, we can write xn − 1 = h(x)g(x), where h(x) ∈ Rn (Fq ). The polynomial h(x) which has degree equal to the dimension of C, is referred to as the check polynomial of C. Since the generator polynomial is unique, so is the check polynomial. The following theorem shows why the check polynomial is important. n Theorem 7.8. Let h(x) = h0 + h1 x + · · · + hn−k xn−k be the check polynomial of a cyclic code C ∈ Rn (Fq ). (1) The code C can be described by C = {p(x) ∈ Rn (Fq ) | p(x)h(x) = 0}. (2) The parity check matrix for C hn−k · · · 0 hn−k 0 P = 0 . .. .. . 0 0 is given by h1 ··· h0 h1 0 h0 0 0 hn−k · · · h1 h0 .. .. . . ··· 0 hn−k · · · 0 0 .. 0 . .. . 0 h1 h0 ··· ··· (3) The dual code C ⊥ is the cyclic code of dimension k with generator polynomial ⊥ n−k h (x) = h−1 + h1 xn−k−1 + · · · + hn−k ). 0 (h0 x Example 7.9. Because x9 − 1 factors over F1 into irreducible factors as follows x9 − 1 = (x − 1)(x2 + x + 1)(x6 + x3 + 1), the code C = hx6 + x3 + 1i has check polynomial h(x) = (x − 1)(x2 + x + 1) = x3 + 1. Hence, C has parity check matrix 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 P = 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 56 7. CYCLIC CODES The Zeros of a Cyclic Code If we have convenient access to the roots of the polynomial xn − 1, then it is possible to characterize the cyclic code Q in Rn (Fq ) in a slightly different way than through generator polynomial. Let xn − 1 = i mi (x) be the factorization of xn − 1 into monic irreducible factors over Fq . If α is a root of mi (x) in some extension field of Fq , then mi (x) is the minimal polynomial of α over Fq . Thus, for any f (x) ∈ Fq [x], we have f (α) = 0 if and only if f (x) = h(x)mi (x) for some h(x) ∈ Fq [x]. In particular, if consider f (x) ∈ Rn (Fq ), then f (α) = 0 if and only if f (x) = hmi (x)i. Now, since xn − 1 has no multiple roots, if g(x) | xn − 1 then g(x) = m1 (x) · · · mt (x) is a product of distinct irreducible factors of xn − 1. If αi is a root of mi (x) for i = 1, . . . , t, then hg(x)i = {f (x) ∈ Rn (Fq ) | f (αi ) = 0, i = 1, . . . , t}. Definition 7.10. The roots of the generator polynomial of a cyclic code are called the zeros of the code. The representation of a cyclic code by its zeros can be used to show that some Hamming code are cyclic codes. The binary case is the easiest, so let us consider it first. Let n = 2r − 1. By Theorem 5.6, the set of all n distinct roots of xn − 1 over F2 is a multiplicative cyclic group F∗2r which is the set of nonzero elements of the field of 2r elements containing F2 . An element that generates F∗2r is called a primitive field element of F2r . Suppose that β is a primitive field element of F2r . Consider the code C = {f (x) ∈ Rn (F2 ) | f (β) = 0}. As mentioned above C is the binary cyclic code with generator polynomial g(x) which is an irreducible factor of xn − 1. Since the degree of F2r over F2 is r, we have that deg(g(x)) = r. Hence, C is an [n, n − r]-code. If there is a polynomial f (x) = xi + xj ∈ Rn (F2 ) such that f (β) = 0 where 0 ≤ i < j < n, then it implies that β j−i = 1, which contradicted to the assumption that β is primitive. Hence, the minimum distance of C is at least 3, which implies that it must be equal to 3 (by the sphere-packing condition). Therefore, C is a linear code with the same parameters as the Hamming code H2 (r) and hence equivalent to H2 (r) by Theorem 6.23. Example 7.11. Consider the Hamming code H2 (4). In this case, n = 24 − 1 = 15 and the splitting field for xn − 1 is F16 . Consider the irreducible polynomial g(x) = x4 + x + 1 over F2 and suppose β is a root of g(x). Then we have that β 15 = 1, but β 3 6= 1 and β 5 6= 1. Hence β is a primitive field element of F16 . We conclude that H2 (4) is equivalent to the cyclic code generated in R15 (F2 ) by g(x) = x4 + x + 1. In general, not every q-ary Hamming code is equivalent to a cyclic code. For instance, we can write down all ternary cyclic codes of length 4 and find out that none of these codes has minimum distance 3. We see that the ternary Hamming code H3 (2) is not equivalent to a cyclic code. However, we have the following result. Proposition 7.12. Let n = (q r − 1)/(q − 1) and assume that gcd([r, q − 1] = 1. Then the q-ary Hamming code Hq (r) is equivalent to a cyclic code. The Idempotent Generator of a Cyclic Code We have seen that a complete list of all cyclic codes in Rn (Fq ) can be obtained from a factorization of xn − 1 into monic irreducible factors over Fq . However, factoring xn − 1 is ENCODING AND DECODING WITH A CYCLIC CODE 57 not an easy task in general. In this section, we explore another approach to describe cyclic codes, involving a different type of generating polynomial than the generator polynomial. Definition 7.13. A polynomial e(x) ∈ Rn (Fq ) is said to be idempotent in Rn (Fq ) if e(x)2 = e(x). Example 7.14. The polynomial x3 + x5 + x6 is an idempotent in R7 (F2 ) because (x3 + x5 + x6 )2 = x6 + x10 + x12 ≡ x6 + x3 + x5 (mod x7 − 1). Let C be a cyclic code in Rn (Fq ) with generator polynomial g(x) and check polynomial h(x). Since xn − 1 has no multiple roots, g(x) and h(x) are relatively prime and so there exist polynomials a(x) and b(x) for which a(x)g(x)+b(x)h(x) = 1. Let e(x) = a(x)g(x) ∈ C. Then for any p(x) ∈ C, we have that e(x)p(x) = p(x), since p(x)h(x) = 0. In other word, e(x) is the unique identity in C, and hence an idempotent in Rn (Fq ). e(x) also generates C, since any polynomial is a multiple of e(x). Proposition 7.15. e(x) is the unique polynomial in C that is both idempotent and generates C. We will refer to the polynomial e(x) as the generating idempotent of C. we can also compute the generator polynomial g(x) from the gerarting idempotent e(x). In fact, gcd[e(x), xn − 1] = gcd[a(x)g(x), h(x)g(x)], because xn − 1 = g(x)h(x) and e(x) ≡ a(x)g(x) (mod xn − 1). But a(x) and h(x) are relatively prime and so gcd[e(x), xn − 1] = g(x). From this, we have the following interesting relationship between the generator polynomial and generating idempotent. Proposition 7.16. Let C be a cyclic code in Rn (Fq ) with generator polynomial g(x) and generating idempotent e(x). (1) If α is a root of xn − 1, then g(α) = 0 if and only if e(α) = 0. (2) Suppose that f (x) is an idempotent in Rn (Fq ) with the property that if α is a root of xn − 1, then g(α) = 0 if and only if f (α) = 0. Then f (x) = e(x). Let C be a cyclic code with generating idempotent e(x) and check polynomial h(x). Since h(x)(1 − e(x)) ≡ h(x)(1 − a(x)g(x)) ≡ h(x) (mod xn − 1), we see that 1 − e(x) is the identity in hh(x)i. Hence the cyclic code hh(x)i has generating idempotent 1 − e(x). Similarly, we have the following result that relates the generating idempotents of a cyclic code and its dual. Theorem 7.17. Let C be a cyclic code in Rn (Fq ) with generating idempotent e(x). Then the dual code C ⊥ has generating idempotent 1 − e(xn−1 ) ∈ Rn (Fq ). Encoding and Decoding with a Cyclic Code There are two rather straightforward ways to encode message strings using a cyclic code. One is systematic and the other one is nonsystematic. Let C = hg(x)i be a q-ary cyclic [n, n − r]-code, where deg(g(x)) = r. Thus, C is capable of encoding q-ary messages of length n − r. We consider the nonsystematic method first. Given a source string a0 a1 · · · an−r−1 , we form the message polynomial a(x) = a0 + a1 x + · · · + an−r−1 xn−r−1 . This polynomial is encoded as the product c(x) = a(x)g(x). 58 7. CYCLIC CODES To obtain a systematic encoder, we form the message polynomial b(x) = a0 xn−1 + a1 xn−2 + · · · + an−r−1 xr . Notice that b(x) has no terms of degree less than r. Next, we divide b(x) by g(x), b(x) = h(x)g(x) + r(x), where deg(r(x)) < r and send the codeword c(x) = b(x) − r(x). Definition 7.18. A q-ary (n, q k )-code is called systematic if there are k positions i1 , i2 , . . . , ik with the property that, by restricting the codewords to these positions, we get all of the q k possible q-ary strings of length k. The set {i1 , i2 , . . . , ik } is called an information set and the codeword symbols in these positions are called information symbols. Since b(x) and r(x) above have no terms of the same degree, this encoder is systematic. In fact, reading the terms from highest degree to lowest degree, we see that the first n − r positions are information symbols. Example 7.19. Consider the binary cyclic [7, 4]-code generated by the polynomial g(x) = x3 + x + 1. Consider the message 1001. Using the systematic encoder, we have b(x) = x6 + x3 and since x6 + x3 = (x3 + x)(x3 + x + 1) + (x2 + x), the encoded message is c(x) = (x6 + x3 ) − (x2 + x) = x6 + x3 + x2 + x. Since a cyclic code is a linear code, we can decode using the polynomial form of syndrome decoding. Let C be a cyclic code. If c(x) ∈ C is the codeword sent and u(x) is the received polynomial, then e(x) = u(x) − c(x) is the error polynomial. The weight of a polynomial is the number of nonzero coefficients. Definition 7.20. Let C = hg(x)i be a cyclic [n, n − r]-code with generator polynomial g(x). The syndrome polynomial of a polynomial u(x), denoted by syn(u(x)), is the remainder upon dividing u(x) by g(x), that is, u(x) = h(x)g(x) + syn(u(x)), deg(syn(u(x))) < deg(g(x)). This definition of syndrome polynomial coincides with the definition of syndrome given for a parity check matrix of a linear code. As expected, a received polynomial u(x) is a code word if and only if its syndrome polynomial is a zero polynomial. Also, two polynomials if and only if they lie in the same coset of C. Thus, the polynomial form of syndrome decoding is analogous to the vector form. Example 7.21. The binary cyclic [7, 4]-code generated by the polynomial g(x) = x3 + x + 1 is single-error-correcting. The coset leaders and corresponding syndrome polynomials are coset leader syndrome 0 0 1 1 x x x2 x2 x3 x+1 4 x2 + x x x5 x2 + x + 1 6 x2 + 1 x ENCODING AND DECODING WITH A CYCLIC CODE 59 If, for example, the polynomial u(x) = x6 + x + 1 is received, we compute its syndrome polynomial x6 + x + 1 = (x3 + x + 1)(x3 + x + 1) + (x2 + x). Since syn(u(x)) = x2 + x, its coset leader is e(x) = x4 , and so we decode u(x) as c(x) = u(x) − e(x) = (x6 + x + 1) − x4 = x6 + x4 + x + 1. The main practice difficulty with syndrome decoding is that coset leader syndrome decoding might become quite long. However, we can take advantage of the fact that the code in question is cyclic as follows. Let us denote the polynomial obtained from p(x) by performing s cyclic shifts by p(s) (x). Suppose that u(x) = c(x) + e(x), where c(x) ∈ C is the codeword sent and u(x) is the received polynomial. There must exist some s for which the cyclic shift e(s) (x) of the error polynomial e(x) has nonzero coefficient of xn−1 . Since u(s) (x) = c(s) (x) + e(s) (x), we have syn(u(s) (x)) = syn(e(s) (x)). Hence, we only need those rows of the coset leader syndrome table that contain coset leaders of degree n − 1. Let us illustrate this process by an example. Example 7.22. Consider the binary cyclic [7, 4]-code generated by the polynomial g(x) = x3 + x + 1. We only need one row table: syn(x6 ) = x2 + 1. Suppose that we receive u(x) = x6 + x + 1. Since syn(u(x)) = x2 + x is not in the table, we shift u(x) and compute its syndrome, which is syn(u(1) (x)) = x2 + x + 1. Again this is not in the table, so we shift again and computing the syndrome gives syn(u(2) (x)) = x2 + 1. Since the syndrome is in the table, we deduce that e(2) (x) = x6 , and hence e(x) = x4 . Let us take a closer look at the relationship between the unknown error polynomial and the known syndrome polynomial. Suppose that u(x) = c(x) + e(x), where c(x) ∈ C = hg(x)i is the codeword sent and u(x) is the received polynomial. Since u(x) = h(x)g(x)+syn(u(x)), we have that e(x)−syn(u(x)) ∈ C. Suppose that C is a v-error-correcting-code, and suppose that at most v errors have occurred in the transmission. Suppose further that, syn(u(x)) has weight at most v. Then e(x) − syn(u(x)) is a codeword of weight at least 2v which is less than the minimum weight of C and so must be the zero codeword. Hence, we have the following. Lemma 7.23. Let C be a v-error-correcting cyclic code, and suppose that at most v errors have occurred in the transmission. If the syndrome of the received polynomial u(x) has weight at most v, then the error polynomial is equal to syn(u(x)). Of course, we may not be lucky enough to encounter syndrome polynomial of weight at most v. However, if the syndrome polynomial of a cyclic shift of u(x) has weight at most v, then it is almost as easy to obtain the error polynomial from this syndrome polynomial. Suppose that the syndrome polynomial of the cyclic shift u(s) (x) of u(x) has weight at most v. Then since u(s) (x) = c(s) (x) + e(s) (x), Lemma 7.23 gives e(s) (x) = syn(u(s) (x)) and so the error polynomial e(x) can be easily recovered from syn(u(s) (x)) by shifting additional n − s places. This strategy is known as error trapping. Example 7.24. Consider the binary cyclic [7, 4]-code generated by the polynomial g(x) = x + x + 1. Suppose that we receive u(x) = x6 + x + 1. We have syn(u(1) (x)) = x2 + x + 1, syn(u(2) (x)) = x2 + 1 and syn(u(3) (x)) = 1, and hence e(3) (x) = 1. This implies that e(x) = x4 , just as before. 3 60 7. CYCLIC CODES Let C be a v-error-correcting cyclic [n, n − r]-code. If v or fewer errors occur, and if they are confined to r consecutive positions, including wrap around, then there must exist some s for which the cyclic shift u(s) (x) of the received polynomial u(x) has its errors confined to the r coefficients of x0 , x1 , . . . , xr−1 . Thus, u(s) (x) = c(s) (x)+e(s) (x) with deg(e(s) (x)) < r. Hence, since c(s) (x) is a codeword, we have syn(u(s) (x)) = e(s) (x). This says that error trapping can correct any v errors that happen to fall within r consecutive positions, including wrap around. The result above does not says that any burst of length r or less can be corrected. In fact, this is not possible, because, according to Proposition ??, we know that if a cyclic [n, n − r]-code C can correct all bust errors of length b or less, then we must have 2b ≤ r. However, if b(x) ∈ C is a burst of length r or less, then by performing cyclic shifts of b(x), we obtain a codeword in C with degree less than r (the degree of the generator polynomial), which is impossible. Hence, we have the following. Proposition 7.25. A cyclic [n, n − r]-code C contains no bursts of length r or less. Hence, it can detect any burst error of length r or less. Exercise (1) Let C1 = hg1 i and C2 = hg2 i be two q-ary cyclic codes of length n. (a) Show that C1 ⊆ C2 if and only if g2 (x) | g1 (x). (b) Show that C1 ∩ C2 is also a cyclic code. (c) Let C1 + C2 = {c1 + c2 | c1 ∈ C1 , c2 ∈ C2 }. Show that C1 + C2 is also a linear code. (2) Let En be the set of even weight strings in Fn2 . (a) Show that En is a cyclic code and En = hx − 1i. (b) Let C = hg(x)i be a binary cyclic code of length n. Show that w(c) is even for all c ∈ C if and only if x − 1 | g(x). (3) Let g(x) be the generator matrix of a binary cyclic [n, n − r]-code C. Suppose that C contains at least one codeword of odd weight. (a) Show that the set E of all codewords in C of even weight is a cyclic code. What is the generator polynomial of E? Pn−1 (b) Prove that i=0 xi ∈ C. (4) Let C1 and C2 be two cyclic codes in Rn (Fq ) with generating idempotent e1 (x) and e2 (x), respectively. (a) Show that C1 ⊆ C2 if and only if e1 (x)e2 (x) = e1 (x) in Rn (Fq ). (b) Show that C1 ∩ C2 has generating idempotent e1 (x)e2 (x) in Rn (Fq ). (c) Show that C1 + C2 has generating idempotent e1 (x) + e2 (x) − e1 (x)e2 (x) in Rn (Fq ). (5) Show that any set of k consecutive positions in a cyclic [n, k]-code is an information set. (6) Let g(x) be the generator matrix of a binary cyclic [n, n − r]-code C. (a) Let si (x) be the remainder obtained by dividing xr+i by g(x). Show that xr+i −si (x), for i = 0, 1, . . . , n − r − 1, is a basis for C. (b) Find the generator matrix for C, by using the basis in part (6a), and find a corresponding parity check matrix H. (c) Suppose that u(x) = u0 + u1 x + · · · un−1 xn−1 is a received polynomial. How is the syndrome polynomial syn(u(x)) related to the syndrome (u0 , u1 , . . . , un−1 )H t ? (7) Let C be a cyclic [n, k]-code with generator polynomial g(x) and let ci = ci,1 ci,2 · · · ci,n be codewords in C, for i = 1, . . . , s. We may interleave these codewords by juxtaposing the first position in each codeword, followed by the second positions in each codewords, EXERCISE 61 and so on, to obtain the string c1,1 c2,1 · · · cs,1 c1,2 c2,2 · · · cs,2 · · · c1,n c2,n · · · cs,n Let us denote by C (s) the set of all strings forms in this way from all possible choices of s codewords in C (taken in all possible orders). (a) Show that C (s) is a cyclic [ns, ks]-code with generator polynomial g(xs ). (b) Suppose that C is capable of correcting burst of length b or less. Show that C (s) is capable of correcting burst errors of length bs or less.