Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Hatice Boylan and Nils-Peter Skoruppa Coding Theory Lecture Notes Version: August 1, 2016 This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Licence. (CC BY-NC-ND 4.0) For details see http://creativecommons.org/licenses/by-nc-nd/4.0/. c Hatice Boylan and Nils Skoruppa 2016 Contents 1 Fundamentals of Coding 1 What is coding theory 2 Basic Notions . . . . . 3 Shannon’s theorem . . 4 Examples of codes . . 5 Bounds . . . . . . . . 6 Manin’s theorem . . . Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Infinite Families of Linear 7 Reed-Solomon Codes . . 8 Reed-Muller codes . . . 9 Cyclic codes . . . . . . . 10 Quadratic residue codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 5 10 13 21 26 Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 31 34 36 43 3 Symmetry and duality 49 11 Weight enumerators . . . . . . . . . . . . . . . . . . . . . . . . . 49 12 MacWilliams’ Identity . . . . . . . . . . . . . . . . . . . . . . . . 51 13 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Appendix 61 14 Solutions to selected exercises . . . . . . . . . . . . . . . . . . . . 61 i ii CONTENTS List of Figures 1.1 1.2 1.3 1.4 2.1 2.2 2.3 Ha (x) for a = 2, 3, 4, 23 . . . . . . . . . . . . . . . . . . . . . . . The Fano plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . The icosahedron . . . . . . . . . . . . . . . . . . . . . . . . . . . For q=2 and n=256, plot of the Hamming, Singleton, Griesmer, Gilbert-Varshamov and Plotkin bounds in red, green, blue, gray and purple, respectively. (We plotted the points ( nd , R), where R is the maximal (respectively minimal, for Gilbert-Varshamov) rate admitted by the respective bound.) . . . . . . . . . . . . . . The 528 32-ary Reed-Solomon codes in the δ, R-plane . . . . . . The set RM2 (red), RM16 (green), RM32 (blue) for r = 1, 2, . . . , 10 in the δ, R-plane. The “Mariner” code RM2 (5, 2) is encircled . . Lattice of binary cyclic codes of length 7. The divisors of x7 − 1 are 1, x + 1, x3 + x + 1, x3 + x2 + 1, x4 + x2 + x + 1, x4 + x3 + x2 + 1, x6 + x5 + x4 + x3 + x2 + x + 1, x7 + 1 . . . . . . . . . . . . . . iii 9 14 16 24 32 36 38 iv LIST OF FIGURES Preface These lecture notes grew out of courses on coding theory which the second author gave during the past 10 years at the universtity of Siegen and a course given by the first author in 2015, when she was visiting Siegen with a Diesterweg stipend. Hatice Boylan and Nils Skoruppa, Siegen, July 2016 v vi LIST OF FIGURES Chapter 1 Fundamentals of Coding Theory 1 What is coding theory In coding theory we meet the following scenario. A source emits information and a receiver tries to log this information. Typically, the information is broken up into atomic parts like letters from an alphabet and information consists of words, i.e. sequences of letters. The problem is that the information might be disturbed by not optimal transport media resulting in incidental changes of letters Real life examples are the transmission of bits via radio signals for transmitting pictures from deep space to earth, e.g. pictures taken by a Mars robot . Or as a more every day life example the transmission of bits via radio signals for digital TV. The source could also be the sequence of bits engraved in an audio or video disk, and the transmission is now the reading by the laser of the CD reader: little vibrations of the device or scratches on the disk cause errors of transmission. Example 1.1. A source emits 0s and 1s, say, at equal probability. Let p be the probability that an error occurs, i.e. that a 0 or 1 arrives as a 1 or 0 at the receiver. If p is very small we might decide to accept these errors, and if p is almost 1 we might also decide to not care since we simply interpret 1 as 0 and vice versa, which reduces again the error probability to a negligible quantity. If the error probability is exactly 21 we cannot do anything but asking the engineers to study the problem of improving the transmission. However, if p is, say only a bit smaller than 12 and we need a more reliable transmission, coding comes into play. The natural idea is to fix a natural number n and if we want to transmit the bit b we send the sequence bb . . . b of length n. In other words, we encode b into a sequence of n-many bs. The receiver must, of course, be informed of this convention. He will decode then according to the principle of Maximum Likelihood Decoding. If he receives a sequence s of length n, he interprets it as a 0 if the word s contains more 0s than 1s and vice versa. In other words, he he interprets s as a 0 if s resembles more a sequence of n-many 0s and otherwise 1 2 CHAPTER 1. FUNDAMENTALS OF CODING THEORY as 1. Here we assume for simplicity that n is odd, so that a word of length n can never contain an equal number of 0s and 1s. What is now the probability of missing the right message? If we send a sequence of n-many 0s then receiving instead any word with r ≥ n+1 2 many 1s would result in an error. The probability of receiving a given word of this kind is pr (1 − p)n−r , and there are nr such words. The error probability is therefore now n X n r p (1 − p)n−r . Pn = r n+1 r= 2 It is not hard to show (see below) that limn→∞ Pn = 0. Therefore, our repetition code can improve a bad transmission to a one as good as we want, provided the transmission error p for bits is strictly less than 12 . What makes the repetition code so efficient is the fact that its two code words are very different. In fact they differ at all n places. However, there is a price to pay. Assume that you want to transmit a video of size 1 GB through a channel which has an error probability p = 0.1 when transmitting bits. This is certainly not acceptable since that would mean that 10 percent of the received video consists of flickering garbage. We might like to transmit the video via the repetition code of length n. The first values for the sequence Pn are P1 = 1.000000e − 01, P3 = 2.800000e − 02, P5 = 8.560000e − 03, P7 = 2.728000e − 03, P9 = 8.909200e − 04, P11 = 2.957061e − 04, P13 = 9.928549e − 05, P15 = 3.362489e − 05, P17 = 1.146444e − 05, P19 = 3.929882e − 06. For having transmission errors less than 0.1 percent we would have to choose n = 9, which would mean that we would have to transmit 9 GB for a video not bigger than 1 GB. In this sense the repetition code seems to us very inefficient. What makes it so inefficient is that there are only two possible informations, i.e. two code words to transmit, but they have length n. In other words there is only one bit of information for every n transmitted bits. We would like to insist on our idea but search for better codes. For example, for our case of transmitting a video we might try to find, for some (possibly big) number n, a subset C of the set {0, 1}n of all sequences of length n of digits 0 or 1 which satisfies the following two properties: 1. Every two distinct sequences in C should differ in as much as possible places. In other words, the quantity d(C) = min{h(v, w) : v, w ∈ C, v 6= w} should be very large, where h(v, w) denotes the number of places where v and w differ. 2. The quotient R(C) = should be large as well. log2 (|C|) n 1. WHAT IS CODING THEORY 3 The number log2 (|C|) is the quantity of information (measured in bits) which is contained in every transmission of a sequence in C, i.e. in every transmission of n bits. The ratio R(C) has therefore to be interpreted a the ratio of information per bit of transmission. We would then cut our video in sequences of length k, where k = blog2 (|C|)c, and map these pieces via a function (preferably designed by an engineer) to the sequences in C, send the encoded words and decode them at the other end of the line using Maximum Likelihood Decoding. The Maximum Likelihood Decoding will yield good results if d is very large, i.e. if the code words differ as much as possible. We shall see later (Shannon’s Theorem) that there are codes C which have R(C) as close as desired to a quantity called channel capacity (which depends on p), and the probability of a transmission error in a code word as low as desired. Of course, the length n might be very long, which might cause engineering problems like an increased time needed for encoding or decoding. We stress an important property of the repetition code which we discussed errors. This means the following: if the above. Namely, it can correct n−1 2 places, sent code word and the received one do not differ at more than n−1 2 the Maximum Likelihood Decoding will return the right code word, i.e. it will correct the errrors. In general we shall mostly interested in such error correction codes. However, in some situations one might be only interested in detecting errors, not necessarily correcting them. Examples for such a code are the International Standard Book Numbers ISBN10 and ISBN13. Here to every published book is associated a unique identifier. In the case of ISBN10 this is a word d1 d2 · · · d10 of length 10 with letters from the alphabet 0, 1, . . . , 9, X. The procedure of this association is not important to us (but see here for details). What is important for us is that it is guaranteed that the sum N := d1 + 2d2 + 3d3 + · · · + 10d10 is always divisible by 11 (where the symbol X is interpreted as the number 10). By elementary number theory the following happens: if exactly one letter is wrongly transmitted then N is no longer divisible by 11. In other words, we can detect one error. However, there is no means to correct this error (except, that we would be told at which place the error occurs). We shall come back to this later, when we recall some elementary number theory. 4 CHAPTER 1. FUNDAMENTALS OF CODING THEORY A property of the binomial distribution We prove the statement that the sequence of the Pn in the above example tend to 0. In fact, this can be obtained from Chebyshev’s inequality applied to a sequence of random variables Xn , where P (Xn = k) = n r p (1 − p)n−r , i.e. where Xn folr lows the binomial distribution with parameters n and p.This distribution measures the probability of successes in a sequence of n independent trials where the probability of success in a single trial is p. However, it is also possible to give a short direct proof avoiding the indicated concepts. For p < 12 we can choose λ = 12 in the proposition, and we obtain the claimed statement Pn → 0. Proposition 1.2. For every 0 ≤ p ≤ 1 and every λ > p, one has X n lim pr (1 − p)n−r = 0 n→∞ r d2 d p(1 − p) 1 −2np +n2 p2 (pex +1−p)n t=0 = , 2 2 2 (λ − p) n dt dt (λ − p)2 n r≥λn Proof. It is clear that 2 n X n X n r r − np r n−r n−r , p (1−p) ≤ p (1−p) (λ − p)n r r r=0 r≥λn since, for r ≥ λn, we have 1 ≤ r−np But the right hand side (λ−p)n . equals which tends to 0. Exercises 1.1. Find all subsets C in {0, 1}5 up to isomorphism, and compute d(C) and R(C) for each . (Two subsets are called isomorphic if one can be obtained from the other by a fixed permutation of the places of the other’s sequences.) 1.2. Which book possesses the ISBN-10 ”3540641 ∗ 35”? (First of all you have to find the 8th digit.) 2. BASIC NOTIONS 2 5 Basic Notions Let A be a finite set, henceforth called the alphabet, and fix a positive integer n. The elements of the Cartesian product An are called words over A of length n. For two words v and w in An we define their Hamming distance as h(v, w) = the number of places, where v and w differ. A subset C of An is called a code of length n. As we saw in the first section there are two quantities which are important to measure the efficiency of a code. The first one is its minimal distance: d(C) := min{h(c1 , c2 ) : c1 , c2 ∈ C, c1 6= c2 }. The larger d(C) the more errors C can discover or even correct. Indeed, one has the following. Theorem 2.1. A code with minimal distance d can correct via Maximum Likelihood Decoding up to b d−1 2 c errors, and it can detect up to d − 1 errors. Proof. Indeed, let c be code word, let w be a word which we receive for c, and 0 assume that w does not contain more than b d−1 2 c errors. If c is another code word then h(c0 , w) ≥ h(c, c0 ) − h(c, w) ≥ d − d − 1 d − 1 = + 1 > h(c, w). 2 2 (see Exercise 1. for the validity of the triangle inequality). Therefore, Maximum Likelihood Decoding would replace w by c, i.e. decodes w correctly. If w differs from c in at least one but not more than d − 1 places\(c\), then w cannot be a codeword and will hence be detected as erroneous since two different code words have distance strictly greater than d − 1. The second one is its information rate (or simply rate) R := log|A| (|C|) log |C| = . n log |An | Here loga (x) denotes the logarithm of x to the base a (i.e. the number y such that ay = x). One should think of it as follows. A set with N (\(=|C|\)) elements can describe (can be associated injectively to) sequences of k letters, where k is not larger than loga N since we need 2k ≤ N . Thus the information provided by such a set is “k letters”. On the other hand, since C ⊆ An every element of C is communicated via a word of length n. Thus the rate of information provided by C is nk . Example 2.2. The repetition code of length n over A consists of the n words in An whose letters are all the same. Here the minimal distance d equals n, which is the theoretical possible maximum for a code of length n. However, its rate is R= which tends to zero for increasing n. 1 , n 6 CHAPTER 1. FUNDAMENTALS OF CODING THEORY Strictly speaking, the given formula for the information rate is only welldefined if C is non-empty, and similarly, the formula for the minimal distance is well-defined only if C has at least two elements. In the following we assume tacitly that |C| ≥ 2 if it is necessary to give sense to a formula. Very often A will be an Abelian group. In this case we can consider the Hamming weight of a word v in An : h(v) = number of places of v different from 0. Clearly, h(v) = h(v, 0), where 0 denotes the neutral element in An . Moreover, for a code C in An which is a subgroup one has d(C) = min h(v). 06=v∈C Indeed, the sets {h(v, w) : v, w ∈ C, v 6= w} and {h(v) : 0 6= v ∈ C} are the same (since, for v and w in C, we have h(v, w) = h(v − w), and v − w is in C, and h(v) = h(v, 0) for all v in C). Even more often, A will be a finite field, and then more reasonably denoted by F . In this case F n is a vector space over F . If F is a prime field, i.e. a field whose cardibality is a prime, every subgroup C of F n is a sub-vector space of F n . If k is the dimension of C, and q denotes the number of elements of F , we have |C| = q k (see below). For the rate of C we therefore have the simple formula dimF C . R(C) = n Subspaces of F n are called linear codes of length n over F . In fact, we shall mostly concerned with linear codes. We shall later repeat the basics of the theory of finite fields. However, in many parts of the course we only need the finite field F2 with two elements. For those knowing a bit algebra or number theory, it suffices to recall that F2 = Z/2Z. Otherwise, as usual in algebra, call the element of the field F2 in question 0 (for the additive neutral elements) and 1 (for the multiplicative neutral element). The multiplication is easily understood by thinking of 0 and 1 as “False” and “True”, and then the multiplication is the logical “and”. Similarly, the addition corresponds to the logical “xor”, also known as the “exclusive or”. Cardinality of vector spaces over finite fields Proposition 2.3. Let C be a finite- ten in one and only one way as a dimensional vector space over the fi- linear combination a1 v1 + · · · + an vn nite field F . Then with aj in F . For each aj we have |F| many choices, which results in |F |k dimF C |C| = |F | . different linear combinations, i.e. elProof. Let v1 , . . . , vk be a basis of C. ements of C. The every element of C can be writ- 2. BASIC NOTIONS 7 Finite fields If F is a finite field its cardinality is a prime power q = pn . Vice versa, for every prime power q there is one and, up to isomorphism, only one finite field with q elements. The finite fields can be constructed as follows. If p is a prime then Fp := Z/pZ is a field with p elements. Here Z/pZ is the quotient of the ring Z by the ideal pZ. The elements of Z/pZ are the cosets [r]p := r + pZ, where r is an integer 0 ≤ r < p. The addition and multiplication of two such cosets is given by [r]p + [s]p = [t]p and [r]p · [s]p = [u]p , where t and u are the remainders of division of r+s and r · s by p. Similarly, if q = pn is a prime power with n ≥ 2, then a field with q elements can be obtained as follows. Let Fp [x] be the ring of polynomials with coefficients on the field Fp . Choose an irreducible polynomial f in Fp [x] of degree n (such polynomials always exist). That f is irreducible means that f cannot be written as product of two nonconstant polynomials in Fp [x]. Finally, the quotient Fq := Fp [x]/f Fp [x] is a field with q elements. As before the elements of Fp [x]/f Fp [x] are the cosets [r]f := r + f Fp [x], where r runs through all polynomials in Fp [x] whose degree is ≤ n − 1. Note that two cosets [g1 ]f and [g2 ]f are equal if and only if g1 − g2 is divisble by f . And as before addition and multiplication of cosets is defined as [r]f +[s]f = [t]f and [r]f ·[s]f = [u]f , where t and u are the (normalized) remainders of division of r + s and r · s by the polynomial f . The field Fq which we just defined depends a priori on the choice of f . In general there are more than one irreducible polynomials of degree n. For example, the polynomials f1 := x2 + 1 and f2 := x2 + x − 1 in F3 are both irreducible. However, it a fact that all fields with a given number q = pn of elements are isomorphic. An isomorphism Fp [x]/f1 Fp [x] → Fp [x]/f2 Fp [x] is given by the application [r]f1 7→ [e r]f2 , where re is the (normalized) rest of r(x + 1) after division by f2 . A finite field F with q = pn elements can be viewed as a vector space over Fp when we define the scalar multiplication of elements [r]p of Fp and λ of F by [r]p · λ as the r-fold sum of λ. It is a fact that f contains an element α such that 1, α, α2 , . . . , αn−1 is a basis of F as vector space over Fp . This follows for example easily from the fact that F ∗ = F \ {0} is a cyclic group with respect to multiplication. Thus every element on F can be written in a unique way as a linear combination u0 + u1 α + u2 α2 + · · · + un−1 αn−1 with elements uj from Fp . If we take for F the field Fq = Fp [x]/f Fp [x] then one can choose α = [x]f . The fact that αn is a linear combination of 1, α, α2 , . . . , αn−1 translates into the fact that there is a unique normalized polynomial f in Fp [x] such that f (α) = 0 (where normalized means that F is of the form xd + terms of lower degree). The multiplication of two linear combinations u0 + u1 α + u2 α2 + · · · + un−1 αn−1 is then done by applying the distributive law and using that αi ·αj = r(α), where r is the rest of xi+j after division by f . The polynomial f is called the minimal polynomial of α. The mentioned facts about finite fields and their proofs can be found in most textbooks on algebra. The reader might also look up the wikipedia. 8 CHAPTER 1. FUNDAMENTALS OF CODING THEORY A notion which will occur repeatedly is the ball of radius r around a word v in An : Br (v) := {w ∈ An : h(v, w) ≤ r}. The number of words inside Br (v) is Va (n, r) := |Br (v)| = r X n i=0 r (a − 1)i (see Exercise 2 below). Note that this number is independent of v. We want to introduce a measure for the “rate of uncertainty of information” transmitted through a channel which uses an alphabet with a ≥ 2 letters and transmit every letter with error probability p. If we use words of length n we expect in the average np errors per word. But then the received word is one of the ball of radius np around the sent one, i.e. it is one amongst Va (n, pn) many. The information provided by these many words, measured again in “number of letters” is loga (Va (n, pn)). The rate of uncertainty in this case is hence loga (Va (n,pn) . We therefore define, for 0 ≤ p ≤ 1, the base-a entropy function n Ha (p) := lim n→∞ loga Va (n, pn) . n By what we have seen this this is a sensible quantity to measure the “rate uncertainty of information” for a base-a channel of error probability p. Theorem 2.4. For any a ≥ 2 and 0 ≤ p ≤ 1 − a1 , the limit defining Ha (p) exists. Its value equals Ha (p) := p loga (a − 1) − p loga p − (1 − p) loga (1 − p), (where we understand Ha (0) = 0.) Note that Ha (x) increases continuously from 0 to 1. Its graphs for a = 2, 3, 4, 23 are: Proof. Set k = bpnc. We observe that nk (a − 1)k is the largest of the terms in the formula for Va (n, r). We conclude n n (a − 1)k ≤ Va (n, r) ≤ (1 + k) (a − 1)k . k k 1 n 1 log (a − 1)k = log n! − log k! − log(n − k)! + k log(1 − a) , n k n which, by Stirling’s formula log n! = n log n − n + O(log n) and k = pn + o(1), equals log n − p log pn − (1 − p) log(1 − p)n + p log(1 − a) + o(1) = Ha (p) log a + o(1). The theorem is now obvious. 2. BASIC NOTIONS 9 Figure 1.1: Ha (x) for a = 2, 3, 4, 23 Exercises 2.1. Show that An equipped with the Hamming distance defines a metric space. 2.2. Prove the given formula for the number of words contained in the ball Br (v) ⊂ An . 2.3. In the notations and under the assumptions of Theorem 4 prove that for i ≤ k := bpnc, one has ni (a − 1)i ≤ nk (a − 1)k . (This inequality was used in the proof of Theorem 4.) 2.4. What happens to the limit defining Ha (p) for 1 − limit exist? Can you determine its value? 1 a ≤ p ≤ 1? Does the 2.5. For any prime p, determine the number of normalized irreducible polynomials of degree 2 in Fp [x]. 10 3 CHAPTER 1. FUNDAMENTALS OF CODING THEORY Shannon’s theorem Assume that we transmit letters of an alphabet A with an error probability p. Let C be a code of length n over A. The events that we want to study is modeled by the set EC of pairs (c, n) where c is in C and m is a word of length n over A. Such a pair corresponds to the event that we transmit c and receive m. We assume that the probability that this event occurs is PC (c, m) = p a−1 h (1 − p)n−h |C| , where h = h(p, c), i.e. h equals the number of places, where c and m differ, and where a − |A|. Thus, the probability that a letter is transmitted wrongly is p, and then every letter different from the received one is received with equal probability. Moreover, we are assuming that in our transmissions every code word in C occurs with the same probability. The probability PC (S) that an P event lies in a subset S of the event space EC is then e∈S PC (e). It is an easy exercise to see that indeed PC (EC ) = 1. We apply the principle of Maximum Likelihood Decoding to map a received word m. This means that we search for the closest c in C (with respect to the Hamming distance). If the minimum h(c, m) is taken by exactly one code word c, we decode m as c. Otherwise we throw an error (or, in practice and if necessary, decode m as a once and for all fixed code word, or as a first one amongst all taking on the minimal distance to m with respect to some ordering). The probability for a transmission error is hence PC (EC ), where EC = {(c, m) ∈ EC : ∃c0 ∈ C : h(c0 , m) ≤ h(c, m)}. Theorem 3.1 (Noisy-Channel Coding Theorem). Assume 0 ≤ p < (a − 1)/a. Let R be a real number such that 0 < R < 1 − Ha (p). Then µn := 1 α(R, n) X PC (EC ) → 0 C⊆An bnRc R(C)= n for n → ∞. Here α(R, n) denotes the number of codes C of length n over A with R(C) = bnRc n . The interpretation of the theorem is clear. For any given R within the given bounds and any given ε > 0, there exists for all sufficiently large n a code of length n over A with transmission error probability less than ε and rate grater than R − ε. It is intuitively clear that the sum of the information rate of a code with probability of a transmission error close to 0 and the rate of uncertainty of information of the channel (i.e. Ha (p) cannot be grater than 1, which is indeed the assumption of the theorem. The magical quantity 1 − Ha (p) is called the channel capacity (of a transmission/channel for an alphabet with a letters and with error probability p). Proof. Let C be a code of length n. Fix a radius r and let DC be the set of events c, m in EC such that h(c, m) ≤ r and such that c is the only code word satisfying this inequality. Clearly, any (c, m) in DC will be decoded correctly 3. SHANNON’S THEOREM 11 0 by the Maximum Likelihood Decoding. Accordingly, the complement of EC of DC contains EC , and so 0 PC (EC ) ≤ PC (EC ). Let f (v, w) = 1 if h(v, w) ≤ r and f (v, w) = 0 otherwise, and, for an event (c, w) in EC , set X gC (c, w) = 1 − f (c, w) + f (c0 , w). c0 ∈C\c 0 Then gC (c, w) ≥ 1 on EC and gC (c, w) = 0 otherwise. Therefore X 0 PC (EC )≤ gC (c, w)PC (c, w). (c,w)∈EC Rewriting this inequality in terms of the f (c, w) yields X X 0 PC (EC ) ≤ PC (h > r) + f (c0 , w) PC (c, w), w∈An c,c0 ∈C c6=c0 where h is the Hamming distance and PC (h > r) denotes the probability that an event (c, w) in EC satisfies h(c, w) > r. We shall see in a moment, that, for any given ε > 0, we can choose r for any sufficiently large n (independent of C) such that PC (h > r) ≤ ε. With such an (i.e. |C| = abnRc ), we r and averaging over all C of length n and R(C) = bnRc n obtain for µn the estimate µn ≤ ε + µ0n , where, for any r, we have µ0n = X X w∈An c,c0 ∈An c6=c0 0 f (c , w) 1 abnRc p a−1 h(c,w) (1 − p)n−h(c,w) AC χC (c0 )χC (c) . Here AC denotes the average over C and χC the characteristic function of C. We estimate µ0n to above. For this we note #{C ⊆ An : c, c0 ∈ C, |C| = abnRc } AC χC (c0 )χC (c) = . α(R, n) n Set k = bnRc. Then α(R, n) = aak . The number of C with |C| = ak and c, c0 ∈ C equals the number of subsets in An \ {c, c0 } of cardinality ak − 2, i.e. it n −2 equals aak −2 . We insert these values into the expression for µ0n , and we drop the condition c 6= c0 in the sum over the c, c0 , so that this sum becomes two independent sums over c and over c0 , respectively. The sum of f (c0 , w) taken over c0 equals |Br (w)| = Va (n, r), and the sum over c equals 1. The contribution µ0n can therefore be estimated to above by an −2 n ak − 1 ak −2 a 0 µn ≤ an k Va (n, r) = n Va (n, r). a a −1 ak 12 CHAPTER 1. FUNDAMENTALS OF CODING THEORY Choose now λ > p such that still R < 1 − Ha (λ) (which is possible since 1 − Ha (x) is continuous), and choose r = λn. Taking the base a logarithm of the right hand side of the last inequality and dividing by n yields log µ0n ≤ 1 loga (abnRc − 1) loga (an − 1) − + loga Va (n, λn). n n n For n → ∞ this tends to β := R − 1 + Ha (λ). By choice of R we have β < 0. We conclude that for sufficiently large n, µ0n ≤ eβ 0 n for some β ≤ β 0 < 0. In particular, we see that limn→∞ µ0n = 0. It remains to prove the claim about the terms PC (h > λn). The mean value of the Hamming distance on EC is E = np and the variance σ 2 equals σ 2 = np(1 − p). By Chebyshev’s inequality we therefore have, for any given ε > 0, that p √ PC (h > np + np(1 − p)/ε) ≤ PC (|h − E| > σ/ ε) ≤ ε. p But for sufficiently large n we have λn ≥ np + np(1 − p)/ε. The claim of the theorem is now obvious. Chebyshev’s Inequality We recall here Chebyshev’s inequality. For avoiding introducing unnecessarily concepts from advanced probability theory we confine ourselves to the case of a finite set E and a probability measure P on the domain of its subsets. In other words, to every e in E is associated P a number 0 ≤ pe ≤ 1 such that e∈E pe = 1. The measure P (S) for a subset S of E is given by \( \sum {e\in S} p e\). Let h be a real or complex valued function on E (which, in the jargon of probability theory would be called a random variable). The mean value E (or expectation value) and the variance σ 2 of h are defined as X X E= h(e)pe , σ 2 = |h(e)−E|2 pe . e∈E e∈E Proposition 3.2 (Chebyshev’s Inequality). In the preceding notations one has, for any real k > 0, P (|h − E| ≥ kσ) ≤ 1 . k2 For the simple proof of Chebyshev’s Inequality we refer to the Wikipedia. Exercises 3.1. Prove that the mean value and the variance of the Hamming distance on EC with respect to the probability measure PC equal np and np(1 − p), respectively. 3.2. For a given w in F32 , compute the mean value of the random variable C 7→ χC (w) on the set of all 2-dimensional subspaces, where we assume that every subspace occurs with equal probability. 4. EXAMPLES OF CODES 4 13 Examples of codes Before we proceed to study more systematically how to produce codes with good minimal distance d and with good information rate R we review some classical codes. In fact, all codes in this section will be binary and almost all linear. Other examples will come in later sections. If we have any binary linear code C of length n we can produce a new code C by appending to each code word c a parity bit which is that bit cn+1 in {0, 1} which has the same parity as h(c), i.e. as the the number of 1s in c. If n is large this reduces the rate only slightly: if C has rate R = nk then C has rate k n+1 . However, if C is linear and has minimal distance d, then C has minimal distance d + 1 if d is odd, and has the same minimal distance d if d is even (since the minimal distance of a binary linear code is the minimal number of 1s occurring in a nonzero codeword). Thus C and C correct the same number of errors, namely b d−1 2 c, but C can detect one more error if d is odd. Example 4.1 (Two-out-of-five code). This is the code consisting of all words of length 5 over {0, 1} which possess exactly two 1s. There are exactly 10 = 52 codewords, which might represent e.g. the digits 0, 1, . . . , 9. This code is not linear. Its rate is log52 10 = 0.664385 . . . . It can obviously detect one error (since changing a 0 to 1 or vice versa will yield a word with one ore three 1s). It can also detect three or five errors (since changing a code word at an odd number of places is the same as adding a word with an odd number of places and hence changing the parity of the sum of letters). However, it does not detect two or four errors. Moreover, if one error occurs, we do in general not know where; so this code does not correct errors. Its minimal distance equals 2 (since all code words have even sum of letters). Example 4.2 (Hamming code and extended Hamming code). Maybe the first error correcting code which was applied as such is the Hamming Code H(7, 4). This is a linear subspace of dimension 4 in F72 . Its rate is therefore 74 . Its minimal distance is rather large for such a small code, namely 3. It can therefore correct one error (see Theorem 2.1). It is suitable for channels with low error probability, like for example in ECC memory, which is used as RAM in critical servers. It is amusing to read the story which lead Richard Hamming to find this code. There are several ways to describe the Hamming code. First of all, as 4dimensional code over F2 it has 216 code words (see Proposition 2.3), and we could simply list them all. This is very likely not very instructive. We can also write down a basis for it, i.e. a list 4 vectors of length 7 which span it. Again this is not very instructive, in particular since such a basis is not unique. We can also describe it by giving 3 linearly independent vectors of length 7 which are perpendicular to the 16 code words of the Hamming code with respect to the natural scalar product on F72 , which are then the vectors perpendicular to the given 3. One can combine these three vectors into a 7 × 3 matrix and the Hamming code is then the left-kernel of this matrix. Such matrices are called control matrices of the given code in question (since applying them to a code word from the right confirms that it is indeed a code word if the result is the zero vector). A fourth method is to read the code words as characteristic functions of subsets of a set with 7 elements. Namely, fix a set {P1 , P2 , . . . , P7 } with seven elements. A code word c1 , c2 . . .7 corresponds then to the subset {Pi : ci = 1}. 14 CHAPTER 1. FUNDAMENTALS OF CODING THEORY It is a truly beautiful fact that the 16 subsets of the Hamming code carry an additional structure which makes them such a distinguished collection. Namely, if we mark the 7 points Pi and connect every three by a “line” if they form a set corresponding to a code word with exactly 3 many 1s, we obtain the following figure Figure 1.2: The Fano plane (the circle in the middle has also to be considered as a “line”). This figure is also known as the Fano plane or the projective plane over the field with 2 elements. We see exactly 7 points and 7 lines, every 2 points lie on exactly one line, and every 2 lines intersect in exactly one point. Every line contains exactly 3 points, and through every point pass exactly 3 lines. The 16 code words of the Hamming code corresponds to the 7 lines, the 7 complements of the lines, the empty set and the full set. Note that the Hamming distance h(w, w0 ) of any two words corresponding to subsets S1 , S2 equals the cardinality |S1 4S2 | of the symmetric distance S1 4S2 = (S1 \ S2 ) ∪ (S2 \ S1 ). Therefore, the Hamming distance of two different lines of the Hamming code is 4. Continuing this line of reasoning It is easy to verify that the minimal distance of H(7, 4) is indeed 3 (see Exercise 1.) However, it is even easier to apply the criterion of Section 2 which states that the minimal distance of a linear code is the smallest number of 1s occurring in a codeword different from the zero word. It is immediately clear that the lines correspond to the codewords with minimal Hamming weight, which is then 3. The Hamming code H(7, 4) possesses another striking property. Namely, P the ball B1 (c) around a code word contains i≤1 7i = 8 points, and any two such balls around two different codewords are disjoint (since 3 ≤ h(c, c0 ) ≤ h(c, w) + h(c0 , w), so that one of the terms on the right is larger than 1). Since the number of code words times the number of points in a ball of radius 1 equals 16 · 8 = 27 , we see that the balls of radius 1 around the codewords partition the space F72 . A code with such a property is called a perfect code. We extend the Hamming code H(7, 3) to the extended Hamming code H(8, 4) by adding a parity bit. The extended code has rate 21 . The minimal distance 4. EXAMPLES OF CODES 15 increases to 4. The projective n-space over a finite field Let F be a field. The set Pn (F ) of 1-dimensional subspaces of F n+1 is called the The projective n-space over F , or simply projective line and projective plane over F if n = 2 or n = 3. The projective space Pn (F ) carries interesting additional structure and it has a very intuitive geometrical meaning. The latter we do not pursue here but hint to this reference. For the first note that it is meaningful to talk, for a given homogeneous polynomial f (x0 , . . . , xn ) with coefficients in F . of the subset N (f ) of all points P in Pn (F ) such that f (P ) = 0. Indeed, let w a basis of the one-dimensional space P . Then we can evaluate f at (w),and the property f (w) = 0 does not depend on the choice of w. If we choose another nonzero w0 in P , then w0 = aw for some a 6= 0 in F , and f (w0 ) = ad f (w), where d is the degree of f , since f is homogeneous. If f is linear, i.e. has degree 1 then N (f ) is called a hyperplane in Pn (F ), or simply a line if n = 2. The projective plane over a finite field F with q elements consists of q 2 + q + 1 elements (see Exercise 2. below). Each line has q + 1 elements, and every point lies on exactly q + 1 points. If we sketch the points and lines in P2 (F ) we rediscover, for F = F2 the Fano plane. Another descripytion of codes: (n,k)-systems An (n, k)-system over the finite field Proof. Every hyperplane H is the F is a pair (V, S) of an k-dimensional kernel of a nonzero φ in V ∗ vector space over F and family S = and vice versa, and #S ∩ H {Pi }1≤i≤n of n points in V , such that equals the number of zeros in S is not contained in any hyperplane φ(P1 ), φ(P2 ), . . . , φ(Pn ) , i.e. in V (i.e. the vectors in Pi generate V ). Note that clearly n ≥ k. h φ(P1 ), φ(P2 ), . . . , φ(Pn ) = n−#S∩H. An (n, k)-system describes a code of length n and dimension k over F , The proposition is now obvious. namely ∗ C := φ(P1 ), φ(P2 ), . . . , φ(Pn ) : φ ∈ V Note, that every linear code of length n and dimension k over F can be obtained from an (n, k)-system. Indeed, let G be a generator matrix of C (i.e. the rows of G form a basis of Proposition 4.3. One has C), and let Pi (\(1\le i\le n\)) be its k×1 columns. ), {Pi }1≤i≤n ) d(C) = n−max #S∩H : H ⊆ V hyperplane , Then (F is an n, k-system and C is the code where #S ∩ H is the number of 1 ≤ associated to it by the preceding coni ≤ n such that Pi ∈ H (i.e. the struction. (Here F k×1 is the vector number of Pi contained in H if the space of columns vectors of length k Pi are pairwise different). with entries from F .) where V ∗ denotes the dual space of V (i.e. the space of linear maps from V to F ). 16 CHAPTER 1. FUNDAMENTALS OF CODING THEORY Example 4.4 (The Golay and extended Golay code). The Golay code G23 is a binary linear code of length 23, rate 21 (hence dimension 12) and minimal weight 7. Later, when we shall study cyclic codes, we shall see a natural (or rather conceptual) construction. The extended Golay code G24 = G23 is obtained by adding a parity bit to G23 . Here we confine ourselves to describe a basis for G24 . The code G23 is then obtained by erasing the last digit (or, for a given i, the ith digit) from G24 . The icosahedron consists possesses 12 vertices, 30 edges and 20 vertices. Figure 1.3: The icosahedron Let A be the adjacency matrix of the icosahedron, i.e. number the vertices and set A = (aij ), where aij = 1 if the ith and jth vertex are joined by an edge, and aij = 0 otherwise. Finally, let B be the complement of A, i.e. replace in A a 0 by 1 and vice versa. Then the rows of the matrix (1|B), where 1 is the 12 × 12 identity matrix, form a basis for G24 . This is indeed not a very intuitive definition of the extended Golay code, but at least one can read off the matrix (1, B) from the picture of the icosahedron and investigate G24 numerically. A matrix like (1, B, i.e.a matrix whose rows form basis for a given linear code C is called generator matrix of C. We described here the Golay codes G24 and G23 up to some ambiguities: the used adjacency matrix depends on the ordering of the vertices, we obtain a priory different codes when we choose different ith places in the words of G24 for discarding. However, all these different codes are isomorphic, i.e. there are the same up to simultaneous permutations of the places of the code words. In the icosahedron every vertex is joined by an edge to exactly 5 other edges. Thus, the adjacency matrix contains in every row exactly five 1s and the complement N contains in every row exactly 7 = 12 − 5 many 1s. So the vectors of the given basis of G24 possess exactly eight 1s. It turns out that every vector of length 24 with exactly five 1s can be converted into a codeword by adding three 1s, and that in only one way. In other words, if we interpret again words in F24 as subsets of a set X with 24 elements, then the collection S of subsets of G24 with 8 elements has the following property: for every subset of X with five elements there exists exactly one subset of S containing it. A system S 4. EXAMPLES OF CODES 17 of subsets of X with this property is called a Steiner system S(5, 8, 24). The Steiner system provided by the vectors of Hamming length 8 in G24 is called the Witt design. Since there are 24 = 42504 5-subsets in X, and every 8-subset 5 8 contains exactly 5 = 56 5-subsets, the total number of codewords of length 8 8 is 24 5 / 5 = 759. The code G23 consists of 212 words, the balls of radius 3 around each codeword are pairwise (since the disjoint 23 23 minimal distance of G23 is 7). Each such 23 ball contains 23 + + + 0 1 2 3 = 2048 words. Therefore |G23 | · V2 (23, 3) = 212 · 211 = 223 , from which we deduce that the balls of radius 3 around the codewords partition F23 2 , i.e. G23 is perfect. The extended Golay code was implemented in the technical equipment of Voyager 1 and 2 for their mission in deep space, more specifically, for transmitting color images from Jupiter and Saturn (see for details). We end this section by examples of several error-detecting, but not error correcting codes. We include them here because we meet them in every day life. Example 4.5 (ISBN 10). We identify the alphabet {0, 1, . . . , 9, X} of the 10digit International Standard book number code which we discussed in the first section with the elements of the field F11 = {[0]11 , [0]11 , . . . , [9]11 , [10]11 }. Then this code becomes a linear code over F11 of length 10, namely, ISBN10 = {c1 c2 · · · c10 ∈ F10 11 : 10 X j · cj = 0} j=1 As kernel of a non-zero functional on F10 11 the code ISBN10 is a hyperplane in F10 , i.e. a subspace of dimension 9. The entry at the kth place of a codeword 11 c1 c2 · · · c10 is always a function of the other places: ck = −[k]−1 11 10 X j · cj . j=1 j6=k Thus, if we change a code word at one place it is no longer a code word. One error will therefore be detected (and can be corrected if we know the place where it occurs). On the other hand, it is easy to change a codeword at two places and again obtain a valid codeword (using again the last formula). Summarizing, we 9 . have d(ISBN10) = 2 and R(ISBN10) = 10 Example 4.6 (ISBN 13/EAN 13). The ISBN 13-digit code, which is identical to the International Article Number code (also known as EAN 13 barcode) is a subgroup of (Z/10Z)13 defined as ISBN13 = {c1 c2 . . . c13 ∈ Z/10Z : 13 X cj + 3 j=1 j odd 13 X cj = 0}. j=1 j even Here we use the ring Z/10Z of residue classes modulo 10 (see below). This code is the kernel of the group homomorphism from (Z/10Z)13 onto Z/10Z given by w0 w1 · · · w13 7→ 13 X j=1 j odd wj + 3 13 X j=1 j even wj . 18 CHAPTER 1. FUNDAMENTALS OF CODING THEORY Since this map is surjective its kernel has cardinality 1012 . As with the ISBN 10 check digits one sees easily that the minimal distance of the ISBN 13 check digit code is 2. Example 4.7 (IBAN). The International Bank Account Number consists of up to 34 characters: the first two are two upper case Latin alphabet characters indicating the country code like DE or GB etc.; then come two digits from the set {0, 1, . . . , 9} (called check digits) and finally up to a maximum of 30 letters from the 36 letters of the alphabet {0, 1, . . . , 9, A, B, . . . .Z}. How many these are is country specific. In Germany this is essentially the “old” Bankleitzahl followed by the proper account number suitably padded with 0s. Such a string of characters is a valid IBAN number if the following is true: take the given string, move the first four symbols to the end, replace all letters A, B, . . . , Z by 10, 11, . . . , 26, respectively. Interpret the resulting string of digits as 10-adic number. If the remainder upon division by 97 is 1 the given number passes the test. The German IBAN consists of the letters DE followed by the two check digits, followed by the 8 digits of the Bankleitzahl, followed by the account number which is prepadded by 0s so to comprise exactly 10 digits; it has exactly 22 characters. Thus the set of valid German IBAN numbers can be identified with the code IBANDE = 1314c1 c0 b22 b21 . . . b6 ∈ {0, 1, . . . , 9}24 : 22 X bj · 10j + 131400 + 10 · c1 + c0 ≡ 1 mod 97 (1.1) (1.2) j=6 (note that 1314 is the replacement of the characters DE). Since 97 is a prime number and 10 is relatively prime to 97 it follows similar to the ISBN 10 code that IBANDE can detect one error, but not correct unless we know the place where the error occurred. 4. EXAMPLES OF CODES 19 The ring Z/mZ of residue classes modulo m The set Z/mZ and the addition and multiplication of elements in Z/mZ is defined as in the case that m is a prime number. However, in contrast to the prime number case, a non-zero element has not always a multiplicative inverse. In fact, [r]m has a multiplicative inverse if and only if r and m are relatively prime, i.e. when the greatest common divisor gcd(r, m) of r and m is 1. For two integers r and s we write r ≡ s mod m if r and s leave the same rest upon division by m, i.e. if [r]m = [s]m . It is easily verified that r ≡ s mod m if and only if m divides r − s. The subset of multiplicatively invertible elements form a group with respect to multplication, which is denoted by (Z/mZ)∗ . In fact, for every ring R the set of multiplicatively invertible elements form a group with respect to multiplication, denoted by R∗ , called the group of units of R. The cardinality of (Z/mZ)∗ equals the number of integers 0 ≤ r < m with gcd(r, m) = 1. This number is usually denoted by ϕ(m), and the application m 7→ ϕ(m) is known as Euler’s phi-function. Formulas for it can be found in almost any textbook on elementary number theory. For a prime power pn one has obviously ϕ(pn ) = pn − pn−1 (i.e. the number remainders of pn not divisible by p equals the number of all remainders minus the number of remainders divisible by p). As a consequence of the Chinese remainder theorem one has Y ϕ(m) = (pn − pn−1 ), pn |km where the product is taken over all prime powers which divide m exactly, i.e. which divide m such that m/pn is no longer divisible by p. Control matrices and Hamming weight It is sometimes easy to read off the consider the Hamming code. A conminimal weight from the control ma- trol matrix is trix of a linear code. Namely, one 1 0 0 has the following proposition: 0 1 0 Proposition 4.8. Let C 6= {0} be 1 1 0 a linear code over the field F . If K 0 0 1 . denotes a control matrix of C, then 1 0 1 d(C) = min r : K possesses r linearly dependent columns 0 1 1. 1 1 1 Note that the set of which we take the minimum is in any case not empty. Namely, since C contains It is immediate that the matrix has nonzero vectors the columns of K full rank (since the 1st, 2nd and are linearly dependent. 4th row form the unit matrix), so We leave the easy proof of the the- that from the proposition we deduce orem as an exercise. As an example d(C) = 3. 20 CHAPTER 1. FUNDAMENTALS OF CODING THEORY Exercises 4.1. Verify, using e.g. Sage, that G24 has indeed minimal distance 8. 4.2. For a field F with q elements let Gkn (F ) be the set of k-dimensional sub spaces of F n . Show that |Gkn (F )| equals the Gaussian binomial coefficient nk q , i.e. n [q]n k |Gn (F )| = = . k q [q]k [q]n−k where, for any q and any nonnegative integer n, we use [q]n = (q n − 1)(q n−1 − 1) · · · (q − 1) (with the convention [q]0 = 1). (Hint: The cardinality in question equals the number of sequences of k linear independent vectors in F n divided by |GL(k, F )|. Next, ask yourself how man nonzero vectors do exist in F n ; if w is such a vector, how many nonzero vectors do exist in F n \ {a · w : a ∈ F }; . . . ?) 4.3. Prove Proposition 6. 4.4. For a code C with generator matrix G, let (V, S) be the (n, k)-system derived from G as described in the last paragraph of the addon “Another description of codes: (n,k)-systems” above. Prove that (V, S) is indeed an (n, k)system, and that C equals the code associated to this system. 5. BOUNDS 5 21 Bounds It is plausible that there must be a trade-off between rate and minimal distance. A code with a high rate should have small minimal distance, and a large minimal distance should have not many codewords, i.e. a small rate. For later it is useful to introduce some vocabulary. We call a code an (n, N, d)q -code if it is of length n over an alphabet with q letters and has cardinality N and minimal distance d. An [n, k, d]q -code is a linear code of length n over the field with q elements of dimension k and minimal distance d. The first four theorems of this section translate the qualitative statement of the last paragraph into precise quantitative forms. These theorems give, in particular, a first feeling for what parameter triples (n, N, d)q of length, cardinality minimal distance are possible for codes over alphabets with q elements. Moreover, their proofs teach us certain techniques to obtain such bounds. Clearly, for every d ≤ n there exists a code of length n and minimal distance d over an alphabet with q letters (e.g. the code {(a, . . . , a, b . . . , b), (b, b, . . . )}, where a 6= b are any two letters of the given alphabet and the first word has as at the first d places followed by bs). However, how large can this code be? We set Aq (n, d) = max{N : an (n, N, d)q code exists}. The first three theorems can be read as upper bounds for Aq (n, d). The fifth theorem, the Gilbert-Varshomov bound, gives a lower bound. Theorem 5.1 (Hamming bound). Let C be a code of length n over an alphabet with q letters of information rate R and with minimal distance d. Then R+ logq Vq (n, d−1 2 ) ≤ 1. n Proof. Indeed, by the triangle inequality the balls of radius t := codewords are pairwise disjoint. Therefore d−1 2 around the |C| · Vq (n, t) ≤ q n since q n is the number of all possible words of length n over an alphabet with q letters. Taking the base-q logarithm yields the claimed inequality. We call a code of length n over an alphabet A with q letters perfect if the inequality of the theorem becomes “equality”, i.e. if the balls of radius d+1 around the code words partition An . Recall from Example 4.2 that the 2 Hamming code H(7, 4) and the extended Golay codes G23 , whose rate and minimal distance are 47 , 3 and 12 23 , 8, respectively, are perfect codes. Theorem 5.2 (Singleton bound). Let C be a code of length n over an alphabet with q letters of information rate R and with minimal distance d. Then R+ d−1 ≤ 1. n Proof. The application c 7→ c0 , where c0 is obtained from c by deleting the first d − 1 letters, is injective, since two codewords differ in at least d places. The 22 CHAPTER 1. FUNDAMENTALS OF CODING THEORY image of this application is a code C 0 of length n − d + 1, and thus contains at most q n−d+1 codewords. Therefore |C| ≤ |C 0 | ≤ q n−d+1 , and taking the base-q logarithm yields the claimed inequality. Theorem 5.3 (Plotkin bound). Let C be a code of length n over an alphabet with q letters of information rate R and with minimal distance d. Then, for d 1 n > 1 − q , one has 1 d . R ≤ logq n d − n(1 − 1q ) Proof. Let N = |C|. For a letter a in the alphabet A of the code, let mi (a) the number of codewords of C which have a at the ith place. The number of ordered pairs in C with different entries is N (N − 1). We therefore have N (N − 1)d ≤ X x,y∈C x6=y h(x, y) = n X X mi (a)(N − mi (a)). i=1 a∈A The first inequality follows since d ≤ h(x, y) for x 6= y. The formula on the right is obtained by summing over all places i, and by counting, for each place i, the pairs of codewords which differ at this Pplace. For further estimating the sums on the right we note, first of all, that a∈A mi (a) = N . Furthermore, by the Cauchy-Schwartz inequality we have, for each a, X X 2 q mi (a)2 ≥ mi (a) = N 2 . a∈A a∈A (Apply the Cauchy-Schwartz inequality to the q-vectors mi (A) a∈A and (1, 1, . . . , 1).) We therefore obtain 1 N (N − 1)d ≤ n(1 − )N 2 , q i.e. 1 N d − n(1 − ) ≤ d. q The theorem is now obvious. The next bound is a bound for linear codes. Theorem 5.4 (Griesmer bound). Let C be a linear code of length n over a field F with q elements of rate nk and with minimal distance d. Then k X d ≤ n. q i−1 i=1 Proof. For positive integers k and d, let N (k, d) the minimal length of a linear code over F of dimension k and minimal distance d. We show N (k, d) ≥ N (k − 1, dq ) + d 5. BOUNDS 23 Applying this inequality repeatedly implies the claimed bound, namely N (k, d) ≥ N (k − 1, dq ) + d ≥ N (k − 2, qd2 ) + dq + d .. . ≥ N (1, d q k−1 k k−1 X X d d = )+ i−1 q q i−1 . i=1 (where one also uses l dd/q i e q m i=1 l m d = qi+1 ). For showing the first inequality let C be an [n, k, d]q -code where n := N (k, d). We can assume (by permuting all codewords simultaneously and multiplying a given place of all codewords by a suitable nonzero element of F ) that C contains a vector e consisting of d many 1s followed by 0s. Let D be a complement in C of the subspace spanned by e, i.e. C = F · e ⊕ D. Finally, let C 0 obtained from 0 0 D by deleting the first d places. We claim that C is a [n − d, k − 1, d ]q -code, where d0 ≥ dq . Deleting successively suitable places of the codeword in C 0 we can shorten C 0 to a [n − d − s, k − 1, dq ]q -code for some s (see Exercise 2), which proves the inequality. We prove the claim on C 0 . The code C 0 has obviously length n − d. Furthermore, it is clear that the application which deletes the first d places is injective (since otherwise there would be a nonzero codeword in D which has only 0s after the first d places, so that adding a suitable multiple of e to it would yield a nonzero codeword in C of length < d). Hence C 0 has dimension k − 1. Finally, let d0 be the minimal length of C. If we take a codeword c in C there must be among the first d places at least dd/qe which have the same entry, say a0 (since, if every element a of F occurs a < d/q many times amongst P only nP the first d places we would have d = a∈F na < a∈F d/q = d.) But then, if c is in D and c0 denotes the codeword in C 0 obtained from c by deleting the first d places, we have l m d− d q + h(c0 ) ≥ h(c − a0 e) ≥ d. It follows d0 ≥ d dq e. The technique of the proof which derived C 0 from C is sometimes known as constructing a residual code of C. As said at the beginning the first three theorems can be read as upper bound for Aq (n, d). Indeed, rewritten in terms of these numbers they state logq Aq (n, d) logq Vq (n, d−1 2 ) ≤1− , n n logq Aq (n, d) d−1 ≤1− , n n logq Aq (n, d) 1 d d 1 ≤ logq ( > 1 − ). 1 n n n q d − n(1 − q ) The following is an upper bound. 24 CHAPTER 1. FUNDAMENTALS OF CODING THEORY Theorem 5.5 (Gilbert-Varshamov bound). For any positive integers d ≤ n, one has logq Vq (n, d − 1) logq Aq (n, d) 1− ≤ n n Proof. Let N = Aq (n, d) and let C be an (n, N, d)q code. Then there is no word w in An \ C which has distance ≥ d to all code words (since otherwise we could adjoin w to C and thereby still keeping the minimal distance which contradicts the maximality of N ). Therefore the balls of radius d − 1 around the code words cover all of An . In particular, N · Vq (n, d − 1) ≥ q n . Taking the base-q logarithm proofs the theorem. Figure 1.4: For q=2 and n=256, plot of the Hamming, Singleton, Griesmer, Gilbert-Varshamov and Plotkin bounds in red, green, blue, gray and purple, respectively. (We plotted the points ( nd , R), where R is the maximal (respectively minimal, for Gilbert-Varshamov) rate admitted by the respective bound.) Exercises 5.1. Prove the inequality dd/q i e d = i+1 , q q which we used in the proof of the Griesmer bound. 5. BOUNDS 25 5.2. Let C be a [n, k, d]q -code, and assume d ≥ 2. Show that there is a place i, so that the code C 0 obtained from C by deleting the ith place of all codewords in C is a [n − 1, k, d − 1]q -code. 5.3. By a suitable adaptation of the proof of the Gilbert-Varshamov bound, prove that, for a given field with q elements and given d ≤ n there exists also a linear [n, k, d]q -code such that n logq Vq (n, d1) ≤ k. 26 6 CHAPTER 1. FUNDAMENTALS OF CODING THEORY Manin’s theorem For comparing codes C of different length it is useful to introduce the relative minimal distance d(C) δ(C) := , n where n denotes the length of C. Let Wq (n) be the set of of all points (δ, R) in the plane for which there exists a code of length n over an alphabet with q letters with minimal distance δn and information rate R. This set lies inside the rectangle 0 ≤ δ, R ≤ 1. We are mainly interested in the maximal points of this set with respect to the componentwise partial ordering, i.e. the ordering for which (δ, R) ≤ (δ 0 , R0 ) if and only if δ ≤ δ 0 and R ≤ R0 . Namely, for a maximal point (\(\delta,R)\) one has R = max{R0 : there exists an (n, R0 , nδ)q -code}, and also δ = max{δ 0 : there exists an (n, R, nδ 0 )q -code}. In other words, whatever we fix, δ or R, the maximal points answer the question for the best available pair δ, R. However, at the moment it seems to be impossible to describe the set Wq (n) or even only its maximal points precisely unless n is very small. The number of (n, R, d)q -codes equals the number of subsets n of q n , which equals 2q . Even for q = 2 and, say n = 5 there are 232 ≈ 4 · 109 such codes, and computing for each of them the minimal distance would hit the border of what is currently possible. (One can, however, do much better by searching only for codes up to “Hamming-distance preserving isomorphism” and which are maximal in the sense that adding another word decreases the minimal distance.) It is already interesting enough to consider prime powers for q and to consider the sets Vq (n) of points (δ, R) which correspond to linear codes over Fq of length n. The number of these codes is Nq (n) := n X k=0 [q]n [q]k [q]n−k (see in Problem 2 in Section 4). For q = 2, the first values are 2, 5, 16, 67, 374, 2825, 29212, 417199, 8283458, 229755605, 8933488744. Again, for n = 11 one has already N2 (n) ≈ 8.9 · 109 linear subspaces in F11 2 , which starts to run out of the range of feasible computations. A more promising approach is to consider the set Vq := [ Vq (n), n≥1 and then, to “smoothen” it, the set Uq of its limit points. Recall that these are those points x in the plane, for which every open neighborhood contains a point of Vq different from x. Here one has the following theorem. 6. MANIN’S THEOREM 27 Theorem 6.1 (Manin). The set Uq of limit points of Vq is of the form Uq = {(δ, R) ∈ [0, 1]2 : 0 ≤ R ≤ aq (δ)}, where aq : [0, 1] 7→ [0, 1] is a continuously decreasing function, equal to 0 on [1 − 1q , 1]. For the proof we introduce two simple procedures for “shortening” a code. Lemma 6.2. Let C be a [n, k, d]q -code. Then, for every 0 ≤ l < k, there exist [n − l, k − l, d]q -codes, for every l < d, there exists an [n, k, d − l]q -code, and, for every 0 ≤ l < k, d, there exist [n, k − l, d − l]q -codes. Proof. For proving the first statement choose l places where a codeword of weight d has zero coordinates (which is possible since by the Singleton bound we have k + d ≤ n + 1). The subspace C 0 of vectors in C having vanishing coordinates at these positions has dimension ≥ k − l (since its dimension equals k − r, where r is the dimension of the image of the map projecting C onto the fixed l coordinates, so that, in particular, r ≤ l). Its minimal length is clearly d. The existence of [n − l, k − l, d]q -codes is now obvious. For the second statement choose l places where a codeword of minimal length has nonzero coordinates. Then the code C 0 obtained from C by replacing these coordinates in every codeword by 0 is a [n, k, d − l]q -code. Any subspace of dimension k−l containing a codeword of shortest Hamming weight of C provides an [n, k − l, d − l]q -code. Proof. For proving the theorem we follow essentially the original argument of Manin. Let A be the pencil of lines in the δ, R-plane trough (0, 1), and let B be the pencil of lines δ − R = const.. For a point (δ0 , R0 ) let A(δ0 , R0 ) be the line from A through this point, and let sA(δ0 , R0 ) be the segment on this line from δ0 (δ0 , R0 ) down to ( 1−R , 0). Similarly, let B(δ0 , R0 ) be the line in B through 0 (δ0 , R0 ), and sB(δ0 , R0 ) the segment from (δ0 , R0 ) down to (δ0 − R0 , 0). We shall show below that, for every (δ0 −R0 , 0) in Uq , the segments sA(δ0 , R0 ) and sB(δ0 , R0 ) are contained in Uq . This is the essential step to prove the theorem. Indeed, for 0 ≤ δ ≤ 1, set aq (δ) := sup{R : (δ, R) ∈ Uq }. Note that the line δ = 0 lies in Uq (as limit points of the codes {(d, . . . , d, 0, . . . , 0), 0}) so that the sets whose suprema we take are indeed nonempty. Note furthermore that aq (δ) is in Uq for each δ (since the latter, being a set of limit points, is obviously closed). Therefore the segments sA(δ, aq (δ)) and sB(δ, aq (δ)) are contained in Uq too. If 0 ≤ x < y ≤ 1 then (x, aq (x) lies to the “left” of B(y, aq (y)) (since aq (x) is greater or equal to the R-coordinate of the intersection point of the segment sB(y, aq (y)) ⊆ Uq with the line δ = x), and similarly, (y, aq (y) lies to the “right” of A(x, aq (a)). But then, for fixed x, the “freedom” of (y, aq (y) is restricted to the segment on the line δ = y between the intersection points of this line with A(x, aq (x)) and B(x, aq (x)). Since this freedom approaches 0 as y tends to x we see that aq (δ) is continuous (a simple sketch makes this argument clear). 28 CHAPTER 1. FUNDAMENTALS OF CODING THEORY It is clear that aq (0) = 1 since Vq contains all points ( n1 , 1) (\(n\ge 1\)) (which correspond to the trivial codes of length n containing all words of length n). The Plotkin bound (see the preceding section) implies that aq (δ) = 0 for 1 − 1q ≤ δ ≤ 1. Namely, if δ is in this range then there exists a sequence of codes Cn of type [n, k, d]q with ( nd , nk ) → (δ, aq (δ)) as n tends to infinity. But by the Plotkin bound we have k 1 d/n ≤ log q → 0. n n d/n − (1 − 1/q) Finally, if R0 ≤ aq (δ0 ) then (δ0 , R0 ) is in Uq since the line δ − R = δ0 − R0 cuts the graph of aq at some point since δ 7→ λ(δ) := δ − (δ0 − R0 ) is increasing and the continuous function aq (δ) − (δ − (δ0 − R0 ) is nonnegative at δ = δ0 and negative at δ = 1. For proving the claim we note that, for every [n, k, d]q -code C, the set Vq contains the points k−l d , :0≤l<k A(C) := n−l n−l (as follows from the Lemma). These points are all on the segment sA( nd , nk ). If (δ0 , R0 ) is a point inb Uq and {Cn } a sequence of codes with δ(Cn ), R(Cn ) approaching (δ0 , R0 ) and whose lengths tend to infinity, then the sets A(Cn ) approach and densely fill sA((δ0 , R0 ). A similar argument applies to the pencil B (using the shortening of [n, k, d]q -codes to [n, k − l.d − l]q -codes). For showing that aq (δ) decreases one may use the pencil of lines R = const.. Using the shortening of [n, k, d]q -codes to [n, k.d − l]q -codes for l < d one shows that Uq contains, for every δ the line R = aq (δ). In particular, If δ 0 < δ then (δ 0 , aq (δ)) is in Uq , so that aq (δ) is in the set whose supremum equals aq (δ 0 ). One does not know much about aq (δ) except various bounds which one can derive from bounds like the ones of the last section, and as we will do now. These bounds are obtained by letting the length n tend to infinity (for which they are also named asymptotic bounds), a technique which we already used in the proof. Theorem 6.3. The function aq (δ) of the preceding theorem satisfies the following bounds: aq (δ) ≤ 1 − δ 1 aq (δ) ≤ 1 − Hq ( δ) 2 1 − Hq (δ) ≤ aq (δ) (Asymptotic Singleton bound), (Asymptotic Hamming bound), (Asymptotic Hamming bound). Proof. The Singleton bound for codes of length n states that Vq (n) is contained in the (finite) set Sq (n) := {(δ, R) ∈ ([0, 1] ∩ 1 1 2 Z) : R + δ ≤ 1 + }. n n Therefore Vq is contained in the union Sq of all Sq (n), and Uq is contained in the set Sbq of limit points of Sq . But Sbq is contained in the set of all (δ, R) ∈ [0, 1]2 such that R ≤ 1 − δ, which implies the asymptotic Singleton bound. 6. MANIN’S THEOREM 29 The asymptotic Hamming bound follows similarly by considering the sets Hq (n) := {(δ, R) ∈ ([0, 1] ∩ 1 2 1 Z) : R + logq Vq (n, (δn − 1)/2) ≤ 1} n n instead of Sq (n). Moreover, we us that n1 logq Vq (n, δn/2) tends to Hq (δ/2) (see Theorem ). Finally, for a given δ, choose a sequence of [ni , ki , di ]q -codes such that ndii → δ. We can assume that, for each i, the dimension ki is maximal. By the GilbertVarshamov bound for linear codes (see Exercise 3 in Section 4) we have 1 ki ≥1− logq Vq (ni , di − 1). ni ni Since the nkii are in the closed interval [0, 1] there is a convergent subsequence; we can therefore assume that nkii → R for some R. If δ is irrational, the set of rational numbers which are equal to at least one of the ndii cannot be finite. Therefore, (δ, R) is a limit point of Vq , and so aq (δ) ≥ R. But R, as limit of the nkii , is greater or equal to the limit of the right hand side of the limit of the Gilbert-Varshamov bound, which, by the theorem in Section 2 equals 1 − Hq (δ). Since aq and Hq are continuous,the asymptotic Gilbert-Varshamov bound is now obvious. 30 CHAPTER 1. FUNDAMENTALS OF CODING THEORY Chapter 2 Infinite Families of Linear Codes 7 Reed-Solomon Codes Recall that the Singleton bound for a linear [n, k, d]q code C states k +d ≤ n+1. If we have equality, i.e. if k+d = n+1, or, in other words, if C is a [n, k, n+1−k]q code, then we call C an Maximum Distance Separable (MDS) code. In this section we introduce an infinite family of MDS codes. Let F be a finite field with q elements, fix a vector a of length n with pairwise different entries aj in F , and set Eva,k : F [x]<k → F n , f 7→ f (a1 ), f (a2 ), . . . , f (an ) . Here F [x]<k denotes the F -sub-vectorspace of polynomials of degree < k in F [x]. Note that its dimension equals k. As basis one might take for instance the polynomials 1, x, . . . , xk−1 . For n ≥ k the evaluation map Eva,k is injective since a nonzero polynomial of degree l < k has at most l < n zeros and since we assume that the aj are pairwise different.. The image RSq (a, k) of Eva,k is a linear code of F of length n, called Reed-Solomon-code of degree k − 1 associated to a. Note that for such a code to exist we need n ≤ |F | (since we assume that the entries of a are pairwise different). In particular, for a given F , there are only finitely many Reed-Solomon codes over a given field F. Theorem 7.1. A Reed-Solomon Code RSq (a, k) of length n ≥ k over a field with q elements is a [n, k, n − k + 1]q -code. Proof. The only non-obvious statement is the minimal distance. For this note h f (a1 ), f (a2 ), . . . , f (an ) = n − #{i : f (ai ) = 0} (2.1) ≥ n − deg(f ) ≥ n − k + 1. For f = Qk−1 i=1 (x − ai ) we have here equality. 31 (2.2) 32 CHAPTER 2. INFINITE FAMILIES OF LINEAR CODES As a consequence we see that a Reed-Solomon Code with n ≥ k reaches the Singleton bound, and is therefore an MDS-Code. Note also, that this is not in contradiction to Manin’s theorem, since, for a given F , the set of Reed-Solomon codes is finite and the associated set 1− k −1 k , : 1 ≤ k ≤ n ≤ |F | n n of the δ, R-plane has therefore no limit points. Figure 2.1: The 528 32-ary Reed-Solomon codes in the δ, R-plane QR codes Reed-Solomon and derived codes are used for error correction in data streams occurring for example in transmission of audio and video streams; see for concrete examples. The reason is that Reed-Solomon codes are used over a field with many elements. A bit stream is for example partitioned into bytes and each byte represents an element of the field F256 . Thus burst errors, which typically occur in data streams, lead then only to a few errors in a code word over F256 . A sequence of 32 wrong bits for example leads to 4 successive errors, which could be corrected by a RS256 (a, 8) code over F256 with an a of length 16. QR codes are typical examples for having burst errors. For example, part of the paper which they are printed on might be missing, or a company prints its logo onto the QR code. To compensate for this the they use Reed-Solomon errorcorrection. 7. REED-SOLOMON CODES 33 Exercises 7.1. What is the kernel of the map Ev : F [x] → F q , f 7→ f (a1 ), f (a2 ), . . . , f (aq ) , where q = |F | and F = {a1 , a2 , . . . , aq }. (Hint: The requested kernel is in fact an ideal of F [x], and as such a principal ideal.) 7.2. What can one say about the dimension of RSq (a, k) for k stricly greater than the length of a? 34 8 CHAPTER 2. INFINITE FAMILIES OF LINEAR CODES Reed-Muller codes We can formally generalize the construction of the previous paragraph in a straightforward way and consider multivariate polynomials. However, a first difficulty is the study of the kernels of the evaluations maps. For Reed-Solomon codes they are injective if one assumes that the number of points at which we evaluate a polynomial is larger than its degree. The reason for this is that a polynomial of degree l has at most l zeros. So the first thing to do is to study the zeros of multivariate polynomials over a finite field. For this let us fix a finite field F . For a polynomial f in F [x1 , . . . , xr ] let N (f ) := {x ∈ F r : f (x) = 0} be the set of zeros of f in F r . Theorem 8.1 (Schwartz-Zippel lemma). If f is not the zero polynomial, one has #N (f ) ≤ l · |F |r−1 , where l denotes the degree of f . Proof. We can assume that l < |F | since otherwise the inequality is trivial. We follow here the short proof by Dana Moshkovitz. For this write f = g + h, where g is homogeneous of degree l and h contains only monomials of degree strictly less than l. By the subsequent lemma we can find a y in F r such that g(y) 6= 0; since g is homogeneous we have y 6= 0. The space F r can be partitioned into |F |r−1 many lines of the form Lx = {x + ty : t ∈ F }. The restriction p(t) := f (x + ty) of f onto Lx is a polynomial of degree l of the form g(y)tl + lower terms. Therefore it has ≤ l zeros. The theorem is now obvious. Lemma 8.2. If g is a nonzero polynomial in r variables and of degree l < |F |, there exists a y such that g(y) 6= 0. Note that the assumption l < |F | is not superfluous: the polynomial xq − x, where q = |F |, is nonzero, but for every a in F one has aq = a (since F ∗ is a group of order q − 1, so that aq−1 = 1 for every a in F ∗ ). Note also that the bound is sharp: if L is a nonzero linear form in r variables then N (L) consists of |F |r−1 points. Choosing l(< |F |) pairwise different elements aj of F the Q polynomial j (aj + L) has degree l and exactly l · |F |r−1 zeros. Proof. There exists an a in F such that g1 (x1 , . . . , xr−1 ) := g(x1 , . . . , xr−1 , a) is not identically zero since otherwise xr − a would divide g for all a in F contradicting the assumption that the degree of F is strictly less than |F |. Applying the same argument to the polynomial g1 yields a b in F such that g2 (x1 , . . . , xr−2 ) := g1 (x1 , . . . , xr−2 , b) is not the zero polynomial. Continuing in this way we finally find a y = (. . . , b, a) in F r such that g(y) 6= 0. 8. REED-MULLER CODES 35 We can now proceed as in the previous section. Choose n ≤ |F |r and a vector a of length n whose entries ai are pairwise different points in F r . The map Eva,k : F [x1 , . . . , xr ]<k → F n , f 7→ f (a1 ), . . . , f (an ) is then injective if k ≤ |F | and (k − 1)|F |r−1 < n, since a nonzero f of degree < k cannot vanish in more than (k − 1)|F |r−1 points (by the theorem). The subscript “< k” denotes the subspace of polynomials in r variables whose degree is strictly less than k. Note that the dimension of F [x1 , . . . , xr ]<k equals r+k−1 k−1 (see below). Moreover, if f is nonzero than the vector Eva,k (f ) has at most (k − 1)|F |r−1 zeros, that means its Hamming weight is ≥ n−(k−1)|F |r−1 . Moreover, as we saw there is a polynomial attaining this bound. We can summarize: Theorem 8.3. Assume 1≤ k ≤ |F | and (k − 1)|F |r−1 < n. The image of Eva,k is then an [n, r+k−1 , n − (k − 1)|F |r−1 ]|F | code. r If one chooses n = q r and 1 ≤ k ≤ q, where q = |F |, the assumptions of the theorem are fulfilled. The resulting code is then denoted by RMq (r, k) and called a Reed-Muller code. If By the last theorem we find that RMq (r, k) is a code of type k−1 r+k−1 r r . ,q 1 − q , q r q There are infinitely many Reed-Muller codes RMq (r, k) over a given field with q elements. Let RMq be the set of all pairs δ(C), R(C), where C runs through the Reed-Muller codes over F . In other words, ( ) ! r+k−1 1 − k−1 q , r RMq = : 1 ≤ k ≤ q, r ≥ 1 . qr r It is easy to see that, for a fixed k the sequence of rates r+k−1 k−1 /q tends to 0 as r increases. Therefore, the limit points of RMq are the points ( 1q , 0), ( 2q , 0), . . . , (1, 0)). The Mariner mission The Mariner mission used the [32, 6, 16]2 code, and can therefore RM2 (5, 2), which is a binary correct up to 7 errors. The number of polynomials of degree l in r variables Theorem 8.4. The dimension of which is the same as the number ν(l) the space F [x1 , . . . , xr ]≤l of polynoof solutions of i1 + · · · + ir + ir+1 = l. r+l But mials of degree ≤ l equals r . Proof. As basis for the F -vector X X −r − 1 X X 1 l i1 +···+ir +ir+1 ν(l) x = · · · x = = (−1) space F [x1 , . . . , xe ]≤l one can take (1 − x)r+1 l i1 i2 ir i ≥0 i ≥0 l≥0 l≥0 1 r the monomials x1 x2 · · · x1 with i1 + · · · + ir ≤ l. This number −r−1 ν(l) = (−1)l = equals the number of solutions of i1 + whence l r+l r+l = r . · · · + ir ≤ l in non-negative integers, l 36 CHAPTER 2. INFINITE FAMILIES OF LINEAR CODES Figure 2.2: The set RM2 (red), RM16 (green), RM32 (blue) for r = 1, 2, . . . , 10 in the δ, R-plane. The “Mariner” code RM2 (5, 2) is encircled 9 Cyclic codes Since there is apparently no way to understand, for a given field F , all linear subspaces of F n from the point of coding theory, it is natural to look first for more distinguished spaces. In whatever sense these spaces might be distinguished, the hope is that they are also interesting in the sense of coding theory. Distinguished spaces are for example those which are symmetric with respect to transformations respecting the Hamming weight. Transformations respecting the Hamming weight are in particular permutations of the places of a word. So it is natural to propose the study linear codes over a given field F which are invariant under certain permutations. To be more precise, recall that the symmetric group Sn of permutations of the set {1, 2, . . . , n} acts on F n via (s−1 , w1 w2 · · · wn ) 7→ ws(1) ws(2) · · · ws(n) . We call a linear code in F n cyclic if it is invariant under the subgroup h(1, 2, 3, . . . , n)i generated by the permutation (1, 2, 3, . . . , n), which maps 1 to 2, 2 to 3 etc. and finally n to 1. In other words, a code C in F n is cyclic if for all code words c1 c2 c3 · · · cn in C, the word cn c1 c2 · · · cn−1 is also in C. There is a useful, more algebraic characterization of cyclic codes. For this we identify F n with the ring Rn := F [x]/(xn − 1) via the isomorphism of F -vector spaces c0 c1 · · · cn−1 7→ [c0 + c1 x + c2 x2 + · · · + cn−1 xn−1 ]xn −1 . The notations are as in 2. Note, however, that Rn , for n ≥ 2, is not a field since xn − 1 is not irreducible. In general, Rn is a ring. More important for us is that 9. CYCLIC CODES 37 Rn is an F [x]-module via (f, [g]xn −1 ) 7→ [f · g]xn −1 . The action of the permutation s = (1, 2, 3, . . . , n) on words in F n corresponds then to multiplication of residue classes by x. Indeed x.[c0 + c1 x + c2 x2 + · · · + cn−1 xn−1 ]xn −1 = [c0 x + c1 x2 + c2 x3 + · · · + cn−2 xn−1 + cn−1 ]xn −1 (since xn ≡ 1 mod xn − 1), and the class on the right corresponds to the word cn−1 c0 c1 · · · cn−2 . Therefore a code C (considered as subset of Rn ) is a cyclic code if and only if it is a F [x]-submodule of Rn . Every F [x]-submodules of Rn is of the form Cn (g) := F [x].[g]xn −1 for a normalized divisor g of xn − 1 (see below). We therefore have a one-to-one correspondence {cyclic codes of length n over F } ↔ {g ∈ F [x] : g|(xn − 1), g norm.} . For the cyclic code Cn (g), the polynomial g is called the generator polynomial. The polynomial xn − 1 g ∗ := g is called control polynomial. The reason for the latter naming is that a word corresponding to the residue class [f ]xn −1 is a codeword in Cn (g) if and only if g ∗ .[f ]xn −1 = 0 (see Exercise 1). Theorem 9.1. The dimension of a cyclic code of length n with generator and control polynomial g and g ∗ equals n − deg(g) = deg(g ∗ ). Proof. The canonical map defines an exact sequence 0 → Cn (g) → Rn → Rn /Rn .[g] → 0. (where we suppress the subscript xn −1 ). It follows that dim Cn (g) = dim Rn − dim Rn /Rn .[g]. We know that dim Rn = n. For computing the second dimension we note that the application f + F [x]g 7→ [f ] + Rn [g] defines an isomorphism F [x]/F [x].g → Rn /Rn .[g]). The space on the left has dimension deg(g). The theorem is now obvious. As a basis for the cyclic code C of length n with generator polynomial g of degree l one can take [g], [xg], [x2 g], ..., [xn−l−1 g]. This remark is useful for setting up a generator or control matrix for C, and for computing the minimal distance for C. Apparently there is currently neither a closed formula nor a good algorithm for computing the minimal distance of the general cyclic code. It was announced that tables of the minimal distance of all binary cyvlic codes of length up to n ≤ 1000 have been computed. Anyway, in the next section we shall study a certain subclass of cyclic codes for which we can give at least lower bounds for their minimal distances. 38 CHAPTER 2. INFINITE FAMILIES OF LINEAR CODES Example 9.2 (The Golay code G23 ). The polynomial x23 − 1 factors over F2 into a product of three irreducible polynomials x23 − 1 = (x − 1)·(x11 + x9 + x7 + x6 + x5 + x + 1) ·(x11 + x10 + x6 + x5 + x4 + x2 + 1). Set g := x11 + x10 + x6 + x5 + x4 + x2 + 1. The code C23 (g) has dimension 12. Its minimal distance is 7 (see Exercise 2). In fact, this is the Golay code introduced in 4.4. The cyclic codes over a given field F and a given length n form a lattice with respect to inclusion: the intersection and the direct sum is again a cyclic code. For two divisors g1 and g2 of xn − 1 one has F [x].[g1 ] ⊂ F [x].[g2 ] if and only if g1 is a multiple of g2 (see Theorem). The maximal non-trivial codes correspond therefore to the irreducible factors of xn − 1. Figure 2.3: Lattice of binary cyclic codes of length 7. The divisors of x7 − 1 are 1, x + 1, x3 + x + 1, x3 + x2 + 1, x4 + x2 + x + 1, x4 + x3 + x2 + 1, x6 + x5 + x4 + x3 + x2 + x + 1, x7 + 1 The number of cyclic codes over F of length n equals the number σF (xn − 1) of normalized divisors of xn − 1. If n is relatively prime to |F |, then xn − 1 9. CYCLIC CODES 39 and its derivative nxn−1 are relatively prime, and hence xn − 1 has no multiple irreducible factor (see Addon 9). In this case σF (xn −1) = 2N , where N denotes the number of irreducible normalized polynomials dividing xn − 1. The number N can be easily computed. Theorem 9.3. Let F be a field with q elements. If n and q are relatively prime, the number of cyclic codes over F of length n equals 2N , where N= X ϕ(l) . ordl (q) l|n Here ϕ is Euler’s phi-function, and ordl (q), for any l, denotes the smallest positive integer f such that q f ≡ 1 mod l. Proof. We use some arguments from Galois theory of finite fields without further explanation. The less experienced reader might wish to skip this proof. Let d be the smallest positive integer such that q d ≡ 1 mod n, and let Q = q d . We can then assume that FQ contains F .Moreover, the polynomial xn − 1 has n different roots in FQ . The latter is true since n divides the order Q − 1 of the cyclic group F∗Q , which therefore possesses exactly n solutions of the equation an = 1. Let R be the set of these solutions. The Galois group G of FQ over F is cyclic of order d, generated by φ : a 7→ aq . The normalized prime polynomials in F [x] dividing xn − 1 are in one to one correspondence to the G-orbits of S Q (the orbit O corresponding to the polynomial a∈O (x − a)). The number of (normalized) divisors in F [x] of xn − 1 equals therefore 2#G\S . S For computing the Galois orbits of S decompose S as S = l|n S(l), where S(l) denotes the elements of order l in S. The number of elements in S equals ϕ(l). Clearly every S(l) is invariant under G. The stabilizer of an element a in S(l) is φf , where f = ordl (q). The orbit of any element in S(l) consists therefore of f elements, and S(l) decomposes into ϕ(l)/f orbits. The theorem is now obvious. ??? to do ??? Table 2.1: Base 2 logarithm of the number of cyclic codes of odd length n = 32i + j over the field with two elements in the ith row and jth column If n and |F | are not relatively prime then xn −1 contains multiple irreducible polynomials (see Addon 9). For example over F2 and for a 2-power n we have xn − 1 = (x + 1)n (as one sees by applying successively y 2 − 1) = (y + 1)2 ). In this case the lattice of cyclic codes of length n is a totally ordered set of length n + 1. 40 CHAPTER 2. INFINITE FAMILIES OF LINEAR CODES Modules and Ideals A module M over a ring R (or simply R-module M ) is an (additively written) abelian group M and an “action of the ring R on M ”. By the latter we mean a map R × M → M, (a, m) 7→ a.m which “commutes with all kind of operations”, i.e. which satisfies (a + b).m = a.m + b.m, (a · b).m = a.(b.m), 0R .m = 0M , 1R .m = m, a.(m + m0 ) = a.m + a.m0 for all a, b ∈ R, m, m0 ∈ M . An R-submodule S of M is a subgroup of M such that R.S(= {a.s : a ∈ R, s ∈ S}) ⊆ S. It is clear that the restriction of the action map in R × M to R × S defines then the structure of a module over R on S. If S is a submodule of M we can define the quotient module M/S. This is the quotient group M/S in the sense of group theory (i.e. the set of cosets m + S := {m + s : s ∈ S}, where m runs through M , equipped with the addition (m + S) + (m0 + S) := (m + m0 ) + S) together with the action defined by a.(m + S) := a.m + S. It is easy to check that M/S becomes in this way indeed an R-module. We may consider R itself as an R=module by taking as action the multiplication map (a, b) 7→ a · b in the ring R. The R-submodules of R are then called ideals of R. An ideal I is therefore a subgroup of R such that R · I ⊆ I. The quotient module R/I is then not only a module over R, but even more a ring by itself if we take as multiplication the map (a + I, b + I) 7→ a · b + I (one has to verify, of course, that this map is well-defined, i.e. does not depend on the choice of representatives a and b of the cosets in question. Special ideals are the principal ideals (a) = Ra = aR. Taking the quotient of modules and, in particular, the quotient of rings, is one of the most basic constructions in mathematics. We saw it already in the sections 4Section 1.4: The ring /m of residue classes modulo m and 2Section 1.2: Finite Fields. The set of real number itself is nothing else but R = C/N , where C is the ring of all Cauchy sequences in Q (with element-wise addition and multiplication), and where N is the ideal of sequences in Q which converge to 0. The notation pi = 3.14159265359 is nothing else but a shorthand for the coset 31 314 ,··· + N. π = 3, , 10 100 9. CYCLIC CODES 41 Arithmetic in F [x] For a field F the ring F [x] has much in common with the ring of integers Z. This is due to the fact that both are Euclidean rings. In an Euclidean ring every nonzero element a has a prime factorization a = p1 · · · pr , where the sequence of prime elements {pj }j is unique up to permutation of the elements and up to multiplication of the primes pj by units. A prime element p is a nonzero element with the property that a · b is a multiple of p only of either a or b is so. The prime elements in Z are the numbers ±p, where p is a prime number. In F [x] the prime elements p are the irreducible polynomials, i.e. those polynomial which cannot be decomposed as product of two polynomials in F [x] of degree strictly less than the degree of p. A unit of a ring R is an element u such that there exists also an element v such that u · v = 1. The units form a group with respect to the ring multiplication. The units of F [x] are the nonzero elements of F . The units of Z are the integers ±1. The prime elements in C[x] are (up to multiplication by units) exactly the polynomials of the form x − a, where a runs through C. In fact, by the Fundamental Theorem of Algebra every complex polynomials can be written (up to a unit) as product of polynomials x − a, where a runs through the zeros of f (taking multiplicities into account). The prime elements of R[x] are the linear polynomials and the quadratic ones which have no real roots. The number of prime elements of F [x] for a finite field F was discussed in Section ??: Exercise 5. A question which occurs sometimes is when does a polynomial have a multiple prime factor. The answer is given by the following theorem. theorem is the product rule (rg)0 = r0 g + rg 0 Moreover, an irreducible r does not divide r0 . Otherwise, r0 = 0 (since, for r0 6= 0, one has deg(r) > deg(r0 ). But this is only possible if F is a finite field, say, F = Fpn , and r(x) = h(xp ) for a suitable polynomial h. But then h(xp ) = h(x)p (which follows from p| kp for 1 ≤ k ≤ p − 1) contradicting the fact that r is irreducible. Assume that a prime polynomial r divides f , say, f = rg. Then, the product rule implies that r|f 0 if and only if r|r0 g. But r|r0 g is equivalent to r|g (since r does not divide r0 ). The theorem now follows. A particular interesting situation can occur in finite fields as we saw in the preceding proof. Here it can happen that the derivative of a polynomial g is the zero polynomial. But then g(x) = h(xp ) = h(x)p for a suitable polynomial h. Another property of Euclidean rings R is that every ideal I is principal, i.e. every ideal is of the form R · a for some element a in R (which is uniquely determined by I up to multiplication by a unit. Thus, every ideal in Z is of the form Zm = mZ = (m), and therefore there are no other quotients of Z than Z/mZ, and similarly for F [x]. For an arbitrary ring one defines the gcd of two ideals I and J as their sum I + J = {a + b : a ∈ I, b ∈ J} (which is again an ideal). If we consider the ring of integers and ideals Zm and Zn, then Zm + Zn = Zg, where g is the greatest common divisor of m and n. Similarly, for a polynomial ring F [x], the gcd of two ideals (f ) and (g) equals the ideal F [x] gcd(f, g), where gcd(f, g) is the normalized polynomial of largest deTheorem 9.4. A polynomial f in gree dividing f and g, or 0 if f = F [x] is divisible by the square of a g = 0. In particular we find prime polynomial r if and only if f and f 0 have a common prime factor. Theorem 9.5 (Bzout’s theorem). For any given polynomials f and g Proof. If f = an xn + an−1 xn−1 + · · · over a field F , there exist polynomithe derivative f 0 of f is defined as als h and k in F [x] such that f 0 = nan xn−1 + (n − 1)an−1 xn−2 + gcd(f, g) = f h + gk. · · · . The key identity for proving the 42 CHAPTER 2. INFINITE FAMILIES OF LINEAR CODES Ideals of F [x]/(xn − 1) The following theorem is true for any Proof. Let κ be the canonical map (not necessarily finite) field F . f 7→ [f ]xn −1 . The application I 7→ κ(I) defines a bijection between the Theorem 9.6. Every ideal of Rn = ideals of Rn and the ideals of F [x] F [x]/(xn − 1) is of the form Rn · which contain F [x] · (xn − 1). But [g]xn −1 , where g is a normalized di- any ideal I of F [x] is of the form visor of xn − 1. For divisors g and F [x] · f for some f , and then I conh of xn − 1, one has Rn · [g]xn −1 ⊆ tains F [x] · (xn − 1) if and only if Rn · [h]xn −1 if and only if g is a mul- f |(xn − 1). From this the theorem is tiple of h. obvious. Exercises 9.1. Prove that that a word corresponding to the residue class [f ]xn −1 is a codeword in Cn (g) if and only if g ∗ .[f ]xn −1 = 0 9.2. Compute the minimal distance of the binary code C23 (x11 + x10 + x6 + x5 + x4 + x2 + 1). 9.3. Deduce from Theorem 3 a formula for the number of cyclic codes of length n over a given field for an arbitrary n. 10. QUADRATIC RESIDUE CODES 10 43 Quadratic residue codes There is a subclass of cyclic codes which is particularly interesting, namely the quadratic residue codes. They are interesting because the Hamming code H(7, 4) and the Golay code G23 are members of this class, and because we can prove a lower bound for the minimal distances of these codes. We continue the notations of the previous section. In particular, as before we identify F n , for a given finite field F with the ring Rn = F [x]/(xn − 1). The Hamming weight h(c) of an element c = [f ] = [f ]xn −1 = [f ] in Rn equals then the number of nonzero coefficients of the remainder of f after division by xn − 1. The ring Rn can also be considered as a vector space over F (in fact, it a F -algebra). In particular, for every polynomial f in F [x] the expression f ([x]) is meaningful and equal to [f ]. Proposition 10.1. For any c1 , c2 in Rn , one has h(c1 c2 ) ≤ h(c1 ) h(c2 ). Proof. Let c1 = [f ], c2 = [g], where f and g are polynomials of degree < n. Let d and e be the minimal weights of f and g. Then f = a1 xm1 +a2 xm2 +· · ·+ad xmd and g = b1 xn1 + b2 xn2 + · · · + bd xnd for suitable ai , bj in F and non-negative integers mi , nj . Therefore X X h([f ][g]) = h([ ai bj xmi +nj ]) ≤ h([xmi +nj ]) = de. i,j i,j The group (Z/nZ)∗ of primitive residue classes acts on the ring Rn : the action of the residue class u on [f ] is defined by [f ]u := f ([x]u ). Here [x]u is a shorthand for [x]r = [xr ] with a (any) r from the class n. Since [x]n = 1 this does not depend on the choice of r. For the same reason, the definition of [f ]u does not depend on the choice of f in [f ]. The action of (Z/nZ)∗ is isometric, i.e. for any given u in (Z/nZ)∗ the map [f ] 7→ [f ]u defines an isometry with respect to the Hamming distance. Indeed, the map [f ] 7→ [f ]u , translated back to words in F n , is nothing else but a permutation of the entries by the permutation which maps i to the rest of ri by division through n (for any r in u). In particular, the action of (Z/nZ)∗ permutes the cyclic codes of length n and preserves dimension and minimal distance. In Exercise 2 we study the action of (Z/nZ)∗ on cyclic codes of length n more closely. We assume from now on that F = Fp for a prime p, and that l is a prime number different from p such that p is a square modulo l. Though we do not need it we mention that p ≡ mod l is equivalent to the statement that l lies in certain residue classes modulo 4p (see Addon 10). For example for the important case 2 the assumption that 2 is a square modulo the odd prime l is equivalent to \( l\equiv \pm1 \bmod 8\). The first primes for which 2 is a square are therefore 7, 17, 23, 31, 41, 47, 57, 71, 73, 79, . . . . 44 CHAPTER 2. INFINITE FAMILIES OF LINEAR CODES The assumption p ≡ mod l admits the existence of quadratic residue codes of length l over Fp . We describe the generator polynomials of these codes. For this let Q := {1 ≤ j ≤ l − 1 : j ≡ mod l}, N := {1 ≤ j ≤ l − 1 : j 6≡ mod l}. Set ρl := c + X xj , ql := gcd(ρl , xl − 1), j∈Q and define similarly νl and nl replacing Q by N . The element c of Fp is chosen 1+s as follows. If p = 2 we set c = 1+l 2 . If p is odd, we choose c = 2 where \( sˆ2 = (-1)ˆ{\frac {l-1}2}l\). Note that such an s exists indeed. Namely, by quadratic l−1 reciprocity (see below) and since p ≡ mod l we know that (−1) 2 l ≡ mod p. The quadratic residue codes of length l over Fp are the Rl -ideals generated by [ql ] and [nl ], respectively. We use QRl for the quadratic residue code with generator polynomial ql . The two quadratic residue codes are mapped to each other by the action of a non-square in F∗l as we shall see in the subsequent proposition.Therefore, from the point of view of coding theory it is a priori not important which one we consider. The reader might have noticed that, for p 6= 2, the constant c = 1+s 2 l−1 was not unambiguously defined since instead of the square root s of (−1) 2 l we could also take −s. However, it does not matter which root we choose since another choice would simply exchange [ql ] and [nl ] (see Exercise 2). Proposition 10.2. Let ph ≡ 1 mod l. There exists a lth root of unity ζ 6= 1 such that\l\) root of unity in Fph and a quadratic non-residue u modulo l such that l−1 l−1 2 2 Y Y 2 2 ql = (x − ζ j ), nl = (x − ζ uj ), . j=1 j=1 We leave the proof as an exercise to the reader (Exercise 3). As an immediate consequence we obtain Lemma 10.3. 1. For a in F∗l , one has ([ql ])a = ([ql ]) if a is a square in Fl , and one has ([ql ]a ) = ([nl ]) otherwise. 2. One has ql nl = xl−1 + xl−2 + · · · + 1.) Proof. 2. is clear from the theorem and since ζ s (1 ≤ j ≤ l − 1) runs through all roots of xl − 1 . ql nl = xl−1 + xl−2 + · · · + 1 = x−1 2 For 1. note that ([ql ]) consists of all [f ] where f vanishes at ζ j for all uj 2 1 ≤ j ≤ l−1 for 2 , and that ([nl ]) consists of all [f ] where f vanishes at ζ l−1 a r all 1 ≤ j ≤ 2 . If the integer r represents a, then [ql ] = [ql (x )], and ql (xr ) 2 vanishes at ζ rj (1 ≤ j ≤ {l − 1}2). Thus ([ql ])a ⊆ ([ql ]) if a is a square in Fl , a and ([ql ]) ⊆ ([nl ]) otherwise. Since dim([ql ])a = dim([ql ]) = dim([nl ]) = (see 9.1) the claim is now obvious. l+1 2 10. QUADRATIC RESIDUE CODES 45 We come to the main result of this section. Theorem 10.4 (Square root bound). The quadratic residue codes over Fp are [l, l+1 2 , d]p -codes, where √ d ≥ l. Proof. It is clear that the residue codes have length l and dimension l+1 2 , where the latter follows from Theorem 9.1. It suffices to prove the bound for d for the code QRl since both quadratic residue codes are mapped to each other by an isometry. Let [f ] be an element of QRl , and let a be a non-square in Fl . The element [f ]a is in [ql ]a = [nl ]. Therefore [f ] · ([f ]a ) is in ([ql ]) ([nl ]) = Rl [xl−1 + xl−2 + · · · + 1] (where latter identity follows from Lemma 10.3 ). But the latter code is one dimensional as we saw in Theorem 9.1. It follows [f ] · ([f ]a ) = t · [xl−1 + xl−2 + · · · + 1] for a suitable t in F . But then h [f ] · ([f ]a ) = l, provided t 6= 0. But t = 0 is easily seen to be equivalent to f (1) = 0. It is a rather non-obvious fact that indeed f (1) 6= 0 for an element of [f ] of minimal length in QRl , as we shall see in the next theorem. On the other hand the preceding proposition implies h [f ] · ([f ]a ) ≤ h([f ])2 . The theorem is now obvious. Supplement (to Theorem 10.4). If l ≡ −1 mod 4 then the minimal distance d of the quadratic residue codes of length l satisfies d≥ 1 √ + l − 1. 2 Proof. If l ≡ −1 mod 4, then −1 is a quadratic non-residue modulo l (see Quadratic reciprocity below). In this case we can take in the preceding proof a = −1, and then [f ]a = f ([x]−1 ). But h [f ] · f ([x]−1 ) ≤ h([f ])2 − h([f ] + 1 (see Exercise 1). From this we deduce as in the preceding proof that the minimal distance d of QRl satisfies d2 − d + 1 ≥ l, and therefore the claimed lower bound The first binary quadratic residue codes For a quadratic residue code of length l over F2 one can prove that its minimal distance d is odd, and, even more, that d ≡ 3 mod 4 if l ≡ −1 mod 8. The following table lists for the first binary quadratic residue of length\(l\) codes the square root bound (SRB), the improved square root bound of the supple- ment (ISRB) if it is better, the corrected improved square root bound (CISRB) if the ISRB was even, and the double-corrected improved square root bound (DCISRB) if l is −1 modulo 8 and the CISRB is congruent 1 modulo 4, and the true minimal distance d. 46 CHAPTER 2. INFINITE FAMILIES OF LINEAR CODES The parity extension C of a code C over a field F is obtained by adding to each codeword from C the sum of the entries of the codeword. As we shall see in the proof of the next Theorem the parity extension of the quadratic residue codes of length l have very interesting properties. . The proof points in fact to much more to discover. Theorem 10.5. Let QR1l the subcode of QRl consisting of all [f ] in QRl with f (1) = 0, and let d be the minimal length of QRl . Then the minimal length of QR1l equals d + 1. Proof. For the prove we consider the Fp -vector space Fp P1 (Fl ) of all maps from P ro1 (Fl ) into Fp . As a basis we can take the maps eP (a\in\Proˆ1(\F l)\), where eP (Q) equals 1 if p = Q and 0 otherwise. We identify a code C of length l over Fp with the subspace l X cj e[j̃,1] : c0 c1 . . . cl−1 ∈ C j=0 (where j̃ denotes the residue class of j in Fl ). The parity extension of C is then the code l−1 X cj : c ∈ C . C = c + e[1:0] j=1 Using these identifications QR1l equals then the subcode of all c in QRl such that c([1 : 0]) = 0. We use without proof the Theorem and Prange, which states that of Gleason the natural action of SL(2, Fl ) on Fp P1 (Fl ) leaves QRl invariant. The natural action is defined as the linear continuation of a b ,e [x:y] 7→ eax+by:cx+dy . c d It is easy to see that SL(2, Fl ) acts transitively on P1 (Fl ). Now let c be a codeword of minimal Hamming weight, say e, in QR1l . Transforming c with a suitable A in SL(2, Fl ) into c0 with c0 (∞) 6= 0, then deleting the last entry, we find a codeword in QRl , whose entries sum up to non-zero, of Hamming weight e − 1. Therefore, e ≥ d + 1. Vice versa, if we take a codeword c of length d in QRl , then its parity extension c must have length d + 1 (since otherwise it would be in QR1l , which only contains codewords of length ≥ d + 1), and then permuting again we find a codeword in QR1l of length d + 1. The ternary Golay code The quadratic residue code of length 11 over F3 is known as the ternary Golay code. The generator polynomial is g = x5 + 2x3 + x2 + 2x + 2. As basis one might take [g · xj ]x11 −1 for j = 0, 1, . . . , 5. The Hamming weight of the basis elements is 5. If one searches through all 36 = 729 codewords one sees that 5 is also the minimal distance of this code. It is therefore a [11, 6, 5]3 -code. The ternary Golay code possesses a number of extraordinary properties. 10. QUADRATIC RESIDUE CODES 47 Quadratic reciprocity Let l be an odd prime number. We have use (F∗l )2 for the subgroup consisting l − 1 2 of the squares a2 of the elements of 1 · 2··· ≡ −1 mod l 2 F∗l . The kernel of the squaring map sq : a 7→ a2 consists of {±1}. Ac- as follows from Wilson’s theorem cordingly, (which states that, for any odd prime, (l − 1)! ≡ −1 mod l) and us l−1 ing that #(F∗l )2 ) = Fl∗ : {±1} = . 2 l−1 l − 1 l−3 l+1 l+3 ∗ 2 · · · · (l−1) ≡ (−1) 2 ··· · · · 1 mod l. ∗ The group Fl Fl ) has in particu2 2 2 2 lar two elements, and hence it possummarize this in the form sesses exactly one non-trivial char- One can −4 = +1 if and only if l ≡ that l acter (i.e. homomorphism) into the 1 mod 4. In this way we see that group \{\pm 1\}. This character is a property of a number modulo l is usually considered as character on Fl expressed as a property of l modulo and denoted by l· . Thus this number. ( This remarkable fact is valid in +1 if a is a square in Fl , much more generality and called the a l = −1 otherwise. Quadratic Reciprocity Law. Theorem 10.6 (Quadratic reciOne usually sets in addition 0l =0, procity). For any odd prime numx andif x is an integer, one uses l bers l and p one has x·1Fl for . ( l l −1 if l and p are congruent −1 modulo 4, p If −1 is a square, say, −1 = a2 , l · p = l−1 +1 otherwise. then (−1) 2 = +1 since by Fermat’s theorem al−1 = 1. Thus if −1 is a square, then l ≡ 1 mod 4. Vice Moreover, 2l = +1 if and only if versa, if l − 1 is divisible by 4, we l ≡ ±1 mod 8. Exercises 10.1. Prove that for any polynomial f in F [x], one has h [f ]f ([x]1 ) h([f ])2 h([f ]) + 1. 10.2. Assume that n and |F | are relatively prime. Prove that, for every two non-zero minimal cyclic codes C and C 0 of length n over F which have the same dimension, there exists a u in F∗l such that C 0 = C u . 10.3. In the notations of this section let l 6= 2 and define X ρ†l := c† + xj , j∈Q where c† = 1−s 2 . Show that gcd(ρ†l , xl − 1) = nl . 10.4. Prove Proposition 2. 48 CHAPTER 2. INFINITE FAMILIES OF LINEAR CODES Chapter 3 Symmetry and duality 11 Weight enumerators Given a code C it is a natural question to ask for the number of codewords of a given Hamming weight. As it is custom in combinatorics we combine these informations in a generating function. More precisely, we define the weight enumerator of C by X wC (x) = xh(c) ∈ Z[x]. c∈C Note that the kth coefficient of the weight enumerator is the number of c in C such that h(c) = k. We saw already before several examples of codes which are somehow the same since they only differ by a permutation of the places in their codewords. More formally we call two codes C and C 0 of the same length n over a field F n equivalent if there exists an automorphism α of the F -vector space F which preserves the Hamming distance (.i.e. satisfies h α(v), α(w) = h(v, w) for all v and w in F n ) such that α(C) = C 0 . It is clear that equivalent codes have the same weight enumerator. The isometries of F n with respect to the Hamming distance, i.e. the ismorphisms of the F -vector space F n preserving the Hamming distance form a group, which can be explicitly described. Obvious isometries are “permutation of the places of a word” and “multiplication of a word by a nonzero scalar from F ”. In fact, there are no others in the sense that the group of isometries of a given F n is generated by these isometries (see Exercise 1). Example 11.1 (Weight enumerators of Hamming and Golay code). If we read off the Hamming code from the projective plane over F2 as explained in Example 4.2 we find that it consists of 1 codeword of weight 0, 7 codewords of weight 3 (which correspond to the lines of the projective plane), 7 codewords of weight 4 (which correspond to the complements of the lines), and 1 codeword of weight 7 corresponding to the projective plane itself. Therefore wH(7,4) = 1 + 7x3 + 7x4 + x7 . If we append a parity bit to each codeword of the Hamming code then only the codewords of odd weight become codewords of increased weight in the 49 50 CHAPTER 3. SYMMETRY AND DUALITY extended code. Therefore wH(7,4) = 1 + 14x4 + x8 . For the Golay code we find (using e.g. Exercise 3 in Section 9) wG23 = 1 + 253x7 + 506x8 + 1288x11 + 1288x12 + 506x15 + 253x16 + x23 . And as before we deduce from this for the extended Golay code wG24 = 1 + 759x8 + 2576x12 + 759x16 + x24 . We see that the weight enumerators of the example are palindromic, i.e. satisfy xdeg(w) w(1/x) = w(x). This is in fact easy to understand: each of the codes in the example possesses the word 1 consisting only of 1s, and adding 1 defines a bijection between the codeword of weight k and those of weight n − k (where n is the length of the code). Exercises 11.1. Show that he group of isometries of F n with respect to the Hamming distance equals S n M , where S is the group of isometries of the form pσ : w1 w2 · wn 7→ wσ(1) wσ(2) · · · wσ(n) , (σ an element of the symmetric group of n letters), and where M is the group of isometries of the form ma : w 7→ aw := (a1 w1 , a2 w2 , . . . , an wn ) (a = (a 1, a 2, . . . , a n) an element in (F ∗ )n ). 12. MACWILLIAMS’ IDENTITY 12 51 MacWilliams’ Identity Weight enumerators and their deeper properties are shadows of a richer theory behind them, namely the theory of Weil representations. We do not have to consider here the full theory of these representations but consider only only a small part, stream-lined to applications in coding theory. For this let F be a finite field with q elements. We consider the polynomial ring over C in q variables xa indexed by the elements a of F . To a code C over F of length n we associate its generalized weight enumerator X WC = xc1 xc2 · · · cxn ∈ Z [{xa }a∈F ] . c=c1 c2 ···cn ∈C This is the generating function for P the following problem: for given non-negative integers ka (\(a\in F\)) with a∈F ka = n, find the number of codewords a ({ka }a∈F ) in C which, for each a, have exactly ka entries equal to a. In other words, X k ka WC = a ka1 , . . . , kaq xaa1 1 · · · xaq q , ka1 ,...,kaq ≥0 ka1 +···+kaq =n where aj is a fixed enumeration of F . Note also that WC (x0 //1, xa //x (a ∈ F∗ )) = wC (i.e. replacing x0 by 1 and all other xa by x yields the usual weight enumerator). If F = F2 the generalized enumerator WC does not carry more information than wC . Namely, we have WC (x0 , x1 ) = xn0 wC (x1 /x0 ). For the following we fix a non-trivial group homomorphism χ : F + → C∗ . Here F + indicates the field F viewed as abelian group with respect to addition. P Note that a∈F χ(ab) equals 0 for b 6= 0 and equals |F | otherwise, as follows from the fact that χ(ab) = 1 for all a is only possible for b = 0 (see Exercise 2). We define two linear operators T and S on the subspace H1 of linear forms in C [{xa }a∈F ]: T xa = ψ(a) xa , σ(F ) X χ(−ab) xb , Sxa = √ q b∈F where 1 X σ(F ) = √ ψ(a)−1 . q a∈F The operator T is not needed in this section and for the moment the reader may ignore it and the following definition of ψ. However it will play an important role in Section 3.4. We use ψ for a map ψ : F → C∗ such that 52 CHAPTER 3. SYMMETRY AND DUALITY 1. ψ(a+b) ψ(a) ψ(b) = χ(ab), 2. ψ(a)2 = χ(a2 ). 1−p If |F | = pk for some odd prime p we can take ψ(a) = χ(a) 2 . In fact, this is the only choice as we see by taking the 1−p 2 th power on both sides of 2. (For verifying these two statement you want to use that χ(a)p = χ(pa) = 1.) However, if p = 2 the proof of the existence of a psi satisfying 1 and 2 is more subtle and there are in general more than one choices (see Addon 12) for more details). It is not difficult to show that that T and S are invertible as operators on H1 . For T this is obvious. For S one can show that S 2 xa = σ(F )2 x−a (see Exercise 1), and one uses that σ(F )8 = 1. Let G be the subgroup of GL(H1 ) generated by the linear operators S and T . Though, in this section, we shall not make use of it the following is notewor thy. It can be shown that, for odd q, the application ( 10 11 ) 7→ T and 01 −1 7→ S 0 can be extended to a homomorphism from SL(2, F ) onto G. (This extension is unique since SL(2, F ) is generated by ( 10 11 ) and 01 −1 .) Thus G is a homo0 morphic image of a quotient of SL(2, F ). If q is 2-power the situation is slightly more complicated. Here G turns out to be the homomorphic image of a central extension of SL(2, F ) of order two. We extend the action of G on H1 to an action of G on C [{xa }a∈F ] by setting, for A in G and a polynomial f : (A.f ) ({xa }) = f ({A.xa }) . Later we shall be interested in codes whose weight enumerators are invariant under G. For the moment we confine ourselves to the statement that S leaves weight enumerators invariant (up to multiplication by a constant). For this we introduce the notion of dual code. Let C be a linear code over F of length n. The dual code C⊥ of C is the subspace of words w = w1 w2 · · · wn in F n such that n X wi ci w · c := i=1 equals 0 for all c = c1 c2 · · · cn in C. Theorem 12.1. For any linear code C over F of length n, one has SWC = σ(F )n |C| √ n WC ⊥ . q Proof. One has ( σ(F ) X SWC = WC √ χ(−ab) xb q b∈F ) a∈F σ(F )n X X = √ n χ(−c · w) xw1 xw2 · · · xwn q n c∈C w∈F But c∈C χ(−c · w) = 0 unless w ∈ C⊥ , when the sum equals |C|. The theorem is now obvious. P 12. MACWILLIAMS’ IDENTITY 53 As a consequence of the foregoing theorem we obtain a remarkable identity due to Jessie MacWilliams which expresses the (usual) weight enumerator of a dual code in terms of the enumerator of the given code. Theorem 12.2 (MacWilliams’ Identity). For any code of length n over a field F , one has wC ⊥ (x) = |C| −1 wC 1−x 1 + (q − 1)x n (1 + (q − 1)x) . P Proof. Indeed, setting x0 = 1 and xb = x (b ∈ F ∗ ), the sum b∈F χ(−ab)xb becomes 1 − x for a 6= 0 and 1 + (q − 1)x for a = 0. We therefore obtain σ(F )n X (SW C) (x0 //1, xb //x (b ∈ F ∗ )) = √ n (1 + (q − 1)x)n−h(c) (1 − x)h(c) . q c∈C Comparing this to the formula of Theorem 12.1 we recognize MacWilliams’ Identity. If F = F2 MacWilliam’s Identity assumes a especially attractive form if written in terms of Wc (x0 , x1 . Here it becomes |C ⊥ | −1/2 −1/2 WC ⊥ (x0 , x1 ) = |C| WC (x0 , x1 ) √12 1 1 1 −1 . (We used here |C| · |C ⊥ | = 2n , where n is the length of C, as we shall prove in the next section.) Example 12.3 (Weight enumerators of the duals of Hamming and Golay code). The weight enumerators of the Hamming and Golay code were computed in Example 11.1. For the weight enumerators of the dual codes we find by MacWilliams’ Identity: wG⊥ = 2−12 wG23 23 wH(7,4)⊥ 1−x 1+x (1 + x)23 = 253x16 + 1288x12 + 506x8 + 1 1−x = 2−4 wH(7,4) (1 + x)7 1+x = 7x4 + 1. In particular, we see that the minimal weights increased, but for the price that the information rate dropped below 12 . 54 CHAPTER 3. SYMMETRY AND DUALITY The existence of characters of degree two The purpose of this section is the proof of the following statement. Theorem 12.4. Let χ : M × M → C∗ be a symmetric Z-bilinear nondegenerate map on an (additive) finite abelian group M . Then there exists a map ψ : M → C∗ such that, for all a, b in M , one has 1. ψ(a+b) ψ(a) ψ(b) = χ(a, b), 2. ψ(a)2 = χ(a, a). The term non-degenerate means that the application b 7→ χ(∗, b) defines an isomorphism of groups M → Hom(M, C∗ ). Note that it suffices to verify that b 7→ χ(∗, b) is injective, since by general theorems on the dual groups of abelian groups M and the group of characters Hom(M, C∗ ) on M have the same cardinality (are even isomorphic). In the text we applied this theorem in the situation M = F + and χ(a, b) equals to χ(ab) for a non-trivial character of F + . Note that 1. and 2. imply ψ(na) = ψ(a)n 2 unity in C∗ . We leave it to the skeptical reader to verify hat this defines indeed a group. Since [M, χ] is finite and abelian we can extend the character (0, s) 7→ s−1 on the subgroup {(0, s) : s ∈ µe } to a character ψb on all of [M, χ]. Let ψ(a) := ψb ((a, 1)). It is easy to see that ψ satisfies 1. However, 2. is not necessarily satisfied. But we can modify ψ by multiplying with any character of F + and 1. is still be satisfied for the modified ψ. We show that we can modify ψ in this way so to fulfill 1. For this consider the map γ on M defined by γ(a) := ψ(a)2 . χ(a, a) From 1. it follows that γ is a character of M . A simple calculation shows that γ is trivial on the subgroup M [2] of elements a of M such that 2a = 0. Namely, we have ψ (a, 1)2 ψ(a)2 ψ ((2a, χ(a, a))) ψ(2a) = = = , χ(a, a) χ(a, a) χ(a, a) χ(a, a)2 which equals 1 if 2a = 0. But by general duality theory for abelian groups the subgroup of characters of 2 M nwhich are onn2M +1 +1[2] equals ψ ((n + 1)a) = ψ(na)ψ(a)χ(na, a) = ψ(a) χ(a, a)ntrivial = ψ(a) ψ(a)2n = ψ(a)n+1 , the subgroup of squares of characwhere the first identity is 1., the sec- ters on M . Therefore γ = δ 2 for a ond the induction hypothesis, and suitable character on M , and by assumption, δ = χ(∗, c) for some c in the third is 2. M . In other words, Proof. For proving the theorem let [M, χ] be the group consisting of all ψ(a)2 = χ(a, c)2 pairs (a, s) (\(a\in F\), s ∈ µe ) with χ(a, a) the composition law for all integers n (as follows inductively proceeding like (a, s) · (a0 , s0 ) = (a + a0 , ss0 χ(a, a0 )) . Here e denotes the exponent of M and µe the subgroup of eth roots of for all a in M , from which we recognize that 2. is fulfilled with ψ replaced by ψ/χ(∗, c). This proves the theorem. 12. MACWILLIAMS’ IDENTITY 55 Exercises 12.1. Show that the operator S has finite order and is hence invertible. (You can use without proof that σ(F )8 = 1.) 12.2. Show that χ(ab) = 1 for all a ∈ F only for b = 0. Deduce from this that ( X |F | if b = 0, χ(ab) = 0 otherwise. a∈F 12.3. Compute σ(F ) for |F | equal to a 2-power, 3-power, or 5-power. What do you expect for an arbitrary F ? 56 13 CHAPTER 3. SYMMETRY AND DUALITY Duality In coding theory we are naturally interested in principles for the construction of codes which allow to control the informations rate and the minimal distance. One of these constructions is taking the dual of a given code. For explaining this fix a finite field F and recall the scalar product v · w := v1 w1 + · · · + vn wn of two words v = v1 · · · vn and w = w1 · · · wn in F n . Note that the scalar product takes values in F . It is non-degenerate, which means that v · w = 0 for all w in F n is only possible for v = 0. For a (not necessarily linear) code C in F n we define its dual C ⊥ by C ⊥ {w ∈ F n : w · c = 0 for all c ∈ C} . It is clear that C ⊥ is a linear code, even if C is not, and that C ⊥ equals the dual of the linear space hCi generated by C. Because of this we are mainly interested in studying the duals of linear codes. Concerning the dimension of the dual of a given linear code we have the following. Proposition 13.1. Let C be a linear code of length n over F . Then dim C + dim C ⊥ = n. Proof. Let c1 , . . . , ck be a basis of C and consider the map L : F n → F k, w 7→ (w · c1 , . . . , w · ck ) = wM, where M is the matrix whose columns are the ci . Note that the rank of M is k (since the ci are linearly independent), so that L is surjective. One of the main theorems about linear maps in linear algebra states dim ker L + dim im L = n. But as we saw, im L = F k , and obviously ker L = C ⊥ . This proves the proposition. The minimal distance of C ⊥ is not simply a function of the minimal distance of C, but we need the knowledge about the distribution of weights in C. The relevant formula is here MacWilliam’s Identity from the last section: 1−x −1 wC ⊥ (x) = |C| wC (1 + (q − 1)x)n . 1 + (q − 1)x It is clear that we can read off from this formula the minimal distance of C ⊥ . Let us suppose for the moment for simplicity F = F2 , i.e. q = 2, and that C is a code of length n and dimension k over F2 . Then 1−x d(C ⊥ ) = ordx=0 wC (1 + x)n − 2k = ordx=1 wC (x) − 2k−n (1 + x)n , 1+x where ordx=a f (x) denote the vanishing order of the function f (x) at x = a. (The second identity follows on replacing x by 1−x 1+x .) 13. DUALITY 57 The reader should watch out that, for a given linear code, C and C ⊥ might have non-zero intersection; it is not at all in general true that C and C ⊥ form a direct sum. On the contrary, very often the most interesting codes are those which are self-dual, i.e. those codes such that C = C⊥ . A necessary condition for the existence of such a code is of course that its length is even, and then its dimension must be f racn2 as follows from the preceding proposition. Example 13.2 (The extended Hamming code). The extended Hamming code H(7, 4) is a [8, 4, 4]2 -code. Its weight enumerator equals 1 + 14x4 + x8 , from which we see that the weight of each of its codewords is divisible by 4. As we shall see in a moment this implies that C is contained in C ⊥ . On using the above proposition this implies that C is in fact self-dual. We call a binary code even if the weights of its codewords are all even, and we call it doubly even if the weights of its codewords are divisible by 4. We have: Proposition 13.3. Let C be a binary linear code of length n. Then one has: 1. If C is self-dual the C is even. 2. If C is doubly-even, then C is self-dual. Note that there are indeed codes which are self-dual but not doubly-even. An example is the repetition code C := {(00, 11}. Proof. For 1. note that, for a self-dual code, every codeword satisfies c · c = 0. But c · c equals the number of 1s in c, viewed as element in F2 , which implies that this number is even if c · c = 0. 2. is an immediate consequence of the identity 1 0 0 (h(c + c ) − h(c) − h(c )) · 1F2 = c · c0 . 2 Here 1F2 is the non-zero element of F2 . Note that the number on the left is an integer. We complement our theory of cyclic codes of Section 2.3 by the following. Theorem 13.4. Let C be a cyclic code of length n with generator polynomial g and control polynomial h. 1. Then C ⊥ is cyclic with generator and control polynomials h] and g ] . (Here, for a polynomial f of degree l, we use f ] for the reciprocal polynomial, i.e. the polynomial f ] (x) = xl f (1/x).) 2. The code C ⊥ is equivalent to the code generated by h. Proof. It is obvious that C ⊥ is cyclic, and that g ] and h] are bothPdivisors of xn −P 1. For computing the generator polynomial of C ⊥ let g = i gi xi and h = j hj xj . Then X n − 1 = gh = X l xl l X i=0 gi hl−i . 58 CHAPTER 3. SYMMETRY AND DUALITY Comparing the coefficients of xn−1 on both sides gives 0 = (g0 , g1 , . . . , gn−1 ) · (hn−1 , hn−2 , . . . , h0 ). But the word on the right represent the coefficients of h] . It follows that h] is in C ⊥ , and therefore, that the code C1 with generator polynomial h] is in C ⊥ . But both codes have the same dimension (as follows from dim C1 = n − deg h] (see Proposition 9.1) and dim C ⊥ = n − dim C = n − deg h (see Proposition 1 and again Theorem 1 in Section 2.3). Clearly h] g ] = xn − 1, so that g ] is the control polynomial of C ⊥ . For 2. we use the application f 7→ f ] , which defines an isometric map between C ⊥ and the code with generator polynomial h. Finally, we also determine the dual codes of the Reed-Solomon codes studied in Section 7. For this we have to introduce some notation. Let a = a1 · · · an be a vector of length n with pairwise different entries ai from F . We set ga (x) := n Y (x − ai )−1 ∈ F (x), i=1 where F (x) is the field of rational functions in the variable x. Moreover, for any polynomial f , we set Resai (f ga ) := [f (x)ga (x) · (x − ai )]x=ai = f (ai ) n Y (ai − aj )−1 . j=1 j6=i Lemma 13.5. For f in F [x]<n−1 , one has n X Resai (f ga ) = 0. i=1 We shall postpone the proof for a moment and state our main result about the duals of Reed-Solomon codes. Theorem 13.6. Let C = RSq (a, k) be a Reed-Solomon code of length n ≥ k. 1. Then C ⊥ equals the image of the map R : F [x]<n−k → F n , f 7→ (Resa1 (f ga ), . . . , Resan (f ga )) . 2. C ⊥ is equivalent to the Reed-Solomon code RSq (a, n − k) Proof. For 1. we show, first of all, that he image of R is contained in C ⊥ . Indeed, let h in F [x]<k and f in F [x]<n−k , then hf has degree ≤ n − 2, and from the lemma we obtain n X i=1 h(ai )Resai (f ga ) = n X Resai (hf ga ) = 0. i=1 The map R is injective since R(f ) = 0 implies f (ai ) = 0 for all aj , and, since f has degree < n, therefore f = 0. Hence the image or R has dimension n − k, which equals n − dim C (see Theorem 7.1) and is therefore the dimension of C ⊥ . 13. DUALITY 59 2. follows on calculating R(f ) = (f (a1 )α1 , . . . , f (an )αn ) , where we use αi = n Y (ai − aj )−1 . j=1 j6=i It follows that C ⊥ equals the image of RSq (a, n−k) under the map (w1 , . . . , wn ) 7→ (α1 w1 , . . . , αn wn ), which is an isometry of the Hamming distance. Proof of Lemma 5. For a polynomial f consider the determinant 1 a1 a21 · · · a1n−2 f (a1 ) n−2 1 a2 a22 · · · a f (a2 ) 2 Df := det . . . . .. . . . . . . . . . . 1 an a2n · · · ann−2 f (an ) If f has degree ≤ n − 2, then the last column is a linear combination of the first n − 1 columns, and hence Df = 0. On the other hand we find by Kramer’s rule, developing the determinant after the last column n X Df = (−1)n−1−r f (ar ) · Vr , r=1 where Vr equals the Vandermonde determinant of the numbers a1 , . . . , ar−1 , ar+1 , . . . , an . In other words, Y Vr = (aj − ai ). 1≤i<j≤n−2 i,j6=r But we can write Vr = (−1)i−1 n Y (ar − ai ). (aj − ai ) Y i=1 i6=r 1≤i<j≤n−2 It follows Df = (−1)n Y (aj − ai ) 1≤i<j≤n−2 n X Resai (f ga ). i=1 which implies the lemma. Exercises 13.1. Let C be an [n, k, n+1−k]q code such that C ⊥ is an [n, n−k, k +1]q -code (i.e. such that C and C ⊥ are both MDS-codes - see Section 7). Show that k−1 X n wC (x) = 1 + (q k−i − 1)(1 − x)i xn−i . i i=0 13.2. Deduce from Theorem 13.4. that the parity extension of any quadratic residue code is self-dual. 60 CHAPTER 3. SYMMETRY AND DUALITY Appendix 14 Solutions to selected exercises 1.1 A straightforward approach in the sense of “without much theory” is to use a computer algebra system. We use Sage, write the few lines of code to solve the exercise, and apply it here to {0, 1}3 instead of {0, 1}5 for reducing the output (and the computation time). def min_dist( C): # computes the minimum distance of a code C pairs = Subsets(C,2) d = infinity for w,v in pairs: d = min( d, sum( 1 for i in range(len(w)) if w[i] != v[i])) return d F = [0,1] W = CartesianProduct( F,F,F) # in fact, W = CartesianProduct( F,F,F,F,F) for the exercise W = {tuple(w) for w in W} S = Subsets(W) tbl = {} for C in S: p = (min_dist(C),C.cardinality()) if tbl.has_key(p): tbl[p].append(C) else: tbl[p] = [C] ll=tbl.keys() ll.sort(); ll 1.2 We need that 3 + 2 · 5 + 3 · 4 + 4 · 0 + 5 · 6 + 6 · 4 + 7 · 1 + 8 · + 9 · 3 + 10 · 5 = 163 + 8 · ∗ 61 62 APPENDIX is divisible by 11, which is the case only if ∗ = 3. Looking up the resulting ISBN 10 number 3540641335 in https://de.nicebooks.com reveals the book. 2.1 The only non-trivial property to check is the triangle inequality h(v, w) ≤ h(v, t) + h(t, w). But, indeed, if v and w differ at the ith place than at least one of the pairs v, t and t, w differ also at the same place. 2.2 There are ni choices for i places among n where a word can differ from give one, and at each of these i many places we have a−1 choices for a letter different from the latter at the corresponding place of the given word. Thus there are in total ni (a − 1)i words of length n having Hamming distance i to a given word. The claimed formula is now immediate. 2.3 It suffices to prove that, for any 0 ≤ i ≤ a−1 a n, one has n n i−1 (a − 1) ≤ (a − 1)i . i−1 i This inequality is equivalent to n−i+1 (a − 1) ≥ 1, i which is indeed true since the left hand side equals −(a − 1) + n+1 n+1 a ≥ −(a − 1) + =1+ . i/(a − 1) n/a n 2.4 We have, for p ≥ 1 − a1 , X 1 i≤n(1− a ) X n X n n (a − 1)i ≤ (a − 1)i = an . (a − 1)i ≤ i i i i≤np i≤n Taking loga and dividing by n, and using, that by Theorem 4 the left hand side tends to 1, whereas the right hand side equals 1, we conclude that the limit of the middle term exists and equals 1. 14. SOLUTIONS TO SELECTED EXERCISES 63 2.5 There are in total p2 normalized polynomials of degree 2 in Fp [x]. to find the irreducible ones, we have to suppress those which are of the form f 2 or f · g with normalized polynomials f and g of degree 1, which are p and p2 many, respectively. Thus there remain p2 −p− p2 = p(p−1)/2 irreducible polynomials of degree 2. The number of irreducible polynomials over Fp We fix a prime power q. Denote by Nq (l) the number of normalized irreducible polynomials of degree l over Fq . It is not hard to prove the following Proposition 14.1. 1X µ(l/d)q d . Nq (l) = l d|l Taking the logarithm on both sides, expanding in powers of q −s and comparing coefficients, we find the claimed formula. The arithmetic function Nq (n) is multiplicative and, for a prime power lt , we have Proof. Every normalized polynomial can be factored in a unique way into t t−1 Nq (lt ) = pl − q l . a product of powers of normalized irreducible polynomials. From this it follows that We conclude that Nq (n) ≥ 1 for all X Y q −s deg(f ) = (1 − q −s deg(g) )−1 , n. We see that g f where f runs through all normalized and g through all normalized irreducible polynomials in Fq [x]. The number of normalized polynomials of degree n is q n , so the left hand side of the last formula equals 1/(1− q 1−s ). Thus the last formula can be rewritten as Y 1 − q −s = (1 − q −sn )Nq (n) . n≥1 Nq (l) = 1. l→∞ ql lim In other words, for a given large degree l an arbitrary polynomial will be with probability close to 1 irreducible. For the first primes, the first values of the sequence Np (l) are 3.1 The mean value of h on EC is E(h) = X c∈C,w∈An h(c, w) PC (c, w) = 1 X p r r a(r) (1 − p)n−r . |C| a−1 r≥0 where a(r) is the number of pairs (c, w) in C × An with h(c, w) = r. There are |C| possibilities to pick c, and then, for a given c, there are nr places which 64 APPENDIX can be changed in (a − 1)r ways to yield a w with h(c, w) = r. Thus n a(r) = |C| (a − 1)r , r and inserting this into the last formula for E(h) gives X n d E(h) = r pr (1 − p)n−r = (1 − p + et p)n t=0 = np. r dt r≥0 Before computing the variance of h we note that σ 2 (h) = E(h2 ) − E(h)2 , as is immediate from the defining formula for the variance by multiplying out |h(e) − E(h)|2 . By a similar computation as before we obtain E(h2 ) = d2 (1 − p + et p)n t=0 = n(n − 1)p2 + np, 2 dt which implies indeed σ 2 (h) = np(1 − p). 3.2 If w = 0, then w is contained in every subspace, and hence the mean value is 1. So assume that w 6= 0. Then every v 6= 0, w defines the 2-dimensional subspace hv, wi containing w, and every such subspace occurs exactly two times when v runs through F32 \ {0, v} (since v and v + w define the same subspace). Thus the requested mean value is (23 − 2)/g32 , where g32 is the number of 2dimensional subspaces of F32 . But this number is the same as g31 , the number of 1-dimensional subspaces of F32 (as follows from the fact that the map V 7→ V ∗ defines a bijection between the set of two-dimensional subspaces of F32 and the 1-dimensional subspaces of the dual space (F32 )∗ , which is isomorphic to F23 ; here S ∗ is the subspace of linear forms vanishing on S). Obviously, g31 = 23 − 1, so that the desired mean value is 67 . A more general approach is to count k-dimensional subspaces containing a given subspace S by identifying them with subspaces of V /S of dimension k − s, where s is the dimension of S. 4.1 Solution by Robert Stark : # SAGEmath def min_dist(C): pairs = Subsets(C, 2) d = infinity for w, v in pairs: d = min(d, sum(1 for i in range(len(w)) if w[i] != v[i])) return d idMat = matrix.identity(12) # 12x12 identity matrix 14. SOLUTIONS TO SELECTED EXERCISES 65 adjIcoMat = graphs.IcosahedralGraph().adjacency_matrix() # adjacency matrix of icosahedral def compl(x): if x == 1: return 0 else: return 1 adjIcoMat = adjIcoMat.apply_map(compl) # complement genMat = idMat.augment(adjIcoMat) # generator matrix # generate all codewords codewords = [] for w in list(span(idMat, GF(2))): codewords.append(w * genMat) for c in codewords: c.set_immutable() # fix for min_dist function print("Minimal distance of G24:", min_dist(Set(codewords))) 4.2 For k ≤ n, let Regk,n (F ) the set of k × n-matrices over F of full rank. The application M 7→ space generated by the rows of M defines a surjective map S : Regk,n (F ) → Gkn (F ). Moreover S(M ) = S(M 0 ) if and only if there exists a g in GL(k, F ) such that M = gM 0 . Note that gM = M implies g = 1 since by assumption M possesses a submatrix in GL(n, F ). Therefore the preimage of a subspace in Gkn (F ) under the map S comprises exactly N := GL(k, F ) elements. We conclude that |Gkn (F )| = |Regk,n (F )| . |GL(k, F )| We have |Regk,n (F )| = (q n − 1)(q n − q)(q n − q 2 ) · · · (q n − q k−1 ). Namely, for the first row of an M in Regk,n (F ) we have q n − 1 choices (any vector except 0). For the second row we have q n − q choices (any vector except the ones in the subspace spanned by the first row), for the third q n − q 2 (any vector except the ones in the subspace spanned by the first two rows), etc. Furthermore |GL(k, F )| = |Regk,k (F )| = (q k − 1)(q k − q)(q n − q 2 ) · · · (q k − q k−1 ). Combining the last three formulas proves the claimed formula. 66 APPENDIX 4.3 Let h1 , . . . , hk denote the columns of the control matrix K. Let s be minimal so that there are columns hi1 , . . . , his (\(i 1\lt i 2 \lt\cdots\lt i s\)) which are linearly dependent, i.e. such that there exists a codeword c such that ci = 0 for all i outside {i1 , . . . , is }. Note also that all cij are nonzero by the minimality of s. Thus c has Hamming weight s, and we conclude d(C) ≤ s. Vice versa, let c in C with minimal weight d = d(C). If i1 < · · · < is are the places where c has nonzero entries, then the columns hi1 , . . . , hid are linearly dependent. Therefore d ≥ s. 4.4 For proving that V, S is indeed an (n, k)-system, we need to show that the columns of the generator matrix span a k-dimensional space. But this s clear since the rank of G is k (since the k rows form a basis of C, and since “rowrank=column-rank”. Next, we need to show that C equals the set of vectors φ(P1 ), . . . , φ(Pn ) , where Pi denotes the ith column of G, and where φ runs through the space of linear forms F k×1 → F . But application of such a φ equals left multiplication by a fixed vector x in F k , and then φ(P1 ), . . . , φ(Pn ) = xG. The claim is now obvious. 5.1 Let a and b two positive integers. We have d d = −ε a a l d m d e dde for some 0 ≤ ε < 1. Set m := ab . Since ab is in 1b Z, we have d ad e k =m− b b for some integer 0 ≤ k < b. It follows d k+ε =m− . ab b d Since k + ε < b we find d ab e = m, which proves the claim. 5.2 Choose a codeword c of Hamming weight d and choose i so that c has a nonzero entry at the ith place. Let C 0 be the code obtained from C by deleting the ith places of the codewords in C. Clearly C 0 has length n − 1. Since deleting a place can lower the Hamming weight of a word by at most 1 the code C 0 has minimal weight ≥ d − 1. In fact we have equality since deleting the ith place of c yields a word of weight d − 1. Finally the map C → C 0 which deletes the ith place is injective since every nonzero codeword in C is mapped to a word of Hamming weight ≥ d − 1, which is hence a nonzero word. 14. SOLUTIONS TO SELECTED EXERCISES 67 5.3 Let k be the maximal dimension for a code of length n and minimal distance d over the given field, and let C be an [n, k, d]q code. (Note that codes of length n and minimal distance 0 ≤ d ≤ n exist, e.g. h(11 . . . 100 . . . 0)i with d many 1s, so that it is in fact justified to talk of a maximal one amongst these.) Then there is no word w in An \ C which has distance ≥ d to all code words, since otherwise, for such a word w the subspace C ⊕ hwi would have still minimal distance d ( as follows using h(aw, c) = h(w, ac) for c in C and scalars a). Therefore the balls of radius d − 1 around the code words cover all of An . In particular, q k · Vq (n, d − 1) ≥ q n . Taking the base-q logarithm proves the theorem. 7.1 I := ker(EV) is a principal ideal (since, as kernel of a linear map, it is a subvector-space of F [x], since, for any polynomial in I, any multiple is obviously also in I, and since any ideal in F [x] is a principal ideal). Thus I = F [x] · g for some monic polynomial g. Since g(a) = 0 for all a in F , the polynomial g has at least q := |F | zeros, and hence its degree is ≥ q. But xq − x has all elements of F as zeros (since aq−1 = 1 for all a in the multiplicative group F ∗ ), is therefore contained in I, hence a multiple of g, whence g = xq − x. Thus I = F [x] · (xq − x). 7.2 As in the previous exercise one shows that the map Eva : F [x] → F n f 7→ f (a1 ), . . . , f (an ) Qn has kernel F [x] · g, where g = j=1 (x − aj ). Assume k > n. We conclude ker(Eva,k ) = ker(Eva ) ∩ F [x]<k = F [x]<k−n · g, in particular dim ker(Eva,k ) = k − n. But then dim image(Eva,k ) = dim F [x]<k − dim ker(Eva,k ) = n. It follows that RSq (a, k), for k > n, is an [n, n, 1]q -code. 9.1 If [f ] is in Cn (g) = F [x].[g], then [f ] = [gh] for some polynomial h. But then g ∗ [f ] = [g ∗ gh] = [(xn − 1)h] = 0 since [xn − 1] = 0. Vice versa, if g ∗ [f ] = 0, n = g, and therefore then g ∗ f is a multiple of xn − 1, i.e. f is multiple of x g−1 ∗ [f ] in Cn (g). 68 APPENDIX 9.2 A little script in Sage solves the problem. l = [] g = [1,0,1,0,1,1,1,0,0,0,1,1] for i in range(12): l += i*[0] + g + (11-i)*[0] A = matrix(GF(2),12,23,l) G23 = A.row_space(); G23 h = lambda v: sum( 1 for i in range(len(v)) if v[i] != 0) d = dict( (i,0) for i in range(24)) for v in G23: d[h(v)] += 1 d The result is {0: 1, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 253, 8: 506, 9: 0, 10: 0, 11: 1288, 12: 1288, 13: 0, 14: 0, 15: 506, 16: 253, 17: 0, 18: 0, 19: 0, 20: 0, 21: 0, 22: 0, 23: 1} 14. SOLUTIONS TO SELECTED EXERCISES 69 9.3 Let p be the prime dividing q = |F | and write n = pk m with m not divisible by p. Then we have in F [x] k k xn − 1 = xmp − 1 = (xm − 1)p . As shown in the proof of Theorem 3 the polynomial xm − 1 factors into N many pairwise different normalized prime polynomials, where N is given by Theorem 3. But then the number of normalized divisors of xn − 1 equals σF (xn − 1) = (pk + 1)N = (pk + 1) ϕ(l) l|m ordl (q) P . 11.1 Let {ei } be the standard basis of F n (where ei consists of a 1 at the ith place and zeros at all other places). Let α be an isometry, i.e. a bijective linear map from F n onto itself such that h(w) = h (α(w)) for all w in F n . In particular, h (α(ei )) = 1, which means that α(ei ) = ai ei0 for a suitable ai in F ∗ and a suitable i0 . Since α is bijective the map i 7→ i0 must be a permutation σ. It follows that α = pσ ◦ ma , where a = (a1 , . . . , an ). In other words, α is in SM . That every map from SM is an isometry is obvious. The sign “n” indicates that S ∩ M = {1} and that M is normalized by S. This is also obvious. 12.1 Applying S two times we find S 2 xa = σ(F )2 X χ(−ab − bc) xc . q b,c∈F But P b∈F χ (−(a + c)b) = 0 unless c = −a, when the sum equals q. Therefore S 2 a = σ(F )2 x−a . It follows that S 4 = 1 if σ(F )4 = 1 and S 8 = 1 otherwise. 12.2 Note, first of all, that there exists a c in F such that χ(c) 6= 1 (since χ is non-trivial). Therefore, if b 6= 0, then χ(ab) 6= 1 for a = c/b. The sum equals |F | if b = 0 since then every term equals 1. If b 6= 0 we choose an a0 such that χ(a0 b) 6= 1. Replacing a by a + a0 in the summation we see that our sum remains unchanged if multiplied by χ(a0 b) 6= 1, which s only possible if it is 0. {# 12.3 We note, first of all, that tr : F → Fp is a Fp -linear and non-trivial (see the preceding exercise). Therefore it assumes each value in Fp exactly |F |/p times. We can therefore write for p = 2 X |F | σ(F ) = |F |−1/2 e4 (−x2 ), 2 x∈F2 70 APPENDIX and similarly for p = 2 but with ep (x2 /2) replaced by e4 (x2 )−1 . For p = 2 the sum on the right hand side consists only of two terms and yields 1−i σ(F ) = √ 2 #}