Download Coding Theory - Hatice Boylan

Document related concepts

Polynomial ring wikipedia , lookup

Factorization wikipedia , lookup

Eisenstein's criterion wikipedia , lookup

Factorization of polynomials over finite fields wikipedia , lookup

Fundamental theorem of algebra wikipedia , lookup

Transcript
Hatice Boylan and Nils-Peter Skoruppa
Coding Theory
Lecture Notes
Version: August 1, 2016
This work is licensed under the
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0
International Licence. (CC BY-NC-ND 4.0)
For details see http://creativecommons.org/licenses/by-nc-nd/4.0/.
c Hatice Boylan and Nils Skoruppa 2016
Contents
1 Fundamentals of Coding
1
What is coding theory
2
Basic Notions . . . . .
3
Shannon’s theorem . .
4
Examples of codes . .
5
Bounds . . . . . . . .
6
Manin’s theorem . . .
Theory
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
2 Infinite Families of Linear
7
Reed-Solomon Codes . .
8
Reed-Muller codes . . .
9
Cyclic codes . . . . . . .
10 Quadratic residue codes
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
5
10
13
21
26
Codes
. . . . .
. . . . .
. . . . .
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
31
31
34
36
43
3 Symmetry and duality
49
11 Weight enumerators . . . . . . . . . . . . . . . . . . . . . . . . . 49
12 MacWilliams’ Identity . . . . . . . . . . . . . . . . . . . . . . . . 51
13 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Appendix
61
14 Solutions to selected exercises . . . . . . . . . . . . . . . . . . . . 61
i
ii
CONTENTS
List of Figures
1.1
1.2
1.3
1.4
2.1
2.2
2.3
Ha (x) for a = 2, 3, 4, 23 . . . . . . . . . . . . . . . . . . . . . . .
The Fano plane . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The icosahedron . . . . . . . . . . . . . . . . . . . . . . . . . . .
For q=2 and n=256, plot of the Hamming, Singleton, Griesmer,
Gilbert-Varshamov and Plotkin bounds in red, green, blue, gray
and purple, respectively. (We plotted the points ( nd , R), where
R is the maximal (respectively minimal, for Gilbert-Varshamov)
rate admitted by the respective bound.) . . . . . . . . . . . . . .
The 528 32-ary Reed-Solomon codes in the δ, R-plane . . . . . .
The set RM2 (red), RM16 (green), RM32 (blue) for r = 1, 2, . . . , 10
in the δ, R-plane. The “Mariner” code RM2 (5, 2) is encircled . .
Lattice of binary cyclic codes of length 7. The divisors of x7 − 1
are 1, x + 1, x3 + x + 1, x3 + x2 + 1, x4 + x2 + x + 1, x4 + x3 + x2 +
1, x6 + x5 + x4 + x3 + x2 + x + 1, x7 + 1 . . . . . . . . . . . . . .
iii
9
14
16
24
32
36
38
iv
LIST OF FIGURES
Preface
These lecture notes grew out of courses on coding theory which the second
author gave during the past 10 years at the universtity of Siegen and a course
given by the first author in 2015, when she was visiting Siegen with a Diesterweg
stipend.
Hatice Boylan and Nils Skoruppa, Siegen, July 2016
v
vi
LIST OF FIGURES
Chapter 1
Fundamentals of Coding
Theory
1
What is coding theory
In coding theory we meet the following scenario. A source emits information
and a receiver tries to log this information. Typically, the information is broken
up into atomic parts like letters from an alphabet and information consists of
words, i.e. sequences of letters. The problem is that the information might
be disturbed by not optimal transport media resulting in incidental changes of
letters
Real life examples are the transmission of bits via radio signals for transmitting pictures from deep space to earth, e.g. pictures taken by a Mars robot . Or
as a more every day life example the transmission of bits via radio signals for
digital TV. The source could also be the sequence of bits engraved in an audio
or video disk, and the transmission is now the reading by the laser of the CD
reader: little vibrations of the device or scratches on the disk cause errors of
transmission.
Example 1.1. A source emits 0s and 1s, say, at equal probability. Let p be
the probability that an error occurs, i.e. that a 0 or 1 arrives as a 1 or 0 at
the receiver. If p is very small we might decide to accept these errors, and
if p is almost 1 we might also decide to not care since we simply interpret 1
as 0 and vice versa, which reduces again the error probability to a negligible
quantity. If the error probability is exactly 21 we cannot do anything but asking
the engineers to study the problem of improving the transmission. However, if
p is, say only a bit smaller than 12 and we need a more reliable transmission,
coding comes into play.
The natural idea is to fix a natural number n and if we want to transmit
the bit b we send the sequence bb . . . b of length n. In other words, we encode
b into a sequence of n-many bs. The receiver must, of course, be informed of
this convention. He will decode then according to the principle of Maximum
Likelihood Decoding. If he receives a sequence s of length n, he interprets it as
a 0 if the word s contains more 0s than 1s and vice versa. In other words, he he
interprets s as a 0 if s resembles more a sequence of n-many 0s and otherwise
1
2
CHAPTER 1. FUNDAMENTALS OF CODING THEORY
as 1. Here we assume for simplicity that n is odd, so that a word of length n
can never contain an equal number of 0s and 1s.
What is now the probability of missing the right message? If we send a
sequence of n-many 0s then receiving instead any word with r ≥ n+1
2 many 1s
would result in an error. The probability
of
receiving
a
given
word
of this kind
is pr (1 − p)n−r , and there are nr such words. The error probability is therefore
now
n
X
n r
p (1 − p)n−r .
Pn =
r
n+1
r=
2
It is not hard to show (see below) that limn→∞ Pn = 0. Therefore, our repetition
code can improve a bad transmission to a one as good as we want, provided the
transmission error p for bits is strictly less than 12 .
What makes the repetition code so efficient is the fact that its two code
words are very different. In fact they differ at all n places. However, there is a
price to pay. Assume that you want to transmit a video of size 1 GB through a
channel which has an error probability p = 0.1 when transmitting bits. This is
certainly not acceptable since that would mean that 10 percent of the received
video consists of flickering garbage. We might like to transmit the video via the
repetition code of length n. The first values for the sequence Pn are
P1 = 1.000000e − 01, P3 = 2.800000e − 02, P5 = 8.560000e − 03,
P7 = 2.728000e − 03, P9 = 8.909200e − 04, P11 = 2.957061e − 04,
P13 = 9.928549e − 05, P15 = 3.362489e − 05, P17 = 1.146444e − 05,
P19 = 3.929882e − 06.
For having transmission errors less than 0.1 percent we would have to choose
n = 9, which would mean that we would have to transmit 9 GB for a video not
bigger than 1 GB. In this sense the repetition code seems to us very inefficient.
What makes it so inefficient is that there are only two possible informations,
i.e. two code words to transmit, but they have length n. In other words there
is only one bit of information for every n transmitted bits.
We would like to insist on our idea but search for better codes. For example,
for our case of transmitting a video we might try to find, for some (possibly big)
number n, a subset C of the set {0, 1}n of all sequences of length n of digits 0
or 1 which satisfies the following two properties:
1. Every two distinct sequences in C should differ in as much as possible
places. In other words, the quantity
d(C) = min{h(v, w) : v, w ∈ C, v 6= w}
should be very large, where h(v, w) denotes the number of places where v
and w differ.
2. The quotient
R(C) =
should be large as well.
log2 (|C|)
n
1. WHAT IS CODING THEORY
3
The number log2 (|C|) is the quantity of information (measured in bits) which
is contained in every transmission of a sequence in C, i.e. in every transmission of
n bits. The ratio R(C) has therefore to be interpreted a the ratio of information
per bit of transmission. We would then cut our video in sequences of length k,
where k = blog2 (|C|)c, and map these pieces via a function (preferably designed
by an engineer) to the sequences in C, send the encoded words and decode
them at the other end of the line using Maximum Likelihood Decoding. The
Maximum Likelihood Decoding will yield good results if d is very large, i.e. if the
code words differ as much as possible. We shall see later (Shannon’s Theorem)
that there are codes C which have R(C) as close as desired to a quantity called
channel capacity (which depends on p), and the probability of a transmission
error in a code word as low as desired. Of course, the length n might be very
long, which might cause engineering problems like an increased time needed for
encoding or decoding.
We stress an important property of the repetition code which we discussed
errors. This means the following: if the
above. Namely, it can correct n−1
2
places,
sent code word and the received one do not differ at more than n−1
2
the Maximum Likelihood Decoding will return the right code word, i.e. it will
correct the errrors. In general we shall mostly interested in such error correction
codes.
However, in some situations one might be only interested in detecting errors,
not necessarily correcting them. Examples for such a code are the International
Standard Book Numbers ISBN10 and ISBN13. Here to every published book is
associated a unique identifier. In the case of ISBN10 this is a word d1 d2 · · · d10
of length 10 with letters from the alphabet 0, 1, . . . , 9, X. The procedure of this
association is not important to us (but see here for details). What is important
for us is that it is guaranteed that the sum
N := d1 + 2d2 + 3d3 + · · · + 10d10
is always divisible by 11 (where the symbol X is interpreted as the number 10).
By elementary number theory the following happens: if exactly one letter is
wrongly transmitted then N is no longer divisible by 11. In other words, we
can detect one error. However, there is no means to correct this error (except,
that we would be told at which place the error occurs). We shall come back to
this later, when we recall some elementary number theory.
4
CHAPTER 1. FUNDAMENTALS OF CODING THEORY
A property of the binomial distribution
We prove the statement that the sequence of the Pn in the above example tend to 0. In fact, this can be
obtained from Chebyshev’s inequality applied to a sequence of random
variables
Xn , where P (Xn = k) =
n r
p
(1
−
p)n−r , i.e. where Xn folr
lows the binomial distribution with
parameters n and p.This distribution measures the probability of successes in a sequence of n independent trials where the probability of
success in a single trial is p. However, it is also possible to give a short
direct proof avoiding the indicated
concepts.
For p < 12 we can choose λ = 12 in
the proposition, and we obtain the
claimed statement Pn → 0.
Proposition 1.2. For every 0 ≤
p ≤ 1 and every λ > p, one has
X n
lim
pr (1 − p)n−r = 0
n→∞
r
d2
d
p(1 − p)
1
−2np +n2 p2 (pex +1−p)n t=0 =
,
2
2
2
(λ − p) n dt
dt
(λ − p)2 n
r≥λn
Proof. It is clear that
2
n X n
X
n r
r − np
r
n−r
n−r
,
p (1−p)
≤
p (1−p)
(λ − p)n
r
r
r=0
r≥λn
since, for r ≥ λn, we have 1 ≤
r−np
But the right hand side
(λ−p)n .
equals
which tends to 0.
Exercises
1.1. Find all subsets C in {0, 1}5 up to isomorphism, and compute d(C) and
R(C) for each . (Two subsets are called isomorphic if one can be obtained from
the other by a fixed permutation of the places of the other’s sequences.)
1.2. Which book possesses the ISBN-10 ”3540641 ∗ 35”? (First of all you have
to find the 8th digit.)
2. BASIC NOTIONS
2
5
Basic Notions
Let A be a finite set, henceforth called the alphabet, and fix a positive integer
n. The elements of the Cartesian product An are called words over A of length
n. For two words v and w in An we define their Hamming distance as
h(v, w) = the number of places, where v and w differ.
A subset C of An is called a code of length n. As we saw in the first section
there are two quantities which are important to measure the efficiency of a code.
The first one is its minimal distance:
d(C) := min{h(c1 , c2 ) : c1 , c2 ∈ C, c1 6= c2 }.
The larger d(C) the more errors C can discover or even correct. Indeed, one
has the following.
Theorem 2.1. A code with minimal distance d can correct via Maximum Likelihood Decoding up to b d−1
2 c errors, and it can detect up to d − 1 errors.
Proof. Indeed, let c be code word, let w be a word which we receive for c, and
0
assume that w does not contain more than b d−1
2 c errors. If c is another code
word then
h(c0 , w) ≥ h(c, c0 ) − h(c, w) ≥ d −
d − 1 d − 1
=
+ 1 > h(c, w).
2
2
(see Exercise 1. for the validity of the triangle inequality). Therefore, Maximum
Likelihood Decoding would replace w by c, i.e. decodes w correctly. If w differs
from c in at least one but not more than d − 1 places\(c\), then w cannot be a
codeword and will hence be detected as erroneous since two different code words
have distance strictly greater than d − 1.
The second one is its information rate (or simply rate)
R :=
log|A| (|C|)
log |C|
=
.
n
log |An |
Here loga (x) denotes the logarithm of x to the base a (i.e. the number y such
that ay = x). One should think of it as follows. A set with N (\(=|C|\))
elements can describe (can be associated injectively to) sequences of k letters,
where k is not larger than loga N since we need 2k ≤ N . Thus the information provided by such a set is “k letters”. On the other hand, since C ⊆ An
every element of C is communicated via a word of length n. Thus the rate of
information provided by C is nk .
Example 2.2. The repetition code of length n over A consists of the n words in
An whose letters are all the same. Here the minimal distance d equals n, which
is the theoretical possible maximum for a code of length n. However, its rate is
R=
which tends to zero for increasing n.
1
,
n
6
CHAPTER 1. FUNDAMENTALS OF CODING THEORY
Strictly speaking, the given formula for the information rate is only welldefined if C is non-empty, and similarly, the formula for the minimal distance
is well-defined only if C has at least two elements. In the following we assume
tacitly that |C| ≥ 2 if it is necessary to give sense to a formula.
Very often A will be an Abelian group. In this case we can consider the
Hamming weight of a word v in An :
h(v) = number of places of v different from 0.
Clearly, h(v) = h(v, 0), where 0 denotes the neutral element in An . Moreover,
for a code C in An which is a subgroup one has
d(C) = min h(v).
06=v∈C
Indeed, the sets {h(v, w) : v, w ∈ C, v 6= w} and {h(v) : 0 6= v ∈ C} are the
same (since, for v and w in C, we have h(v, w) = h(v − w), and v − w is in C,
and h(v) = h(v, 0) for all v in C).
Even more often, A will be a finite field, and then more reasonably denoted
by F . In this case F n is a vector space over F . If F is a prime field, i.e. a field
whose cardibality is a prime, every subgroup C of F n is a sub-vector space of
F n . If k is the dimension of C, and q denotes the number of elements of F ,
we have |C| = q k (see below). For the rate of C we therefore have the simple
formula
dimF C
.
R(C) =
n
Subspaces of F n are called linear codes of length n over F . In fact, we shall
mostly concerned with linear codes.
We shall later repeat the basics of the theory of finite fields. However, in
many parts of the course we only need the finite field F2 with two elements.
For those knowing a bit algebra or number theory, it suffices to recall that
F2 = Z/2Z. Otherwise, as usual in algebra, call the element of the field F2 in
question 0 (for the additive neutral elements) and 1 (for the multiplicative neutral element). The multiplication is easily understood by thinking of 0 and 1 as
“False” and “True”, and then the multiplication is the logical “and”. Similarly,
the addition corresponds to the logical “xor”, also known as the “exclusive or”.
Cardinality of vector spaces over finite fields
Proposition 2.3. Let C be a finite- ten in one and only one way as a
dimensional vector space over the fi- linear combination a1 v1 + · · · + an vn
nite field F . Then
with aj in F . For each aj we have |F|
many choices, which results in |F |k
dimF C
|C| = |F |
.
different linear combinations, i.e. elProof. Let v1 , . . . , vk be a basis of C. ements of C.
The every element of C can be writ-
2. BASIC NOTIONS
7
Finite fields
If F is a finite field its cardinality is
a prime power q = pn . Vice versa,
for every prime power q there is one
and, up to isomorphism, only one finite field with q elements. The finite
fields can be constructed as follows.
If p is a prime then Fp := Z/pZ is
a field with p elements. Here Z/pZ
is the quotient of the ring Z by the
ideal pZ. The elements of Z/pZ are
the cosets [r]p := r + pZ, where r
is an integer 0 ≤ r < p. The addition and multiplication of two such
cosets is given by [r]p + [s]p = [t]p
and [r]p · [s]p = [u]p , where t and u
are the remainders of division of r+s
and r · s by p.
Similarly, if q = pn is a prime power
with n ≥ 2, then a field with q elements can be obtained as follows.
Let Fp [x] be the ring of polynomials with coefficients on the field Fp .
Choose an irreducible polynomial f
in Fp [x] of degree n (such polynomials always exist). That f is irreducible means that f cannot be
written as product of two nonconstant polynomials in Fp [x]. Finally,
the quotient Fq := Fp [x]/f Fp [x] is
a field with q elements. As before
the elements of Fp [x]/f Fp [x] are the
cosets [r]f := r + f Fp [x], where r
runs through all polynomials in Fp [x]
whose degree is ≤ n − 1. Note
that two cosets [g1 ]f and [g2 ]f are
equal if and only if g1 − g2 is divisble
by f . And as before addition and
multiplication of cosets is defined as
[r]f +[s]f = [t]f and [r]f ·[s]f = [u]f ,
where t and u are the (normalized)
remainders of division of r + s and
r · s by the polynomial f .
The field Fq which we just defined
depends a priori on the choice of f .
In general there are more than one
irreducible polynomials of degree n.
For example, the polynomials f1 :=
x2 + 1 and f2 := x2 + x − 1 in F3 are
both irreducible. However, it a fact
that all fields with a given number
q = pn of elements are isomorphic.
An isomorphism Fp [x]/f1 Fp [x] →
Fp [x]/f2 Fp [x] is given by the application [r]f1 7→ [e
r]f2 , where re is the
(normalized) rest of r(x + 1) after
division by f2 .
A finite field F with q = pn elements can be viewed as a vector
space over Fp when we define the
scalar multiplication of elements [r]p
of Fp and λ of F by [r]p · λ as the
r-fold sum of λ. It is a fact that
f contains an element α such that
1, α, α2 , . . . , αn−1 is a basis of F as
vector space over Fp . This follows
for example easily from the fact that
F ∗ = F \ {0} is a cyclic group with
respect to multiplication. Thus every element on F can be written in a
unique way as a linear combination
u0 + u1 α + u2 α2 + · · · + un−1 αn−1
with elements uj from Fp . If we take
for F the field Fq = Fp [x]/f Fp [x]
then one can choose α = [x]f . The
fact that αn is a linear combination
of 1, α, α2 , . . . , αn−1 translates into
the fact that there is a unique normalized polynomial f in Fp [x] such
that f (α) = 0 (where normalized
means that F is of the form xd +
terms of lower degree). The multiplication of two linear combinations
u0 + u1 α + u2 α2 + · · · + un−1 αn−1 is
then done by applying the distributive law and using that αi ·αj = r(α),
where r is the rest of xi+j after division by f . The polynomial f is called
the minimal polynomial of α.
The mentioned facts about finite
fields and their proofs can be found
in most textbooks on algebra. The
reader might also look up the
wikipedia.
8
CHAPTER 1. FUNDAMENTALS OF CODING THEORY
A notion which will occur repeatedly is the ball of radius r around a word v
in An :
Br (v) := {w ∈ An : h(v, w) ≤ r}.
The number of words inside Br (v) is
Va (n, r) := |Br (v)| =
r X
n
i=0
r
(a − 1)i
(see Exercise 2 below). Note that this number is independent of v.
We want to introduce a measure for the “rate of uncertainty of information”
transmitted through a channel which uses an alphabet with a ≥ 2 letters and
transmit every letter with error probability p. If we use words of length n we
expect in the average np errors per word. But then the received word is one of
the ball of radius np around the sent one, i.e. it is one amongst Va (n, pn) many.
The information provided by these many words, measured again in “number
of letters” is loga (Va (n, pn)). The rate of uncertainty in this case is hence
loga (Va (n,pn)
. We therefore define, for 0 ≤ p ≤ 1, the base-a entropy function
n
Ha (p) := lim
n→∞
loga Va (n, pn)
.
n
By what we have seen this this is a sensible quantity to measure the “rate
uncertainty of information” for a base-a channel of error probability p.
Theorem 2.4. For any a ≥ 2 and 0 ≤ p ≤ 1 − a1 , the limit defining Ha (p)
exists. Its value equals
Ha (p) := p loga (a − 1) − p loga p − (1 − p) loga (1 − p),
(where we understand Ha (0) = 0.)
Note that Ha (x) increases continuously from 0 to 1. Its graphs for a =
2, 3, 4, 23 are:
Proof. Set k = bpnc. We observe that nk (a − 1)k is the largest of the terms in
the formula for Va (n, r). We conclude
n
n
(a − 1)k ≤ Va (n, r) ≤ (1 + k)
(a − 1)k .
k
k
1
n
1
log
(a − 1)k =
log n! − log k! − log(n − k)! + k log(1 − a) ,
n
k
n
which, by Stirling’s formula log n! = n log n − n + O(log n) and k = pn + o(1),
equals
log n − p log pn − (1 − p) log(1 − p)n + p log(1 − a) + o(1) = Ha (p) log a + o(1).
The theorem is now obvious.
2. BASIC NOTIONS
9
Figure 1.1: Ha (x) for a = 2, 3, 4, 23
Exercises
2.1. Show that An equipped with the Hamming distance defines a metric space.
2.2. Prove the given formula for the number of words contained in the ball
Br (v) ⊂ An .
2.3. In the notations and under the assumptions
of Theorem 4 prove that for
i ≤ k := bpnc, one has ni (a − 1)i ≤ nk (a − 1)k . (This inequality was used in
the proof of Theorem 4.)
2.4. What happens to the limit defining Ha (p) for 1 −
limit exist? Can you determine its value?
1
a
≤ p ≤ 1? Does the
2.5. For any prime p, determine the number of normalized irreducible polynomials of degree 2 in Fp [x].
10
3
CHAPTER 1. FUNDAMENTALS OF CODING THEORY
Shannon’s theorem
Assume that we transmit letters of an alphabet A with an error probability
p. Let C be a code of length n over A. The events that we want to study is
modeled by the set EC of pairs (c, n) where c is in C and m is a word of length
n over A. Such a pair corresponds to the event that we transmit c and receive
m. We assume that the probability that this event occurs is
PC (c, m) =
p
a−1
h
(1 − p)n−h
|C|
,
where h = h(p, c), i.e. h equals the number of places, where c and m differ,
and where a − |A|. Thus, the probability that a letter is transmitted wrongly
is p, and then every letter different from the received one is received with equal
probability. Moreover, we are assuming that in our transmissions every code
word in C occurs with the same probability. The probability
PC (S) that an
P
event lies in a subset S of the event space EC is then e∈S PC (e). It is an easy
exercise to see that indeed PC (EC ) = 1.
We apply the principle of Maximum Likelihood Decoding to map a received
word m. This means that we search for the closest c in C (with respect to
the Hamming distance). If the minimum h(c, m) is taken by exactly one code
word c, we decode m as c. Otherwise we throw an error (or, in practice and
if necessary, decode m as a once and for all fixed code word, or as a first one
amongst all taking on the minimal distance to m with respect to some ordering).
The probability for a transmission error is hence PC (EC ), where
EC = {(c, m) ∈ EC : ∃c0 ∈ C : h(c0 , m) ≤ h(c, m)}.
Theorem 3.1 (Noisy-Channel Coding Theorem). Assume 0 ≤ p < (a − 1)/a.
Let R be a real number such that 0 < R < 1 − Ha (p). Then
µn :=
1
α(R, n)
X
PC (EC ) → 0
C⊆An
bnRc
R(C)= n
for n → ∞. Here α(R, n) denotes the number of codes C of length n over A
with R(C) = bnRc
n .
The interpretation of the theorem is clear. For any given R within the given
bounds and any given ε > 0, there exists for all sufficiently large n a code of
length n over A with transmission error probability less than ε and rate grater
than R − ε. It is intuitively clear that the sum of the information rate of a code
with probability of a transmission error close to 0 and the rate of uncertainty of
information of the channel (i.e. Ha (p) cannot be grater than 1, which is indeed
the assumption of the theorem.
The magical quantity 1 − Ha (p) is called the channel capacity (of a transmission/channel for an alphabet with a letters and with error probability p).
Proof. Let C be a code of length n. Fix a radius r and let DC be the set of
events c, m in EC such that h(c, m) ≤ r and such that c is the only code word
satisfying this inequality. Clearly, any (c, m) in DC will be decoded correctly
3. SHANNON’S THEOREM
11
0
by the Maximum Likelihood Decoding. Accordingly, the complement of EC
of
DC contains EC , and so
0
PC (EC ) ≤ PC (EC
).
Let f (v, w) = 1 if h(v, w) ≤ r and f (v, w) = 0 otherwise, and, for an event
(c, w) in EC , set
X
gC (c, w) = 1 − f (c, w) +
f (c0 , w).
c0 ∈C\c
0
Then gC (c, w) ≥ 1 on EC
and gC (c, w) = 0 otherwise. Therefore
X
0
PC (EC
)≤
gC (c, w)PC (c, w).
(c,w)∈EC
Rewriting this inequality in terms of the f (c, w) yields
X X
0
PC (EC
) ≤ PC (h > r) +
f (c0 , w) PC (c, w),
w∈An c,c0 ∈C
c6=c0
where h is the Hamming distance and PC (h > r) denotes the probability that
an event (c, w) in EC satisfies h(c, w) > r.
We shall see in a moment, that, for any given ε > 0, we can choose r for any
sufficiently large n (independent of C) such that PC (h > r) ≤ ε. With such an
(i.e. |C| = abnRc ), we
r and averaging over all C of length n and R(C) = bnRc
n
obtain for µn the estimate
µn ≤ ε + µ0n ,
where, for any r, we have
µ0n =
X
X
w∈An c,c0 ∈An
c6=c0
0
f (c , w)
1
abnRc
p
a−1
h(c,w)
(1 − p)n−h(c,w) AC χC (c0 )χC (c) .
Here AC denotes the average over C and χC the characteristic function of C.
We estimate µ0n to above. For this we note
#{C ⊆ An : c, c0 ∈ C, |C| = abnRc }
AC χC (c0 )χC (c) =
.
α(R, n)
n
Set k = bnRc. Then α(R, n) = aak . The number of C with |C| = ak and
c, c0 ∈ C equals the number of subsets in An \ {c, c0 } of cardinality ak − 2, i.e. it
n
−2
equals aak −2
. We insert these values into the expression for µ0n , and we drop
the condition c 6= c0 in the sum over the c, c0 , so that this sum becomes two
independent sums over c and over c0 , respectively. The sum of f (c0 , w) taken
over c0 equals |Br (w)| = Va (n, r), and the sum over c equals 1. The contribution
µ0n can therefore be estimated to above by
an −2
n
ak − 1
ak −2 a
0
µn ≤ an k Va (n, r) = n
Va (n, r).
a
a −1
ak
12
CHAPTER 1. FUNDAMENTALS OF CODING THEORY
Choose now λ > p such that still R < 1 − Ha (λ) (which is possible since
1 − Ha (x) is continuous), and choose r = λn. Taking the base a logarithm of
the right hand side of the last inequality and dividing by n yields
log µ0n ≤
1
loga (abnRc − 1) loga (an − 1)
−
+ loga Va (n, λn).
n
n
n
For n → ∞ this tends to β := R − 1 + Ha (λ). By choice of R we have β < 0.
We conclude that for sufficiently large n,
µ0n ≤ eβ
0
n
for some β ≤ β 0 < 0. In particular, we see that limn→∞ µ0n = 0.
It remains to prove the claim about the terms PC (h > λn). The mean
value of the Hamming distance on EC is E = np and the variance σ 2 equals
σ 2 = np(1 − p). By Chebyshev’s inequality we therefore have, for any given
ε > 0, that
p
√
PC (h > np + np(1 − p)/ε) ≤ PC (|h − E| > σ/ ε) ≤ ε.
p
But for sufficiently large n we have λn ≥ np + np(1 − p)/ε. The claim of the
theorem is now obvious.
Chebyshev’s Inequality
We recall here Chebyshev’s inequality. For avoiding introducing unnecessarily concepts from advanced
probability theory we confine ourselves to the case of a finite set E and
a probability measure P on the domain of its subsets. In other words,
to every e in E is associated
P a number 0 ≤ pe ≤ 1 such that e∈E pe =
1. The measure P (S) for a subset
S of E is given by \( \sum {e\in S}
p e\). Let h be a real or complex
valued function on E (which, in the
jargon of probability theory would
be called a random variable). The
mean value E (or expectation value)
and the variance σ 2 of h are defined
as
X
X
E=
h(e)pe , σ 2 =
|h(e)−E|2 pe .
e∈E
e∈E
Proposition 3.2 (Chebyshev’s Inequality). In the preceding notations
one has, for any real k > 0,
P (|h − E| ≥ kσ) ≤
1
.
k2
For the simple proof of Chebyshev’s
Inequality we refer to the Wikipedia.
Exercises
3.1. Prove that the mean value and the variance of the Hamming distance on EC
with respect to the probability measure PC equal np and np(1 − p), respectively.
3.2. For a given w in F32 , compute the mean value of the random variable
C 7→ χC (w) on the set of all 2-dimensional subspaces, where we assume that
every subspace occurs with equal probability.
4. EXAMPLES OF CODES
4
13
Examples of codes
Before we proceed to study more systematically how to produce codes with good
minimal distance d and with good information rate R we review some classical
codes. In fact, all codes in this section will be binary and almost all linear.
Other examples will come in later sections.
If we have any binary linear code C of length n we can produce a new code
C by appending to each code word c a parity bit which is that bit cn+1 in {0, 1}
which has the same parity as h(c), i.e. as the the number of 1s in c. If n is
large this reduces the rate only slightly: if C has rate R = nk then C has rate
k
n+1 . However, if C is linear and has minimal distance d, then C has minimal
distance d + 1 if d is odd, and has the same minimal distance d if d is even
(since the minimal distance of a binary linear code is the minimal number of 1s
occurring in a nonzero codeword). Thus C and C correct the same number of
errors, namely b d−1
2 c, but C can detect one more error if d is odd.
Example 4.1 (Two-out-of-five code). This is the code consisting of all words
of length 5 over {0, 1} which possess exactly two 1s. There are exactly 10 = 52
codewords, which might represent e.g. the digits 0, 1, . . . , 9. This code is not
linear. Its rate is log52 10 = 0.664385 . . . . It can obviously detect one error (since
changing a 0 to 1 or vice versa will yield a word with one ore three 1s). It can
also detect three or five errors (since changing a code word at an odd number
of places is the same as adding a word with an odd number of places and hence
changing the parity of the sum of letters). However, it does not detect two or
four errors. Moreover, if one error occurs, we do in general not know where; so
this code does not correct errors. Its minimal distance equals 2 (since all code
words have even sum of letters).
Example 4.2 (Hamming code and extended Hamming code). Maybe the first
error correcting code which was applied as such is the Hamming Code H(7, 4).
This is a linear subspace of dimension 4 in F72 . Its rate is therefore 74 . Its minimal
distance is rather large for such a small code, namely 3. It can therefore correct
one error (see Theorem 2.1). It is suitable for channels with low error probability,
like for example in ECC memory, which is used as RAM in critical servers. It
is amusing to read the story which lead Richard Hamming to find this code.
There are several ways to describe the Hamming code. First of all, as 4dimensional code over F2 it has 216 code words (see Proposition 2.3), and we
could simply list them all. This is very likely not very instructive. We can also
write down a basis for it, i.e. a list 4 vectors of length 7 which span it. Again
this is not very instructive, in particular since such a basis is not unique. We
can also describe it by giving 3 linearly independent vectors of length 7 which
are perpendicular to the 16 code words of the Hamming code with respect to
the natural scalar product on F72 , which are then the vectors perpendicular to
the given 3. One can combine these three vectors into a 7 × 3 matrix and the
Hamming code is then the left-kernel of this matrix. Such matrices are called
control matrices of the given code in question (since applying them to a code
word from the right confirms that it is indeed a code word if the result is the
zero vector).
A fourth method is to read the code words as characteristic functions of
subsets of a set with 7 elements. Namely, fix a set {P1 , P2 , . . . , P7 } with seven
elements. A code word c1 , c2 . . .7 corresponds then to the subset {Pi : ci = 1}.
14
CHAPTER 1. FUNDAMENTALS OF CODING THEORY
It is a truly beautiful fact that the 16 subsets of the Hamming code carry an
additional structure which makes them such a distinguished collection. Namely,
if we mark the 7 points Pi and connect every three by a “line” if they form a set
corresponding to a code word with exactly 3 many 1s, we obtain the following
figure
Figure 1.2: The Fano plane
(the circle in the middle has also to be considered as a “line”). This figure
is also known as the Fano plane or the projective plane over the field with 2
elements.
We see exactly 7 points and 7 lines, every 2 points lie on exactly one line,
and every 2 lines intersect in exactly one point. Every line contains exactly 3
points, and through every point pass exactly 3 lines.
The 16 code words of the Hamming code corresponds to the 7 lines, the 7
complements of the lines, the empty set and the full set. Note that the Hamming
distance h(w, w0 ) of any two words corresponding to subsets S1 , S2 equals the
cardinality |S1 4S2 | of the symmetric distance S1 4S2 = (S1 \ S2 ) ∪ (S2 \ S1 ).
Therefore, the Hamming distance of two different lines of the Hamming code is
4. Continuing this line of reasoning It is easy to verify that the minimal distance
of H(7, 4) is indeed 3 (see Exercise 1.) However, it is even easier to apply the
criterion of Section 2 which states that the minimal distance of a linear code is
the smallest number of 1s occurring in a codeword different from the zero word.
It is immediately clear that the lines correspond to the codewords with minimal
Hamming weight, which is then 3.
The Hamming code H(7, 4) possesses another
striking
property. Namely,
P
the ball B1 (c) around a code word contains i≤1 7i = 8 points, and any two
such balls around two different codewords are disjoint (since 3 ≤ h(c, c0 ) ≤
h(c, w) + h(c0 , w), so that one of the terms on the right is larger than 1). Since
the number of code words times the number of points in a ball of radius 1 equals
16 · 8 = 27 , we see that the balls of radius 1 around the codewords partition the
space F72 . A code with such a property is called a perfect code.
We extend the Hamming code H(7, 3) to the extended Hamming code H(8, 4)
by adding a parity bit. The extended code has rate 21 . The minimal distance
4. EXAMPLES OF CODES
15
increases to 4.
The projective n-space over a finite field
Let F be a field. The set Pn (F )
of 1-dimensional subspaces of F n+1
is called the The projective n-space
over F , or simply projective line and
projective plane over F if n = 2 or
n = 3.
The projective space Pn (F ) carries
interesting additional structure and
it has a very intuitive geometrical
meaning. The latter we do not pursue here but hint to this reference.
For the first note that it is meaningful to talk, for a given homogeneous polynomial f (x0 , . . . , xn ) with
coefficients in F . of the subset N (f )
of all points P in Pn (F ) such that
f (P ) = 0. Indeed, let w a basis of
the one-dimensional space P . Then
we can evaluate f at (w),and the
property f (w) = 0 does not depend
on the choice of w. If we choose another nonzero w0 in P , then w0 = aw
for some a 6= 0 in F , and f (w0 ) =
ad f (w), where d is the degree of f ,
since f is homogeneous. If f is linear, i.e. has degree 1 then N (f ) is
called a hyperplane in Pn (F ), or simply a line if n = 2.
The projective plane over a finite
field F with q elements consists of
q 2 + q + 1 elements (see Exercise 2.
below). Each line has q + 1 elements, and every point lies on exactly q + 1 points. If we sketch the
points and lines in P2 (F ) we rediscover, for F = F2 the Fano plane.
Another descripytion of codes: (n,k)-systems
An (n, k)-system over the finite field Proof. Every hyperplane H is the
F is a pair (V, S) of an k-dimensional kernel of a nonzero φ in V ∗
vector space over F and family S = and vice versa, and #S ∩ H
{Pi }1≤i≤n of n points in V , such that equals the number of zeros in
S is not contained in any hyperplane φ(P1 ), φ(P2 ), . . . , φ(Pn ) , i.e.
in V (i.e. the vectors in Pi generate
V ). Note that clearly n ≥ k.
h φ(P1 ), φ(P2 ), . . . , φ(Pn ) = n−#S∩H.
An (n, k)-system describes a code of
length n and dimension k over F , The proposition is now obvious.
namely
∗
C := φ(P1 ), φ(P2 ), . . . , φ(Pn ) : φ ∈ V
Note, that every linear code of length
n and dimension k over F can be
obtained from an (n, k)-system. Indeed, let G be a generator matrix of
C (i.e. the rows of G form a basis of
Proposition 4.3. One has
C), and let Pi (\(1\le i\le n\)) be its
k×1
columns.
), {Pi }1≤i≤n )
d(C) = n−max #S∩H : H ⊆ V hyperplane
, Then (F
is an n, k-system and C is the code
where #S ∩ H is the number of 1 ≤ associated to it by the preceding coni ≤ n such that Pi ∈ H (i.e. the struction. (Here F k×1 is the vector
number of Pi contained in H if the space of columns vectors of length k
Pi are pairwise different).
with entries from F .)
where V ∗ denotes the dual space of
V (i.e. the space of linear maps from
V to F ).
16
CHAPTER 1. FUNDAMENTALS OF CODING THEORY
Example 4.4 (The Golay and extended Golay code). The Golay code G23 is a
binary linear code of length 23, rate 21 (hence dimension 12) and minimal weight
7. Later, when we shall study cyclic codes, we shall see a natural (or rather
conceptual) construction. The extended Golay code G24 = G23 is obtained by
adding a parity bit to G23 . Here we confine ourselves to describe a basis for
G24 . The code G23 is then obtained by erasing the last digit (or, for a given i,
the ith digit) from G24 .
The icosahedron consists possesses 12 vertices, 30 edges and 20 vertices.
Figure 1.3: The icosahedron
Let A be the adjacency matrix of the icosahedron, i.e. number the vertices
and set A = (aij ), where aij = 1 if the ith and jth vertex are joined by an edge,
and aij = 0 otherwise. Finally, let B be the complement of A, i.e. replace in
A a 0 by 1 and vice versa. Then the rows of the matrix (1|B), where 1 is the
12 × 12 identity matrix, form a basis for G24 . This is indeed not a very intuitive
definition of the extended Golay code, but at least one can read off the matrix
(1, B) from the picture of the icosahedron and investigate G24 numerically. A
matrix like (1, B, i.e.a matrix whose rows form basis for a given linear code C
is called generator matrix of C.
We described here the Golay codes G24 and G23 up to some ambiguities:
the used adjacency matrix depends on the ordering of the vertices, we obtain a
priory different codes when we choose different ith places in the words of G24
for discarding. However, all these different codes are isomorphic, i.e. there are
the same up to simultaneous permutations of the places of the code words.
In the icosahedron every vertex is joined by an edge to exactly 5 other
edges. Thus, the adjacency matrix contains in every row exactly five 1s and the
complement N contains in every row exactly 7 = 12 − 5 many 1s. So the vectors
of the given basis of G24 possess exactly eight 1s. It turns out that every vector
of length 24 with exactly five 1s can be converted into a codeword by adding
three 1s, and that in only one way. In other words, if we interpret again words
in F24 as subsets of a set X with 24 elements, then the collection S of subsets
of G24 with 8 elements has the following property: for every subset of X with
five elements there exists exactly one subset of S containing it. A system S
4. EXAMPLES OF CODES
17
of subsets of X with this property is called a Steiner system S(5, 8, 24). The
Steiner system provided by the vectors
of Hamming length 8 in G24 is called the
Witt design. Since there are 24
=
42504
5-subsets in X, and every 8-subset
5
8
contains exactly 5 = 56 5-subsets, the total number of codewords of length 8
8
is 24
5 / 5 = 759.
The code G23 consists of 212 words, the balls of radius 3 around each codeword are pairwise
(since
the
disjoint
23
23
minimal distance of G23 is 7). Each such
23
ball contains 23
+
+
+
0
1
2
3 = 2048 words. Therefore |G23 | · V2 (23, 3) =
212 · 211 = 223 , from which we deduce that the balls of radius 3 around the
codewords partition F23
2 , i.e. G23 is perfect.
The extended Golay code was implemented in the technical equipment of
Voyager 1 and 2 for their mission in deep space, more specifically, for transmitting color images from Jupiter and Saturn (see for details).
We end this section by examples of several error-detecting, but not error
correcting codes. We include them here because we meet them in every day life.
Example 4.5 (ISBN 10). We identify the alphabet {0, 1, . . . , 9, X} of the 10digit International Standard book number code which we discussed in the first
section with the elements of the field F11 = {[0]11 , [0]11 , . . . , [9]11 , [10]11 }. Then
this code becomes a linear code over F11 of length 10, namely,
ISBN10 = {c1 c2 · · · c10 ∈ F10
11 :
10
X
j · cj = 0}
j=1
As kernel of a non-zero functional on F10
11 the code ISBN10 is a hyperplane in
F10
,
i.e.
a
subspace
of
dimension
9.
The
entry at the kth place of a codeword
11
c1 c2 · · · c10 is always a function of the other places:
ck = −[k]−1
11
10
X
j · cj .
j=1
j6=k
Thus, if we change a code word at one place it is no longer a code word. One
error will therefore be detected (and can be corrected if we know the place where
it occurs). On the other hand, it is easy to change a codeword at two places and
again obtain a valid codeword (using again the last formula). Summarizing, we
9
.
have d(ISBN10) = 2 and R(ISBN10) = 10
Example 4.6 (ISBN 13/EAN 13). The ISBN 13-digit code, which is identical
to the International Article Number code (also known as EAN 13 barcode) is a
subgroup of (Z/10Z)13 defined as
ISBN13 = {c1 c2 . . . c13 ∈ Z/10Z :
13
X
cj + 3
j=1
j odd
13
X
cj = 0}.
j=1
j even
Here we use the ring Z/10Z of residue classes modulo 10 (see below). This code
is the kernel of the group homomorphism from (Z/10Z)13 onto Z/10Z given by
w0 w1 · · · w13 7→
13
X
j=1
j odd
wj + 3
13
X
j=1
j even
wj .
18
CHAPTER 1. FUNDAMENTALS OF CODING THEORY
Since this map is surjective its kernel has cardinality 1012 . As with the ISBN
10 check digits one sees easily that the minimal distance of the ISBN 13 check
digit code is 2.
Example 4.7 (IBAN). The International Bank Account Number consists of
up to 34 characters: the first two are two upper case Latin alphabet characters
indicating the country code like DE or GB etc.; then come two digits from the
set {0, 1, . . . , 9} (called check digits) and finally up to a maximum of 30 letters
from the 36 letters of the alphabet {0, 1, . . . , 9, A, B, . . . .Z}. How many these
are is country specific. In Germany this is essentially the “old” Bankleitzahl
followed by the proper account number suitably padded with 0s. Such a string
of characters is a valid IBAN number if the following is true: take the given
string, move the first four symbols to the end, replace all letters A, B, . . . , Z
by 10, 11, . . . , 26, respectively. Interpret the resulting string of digits as 10-adic
number. If the remainder upon division by 97 is 1 the given number passes the
test.
The German IBAN consists of the letters DE followed by the two check
digits, followed by the 8 digits of the Bankleitzahl, followed by the account
number which is prepadded by 0s so to comprise exactly 10 digits; it has exactly
22 characters. Thus the set of valid German IBAN numbers can be identified
with the code
IBANDE = 1314c1 c0 b22 b21 . . . b6 ∈ {0, 1, . . . , 9}24 :
22
X
bj · 10j + 131400 + 10 · c1 + c0 ≡ 1 mod 97
(1.1)
(1.2)
j=6
(note that 1314 is the replacement of the characters DE). Since 97 is a prime
number and 10 is relatively prime to 97 it follows similar to the ISBN 10 code
that IBANDE can detect one error, but not correct unless we know the place
where the error occurred.
4. EXAMPLES OF CODES
19
The ring Z/mZ of residue classes modulo m
The set Z/mZ and the addition and
multiplication of elements in Z/mZ
is defined as in the case that m is a
prime number. However, in contrast
to the prime number case, a non-zero
element has not always a multiplicative inverse. In fact, [r]m has a multiplicative inverse if and only if r and
m are relatively prime, i.e. when the
greatest common divisor gcd(r, m) of
r and m is 1. For two integers r and
s we write r ≡ s mod m if r and s
leave the same rest upon division by
m, i.e. if [r]m = [s]m . It is easily verified that r ≡ s mod m if and only if
m divides r − s.
The subset of multiplicatively invertible elements form a group with
respect to multplication, which is denoted by (Z/mZ)∗ . In fact, for every
ring R the set of multiplicatively invertible elements form a group with
respect to multiplication, denoted by
R∗ , called the group of units of R.
The cardinality of (Z/mZ)∗ equals
the number of integers 0 ≤ r < m
with gcd(r, m) = 1. This number
is usually denoted by ϕ(m), and the
application m 7→ ϕ(m) is known as
Euler’s phi-function. Formulas for it
can be found in almost any textbook
on elementary number theory. For
a prime power pn one has obviously
ϕ(pn ) = pn − pn−1 (i.e. the number
remainders of pn not divisible by p
equals the number of all remainders
minus the number of remainders divisible by p). As a consequence of
the Chinese remainder theorem one
has
Y
ϕ(m) =
(pn − pn−1 ),
pn |km
where the product is taken over all
prime powers which divide m exactly, i.e. which divide m such that
m/pn is no longer divisible by p.
Control matrices and Hamming weight
It is sometimes easy to read off the consider the Hamming code. A conminimal weight from the control ma- trol matrix is
trix of a linear code. Namely, one


1 0 0
has the following proposition:
 0 1 0 


Proposition 4.8. Let C 6= {0} be
 1 1 0 


a linear code over the field F . If K
 0 0 1 .


denotes a control matrix of C, then
 1 0 1 


d(C) = min r : K possesses r linearly dependent 
columns
0 1 1. 
1 1 1
Note that the set of which we take
the minimum is in any case not
empty. Namely, since C contains It is immediate that the matrix has
nonzero vectors the columns of K full rank (since the 1st, 2nd and
are linearly dependent.
4th row form the unit matrix), so
We leave the easy proof of the the- that from the proposition we deduce
orem as an exercise. As an example d(C) = 3.
20
CHAPTER 1. FUNDAMENTALS OF CODING THEORY
Exercises
4.1. Verify, using e.g. Sage, that G24 has indeed minimal distance 8.
4.2. For a field F with q elements let Gkn (F ) be the set of k-dimensional sub
spaces of F n . Show that |Gkn (F )| equals the Gaussian binomial coefficient nk q ,
i.e.
n
[q]n
k
|Gn (F )| =
=
.
k q
[q]k [q]n−k
where, for any q and any nonnegative integer n, we use [q]n = (q n − 1)(q n−1 −
1) · · · (q − 1) (with the convention [q]0 = 1).
(Hint: The cardinality in question equals the number of sequences of k linear
independent vectors in F n divided by |GL(k, F )|. Next, ask yourself how man
nonzero vectors do exist in F n ; if w is such a vector, how many nonzero vectors
do exist in F n \ {a · w : a ∈ F }; . . . ?)
4.3. Prove Proposition 6.
4.4. For a code C with generator matrix G, let (V, S) be the (n, k)-system
derived from G as described in the last paragraph of the addon “Another description of codes: (n,k)-systems” above. Prove that (V, S) is indeed an (n, k)system, and that C equals the code associated to this system.
5. BOUNDS
5
21
Bounds
It is plausible that there must be a trade-off between rate and minimal distance.
A code with a high rate should have small minimal distance, and a large minimal
distance should have not many codewords, i.e. a small rate. For later it is useful
to introduce some vocabulary. We call a code an (n, N, d)q -code if it is of length
n over an alphabet with q letters and has cardinality N and minimal distance
d. An [n, k, d]q -code is a linear code of length n over the field with q elements
of dimension k and minimal distance d.
The first four theorems of this section translate the qualitative statement
of the last paragraph into precise quantitative forms. These theorems give, in
particular, a first feeling for what parameter triples (n, N, d)q of length, cardinality minimal distance are possible for codes over alphabets with q elements.
Moreover, their proofs teach us certain techniques to obtain such bounds.
Clearly, for every d ≤ n there exists a code of length n and minimal distance
d over an alphabet with q letters (e.g. the code {(a, . . . , a, b . . . , b), (b, b, . . . )},
where a 6= b are any two letters of the given alphabet and the first word has as
at the first d places followed by bs). However, how large can this code be? We
set
Aq (n, d) = max{N : an (n, N, d)q code exists}.
The first three theorems can be read as upper bounds for Aq (n, d). The fifth
theorem, the Gilbert-Varshomov bound, gives a lower bound.
Theorem 5.1 (Hamming bound). Let C be a code of length n over an alphabet
with q letters of information rate R and with minimal distance d. Then
R+
logq Vq (n, d−1
2 )
≤ 1.
n
Proof. Indeed, by the triangle inequality the balls of radius t :=
codewords are pairwise disjoint. Therefore
d−1
2
around the
|C| · Vq (n, t) ≤ q n
since q n is the number of all possible words of length n over an alphabet with
q letters. Taking the base-q logarithm yields the claimed inequality.
We call a code of length n over an alphabet A with q letters perfect if
the inequality of the theorem becomes “equality”, i.e. if the balls of radius
d+1
around the code words partition An . Recall from Example 4.2 that the
2
Hamming code H(7, 4) and the extended Golay codes G23 , whose rate and
minimal distance are 47 , 3 and 12
23 , 8, respectively, are perfect codes.
Theorem 5.2 (Singleton bound). Let C be a code of length n over an alphabet
with q letters of information rate R and with minimal distance d. Then
R+
d−1
≤ 1.
n
Proof. The application c 7→ c0 , where c0 is obtained from c by deleting the first
d − 1 letters, is injective, since two codewords differ in at least d places. The
22
CHAPTER 1. FUNDAMENTALS OF CODING THEORY
image of this application is a code C 0 of length n − d + 1, and thus contains at
most q n−d+1 codewords. Therefore
|C| ≤ |C 0 | ≤ q n−d+1 ,
and taking the base-q logarithm yields the claimed inequality.
Theorem 5.3 (Plotkin bound). Let C be a code of length n over an alphabet
with q letters of information rate R and with minimal distance d. Then, for
d
1
n > 1 − q , one has
1
d
.
R ≤ logq
n
d − n(1 − 1q )
Proof. Let N = |C|. For a letter a in the alphabet A of the code, let mi (a)
the number of codewords of C which have a at the ith place. The number of
ordered pairs in C with different entries is N (N − 1). We therefore have
N (N − 1)d ≤
X
x,y∈C
x6=y
h(x, y) =
n X
X
mi (a)(N − mi (a)).
i=1 a∈A
The first inequality follows since d ≤ h(x, y) for x 6= y. The formula on the
right is obtained by summing over all places i, and by counting, for each place
i, the pairs of codewords which differ at this
Pplace. For further estimating the
sums on the right we note, first of all, that a∈A mi (a) = N . Furthermore, by
the Cauchy-Schwartz inequality we have, for each a,
X
X
2
q
mi (a)2 ≥
mi (a) = N 2 .
a∈A
a∈A
(Apply the Cauchy-Schwartz inequality to the q-vectors mi (A) a∈A and (1, 1, . . . , 1).)
We therefore obtain
1
N (N − 1)d ≤ n(1 − )N 2 ,
q
i.e.
1 N d − n(1 − ) ≤ d.
q
The theorem is now obvious.
The next bound is a bound for linear codes.
Theorem 5.4 (Griesmer bound). Let C be a linear code of length n over a field
F with q elements of rate nk and with minimal distance d. Then
k X
d
≤ n.
q i−1
i=1
Proof. For positive integers k and d, let N (k, d) the minimal length of a linear
code over F of dimension k and minimal distance d. We show
N (k, d) ≥ N (k − 1, dq ) + d
5. BOUNDS
23
Applying this inequality repeatedly implies the claimed bound, namely
N (k, d) ≥ N (k − 1, dq ) + d
≥ N (k − 2, qd2 ) + dq + d
..
.
≥ N (1,
d
q k−1
k
k−1
X
X
d d
=
)+
i−1
q
q i−1 .
i=1
(where one also uses
l
dd/q i e
q
m
i=1
l
m
d
= qi+1
).
For showing the first inequality let C be an [n, k, d]q -code where n := N (k, d).
We can assume (by permuting all codewords simultaneously and multiplying a
given place of all codewords by a suitable nonzero element of F ) that C contains
a vector e consisting of d many 1s followed by 0s. Let D be a complement in C
of the subspace spanned by e, i.e. C = F · e ⊕ D. Finally, let C 0 obtained from
0
0
D by deleting
the first d places. We claim that C is a [n − d, k − 1, d ]q -code,
where d0 ≥ dq . Deleting successively suitable places of the codeword in C 0 we
can shorten C 0 to a [n − d − s, k − 1, dq ]q -code for some s (see Exercise 2),
which proves the inequality.
We prove the claim on C 0 . The code C 0 has obviously length n − d. Furthermore, it is clear that the application which deletes the first d places is injective
(since otherwise there would be a nonzero codeword in D which has only 0s
after the first d places, so that adding a suitable multiple of e to it would yield
a nonzero codeword in C of length < d). Hence C 0 has dimension k − 1.
Finally, let d0 be the minimal length of C. If we take a codeword c in C there
must be among the first d places at least dd/qe which have the same entry, say
a0 (since, if every element a of F occurs
a < d/q many times amongst
P only nP
the first d places we would have d = a∈F na < a∈F d/q = d.) But then, if c
is in D and c0 denotes the codeword in C 0 obtained from c by deleting the first
d places, we have
l m
d−
d
q
+ h(c0 ) ≥ h(c − a0 e) ≥ d.
It follows d0 ≥ d dq e.
The technique of the proof which derived C 0 from C is sometimes known as
constructing a residual code of C.
As said at the beginning the first three theorems can be read as upper bound
for Aq (n, d). Indeed, rewritten in terms of these numbers they state
logq Aq (n, d)
logq Vq (n, d−1
2 )
≤1−
,
n
n
logq Aq (n, d)
d−1
≤1−
,
n
n
logq Aq (n, d)
1
d
d
1
≤ logq
( > 1 − ).
1
n
n
n
q
d − n(1 − q )
The following is an upper bound.
24
CHAPTER 1. FUNDAMENTALS OF CODING THEORY
Theorem 5.5 (Gilbert-Varshamov bound). For any positive integers d ≤ n,
one has
logq Vq (n, d − 1)
logq Aq (n, d)
1−
≤
n
n
Proof. Let N = Aq (n, d) and let C be an (n, N, d)q code. Then there is no word
w in An \ C which has distance ≥ d to all code words (since otherwise we could
adjoin w to C and thereby still keeping the minimal distance which contradicts
the maximality of N ). Therefore the balls of radius d − 1 around the code words
cover all of An . In particular,
N · Vq (n, d − 1) ≥ q n .
Taking the base-q logarithm proofs the theorem.
Figure 1.4: For q=2 and n=256, plot of the Hamming, Singleton, Griesmer, Gilbert-Varshamov and Plotkin bounds in red, green, blue, gray
and purple, respectively. (We plotted the points ( nd , R), where R is the
maximal (respectively minimal, for Gilbert-Varshamov) rate admitted
by the respective bound.)
Exercises
5.1. Prove the inequality
dd/q i e
d
= i+1 ,
q
q
which we used in the proof of the Griesmer bound.
5. BOUNDS
25
5.2. Let C be a [n, k, d]q -code, and assume d ≥ 2. Show that there is a place i,
so that the code C 0 obtained from C by deleting the ith place of all codewords
in C is a [n − 1, k, d − 1]q -code.
5.3. By a suitable adaptation of the proof of the Gilbert-Varshamov bound,
prove that, for a given field with q elements and given d ≤ n there exists also a
linear [n, k, d]q -code such that n logq Vq (n, d1) ≤ k.
26
6
CHAPTER 1. FUNDAMENTALS OF CODING THEORY
Manin’s theorem
For comparing codes C of different length it is useful to introduce the relative
minimal distance
d(C)
δ(C) :=
,
n
where n denotes the length of C. Let Wq (n) be the set of of all points (δ, R)
in the plane for which there exists a code of length n over an alphabet with q
letters with minimal distance δn and information rate R. This set lies inside
the rectangle 0 ≤ δ, R ≤ 1. We are mainly interested in the maximal points of
this set with respect to the componentwise partial ordering, i.e. the ordering for
which (δ, R) ≤ (δ 0 , R0 ) if and only if δ ≤ δ 0 and R ≤ R0 . Namely, for a maximal
point (\(\delta,R)\) one has
R = max{R0 : there exists an (n, R0 , nδ)q -code},
and also
δ = max{δ 0 : there exists an (n, R, nδ 0 )q -code}.
In other words, whatever we fix, δ or R, the maximal points answer the question
for the best available pair δ, R. However, at the moment it seems to be impossible to describe the set Wq (n) or even only its maximal points precisely unless
n is very small. The number of (n, R, d)q -codes equals the number of subsets
n
of q n , which equals 2q . Even for q = 2 and, say n = 5 there are 232 ≈ 4 · 109
such codes, and computing for each of them the minimal distance would hit
the border of what is currently possible. (One can, however, do much better
by searching only for codes up to “Hamming-distance preserving isomorphism”
and which are maximal in the sense that adding another word decreases the
minimal distance.)
It is already interesting enough to consider prime powers for q and to consider
the sets Vq (n) of points (δ, R) which correspond to linear codes over Fq of length
n. The number of these codes is
Nq (n) :=
n
X
k=0
[q]n
[q]k [q]n−k
(see in Problem 2 in Section 4). For q = 2, the first values are
2, 5, 16, 67, 374, 2825, 29212, 417199, 8283458, 229755605, 8933488744.
Again, for n = 11 one has already N2 (n) ≈ 8.9 · 109 linear subspaces in F11
2 ,
which starts to run out of the range of feasible computations.
A more promising approach is to consider the set
Vq :=
[
Vq (n),
n≥1
and then, to “smoothen” it, the set Uq of its limit points. Recall that these are
those points x in the plane, for which every open neighborhood contains a point
of Vq different from x. Here one has the following theorem.
6. MANIN’S THEOREM
27
Theorem 6.1 (Manin). The set Uq of limit points of Vq is of the form
Uq = {(δ, R) ∈ [0, 1]2 : 0 ≤ R ≤ aq (δ)},
where aq : [0, 1] 7→ [0, 1] is a continuously decreasing function, equal to 0 on
[1 − 1q , 1].
For the proof we introduce two simple procedures for “shortening” a code.
Lemma 6.2. Let C be a [n, k, d]q -code. Then, for every 0 ≤ l < k, there exist
[n − l, k − l, d]q -codes, for every l < d, there exists an [n, k, d − l]q -code, and, for
every 0 ≤ l < k, d, there exist [n, k − l, d − l]q -codes.
Proof. For proving the first statement choose l places where a codeword of
weight d has zero coordinates (which is possible since by the Singleton bound
we have k + d ≤ n + 1). The subspace C 0 of vectors in C having vanishing
coordinates at these positions has dimension ≥ k − l (since its dimension equals
k − r, where r is the dimension of the image of the map projecting C onto the
fixed l coordinates, so that, in particular, r ≤ l). Its minimal length is clearly
d. The existence of [n − l, k − l, d]q -codes is now obvious.
For the second statement choose l places where a codeword of minimal length
has nonzero coordinates. Then the code C 0 obtained from C by replacing these
coordinates in every codeword by 0 is a [n, k, d − l]q -code. Any subspace of
dimension k−l containing a codeword of shortest Hamming weight of C provides
an [n, k − l, d − l]q -code.
Proof. For proving the theorem we follow essentially the original argument of
Manin.
Let A be the pencil of lines in the δ, R-plane trough (0, 1), and let B be the
pencil of lines δ − R = const.. For a point (δ0 , R0 ) let A(δ0 , R0 ) be the line
from A through this point, and let sA(δ0 , R0 ) be the segment on this line from
δ0
(δ0 , R0 ) down to ( 1−R
, 0). Similarly, let B(δ0 , R0 ) be the line in B through
0
(δ0 , R0 ), and sB(δ0 , R0 ) the segment from (δ0 , R0 ) down to (δ0 − R0 , 0).
We shall show below that, for every (δ0 −R0 , 0) in Uq , the segments sA(δ0 , R0 )
and sB(δ0 , R0 ) are contained in Uq . This is the essential step to prove the theorem.
Indeed, for 0 ≤ δ ≤ 1, set
aq (δ) := sup{R : (δ, R) ∈ Uq }.
Note that the line δ = 0 lies in Uq (as limit points of the codes {(d, . . . , d, 0, . . . , 0), 0})
so that the sets whose suprema we take are indeed nonempty. Note furthermore
that aq (δ) is in Uq for each δ (since the latter, being a set of limit points, is
obviously closed). Therefore the segments sA(δ, aq (δ)) and sB(δ, aq (δ)) are contained in Uq too. If 0 ≤ x < y ≤ 1 then (x, aq (x) lies to the “left” of B(y, aq (y))
(since aq (x) is greater or equal to the R-coordinate of the intersection point of
the segment sB(y, aq (y)) ⊆ Uq with the line δ = x), and similarly, (y, aq (y) lies
to the “right” of A(x, aq (a)). But then, for fixed x, the “freedom” of (y, aq (y)
is restricted to the segment on the line δ = y between the intersection points of
this line with A(x, aq (x)) and B(x, aq (x)). Since this freedom approaches 0 as y
tends to x we see that aq (δ) is continuous (a simple sketch makes this argument
clear).
28
CHAPTER 1. FUNDAMENTALS OF CODING THEORY
It is clear that aq (0) = 1 since Vq contains all points ( n1 , 1) (\(n\ge 1\))
(which correspond to the trivial codes of length n containing all words of length
n). The Plotkin bound (see the preceding section) implies that aq (δ) = 0 for
1 − 1q ≤ δ ≤ 1. Namely, if δ is in this range then there exists a sequence of codes
Cn of type [n, k, d]q with ( nd , nk ) → (δ, aq (δ)) as n tends to infinity. But by the
Plotkin bound we have
k
1
d/n
≤ log q
→ 0.
n
n
d/n − (1 − 1/q)
Finally, if R0 ≤ aq (δ0 ) then (δ0 , R0 ) is in Uq since the line δ − R = δ0 − R0
cuts the graph of aq at some point since δ 7→ λ(δ) := δ − (δ0 − R0 ) is increasing
and the continuous function aq (δ) − (δ − (δ0 − R0 ) is nonnegative at δ = δ0 and
negative at δ = 1.
For proving the claim we note that, for every [n, k, d]q -code C, the set Vq
contains the points
k−l
d
,
:0≤l<k
A(C) :=
n−l n−l
(as follows from the Lemma). These points are all on the segment sA( nd , nk ).
If (δ0 , R0 ) is a point inb Uq and {Cn } a sequence of codes with δ(Cn ), R(Cn )
approaching (δ0 , R0 ) and whose lengths tend to infinity, then the sets A(Cn )
approach and densely fill sA((δ0 , R0 ). A similar argument applies to the pencil
B (using the shortening of [n, k, d]q -codes to [n, k − l.d − l]q -codes).
For showing that aq (δ) decreases one may use the pencil of lines R = const..
Using the shortening of [n, k, d]q -codes to [n, k.d − l]q -codes for l < d one shows
that Uq contains, for every δ the line R = aq (δ). In particular, If δ 0 < δ then
(δ 0 , aq (δ)) is in Uq , so that aq (δ) is in the set whose supremum equals aq (δ 0 ).
One does not know much about aq (δ) except various bounds which one can
derive from bounds like the ones of the last section, and as we will do now.
These bounds are obtained by letting the length n tend to infinity (for which
they are also named asymptotic bounds), a technique which we already used in
the proof.
Theorem 6.3. The function aq (δ) of the preceding theorem satisfies the following bounds:
aq (δ) ≤ 1 − δ
1
aq (δ) ≤ 1 − Hq ( δ)
2
1 − Hq (δ) ≤ aq (δ)
(Asymptotic Singleton bound),
(Asymptotic Hamming bound),
(Asymptotic Hamming bound).
Proof. The Singleton bound for codes of length n states that Vq (n) is contained
in the (finite) set
Sq (n) := {(δ, R) ∈ ([0, 1] ∩
1
1 2
Z) : R + δ ≤ 1 + }.
n
n
Therefore Vq is contained in the union Sq of all Sq (n), and Uq is contained in the
set Sbq of limit points of Sq . But Sbq is contained in the set of all (δ, R) ∈ [0, 1]2
such that R ≤ 1 − δ, which implies the asymptotic Singleton bound.
6. MANIN’S THEOREM
29
The asymptotic Hamming bound follows similarly by considering the sets
Hq (n) := {(δ, R) ∈ ([0, 1] ∩
1 2
1
Z) : R + logq Vq (n, (δn − 1)/2) ≤ 1}
n
n
instead of Sq (n). Moreover, we us that n1 logq Vq (n, δn/2) tends to Hq (δ/2) (see
Theorem ).
Finally, for a given δ, choose a sequence of [ni , ki , di ]q -codes such that ndii →
δ. We can assume that, for each i, the dimension ki is maximal. By the GilbertVarshamov bound for linear codes (see Exercise 3 in Section 4) we have
1
ki
≥1−
logq Vq (ni , di − 1).
ni
ni
Since the nkii are in the closed interval [0, 1] there is a convergent subsequence;
we can therefore assume that nkii → R for some R. If δ is irrational, the set
of rational numbers which are equal to at least one of the ndii cannot be finite.
Therefore, (δ, R) is a limit point of Vq , and so aq (δ) ≥ R. But R, as limit of
the nkii , is greater or equal to the limit of the right hand side of the limit of the
Gilbert-Varshamov bound, which, by the theorem in Section 2 equals 1 − Hq (δ).
Since aq and Hq are continuous,the asymptotic Gilbert-Varshamov bound is
now obvious.
30
CHAPTER 1. FUNDAMENTALS OF CODING THEORY
Chapter 2
Infinite Families of Linear
Codes
7
Reed-Solomon Codes
Recall that the Singleton bound for a linear [n, k, d]q code C states k +d ≤ n+1.
If we have equality, i.e. if k+d = n+1, or, in other words, if C is a [n, k, n+1−k]q
code, then we call C an Maximum Distance Separable (MDS) code. In this
section we introduce an infinite family of MDS codes.
Let F be a finite field with q elements, fix a vector a of length n with pairwise
different entries aj in F , and set
Eva,k : F [x]<k → F n , f 7→ f (a1 ), f (a2 ), . . . , f (an ) .
Here F [x]<k denotes the F -sub-vectorspace of polynomials of degree < k in
F [x]. Note that its dimension equals k. As basis one might take for instance
the polynomials 1, x, . . . , xk−1 . For n ≥ k the evaluation map Eva,k is injective
since a nonzero polynomial of degree l < k has at most l < n zeros and since
we assume that the aj are pairwise different.. The image RSq (a, k) of Eva,k is a
linear code of F of length n, called Reed-Solomon-code of degree k − 1 associated
to a. Note that for such a code to exist we need
n ≤ |F |
(since we assume that the entries of a are pairwise different). In particular, for
a given F , there are only finitely many Reed-Solomon codes over a given field
F.
Theorem 7.1. A Reed-Solomon Code RSq (a, k) of length n ≥ k over a field
with q elements is a [n, k, n − k + 1]q -code.
Proof. The only non-obvious statement is the minimal distance. For this note
h f (a1 ), f (a2 ), . . . , f (an ) = n − #{i : f (ai ) = 0}
(2.1)
≥ n − deg(f ) ≥ n − k + 1.
For f =
Qk−1
i=1
(x − ai ) we have here equality.
31
(2.2)
32
CHAPTER 2. INFINITE FAMILIES OF LINEAR CODES
As a consequence we see that a Reed-Solomon Code with n ≥ k reaches the
Singleton bound, and is therefore an MDS-Code. Note also, that this is not in
contradiction to Manin’s theorem, since, for a given F , the set of Reed-Solomon
codes is finite and the associated set
1−
k −1 k
,
: 1 ≤ k ≤ n ≤ |F |
n
n
of the δ, R-plane has therefore no limit points.
Figure 2.1: The 528 32-ary Reed-Solomon codes in the δ, R-plane
QR codes
Reed-Solomon and derived codes are
used for error correction in data
streams occurring for example in
transmission of audio and video
streams; see for concrete examples.
The reason is that Reed-Solomon
codes are used over a field with many
elements. A bit stream is for example partitioned into bytes and each
byte represents an element of the
field F256 . Thus burst errors, which
typically occur in data streams, lead
then only to a few errors in a code
word over F256 . A sequence of 32
wrong bits for example leads to 4
successive errors, which could be corrected by a RS256 (a, 8) code over
F256 with an a of length 16.
QR codes are typical examples for
having burst errors. For example,
part of the paper which they are
printed on might be missing, or a
company prints its logo onto the
QR code. To compensate for this
the they use Reed-Solomon errorcorrection.
7. REED-SOLOMON CODES
33
Exercises
7.1. What is the kernel of the map
Ev : F [x] → F q ,
f 7→ f (a1 ), f (a2 ), . . . , f (aq ) ,
where q = |F | and F = {a1 , a2 , . . . , aq }. (Hint: The requested kernel is in fact
an ideal of F [x], and as such a principal ideal.)
7.2. What can one say about the dimension of RSq (a, k) for k stricly greater
than the length of a?
34
8
CHAPTER 2. INFINITE FAMILIES OF LINEAR CODES
Reed-Muller codes
We can formally generalize the construction of the previous paragraph in a
straightforward way and consider multivariate polynomials. However, a first
difficulty is the study of the kernels of the evaluations maps. For Reed-Solomon
codes they are injective if one assumes that the number of points at which we
evaluate a polynomial is larger than its degree. The reason for this is that a
polynomial of degree l has at most l zeros. So the first thing to do is to study
the zeros of multivariate polynomials over a finite field.
For this let us fix a finite field F . For a polynomial f in F [x1 , . . . , xr ] let
N (f ) := {x ∈ F r : f (x) = 0}
be the set of zeros of f in F r .
Theorem 8.1 (Schwartz-Zippel lemma). If f is not the zero polynomial, one
has
#N (f ) ≤ l · |F |r−1 ,
where l denotes the degree of f .
Proof. We can assume that l < |F | since otherwise the inequality is trivial. We
follow here the short proof by Dana Moshkovitz. For this write f = g + h,
where g is homogeneous of degree l and h contains only monomials of degree
strictly less than l. By the subsequent lemma we can find a y in F r such
that g(y) 6= 0; since g is homogeneous we have y 6= 0. The space F r can be
partitioned into |F |r−1 many lines of the form Lx = {x + ty : t ∈ F }. The
restriction p(t) := f (x + ty) of f onto Lx is a polynomial of degree l of the
form g(y)tl + lower terms. Therefore it has ≤ l zeros. The theorem is now
obvious.
Lemma 8.2. If g is a nonzero polynomial in r variables and of degree l < |F |,
there exists a y such that g(y) 6= 0.
Note that the assumption l < |F | is not superfluous: the polynomial xq − x,
where q = |F |, is nonzero, but for every a in F one has aq = a (since F ∗ is a
group of order q − 1, so that aq−1 = 1 for every a in F ∗ ). Note also that the
bound is sharp: if L is a nonzero linear form in r variables then N (L) consists
of |F |r−1 points.
Choosing l(< |F |) pairwise different elements aj of F the
Q
polynomial j (aj + L) has degree l and exactly l · |F |r−1 zeros.
Proof. There exists an a in F such that
g1 (x1 , . . . , xr−1 ) := g(x1 , . . . , xr−1 , a)
is not identically zero since otherwise xr − a would divide g for all a in F
contradicting the assumption that the degree of F is strictly less than |F |.
Applying the same argument to the polynomial g1 yields a b in F such that
g2 (x1 , . . . , xr−2 ) := g1 (x1 , . . . , xr−2 , b) is not the zero polynomial. Continuing
in this way we finally find a y = (. . . , b, a) in F r such that g(y) 6= 0.
8. REED-MULLER CODES
35
We can now proceed as in the previous section. Choose n ≤ |F |r and a
vector a of length n whose entries ai are pairwise different points in F r . The
map
Eva,k : F [x1 , . . . , xr ]<k → F n , f 7→ f (a1 ), . . . , f (an )
is then injective if k ≤ |F | and (k − 1)|F |r−1 < n, since a nonzero f of degree
< k cannot vanish in more than (k − 1)|F |r−1 points (by the theorem). The
subscript “< k” denotes the subspace of polynomials in r variables whose degree
is strictly less than k. Note that the dimension of F [x1 , . . . , xr ]<k equals r+k−1
k−1
(see below). Moreover, if f is nonzero than the vector Eva,k (f ) has at most (k −
1)|F |r−1 zeros, that means its Hamming weight is ≥ n−(k−1)|F |r−1 . Moreover,
as we saw there is a polynomial attaining this bound. We can summarize:
Theorem 8.3. Assume 1≤ k ≤ |F | and (k − 1)|F |r−1 < n. The image of
Eva,k is then an [n, r+k−1
, n − (k − 1)|F |r−1 ]|F | code.
r
If one chooses n = q r and 1 ≤ k ≤ q, where q = |F |, the assumptions of
the theorem are fulfilled. The resulting code is then denoted by RMq (r, k) and
called a Reed-Muller code. If By the last theorem we find that RMq (r, k) is a
code of type
k−1
r+k−1
r
r
.
,q 1 −
q ,
q
r
q
There are infinitely many Reed-Muller codes RMq (r, k) over a given field
with q elements. Let RMq be the set of all pairs δ(C), R(C), where C runs
through the Reed-Muller codes over F . In other words,
(
)
!
r+k−1
1 − k−1
q ,
r
RMq =
: 1 ≤ k ≤ q, r ≥ 1 .
qr
r
It is easy to see that, for a fixed k the sequence of rates r+k−1
k−1 /q tends to 0
as r increases. Therefore, the limit points of RMq are the points ( 1q , 0), ( 2q , 0),
. . . , (1, 0)).
The Mariner mission
The Mariner mission used the [32, 6, 16]2 code, and can therefore
RM2 (5, 2), which is a binary correct up to 7 errors.
The number of polynomials of degree l in r variables
Theorem 8.4. The dimension of which is the same as the number ν(l)
the space F [x1 , . . . , xr ]≤l of polynoof solutions of i1 + · · · + ir + ir+1 = l.
r+l
But
mials of degree ≤ l equals r .
Proof. As basis for the F -vector X
X −r − 1
X
X
1
l
i1 +···+ir +ir+1
ν(l)
x
=
·
·
·
x
=
=
(−1)
space F [x1 , . . . , xe ]≤l one can take
(1 − x)r+1
l
i1 i2
ir
i
≥0
i
≥0
l≥0
l≥0
1
r
the monomials x1 x2 · · · x1 with
i1 + · · · + ir ≤ l. This number
−r−1
ν(l) =
(−1)l =
equals the number of solutions of i1 + whence
l
r+l
r+l
= r .
· · · + ir ≤ l in non-negative integers,
l
36
CHAPTER 2. INFINITE FAMILIES OF LINEAR CODES
Figure 2.2: The set RM2 (red), RM16 (green), RM32 (blue) for r =
1, 2, . . . , 10 in the δ, R-plane. The “Mariner” code RM2 (5, 2) is encircled
9
Cyclic codes
Since there is apparently no way to understand, for a given field F , all linear subspaces of F n from the point of coding theory, it is natural to look first
for more distinguished spaces. In whatever sense these spaces might be distinguished, the hope is that they are also interesting in the sense of coding theory.
Distinguished spaces are for example those which are symmetric with respect to
transformations respecting the Hamming weight. Transformations respecting
the Hamming weight are in particular permutations of the places of a word.
So it is natural to propose the study linear codes over a given field F which
are invariant under certain permutations. To be more precise, recall that the
symmetric group Sn of permutations of the set {1, 2, . . . , n} acts on F n via
(s−1 , w1 w2 · · · wn ) 7→ ws(1) ws(2) · · · ws(n) .
We call a linear code in F n cyclic if it is invariant under the subgroup h(1, 2, 3, . . . , n)i
generated by the permutation (1, 2, 3, . . . , n), which maps 1 to 2, 2 to 3 etc. and
finally n to 1. In other words, a code C in F n is cyclic if for all code words
c1 c2 c3 · · · cn in C, the word cn c1 c2 · · · cn−1 is also in C.
There is a useful, more algebraic characterization of cyclic codes. For this
we identify F n with the ring
Rn := F [x]/(xn − 1)
via the isomorphism of F -vector spaces
c0 c1 · · · cn−1 7→ [c0 + c1 x + c2 x2 + · · · + cn−1 xn−1 ]xn −1 .
The notations are as in 2. Note, however, that Rn , for n ≥ 2, is not a field since
xn − 1 is not irreducible. In general, Rn is a ring. More important for us is that
9. CYCLIC CODES
37
Rn is an F [x]-module via
(f, [g]xn −1 ) 7→ [f · g]xn −1 .
The action of the permutation s = (1, 2, 3, . . . , n) on words in F n corresponds
then to multiplication of residue classes by x. Indeed
x.[c0 + c1 x + c2 x2 + · · · + cn−1 xn−1 ]xn −1
= [c0 x + c1 x2 + c2 x3 + · · · + cn−2 xn−1 + cn−1 ]xn −1
(since xn ≡ 1 mod xn − 1), and the class on the right corresponds to the word
cn−1 c0 c1 · · · cn−2 . Therefore a code C (considered as subset of Rn ) is a cyclic
code if and only if it is a F [x]-submodule of Rn .
Every F [x]-submodules of Rn is of the form
Cn (g) := F [x].[g]xn −1
for a normalized divisor g of xn − 1 (see below). We therefore have a one-to-one
correspondence
{cyclic codes of length n over F } ↔ {g ∈ F [x] : g|(xn − 1), g norm.} .
For the cyclic code Cn (g), the polynomial g is called the generator polynomial.
The polynomial
xn − 1
g ∗ :=
g
is called control polynomial. The reason for the latter naming is that a word
corresponding to the residue class [f ]xn −1 is a codeword in Cn (g) if and only if
g ∗ .[f ]xn −1 = 0 (see Exercise 1).
Theorem 9.1. The dimension of a cyclic code of length n with generator and
control polynomial g and g ∗ equals n − deg(g) = deg(g ∗ ).
Proof. The canonical map defines an exact sequence
0 → Cn (g) → Rn → Rn /Rn .[g] → 0.
(where we suppress the subscript
xn −1 ).
It follows that
dim Cn (g) = dim Rn − dim Rn /Rn .[g].
We know that dim Rn = n. For computing the second dimension we note that
the application f + F [x]g 7→ [f ] + Rn [g] defines an isomorphism
F [x]/F [x].g → Rn /Rn .[g]).
The space on the left has dimension deg(g). The theorem is now obvious.
As a basis for the cyclic code C of length n with generator polynomial g
of degree l one can take [g], [xg], [x2 g], ..., [xn−l−1 g]. This remark is useful for
setting up a generator or control matrix for C, and for computing the minimal
distance for C.
Apparently there is currently neither a closed formula nor a good algorithm
for computing the minimal distance of the general cyclic code. It was announced
that tables of the minimal distance of all binary cyvlic codes of length up to
n ≤ 1000 have been computed. Anyway, in the next section we shall study a
certain subclass of cyclic codes for which we can give at least lower bounds for
their minimal distances.
38
CHAPTER 2. INFINITE FAMILIES OF LINEAR CODES
Example 9.2 (The Golay code G23 ). The polynomial x23 − 1 factors over F2
into a product of three irreducible polynomials
x23 − 1 = (x − 1)·(x11 + x9 + x7 + x6 + x5 + x + 1)
·(x11 + x10 + x6 + x5 + x4 + x2 + 1).
Set g := x11 + x10 + x6 + x5 + x4 + x2 + 1. The code C23 (g) has dimension
12. Its minimal distance is 7 (see Exercise 2). In fact, this is the Golay code
introduced in 4.4.
The cyclic codes over a given field F and a given length n form a lattice with
respect to inclusion: the intersection and the direct sum is again a cyclic code.
For two divisors g1 and g2 of xn − 1 one has F [x].[g1 ] ⊂ F [x].[g2 ] if and only if
g1 is a multiple of g2 (see Theorem). The maximal non-trivial codes correspond
therefore to the irreducible factors of xn − 1.
Figure 2.3: Lattice of binary cyclic codes of length 7. The divisors of
x7 − 1 are 1, x + 1, x3 + x + 1, x3 + x2 + 1, x4 + x2 + x + 1, x4 + x3 + x2 +
1, x6 + x5 + x4 + x3 + x2 + x + 1, x7 + 1
The number of cyclic codes over F of length n equals the number σF (xn − 1)
of normalized divisors of xn − 1. If n is relatively prime to |F |, then xn − 1
9. CYCLIC CODES
39
and its derivative nxn−1 are relatively prime, and hence xn − 1 has no multiple
irreducible factor (see Addon 9). In this case σF (xn −1) = 2N , where N denotes
the number of irreducible normalized polynomials dividing xn − 1. The number
N can be easily computed.
Theorem 9.3. Let F be a field with q elements. If n and q are relatively prime,
the number of cyclic codes over F of length n equals 2N , where
N=
X ϕ(l)
.
ordl (q)
l|n
Here ϕ is Euler’s phi-function, and ordl (q), for any l, denotes the smallest
positive integer f such that q f ≡ 1 mod l.
Proof. We use some arguments from Galois theory of finite fields without further
explanation. The less experienced reader might wish to skip this proof.
Let d be the smallest positive integer such that q d ≡ 1 mod n, and let Q = q d .
We can then assume that FQ contains F .Moreover, the polynomial xn − 1 has
n different roots in FQ . The latter is true since n divides the order Q − 1 of the
cyclic group F∗Q , which therefore possesses exactly n solutions of the equation
an = 1. Let R be the set of these solutions. The Galois group G of FQ over F is
cyclic of order d, generated by φ : a 7→ aq . The normalized prime polynomials
in F [x] dividing xn − 1 are in one to one correspondence
to the G-orbits of S
Q
(the orbit O corresponding to the polynomial a∈O (x − a)). The number of
(normalized) divisors in F [x] of xn − 1 equals therefore
2#G\S .
S
For computing the Galois orbits of S decompose S as S = l|n S(l), where
S(l) denotes the elements of order l in S. The number of elements in S equals
ϕ(l). Clearly every S(l) is invariant under G. The stabilizer of an element a
in S(l) is φf , where f = ordl (q). The orbit of any element in S(l) consists
therefore of f elements, and S(l) decomposes into ϕ(l)/f orbits. The theorem
is now obvious.
??? to do ???
Table 2.1: Base 2 logarithm of the number of cyclic codes of odd length n =
32i + j over the field with two elements in the ith row and jth column
If n and |F | are not relatively prime then xn −1 contains multiple irreducible
polynomials (see Addon 9). For example over F2 and for a 2-power n we have
xn − 1 = (x + 1)n
(as one sees by applying successively y 2 − 1) = (y + 1)2 ). In this case the lattice
of cyclic codes of length n is a totally ordered set of length n + 1.
40
CHAPTER 2. INFINITE FAMILIES OF LINEAR CODES
Modules and Ideals
A module M over a ring R (or simply R-module M ) is an (additively
written) abelian group M and an
“action of the ring R on M ”. By
the latter we mean a map
R × M → M,
(a, m) 7→ a.m
which “commutes with all kind of
operations”, i.e. which satisfies (a +
b).m = a.m + b.m, (a · b).m =
a.(b.m), 0R .m = 0M , 1R .m = m,
a.(m + m0 ) = a.m + a.m0 for all
a, b ∈ R, m, m0 ∈ M .
An R-submodule S of M is a subgroup of M such that R.S(= {a.s :
a ∈ R, s ∈ S}) ⊆ S. It is clear that
the restriction of the action map in
R × M to R × S defines then the
structure of a module over R on S.
If S is a submodule of M we can define the quotient module M/S. This
is the quotient group M/S in the
sense of group theory (i.e. the set
of cosets m + S := {m + s : s ∈ S},
where m runs through M , equipped
with the addition (m + S) + (m0 +
S) := (m + m0 ) + S) together with
the action defined by a.(m + S) :=
a.m + S. It is easy to check that
M/S becomes in this way indeed an
R-module.
We may consider R itself as an
R=module by taking as action the
multiplication map (a, b) 7→ a · b in
the ring R. The R-submodules of
R are then called ideals of R. An
ideal I is therefore a subgroup of R
such that R · I ⊆ I. The quotient
module R/I is then not only a module over R, but even more a ring
by itself if we take as multiplication
the map (a + I, b + I) 7→ a · b + I
(one has to verify, of course, that
this map is well-defined, i.e. does not
depend on the choice of representatives a and b of the cosets in question. Special ideals are the principal
ideals (a) = Ra = aR.
Taking the quotient of modules and,
in particular, the quotient of rings, is
one of the most basic constructions
in mathematics. We saw it already
in the sections 4Section 1.4: The
ring /m of residue classes modulo
m and 2Section 1.2: Finite Fields.
The set of real number itself is nothing else but R = C/N , where C is
the ring of all Cauchy sequences in
Q (with element-wise addition and
multiplication), and where N is the
ideal of sequences in Q which converge to 0. The notation pi =
3.14159265359 is nothing else but a
shorthand for the coset
31 314
,··· + N.
π = 3, ,
10 100
9. CYCLIC CODES
41
Arithmetic in F [x]
For a field F the ring F [x] has much
in common with the ring of integers Z. This is due to the fact
that both are Euclidean rings. In
an Euclidean ring every nonzero element a has a prime factorization
a = p1 · · · pr , where the sequence of
prime elements {pj }j is unique up
to permutation of the elements and
up to multiplication of the primes pj
by units. A prime element p is a
nonzero element with the property
that a · b is a multiple of p only of either a or b is so. The prime elements
in Z are the numbers ±p, where p is
a prime number. In F [x] the prime
elements p are the irreducible polynomials, i.e. those polynomial which
cannot be decomposed as product of
two polynomials in F [x] of degree
strictly less than the degree of p. A
unit of a ring R is an element u such
that there exists also an element v
such that u · v = 1. The units form a
group with respect to the ring multiplication. The units of F [x] are the
nonzero elements of F . The units of
Z are the integers ±1.
The prime elements in C[x] are (up
to multiplication by units) exactly
the polynomials of the form x − a,
where a runs through C. In fact, by
the Fundamental Theorem of Algebra every complex polynomials can
be written (up to a unit) as product
of polynomials x − a, where a runs
through the zeros of f (taking multiplicities into account). The prime elements of R[x] are the linear polynomials and the quadratic ones which
have no real roots. The number of
prime elements of F [x] for a finite
field F was discussed in Section ??:
Exercise 5.
A question which occurs sometimes
is when does a polynomial have a
multiple prime factor. The answer
is given by the following theorem.
theorem is the product rule
(rg)0 = r0 g + rg 0
Moreover, an irreducible r does not
divide r0 . Otherwise, r0 = 0 (since,
for r0 6= 0, one has deg(r) > deg(r0 ).
But this is only possible if F is a finite field, say, F = Fpn , and r(x) =
h(xp ) for a suitable polynomial h.
But then h(xp ) = h(x)p (which follows from p| kp for 1 ≤ k ≤ p − 1)
contradicting the fact that r is irreducible.
Assume that a prime polynomial r
divides f , say, f = rg. Then, the
product rule implies that r|f 0 if and
only if r|r0 g. But r|r0 g is equivalent
to r|g (since r does not divide r0 ).
The theorem now follows.
A particular interesting situation
can occur in finite fields as we saw
in the preceding proof. Here it can
happen that the derivative of a polynomial g is the zero polynomial. But
then g(x) = h(xp ) = h(x)p for a
suitable polynomial h.
Another property of Euclidean rings
R is that every ideal I is principal,
i.e. every ideal is of the form R · a
for some element a in R (which is
uniquely determined by I up to multiplication by a unit. Thus, every
ideal in Z is of the form Zm = mZ =
(m), and therefore there are no other
quotients of Z than Z/mZ, and similarly for F [x].
For an arbitrary ring one defines the
gcd of two ideals I and J as their
sum I + J = {a + b : a ∈ I, b ∈ J}
(which is again an ideal). If we consider the ring of integers and ideals
Zm and Zn, then Zm + Zn = Zg,
where g is the greatest common divisor of m and n. Similarly, for a
polynomial ring F [x], the gcd of two
ideals (f ) and (g) equals the ideal
F [x] gcd(f, g), where gcd(f, g) is the
normalized polynomial of largest deTheorem 9.4. A polynomial f in gree dividing f and g, or 0 if f =
F [x] is divisible by the square of a
g = 0. In particular we find
prime polynomial r if and only if f
and f 0 have a common prime factor. Theorem 9.5 (Bzout’s theorem).
For any given polynomials f and g
Proof. If f = an xn + an−1 xn−1 + · · · over a field F , there exist polynomithe derivative f 0 of f is defined as als h and k in F [x] such that
f 0 = nan xn−1 + (n − 1)an−1 xn−2 +
gcd(f, g) = f h + gk.
· · · . The key identity for proving the
42
CHAPTER 2. INFINITE FAMILIES OF LINEAR CODES
Ideals of F [x]/(xn − 1)
The following theorem is true for any Proof. Let κ be the canonical map
(not necessarily finite) field F .
f 7→ [f ]xn −1 . The application I 7→
κ(I) defines a bijection between the
Theorem 9.6. Every ideal of Rn = ideals of Rn and the ideals of F [x]
F [x]/(xn − 1) is of the form Rn · which contain F [x] · (xn − 1). But
[g]xn −1 , where g is a normalized di- any ideal I of F [x] is of the form
visor of xn − 1. For divisors g and F [x] · f for some f , and then I conh of xn − 1, one has Rn · [g]xn −1 ⊆ tains F [x] · (xn − 1) if and only if
Rn · [h]xn −1 if and only if g is a mul- f |(xn − 1). From this the theorem is
tiple of h.
obvious.
Exercises
9.1. Prove that that a word corresponding to the residue class [f ]xn −1 is a
codeword in Cn (g) if and only if g ∗ .[f ]xn −1 = 0
9.2. Compute the minimal distance of the binary code
C23 (x11 + x10 + x6 + x5 + x4 + x2 + 1).
9.3. Deduce from Theorem 3 a formula for the number of cyclic codes of length
n over a given field for an arbitrary n.
10. QUADRATIC RESIDUE CODES
10
43
Quadratic residue codes
There is a subclass of cyclic codes which is particularly interesting, namely
the quadratic residue codes. They are interesting because the Hamming code
H(7, 4) and the Golay code G23 are members of this class, and because we can
prove a lower bound for the minimal distances of these codes.
We continue the notations of the previous section. In particular, as before
we identify F n , for a given finite field F with the ring Rn = F [x]/(xn − 1).
The Hamming weight h(c) of an element c = [f ] = [f ]xn −1 = [f ] in Rn equals
then the number of nonzero coefficients of the remainder of f after division by
xn − 1. The ring Rn can also be considered as a vector space over F (in fact, it
a F -algebra). In particular, for every polynomial f in F [x] the expression f ([x])
is meaningful and equal to [f ].
Proposition 10.1. For any c1 , c2 in Rn , one has
h(c1 c2 ) ≤ h(c1 ) h(c2 ).
Proof. Let c1 = [f ], c2 = [g], where f and g are polynomials of degree < n. Let d
and e be the minimal weights of f and g. Then f = a1 xm1 +a2 xm2 +· · ·+ad xmd
and g = b1 xn1 + b2 xn2 + · · · + bd xnd for suitable ai , bj in F and non-negative
integers mi , nj . Therefore
X
X
h([f ][g]) = h([
ai bj xmi +nj ]) ≤
h([xmi +nj ]) = de.
i,j
i,j
The group (Z/nZ)∗ of primitive residue classes acts on the ring Rn : the
action of the residue class u on [f ] is defined by
[f ]u := f ([x]u ).
Here [x]u is a shorthand for [x]r = [xr ] with a (any) r from the class n. Since
[x]n = 1 this does not depend on the choice of r. For the same reason, the
definition of [f ]u does not depend on the choice of f in [f ]. The action of
(Z/nZ)∗ is isometric, i.e. for any given u in (Z/nZ)∗ the map [f ] 7→ [f ]u defines
an isometry with respect to the Hamming distance. Indeed, the map [f ] 7→ [f ]u ,
translated back to words in F n , is nothing else but a permutation of the entries
by the permutation which maps i to the rest of ri by division through n (for
any r in u). In particular, the action of (Z/nZ)∗ permutes the cyclic codes of
length n and preserves dimension and minimal distance. In Exercise 2 we study
the action of (Z/nZ)∗ on cyclic codes of length n more closely.
We assume from now on that F = Fp for a prime p, and that l is a prime
number different from p such that p is a square modulo l. Though we do not
need it we mention that p ≡ mod l is equivalent to the statement that l
lies in certain residue classes modulo 4p (see Addon 10). For example for the
important case 2 the assumption that 2 is a square modulo the odd prime l
is equivalent to \( l\equiv \pm1 \bmod 8\). The first primes for which 2 is a
square are therefore 7, 17, 23, 31, 41, 47, 57, 71, 73, 79, . . . .
44
CHAPTER 2. INFINITE FAMILIES OF LINEAR CODES
The assumption p ≡ mod l admits the existence of quadratic residue codes
of length l over Fp . We describe the generator polynomials of these codes. For
this let
Q := {1 ≤ j ≤ l − 1 : j ≡ mod l},
N := {1 ≤ j ≤ l − 1 : j 6≡ mod l}.
Set
ρl := c +
X
xj ,
ql := gcd(ρl , xl − 1),
j∈Q
and define similarly νl and nl replacing Q by N . The element c of Fp is chosen
1+s
as follows. If p = 2 we set c = 1+l
2 . If p is odd, we choose c = 2 where \( sˆ2 =
(-1)ˆ{\frac {l-1}2}l\). Note that such an s exists indeed. Namely, by quadratic
l−1
reciprocity (see below) and since p ≡ mod l we know that (−1) 2 l ≡ mod
p.
The quadratic residue codes of length l over Fp are the Rl -ideals generated
by [ql ] and [nl ], respectively. We use QRl for the quadratic residue code with
generator polynomial ql .
The two quadratic residue codes are mapped to each other by the action of a
non-square in F∗l as we shall see in the subsequent proposition.Therefore, from
the point of view of coding theory it is a priori not important which one we
consider. The reader might have noticed that, for p 6= 2, the constant c = 1+s
2
l−1
was not unambiguously defined since instead of the square root s of (−1) 2 l
we could also take −s. However, it does not matter which root we choose since
another choice would simply exchange [ql ] and [nl ] (see Exercise 2).
Proposition 10.2. Let ph ≡ 1 mod l. There exists a lth root of unity ζ 6= 1
such that\l\) root of unity in Fph and a quadratic non-residue u modulo l such
that
l−1
l−1
2
2
Y
Y
2
2
ql =
(x − ζ j ), nl =
(x − ζ uj ), .
j=1
j=1
We leave the proof as an exercise to the reader (Exercise 3). As an immediate
consequence we obtain
Lemma 10.3.
1. For a in F∗l , one has ([ql ])a = ([ql ]) if a is a square in Fl ,
and one has ([ql ]a ) = ([nl ]) otherwise.
2. One has ql nl = xl−1 + xl−2 + · · · + 1.)
Proof. 2. is clear from the theorem and since ζ s (1 ≤ j ≤ l − 1) runs through
all roots of
xl − 1
.
ql nl = xl−1 + xl−2 + · · · + 1 =
x−1
2
For 1. note that ([ql ]) consists of all [f ] where f vanishes at ζ j for all
uj 2
1 ≤ j ≤ l−1
for
2 , and that ([nl ]) consists of all [f ] where f vanishes at ζ
l−1
a
r
all 1 ≤ j ≤ 2 . If the integer r represents a, then [ql ] = [ql (x )], and ql (xr )
2
vanishes at ζ rj (1 ≤ j ≤ {l − 1}2). Thus ([ql ])a ⊆ ([ql ]) if a is a square in Fl ,
a
and ([ql ]) ⊆ ([nl ]) otherwise. Since
dim([ql ])a = dim([ql ]) = dim([nl ]) =
(see 9.1) the claim is now obvious.
l+1
2
10. QUADRATIC RESIDUE CODES
45
We come to the main result of this section.
Theorem 10.4 (Square root bound). The quadratic residue codes over Fp are
[l, l+1
2 , d]p -codes, where
√
d ≥ l.
Proof. It is clear that the residue codes have length l and dimension l+1
2 , where
the latter follows from Theorem 9.1.
It suffices to prove the bound for d for the code QRl since both quadratic
residue codes are mapped to each other by an isometry. Let [f ] be an element
of QRl , and let a be a non-square in Fl . The element [f ]a is in [ql ]a = [nl ].
Therefore [f ] · ([f ]a ) is in ([ql ]) ([nl ]) = Rl [xl−1 + xl−2 + · · · + 1] (where latter
identity follows from Lemma 10.3 ). But the latter code is one dimensional as
we saw in Theorem 9.1. It follows [f ] · ([f ]a ) = t · [xl−1 + xl−2 + · · · + 1] for a
suitable t in F . But then
h [f ] · ([f ]a ) = l,
provided t 6= 0. But t = 0 is easily seen to be equivalent to f (1) = 0. It is a
rather non-obvious fact that indeed f (1) 6= 0 for an element of [f ] of minimal
length in QRl , as we shall see in the next theorem.
On the other hand the preceding proposition implies
h [f ] · ([f ]a ) ≤ h([f ])2 .
The theorem is now obvious.
Supplement (to Theorem 10.4). If l ≡ −1 mod 4 then the minimal distance d
of the quadratic residue codes of length l satisfies
d≥
1 √
+ l − 1.
2
Proof. If l ≡ −1 mod 4, then −1 is a quadratic non-residue modulo l (see
Quadratic reciprocity below). In this case we can take in the preceding proof
a = −1, and then [f ]a = f ([x]−1 ). But
h [f ] · f ([x]−1 ) ≤ h([f ])2 − h([f ] + 1
(see Exercise 1). From this we deduce as in the preceding proof that the minimal
distance d of QRl satisfies d2 − d + 1 ≥ l, and therefore the claimed lower
bound
The first binary quadratic residue codes
For a quadratic residue code of
length l over F2 one can prove that
its minimal distance d is odd, and,
even more, that d ≡ 3 mod 4 if l ≡
−1 mod 8. The following table lists
for the first binary quadratic residue
of length\(l\) codes the square
root bound (SRB), the improved
square root bound of the supple-
ment (ISRB) if it is better, the corrected improved square root bound
(CISRB) if the ISRB was even,
and the double-corrected improved
square root bound (DCISRB) if l is
−1 modulo 8 and the CISRB is congruent 1 modulo 4, and the true minimal distance d.
46
CHAPTER 2. INFINITE FAMILIES OF LINEAR CODES
The parity extension C of a code C over a field F is obtained by adding to
each codeword from C the sum of the entries of the codeword. As we shall see
in the proof of the next Theorem the parity extension of the quadratic residue
codes of length l have very interesting properties. . The proof points in fact to
much more to discover.
Theorem 10.5. Let QR1l the subcode of QRl consisting of all [f ] in QRl with
f (1) = 0, and let d be the minimal length of QRl . Then the minimal length of
QR1l equals d + 1.
Proof. For the prove we consider the Fp -vector space Fp P1 (Fl ) of all maps
from P ro1 (Fl ) into Fp . As a basis we can take the maps eP (a\in\Proˆ1(\F l)\),
where eP (Q) equals 1 if p = Q and 0 otherwise. We identify a code C of length
l over Fp with the subspace


l
X

cj e[j̃,1] : c0 c1 . . . cl−1 ∈ C


j=0
(where j̃ denotes the residue class of j in Fl ). The parity extension of C is then
the code


l−1


X
cj : c ∈ C .
C = c + e[1:0]


j=1
Using these identifications QR1l equals then the subcode of all c in QRl such
that c([1 : 0]) = 0.
We use without proof the Theorem
and Prange, which states that
of Gleason
the natural action of SL(2, Fl ) on Fp P1 (Fl ) leaves QRl invariant. The natural
action is defined as the linear continuation of
a b ,e
[x:y] 7→ eax+by:cx+dy .
c d
It is easy to see that SL(2, Fl ) acts transitively on P1 (Fl ).
Now let c be a codeword of minimal Hamming weight, say e, in QR1l . Transforming c with a suitable A in SL(2, Fl ) into c0 with c0 (∞) 6= 0, then deleting
the last entry, we find a codeword in QRl , whose entries sum up to non-zero, of
Hamming weight e − 1. Therefore, e ≥ d + 1. Vice versa, if we take a codeword
c of length d in QRl , then its parity extension c must have length d + 1 (since
otherwise it would be in QR1l , which only contains codewords of length ≥ d + 1),
and then permuting again we find a codeword in QR1l of length d + 1.
The ternary Golay code
The quadratic residue code of length
11 over F3 is known as the ternary
Golay code. The generator polynomial is
g = x5 + 2x3 + x2 + 2x + 2.
As basis one might take [g · xj ]x11 −1
for j = 0, 1, . . . , 5. The Hamming
weight of the basis elements is 5. If
one searches through all 36 = 729
codewords one sees that 5 is also the
minimal distance of this code. It is
therefore a [11, 6, 5]3 -code.
The ternary Golay code possesses a
number of extraordinary properties.
10. QUADRATIC RESIDUE CODES
47
Quadratic reciprocity
Let l be an odd prime number. We have
use (F∗l )2 for the subgroup consisting
l − 1 2
of the squares a2 of the elements of
1 · 2···
≡ −1 mod l
2
F∗l . The kernel of the squaring map
sq : a 7→ a2 consists of {±1}. Ac- as follows from Wilson’s theorem
cordingly,
(which states that, for any odd
prime, (l − 1)! ≡ −1 mod l) and us
l−1
ing that
#(F∗l )2 ) = Fl∗ : {±1} =
.
2
l−1 l − 1
l−3
l+1 l+3
∗ 2
·
· · · (l−1) ≡ (−1) 2
···
· · · 1 mod l.
∗
The group Fl Fl ) has in particu2
2
2
2
lar two elements, and hence it possummarize
this in the form
sesses exactly one non-trivial char- One can
−4
=
+1
if
and
only if l ≡
that
l
acter (i.e. homomorphism) into the
1
mod
4.
In
this
way
we
see that
group \{\pm 1\}. This character is
a
property
of
a
number
modulo
l is
usually considered as character on Fl
expressed
as
a
property
of
l
modulo
and denoted by l· . Thus
this number.
(
This remarkable fact is valid in
+1 if a is a square in Fl , much more generality and called the
a
l =
−1 otherwise.
Quadratic Reciprocity Law.
Theorem 10.6 (Quadratic reciOne usually sets in addition 0l =0,
procity). For any odd prime numx
andif x is
an integer, one uses l
bers l and p one has
x·1Fl
for
.
(
l
l
−1 if l and p are congruent −1 modulo 4,
p
If −1 is a square, say, −1 = a2 ,
l · p =
l−1
+1 otherwise.
then (−1) 2 = +1 since by Fermat’s theorem al−1 = 1. Thus if −1
is a square, then l ≡ 1 mod 4. Vice Moreover, 2l = +1 if and only if
versa, if l − 1 is divisible by 4, we l ≡ ±1 mod 8.
Exercises
10.1. Prove that for any polynomial f in F [x], one has
h [f ]f ([x]1 ) h([f ])2 h([f ]) + 1.
10.2. Assume that n and |F | are relatively prime. Prove that, for every two
non-zero minimal cyclic codes C and C 0 of length n over F which have the same
dimension, there exists a u in F∗l such that C 0 = C u .
10.3. In the notations of this section let l 6= 2 and define
X
ρ†l := c† +
xj ,
j∈Q
where c† =
1−s
2 .
Show that gcd(ρ†l , xl − 1) = nl .
10.4. Prove Proposition 2.
48
CHAPTER 2. INFINITE FAMILIES OF LINEAR CODES
Chapter 3
Symmetry and duality
11
Weight enumerators
Given a code C it is a natural question to ask for the number of codewords of
a given Hamming weight. As it is custom in combinatorics we combine these
informations in a generating function. More precisely, we define the weight
enumerator of C by
X
wC (x) =
xh(c) ∈ Z[x].
c∈C
Note that the kth coefficient of the weight enumerator is the number of c in C
such that h(c) = k.
We saw already before several examples of codes which are somehow the
same since they only differ by a permutation of the places in their codewords.
More formally we call two codes C and C 0 of the same length n over a field F
n
equivalent if there exists an automorphism α of the F -vector
space F which
preserves the Hamming distance (.i.e. satisfies h α(v), α(w) = h(v, w) for all
v and w in F n ) such that α(C) = C 0 . It is clear that equivalent codes have the
same weight enumerator.
The isometries of F n with respect to the Hamming distance, i.e. the ismorphisms of the F -vector space F n preserving the Hamming distance form a
group, which can be explicitly described. Obvious isometries are “permutation
of the places of a word” and “multiplication of a word by a nonzero scalar from
F ”. In fact, there are no others in the sense that the group of isometries of a
given F n is generated by these isometries (see Exercise 1).
Example 11.1 (Weight enumerators of Hamming and Golay code). If we read
off the Hamming code from the projective plane over F2 as explained in Example
4.2 we find that it consists of 1 codeword of weight 0, 7 codewords of weight 3
(which correspond to the lines of the projective plane), 7 codewords of weight 4
(which correspond to the complements of the lines), and 1 codeword of weight
7 corresponding to the projective plane itself. Therefore
wH(7,4) = 1 + 7x3 + 7x4 + x7 .
If we append a parity bit to each codeword of the Hamming code then
only the codewords of odd weight become codewords of increased weight in the
49
50
CHAPTER 3. SYMMETRY AND DUALITY
extended code. Therefore
wH(7,4) = 1 + 14x4 + x8 .
For the Golay code we find (using e.g. Exercise 3 in Section 9)
wG23 = 1 + 253x7 + 506x8 + 1288x11 + 1288x12 + 506x15 + 253x16 + x23 .
And as before we deduce from this for the extended Golay code
wG24 = 1 + 759x8 + 2576x12 + 759x16 + x24 .
We see that the weight enumerators of the example are palindromic, i.e.
satisfy
xdeg(w) w(1/x) = w(x).
This is in fact easy to understand: each of the codes in the example possesses
the word 1 consisting only of 1s, and adding 1 defines a bijection between the
codeword of weight k and those of weight n − k (where n is the length of the
code).
Exercises
11.1. Show that he group of isometries of F n with respect to the Hamming
distance equals S n M , where S is the group of isometries of the form
pσ : w1 w2 · wn 7→ wσ(1) wσ(2) · · · wσ(n) ,
(σ an element of the symmetric group of n letters), and where M is the group
of isometries of the form
ma : w 7→ aw := (a1 w1 , a2 w2 , . . . , an wn )
(a = (a 1, a 2, . . . , a n) an element in (F ∗ )n ).
12. MACWILLIAMS’ IDENTITY
12
51
MacWilliams’ Identity
Weight enumerators and their deeper properties are shadows of a richer theory
behind them, namely the theory of Weil representations. We do not have to
consider here the full theory of these representations but consider only only a
small part, stream-lined to applications in coding theory.
For this let F be a finite field with q elements. We consider the polynomial
ring over C in q variables xa indexed by the elements a of F . To a code C over
F of length n we associate its generalized weight enumerator
X
WC =
xc1 xc2 · · · cxn ∈ Z [{xa }a∈F ] .
c=c1 c2 ···cn ∈C
This is the generating function for
P the following problem: for given non-negative
integers ka (\(a\in F\)) with
a∈F ka = n, find the number of codewords
a ({ka }a∈F ) in C which, for each a, have exactly ka entries equal to a. In other
words,
X
k
ka
WC =
a ka1 , . . . , kaq xaa1 1 · · · xaq q ,
ka1 ,...,kaq ≥0
ka1 +···+kaq =n
where aj is a fixed enumeration of F . Note also that
WC (x0 //1, xa //x (a ∈ F∗ )) = wC
(i.e. replacing x0 by 1 and all other xa by x yields the usual weight enumerator).
If F = F2 the generalized enumerator WC does not carry more information than
wC . Namely, we have
WC (x0 , x1 ) = xn0 wC (x1 /x0 ).
For the following we fix a non-trivial group homomorphism
χ : F + → C∗ .
Here F + indicates
the field F viewed as abelian group with respect to addition.
P
Note that a∈F χ(ab) equals 0 for b 6= 0 and equals |F | otherwise, as follows
from the fact that χ(ab) = 1 for all a is only possible for b = 0 (see Exercise 2).
We define two linear operators T and S on the subspace H1 of linear forms
in C [{xa }a∈F ]:
T xa = ψ(a) xa ,
σ(F ) X
χ(−ab) xb ,
Sxa = √
q
b∈F
where
1 X
σ(F ) = √
ψ(a)−1 .
q
a∈F
The operator T is not needed in this section and for the moment the reader may
ignore it and the following definition of ψ. However it will play an important
role in Section 3.4. We use ψ for a map ψ : F → C∗ such that
52
CHAPTER 3. SYMMETRY AND DUALITY
1.
ψ(a+b)
ψ(a) ψ(b)
= χ(ab),
2. ψ(a)2 = χ(a2 ).
1−p
If |F | = pk for some odd prime p we can take ψ(a) = χ(a) 2 . In fact,
this is the only choice as we see by taking the 1−p
2 th power on both sides of 2.
(For verifying these two statement you want to use that χ(a)p = χ(pa) = 1.)
However, if p = 2 the proof of the existence of a psi satisfying 1 and 2 is more
subtle and there are in general more than one choices (see Addon 12) for more
details).
It is not difficult to show that that T and S are invertible as operators on
H1 . For T this is obvious. For S one can show that S 2 xa = σ(F )2 x−a (see
Exercise 1), and one uses that σ(F )8 = 1. Let G be the subgroup of GL(H1 )
generated by the linear operators S and T .
Though, in this section, we shall not make use of it the following is notewor
thy. It can be shown that, for odd q, the application ( 10 11 ) 7→ T and 01 −1
7→ S
0
can be extended to a homomorphism from SL(2, F ) onto
G.
(This
extension
is
unique since SL(2, F ) is generated by ( 10 11 ) and 01 −1
.)
Thus
G
is
a
homo0
morphic image of a quotient of SL(2, F ). If q is 2-power the situation is slightly
more complicated. Here G turns out to be the homomorphic image of a central
extension of SL(2, F ) of order two.
We extend the action of G on H1 to an action of G on C [{xa }a∈F ] by setting,
for A in G and a polynomial f :
(A.f ) ({xa }) = f ({A.xa }) .
Later we shall be interested in codes whose weight enumerators are invariant
under G. For the moment we confine ourselves to the statement that S leaves
weight enumerators invariant (up to multiplication by a constant).
For this we introduce the notion of dual code. Let C be a linear code over F
of length n. The dual code C⊥ of C is the subspace of words w = w1 w2 · · · wn
in F n such that
n
X
wi ci
w · c :=
i=1
equals 0 for all c = c1 c2 · · · cn in C.
Theorem 12.1. For any linear code C over F of length n, one has
SWC =
σ(F )n |C|
√ n WC ⊥ .
q
Proof. One has
(
σ(F ) X
SWC = WC  √
χ(−ab) xb
q
b∈F
)


a∈F
σ(F )n X X
= √ n
χ(−c · w) xw1 xw2 · · · xwn
q
n
c∈C w∈F
But c∈C χ(−c · w) = 0 unless w ∈ C⊥ , when the sum equals |C|. The theorem
is now obvious.
P
12. MACWILLIAMS’ IDENTITY
53
As a consequence of the foregoing theorem we obtain a remarkable identity
due to Jessie MacWilliams which expresses the (usual) weight enumerator of a
dual code in terms of the enumerator of the given code.
Theorem 12.2 (MacWilliams’ Identity). For any code of length n over a field
F , one has
wC ⊥ (x) = |C|
−1
wC
1−x
1 + (q − 1)x
n
(1 + (q − 1)x) .
P
Proof. Indeed, setting x0 = 1 and xb = x (b ∈ F ∗ ), the sum b∈F χ(−ab)xb
becomes 1 − x for a 6= 0 and 1 + (q − 1)x for a = 0. We therefore obtain
σ(F )n X
(SW C) (x0 //1, xb //x (b ∈ F ∗ )) = √ n
(1 + (q − 1)x)n−h(c) (1 − x)h(c) .
q
c∈C
Comparing this to the formula of Theorem 12.1 we recognize MacWilliams’
Identity.
If F = F2 MacWilliam’s Identity assumes a especially attractive form if
written in terms of Wc (x0 , x1 . Here it becomes
|C ⊥ |
−1/2
−1/2
WC ⊥ (x0 , x1 ) = |C|
WC (x0 , x1 ) √12
1 1
1 −1
.
(We used here |C| · |C ⊥ | = 2n , where n is the length of C, as we shall prove in
the next section.)
Example 12.3 (Weight enumerators of the duals of Hamming and Golay
code). The weight enumerators of the Hamming and Golay code were computed in Example 11.1. For the weight enumerators of the dual codes we find
by MacWilliams’ Identity:
wG⊥
= 2−12 wG23
23
wH(7,4)⊥
1−x
1+x
(1 + x)23
= 253x16 + 1288x12 + 506x8 + 1
1−x
= 2−4 wH(7,4)
(1 + x)7
1+x
= 7x4 + 1.
In particular, we see that the minimal weights increased, but for the price that
the information rate dropped below 12 .
54
CHAPTER 3. SYMMETRY AND DUALITY
The existence of characters of degree two
The purpose of this section is the
proof of the following statement.
Theorem 12.4. Let χ : M × M →
C∗ be a symmetric Z-bilinear nondegenerate map on an (additive) finite abelian group M . Then there
exists a map ψ : M → C∗ such that,
for all a, b in M , one has
1.
ψ(a+b)
ψ(a) ψ(b)
= χ(a, b),
2. ψ(a)2 = χ(a, a).
The term non-degenerate means
that the application b 7→ χ(∗, b) defines an isomorphism of groups M →
Hom(M, C∗ ). Note that it suffices to
verify that b 7→ χ(∗, b) is injective,
since by general theorems on the
dual groups of abelian groups M and
the group of characters Hom(M, C∗ )
on M have the same cardinality (are
even isomorphic).
In the text we applied this theorem
in the situation M = F + and χ(a, b)
equals to χ(ab) for a non-trivial character of F + .
Note that 1. and 2. imply
ψ(na) = ψ(a)n
2
unity in C∗ . We leave it to the skeptical reader to verify hat this defines
indeed a group. Since [M, χ] is finite
and abelian we can extend the character (0, s) 7→ s−1 on the subgroup
{(0, s) : s ∈ µe } to a character ψb on
all of [M, χ]. Let ψ(a) := ψb ((a, 1)).
It is easy to see that ψ satisfies 1.
However, 2. is not necessarily satisfied. But we can modify ψ by multiplying with any character of F + and
1. is still be satisfied for the modified ψ. We show that we can modify
ψ in this way so to fulfill 1.
For this consider the map γ on M
defined by
γ(a) :=
ψ(a)2
.
χ(a, a)
From 1. it follows that γ is a character of M . A simple calculation
shows that γ is trivial on the subgroup M [2] of elements a of M such
that 2a = 0. Namely, we have
ψ (a, 1)2
ψ(a)2
ψ ((2a, χ(a, a)))
ψ(2a)
=
=
=
,
χ(a, a)
χ(a, a)
χ(a, a)
χ(a, a)2
which equals 1 if 2a = 0. But
by general duality theory for abelian
groups the subgroup of characters of
2
M nwhich
are
onn2M
+1
+1[2] equals
ψ ((n + 1)a) = ψ(na)ψ(a)χ(na, a) = ψ(a)
χ(a,
a)ntrivial
= ψ(a)
ψ(a)2n = ψ(a)n+1 ,
the subgroup of squares of characwhere the first identity is 1., the sec- ters on M . Therefore γ = δ 2 for a
ond the induction hypothesis, and suitable character on M , and by assumption, δ = χ(∗, c) for some c in
the third is 2.
M . In other words,
Proof. For proving the theorem let
[M, χ] be the group consisting of all
ψ(a)2
= χ(a, c)2
pairs (a, s) (\(a\in F\), s ∈ µe ) with
χ(a, a)
the composition law
for all integers n (as follows inductively proceeding like
(a, s) · (a0 , s0 ) = (a + a0 , ss0 χ(a, a0 )) .
Here e denotes the exponent of M
and µe the subgroup of eth roots of
for all a in M , from which we recognize that 2. is fulfilled with ψ replaced by ψ/χ(∗, c). This proves the
theorem.
12. MACWILLIAMS’ IDENTITY
55
Exercises
12.1. Show that the operator S has finite order and is hence invertible. (You
can use without proof that σ(F )8 = 1.)
12.2. Show that χ(ab) = 1 for all a ∈ F only for b = 0. Deduce from this that
(
X
|F | if b = 0,
χ(ab) =
0
otherwise.
a∈F
12.3. Compute σ(F ) for |F | equal to a 2-power, 3-power, or 5-power. What do
you expect for an arbitrary F ?
56
13
CHAPTER 3. SYMMETRY AND DUALITY
Duality
In coding theory we are naturally interested in principles for the construction
of codes which allow to control the informations rate and the minimal distance.
One of these constructions is taking the dual of a given code.
For explaining this fix a finite field F and recall the scalar product
v · w := v1 w1 + · · · + vn wn
of two words v = v1 · · · vn and w = w1 · · · wn in F n . Note that the scalar
product takes values in F . It is non-degenerate, which means that v · w = 0 for
all w in F n is only possible for v = 0. For a (not necessarily linear) code C in
F n we define its dual C ⊥ by
C ⊥ {w ∈ F n : w · c = 0 for all c ∈ C} .
It is clear that C ⊥ is a linear code, even if C is not, and that C ⊥ equals the
dual of the linear space hCi generated by C. Because of this we are mainly
interested in studying the duals of linear codes.
Concerning the dimension of the dual of a given linear code we have the
following.
Proposition 13.1. Let C be a linear code of length n over F . Then
dim C + dim C ⊥ = n.
Proof. Let c1 , . . . , ck be a basis of C and consider the map
L : F n → F k,
w 7→ (w · c1 , . . . , w · ck ) = wM,
where M is the matrix whose columns are the ci . Note that the rank of M is k
(since the ci are linearly independent), so that L is surjective. One of the main
theorems about linear maps in linear algebra states
dim ker L + dim im L = n.
But as we saw, im L = F k , and obviously ker L = C ⊥ . This proves the proposition.
The minimal distance of C ⊥ is not simply a function of the minimal distance
of C, but we need the knowledge about the distribution of weights in C. The
relevant formula is here MacWilliam’s Identity from the last section:
1−x
−1
wC ⊥ (x) = |C| wC
(1 + (q − 1)x)n .
1 + (q − 1)x
It is clear that we can read off from this formula the minimal distance of C ⊥ .
Let us suppose for the moment for simplicity F = F2 , i.e. q = 2, and that C is
a code of length n and dimension k over F2 . Then
1−x
d(C ⊥ ) = ordx=0 wC
(1 + x)n − 2k = ordx=1 wC (x) − 2k−n (1 + x)n ,
1+x
where ordx=a f (x) denote the vanishing order of the function f (x) at x = a.
(The second identity follows on replacing x by 1−x
1+x .)
13. DUALITY
57
The reader should watch out that, for a given linear code, C and C ⊥ might
have non-zero intersection; it is not at all in general true that C and C ⊥ form
a direct sum. On the contrary, very often the most interesting codes are those
which are self-dual, i.e. those codes such that C = C⊥ . A necessary condition
for the existence of such a code is of course that its length is even, and then its
dimension must be f racn2 as follows from the preceding proposition.
Example 13.2 (The extended Hamming code). The extended Hamming code
H(7, 4) is a [8, 4, 4]2 -code. Its weight enumerator equals 1 + 14x4 + x8 , from
which we see that the weight of each of its codewords is divisible by 4. As we
shall see in a moment this implies that C is contained in C ⊥ . On using the
above proposition this implies that C is in fact self-dual.
We call a binary code even if the weights of its codewords are all even, and
we call it doubly even if the weights of its codewords are divisible by 4. We
have:
Proposition 13.3. Let C be a binary linear code of length n. Then one has:
1. If C is self-dual the C is even.
2. If C is doubly-even, then C is self-dual.
Note that there are indeed codes which are self-dual but not doubly-even.
An example is the repetition code C := {(00, 11}.
Proof. For 1. note that, for a self-dual code, every codeword satisfies c · c = 0.
But c · c equals the number of 1s in c, viewed as element in F2 , which implies
that this number is even if c · c = 0.
2. is an immediate consequence of the identity
1
0
0
(h(c + c ) − h(c) − h(c )) · 1F2 = c · c0 .
2
Here 1F2 is the non-zero element of F2 . Note that the number on the left is an
integer.
We complement our theory of cyclic codes of Section 2.3 by the following.
Theorem 13.4. Let C be a cyclic code of length n with generator polynomial
g and control polynomial h.
1. Then C ⊥ is cyclic with generator and control polynomials h] and g ] .
(Here, for a polynomial f of degree l, we use f ] for the reciprocal polynomial, i.e. the polynomial f ] (x) = xl f (1/x).)
2. The code C ⊥ is equivalent to the code generated by h.
Proof. It is obvious that C ⊥ is cyclic, and that g ] and h] are bothPdivisors of
xn −P
1. For computing the generator polynomial of C ⊥ let g = i gi xi and
h = j hj xj . Then
X n − 1 = gh =
X
l
xl
l
X
i=0
gi hl−i .
58
CHAPTER 3. SYMMETRY AND DUALITY
Comparing the coefficients of xn−1 on both sides gives
0 = (g0 , g1 , . . . , gn−1 ) · (hn−1 , hn−2 , . . . , h0 ).
But the word on the right represent the coefficients of h] . It follows that h] is
in C ⊥ , and therefore, that the code C1 with generator polynomial h] is in C ⊥ .
But both codes have the same dimension (as follows from dim C1 = n − deg h]
(see Proposition 9.1) and dim C ⊥ = n − dim C = n − deg h (see Proposition 1
and again Theorem 1 in Section 2.3). Clearly h] g ] = xn − 1, so that g ] is the
control polynomial of C ⊥ .
For 2. we use the application f 7→ f ] , which defines an isometric map
between C ⊥ and the code with generator polynomial h.
Finally, we also determine the dual codes of the Reed-Solomon codes studied
in Section 7. For this we have to introduce some notation. Let a = a1 · · · an be
a vector of length n with pairwise different entries ai from F . We set
ga (x) :=
n
Y
(x − ai )−1 ∈ F (x),
i=1
where F (x) is the field of rational functions in the variable x. Moreover, for any
polynomial f , we set
Resai (f ga ) := [f (x)ga (x) · (x − ai )]x=ai = f (ai )
n
Y
(ai − aj )−1 .
j=1
j6=i
Lemma 13.5. For f in F [x]<n−1 , one has
n
X
Resai (f ga ) = 0.
i=1
We shall postpone the proof for a moment and state our main result about
the duals of Reed-Solomon codes.
Theorem 13.6. Let C = RSq (a, k) be a Reed-Solomon code of length n ≥ k.
1. Then C ⊥ equals the image of the map
R : F [x]<n−k → F n ,
f 7→ (Resa1 (f ga ), . . . , Resan (f ga )) .
2. C ⊥ is equivalent to the Reed-Solomon code RSq (a, n − k)
Proof. For 1. we show, first of all, that he image of R is contained in C ⊥ .
Indeed, let h in F [x]<k and f in F [x]<n−k , then hf has degree ≤ n − 2, and
from the lemma we obtain
n
X
i=1
h(ai )Resai (f ga ) =
n
X
Resai (hf ga ) = 0.
i=1
The map R is injective since R(f ) = 0 implies f (ai ) = 0 for all aj , and, since
f has degree < n, therefore f = 0. Hence the image or R has dimension n − k,
which equals n − dim C (see Theorem 7.1) and is therefore the dimension of C ⊥ .
13. DUALITY
59
2. follows on calculating
R(f ) = (f (a1 )α1 , . . . , f (an )αn ) ,
where we use
αi =
n
Y
(ai − aj )−1 .
j=1
j6=i
It follows that C ⊥ equals the image of RSq (a, n−k) under the map (w1 , . . . , wn ) 7→
(α1 w1 , . . . , αn wn ), which is an isometry of the Hamming distance.
Proof of Lemma 5. For a polynomial f consider the determinant


1 a1 a21 · · · a1n−2 f (a1 )
n−2
1 a2 a22 · · · a
f (a2 ) 
2


Df := det  .
.
.
.
..  .
.
.
.
.
.
.
.
.
. 
1 an a2n · · · ann−2 f (an )
If f has degree ≤ n − 2, then the last column is a linear combination of the first
n − 1 columns, and hence Df = 0.
On the other hand we find by Kramer’s rule, developing the determinant
after the last column
n
X
Df =
(−1)n−1−r f (ar ) · Vr ,
r=1
where Vr equals the Vandermonde determinant of the numbers a1 , . . . , ar−1 , ar+1 , . . . , an .
In other words,
Y
Vr =
(aj − ai ).
1≤i<j≤n−2
i,j6=r
But we can write
Vr = (−1)i−1
n
Y
(ar − ai ).
(aj − ai )
Y
i=1
i6=r
1≤i<j≤n−2
It follows

Df = (−1)n 

Y
(aj − ai )
1≤i<j≤n−2
n
X
Resai (f ga ).
i=1
which implies the lemma.
Exercises
13.1. Let C be an [n, k, n+1−k]q code such that C ⊥ is an [n, n−k, k +1]q -code
(i.e. such that C and C ⊥ are both MDS-codes - see Section 7). Show that
k−1
X n
wC (x) = 1 +
(q k−i − 1)(1 − x)i xn−i .
i
i=0
13.2. Deduce from Theorem 13.4. that the parity extension of any quadratic
residue code is self-dual.
60
CHAPTER 3. SYMMETRY AND DUALITY
Appendix
14
Solutions to selected exercises
1.1
A straightforward approach in the sense of “without much theory” is to use a
computer algebra system. We use Sage, write the few lines of code to solve the
exercise, and apply it here to {0, 1}3 instead of {0, 1}5 for reducing the output
(and the computation time).
def min_dist( C):
# computes the minimum distance of a code C
pairs = Subsets(C,2)
d = infinity
for w,v in pairs:
d = min( d, sum( 1 for i in range(len(w)) if w[i] != v[i]))
return d
F = [0,1]
W = CartesianProduct( F,F,F)
# in fact, W = CartesianProduct( F,F,F,F,F) for the exercise
W = {tuple(w) for w in W}
S = Subsets(W)
tbl = {}
for C in S:
p = (min_dist(C),C.cardinality())
if tbl.has_key(p):
tbl[p].append(C)
else:
tbl[p] = [C]
ll=tbl.keys()
ll.sort(); ll
1.2
We need that
3 + 2 · 5 + 3 · 4 + 4 · 0 + 5 · 6 + 6 · 4 + 7 · 1 + 8 · + 9 · 3 + 10 · 5 = 163 + 8 · ∗
61
62
APPENDIX
is divisible by 11, which is the case only if ∗ = 3. Looking up the resulting ISBN
10 number 3540641335 in https://de.nicebooks.com reveals the book.
2.1
The only non-trivial property to check is the triangle inequality
h(v, w) ≤ h(v, t) + h(t, w).
But, indeed, if v and w differ at the ith place than at least one of the pairs v, t
and t, w differ also at the same place.
2.2
There are ni choices for i places among n where a word can differ from give
one, and at each of these i many places we have a−1 choices for a letter different
from the latter at the corresponding place of the given word. Thus there are in
total ni (a − 1)i words of length n having Hamming distance i to a given word.
The claimed formula is now immediate.
2.3
It suffices to prove that, for any 0 ≤ i ≤
a−1
a n,
one has
n
n
i−1
(a − 1)
≤
(a − 1)i .
i−1
i
This inequality is equivalent to
n−i+1
(a − 1) ≥ 1,
i
which is indeed true since the left hand side equals
−(a − 1) +
n+1
n+1
a
≥ −(a − 1) +
=1+ .
i/(a − 1)
n/a
n
2.4
We have, for p ≥ 1 − a1 ,
X
1
i≤n(1− a
)
X n
X n
n
(a − 1)i ≤
(a − 1)i = an .
(a − 1)i ≤
i
i
i
i≤np
i≤n
Taking loga and dividing by n, and using, that by Theorem 4 the left hand side
tends to 1, whereas the right hand side equals 1, we conclude that the limit of
the middle term exists and equals 1.
14. SOLUTIONS TO SELECTED EXERCISES
63
2.5
There are in total p2 normalized polynomials of degree 2 in Fp [x]. to find the
irreducible ones, we have to suppress those which are of the form f 2 or f · g
with normalized polynomials f and g of degree 1, which are p and p2 many,
respectively. Thus there remain p2 −p− p2 = p(p−1)/2 irreducible polynomials
of degree 2.
The number of irreducible polynomials over Fp
We fix a prime power q. Denote by
Nq (l) the number of normalized irreducible polynomials of degree l over
Fq . It is not hard to prove the following
Proposition 14.1.
1X
µ(l/d)q d .
Nq (l) =
l
d|l
Taking the logarithm on both sides,
expanding in powers of q −s and
comparing coefficients, we find the
claimed formula.
The arithmetic function Nq (n) is
multiplicative and, for a prime power
lt , we have
Proof. Every normalized polynomial
can be factored in a unique way into
t
t−1
Nq (lt ) = pl − q l .
a product of powers of normalized irreducible polynomials. From this it
follows that
We conclude that Nq (n) ≥ 1 for all
X
Y
q −s deg(f ) =
(1 − q −s deg(g) )−1 , n.
We see that
g
f
where f runs through all normalized
and g through all normalized irreducible polynomials in Fq [x]. The
number of normalized polynomials
of degree n is q n , so the left hand
side of the last formula equals 1/(1−
q 1−s ). Thus the last formula can be
rewritten as
Y
1 − q −s =
(1 − q −sn )Nq (n) .
n≥1
Nq (l)
= 1.
l→∞
ql
lim
In other words, for a given large degree l an arbitrary polynomial will
be with probability close to 1 irreducible.
For the first primes, the first values
of the sequence Np (l) are
3.1
The mean value of h on EC is
E(h) =
X
c∈C,w∈An
h(c, w) PC (c, w) =
1 X
p r
r a(r)
(1 − p)n−r .
|C|
a−1
r≥0
where a(r) is the number of pairs (c, w) in C × An with h(c, w) =
r. There are
|C| possibilities to pick c, and then, for a given c, there are nr places which
64
APPENDIX
can be changed in (a − 1)r ways to yield a w with h(c, w) = r. Thus
n
a(r) = |C|
(a − 1)r ,
r
and inserting this into the last formula for E(h) gives
X n
d
E(h) =
r pr (1 − p)n−r =
(1 − p + et p)n t=0 = np.
r
dt
r≥0
Before computing the variance of h we note that
σ 2 (h) = E(h2 ) − E(h)2 ,
as is immediate from the defining formula for the variance by multiplying out
|h(e) − E(h)|2 . By a similar computation as before we obtain
E(h2 ) =
d2
(1 − p + et p)n t=0 = n(n − 1)p2 + np,
2
dt
which implies indeed σ 2 (h) = np(1 − p).
3.2
If w = 0, then w is contained in every subspace, and hence the mean value
is 1. So assume that w 6= 0. Then every v 6= 0, w defines the 2-dimensional
subspace hv, wi containing w, and every such subspace occurs exactly two times
when v runs through F32 \ {0, v} (since v and v + w define the same subspace).
Thus the requested mean value is (23 − 2)/g32 , where g32 is the number of 2dimensional subspaces of F32 . But this number is the same as g31 , the number of
1-dimensional subspaces of F32 (as follows from the fact that the map V 7→ V ∗
defines a bijection between the set of two-dimensional subspaces of F32 and the
1-dimensional subspaces of the dual space (F32 )∗ , which is isomorphic to F23 ; here
S ∗ is the subspace of linear forms vanishing on S). Obviously, g31 = 23 − 1, so
that the desired mean value is 67 .
A more general approach is to count k-dimensional subspaces containing a
given subspace S by identifying them with subspaces of V /S of dimension k − s,
where s is the dimension of S.
4.1
Solution by Robert Stark :
# SAGEmath
def min_dist(C):
pairs = Subsets(C, 2)
d = infinity
for w, v in pairs:
d = min(d, sum(1 for i in range(len(w)) if w[i] != v[i]))
return d
idMat = matrix.identity(12) # 12x12 identity matrix
14. SOLUTIONS TO SELECTED EXERCISES
65
adjIcoMat = graphs.IcosahedralGraph().adjacency_matrix() # adjacency matrix of icosahedral
def compl(x):
if x == 1:
return 0
else:
return 1
adjIcoMat = adjIcoMat.apply_map(compl) # complement
genMat = idMat.augment(adjIcoMat) # generator matrix
# generate all codewords
codewords = []
for w in list(span(idMat, GF(2))):
codewords.append(w * genMat)
for c in codewords:
c.set_immutable() # fix for min_dist function
print("Minimal distance of G24:", min_dist(Set(codewords)))
4.2
For k ≤ n, let Regk,n (F ) the set of k × n-matrices over F of full rank. The
application
M 7→ space generated by the rows of M
defines a surjective map S : Regk,n (F ) → Gkn (F ). Moreover S(M ) = S(M 0 ) if
and only if there exists a g in GL(k, F ) such that M = gM 0 . Note that gM =
M implies g = 1 since by assumption M possesses a submatrix in GL(n, F ).
Therefore the preimage of a subspace in Gkn (F ) under the map S comprises
exactly N := GL(k, F ) elements. We conclude that
|Gkn (F )| =
|Regk,n (F )|
.
|GL(k, F )|
We have
|Regk,n (F )| = (q n − 1)(q n − q)(q n − q 2 ) · · · (q n − q k−1 ).
Namely, for the first row of an M in Regk,n (F ) we have q n − 1 choices (any
vector except 0). For the second row we have q n − q choices (any vector except
the ones in the subspace spanned by the first row), for the third q n − q 2 (any
vector except the ones in the subspace spanned by the first two rows), etc.
Furthermore
|GL(k, F )| = |Regk,k (F )| = (q k − 1)(q k − q)(q n − q 2 ) · · · (q k − q k−1 ).
Combining the last three formulas proves the claimed formula.
66
APPENDIX
4.3
Let h1 , . . . , hk denote the columns of the control matrix K. Let s be minimal
so that there are columns hi1 , . . . , his (\(i 1\lt i 2 \lt\cdots\lt i s\)) which are
linearly dependent, i.e. such that there exists a codeword c such that ci = 0 for
all i outside {i1 , . . . , is }. Note also that all cij are nonzero by the minimality of
s. Thus c has Hamming weight s, and we conclude d(C) ≤ s.
Vice versa, let c in C with minimal weight d = d(C). If i1 < · · · < is are the
places where c has nonzero entries, then the columns hi1 , . . . , hid are linearly
dependent. Therefore d ≥ s.
4.4
For proving that V, S is indeed an (n, k)-system, we need to show that the
columns of the generator matrix span a k-dimensional space. But this s clear
since the rank of G is k (since the k rows form a basis of C, and since “rowrank=column-rank”.
Next, we need to show that C equals the set of vectors φ(P1 ), . . . , φ(Pn ) ,
where Pi denotes the ith column of G, and where φ runs through the space of
linear forms F k×1 → F . But application of such a φ equals left multiplication
by a fixed vector x in F k , and then
φ(P1 ), . . . , φ(Pn ) = xG.
The claim is now obvious.
5.1
Let a and b two positive integers. We have
d
d
=
−ε
a
a
l d m
d e
dde
for some 0 ≤ ε < 1. Set m := ab . Since ab is in 1b Z, we have
d ad e
k
=m−
b
b
for some integer 0 ≤ k < b. It follows
d
k+ε
=m−
.
ab
b
d
Since k + ε < b we find d ab
e = m, which proves the claim.
5.2
Choose a codeword c of Hamming weight d and choose i so that c has a nonzero
entry at the ith place. Let C 0 be the code obtained from C by deleting the ith
places of the codewords in C. Clearly C 0 has length n − 1. Since deleting a
place can lower the Hamming weight of a word by at most 1 the code C 0 has
minimal weight ≥ d − 1. In fact we have equality since deleting the ith place
of c yields a word of weight d − 1. Finally the map C → C 0 which deletes the
ith place is injective since every nonzero codeword in C is mapped to a word of
Hamming weight ≥ d − 1, which is hence a nonzero word.
14. SOLUTIONS TO SELECTED EXERCISES
67
5.3
Let k be the maximal dimension for a code of length n and minimal distance d
over the given field, and let C be an [n, k, d]q code. (Note that codes of length n
and minimal distance 0 ≤ d ≤ n exist, e.g. h(11 . . . 100 . . . 0)i with d many 1s, so
that it is in fact justified to talk of a maximal one amongst these.) Then there is
no word w in An \ C which has distance ≥ d to all code words, since otherwise,
for such a word w the subspace C ⊕ hwi would have still minimal distance d ( as
follows using h(aw, c) = h(w, ac) for c in C and scalars a). Therefore the balls
of radius d − 1 around the code words cover all of An . In particular,
q k · Vq (n, d − 1) ≥ q n .
Taking the base-q logarithm proves the theorem.
7.1
I := ker(EV) is a principal ideal (since, as kernel of a linear map, it is a subvector-space of F [x], since, for any polynomial in I, any multiple is obviously
also in I, and since any ideal in F [x] is a principal ideal). Thus I = F [x] · g for
some monic polynomial g. Since g(a) = 0 for all a in F , the polynomial g has at
least q := |F | zeros, and hence its degree is ≥ q. But xq − x has all elements of
F as zeros (since aq−1 = 1 for all a in the multiplicative group F ∗ ), is therefore
contained in I, hence a multiple of g, whence g = xq − x. Thus
I = F [x] · (xq − x).
7.2
As in the previous exercise one shows that the map
Eva : F [x] → F n
f 7→ f (a1 ), . . . , f (an )
Qn
has kernel F [x] · g, where g = j=1 (x − aj ).
Assume k > n. We conclude
ker(Eva,k ) = ker(Eva ) ∩ F [x]<k = F [x]<k−n · g,
in particular dim ker(Eva,k ) = k − n. But then
dim image(Eva,k ) = dim F [x]<k − dim ker(Eva,k ) = n.
It follows that RSq (a, k), for k > n, is an [n, n, 1]q -code.
9.1
If [f ] is in Cn (g) = F [x].[g], then [f ] = [gh] for some polynomial h. But then
g ∗ [f ] = [g ∗ gh] = [(xn − 1)h] = 0 since [xn − 1] = 0. Vice versa, if g ∗ [f ] = 0,
n
= g, and therefore
then g ∗ f is a multiple of xn − 1, i.e. f is multiple of x g−1
∗
[f ] in Cn (g).
68
APPENDIX
9.2
A little script in Sage solves the problem.
l = []
g = [1,0,1,0,1,1,1,0,0,0,1,1]
for i in range(12):
l += i*[0] + g + (11-i)*[0]
A = matrix(GF(2),12,23,l)
G23 = A.row_space(); G23
h = lambda v: sum( 1 for i in range(len(v)) if v[i] != 0)
d = dict( (i,0) for i in range(24))
for v in G23:
d[h(v)] += 1
d
The result is
{0: 1,
1: 0,
2: 0,
3: 0,
4: 0,
5: 0,
6: 0,
7: 253,
8: 506,
9: 0,
10: 0,
11: 1288,
12: 1288,
13: 0,
14: 0,
15: 506,
16: 253,
17: 0,
18: 0,
19: 0,
20: 0,
21: 0,
22: 0,
23: 1}
14. SOLUTIONS TO SELECTED EXERCISES
69
9.3
Let p be the prime dividing q = |F | and write n = pk m with m not divisible by
p. Then we have in F [x]
k
k
xn − 1 = xmp − 1 = (xm − 1)p .
As shown in the proof of Theorem 3 the polynomial xm − 1 factors into N many
pairwise different normalized prime polynomials, where N is given by Theorem
3. But then the number of normalized divisors of xn − 1 equals
σF (xn − 1) = (pk + 1)N = (pk + 1)
ϕ(l)
l|m ordl (q)
P
.
11.1
Let {ei } be the standard basis of F n (where ei consists of a 1 at the ith place
and zeros at all other places). Let α be an isometry, i.e. a bijective linear map
from F n onto itself such that h(w) = h (α(w)) for all w in F n . In particular,
h (α(ei )) = 1, which means that α(ei ) = ai ei0 for a suitable ai in F ∗ and a
suitable i0 . Since α is bijective the map i 7→ i0 must be a permutation σ. It
follows that α = pσ ◦ ma , where a = (a1 , . . . , an ). In other words, α is in SM .
That every map from SM is an isometry is obvious. The sign “n” indicates
that S ∩ M = {1} and that M is normalized by S. This is also obvious.
12.1
Applying S two times we find
S 2 xa =
σ(F )2 X
χ(−ab − bc) xc .
q
b,c∈F
But
P
b∈F
χ (−(a + c)b) = 0 unless c = −a, when the sum equals q. Therefore
S 2 a = σ(F )2 x−a .
It follows that S 4 = 1 if σ(F )4 = 1 and S 8 = 1 otherwise.
12.2
Note, first of all, that there exists a c in F such that χ(c) 6= 1 (since χ is
non-trivial). Therefore, if b 6= 0, then χ(ab) 6= 1 for a = c/b. The sum equals
|F | if b = 0 since then every term equals 1. If b 6= 0 we choose an a0 such
that χ(a0 b) 6= 1. Replacing a by a + a0 in the summation we see that our sum
remains unchanged if multiplied by χ(a0 b) 6= 1, which s only possible if it is 0.
{#
12.3
We note, first of all, that tr : F → Fp is a Fp -linear and non-trivial (see the
preceding exercise). Therefore it assumes each value in Fp exactly |F |/p times.
We can therefore write for p = 2
X |F |
σ(F ) = |F |−1/2
e4 (−x2 ),
2
x∈F2
70
APPENDIX
and similarly for p = 2 but with ep (x2 /2) replaced by e4 (x2 )−1 .
For p = 2 the sum on the right hand side consists only of two terms and
yields
1−i
σ(F ) = √
2
#}