Download An Introduction to Codes

Document related concepts

Matrix multiplication wikipedia , lookup

Gaussian elimination wikipedia , lookup

Cayley–Hamilton theorem wikipedia , lookup

Transcript
CHAPTER 1
An Introduction to Codes
Basic Definitions
The concept of a string is fundamental to the subjects of information and coding theory.
Let A = {a1 , a2 , . . . , an } be a finite nonempty set which we refer to as an alphabet. A string
over A is simply a finite sequence of elements of A. Strings will be denoted by boldface
letters, such as x and y. If x = x1 x2 · · · xk is a string over A, then each xi in x is called an
element of x. The length of a string x, denoted by Len(x), is the number of elements in the
string x.
A code is nothing more than a set of strings over a certain alphabet. Of course, codes
are generally used to encode messages.
Definition 1.1. Let A = {a1 , a2 , . . . , ar } be a set of r elements, which we call a code
alphabet and whose elements are called code symbols. An r-ary code over A is a subset C of
the set of all strings over A. The number r is called the radix of the code. The elements of C
are called codewords and the number of codewords of C is called the size of C. When A = Z2
and A = Z3 , codes over A are referred to as binary codes and ternary codes, respectively.
Definition 1.2. Let S = {s1 , s2 , . . . , sn } be a finite set which we refer to as source
alphabet. The elements of S are called source symbol and the number of source symbol in
S is called the size of S. Let C be a code. An encoding function is a bijective function
f : S → C, from S to C. We refer to the ordered pair (C, f ) as an encoding scheme for S.
Definition 1.3. If all the codewords in a code C have the same length, we say that C is
a fixed length code, or block code. Any encoding scheme that uses a fixed length code will be
referred to as a fixed length encoding scheme. If C contains codewords of different lengths,
we say that C is a variable length code. Any encoding scheme that uses a variable length
code will be referred to as a variable length encoding schemes.
Fixed length codes have advantages and disadvantages over variable length codes. One
advantage is that they never require a special symbol to separate the source symbols in
the message being coded. Perhaps the main disadvantage of fixed length codes is that
source symbols that are used frequently have codes as long as source symbols that are used
infrequently. On the other hand, variable length codes, which can encode frequently used
source symbols using shorter codewords, can save a great deal of time and space.
Uniquely Decipherable Codes
Definition 1.4. A code C over an alphabet A is uniquely decipherable if no two different
sequences of codewords in C represents the same string over A. In symbols, if
c1 c2 · · · cm = d1 d2 · · · dn
for ci , dj ∈ C, then m = n and ci = di for all i = 1, . . . , n.
1
2
1. AN INTRODUCTION TO CODES
The following theorem, proved by McMillan in 1956, provides us some information about
the codeword lengths for unique decipherable code.
Theorem 1.5 (McMillan’s Theorem). Let C = {c1 , c2 , . . . , cn } be a uniquely decipherable
r-ary code and let li = Len(ci ). Then its codeword lengths l1 , l2 . . . , ln must satisfy
n
X
1
≤ 1.
r li
i=1
Remark 1.6. Consider the binary code C = {0, 11, 100, 110}. Its codeword lengths
1, 2, 3 and 3 satisfy Kraft’s Inequality, but it is not uniquely decipherable. Hence McMillan’s
Theorem cannot tell us when a particular code is uniquely decipherable, but only when it is
not.
Instantaneous Codes
Definition 1.7. A code is said to be instantaneous if each codeword in any string of
codewords can be decoded (reading from left to right) as soon as it is received.
If a code is instantaneous, then it is also uniquely decipherable. However, there exist
codes that are uniquely decipherable but not instantaneous.
Definition 1.8. A code is said to have the prefix property if no codeword is a prefix of
any other codeword, that is, if whenever c = x1 x2 · · · xn is a codeword, then x1 x2 · · · xk is
not a codeword for 1 ≤ k < n.
Given a code C, it is easy to determine whether or not it has the prefix property. It is
only necessary to compare each codeword with all codewords of greater length to see if it is
a prefix. The importance of the prefix property comes from the following proposition.
Proposition 1.9. A code is instantaneous if and only if it has the prefix property.
Now we come to a theorem, published by L. G. Kraft in 1949, gives a simple criterion to
determine whether or not there is an instantaneous code with given codeword lengths.
Theorem 1.10 (Kraft’s Theorem). There exists an instantaneous r-ary code C with
codeword lengths l1 , . . . , ln , if and only if these lengths satisfy Kraft’s inequality,
n
X
1
≤ 1.
li
r
i=1
Remark 1.11. Again we should point out that as in Remark 1.6, Kraft’s Theorem does
not say that any code whose codeword lengths satisfy Kraft’s inequality must be instantaneous. However, we can construct an instantaneous code with these codeword lengths.
Definition 1.12. An instantaneous code C is said to be maximal instantaneous if C is
not contained in any strictly larger instantaneous code.
Corollary 1.13. Let C be an instantaneous r-ary code with codeword lengths l1 , . . . , ln .
Then C is maximal instantaneous if and only if these lengths satisfy
n
X
1
= 1.
r li
i=1
EXERCISES
3
McMillan’s Theorem and Kraft’s Theorem together tell us something interesting about
the relationship between uniquely decipherable codes and instantaneous codes. We have the
following useful result.
Corollary 1.14. If a uniquely decipherable code exists with codeword lengths l1 , . . . , ln ,
then an instantaneous code must also exist with these same codeword lengths.
Our interest in Corollary 1.14 will come later, when we turn to questions related to
codeword lengths. For it tells us that we lose nothing by considering only instantaneous
codes rather than all uniquely decipherable codes.
Exercises
(1) What is the minimum possible length for a binary block code containing n codewords?
(2) How many encoding functions are possible from the source alphabet S = {a, b, c} to the
code C = {00, 01, 11}? List them.
(3) How many r-ary codes are there with maximum codeword length n over an alphabet A?
What is this number for r = 2 and n = 5?
(4) Which of the the following codes
C1 = {0, 01, 011, 0111, 01111, 11111} and C2 = {0, 10, 1101, 1110, 1011, 110110}
(5)
(6)
(7)
(8)
are uniquely decipherable?
Is it possible to construct a uniquely decipherable code over the alphabet {0, 1, . . . , 9}
with nine codewords of length 1, nine codewords of length 2, ten codewords of length 3
and ten codewords of length 4?
For a given binary code C = {0, 10, 11}, let N (k) be the total number of sequences of
codewords that contain exactly k bits. For instance, we have N (3) = 5. Show that in
this case N (k) = N (k − 1) + 2N (k − 2), for all k ≥ 3.
Suppose that we want an instantaneous binary code that contains the codewords 0, 10
and 1100. How many additional codewords of length 6 could be added to this code?
Suppose that C is a maximal instantaneous code with maximum codeword length m.
Show that C must contain at least two codewords of maximum length m.
CHAPTER 2
Noiseless Coding
Optimal Encoding Schemes
In order to achieve unique decipherability, McMillan’s Theorem tells us that we must
allow reasonably long codewords. Unfortunately, this tends to reduce the efficiency of a
code. On the other hand, it is often the case that not all source symbols occur with the same
frequency within a given class of messages. When no errors can occur in the transmission
of data, it makes sense to assign the longer codewords to the less frequently used source
symbols, thereby improving the efficiency of the code.
Definition 2.1. An information source is an ordered pair I = (S, P), where S =
{s1 , . . . , sn } is a source alphabet and P is a probability law that assigns to each source symbol
si of S a probability P(si ). The sequence P(s1 ), . . . , P(sn ) is the probability distribution for
I.
For the noiseless coding, the measure of efficiency of an encoding scheme is its average
codeword length.
Definition 2.2. The average codeword length of an encoding scheme (C, f ) for an information source I = (S, P), where S = {s1 , . . . , sn }, is defined by
n
X
Len(f (si ))P(si ).
i=1
We should emphasizes the fact that the average codeword length of an encoding scheme
is not the same as the average codeword length of a code, since the former depends also on
the probability distribution.
It is clear that the average codeword length of an encoding scheme is not affected by
the nature of the source symbols themselves. Hence, for the purposes of measuring average
codeword length, we may assume that the codewords are assigned directly to the probabilities. Accordingly, we may speak of an encoding scheme (c1 , . . . , cn ) for the probability
distribution (p1 , . . . , pn ). With this in mind, the average codeword length of an encoding
scheme C = (c1 , . . . , cn ) is
n
X
pi Len(ci ).
AveLen(C) =
i=1
Let (C1 , f1 ) and (C2 , f2 ) be two encoding schemes of the information source I such that
the corresponding codes have the same radix. We say that (C1 , f1 ) is more efficient than
(C2 , f2 ), if AveLen(C1 ) < AveLen(C2 ). We should point out that it makes sense to compare
the average codeword lengths of different encoding schemes only when the corresponding
codes have the same radix. For in general the larger the radix, the shorter we can make the
average codeword length.
5
6
2. NOISELESS CODING
We will use the notation MinAveLenr (p1 , . . . , pn ) to denote the minimum average codeword length among all r-ary instantaneous encoding schemes for the probability distribution
(p1 , . . . , pn ).
Definition 2.3. An optimal r-ray encoding scheme for a probability distribution (p1 , . . . , pn )
is an r-ary instantaneous encoding scheme (c1 , . . . , cn ) for which
AveLen(c1 , . . . , cn ) = MinAveLenr (p1 , . . . , pn ).
Note the optimal encoding schemes are, by definition, instantaneous. By virtue of Corollary 1.14, this minimum is also over all uniquely decipherable schemes. Hence, we may
restrict attention to instantaneous codes.
Huffman Encoding
In 1952 D. A. Huffman published a method for constructing optimal encoding schemes.
This method is now known as Huffman encoding.
Since we are dealing with r-ary codes, we may as well assume that the code alphabet is
{1, 2, . . . , r}.
Lemma 2.4. Let P = (p1 , . . . , pn ) be a probability distribution, with p1 ≥ p2 ≥ · · · ≥ pn .
Then there exists an optimal r-ary encoding scheme C = (c1 , . . . , cn ) for P that has exactly
s codewords of maximum length of the form d1, d2, . . . , ds, where s is uniquely determined
by the conditions s ≡ n (mod r − 1) and 2 ≤ s ≤ r.
As a result, for such probability distributions, we have
where q =
Pn
MinAveLenr (p1 , . . . , pn ) = MinAveLenr (p1 , . . . , pn−s , q) + q,
i=n−s+1
pi .
By Lemma 2.4 we can present Huffman’s algorithm.
Theorem 2.5. The following algorithm H produces r-ary optimal encoding schemes C
for probability distributions P:
(1) If P = (p1 , . . . , pn ), where n ≤ r, then let C = (1, . . . , n).
(2) If P = (p1 , . . . , pn ), where n > r, then
(a) Reorder P if necessary so that p1 ≥ pP
2 ≥ · · · ≥ pn .
(b) Let Q = (p1 , . . . , pn−s , q), where q = ni=n−s+1 pi and s is uniquely determined
by the conditions s ≡ n (mod r − 1) and 2 ≤ s ≤ r.
(c) Perform the algorithm H on Q, obtaining an encoding scheme D = (c1 , . . . , cn−s , d).
(d) Let C = (c1 , . . . , cn−s , d1, d2, . . . , ds).
Entropy of a Source
For the information obtained from a source symbol, it should have the property that the
less likely a source symbol is to occur, the more information we obtain from an occurrence of
that symbol, and conversely. Because the information obtained from a source symbol is not
a function of the symbol itself, but rather of the symbol’s probability of occurrence p, we use
the notation I(p) to denote the information obtained from a source symbol with probability
of occurrence p.
ENTROPY OF A SOURCE
7
Definition 2.6. For a source alphabet S, the r-ary information Ir (p) obtained from a
source symbol s ∈ S with probability of occurrence p, is given by
1
Ir (p) = logr .
p
Ir (p) can be characterized by the fact that it is the only continuous function on (0, 1]
with the property that Ir (pq) = Ir (p) + Ir (q) and Ir (1/r) = 1.
Definition 2.7. Let P = {p1 , . . . , pn } be a probability distribution. The r-ary entropy
of the distribution P is
Hr (P) =
n
X
pi Ir (pi ) =
i=1
n
X
pi logr
i=1
1
.
pi
(When pi = 0 we set pi logr (1/pi ) = 0.) If I = (S, P) is a information source, with probability
distribution P = {p1 , . . . , pn }, then we refer to Hr (I) = Hr (P) as the entropy of the source
I.
The quantity Hr (I) is the average information obtained from a simple sample of I. It
seems reasonable to say that sampling from I with equal probability gives an amount of
information equal to one r-ary unit. For instance, if S = {0, 1} with P(0) = 1/2 and
P(1) = 1/2, then it gives us one binary unit of information (or one bit of information). We
mention that many books on information theory restrict attention to binary entropy and use
the notation H(p1 , . . . , pn ) for binary entropy.
To begin with the main properties of entropy, we begin with a lemma which can be easily
derived from the fact that ln x ≤ x − 1, for all x > 0, and equality holds only when x = 1 .
Lemma 2.8. Let P = {p1 , . . . , pn } be a probability
distribution. Let Q = {q1 , . . . , qn }
Pn
have the property that 0 ≤ qi ≤ 1 for all i, and i=1 qi ≤ 1. Then
n
X
i=1
n
pi logr
X
1
1
≤
pi logr ,
pi
qi
i=1
(We set 0 · logr 10 = 0 and p logr 01 = +∞, for p > 0.)
Furthermore, the equality holds if and only if pi = qi for all i.
With Lemma 2.8 at our disposal, we can get the range of th entropy function.
Theorem 2.9. For a information source I = (S, P) of size n (i.e. |S| = n), the entropy
satisfies
0 ≤ Hr (P) ≤ logr n.
Furthermore, Hr (P) = logr n if and only if the source has a uniform distribution (i.e. all
of the source symbols are equally likely to occur), and Hr (P) = 0 if and only if one of the
source symbols has probability 1 of occurring.
Theorem 2.9 confirms the fact that, on the average, the most information is obtained
from sources for which each source symbol is equally likely to occur.
8
2. NOISELESS CODING
The Noiseless Coding Theorem
As we know, the entropy H(I) of an information source I is the amount of information
contained in the source. Further, since an instantaneous encoding scheme for I captures
the information in the source, it is reasonable to believe that the average codeword length
of such a code must be at least as large as the entropy. In fact, this is what the Noiseless
Coding Theorem says.
Theorem 2.10 (The Noiseless Coding Theorem). For any probability distribution P =
(p1 , . . . , pn ), we have
Hr (p1 , . . . , pn ) ≤ MinAveLenr (p1 , . . . , pn ) < Hr (p1 , . . . , pn ) + 1.
Notice that the condition for equality in Theorem 2.10 is that li = − logr pi , which means
that logr pi is an integer. Since this is not often the case, we cannot often expect equality.
In general, if we choose the integer li to satisfy
1
1
logr ≤ li < logr + 1,
pi
pi
for all i, then, by Kraft’s Theorem, there is an instantaneous encodings with these codeword
lengths. An encoding scheme constructed by this method is referred as a Shannon-Fano
encoding scheme. However, this method does not, in general, give the smallest possible
average codeword length.
The Noiseless Coding Theorem determines MinAveLenr (p1 , . . . , pn ) to within 1 r-ary unit,
but this may still be too much for some purposes. Fortunately, there is a way to improve
upon this, based on the following idea.
Definition 2.11. Let S = {x1 , . . . , xn } with probability distribution P(xi ) = pi , for all
i. The k-th extension of I = (S, P) is I k = (S k , P k ), where S k is the set of all strings of
length k over S and P n is the probability distribution defined for x = x1 x2 · · · xk ∈ S k by
P k (x) = P(x1 ) · · · P(xk ).
The entropy of an extension I k is related to the entropy of I in a very simple way.
It seems intuitively clear that, since we get k times as much information from a string of
length k as from a single symbol, the entropy of I k should be k times the entropy of I. The
following lemma confirms this.
Lemma 2.12. Let I be an information source and let I k be its k-th extension. Then
Hr (I k ) = kHr (I).
Applying the Noiseless Coding Theorem to the extension I k and using Lemma 2.12, gives
the final version of the Noiseless Coding Theorem.
Theorem 2.13. Let P be a probability distribution and let P k be its k-th extension. Then
Hr (P) ≤
1
MinAveLenr (S k )
< Hr (P) + .
k
k
Since each codeword in the k-th extension S k encodes k source symbol from S, the
quantity
MinAveLenr (S k )
k
EXERCISE
9
is the minimum average codeword length per source symbol of S, taken over all uniquely
decipherable r-ary encodings of S k . Theorem 2.13 says that, by encoding a sufficiently long
extension of I, we may make the minimum average codeword length per source symbol of S
as close to the entropy Hr (P) as desired. The penalty for doing so is that, since |S k | = |S|k ,
the number of codewords required to encode the k-th extension S k grows exceedingly large
as k gets large.
Exercise
(1) Let P = (0.3, 0.1, 0.1, 0.1, 0.1, 0.06, 0.05, 0.05, 0.05, 0.04, 0.03, 0.02). Find the Huffman encodings of P for the given radix r, with r = 2, 3, 4.
(2) Determine possible probability distributions that have (00, 01, 10, 11) and (0, 10, 110, 111)
as binary Huffman encodings.
(3) Determine all possible ternary Huffman encodings of sizes 5 and 6.
(4) Let C be a binary Huffman encoding. Prove that C is maximal instantaneous.
(5) Let C be a binary Huffman encoding for the uniform probability distribution P =
(1/n, . . . , 1/n) and suppose that Len(ci ) = li for i = 1, . .P
. , n. Let m = maxi {li }
(a) Show that C has the minimum total codeword length ni=1 li among all instantaneous
encodings.
(b) Show that there exist two codewords c and d in C such that Len(c) = Len(d) = m,
and c and d differ only in their last positions.
(c) Show that m − 1 ≤ li ≤ m for i = 1, . . . , n.
(d) Let n = α2k , where 1 < α ≤ 2. Let u be the number of codewords of length m − 1
and let v be the number of codewords of length m. determine u, v and m in terms
of α and k.
(e) Find MinAveLen2 (P).
(6) Prove the following properties of entropy.
(a) Let {p1 , . . . , pn , q1 , . . . , qm } be a probability distribution. If p = p1 + · + pn , then
¡ p1
¡ q1
pn ¢
qm ¢
Hr (p1 , . . . , pn , q1 , . . . , qm ) = Hr (p, 1 − p) + pHr
,...,
+ (1 − p)Hr
,...,
.
p
p
1−p
1−p
(b) Let P = {p1 , . . . , pn } and Q = {q1 , . . . , qn } be two probability distributions. For
0 ≤ t ≤ 1, we have
Hr (tp1 + (1 − t)q1 , . . . , tpn + (1 − t)qn ) ≥ tHr (p1 , . . . , pn ) + (1 − t)Hr (q1 , . . . , qn ).
(c) Let P = {p1 , . . . , pn } be a probability distribution. Suppose that ε is a positive real
number such that p1 − ε > p2 + ε ≥ 0. Thus, {p1 − ε, p2 + ε, p3 , . . . , pn } is also a
probability distribution. Show that
Hr (p1 , . . . , pn ) < Hr (p1 − ε, p2 + ε, p3 , . . . , pn ).
(7) Let S = {0, 1}. In order to guarantee that the average codeword length per source
symbol of S is at most 0.01 greater than the entropy of S, which extension of S should
we encode? How many codewords would we need?
(8) Let I be an information source and let I 2 be its second extension. Is the second extension
of I 2 equal to the fourth extension of S?
(9) Show that the Noiseless Coding Theorem is best possible by showing that for any ² > 0,
there is a probability distribution P = {p1 , . . . , pn } for which MinAveLenr (p1 , . . . , pn ) −
Hr (p1 , . . . , pn ) ≥ 1 − ².
CHAPTER 3
Noisy Coding
Communications Channels
In the previous chapter, we discussed the question of how to most efficiently encode source
information for transmission over a noiseless channel, where we did not need to be concerned
about correcting errors. Now we are ready to consider the question of how to encode source
data efficiently and, at the same time, minimize the probability of uncorrected errors when
transmitting over a noisy channel.
Definition 3.1. A communications channel consists of a finite input alphabet I =
{x1 , . . . , xs } and output alphabet O = {y1 , . . . , yt }, P
and a set of forward channel probabilities or transition probabilities, Pf (yj | xi ), satisfying tj=1 Pf (yj | xi ) = 1, for all i = 1, . . . , s.
Intuitively, we think of Pf (yj | xi ) as the probability that yj is received, given that xi is
sent through the channel. It is important not to confuse the forward channel probability
Pf (yj | xi ) with the so-called backward channel probability Pb (xi | yj ). In the forward probabilities, we assume a certain input symbol was sent. In the backward probabilities, we
assume a certain output symbol is received.
Example 3.2. The noiseless channel, which we discussed in previous chapter, has the
same
input and output alphabet I = O = {x1 , . . . , xs } and channel probabilities Pf (xi | xj ) =
(
1 i = j,
0 otherwise.
Example 3.3. A communications channel is called symmetric if it has the same input
and output alphabet I = O = {x1 , . . . , xs } and channel probabilities Pf (xi | xi ) = Pf (xj | xj )
and Pf (xi | xj ) = Pf (xj | xi ), for all i, j = 1, . . . , s. Perhaps the most important memoryless
channel is the binary symmetric channel, which has I = O = {0, 1} and channel probabilities
Pf (1 | 0) = Pf (0 | 1) = p and Pf (0 | 0) = Pf (1 | 1) = 1 − p. Thus, the probability of a symbol
error, also called the crossover probability, is p.
Example 3.4. Another important memoryless channel is the binary erasure channel,
which has input alphabet I = {0, 1}, output alphabet O = {0, ?, 1} and channel probabilities
Pf (1 | 0) = Pf (0 | 1) = q, Pf (? | 0) = Pf (? | 1) = p and Pf (0 | 0) = Pf (1 | 1) = 1 − p − q.
We will deal only with channels that have no memory, in the following sense.
Definition 3.5. A communications channel is said to be memoryless if for c = c1 · · · cn ∈
I and d = d1 · · · dn ∈ O, the probability that d is received, given that c is sent, is
n
Y
Pf (di | ci ).
Pf (d | c) =
i=1
We will also refer to the probabilities Pf (d | c) as forward channel probabilities.
11
12
3. NOISY CODING
We use the the term memoryless because the probability that an output symbol di is
received depends only on the current input ci , and not on previous inputs.
Decision Rules
A decision rule for C is a partial function f from the set of output strings to the set of the
codewords C. The process of applying a decision rule is referred to as decoding. The word
“partial” refers to the fact that f may not be defined for all output strings. The intention
is that, if an output string d is received and if f (d) ∈ C is defined, then the decision rule
decodes that f (d) is the codeword that was sent or else declares a decoding error.
Our goal is to find a decision rule that maximizes the probability of correct decoding.
The probability of correct decoding can be expressed in a variety of ways.
Conditioning on the codeword sent gives
XX
P(correct decoding) =
Pf (d | c)Pi (c),
c∈C d∈Bc
where Bc = {d|f (d) = c} and Pi (c) is the probability that c is sent through the channel.
The probabilities {Pi (c)| c ∈ C} form the so-called input distribution for the channel.
Conditioning instead on the string received gives
X
P(correct decoding) =
Pb (f (d) | d)Po (d),
d
where Po (d) is the probability that d is received through the channel and is called the output
distribution for the channel.
The probability of correct decoding can be maximized by choosing the decision rule that
maximizes each of the conditional probability Pb (f (d) | d).
Definition 3.6. Any decision rule f for which f (d) has the property that
Pb (f (d) | d) = max Pb (c | d),
c∈C
for every possible received string d, is called an ideal observer.
Proposition 3.7. An ideal observer decision rule maximizes the probability of the correct
decoding of received strings among all decision rules.
We remark that an ideal observer decision rule depends on the input distribution because
Pf (d | c)Pi (c)
Pb (c | d) = P
.
0
0
c0 ∈C Pf (d | c )Pi (c )
For the case that the input probability distribution is uniform, i.e. Pi (c) = 1/|C|, we have
Pf (d | c)
Pb (c | d) = P
.
0
c0 ∈C Pf (d | c )
Now the denominator on the right is a sum of forward channel probabilities and thus depends
only on the communications channel. Thus, maximizing Pb (c | d) is equivalent to maximizing
Pf (d | c). This leads to the following definition and proposition.
Definition 3.8. Any decision rule f for which f (d) maximizes the forward channel
probabilities, that is, for which
Pf (d | f (d)) = max Pf (d | c),
c∈C
CONDITIONAL ENTROPY AND CHANNEL CAPACITY
13
for every possible received string d, is called a maximum likelihood decision rule.
Proposition 3.9. For the uniform input distribution, an ideal observer is the same as
a maximum likelihood decoding.
Conditional Entropy and Channel Capacity
In general, knowing the value of the output of a channel will have an effect on our
information about the input. This leads us to make the following definition.
Definition 3.10. Consider a communications channel with the input alphabet I and
the output alphabet O. The r-ary conditional entropy of I, given y ∈ O, is defined by
X
1
Hr (I | y) =
Pb (x | y) logr
.
Pb (x | y)
x∈I
The r-ary conditional entropy of I, given O, is the average conditional entropy defined by
X
Hr (I | O) =
Hr (I | y)Po (y).
y∈O
Note that Hr (I | O) measure the amount of information remaining in I, after sampling
O, and so it can be interpreted as the loss of information about I caused by the channel.
Conditional entropy can also be defined for strings.
Definition 3.11. Let C be a code over the input alphabet I and D be the set of
output strings over the output alphabet O. The r-ary conditional entropy of C, given that
d = y1 · · · ym ∈ D, is defined by
X
1
Hr (C | d) =
Pb (c | d) logr
.
Pb (c | d)
c∈C
The r-ary conditional entropy of C, given D is defined by
X
Hr (C | d)Po (d).
Hr (C | D) =
d∈D
The quantity Ir (I, O) = Hr (I) − Hr (I | O) is the amount of information in I minus the
amount of information still in I after knowing O. In other words, Ir (I, O) is the amount of
information about I that gets through the channel.
Definition 3.12. The r-ary mutual information of I and O is defined by
X
1
Ir (I, O) = Hr (I) − Hr (I | O) =
Pi (x) logr
− Hr (I | O).
Pi (x)
x∈I
Notice that the quantity Ir (I, O) depends upon the input distribution of I as well as the
forward channel probabilities Pf (y | x).
We are now ready to define the concept of the capacity of a communications channel.
This concept plays a key role in the main results of information theory.
Definition 3.13. The capacity of a communications channel is the maximum mutual
information Ir (I, O), taken over all input distributions of I.
14
3. NOISY CODING
Proposition 3.14. Consider a symmetric channel with input alphabet and output alphabet I of size r. Then capacity of this symmetric channel is
X
1
Pf (y | x) logr
1−
,
Pf (y | x)
y∈I
for any x ∈ I. Furthermore, the capacity is achieved by the uniform input distribution.
Corollary 3.15. The capacity of the binary symmetric channel with crossover probability p is
1 + p log2 p + (1 − p) log2 (1 − p).
The Noisy Coding Theorem
It is sometimes said that there are two main results in information theory. One is the
Noiseless Coding Theorem, which we discussed in previous chapter, and the other is the
so-called Noisy Coding Theorem.
Before we can state the Noisy Coding Theorem formally, we need to discuss in detail the
notion of rate of transmission. Let us suppose that the source information is in the form
of strings of length k, over the input alphabet I of size r and that the r-ary block code
C consist of codewords of fixed length n over I. Now, since the channel must transmit n
code symbols in order to send k source symbols, the rate of transmission is R = k/n source
symbols per code symbol. Further, since there are rk possible source strings, the code must
have size at least rk in order to accommodate all of these strings. Assuming that |C| = rk ,
we have k = logr |C| and hence R = logr |C|/n. Thus we have the following.
Definition 3.16. An r-ary block code C of length n and size |C| is called an (n, |C|) −
code. The number
logr |C|
R(C) =
n
is called the rate of C.
Now, we can state the Noisy Coding Theorem. Let dxe denote the smallest integer greater
than or equal to x.
Theorem 3.17 (The Noisy Coding Theorem). Consider a memoryless communications
channel with capacity C. For any positive number R < C, there exists a sequence Cn of r-ary
block codes and corresponding decision rules fn with the following properties.
(1) Cn is an (n, drnR e)-code. Thus, Cn has length n and rate at least R.
(2) The probability of decoding error of fn approach 0 as n → ∞.
Roughly speaking, the Noisy Coding Theorem says that, if we choose any transmission
rate below the capacity of the channel, there exists a code that can transmit at that rate
and yet maintain a probability of decoding error below some predefined limit.
The price we pay for this efficient encoding is that the code size n may be extremely
large. Furthermore, the known proofs of this theorem tell us only that such a code must
exist, but do not show us how to actually find these codes.
EXERCISE
15
Exercise
(1) Consider a channel whose input alphabet is the set of all integers between −n and n and
whose output is the square of the input. Determinate the forward channel probabilities
of this channel.
(2) Suppose that codewords from the code {0000, 1111} are being sent over a binary symmetric channel (c.f. Example 3.3) with crossover probability p = 0.01. Use the maximum
likelihood decision rule to decode the received strings 0000, 0010 and 1010.
(3) Let C be a block code consists of all 8 binary strings of length 3. Denote the input codeword by i1 i2 i3 and the received string by o1 o2 o3 . Let B.S.C. denote a binary symmetric
channel with crossover probability p = 0.001. Consider the following different channels.
(a) The first channel works as follows: send i1 through the B.S.C. to get o1 and no
matter what i2 and i3 are, choose o2 and o3 randomly.
(b) The second channel works as follows: send i1 through the B.S.C. to get o1 , send i2
through the B.S.C. to get o2 and send i3 through the B.S.C. to get o3 .
(c) The third channel works as follows: choose o1 = o2 = o3 to be the majority bit
among i1 , i2 and i3 .
Compute the probability of correct decoding for each of these channels, assuming a
uniform input distribution. Which channel is best?
(4) Show that for a symmetric channel with uniform input distribution, the output distribution is also uniform.
(5) Let I and O be the input alphabet and the output alphabet of a noiseless communications
channel. Show that Hr (I | O) = 0.
(6) Let I and O be the input alphabet and the output alphabet of a communications channel
with forward channel probabilities {Pf (y | x) | x ∈ I, y ∈ O}. Suppose that {Pi (x) | x ∈
I} is the input distribution and {Po (y) | y ∈ O} is the output distribution for the channel.
(a) Show that the backward channel probability for x ∈ I and y ∈ O is
Pf (y | x)Pi (x)
.
Pb (x | y) =
Po (y)
(b) Show that for an r-ary symmetric channel,
X
X
1
1
Ir (I, O) =
Po (y) logr
−
Pf (y | x) logr
,
P
P
o (y)
f (y | x)
y∈O
y∈O
for any x ∈ I.
(7) Consider the special case of a binary erasure channel (c.f. Example 3.4), which has
input alphabet I = {0, 1}, output alphabet O = {0, ?, 1} and channel probabilities
Pf (1 | 0) = Pf (0 | 1) = 0, Pf (? | 0) = Pf (? | 1) = p and Pf (0 | 0) = Pf (1 | 1) = 1 − p.
Calculate the mutual information I2 (I, O) in terms of the input probability Pi (0) = p0 .
Then determine the capacity of the channel, and an input probability that achieves that
capacity.
CHAPTER 4
General Remarks on Codes
Nearest Neighbor Decoding
In general the problem of finding good codes is a very difficult one. However, by making
certain assumptions about the channel, we can at least give the problem a highly intuitive
flavor. We begin with a definition.
Definition 4.1. Let x = x1 x2 · · · xn and y = y1 y2 · · · yn be strings of the same length n
over the same alphabet A. The Hamming distance d(x, y) between x and y is the number
of positions in which xi 6= yi .
For instance, if x = 10112 and y = 20110, then d(x, y) = 2. The following result says
that Hamming distance is a metric.
Proposition 4.2. Let An be the set of all strings of length n over the alphabet A. Then
the Hamming distance function d : An × An → N satisfies the following properties. For all
x, y and z in An ,
(1) d(x, y) ≥ 0 and d(x, y) = 0 if and only if x = y;
(2) d(x, y) = d(y, x);
(3) d(x, y) ≤ d(x, z) + d(z, y).
In other words, (An , d) is a metric space.
Suppose that C is a block code of length n over A. The codewords that are closest to a
given received string x are referred to as nearest neighbor codewords. The nearest neighbor
decoding or minimum distance decoding is the decision rule that decodes a received strings
as a nearest neighbor codeword. When there are more than one nearest neighbor codeword,
we will refer to this situation as a tie. In some cases, we may wish to choose randomly from
among the candidates. In other cases, it might be more desirable simply to admit a decoding
error. The term complete decoding refers to the case where all received strings are decoded
and the term incomplete decoding refers to the case where we prefer occasionally to simply
admit an error, rather than always decodes.
There are many channels for which maximum likelihood decoding takes the intuitive
form of nearest neighbor decoding. For instance, the r-ary symmetric channel with forward
channel probabilities
(
1 − p if i = j,
Pf (xi | xj ) =
p
otherwise.
r−1
has this property, for p < 1/2.
In implementing nearest neighbor decoding, the following concepts are useful.
Definition 4.3. Let C be a block code with at least two codewords. The minimum
distance of C is defined to be
d(C) = min{d(c, d) | c, d ∈ C, c 6= d}.
17
18
4. GENERAL REMARKS ON CODES
An (n, M, d)-code is a block code of size M , length n and minimum distance d. The
numbers n, M and d are called the parameters of the code.
Since for c 6= d, d(c, d) ≥ 1, the minimum distance of a code must be at least 1.
Perfect Code
Definition 4.4. Let x be a string in An , where |A| = r and let ρ > 0. The sphere in
A with center x and radius ρ is the set
n
Srn (x, ρ) = {y ∈ An | d(x, y) ≤ ρ}.
The volume Vrn (ρ) of the sphere Srn (x, ρ) is the number of elements in Srn (x, ρ).
This volume is independent of the center and is given by
bρc µ ¶
X
n
n
(r − 1)k ,
Vr (ρ) =
k
k=0
where bρc denote the greatest integer smaller than or equal to ρ.
We can determine the minimum distance of a code C by simply increasing the radius t of
the spheres centered at each codeword of C until just before two spheres become “tangent”
(which will happen when d(C) = 2t + 2), or just before two spheres “overlap” (which will
happen when d(C) = 2t + 1).
Definition 4.5. Let C ∈ An be a code. The packing radius of C is the largest integer ρ
for which the spheres Srn (c, ρ) centered at each codeword c are disjoint. The covering radius
of C is the smallest integer ρ0 for which the spheres Srn (c, ρ0 ) centered at each codeword c
cover An . We will denote the packing radius of C by pr(C) and the covering radius by cr(C).
c.
Proposition 4.6. The packing radius of an (n, M, d)-code C is pr(C) = b d−1
2
The following concept plays a major role in coding theory.
Definition 4.7. An r-ary (n, M, d)-code C is perfect if pr(C) = cr(C)
In words, a code C ⊆ An is perfect if there exists a number ρ for which the spheres
centered at each codeword c are disjoint and cover An .
The size of a perfect code is uniquely determined by the length and the minimum distance.
The following result is known as the sphere-packing condition.
Srn (c, ρ)
Proposition 4.8. Let C be an r-ary (n, M, d)-code. Then C is perfect if and only if
d = 2v + 1 is odd and
v µ ¶
X
n
n
(r − 1)k = rn .
M · Vr (v) = M ·
k
k=0
It is important to emphasize that the existence of numbers n, M and d = 2v + 1 for
which the sphere-packing condition holds does not mean that there is a perfect code with
these parameters. The problem of determining all perfect codes has not yet been solved.
However, a great deal is known about perfect codes over alphabets whose size is a power of
a prime.
ERROR DETECTION AND ERROR CORRECTION
19
Error Detection and Error Correction
Let u be a positive integer. If u errors occur in the transmission of a codeword, we will
say that an error of size u has occurred. It is possible that so many errors occurred as
to change the codeword into another codeword, so that we cannot detect if any error has
occurred or not.
Definition 4.9. A code C is u-error-detecting, if whenever an error of size of at most
u but at least one has occurred, the resulting string is not a codeword. A code C is exactly
u-error-detecting if it is u-error-detecting but not u + 1-error-detecting.
The next theorem is essentially just a restatement of the definition of u-error-detecting
in terms of minimum distance.
Theorem 4.10. A code C is u-error-detecting if and only if d(C) ≥ u + 1. In particular,
C is exactly u-error-detecting if and only if d(C) = u + 1.
Definition 4.11. Let v be a positive integer. A code C is v-error-correcting if nearest
neighbor decoding is able to correct v or fewer errors, assuming that if a tie occurs in the
decoding process, a decoding error is reported. A code is exactly v-error-correcting if it is
v-error-correcting but not (v + 1)-error-correcting.
It should be kept in mind that, as long as the received word is not a codeword, nearest
neighbor decoding will decode it as some codeword, but the receiver has no way of knowing
whether that codeword is the one that was actually sent. We know only that, under a
v-error-correcting code, if no more than v errors were introduced, then nearest neighbor
decoding will produce the codeword that was sent.
Theorem 4.12. A code is v-error-correcting if and only if d(C) ≥ 2v + 1. In particular,
C is exactly v-error-correcting if and only if d(C) = 2v + 1 or d(C) = 2v + 2.
c-errorCorollary 4.13. A code C has d(C) = d if and only if it is exactly b d−1
2
correcting.
The following result is a consequence of Proposition 4.6 and Theorem 4.12. It shows the
connection between error correction and pr(C).
Corollary 4.14. Assuming that ties are always reported as error, a code C is exactly
v-error-correcting if and only if pr(C) = v.
Example 4.15. The r-ary repetition code of length n is
Repr (n) = {00 · · · 0, 11 · · · 1, . . . , (r − 1)(r − 1) · · · (r − 1)},
consisting of r codewords each of length n. The r-ary repetition code of length n can detect
up to n − 1 errors in transmission, and so it is exactly (n − 1)-error-detecting. Furthermore,
c-error-correcting.
it is exactly b n−1
2
Suppose that a code C has minimum distance d. If we use C for error detecting only,
it can detect up to d − 1 errors. On the other hand, if we want C to also correct errors
c errors, but may no longer be able to
whenever possible, then it can correct up to b d−1
2
d−1
detect a situation where more than b 2 c but less than d errors have occurred. For if more
than b d−1
c are made, nearest neighbor decoding might “correct” the received word to the
2
wrong codeword and thus the errors will go undetected.
20
4. GENERAL REMARKS ON CODES
We consider the following strategy: Let v be a positive integer. If a string x is received
and if the closed codeword c to x is at a distance of at most v, and there is only one such
codeword, then decode x as c. If there is more than one codeword at minimum distance to
x or if the closest codeword has distance greater than v, then simply declare an error.
Definition 4.16. A code C is simultaneously v-error-correcting and u-error-detecting, if
whenever at least one but at most v errors are made, the strategy describe above will correct
these errors and if whenever at least v + 1 but at most v + u errors are made, the strategy
above simply reports an error.
Theorem 4.17. A code C is simultaneously v-error-correcting and u-error-detecting if
and only if d(C) ≥ 2v + u + 1.
It is intuitively clear that, given any code C, we may continually add new codewords to
it at no cost to its minimum distance. This leads us to make the following definition.
Definition 4.18. An (n, M, d)-code is said to be maximal if it is not contained in any
larger code with the same minimum distance, that is, if it is not contained in any (n, M +1, d)code.
Thus an (n, M, d)-code C is maximal if and only if, for all strings x ∈ An , there is a
codeword c ∈ C with the property that d(x, c) < d.
Proposition 4.19. For the binary symmetric channel with crossover probability p using
minimum distance decoding, the probability of a decoding error for maximal (n, M, d)-code
satisfies
b d−1
cµ ¶
n µ ¶
2
X
X
n k
n k
n−k
p (1 − p)
≤ P(decode error) ≤ 1 −
p (1 − p)n−k .
k
k
k=d
k=0
Furthermore, for a non-maximal code, the upper bound still holds, but the lower bound may
not.
Making New Codes from Old Codes
There are several useful techniques that can be used to obtain new codes from old codes.
In the following, we always suppose that our codes are over the alphabet A = Zr = Z/rZ.
Extending a Code. The process of adding one or more additional positions to all the
codewords in a code, thereby increasing the length of the code, is referred to as extending the
code. The most common way to extend a code is by adding an overall parity check, which is
done as follows. If C is an r-ary (n, M, d)-code over Zr , we define the extended code C by
C = {c1 c2 · · · cn cn+1 | c1 c2 · · · cn ∈ C and
n+1
X
ck ≡ 0
(mod r)}.
k=1
If C is an (n, M , d)-code, then n = n + 1, M = M and d = d or d + 1.
We remark that for a binary (n, M, d)-code C, the minimum distance of C depends on
the parity of d. In particular, since all of the codewords in C have even sum, the minimum
distance of C is even. It follows that if d is even then d(C) = d and if d is odd then
c = b d(C)−1
c, the error-correcting capabilities of the
d(C) = d + 1. Moreover, since b d(C)−1
2
2
code do not increase.
MAKING NEW CODES FROM OLD CODES
21
Puncturing a Code. The opposite process to extending a code is puncturing a code, in
which one or more positions are removed from the codewords. If C is an r-ary (n, M, d)-code
and if d ≥ 2, then the code C ∗ obtained by puncturing C once has parameters n∗ = n − 1,
M ∗ = M and d∗ = d or d − 1.
For binary code, the process of extending and puncturing can be used to prove the
following useful result.
Lemma 4.20. A binary (n, M, 2v +1)-code exists if and only if a binary (n+1, M, 2v +2)code exists.
Shortening a Code. Shortening a code refers to the process of keeping only those
codewords in a code that have a given symbol in a given position, and then deleting that
position. If C is an (n, M, d)-code then a shortened code has length n − 1 and minimum
distance at least d. In fact, shortening a code can result in a substantial increase in the
minimum distance, but shortening a code does result in a code with smaller size.
The shortened code formed by taking codewords with an s in the i-th position is referred
to as the cross-section xi = s. We will have many occasions to use cross-sections in the
sequel.
Augmenting a Code. Augmenting a code which simply means adding additional strings
to the code. A common way to augment a binary code C is to include the complements of
each codeword in C, where the complement of a binary codeword c is the string obtained
from c by interchanging all 0’s and 1’s.
Let us denote the complement of c by cc and denote the set of all complements of the
codewords in C by C c . It is easy to check that if x, y ∈ Zn2 , then d(x, yc ) = n − d(x, y).
Proposition 4.21. Let C be a binary (n, M, d)-code. Suppose that d0 is the maximum
distance between codewords in C. Then d(C ∪ C c ) = min{d, n − d0 }.
The Direct Sum Construction. If C1 is an r-ary (n1 , M1 , d1 )-code and C2 is an r-ary
(n2 , M2 , d2 )-code, the direct sum C1 ¯ C2 is the code
C1 ¯ C2 = {cd | c ∈ C1 , d ∈ C2 }.
Clearly, C1 ¯ C2 has parameters n = n1 + n2 , M = M1 M2 and d = min{d1 , d2 }.
The u(u + v) Construction. A much more useful construction than the direct sum is
the following. If C1 is an r-ary (n, M1 , d1 )-code and C2 is an r-ary (n, M2 , d2 )-code, then we
define a code C1 ⊕ C2 by
C1 ⊕ C2 = {c(c + d) | c ∈ C1 , d ∈ C2 }.
Certainly, the length of C1 ⊕ C2 is 2n and the size is M1 M2 . As for the minimum distance,
consider two distinct codewords x = c1 (c1 + d1 ) and y = c2 (c2 + d2 ). If d1 = d2 , then
d(x, y) ≥ 2d1 . On the other hand, if d1 6= d2 , then d(x, y) ≥ d2 . Since equality can hold in
both cases, we get the following result.
Lemma 4.22. Let C1 be an r-ary (n, M1 , d1 )-code and C2 be an r-ary (n, M2 , d2 )-code.
Then C1 ⊕ C2 is a (2n, M1 M2 , d0 )-code, where d0 = min{2d1 , d2 }.
22
4. GENERAL REMARKS ON CODES
Equivalence of Codes. There are various definitions of equivalence of codes in the
literature. We will adopt the following definitions.
Definition 4.23. Two r-ary (n, M )-codes C1 and C2 are equivalent if there exists a
permutation σ of the n positions and permutations π1 , . . . , πn of the code alphabet for which
c1 c2 · · · cn ∈ C1 if and only if π1 (cσ(1) )π2 (cσ(2) ) · · · πn (cσ(n) ) ∈ C2 .
In particular, any r-ary code over Zr is equivalent to a code that contains the zero codeword 0 = 00 · · · 0. Furthermore, equivalent codes have the same length, size and minimum
distance.
The Main Coding Theory Problem
A good r-ary (n, M, d)-codes should have a relatively large size so that it can be used
to encode a large number of source messages and it should have a relatively large minimum
distance so that it can be used to correct a large number of errors. Not surprisingly, these
goals are conflicting.
For given values of n and d, it is customary to let Ar (n, d) denote the largest possible size
M for which there exists an r-ary (n, M, d)-code. Any r-ary (n, M, d)-code with M = Ar (n, d)
is called an optimal code. The numbers Ar (n, d) play a central role in coding theory and
much effort has been expended in attempting to determine their values. In fact, determining
the values of Ar (n, d) has come to be known as the main coding theory problem.
Note that in order to show that Ar (n, d) = M , it is enough to show that Ar (n, d) ≤ M
and then find a specific r-ary (n, M )-code C for which d(C) ≥ d, which shows that Ar (n, d) ≥
Ar (n, d(C)) ≥ M .
Example 4.24. Let C be a binary (4, M, 3)-code. Without lose of generality, we may
assume that C contains the zero codeword 0 = 0000. Now since d(c, 0) ≥ 3 for any other
codeword c in C. This leaves five possibilities for additional codewords in C, namely:
1110, 1101, 1011, 0111, 1111.
But no pair of these has distance 3 apart, and so only one can be included in C. Hence
A2 (4, 3) = 2.
Example 4.25. Let C be a binary (5, M, 3)-code. Consider the cross-section C0 defined
by x1 = 0. We know that C0 has minimum distance d0 where 4 ≥ d0 ≥ 3 and since
A2 (4, 3) = A2 (4, 4) = 2, it follows that C0 has size M0 ≤ 2. Similarly, consider the crosssection C1 defined by x1 = 1. C1 has size M1 ≤ 2. Thus M = M0 + M1 ≤ 4 and hence
A2 (5, 3) ≤ 4. On the other hand, the code C = {00000, 11100, 00111, 11011} has minimum
distance d(C) = 3 and so A2 (5, 3) = 4.
The approach used in Example 4.25 will not go very far in determining values of A2 (n, d).
In fact, very few actual values of A2 (n, d) are known. For instance, we only know that
72 ≤ A2 (10, 3) ≤ 79.
Let us now turn to the establishment of some general results about the numbers Ar (n, d).
Proposition 4.26. For any n ≥ 1,
(1) Ar (n, d) ≤ rn for all 1 ≤ d ≤ n.
(2) Ar (n, 1) = rn .
(3) Ar (n, n) = r.
THE MAIN CODING THEORY PROBLEM
23
Let C be an optimal r-ary (n, M, d)-code. By use of the pigeon-hole principle, one of
the cross-sections x1 = i of C must contain at least M/r codewords, and so we have the
following.
Proposition 4.27. For any n ≥ 2, Ar (n, d) ≤ rAr (n − 1, d).
According to Lemma 4.20, a binary (n, M, 2v + 1)-code exists if and only if a binary
(n + 1, M, 2v + 2)-code exists. Hence, we immediately have the following.
Proposition 4.28. If d > 0 is even, then A2 (n, d) = A2 (n − 1, d − 1).
Thus, for binary codes, it is enough to determine A2 (n, d) for all odd values of d.
Let us now turn our attention to some upper and lower bounds on the numbers Ar (n, d)
that arise from considering spheres in Znr .
Let C = {c1 , . . . , cM } be an optimal r-ary (n, M, d)-code over Zr . Thus M = Ar (n, d).
Because C has maximal size, there can be no string in Znr whose distance from every codeword
S
n
n
n
in C is at least d. In symbols Znr ⊆ M
i=1 Sr (ci , d − 1). Since |Zr | = r , it implies that
rn ≤ Vrn (d − 1) · M . We arrive at the following result, called the sphere-covering bound for
Ar (n, d).
Theorem 4.29 (The sphere-covering bound for Ar (n, d)). If Vrn (ρ) denotes the volume
of a sphere of radius ρ in Znr , then
rn
≤ Ar (n, d).
Vrn (d − 1)
The sphere-covering bound is a lower bound for Ar (n, d). We can derive an upper bound
for Ar (n, d) by similar methods. In particular, let C = {c1 , . . . , cM } be an optimal (n, M, d)S
n
n
code. Since pr(C) = b d−1
c and M
i=1 Sr (ci , pr(C)) ⊆ Zr , we have the sphere-packing bound
2
for Ar (n, d).
Theorem 4.30 (sphere-packing bound for Ar (n, d)). If Vrn (ρ) denotes the volume of a
sphere of radius ρ in Znr , then
rn
Ar (n, d) ≤ n d−1 .
Vr (b 2 c)
The sphere-packing bound is not the only useful upper bound on the values of Ar (n, d).
We consider two additional bounds.
Let C be an (n, M, d)-code. If we remove the last d − 1 positions from each codeword in
C, the resulting shortened codewords must all be distinct. Since the length of the shortened
codewords is n − d − 1, we have the following.
Theorem 4.31 (The Singleton bound).
Ar (n, d) ≤ rn−d+1 .
Example 4.32. According to the Singleton bound, Ar (4, 3) ≤ r2 . On the other hand,
the sphere-packing bound is Ar (4, 3) ≤ r4 /(4r − 3). Thus, for r ≥ 4. the Singleton bound is
much more better than the sphere-packing bound.
Let C be an r-ary (n, M,
P d)-code and consider the sum of the distance between codewords,
which is given by S =
c,d∈C d(c, d). Since the minimum distance of C is d, we have
S ≥ M (M − 1)d. On the other hand, suppose that the number of j’s in the i-th position of
24
4. GENERAL REMARKS ON CODES
all codewords in C is kij , where j = 0, . . . , r − 1. Then the i-th position contributes a total
of
r−1
r−1
X
X
M2
2
kij (M − kij ) = M −
kij2 ≤ M 2 −
r
j=0
j=0
to S, since the last sum above is smallest when kij = M/r. Since there are n positions, we
have M (M − 1)d ≤ S ≤ nM 2 (1 − 1/r). Solving for M gives the following result.
Theorem 4.33 (The Plotkin Bound). If n < dr/(r − 1), then
dr
Ar (n, d) ≤
.
dr − nr + n
The Plotkin bound can easily be refined a bit when r = 2.
Theorem 4.34. (The Plotkin Bound for Binary Code).
(1) If d is even and n < 2d, then
d
A2 (n, d) ≤ 2b
c
2d − n
and for n = 2d, A2 (2d, d) ≤ 4d.
(2) If d is odd and n < 2d + 1, then
d+1
A2 (n, d) ≤ 2b
c
2d + 1 − n
and for n = 2d + 1, A2 (2d + 1, d) ≤ 4d + 4.
The Plotkin bound applies only when the minimum distance d is rather large. It seems
superior to the sphere-packing bound.
Example 4.35. The Plotkin bound can also be used, in conjunction with Proposition
4.27, to give an upper bound when d ≤ n(r − 1)/r. For example, We have A2 (13, 5) =
23 A2 (10.5) ≤ 96.
Exercise
(1) Consider the code C consisting of all strings in Zn2 that have an even number of 1s. What
is the length, size, and minimum distance of C?
(2) Let c, d ∈ An and consider the sets S = {x ∈ An | d(x, c) < d(x, d)} and T = {x ∈
An | d(x, c) > d(x, d)}. Show that |S| = |T |.
(3) Construct an explicit example to illustrate that simultaneous error detection and correction can reduce the error detecting capabilities of a code.
(4) Estimate the probability of a decoding error using the binary repetition code of length
5 under a binary symmetric channel with crossover probability p = 0.001.
(5) Dose a binary (8, 4, 5)-code exist? Justify your answer.
(6) Let C be an r-ary (n, M, d)-code over the alphabet Zr . Show that, as long as d < n,
then for some position i, there is a cross-section that has minimum distance d. What
can happen if d = n?
(7) Suppose that C is an (n, M, d)-code. Show that C is a cross-section of a larger code with
parameters (n + 1, M + 2, 1).
(8) Let C1 = {c1 c2 c3 c4 | c1 + c2 + c3 + c4 ≡ 0 (mod 2)} be the code over Z2 .
(a) What are the parameters of C1 ?
EXERCISE
(9)
(10)
(11)
(12)
(13)
(14)
(15)
25
(b) Construct C2 = C1 ⊕ Rep2 (4). What are the parameters of C2 ?
(c) What are the parameters of C3 = C2 ⊕ Rep2 (8)?
(d) What are the parameters of C4 = C3 ⊕ Rep2 (16)?
(e) Show that we can construct a binary (2m , 2m+1 , 2m−1 )-code in this fashion.
If C is a code over Zp and C is the code obtained by adding an overall parity check,
what is the relation between the minimum distances of C and C?
Verify that A2 (6, 5) = 2, A2 (7, 5) = 2 and A2 (8, 5) = 4.
Let C be an (n, M, d)-code.
(a) If C is not maximal, is it always possible to add codewords to C until the resulting
code is maximal?
(b) If C is not optimal, is it always possible to add codewords to C until the resulting
code is optimal?
(c) Given an example of a code that is maximal but not optimal.
Is there a binary (8, 29, 3)-code? Explain.
Show that Ar (r + 1, 5) ≤ 2rr−2 /(r − 1).
Compare the Singleton, Plotkin and sphere-packing upper bounds for A2 (9, 5).
Let C be a perfect binary (n, M, 7)-code. Use the sphere-packing condition to show that
n = 7 or n = 23.
CHAPTER 5
Linear Codes
Finite Fields
Finite fields play a major role in coding theory and so it is important to gain a solid
understanding of the structure of such fields.
Let K and F be fields. If K is an extension of F , we write K/F . In this case, K is
also a vector space over F . If the dimension of K over F is finite, we say that K is a finite
extension of F and denote this dimension by [K : F ]. It is easy to check that if F is a finite
field and K is a finite extension of F with d = [K : F ], then K is a finite field such that
|K| = |F |d .
If R is a ring and if there exists a positive integer n for which
n·a=a
· · + a} = 0
| + ·{z
n times
for all a ∈ R, then the smallest such n is called the characteristic of R and is denoted by
char(R). If no such positive integer n exists, we say that R has characteristic 0.
In a field of characteristic 0, the positive integers 1, 2, . . . , are all distinct, and so a finite
field must have nonzero characteristic. Suppose that the characteristic of a finite field F is
n. If n = uv where 1 < u, v < n, then (u · 1)(v · 1) = 0 implying u · 1 = 0 or v · 1 = 0. In
either case, we have a contradiction to the fact that n is the smallest positive integer such
that n · 1 = 0. Thus, n must be a prime number.
Lemma 5.1. If F is a finite field, then F has prime characteristic. Furthermore, if
char(F ) = p, then F has pn elements, for some positive integers n.
From now on, p will represent a prime number and q will represent a prime power.
The following result is a key reason why the theory of finite fields has its characteristic
flavor.
Lemma 5.2. If F is a finite field of characteristic p, then
n
n
n
(α + β)p = αp + β p ,
for any positive integer n and for all α, β ∈ F .
According to the definition, the set F ∗ of nonzero elements of a field F forms a group
under multiplication. If |F | = q, then |F ∗ | = q − 1 and since the order of every element in a
group divides the order of the group, we have αq−1 = 1 for all α ∈ F ∗ . In other words, every
element of F is a root of the polynomial fq (x) = xq − x. But since this polynomial has at
most q roots, we see that F is the set of all roots of fq (x) and therefore is also the splitting
field for fq (x).
Lemma 5.3. If F is a finite field of q elements, then F is both the set of all roots of
fq (x) = xq − x and the splitting field for fq (x).
27
28
5. LINEAR CODES
Since any two splitting fields for the same polynomial are isomorphic, Lemma 5.3 tells
us that any finite field of the same size is isomorphic. We will denote a finite field of size q
by Fq .
It remains now to determine whether or not there is a finite field of size q for every prime
power q = pn . Let K be the splitting field for fq (x) = xq − x and let R be the set of roots
of fq (x). If α, 0 6= β ∈ R, then by Lemma 5.2, α + β and αβ −1 are also in R. Thus, R is a
subfield of K which implies that R = K. Let us summarize our results.
Theorem 5.4. All finite fields have size q = pn , for some prime p. On the other hand,
for every q = pn , there exists a unique (up to isomorphism) field of size q.
Our goal now is to describe the subfield of a finite field. Suppose that K is a field of size
d
n
p and let d | n. It is not hard to show that pd − 1 | pn − 1 and so xp − x | xp − x. Hence
d
fpd (x) = xp − x splits into linear factors over K. In other words, K contains a subfield of
size pd .
n
Theorem 5.5. Let K be a finite field of size pn . Then K has exactly one subfield of size
pd for each d | n. Furthermore, this accounts for all of the subfields of K.
For a finite field, the multiplicative group K ∗ could not have a simpler structure: it is
cyclic. Recall that if G is a cyclic group of order n, then G contains exactly φ(d) elements
of each order d dividing n, where φ is the Euler’s phi function. This gives the formula
X
φ(d) = n.
d|n
Now, suppose that |F ∗ | = q −1 and α is an element of F ∗ of order d. Thus, d | q −1. Consider
the cyclic subgroup < α > generated by α. Every element of < α > has order dividing d
and so is a root of the polynomial xd − 1. But this polynomial can have at most d roots in F
and so < α > is the set of all roots of xd − 1. In particular, all of the elements of F of order
d must lie in < α >. However, in < α >, there are exactly φ(d) elements of order d. Hence,
letting ψ(d) denote the number of elements of F of order d, then ψ(d) = φ(d) or ψ(d) = 0,
we have
X
X
φ(d).
ψ(d) = |F ∗ | = q − 1 =
d | q−1
d | q−1
We have the following result.
Theorem 5.6. If F is a finite field of q elements, then F contains exactly φ(d) elements
of order d, for each d | q − 1. In particular, the multiplicative group F ∗ of nonzero elements
of F is cyclic.
Basic Definitions
The set Fnq of all n-tuples whose components belong to Fq is a vector space over Fq of
dimension n. We will write the vector (x1 , x2 , . . . , xn ) in the form x1 x2 · · · xn .
We can now define the most important and most studied type of code.
Definition 5.7. A code C ⊆ Fnq that is also a subspace of Fnq is called a linear code.
If C has dimension k and minimum distance d(C) = d, then C is an [n, k, d]-code. When
we do not care to emphasize the minimum distance d, we use the notation [n, k]-code. The
number n, k and d are called the parameters of the linear code.
BASIC DEFINITIONS
29
Note that a linear code C being a subspace of Fnq , must contain the zero codeword
0 = 0 · · · 0. Note also that a q-ary linear [n, k, d]-code is an (n, q k , d)-code.
Since a linear code is a vector space, we can describe it by giving a basis. It is customary
to arrange the basis vectors as rows of a matrix.
Definition 5.8. Let C be an [n, k]-code with a basis B = {b1 , . . . , bk }. If
b1 = b11 b12 · · · b1n
b2 = b21 b22 · · · b2n
..
.
bk = bk1 bk2 · · · bkn
then the k × n matrix

b11 b12 · · ·
b21 b22 · · ·
G=
..

.
bk1 bk2 · · ·

b1n
b2n 


bkn
whose rows are the codewords in B, is called the generator matrix for C.
If C is a q-ary linear [n, k]-code with generator matrix G, then the codewords in C are
precisely the row space of G. Put another way, C = {x · G | x ∈ Fnq }. Since performing
elementary row operations does not change the row space of a matrix, any matrix that is
row equivalent to G is also a generator matrix for C. On the other hand, interchanging two
column of G, gives us a generator matrix for a code which is equivalent to C.
A generator matrix of the form G = (Ik | Mk,n−k ) (where Ik is the identity matrix of size
k × k and Mk,n−k is a matrix of size k × (n − k)), is said to be in left standard form. In
view of the previous remarks, every linear code is equivalent to a linear code which has a
generator matrix in standard form. When a k × n generator matrix is in left standard form,
it makes both encoding and decoding processes very simple.
Example 5.9. As we will see later, the

1 0
0 1
G=
0 0
0 0
matrix
0
0
1
0
0
0
0
1
0
1
1
1
1
0
1
1

1
1

0
1
is a generator matrix for the Hamming code H2 (3). The Hamming code H2 (3) can encode
source words from F42 as follows


1 0 0 0 0 1 1
0 1 0 0 1 0 1 

x·G = (x1 , x2 , x3 , x4 ) 
0 0 1 0 1 1 0 = (x1 , x2 , x3 , x4 , x2 +x3 +x4 , x1 +x3 +x4 , x1 +x2 +x4 )
0 0 0 1 1 1 1
Since G is in left standard form. the original source message appears as the first k symbols
of its codeword.
30
5. LINEAR CODES
The Dual of a Linear Code
We have seen several ways of constructing new codes from old ones. Now, we describe
another method (perhaps the most important one for linear codes).
Definition 5.10. Let x = x1 x2 · · · xn and y = y1 y2 · · · yn be strings in Fnq . The inner
product of x and y, denoted by x · y, is the element of Fq defined by
x · y = x1 y1 + x2 y2 + · · · + xn yn
where the sum and product are taken in Fq .
For any set S ⊆ Fnq , we let S ⊥ denote the set of all strings in Fnq that are orthogonal to
every strings in S. Thus,
S ⊥ = {x ∈ Fnq | s · x = 0, ∀ s ∈ S}.
This set is called the orthogonal complement of S.
Lemma 5.11. For any subset S in Fnq , the set S ⊥ is a linear code.
From Lemma 5.11, we have the following definition.
Definition 5.12. The orthogonal complement C ⊥ of any code C is a linear code called
the dual code of C.
We may apply some basic linear algebra to get the following results which give some of
the most basic properties of dual codes.
Proposition 5.13. Let C be a linear [n, k]-code over Fq , with generator matrix G.
(1) C ⊥ is the set of all strings that are orthogonal to every row of G. In symbols,
C ⊥ = {x ∈ Fnq | x · Gt = 0}.
(where Gt is the transpose of G)
(2) C ⊥ is a linear [n, n − k]-code. In other words,
dim(C ⊥ ) = n − dim(C).
(3) We have (C ⊥ )⊥ = C.
We should remark that the properties of the dual of a linear code over a finite field can
be quite different from those of the dual space of a vector space over the real numbers. For
instance, if W is a subspace of a finite dimensional real vector space V , then W ⊥ ∩ W = {0},
since no vector is orthogonal to itself. This is not always the case for linear codes over finite
field, however. In fact, as the next example illustrates, we can even have C ⊥ = C.
Example 5.14. For the binary [4, 2]-code, C = {0000, 1100, 0011, 1111}, we have C ⊆
C , and since C ⊥ is also a [4, 2]-code, we get C = C ⊥ .
⊥
Definition 5.15. A linear code C is said to be self-orthogonal if C ⊆ C ⊥ . A linear code
C for which C = C ⊥ is said to be self-dual.
It is easy to check that a linear code C with generator matrix G is self-orthogonal if and
only if the rows of G are orthogonal to themselves and to each other. Note that a linear
[n, k]-code is self-dual if and only if it is self-orthogonal and k = n/2.
By Proposition 5.13 (1), we can describe the dual code as the solutions to certain equations. The system of equations x · Gt = 0 is called the parity check equations for the code
THE DUAL OF A LINEAR CODE
31
C ⊥ . A string x = x1 x2 · · · xn ∈ Fnq is in the dual code C ⊥ if and only if its components
x1 , . . . , xn satisfy the parity check equations for C ⊥ .
Definition 5.16. A parity check matrix for a linear q-ary [n, k]-code C is a matrix P
with the property that
C = {x ∈ Fnq | x · P t = 0}.
Note that, unlike a generator matrix, we make no requirement that the rows of P be
linearly independent. Of course, parity check matrices in which the rows are linearly independent are smaller and therefore more efficient than other parity check matrices.
Any linear code C has a parity check matrix. In particular, a generator matrix for the
dual code C ⊥ is a parity check matrix for C. We have now two convenient ways to define a
linear code C: by giving a generator matrix or by giving a parity check matrix.
One of the advantages of a generator matrix in left standard form is that such a description makes it easy to encode and decode source messages. Another advantage is that it is
easy to construct a parity check matrix from a generator matrix that is in left standard form.
Let G = (Ik | B) be a generator matrix for C. Consider P = (−B t | In−k ). Then
µ
¶
−B
t
GP = (Ik | B)
= −B + B = O
In−k
where O is the k × (n − k) zero matrix. Hence, the rows of P are orthogonal to the rows of
G and since rank(P ) = n − k = dim(C ⊥ ), we deduce that P is a generator matrix for the
dual code C ⊥ . We have the following.
Proposition 5.17. The matrix G = (Ik | B) is a generator matrix for an [n, k]-code C
if and only if the matrix P = (−B t | In−k ) is a parity check matrix for C.
Example 5.18. The code H2 (3) in Example

0 1 1 1
P = 1 0 1 1
1 1 0 1
5.9 has parity check matrix

1 0 0
0 1 0 .
0 0 1
In this case, the parity check equations are
x2 + x3 + x4 + x5 = 0
x1 + x3 + x4 + x6 = 0 .
x1 + x2 + x4 + x7 = 0
A matrix of the form A = (M | Ik ) is said to be in right standard form. By Proposition
5.17, it is easy to go back and forth between generator matrices in left standard form and
parity check matrices in right standard form.
The use of parity check matrices that are in right standard form also has some interesting
features. For instance, the code H2 (3) in Example 5.18 has parity check matrix in right
standard form. A string x = x1 x2 · · · x7 is in H2 (3) if and only if
x5 = x2 + x3 + x4
x6 = x1 + x3 + x4 .
x7 = x1 + x2 + x4
This description of H2 (3) is very pleasant, for we can easily generate codewords from it by
just picking values for x1 , x2 , x3 and x4 and substituting, or we can easily determine whether
or not a given string is a codeword.
32
5. LINEAR CODES
The Minimum Distance of a Linear Code
In order to determine the minimum distance for an arbitrary code C of size M , we need
to check each of the M (M − 1)/2 distance d(c, d) between codewords. For linear codes, we
can greatly simplify the task.
Definition 5.19. The weight w(x) of a string x ∈ Fnq is defined to be the number of
nonzero positions in x. The weight of a code C, denoted by w(C), is the minimum weight
of all nonzero codewords in C.
Lemma 5.20. d(x, y) = w(x − y) for all strings x, y in Fnq .
Since for a linear code C, we have that c, d ∈ C implies c − d ∈ C, by Lemma 5.20, we
have the following.
Proposition 5.21. If C is a linear code, then d(C) = w(C).
It is important to emphasize that Proposition 5.21 holds only for codes which are additive
subgroups of Fnq .
As we have said, a linear code C can be described either by giving a generator matrix
G or a parity check matrix P . Both method have advantages. For instance, it is easier to
generate all codewords in C from G. On the other hand, to use P to generate all codewords
in C requires solving a system of linear equations. However, it is easier to determine whether
or not a given string is in C by using P . Furthermore, there does not seem to be a simple,
direct method for determining the minimum weight of a linear code from a generator matrix.
However, the following result shows that it is easy to do so from a parity check matrix.
Proposition 5.22. Let P be a parity check matrix for a linear code C. Then the minimum distance of C is the smallest integer r for which there are r linearly dependent columns
in P .
Recall the sphere-covering lower bound on Ar (n, d), is given by
rn
≤ Ar (n, d).
Vrn (d − 1)
It happens that we can improve upon this bound, in some cases, by considering linear codes
and using Proposition 5.22.
Theorem 5.23 (Gilbert-Varshamov bound). There exists a q-ary linear [n, k, d]-code if
qn
q k < n−1
.
Vq (d − 2)
Thus, if q k is the largest power of q satisfying this inequality, we have Aq (n, d) ≥ q k .
The inequality displayed in Theorem 5.23 is known as the Gilbert-Varshamov inequality.
The following example will show that the Gilbert-Varshamov bound is better than the
sphere-covering bound.
Example 5.24. The sphere-covering bound says that A2 (5, 3) ≥ 2. On the other hand,
the Gilbert-Varshamov bound says that there exists a binary linear (5, 2k , 3)-code if 2k < 32/5
and so we may take k = 2, showing that there is a binary linear (5, 4, 3)-code, whence
A2 (5, 3) ≥ 4.
CORRECTING ERRORS IN A LINEAR CODE
33
Correcting Errors in a Linear Code
Nearest neighbor decoding involves finding a codeword closest to the received string.
There are betters methods for decoding with linear codes.
Let us recall a few simple fact about quotient spaces. If W is a subspace of V over
K, the quotient space of V modulo W is defined by V /W = {v + W | v ∈ V }. The set
v + W = {v + w | w ∈ W } is called a coset of W . The quotient space is also a vector space
over K, where λ(v + W ) = λv + W and (v + W ) + (v 0 + W ) = (v + v 0 ) + W for all λ ∈ K
and v, v 0 ∈ V . Recall that v + W = v 0 + W if and only if v − v 0 ∈ W .
Now let us suppose that a string x ∈ Fnq is received. nearest neighbor decoding requires
that we decode x as a codeword c for which x − c has smallest weight. But as c ranges
over a linear code C, x − c ranges over the coset x + C. Hence, nearest neighbor decoding
requires that we decode x as the codeword c = x − f , where f is a string in x + C of smallest
weight.
Let C be a q-ary linear [n, k]-code. The process can be described in terms of so-called
standard array for C,
0
f2
f3
..
.
c1
f2 + c 1
f3 + c 1
..
.
c2
f2 + c2
f3 + c2
..
.
···
···
···
..
.
fqn−k fqn−k + c1 fqn−k + c2 · · ·
cqk
f2 + c q k
f3 + c q k
..
.
fqn−k + cqk
The first row of the arry consists of codewords in C. To form the second row, we choose a
string f2 of smallest weight that is not in the first row and add it to each codeword of the first
row. This forms the coset f2 + C. In general, the i-th row of the array is formed by choosing
a string fi of smallest weight that is not yet in the array and adding it to each codeword
of the first row, to form the coset fi + C. The elements fi are called the coset leader of the
array.
The following basic fact about standard arrays will be used repeatedly
Lemma 5.25. Let C be a q-ary linear [n, k]-code with standard array A.
(1) Every strings in Fnq appears exactly once in A.
(2) The number of rows of A is q n−k .
(3) Two strings x and y in Fnq lie in the same coset (row) of A if and only if their
difference x − y is in C.
(4) The coset leader has minimum weight among all strings in its coset.
Example 5.26. A standard array for the binary [4, 2]-code C = {0000, 1011, 0110, 1101}
is
0000
1000
0100
0001
1011
0011
1111
1010
0110
1110
0010
0111
1101
0101
1001
1100
We remark that standard arrays are not unique. For instance, in the array of the previous
example, we could have chosen 0010 to be the coset leader for the third row of the array.
We now come to the purpose of standard arrays, which is to implement nearest neighbor
decoding.
34
5. LINEAR CODES
Proposition 5.27. Let C be a q-ary [n, k]-code with standard array A. For any string
x in Fnq , the codeword c that lies at the top of the column containing x is a nearest neighbor
codeword to x.
Notice that the difference x − c between the received string x and the nearest neighbor
interpretation c at the top of the column containing x, is the coset leader for the coset
containing x. This coset leader is called the error string.
Nearest neighbor ties are always decided when using a standard array. Thus, standard
array decoding is complete decoding. Recall that if C is a linear [n, k, d]-code, then it is
v-error-correcting, where v = b d−1
c. Put another way, any errors that result in an error
2
string of weight v or less are corrected. It follows that the coset leaders of any standard
array for C must include all strings of weight v or less.
One of the advantage of parity check matrices is that they can be used for efficient
implementation of nearest neighbor decoding.
Definition 5.28. Let P be the parity check matrix for a linear code C ⊆ Fnq . The
syndrome S(x) of a string x ∈ Fnq is the product x · P t .
We remark that the syndrome function S has the properties that S(x + y) = S(x) + S(y)
and S(λx) = λS(x). Note also that the parity check equation x · P t = 0 is equivalent to
S(x) = 0 and so x ∈ C if and only if S(x) = 0.
The main importance of the syndrome comes from the following lemma.
Lemma 5.29. Let C be a linear code. Two strings x and y are in the same coset of any
standard array for C if and only if they have the same syndrome.
Recall that under nearest neighbor decoding, the error string e in a received word x
is the coset leader of the coset containing x and that the nearest neighbor codeword is
c = x − e. But, the syndrome of x is equal to the syndrome of e and since the syndromes
of the coset leaders are all distinct, we can find e simply by comparing the syndrome of x
to the syndromes of the coset leaders.
Now, nearest neighbor decoding can be implemented by the following simple algorithm.
(1) Compute the syndrome S(x) of the received string x.
(2) Compare it with the list of syndromes of the coset leaders {fi }. If S(x) = S(fi ),
then fi is the error string and c = x − fi is a nearest neighbor codeword.
Thus, we need only a list of coset leaders and their syndromes, which we refer to as a
syndrome table for C
coset leader syndrome
0
0
f2
S(f2 )
f3
S(f3 )
..
..
.
.
fqn−k
S(fqn−k )
This process is referred to as syndrome decoding.
Note that a standard array for a q-ary [n, k]-code C has q n−k rows. If P is a parity check
matrix for C and P has linearly independent rows, then it has size (n − k) × n and therefore
| = q n−k , we conclude that the set
. Since |Fn−k
each syndrome x · P t is an element of Fn−k
q
q
n−k
of syndrome is precisely the entire space Fq .
CORRECTING ERRORS IN A LINEAR CODE
35
Let C be a linear code. We have seen that syndrome decoding will result in correct
codeword if and only if the error made in transmission is one of the coset leader. Assuming
a channel wherein the probability that a code symbol is changed to any other code symbol
is p, if we let wi be the number of coset leaders that have weight i, then the probability of
correct decoding is
n
X
P(correct decoding) =
wi pi (1 − p)n−i .
i=1
In general, the problem of determining the number wi of coset leaders of weight i is quite
difficult. However, in the case of perfect linear [n, k, d]-codes,
¡ ¢we can easily determine these
numbers. By using the result in Exercise 11, we have wi = ni for 0 ≤ i ≤ b d−1
c and wi = 0
2
d−1
for i > b 2 c.
An error in the transmission of a codeword c will go undetected if and only if the error
string is a nonzero codeword. Hence, if Ai denotes the number of codewords of weight i, for
the channel wherein the probability that a code symbol is changed to any other code symbol
is p, the probability of an undetected error is
n
X
P(undetected error) =
Ai pi (1 − p)n−i .
i=1
In defining a communications channel, we included the requirements that symbol errors
be independent of time. While these assumptions make life a lot simpler, they are not always
realistic. This leads us to the concept of a burst error.
Definition 5.30. A burst in Fnq of length b is a string in Fnq whose nonzero coordinates
confined to b consecutive positions, the first and last of which must be nonzero.
For example, the string 0001100100 in F10
2 is a burst of length 5. Note that not all of the
coordinates between the first and last 1s need be nonzero.
Note that if a linear code is to correct any burst of length b or less, then no such burst
can be a codeword. The following lemma will be useful.
Lemma 5.31. Let C be a linear [n, k]-code over Fq . If C contains no bursts of length b
or less, then k ≤ n − b.
We have seen that the more errors we expect a code to detect or correct, the smaller must
be the size of the code. This situation for burst error detection is settled by the following
result.
Proposition 5.32. If a linear [n, k]-code C can detect all burst errors of length b or less,
then k ≤ n − b. Furthermore, there is a linear [n, n − b]-code that will detect all burst errors
of length b or less.
Now let us consider burst correction.
Proposition 5.33. If a linear [n, k]-code C can correct all burst errors of length b or
less, using nearest neighbor decoding, then k ≤ n − 2b.
If a code can correct any burst error of length b or less, then no two such burst can lie
in the same coset of a standard array of C. Thus by counting the total number of bursts of
length b or less, we get an lower bound on the number of cosets of C, and hence an upper
bound on the dimension of C.
36
5. LINEAR CODES
Proposition 5.34. If a linear [n, k]-code C over Fq can correct all burst errors of length
b or less, using nearest neighbor decoding, then
k ≤ n − b + 1 − logq [(n − b + 1)(q − 1) + 1].
Finally, we introduce a procedure referred to as majority logic decoding. This procedure
often provides a simple method for decoding a linear code.
Definition 5.35. A system of parity check equations for a linear code is said to be
orthogonal with respect to the variable xi provided xi appears in every equation of the
system, but all other variables appear in exactly one equation.
Now suppose a system of parity check equations is orthogonal with respect to xi and
suppose that a single error occurs in transmission. If the error is in the i-th position, then
xi is incorrect, but all other xj are correct. Hence, each equation will be unsatisfied. On
the other hand, if the error is in any position other then the i-th position, then exactly one
of the equations will be unsatisfied. Thus, the number of unsatisfied equations will tell us
whether or not the i-th position in the received string is correct (assuming a single error).
More generally, suppose we have r parity check equations which is orthogonal with respect
to the variable xi . Suppose further that t ≤ r/2 error have occurred in transmission. If one
of the errors is in the i-th position, then at most t − 1 of the equations can be corrected by
the remaining errors, and so at least r − (t − 1) ≥ r/2 + 1 equations will be unsatisfied. On
the other hand, if the i-th position does not suffer an error, then at most t ≤ r/2 equations
will be unsatisfied. Therefore, the i-th position in the received string is in error if and only
if the majority of equations is unsatisfied. This is majority logic decoding.
Exercise
(1) Let F be an arbitrary field. Prove that if F ∗ is cyclic, then F must be a finite field.
(2) Is the code En consisting of all codewords in Fn2 with even weight a linear code? If so,
give a basis, state the dimension and find the minimum weight.
(3) Prove that for a binary linear code C, either all of the codewords have even weight or
else exactly half of the codewords have even weight.
(4) Write out all of the codewords for the ternary code with generator matrix
µ
¶
1 0 1 1
0 1 1 2
(5)
(6)
(7)
(8)
and find the parameters of the code. Show that it is perfect.
Let C be a binary linear code. Let C c be the set of complements (c.f. Chapter 4) of
codewords in C. Let 1 = 1 · · · 1 ∈ Fn2 .
(a) Show that if 1 ∈ C then C c = C.
(b) Is C also a linear code?
(c) Show that C ∪ C c is a linear code.
Let C be a linear code and let C (c.f. Chapter 4) be the extended code defined by adding
an overall parity check to C.
(a) Show that C is also a linear code.
(b) If P is the parity check matrix for C, what is the parity check matrix for C?
Prove that a binary self-dual [n, n/2]-code exists for all positive even integers n.
Let G be a generator matrix for a q-ary linear code C. Then C is self-dual if and only
if distinct rows of G are orthogonal and each row of G has weight divisible by q.
EXERCISE
37
(9) Show that there is no binary linear [90, 78, 5]-code.
(10) Let 1 = 1 · · · 1 ∈ Fn2 .
(a) Show that if C is a binary self-orthogonal code, then all codewords in C have even
weight and 1 ∈ C ⊥
(b) Suppose that n is odd. Show that if C is a binary [n, (n − 1)/2]-code, then C ⊥ is
generated by any basis for C together with the string 1
(11) Let C be a linear [n, k, d]-code with standard array A. Show that C is perfect if and
c or less.
only if the coset leaders of A are precisely the strings of weight b d−1
2
(12) Let A and B be mutually orthogonal subset of Fnq , that is, a · b = 0 for all a ∈ A and
b ∈ B. Suppose furthermore that |A| = q k and |B| > q n−k−1 . Show that A is a linear
code.
CHAPTER 6
Some Linear Codes
Maximum Distance Separable Codes
For fixed n and k, We may ask for the largest minimum distance d among all linear
[n, k]-code. This problem has a very simple answer and leads to some fascinating theory.
The Singleton bound (Theorem 4.31) or Proposition 5.22 implies the following.
Lemma 6.1. For a linear [n, k]-code, we must have d ≤ n − k + 1.
Definition 6.2. A linear [n, k]-code with minimum distance d = n − k + 1 is called a
maximum distance separable code or an MDS code.
It is not hard to see that q-ary MDS codes exist with parameters [n, n, 1], [n, 1, n] and
[n, n − 1, 2]. This codes are referred to as the trivial MDS codes. Thus, any nontrivial MDS
[n, k]-code must satisfy 2 ≤ k ≤ n − 2.
Proposition 5.22 says that a linear code has minimum distance d if and only if any d − 1
columns of a parity check matrix are linearly independent but some d columns are linearly
dependent. Thus we have the following
Lemma 6.3. Let C be a linear [n, k]-code with parity check matrix P . Then C is MDS if
and only if any n − k columns of P are linearly independent.
If we choose the parity check matrix P of C with the property that rows of P are linearly
independent, then P is a genera—or matrix for the dual code C ⊥ . We can characterize MDS
codes in terms of their generator matrices.
Proposition 6.4. Let C be a linear [n, k]-code with generator matrix G. Then C is
MDS if and only C ⊥ is MDS. Furthermore, we have that C is MDS if and only if any k
columns of G are linearly independent.
Hear is another beautiful characterization of MDS codes.
Proposition 6.5. Let C be an [n, k]-code with generator matrix G = (Ik | M ) in left
standard form. Then C is an MDS code if and only if every square submatrix of A is
nonsingular.
The support of a vector x ∈ Fnq is the set of all coordinate positions where x is nonzero.
Our next result characterizes MDS codes in yet another way.
Proposition 6.6. A linear [n, k, d]-code C is an MDS code if and only if given any d
coordinate positions, there is a codeword whose support is precisely these positions.
Since MDS codes are very special, it is not surprising that the existence of such a code
puts strong constraints on the possible values of the parameters of the code.
Lemma 6.7. There are no nontrivial MDS [n, k]-codes for which 1 ≤ k ≤ n − q.
39
40
6. SOME LINEAR CODES
By applying Lemma 6.7 to the dual code C ⊥ , we get the dual result.
Corollary 6.8. There are no nontrivial MDS [n, k]-codes for which q ≤ k ≤ n.
Lemma 6.7 and Corollary 6.8 can be restated as the following.
Proposition 6.9. If a nontrivial MDS [n, k]-code exists, then n − q + 1 ≤ k ≤ q − 1.
This Proposition spells apathy for binary MDS code.
Corollary 6.10. The only binary MDS codes are the trivial codes.
One of the most important problems related to MDS codes is the following. Given k and
q, find the largest value of n for which there exists a q-ary MDS [n, k]-code. Let us denote
this value of n by m(k, q). According to Proposition 6.9, m(k, q) ≤ k + q − 1.
It is not difficult to construct a family of MDS codes. Let α1 , . . . , αu be nonzero elements
from a field. The Vandermonde matrix based on these elements is


1
1
···
1
 α1
α2 · · · αu 

 2
2
α
α
· · · αu2  .

2
V (α1 , . . . , αu ) =  1
..
.. 
 ...
.
···
. 
u−1
u−1
u−1
α1
α2
· · · α4
Lemma 6.11. The determinant of the Vandermonde matrix is
Y
det[V (α1 , . . . , αu )] =
(αj − αi ).
1≤i<j≤u
Now let Fq = {0, α1 , . . . , αq−1 } and consider the following (q − k + 1) × (q + 1) matrix
obtained from a Vandermonde matrix by adding two additional columns


1
···
1
1 0
 α1 · · · αq−1 0 0
 2

2


H1 =  α1 · · · αq−1 0 0
..
.. .. 
 ..
 .
···
.
. .
q−k
q−k
α1
· · · αq−1 0 1
where 1 ≤ k ≤ q. Using Lemma 6.11, any q −k +1 columns of H1 form a nonsingular matrix.
Therefore, we have the following.
Proposition 6.12. For 1 ≤ k ≤ q, the matrix H1 is a parity check matrix of a q-ary
MDS [q + 1, k]-code.
Notice that we cannot, in general, add additional columns to the matrix H1 and expect
a parity check matrix for an MDS code. For instance, consider the matrix


1 ···
1
1 0 0
H2 = α1 · · · αq−1 0 1 0 .
2
0 0 1
α12 · · · αq−1
Choosing 2 columns among the first q − 1 columns along with the q + 1-th column, we get


1 1 0
 αi αj 1 .
αi2 αj2 0
HAMMING CODES
41
This matrix has determinant αi2 − αj2 . In order for any choice of distinct αi and αj , the field
Fq must have the property that if α 6= β then α2 6= β 2 . This says that the characteristic of
Fq must be 2, that is, q must be a power of 2. We have the following.
Proposition 6.13. For q = 2m , the matrix H2 is the parity check matrix of a q-ary
MDS [q + 2, q − 1]-code.
Taking account the dual codes, Propositions 6.12 and 6.13 give the following.
Corollary 6.14. For 1 ≤ k ≤ q, there exist q-ary MDS [q+1, k]-code and [q+1, q−k+1]code. For q = 2m , there exist q-ary MDS [q + 2, q − 1]-code and [q + 2, 3]-code.
We remark that for k ≥ 3 and q odd, we can improve upon this slightly. Thus, for a
nontrivial q-ary MDS [n, k]-code, with k ≥ 3 and q odd, we have n ≤ q + k − 2. At this
point, we have gathered enough information to determine the value of m(3, q). In fact, we
have
(
q + 1 if q is odd,
m(3, q) =
q + 2 otherwise.
It has been conjectured that, except for the case k = 3 and q even, if there exists a nontrivial
MDS [n, k]-code, then m(k, q) = q + 1.
Hamming Codes
The Humming codes Hq (h) are probably the most famous of all error-correcting codes.
They are perfect, linear codes that decode in a very elegant manner.
For a given code alphabet Fq , we can construct a parity check matrix P with h rows and
with the maximum possible number of columns such that no two of its columns are linearly
dependent but some set of three columns is linearly dependent. First, pick any nonzero
column v1 in Fhq . Then pick any nonzero column v2 in Fhq \ {αv1 | α 6= 0}. We continue to
pick nonzero columns and then discard all nonzero scalar multiples of the chosen column until
all columns have been discarded. The result is a parity check matrix with (q h − 1)/(q − 1)
columns and with the properties we want.
The resulting matrix, known as a Hamming matrix of order h, has the following property.
Theorem 6.15. The Hamming matrix of order h is a parity check matrix of a q-ary
linear [n, k, 3]-code with parameters
n=
qh − 1
,
q−1
k = n − h,
d = 3.
This code Hq (h) is known as a q-ary Hamming code of order h. It is an exactly singleerror-correcting perfect code.
Notice that the choice of columns is not unique and so there are many different Hamming
matrices and Hamming codes with a given set of parameters. However, any Hamming matrix
can be obtained from any other with the same parameters by permuting the columns and
multiplying some columns by nonzero scalars. Hence any two Hamming codes of the same
size are equivalent.
Example 6.16. The binary case is by far the most common, where H2 (h) is a binary
linear [2h − 1, n − h, 3]-code. For instance, the parity check matrix for the binary Hamming
42
6. SOME LINEAR CODES
code H2 (3) is


0 0 0 1 1 1 1
H2 (3) = 0 1 1 0 0 1 1
1 0 1 0 1 0 1
Notice that the i-th column of H2 (3) is simply the binary representations of i. Now, if a
single error occurs in transmission in the i-th position, resulting in the error vector ei , the
syndrome of the received word is ei H2 (3)t , which is just the i-th column of H2 (3) written as
a row.
The previous example leads to the following.
Proposition 6.17. If a codeword from the binary Hamming code H2 (h) suffers a single
error, resulting in the received string x, then the syndrome S(x) = xH2 (h)t is the binary
representation of the position in x of the error.
In the non-binary case, we can do almost as well by choosing the columns of the parity
check matrix Hq (h) in increasing size as q-ary numbers, but for which the first nonzero entry
in each column is 1. For instance, the parity check matrix for H3 (3) is


0 0 0 0 1 1 1 1 1 1 1 1 1
H3 (3) = 0 1 1 1 0 0 0 1 1 1 2 2 2
1 0 1 2 0 1 2 0 1 2 0 1 2
Now, if an error occurs in the i-th position, the error will have the form αei for some nonzero
scalar α. Hence, the syndrome is αei Hq (h)t which is α times the i-th column of Hq (h) written
as a row. Because the way Hq (h) was constructed, we see that α is the first nonzero entry
in the syndrome. Multiplying the syndrome by α−1 will give us the i-th column of Hq (h),
telling us the position of the error.
Since the Hamming codes have some special properties, it is not surprising that their
dual codes also have special properties. we will restrict attention to binary codes. The dual
of the binary Hamming code H2 (h) is called the simplex code S(h). Since the rows of the
parity check matrix H2 (h) for H2 (h) is linearly independent, H2 (h) is a generator matrix for
S(h).
The simplex code S(h) is a [2h − 1, h]-code. To determine the distance properties of the
simplex codes, we observe that the generator matrix H2 (h + 1) can be obtained from two
copies of the matrix H2 (h) as follows


0 ···
0 1 1 ···
1


0

H2 (h + 1) = 
..


H2 (h)
.
H2 (h)
0
Now, any codeword c ∈ S(h + 1) is a sum of some of the rows of H2 (h + 1). Hence, c = xαy,
where x is a sum of rows of H2 (h) and is therefore a codeword in S(h), α is either 0 or 1 and
y is equal to x or xc (the complement of x), depending upon whether or not the first row
of H2 (h + 1) is included in the sum. These cases are summarized in the following theorem
which completely describes the simplex codes.
Theorem 6.18. The simplex code S(h) can be described as follows.
(1) S(2) = {000, 011, 101, 110}.
GOLAY CODES
43
(2) For any integer h ≥ 2,
S(h + 1) = {x0x | x ∈ S(h)} ∪ {x1xc | x ∈ S(h)}.
Furthermore, d(c, d) = 2h−1 , for every pair of distinct codewords c and d in S(h).
Theorem 6.18 explains why the codes S(h) are referred to as simplex codes. The line
segments connecting the codewords form a regular simplex.
Golay Codes
There are total of four Golay codes: two binary codes and two ternary codes. We will
define these codes by giving generating matrices, as did Marcel Golay in 1949.
The Binary Golay Code G24 . The binary Golay
generator matrix has the form G = [I12 | A], where

0 1 1 1 1 1 1 1 1
1 1 1 0 1 1 1 0 0

1 1 0 1 1 1 0 0 0

1 0 1 1 1 0 0 0 1

1 1 1 1 0 0 0 1 0

1 1 1 0 0 0 1 0 1
A=
1 1 0 0 0 1 0 1 1
1 0 0 0 1 0 1 1 0

1 0 0 1 0 1 1 0 1

1 0 1 0 1 1 0 1 1

1 1 0 1 1 0 1 1 1
1 0 1 1 0 1 1 1 0
code G24 is a [24, 12]-code whose
1
0
1
0
1
1
0
1
1
1
0
0
1
1
0
1
1
0
1
1
1
0
0
0

1
0

1

1

0

1

1
1

0

0

0
1
We will show that G24 has minimum weight 8.
It is straightforward to check that, if r and s are rows of G, then r · s = 0. Hence
⊥
⊥
G24 ⊆ G24
. Since G24 and G24
both have dimension 12, they must be equal. By Proposition
5.17, we have that the matrix [At | I12 ] is a parity check matrix of G24 . Since G24 is self-dual
and A is a symmetric matrix, we have the following.
Lemma 6.19. Let [I12 | A] be the generator matrix for the Golay code G24 in left standard
form. Then the matrix [A | I12 ] is also a generator matrix for G24 .
We take advantage of the two generator matrices G1 = [I12 | A] and G2 = [A | I12 ] for
G24 . Suppose that c ∈ G24 and consider c = xy. We have that w(x) ≥ 1 and w(y) ≥ 1.
Suppose that w(c) = 4. If w(x) = 1, then c must be a row of G1 and similarly, if w(y) = 1,
then c must be a row of G2 . None of these has weight 4. These leave only the possibility
w(x) = w(y) = 2, which can be rule out by checking that no sum of any two rows of G1 has
weight 4. Hence, there is no codeword in G24 of weight 4. However, if r and s are rows of
G1 , then w(r + s) ≡ w(r) + w(s) (mod 4). It implies that the weight of every codeword in
G24 is divisible by 4. We can now state the following
Theorem 6.20. The binary Golay code G24 is a [24, 12, 8]-code.
Since G24 is a [24, 12, 8]-code, syndrome decoding would require that we construct 224 /212 =
4096 syndromes. On the other hand, using the structure of G24 , we can considerably reduce
the work involved in decoding.
44
6. SOME LINEAR CODES
Since G24 is self-dual, the matrix G1 = [I12 | A] and G2 = [A | I12 ] are both parity check
matrices for G24 . Suppose that 3 or fewer errors occur in the transmission of a codeword
and let x be the received string and e be the error string. Let us write e = fg. we can
compute the syndromes of the received string using both parity check matrices as follows:
S1 (x) = xGt1 = f + gA and S2 (x) = xGt2 = f A + g. Now let us examine the possibilities.
(1) If w(f ) = 0, then e = 0g = 0S2 (x) and w(S1 (x)) ≥ 5, w(S2 (x)) ≤ 3.
(2) If w(g) = 0, then e = f0 = S1 (x)0 and w(S1 (x)) ≤ 3, w(S2 (x)) ≥ 5.
(3) If w(f ) ≥ 1 and w(g) ≥ 1 then w(S1 (x)) ≥ 5, w(S2 (x)) ≥ 5.
Thus, if either syndrome has weight at most 3, we can easily recover the error string e. If
w(S1 (x)) and w(S2 (x)) are both greater than 3, we know that one of the following holds
(a) w(f ) = 1 and w(g) = 1 or 2: We have f = ei , where ei is the string with 1 in the
i-th position and zeros elsewhere. Consider yu = (x + eu 0)Gt2 = (ei g + eu 0)Gt2 =
ei A + g + eu A. Then w(yu ) = 1 or 2 precisely when u = i; otherwise, w(yu ) ≥ 4.
Thus, we can determine both the error position i and second half g, by looking at
the 12 strings y1 , . . . , y12 .
(b) w(f ) = 2 and w(g) = 1: We have f = ei + ej for some i 6= j. Consider yu =
(x + eu 0)Gt2 = ei A + ej A + g + eu A. Then w(yu ) ≥ 4 for all u = 1, . . . , 12. In this
case, we use similar computation using Gt1 . Because g = eλ for some λ, we have
zu = (x + 0eu )Gt1 = f + eλ A + eu A which has weight w(zu ) = 2 if u = λ and weight
w(zu ) ≥ 5 for u 6= λ. Thus we may easily pick out f = zλ and the error position λ
by looking at the 12 strings z1 , . . . , z12 .
In summary, if at most three errors occur, then we can decode correctly by computing
at most the 26 syndromes
xGt1 , xGt2 , (x + e1 0)Gt2 , . . . , (x + e12 0)Gt2 , (x + 0e1 )Gt1 , . . . , (x + 0e12 )Gt1 .
The Binary Golay Code G23 . The binary Golay code G23 is obtained by puncturing
the code G24 in its last coordinate position. (We remark that puncturing the code G24 in
any of its coordinate positions will lead to an equivalent code.) The resulting punctured
code has length 23 and since the distance between codewords in G24 is greater than 1, all
of the punctured codewords are distinct, so G23 has the same size as G24 . It is clear that
puncturing a code cannot increase the minimum distance nor decrease it by more than 1
and so d(G23 ) = 7 or 8. But the parameters [23, 12, 7] satisfy the sphere-packing condition
and so d(G23 ) = 7.
Theorem 6.21. The binary Golay code G23 is a perfect binary [23, 12, 7]-code.
We will see that the code G23 can also be defined as a cyclic code, and this leads to
efficient decoding procedures for G23 .
The Ternary Golay Codes. The ternary
matrix G = [I6 | B] where

0 1 1
1 0 1

1 1 0
B=
1 2 1
1 2 2
1 1 2
Golay code G12 is the code with generator
1
2
1
0
1
2
1
2
2
1
0
1

1
1

2

2
1
0
REED-MULLER CODES
45
As with the binary Golay code, the ternary Golay code G12 is self-dual and it is also
generated by the matrix [B | I6 ]. We can also construct the ternary Golay code G11 by
puncturing G12 in its last coordinate position.
Theorem 6.22. The ternary Golay code G12 is a [12, 6, 6]-code and the ternary Golay
code G11 is a perfect [11, 6, 5]-code.
Many coding theorists established the uniqueness of Golay code. Their results can be
summarized by saying that any code (linear or nonlinear) that has the parameters of a Golay
code is equivalent to a Golay code.
We also mention another remarkable result concerning the existence of perfect codes. As
we have seen, the code consisting of a single codeword, the entire space and the repetition
codes are all perfect. These are referred to as the trivial perfect codes.
Theorem 6.23. For alphabets of prime power size, all nontrivial perfect codes C have
the parameters of either a Hamming code or a Golay code. Furthermore,
(1) if C has the parameters of a Golay code, then it is equivalent to that Golay code.
(2) if C is linear and has the parameters of a Hamming code, then it is equivalent to
that Hamming code. However, there are nonlinear perfect codes with the Hamming
parameters.
However, over any alphabet, the only nontrivial t-error-correcting perfect code with t ≥ 3 is
the binary Golay code G23 .
Notice that there are some gaps in Theorem 6.23. With regard to alphabets of prime
power size, it is not know how many nonequivalent, nonlinear perfect codes there are with
Hamming parameters. In 1962, Vasil0 ev discovered a family of such codes, which we discuss
in the exercise. More generally, it is still not known whether there are perfect double-errorcorrecting codes over any alphabet whose size is not a power of a prime. (It is conjectured
that there are none.) The issue of how many nonequivalent single-error-correcting perfect
codes may exist seems to be extremely difficult.
Reed-Muller Codes
Reed-Muller codes are one of the oldest families of codes and have been widely used in
applications. For each positive integer m and each integer r satisfying 0 ≤ r ≤ m, the r-th
order Reed-Muller code R(r, m) is a binary linear [n, k, d]-code with parameters
µ ¶
µ ¶
m
m
m
n=2 , k =1+
+ ··· +
, d = 2m−r .
1
r
At first, we restrict attention to the first order Reed-Muller codes R(m), which are binary
linear [2m , m + 1, 2m−1 ]-codes.
Definition 6.24. The Reed-Muller codes R(m) are binary codes defined for all integers
m ≥ 1 as follows.
(1) R(1) = Z22 = {00, 01, 10, 11}
(2) For m ≥ 1, R(m + 1) = {uu | u ∈ R(m)} ∪ {uuc | u ∈ R(m)}.
In words, the codewords in R(m + 1) are formed by juxtaposing each codeword in R(m)
with itself and with its complement.
46
6. SOME LINEAR CODES
To demonstrate the virtues of an inductive definition, note that R(1) is a linear [2, 2, 1]code in which every codeword except 00 and 11 has weight 20 . We can easily extend this
statement to the other Reed-Muller code by induction.
Theorem 6.25. For m ≥ 1, the Reed-Muller code R(m) is a linear [2m , m + 1, 2m−1 ]-code
for which every codeword except 0 and 1 has weight 2m−1 .
The inductive definition of R(m) also allows us
codes. If Rm is a generator matrix for R(m), then a
µ
0 ··· 0 1
Rm+1 =
Rm
to define generator matrices for these
generator matrix for R(m + 1) is
¶
··· 1
Rm
We can describe the generator matrices Rm directly, both in terms of their rows and their
columns.
The first row of Rm consists of a block of 2m−1 0s followed by 2m−1 1s
0| ·{z
· · 0} 1| ·{z
· · 1}
2m−1
2m−1
The next row of Rm consists of alternating blocks of 0s and 1s of length 2m−2
0| ·{z
· · 0} 1| ·{z
· · 1} 0| ·{z
· · 0} 1| ·{z
· · 1}
2m−2
2m−2
2m−2
2m−2
In general, the i-th row of Rm consists of alternating blocks of 0s and 1s of length 2m−i . The
last row of Rm is a row of all 1s.
The columns of Rm can be describe as follows. Excluding the last row of Rm , the columns
of Rm consist of all possible binary strings of length m, which read from the top down as
binary numbers are 0, 1, . . . , 2m − 1, in this order.
It is interesting to compare the characteristics of the Reed-Muller codes with those of
Hamming codes. For approximately the same codeword length, the code size of Reed-Muller
is significantly smaller than that of Hamming code. With Hamming codes, we pay for the
large code size with a minimum distance of only 3. For the Reed-Muller codes, the relatively
large minimum distance grows along with the code size.
Since R(m) is a [2m , m + 1, 2m−1 ]-code, it is capable of correcting 2m−2 − 1 errors. Howm
ever, a standard array for R(m) has 22 −m−1 rows. Thus, decoding using a syndrome table
is time consuming, even with small value of m.
We will describe a special type of majority logic decoding, call Reed decoding, that applies
to Reed-Muller codes. Let Rm be the generator matrix for R(m) defined above and denote
the rows of Rm by r1 , . . . , rm+1 . Then for a codeword c = c1 · · · cn , we have
c = α1 r1 + · · · αm rm + αm+1 rm+1 ,
for some scalars αi ∈ F2 . Fixing i, we would like to find strings xi ∈ Fn2 such that xi ·rj = δij .
Suppose that rj = rj1 · · · rjn . We have that (eu + ev ) · rj = rju + rjv . Therefore, if eu + ev
is to be our candidate for xi , then we must have rju = rjv if j 6= i and riu 6= riv . Thus, for
each row i, we want a pair of columns that are identical except in their i-th row. We refer to
such pair of columns as good pair for the i-th row. Remark that the last row of Rm consists
of all 1s, so the last row has no good pair. On the other hand, the last row will never give
any trouble in finding good pairs for the other rows and so, for now, we can simply ignore
the last row.
REED-MULLER CODES
47
0
In fact, let Rm
be the matrix obtained from Rm by removing the last row. The columns
0
of Rm consist of the binary representations of the numbers 0, 1, . . . , 2m − 1, in this order.
Hence if rju = rjv for j 6= i and riu = 0, riv = 1, then u and v must have distance 2m−i apart.
In particular, there are exactly 2m−1 good pairs for each row.
Now imagine that a codeword
c = α1 r1 + · · · αm rm + αm+1 rm+1
is sent. Using the 2m−1 good pair for row i, we get 2m−1 expressions for αi (for i ≤ m).
Specifically, if c = c1 · · · cn and columns u and v is a good pair foe row i, then αi =
(eu + ev ) · c = cu + cv . Hence, each of these 2m−1 expressions for αi involves different
positions in the codeword c. Thus, if no more than 2m−2 − 1 errors occur, then at most
2m−2 − 1 of the coordinates cj are incorrect and so at most 2m−2 − 1 of expressions for αi
are incorrect. This means that at least 2m−1 − (2m−2 − 1) = 2m−2 + 1 of these expressions
give the correct value of αi . It follows that we can get the correct value of αi by computing
the 2m−1 expressions for αi and taking the majority value.
The final step is to obtain the coefficient αm+1 . If at most 2m−2 − 1 errors have occurred
in receiving x, then the error string e = x − c has weight at most 2m−2 − 1. Letting
d = α1 r1 + · · · αm rm , we have x − d = αm+1 rm+1 + e = αm+1 1 + e. There are two
possibilities. If αm+1 = 0, then e = x − d and if αm+1 = 1, then e = (x − c)c . Thus, if
w(x − d) ≤ 2m−2 − 1, we decode αm+1 as 0 and if w((x − d)c ) ≤ 2m−2 − 1, then we decode
αm+1 as 1.
Example 6.26. Suppose that a codeword from R(3)
x = 11011100. Consider the generator matrix

0 0 0 0 1 1 1
0 0 1 1 0 0 1
R3 = 
0 1 0 1 0 1 0
1 1 1 1 1 1 1
is sent and the received string is

1
1
.
1
1
For row 1 we have 23−1 = 4 so the good pairs are (1, 5), (2, 6), (3, 7), (4, 8).
For row 2 we have 23−2 = 2 so the good pairs are (1, 3), (2, 4), (5, 7), (6, 8).
For row 3 we have 23−3 = 1 so the good pairs are (1, 2), (3, 4), (5, 6), (7, 8).
Thus, if
c = c1 · · · c8 = α1 r1 + α2 r2 + α3 r3 + α4 r4
the expressions for α1 are
α1 = c1 + c5 = c1 + c5 = c1 + c5 = c1 + c5 .
The majority logic decision is α1 = 0 and similarly, α2 = 1 and α3 = 0. Thus,
x − (α1 r1 + α2 r2 + α3 r3 ) = 11011100 − r2 = 11101111.
Since the complement of this string has weight 1 ≤ 23−2 − 1, we decode α4 as 1. It follows
that the codeword sent is 11001100.
Now, we introduce the higher order Reed-Muller code R(r, m). In order to introduce
these codes, we begin with a discussion of Boolean functions and Boolean polynomials.
Definition 6.27. A Boolean function of m variables x1 , . . . , xm is a function f (x1 , . . . , xm )
from Fm
2 to F2 .
48
6. SOME LINEAR CODES
(1) A Boolean monomial in m variables x1 , . . . , xm of degree s is an expression of the
form
g(x1 , . . . , xm ) = xi1 xi2 · · · xis ,
1 ≤ i1 < i2 < · · · < is ≤ m.
(2) A Boolean polynomial in m variables x1 , . . . , xm is a linear combination of Boolean
monomials in these variables with coefficients in F2 . The degree of a Boolean polynomial g is the largest of the degrees of the Boolean monomials that form g.
m
The set Bm of all Boolean functions of m variables forms a vector space of size 22 over
F2 . The set Bm of all Boolean polynomials in m variables is a vector space over F2 , as is the
set Bm,r of all Boolean
¡ ¢polynomials in m variables of degree at most r.
Since there are ms distinct Boolean monomials of degree s in m variables, the total
number of distinct Boolean monomials is
µ ¶
µ ¶
m
m
1+
+ ··· +
= 2m
1
m
m
and the total number of distinct Boolean polynomials in m variables is 22 . This also
happens to be the total number of Boolean functions in m variables, which is no mere a
coincidence.
Proposition 6.28. For every Boolean function f (x1 , . . . , xm ) in Bm , there is a unique
Boolean polynomial g(x1 , . . . , xm ) in Bm for which f (α1 , . . . , αm ) = g(α1 , . . . , αm ) for all
(α1 , . . . , αm ) ∈ Fm
2 .
If we always agree to list the variables in the same order, we obtain a one-to-one correspondence between Boolean function f ∈ Bm and binary strings af of length 2m .
Example 6.29. Suppose that f ∈ B3 with
x1 x2 x3
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
f
0
1
1
0
0
0
1
1
We obtain the binary string af = 01100011. Using a convenient abuse of notation and
writing a binary string in place of the corresponding polynomial, we have
01100011 = 0110 + x1 (0011 − 0110)
= 0110 + x1 (0101)
= 01 + x2 (10 − 01) + x1 (01 + x2 (01 − 10)
= 01 + x2 (11) + x1 (01 + x2 (00))
= x3 + x2 (1) + x1 (x3 + x2 (0))
= x3 + x2 + x1 x3
Definition 6.30. Let 0 ≤ r ≤ m. The r-th order Reed-Muller code R(r, m) is the set of
all binary strings ag of length n = 2m associated with the Boolean polynomials g ∈ Bm,r .
REED-MULLER CODES
49
Example 6.31.
(1) The 0-th order Reed-Muller code R(0, m) consists of the binary strings associated with the constant polynomials 0 and 1, that is, R(0, m) =
Rep(2m ). On the other extreme, the m-th order Reed-Muller code R(m, m) consists
of all binary strings of length 2m .
(2) The first order Reed-Muller code of length n = 22 is the set of all binary strings
associated with the Boolean polynomials p of the form α0 + α1 x1 + α2 x2 , where
αi = 0 or 1. Thus, we can list the codewords in R(1, 2) as follows
Polynomial Codeword
0
0000
x1
0011
x2
0101
x1 + x2
0110
1
1111
1 + x1
1100
1 + x2
1010
1 + x1 + x2
1001
The Reed-Muller code can be obtained using the u(u + v)-construction. Recall that if C1
is an (n, M1 , d1 )-code and C2 is an (n, M2 , d2 )-code, then the u(u + v)-construction yields a
code C1 ⊕ C2 by
C1 ⊕ C2 = {c(c + d) | c ∈ C1 , d ∈ C2 }
which is a (2n, M1 M2 , d)-code with d = min{2d1 , d2 }.
Suppose that 0 < r < m and consider a codeword af ∈ R(r, m), where f ∈ Bm,r . We
can factor the variable x1 from those terms in which it appears and write f in the form
f (x1 , . . . , xm ) = x1 g(x2 , . . . , xm ) + h(x2 , . . . , xm ),
where g ∈ Bm−1,r−1 and h ∈ Bm−1,r . Let ag ∈ R(r − 1, m − 1) and ah ∈ R(r, m − 1) be
the binary strings corresponding to the polynomials g and h, respectively. The string corresponding to x1 g(x2 , . . . , xm ) is 0ag and if think of h as a Boolean polynomial in m variables
x1 , . . . , xm , then the string corresponding to h is ah ah . Hence, the string corresponding to f
is
af = 0ag + ah ah = ah (ah + ag ).
Theorem 6.32. For the Reed-Muller codes R(r, m), we have
(1) R(0, m) = Rep(2m ),
(2) R(m, m) = Fn2 , where n = 2m ,
(3) for 0 < r < m,
R(r, m) = R(r, m − 1) ⊕ R(r − 1, m − 1),
where ⊕ denotes the u(u + v)-construction.
In particular, R(r, m) has minimum distance 2m−r .
Corollary 6.33. For r < m, R(r, m) contains codewords of even weight only.
Let af ∈ R(r, m) and ag ∈ R(m − r − 1, m) with f ∈ Bm,r and g ∈ Bm,m−r−1 . Observe
that af · ag ≡ w(af g ) (mod 2). Since deg(f g) ≤ deg(f ) + deg(g) ≤ m − 1, we have af g ∈
R(m − 1, m). According to Corollary 6.33, w(af g ) is even, which implies that af · ag = 0.
50
6. SOME LINEAR CODES
Theorem 6.34. For 0 < r < m − 1, we have
R(r, m)⊥ = R(m − r − 1, m).
Exercise
(1) Let C be a q-ary MDS [n, k]-code and let Aw be the number of codewords in C of weight
w. Show that
µ
¶
n
.
Ad = (q − 1)
n−k+1
(2) We remarked in Theorem 6.23 that if C is a linear code with the same parameters as
the Hamming code H2 (h), then C is equivalent to H2 (h). We now construct a binary
nonlinear code V(h) with the same parameters as the Hamming code H2 (h).
Let n = 2h − 1 and let f : H2 (h) → Z2 be a nonlinear function with f (0) = 0. Let
π : Zn2 → Z2 be the function defined by
(
0 w(x) ≡ 0 (mod 2),
π(x) =
1 otherwise.
Now let
V(h) = {x(x + c)(π(x) + f (c)) | x ∈ Zn , c ∈ H2 (h)}
Show that V(h) is a binary [2h+1 − 1, 2h+1 − h − 2, 3]-code. Show also that V(h) is
nonlinear.
(3) In this exercise, we define and discuss the Nordstrom-Robinson code. This code has the
interesting property that it has strictly larger minimum distance than any linear code
with the same length and size.
(a) Let G = [I12 | A] be the generator matrix of G24 . Show that, by permuting columns
and using elementary row operations, the matrix G can be brought to the form


1 0 0 0 0 0 0 1
 0 1 0 0 0 0 0 1



 0 0 1 0 0 0 0 1



 0 0 0 1 0 0 0 1
* 


 0 0 0 0 1 0 0 1



 0 0 0 0 0 1 0 1

G0 = 

 0 0 0 0 0 0 1 1



 0 0 0 0 0 0 0 0

 0 0 0 0 0 0 0 0



 0 0 0 0 0 0 0 0
* 


 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0
where the asterisks represent some values.
(b) Let C be the code generated by the generator matrix G0 . Show that there are
8 × 25 = 256 codewords in C whose first eight coordinates are one of
10000001, 01000001, 00100001, 00010001,
00001001, 00000101, 00000011, 00000000.
EXERCISE
(4)
(5)
(6)
(7)
51
(c) The Nordstrom-Robinson code N is the code whose codewords are obtained from the
256 codewords obtained above by deleting the first eight coordinate positions. Show
that N is a (16, 256, 6)-code.
(d) Show that there is no linear (16, 256, 6)-code.
Assuming the Reed-Muller code R(4) is used, decode the received word 0111011011100010.
Find Boolean polynomial corresponding to the binary strings 1101111000011001.
Show that for any Boolean function f (x1 , . . . , xm−1 ), the function xm + f (x1 , . . . , xm−1 )
takes on the values 0 and 1 equally often.
Find an expression for a generator matrix for R(r, m) in terms of generator matrices for
R(r, m − 1) and R(r − 1, m − 1)
CHAPTER 7
Cyclic Codes
Basic Definitions
Definition 7.1. The right cyclic shift of a string x = x1 · · · xn−1 xn is the string xn x1 · · · xn−1
obtained by shifting each element to the right one position, wrapping the last element around
to the first position.
A linear code C is cyclic if whenever c ∈ C then the right cyclic shift of c is also in C.
As an immediate consequence of this definition, if C is a cyclic code and c ∈ C, then the
string obtained by shifting the elements of c any number of positions with wrapping is also
a codeword in C.
Example 7.2. The binary code D = {0000, 1001, 0110, 1111} is not cyclic, since shifting 1001 gives 1100, which is not in D. However, D is equivalent to a cyclic code C =
{0000, 1010, 0101, 1111}.
To get better understanding of cyclic codes, it pays to think of strings as polynomials.
In particular, to each string c = c0 c1 · · · cn−1 , we associate a polynomial c0 + c1 x + c2 x2 +
· · · + cn−1 xn−1 . Note that addition and scalar multiplication of strings corresponds to the
analogous operations for polynomials. Thus, we may think of a linear code C of length n over
Fq as a subspace of the space Pn (Fq ) of polynomials of degree less than n with coefficients
in Fq .
We can express the process of performing a right cyclic shift in terms of operations on
polynomials. Notice that multiplying a codeword p(x) = c0 + c1 x + · · · cn−1 xn−1 by x gives
xp(x) = c0 x + c1 x2 + · · · cn−1 xn which has some resemblance to a right cyclic shift and indeed
would be a right cyclic shift if we replace xn by x0 = 1.
Let Rn (Fq ) = Fq [x]/(xn − 1). Recall that Rn (Fq ) is the set of all polynomials over Fq
of degree less than n. Addition in Rn (Fq ) is the usual addition of polynomial and multiplication is ordinary multiplication of polynomials, then dividing by xn − 1 and keep only the
remainder. Note that taking the product modulo xn − 1 is very easy, since we simply take
the ordinary product and then replace xn by 1. As an example, in R4 (F2 ),
(x3 + x2 + 1)(x2 + 1) = x5 + x4 + x3 + 1
= 1 · x + 1 + x3 + 1
= x3 + x
It is also important to note that, since xn − 1 is not irreducible in Fq [x], the product of
nonzero polynomials may equal the zero polynomial.
We can now think of a linear code C over Fq as a subspace of the vector space Rn (Fq ). In
addition, if p(x) ∈ C, then the right cyclic shift of p(x) is the polynomial xp(x). In general,
applying k right cyclic shifts is equivalent to multiplying p(x) by xk .
53
54
7. CYCLIC CODES
Lemma 7.3. A linear code C ⊆ Rn (Fq ) is cyclic if and only if p(x) ∈ C implies that
f (x)p(x) ∈ C for any f (x) ∈ Rn (Fq ).
In the language of abstract algebra, the set Rn (Fq ), together with the operations of
addition, scalar multiplication and multiplication modulo xn − 1 is an algebra over Fq . Any
subset C of Rn (Fq ) that is a vector subspace and also has the property described in Lemma
7.3 is called an ideal of Rn (Fq ). That is, the cyclic codes in Rn (Fq ) are precisely the ideals
of Rn (Fq ).
An ideal C of Rn (Fq ) is called a principle ideal if there exists a polynomial g(x) ∈ C
such that
C = hg(x)i = {f (x)g(x) | f (x) ∈ Rn (Fq )}.
The following theorem, in the language of abstract algebra, says that the ring Rn (Fq ) is
a principle ideal domain.
Theorem 7.4. Let C be a cyclic code in Rn (Fq ). Then there is a unique polynomial g(x)
in C that is both monic and has the smallest degree among all nonzero polynomials in C.
Moreover, C = hg(x)i.
The unique polynomial mentioned in Theorem 7.4 is called the generator polynomial of
C.
Example 7.5. For the binary cyclic code C = {0, 1 + x2 , x + x2 , 1 + x}, we have
0 = 0 · (1 + x)
x + x2 = x · (1 + x)
1 + x2 = x2 · (1 + x)
1 + x = 1 · (1 + x)
and so C = h1 + xi. Since 1 + x has minimum degree in C, it is the generator polynomial
for C. Notice also that
0 = 0 · (1 + x)
1 + x2 = 1 · (1 + x2 )
x + x2 = x2 · (1 + x2 )
1 + x = x · (1 + x2 )
and so C is also generated by the polynomial 1 + x2 . However, since 1 + x2 does not have
minimum degree in C, it is not the generator polynomial for C.
It is very easy to characterize those polynomials that are generator polynomials.
Proposition 7.6. A monic polynomial p(x) ∈ Rn (Fq ) is the generator polynomial of a
cyclic code in Rn (Fq ) if and only if it divides xn − 1.
Proposition 7.6 is very important, for it tells us that there is precisely one cyclic code in
Rn (Fq ) for each factor of xn − 1. Thus, we can find all cyclic codes in Rn (Fq ) by factoring
xn − 1.
We have seen that if g(x) is the generator polynomial for a cyclic code C, then C consists
of all polynomial multiples of g(x). We can easily obtain a basis for C from g(x).
Theorem 7.7. Let g(x) = g0 + g1 x + · · · + gk xk be the generator polynomial of a nonzero
cyclic code C in Rn (Fq ).
(1) C has basis
B = {g(x), xg(x), . . . , xn−k−1 g(x)}
(2) C has dimension n − deg(g(x)). In fact,
C = {r(x)g(x) | deg(r(x)) < n − k}.
BASIC DEFINITIONS
(3) C has generator matrix

g0 g1 · · ·
 0 g0 g1


G =  0 0 g0
. .
 .. ..
0 0 ···
gk
···
0
gk
0
0
g1 · · · gk
.. ..
.
.
0 g0 g1
55

0
0
.. 

0
.

..
. 0
· · · gk
···
···
whose n − k rows each consist of a right cyclic shift of the row above.
We have seen that the generator polynomial g(x) of a cyclic code C ⊆ Rn (Fq ) divides
x − 1. Hence, we can write xn − 1 = h(x)g(x), where h(x) ∈ Rn (Fq ). The polynomial h(x)
which has degree equal to the dimension of C, is referred to as the check polynomial of C.
Since the generator polynomial is unique, so is the check polynomial. The following theorem
shows why the check polynomial is important.
n
Theorem 7.8. Let h(x) = h0 + h1 x + · · · + hn−k xn−k be the check polynomial of a cyclic
code C ∈ Rn (Fq ).
(1) The code C can be described by
C = {p(x) ∈ Rn (Fq ) | p(x)h(x) = 0}.
(2) The parity check matrix for C

hn−k · · ·
 0
hn−k


0
P = 0
 .
..
 ..
.
0
0
is given by
h1
···
h0
h1
0
h0
0
0
hn−k · · ·
h1
h0
..
..
.
.
···
0 hn−k · · ·

0
0
.. 

0
.

..
. 0
h1 h0
···
···
(3) The dual code C ⊥ is the cyclic code of dimension k with generator polynomial
⊥
n−k
h (x) = h−1
+ h1 xn−k−1 + · · · + hn−k ).
0 (h0 x
Example 7.9. Because x9 − 1 factors over F1 into irreducible factors as follows
x9 − 1 = (x − 1)(x2 + x + 1)(x6 + x3 + 1),
the code C = hx6 + x3 + 1i has check polynomial h(x) = (x − 1)(x2 + x + 1) = x3 + 1. Hence,
C has parity check matrix


1 0 0 1 0 0 0 0 0
0 1 0 0 1 0 0 0 0 


0 0 1 0 0 1 0 0 0 
P =

0 0 0 1 0 0 1 0 0 
0 0 0 0 1 0 0 1 0 
0 0 0 0 0 1 0 0 1
56
7. CYCLIC CODES
The Zeros of a Cyclic Code
If we have convenient access to the roots of the polynomial xn − 1, then it is possible
to characterize the cyclic code
Q in Rn (Fq ) in a slightly different way than through generator
polynomial. Let xn − 1 = i mi (x) be the factorization of xn − 1 into monic irreducible
factors over Fq . If α is a root of mi (x) in some extension field of Fq , then mi (x) is the
minimal polynomial of α over Fq . Thus, for any f (x) ∈ Fq [x], we have f (α) = 0 if and only
if f (x) = h(x)mi (x) for some h(x) ∈ Fq [x]. In particular, if consider f (x) ∈ Rn (Fq ), then
f (α) = 0 if and only if f (x) = hmi (x)i.
Now, since xn − 1 has no multiple roots, if g(x) | xn − 1 then g(x) = m1 (x) · · · mt (x) is a
product of distinct irreducible factors of xn − 1. If αi is a root of mi (x) for i = 1, . . . , t, then
hg(x)i = {f (x) ∈ Rn (Fq ) | f (αi ) = 0, i = 1, . . . , t}.
Definition 7.10. The roots of the generator polynomial of a cyclic code are called the
zeros of the code.
The representation of a cyclic code by its zeros can be used to show that some Hamming
code are cyclic codes. The binary case is the easiest, so let us consider it first.
Let n = 2r − 1. By Theorem 5.6, the set of all n distinct roots of xn − 1 over F2 is a
multiplicative cyclic group F∗2r which is the set of nonzero elements of the field of 2r elements
containing F2 . An element that generates F∗2r is called a primitive field element of F2r .
Suppose that β is a primitive field element of F2r . Consider the code C = {f (x) ∈
Rn (F2 ) | f (β) = 0}. As mentioned above C is the binary cyclic code with generator polynomial g(x) which is an irreducible factor of xn − 1. Since the degree of F2r over F2 is
r, we have that deg(g(x)) = r. Hence, C is an [n, n − r]-code. If there is a polynomial
f (x) = xi + xj ∈ Rn (F2 ) such that f (β) = 0 where 0 ≤ i < j < n, then it implies that
β j−i = 1, which contradicted to the assumption that β is primitive. Hence, the minimum
distance of C is at least 3, which implies that it must be equal to 3 (by the sphere-packing
condition). Therefore, C is a linear code with the same parameters as the Hamming code
H2 (r) and hence equivalent to H2 (r) by Theorem 6.23.
Example 7.11. Consider the Hamming code H2 (4). In this case, n = 24 − 1 = 15 and
the splitting field for xn − 1 is F16 . Consider the irreducible polynomial g(x) = x4 + x + 1
over F2 and suppose β is a root of g(x). Then we have that β 15 = 1, but β 3 6= 1 and β 5 6= 1.
Hence β is a primitive field element of F16 . We conclude that H2 (4) is equivalent to the
cyclic code generated in R15 (F2 ) by g(x) = x4 + x + 1.
In general, not every q-ary Hamming code is equivalent to a cyclic code. For instance,
we can write down all ternary cyclic codes of length 4 and find out that none of these codes
has minimum distance 3. We see that the ternary Hamming code H3 (2) is not equivalent to
a cyclic code. However, we have the following result.
Proposition 7.12. Let n = (q r − 1)/(q − 1) and assume that gcd([r, q − 1] = 1. Then
the q-ary Hamming code Hq (r) is equivalent to a cyclic code.
The Idempotent Generator of a Cyclic Code
We have seen that a complete list of all cyclic codes in Rn (Fq ) can be obtained from a
factorization of xn − 1 into monic irreducible factors over Fq . However, factoring xn − 1 is
ENCODING AND DECODING WITH A CYCLIC CODE
57
not an easy task in general. In this section, we explore another approach to describe cyclic
codes, involving a different type of generating polynomial than the generator polynomial.
Definition 7.13. A polynomial e(x) ∈ Rn (Fq ) is said to be idempotent in Rn (Fq ) if
e(x)2 = e(x).
Example 7.14. The polynomial x3 + x5 + x6 is an idempotent in R7 (F2 ) because (x3 +
x5 + x6 )2 = x6 + x10 + x12 ≡ x6 + x3 + x5 (mod x7 − 1).
Let C be a cyclic code in Rn (Fq ) with generator polynomial g(x) and check polynomial
h(x). Since xn − 1 has no multiple roots, g(x) and h(x) are relatively prime and so there
exist polynomials a(x) and b(x) for which a(x)g(x)+b(x)h(x) = 1. Let e(x) = a(x)g(x) ∈ C.
Then for any p(x) ∈ C, we have that e(x)p(x) = p(x), since p(x)h(x) = 0. In other word,
e(x) is the unique identity in C, and hence an idempotent in Rn (Fq ). e(x) also generates C,
since any polynomial is a multiple of e(x).
Proposition 7.15. e(x) is the unique polynomial in C that is both idempotent and
generates C.
We will refer to the polynomial e(x) as the generating idempotent of C. we can also compute the generator polynomial g(x) from the gerarting idempotent e(x). In fact, gcd[e(x), xn −
1] = gcd[a(x)g(x), h(x)g(x)], because xn − 1 = g(x)h(x) and e(x) ≡ a(x)g(x) (mod xn − 1).
But a(x) and h(x) are relatively prime and so gcd[e(x), xn − 1] = g(x). From this, we
have the following interesting relationship between the generator polynomial and generating
idempotent.
Proposition 7.16. Let C be a cyclic code in Rn (Fq ) with generator polynomial g(x) and
generating idempotent e(x).
(1) If α is a root of xn − 1, then g(α) = 0 if and only if e(α) = 0.
(2) Suppose that f (x) is an idempotent in Rn (Fq ) with the property that if α is a root
of xn − 1, then g(α) = 0 if and only if f (α) = 0. Then f (x) = e(x).
Let C be a cyclic code with generating idempotent e(x) and check polynomial h(x). Since
h(x)(1 − e(x)) ≡ h(x)(1 − a(x)g(x)) ≡ h(x)
(mod xn − 1),
we see that 1 − e(x) is the identity in hh(x)i. Hence the cyclic code hh(x)i has generating
idempotent 1 − e(x). Similarly, we have the following result that relates the generating
idempotents of a cyclic code and its dual.
Theorem 7.17. Let C be a cyclic code in Rn (Fq ) with generating idempotent e(x). Then
the dual code C ⊥ has generating idempotent 1 − e(xn−1 ) ∈ Rn (Fq ).
Encoding and Decoding with a Cyclic Code
There are two rather straightforward ways to encode message strings using a cyclic code.
One is systematic and the other one is nonsystematic.
Let C = hg(x)i be a q-ary cyclic [n, n − r]-code, where deg(g(x)) = r. Thus, C is capable
of encoding q-ary messages of length n − r. We consider the nonsystematic method first.
Given a source string a0 a1 · · · an−r−1 , we form the message polynomial
a(x) = a0 + a1 x + · · · + an−r−1 xn−r−1 .
This polynomial is encoded as the product c(x) = a(x)g(x).
58
7. CYCLIC CODES
To obtain a systematic encoder, we form the message polynomial
b(x) = a0 xn−1 + a1 xn−2 + · · · + an−r−1 xr .
Notice that b(x) has no terms of degree less than r. Next, we divide b(x) by g(x), b(x) =
h(x)g(x) + r(x), where deg(r(x)) < r and send the codeword c(x) = b(x) − r(x).
Definition 7.18. A q-ary (n, q k )-code is called systematic if there are k positions i1 , i2 , . . . , ik
with the property that, by restricting the codewords to these positions, we get all of the q k
possible q-ary strings of length k. The set {i1 , i2 , . . . , ik } is called an information set and the
codeword symbols in these positions are called information symbols.
Since b(x) and r(x) above have no terms of the same degree, this encoder is systematic.
In fact, reading the terms from highest degree to lowest degree, we see that the first n − r
positions are information symbols.
Example 7.19. Consider the binary cyclic [7, 4]-code generated by the polynomial g(x) =
x3 + x + 1. Consider the message 1001. Using the systematic encoder, we have b(x) = x6 + x3
and since
x6 + x3 = (x3 + x)(x3 + x + 1) + (x2 + x),
the encoded message is c(x) = (x6 + x3 ) − (x2 + x) = x6 + x3 + x2 + x.
Since a cyclic code is a linear code, we can decode using the polynomial form of syndrome
decoding. Let C be a cyclic code. If c(x) ∈ C is the codeword sent and u(x) is the received
polynomial, then e(x) = u(x) − c(x) is the error polynomial. The weight of a polynomial is
the number of nonzero coefficients.
Definition 7.20. Let C = hg(x)i be a cyclic [n, n − r]-code with generator polynomial
g(x). The syndrome polynomial of a polynomial u(x), denoted by syn(u(x)), is the remainder
upon dividing u(x) by g(x), that is,
u(x) = h(x)g(x) + syn(u(x)),
deg(syn(u(x))) < deg(g(x)).
This definition of syndrome polynomial coincides with the definition of syndrome given
for a parity check matrix of a linear code. As expected, a received polynomial u(x) is a code
word if and only if its syndrome polynomial is a zero polynomial. Also, two polynomials if
and only if they lie in the same coset of C. Thus, the polynomial form of syndrome decoding
is analogous to the vector form.
Example 7.21. The binary cyclic [7, 4]-code generated by the polynomial g(x) = x3 +
x + 1 is single-error-correcting. The coset leaders and corresponding syndrome polynomials
are
coset leader syndrome
0
0
1
1
x
x
x2
x2
x3
x+1
4
x2 + x
x
x5
x2 + x + 1
6
x2 + 1
x
ENCODING AND DECODING WITH A CYCLIC CODE
59
If, for example, the polynomial u(x) = x6 + x + 1 is received, we compute its syndrome
polynomial
x6 + x + 1 = (x3 + x + 1)(x3 + x + 1) + (x2 + x).
Since syn(u(x)) = x2 + x, its coset leader is e(x) = x4 , and so we decode u(x) as
c(x) = u(x) − e(x) = (x6 + x + 1) − x4 = x6 + x4 + x + 1.
The main practice difficulty with syndrome decoding is that coset leader syndrome decoding might become quite long. However, we can take advantage of the fact that the code
in question is cyclic as follows.
Let us denote the polynomial obtained from p(x) by performing s cyclic shifts by p(s) (x).
Suppose that u(x) = c(x) + e(x), where c(x) ∈ C is the codeword sent and u(x) is the
received polynomial. There must exist some s for which the cyclic shift e(s) (x) of the error
polynomial e(x) has nonzero coefficient of xn−1 . Since u(s) (x) = c(s) (x) + e(s) (x), we have
syn(u(s) (x)) = syn(e(s) (x)). Hence, we only need those rows of the coset leader syndrome
table that contain coset leaders of degree n − 1. Let us illustrate this process by an example.
Example 7.22. Consider the binary cyclic [7, 4]-code generated by the polynomial g(x) =
x3 + x + 1. We only need one row table: syn(x6 ) = x2 + 1. Suppose that we receive
u(x) = x6 + x + 1. Since syn(u(x)) = x2 + x is not in the table, we shift u(x) and compute
its syndrome, which is syn(u(1) (x)) = x2 + x + 1. Again this is not in the table, so we shift
again and computing the syndrome gives syn(u(2) (x)) = x2 + 1. Since the syndrome is in the
table, we deduce that e(2) (x) = x6 , and hence e(x) = x4 .
Let us take a closer look at the relationship between the unknown error polynomial and
the known syndrome polynomial. Suppose that u(x) = c(x) + e(x), where c(x) ∈ C = hg(x)i
is the codeword sent and u(x) is the received polynomial. Since u(x) = h(x)g(x)+syn(u(x)),
we have that e(x)−syn(u(x)) ∈ C. Suppose that C is a v-error-correcting-code, and suppose
that at most v errors have occurred in the transmission. Suppose further that, syn(u(x))
has weight at most v. Then e(x) − syn(u(x)) is a codeword of weight at least 2v which is
less than the minimum weight of C and so must be the zero codeword. Hence, we have the
following.
Lemma 7.23. Let C be a v-error-correcting cyclic code, and suppose that at most v errors
have occurred in the transmission. If the syndrome of the received polynomial u(x) has weight
at most v, then the error polynomial is equal to syn(u(x)).
Of course, we may not be lucky enough to encounter syndrome polynomial of weight at
most v. However, if the syndrome polynomial of a cyclic shift of u(x) has weight at most
v, then it is almost as easy to obtain the error polynomial from this syndrome polynomial.
Suppose that the syndrome polynomial of the cyclic shift u(s) (x) of u(x) has weight at most
v. Then since u(s) (x) = c(s) (x) + e(s) (x), Lemma 7.23 gives e(s) (x) = syn(u(s) (x)) and so the
error polynomial e(x) can be easily recovered from syn(u(s) (x)) by shifting additional n − s
places. This strategy is known as error trapping.
Example 7.24. Consider the binary cyclic [7, 4]-code generated by the polynomial g(x) =
x + x + 1. Suppose that we receive u(x) = x6 + x + 1. We have syn(u(1) (x)) = x2 + x + 1,
syn(u(2) (x)) = x2 + 1 and syn(u(3) (x)) = 1, and hence e(3) (x) = 1. This implies that
e(x) = x4 , just as before.
3
60
7. CYCLIC CODES
Let C be a v-error-correcting cyclic [n, n − r]-code. If v or fewer errors occur, and if they
are confined to r consecutive positions, including wrap around, then there must exist some s
for which the cyclic shift u(s) (x) of the received polynomial u(x) has its errors confined to the
r coefficients of x0 , x1 , . . . , xr−1 . Thus, u(s) (x) = c(s) (x)+e(s) (x) with deg(e(s) (x)) < r. Hence,
since c(s) (x) is a codeword, we have syn(u(s) (x)) = e(s) (x). This says that error trapping
can correct any v errors that happen to fall within r consecutive positions, including wrap
around.
The result above does not says that any burst of length r or less can be corrected. In
fact, this is not possible, because, according to Proposition ??, we know that if a cyclic
[n, n − r]-code C can correct all bust errors of length b or less, then we must have 2b ≤ r.
However, if b(x) ∈ C is a burst of length r or less, then by performing cyclic shifts of b(x),
we obtain a codeword in C with degree less than r (the degree of the generator polynomial),
which is impossible. Hence, we have the following.
Proposition 7.25. A cyclic [n, n − r]-code C contains no bursts of length r or less.
Hence, it can detect any burst error of length r or less.
Exercise
(1) Let C1 = hg1 i and C2 = hg2 i be two q-ary cyclic codes of length n.
(a) Show that C1 ⊆ C2 if and only if g2 (x) | g1 (x).
(b) Show that C1 ∩ C2 is also a cyclic code.
(c) Let C1 + C2 = {c1 + c2 | c1 ∈ C1 , c2 ∈ C2 }. Show that C1 + C2 is also a linear code.
(2) Let En be the set of even weight strings in Fn2 .
(a) Show that En is a cyclic code and En = hx − 1i.
(b) Let C = hg(x)i be a binary cyclic code of length n. Show that w(c) is even for all
c ∈ C if and only if x − 1 | g(x).
(3) Let g(x) be the generator matrix of a binary cyclic [n, n − r]-code C. Suppose that C
contains at least one codeword of odd weight.
(a) Show that the set E of all codewords in C of even weight is a cyclic code. What is
the generator
polynomial of E?
Pn−1
(b) Prove that i=0 xi ∈ C.
(4) Let C1 and C2 be two cyclic codes in Rn (Fq ) with generating idempotent e1 (x) and e2 (x),
respectively.
(a) Show that C1 ⊆ C2 if and only if e1 (x)e2 (x) = e1 (x) in Rn (Fq ).
(b) Show that C1 ∩ C2 has generating idempotent e1 (x)e2 (x) in Rn (Fq ).
(c) Show that C1 + C2 has generating idempotent e1 (x) + e2 (x) − e1 (x)e2 (x) in Rn (Fq ).
(5) Show that any set of k consecutive positions in a cyclic [n, k]-code is an information set.
(6) Let g(x) be the generator matrix of a binary cyclic [n, n − r]-code C.
(a) Let si (x) be the remainder obtained by dividing xr+i by g(x). Show that xr+i −si (x),
for i = 0, 1, . . . , n − r − 1, is a basis for C.
(b) Find the generator matrix for C, by using the basis in part (6a), and find a corresponding parity check matrix H.
(c) Suppose that u(x) = u0 + u1 x + · · · un−1 xn−1 is a received polynomial. How is the
syndrome polynomial syn(u(x)) related to the syndrome (u0 , u1 , . . . , un−1 )H t ?
(7) Let C be a cyclic [n, k]-code with generator polynomial g(x) and let ci = ci,1 ci,2 · · · ci,n
be codewords in C, for i = 1, . . . , s. We may interleave these codewords by juxtaposing
the first position in each codeword, followed by the second positions in each codewords,
EXERCISE
61
and so on, to obtain the string
c1,1 c2,1 · · · cs,1 c1,2 c2,2 · · · cs,2 · · · c1,n c2,n · · · cs,n
Let us denote by C (s) the set of all strings forms in this way from all possible choices of
s codewords in C (taken in all possible orders).
(a) Show that C (s) is a cyclic [ns, ks]-code with generator polynomial g(xs ).
(b) Suppose that C is capable of correcting burst of length b or less. Show that C (s) is
capable of correcting burst errors of length bs or less.