Download 6 Shannon`s Model of Communication

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Birthday problem wikipedia , lookup

Inductive probability wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability box wikipedia , lookup

Transcript
One checks that
(5.16)
P ( x, y) XY
I ( X; Y ) = ∑ PXY ( x, y) log
,
P
( x )PY (y)
X
x ∈X ,y∈Y
and so in particular I ( X; Y ) = I (Y; X ) (which can also be seen by noting that
I ( X; Y ) = H ( X ) + H (Y ) − H ( XY ).
6 Shannon’s Model of Communication
Shannon studied the case where a discrete memoryless channel C from X to Y is
given. Such a channel can be specified by providing, for every x ∈ X a probability
distribution P( x) (y) over the values of y ∈ Y which are received when x is input
into the channel. We will often write PY | X = x (y) or PY | X (y, x ) for this probability
distribution.9 Furthermore, if x ∈ X is given, we write C( x ) to denote a random
variable distributed as given by PY | X = x (y). If values x n = ( x1 , . . . , xn ) ∈ X n , we
analogously write
C( n ) ( x n ).
Definition 6.1. The capacity Cap(C) of a channel C is defined as
(6.1)
Cap(C) := sup I ( X; Y ) ,
PX
Y =C( X )
where the supremum is over all probability distributions PX over X , then X is picked according to PX , and Y is obtained by sending X through the channel C.
Shannon proved that in n uses of channel C, roughly n · Cap(C) bits can be transmitted. We will provide more exact statements and some proofs in the next two
sections.
6.1 The Channel Coding Theorem
We first give the theorem which states that we can transmit information over a
channel at any rate below the channel capacity.
9 This
is slight abuse of notation: when a probability distribution over PXY ( x, y) is given, we also
write PY |X = x (y) to denote the conditional probability distribution. Note that when only a channel is
provided, this is not such a conditional distribution (as there is no distribution over X ). However,
if PXY is a probability distribution over X × Y which is achieved by picking x according to some
distribution and then sending it over a channel, the two objects almost coincide.
21
Theorem 6.2 (Shannon’s Channel Coding Theorem). Let C be a discrete memoryless
channel from X to Y , e > 0, and let R = Cap(C) − e. Then, for every n > n0 (C, e),
m ≤ nR, there exist functions Enc : {0, 1}m → X n and Dec : Y n → {0, 1}m such that
for every w ∈ {0, 1}m we have
(6.2)
2
Prn [Dec(C(n) (Enc(w))) 6= w] ≤ exp(−Ω( eL2n ))
C
where L := log(|X | · |Y |).
We omit the proof of Theorem 6.2, but prove the following related result. It shows
that a random linear code over {0, 1} work well, if the channel achieves capacity
on the uniform distribution.
Exercise 6.3. Let C be the binary symmetric channel with error probability p, i.e., channel
from {0, 1} to {0, 1} which, on input x, outputs x with probability p and 1 − x with
probabilty 1 − p.
Prove that the capacity of the channel is 1 − h( p).
Theorem 6.4. Let C be a discrete memoryless channel from X = {0, 1} to Y which
achieves the capacity on the uniform input. Let e > 0, and R = Cap(C) − e. Then, for every n > n0 (C, e), m ≤ nR we have that there exists a family of functions DecG : Y n → m
such that for all w ∈ {0, 1}m \ {0m } we have
(6.3)
2
Pr [DecG (C(n) ( G · w)) 6= w] ≤ exp(−Ω( eL2n ))
G,Cn
where L := log(|Y |), and G ∈ {0, 1}n×m is a uniform random matrix.
Note that the probability in (6.3) is also over the choice of G, which makes it a
weaker statement: we only show that for every w a uniform random G will suffice,
but one would like to show that some G works for all w.
Proof. We stay a bit more general than the theorem for a moment and consider an
arbitrary channel. Let PX be a distribution which achieves capacity, and consider
the distribution PXY ( x, y) = PX ( x )PY | X = x (y) which one obtains by first picking X
according to PX and then setting Y = C( X ). Furthermore, let PX n Y n , the distribution
over n independent such copies.
For a parameter δ > 0, we then define the “typical set”
n
(6.4)
Anδ := ( x n , yn ) h X n Y n ( x n , yn ) ∈ n( H ( XY ) ± δ) ∧
We have the following claims:
h X n ( xn ) ∈ n( H ( X ) ± δ) ∧
o
h Y n ( y n ) ∈ n ( H (Y ) ± δ )
22
(1) Pick ( x n , yn ) according to PX n Y n . Then, with probability at least 1 − 2
we have ( x n , yn ) ∈ Anδ .
xn
2
−Ω( nδ2 )
L
yn
(2) Pick
according to PX n and
indepdentently according to PY n . Then, the
n
n
n
probability that ( x , y ) ∈ Aδ is at most 2−nCap(C)+3nδ .
Claim (1) follows immediately from Theorem 5.8.
To see claim (2), we note that A has at most 2n( H (XY )+δ) elements, because each
element has pointwise entropy at most n( H ( XY ) + δ), and so probability at least
2−n( H (XY )+δ) . Because of the two remaining conditions on elements in Anδ , the
probability that in the process in claim (2) a fixed pair ( x n , yn ) is picked is at most
2−n( H (X )−δ)−n( H (Y )−δ) , which gives the result. Thus, overall we have a probability
of at most 2n( H (XY )+δ)−n( H (X )−δ)−n( H (Y )−δ) = 2−nI (X;Y )+3nδ .
We now come to the proof of the theorem.
The decoder DecG works as follows: it enumerates all messages w∗ 6= 0m , and
∗
checks whether ( Gw∗ , yn ) is in Am
δ . If there is a single w for which this holds, he
outputs w∗ , otherwise he outputs a special symbol ‘fail’.
By Claim (1) above, the probability (over the choice of G and and the randomness of the channel) that for the correct message w the pair ( Gw, yn ) is typical
is 1 − exp(−Ω(mδ2 /L2 )). Furthermore, by Claim (2) and the union bound, the
probability that for any other non-zero message the pair ( Gw∗ , yn ) is typical is
2m 2−nI (X;Y )+3nδ ≤ 2−ne+3nδ . Thus, if we choose δ = e/4 we get the result.
6.2 The Converse to the Channel Coding Theorem
For completeness we add a proof that rates above capacity can not be achieved.
Theorem 6.9 (Shannon’s Converse Channel Coding Theorem). Let Z be a discrete
memoryless channel from X to Y , e > 0, and let R = Cap( Z ) + e. Then, for every
m ≥ nR and every pair of functions Enc : {0, 1}m → X n and Dec : Y n → {0, 1}m we
have
en − 1
[Dec( Z (n) (Enc(w))) 6= w] ≥
Pr
(6.5)
(
n
)
m
m
w←{0,1} ,Z
Proof sketch. We assume that the encoding function is injective – intuitively this is
no problem, but formally it loses generality, and one should do more work which
we omit. We pick the message w uniformly in {0, 1}m . We then let x (n) := Enc(w),
and y(n) the result of sending x (n) through the channel. We see that
(6.6)
= m − I ( X (n) ; Y (n) )
= m − H (Y ( n ) ) − H (Y ( n ) | X ( n ) )
n
= m − H (Y (n) ) − ∑ H (Yi |Y1 , . . . , Yi−1 X (n) )
i =1
n
= m − H (Y (n) ) − ∑ H (Yi | Xi )
Exercise 6.5. Assume that C is the binary symmetric channel, and strengthen Theorem 6.4
for this case, showing that some fixed G works for all w including 0m .
Exercise 6.6. Prove Theorem 6.2. What is the smallest possible minimum such a code can
have?
{0, 1}n
C (r )
{0, 1}nr
be some code. For r ∈ N, let
⊆
be the code
Exercise 6.7. Let C ⊆
which is obtained by concatenating r codewords of C . Formally, C (r) := {(c1 kc2 k . . . kcr ) :
c1 ∈ C ∧ · · · ∧ cr ∈ C}.
Let C be a channel from {0, 1} to Y which achieves capacity on the uniform distribution,
and fix e > 0. Use Theorem 6.4 and the above construction for an appropriate r to show
that there exists a family of polynomial time encodable and decodable codes C ⊆ {0, 1}n
with rate Cap(C) − e and error probability at most n1 (or any other polynomial in n).
2
Exercise 6.8. Let C ⊆ {0, 1}n be a code with ∆(C) ≥ n4 and |C| ≤ 2−ne /1000 . Prove that
there is a decoder which finds the sent codeword with overwhelming probability, when a
codeword from C is sent over a binary symmetric channels which flips a bit with probability
1− e
2 .
23
H (W |Y (n) ) = H (W ) − I (W; Y (n) )
i =1
n
≥ m − ∑ I ( Xi ; Yi ) ≥ nR − nCap(C) ≥ en.
i =1
Now, define
(6.7)
E :=
(
1
0
if Dec(Y (n) ) 6= W
otherwise
We see that H ( EW |Y (n) ) = H ( E|Y (n) ) + H (W | E, Y (n) ) and H (W | E, Y n ) = Pr[ E =
1] H (W |Y (n) , E = 1) ≤ Pr[ E = 1]m we see that Pr[ E = 1] ≥
this with (6.6) gives the result.
H (W |Y (n) )−1
.
m
Combining
7 Concatenated Codes
In order to explain concatenated codes, we suppose for a moment that a code is
given by an encoding function C : Σk → Σn .
24
kin
kin
kin
kin
}|
···
kin
kin
kin
{
}
nout symbols from Σ
Inner Code
kin
···
Outer code Cout
}
kin
encoding, and there exists a polynomial time decoding procedure which decodes
errors up to relative distance at least δ/2.
{
{z
kin
}|
Outer Code
z
kout symbols from Σ
|
z
nin
nin
···
{z
nin
nout blocks
nin
nin
|
nin
|
{z
Inner code Cin
}
Figure 5: Concatenation of codes.
We then assume we have two codes. First, an outer code Cout : Σ(kout ) → Σ(nout ) .
Furthermore, we have given an inner code Cin : {0, 1}kin → {0, 1}nin , where the
assumption that the inner code is binary is just for simplicity. We further have the
assumption that Σ = {0, 1}kin .
We can then naturally get a “concatenated code” Cout ◦ Cin ⊆ {0, 1}nout nin (writing Cout ◦ Cin and not Cin ◦ Cout seems to be standard) which is given by a function D : Σ(kout ) → {0, 1}nout nin as follows: first, map the message (σ1 , . . . , σkout ) to
(τ1 , . . . , τnout ) = Cout (σ1 , . . . , σkout ). Then, map each τi to a bitstring Cin (τi ) of length
nout , which gives the final codeword (Cin (τ1 )k . . . kCin (τnout )).
An graphical explanation is given in Figure 5. Concatenated codes have first been
studied by Forney in his 1965 PhD thesis.
Lemma 7.1. If the outer code Cout has minimum distance dout and the inner code Cin has
minimum distance din then Cout ◦ Cin has minimum distance at least dout · din .
Exercise 7.2. Argue that this bound is not always tight.
7.1 Asymptotically Good Codes
We say that a family of codes {Cn } with Cn ⊆ F2n is an “efficient code of relative distance δ” if it has relative minimum distance at least δ, there exists a polynomial time
25
Theorem 7.3 (Zyablov Codes). For every rate r < 1 and n > n0 (r ) there exists an
efficient linear code C ⊆ F2n with dimension k > r · n and relative distance at least δ(r ) >
0.
For every δ < 12 and every n > n1 (δ) there exists an efficient linear code C ⊆ F2n with
dimension k > r (δ) · n, where r (δ) > 0, and relative distance at least δ.
Proof. We use a concatenated code. As outer code, we use a Reed-Solomon code
over F2m for large enough m with relative distance δout < 1. As an inner code, we
use a linear code as given by Theorem 2.2 with dimension m, with relative distance
δin < 12 .
The length of the outer code is maximal, i.e., nout = 2m . If the dimension kout , the
1
relative distance is δout = 1 − nkout
+ nout
≈ 1 − nkout
.
out
out
By Theorem 2.2, the inner code encodes messages of length m into codewords of
length roughly 1−hm(δ ) .
in
Thus, the total length of the code is nout · 1−hm(δ ) , and it can encode binary mesin
sages of length kout · m. This means that the rate of the code is roughly (1 − δout )(1 −
h(δin )). Furthermore, the relative distance is least δin δout .
For a given relative overall relative distance δ ∈ [0, 12 ), one now thus looks for
δout ∈ [0, 1) and δin ∈ [0, 12 ) which maximize (1 − δout )(1 − h(δin )) subject to the
constraint that δin δout = δ.
The result is plotted in Figure 4 and called “Zyablov-Radius”. Note that both
results are implied by the curve (and can be easily seen from the optimization).
Finally, we note that encoding and decoding up relative distance δ2in δout
2 can be
done in polynomial time (using Theorem 2.4, exhaustive search for decoding, and
Theorem 4.8).
7.2 Constructions with Varying Inner Codes
In general, there is no need to use the same inner code Cin for each symbol of the
outer code. There are two constructions we want to quickly mention which use this
idea.
The Wozencraft ensemble is a family of codes which have the property that most
of the codes in the family achieve the Gilbert-Varshamov bound. In his 1972 paper
“A class of constructive asymptotically good algebraic codes”, Justesen proposed
to use this family for the inner codes. Then, each position of the outer codeword is
encoded using a different member of the family. For some rates, this construction
26
then achieves the same parameters as the construction from the previous section,
but without the need for exhaustive search.
The second construction is due to Thommesen. He shows one can use the ReedSolomon code together with a new random linear code in each position and with
high probability achieve the Gilbert-Varshamov bound (the Reed-Solomon code is
used with different parameters than in the following construction).
7.3 Capacity Achieving Concatenated Codes
We discuss this as an exercise.
Exercise 7.4. Let C be a binary symmetric channel (as in Exercise 6.3). Fix 0 < e < 1.
Let Cout be the Reed-Solomon code over an alphabet of size 2m with rate 1 − e, length 2m .
Furthermore, we pick for each coordinate of the Reed-Solomon code a random linear code
over {0, 1}m with rate Cap(C) · (1 − e).
Consider the decoder which first treats each instance of the inner code individually, decoding each received word to the closest codeword. After this, the decoder does the BerklekampWalch decoding on the outer code.
• Prove that for any codeword, the probability (over the choice of the linear codes and
the randomness of the channel) that there are more than e/2 fraction of wrong symbols recieved in the Reed-Solomon code is exponentially small (in the overall message
length).
• Argue that the decoder is polynomial time.
27