Download Data Coding and Compression. Part I

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

Transcript
Data Coding and Compression.
Solution of the First Exam and Second Test
Warning: there were 3 different versions of the exam, with different positions (a, b, or c) of the correct answers in
Part I, and slightly different solutions to the problems in Part II.
Department of Electrical and Computer Engineering, IST
Nome:
January 14, 2013
Número:
NOTES: Exam (3 hours): everything. Second test (90 minutes): Part I, questions 11–20; Part II, problem 3.
2. Part I (exam): correct answer = 0.5 points; wrong answer = – 0.25 points.
3. Part I (second teste): correct answer = 1 point; wrong answer = – 0.5 points.
4. Possibly useful facts: log2 3 ≃ 1.585; log2 (5) ≃ 2.322; log2 (7) ≃ 2.807; log10 (2) ≃ 0.30
Part I
1. Let X, Y, Z, T ∈ {1, 2, ..., 6} be four random variables representing the outcome of tossing four independent fair
dice and A = X + Y + Z + T their sum; then,
a) H(A) < 4 log2 (6) bits/symbol;
b) H(A) = 4 log2 (6) bits/symbol;
c) H(A) > 4 log2 (6) bits/symbol.
Solution: The random variable can take 21 different values, A ∈ {4, 5, .., 24}, obviously with non-uniform
probability distribution (for example, P[A = 4] = P[A = 24] = (1/6)4 , while P[A = 5] = 4 (1/6)4 ). Thus,
H(A) < log2 21 bits/symbol. Now, log2 21 = log2 7 + log2 3 ≃ 2.807 + 1.585 = 4.392, and 4 log2 6 = 4(log2 2 +
log2 3) ≃ 4(1 + 1.585) > 8. In conclusion, H(A) < 4 log2 6.
2. Let A be the random variable defined in the previous question and U = A2 another variable; then,
a) H(U ) < H(A);
b) H(U ) = H(A);
c) H(U ) > H(A).
Solution: Since A only takes positive values, U = A2 is an injective function of A, thus H(U ) = H(A).
3. Let X ∈ {a, b, c, d, e} be a random variable with probabilities P[X = a] = P[X = b] = P[X = c] = 1/6 and
P[X = d] = P[X = e] = 1/4. Then,
a) H(X) < 2 bits/symbol;
b) H(X) = 2 bits/symbol;
(
)
(
)
Solution 1: Directly from the definition of entropy, H(X) =
( 3 (1/6) log
) 2 6 + 2 (1/4) log2 4 . Using log2 6 =
log2 2 + log2 3 ≃ 2.585 and log2 4 = 2, we have H(X) ≃ (1/2) 2.585 + 2 = 4.585/2 > 2.
c) H(X) > 2 bits/symbol.
Solution 2: Using the grouping axiom: H(X) = H(1/2, 1/2) + (1/2)H(1/3, 1/3, 1/3) + (1/2)H(1/2, 1/2) =
1 + (1/2) log2 3 + (1/2) log2 2 = 1 + (1/2)(1 + log2 3) > 2.
4. Consider the random variables X ∈ {−2, −1, 0, 1, 2}, with uniform distribution, and Y ∈ {−1, 1}, also with
uniform distribution. Let Z = X ∗ Y ; then,
a) I(Z; X) = log2 (5) − 4/5
bits/symbol;
b) I(Z; X) = log2 (5) − 1
bits/symbol;
c) I(Z; X) = log2 (5) − 5/4
bits/symbol.
1
Solution: First, since P[X = x] = 1/5, for any x ∈ {−2, −1, 0, 1, 2}, and P[Y = −1] = P[Y = 1] = 1/2 (both
uniform distributions), we have P[Z = −2] = P[X = −2]P[Y = 1] + P[X = 2]P[Y = −1] = 1/10 + 1/10 = 1/5;
similarly for P[Z = −1] = P[Z = 1] = P[Z = 2] = 1/5; finally, P[Z = 0] = P[X = 0]P[Y = 1] + P[X = 0]P[Y =
−1] = 1/5. Consequently, H(Z) = log2 5 bits/symbol.
To compute H(Z|X), notice that H(Z|X = 0) = 0, because if X = 0 also Z = 0, independently of Y .
Furthermore, H(Z|X = x) = 1 bit/symbol, for x ∈ {−2, −1, 1, 2}, because for each of these values of x,
Z = x ∗ Y ∈ {x, −x}, with probabilities {1/2, 1/2}. Thus,
H(Z|X) =
2
∑
4
H(Z|X = x) P[X = x] = .
|
{z
} | {z } 5
x=−2
1/5
1, if x ̸= 0
0, if x = 0
Finally, I(Z; X) = H(Z) − H(Z|X) = log2 5 − 4/5.
5. Consider now that the random variables X and Y defined in the previous question are not independent, but
verify the following: (X > 0) ⇒ (Y = 1); (X < 0) ⇒ (Y = −1); if X = 0, then Y = 1, with probability 1/2.
Then, the entropy H(Z) is equal to
a) 1/5 bit/symbol;
b) 2/5 bit/symbol;
c) none of the values above.
Solution: Notice that, if X > 0, then Y = 1, and if X < 0, then Y = −1, implies that Z = X ∗ Y is never
negative, that is, it can only take values in {0, 1, 2}. Then,
2
P[Z = 2] = P[X = 2] P[Y = 1|X = 2] + P[X = −2] P[Y = −1|X = −2] =
| {z } |
{z
} | {z } |
{z
} 5
1/5
1
1/5
1
2
P[Z = 1] = P[X = 1] P[Y = 1|X = 1] + P[X = −1] P[Y = −1|X = −1] =
| {z } |
{z
} | {z } |
{z
} 5
1/5
P[Z = 0] =
1
1/5
1
1
.
5
Thus, H(Z) = 2((2/5) log2 (5/2)) + (1/5) log2 5 = log2 5 − 4/5, which is not equal to 1/5 nor 2/5.
6. Consider the source X ∈ {a, b, c, d}, with probabilities satisfying P[X = a] > P[X = b] > P[X = c] > P[X = d].
The ternary code {C(a) = 0, C(b) = 10, C(c) = 11, C(d) = 12}
a) is optimal for this source;
b) may or not be optimal for this source, depending on the values of the probabilities;
c) is not optimal for this source.
Solution: Independently of the probabilities, we could build a better (of course, still instantaneous) ternary
code: {C(a) = 0, C(b) = 1, C(c) = 21, C(d) = 22}.
7. Consider a memoryless source X ∈ {a, b, c}, with uniform distribution, generating 2000 symbols/second. An
optimal binary code for the second order extension of this source produces
a) less than 3300 bits/second;
b) exactly 3300 bits/second;
c) more than 3300 bits/second.
Solution: Designing a Huffman code for the second order extension of this source (which produces 9 possible
pairs {(a, a), (a, b), ..., (cc)}, all with probability 1/9), we obtain a code with expected code-length equal to 29/9
bits/pair-of-symbols. If the source generates 2000 symbols/second, it generates 1000 pairs of symbols per second,
thus the code produces 1000 ∗ 29/9 ≃ 3222 bits/second.
2
8. Consider a Markovian source X = (X1 , X2 , ..., Xt , ...), with Xt ∈ {1, 2, 3}, with the following transition matrix:


1/2 0
1/2
P =  0 1/2 1/2 .
1/3 1/3 1/3
The expected length of the optimal coding scheme for this source
a) is less than the conditional entropy rate of the source;
b) is equal to the conditional entropy rate of the source;
c) is larger than the conditional entropy rate of the source.
Solution: The expected length of the optimal coding scheme for this source is the weighted (by the stationary
distribution) average of the expected lengths of the optimal codes for each conditional distribution (rows of
the transition matrix). The conditional entropy rate is the weighted (by the stationary distribution) average
of the entropies of the conditional distribution (the rows of the transition matrix). In two of the rows, all
probabilities are powers of 2, thus the corresponding expected length and entropies are equal. In one of the rows,
the probabilities are not powers of two, thus the expected code-length is larger than the corresponding entropy.
9. Consider a Markovian source X = (X1 , X2 , ..., Xt , ...), with Xt ∈ {1, 2, 3}, with the following transition matrix:


1/2 1/2
0
P =  0 1/3 2/3 .
1/2 0
1/2
The expected length of the optimal coding scheme for this source is
a) > 1 bit/symbol;
b) = 1 bit/symbol;
c) < 1 bit/symbol
Solution: Since, in each row, only two symbols of non-zero probabilities, the corresponding codes will have only
two words {0, 1}, thus with expected length equal to 1 bit/symbol.
10. Consider a memoryless discrete source X ∈ {a, b, c, d}, with probabilities P[X = a] = 0.6, P[X = b] = 0.2, P[X =
c] = P[X = d] = 0.1. The second bit of the arithmetic code for the sequence that starts with ab...
a) is 0;
b) is 1;
c) depends on the following symbols.
Solution: Since the first symbol to encode is “a”, the first interval is [0, 0.6[; this interval is then split into
4 intervals, with proportions 0.6, 0.2, 0.1, 0.1, that is: [0, 0.36[, [0.36, 0.48[, [0.48, 0.54[, [0.54, 0.6[. Since the
second symbol is “b”, the chosen interval is [0.36, 0.48[. At this point, we know that the first bit is necessarily
0 (the interval is all to the left of 1/2 = 0.5) and the second bit is necessarily 1 (the interval is all to the right
of 1/4 = 0.25). That is C(ab....) = 01...
11. Which of the following binary words corresponds to the Elias delta code for the natural number 17?
a) 001010000;
b) 0010110001;
c) 001010001.
Solution: Option (a) is obviously false: 17 is an odd number, thus the last bit of its binary representation
(1710 = 100012 ) must be 1. Now, 10001 has 5 bits, thus we need the Elias gamma code for 5, which is its binary
representation, 101, preceded by a number of zeros equal to the length of this representation minus 1, that is
Cγ (5) = 00 101. Finally, Cδ (17) is obtained by concatenating Cγ (5) with 10001, and then deleting the first 1,
because the binary representation of any natural number begins with a 1. Finally, Cδ (17) = 001010001.
3
12. Which of the following sequences cannot correspond to the Lempel-Ziv-Welch coding of any sequence from the
alphabet {a, b, c}, assuming that the dictionary indices start at 1?
a) 1111;
b) 1234;
c) 1211.
Solution: If we decode 1111 using an LZW decoder (with alphabet {a, b, c} and assuming that the dictionary
indices start at 1), we obtain the sequence aaaa. Now, if we code aaaa using LZW, we obtain 141, which means
that 1111 is not the LZW code for aaaa.
We can check that the other two options are valid LZW codes. If we decode 1234, we obtain abcab; if we code
this sequence using LZW, we obtain 1234, confirming that this is the correct LZW codeword for abcab. Decode
1211, we obtain abaa; if we code this sequence using LZW, we obtain 1211, confirming that this is the correct
LZW codeword for abaa.
13. Knowing that the differential entropy of the real-valued random variable X is equal to log2 (10), what is the
differential entropy of the real-valued random variable Y = −2 X?
a) 2 log2 (5);
b) 2 log2 (10);
c) log2 (20).
Solution: Using the relation h(AX) = h(X) + log |A| with h(X) = log2 (10) and A = −2, we obtain h(Y ) =
h(−2 X) = log2 10 + log2 2 = log2 (20).
14. Knowing that the differential entropy of the real-valued random variable X is equal to log2 (6), what is the
differential entropy of the real-valued random variable Y = X − 1?
a) log2 (6) − 1;
b) log2 (5);
c) log2 (6).
Solution: Adding or subtracting a constant from a real random variable just changes its mean; the differential
entropy doesn’t change when the mean is changed.
15. Consider two real-valued random variable X and Y , mutually independent, and the variable Z = X + Y . Then,
it is necessarily true that
a) h(Z) ≤ h(X);
b) h(Z) ≥ h(X);
c) h(Z) = h(X) + h(Y ).
Solution: This result is perfectly analogous to the one that applies to discrete variables. A simple way to think
about this question is using two independent Gaussian variables, both with zero mean (the entropies do not
depend on the means) and unit variance, say X ∼ N (0, 1) and Y ∼ N (0, 1). Then, h(X) = h(Y ) = 12 log(2 πe).
Since X and Y are independent, the variance of the sum is the sum of the variances, thus Z ∼ N (0, 2) and
h(Z) = 21 log(2 πe 2) > 12 log(2 πe) = h(X). Option (c) is false, because 12 log(2 πe 2) ̸= 12 log(2 πe) + 21 log(2 πe).
16. Consider a source X ∈ [−1, 1] with probability density function fX (x) = (x + 1)/2, connected to a non-uniform
quantizer with the following four regions: R0 = [−1, 0], R1 =]0, 1/2], R3 =]1/2, 3/4] and R3 =]3/4, 1]. The
optimal representative of each of these regions
a) is always located to the left of the center of the region;
b) is always located to the right of the center of the region;
c) none of the previous answers.
Solution: Since the probability density function, in any quantization region, is a monotonically increasing
function, the optimal representative (its center of mass) is located to the right of its center.
4
17. Consider that the source defined in the previous question (16) is connected to an 8-bit uniform quantizer. A
entropy of the quantizer output is
a) less than 8 bits/symbol;
b) equal to 8 bits/symbol;
c) larger than 8 bits/symbol.
Solution: The output of the quantizer is a discrete variable, with 28 = 256 possible values, with probabilities
equal to the probabilities that X falls in each of the regions. Since the size of the regions is constant (the
quantizer is uniform), but the probability density function is not uniform, these probabilities are not equal to
each other, thus the entropy is less than 8 = log2 256 bits/symbol.
18. Consider again the source defined in question 16, now connected to a 2-bit quantizer, with the following codebook:
{y0 = −3/4, y1 = −1/4, y2 = 1/4, y3 = 3/4}. Then, the point that separates region R1 from region R2 , in the
optimal quantizer for this codebook,
a) is exactly at zero;
b) is located to the right of zero;
c) is located to the left of zero.
Solution: The optimal separation between pairs of consecutive codebook elements is the corresponding midpoint, independently of the probability density function. The midpoint between y1 = −1/4 and y2 = 1/4 is
obviously 0.
19. Consider again the source defined in question 16, now connected to a 2-bit uniform quantizer (regions R0 =
[−1, −1/2[, R1 = [−1/2, 0[, R2 = [0, 1/2[ e R0 = [1/2, 1]). The resulting mean squared error is
a) less than 1/48;
b) equal to 1/48;
c) larger than 1/48.
Solution: The high resolution approximation (HRA) would give exactly 1/48 = ∆2 /12, with ∆ = 1/2. This
would be the correct value if the density was uniform, which it is not. To see if the exact value is larger or
smaller than the HRA predicts, simply observe that in the regions where the density is non-uniform, it is more
concentrated around the center of mass, thus the means squared error is smaller.
20. Consider a pair of real-valued random variables X = [X1 , X2 ] ∈ [−1, 1]2 , mutually independent, and both with
uniform probability density function in the interval [−1, 1]. Consider the optimal 10-bit vector quantizer (1024
regions) for X; the regions of this quantizer
a) are squares;
b) are rectangles (but not squares);
c) none of the previous answers.
Solution: The optimal partition of the plane is not obtained with squares or rectangles, but with hexagons.
5
Part II
Notes: present only the final results, not the intermediate calculations that lead to them; presente the results in
exact form (for example, H(U ) = 2 log2 (3) bits/symbol) or in numerical form with two decimal places (for example,
H(U ) = 3.17 bits/symbol).
Problem 1
[
]
Consider discrete memoryless source X ∈ {1, 2}, with probability P X = 1 = 1/3. Consider another source Y
that produces[ symbols
] according to Y = X ∗ N , where N ∈ {1, 2} is a random variable, independent from X, with
probability P N = 1 = 1/2.
a) Compute the following quantities: H(X), H(Y ), H(X, Y ), H(Y |X), and I(X; Y ).
Solution: The following is the table of joint probabilities of the pair (X, Y )
P (X, Y )
y=1
y=2
y=4
x=1
1/6
1/6
0
x=2
0
1/3
1/3
The marginal for X (which is given in the problem) leads to
H(X) =
1
2
3
2
log2 3 + log2 = log2 3 − ≃ 1.585 − 0.666 ≃ 0.92 bits/symbol.
3
3
2
3
Variable Y takes values in {1, 2, 4}, with probabilities: P[Y = 1] = P[X = 1] P[N = 1] = (1/3)(1/2) = 1/6;
P[Y = 4] = P[X = 2] P[N = 2] = (2/3)(1/2) = 1/3; P[Y = 2] = 1 − (1/6) − (1/3) = 1/2; thus,
H(Y ) =
1
1
1
1
2
log2 6 + log2 3 + log2 2 = log2 3 + ≃ 1.46 bits/symbol.
6
3
2
2
3
The joint entropy is obtained directly from the table above:
H(X, Y ) =
2
2
1
log2 6 + log2 3 = log2 3 + ≃ 1.92 bits/symbol.
6
3
3
Using Bayes law for entropies, H(X, Y ) = H(Y |X) + H(X),
H(Y |X) = H(X, Y ) − H(X) = 1 bit/symbol
Finally, we compute the mutual information
I(X; Y ) = H(Y ) − H(Y |X) =
1
1
log2 3 − ≃ 0.46 bits/symbol
2
3
b) Compute the expected code-length of the optimal binary codes for the sources X and Y .
Solution: Variable X only takes two different values, thus L[C opt (X)] = 1 bit/symbol.
Variable Y takes values in {1, 2, 4}, with probabilities 1/6, 1/2, 1/3, respectively, thus an optimal code is
{C(1) = 10, C(2) = 0, C(4) = 11}, with L[C opt (Y )] = 2(1/6) + (1/2) + 2(1/3) = 3/2 = 1.5 bit/symbol.
c) Compute the expected code-length of the optimal binary joint code for pair (X, Y ) (that is, the optimal code for
the joint probability distribution of (X, Y )).
Solution: The pair (X, Y ) takes 4 possible configurations {(1, 1), (1, 2), (2, 2), (2, 4)} with probabilities 1/6, 1/6,
1/3, 1/3, respectively. A possible Huffman code for this distribution is {C(1, 1) = 00, C(1, 2) = 01, C(2, 2) =
10, C(2, 4) = 11}, with L[C opt (X, Y )] = 2 bits/symbol.
6
d) Compute the expected code-length of the optimal ternary code for the second order extension of source X
(expressed in trits per symbol of source X).
Solution: The second order extension of source X takes values in {(1, 1), (1, 2), (2, 1), (2, 2)}, with probabilities
1/9, 2/9, 2/9, and 4/9, respectively. Since there is an even number of configurations, we append a dummy
symbol “*” and design a Huffman code: {C(1, 1) = 01, C(1, 2) = 02, C(2, 1) = 1, C(2, 2) = 2, C(∗) = 00}.
After dropping the word for the dummy symbol, the code has an expected length equal to 2(1/9) + 2(2/9) +
1(2/9) + 1(4/9) = 4/3 trits/pair-of-symbols. Finally, in terms of symbols (not pairs of symbols),
L2 [C2opt (X)] =
2
≃ 0.67 trits/symbol.
3
Problem 2
Consider a first-order Markovian source with alphabet {a, b, c} and the following transition matrix:
P (Xt |Xt−1 )
Xt−1 = a
Xt−1 = b
Xt−1 = c
Xt = a
1/2
1/4
1/4
Xt = b
1/4
1/2
1/4
Xt = c
1/4
1/4
1/2
1. Determine the stationary distribution for this source,
Solution: Since each row is a circular permutation of the others (and contains all possible such permutations)
no symbol is on average more probable than the others, thus
p∞ = [1/3, 1/3, 1/3]T
(the answer without the transpose is also accepted as correct). Of course, it is also possible to compute the
eigenvector of the transpose of the transition matrix, associated to the unit eigenvector.
2. Knowing that the probability of the sequence abcc is twice the probability of the sequence bcaa and three times
larger than that of sequence cabb, that is, P[abcc] = 2P[bcaa] and P[abcc] = 3P[cabb], find the initial distribution.
Solution: Let the initial distribution be denoted as p1 = [pa , pb , pc ]T . Notice that
P[abcc] = pa P (Xt = b|Xt−1 = a) P (Xt = c|Xt−1 = b) P (Xt = c|Xt−1 = c) = pa (1/4)(1/4)(1/2) = pa (1/32)
P[bcaa] = pb (1/4)(1/4)(1/2) = pb (1/32)
P[cabb] = pc (1/32)
We thus have the following system:

 pa = 2 pb
pa = 3 pc

pa + pb + pc = 1,
the solution of which is pa = (6/11), pb = (3/11), pc = (2/11).
3. Compute the conditional entropy rate of this source and the expected length of the optimal binary coding scheme
for this source
Solution:
H ′ (X) = (3/2) = 1.5 bits/symbol (simple, because all the probabilities are powers of 2.)
L[C
opt
(X)] = H ′ (X) = 1.5 bits/symbol (simple, because all the probabilities are powers of 2.)
4. Consider that this source is coded with fix code, {C(a) = 0, C(b) = 10, C(c) = 11}; compute the resulting
expected code-length:
L[C(X)] =
1
1) 1( 1
1
1) 1( 1
1
1) 5
1( 1
1 +2 +2
+
1 +2 +2
+
1 +2 +2
= ≃ 1.67 bits/symbol
3
2
4
4
3
4
2
4
3
4
4
2
3
7
Problem 3
Consider a real-valued random variable X, with probability

 A + Ax
A
fX (x) =

0
density function
⇐
⇐
⇐
x ∈ [−1, 0[,
x ∈ [0, 1],
x ̸∈ [−1, 1].
1. Find A such that fX is a valid probability density function.
Solution:
∫ 1
fX (x) dx = A/2+A (area of a triangle of height A and width 1 plus a rectangle of height A and width 1.)
−1
Equating A/2 + A = 1 leads to A = 2/3.
2. Consider X connected to the input of a uniform 2-bit quantizer (that is, with the following four regions: R0 =
[−1, −1/2], R1 =]−1/2, 0], R2 =]0, 1/2] and R3 =]1/2, 1]). Compute the corresponding optimal representatives.
Solution:
∫ −1/2
x(A + A x)dx
2
y0 = ∫−1−1/2
= − ≃ −0.67
3
(A + A x)dx
−1
∫0
x(A + A x)dx
2
−1/2
y1 = ∫ 0
= − ≃ −0.22
9
(A + A x)dx
−1/2
0 + (1/2)
1
= = 0.25 (the density is uniform in this region).
2
4
(1/2) + 1
3
y3 =
= = 0.75 (the density is uniform in this region).
2
4
y2 =
3. Consider X connected to the input of a uniform 1-bit quantizer(that is, with the following 2 regions: R0 = [−1, 0]
and R1 =]0, 1]). Compute the exact value of the mean squared error (MSE) and its high-resolution approximation
]
MSE).
Solution: The optimal representative of R0 is y0 = −(1/3) and that of R1 is y1 = 1/2.
(
)
∫ 0
∫ 1
(
)2
(
)2
1
1
2 1
2
MSE =
x + (1/3) (A + A x) dx +
x − (1/2) A dx = A
+
=
=
≃ 0.074
36 12
3 9
27
−1
0
2
] = ∆ = 1 ≃ 0.083
MSE
12
12
4. Consider that X is connected to the input of a non-uniform 2-bit quantizer, with the following four regions:
R0 = [−1, 0], R1 =]0, 1/3], R2 =]1/3, 2/3] and R3 =]2/3, 1]. Compute the entropy of the output of the encoder
of this quantizer, E : [−1, 1] → {0, 1, 2, 3} (which is the same as that of the quantized variable Q(X)).
Solution: Define pi = P[X ∈ Ri ]. Then,
p0 = A/2 = 1/3 (area of a triangle of height A and width 1)
p1 = p2 = p3 = A/3 = 2/9 (areas of rectangles of height A and width 1/3)
[
]
[
] 1
2
9
5
2
H E(X) = H Q(X) = log2 3 + log2 = log2 3 − ≃ 1.97 bits/symbol
3
3
2
3
3
5. Consider that X is connected to the input of a 2-bit quantizador (clearly non-optimal), with the following codebook: C = {y0 = −3/4, y1 = −1/2, y2 = −1/4, y3 = 1/2}. The four regions can be written as R0 = [−1, a], R1 =
]a, b], R2 =]b, c], R3 =]c, 1]. Find the values of a, b and c that minimize the MSE, given codebook C.
Solution: Given the codebook, the optimal regions do not depend on the probability density function:
y0 + y1
5
a=
=−
2
8
3
y1 + y2
=−
b=
2
8
y2 + y3
1
c=
=
2
8
8