Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Comp 411 Computer Orginization
Fall 2007
Problem Set #1 Solutions
Problem 1
a) If all nucleic acids are equally likely, a 3-nucleic acid sequence can be represented with 6 bits, as
such:
64
log2 ( 4∗4∗4
1 ) = log2 ( 1 ) = 6
Given that there are 20 amino acids and 1 stop code, the minimal number of bits to encode a
single item in a protein chain is:
log2 ( 21
1 ) ≈ 4.39
The numbers do not agree. Reasons why should include something along the lines of less
information being stored in amino acid representation, the smaller set size of the amino acids,
or such.
b) Since there are 64 codons, and 61 are amino acids. Since the only information given is that the
codon represents an amino acid, the bits conveyed are:
64
) ≈ 0.07
log2 ( 61
Similarly for the 3 stop codes:
log2 ( 64
3 ) ≈ 4.42
Since there are 6 possible codons for Serine:
log2 ( 64
6 ) ≈ 3.42
c) There are 37 codons that contain the T nucleotide and 64 possible codons. Thus, the bits
conveyed are:
64
log2 ( 37
) ≈ 0.79
To figure out how many bits of information are added by knowing that the codon is a stop code,
subtract the original amount of information conveyed from the new amount of information
conveyed.
64
64
log2 ( ) − log2 ( ) ≈ 4.41 − 0.79 ≈ 3.62
3
37
You could also simply use log2 ( 37
3 ) to find the amount of additional information.
1
d) There are 20 amino acids and 3 bases:
log2 ( 20
3 ) ≈ 2.74
There are 64 possible codons and 10 of them encode to bases:
64
log2 ( 10
) ≈ 2.68
There are 2 ways to encode Lysine, so:
64
log2 ( 64
2 ) − log2 ( 10 ) ≈ 5 − 2.68 ≈ 2.32
Again, you could also simply evaluate log2 ( 10
3 ) to find the amount of additional information.
e) Across the set of 64 codons, there are 32 available transitions. Thus, let us consider each position
in a codon and its influence on the final value. For the right-most position, there is only one
transition that changes the resulting amino acid: Isoleucine ↔ Methionine. For the middle
position, all 32 of the possible 32 transitions change the protein. For the right-most position,
30 of the 32 transitions change the protein (TTA ↔ CTA and TTG ↔ CTG do not). Thus,
the number of bits conveyed is:
96
log2 ( 32+32+32
1+32+30 ) = log2 ( 63 ) ≈ 0.61
f ) In the case of Glycine, since the first 2 nuclic acids are all that is needed to identify the resulting
amino acid, no bits of information are conveyed in the last nucleic acid.
P
g) Using the entropy formula given in the lecture − i pi log2 (pi ), the entropy of the codes is:
−(0.24 ∗ log2 (0.24) + 0.14 ∗ log2 (0.14) + 0.12 ∗ log2 (0.12) + 0.5 ∗ log2 (0.5))
≈ −(0.24(−2.06) + 0.14(−2.84) + 0.12(−3.06) + 0.5(−1)) ≈ −(−0.49 − 0.4 − 0.38 − 0.5) ≈ 1.77
The bits wasted using a fixed length scheme are:
2 − 1.77 ≈ 0.23
h) The string 0011011101010 can be decoded as follows (only the last acid in the codon is shown):
0 |{z}
0 |{z}
110 |{z}
111 |{z}
0 |{z}
10 |{z}
10 |{z}
10
|{z}
C C A T C G G G
i) The expected length is:
1000(0.5 ∗ 1 + 0.24 ∗ 2 + 0.14 ∗ 3 + 0.12 ∗ 3) = 1760
Since GGT and GGA have an encoded length of 3, the worst case is:
1000(3) = 3000
Answers will vary, but generally when compared to 1000 ∗ log2 ( 14 ) = 2000, the worst case of
3000 (a 50% increase) seems very poor.
2
Problem 2
a) There are 32 positions in a 32 bit word and each position can hold 1 of 2 possible values, thus:
232 = 4294967296
b) 0 is represented as:
0000 0000 0000 0000 0000 0000 0000 00002
The most positive integer of 2147483647 (from
0111 1111 1111 1111 1111 1111 1111 11112
232
2
The most negative value of -2147483648 (from
1000 0000 0000 0000 0000 0000 0000 00002
232
2 )
− 1) is represented as:
is represented as:
Negating the most negative number is as such:
initially → 1000 0000 0000 0000 0000 0000 0000 00002 (-2147483648)
complement → 0111 1111 1111 1111 1111 1111 1111 11112 (2147483647)
add 1→ 1000 0000 0000 0000 0000 0000 0000 00002 (-2147483648)
c) See d)
d)
1. 0x0000019B16
2. FFFF000016
3. BADC541116
4. ABADFACE16
5. FFFFFFFF16
e) All numbers are in their base 2 form, followed by their base 10 form.
1.
010100 (20)
+001010 (10)
011110 (30)
2.
010011 (19)
+110001 (-15)
000100 (4)
3.
001111 (15)
+101101 (-19)
111100 (-4)
3
4.
011011 (27)
+110111 (-9)
010010 (18)
5.
110111 (-9)
+101111 (-17)
100110 (-26)
6.
010101 (21)
+101011 (-21)
000000 (0)
7.
011111 (31)
+001100 (12)
101011 (-21)
Answers will vary, but they should mention that a 1 was carried into the most significant
bit, causing the number to ’become’ negative.
f ) Answers will vary. It would be nice to have something that describes how a number added to its
complement is a very large number and adding 1 causes it to overflow to 0.
Problem 3
a)
1. 1+1+1+0+1+1+0+1+1+1+1+0+1+1+0+1+1 = 13 (has error)
2. 1+1+0+1+1+1+1+0+1+0+1+0+1+1+0+1+1 = 12 (no errors)
3. 1+0+1+1+1+1+1+0+1+1+1+0+1+1+1+1+0 = 13(has error)
4. 0+0+0+0+0+0+0+0+0+0+0+0+0+0+0+0+0 = 0 (no errors)
b) By summing the ’1’s in each row and column, it can be determined if any errors are the data
block.
1. Error at 1,1
Corrected matrix:
0011
0110
0011
011
2. No errors
4
3. Error at 2,2
Corrected matrix:
000
101
10
4. Error in row 1 parity bit
Corrected matrix:
0110
1001
0110
100
c) Answers will vary. Should include mention of 2 errors in a single row or column.
d) 13 is the only index that is present in p0 , p2 , and p3 . Thus the bit at index 13 has the error.
Since p0 , p2 , and p3 marked the error, e0 = 1, e1 = 0, e2 = 1, e3 = 1, and e4 = 0.
The binary representation of e0 , e1 , e2 , e3 , and e4 is 01101. Note that e0 represents the least
significant bit.
e) Consider that the index with the error is 13 and the binary representation of checked error bits
is 01101. The error bits (01101) encode the index of the error (13) in binary form.
f ) Answers will vary, but they should mention that certain combinations of double bit errors are
not detectable. Solutions may include such things as a parity bit that checks the other parity
bits.
Problem 4
a) Since there are 10 possibilities:
log2 (
10
) ≈ 3.32
1
b) f (d) encodes as follows:
1
2
f (d) =
4
8
d=0
1≤d≤2
3≤d≤5
d≥6
The probabilities of f (d) is:
p(f (d)) =
5
1
10
2
10
3
10
4
10
1
2
4
8
In the case of 1, the information about d is:
log2 (
10
) ≈ 3.32
1
log2 (
10
) ≈ 1.32
4
For 8, the information conveyed is:
c) The average using the formula
P
i
pi log2 ( p1i ) is:
1
10
1
5
3
10
4
10
∗ log2 ( ) + ∗ log2 ( ) +
∗ log2 ( ) +
∗ log2 ( )
10
1
5
1
10
3
10
4
≈
1
1
3
4
(3.32) + (2.32) + (1.74) + (1.32)
10
5
10
10
≈ 1.85
d) Since step 2 always operates on a pair of elements, it will repeat until there is a single element
left in S. Thus, it will iterate n − 1 times.
e) Answers will vary. Should be similar to this:
Element
1
2
4
8
Encoding
011
010
00
1
1
HHH
H
6
4
10
10 (8)
H
H
H3
3
(4)
10
10
HH
2
1
10 (2)
10 (1)
Probability
1
10
2
10
3
10
5
10
6