



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Solutions to problem set #1 in the comp 411 computer organization course, covering topics such as nucleic acid representation, codon representation, and error detection in binary matrices. It includes calculations and explanations for each problem.
Typology: Assignments
1 / 6
This page cannot be seen from the preview
Don't miss anything!




a) If all nucleic acids are equally likely, a 3-nucleic acid sequence can be represented with 6 bits, as such:
log 2 ( 4 ∗^41 ∗ 4 ) = log 2 ( 641 ) = 6
Given that there are 20 amino acids and 1 stop code, the minimal number of bits to encode a single item in a protein chain is:
log 2 ( 211 ) ≈ 4. 39
The numbers do not agree. Reasons why should include something along the lines of less information being stored in amino acid representation, the smaller set size of the amino acids, or such.
b) Since there are 64 codons, and 61 are amino acids. Since the only information given is that the codon represents an amino acid, the bits conveyed are:
log 2 ( 6461 ) ≈ 0. 07
Similarly for the 3 stop codes:
log 2 ( 643 ) ≈ 4. 42
Since there are 6 possible codons for Serine:
log 2 ( 646 ) ≈ 3. 42
c) There are 37 codons that contain the T nucleotide and 64 possible codons. Thus, the bits conveyed are:
log 2 ( 6437 ) ≈ 0. 79
To figure out how many bits of information are added by knowing that the codon is a stop code, subtract the original amount of information conveyed from the new amount of information conveyed. log 2 (
) − log 2 (
You could also simply use log 2 ( 373 ) to find the amount of additional information.
d) There are 20 amino acids and 3 bases:
log 2 ( 203 ) ≈ 2. 74
There are 64 possible codons and 10 of them encode to bases:
log 2 ( 6410 ) ≈ 2. 68
There are 2 ways to encode Lysine, so:
log 2 ( 642 ) − log 2 ( 6410 ) ≈ 5 − 2. 68 ≈ 2. 32
Again, you could also simply evaluate log 2 ( 103 ) to find the amount of additional information.
e) Across the set of 64 codons, there are 32 available transitions. Thus, let us consider each position in a codon and its influence on the final value. For the right-most position, there is only one transition that changes the resulting amino acid: Isoleucine ↔ Methionine. For the middle position, all 32 of the possible 32 transitions change the protein. For the right-most position, 30 of the 32 transitions change the protein (TTA ↔ CTA and TTG ↔ CTG do not). Thus, the number of bits conveyed is:
log 2 ( 32+32+321+32+30 ) = log 2 ( 9663 ) ≈ 0. 61
f ) In the case of Glycine, since the first 2 nuclic acids are all that is needed to identify the resulting amino acid, no bits of information are conveyed in the last nucleic acid.
g) Using the entropy formula given in the lecture −
i pi^ log 2 (pi), the entropy of the codes is:
−(0. 24 ∗ log 2 (0.24) + 0. 14 ∗ log 2 (0.14) + 0. 12 ∗ log 2 (0.12) + 0. 5 ∗ log 2 (0.5)) ≈ −(0.24(− 2 .06) + 0.14(− 2 .84) + 0.12(− 3 .06) + 0.5(−1)) ≈ −(− 0. 49 − 0. 4 − 0. 38 − 0 .5) ≈ 1. 77
The bits wasted using a fixed length scheme are:
2 − 1. 77 ≈ 0. 23
h) The string 0011011101010 can be decoded as follows (only the last acid in the codon is shown):
︸︷︷︸^0 C
i) The expected length is:
1000(0. 5 ∗ 1 + 0. 24 ∗ 2 + 0. 14 ∗ 3 + 0. 12 ∗ 3) = 1760
Since GGT and GGA have an encoded length of 3, the worst case is:
1000(3) = 3000
Answers will vary, but generally when compared to 1000 ∗ log 2 ( 41 ) = 2000, the worst case of 3000 (a 50% increase) seems very poor.
Answers will vary, but they should mention that a 1 was carried into the most significant bit, causing the number to ’become’ negative.
f ) Answers will vary. It would be nice to have something that describes how a number added to its complement is a very large number and adding 1 causes it to overflow to 0.
a)
b) By summing the ’1’s in each row and column, it can be determined if any errors are the data block.
c) Answers will vary. Should include mention of 2 errors in a single row or column.
d) 13 is the only index that is present in p 0 , p 2 , and p 3. Thus the bit at index 13 has the error.
Since p 0 , p 2 , and p 3 marked the error, e 0 = 1, e 1 = 0, e 2 = 1, e 3 = 1, and e 4 = 0.
The binary representation of e 0 , e 1 , e 2 , e 3 , and e 4 is 01101. Note that e 0 represents the least significant bit.
e) Consider that the index with the error is 13 and the binary representation of checked error bits is 01101. The error bits (01101) encode the index of the error (13) in binary form.
f ) Answers will vary, but they should mention that certain combinations of double bit errors are not detectable. Solutions may include such things as a parity bit that checks the other parity bits.
a) Since there are 10 possibilities:
log 2 (
b) f (d) encodes as follows:
f (d) =
1 d = 0 2 1 ≤ d ≤ 2 4 3 ≤ d ≤ 5 8 d ≥ 6
The probabilities of f (d) is:
p(f (d)) =
1 102 1 103 2 104 4 10 8