



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The main points are:Quantifying Information, Encoding, Fixed-Length Encodings, Encoding Numbers, Signed Integers, Data Compression, Variable-Length Encodings, Error Detection and Correction, Hamming Distance
Typology: Slides
1 / 5
This page cannot be seen from the preview
Don't miss anything!




L01 - Basics of Information 9
2/3/
(Claude Shannon, 1948)
2
2
2
(well, actually, are both outcomes equally probable?)
Information is measured in bits(binary digits) = number of 0/1’s
required to encode choice(s)
L01 - Basics of Information 10
6.004 – Spring 2009
2/3/
L01 - Basics of Information 11
2/3/
bits
4
322
3
10
2
ex. ~86 English characters =
{A-Z (26), a-z (26), 0-9 (10), punctuation (11), math (9), financial (4)}
7-bit ASCII (
American Standard Code for Information Interchange
ex. Decimal digits 10 = {0,1,2,3,4,5,6,7,8,9}
4-bit BCD (binary coded decimal)
L01 - Basics of Information 12
6.004 – Spring 2009
=
1
n
0
i
i
i
2
11
2
10
2
9
2
8
2
7
2
6
2
5
2
4
2
3
2
2
2
1
2
0
03720
Octal - base 8
000 - 0001 - 1010 - 2011 - 3100 - 4101 - 5110 - 6111 - 7
0x7d
Hexadecimal - base 16
0000 - 0 1000 - 80001 - 1
1001 - 9
0010 - 2
1010 - a
0011 - 3
1011 - b
0100 - 4
1100 - c
0101 - 5
1101 - d
0110 - 6
1110 - e
0111 - 7
1111 - f
Oftentimes we will find it
convenient to cluster
groups of bits together
for a more compact
notation. Two popular
groupings are clusters of
3 bits and 4 bits.
It is straightforward to encode positive integers as a sequence of bits.Each bit is assigned a weight. Ordered from right to left, these weights areincreasing powers of 2. The value of an n-bit number encoded in this fashionis given by the following formula:
10
L01 - Basics of Information 13
2/3/
0
1
2
3
…
N-
N-
…
…
N bits
8-bit 2’s complement example:
7
6
4
2
1
If we use a two’s complement representation for signed integers, the samebinary addition mod 2
n
procedure will work for adding positive and negative
numbers (don’t need separate subtraction rules). The same procedure will alsohandle unsigned numbers!By moving the implicit location of “decimal” point, we can represent fractionstoo:
1101.0110 = –
3
2
0
= – 8 + 4 + 1 + 0.25 + 0.125 = – 2.
“sign bit”
“decimal” point
Range: – 2
N-
to 2
N-
L01 - Basics of Information 14
6.004 – Spring 2009
2/3/
i
i
2
i
2
choice
i^
p
i
“A”
1/
“B“
1/
“C”
1/
“D”
1/
Average information
= (.333)(1.58) + (.5)(1)+ (2)(.083)(3.58)= 1.626 bits Can we find an encoding wheretransmitting 1000 choices isclose to 1626 bits on theaverage? Using two bits for eachchoice = 2000 bits
log
2
(1/p
)i
1.58 bits
1 bit
3.58 bits3.58 bits
L01 - Basics of Information 15
2/3/
(David Huffman, MIT 1950)
choice
i^
p
i^
encoding
“A”
1/
11
“B“
1/
0
“C”
1/
100
“D”
1/
101
1
0
1
0
1
0
(^010011011101) Huffman Decoding Tree
C
B
A
A
D
Average information
=
(.333)(2)+(.5)(1)+(2)(.083)(3)
= 1.666 bits Transmitting 1000 choicestakes an average of 1666bits… better but notoptimal
To get a more efficient encoding (closer to information content) we need toencode sequences of choices, not just each choice individually. This is theapproach taken by most file compression algorithms…
B
L01 - Basics of Information 16
6.004 – Spring 2009
2/3/
Key:
A84b!*m9@+M(p
“Outside of a dog, a book is
man’s best friend. Inside ofa dog, its too dark toread…”
-Groucho Marx
Ideal: No redundant info – Only
unpredictable bits transmitted.Result appears
random!
LOSSLESS: can ‘uncompress’, get back
original.
Figure by MIT OpenCourseWare.
L01 - Basics of Information 21
2/3/
(Richard Hamming, 1950) HAMMING DISTANCE: The number of digitpositions in which the corresponding digits oftwo encodings of the same length are different
The Hamming distance between a valid binary code word and the samecode word with single-bit error is 1.The problem with our simple encoding is that the two valid code words(“0” and “1”) also have a Hamming distance of 1. So a single-bit errorchanges a valid code word into another valid code word…
“heads”
“tails”
single-bit error
L01 - Basics of Information 22
6.004 – Spring 2009
2/3/
“heads”
“tails”
single-bit error
We can add single-bit error detection to any length code word by adding a parity bit
chosen to guarantee the Hamming distance between any two
valid code words is at least 2. In the diagram above, we’re using “evenparity” where the added bit is chosen to make the total number of 1’s inthe code word even.
Can we correct detected errors? Not yet…
If D is the minimumHamming distancebetween code words, wecan detect up to(D-1)-bit errors
L01 - Basics of Information 23
2/3/
110
000
“heads”
“tails”
100
010
single-bit error
111
101 001
011
By increasing the Hamming distance between valid code words to 3, weguarantee that the sets of words produced by single-bit errors don’toverlap. So if we detect an error, we can perform
error correction
since we
can tell what the valid code was before the error happened.
If D is the minimum Hammingdistance between codewords, we can correct up to
2
1
D
L01 - Basics of Information 24
6.004 – Spring 2009
2/3/
The right choice of codes can solve hard problems
L01 - Basics of Information 25
6.004 – Spring 2009
2/3/
Information resolves uncertainty
Choices equally probable:
N choices down to M
log
2
(N/M) bits of information
use fixed-length encodings
encoding numbers: 2’s complement signed integers
Choices not equally probable:
choice
i^
with probability p
i
log
2
(1/p
) bits of informationi
average number of bits =
p
i
log
2
(1/p
)i
use variable-length encodings
To detect D-bit errors: Hamming distance > D
To correct D-bit errors: Hamming distance > 2D Next time:
encoding information electrically
the digital abstraction
combinational devices
Hand in Information Sheets!