Quantifying Information

L01 - Basics of Information 9

6.004 – Spring 2009 2/3/09

(Claude Shannon, 1948)

Suppose you’re faced with N equally probable choices, and I

give you a fact that narrows it down to M choices. Then

I’ve given you

log2(N/M) bits of information

Examples:

 information in one coin ﬂip: log2(2/1) = 1 bit

 roll of 2 dice: log2(36/1) = 5.2 bits

 outcome of a Red Sox game: 1 bit

(well, actually, are both outcomes equally probable?)

Information is measured in bits

(binary digits) = number of 0/1’s

required to encode choice(s)

L01 - Basics of Information 10

6.004 – Spring 2009 2/3/09

Encoding

Encoding describes the process of

assigning representations to information

 Choosing an appropriate and eﬃcient encoding is a

real engineering challenge

 Impacts design at many levels

- Mechanism (devices, # of components used)

- Eﬃciency (bits used)

- Reliability (noise)

- Security (encryption)

Next lecture: encoding a bit.

What about longer messages?

L01 - Basics of Information 11

6.004 – Spring 2009 2/3/09

Fixed-length encodings

7bits6.426

<=(86)

log

bits4322310

<= .)(log

If all choices are equally likely (or we have no reason to expect

otherwise), then a ﬁxed-length code is often used. Such a code will

use at least enough bits to represent the information content.

ex. ~86 English characters =

{A-Z (26), a-z (26), 0-9 (10), punctuation (11), math (9), ﬁnancial (4)}

7-bit ASCII (American Standard Code for Information Interchange)

ex. Decimal digits 10 = {0,1,2,3,4,5,6,7,8,9}

4-bit BCD (binary coded decimal)

L01 - Basics of Information 12

6.004 – Spring 2009

Encoding numbers





b2v

211

21029282726252423222120

011111010000

03720

Octal - base 8

000 - 0

001 - 1

010 - 2

011 - 3

100 - 4

101 - 5

110 - 6

111 - 7

0x7d0

Hexadecimal - base 16

0000 - 0 1000 - 8

0001 - 1 1001 - 9

0010 - 2 1010 - a

0011 - 3 1011 - b

0100 - 4 1100 - c

0101 - 5 1101 - d

0110 - 6 1110 - e

0111 - 7 1111 - f

Oftentimes we will ﬁnd it

convenient to cluster

groups of bits together

for a more compact

notation. Two popular

groupings are clusters of

3 bits and 4 bits.

It is straightforward to encode positive integers as a sequence of bits.

Each bit is assigned a weight. Ordered from right to left, these weights are

increasing powers of 2. The value of an n-bit number encoded in this fashion

is given by the following formula:

= 200010

0273 0d7

Quantifying Information - Computation Structures - Lecture Slides, Slides of Computer Fundamentals

Related documents

Partial preview of the text

Download Quantifying Information - Computation Structures - Lecture Slides and more Slides Computer Fundamentals in PDF only on Docsity!

Suppose you’re faced with N equally probable choices, and Igive you a fact that narrows it down to M choices. ThenI’ve given you

log

(N/M)

bits

of information

Examples:

information in one coin flip: log

(2/1) = 1 bit

roll of 2 dice: log

(36/1) = 5.2 bits

outcome of a Red Sox game: 1 bit

Encoding

Encoding describes the process of assigning representations to information

Choosing an appropriate and efficient encoding is a

real engineering challenge

Impacts design at many levels

Next lecture: encoding a

bit.

What about

longer

messages?

Fixed-length encodings

7bits

log

log

If all choices are equally likely (or we have no reason to expectotherwise), then a fixed-length code is often used. Such a code willuse at least enough bits to represent the information content.

Encoding numbers

b

v

d

Signed integers: 2’s complement

When choices aren’t equally probable

When the choices have different probabilities (p

), you get more

information when learning of a unlikely choice than when learningof a likely choice

Information from choice

= log

(1/p

) bitsi^

Average information from a choice =

p

log

(1/p

)i^

Example

Variable-length encodings

B

C

D

A

Use shorter bit sequences for high probability choices,longer sequences for less probable choices

Data Compression

re-encoding to remove

redundant information: matchdata rate to actual informationcontent.

Hamming Distance

Error Detection

What we need is an encoding where a single-biterror doesn’t produce another valid code word.

Error Correction

Summary

^

^

^

^

^

^

^

^

^