Compression Algorithms: Uniform & Variable Length Encoding, Huffman, Dictionary, LZ78 & LZ | Study notes Algorithms and Programming

CS231 Algorithms Handout # 31

Prof. Lyn Turbak November 20, 2001

Wellesley College

Compression

The Big Picture

We want to be able to store and retrieve data, as well as communicate it with others. In general,

this requires encoding the data and decoding the encoded data:

encode decode

data storage medium/

communications network

data

For the purpose of this lecture, we observe the following constraints:

•We consider only digital storage and communication media, in which all information is

expressed via a sequence of discrete values (in the simplest case, 0 and 1). This contrasts

with analog media modeled by continuously varying waveforms.

•We consider only lossless storage/transmission in which all information in the original data

must be preserved. I.e., for all d,decode(encode(d)) =d. This constrast with lossy

approaches that may lose some information (common with images, such as JPEG format,

where the data is already an approximation of an analog waveform).

•We consider only noiseless storage/transmission in which no errors are introduced between

the encoding and decoding phases. In practice, error-correction strategies must be applied to

handle errors introduced by real-world “noise”.

Uniform-Length Encoding of Textual Data

For textual data, it is common to encode each character as an 8-bit byt e using a uniform-length

encoding known as ASCII. Each byte can be written as a decimal integer in the range [0 .. 255].

Below is a table showing ASCII values in the range [0 .. 127] and their associated characters:

0:^@ 1:^A 2:^B 3:^C 4:^D 5:^E 6:^F 7:^G

8:^H 9:\t 10:\n 11:^K 12:^L 13:^M 14:^N 15:^O

16:^P 17:^Q 18:^R 19:^S 20:^T 21:^U 22:^V 23:^W

24:^X 25:^Y 26:^Z 27:^[ 28:^\ 29:^] 30:^^ 31:^_

32: 33:! 34:" 35:# 36:$ 37:% 38:& 39:’

40:( 41:) 42:* 43:+ 44:, 45:- 46:. 47:/

48:0 49:1 50:2 51:3 52:4 53:5 54:6 55:7

56:8 57:9 58:: 59:; 60:< 61:= 62:> 63:?

64:@ 65:A 66:B 67:C 68:D 69:E 70:F 71:G

72:H 73:I 74:J 75:K 76:L 77:M 78:N 79:O

80:P 81:Q 82:R 83:S 84:T 85:U 86:V 87:W

88:X 89:Y 90:Z 91:[ 92:\ 93:] 94:^ 95:_

96:‘ 97:a 98:b 99:c 100:d 101:e 102:f 103:g

104:h 105:i 106:j 107:k 108:l 109:m 110:n 111:o

112:p 113:q 114:r 115:s 116:t 117:u 118:v 119:w

120:x 121:y 122:z 123:{ 124:| 125:} 126:~ 127:^?

For a letter or symbol σ, the notation ^σstands for the character specified by pressing the Control

key and σkey at the same time. The notation \t stands for the tab character, and \n for the

newline character. Integers in the range [128 .. 255] correspond to other special characters.

Compression Algorithms: Uniform & Variable Length Encoding, Huffman, Dictionary, LZ78 & LZ, Study notes of Algorithms and Programming

Related documents

Partial preview of the text

Download Compression Algorithms: Uniform & Variable Length Encoding, Huffman, Dictionary, LZ78 & LZ and more Study notes Algorithms and Programming in PDF only on Docsity!

Compression