Understanding Compression and Huffman Codes: A Detailed Guide, Study notes of Computer Science

An in-depth exploration of compression techniques, focusing on huffman codes. Learn about the benefits of compression, sources of compressibility, types of compression, and the effectiveness of lossless and lossy compression. Discover the principles of huffman coding and its advantages over other methods.

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-fh7
koofers-user-fh7 ๐Ÿ‡บ๐Ÿ‡ธ

10 documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
1
CMSC 132:
Object-Oriented Programming II
Compression & Huffman Codes
Department of Computer Science
University of Maryland, College Park
2
Overview
Compression
Examples
Sources
Types
Effectiveness
Huffman Code
Properties
Huffman tree (encoding)
Decoding
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Understanding Compression and Huffman Codes: A Detailed Guide and more Study notes Computer Science in PDF only on Docsity!

1

CMSC 132:

Object-Oriented Programming II

Compression & Huffman Codes

Department of Computer Science

University of Maryland, College Park

2

Overview

Compression Examples

Sources Types Effectiveness

Huffman Code Properties

Huffman tree (encoding) Decoding

3

Compression

Definition Reduce size of data

(number of bits needed to represent data)

Benefits Reduce storage needed

Reduce transmission cost / latency / bandwidth

4

Compression Examples

Tools winzip, pkzip, compress, gzip

Formats Images

Audio^ .jpg, .gif Video^ .wav (CD), .mp3, .wma, .aac General^ mpeg1 (LD,VCD), mpeg2 (DVD), mpeg4 (Divx) .zip, .gz

7

Effectiveness of Compression

Metrics Bits per byte (8 bits)

2 bits / byte 8 bits / byte โ‡’โ‡’ ยผ original sizeno compression Percentage 75% compression โ‡’ ยผ original size

8

Effectiveness of Compression

Depends on data Random data โ‡’ hard

Organized data^ Example: 1001110100 โ‡’ easy^ โ‡’^? Example: 1111111111 โ‡’ 1 ร— 10

Corollary No universally best compression algorithm

9

Effectiveness of Compression

Lossless Compression is not guaranteed Pigeonhole principle

Reduce size 1 bit Example โ‡’ can only store ยฝ of data If compression is always possible (alternative view)^ 000, 001, 010, 011, 100, 101, 110, 111^ โ‡’^ 00, 01, 10, 11 Compress file (reduce size by 1 bit) Recompress output Repeat (until we can store data with 0 bits)

10

Lossless Compression Techniques

LZW (Lempel-Ziv-Welch) compression Build pattern dictionary

Replace patterns with index into dictionary

Run length encoding Find & compress repetitive sequences

Huffman code Use variable length codes based on frequency

13

Huffman Code Data Structures

Binary (Huffman) tree Represents Huffman code

Edge Leaf โ‡’ โ‡’ symbol code (0 or 1) Path to leaf Example โ‡’ encoding A = โ€œ11โ€, H = โ€œ10โ€, C = โ€œ0โ€

Priority queue To efficiently build binary tree

A

C

H

14

Huffman Code Algorithm Overview

Encoding Calculate frequency of symbols in file

Create binary tree representing โ€œbestโ€ encoding Use binary tree to encode compressed file For each symbol, output path from root to leaf Size of encoding = length of path Save binary tree

15

Huffman Code โ€“ Creating Tree

Algorithm Place each symbol in leaf

Select two trees L and R (initially leafs)^ Weight of leaf = symbol frequency Create new (internal) node^ Such that L, R have lowest frequencies in tree Left child Right child โ‡’ โ‡’ L R Repeat until all nodes merged into one tree^ New frequency^ โ‡’^ frequency( L ) + frequency( R )

16

Huffman Tree Construction 1

3 5 8 2 7

A C^ E^ H^ I

19

Huffman Tree Construction 4

3 5

A

C

H E I

20

Huffman Tree Construction 5

3 5 8

A

C E

H

I

EI == (^0100) CA == (^10111) H = 110

21

Huffman Coding Example

Huffman code

Input ACE

Output (111)(10)(01) = 1111001

EI == (^0100) C = 10 AH == (^111110)

22

Huffman Code Algorithm Overview

Decoding Read compressed file & binary tree

Use binary tree to decode file Follow path from root to leaf

25

Huffman Decoding 3

3 5 8

A

C E

H

I

1111001 A

26

Huffman Decoding 4

3 5 8

A

C E

H

I

1111001 A

27

Huffman Decoding 5

3 5 8

A

C E

H

I

1111001 AC

28

Huffman Decoding 6

3 5 8

A

C E

H

I

1111001 AC

31

Huffman Code Properties

Greedy algorithm Chooses best local solution at each step

Combines 2 trees with lowest frequency

Still yields overall best solution Optimal prefix code

Based on statistical frequency

Better compression possible (depends on data) Using other approaches (e.g., pattern dictionary)