Compression and Huffman Codes for Object Oriented Programming II | CMSC 132, Study notes of Computer Science

Material Type: Notes; Professor: Padua-Perez; Class: OBJECT-ORIENTED PROG II; Subject: Computer Science; University: University of Maryland; Term: Spring 2006;

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-7zo
koofers-user-7zo ๐Ÿ‡บ๐Ÿ‡ธ

10 documents

1 / 30

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Compression & Huffman Codes
Nelson Padua-Perez
William Pugh
Department of Computer Science
University of Maryland, College Park
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e

Partial preview of the text

Download Compression and Huffman Codes for Object Oriented Programming II | CMSC 132 and more Study notes Computer Science in PDF only on Docsity!

Compression & Huffman Codes

Nelson Padua-Perez

William Pugh

Department of Computer Science

University of Maryland, College Park

Compression

Definition

Reduce size of data

(number of bits needed to represent data)

Benefits

Reduce storage needed

Reduce transmission cost / latency / bandwidth

Sources of Compressibility

Redundancy

Recognize repeating patterns

Exploit using

Dictionary

Variable length encoding

Human perception

Less sensitive to some information

Can discard less important data

Types of Compression

Lossless

Preserves all information

Exploits redundancy in data

Applied to general data

Lossy

May lose some information

Exploits redundancy & human perception

Applied to audio, image, video

Effectiveness of Compression

Depends on data

Random data โ‡’ hard

Example: 1001110100 โ‡’?

Organized data โ‡’ easy

Example: 1111111111 โ‡’ 1 ร— 10

Corollary

No universally best compression algorithm

Effectiveness of Compression

Lossless Compression is not guaranteed

Pigeonhole principle

Reduce size 1 bit โ‡’ can only store ยฝ of data

Example

000, 001, 010, 011, 100, 101, 110, 111 โ‡’ 00, 01, 10, 11

If compression is always possible (alternative view)

Compress file (reduce size by 1 bit)

Recompress output

Repeat (until we can store data with 0 bits)

Huffman Code

Approach

Variable length encoding of symbols

Exploit statistical frequency of symbols

Efficient when symbol probabilities vary widely

Principle

Use fewer bits to represent frequent symbols

Use more bits to represent infrequent symbols

A A B A

A A B A

Huffman Code Example

Expected size

Original โ‡’ 1/8ร—2 + 1/4ร—2 + 1/2ร—2 + 1/8ร—2 = 2 bits / symbol

Huffman โ‡’ 1/8ร—3 + 1/4ร—2 + 1/2ร—1 + 1/8ร—3 = 1.75 bits / symbol

Symbol

3 bits 2 bits 1 bit 3 bits

Huffman 110 10 0 111

Encoding

2 bits

Bird

Frequency 1/8 1/4 1/

2 bits 2 bits 2 bits

Original

Encoding

Dog Cat Fish

Huffman Code Algorithm Overview

Encoding

Calculate frequency of symbols in file

Create binary tree representing โ€œbestโ€ encoding

Use binary tree to encode compressed file

For each symbol, output path from root to leaf

Size of encoding = length of path

Save binary tree

Huffman Code โ€“ Creating Tree

Algorithm

Place each symbol in leaf

Weight of leaf = symbol frequency

Select two trees L and R (initially leafs)

Such that L, R have lowest frequencies in tree

Create new (internal) node

Left child โ‡’ L

Right child โ‡’ R

New frequency โ‡’ frequency( L ) + frequency( R )

Repeat until all nodes merged into one tree

Huffman Tree Construction 2

A

C E

H

I

Huffman Tree Construction 3

A

C

E

H

I

Huffman Tree Construction 5

A

C E

H

I

E = 01

I = 00

C = 10

A = 111

H = 110

Huffman Coding Example

Huffman code

Input

ACE

Output

E = 01

I = 00

C = 10

A = 111

H = 110