Understanding Compression and Huffman Codes: A Guide to Data Compression Techniques - Prof, Study notes of Computer Science

An overview of data compression, focusing on the concepts of compression, its benefits, and the use of huffman codes. Learn about the effectiveness of compression, the difference between lossless and lossy compression, and the construction of huffman trees. This resource is ideal for students in computer science and related fields.

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-qlu
koofers-user-qlu ๐Ÿ‡บ๐Ÿ‡ธ

10 documents

1 / 31

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
CMSC 132:
Object-Oriented Programming II
Compression & Huffman Codes
Department of Computer Science
University of Maryland, College Park
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f

Partial preview of the text

Download Understanding Compression and Huffman Codes: A Guide to Data Compression Techniques - Prof and more Study notes Computer Science in PDF only on Docsity!

CMSC 132:

Object-Oriented Programming II

Compression & Huffman Codes

Department of Computer Science

University of Maryland, College Park

Overview

Compression

Examples

Sources

Types

Effectiveness

Huffman Code

Properties

Huffman tree (encoding)

Decoding

Compression Examples

Formats

General

.zip, .rar

Images

.jpg, .gif

Audio

.mp3, .wmv

Video

.mpg, .mov

Sources of Compressibility

Redundancy

Recognize repeating patterns

Exploit using

Dictionary

Variable length encoding

Human perception

Less sensitive to some information

Can discard less important data

Effectiveness of Compression

Metrics

Bits per byte (8 bits)

2 bits / byte ๏ƒž ยผ original size

8 bits / byte ๏ƒž no compression

Percentage

75% compression ๏ƒž ยผ original size

Effectiveness of Compression

Depends on data

Random data ๏ƒž hard

Example: 1001110100 ๏ƒž?

Organized data ๏ƒž easy

Example: 1111111111 ๏ƒž 1 ๏‚ด 10

Corollary

No universally best compression algorithm

Lossless Compression Techniques

LZW (Lempel-Ziv-Welch) compression

Build pattern dictionary

Replace patterns with index into dictionary

Run length encoding

Find & compress repetitive sequences

Huffman code

Use variable length codes based on frequency

Huffman Code

Approach

Variable length encoding of symbols

Exploit statistical frequency of symbols

Efficient when symbol probabilities vary widely

Principle

Use fewer bits to represent frequent symbols

Use more bits to represent infrequent symbols

A A B A

A A B A

Huffman Code Data Structures

Binary (Huffman) tree

Represents Huffman code

Edge ๏ƒž code (0 or 1)

Leaf ๏ƒž symbol

Path to leaf ๏ƒž encoding

Example

A = โ€œ11โ€, H = โ€œ10โ€, C = โ€œ0โ€

Priority queue

To efficiently build binary tree

A

C

H

Huffman Code Algorithm Overview

Encoding

  1. Calculate frequency of symbols in file
  2. Create binary tree representing โ€œbestโ€ encoding
  3. Use binary tree to encode compressed file

For each symbol, output path from root to leaf

Size of encoding = length of path

  1. Save binary tree

Huffman Tree Construction 1

A

C E H I

2 trees with

lowest

frequency

Huffman Tree Construction 2

A

C E

H

I

2 trees with

lowest

frequency

Huffman Tree Construction 4

A

C

E

H

I

2 trees with

lowest

frequency

Huffman Tree Construction 5

A

C E

H

I

E = 01

I = 00

C = 10

A = 111

H = 110

Huffman

code for

each leaf