Compression and Huffman Codes: Understanding Data Compression Techniques, Study notes of Computer Science

An overview of data compression, focusing on the concepts of huffman codes. Learn about the benefits of compression, different types, and the effectiveness of lossless and lossy methods. Discover the principles of huffman coding and its advantages in handling varying symbol probabilities.

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-qlu
koofers-user-qlu 🇺🇸

10 documents

1 / 31

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Compression & Huffman Codes
CMSC 132
Department of Computer Science
University of Maryland, College Park
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f

Partial preview of the text

Download Compression and Huffman Codes: Understanding Data Compression Techniques and more Study notes Computer Science in PDF only on Docsity!

Compression & Huffman Codes

CMSC 132

Department of Computer Science

University of Maryland, College Park

Overview

Compression Compression Examples Types of Compression Effectiveness of Compression Huffman Code Huffman Code Properties

Compression Examples

Tools

winzip, pkzip, compress, gzip

Formats

Images

.jpg, .gif

Audio

.mp3, .wav

Video

mpeg1 (VCD), mpeg2 (DVD), mpeg4 (Divx)

General

.zip, .gz

Sources of Compressibility

Redundancy

Recognize repeating patterns Exploit using

Dictionary Variable length encoding

Human perception

Less sensitive to some information Can discard less important data

Effectiveness of Compression

Metrics

Bits per byte (8 bits)

2 bits / byte

¼ original size

8 bits / byte

no compression

Percentage

75% compression

¼ original size

Effectiveness of Compression

Depends on data

Random data

hard

Example: 1001110100

Organized data

easy

Example: 1111111111

××××

Corollary

No universally best compression algorithm

Lossless Compression Techniques

LZW (Lempel-Ziv-Welch) compression

Build pattern dictionary Replace patterns with index into dictionary

Run length encoding

Find & compress repetitive sequences

Huffman code

Use variable length codes based on frequency

Huffman Code

Approach

Variable length encoding of symbols Exploit statistical frequency of symbols Efficient when symbol probabilities vary widely

Principle

Use fewer bits to represent

frequent

symbols

Use more bits to represent

infrequent

symbols

A

A

B

A

A

A

A

B

Huffman Code Data Structures

Binary (Huffman) tree

Represents Huffman code Edge

code (0 or 1)

Leaf

symbol

Path to leaf

encoding

Example

A = “11”, H = “10”, C = “0”

Priority queue

To efficiently build binary tree

A

C

H

Huffman Code Algorithm Overview

Encoding

Calculate frequency of symbols in file Create binary tree representing “best” encoding Use binary tree to encode compressed file

For each symbol, output path from root to leaf Size of encoding = length of path

Save binary tree

Huffman Tree Construction 1

A

C

E

H

I

Huffman Tree Construction 2

A

C

E

H

I

Huffman Tree Construction 4

A

C

E

H

I

Huffman Tree Construction 5

A

C

E

H

I

E

I

C

A

H