Data Compressor - Data Structures - Lecture Slides, Slides of Data Structures and Algorithms

Some concept of Data Structures are Abstract, Balance Factor, Complete Binary Tree, Dynamically, Storage, Implementation, Sequential Search, Advanced Data Structures, Graph Coloring Two, Insertion Sort. Main points of this lecture are: Data Compressor, Encoding and Decoding, Huffman, Compression, Files and Messages, Wasting, Smallest Number, Arbitrary Piece, Frequency, Short Bit Strings

Typology: Slides

2012/2013

Uploaded on 04/30/2013

dinpal
dinpal 🇮🇳

3.6

(12)

73 documents

1 / 44

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Data Compressor---Huffman
Encoding and Decoding
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c

Partial preview of the text

Download Data Compressor - Data Structures - Lecture Slides and more Slides Data Structures and Algorithms in PDF only on Docsity!

Data Compressor---Huffman

Encoding and Decoding

Huffman Encoding

  • Compression
    • Typically, in files and messages,
      • Each character requires 1 byte or 8 bits
      • Already wasting 1 bit for most purposes!
  • Question
    • What’s the smallest number of bits that can be used to store an arbitrary piece of text?
  • Idea
    • Find the frequency of occurrence of each character
    • Encode Frequent characters short bit strings
    • Rarer characters longer bit strings

Huffman's Algorithm

• Repeatedly merges trees - maintains a forest

• Tree weight - the sum of its leaves frequencies

• For C characters to code, start with C single

node trees

• Select two trees, T 1 and T 2 , of smallest weights

and merge them

• C - 1 merge operations

Huffman Encoding

  • Encoding
    • Use a tree
      • Inefficient in practice
    • Use a direct-addressed lookup table

? Finding the optimal encoding

  • Smallest number of bits to represent arbitrary text

A 010

E 00

B : : N : S T

  • A divide-and-conquer approach might have us

asking which characters should appear in the

left and right subtrees and trying to build the

tree from the top down.

  • A greedy approach places our n characters in

n sub-trees and starts by combining the two

least weight nodes into a tree which is

assigned the sum of the two leaf node weights

as the weight for its root node.

Standard Coding Scheme

Binary Tree Representation

• For the character set of C characters, the

standard fixed-length coding needs ┌log C┐^ bits

• Fixed-length code can be represented by a

binary tree where characters are stored only

in leaf nodes - binary trie

• Each character path - start at the root, follow

the branches, record 0 for the left branch and

1 for the right branch

• Optimal code is always a full tree - all nodes

are either leaves or have two children

Improved Binary Trie

Prefix Code

• The fixed-length character code that has

characters places only at the leaves

guarantees that any bit sequence can be

decoded unambiguously

• Prefix code - characters may have varying

lengths as long as no character code is a prefix

of another code

• That means that characters can be only in

leafs

Optimal Prefix Code Tree

Optimal Prefix Code Cost