Data Compression - Data Structures - Lecture Slides, Slides of Data Structures and Algorithms

Some concept of Data Structures are Abstract, Balance Factor, Complete Binary Tree, Dynamically, Storage, Implementation, Sequential Search, Advanced Data Structures, Graph Coloring Two, Insertion Sort. Main points of this lecture are: Data Compression, Lossless, Lossy Compression, File Size, Time and Space, Data Compression Algorithms, Statistical, Frequency, Represent the Data, Examples in Computers

Typology: Slides

2012/2013

Uploaded on 04/30/2013

dinpal
dinpal 🇮🇳

3.6

(12)

73 documents

1 / 25

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Data Compression
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19

Partial preview of the text

Download Data Compression - Data Structures - Lecture Slides and more Slides Data Structures and Algorithms in PDF only on Docsity!

Data Compression

What is Data Compression?

  • There is lossless and lossy compression,

either way, file size is reduced

  • This saves both time and space (premium)
  • Data Compression Algorithms are more

successful if they are based on statistical

analysis of the frequency of the data and the

accuracy needed to represent the data.

What is Greedy Algorithm

  • Solve a problem in stages
  • Make a locally optimum decision
  • Algorithm is good if local optimum is equal

is to the global optimum

Examples of Greedy

  • Dijkstra, Prim, Kruskal
  • Bin Packing problem
  • Huffman Code

David Huffman

  • Paper published in 1952
  • “A Method for the Construction of

Minimum Redundancy Codes”

  • What we call “Data Compression” is what

he termed “Minimum Redundancy”

ASCII Code

  • 128 characters includes punctuation
  • log 128 = 7 bits
  • 1 byte = 8 bits
  • All characters are 8 bits long
  • “Fixed-Length Encoding”
  • “Etaoin Shrdlu” most common letters!!!

Huffman Algorithm (English)

    1. Maintain a forest of trees
    1. Weight of tree = sum frequency of leaves
    1. For 0 to N-
    • Select two smallest weight trees
    • Form a new tree

Huffman Algorithm (Technical)

  • n  |C|
  • Q  C
  • For i 1 to n – 1
    • Do z  AllocateNode()
    • x  left[z]  ExtractMin(Q)
    • y  right[z]  ExtractMin(Q)
    • f[z]  f[x] + f[y]
    • Insert(Q, z)
  • Return Extract-Min(Q)

Step0:

(q) (w) (e) (r) (t) (y) (u)

Step1: ( ) 10 20 15 25 16 /
(q) (w) (r) (t) (u) (y) (e)

Step2: ( )

/
( ) (q) 20 15 25 16 / \

(w) (r) (t) (u) (y) (e)

Step3: ( ) /
( )31 ( ) (q) 20 25 / \ /
(w) (t) (r) (u) (y) (e)

Step3: ( )

/
( )31 ( ) (q) 20 25 / \ / \

(w) (t) (r) (u) (y) (e)

Step4 : ( ) /
( ) (w) /
( )31 ( ) (q) 25 / \ /
(t) (r) (u) (y) (e)

Step5 : ( ) /
( )56 ( ) (w) / \ /
(t) ( ) ( ) (q) / \ /
(r) (u) (y) (e)

Step6 : ( ) /
( ) ( ) / \ /
( ) (w) (t) ( ) / \ /
( ) (q) (r) (u) /
(y) (e)

Proof: part 1

  • Lemma:
    • Let C be an alphabet in which each character c in C has frequency f[c]
    • Let x and y be two characters in C having lowest frequencies
    • There exists an optimal prefix code in C in which the codes for x and y have the same length and differ only in last bit

Proof: part 2

  • Lemma:
    • Let T be a full binary tree representing an optimal prefix code over an alphabet C
    • Let z be the parent of two leaves x and y
    • Then T” = T – {x,y} represents an optimal prefix code for C” = C – {x,y}U{z}

Lengths of Encoding Set

root /
/ \ 8 / \ 7 / \ 6 / \ 5 / \ 4 / \ 3

1 2

Length of set is: 7+7+6+5+4+3+2+1 = 35bits

This is what you would get if the nodes vary the most in probability.

Expected Value / character

  • In example 1:

• 8 * (1/2^3) * 3) = 3 bits

  • In example 2:
  • 2 * (1/2^7 * 7) + (1/2^6 * 6) + (1/2^5 * 5) +

(1/2^4 * 4) + (1/2^3 * 3) + (1/2^2 * 2) +

(1/2^1 * 1) = 1.98 bits