



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This lecture was delivered by Dr. Ameet Shashank at B R Ambedkar National Institute of Technology. Its relate to Data Representation and Algorithm Design course. Its main points are: Huffman, Codes, Data, Compression, Prefix, Optimal, Binary, Trees, False, Start
Typology: Slides
1 / 6
This page cannot be seen from the preview
Don't miss anything!




These lecture slides are supplied by Mathijs de Weerd 2
Q. Given a text that uses 32 symbols (26 different letters, space, and some punctuation characters), how can we encode this text in bits? Q. Some symbols (e, t, a, o, i, n) are used far more often than others. How can we use this to reduce our encoding? Q. How do we know when the next symbol begins? Ex. c(a) = 01 What is 0101? c(b) = 010 c(e) = 1 3
Q. Given a text that uses 32 symbols (26 different letters, space, and some punctuation characters), how can we encode this text in bits? A. We can encode 2^5 different symbols using a fixed length of 5 bits per symbol. This is called fixed length encoding. Q. Some symbols (e, t, a, o, i, n) are used far more often than others. How can we use this to reduce our encoding? A. Encode these characters with fewer bits, and the others with more bits. Q. How do we know when the next symbol begins? A. Use a separation symbol (like the pause in Morse), or make sure that there is no ambiguity by ensuring that no code is a prefix of another one. Ex. c(a) = 01 What is 0101? c(b) = 010 c(e) = 1 4
Definition. A prefix code for a set S is a function c that maps each x∈ S to 1s and 0s in such a way that for x,y∈S, x≠ y, c(x) is not a prefix of c(y). Ex. c(a) = 11 c(e) = 01 c(k) = 001 c(l) = 10 c(u) = 000 Q. What is the meaning of 1001000001? Suppose frequencies are known in a text of 1G: fa=0.4, fe=0.2, fk=0.2, fl=0.1, fu=0. Q. What is the size of the encoded text?
5
Definition. A prefix code for a set S is a function c that maps each x∈ S to 1s and 0s in such a way that for x,y∈S, x≠ y, c(x) is not a prefix of c(y). Ex. c(a) = 11 c(e) = 01 c(k) = 001 c(l) = 10 c(u) = 000 Q. What is the meaning of 1001000001? A. “leuk” Suppose frequencies are known in a text of 1G: fa=0.4, fe=0.2, fk=0.2, fl=0.1, fu=0. Q. What is the size of the encoded text? A. 2fa + 2fe + 3fk + 2fl + 4*fu = 2.4G 6
Definition. The average bits per letter of a prefix code c is the sum over all symbols of its frequency times the number of bits of its encoding: We would like to find a prefix code that is has the lowest possible average bits per letter. Suppose we model a code in a binary tree…
"
xS
7
Ex. c(a) = 11 c(e) = 01 c(k) = 001 c(l) = 10 c(u) = 000 Q. How does the tree of a prefix code look? l u e^ a k 0 0 0 0 1 1 1 1 8
Ex. c(a) = 11 c(e) = 01 c(k) = 001 c(l) = 10 c(u) = 000 Q. How does the tree of a prefix code look? A. Only the leaves have a label. Pf. An encoding of x is a prefix of an encoding of y if and only if the path of x is a prefix of the path of y. l u e^ a k 0 0 0 0 1 1 1 1
13 Definition. A tree is full if every node that is not a leaf has two children. Claim. The binary tree corresponding to the optimal prefix code is full. Pf. (by contradiction) Suppose T is binary tree of optimal prefix code and is not full. This means there is a node u with only one child v. Case 1: u is the root; delete u and use v as the root Case 2: u is not the root
v w u 14
Q. Where in the tree of an optimal prefix code should letters be placed with a high frequency? 15
Q. Where in the tree of an optimal prefix code should letters be placed with a high frequency? A. Near the top. Greedy template. Create tree top-down, split S into two sets S 1 and S 2 with (almost) equal frequencies. Recursively build tree for S 1 and S 2. [Shannon-Fano, 1949] fa=0.32, fe=0.25, fk=0.20, fl=0.18, fu=0. l u e^ a k e u k^ a l 0.18^ 0.32^ 0.
0.25 0.
0.05 0. 16
Observation. Lowest frequency items should be at the lowest level in tree of optimal prefix code. Observation. For n > 1, the lowest level always contains at least two leaves. Observation. The order in which items appear in a level does not matter. Claim. There is an optimal prefix code with tree T* where the two lowest-frequency letters are assigned to leaves that are siblings in T*. Greedy template. [Huffman, 1952] Create tree bottom-up. Make two leaves for two lowest-frequency letters y and z. Recursively build tree for the rest using a meta-letter for yz.
17
Q. What is the time complexity? Huffman(S) { if |S|=2 { return tree with root and 2 leaves } else { let y and z be lowest-frequency letters in S S’ = S remove y and z from S’ insert new letter ω in S’ with f ω =fy+fz T’ = Huffman(S’) T = add two children y and z to leaf ω from T’ return T } } 18
Q. What is the time complexity? A. T(n) = T(n-1) + O(n) so O(n^2 ) Q. How to implement finding lowest-frequency letters efficiently? A. Use priority queue for S: T(n) = T(n-1) + O(log n) so O(n log n) Huffman(S) { if |S|=2 { return tree with root and 2 leaves } else { let y and z be lowest-frequency letters in S S’ = S remove y and z from S’ insert new letter ω in S’ with f ω =fy+fz T’ = Huffman(S’) T = add two children y and z to leaf ω from T’ return T } } 19
Claim. Huffman code for S achieves the minimum ABL of any prefix code. Pf. by induction, based on optimality of T’ (y and z removed, ω added) (see next page) Claim. ABL(T’)=ABL(T)-fω Pf. 20
Claim. Huffman code for S achieves the minimum ABL of any prefix code. Pf. by induction, based on optimality of T’ (y and z removed, ω added) (see next page) Claim. ABL(T’)=ABL(T)-fω Pf. ! ABL( T ) = fx " depth T ( x ) x # S
= fy " depth T ( y ) + fz " depth T ( z ) + fx " depth T ( x ) x # S , x % y , z
= (^) ( fy + fz ) " (^) ( 1 + depth T ( &)) + fx " depth T ( x ) x # S , x % y , z
= f & " (^) ( 1 + depth T ( &)) + fx " depth T ( x ) x # S , x % y , z
= f & + fx " depth T ' ( x ) x # S '
= f & + ABL( T ' )