Download Huffman Coding: A Greedy Approach to Data Compression and more Study notes Design and Analysis of Algorithms in PDF only on Docsity!
Dept. CSE, UT Arlington CSE5311 Design and Analysis of AlgorithmsCSE5311 Design and Analysis of Algorithms 11
CSE 5311
Lecture 17 Greedy algorithms: Huffman
Coding
Junzhou Huang, Ph.D. Department of Computer Science and Engineering Design and Analysis of Algorithms
- Suppose we have 1000000000 (1G) character data file that we wish to include in an email.
- Suppose file only contains 26 letters {a,…,z}.
- Suppose each letter a in {a,…,z} occurs with frequency f a
- Suppose we encode each letter by a binary code
- If we use a fixed length code, we need 5 bits for each character
- The resulting message length is
- Can we do better?
Data Compression
a b z
5 f f f
Data Compression: A Smaller Example
- Suppose the file only has 6 letters {a,b,c,d,e,f} with frequencies
- Fixed length 3G=3000000000 bits
- Variable length
a b c d e f
Fixed length Variable length . 45 1 . 13 3 . 12 3 . 16 3 . 09 4 . 05 4 2. 24 G
How to decode?
- At first it is not obvious how decoding will happen, but this is possible if we use prefix codes
Prefix codes
- A message can be decoded uniquely.
- Following the tree until it reaches to a leaf, and then repeat!
- Draw a few more tree and produce the codes!!!
Some Properties
- Prefix codes allow easy decoding
- Given a: 0, b: 101, c: 100, d: 111, e: 1101, f: 1100
- Decode 001011101 going left to right, 0|01011101, a|0|1011101, a|a|101|1101, a|a|b|1101, a|a|b|e
- An optimal code must be a full binary tree (a tree where every internal node has two children)
- For C leaves there are C-1 internal nodes
- The number of bits to encode a file is where f(c) is the freq of c, d T (c) is the tree depth of c, which corresponds to the code length of c
Greedy Algorithms
- Many optimization problems can be solved using a greedy approach - The basic principle is that local optimal decisions may may be used to build an optimal solution - But the greedy approach may not always lead to an optimal solution overall for all problems - The key is knowing which problems will work with this approach and which will not
- We will study
- The problem of generating Huffman codes
Greedy algorithms
- A greedy algorithm always makes the choice that looks best at the moment - My everyday examples: Driving in Los Angeles, NY, or Boston for that matter Playing cards Invest on stocks Choose a university - The hope: a locally optimal choice will lead to a globally optimal solution - For some problems, it works
- Greedy algorithms tend to be easier to code
Building the Encoding Tree
Building the Encoding Tree
Building the Encoding Tree
Building the Encoding Tree
Correctness of Huffman’s Algorithm
Since each swap does not increase the cost, the resulting tree T’’ is also an optimal tree
Lemma 16.
- Without loss of generality, assume f[a]f[b] and f[x]f[y]
- The cost difference between T and T’ is 0 ( [ ] [ ])( ( ) ( )) [ ] ( ) [ ] ( ) [ ] ( ) [ ] ( ) [ ] ( ) [ ] ( ) [ ] ( ) [ ] ( ) ( ) ( ') ( ) ( ) ( ) ( ) ' ' ' f a f x d a d x f x d x f a d a f x d a f a d x f x d x f a d a f x d x f a d a B T B T f c d c f c d c T T T T T T T T T T c C T c C T B( T’’) B( T), but T is optimal, B( T) B( T’’) B( T’’) = B( T) Therefore T’’ is an optimal tree in which x and y appear as sibling leaves of maximum depth