

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This pdf contain code of huffman encoding algorithm
Typology: Assignments
1 / 3
This page cannot be seen from the preview
Don't miss anything!


Huffman invented a greedy algorithm that constructs an optimal prefix code called a Huffman code. In the pseudocode that follows (Algorithm 1), we assume that C is a set of n characters and that each character c ∈ C is an object with an attribute c.f req giving its frequency. The algorithm builds the tree T corresponding to the optimal code in a bottom-up manner. It begins with a set of |C| leaves and performs a sequence of |C| − 1 merging operations to create the final tree. The algorithm uses a min-priority queue Q, keyed on the freq attribute, to identify the two least-frequent objects to merge together. When we merge two objects, the result is a new object whose frequency is the sum of the frequencies of the two objects that were merged.
Algorithm 1 Huffman(C)
1: n := |C|; 2: Q := C; 3: for i := 1 to n − 1 do 4: allocate a new node z 5: z.lef t := x := Extract-Min(Q); 6: z.right := y := Extract-Min(Q); 7: z.f req := x.f req + y.f req; 8: Insert(Q, z); 9: end for 10: return Extract-Min(Q); {return the root of the tree}
a b c d e f Frequency (in thousands) 45 13 12 16 9 5 Fixed-length codeword 000 001 010 011 100 101 Variable-length codeword 0 101 100 111 1101 1100
Table 1: A character-coding problem. A data file of 100,000 characters contains only the characters af, with the frequencies indicated. If we assign each character a 3-bit codeword, we can encode the file in 300,000 bits. Using the variable- length code shown, we can encode the file in only 224,000 bits.
Figure 1: The steps of Huffmans algorithm for the frequencies given in Table
For our example, Huffmans algorithm proceeds as shown in Figure 1. Since the alphabet contains 6 letters, the initial queue size is n = 6, and 5 merge steps build the tree. The final tree represents the optimal prefix code. The codeword for a letter is the sequence of edge labels on the simple path from the root to the letter. Line 2 initializes the min-priority queue Q with the characters in C. The for loop in lines 38 repeatedly extracts the two nodes x and y of lowest frequency from the queue, replacing them in the queue with a new node z representing their merger. The frequency of z is computed as the sum of the frequencies of x and y in line 7. The node z has x as its left child and y as its right child. (This order is arbitrary; switching the left and right child of any node yields a different code of the same cost.) After n − 1 mergers, line 9 returns the one node left in the queue, which is the root of the code tree. To analyze the running time of Huffmans algorithm, we assume that Q is implemented as a binary min-heap (see [1, Chapter 6]). For a set C of n characters, we can initialize Q in line 2 in O(n) time using the Build-Min-