Huffman Encoding, Correctness - Design and Analysis - Study Notes, Study notes of Digital Systems Design

Huffman Encoding Correctness, Optimal prefix code tree T, Swap x and b in tree prefix tree T, Activity Selection, Maximum depth in the tree, Claim and Proof are the key points in this study notes file.

Typology: Study notes

2011/2012

Uploaded on 11/03/2012

ankitay
ankitay 🇮🇳

4.4

(50)

106 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Lecture No. 25
7.2.2 Huffman Encoding: Correctness
Huffman algorithm uses a greedy approach to generate a prefix code T that minimizes the
expected length B(T) of the encoded string. In other words, Huffman algorithm generates
an optimum prefix code.
The question that remains is that why is the algorithm correct?
Recall that the cost of any encoding tree T is
Our approach to prove the correctness of Huffman Encoding will be to show that any tree
that differs from the one constructed by Huffman algorithm can be converted into one
that is equal to Huffman’s tree without increasing its costs. Note that the binary tree
constructed by Huffman algorithm is a full binary tree.
Claim:
Consider two characters x and y with the smallest probabilities. Then there is optimal
code tree in which
these two characters are siblings at the maximum depth in the tree.
Proof:
Let T be any optimal prefix code tree with two siblings b and c at the maximum depth of
the tree. Such a
tree is shown in Figure 7.2Assume without loss of generality that
p(b) p(c) and p(x) p(y)
Figure 7.2: Optimal prefix code tree T
Docsity.com
pf3
pf4
pf5

Partial preview of the text

Download Huffman Encoding, Correctness - Design and Analysis - Study Notes and more Study notes Digital Systems Design in PDF only on Docsity!

Lecture No. 25

7.2.2 Huffman Encoding: Correctness

Huffman algorithm uses a greedy approach to generate a prefix code T that minimizes the expected length B(T) of the encoded string. In other words, Huffman algorithm generates an optimum prefix code. The question that remains is that why is the algorithm correct? Recall that the cost of any encoding tree T is

Our approach to prove the correctness of Huffman Encoding will be to show that any tree that differs from the one constructed by Huffman algorithm can be converted into one that is equal to Huffman’s tree without increasing its costs. Note that the binary tree constructed by Huffman algorithm is a full binary tree.

Claim: Consider two characters x and y with the smallest probabilities. Then there is optimal code tree in which these two characters are siblings at the maximum depth in the tree. Proof: Let T be any optimal prefix code tree with two siblings b and c at the maximum depth of the tree. Such a tree is shown in Figure 7.2Assume without loss of generality that

p(b) ≤ p(c) and p(x) ≤ p(y)

Figure 7.2: Optimal prefix code tree T

Since x and y have the two smallest probabilities (we claimed this), it follows that

p(x) ≤ p(b) and p(y) ≤ p(c)

Since b and c are at the deepest level of the tree, we know that

d(b) ≥ d(x) and d(c) ≥ d(y) (d is the depth)

Thus we have

p(b) - p(x) ≥ 0 and d(b) - d(x) ≥ 0 Hence their product is non-negative. That is,

(p(b) - p(x)) · (d(b) - d(x)) ≥ 0

Now swap the positions of x and b in the tree

Figure 7.3: Swap x and b in tree prefix tree T

The final tree T′′ satisfies the claim we made earlier, i.e., consider two characters x and y

with the smallest probabilities. Then there is optimal code tree in which these two characters are siblings at the maximum depth in the tree.

The claim we just proved asserts that the first step of Huffman algorithm is the proper one to perform (the greedy step). The complete proof of correctness for Huffman algorithm follows by induction on n.

Claim: Huffman algorithm produces the optimal prefix code tree.

Proof: The proof is by induction on n, the number of characters. For the basis case, n = 1, the tree consists of a single leaf node, which is obviously optimal. We want to show it is true with exactly n characters.

Suppose we have exactly n characters. The previous claim states that two characters x and y with the lowest probability will be siblings at the lowest level of the tree. Remove x and y and replace them with a new character z whose probability is p(z) = p(x) + p(y). Thus n - 1 character remain.

Consider any prefix code tree T made with this new set of n - 1 characters. We can convert T into prefix code tree T 0 for the original set of n characters by replacing z with nodes x and y. This is essentially undoing the operation where x and y were removed an

replaced by z. The cost of the new tree T′ is

B(T′) = B(T) - p(z)d(z) + p(x)[d(z) + 1] + p(y)[d(z) + 1]

= B(T) - (p(x) + p(y))d(z) + (p(x) + p(y))[d(z) + 1] = B(T) + (p(x) + p(y))[d(z) + 1 - d(z)] = B(T) + p(x) + p(y)

The cost changes but the change depends in no way on the structure of the tree T (T is for

n – 1 characters). Therefore, to minimize the cost of the final tree T′, we need to build

the tree T on n – 1 character optimally. By induction, this is exactly what Huffman algorithm does. Thus the final tree is optimal.

7.3 Activity Selection The activity scheduling is a simple scheduling problem for which the greedy algorithm approach provides an optimal solution. We are given a set S = {a 1 , a 2 ,... , an } of n activities that are to be scheduled to use some resource. Each activity ai must be started at a given start time si and ends at a given finish time f i.

An example is that a number of lectures are to be given in a single lecture hall. The start and end times have be set up in advance. The lectures are to be scheduled. There is only one resource (e.g., lecture hall). Some start and finish times may overlap. Therefore, not all requests can be honored. We say that two activities ai and aj are non-interfering if their start-finish intervals do not overlap. I.e, (si, f i) \ (sj , f j ) = ?. The activity selection problem is to select a maximum-size set of mutually non-interfering activities for use of the resource. So how do we schedule the largest number of activities on the resource? Intuitively, we do not like long activities Because they occupy the resource and keep us from honoring other requests. This suggests the greedy strategy: Repeatedly select the activity with the smallest duration (f i - si ) and schedule it, provided that it does not interfere with any previously scheduled activities. Unfortunately, this turns out to be non-optimal.