















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An example illustrating the proof is given below. Proof: Given the lengths satisfying the stated property, we will construct a prefix code. Assume. 5. 4.
Typology: Study notes
1 / 23
This page cannot be seen from the preview
Don't miss anything!
















Coding Preliminaries Code: Source message --- - f -----> code words (alphabet A ) (alphabet B ) alphanumeric symbols binary symbols | A | = N | B |=
A code is Distinct : mapping f is one-to-one. Block-to-Block (ASCII – EBCDIC) Block-to-Variable or VLC (variable length code)(Huffman) Variable-to-Block (Arithmetic) Variable-to-Variable (LZ family) Average code length Let lj denote the length of the binary code assigned to some symbol a j with a probability
p j, then the average code length is given by l
= ∑=
n j j^ j
pl 1
l
Prefix Code :A code is said to have prefix property if no code word or bit pattern is a prefix of other code word. UD – Uniquely decodable Let S 1 (^) = ( a 1 , a 2 ,..., an )and S (^) 2 = ( b 1 , b 2 ,..., bm ) be two sequences of some letters from
f ( a 1 )• f ( a 2 )••• f ( an )= f ( b 1 )• f ( b 2 )••• f ( bm )
implies that a (^) 1 , a 2 ,..., an is identically equal to b 1 (^) , b 2 ,..., bm. That is, a 1 (^) = b 1 , a (^) 2 = b 2 ,etc
and n = m.
Example Codes: 8 symbols a 1 , a 2 ,..., a 8 ,
probabilities codes ai p(ai ) Code A Code B Code C Code D Code E Code F a1 0.40 000 0 010 0 0 1 a2 0.15 001 1 011 011 01 001 a3 0.15 010 00 00 1010 011 011 a4 0.10 011 01 100 1011 0111 010 a5 0.10 100 10 101 10000 01111 0001 a6 0.05 101 11 110 10001 011111 00001 a7 0.04 110 000 1110 10010 0111111 000001 a8 0.01 111 001 1111 10011 01111111 000000 Avg.length 3 1.5 2.9 2.85 2.71 2.
Code A, violates Morse’s principle, not efficient (instantaneously decodable) Code B, not uniquely decodable Code C, Prefix code that violates Morse’s principle Code D, UD but not prefix Code E, not instantaneously decodable (need look-ahead to decode) Code F, UD, ID, Prefix and obeys Morse’s principle Note
The quantity l is simply the sum of lengths of code words whose minimum
value is n (if all the code words were of lengths 1). The maximum value of the exponent is nl where l = max( l ). Therefore, we can write the summation as
i 1 +^ li^ 2 +...+ li n
i (^) 1 ,^ l i^ 2 ,..., lin K ( C )^ n = nl k kn k
∑=
where Ak is the combinations of n codewords that have a combined length of k.
Example to illustrate the proof l 1 (^) = 1 , l 2 = 2 , l 3 = 2 , n = 3 (Note N is 5, not 3) l = max( 1 , 2 , 2 )= 2 nl = 3 x 2 = 6
[ 2 ]^3 ( 21 22 22 )( 21 22 22 )( 21 22 22 )
3 1
∑ −^ = + + + + + + i
l (^) i
nl − kn
Ak 2 k = 2 −^3 + 6 • 2 −^4 + 12 • 2 −^5 + 8 • 2 −^2
A 3 (^) = 1 , A 4 (^) = 6 , A 5 (^) = 12 , A 6 = 8
111 112* 112* 121* 122 122 121* 122 122 211* 212 212 221 222 222 221 222 222 211* 212 212 221 222 222 221 222 222
The example illustrates how the sizes of A (^) k are determined. The combinations, for
example, marked with * contributes to the coefficient of and there are 6 of them so
and so on. The number of possible binary sequences of length k is 2. If the
code is uniquely decodable, then each sequence can represent one and only one sequence of code words. Therefore, the number of possible combination of code words
whose combined length is k cannot be greater than. Thus
A 4 (^) = 6 k
2^ k
Ak < 2 k
∑ 2 ≤^ ∑^2 ⋅^2 ≤∑^1 = − +^1 = ( −^1 )+^1 = =
A −^ nl nl n nl k n
nl k n
nl k k k n k k
KC nl A −^ nl k n
n k k
If K ( C )is greater than 1, [ K ( C )] n goes exponentially, but n ( l -1) + 1 goes linearly with n.
Hence K ( C )≤ 1 , Or 2 1 1 ∑^ ≤ =
N − i
l (^) i
The converse of Theorem 1 is also true, as given in Theorem 2..
Theorem 2 : Given a set of integers l such that , then we can find a
prefix code with codeword length l
1 ,^ l^ 2 ,..., l N
2 ,..., l N
1 ∑^ ≤ =
N − i
l (^) i
1 ,^ l
See proof in Khalid Saywood, p.33. An example illustrating the proof is given below. Proof: Given the lengths satisfying the stated property, we will construct a prefix code. Assume l 1 (^) ≤ l 2 ≤ l 3 ≤ l 4 ≤ l 5
l 1 = 1 , l 2 = 2 , l 3 = 3 , l 4 = 4 , l 5 = 4 1 ≤^12 +^14 +^18 +^116 +^116
Define a sequence of numbers w (^) 1 , w 2 ,..., wN as follows:
∑
1 1
1 2
j i
wj lj li
w
such that j >1.. The binary representation of w (^) j for j >1 would take log 2 wj
w 1
bits. We
will use these binary representations to construct a prefix code. Note that the binary representation of w (^) j is less than or equal to l (^) j. This is obviously true for. For j >1,
j
j i
j j l i
j l l i
log log [ 2 −^ ] log [ 2 2 ] log [^12 ] (^21)
1 (^21)
1 (^221)
Note the proof assumes only UD property but not the prefix property. But the resulting code has the prefix property. If we know that the code is prefix then a much simpler proof exists.
We prove the theorem by using a binary tree embedding technique. Every prefix code can be represented in the paths of a binary tree.
1 ,^ l^ 2 ,..., l N = 12 −^ ≤^1
N i
l (^) i
Example to illustrate the proof
path in a binary tree. This embedding assigns to each codeword of length l a node on
level l to serve as the terminal node. Then prune the entire sub-tree below that node,
wiping out nodes. Since we cannot prune from a level- L tree more than
j j 2 L − l^ j 2 L
nodes that were there to start with, we must have. Diving by , we get
which is the Kraft Inequality.
N L i
2 L^ li^ 2 1
=
1
=
N − i
l (^) i
The proof of the converse is more interesting.
prefix code with these lengths.
Proof: That is, for each level l we must show that after we have successfully embedded all words with lengths l , enough nodes at level l remain un-pruned so that we can
embed a codeword there for each such that
j <^ l j l (^) j = l.
That is,
j lj l jl l
l ll j
− (^) ∑ j^ ≥ = <
:
=c …(1)
The right hand side is simply the number of nodes with l = lj But
− = =
jl l
ll j jl l jl l j j j j
j l l : :
0 :
− <
jl l
ll jl l
l ll j
j j
j : :
Or (^) ∑ ∑ ∑ ≤
− <
jl l
ll jl l
ll jl l
l ll j
j j
j j
j : : :
Dividing both sides by 2 , we have l ∑ ≤
jl l
l j
j :
Since we have (^) ∑ ∑ ≤
jl l
l all j
l j
j j :
We must have 2 2 1 : ∑ ≤^ ∑ −^ ≤ ≤
− all j
l jl l
l (^) j j
j
(All derivations above are valid for d-ary tree. Put d − l^ j to replace 2 − lj .)
Examples of Prefix Code:
Fibonacci Code Express rank x in terms of a weighted number system where the Fibonacci number are the weights. Then x is encoded as the reverse Fibonacci sequence followed by binary’1’. N 21 13 8 5 3 2 1 Code 1 1 1 1 2 1 0 0 1 1 3 1 0 0 0 0 1 1 4 1 0 1 1 0 1 1 5 1 0 0 0 0 0 0 1 1 6 1 0 0 1 1 0 0 1 1 7 1 0 1 0 0 1 0 1 1 8 1 1 0 0 0 0 1 1 1 16 1 0 0 1 0 0 0 0 1 0 0 1 1 32 1 0 1 0 1 0 0 0 0 1 0 1 0 1 1
Shannon-Fano Code: i pi Code 1 0.25 1 0 1 0 Average length = 2.7 bit/symbol 2 0.2 1 1 1 1 Entropy=2.67bit 3 0.15 0 0 0 0 0 0 very good 4 0.15 0 0 1 0 0 1 5 0.1 0 1 0 0 1 0 6 0.1 0 1 1 0 0 1 1 0 7 0.05 0 1 1 1 0 1 1 1
1 0.25 1 1 Average length = 2.5 bit/symbol 2 0.25 1 0 Entropy=2.5bit 3 0.125 0 1 1 perfect code! 4 0.125 0 1 0 5 0.125 0 0 1 6 0.125 0 0 0
The method produces best result if the splits are perfect which happens when the probabilities are 2 − k and (^) ∑ 2 −^ k = 1. This property is also true for Huffman code.
¾ Shannon-Fano is top-down. If you draw a binary tree, the symbols near to the root get codes assigned to them first. ¾ Huffman is bottom-up. It starts assigning codes from leaf nodes.
Huffman invented this code as an undergraduate at MIT and managed to skip the final exam as a reward!
Same offer: If you come up with an original idea in this course worth publishing in a reputable journal, you may skip the final exam.
v ( l l )^2 p
l = 2. 2 v = 1. 36
Optimality of Huffman Code
Theorem 5 Huffman code is a minimum average length ( l ) binary prefix code.
Lemma 1 If p ( a 1 )≥ p ( a 2 ) , then it must be that l 1 (^) ≤ l 2 for the code to have minimum
average ( l ) codelength.
= + + ∑=
n i i^ i
l pa l pa l pl (^1 )
= p ( a 1 ) l 1 + p ( a 2 ) l 2 + Q
For the sake of contradiction, assume l. Then, we can exchange the codes for a and
, giving modified average length:
1 >^ l 2 1 a 2 l *^ = p ( a 1 ) l 2 + p ( a 2 ) l 1 + Q
Therefore, l − l *^ = p ( a 1 )( l 1 − l 2 )+ p ( a 2 )( l 2 − l 1 )
= p ( a 1 )( l 1 − l 2 )− p ( a 2 )( l 1 − l 2 ) = C [ p ( a 1 )− p ( a 2 )], C = l 1 − l 2 Thus, l > l * So l is not a minimum, a contradiction.
Lemma 2 A minimum average length l binary code has at least two codes of maximum length lM.
Proof: Let C = ( C 1 (^) , C 2 ,K, CM )be a minimum l binary prefix code, such that
. Let l be the length of the least likely source symbol whose code is and has length l. So, the leaf node sits at the deepest level of the binary tree. It cannot be a lone node at that level, because, if it were, we can replace it by its ancestor on the previous level. Since shuffling the code words to nodes on any fixed level does not affect
≥ p (^) 2 ≥,K,≥ p M C M
p 1 M M
l , we may assume that C and stem from the same ancestor, with , say, encoding in 0 and C encoding in 1.That is we put these two leaf nodes on consecutive positions of the Huffman
M − 1 C (^) M C (^) M − 1 M
tree. Let’s redefine l to be the depth of the node that is the common ancestor of and C , while letting each for 1
M − 1 C (^) M M − 1 l (^) j ≤ j ≤ M − 2 retain the original
meaning.
= pj lj + M − 1 + 1 ]
M^ *− 1 =
− 1 + p^ p^ j^1 ≤^ j ≤
j^ +^ p * M (^ lM − 1
p
l j
This converts the problem to construct a binary tree with M-1 terminal nodes so as to
minimize ( )[ 1
2 2
−
M j
l p p l.
Now, define modified probabilities as
p pM M , p * j^ = M − 2
Then
1 1
(^21) * 1
− =
M j j
M j j^
l pl p p
But is a constant of the problem and does not affect how we construct the tree.
This has converted our original problem to that of finding a tree with M -1 terminal nodes
that is optimum for probabilities {. This, in turn, can be reduced to an
( M -2) node problem by assigning the code words corresponding to the smallest two of modified probabilities to a pair of terminal nodes that share a common immediate
ancestor. But, that is, precisely what the next merge operation in Huffman algorithm does! Iterating this argument M -1 times establishes that Huffman algorithm produces minimum average length prefix binary codes.
p *^ M − 1
p * j^ , 1 ≤ j −
This argument is also valid for d-ary codes!
Theorem 6 The entropy H of { p (^) j , 1 ≤ j ≤ n } satisfies 0 ≤ H ≤log n
Lower Bound for average length l − H
∑
∑
∑ ∑
i i i l
i i i i
i ii i i i
p p i
p l p
pl p p
log( 2 )
( log )
( log )
Let x = pi 2 li. Using the relationlog x ≥log (^) x e ( 1 −^1 x ),we then have log( 2 i ) log 2 ( (^112) li ) i i l p ≥ e − p
Thus, l − H
log [ 2 ]
log ( 2 )
log ( 1 2 )
2
2
2
e p
e p
e p p
i
l i i
i i l
i
l i i
i
i
i
∑ ∑
∑
∑
−
−
−
where C = (^) ∑ − ≤ (By Kraft inequality). Thus, i
2 l^ i^ 1 l − H ≥ 0 .Equality holds when x = 1
Thus , ≥ H
2 − l^ i
l. The average code length for any binary prefix code is at least as large as the entropy of the source. [The above derivation is also true for d-ary prefix code.
Replace by d − l^ i and log 2 e by log (^) de .]
Upper Bound See Sayood, pp.46-51. ( Reading Assignment) Theorem 7: H ( S )≤ l 〈 H ( S )+ 1
Huffman code has some major disadvantages. If the alphabet size is large, viz. word based Huffman need to code each word of a large English dictionary.
As we know, if there are n -1 internal nodes, we can create new Huffman codes by re-labeling (at each internal node there are two choices of labeling with 0
and 1). So, we should have Huffman codes. But, let us create the codes as 00 x , 10 x , 01, and 11 where x = 0 or 1. let A =00, B =10, C =01, D =11. The codes are Ax, Bx, C, D. Any permutation of A, B, C, D will lead to a valid Huffman code. There are 4! permutation and x has two possible values – hence a total of 96 Huffman codes! (Actually 94 if we do the enumeration.) This means that there are Huffman codes that cannot be generated by Huffman tree. Canonic Huffman code is one such Huffman code.
2 n −^1
a 000 b 001 c 010
The algorithm to generate the codes seems very straight forward as described above in the code generation steps. If the first number using bits is somehow figured out for the code group of length l , then we know the remaining codes in this group are consecutive numbers. Let us denote by first(l) be the first number in the code group of length l. For encoding purpose we only need first(l) for values of l equal to l which are the lengths of the codes. But, we will compute first(l) for all values of l in the range l
l i i
1
1 ,^ l^ 2 ,K, l max ≤ l ≤ l maxsince, as we will see later, we will need this for the purpose of decoding.Let numl ( l ) denote the number of codes of length l , l (^) 1 ≤ l ≤ l max. The computation of first(l) is given by the two line code:
first ( l (^) max):=0; for l := l (^) max-1 down to 1 do first ( l ) := (^) ( first ( l + 1 )+ numl ( l + 1 ))/ (^2) ;
Given the lengths as (6,6,6,6,6,6,6,5,5,5,5,5,3,3,3,3)
We have l 1 2 3 4 5 6 numl ( l) 0 0 4 0 5 7 first(l) 2 4 3 5 4 0 first ( 6 )= 0 first ( 5 )= ( 0 + 7 )/ 2 = 4 first ( 4 )= ( 4 + 5 )/ 2 = 5 first ( 3 )= ( 5 + 0 )/ 2 = 3 first ( 2 )= ( 3 + 4 )/ 2 = 4
Therefore, l 1 2 3 4 5 6 numl ( l ) 0 0 4 0 5 7 first ( l ) 2 4 3 5 4 0
need the values for encoding Given the array first(l), the algorithms steps can now be followed to obtain the canonical codes. The expression (^) ( first ( l + 1 )+ numl ( l + 1 )) / (^2)
guarantees that the resulting code is a
prefix. If first ( l and ( + 1 )=. ( n 1 (^) + N )/ 2 is right shifted by one bit and
ceiling operation add a 1 to it if it is an odd number. Convince yourself that implies prefix property.
Decoding Canonical Huffman is very useful when the alphabet is large but fast decoding is necessary. The code is stored in consecutive memory addresses, along with symbol.
Let’s do the example (5,5,5,5,3,2,2,2) again