



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A set of slides from a lecture in the ece-8873 data compression and modeling course at georgia tech, spring 2004. The lecture covers lossless compression techniques, specifically arithmetic coding and dictionary methods. Arithmetic coding is a method for lossless data compression that uses a continuous representation of probability. Dictionary methods, such as lz77 and lz78, use a dictionary to encode repeated patterns in the data. The lecture also discusses the efficiency and comparison of huffman and arithmetic codes.
Typology: Study notes
1 / 6
This page cannot be seen from the preview
Don't miss anything!




Spring 2004
ECE-8873 B. H. Juang
Copyright 2004
Lecture #4, Slide #
Spring 2004
ECE-8833 B. H. Juang
Copyright 2004
Lecture #4, Slide #
max
Spring 2004
ECE-8833 B. H. Juang
Copyright 2004
Lecture #4, Slide #
=
i k
X
i
i
m
1
2
1
Spring 2004
ECE-8833 B. H. Juang
Copyright 2004
Lecture #4, Slide #
3 2 1 3 2 1
3 3 2 1 3 3 2 1
ECE-8833 B. H. Juang
Copyright 2004
Lecture #4, Slide #
Generating and Deciphering the Tag
Let
and
be the lower and upper bound of
the tag associated with a sequence of
n
symbols,
.
)
(
n
u
)
(
n
l
) 1
(
) 1
(
) 1
(
) (
) 1
(
) 1
(
) 1
(
) (
n X n n n n n X n n n n
−
−
−
−
−
−
)
,
,
,
(
2
1
n x
x
x
L
=
x These bounds can be computed via the followingrecursion:with
1
0
) 0 (
) 0 (^
=
=
u
l
and
The same recursion is used to convert a tag backinto the sequence of symbols.Need to know the cumulative probabilities,
F
(
x
).
Spring 2004
ECE-8833 B. H. Juang
Copyright 2004
Lecture #4, Slide #
The Deciphering Algorithm
Initialize
;
For each
k,
find
Find the value of
for which
Update
Repeat steps 2-4 until the entire sequence isfound.
1
0
) 0 (
) 0 (^
=
=
u
l
and
) (
) (^
k
k
u
l
and
)
/( )
tag (
) 1
(
) 1
(
) 1
(^
−
−
−
−
−
=
k
k
k
l
u
l
t
k x
)
(
) 1
(
k
X
k
X
x
F
t
x
F
<
≤
−
n
k
,
, 2 , 1
L
=
ECE-8833 B. H. Juang
Copyright 2004
Lecture #4, Slide #
Another Way to Look at Arithmetic Code
1
2 a
1 a
3 a
2 a
1 a
3 a
2 a
1 a 3 a
2 a
1 a
3 a
2 a
1 a
3 a
2 a
1 a
3 a
2 a
1 a
3 a
3
3
a
a
2
3
a a
3
2
a
a
1
3
a
a
2
2
a
a
1
1
a a
1
2
a
a
3
1
a a
2
1
a a (^49). 0
tag
log
x
x
l
Use binary format for tag but truncated it to
bits.
= ) (
x
P
The truncation while causes deviation from the original tag value will not however movethe value out of the range the tag was originally in, thereby maintaining the decodability.
Spring 2004
ECE-8833 B. H. Juang
Copyright 2004
Lecture #4, Slide #
Binary Code Assignment
1111
4
.
.
4
1101
4
.
.
.
3
101
3
.
.
.
2
01
2
.
.
.
1
Code
In Binary
Symbol
X F
X T
^
^
1
) (
log
−
x P
Each tag in [0,1) is represented by a binary fractionalnumber and truncated to
bits.
log
x
x
l
This binary code is uniquely decodable becausethe truncated tag always remains inand it is a prefix code.
X
X
Average code length:
length
block
m m X H I X H
A
ECE-8833 B. H. Juang
Copyright 2004
Lecture #4, Slide #
o
l^
[ 3,5,C(
d
)]
o
l^
[ 2,1,C(
r
)]
o
l^
[ 7,4,C(
r
)]
[ 0,0,C(
d
)]
LZ77 Example
c a b r a c a d a b r a r r a r r a d
o
l^
o
l^
c a b r a c a d
a b r a
r
r a r r a
Spring 2004
ECE-8833 B. H. Juang
Copyright 2004
Lecture #4, Slide #
LZ
PKZip, Zip, Lharc, PNG, gzip, and ARJ
the original triple.
ECE-8833 B. H. Juang
Copyright 2004
Lecture #4, Slide #
LZ
wa b ba wabba_ wabba_wabba_w_
o o_
(^1234567891011121314)
<0,C(
w
)>
<0,C(
a
)>
<0,C(
b
)>
<3,C(
a
)>
<0,C(
)>
<1,C(
a
)>
<3,C(
b
)>
<2,C(
)>
<6,C(
b
)>
<4,C(
)>
<9,C(
b
)>
<8,C(
w
)>
<0,C(
o
)>
<13,C(
)>
Entry
Index
Encoded output
Dictionary
Initialcodebook
Input sequence: wabba_wabba_wabba_wabba_woo_
Spring 2004
ECE-8833 B. H. Juang
Copyright 2004
Lecture #4, Slide #
The Lempel-Ziv-Welch (LZW) Algorithm
nd
wabba_wabba_wabba_wabba_woo_woo_woo^ Initial primed dictionary
_^ a b o w
1 2 3 4 5
entry
index
a_wwabbba__waabbba_wwoooo__wooo__woo
141516171819202122232425
__ a b o w waabbbbaa__w wabbba_
(^12345678910111213)
entry
index
entry
index
Encoded sequence:52332168(10)(12)9(11)7(16)544(11)(21)(23)
ECE-8833 B. H. Juang
Copyright 2004
Lecture #4, Slide #
Predictive Coding
Spring 2004
ECE-8833 B. H. Juang
Copyright 2004
Lecture #4, Slide #
Structure in Information
Again, work on (or find) the structure first.
At times, information structure is recursive.
)}
,
,
|
(
{
2
1
L
−
−
i
i
i^
x
x
x
P
(1 2 5 7 1 3 0 -5 -3 -1 1 -2 -7 -4 -2 1 3 4)
(1 -1 1 0 -8 0 -5 -7 0 0 0 -5 -7 -1 0 1 0 -1)
2
1
−
−
=
− i
i
i^
x
x
y
) ( ) , , ( ~
2
1
i i i i i i i
i^
H f x x x f x x x
− = − = − = ∆
−
−
L
ECE-8833 B. H. Juang
Copyright 2004
Lecture #4, Slide #
General Block Diagram
i X
(& Decoder)
i ~ X
i ∆
i ˆ X
i ˆ∆
) ( ) , ˆ , ˆ ( ~
2
1
i i i i i i i
i^
H f x x x f x x x
− = − = − = ∆
−
−
L
coding
lossless
of
case
in
i
i
i^
∆
=
∆
=
∆
))
(
(
ˆ
α
β
i
i
i^
x
x
~
ˆ
ˆ
∆
=
But, the notion of prediction can also be applied toprobability measures – foresee change in distributionbased on what is or are already observed.
Spring 2004
ECE-8833 B. H. Juang
Copyright 2004
Lecture #4, Slide #
Run Length Encoding
Simple but still useful.
Binary (e.g., FAX): Instead of sending 1's and 0's(mostly 1's for white with 0's for black), send length ofruns of white.
Often used for data with many repeated symbols,e.g., FAX and quantized transform coefficients (DCT,wavelet)
Can combine with other methods, e.g., JPEG doeslossless coding (Huffman or arithmetic) of combinedquantizer levels/runlengths.