Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Lecture 4: Lossless Compression - Arithmetic Code & Dictionary Techniques, Study notes of Electrical and Electronics Engineering

Georgia Institute of Technology - Main Campus Electrical and Electronics Engineering

A set of slides from a lecture in the ece-8873 data compression and modeling course at georgia tech, spring 2004. The lecture covers lossless compression techniques, specifically arithmetic coding and dictionary methods. Arithmetic coding is a method for lossless data compression that uses a continuous representation of probability. Dictionary methods, such as lz77 and lz78, use a dictionary to encode repeated patterns in the data. The lecture also discusses the efficiency and comparison of huffman and arithmetic codes.

Typology: Study notes

Pre 2010

Uploaded on 08/05/2009

koofers-user-a36 🇺🇸

4

(1)

10 documents

1 / 6

This page cannot be seen from the preview

Don't miss anything!

ECE 8873

Data Compression and Modeling

Lecture 4:

Lossless Compression – Arithmetic Code and

Dictionary Techniques

School of Electrical and Computer Engineering

Georgia Institute of Technology

Spring, 2004

Arithmetic Coding

• A skewed source often has low entropy.

• A source with small alphabet also has low entropy

in most practical situations.

• Huffman codes are inefficient for skewed sources;

its deviation from optimality depends on ,

maximum of symbol probabilities.

• Extended Huffman code may improve efficiency,

but implementation requires large table.

• Is it possible to sequentially construct only the part

of the table that is needed?

max

P

Arithmetic Code

• Key mechanism:

– Code k-tuples of input symbols.

– For each k-tuple input, generate a tag, a real number in

(0,1), according the probabilities in the alphabet; the tag

can be generated sequentially.

– Represent each tag in binary code with length

commensurate (inversely) with the symbol’s probability;

the binary code can be truncated without compromising

the decodability due to the tag structure.

∑

=

====

==

i

k

Xi

im

iXPiFaPiXP

iaXaaaA

1

21

)()( )()(

)( },,,,{

and

and Let L

Mapping a Symbol Sequence into a Tag

2.0)(,1.0)(,7.0)( },,,{ 321321 ==== aPaPaPaaaA and Let

5586.0)],,,[( ),,,,( 33213321 =aaaaTaaaa sequence For

0

0.7

0.8

1.0

0

0.49

0.56

0.7

0.49

0.56

0.539

0.546

0.56

0.546

0.5572

0.5558

0.56

0.5572

Discover Study notes of Electrical and Electronics Engineering Georgia Institute of Technology - Main Campus

Partial preview of the text

Download Lecture 4: Lossless Compression - Arithmetic Code & Dictionary Techniques and more Study notes Electrical and Electronics Engineering in PDF only on Docsity!

Spring 2004

ECE-8873 B. H. Juang

Lecture #4, Slide #

ECE 8873

Data Compression and Modeling

Lecture 4:

Lossless Compression – Arithmetic Code and

Dictionary Techniques

School of Electrical and Computer Engineering

Georgia Institute of Technology

Spring, 2004

Spring 2004

ECE-8833 B. H. Juang

Lecture #4, Slide #

Arithmetic Coding

• A skewed source often has low entropy.• A source with small alphabet also has low entropy

in most practical situations.

• Huffman codes are inefficient for skewed sources;

its deviation from optimality depends on

maximum of symbol probabilities.

• Extended Huffman code may improve efficiency,

but implementation requires large table.

• Is it possible to sequentially construct only the part

of the table that is needed?

max

P

Spring 2004

ECE-8833 B. H. Juang

Lecture #4, Slide #

Arithmetic Code

Key mechanism:

– Code k-tuples of input symbols.– For each k-tuple input, generate a tag, a real number in

(0,1), according the probabilities in the alphabet; the tagcan be generated sequentially.

– Represent each tag in binary code with length

commensurate (inversely) with the symbol’s probability;the binary code can be truncated without compromisingthe decodability due to the tag structure.

=

i k

X

i

m

i X P i F a P i X P

i a X a a a A

1

2

1

and

Let

L

Spring 2004

ECE-8833 B. H. Juang

Lecture #4, Slide #

Mapping a Symbol Sequence into a Tag

3 2 1 3 2 1

a P a P a P a a a A

and

Let

)]

[(

3 3 2 1 3 3 2 1

= a a a a T a a a a

sequence

For

ECE-8833 B. H. Juang

Lecture #4, Slide #

Generating and Deciphering the Tag

Let

and

be the lower and upper bound of

the tag associated with a sequence of

n

symbols,

.

)

(

n

u

)

(

n

l

) 1

(

) 1

(

) 1

(

) (

) 1

(

) 1

(

) 1

(

) (

n X n n n n n X n n n n

x F l u u u

x F l u l l

−

)

,

(

2

1

n x

x

L

=

x These bounds can be computed via the followingrecursion:with

1

0

) 0 (

) 0 (^

=

u

l

and

The same recursion is used to convert a tag backinto the sequence of symbols.Need to know the cumulative probabilities,

F

(

x

).

Spring 2004

ECE-8833 B. H. Juang

Lecture #4, Slide #

The Deciphering Algorithm

Initialize

;

For each

k,

find

Find the value of

for which

Update

Repeat steps 2-4 until the entire sequence isfound.

1

0

) 0 (

) 0 (^

=

u

l

and

) (

) (^

k

u

l

and

)

/( )

tag (

) 1

(

) 1

(

) 1

(^

−

=

k

l

u

l

t

k x

)

(

) 1

(

k

X

k

X

x

F

t

x

F

<

≤

−

n

k

,

, 2 , 1

L

=

ECE-8833 B. H. Juang

Lecture #4, Slide #

Another Way to Look at Arithmetic Code

1

2 a

1 a

3 a

2 a

1 a

3 a

2 a

1 a 3 a

2 a

1 a

3 a

2 a

1 a

3 a

2 a

1 a

3 a

2 a

1 a

3 a

3

a

2

3

a a

3

2

a

1

3

a

2

a

1

a a

1

2

a

3

1

a a

2

1

a a (^49). 0

tag





log

x

P

l

Use binary format for tag but truncated it to

bits.

= ) (

x

P

The truncation while causes deviation from the original tag value will not however movethe value out of the range the tag was originally in, thereby maintaining the decodability.

Spring 2004

ECE-8833 B. H. Juang

Lecture #4, Slide #

Binary Code Assignment

1111

4

.

4

1101

4

.

3

101

3

.

2

01

2

.

1

Code

In Binary

Symbol

X F

X T

^

^

1

) (

log

−

x P

Each tag in [0,1) is represented by a binary fractionalnumber and truncated to

bits.





log

x

P

l

This binary code is uniquely decodable becausethe truncated tag always remains inand it is a prefix code.

[

x

X

F

Average code length:

length

block

m m X H I X H

A

ECE-8833 B. H. Juang

Lecture #4, Slide #

o

l^

[ 3,5,C(

d

)]

o

l^

[ 2,1,C(

r

)]

o

l^

[ 7,4,C(

r

)]

[ 0,0,C(

d

)]

LZ77 Example

c a b r a c a d a b r a r r a r r a d

[ 0,0,C(

d

)]

o

l^

o

l^

c a b r a c a d

[ 7,4,C(

r

)]

a b r a

[ 2,1,C(

r

)]

r

r a r r a

[ 3,5,C(

d

)]

Spring 2004

ECE-8833 B. H. Juang

Lecture #4, Slide #

LZ

Asymptotic optimality: approached entropy.• Recurrence of codewords happens in recent memory.• Variations:
- Encode the triple with variable length code;

PKZip, Zip, Lharc, PNG, gzip, and ARJ

Improved buffer search algorithm; hash table,…– Use a flag bit in the case of no match instead of

the original triple.

ECE-8833 B. H. Juang

Lecture #4, Slide #

LZ

Avoid performance dependency on the buffer length as in LZ77.• Codebook may grow unbounded unless constrained.

wa b ba wabba_ wabba_wabba_w_

o o_

(^1234567891011121314)

<0,C(

w

)>

<0,C(

a

)>

<0,C(

b

)>

<3,C(

a

)>

<0,C(

)>

<1,C(

a

)>

<3,C(

b

)>

<2,C(

)>

<6,C(

b

)>

<4,C(

)>

<9,C(

b

)>

<8,C(

w

)>

<0,C(

o

)>

<13,C(

)>

Entry

Index

Encoded output

Dictionary

Initialcodebook

Input sequence: wabba_wabba_wabba_wabba_woo_

Spring 2004

ECE-8833 B. H. Juang

Lecture #4, Slide #

The Lempel-Ziv-Welch (LZW) Algorithm

2

nd

element in code not transmitted. Only index is sent.

If p is in dictionary but pa is not, augment the dictionary with pa.

wabba_wabba_wabba_wabba_woo_woo_woo^ Initial primed dictionary

_^ a b o w

1 2 3 4 5

entry

index

a_wwabbba__waabbba_wwoooo__wooo__woo

141516171819202122232425

__ a b o w waabbbbaa__w wabbba_

(^12345678910111213)

entry

index

entry

index

Encoded sequence:52332168(10)(12)9(11)7(16)544(11)(21)(23)

ECE-8833 B. H. Juang

Lecture #4, Slide #

Predictive Coding

Symbols or letters in the sequence may have recursivedependency.

Conventional variable length coding may not fully capture thistype of redundancy.

Instead of coding each symbol in a memoryless fashion (evenwhen sophisticated parsing is involved), predict the symbolbased on information that the decoder (also) possesses and canuse, obtain the discrepancy between the input and the predictedone, and code such discrepancy. (Recall the structure ofinformation and representation of it.)

Decoder combines the decoded discrepancy with its version ofthe predicted result to recover the original.

Cleary & Witten (1984) – predictive coding with partial match

Spring 2004

ECE-8833 B. H. Juang

Lecture #4, Slide #

Two options to transform the coding task:•

“Predict” current symbol based on past symbols; code residual

“Adapt” probability distribution based on past symbols; i.e., use

Structure in Information

Again, work on (or find) the structure first.

At times, information structure is recursive.

)}

,

|

(

{

2

1

L

−

i

i^

x

P

(1 2 5 7 1 3 0 -5 -3 -1 1 -2 -7 -4 -2 1 3 4)

(1 -1 1 0 -8 0 -5 -7 0 0 0 -5 -7 -1 0 1 0 -1)

2

1

−

=

− i

i

i^

x

y

) ( ) , , ( ~

2

1

i i i i i i i

i^

H f x x x f x x x

− = − = − = ∆

−

L

ECE-8833 B. H. Juang

Lecture #4, Slide #

General Block Diagram

i X

Encoder

(& Decoder)

Predictor

i ~ X

i ∆

i ˆ X

i ˆ∆

) ( ) , ˆ , ˆ ( ~

2

1

i i i i i i i

i^

H f x x x f x x x

− = − = − = ∆

−

L

coding

lossless

of

case

in

i

i^

∆

=

∆

=

∆

))

(

ˆ

α

β

i

i^

x

~

ˆ

∆

=

But, the notion of prediction can also be applied toprobability measures – foresee change in distributionbased on what is or are already observed.

Spring 2004

ECE-8833 B. H. Juang

Lecture #4, Slide #

Run Length Encoding

Simple but still useful.

Binary (e.g., FAX): Instead of sending 1's and 0's(mostly 1's for white with 0's for black), send length ofruns of white.

Often used for data with many repeated symbols,e.g., FAX and quantized transform coefficients (DCT,wavelet)

Can combine with other methods, e.g., JPEG doeslossless coding (Huffman or arithmetic) of combinedquantizer levels/runlengths.