A Tutorial of The Context Tree Weighting Method: Basic Properties | EE 80653, Papers of Electrical and Electronics Engineering

Material Type: Paper; Class: Information Theory; Subject: Electrical Engineering; University: Notre Dame; Term: Fall 2005;

Typology: Papers

Pre 2010

Uploaded on 02/24/2010

koofers-user-npt
koofers-user-npt 🇺🇸

9 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
A Tutorial of The Context Tree Weighting Method: Basic Properties
Zijun Wu
November 9, 2005
Abstract
In this tutorial, we try to give a tutorial overview of The Context Tree Weighting Method.
We confine our discussion to binary bounded memory tree sources and describe a sequential
universal data compression procedure, which achieves a desirable coding distribution for tree
sources with unknown model and unknown parameters. Computational and storage
complexity of the proposed procedure are both linear in the source sequence length.
1. Introduction
In our class, we learned Huffman codes. For small source alphabets, though, we have
efficient coding only if we use long blocks of source symbols. It is therefore desirable to have
an efficient coding procedure that works for long blocks of source symbols. Huffman coding
is not ideal for this situation. Arithmetic coding achieves this goal. When no priori probability
distribution knowledge of the source is available, one uses universal coding algorithm. For
finite memory source model, Weinberger, Ziv and Lempel developed a sequential algorithm
for universal coding, where artificial parameter K is involved. To avoid choosing parameter,
Willems, Shtarkov and Tjalkens proposed a weighting method. This method estimates the
source probability given past symbols and then combines it with arithmetic algorithm. We
will outline the basics of the method in this tutorial.
2. Binary Bounded Memory Tree Sources
A binary tree source generates a sequence
x of digits assuming values in the alphabet
{0,1}. We denote by n
m
x the sequence nmm xxx
+1 and allow m and n to be infinitely
large. For n<m the sequence is empty, denoted by
φ
.
The statistical behavior of a binary finite memory tree source can be described by means
of a suffix set S. This suffix set is a collection of binary strings s(k), with k=1,2,…,|S|. We
require it to be proper and complete. Properness of the suffix set implies that no string in S is
a suffix of any other string in S. Completeness guarantees that each semi-infinite sequence
(string) nnn xxx 12
has a suffix that belongs to S. This suffix is unique since S is proper.
A bounded memory tree source has a suffix set S that satisfies l(s)
D for all sS. We
say that the source has memory not larger than D.
We define a suffix function )(
S
β
, which maps semi-infinte sequences onto their unique
suffix s in S.
Definition 1: The actual next-symbol probabilities for a bounded memory tree source
pf3
pf4
pf5

Partial preview of the text

Download A Tutorial of The Context Tree Weighting Method: Basic Properties | EE 80653 and more Papers Electrical and Electronics Engineering in PDF only on Docsity!

A Tutorial of The Context Tree Weighting Method: Basic Properties

Zijun Wu

November 9, 2005

Abstract

In this tutorial, we try to give a tutorial overview of The Context Tree Weighting Method.

We confine our discussion to binary bounded memory tree sources and describe a sequential

universal data compression procedure, which achieves a desirable coding distribution for tree

sources with unknown model and unknown parameters. Computational and storage

complexity of the proposed procedure are both linear in the source sequence length.

1. Introduction

In our class, we learned Huffman codes. For small source alphabets, though, we have

efficient coding only if we use long blocks of source symbols. It is therefore desirable to have

an efficient coding procedure that works for long blocks of source symbols. Huffman coding

is not ideal for this situation. Arithmetic coding achieves this goal. When no priori probability

distribution knowledge of the source is available, one uses universal coding algorithm. For

finite memory source model, Weinberger, Ziv and Lempel developed a sequential algorithm

for universal coding, where artificial parameter K is involved. To avoid choosing parameter,

Willems, Shtarkov and Tjalkens proposed a weighting method. This method estimates the

source probability given past symbols and then combines it with arithmetic algorithm. We

will outline the basics of the method in this tutorial.

2. Binary Bounded Memory Tree Sources

A binary tree source generates a sequence

x − (^) ∞ of digits assuming values in the alphabet

{0,1}. We denote by

n xm the sequence x (^) m xm + 1 ⋅⋅⋅ xn and allow m and n to be infinitely

large. For n<m the sequence is empty, denoted by φ.

The statistical behavior of a binary finite memory tree source can be described by means

of a suffix set S. This suffix set is a collection of binary strings s(k), with k=1,2,…,|S|. We

require it to be proper and complete. Properness of the suffix set implies that no string in S is

a suffix of any other string in S. Completeness guarantees that each semi-infinite sequence

(string) ⋅ ⋅⋅ x (^) n − 2 xn − 1 xn has a suffix that belongs to S. This suffix is unique since S is proper.

A bounded memory tree source has a suffix set S that satisfies l (s) ≤ D for all s∈S. We

say that the source has memory not larger than D.

We define a suffix function β S (⋅), which maps semi-infinte sequences onto their unique

suffix s in S.

Definition 1 : The actual next-symbol probabilities for a bounded memory tree source

with suffix set S and parameter vector Θ (^) S are

1 P X 1 x S

t a tD

− = (^) − Θ S ) 1 ( 0 | , , )

1 s

t = − Pa Xt = xtD S Θ

− − (^ )

− 1 = (^) −

t

θ β S xtD for all t.

The actual block probabilities are now products of actual next-symbol probabilities, i.e.

0 P X 1 x 1 x 1 D S

t t a =^ − Θ^ S )^ =

t

= 1

τ

1 Pa X x x D S

− = −

τ τ τ τ Θ^ S )

All sources with the same suffix set are said to have the same model. The set of all tree

models having memory not larger than D is called the model class CD. It is possible to specify

a model in this model class by a natural code by encoding the suffix set S recursively. The

code of S is the code of the empty string λ. The code of a string s is void if l (s) = D;

otherwise, it is 0 if s∈S and 1 followed by the codes of the strings 0s and 1s if s∉S. If we use

this natural code, the number of bits that are needed to specify a model S∈CD is equal

to Γ D ( S ), where

Definition 2 : Γ D ( S ), the cost of a model S with respect to model class CD is defined as

Γ D ( S )= |S| - 1 + |{s:s∈S, l (s) ≠ D}|

where it is assumed that S∈CD.

So far, we’ve proposed a model for the source. And we already know arithmetic coding

is a good sequential method in the sense that its individual coding redundancy is no more than

2 bits. Now our task is to find a distribution estimate which bridges between the model and

arithmetic coding method. In the following, we define the coding redundancy, describe the

redundancy upper bound of arithmetic coding, and then a weighting method as a distribution

estimate.

3. Codes and Redundancy

We assume that both the encoder and the decoder have access to the past source symbols

1 1 0

0 x 1 (^) − D = xD ⋅⋅⋅ xx , so that implicitly the suffix that determines the probability distribution

of the first source symbols, is available to them. We denote the functional relationship

between source sequence and codeword to be ( | )

0 1 1 D

L T c x x −. The length of the codeword, in

binary digits, is denoted as ( | )

0 1 1 D

T L x x −. We restrict ourselves to prefix codes here.

The codeword lengths ( | )

0 1 1 D

T L x x − determine the individual redundancies.

Definition 3 : The individual redundancy ( | , , )

0 1 1 D S

T

ρ x x − S Θ of a sequence

T x 1 given

the past symbols

0 x 1 (^) − D , with respect to a source with model S∈CD and parameter vector Θ (^) S , is

defined as

This estimator has properties that are listed in the lemma that follows.

Lemma 1 : The K-T probability estimator Pe ( a , b )

  1. can be computed sequentially, i.e., Pe ( 0 , 0 )= 1, and for a ≥ 0 and b ≥ 0

( 1 , ) P ab a b

a

Pe a be

  • = and

( , 1 ) P ab a b

b

Pe abe

  1. satisfies, for a + b ≥ 1 , the following inequality:

a b e a b

b

a b

a

a b

P ab ( ) ( )

6. Coding for an Unknown Tree Source

A. Definition of the Context-Tree Weighting Method

Consider the case where we have to compress a sequence which is (supposed to be)

generated by a tree source, whose suffix set S∈CD and parameter vector Θ S are unknown to

the encoder and the decoder. We will define a weighted coding distribution for this situation.

Definition 5 : The context tree TD is a set of nodes labeled s, where s is a (binary) string

with length l ( s ) such that 0 ≤ l ( s )≤ D. Each node s∈ TD with l ( s )< D , “splits up” into

two nodes, 0s and 1s. The counts must satisfy a (^) 0 s + a 1 s = as and b 0 (^) s + b 1 s = bs

Definition 6 : To each node s∈ TD , we assign a weighted probability

s Pw which is

defined as:

s s s e s s w w w

e s s

P a b P P P

P a b

for

for

l s D

l s D

(3)

The context tree together with the weighted probabilities of the nodes is called a weighted

context tree.

We define our weighted coding distribution as

0 0 ( 1 | 1 ) ( 1 | 1 )

t t PC x x (^) D Pw x x D

λ − ^ − for all^1 {0,1}

t t

x ∈ , t = 0, 1, ⋅⋅⋅ , T, where λ is the root

node of the context tree TD. (4)

We can check that the weighted coding distribution satisfies (1).

B. An Upper Bound on the Redundancy

Definition 7 : Let

log 1 2

z

z z

for

for

0 z 1

z D

The basic result concerning the context-tree weighting technique can be stated now.

Theorem 2 : The individual redundancies with respect to any source with model SCD

and parameter vector Θ S are upper-bounded by

0 1 1 D S

T

ρ x x − S Θ <^ D ( ) | | ( ) 2

T

S S

S

Γ + γ + for^ all^1 {0,1}

T T x ∈ ,^ for^ any

sequence of past symbols

0 x 1 (^) − D. The three terms on the right hand side of the inequality

represents upper bound of model redundancy, parameter redundancy and coding redundancy

respectively.

Corollary : Using the coding distribution in (4), the codeword lengths

0 ( 1 | 1 )

T L x x (^) − D are

upper bounded by

0 (^1 1 ) 1 1

( | ) min(min log ( ) | | ( )) 2 D S ( | , , )

T D (^) S C T D a D S

T

L x x S S P x x S S

∈ Θ −

C. Implementation of the Context-Tree Weighting Method

  1. Encoding and Decoding: We assume that a node sTD contains the pair ( as , bs ), the

estimated probability P ae ( (^) s , bs ) and the weighted probability

s P w. When a node is created,

the counts as and bs are made 0, the probabilities P ae ( (^) s , bs )and

s P w are made 1. We then use

the update scheme as indicated by (2) & (3) to get the coding distribution. Then the encoding

and decoding procedure follows through with the Elias algorithm.

  1. Complexity Issues: For each symbol xt we have to visit D+1 nodes. Some of these nodes

have to be created first. From this it follows that the total number of allocated nodes cannot be

more than T D ( + 1). This makes the storage complexity not more than linear in T. Note also

that the number of nodes cannot be more than

1 2 1

D + − , the total number of nodes in TD.

The computational complexity, i.e. the number of additions, multiplications, and

divisions, is proportional to the number of nodes that are visited, which is T D ( + 1).

Therefore, this complexity is also linear in T.