



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Paper; Class: Information Theory; Subject: Electrical Engineering; University: Notre Dame; Term: Fall 2005;
Typology: Papers
1 / 6
This page cannot be seen from the preview
Don't miss anything!




Zijun Wu
November 9, 2005
Abstract
In this tutorial, we try to give a tutorial overview of The Context Tree Weighting Method.
We confine our discussion to binary bounded memory tree sources and describe a sequential
universal data compression procedure, which achieves a desirable coding distribution for tree
sources with unknown model and unknown parameters. Computational and storage
complexity of the proposed procedure are both linear in the source sequence length.
1. Introduction
In our class, we learned Huffman codes. For small source alphabets, though, we have
efficient coding only if we use long blocks of source symbols. It is therefore desirable to have
an efficient coding procedure that works for long blocks of source symbols. Huffman coding
is not ideal for this situation. Arithmetic coding achieves this goal. When no priori probability
distribution knowledge of the source is available, one uses universal coding algorithm. For
finite memory source model, Weinberger, Ziv and Lempel developed a sequential algorithm
for universal coding, where artificial parameter K is involved. To avoid choosing parameter,
Willems, Shtarkov and Tjalkens proposed a weighting method. This method estimates the
source probability given past symbols and then combines it with arithmetic algorithm. We
will outline the basics of the method in this tutorial.
2. Binary Bounded Memory Tree Sources
A binary tree source generates a sequence
∞ x − (^) ∞ of digits assuming values in the alphabet
{0,1}. We denote by
n xm the sequence x (^) m xm + 1 ⋅⋅⋅ xn and allow m and n to be infinitely
The statistical behavior of a binary finite memory tree source can be described by means
of a suffix set S. This suffix set is a collection of binary strings s(k), with k=1,2,…,|S|. We
require it to be proper and complete. Properness of the suffix set implies that no string in S is
a suffix of any other string in S. Completeness guarantees that each semi-infinite sequence
(string) ⋅ ⋅⋅ x (^) n − 2 xn − 1 xn has a suffix that belongs to S. This suffix is unique since S is proper.
A bounded memory tree source has a suffix set S that satisfies l (s) ≤ D for all s∈S. We
say that the source has memory not larger than D.
suffix s in S.
Definition 1 : The actual next-symbol probabilities for a bounded memory tree source
with suffix set S and parameter vector Θ (^) S are
1 P X 1 x S
t a tD
− = (^) − Θ S ) 1 ( 0 | , , )
1 s
t = − Pa Xt = xtD S Θ
− − (^ )
− 1 = (^) −
t
The actual block probabilities are now products of actual next-symbol probabilities, i.e.
0 P X 1 x 1 x 1 D S
t t a =^ − Θ^ S )^ =
t
= 1
τ
1 Pa X x x D S
− = −
τ τ τ τ Θ^ S )
All sources with the same suffix set are said to have the same model. The set of all tree
models having memory not larger than D is called the model class CD. It is possible to specify
a model in this model class by a natural code by encoding the suffix set S recursively. The
otherwise, it is 0 if s∈S and 1 followed by the codes of the strings 0s and 1s if s∉S. If we use
this natural code, the number of bits that are needed to specify a model S∈CD is equal
to Γ D ( S ), where
Definition 2 : Γ D ( S ), the cost of a model S with respect to model class CD is defined as
Γ D ( S )= |S| - 1 + |{s:s∈S, l (s) ≠ D}|
where it is assumed that S∈CD.
So far, we’ve proposed a model for the source. And we already know arithmetic coding
is a good sequential method in the sense that its individual coding redundancy is no more than
2 bits. Now our task is to find a distribution estimate which bridges between the model and
arithmetic coding method. In the following, we define the coding redundancy, describe the
redundancy upper bound of arithmetic coding, and then a weighting method as a distribution
estimate.
3. Codes and Redundancy
We assume that both the encoder and the decoder have access to the past source symbols
1 1 0
0 x 1 (^) − D = x − D ⋅⋅⋅ x − x , so that implicitly the suffix that determines the probability distribution
of the first source symbols, is available to them. We denote the functional relationship
between source sequence and codeword to be ( | )
0 1 1 D
L T c x x −. The length of the codeword, in
binary digits, is denoted as ( | )
0 1 1 D
T L x x −. We restrict ourselves to prefix codes here.
The codeword lengths ( | )
0 1 1 D
T L x x − determine the individual redundancies.
Definition 3 : The individual redundancy ( | , , )
0 1 1 D S
T
T x 1 given
the past symbols
0 x 1 (^) − D , with respect to a source with model S∈CD and parameter vector Θ (^) S , is
defined as
This estimator has properties that are listed in the lemma that follows.
Lemma 1 : The K-T probability estimator Pe ( a , b )
( 1 , ) P ab a b
a
Pe a b ⋅ e
= and
( , 1 ) P ab a b
b
Pe ab ⋅ e
a b e a b
b
a b
a
a b
P ab ( ) ( )
6. Coding for an Unknown Tree Source
A. Definition of the Context-Tree Weighting Method
Consider the case where we have to compress a sequence which is (supposed to be)
generated by a tree source, whose suffix set S∈CD and parameter vector Θ S are unknown to
the encoder and the decoder. We will define a weighted coding distribution for this situation.
Definition 5 : The context tree TD is a set of nodes labeled s, where s is a (binary) string
with length l ( s ) such that 0 ≤ l ( s )≤ D. Each node s∈ TD with l ( s )< D , “splits up” into
two nodes, 0s and 1s. The counts must satisfy a (^) 0 s + a 1 s = as and b 0 (^) s + b 1 s = bs
Definition 6 : To each node s∈ TD , we assign a weighted probability
s Pw which is
defined as:
s s s e s s w w w
e s s
P a b P P P
P a b
for
for
l s D
l s D
(3)
The context tree together with the weighted probabilities of the nodes is called a weighted
context tree.
We define our weighted coding distribution as
0 0 ( 1 | 1 ) ( 1 | 1 )
t t PC x x (^) D Pw x x D
λ − ^ − for all^1 {0,1}
t t
node of the context tree TD. (4)
We can check that the weighted coding distribution satisfies (1).
B. An Upper Bound on the Redundancy
Definition 7 : Let
log 1 2
z
z z
for
for
0 z 1
z D
The basic result concerning the context-tree weighting technique can be stated now.
Theorem 2 : The individual redundancies with respect to any source with model S ∈ CD
and parameter vector Θ S are upper-bounded by
0 1 1 D S
T
T T x ∈ ,^ for^ any
sequence of past symbols
0 x 1 (^) − D. The three terms on the right hand side of the inequality
represents upper bound of model redundancy, parameter redundancy and coding redundancy
respectively.
Corollary : Using the coding distribution in (4), the codeword lengths
0 ( 1 | 1 )
T L x x (^) − D are
upper bounded by
0 (^1 1 ) 1 1
( | ) min(min log ( ) | | ( )) 2 D S ( | , , )
T D (^) S C T D a D S
L x x S S P x x S S
∈ Θ −
C. Implementation of the Context-Tree Weighting Method
estimated probability P ae ( (^) s , bs ) and the weighted probability
s P w. When a node is created,
the counts as and bs are made 0, the probabilities P ae ( (^) s , bs )and
s P w are made 1. We then use
the update scheme as indicated by (2) & (3) to get the coding distribution. Then the encoding
and decoding procedure follows through with the Elias algorithm.
have to be created first. From this it follows that the total number of allocated nodes cannot be
more than T D ( + 1). This makes the storage complexity not more than linear in T. Note also
that the number of nodes cannot be more than
1 2 1
D + − , the total number of nodes in TD.
The computational complexity, i.e. the number of additions, multiplications, and
divisions, is proportional to the number of nodes that are visited, which is T D ( + 1).
Therefore, this complexity is also linear in T.