A Tutorial of The Context Tree Weighting Method: Basic Properties

Zijun Wu

November 9, 2005

Abstract

In this tutorial, we try to give a tutorial overview of The Context Tree Weighting Method.

We confine our discussion to binary bounded memory tree sources and describe a sequential

universal data compression procedure, which achieves a desirable coding distribution for tree

sources with unknown model and unknown parameters. Computational and storage

complexity of the proposed procedure are both linear in the source sequence length.

1. Introduction

In our class, we learned Huffman codes. For small source alphabets, though, we have

efficient coding only if we use long blocks of source symbols. It is therefore desirable to have

an efficient coding procedure that works for long blocks of source symbols. Huffman coding

is not ideal for this situation. Arithmetic coding achieves this goal. When no priori probability

distribution knowledge of the source is available, one uses universal coding algorithm. For

finite memory source model, Weinberger, Ziv and Lempel developed a sequential algorithm

for universal coding, where artificial parameter K is involved. To avoid choosing parameter,

Willems, Shtarkov and Tjalkens proposed a weighting method. This method estimates the

source probability given past symbols and then combines it with arithmetic algorithm. We

will outline the basics of the method in this tutorial.

2. Binary Bounded Memory Tree Sources

A binary tree source generates a sequence ∞

∞−

x of digits assuming values in the alphabet

{0,1}. We denote by n

x the sequence nmm xxx

⋅

+1 and allow m and n to be infinitely

large. For n<m the sequence is empty, denoted by

The statistical behavior of a binary finite memory tree source can be described by means

of a suffix set S. This suffix set is a collection of binary strings s(k), with k=1,2,…,|S|. We

require it to be proper and complete. Properness of the suffix set implies that no string in S is

a suffix of any other string in S. Completeness guarantees that each semi-infinite sequence

(string) nnn xxx 12 −−

⋅⋅⋅ has a suffix that belongs to S. This suffix is unique since S is proper.

A bounded memory tree source has a suffix set S that satisfies l(s)

≤

D for all s∈S. We

say that the source has memory not larger than D.

We define a suffix function )(

⋅

, which maps semi-infinte sequences onto their unique

suffix s in S.

Definition 1: The actual next-symbol probabilities for a bounded memory tree source

A Tutorial of The Context Tree Weighting Method: Basic Properties | EE 80653, Papers of Electrical and Electronics Engineering

Related documents

Partial preview of the text

Download A Tutorial of The Context Tree Weighting Method: Basic Properties | EE 80653 and more Papers Electrical and Electronics Engineering in PDF only on Docsity!

large. For n<m the sequence is empty, denoted by φ.

We define a suffix function β S (⋅), which maps semi-infinte sequences onto their unique

θ β S xtD for all t.

code of S is the code of the empty string λ. The code of a string s is void if l (s) = D;

ρ x x − S Θ of a sequence

x ∈ , t = 0, 1, ⋅⋅⋅ , T, where λ is the root

ρ x x − S Θ <^ D ( ) | | ( ) 2

T

S S

S

Γ + γ + for^ all^1 {0,1}

T