Docsity
Docsity

Prepara tus exámenes
Prepara tus exámenes

Prepara tus exámenes y mejora tus resultados gracias a la gran cantidad de recursos disponibles en Docsity


Consigue puntos base para descargar
Consigue puntos base para descargar

Gana puntos ayudando a otros estudiantes o consíguelos activando un Plan Premium


Orientación Universidad
Orientación Universidad


An Introduction to Arithmetic Coding, Resúmenes de Matemáticas

Arithmetic coding is a data compression technique that encodes data (the data string) by creating a code string which represents a fractional value on the number line between 0 and 1. The coding algorithm is symbolwise recursive; i.e., it operates upon and encodes (decodes) one data symbol per iteration or recursion. On each recursion, the algorithm successively partitions an interval of the number line between 0 and I , and retains one of the partitions as the new interval. Thus, the algorithm successively deals with smaller intervals, and the code string, viewed as a magnitude, lies in each of the nested intervals. The data string is recovered by using magnitude comparisons on the code string to recreate how the encoder must have successively partitioned and retained each nested subinterval. Arithmetic coding differs considerably from the more familiar compression coding techniques, such as prefix (Huffman) codes.

Tipo: Resúmenes

2022/2023

Subido el 17/05/2023

perhacker
perhacker 🇪🇸

2 documentos

1 / 15

Toggle sidebar

Esta página no es visible en la vista previa

¡No te pierdas las partes importantes!

bg1
Glen G. Langdon,
Jr.
An Introduction to Arithmetic Coding
Arithmetic coding is a data compression technique that encodes data (the data string) by creating a code string which represents a
fractional value
on
the number line between
0
and
1.
The coding algorithm is symbolwise recursive; i.e., it operates upon and
encodes (decodes) one data symbol per iteration or recursion.
On
each recursion, the algorithm successively partitions an interval
of the number line between
0
and
I,
and retains one of the partitions as the new interval. Thus, the algorithm successively deals
with smaller intervals, and the code string, viewed as a magnitude, lies in each of the nested intervals. The data string is recovered
by using magnitude comparisons
on
the code string to recreate how the encoder must have successively partitioned and retained
each nested subinterval. Arithmetic coding
differs
considerably from the more familiar compression coding techniques, such as
prefix (Huffman) codes.
Also,
it should not be confused with error control coding, whose object is to detect and correct errors in
computer operations. This paper presents the key notions of arithmetic compression coding by means of simple examples.
1.
Introduction
Arithmetic coding maps a string of data (source) symbols to a
code string in such a way that the original data can be
recovered from the code string. The encoding and decoding
algorithms perform
arithmetic operations
on the code string.
One recursion of the algorithm handles one data symbol.
Arithmetic coding is actually a
family
of codes which share
the property of treating the code string as a magnitude. For a
brief history of the development of arithmetic coding, refer to
Appendix
1.
Compression systems
The notion of compression systems captures the idea that data
may
be
transformed into something which is encoded, then
transmitted to a destination, then transformed back into the
original data. Any data compression approach, whether em-
ploying arithmetic coding, Huffman codes, or any other cod-
ing technique, has a
model
which makes some assumptions
about the
data
and the
events
encoded.
The code itself can
be
independent of the model. Some
systems which compress waveforms (eg, digitized speech)
may predict the next value and encode the error. In this model
the error and not the actual data is encoded. Typically, at the
encoder side of a compression system, the data to be com-
pressed feed a model unit. The model determines
1)
the
event@) to
be
encoded, and
2)
the estimate of the relative
frequency (probability) of the events. The encoder accepts the
event and some indication of its relative frequency and gen-
erates the code string.
A
simple model is the memoryless model, where the data
symbols themselves are encoded according to a single code.
Another model is the first-order Markov model, which uses
the previous symbol as the
context
for the current symbol.
Consider, for example, compressing English sentences. If the
data symbol (in this case, a letter) “q” is the previous letter,
we would expect the next letter to be “u.” The first-order
Markov model is a
dependent
model; we have a different
expectation for each symbol (or in the example, each letter),
depending on the context. The context is, in a sense, a state
governed by the past sequence of symbols. The purpose of a
context is to provide a probability distribution,
or
statistics,
for encoding (decoding) the next symbol.
Corresponding to the symbols are statistics. To simplify the
discussion, consider a single-context model, i.e., the memory-
less model. Data compression results from encoding the more-
frequent symbols with short code-string length increases, and
encoding the less-frequent events with long code length in-
creases. Let
e,
denote the occurrences of the ith symbol in a
data string. For the memoryless model and a given code, let
4
denote the length (in bits) of the code-string increase associated
0
Copyright
1984
by International Business Machines Corporation. Copying in printed form for private
use
is permitted without payment of
royalty provided that
(1)
each reproduction is done without alteration and
(2)
the
Journal
reference and IBM copyright notice are included on the
first page. The title and abstract, but no other portions, of this paper may
be
copied
or
distributed royalty free without further permission by
computer-based and other information-service systems. Permission to
republish
any other portion of this paper must
be
obtained from the Editor.
IBM
J.
RES.
DEVELOP.
VOL.
28
NO.
2
MARCH
I
984
135
GLEN
G.
LANGDON.
JR.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Vista previa parcial del texto

¡Descarga An Introduction to Arithmetic Coding y más Resúmenes en PDF de Matemáticas solo en Docsity!

Glen G. Langdon,Jr.

An Introduction to Arithmetic Coding

Arithmetic coding is a data compression technique that encodes data (the data string) by creating a code string which represents a

fractional value on the number line between 0 and 1. The coding algorithm is symbolwise recursive; i.e., it operates upon and

encodes (decodes) one data symbol per iteration or recursion. On each recursion, the algorithm successively partitions an interval

of the number line between 0 and I , and retains one of the partitions as the new interval. Thus, the algorithm successively deals

with smaller intervals, and the code string, viewed as a magnitude, lies in each of the nested intervals. The data string is recovered

by using magnitude comparisons on the code string to recreate how the encoder must have successively partitioned and retained

each nested subinterval. Arithmetic coding differs considerably from the more familiar compression coding techniques, such as

prefix (Huffman) codes. Also, it should not be confused with error control coding, whose object is to detect and correct errors in

computer operations. This paper presents the keynotions of arithmetic compression coding by meansof simple examples.

1. Introduction

Arithmetic coding maps a string of data (source) symbols to a code string in such a way that the original data can be recovered from the code string. The encoding and decoding

algorithms perform arithmetic operations on the code string.

One recursion of the algorithm handles onedata symbol.

Arithmetic coding is actually a family of codes which share

the property of treating the code string as a magnitude. For a brief history ofthe development of arithmetic coding, refer to Appendix 1.

Compression systems

The notion of compression systems captures the idea that data may be transformed into something which is encoded, then transmitted to a destination, then transformed back into the original data. Any data compression approach, whether em- ploying arithmetic coding, Huffman codes, or any other cod-

ing technique, has a model which makes some assumptions

about the data and the events encoded.

The code itself can be independent of the model. Some systemswhich compress waveforms ( e g , digitizedspeech) may predict the next valueand encode the error. In this model the error and not the actual data is encoded. Typically, at the encoder side of a compression system, the data to be com- pressedfeed a model unit. The model determines 1) the event@) to be encoded, and 2) the estimate of the relative

frequency (probability) of the events. The encoder accepts the event and some indication of its relative frequency and gen- erates the code string.

A simple model is the memoryless model, where the data symbols themselves are encoded according to a single code. Another model is the first-order Markov model, whichuses

the previous symbol as the context for the current symbol.

Consider, for example, compressing English sentences. If the data symbol (in this case, a letter) “q” is the previous letter, we wouldexpect the next letter to be “u.” The first-order

Markov modelis a dependent model; we have a different

expectation for each symbol (or in the example, each letter), depending on the context. The context is, in a sense, a state governed by the past sequence of symbols. The purpose of a

context is to provide a probability distribution, or statistics,

for encoding (decoding) the next symbol.

Corresponding to the symbols are statistics. To simplify the discussion, consider a single-context model, i.e., the memory- less model. Data compression results from encoding the more- frequent symbols with short code-string length increases, and encoding the less-frequent events with long code length in-

creases. Let e, denote the occurrences of the ith symbol in a

data string. For the memoryless model and a given code, let 4

denote the length (in bits) of the code-string increase associated

0 Copyright 1984 byInternational Business Machines Corporation. Copying inprintedformforprivate use is permitted without payment of royalty provided that ( 1 ) each reproduction is done without alteration and (2) the Journal reference and IBM copyright notice are included on the firstpage. The title and abstract,but no other portions, of this papermay be copied or distributedroyaltyfreewithoutfurtherpermission by computer-based and other information-servicesystems. Permission to republish any other portion of this paper must be obtained from the Editor.

IBM J. RES. DEVELOP. VOL. 28 NO. 2 MARCH I 984

GLEN G. LANGDON. JR.

Table Huffman 1 Example code. Encoder

GLEN G. L

The encoder accepts the events to be encoded and generates Symbol CodewordProbability (^) (inbinary) p (^) probability PCumulative the code string.

a 0. l o o .Ooo

b 10 ,010. l o o C 110 .oo 1. I 10 d 1 1 1^ .oo^1. I^11

with symbol i. The code-stringlengthcorresponding to the

data string is obtained by replacing each data symbol with its associated length and summing thelengths:

c C r 4. I

If 4 is large for data symbols of high relative frequency (large

values of c,), the given code will almost surely fail to achieve

compression. The wrong statistics (a popular symbol with a

long length e ) lead to a code stringwhich may have more bits than the original data. For compressionit is imperative to closely approximate the relative frequency of the more-fre-

quent events. Denote the relative frequency of symbol i as p,

where p, = cJN, and N is the total number of symbols in the

data string. If we use a fixed frequency for each data symbol

value, the best we can compress (accordingto ourgiven model)

is to assign length 4 as -log pi. Here, logarithms are taken to

the base 2 and the unitof length is the bit. Knowing the ideal

length for each symbol, we calculate the ideal code length for

the given data string and memoryless model by replacing each

instance of symbol i in the datastring by length value -log pt,

and summing thelengths.

Let us now review the componentsof a compressionsystem: the model structure for contexts andevents, the statistics unit for estimation of the event statistics, and theencoder.

Model structure

In practice, the model is a finite-state machine which operates successively on each data symbol and determines the current event to be encoded and its context (i.e., which relative frequency distribution applies to the current event). Often, eachevent is thedata symbol itself, butthestructurecan define other eventsfrom which thedata stringcould be

reconstructed. For example, one could define an event such

as the runlength of a succession of repeated symbols, i.e., the number of times the currentsymbol repeats itself.

Statistics estimation

The estimation method computes the relative frequency dis-

tribution used for each context.Thecomputation may be

performed beforehand, or may be performed during the en-

coding process, typically by a counting technique. For Huff-

man codes, the event statistics are predetermined by the length of the event’s codeword.

The notionsof model structure andstatistics are important because they completelydetermine theavailable compression. Consider applications where the compression model is com-

plex, i.e., has several contexts anda need to adapt to the data

string statistics. Due to theflexibility of arithmetic coding, for such applications the “compression problem” is equivalent to the “modelingproblem.”

Desirable properties of a coding method

We now list some properties for which arithmetic coding is amply suited.

For most applications, we desire thejrst-infirst-out (FIFO)

property: Events are decoded in the same order as they are

encoded. FIFO coding allows for adapting to the statistics of

the data string. With last-in jirst-out (LIFO) coding, the last

event encoded is the first event decoded, so adapting is diffi-

cult.

We desire nomorethan asmall storage buffer atthe encoder. Once events are encoded,we do notwant the encod- ing of subsequent events to alter what has already been gen- erated.

Theencoding algorithmshould be capableofaccepting successive events from different probabilitydistributions. Arithmetic coding has this capability. Moreover,the code acts directly on the probabilities, and can adapt “on the fly” to changing statistics. TraditionalHuffman codes require the design of a different codeword set for different statistics.

An initial view of Huffman and arithmetic codes

We progress to a very simple arithmetic code by first using a prefix (Huffman) code asan example. Our purpose is to introducethe basic notions of arithmetic codesina very simple setting.

Considerafour-symbolalphabet,for which the relative frequencies 4, i, i , and Q call for respective codeword lengths

of 1, 2, 3, and 3. Let us order thealphabet {a, b, c, d) according

to relative frequency, and use the code of Table 1. The probability column has the binary fraction associated with the probability corresponding to the assigned length.

The encoding for the data string “a a b c” is 0.0. IO. 110,

where “. ” is used as a delimiter to show the substitution of

the codeword for the symbol. The code also has the prefix

property (no codeword is the prefix of another). Decoding is

performed by a matching or comparison process starting with

the first bit of the code string. For decodingcodestring

A N G W N , JR. IBM J. RES. DEVELOP, VOL. 28 - NO. 2 MARCH 1984

0 ,001 1001 Ill I I 4 I , (^) d I a I h I ( (^) I I I d 1 a I h I c I 1 - I Id1 a I b I C I

I...l

Figure 3 Subdivision of unit interval for arithmetic code of Table 2

and data string “a a b.. ..”

Table 2 Arithmeticcodeexample.

Symbol Cumulative (^) Symbol Length probability P (^) probability p

d .ooo .oo 1 3

b .oo 1 .010 2

a .o I 1. I O 0 1

C .111 .oo 1 3

W of the current interval and the cumulative probability P,

for the symbol i being encoded:

New C = Current C + ( A X Pi). For example, after encoding “a a,” the current code point C

is 0 and theinterval width A is .O 1. For “ a a b,” the new code

point is .OO I , determined as 0 (current code point C), plus the

product (.Ol) X (.loo). The factor on theleft is the width A of

the current interval, and the factor on the right is the cumu- lativeprobability P for symbol “b”; see the“Cumulative probability” column of Table 1.

New interval width A

The width A of the current interval is the product of the

probabilities of the data symbols encoded so far. Thus, the

new interval width is New A = Current A X Pi,

where thecurrent symbol is i. For example,afterencod-

ing “a a^ b,”^ the interval width is^ (. I )^ X^ (. I )^ X^ (.Ol),^ which is .om I.

In summary,we can systematically calculate the next inter-

val from the leftmost point C and width A of the current

interval, given the probability p and cumulativeprobability P

values inTable I forthe symbol to be encoded. The two

operations (new codepoint andnew width) thus forma double

recursion. This double recursion is central to arithmetic cod-

ing, and this particular version is characteristic of the class of FIFO arithmetic codes which use the symbolprobabilities directly.

TheHuffman codeof Table I correspondsto a special integer-length arithmetic code. With arithmetic codes we can rearrange the symbols and forsake the notion of a k-bit code- word for a symbol corresponding to a probability of 2-k. We

GLEN G. LANGDON, JR.

retain the important techniqueof the double recursion. Con- sider thearrangement of Table 2. The “codeword”corre-

sponds to the cumulativeprobability P of the preceding sym-

bols in the ordering.

The subdivision of the unit interval for Table 2, and for the

data string “ a a b,” is shown in Figure 3. In this example, we

retain Points 1 and 2 of the previous example, but no longer

have the prefix property of Huffman codes. Compare Figs. 2

and 3 to see that the intervalwidths are the same but the

locations of the intervals have been changed in Fig. 3 to

conform with the new ordering in Table 2.

Let usagaincode the string “a a b c.” This example

reinforces thedouble recursionoperations, where the new values become the current values for the next recursion. It is helpful to understand the arithmeticprovided here, using the

“picture” of Fig. 3 for motivation.

The first “a” symbol yields the code point .O 1 1 and interval

[.O 1 1 ,. 1 1 l), as follows:

First symbol ( a )

C N e w c o d e p o i n t C = O + I X(.011)=.011.

(Current code point plus currentwidth A times P.) A : New interval width A = 1 X (.1) =. l. (Current width A times probability p. )

In the arithmeticcoding literature, we have called the value A

X P added to the oldcode point C, the augend.

The second “a” identifies subinterval [.1001,.1101).

Second symbol ( a )

C: New code point = .011 +. 1 X (.011) = .O 1 1 (current code point) .0011 (current width A times P, or augend)

.lo0 1. (new code point)

(Current width A times probability p.)

A: New interval width A =. 1 X (. I ) = .O 1.

Now the remaininginterval is one-fourththe width of the unit interval.

For the third symbol, “6,” we repeat the recursion.

Third symbol ( 6 )

C: New code point = .lo01 + .01 X (.001) = .10011. .lo01 (currentcodepoint C )

.OOOO 1 (current width A times P, or augend)

,1001 I (new code point)

(Current width A times probability p.)

A : New interval width A = .01 X (.01) = .0001.

Thus, following the codingof “ a a b,” the interval is

[.loo1 1,.10101).

IBM J. RES. DEVELOP. VOL. 28 NO. 2 MARCH 1984

To handle the fourthletter, “c,” we continue as follows.

Fourth symbol (c)

C: New code point = .IO011 + .0001 X (. I l l ) = .1010011. .IO01 1 (current codepoint)

,0000I 1 1 (current width A times P, or augend)

. I O 100 1 1 (new code point)

(Current width A times probability p.)

A : New interval width A = .0001 X (.001) = .0000001.

Carry-overproblem

The encoding of the fourth symbol exemplifies a small prob- lem, called the carry-over problem. After encoding symbols

“a,” “a,” and “b,” each having codewords of lengths1, 1, and

2, respectively, in Table I , the first four bits of an encoded

string using Huffman coding would not change. However, in this arithmeticcode, the encoding of symbol “c” changed the value of the third code-string bit. (The first three bits changed

from. I O 0 to .I01.) The change was prompted by a carry-over,

because we are basically adding quantities to thecode string. We discuss carry-over control later on in thepaper.

Code-string termination

Following encoding of “a ab c,” thecurrent interval is

[.1010011,.1010100). Ifwe were to terminate the code string at this point (no more data symbols to handle), any value

equal to or greater than ,101001 1, but less than .1010100,

would serve to identify the interval.

Let us overview the example. In our creation of code string .1010011, we in effect added properly scaled cumulative prob- abilities P, called augends, to the code string. For the width recursion on A , the interval widths are, fortuitously, negative integral powers of two, which can be represented as floating point numbers with one bit of precision. Multiplication by a negative integral power of two may be performed by a shift

right. The code string for “a a b c” is the result of the following

sum of augends, which displays the scaling by a right shift:

.o 1 I 01 1

.101001 I

Decoding

Let us retain code string .lOlOOl 1 and decode it. Basically, the code string tells the decoder what the encoder did. In a sense, the decoder recursively “undoes” the encoder’s recur- sion. If, for the first data symbol, the encoder had encoded a “b,” then (referring to the cumulative probability P column of Table 2), the code-string value would be at least ,001 but

less than .011.^ For^ encoding an^ “a,”^ the code-string value

would be at least .O^1 1 but less than^.^1 1 1.^ Therefore, the first

symbol of the data stringmust be “a” because code-string

IBM J. RESDEVELOP. VOL. 28 - NO. 2 MARCH 1984

. I O 1001 I lies in [.O 1 1 ,. I IO), which is a’s subinterval. We can summarize this step asfollows.

Step I : Decoder C comparison Examine the code string and

determine the interval in whichit lies. Decode the symbol corresponding to thatinterval.

Since the second subinterval code pointwas obtained at the encoder by addingsomethingto .011, we can prepare to decode the second symbol by subtracting .011 from the code

string: .IO1001 1 - .011 = .0100011. We then have Step 2.

Step 2: Decoder C readjust Subtract from the codestring the

augend value of the code point for thedecoded symbol.

Also, since the values for the second subinterval were ad- justed by multiplying by. I in the encoderArecursion, we can “undo” that multiplication by multiplying the remaining value of the codestring by 2. Our code string isnow .IO000 1 I. In summary, we have Step 3.

Step 3: DecoderC scaling Rescale thecode C for direct

comparison with P by undoingthe multiplication for the

value A.

Now we can decode the second symbol from the adjusted codestring .IO001 1 by dealing directly with the values in

Table 2 and repeating Decoder Steps 1, 2, and 3.

Decoder Step 1 Table 2 identifies “a” as the second data

symbol, because the adjusted code string is greater than .01 I

(codeword for “a”) but less than. I 1 I (codeword for “c”).

Decoder Step 2 Repeating the operation of subtracting .O I I ,

we obtain

.100011 - .01I = .001011.

Decoder Step 3 Symbol “a” causesmultiplication by. I at

the encoder, so the rescaled code string is obtained by doubling

the result of Decoder Step 2:

The third symbol is decoded as follows.

Decoder Step 1 Referring to Table 2 to decode the third

symbol, we see that .O I O 1 1 is equal to or greater than ,

(codeword for “b”) but less than ,011 (codeword for “a”), and

symbol “b” is decoded.

Decoder Step 2 Subtracting out .00 1 we have .OO 1 1 1 :

.01011 - ,001 = ,0011 I.

Decoder Step 3 Symbol “b” caused the encoder to multiply

by .O 1 , which is undone by rescaling with a 2-bit shift:

,001 11 becomes. I 1 I.

To decode the fourth and last symbol,Decoder Step 1 is

sufficient. The fourth symbol is decoded as “c,” whose code point corresponds to the remainingcode string.

GLEN G. L

ANGDON. JR.

The encoder operates onvariable MIN, whose values are T

(true) and F (false). The name MIN denotes “more probable

in.” If MIN is true (T), the event to be encoded is the more

probable, and if MIN is false (F), the event to be encoded is

the less probable. The decoder result is binary variable MOUT,

of values T and F, where MOUT means “moreprobable out.”

Similarly, at the decoder side, output value MOCJT is true (T) only when the decoded event is the more probable.

In practice, data to be encoded are not conveniently pro-

vided to us as the “more” or “less” probable values. Binary

data usually represent bits from the real world. Here, we leave to a statistics unit the determinationof event values T or F.

Consider, for example,a black and white image of two- valued pels (pictureelements) which hasaprimary white background. For these data we associate the instances of a white pel value to the “moreprobable” value (T) and a black pel value into the“less probable” value (F). The statistics unit would thus have an internal variable, MVAL, indicating that

white maps toT. On the other hand,if we had an image with

a black background, the mapping of values black and white

would be respectively to values T and F (MVAL is black). In

a more complex model, if the same black and white image had areas of whitebackground interspersed with neighbor- hoods of black, the mappingof pel values black/white to event values F and T could change dynamically in accordance with thecontext (neighborhood) of the pel location. In a black context, the black pel would be value T, whereas in the context of a white neighborhood the black pel would be value F.

The statistics unit must determine the additional informa-

tion as to by how much value T is more popular than value

F. The BAC coder requires us to estimate therelative ratio of

F to the nearest power of 2; does F appear 1 in 2, or I in 4,

or 1 in 8, etc., or 1 in 4096? We thereforehave 12 distinct

coding parameters SK, called skew, of respective index 1

through 12, to indicate the relative frequency of value F. In a crude sense, we select one of 12 “codes” for each event to be encoded or decoded. By approximatingto 12 skew values, instead of using a continuum of values, the maximum loss in coding efficiency is less than 4 percent of the original file size

at probabilities falling between skew numbers 1 and 2. The

loss at higher skew numbers is even less; see [2].

In what follows, our concern is how to code binary events after the relative frequencies have been estimated.

The basic encoding process

The double recursion introduced in conjunction with Table 2

appears in the BAC algorithm as a recursion on variables C (for code point) and A (for available space). The BAC algo-

rithm is initialized with the codespace as the unit interval

[ O , l ) from value 0.0 to value 1.0, with C = 0.0 and A = 1.0.

IBM J. RES. DEVELOP. VOL. 28 NO. 2 MARCH 1984

The BAC coder successively splits the width or size of the

available code space A , or current interval, into two subinter-

vals. The left subinterval is associated with F and the right subinterval with T. Variables C and A jointly describe the current intervalas, respectively, the leftmost point and the width. As with the initial code space, the current interval is closed on the left and open on theright: [C,C +^ A ).

In the BAC, not all intervalwidths are integral negative powers of two. For example, where p of event F is 4, the other

probability for T must be i.For the width associated with T

of j , we have more than one bit of precision. The product of

probabilitiesgreater than f can lead to a growing precision

problem. We solve the problem by representing space A with a floating point number to a fixed precision. We introduce variable E for the exponent, which controls the “data han- dling” and “shifting” aspect of the algorithm. We represent variable A in floating point with the most significant bit of A in position E from theleft. Thus theleading I-bit ofthe binary representation of A has value 2-”. For example, if A =

0.00101 I , E has value 3, 2? is 0.001, and A is 1.01 1.

In the encoder A recursion of the example of Table 2, the

width is determined by a multiplication. In the simple BAC algorithm, the smaller width is determined by the value SK, as in Eq. ( I ) , which follows. The other width is the difference between the current width and the smaller width, as in Eq. (2), which follows. No multiplication is needed.

The current interval is split according to the skew value SK

as follows. If SK is 1, the interval is split nearly in half, and if SK is 12, the interval is split with a very smallsubinterval

assigned to F. Note that we roughly subdivide thecurrent

interval in a way which corresponds to therelative frequency of each event. Let W(F) and W(T) be respectively the subin- terval widths assigned to F^ and to^ T.^ Specifically,

W(F) = 2-(E+SK), (1)

with the remainder of the interval width A assigned to T:

W(T) = A - 2-(E+SK), (2)

We can summarize thehandling of an event (value T or F) in the BAC algorithm in three steps. The first and second steps correspond to the A and C recursions described earlier. The third step is a “data handling” or scaling step which we have ignored in the earlier examples. Let s denote thestring ofdata symbols already encoded,and let notation C(s); A($), and E($)

respectively denote the values of variables C, A , and E follow-

ing the encoding of the data string. Now, after handling the

next event T, let the new values of variables C, A , and E be

respectively denoted C(s,T), A(s,T), and E(s,T).

Sfepsfor encoding of next event 141

GLEN G. LANGDON, JR.

Table 3 Exampleencoding-refining theinterval.

Event MIN SK E W(F) C A (value) (skew) (A’s lead (interval Os) pt) (least (F width) A)

Initial -^ -^0 -^ 0.o000oo^^1.^ m^0 (^1) T 3 0 .00 (^1) 0.001o00 0.1 1 1000

2 T 1 1 .o 1 0.01 lo00 0.101o

3 F 1 1 .o 1 0.01 lo00 0.01oooo 4 T 1 2 .oo 1 0. 1 m 0.001o

0 ,001 1 r I 1 1 c - c (^) / 1 Y F T S u b d ~ v i s ~ o n point

Figure 5 Intervalsplitting-subdivisionforEvent 1, Table 3.

Width h (^) , 0 O l l. I 101. I I (^) L I^ Y-^ I 1

(a) t^ Subdlvlslun^ polnt

  • Width 010 0 ,011 IO1 I I (^) L \ J J ( b ) Figure 6 Intervalsplitting-subdivision for Event 3, Table 3: (a) Current interval at endof Event 2 and subdivision point. (b) Current interval following encoding of Event 3.

Step I Given skew SK and E (the leading Os of A ) , subdivide

the current width as in Eqs. ( I ) and (2).

Step 2 Given the event values (T or F), C, W(T), and W(F),

describe the new interval: If T: C(s,T) = C(s) + W(F) and A(s,T) = W(T). ( 3 4 If F: C(s,F) = C(s) and A(s,F) = W(F). (3b)

Step 3 Given the new value of A , determine the new value

of E If T: If A(s,T) < 2-€(”), then E(s,T) = E($) + 1; otherwise E(s,T) = E($). If F: E(s,F) = E(s) + SK.

We continue the discussion by an example, where we en- code the four-event string T, T, F, T under respective skews

3 , I , 1, 1. Theencoding is described by Table 3, andthe

following description accompanies thistable.

For Event I , SK is3 and E is 0. ForStep 1, the width

associated with the value F, W(F), is 2” or 0.001. W(T) is

what is left over, or 1 .000 - 0.001 = 0. I 1 1. See Figure 5.

Relative to Step 2, Eq. (3), the subdivision point is C + W(T) or 0 + .OO 1 = .001. Since the binaryvalue is T and therelative frequency of the T event is equal to orgreater than 4,we keep the larger (rightmost) subinterval. Refemng to Fig. 5 , we see

that the new values of C and A which describe the interval are

now C(T) = 0.001 and A(T) = W(T) = 0.11 1. For Step 3, we

note that A has developed a leading 0, so E = 1.

For Event 2 of Table 3 , the skew SK is 1 and E is now I ,

so W(F)i~2-(~+~)or0.01. W(T)isthus0.111-0.010=0.101.

The subdivision point of the current interval is C + W(F), or

0.01 1. Again, the event value is T, so we keep the rightmost

part. The new value of C is the subdivision point 0.0 1 1, and

the new value ofA is W(T) or 0,101. Theleading 1 -bit position

of A has not changed, so E is still 1.

For Event 3 of Table 3, see Figure 6, which displays current

interval [.011,1) ofwidth .101. The smaller width W(F) is

2-(’+’) or .01. We add thisvalue to C toobtain

C + W(F), or subdivision point .101. See Fig. 6(a). Refemng

to Event 3 of Table 3, the value to encode is F, so we must

now keep the left side of the subdivision. By keeping the F

subinterval, the value of C remains at .01I and A becomes

W(F) or 0.01. Available width A has a new leading 0, so E

becomes 2. The resulting interval is shown in Fig. 6(b).

Separation of data handlingfrom the arithmetic

Arithmetic codes generate the code string by adding a sum-

mand (called augend in the arithmetic coding literature) to

the current code string and possibly shifting the result. The

summation operation creates a problem called the carry-over

problem. We can, in thecourse of code generation, shift out a

long string of 1s from the coding process. An addition could propagateacarry intothe longstring of Is, changing the

values of higher-order bits until a 0 is converted to a 1 and

stopping the curry chain.

In this section we show how the arithmetic using A and C can be separated fromthe carry-overhandling anddata-

GLEN G. LANGDON. JR IBM J. RES. DEVELOP. VOL. 28 NO. 2 MARCH 1984

Table 4 Example encoding with normalization.

Event MIN SK Q C A Normalization

Initial - - - 1 T 3

0.000 1 .ooo -

2 T 1 0 1.010 0.

Yes No 3 F 1 00 1.0001.100 F-shift of SK 4 T I 010 0.000 1.000 Yes


Q,C e Q,C + 2-sK, ( 4 4

A c A - 2-sK. (4b)

If the result in A is equal to or greater than I .OOO, we are done

with the double recursion. If the result in A is less than 1 .OOO,

a normalization shift is needed. We shift Q and C left as a

pair, and shift left A. Let “shl” denote shift left one bit, and “shl’” denote a shift left of two bits, etc. If A is less than I .OOO, then

Q,C +- shl Q G O , ( 5 4

A t shl A,O. (5b) In the above, “,O” denotes “0-fill” (the vacated positions are filled with Os).

If the symbol to be encoded is F, Fig. 8 shows thatthe

action to perform is relatively simple: Q,C e shlSK Q,C,O, ( 6 4 A t 1.0. (6b)

We use the same example as in Table 3 redone as shown in

Table 4. Columns Q, C, and A show the result of applying the

MIN and SK values of that step. The first row is the initiali- zation.

Event I , with C and A as initialized, encodes value T with

an SK of 3. The arithmetic result for Eq. (4a) is C = 0.000 +

0.001, and for Eq. (4b) is A = 1.000 - .OOl = 0.1 1 1. Since

0. I I I is less than 1 .O, we must apply Eq. ( 5 ) to normalize.

Following the normalization shift, Q is now 0, C is 0.010, and

A is 1.1 10.

Event 2 encodes value T with a skew of 1. We perform the

operations of Eq. (4a), resulting in C of 0. I 10 as follows:

0.010 (^) (old C)

0.1 10 (^) (new C)

+-. I (2-l)

Equation (4b) gives 1 .O 10 for A: 1.1 (^10) (old A)

1.010 (^) (new A)

-. I (-2”)

Since the register A result is greater than 1 .O, the normalization

shift of Eq. ( 5 ) is not needed.

Event 3 encodes value F at skew 1. The algorithm for

encoding an F is Eq. (6). The value F is encoded by shifting

the Q,C pair left SK bits and reinitializing A to 1.000. Sum-

marizing Event 3, an F of skew I is a one-bit shift left for Q,C,

so Q is 00 and C is 1.100. Equation (6b) reinitializes A to 1 .ooo.

Event 4 illustratesacarry-over.Event4encodes value T

with a skew of 1. Following Eq. (4a), adding 2” (0.100) to C

( 1.100) results in 10.OOO, where indicates the carry-out from

the C^ register. This carrypropagates to^ Q^ by activating encoder output signal ADD+I, and this carry-over operation converts

Q from 00 to 0 1. Meanwhile, for Eq. (4b), 2” subtracts from

register A leaving 0.100, so the normalization shift of Eq. ( 5 )

is needed. Q now becomes 010. The value of code string is

0100000, which is the same result of Table 3, as expected.

Carry-over control

Arithmetic coding ensuresthat no futurevalue of Ccan exceed the current value of C + A. Consequently, once a carry-over has propagated into a given code-string positionin Q, no other carry-over will reach thesame code-stringposition. In the abovesample, the secondbit of the codestring received a carry-over. The algorithm ensures that this same bit position (second fromthe beginning) cannot receive another carry- over during the succeeding encoding operations. This obser-

vation leads to a method for controlling the carry called bit-

stufing [3]. At the encoder side, if 16 Is in a row are emitted,

the buffer can insert (stuff) a 0. This 0 converts to a I and

blocks the carry from reaching the string of 16 Is. Therefore

a bit-stuff permits the block with the 16 Is to be transmitted. At the decoder side, if the decoder encounters 16 Is in a row, the decoder buffer removes and examines the stuffed bit. If the stuffed bit value is I , the carry is propagatedinside the decoder.

Code-string termination

When the encoding is done, the C register may be shifted into

the Q FIFO store.However,after the last eventhas been coded, we remain with interval [C,C + A ). Ifwe know the length of the datastring, then we know whenwe have decoded the last event. Therefore, any code string whose magnitude lies in [C,C + A ) decodes to the original data string. In the

present case, we can simply truncatethe trailing Os. The

truncation process leaves “01” as the codestring, with the

convention that the decoder insert as many trailing Os as it

takes to decode four databits.

In the general case, any code string greater than 0100000

(smallest value in current interval) and strictly less than C + A = 010000 + 0001000 = 010100 suffices. Our shortest

selection remains 0 I.

GLEN G.LANGDON. JR. (^) IBM J. RES. DEVELOP. VOL. 28 NO. 2 MARCH 1984

Decoding process

The decoding part of the BAC algorithm is shown in Figure

9. Consider decoding the example code string of Table 4. We

demonstrate the decoding process with the aid of Table 5.

Register A is initialized to 1.000 and C is initialized to the

first four bits out of the FIFO buffer Q. Since Q only has 2

bits (OI), we insert Os as needed. C is initialized to 0.100. The

description of each eventthat follows accompanies Fig. 9 and

one line of Table 5.

Event I To decode the first event we need the value of SK,

which is 3. We subtract 2" from C as an intermediate result

called CBUF. CBUF is 0.01 I , which is greater than 0, so the

resulting bit is T. So 2-' is subtracted from A, giving 0.1 1 1

and the contentsof CBUF are transferred into C. Register A

is less than 1.0, so a normalization shift is needed. C and A

arenow0.110and 1.110.

Event 2 Now we obtain the second value of SK, which is I.

Subtracting 2", or 0.100, from C gives a CBUF value of

0.0 IO, which is positive. Therefore the result is T, and 0.

is the new value of C. Subtracting 0.100 from register A gives

1 .O IO, so no normalization is needed.

Event 3 For thethird event, SK is again I , so we again

subtract 0.100 from C (which is now 0.010). The result CBUF

is negative, so the event is F. We do not load C from CBUF,

but simply shift C one position left, and reinitialize A. C is

now 0.100 and A is 1 .OOO.

Event 4 The fourth SK is I , and subtracting 0.100 from C

leaves 0.000 for CBUF. The result is not negative, so the event

is T. Tocontinuethe algorithm, we subtract 2" from A,

discover that the result 0.100 is less than I , and do a normal-

ization shift of C and A. A is now 1.000 and decoding is complete.

Notethatcolumn A and the Normalization columnsof

Table 4 (encoder) and Table 5 (decoder) are identical. The A

register contents always follow the same sequence of values for the decode phase as for the encode phase.

Framework for prefixcodes and arithmetic codes

We can apply the code-string-tree representation of the coding operations to arithmeticcodes. However, unlike prefix codes, in arithmetic coding the encoding of a given symbolmay result in a code spacedescribed by continuations of more than one leaf of the tree. We illustrate the point by showing the

example of Event 1 of Table 4 in the form of a code-string

tree of Figure 10.

The smallest subinterval at the current depthis a single leaf.

With a maximum skew SK of 2-3, we identify value ,001 with

a single leaf at the current depth. With a maximum SK of 3,

the value of A can range from 1 .OOO to 1.1 I O. At the same current depth where 2" is one leaf, the subset of code-string

IBM J. RES. DEVELOP. VOL. 28 NO. 2 MARCH 1984

START

INITIALIZE C shl' Q,O

Pet SK

A t shl A

f

EXIT Figure 9 Flowchart for BAC algorithm decoder.

A = 1. S u b d l v l w n point f o r SK = 3 (a)

f - v C = 0 0 0 1 unnornmalized A = I 110 ( n o r m a l l z e d ) C = ( 1. 0 1 0 dfter norrnallzatmn ( b l Figure 10 Code-string tree for Event I , Table 4: (a) Initial tree. (b) Following encoding of Event 1.

Table 5 Example decoding.

Event SK C (ajier) A (aJier) CBUF MOUT Normalization

-~ Initial - 0.100 1.000 - - I 3 0.110 1.110 0.011 T Yes 2 I 0.010 1.010 0.010 T No 3 I 0.100 1.000 -1.110 F F-shiftofSK 4 I 1.0000.000 0.000 T Yes

GLEN G. L A N G W N. JR.

4 and 5. Pasco [ 5 ] , who is responsible for the first FIFO

arithmetic code, solved the precision problem by truncating

the product of Eq. (8) to the sameprecision as A(s). In terms

of the leaves atthecurrentdepth of the code-stringtree,

truncation in Eq. (8) allows gaps to form in thecode space. A

gap is a leaf of the current interval which is not assigned to

any of the symbols. For such unassigned leaves, no datastring

maps to a continuation ofsuch leaves, and there is some

wasted code space. Martin [6] and Jones [7], independently

of Pasco and each other, discovered practical versions of Eqs.

(7) and (8) which partition the code space and have no gaps.

They do this by ensuring that C(s,k + 1) = C(s,k) + A(s,k).

Decodability Codes are not uniquelydecodable if two data strings map into the samecode space. In arithmetic codes, if the subdivi- sionof the interval yields an overlap, more than one data string can map to the overlappedsubinterval.Overlap oc-

curs when, in subdividing interval A(s), the sum of the sub-

interval widths exceeds A(s) itself. Thus, some continuations

of data string s may map to thesubinterval whose least point is C(s + I). We define string s + 1 to be the data string of the same length as s which is next in the lexical ordering. If s is

already the last string in the ordering, i.e., string “n n.. .n,”

then C(n n... n ) is 1 .OOO. The general decodability criterion

is thus C(s,n) + A(s,n)c C(s + I), (9)

where s,n denotes symbol n concatenatedto string s. For

arithmetic codes which do not explicitly use value A(s,n), it

may be calculated as C(s n n n n,.. .) - C(s,n). In cases where

Eq. (9) is violated, the interval of overlap is

[C(s + I),C(s,n)+ A(s,n)).

Similarly, a gap is formed between the mappings ofdata string

s a n d s + 1 if C(S + I ) - C(S) > A(s,l) +... + A(s,n). (10) A gap does not affect decodability; it simply means that the code space is not fully utilized.

P-based arithmetic codes For P-based arithmetic codes, the code space is represented as a number A which is subdivided by a multiplication in pro- portion to therelative frequencies of the symbols.

The decodability criterion for P-based codes is given by A ( s ) 2 A ( $ , ] ) + A(s,2) +... + A(s,n). (1 1) If this equation is met with equality forall s, then thealgorithm leaves no gaps. L-based arithmetic codes The L-based arithmetic codes represent the width of the code space A(s) as a value 2-[y(s)+x(s)1, where Y(s) is an integer and

X ( s ) is a rational fraction of an integer denominator M. In

fact, the lengthof the codestringfor s, denoted L(s), is

Y(s) + X@). Here, Y ( s ) corresponds to E(s) of the example of Table 3. [When the codestring is terminated following the encoding of the last symbol, the code-string length of C(s) is

within a few bits of L(s).] For L-based arithmetic codes, D(s,i)

is determined as the product

D(s,k) = D(X(s),k) x 2TY‘”), (12)

so that

C(s,k) = C(s) + D(s,k).

Since there are M distinct values of X ( $ ) , i.e., 0, 1/M, 2/

M ,.. ., ( M - l ) / M , and n symbols, we precalculate an M-by-

n table of values D(X,k). [Actually, for symbol k = 1, the

value of D is 0, SO only M-by-(n - 1) values need be stored.]

In Eq. (12), multiplication by 2-‘(”) is simply a shift.

Corresponding to the relative frequency estimates, pkr are

instead length approximations, 4 = -log P k. These lengths

must satisfy the generalized Kraftinequality [8]. Following

encoding of symbol k, the internal variables are updated:

L(s,k) = Y(s) + X ( s ) + 4 , where again L(s) is broken into integer part Y(s,k) and fraction

part X(s,k). The decodability criterion of Eq. ( 1 1) is also the

decodability criterion for L-based codes if A(s,k) is as defined

in Eq. (13).

For k < n define A(s,k) = B(s,k + 1 ) - B(s,k),

and for n,

A(s,n) = B(s,n,n) + B(s,n,n,n) +.. .. (13)

Applications

Langdon and Rissanen [ 3 ] applieda FIFO P-based binary

arithmetic code to the encoding of black and white images,

using as a contextthe value of neighboring pels already encoded. This work introducedthe notion ofthe binary coding parameter called skew. A binary codeis particularly useful for black and white images because the contextsemployed for successive pels may have different relative frequencies. Tra- ditional run-length codes such as Golomb’s [4]^ are only de- signed for aparticular relative frequency for the repeating symbol. In [ 3 ] , amethod to dynamically adapttothe pel statistics is described.Particularlysimple adaptation tech- niques exist for determining the skew number.

Arithmetic coding can also be applied to file compression

[5-7,9, I O ]. In [9], the first-order contexts are also determined

adaptively. A linearized binary tree can be used to store the skew numbers required to encode the decomposed 8-bit byte as eight encodings, each of a binary event. 147

IBM J. RES. DEVELOP. VOL. 28 NO. 2 MARCH 1984 GLEN G.LANGDON. JR

Arithmetic codes have been employedto design codes for a

constrained channel. A constrained channel has an alphabet,

e.g. (0,1], where not allstrings in (0,1]* are allowed. The Manchester code, where clocking and data information are

recordedtogether, is an example of a constrained channel

code used in magneticrecordingchannels. Shannon [ I I ]

studied the constrained channel, defined the constraints in terms of allowed transitions in a state table, and determined the probabilities for the allowed transitions which are needed

in order to map the maximum amount of information to a

set of allowed strings. Guazzo [ 121 showed that mapping data

strings to channel strings is analogous to the decompression operation, and recovering the data string from the channel

string corresponds to thecompression operation. Martin eta].

[ I ] showed thatthedual of decodability forcompression

coding is representability^ for constrained channel coding.^ To

be representable, the encoding operation can have overlaps but cannot have a gap. Some interesting L-based codes for constrained channels arederived in [ 131.

Guazzo [ 121, in addition to suggesting the application to

constrained-channel codes, contributed a code which subdi- vides the codespaceaccording tothe given probabilities. However, the subdivision method is quite crude compared to Pasco’s [5].

4. Comments

Arithmetic codescan achieve compressionas close to theideal as desired, for given statistics. Inaddition,the P-based FIFO arithmeticcodes which accept statistics directly facilitate dynamic adaptation to the statistics in one pass of the data [3]. A good binary code is important, as Shannon and others havenoted, because other alphabets can be converted to binary form by decomposition.

Of ovemding importance tocompression now is the mod- eling of the data. Rissanen and Langdon [ 141 have studied a framework for the encoding of data strings and have assigned a cost to a model based on the coding parameters required. Different modeling approachesmay be compared. They showed that blocking to form larger alphabets results in the same model entropy at the samecost in coding parameters as a symbolwise approach. In general, the compression system designer seeks ways to give up a small percentage of the ideal compressionin orderto simplify theimplementation.The existence of an efficient coding technique now places the emphasis on efficient context selection and parameter-reduc- tion techniques [ 141.

Acknowledgments

Most of the author’s work in this field was done jointly with J. Rissanen, and thisdebt is obvious.I also owe a debt to Joan Mitchell, who has made several contributions to arithmetic 148 coding [ 151. If this paper is more accessible tothe general

GLEN G. LANGDON. JR.

reader, it is due toJoan’s graciousand patient encouragement. I thankGerry Goertzelfor his insights in explaining the operation of arithmetic coding. I have also benefited from the encouragement of Janet Kelly, Nigel Martin, Stephen Todd, Ron Arps, and Murali Varanasi.

Appendix 1: Origins of arithmetic coding

The first step toward arithmetic coding was taken by Shannon [ 1 I], who observed in a 1948 paper that messages N symbols long could be encoded by first sorting the messages in order of their probabilities and then sending the cumulative proba- bility of the preceding messages in the ordering. The code

string was a binary fraction and was decoded by magnitude

comparison. The nextstep was taken by Peter Elias in an unpublished result; Abramson [ 161 described Elias’ improve-

ment in 1963 in a note at the endof a chapter. Elias observed

that Shannon’s scheme worked without sorting the messages, and that the cumulativeprobability of a message of N symbols could be recursively calculated from individual symbol prob-

abilities and the cumulative probability of the message of N

  • 1 symbols. Elias’ code was studied by Jelinek [ 171. The

codes of Shannon and Elias suffered from a serious problem:

As the message increased in length the arithmetic involved

required increasing precision. By using fixed-width arithmetic

units forthese codes, thetimetoencode eachsymbol is increased linearly with the length of the code string.

Meanwhile, another approach to coding was having a sim- ilar problem with precision. In 1972, Schalkwijk [ 181 studied coding fromthestandpoint of providing an index tothe encodedstringwithin a set of, possible strings. As symbols were added to the string, the index increased in size. This is a

last-in-jirst-out (LIFO) code, because the last symbol encoded

was the first symbol decoded. Cover[ 191 made improvements

to this scheme,which is nowcalled enumerative coding. These

codes suffered from the same precision problem.

Both Shannon’s code and the Schalkwijk-Cover code can be viewed as a mapping of strings to a number, forming two

branches of pre-arithmetic codes, called FIFO (@st-in-jirst-

out) and LIFO. Both branches use a double recursion, and

bothhave a precision problem.Rissanen [8] alleviated the

precision problem by suitable approximations in designing a LIFO arithmetic code. Code strings of any length could be generated with a fixed calculation time per data symbol using fixed-precision arithmetic.

Pasco [ 5 ] discovered a FIFOarithmetic code, discussed

earlier, which controlled the precision problem by essentially

thesame ideaproposed by Rissanen. In Pasco’s work, the

code stringwas kept in computer memory until last symbolthe was encoded. This strategy allowed a carry-over to be propa- gated over a long carry chain. Pasco [5] also conjectured on the family of arithmetic codes based on their mechanization.

IBM J. RES. DEVELOP. VOL. 28 NO. 2 MARCH 1984