Coding and Compression in Digital Image Processing | CSCE 472, Study notes of Digital Signal Processing

Material Type: Notes; Professor: Reichenbach; Class: Digital Image Processing; Subject: Computer Science and Engineering ; University: University of Nebraska - Lincoln; Term: Unknown 2008;

Typology: Study notes

Pre 2010

Uploaded on 08/30/2009

koofers-user-vjd
koofers-user-vjd 🇺🇸

10 documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Chapter 1
Coding and Compression
As described in Chapter ??, the signals from scenes, which are continuous, must be converted
to discrete values a process referred to as quantization. Many image acquisition systems
use a linear scale for quantization, but other scales can be used, e.g., a log scale. In any case,
the scale used in quantization will have an ordinal sequence of discrete values.
Coding is the operation of representing each quantized value as a bit pattern. Many image
acquisition systems use straightforward binary representations. For example, a system with
a scale of values 0–255 would code the values with binary codes as:
0 00000000
1 00000001
2 00000010
3 00000011
...
254 11111110
255 11111111
Costs and limitations on storage and communication motivate the development of methods
for image compression. Data compression is the operation of reducing the number of
bits required to represent a sequence of values. Access to pixels in a compressed image
(e.g., for viewing or processing) is typically inefficient and so images are decompressed to a
more usable format before other operations are performed. The process of compression and
decompression are pictured in Figure ??.
A central issue in compression is whether the result after compression and decompression is
identical to the original image. If the result is identical to the source, then the compression
scheme is called lossless. If the result is not identical to the source, then the scheme
is called lossy. Superficially, it seems highly desirable to use lossless compression, but
lossy compression schemes often can achieve an order-of-magnitude saving in representation
overhead and still produce an image that is very similar to the source.
1
pf3
pf4
pf5
pf8

Partial preview of the text

Download Coding and Compression in Digital Image Processing | CSCE 472 and more Study notes Digital Signal Processing in PDF only on Docsity!

Chapter 1

Coding and Compression

As described in Chapter ??, the signals from scenes, which are continuous, must be converted to discrete values — a process referred to as quantization. Many image acquisition systems use a linear scale for quantization, but other scales can be used, e.g., a log scale. In any case, the scale used in quantization will have an ordinal sequence of discrete values.

Coding is the operation of representing each quantized value as a bit pattern. Many image acquisition systems use straightforward binary representations. For example, a system with a scale of values 0–255 would code the values with binary codes as: 0 00000000 1 00000001 2 00000010 3 00000011

... 254 11111110 255 11111111

Costs and limitations on storage and communication motivate the development of methods for image compression. Data compression is the operation of reducing the number of bits required to represent a sequence of values. Access to pixels in a compressed image (e.g., for viewing or processing) is typically inefficient and so images are decompressed to a more usable format before other operations are performed. The process of compression and decompression are pictured in Figure ??.

A central issue in compression is whether the result after compression and decompression is identical to the original image. If the result is identical to the source, then the compression scheme is called lossless. If the result is not identical to the source, then the scheme is called lossy. Superficially, it seems highly desirable to use lossless compression, but lossy compression schemes often can achieve an order-of-magnitude saving in representation overhead and still produce an image that is very similar to the source.

2 CHAPTER 1. CODING AND COMPRESSION

1.1 Huffman Coding

Straightforward binary representations use the same number of bits for each quantized value. A more efficient scheme can be developed based on the observation that some values occur more frequently than other values. If a code with fewer bits is used for the values that occur more frequently, then the image can be coded in fewer bits (even if longer codes are required for values that occur less frequently).

Huffman coding is one of the simplest schemes based on this approach. A Huffman code can be constructed as follows:

  1. Compute the frequency distribution function (or histogram) of the image (or of an ensemble of images).
  2. For each value of the quantization scale, create a tree with a single root node and no leaves. The root node will store the quantization value and the occurrence frequency of that value. The set of all trees is the forest.
  3. While there are two or more trees in the forest, repeat:

(a) Select the two trees with the smallest associated frequency and remove them from the forest of trees. (If more than two trees have identical values that are smallest, any choice is acceptable.) (b) Join the two selected trees as children of a new root node (one is the ‘0’ child, one is the ‘1’ child, and either permutation is acceptable). Set the associated frequency of the new root node to the sum of the frequencies of the two child nodes. (A parent node will not have an associated quantization value.) Add the new tree to the forest.

  1. Generate a table with a row for each leaf in the resulting tree such that in each row of the table is the quantization value of the leaf and the path from the root to the leaf (in ‘0’s and ‘1’s).

Example. Consider an image with M ×N = 64 pixels and a quantization scale with 8 values. With straightforward binary coding, each value would be coded with three bits, ‘000’ for 0, ‘001’ for 1,... ‘111’ for 7. With 64 pixels and three bits per pixel, straightforward binary coding would require 192 bits.

A Huffman code can be generated based on the frequency distribution function. l 0 1 2 3 4 5 6 7 H[l] 5 9 16 21 9 3 1 0

The tree illustrated in Figure ?? captures the information in the frequency distribution function. Because the algorithm for producing the tree is indeterminate with respect to which smallest values are combined and with respect to which is the ‘0’ and ‘1’ child, other

4 CHAPTER 1. CODING AND COMPRESSION

The entropy of the image is the lower bound on the storage that can be achieved by Huffman coding. This bound is achieved if and only if

∀l, − lg

( H[l] M N

) is a whole number

In practice, this is seldom true, but Huffman coding always achieves compression that is within one bit of the entropy and is typically much closer to the entropy than one bit. Other compression schemes, such as arithmetic coding have been designed to reduce the inefficiency caused by fractional log probabilities to achieve compression arbitrarily close to the entropy.

The table used for Huffman coding must be available during both coding and decoding. If the table used by the coder is not known to the decoder, it must be included with the image. If the table must be included with the image, the effective compression ratio is diminished.

1.2 Block Huffman Coding

Pixel-by-pixel Huffman coding does not take advantage of the spatial correlations that typ- ically exist in images. These correlations can be used to more efficiently code images. Typi- cally, inter-pixel correlation is largest between neighboring pixels. Block Huffman coding is based on the expectation that nearby pixels are correlated and can be more efficiently encoded as a block than as separate values.

Block Huffman coding is based on the multi-dimensional frequency distribution function where each dimension corresponds to one of the pixels in the block. For example, the blocks can be formed from sequential pairs of pixels (i.e., pixels 0 and 1, pixels 2 and 3, etc), in which case the frequency distribution is a function of the pair of values. For example, consider a M ×N = 64 image with only four possible values and the 4×4 frequency distribution function of its block values:

p [m, n + 1] 0 1 2 3 0 3 2 1 1 p [m, n] 1 2 4 2 1 2 1 2 4 2 3 1 1 2 3

This second-order block frequency function indicates that there are 6 blocks with both pixels 0, 3 blocks with the first pixel 0 and the following pixel 1, etc

With straight binary coding, this image requires 64 pixels × 2 bits/pixel = 128 bits. First- order Huffman coding (based on a distribution of 14, 18, 18, 14) still requires 2 bits/pixel and

1.3. RUN-LENGTH ENCODING 5

so 128 bits for the image, even though the first-order entropy is less than 1.4 bits/pixel. For this image, Block Huffman coding requires 123 bits. The second-order entropy is less than 2.66 bits/block, which is 1.33 bits/pixel and 85bits/image. The storage for block Huffman coding is bounded below by the higher-order entropy and is no worse than one bit larger than the higher-order entropy.

The size of the table used for coding grows exponentially with the size of the block. For example, with a block size of 4 and 256 values per pixel, the code table can be larger than 4 billion codes (256^4 ). The size of the table is a major consideration for block Huffman coding.

1.3 Run-Length Encoding

Run-length encoding is another approach to compression that makes use of redundancy between consecutive pixels — in particular sequences of pixels that have the same value. Sequences of pixels with the same value are especially common in binary images such as scanned documents and facsimile transmissions. Grayscale images (and color images) can be analyzed into bit planes which can be run-length encoded. For bit planes, run-length encoding typically is very effective in more significant bit planes, in which successive pixels are highly correlated, but not very effective in the least significant bit planes.

Specific methods for run-length encoding vary according to the statistical properties of the data. In a simple scheme, the value and the number of pixels having the value are recorded. For example, 50 zeros, followed by 12 ones, followed by 15 zeros, etc., could be coded simply as 0, 50, 1, 12, 0, 15,.... In a binary image, there are only two possible values and the runs alternate, so the next value is known to be the other symbol, i.e., each run of zeros is followed by a run of ones and vice versa. Then, with the convention that the first length is for a run of zeros, these runs could be coded without the value as just the run lengths 50, 12, 15,.... If the image began with a run of ones, the first run length (for zeros) would be

The run length typically is represented in a fixed length integer, e.g., 8 bits per run length. If a run length is larger than can be represented in the allowed number of bits, it may be necessary to record more than one run length to represent the length of the run. For example, a run length of 400 zeros could be represented using 8-bit run lengths as a run of 255 zeros, followed by a run of 0 ones, followed by a run of 145 zeros.

If an image has long runs, run-length encoding can be very effective, but if the image does not have long run lengths, run-length encoding may increase the number of bits required. A common situation is that one symbol (e.g., zero or one) is very common and the other occurs infrequently. One statistical model for run-length encoding assumes that successive values are independent. In that case, the probability distribution of the run lengths for a given symbol is a geometric series:

P [l] = pl(1 − p)

1.4. PREDICTIVE CODING 7

Run-length encoding yields a sequence of run lengths. If some run lengths are more com- mon than others, the sequence of run lengths can be further compressed with Huffman or arithmetic coding.

1.4 Predictive Coding

Predictive coding attempts to remove redundancy by using prior pixel values to predict the next pixel value:

pc [n] = p [n] − f (∀p [n′] , n′^ < n)

where pc is the difference (or prediction error) between the pixel value p [n] and the value generated by the predictor f. For each pixel, the predictor f has access only to previously encoded pixels. Then, only what can’t be predicted, i.e., pc [n], must be coded. Removing redundancy reduces the entropy which allows more efficient coding, e.g., with Huffman or arithmetic coding.

A simple scheme uses the value of the current pixel to predict the next pixel:

f (∀p [n′] , n′^ < n) ≡

{ c n = 0 p [n − 1] n > 0

where c is the initial predictor value used to predict the first pixel. Given the sequence of pixel values:

the predictor in Equation 1.1 with c = 0 yields the sequence:

and prediction errors:

from (0 − 0), (2 − 0), (3 − 2), (3 − 3), (4 − 3), (5 − 4), (7 − 5), (5 − 7),.. ..

To decode the pixel values from the prediction errors, the predicted value (from the same predictor with the same initial predictor value) is added to the prediction error:

p [n] = pc [n] + f (∀p [n′] , n′^ < n).

8 CHAPTER 1. CODING AND COMPRESSION

For the example sequence, that is the original sequence computed as (0 + 0), (2 + 0), (1 + 2), (0 + 3), (1 + 3), (1 + 4), (2 + 5), (−2 + 7),.. ..

A linear predictor that uses a weighted sum of neighboring pixels can be optimized with respect to the mean-square prediction error. Let:

f (∀p [m′, n′] , m′^ < m, n′^ < n) ≡

{ c if m = 0 or n = 0 int (a 1 p [m, n − 1] + a 2 p [m − 1 , n]) otherwise

where a 1 and a 2 are the weights on the pixels to the left and above. Then, the expected mean-square prediction error γ p^2 c is:

E

{ (pc [m, n])^2

}

= E

{ (p [m, n])^2 − (2a 1 p [m, n] p [m, n − 1]) − (2a 2 p [m, n] p [m − 1 , n])

+(a 1 p [m, n − 1])^2 + (a 2 p [m − 1 , n])^2 + (a 1 a 2 p [m, n − 1] p [m − 1 , n])

}

= E

{ (p [m, n])^2

} − 2 a 1 E {p [m, n] p [m, n − 1]} − 2 a 2 E {p [m, n] p [m − 1 , n]}

+a^21 E

{ (p [m, n − 1])^2

}

  • a^22 E

{ (p [m − 1 , n])^2

}

  • a 1 a 2 E {p [m, n − 1] p [m − 1 , n]}

If the autcorrelation function of the image is known, then the expected mean square error is:

γ p^2 c = Rp [0, 0] − 2 a 1 Rp [0, −1] − 2 a 2 Rp [− 1 , 0] + a^21 Rp [0, 0] + a^22 Rp [0, 0] + 2a 1 a 2 Rp [− 1 , 1]

Then, the optimal values for the weights can be determined by setting the partial derivatives with respect to each weight equal to zero:

∂γ p^2 c ∂a 1

= − 2 Rp [0, −1] + 2a 1 Rp [0, 0] + 2a 2 Rp [− 1 , 1] = 0

∂γ p^2 c ∂a 2

= − 2 Rp [− 1 , 0] + 2a 2 Rp [0, 0] + 2a 1 Rp [− 1 , 1] = 0

and solving the resulting linear equations for unknowns a 1 and a 2 :

a 1 =

Rp [0, 0] Rp [0, −1] − Rp [− 1 , 0] Rp [− 1 , 1] (Rp [0, 0])^2 − (Rp [− 1 , 1])^2

a 2 =

Rp [0, 0] Rp [− 1 , 0] − Rp [0, −1] Rp [− 1 , 1] (Rp [0, 0])^2 − (Rp [− 1 , 1])^2

1.5 Transform Coding