




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Professor: Reichenbach; Class: Digital Image Processing; Subject: Computer Science and Engineering ; University: University of Nebraska - Lincoln; Term: Unknown 2008;
Typology: Study notes
1 / 8
This page cannot be seen from the preview
Don't miss anything!





As described in Chapter ??, the signals from scenes, which are continuous, must be converted to discrete values — a process referred to as quantization. Many image acquisition systems use a linear scale for quantization, but other scales can be used, e.g., a log scale. In any case, the scale used in quantization will have an ordinal sequence of discrete values.
Coding is the operation of representing each quantized value as a bit pattern. Many image acquisition systems use straightforward binary representations. For example, a system with a scale of values 0–255 would code the values with binary codes as: 0 00000000 1 00000001 2 00000010 3 00000011
... 254 11111110 255 11111111
Costs and limitations on storage and communication motivate the development of methods for image compression. Data compression is the operation of reducing the number of bits required to represent a sequence of values. Access to pixels in a compressed image (e.g., for viewing or processing) is typically inefficient and so images are decompressed to a more usable format before other operations are performed. The process of compression and decompression are pictured in Figure ??.
A central issue in compression is whether the result after compression and decompression is identical to the original image. If the result is identical to the source, then the compression scheme is called lossless. If the result is not identical to the source, then the scheme is called lossy. Superficially, it seems highly desirable to use lossless compression, but lossy compression schemes often can achieve an order-of-magnitude saving in representation overhead and still produce an image that is very similar to the source.
Straightforward binary representations use the same number of bits for each quantized value. A more efficient scheme can be developed based on the observation that some values occur more frequently than other values. If a code with fewer bits is used for the values that occur more frequently, then the image can be coded in fewer bits (even if longer codes are required for values that occur less frequently).
Huffman coding is one of the simplest schemes based on this approach. A Huffman code can be constructed as follows:
(a) Select the two trees with the smallest associated frequency and remove them from the forest of trees. (If more than two trees have identical values that are smallest, any choice is acceptable.) (b) Join the two selected trees as children of a new root node (one is the ‘0’ child, one is the ‘1’ child, and either permutation is acceptable). Set the associated frequency of the new root node to the sum of the frequencies of the two child nodes. (A parent node will not have an associated quantization value.) Add the new tree to the forest.
Example. Consider an image with M ×N = 64 pixels and a quantization scale with 8 values. With straightforward binary coding, each value would be coded with three bits, ‘000’ for 0, ‘001’ for 1,... ‘111’ for 7. With 64 pixels and three bits per pixel, straightforward binary coding would require 192 bits.
A Huffman code can be generated based on the frequency distribution function. l 0 1 2 3 4 5 6 7 H[l] 5 9 16 21 9 3 1 0
The tree illustrated in Figure ?? captures the information in the frequency distribution function. Because the algorithm for producing the tree is indeterminate with respect to which smallest values are combined and with respect to which is the ‘0’ and ‘1’ child, other
The entropy of the image is the lower bound on the storage that can be achieved by Huffman coding. This bound is achieved if and only if
∀l, − lg
( H[l] M N
) is a whole number
In practice, this is seldom true, but Huffman coding always achieves compression that is within one bit of the entropy and is typically much closer to the entropy than one bit. Other compression schemes, such as arithmetic coding have been designed to reduce the inefficiency caused by fractional log probabilities to achieve compression arbitrarily close to the entropy.
The table used for Huffman coding must be available during both coding and decoding. If the table used by the coder is not known to the decoder, it must be included with the image. If the table must be included with the image, the effective compression ratio is diminished.
Pixel-by-pixel Huffman coding does not take advantage of the spatial correlations that typ- ically exist in images. These correlations can be used to more efficiently code images. Typi- cally, inter-pixel correlation is largest between neighboring pixels. Block Huffman coding is based on the expectation that nearby pixels are correlated and can be more efficiently encoded as a block than as separate values.
Block Huffman coding is based on the multi-dimensional frequency distribution function where each dimension corresponds to one of the pixels in the block. For example, the blocks can be formed from sequential pairs of pixels (i.e., pixels 0 and 1, pixels 2 and 3, etc), in which case the frequency distribution is a function of the pair of values. For example, consider a M ×N = 64 image with only four possible values and the 4×4 frequency distribution function of its block values:
p [m, n + 1] 0 1 2 3 0 3 2 1 1 p [m, n] 1 2 4 2 1 2 1 2 4 2 3 1 1 2 3
This second-order block frequency function indicates that there are 6 blocks with both pixels 0, 3 blocks with the first pixel 0 and the following pixel 1, etc
With straight binary coding, this image requires 64 pixels × 2 bits/pixel = 128 bits. First- order Huffman coding (based on a distribution of 14, 18, 18, 14) still requires 2 bits/pixel and
so 128 bits for the image, even though the first-order entropy is less than 1.4 bits/pixel. For this image, Block Huffman coding requires 123 bits. The second-order entropy is less than 2.66 bits/block, which is 1.33 bits/pixel and 85bits/image. The storage for block Huffman coding is bounded below by the higher-order entropy and is no worse than one bit larger than the higher-order entropy.
The size of the table used for coding grows exponentially with the size of the block. For example, with a block size of 4 and 256 values per pixel, the code table can be larger than 4 billion codes (256^4 ). The size of the table is a major consideration for block Huffman coding.
Run-length encoding is another approach to compression that makes use of redundancy between consecutive pixels — in particular sequences of pixels that have the same value. Sequences of pixels with the same value are especially common in binary images such as scanned documents and facsimile transmissions. Grayscale images (and color images) can be analyzed into bit planes which can be run-length encoded. For bit planes, run-length encoding typically is very effective in more significant bit planes, in which successive pixels are highly correlated, but not very effective in the least significant bit planes.
Specific methods for run-length encoding vary according to the statistical properties of the data. In a simple scheme, the value and the number of pixels having the value are recorded. For example, 50 zeros, followed by 12 ones, followed by 15 zeros, etc., could be coded simply as 0, 50, 1, 12, 0, 15,.... In a binary image, there are only two possible values and the runs alternate, so the next value is known to be the other symbol, i.e., each run of zeros is followed by a run of ones and vice versa. Then, with the convention that the first length is for a run of zeros, these runs could be coded without the value as just the run lengths 50, 12, 15,.... If the image began with a run of ones, the first run length (for zeros) would be
The run length typically is represented in a fixed length integer, e.g., 8 bits per run length. If a run length is larger than can be represented in the allowed number of bits, it may be necessary to record more than one run length to represent the length of the run. For example, a run length of 400 zeros could be represented using 8-bit run lengths as a run of 255 zeros, followed by a run of 0 ones, followed by a run of 145 zeros.
If an image has long runs, run-length encoding can be very effective, but if the image does not have long run lengths, run-length encoding may increase the number of bits required. A common situation is that one symbol (e.g., zero or one) is very common and the other occurs infrequently. One statistical model for run-length encoding assumes that successive values are independent. In that case, the probability distribution of the run lengths for a given symbol is a geometric series:
P [l] = pl(1 − p)
Run-length encoding yields a sequence of run lengths. If some run lengths are more com- mon than others, the sequence of run lengths can be further compressed with Huffman or arithmetic coding.
Predictive coding attempts to remove redundancy by using prior pixel values to predict the next pixel value:
pc [n] = p [n] − f (∀p [n′] , n′^ < n)
where pc is the difference (or prediction error) between the pixel value p [n] and the value generated by the predictor f. For each pixel, the predictor f has access only to previously encoded pixels. Then, only what can’t be predicted, i.e., pc [n], must be coded. Removing redundancy reduces the entropy which allows more efficient coding, e.g., with Huffman or arithmetic coding.
A simple scheme uses the value of the current pixel to predict the next pixel:
f (∀p [n′] , n′^ < n) ≡
{ c n = 0 p [n − 1] n > 0
where c is the initial predictor value used to predict the first pixel. Given the sequence of pixel values:
the predictor in Equation 1.1 with c = 0 yields the sequence:
and prediction errors:
from (0 − 0), (2 − 0), (3 − 2), (3 − 3), (4 − 3), (5 − 4), (7 − 5), (5 − 7),.. ..
To decode the pixel values from the prediction errors, the predicted value (from the same predictor with the same initial predictor value) is added to the prediction error:
p [n] = pc [n] + f (∀p [n′] , n′^ < n).
For the example sequence, that is the original sequence computed as (0 + 0), (2 + 0), (1 + 2), (0 + 3), (1 + 3), (1 + 4), (2 + 5), (−2 + 7),.. ..
A linear predictor that uses a weighted sum of neighboring pixels can be optimized with respect to the mean-square prediction error. Let:
f (∀p [m′, n′] , m′^ < m, n′^ < n) ≡
{ c if m = 0 or n = 0 int (a 1 p [m, n − 1] + a 2 p [m − 1 , n]) otherwise
where a 1 and a 2 are the weights on the pixels to the left and above. Then, the expected mean-square prediction error γ p^2 c is:
{ (pc [m, n])^2
}
{ (p [m, n])^2 − (2a 1 p [m, n] p [m, n − 1]) − (2a 2 p [m, n] p [m − 1 , n])
+(a 1 p [m, n − 1])^2 + (a 2 p [m − 1 , n])^2 + (a 1 a 2 p [m, n − 1] p [m − 1 , n])
}
{ (p [m, n])^2
} − 2 a 1 E {p [m, n] p [m, n − 1]} − 2 a 2 E {p [m, n] p [m − 1 , n]}
+a^21 E
{ (p [m, n − 1])^2
}
{ (p [m − 1 , n])^2
}
If the autcorrelation function of the image is known, then the expected mean square error is:
γ p^2 c = Rp [0, 0] − 2 a 1 Rp [0, −1] − 2 a 2 Rp [− 1 , 0] + a^21 Rp [0, 0] + a^22 Rp [0, 0] + 2a 1 a 2 Rp [− 1 , 1]
Then, the optimal values for the weights can be determined by setting the partial derivatives with respect to each weight equal to zero:
∂γ p^2 c ∂a 1
= − 2 Rp [0, −1] + 2a 1 Rp [0, 0] + 2a 2 Rp [− 1 , 1] = 0
∂γ p^2 c ∂a 2
= − 2 Rp [− 1 , 0] + 2a 2 Rp [0, 0] + 2a 1 Rp [− 1 , 1] = 0
and solving the resulting linear equations for unknowns a 1 and a 2 :
a 1 =
Rp [0, 0] Rp [0, −1] − Rp [− 1 , 0] Rp [− 1 , 1] (Rp [0, 0])^2 − (Rp [− 1 , 1])^2
a 2 =
Rp [0, 0] Rp [− 1 , 0] − Rp [0, −1] Rp [− 1 , 1] (Rp [0, 0])^2 − (Rp [− 1 , 1])^2