
















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Class: Computer Science 0; Subject: Computer Science; University: West Virginia University; Term: Unknown 1989;
Typology: Study notes
1 / 24
This page cannot be seen from the preview
Don't miss anything!

















Image Model
An image of a surface is a 2-D function of the light intensity falling on the surface.
For a given spatial location ( x,y ) on the surface, the image f ( x , y )is formed from a combination of the illumination on the
point, and the surface reflectance at that point:
f ( x , y )= i ( x , y ). s ( x , y ).
where i ( x , y )is the illumination component, s ( x , y )is a component due to surface reflectance
Typically, 0 < i ( x , y )<∞,and 0 < s ( x , y )< 1 , thus 0 < f ( x , y )<∞.
Practically, f is bounded within certain limits, called the grey-scale , such that L min (^) ≤ f ( x , y )≤ L max. The scale is further
shifted such that Lmin =0, and Lmax=L , with values of f ( x , y )= 0 corresponding to black, and values of f ( x , y )= L
corresponding to white.
The image formed in this way is called a monochrome ( or grey-scale) image , as it considers only the intensity values.
Sampling and Quantization
To convert to digital form, the image is discretized - both in its spatial co-ordinates, and its amplitude values.
Discretization of the spatial coodinates ( x,y ) is called image sampling.
Descretization of the amplitude values f ( x , y )is called grey-level quantization.
The result of the digitization process is a digital image.
We can then represent the image as an N x M array of equally-spaced samples:
f N f N f N M
f f f M
f f f M
f x y M M M M
Each element in the array is called a picture element, or pixel.
The number of quantization levels used on the amplitude value determines the number of grey levels used to represent the
image. Typical examples are 2, 128, 256, 1024, etc.
If G is the number of grey-levels, such that
g G = 2 , then the number of bits needed to represent the digital image will be:
b = M. N. g
The product N.M is called the image size measured in pixels.
Notice that N and M in turn depends on the quantization step sizes ( ∆ (^) n , ∆ m ) used on the spatial coordinates. The overall
resolution of the image depends on N, M and g. The higher the values, the better the resolution, but with higher data sizes.
<< See attached images on the effect of resolution - and compare with the above results>>
Basic Pixel Relationships
Pixel Neighbours
A pixel p 1 is a neighbour of another pixel p 2 , if their spatial coordinates ( x 1 ,y 1 ) and ( x 1 ,y 2 ) are not more than a unit distance
apart. The following types of neighbours are often used in image processing:
♦ Horizontal neighbours ♦ Vertical neighbours ♦ Diagonal neighbours ♦ 4-neghbours – comprising vertical and horizontal neighbours ♦ 8-neighbours – comprising 4-neighbours, and diagonal neighbours.
Pixel Connectivity
For two pixels to be connected they must be neighbours and they must be similar based on certain criteria. A simple
similarity criterion is equality. Another is maximum difference – i.e. the pixel values may not necessarily be equal, but they
cannot differ by more than a specified threshold. Basic types of connectivity used include:
♦ 4-connectivity
♦ 8-connectivity ♦ m-connectivity (mixed connectivity)
Arithmetic and Logic Operations
We can perform the four basic arithmetic operations on pixels
♦ Addition: p 1 +p 1 –used in image averaging ♦ Subtraction: p 1 -p 2 – used in image motion analysis, and background removal ♦ Multiplication: p 1 *p 2 – used in colour and image shading operations
♦ Division p 1 /p 2 – used in colour processing,
Colour Models
The colour models used in image processing generally represent colour information in a 3-D coordinate system, whereby
each colour can be represented as a single point in the 3-D space.
Various models have been proposed, and are currently in use. Which model to use is often dependent on the application.
In image processing, the three models that are often used are the RGB, YIQ, and HSI.
RGB Representation
In the RGB representation, each image is represented with three independent image planes, one for each of the three primary
colours R, G, or B.
When the R,G,B values are normalized to unit values, the RGB model can be represented as a unit cube.
The vertices of the cube represent different colours. For instance, the colour at the origin (0,0,0) is taken to be black, while
the colour at the furthest point (1,1,1) is taken to be white. The line from (000) to (111) thus defines the grey-scale
representation.
Sometimes, the normalized R,G,B values are also used in image processing:
r
g g
b
Obviously, r+g+b=1.
The normalized values are also called the trichromatic coefficients.
The RGB has found applications mainly in colour monitors, and colour cameras.
One advantage of the RGB representation is that most of the other colour models can be obtained by simple linear
transformation on the RGB values. The problem is that the intensity component is not de-coupled from the chromatic
components.
YIQ Representation
This is used mainly in TV broadcast, and in image transmission. The YIQ de-couples the colour components (I,Q) from the
achromatic component, (Y), which represents the monochrome (grey-level) information. It is thus compatible with
monochrome TV standards.
HSI (Hue, Saturation, and Intensity) Representation
The main motivation for using the HSI representation is that it is more closely related to the way humans perceive colour.
H - hue is the colour attribute, indicating the density of pure colour
S - saturation indicates how saturated the colour is – i.e. to what extent the colour is diluted by white light
I – intensity
The HSI components are defined as follows:
1
min , , = 1 −
−
2
1 ( )( )
cos 2
2
1 1
Image Processing Operations
Image processing operations can generally be represented as follows:
where, f ( x , y )is the input image, T ( x , y )is the image processing operator, fo ( x , y )is the output image. Notice that
some operators can act on more than one input image at the same time.
Two broad types of processing can be defined depending on the extent of the operation. These are point-processing and
neighbourhood-based processing.
Using a filtering mask centred at say ( xo , yo ), (see diagrams), the result of the filtering operation will be given by:
∑
−
=
1
0
n
i
fo xo yo wif xi yi
where n is the number of neighbours, and wi is the weight at the i -th position in the mask.
Basic Filter shapes
Low Pass Filters
This eliminates the high frequency components (edges, sharp details) while passing the low frequency components. Thus the
overall effect in a blurring of the input image.
From the shape of the filter response, we can observe that all coefficients of the mask will have to be positive.
Using the above masks, the result at a given point will simply be the average of its neighbours. This operation is therefore
also called neighbourhood averaging. As the size of the neighbourhood increases, the averaging effect increases.
Median Filter
Here, rather than using the direct average, the media of the neighbouring pixels is used. The effect is that the sharp edges are
preserved, while noise is removed.
<>
Sharpening Filters
These emphasize the fine details or sharp edges in the images. Thus, the high frequency components are passed, while the
low frequency components are eliminated.
From the impulse response of the sharpening filter, we can expect to have positive values for the wi ’s near the mask centre,
and negative values as we move away from the origin.
The high pass filters form the basis of most edge detection algorithms. The major difference is in the weights assigned to
points in the mask.
Examples of masks used in edge detection:
Notice that in all cases, the sum of the weights is always equal to 0.
Other issues of interest:
♦ Image transforms ♦ Image segmentation
♦ Description and representation of segmented images ♦ Motion analysis
Spatial Redundancy
Spatial redundancy usually occurs as a result of the natural similarity between neighbouring points on an object surface.
This usually manifests in the form of correlation between nearby pixels, dis-order/order in the image. This is independent of
the coding redundancy.
Correlation between pixels implies possible predictability of the pixels using information from some other pixels.
Spatial redundancy can be exploited by
♦ representing the images in terms of pixel differences, rather than explicit pixel values. ♦ using run-length pairs, for instance along a given path in the image.
Temporal Redundancy
This is similar to spatial redundancy, but correlation is now with respect to temporal (rather than spatial) neighbourhoods.
Used mainly for video and audio coding.
Psychovisual Redundancy
Perception is often subjective rather than quantitative. Some information items are treated as less important than others by the
human visual system, HVS. Psychovisual redundancy exploits this limitation of the HVS by laying more emphasis on those
aspects of the data that the human will perceive.
♦ This is usually achieved by use of quantization ♦ Results in lossy compression, since the process in not reversible. ♦ This is important in image and video compression
Psycho-acoustic Redundancy
Similar to psycho-visual redundancy, only that here, consideration is with respect to the human auditory system. This is more
important for audio compression
Measures of Compression Performance
♦ Compression ratio ♦ Distortion - quantitative measures, e,g, mean square error (MSE), PSNR ♦ Perceptual quality - subjective rating scales, just-noticeable-thresholds, etc
♦ Coding complexity ♦ Decoding complexity
Compression Models
The compression process can be described by use of the encoder-decoder model. The encoder performs the compression, and
each encoding stage removes (or facilitates the removal of) one form of redundancy. The decoder performs the inverse
process of decompression.
Two Types of Compression
Lossless compression
♦ Reconstructed data is an exact replica of the original ♦ Provides low compression ratios
♦ Used in applications such as data processing, law, medicine, etc.
Lossy compression
♦ Reconstructed data is only an approximation of the original ♦ Possible information loss ♦ Provides avenue for huge compression ratios ♦ Applications include TV broadcasting, VoD, data storage
Lossless Compression
Since the primary source of information loss is the quantization stage, for lossless compression there will be no quantization
and dequantization stages. The basic issues then are:
♦ Representational problem - the transformation stage which exposes the redundancies in the data ♦ Elimination of the exposed redundancy - the encoding stage
Lossless Compression Schemes
Variable Length Coding
This reduces the coding redundancy in the data representation, by representing the most probable symbols (e.g. grey-levels)
with the shortest possible codewords. On popular technique for VLC is Huffman coding.
The Huffman code is said to be optimal in the sense that it produces the smallest possible number of codes per source
symbol, if the source symbols are coded one at a time.
Pattern Substitution
The basic idea here is to replace a data segment with a code word. This thus results in a form of table look-up. For
compression to be achieved, the length of a symbol in the table should be less than the corresponding original symbol. This
is used mainly in situations where there are many repeating patterns.
Examples are:
♦ Repetition suppression, e.g. 245999999990 can be represented as 2459@ ♦ Dictionary look-up
♦ Constant-area coding (used especially for binary images) Identify large areas with contiguous 1's or 0's Code the areas with special codes
Run Length Encoding RLE
This represents the data stream in terms of the number of runs for each symbol.
The 245999999990 above could then be codes as (2,1)(4,1)(5,1)(9,8)(0,1).
The results from the RLE could further be compressed by passing them as the input to say a variable length coding scheme.
Predictive Coding
Predictive coding predicts the value at a given data point based on the preceding data points, and codes only the difference
between the predicted value and the actual value at the data point. Thus, only the prediction error (sometimes called the new
information ) is coded.
This is used to reduce the spatio-temporal redundancies that may exist between nearby data points, example, pixels in an
image.
For an m -th order predictor, the predicted value is given by:
m
i
fn (^) ifni 1
α , where^ αi^ is a weighting factor.
For DPCM, the current data point is assumed to have the same value as the preceding data point. That is
1
fn = fn −
Thus
1
en = fn − fn = fn − fn −
Notice that the transformation stage here corresponds to the mapping of the original data into prediction error. The amount
of compression is thus proportional to the entropy reduction produced by the mapping.
Lossy Compression
Basic motivation
♦ Huge compression ratios ♦ For some applications, exact reconstruction is not necessary, and thus accuracy can be compromised
Sources of Error
♦ Quantization process ♦ Truncation/roundup errors
Basic Methodology
Simply insert a quantization stage between the transformation and encoding stages. In general, the transformation stage
would have exposed some form of redundancy in the original data. The quantization stage then exploits the exposed
redundancies by quantizing the transform coefficients (or prediction errors) rather than the original data.
Quantization
♦ Process of converting a continuous signal into a discrete signal. ♦ Basic procedure in digitization ♦ Many-to-one mapping, and thus comes with an inherent loss of information.
Uniform Quantization
Here, the same quantization step size is used for all quantization levels.
Let R be the dynamic range of the data i.e. the total range of the input values, and L the number of quantization levels,
Then, the quantization step size is just:
Then, the quantization is performed by mapping an input value s , to si ,
si = i if ( i − 1 ).∆≤ s < i. ∆, i =1, 2, …, L
Non Uniform Quantization
♦ The quantization step size is not the same for all quantization levels. ♦ For an image, some regions will have coarse quantization, while others will be fine-grained. ♦ Generally more fine-grained quantization is used in rapidly changing areas, while slowly varying areas can use more coarse quantization.
Optimal Quantizers
These aim at deriving the best values for the quantization step size for each quantization step. Usually, this is done based on
the probability distribution function of the input data.
Transform Coding
y v
x u g x y uv u v 2
cos 2
( , , , ) ( ) ( )cos
π π α α
where
a N N
a N αa
Choice of Transform
♦ Decorrelation ability
♦ Energy packing ability ♦ Computational complexity ♦ Allowable error
♦ Application dependent
Generally, the DCT provides a better information packing ability than the DFT. But the KLT is the optimal transform in
terms of decorrelation and energy packing ability. The KLT however requires more computation. Thus, the DCT is used in
most transform image coding applications.
Effect of Sub-Image Sizes
♦ Generally error decreases with increasing subimage size ♦ Also, achievable compression increases with increasing subimage size ♦ However, computational complexity also increases with sub-image size.
<< see attached figure>>
For the DCT, at image sizes greater than 16x16, the differences in error and compression tend to become insignificant.
8x8 subimage size has been used as the standard in most existing systems.
Quantization and Coefficient Selection
The quantization values can be chosen to effectively ignore certain coefficients, or to reduce their importance. The choice of
coefficients is done using two basic methods:
♦ Zonal method - select coefficients with overall maximum variance across the image ♦ Threshold method - select coefficients with maximum magnitudes. Those with values below a given threshold are ignored
Quantization is performed by using a quantization table Q , based on the simple formula:
Quv
T uv Tu v round
Thus, T ˆ^ ( u , v )= 0 whenever Q ( u , v )> 2 T ( u , v ).
Inverse quantization (dequantization):
T & ( u , v )= T ˆ( u , v ). Q ( u , v )
Clearly, if T ˆ^ ( u , v )= 0 , the original value of T ( u , v )cannot be recovered. Hence, by increasing the values in Q , we can
obtain more compression, but at the expense of more error.
Some compression schemes also use a hybrid approach, by combining say DCT and predictive coding.
Encoding Stage
After quantization, we may have several runs of zeros or repetitions, which can be coded using any of the methods for
lossless encoding, such as the RLE, or Huffman codes.
Extended coding system
♦ lossy ♦ higher compression ♦ better precision ♦ progressive coding
Independent coding system
♦ loslesss ♦ completely reversible ♦ uses simple predictive coding ♦ does not involve DCT or quantization
JPEG Coding Procedure
♦ Divide the image into 8x8 blocks
♦ Process blocks from left-to-right, top-to-bottom
♦ Level shift the image pixel values :
1 ( , ) ( , ) 2
− = −
n fs x y f x y , where 2 n is the number of grey levels.
♦ Compute the 2D DCT of each block
♦ Quantize the coefficients using the quantization table. Different quantization tables are used for the colour and intensity components ♦ Form 1-D sequence of the coefficients using the zigzag representation
♦ Encode the resulting 1-D sequence (uses Huffman coding). Different Huffman tables are also used for the luminance and chrominance components.
The DC coefficients are coded relative to previous DC's using differential coding. That is, for the DC's only the prediction
error between the current DC and the DC value from the preceding block is coded. Thus, the coding tables are based on ( DC
difference - category ) pairs, where categories indicate the range (or size) of the DC difference.
For AC coefficients, a variable length coding, based on tables of stipulated output symbols for given pairs of coefficient
value and number of preceding zeros, sometimes called ( runlength, category ) pairs.
Progressive Encoding In JPEG
Uses
♦ Browsing, search and retrieval ♦ Network support for prioritization - i.e. lower resolution information are given higher priority, and thus sent out first to enhance speedy browsing ♦ Lower resolution can typically be coded with less precision without degrading the visual quality. Hence, they will require less bandwidth
Methods used for progressive encoding
Spectral selection
This is based on the zig-zag ordering. At the i-th scan, only information from the i-th diagonal are sent:
DC
AC 1 +AC 2
…
AC 63
Usually, the DC and the first two AC coefficients are sent at the same time.
This is conceptually simple. The distortion introduced depends on the cut-off - i..e the number of diagonals scaned.
Successive Approximation
Send information from all frequencies at the same time, starting from the MSB to the LSB. Usually, the first 4 MSBs are sent
at the first scan. Then the remaining bits are sent one at a time.
This is generally slower than spectral selection, but produces a relatively constant distortion across all the spatial frequencies.
Some techniques try to combine spectral selection with successive approximation.
Hierarchical Encoding in JPEG
Generally used where the source image is at a higher resolution than the display device. Hierarchical encoding basically uses
a pyramidal image representation to encode images at multiple resolution. The resolutions differ by a factor of 2 vertically
and horizontally. Lower resolution images can be accessed without decompressing the entire image. Each resolution is coded
as an ordinary image.