Introduction to Image Processing - Computer Science | CS T101, Study notes of Computer Science

Material Type: Notes; Class: Computer Science 0; Subject: Computer Science; University: West Virginia University; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 07/31/2009

koofers-user-lt7-1
koofers-user-lt7-1 🇺🇸

10 documents

1 / 24

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
NOTE04L12: BRIEF INTRODUCTION TO IMAGE PROCESSING
Image Model
An image of a surface is a 2-D function of the light intensity falling on the surface.
For a given spatial location (x,y) on the surface, the image
)
,
(
y
x
f
is formed from a combination of the illumination on the
point, and the surface reflectance at that point:
)
,
(
).
,
(
)
,
(
y
x
s
y
x
i
y
x
f
=
.
where
)
,
(
y
x
i
is the illumination component,
)
,
(
y
x
s
is a component due to surface reflectance
Typically,
1
)
,
(
0
and
,
)
,
(
0
<
<
<
<
y
x
s
y
x
i
, thus
<
<
)
,
(
0
y
x
f
.
Practically, f is bounded within certain limits, called the grey-scale, such that maxmin ),(LyxfL . The scale is further
shifted such that Lmin =0, and Lmax=L, with values of
0
)
,
(
=
y
x
f
corresponding to black, and values of
L
y
x
f
=
)
,
(
corresponding to white.
The image formed in this way is called a monochrome ( or grey-scale) image, as it considers only the intensity values.
Sampling and Quantization
To convert to digital form, the image is discretized - both in its spatial co-ordinates, and its amplitude values.
Discretization of the spatial coodinates (x,y) is called image sampling.
Descretization of the amplitude values
)
,
(
y
x
f
is called grey-level quantization.
The result of the digitization process is a digital image.
We can then represent the image as an NxM array of equally-spaced samples:
)1,1(...)1,1()0,1(
)1,1(...)1,1()0,1(
)1,0(...)1,0()0,0(
),(
MNfNfNf
Mfff
Mfff
yxfMMMM
Each element in the array is called a picture element, or pixel.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18

Partial preview of the text

Download Introduction to Image Processing - Computer Science | CS T101 and more Study notes Computer Science in PDF only on Docsity!

NOTE 04 L12: BRIEF INTRODUCTION TO IMAGE PROCESSING

Image Model

An image of a surface is a 2-D function of the light intensity falling on the surface.

For a given spatial location ( x,y ) on the surface, the image f ( x , y )is formed from a combination of the illumination on the

point, and the surface reflectance at that point:

f ( x , y )= i ( x , y ). s ( x , y ).

where i ( x , y )is the illumination component, s ( x , y )is a component due to surface reflectance

Typically, 0 < i ( x , y )<∞,and 0 < s ( x , y )< 1 , thus 0 < f ( x , y )<∞.

Practically, f is bounded within certain limits, called the grey-scale , such that L min (^) ≤ f ( x , y )≤ L max. The scale is further

shifted such that Lmin =0, and Lmax=L , with values of f ( x , y )= 0 corresponding to black, and values of f ( x , y )= L

corresponding to white.

The image formed in this way is called a monochrome ( or grey-scale) image , as it considers only the intensity values.

Sampling and Quantization

To convert to digital form, the image is discretized - both in its spatial co-ordinates, and its amplitude values.

Discretization of the spatial coodinates ( x,y ) is called image sampling.

Descretization of the amplitude values f ( x , y )is called grey-level quantization.

The result of the digitization process is a digital image.

We can then represent the image as an N x M array of equally-spaced samples:

f N f N f N M

f f f M

f f f M

f x y M M M M

Each element in the array is called a picture element, or pixel.

The number of quantization levels used on the amplitude value determines the number of grey levels used to represent the

image. Typical examples are 2, 128, 256, 1024, etc.

If G is the number of grey-levels, such that

g G = 2 , then the number of bits needed to represent the digital image will be:

b = M. N. g

The product N.M is called the image size measured in pixels.

Notice that N and M in turn depends on the quantization step sizes ( ∆ (^) n , ∆ m ) used on the spatial coordinates. The overall

resolution of the image depends on N, M and g. The higher the values, the better the resolution, but with higher data sizes.

<< See attached images on the effect of resolution - and compare with the above results>>

Basic Pixel Relationships

Pixel Neighbours

A pixel p 1 is a neighbour of another pixel p 2 , if their spatial coordinates ( x 1 ,y 1 ) and ( x 1 ,y 2 ) are not more than a unit distance

apart. The following types of neighbours are often used in image processing:

♦ Horizontal neighbours ♦ Vertical neighbours ♦ Diagonal neighbours ♦ 4-neghbours – comprising vertical and horizontal neighbours ♦ 8-neighbours – comprising 4-neighbours, and diagonal neighbours.

Pixel Connectivity

For two pixels to be connected they must be neighbours and they must be similar based on certain criteria. A simple

similarity criterion is equality. Another is maximum difference – i.e. the pixel values may not necessarily be equal, but they

cannot differ by more than a specified threshold. Basic types of connectivity used include:

4-connectivity

8-connectivitym-connectivity (mixed connectivity)

Arithmetic and Logic Operations

We can perform the four basic arithmetic operations on pixels

♦ Addition: p 1 +p 1 –used in image averaging ♦ Subtraction: p 1 -p 2 – used in image motion analysis, and background removal ♦ Multiplication: p 1 *p 2 – used in colour and image shading operations

♦ Division p 1 /p 2 – used in colour processing,

Colour Models

The colour models used in image processing generally represent colour information in a 3-D coordinate system, whereby

each colour can be represented as a single point in the 3-D space.

Various models have been proposed, and are currently in use. Which model to use is often dependent on the application.

In image processing, the three models that are often used are the RGB, YIQ, and HSI.

RGB Representation

In the RGB representation, each image is represented with three independent image planes, one for each of the three primary

colours R, G, or B.

When the R,G,B values are normalized to unit values, the RGB model can be represented as a unit cube.

The vertices of the cube represent different colours. For instance, the colour at the origin (0,0,0) is taken to be black, while

the colour at the furthest point (1,1,1) is taken to be white. The line from (000) to (111) thus defines the grey-scale

representation.

Sometimes, the normalized R,G,B values are also used in image processing:

R G B

R

r

R G B

g g

R G B

B

b

Obviously, r+g+b=1.

The normalized values are also called the trichromatic coefficients.

The RGB has found applications mainly in colour monitors, and colour cameras.

One advantage of the RGB representation is that most of the other colour models can be obtained by simple linear

transformation on the RGB values. The problem is that the intensity component is not de-coupled from the chromatic

components.

YIQ Representation

This is used mainly in TV broadcast, and in image transmission. The YIQ de-couples the colour components (I,Q) from the

achromatic component, (Y), which represents the monochrome (grey-level) information. It is thus compatible with

monochrome TV standards.

B

G

R

Q

I

Y

HSI (Hue, Saturation, and Intensity) Representation

The main motivation for using the HSI representation is that it is more closely related to the way humans perceive colour.

H - hue is the colour attribute, indicating the density of pure colour

S - saturation indicates how saturated the colour is – i.e. to what extent the colour is diluted by white light

I – intensity

The HSI components are defined as follows:

I = 3 ( R + G + B )

1

I

RG B

S

min , , = 1 −

[ ]

[ ( ) ] 

2

1 ( )( )

cos 2

2

1 1

R G R B G B

R G R B

H

Image Processing Operations

Image processing operations can generally be represented as follows:

where, f ( x , y )is the input image, T ( x , y )is the image processing operator, fo ( x , y )is the output image. Notice that

some operators can act on more than one input image at the same time.

Two broad types of processing can be defined depending on the extent of the operation. These are point-processing and

neighbourhood-based processing.

Using a filtering mask centred at say ( xo , yo ), (see diagrams), the result of the filtering operation will be given by:

=

1

0

n

i

fo xo yo wif xi yi

where n is the number of neighbours, and wi is the weight at the i -th position in the mask.

Basic Filter shapes

Low Pass Filters

This eliminates the high frequency components (edges, sharp details) while passing the low frequency components. Thus the

overall effect in a blurring of the input image.

From the shape of the filter response, we can observe that all coefficients of the mask will have to be positive.

Using the above masks, the result at a given point will simply be the average of its neighbours. This operation is therefore

also called neighbourhood averaging. As the size of the neighbourhood increases, the averaging effect increases.

Median Filter

Here, rather than using the direct average, the media of the neighbouring pixels is used. The effect is that the sharp edges are

preserved, while noise is removed.

<>

Sharpening Filters

These emphasize the fine details or sharp edges in the images. Thus, the high frequency components are passed, while the

low frequency components are eliminated.

From the impulse response of the sharpening filter, we can expect to have positive values for the wi ’s near the mask centre,

and negative values as we move away from the origin.

The high pass filters form the basis of most edge detection algorithms. The major difference is in the weights assigned to

points in the mask.

Examples of masks used in edge detection:

Notice that in all cases, the sum of the weights is always equal to 0.

Other issues of interest:

♦ Image transforms ♦ Image segmentation

♦ Description and representation of segmented images ♦ Motion analysis

Spatial Redundancy

Spatial redundancy usually occurs as a result of the natural similarity between neighbouring points on an object surface.

This usually manifests in the form of correlation between nearby pixels, dis-order/order in the image. This is independent of

the coding redundancy.

Correlation between pixels implies possible predictability of the pixels using information from some other pixels.

Spatial redundancy can be exploited by

♦ representing the images in terms of pixel differences, rather than explicit pixel values. ♦ using run-length pairs, for instance along a given path in the image.

Temporal Redundancy

This is similar to spatial redundancy, but correlation is now with respect to temporal (rather than spatial) neighbourhoods.

Used mainly for video and audio coding.

Psychovisual Redundancy

Perception is often subjective rather than quantitative. Some information items are treated as less important than others by the

human visual system, HVS. Psychovisual redundancy exploits this limitation of the HVS by laying more emphasis on those

aspects of the data that the human will perceive.

♦ This is usually achieved by use of quantization ♦ Results in lossy compression, since the process in not reversible. ♦ This is important in image and video compression

Psycho-acoustic Redundancy

Similar to psycho-visual redundancy, only that here, consideration is with respect to the human auditory system. This is more

important for audio compression

Measures of Compression Performance

♦ Compression ratio ♦ Distortion - quantitative measures, e,g, mean square error (MSE), PSNR ♦ Perceptual quality - subjective rating scales, just-noticeable-thresholds, etc

♦ Coding complexity ♦ Decoding complexity

Compression Models

The compression process can be described by use of the encoder-decoder model. The encoder performs the compression, and

each encoding stage removes (or facilitates the removal of) one form of redundancy. The decoder performs the inverse

process of decompression.

Two Types of Compression

Lossless compression

♦ Reconstructed data is an exact replica of the original ♦ Provides low compression ratios

♦ Used in applications such as data processing, law, medicine, etc.

Lossy compression

♦ Reconstructed data is only an approximation of the original ♦ Possible information loss ♦ Provides avenue for huge compression ratios ♦ Applications include TV broadcasting, VoD, data storage

Lossless Compression

Since the primary source of information loss is the quantization stage, for lossless compression there will be no quantization

and dequantization stages. The basic issues then are:

♦ Representational problem - the transformation stage which exposes the redundancies in the data ♦ Elimination of the exposed redundancy - the encoding stage

Lossless Compression Schemes

Variable Length Coding

This reduces the coding redundancy in the data representation, by representing the most probable symbols (e.g. grey-levels)

with the shortest possible codewords. On popular technique for VLC is Huffman coding.

The Huffman code is said to be optimal in the sense that it produces the smallest possible number of codes per source

symbol, if the source symbols are coded one at a time.

Pattern Substitution

The basic idea here is to replace a data segment with a code word. This thus results in a form of table look-up. For

compression to be achieved, the length of a symbol in the table should be less than the corresponding original symbol. This

is used mainly in situations where there are many repeating patterns.

Examples are:

♦ Repetition suppression, e.g. 245999999990 can be represented as 2459@ ♦ Dictionary look-up

♦ Constant-area coding (used especially for binary images) Identify large areas with contiguous 1's or 0's Code the areas with special codes

Run Length Encoding RLE

This represents the data stream in terms of the number of runs for each symbol.

The 245999999990 above could then be codes as (2,1)(4,1)(5,1)(9,8)(0,1).

The results from the RLE could further be compressed by passing them as the input to say a variable length coding scheme.

Predictive Coding

Predictive coding predicts the value at a given data point based on the preceding data points, and codes only the difference

between the predicted value and the actual value at the data point. Thus, only the prediction error (sometimes called the new

information ) is coded.

This is used to reduce the spatio-temporal redundancies that may exist between nearby data points, example, pixels in an

image.

For an m -th order predictor, the predicted value is given by:

m

i

fn (^) ifni 1

α , where^ αi^ is a weighting factor.

For DPCM, the current data point is assumed to have the same value as the preceding data point. That is

1

fn = fn

Thus

1

en = fnfn = fnfn

Notice that the transformation stage here corresponds to the mapping of the original data into prediction error. The amount

of compression is thus proportional to the entropy reduction produced by the mapping.

Lossy Compression

Basic motivation

♦ Huge compression ratios ♦ For some applications, exact reconstruction is not necessary, and thus accuracy can be compromised

Sources of Error

♦ Quantization process ♦ Truncation/roundup errors

Basic Methodology

Simply insert a quantization stage between the transformation and encoding stages. In general, the transformation stage

would have exposed some form of redundancy in the original data. The quantization stage then exploits the exposed

redundancies by quantizing the transform coefficients (or prediction errors) rather than the original data.

Quantization

♦ Process of converting a continuous signal into a discrete signal. ♦ Basic procedure in digitization ♦ Many-to-one mapping, and thus comes with an inherent loss of information.

Uniform Quantization

Here, the same quantization step size is used for all quantization levels.

Let R be the dynamic range of the data i.e. the total range of the input values, and L the number of quantization levels,

Then, the quantization step size is just:

L

R

Then, the quantization is performed by mapping an input value s , to si ,

si = i if ( i − 1 ).∆≤ s < i. ∆, i =1, 2, …, L

Non Uniform Quantization

♦ The quantization step size is not the same for all quantization levels. ♦ For an image, some regions will have coarse quantization, while others will be fine-grained. ♦ Generally more fine-grained quantization is used in rapidly changing areas, while slowly varying areas can use more coarse quantization.

Optimal Quantizers

These aim at deriving the best values for the quantization step size for each quantization step. Usually, this is done based on

the probability distribution function of the input data.

Transform Coding

M

y v

N

x u g x y uv u v 2

cos 2

( , , , ) ( ) ( )cos

π π α α

where

a N N

a N αa

Choice of Transform

♦ Decorrelation ability

♦ Energy packing ability ♦ Computational complexity ♦ Allowable error

♦ Application dependent

Generally, the DCT provides a better information packing ability than the DFT. But the KLT is the optimal transform in

terms of decorrelation and energy packing ability. The KLT however requires more computation. Thus, the DCT is used in

most transform image coding applications.

Effect of Sub-Image Sizes

♦ Generally error decreases with increasing subimage size ♦ Also, achievable compression increases with increasing subimage size ♦ However, computational complexity also increases with sub-image size.

<< see attached figure>>

For the DCT, at image sizes greater than 16x16, the differences in error and compression tend to become insignificant.

8x8 subimage size has been used as the standard in most existing systems.

Quantization and Coefficient Selection

The quantization values can be chosen to effectively ignore certain coefficients, or to reduce their importance. The choice of

coefficients is done using two basic methods:

Zonal method - select coefficients with overall maximum variance across the image ♦ Threshold method - select coefficients with maximum magnitudes. Those with values below a given threshold are ignored

Quantization is performed by using a quantization table Q , based on the simple formula:

Quv

T uv Tu v round

Thus, T ˆ^ ( u , v )= 0 whenever Q ( u , v )> 2 T ( u , v ).

Inverse quantization (dequantization):

T & ( u , v )= T ˆ( u , v ). Q ( u , v )

Clearly, if T ˆ^ ( u , v )= 0 , the original value of T ( u , v )cannot be recovered. Hence, by increasing the values in Q , we can

obtain more compression, but at the expense of more error.

Some compression schemes also use a hybrid approach, by combining say DCT and predictive coding.

Encoding Stage

After quantization, we may have several runs of zeros or repetitions, which can be coded using any of the methods for

lossless encoding, such as the RLE, or Huffman codes.

Extended coding system

♦ lossy ♦ higher compression ♦ better precision ♦ progressive coding

Independent coding system

♦ loslesss ♦ completely reversible ♦ uses simple predictive coding ♦ does not involve DCT or quantization

JPEG Coding Procedure

♦ Divide the image into 8x8 blocks

♦ Process blocks from left-to-right, top-to-bottom

♦ Level shift the image pixel values :

1 ( , ) ( , ) 2

− = −

n fs x y f x y , where 2 n is the number of grey levels.

♦ Compute the 2D DCT of each block

♦ Quantize the coefficients using the quantization table. Different quantization tables are used for the colour and intensity components ♦ Form 1-D sequence of the coefficients using the zigzag representation

♦ Encode the resulting 1-D sequence (uses Huffman coding). Different Huffman tables are also used for the luminance and chrominance components.

The DC coefficients are coded relative to previous DC's using differential coding. That is, for the DC's only the prediction

error between the current DC and the DC value from the preceding block is coded. Thus, the coding tables are based on ( DC

difference - category ) pairs, where categories indicate the range (or size) of the DC difference.

For AC coefficients, a variable length coding, based on tables of stipulated output symbols for given pairs of coefficient

value and number of preceding zeros, sometimes called ( runlength, category ) pairs.

Progressive Encoding In JPEG

Uses

♦ Browsing, search and retrieval ♦ Network support for prioritization - i.e. lower resolution information are given higher priority, and thus sent out first to enhance speedy browsing ♦ Lower resolution can typically be coded with less precision without degrading the visual quality. Hence, they will require less bandwidth

Methods used for progressive encoding

Spectral selection

This is based on the zig-zag ordering. At the i-th scan, only information from the i-th diagonal are sent:

DC

AC 1 +AC 2

AC 63

Usually, the DC and the first two AC coefficients are sent at the same time.

This is conceptually simple. The distortion introduced depends on the cut-off - i..e the number of diagonals scaned.

Successive Approximation

Send information from all frequencies at the same time, starting from the MSB to the LSB. Usually, the first 4 MSBs are sent

at the first scan. Then the remaining bits are sent one at a time.

This is generally slower than spectral selection, but produces a relatively constant distortion across all the spatial frequencies.

Some techniques try to combine spectral selection with successive approximation.

Hierarchical Encoding in JPEG

Generally used where the source image is at a higher resolution than the display device. Hierarchical encoding basically uses

a pyramidal image representation to encode images at multiple resolution. The resolutions differ by a factor of 2 vertically

and horizontally. Lower resolution images can be accessed without decompressing the entire image. Each resolution is coded

as an ordinary image.