Panorama Stitching Final Year Project, Projects for Computer Vision. Pakistan Institute of Engineering and Applied Sciences, Islamabad (PIEAS)
naachiz
naachiz19 May 2013

Panorama Stitching Final Year Project, Projects for Computer Vision. Pakistan Institute of Engineering and Applied Sciences, Islamabad (PIEAS)

PDF (2 MB)
95 pages
17Number of download
1000+Number of visits
Description
Image processing and computer vision project report includes the whole research on panorama stitching.
20points
Download points needed to download
this document
Download the document
Preview3 pages / 95
This is only a preview
3 shown on 95 pages
Download the document
This is only a preview
3 shown on 95 pages
Download the document
This is only a preview
3 shown on 95 pages
Download the document
This is only a preview
3 shown on 95 pages
Download the document
FINAL DEGREE PROJECT - Oscar Soler _Reparado_

Department of Electronic and Computer Engineering

FINAL DEGREE PROJECT

June 2011

IMAGE STITCHING

Studies: Telecommunications Engineering

Author: Oscar Soler Cubero

Supervisors: Dr. Sean McGrath, Dr. Colin Flanagan

Image Stitching 1

CONTENTS

Abstract ................................................................................................................................ 3

Resum .................................................................................................................................. 4

Resumen .............................................................................................................................. 5

1. Introduction ...................................................................................................................... 6

1.1. General Introduction .................................................................................................... 6

1.2. Image .......................................................................................................................... 7

1.2.1. Still Image ............................................................................................................. 7

1.2.2. Moving Image ..................................................................................................... 10

2. Image Processing .......................................................................................................... 11

2.1. Points Operators ....................................................................................................... 11

2.2. Neighbourhood Operators ......................................................................................... 13

2.2.1. Linear Filtering .................................................................................................... 13

2.2.2. Non-linear Filtering.............................................................................................. 14

2.3. Image Pyramids and Wavelets .................................................................................. 14

2.3.1. Pyramids ............................................................................................................. 15

2.3.2. Wavelets ............................................................................................................. 15

2.4. Geometric Transformations ....................................................................................... 16

3. Stitching Process ........................................................................................................... 18

3.1. History ....................................................................................................................... 18

3.2. Image Registration .................................................................................................... 19

3.2.1. Direct (Pixel-Based) Registration ........................................................................ 19

3.2.2. Feature-Based Registration ................................................................................ 21

3.3. Image Calibration ...................................................................................................... 28

3.3.1. Bundle Adjustment .............................................................................................. 28

3.3.2. Parallax Removal ................................................................................................ 29

3.4. Image Blending ......................................................................................................... 30

3.4.1. Compositing Surface ........................................................................................... 30

3.4.2. Pixel/Seam Selection and Weighting .................................................................. 31

3.4.2. Blending .............................................................................................................. 32

4. Stitching Issues ............................................................................................................. 35

4.1. Algorithms ................................................................................................................. 35

4.1.1. Scale Invariant Feature Transform ...................................................................... 35

4.1.2. Random Sample Consensus .............................................................................. 38

4.1.3. Kd-Tree Nearest Neighbour Search .................................................................... 39

Image Stitching 2

4.2. Common Problems .................................................................................................... 40

4.2.1. Exposure ............................................................................................................ 40

4.2.2. Lens Distortion .................................................................................................... 41

4.2.3. Parallax Error ...................................................................................................... 42

4.2.4. Ghosting ............................................................................................................. 42

4.2.5. Overlapping ........................................................................................................ 43

4.2.6. HFOV and Image Resolution .............................................................................. 44

4.3. Software .................................................................................................................... 45

4.3.1. Panorama Tools ................................................................................................. 45

4.3.2. Hugin .................................................................................................................. 46

4.3.3. PTGui ................................................................................................................. 46

4.3.4. AutoStitch and AutopanoPro ............................................................................... 47

4.3.5. Microsoft Image Composite Editor ...................................................................... 48

4.3.6. CleVR ................................................................................................................. 48

5. Stitching Application ..................................................................................................... 49

5.1. Smart Tour UL project ............................................................................................... 49

5.2. Panorama Stitching Formats ..................................................................................... 50

5.3. Script ......................................................................................................................... 53

5.3.1. Package and Libraries ........................................................................................ 53

5.3.2. Panorama_Script.sh ........................................................................................... 54

5.4. Results ...................................................................................................................... 59

5.4.1. Original Image Re-Stitched ................................................................................ 59

5.4.2. Camera Phone .................................................................................................... 60

5.4.2. University of Limerick .......................................................................................... 61

6. Conclusions ................................................................................................................... 63

6.1. Future Work Development ......................................................................................... 63

7. References ..................................................................................................................... 65

Appendix A. Panorama_Script.sh ..................................................................................... 66

Appendix B. Manual Pages ............................................................................................... 67

Appendix C. Project Files Examples ................................................................................ 86

C.1. Project File for autooptimiser .................................................................................... 86

C.1. Project File for nona .................................................................................................. 91

Image Stitching 3

ABSTRACT

Image processing is any form of signal processing for which the input is an image, such as a photograph or video frame; the output of image processing may be either an

image or, a set of characteristics or parameters related to the image. Most image processing

techniques involve treating the image as a two-dimensional signal and applying standard

signal processing techniques to it. Specifically, image stitching presents different stages to

render two or more overlapping images into a seamless stitched image, from the detection of

features to blending in a final image. In this process, Scale Invariant Feature Transform

(SIFT) algorithm can be applied to perform the detection and matching control points step,

due to its good properties.

The process of create an automatic and effective whole stitching process leads to

analyze different methods of the stitching stages. Several commercial and online software

tools are available to perform the stitching process, offering diverse options in different

situations. This analysis involves the creation of a script to deal with images and project data

files. Once the whole script is generated, the stitching process is able to achieve an

automatic execution allowing good quality results in the final composite image.

Image Stitching 4

RESUM

Processament d'imatge és qualsevol tipus de processat de senyal en aquell que

l'entrada és una imatge, com una fotografia o fotograma de vídeo, i la sortida pot ser una

imatge o conjunt de característiques i paràmetres relacionats amb la imatge. Moltes de les

tècniques de processat d'imatge impliquen un tractament de la imatge com a senyal en dues

dimensions, i per això s'apliquen tècniques estàndard de processament de senyal.

Concretament, la costura o unió d'imatges presenta diferents etapes per unir dues o més

imatges superposades en una imatge perfecta sense costures, des de la detecció de punts

clau en les imatges fins a la seva barreja en la imatge final. En aquest procés, l'algoritme

Scale Invariant Feature Transform (SIFT) pot ser aplicat per desenvolupar la fase de

detecció i selecció de correspondències entre imatges a causa de les seves bones qualitats.

El desenvolupament de la creació d'un complet procés de costura automàtic i efectiu,

passa per analitzar diferents mètodes de les etapes del cosit de les imatges. Diversos

programari comercials i gratuïts són capaços de dur a terme el procés de costura, oferint

diferents alternatives en diverses situacions. Aquesta anàlisi implica la creació d'una

seqüència de commandes que treballa amb les imatges i amb arxius de dades del projecte

generat. Un cop aquesta seqüència és creada, el procés de cosit d'imatges és capaç

d'aconseguir una execució automàtica permetent uns resultats de qualitat en la imatge final.

Image Stitching 5

RESUMEN

Procesado de imagen es cualquier tipo de procesado de señal en aquel que la entrada es una imagen, como una fotografía o fotograma de video; la salida puede ser una

imagen o conjunto de características y parámetros relacionados con la imagen. Muchas de

las técnicas de procesado de imagen implican un tratamiento de la imagen como señal en

dos dimensiones, y para ello se aplican técnicas estándar de procesado de señal.

Concretamente, la costura o unión de imágenes presenta diferentes etapas para unir dos o

más imágenes superpuestas en una imagen perfecta sin costuras, desde la detección de

puntos clave en las imágenes hasta su mezcla en la imagen final. En este proceso, el

algoritmo Scale Invariant Feature Transform (SIFT) puede ser aplicado para desarrollar la

fase de detección y selección de correspondencias entre imágenes debido a sus buenas

cualidades.

El desarrollo de la creación de un completo proceso de costura automático y efectivo,

pasa por analizar diferentes métodos de las etapas del cosido de las imágenes. Varios

software comerciales y gratuitos son capaces de llevar a cabo el proceso de costura,

ofreciendo diferentes alternativas en distintas situaciones. Este análisis implica la creación

de una secuencia de comandos que trabaja con las imágenes y con archivos de datos del

proyecto generado. Una vez esta secuencia es creada, el proceso de cosido de imágenes

es capaz de lograr una ejecución automática permitiendo unos resultados de calidad en la

imagen final.

Image Stitching 6

1. INTRODUCTION

1.1. General Introduction

Signal processing is an area of electrical engineering and applied mathematics that

operates and analyzes signals, in either discrete or continuous time, performing useful

operations on those signals. Some of the most common signals can include sound, images,

time-varying measurement values, sensor data, control system signals,

telecommunication transmission signals and many others. These signals are analog or digital

electrical representations of time-varying or spatial-varying physical magnitude.

It can differentiate between three types of signal processing, depending on which kind

of signal is used: analog signal processing for not digitized signals, as radio, telephone,

radar, and television systems; discrete time signal processing for sampled signals that are

defined only at discrete points in time; and digital signal processing as the processing of

digitised discrete time sampled signals, done by computers or specialized digital signal

processors.

Digital signal processing usually aims to measure, filter and/or compress continuous

real analog signals. Its first step is to convert the signal from an analog to a digital

form, sampling it using an analog-to-digital converter, which converts the analog signal into a

stream of numbers. However, the output signal is often another analog signal, which requires

a digital-to-analog converter. Digital signal processing allows many advantages over analog

processing in many applications, such as error detection and correction in transmission as

well as data compression. It includes subfields like: audio and speech signal processing,

sonar and radar signal processing, spectral estimation, statistical signal processing, digital

image processing, signal processing for communications, and many others.

Concretely in this memory, the subcategory of signal processing analyzed is image

processing, also usually refers to digital image processing. Image processing is a type

of signal processing where the input is an image, such as a photograph or video frame; and

the output of image processing can be an image or a set of parameters related to the image.

Most image processing techniques involve treating images as a two dimensional signals and

applying standard signal processing techniques. Specifically, digital image processing uses a

wide range of computer algorithms to perform image processing on digital images, avoiding

problems such as the increase of noise and signal distortion during the process. Medical and

microscope image processing, face and feature detection, computer vision and image

stitching are some of the different applications in the field of image processing.

Image Stitching 7

1.2. Image

Before starting to describe the image processing and the image stitching process, it is

required to understand the basic objects that it is going to work with: images. It can

discriminate between still image (digital image) and moving image (digital video). In this

section, first the characteristics and different formats of digital image are explained. After

that, considering video stitching as the next step of image stitching, a brief introduction to the

digital video is presented.

1.2.1. Still Image

In the field of engineering and computer science, it requires a kind of still image that

can be manipulated by computers. For this reason it is used a numeric representation of a

two-dimensional static image, known as digital image.

Firstly, to obtain a digital image from an analog image, the digitalization process is

performed in some devices such as scanners or digital cameras. After that, the digital image

is prepared to be processed. There are different digital image formats to work: bitmap or

raster format and vector. Often, it can combine both formats in one image.

Figure 1.1. Raster/Bitmap vs. Vector Image

Raster Image

Raster graphic image or bitmap is composed by a serial of points, called pixels, that

contains colour information. Bitmap images depend on the resolution, containing a fixed

number of pixels. Each pixel has a concrete location and colour value information, what

convert the pixel to the basic information unit of the image. The pixels are distributed creating

a grid of cells, where each cell is a pixel, and all together build the whole image. When the

Image Stitching 8

grid is modified, it affects the distribution, number and colour information of every pixel, and

therefore, the final image.

The image resolution is the number of pixels shown per longitude unit printed in an

image, normally in pixel per inch. The quality of a bitmap image is determined in the moment

of its creation, so it cannot amplify its resolution without modifying the image, normally

deforming it and losing definition. This resolution is proportionally related to the file size;

more pixels imply more bits.

From the previous paragraph, it is easy to see that the size is an important factor to

consider in the image. The solution to manage resolution and size of images is the different

compression techniques, which try to reduce the file volume with algorithms. The different

raster formats achieve to reduce the image weight without modify the number of pixels.

There are two types of compression: lossy and lossless techniques. The first one

compresses keeping image details and colour information, while the second deletes both.

Some common bitmaps formats are:

GIF – Graphical Interchange Format: works with two methods to compress, CLUT

(Colour Look-Up Table) and LZW (explained below). It is one of the most used, especially in

web images, because it offers more possibilities and higher level compression than others.

Suitable for 256 colours compression.

RLE – Run Length Encoding: Lossless compression technique that registers a single

colour value for a group of pixels with the same colour. This technique is exploited in bitmaps

images with a large amount of equal colours, saving a lot of weight and keeping the quality.

LZW – Lempel-Ziv-Welch: Similar operation to RLE.

JPEG – Join Photographic Expert Group: is one of the most adequate formats for

images with more than 256 colours, appropriate for colour photos and web images

compression. Although the compression can become high, the visible losses are not very

significant. JPEG saves all the colour information in millions of colours without creating a

large file. Discrete Cosine Transform (DCT) is the compression technique used by JPEG. It

is based in divide the information in two parts, on the one hand the colour data, and on the

other hand the brightness data, compressing separately too. For this reason, JPEG is not

suitable for images with high contrasts of colours or for text with image. Because it is a lossy

compression format, it is recommended to do the JPEG conversion in the last steps, after

have done all the required modifications of the image.

Image Stitching 9

PNG – Portable Network Graphics: has become important over the last times. It

allows lossless compression, merging perfectly any image edge with the background. It is not

able to play animated images such as GIF, and the images have more weight than in JPEG.

BMP – BitMaP: is the Windows format, very popular but its compression is poor

compared with other formats such as JPEG.

PSD – PhotoShop Document: Is the format for the Adobe program, widely used

because it is one of the most powerful photography programs graphically.

TIFF – Tag Image File Format: is admitted in almost all the edition and image

applications. It allows many possibilities for both Mac and PC.

Vector Image

Vector images or vector graphics oriented to objects are made by vectors, objects

mathematically created. The most important vector elements are Béizer curves,

mathematically represented. Each vector is defined by a serial of points that have some

handles to control the line shape created between them. The curve is totally defined by

nodes or anchor points, and the handles. Moving the handles it can obtain the wanted curve.

Figure 1.2. Different Béizer Curves.

These lines or curves of Béizer are quite manageable because give a lot of

possibilities due to their plasticity. These characteristics convert the vector images to the

ideal way to work in the field of graphic design, for example in the creation of drawings or

logos. The versatility of the curves makes them useful to work with text, modifying and

deforming letters without limit.

Using mathematical coordinates to create images, vectorial formats allow an infinite

image resolution. If an image is enlarged or reduced, its visibility will not change, nor on the

Image Stitching 10

screen or printed. The image conserves its forms and colours. This is the main inconvenient

found in the bitmaps images.

Some of the most popular and used vector graphics formats are:

CDR – Corel DRaw: Format generated by the program with the same name.

AI – Adobe Illustrator: Characteristics similar to Corel DRaw.

EPS – Encapsulated PostScript: Very adaptable format. It is one of the best formats

to be imported from most of design software.

WMF – Windows MetaFile: Format developed by Microsoft, and especially suited to

work with Microsoft programs.

1.2.2. Moving Image

A moving image is typically a movie (film), or video, including digital video.

Specifically, digital video is composed for a series of orthogonal bitmap digital images

displayed in rapid succession at a constant rate. In the context of video these images are

called frames, and typically is measured the rate at which these frames are displayed

in frames per second (FPS).

There are two different formats to get the images, interlaced and progressive scan.

The interlaced scan gets the image in groups of alternate lines, first the odd lines, and after

the even lines, repeating progressively. In the other case, a progressive scan gets every

image individually, with all scan lines being captured at the same moment in time. Thus,

interlaced video captures samples the scene motion two times faster as often as progressive

video does, for the same number of frames per second.

The digital video can be copied without losing quality, and many compression and

encoding formats are used, such as WindowsMedia, MPEG2, MPEG4 or AVC. Probably,

MPEG4 and Windows Media are widely the most used in internet, while MPEG2 is almost

exclusive for DVD, giving a good quality image with minimum size.

Image Stitching 11

2. IMAGE PROCESSING

Now that it has seen how image are formed, it is time to take a look at the stage of

image processing, to pre-process the image and convert it into a form suitable for further

analysis. This chapter reviews standard image processing operators and transforms that

map pixel values from one image to another.

2.1. Point Operators

The point operators or processes are the simplest kind of image processing

transforms, where each output pixel’s value depends on only the corresponding input pixel

value. This can be denoted as

 = ℎ , (2.1) a function that takes one or more input images f(x) and produces an output image g(x). For

sampled images, the domain consists of a finite number of pixels locations, replacing the

value  =  ,  in the equation. Two commonly used point operators are multiplication and addition with a constant

 =  + , (2.2) where a and b are said to control contrast and brightness, respectively.

Multiplicative gain is a linear operation related to the superposition principle.

ℎ0 + 1 = ℎ0 + ℎ1. (2.3) Another commonly used two-input operator is the linear blend operator,

 = 1 ± 0 + 1, (2.4) used to perform a temporal cross-dissolve between two images or videos.

One highly used non-linear transform applied before further processing is gamma

correction, which is used to remove the non-linear mapping.

 = []/ . (2.5)

Image Stitching 12

Moreover, there are the colour transforms, that adding the same value to each colour

channel not only increases the apparent intensity of each pixel, it can also affect the pixel’s

hue and saturation. This colour balancing can be performed either by multiplying each

channel with a different scale factor or by more complex processes.

The automatic way to determine the best values of the brightness and gain controls

described before is plotting a histogram of the individual colour channels and luminance

values. From this distribution, we can compute relevant statistics such as the minimum,

maximum and average intensity values. One common solution is to perform histogram

equalization, to find an intensity mapping function such that the resulting histogram is flat.

The trick to finding such a mapping is the same than to generate random samples

from a probability density function, which is to first compute the cumulative distribution

function. Integrating the distribution h(I) to obtain the cumulative distribution (or percentile)

c(I),

 =  ∑ ℎ  =  − 1 +  ℎ, !"# (2.6) it can determine the final value that pixel should take (N is the number of pixels in the image).

When working with eight-bit pixel values, the I and c axes are rescaled from [0; 255].

(a) (b) (c)

(d) (e)

Figure 2.1. Histogram analysis and equalization: (a) original image; (b) colour channel and intensity

histograms; (c) cumulative distribution functions; (d) equalization functions; (e) full histogram

equalization.

Image Stitching 13

While global histogram equalization can be useful, for some images it might be

preferable different equalizations in different regions. One technique is to recompute the

histogram for every MxM non-overlapped block centred at pixels, and then interpolate the

transfer functions as it moves between blocks. This method is known as local adaptative

histogram equalization, and is used in a variety of other applications, including the

construction of SIFT (Scale Invariant Fourier Transform) feature descriptors.

2.2. Neighbourhood Operators

Locally adaptative histogram equalization is an example of neighbourhood or local

operator, which uses a collection of pixel values in the surrounding area of a given pixel to

determine its final output value. In addition, neighbourhood operators can be used to filter

images in order to add soft blur, sharpen details, accentuate edges, or remove noise. There

are linear filtering operators that involve weighted combinations of pixels in small

neighbourhoods, and non-linear filtering operators such as median or bilateral filters and

distance transforms.

2.2.1. Linear Filtering

The most commonly used type of neighbourhood operator is linear filter, in which an

output pixel’s value is determined as a weighted sum of input pixel values,

 ,  = ∑  + $, + %ℎ$, %.&,' (2.7) The entries in the mask h(k,l) or kernel, are often called the filter coefficients. Another

common variant and compactly notated formula is the convolution operator,

 =  ∗ ℎ , (2.8) and h is called the impulse response function. Both are linear shift invariant (LSI) operators,

which obey both the superposition principle,

ℎ ) # +  = ℎ ) # + ℎ )  (2.9) and the shift invariance principle,

 ,  =  + $, + % ↔ ℎ )  ,  = ℎ )  + $, + %. (2.10)

Image Stitching 14

The simplest filter to implement is the moving average or box filter, which simply

averages the pixel values in a KxK window. It is used as a pre-processing stage to edge

extraction and interest point detection algorithms.

2.2.2. Non-linear filtering

Linear filters can perform a wide variety of image transformations; however, non-

linear filters can sometimes perform even better. One of these filters, the median filter,

selects the median value form each pixel’s neighbourhood, and is able to filter away such

bad pixels. Other case is the bilateral filter, which simply rejects the pixels whose values

differ too much from the central pixel, and the output pixel value depends on a weighted

combination of neighbouring pixel values.

Other examples of neighbourhood operators include semi-global operator that

computes distance transforms. The distance transform is useful in quickly precomputing the

distance to a curve or set of points using a two-pass raster algorithm, and is defined as

+ ,  = min&,':0&,'"# 1 − $, − %, (2.11) where d is the distance metric between pixel offsets, and can be Manhattan distance or

Euclidean distance. It has many applications, including binary image alignment, feathering in

image stitching and blending, and nearest point alignment.

Finally, another practical semi-global image operation is finding connected

components, defined as regions of adjacent pixels that have the same value or label.

Connected components are used in a variety of applications, such as finding letters in

scanned documents or finding objects in images.

2.3. Image Pyramids and Wavelets

Neighbourhood operators can be cascaded to form image pyramids and wavelets, for

analyzing images at a variety of resolutions and for accelerating certain operations. There

are two possibilities for changing image resolution: interpolation and decimation.

In order to interpolate (or up sample) and image to a higher resolution, it is necessary

to select some interpolation mask with which to convolve the image. On the other hand, there

is decimation (or down sample), which is required to reduce the resolutions, where first the

image convolves with a low-pass filter (to avoid aliasing) and then keep every sample.

Image Stitching 15

2.3.1. Pyramids

With both techniques mentioned before, it can build a complete image pyramid, which

can be used to accelerate coarse-to-fine search algorithms, to look for objects at different

scales, and to perform multi-resolution blending operations. The best known and most widely

used is Laplacian pyramid. To construct it, first the original image is blurred and subsampled

by a factor two and stored in the next level of the pyramid. To compute it, first it interpolates a

lower resolution image to obtain a reconstructed low-pass version from the original to yield

the band-pass “Laplacian” image, stored away for further processing. The resulting pyramid

has perfect reconstruction, sufficient to exactly reconstruct the original image.

One of the most engaging applications of the Laplacian pyramid is the creation of

blended composite image. The approach is that low-frequency colour variations between the

images are smoothly blended, while the higher-frequency textures on each one are blended

more quickly to avoid ghosting effects when two textures are overlaid. This is particularly

useful in image stitching and compositing applications, where the exposures may vary

between different images.

(a) (b)

Figure 2.2. Laplacian pyramid in image blending: (a) regular splice of original images (b) pyramid

blend.

2.3.2. Wavelets

An alternative to pyramids is the use of wavelet decompositions. Wavelets are filters

that localize a signal in both space and frequency and are defined over a hierarchy of scales.

Wavelets provide a smooth way to decompose a signal into frequency components without

blocking and are closely related to pyramids.

Image Stitching 16

(a) (b)

Figure 2.3. Multiresolution pyramids: (a) pyramid with half-octave sampling; (b) wavelet pyramid,

where each wavelet level stores 3/4 of the original pixels, so that the total number of wavelet

coefficients and original pixels is the same.

The main difference between pyramids and wavelets is that traditional pyramids are

over complete, using more pixels than the original image to represent the decomposition,

whereas wavelets keep the size of the decomposition the same as the image, providing a

tight frame.

2.4. Geometric Transformations

After seeing how to change the resolution of an image in general, geometric

transformations are introduced as another important class of global operators. These perform

more general transformations, such as image rotations or general warps. In contrast to the

point operators or processes, the functions transform the domain, 2′ = 42 , and not the range of the image. Between different geometric transformations, which most concerns to

image stitching is the parametric 2D transformation, where the behaviour of the

transformation is controlled by a small number of parameters.

Figure 2.4. Basic 2D geometric image transformations.

Image Stitching 17

The examples of transformations shown in Figure 2.4. are based on the formulas

reproduced in the next table, where I is the inverse matrix, 5 = 6)78 −79:879:8 )78 ;, s an arbitrary scale factor, and <= and A arbitraries 3x3 and 3x2 matrix respectively.

Transformation Matrix Preserves Icon

translation [ > | @ ]ABC orientation rigid (rotation+translation) [ D | @ ]ABC lengths similarity (scaled rotation) [ 7D | @ ]ABC angles affine [ E ]ABC parallelism projective F G=HCBC straight lines

Table 2.1. Hierarchy of 2D coordinate transformations. Each transformation also preserves the

properties listed.

The process to compute the values in the new image 2′ is called inverse warping. Each pixel in the destination image 2′ is sampled from the original image 2. The procedure for creating the new image is the following: for every pixel 2′in 2′, firstly the source location 2 = 4I2J is computed, and after the 2 at location 2is resampled and copied to 2′. This explanation is illustrated in the next figure,

(a) (b)

Figure 2.5. Inverse warping algorithm: (a) a pixel sampled from its corresponding location; (b) detail

of the source and destination pixel locations.

where 4I2J is often simply computed as the inverse of ℎ2. Since 4I2J is defined for all pixels in 2′, there are not holes in the result image.

Image Stitching 18

3. STITCHING PROCESS

Algorithms for aligning images and stitching them into seamless photo-mosaics are

among the oldest and most widely used in computer vision. Image stitching is the process of

combining multiple images with overlapping fields of view to produce high-resolution photo-

mosaics used for today’s digital maps and satellite photos. Image stitching algorithms can

create wide-angle panoramas, and they also come bundled with most digital cameras.

Since the pictures are taken until the creation of the stitched image, there are

different processes to follow, starting with thedetection of points or features of the single

images, and ending with image merging. The image stitching processes can be classified in

three main modules: registration, optimization and blending.

In this chapter, first a short history of image stitching is given as a context situation,

continuing with the different stage of the stitching process, describing and covering in detail

each stage.

3.1. History

Image stitching originated in the photographic community, where more manually

intensive methods based on surveyed ground control points or manually registered tie points

have long been used to register aerial photos into large-scale photo-mosaics. One of the key

advances in this community was the development of bundle adjustment algorithms, which

could simultaneously solve for the locations of all of the camera positions, thus yielding

globally consistent solutions. Another recurring problem in creating photo-mosaics is the

elimination of visible seams, for which a variety of techniques have been developed over the

years.

In film photography, special cameras were developed in the 1990s to take ultra-wide

angle panoramas, often by exposing the film through a vertical slit as the camera rotated on

its axis. In the middle of 1990s, image alignment techniques started being applied to the

construction of wide-angle seamless panoramas from regular hand-held cameras. More

recent work in this area has addressed the need to compute globally consistent alignments

to remove ghosting due to parallax error and object movement, and to deal with varying

exposures. These techniques have spawned a large number of commercial stitching

products.

Image Stitching 19

3.2. Image Registration

Image registration involves the detection and matching pixels or features in a set of

images. After that, it estimates the correct alignments relating various pairs o group of

images. Before it can register and align images, it needs to establish the relationships of the

pixel coordinates from one image to another, which is done with the parametric motion

models shown in the previous chapter. Depending on the technique used in registration and

alignment, it can consider two different methods: pixel-based method and feature-based

method. Because of the development of the stitching script in chapter five, it is given more

importance to the feature-based registration method, and it is explained in more detail.

3.2.1. Direct (Pixel-Based) Registration

This approach consists in to warp the images relative to each other and to look at

how much the pixels agree, using pixel to pixel matching. It is often called direct method. To

use this method, first a suitable error metric must be chosen to compare the images. After

that, a suitable search technique must be devised, where the simplest technique is to do a

full search. Alternatively, hierarchical and Fourier transforms can be used to accelerate the

process.

The simplest way to establish an alignment between two images is to warp one

image to the other. Given a template image #2 sampled at discrete pixel locations 2! = !, K!, the goal is to find where is located in image 2. A least-squares solution is to find the minimum of the sum of squared differences (SSD) function

LMMNO = ∑ [2! + O − #2!]A! , (3.1) where O = P, Q is the displacement. The above error metric can be made more robust to outliers by replacing the squared

error terms with a robust function R , LMSNO = ∑ R2! + O − #2!! , (3.2)

that grows less quickly than the quadratic function associated with least squares. One robust

R  possibility can be a smoothly varying function that is quadratic for small values but grows more slowly away from the origin. It is called Geman-McClure function,

RTU = BVWBV XVY , (3.3)

Image Stitching 20

where is  constant outlier threshold. When stitching a mosaic, unwanted foreground objects have to be erased, because

some of the pixels being compared may lie outside the original image boundaries. Then, the

error metric become the weighted SSD function,

LZMMNO = ∑ [#2[2! + O[2! + O − #2!]A! , (3.4) and the weighting functions [# and [ are zero outside the valid range of the images. Often, the two images to stitch were not taken with the same exposure. The bias and

gain model is a simple model of linear intensity variation between the two images,

2 + O = 1 + #2 + \ ⇒ L^TO = ∑ [2! + O − 1 + #2! − \]A! , (3.5)

with \ and  as the bias and gain respectively. An alternative to taking intensity differences is to perform cross-correlation to maximize the product of the two images,

L__O = ∑ #2!2! + O! . (3.6) To accelerate these search processes, hierarchical motion estimation is often used,

where an image pyramid (discussed in the previous chapter) is first constructed, and a

search over a smaller number of discrete pixels is first performed at coarser levels. The

model estimates from one level of the pyramid, and initializes a smaller local search at the

next finer level.

This estimation is not sufficient if the search range corresponds to a significant

fraction of the larger image, and a Fourier-based approach may be preferable. Fourier-based

alignment relies on the fact that Fourier transform of a warped signal has the same

magnitude as the original, but linearly varying phase,

ℱa2 + Ob = ℱab9cAdeO·g = ℐg9cAdeO·g. (3.7) Consequently, to efficiently evaluate the previous models over the range of all possible

values of O, the Fourier transforms of both images are taken, and operate with both transforms, and the inverse transform is taken of the result. The Fast Fourier Transform

algorithm can be significantly faster than a full search when the full range of image overlaps

is considered.

Image Stitching 21

Finally, to get sub-pixel precision in the alignment, incremental refinement methods

based on Taylor series and parametric motion models should be used, but it is not interesting

for the goals of this report to go inside the extended approximation computation.

3.2.2. Feature-Based Registration

The other possible approach to image registration is first to extract distinctive features from each image, to match these features establishing a global correspondence, and then to

estimate the geometric transformation between the images. This kind of approach has more

recently popularity for image stitching applications, and it will be used in the development of

the script in the next chapter.

In this subsection, first it is explained the feature detection (extraction) stage, where

each image is searched for locations that are likely to match well in other images. At the

feature description stage, each region around detected keypoint locations is converted into a

more compact and stable (invariant) descriptor that can be matched against other

descriptors. The feature matching stage efficiently searches for likely matching candidates in

other images. Finally, the last step is estimates the motion parameters that best register the

images.

There are two main approaches to finding feature points and their correspondences.

The first is to independently detect features in all the images under consideration and then

match features based on their local appearance. The second is to find features in one image

that can be accurately tracked using a local search technique, such as correlation or least

squares. The first approach is more suitable when a large amount of motion or appearance

change is expected, e.g., in stitching together panoramas, while the second is more suitable

when images are taken from nearby viewpoints or in rapid succession. Here it is explained

the first approach, due to the relevance of the panoramas creation.

Feature Detectors

The first kind of feature that you may notice are specific locations in the images, such

as mountain peaks, building corners, doorways, or interestingly shaped patches of snow.

These kinds of localized feature are often called keypoint features or control points and are

often described by the appearance of patches of pixels surrounding the point location.

Image Stitching 22

Figure 3.1. Image pairs with three extracted patches below. Some patches can be localized with

higher accuracy than others.

Normally, texture-less patches are nearly impossible to localize, while patches with

large contrast changes (or gradients) are easier to localize, although it is only possible to

align patches along the direction normal to the edge direction. Patches with gradients in at

least two different orientations are the easiest to localize.

These can be formalized by comparing two images patches, in a weighted summed

square difference,

LZMMNO = ∑ [2![2! + O − #2!]A! , (3.8) where # and  are the images being compared, O the displacement vector, [2 is a spatially varying weighting function, and the summation is over all the pixels in the patch. Moreover, if it is wanted how stable this metric is with respect to small variations in position

∆O, it is needed to compare an image patch against itself, known as an auto-correlation function,

Lj_∆O = ∑ [2![2! + ∆O − #2!]A! = ∆OkE∆O, (3.9) approximating with a Taylor series expansion of the image function, where E is the auto- correlation matrix, and is written as E = [ ∗ l BA BmBm mA n.

Sometimes, feature detectors can lead to an uneven distribution of control points

across the image. One solution is to use adaptative non-maximal suppression (ANMS),

which only detects features whose response value is significantly greater than that of all of its

neighbours within a certain radius.

Image Stitching 23

In many situations, detecting features at the finest stable scale possible may not be

appropriate. The way to do it is extracting features at a variety of scales, for example by

performing the same operations at multiple resolutions in a pyramid and then matching

features at the same level. In addition, it is required to work with in-plane image rotation,

estimating a dominant orientation at each detected control point. Once its local orientation

and scale have been estimated, a scaled and oriented patch around the detected keypoint

can be extracted and used to form a feature descriptor. This approach is suitable when the

images being matched do not suffer large scale changes and is called multi-scale oriented

patches (MOPS).

Figure 3.2. Multi-scaled oriented patches (MOPS) extracted at three pyramid level. The boxes show

feature orientation at different scales.

Feature Descriptors

After detecting control points, in most cases, the local appearance of features

changes in orientation and scale, sometimes even undergoes affine deformations, and

usually varies from image to image. A feature descriptor is created by first computing the

gradient magnitude (or patches with large contrast changes) and orientation at each image

sample point in a region around the control point location. These keypoints descriptors are

created to make the features points detected more invariant to such changes, still

discriminating between different patches. There are a few descriptors which can be used to

improve the keypoints found.

Image Stitching 24

Image Gradients Feature Descriptors

Figure 3.3. Representation the computation of a feature descriptor

A simple normalized intensity patches perform reasonably well and are easy to

implement. The multi-scaled oriented patches (MOPS) are sampled, using a coarser level of

the image pyramid to avoid aliasing. To compensate for affine photometric variations, patch

intensities are re-scaled by normalization.

Scale invariant feature transform (SIFT) features are formed by computing the

gradient at each pixel in a 16x16 window around the detected control point, using the

appropriate level of the pyramid at which the keypoint was detected. This is done in order to

reduce the influence of gradients far from the centre, as these are more affected by small

misregistrations. This algorithm is widely explained in the chapter four of this report.

Another ways to compute descriptor are inspired by SIFT, for example Using principal

component analysis (PCA-SIFT), or using box filters to approximate the derivatives and

integrals used in SIFT, called SURF (Speeded up robust features). Another popular variant

on SIFT is Gradient location-orientation histogram (GLOH), that usually has the best

performance overall.

Feature Matching

Once the features and their descriptors have been extracted from two or more

images, the next step is to establish some preliminary feature matches between these

images. The first is to select a matching strategy that determines which correspondences are

passed on to the next stage for further processing. The second is to devise data structures

and algorithms to perform this matching as quickly as possible. This process depends on the

context in which the matching is being performed. For image stitching, two images are given

comments (0)
no comments were posted
be the one to write the first!
This is only a preview
3 shown on 95 pages
Download the document