























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Image processing and computer vision project report includes the whole research on panorama stitching.
Typology: Study Guides, Projects, Research
1 / 95
This page cannot be seen from the preview
Don't miss anything!
























































































ABSTRACT
Image processing is any form of signal processing for which the input is an image, such as a photograph or video frame; the output of image processing may be either an image or, a set of characteristics or parameters related to the image. Most image processing techniques involve treating the image as a two-dimensional signal and applying standard signal processing techniques to it. Specifically, image stitching presents different stages to render two or more overlapping images into a seamless stitched image, from the detection of features to blending in a final image. In this process, Scale Invariant Feature Transform (SIFT) algorithm can be applied to perform the detection and matching control points step, due to its good properties.
The process of create an automatic and effective whole stitching process leads to analyze different methods of the stitching stages. Several commercial and online software tools are available to perform the stitching process, offering diverse options in different situations. This analysis involves the creation of a script to deal with images and project data files. Once the whole script is generated, the stitching process is able to achieve an automatic execution allowing good quality results in the final composite image.
RESUM
Processament d'imatge és qualsevol tipus de processat de senyal en aquell que l'entrada és una imatge, com una fotografia o fotograma de vídeo, i la sortida pot ser una imatge o conjunt de característiques i paràmetres relacionats amb la imatge. Moltes de les tècniques de processat d'imatge impliquen un tractament de la imatge com a senyal en dues dimensions, i per això s'apliquen tècniques estàndard de processament de senyal. Concretament, la costura o unió d'imatges presenta diferents etapes per unir dues o més imatges superposades en una imatge perfecta sense costures, des de la detecció de punts clau en les imatges fins a la seva barreja en la imatge final. En aquest procés, l'algoritme Scale Invariant Feature Transform (SIFT) pot ser aplicat per desenvolupar la fase de detecció i selecció de correspondències entre imatges a causa de les seves bones qualitats.
El desenvolupament de la creació d'un complet procés de costura automàtic i efectiu, passa per analitzar diferents mètodes de les etapes del cosit de les imatges. Diversos programari comercials i gratuïts són capaços de dur a terme el procés de costura, oferint diferents alternatives en diverses situacions. Aquesta anàlisi implica la creació d'una seqüència de commandes que treballa amb les imatges i amb arxius de dades del projecte generat. Un cop aquesta seqüència és creada, el procés de cosit d'imatges és capaç d'aconseguir una execució automàtica permetent uns resultats de qualitat en la imatge final.
1. INTRODUCTION
Signal processing is an area of electrical engineering and applied mathematics that operates and analyzes signals, in either discrete or continuous time, performing useful operations on those signals. Some of the most common signals can include sound, images, time-varying measurement values, sensor data, control system signals, telecommunication transmission signals and many others. These signals are analog or digital electrical representations of time-varying or spatial-varying physical magnitude.
It can differentiate between three types of signal processing, depending on which kind of signal is used: analog signal processing for not digitized signals, as radio, telephone, radar, and television systems; discrete time signal processing for sampled signals that are defined only at discrete points in time; and digital signal processing as the processing of digitised discrete time sampled signals, done by computers or specialized digital signal processors.
Digital signal processing usually aims to measure, filter and/or compress continuous real analog signals. Its first step is to convert the signal from an analog to a digital form, sampling it using an analog-to-digital converter, which converts the analog signal into a stream of numbers. However, the output signal is often another analog signal, which requires a digital-to-analog converter. Digital signal processing allows many advantages over analog processing in many applications, such as error detection and correction in transmission as well as data compression. It includes subfields like: audio and speech signal processing, sonar and radar signal processing, spectral estimation, statistical signal processing, digital image processing, signal processing for communications, and many others.
Concretely in this memory, the subcategory of signal processing analyzed is image processing , also usually refers to digital image processing. Image processing is a type of signal processing where the input is an image, such as a photograph or video frame; and the output of image processing can be an image or a set of parameters related to the image. Most image processing techniques involve treating images as a two dimensional signals and applying standard signal processing techniques. Specifically, digital image processing uses a wide range of computer algorithms to perform image processing on digital images, avoiding problems such as the increase of noise and signal distortion during the process. Medical and microscope image processing, face and feature detection, computer vision and image stitching are some of the different applications in the field of image processing.
Before starting to describe the image processing and the image stitching process, it is required to understand the basic objects that it is going to work with: images. It can discriminate between still image (digital image) and moving image (digital video). In this section, first the characteristics and different formats of digital image are explained. After that, considering video stitching as the next step of image stitching, a brief introduction to the digital video is presented.
1.2.1. Still Image
In the field of engineering and computer science, it requires a kind of still image that can be manipulated by computers. For this reason it is used a numeric representation of a two-dimensional static image, known as digital image.
Firstly, to obtain a digital image from an analog image, the digitalization process is performed in some devices such as scanners or digital cameras. After that, the digital image is prepared to be processed. There are different digital image formats to work: bitmap or raster format and vector. Often, it can combine both formats in one image.
Figure 1.1. Raster/Bitmap vs. Vector Image
Raster Image
Raster graphic image or bitmap is composed by a serial of points, called pixels, that contains colour information. Bitmap images depend on the resolution, containing a fixed number of pixels. Each pixel has a concrete location and colour value information, what convert the pixel to the basic information unit of the image. The pixels are distributed creating a grid of cells, where each cell is a pixel, and all together build the whole image. When the
PNG – Portable Network Graphics: has become important over the last times. It allows lossless compression, merging perfectly any image edge with the background. It is not able to play animated images such as GIF, and the images have more weight than in JPEG.
BMP – BitMaP: is the Windows format, very popular but its compression is poor compared with other formats such as JPEG.
PSD – PhotoShop Document: Is the format for the Adobe program, widely used because it is one of the most powerful photography programs graphically.
TIFF – Tag Image File Format: is admitted in almost all the edition and image applications. It allows many possibilities for both Mac and PC.
Vector Image
Vector images or vector graphics oriented to objects are made by vectors, objects mathematically created. The most important vector elements are Béizer curves, mathematically represented. Each vector is defined by a serial of points that have some handles to control the line shape created between them. The curve is totally defined by nodes or anchor points, and the handles. Moving the handles it can obtain the wanted curve.
Figure 1.2. Different Béizer Curves.
These lines or curves of Béizer are quite manageable because give a lot of possibilities due to their plasticity. These characteristics convert the vector images to the ideal way to work in the field of graphic design, for example in the creation of drawings or logos. The versatility of the curves makes them useful to work with text, modifying and deforming letters without limit.
Using mathematical coordinates to create images, vectorial formats allow an infinite image resolution. If an image is enlarged or reduced, its visibility will not change, nor on the
screen or printed. The image conserves its forms and colours. This is the main inconvenient found in the bitmaps images.
Some of the most popular and used vector graphics formats are:
CDR – Corel DRaw: Format generated by the program with the same name.
AI – Adobe Illustrator: Characteristics similar to Corel DRaw.
EPS – Encapsulated PostScript: Very adaptable format. It is one of the best formats to be imported from most of design software.
WMF – Windows MetaFile: Format developed by Microsoft, and especially suited to work with Microsoft programs.
1.2.2. Moving Image
A moving image is typically a movie (film), or video, including digital video. Specifically, digital video is composed for a series of orthogonal bitmap digital images displayed in rapid succession at a constant rate. In the context of video these images are called frames, and typically is measured the rate at which these frames are displayed in frames per second (FPS).
There are two different formats to get the images, interlaced and progressive scan. The interlaced scan gets the image in groups of alternate lines, first the odd lines, and after the even lines, repeating progressively. In the other case, a progressive scan gets every image individually, with all scan lines being captured at the same moment in time. Thus, interlaced video captures samples the scene motion two times faster as often as progressive video does, for the same number of frames per second.
The digital video can be copied without losing quality, and many compression and encoding formats are used, such as WindowsMedia, MPEG2, MPEG4 or AVC. Probably, MPEG4 and Windows Media are widely the most used in internet, while MPEG2 is almost exclusive for DVD, giving a good quality image with minimum size.
Moreover, there are the colour transforms , that adding the same value to each colour channel not only increases the apparent intensity of each pixel, it can also affect the pixel’s hue and saturation. This colour balancing can be performed either by multiplying each channel with a different scale factor or by more complex processes.
The automatic way to determine the best values of the brightness and gain controls described before is plotting a histogram of the individual colour channels and luminance values. From this distribution, we can compute relevant statistics such as the minimum, maximum and average intensity values. One common solution is to perform histogram equalization , to find an intensity mapping function such that the resulting histogram is flat.
The trick to finding such a mapping is the same than to generate random samples from a probability density function, which is to first compute the cumulative distribution function. Integrating the distribution h(I) to obtain the cumulative distribution (or percentile) c(I) ,
ᡕ䙦ᠵ䙧 = (^) 〕⡩ ∑ 【〶⢀⡨ ℎ䙦ᡡ䙧 = ᡕ䙦ᠵ − 1䙧 + (^) 〕⡩ ℎ䙦ᠵ䙧, (2.6)
it can determine the final value that pixel should take ( N is the number of pixels in the image). When working with eight-bit pixel values, the I and c axes are rescaled from [0; 255].
(a) (b) (c)
(d) (e) Figure 2.1. Histogram analysis and equalization: (a) original image; (b) colour channel and intensity histograms; (c) cumulative distribution functions; (d) equalization functions; (e) full histogram equalization.
While global histogram equalization can be useful, for some images it might be preferable different equalizations in different regions. One technique is to recompute the histogram for every MxM non-overlapped block centred at pixels, and then interpolate the transfer functions as it moves between blocks. This method is known as local adaptative histogram equalization , and is used in a variety of other applications, including the construction of SIFT (Scale Invariant Fourier Transform) feature descriptors.
Locally adaptative histogram equalization is an example of neighbourhood or local operator, which uses a collection of pixel values in the surrounding area of a given pixel to determine its final output value. In addition, neighbourhood operators can be used to filter images in order to add soft blur, sharpen details, accentuate edges, or remove noise. There are linear filtering operators that involve weighted combinations of pixels in small neighbourhoods, and non-linear filtering operators such as median or bilateral filters and distance transforms.
2.2.1. Linear Filtering
The most commonly used type of neighbourhood operator is linear filter , in which an output pixel’s value is determined as a weighted sum of input pixel values,
ᡙ䙦ᡡ, ᡢ䙧 = ∑ (^) 〸,〹 ᡘ䙦ᡡ + ᡣ, ᡢ + ᡤ䙧ℎ䙦ᡣ, ᡤ䙧. (2.7)
The entries in the mask h(k,l) or kernel, are often called the filter coefficients. Another common variant and compactly notated formula is the convolution operator,
ᡙ = ᡘ ∗ ℎ , (2.8)
and h is called the impulse response function. Both are linear shift invariant (LSI) operators, which obey both the superposition principle,
ℎ ᡧ 䙦ᡘ⡨ + ᡘ⡩䙧 = ℎ ᡧ ᡘ⡨ + ℎ ᡧ ᡘ⡩ (2.9)
and the shift invariance principle,
ᡙ䙦ᡡ, ᡢ䙧 = ᡘ䙦ᡡ + ᡣ, ᡢ + ᡤ䙧 ↔ 䙦ℎ ᡧ ᡙ䙧䙦ᡡ, ᡢ䙧 = 䙦ℎ ᡧ ᡘ䙧䙦ᡡ + ᡣ, ᡢ + ᡤ䙧. (2.10)
2.3.1. Pyramids
With both techniques mentioned before, it can build a complete image pyramid, which can be used to accelerate coarse-to-fine search algorithms, to look for objects at different scales, and to perform multi-resolution blending operations. The best known and most widely used is Laplacian pyramid. To construct it, first the original image is blurred and subsampled by a factor two and stored in the next level of the pyramid. To compute it, first it interpolates a lower resolution image to obtain a reconstructed low-pass version from the original to yield the band-pass “Laplacian” image, stored away for further processing. The resulting pyramid has perfect reconstruction, sufficient to exactly reconstruct the original image.
One of the most engaging applications of the Laplacian pyramid is the creation of blended composite image. The approach is that low-frequency colour variations between the images are smoothly blended, while the higher-frequency textures on each one are blended more quickly to avoid ghosting effects when two textures are overlaid. This is particularly useful in image stitching and compositing applications, where the exposures may vary between different images.
(a) (b) Figure 2.2. Laplacian pyramid in image blending: (a) regular splice of original images (b) pyramid blend.
2.3.2. Wavelets
An alternative to pyramids is the use of wavelet decompositions. Wavelets are filters that localize a signal in both space and frequency and are defined over a hierarchy of scales. Wavelets provide a smooth way to decompose a signal into frequency components without blocking and are closely related to pyramids.
(a) (b) Figure 2.3. Multiresolution pyramids: (a) pyramid with half-octave sampling; (b) wavelet pyramid, where each wavelet level stores 3/4 of the original pixels, so that the total number of wavelet coefficients and original pixels is the same.
The main difference between pyramids and wavelets is that traditional pyramids are over complete, using more pixels than the original image to represent the decomposition, whereas wavelets keep the size of the decomposition the same as the image, providing a tight frame.
After seeing how to change the resolution of an image in general, geometric transformations are introduced as another important class of global operators. These perform more general transformations, such as image rotations or general warps. In contrast to the point operators or processes, the functions transform the domain, ᡙ䙦∆′䙧 = ᡘ㐵←䙦∆䙧㐹, and not
the range of the image. Between different geometric transformations, which most concerns to image stitching is the parametric 2D transformation , where the behaviour of the transformation is controlled by a small number of parameters.
Figure 2.4. Basic 2D geometric image transformations.
3. STITCHING PROCESS
Algorithms for aligning images and stitching them into seamless photo-mosaics are among the oldest and most widely used in computer vision. Image stitching is the process of combining multiple images with overlapping fields of view to produce high-resolution photo- mosaics used for today’s digital maps and satellite photos. Image stitching algorithms can create wide-angle panoramas, and they also come bundled with most digital cameras.
Since the pictures are taken until the creation of the stitched image, there are different processes to follow, starting with the detection of points or features of the single images, and ending with image merging. The image stitching processes can be classified in three main modules: registration, optimization and blending.
In this chapter, first a short history of image stitching is given as a context situation, continuing with the different stage of the stitching process, describing and covering in detail each stage.
Image stitching originated in the photographic community, where more manually intensive methods based on surveyed ground control points or manually registered tie points have long been used to register aerial photos into large-scale photo-mosaics. One of the key advances in this community was the development of bundle adjustment algorithms, which could simultaneously solve for the locations of all of the camera positions, thus yielding globally consistent solutions. Another recurring problem in creating photo-mosaics is the elimination of visible seams, for which a variety of techniques have been developed over the years.
In film photography, special cameras were developed in the 1990s to take ultra-wide angle panoramas, often by exposing the film through a vertical slit as the camera rotated on its axis. In the middle of 1990s, image alignment techniques started being applied to the construction of wide-angle seamless panoramas from regular hand-held cameras. More recent work in this area has addressed the need to compute globally consistent alignments to remove ghosting due to parallax error and object movement, and to deal with varying exposures. These techniques have spawned a large number of commercial stitching products.
Image registration involves the detection and matching pixels or features in a set of images. After that, it estimates the correct alignments relating various pairs o group of images. Before it can register and align images, it needs to establish the relationships of the pixel coordinates from one image to another, which is done with the parametric motion models shown in the previous chapter. Depending on the technique used in registration and alignment, it can consider two different methods: pixel-based method and feature-based method. Because of the development of the stitching script in chapter five, it is given more importance to the feature-based registration method, and it is explained in more detail.
3.2.1. Direct (Pixel-Based) Registration
This approach consists in to warp the images relative to each other and to look at how much the pixels agree, using pixel to pixel matching. It is often called direct method. To use this method, first a suitable error metric must be chosen to compare the images. After that, a suitable search technique must be devised, where the simplest technique is to do a full search. Alternatively, hierarchical and Fourier transforms can be used to accelerate the process.
The simplest way to establish an alignment between two images is to warp one image to the other. Given a template image ᠵ⡨䙦∆䙧^ sampled at discrete pixel locations ∆〶 = 䙦ᡶ〶, ᡷ〶䙧, the goal is to find where is located in image ᠵ⡩䙦∆䙧. A least-squares solution is to find the minimum of the sum of squared differences (SSD) function
ᠱ〠〠々䙦∃䙧 = ∑ [ᠵ〶 ⡩䙦∆〶 + ∃䙧 − ᠵ⡨䙦∆〶䙧]⡰, (3.1)
where ∃ = 䙦ᡳ, ᡴ䙧 is the displacement.
The above error metric can be made more robust to outliers by replacing the squared error terms with a robust function ‥䙦 䙧,
ᠱ〠〙々䙦∃䙧 = ∑ ‥䙦ᠵ〶 ⡩䙦∆〶 + ∃䙧 − ᠵ⡨䙦∆〶䙧䙧, (3.2)
that grows less quickly than the quadratic function associated with least squares. One robust ‥䙦 䙧 possibility can be a smoothly varying function that is quadratic for small values but grows more slowly away from the origin. It is called Geman-McClure function,
ㄘ ⡩⡸けㄘ㐕 〨 (^) ㄘ,^ (3.3)