Image Stitching Using Structure Deformation | COSC 6373, Papers of Computer Science

Material Type: Paper; Class: Computer Vision; Subject: (Computer Science); University: University of Houston; Term: Spring 2008;

Typology: Papers

Pre 2010

Uploaded on 08/19/2009

koofers-user-myc-1
koofers-user-myc-1 🇺🇸

5

(2)

10 documents

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Image Stitching Using Structure Deformation
Jiaya Jia, Member,IEEE, and Chi-Keung Tang, Senior Member,IEEE Computer Society
Abstract—The aim of this paper is to achieve seamless image stitching without producing visual artifact caused by severe intensity
discrepancy and structure misalignment, given that the input images are roughly aligned or globally registered. Our new approach is
based on structure deformation and propagation for achieving the overall consistency in image structure and intensity. The new stitching
algorithm, which has found applications in image compositing, image blending, and intensity correction, consists of the following main
processes. Depending on the compatibility and distinctiveness of the 2D features detected in the image plane, single or double optimal
partitions are computed subject to the constraints of intensity coherence and structure continuity. Afterwards, specific 1D features are
detected along the computed optimal partitions from which a set of sparse deformation vectors is derived to encode 1D feature matching
between the partitions. These sparse deformation cues are robustly propagated into the input images by solving the associated
minimization problem in gradient domain, thus providing a uniform framework for the simultaneous alignment of image structure and
intensity. We present results in general image compositing and blending in order to show the effectiveness of our method in producing
seamless stitching results from complex input images.
Index Terms—Image stitching, structure deformation, image alignment.
Ç
1INTRODUCTION
TECHNIQUES in image stitching or blending have been
widely applied to generating a natural image composite
given a set of globally registered images with limited
overlapped region [22], [36]. For image mosaicing applica-
tions, global registration is performed based on a variety of
predefined camera motion models [33]. For applications
aiming to create special effect by engrafting image objects [1],
[6], input images are initially registered by manual dragging
or assuming static camera configuration. In all these situa-
tions, even a small misalignment may cause local intensity or
structure inconsistency and produce visual artifacts.
In order to obtain satisfactory results in image stitching, a
natural transition from one image to another is required,
where both structure and intensity should be aligned or
matched within, or possibly beyond, the overlapped area.
In this paper, we address the general problem of image
stitching in the presence of severe structure and intensity
discrepancy and propose a novel technique to simulta-
neously and globally eliminate misalignment in structure
and intensity between the overlapped images.
Previous techniques in image stitching [6], [22], [36]
optimize a blending function that minimizes the intensity
difference in the vicinity of the overlapped area. There is,
however, no guarantee that, after intensity alignment, image
features or structures will also be aligned. Structure mis-
alignment causes image ghosting or blurring artifact, where a
salient edge fades out as it enters the overlapped area and
fades in just a few pixels away but in a shifted position. To
align image features, nonparametric and patch-based tech-
niques have been recently proposed in texture synthesis
based on texture deformation [13], [37]. To synthesize a
natural texture image, the detected features are matched and
deformed inside the overlapped texture samples. Therefore,
local structure across patch boundaries can be maintained
after synthesis. However, these techniques fail to handle
input images with significant color or intensity inconsistency.
Moreover, complex global structures and detailed patterns
typical of natural images will significantly increase the
ambiguity in their 2D patch matching process.
In our experiments, we observe that a successful image-
stitching algorithm should not only create a smooth transition
within the overlapped region but also preserve the following
properties, which are in general agreement with our visual
perception:
.Structure preservation. The stitched image should
not break existing or create new salient structures. A
counterexample is shown in Fig. 1a, where the edge of
the tower is broken in the overlapped region due to
structure misalignment, causing obvious ghosting
artifact.
.Intensity alignment. Human eyes are sensitive to
large intensity change. Unbalanced contrast beyond
the overlapped area of a stitched image can be
perceptually magnified. An example is shown in
Fig. 1b. Although the structure is well aligned and
color transition is smooth within the overlapped area,
the unnatural color transition from left to right reveals
the unmatched intensities inherent in the input
images.
.Image context consideration. Last but not the least,
the context information of objects in the input
images should be taken into account during the
stitching process. For instance, in Fig. 1c, when the
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 4, APRIL 2008 617
.J. Jia is with the Department of Computer Science and Engineering, The
Chinese University of Hong Kong, Shatin, N.T., Hong Kong.
.C.-K. Tang is with the Department of Computer Science and Engineering,
The Hong Kong University of Science Technology, Clear Water Bay, Hong
Kong. E-mail: [email protected].
Manuscript received 18 Oct. 2006; revised 7 June 2007; accepted 11 June
2007; published online 21 June 2007.
Recommended for acceptance by C. Taylor.
For information on obtaining reprints of this article, please send e-mail to:
[email protected], and reference IEEECS Log Number TPAMI-0735-1006.
Digital Object Identifier no. 10.1109/TPAMI.2007.70729.
0162-8828/08/$25.00 ß2008 IEEE Published by the IEEE Computer Society
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Image Stitching Using Structure Deformation | COSC 6373 and more Papers Computer Science in PDF only on Docsity!

Image Stitching Using Structure Deformation

Jiaya Jia, Member, IEEE, and Chi-Keung Tang, Senior Member, IEEE Computer Society

Abstract—The aim of this paper is to achieve seamless image stitching without producing visual artifact caused by severe intensity discrepancy and structure misalignment, given that the input images are roughly aligned or globally registered. Our new approach is based on structure deformation and propagation for achieving the overall consistency in image structure and intensity. The new stitching algorithm, which has found applications in image compositing, image blending, and intensity correction, consists of the following main processes. Depending on the compatibility and distinctiveness of the 2D features detected in the image plane, single or double optimal partitions are computed subject to the constraints of intensity coherence and structure continuity. Afterwards, specific 1D features are detected along the computed optimal partitions from which a set of sparse deformation vectors is derived to encode 1D feature matching between the partitions. These sparse deformation cues are robustly propagated into the input images by solving the associated minimization problem in gradient domain, thus providing a uniform framework for the simultaneous alignment of image structure and intensity. We present results in general image compositing and blending in order to show the effectiveness of our method in producing seamless stitching results from complex input images.

Index Terms—Image stitching, structure deformation, image alignment. Ç

1 I NTRODUCTION

T

ECHNIQUES in image stitching or blending have been widely applied to generating a natural image composite given a set of globally registered images with limited overlapped region [22], [36]. For image mosaicing applica- tions, global registration is performed based on a variety of predefined camera motion models [33]. For applications aiming to create special effect by engrafting image objects [1], [6], input images are initially registered by manual dragging or assuming static camera configuration. In all these situa- tions, even a small misalignment may cause local intensity or structure inconsistency and produce visual artifacts. In order to obtain satisfactory results in image stitching, a natural transition from one image to another is required, where both structure and intensity should be aligned or matched within, or possibly beyond, the overlapped area. In this paper, we address the general problem of image stitching in the presence of severe structure and intensity discrepancy and propose a novel technique to simulta- neously and globally eliminate misalignment in structure and intensity between the overlapped images. Previous techniques in image stitching [6], [22], [36] optimize a blending function that minimizes the intensity difference in the vicinity of the overlapped area. There is, however, no guarantee that, after intensity alignment, image features or structures will also be aligned. Structure mis- alignment causes image ghosting or blurring artifact, where a

salient edge fades out as it enters the overlapped area and fades in just a few pixels away but in a shifted position. To align image features, nonparametric and patch-based tech- niques have been recently proposed in texture synthesis based on texture deformation [13], [37]. To synthesize a natural texture image, the detected features are matched and deformed inside the overlapped texture samples. Therefore, local structure across patch boundaries can be maintained after synthesis. However, these techniques fail to handle input images with significant color or intensity inconsistency. Moreover, complex global structures and detailed patterns typical of natural images will significantly increase the ambiguity in their 2D patch matching process. In our experiments, we observe that a successful image- stitching algorithm should not only create a smooth transition within the overlapped region but also preserve the following properties, which are in general agreement with our visual perception:

. Structure preservation. The stitched image should not break existing or create new salient structures. A counterexample is shown in Fig. 1a, where the edge of the tower is broken in the overlapped region due to structure misalignment, causing obvious ghosting artifact. . Intensity alignment. Human eyes are sensitive to large intensity change. Unbalanced contrast beyond the overlapped area of a stitched image can be perceptually magnified. An example is shown in Fig. 1b. Although the structure is well aligned and color transition is smooth within the overlapped area, the unnatural color transition from left to right reveals the unmatched intensities inherent in the input images. . Image context consideration. Last but not the least, the context information of objects in the input images should be taken into account during the stitching process. For instance, in Fig. 1c, when the

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 4, APRIL 2008 617

. J. Jia is with the Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong. E-mail: [email protected]. . C.-K. Tang is with the Department of Computer Science and Engineering, The Hong Kong University of Science Technology, Clear Water Bay, Hong Kong. E-mail: [email protected]. Manuscript received 18 Oct. 2006; revised 7 June 2007; accepted 11 June 2007; published online 21 June 2007. Recommended for acceptance by C. Taylor. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TPAMI-0735-1006. Digital Object Identifier no. 10.1109/TPAMI.2007.70729. 0162-8828/08/$25.00 ß 2008 IEEE Published by the IEEE Computer Society

images are stitched in a way shown at the bottom, because of the horizontal shift of the second input image, the windows that straddle the overlapped area are widened. Fig. 1d shows the ground truth where the windows are of uniform size. To address the above issues, we propose a general approach in image stitching based on structure alignment and deformation propagation in natural images. Unlike the previous deformation techniques aiming to align medical images [3] or textures [13], in this paper, our method does not assume camera motion or deformation models. Instead, colors and structures may vary significantly across the images that are problematic to many conventional methods. In our approach, we reduce the ambiguity of deriving structure matching from 2D to 1D, where salient feature detection and matching can be more robustly performed. This is achieved by computing one or two optimal partitions between each pair of the overlapped images, along which structure deformation is performed. In order to achieve smooth and natural deformation, we represent the structure alignment by using feature deformation vectors and propagate them from the optimal partitions toward other pixels in the input images. Such structure deformation propagation is performed in the gradient domain, which globally reduces intensity discre- pancy among images. The rest of our paper is organized as follows: Section 2 reviews related work. Then, in Section 3, we present our algorithms in computing one or two optimal partitions, depending on the compatibility of the detected image features The preliminary version of our algorithm [18] computes one partition that satisfies the smoothness and alignment constraints. This paper generalizes the notion to searching for two matchable partitions to make the deformation produce more reasonable and meaningful results. In Section 4, the main results from different

applications using our method are shown. The comparisons with previous methods are also given. Finally, we discuss our method and conclude our paper in Section 5.

2 RELATED W ORK

Our image stitching aligns not only image intensity but also image structure while preserving the inherent object context. In this section, we review related work in image stitching and structure deformation.

2.1 Image Stitching

Many image registration methods have been developed in recent years. In the presence of significant intensity differ- ence, color blending with the use of a weighting mask over the overlapped area is commonly adopted for generating a smooth intensity transition. For instance, the video mosaics algorithm proposed in [32] estimates the homography matrix for aligning two overlapped images. To reduce visible artifact and local misalignment, the overlapped regions are blended using a bilinear weighting function. In [36], a feather-based algorithm is proposed, which uses averaging and interpola- tion functions to reduce intensity difference. Unnatural transition, however, is still inevitable since only local operations inside the overlapped regions are performed. Burt and Adelson [6] use a multiresolution spline to perform blending. All these methods only locally blend images in the overlapped areas to transit the images from one lighting environment to another. The local alignment method proposed in [34] performs deghosting, which works well in many situations. However, it requires the recovery of the true 3D ray directions, making it difficult to handle occlusions. Color or intensity difference among images may also make the method susceptible to local minimum. A general review of image alignment and stitching can be found in [33]. Recently, methods in distinctive feature detection and matching have undergone rapid development. Representa- tive methods include Harris corner detector [15], scale invariant SIFT [23], and affine invariant feature detectors [25]. By employing robust feature matching, automatic panorama recognition based on RANSAC is proposed in [4], where multiband blending is introduced to reduce the blurring effect by assigning blending weights to different frequency band. Sand and Teller [30] match video frames using detected features. One of their goals is to find the best matching frames in different videos. The video matching algorithm cannot be directly applied to general image stitching. Methods using optimal seam are proposed to composite natural or texture images [1], [11], [20]. These methods first compute the color difference in the overlapped area between the two input images. Then, dynamic programming [11] or Graph Cuts [20] is used to compute an optimal partition that produces the least color difference between the two textures/ images. In [9], partitions among different motions are also computed. Optimal seam methods do not explicitly consider image features. An ideal partition that does not intersect salient structures may not be found. Another problem of optimal seam is the possible ambiguity associated with the placement of the seam, as shown in Fig. 1c. The stitched

618 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 4, APRIL 2008

Fig. 1. Typical visual artifact in image stitching. The top row shows the input images to be stitched, whereas the bottom shows the stitching results. (a) One example where the structures of the input images are not correctly aligned. The ghosting artifact is apparent. (b) Color inconsistency in the stitching result. Although the color transition is smooth within the overlapped region, unbalanced intensities are still visible. (c) Object context in image stitching. Given the two input images in the top, the result shown in the bottom is considered visually satisfactory from the point of view of color and structure consistency. However, the windows have the wrong size in the overlapped region. (d) Ground truth, showing that image context, in this case, the uniformity of the window size, should be considered in the stitching process.

minimizing an energy function. We associate each matched 1D feature pair with a corresponding deformation vector and produce a sparse matching set in the image plane.

. Deformation propagation. The computed deforma- tion vectors are propagated from the detected 1D features toward all other pixels in the target image to produce a smooth transition in structure and intensity. A deformed gradient map is computed, followed by the reconstruction of the color image result. Fig. 3 shows a simple illustration where two objects, with color and structure discrepancy, as shown in Figs. 3a and 3b, are to be stitched. Fig. 3c shows the result using our approach in the color image space. Even without considering color blending, our structure deformation can already align the overlapped elliptical structures. Fig. 3d shows that when our method is applied in the gradient domain, after image reconstruction, structures are smoothly aligned while color difference between the input images is reduced. Here, we also show the results obtained using other methods. More comparisons on natural images are given in Section 4. Fig. 3e shows the result using the optimal seam method [11]. Because of the salient structure misalignment, no matter how the seam is constructed, the edge break-up cannot be avoided. Fig. 3f shows the feathering result, which still contains evident structure and color misalignment. Fig. 3g demonstrates the stitching result generated from the optimal seam in the gradient domain. Without the explicit feature alignment, visual artifacts due to edge discontinuity and color inconsistency are apparent. Fig. 3h is the result produced by our implementation of the texture deformation method [37], which is designed for synthesizing texture images. Without the consideration of intensity dissimilarity, the result is unsatisfactory. In the following, we describe in detail each computation step overviewed in Fig. 2.

3.1 Optimal Partition Computation

In image stitching, directly matching and aligning all pixels within the overlapped region implies a quadratic search space in the total number of pixels. The large search space causes ambiguities in the matching process and makes the general alignment process difficult. In image registration or morphing [14], [29], a sparse set of matched points or lines is either manually drawn by the user or computed from the images. Afterward, interpolation or approximation using, for example, thin-plate splines or radial basis functions, is applied to all other pixels to smoothly propagate the matching. This approach is limited in deform- ing general images, because the underlying transformation may not be coincident with the adapted model or function. For instance, if the unknown distortion within the overlapped regions is nonlinear, there may not exist any registration model that adequately describes the transformation. This leads to the problem that even if a large set of points is correctly matched, it is still difficult to appropriately propagate the matching information to all other nondistinc- tive features such as edges and uniform textures. Fig. 4 shows one simple example in which the feature (corner) points are correctly registered (highlighted using the black crosses in the two input images on the left), there still exists an apparent structure misalignment in the stitching result (shown in the rightmost image), simply because the transformation model does not match the unknown local geometry between the edges of the two input images. In this section, we shall describe our approach in computing an optimal partition, or two matchable partitions, to robustly align selected 1D features along them. In our method, even if the transformation model between the images is unknown, we are still able to compute robust matching for all the necessary pixels in order to produce seamless image stitching. Either one of the techniques, referred to as a single optimal partition (SOP) or a double optimal partition (DOP), is employed, depending on the compatibility and distinctiveness of the feature points detected in the overlapped regions of the two images. Single Optimal Partition (SOP). There are two situations where an SOP will be applied to minimize structure misalignment. One case is that the overlapped area between the input images only contains textureless regions, where no sufficient distinctive 2D features can be reliably matched. The other situation is that the input images capture different

620 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 4, APRIL 2008

Fig. 3. Toy example. (a) and (b) are two input images. The overlapped regions are indicated by the dashed box. Color and structure misalignment between the images are present. (c) Our result from image space stitching. The edges are aligned. (d) Our result from gradient space stitching, where the structures are properly connected and the color inconsistency is globally corrected. (e) Result using the optimal seam method in [11]. The structure misalignment is inevitable. (f) Result obtained using feathering. (g) Result obtained using the optimal seam operated in the gradient domain. (h) Result from the method in [37]. The warping method cannot produce smooth transition in the presence of significant intensity differences.

Fig. 4. Image matching using distinctive features. The two images on the left are to be stitched. The correctly matched feature (corner) point pairs are marked in the two images using “þ” within the overlapped area. The rightmost image shows the registration result. Although the feature points are aligned well, the edges delimited by the matched feature points are misaligned because of the noticeable difference in local geometry inadequately handled by, for instance, thin-plate spline registration. Without the exact knowledge of the image transformation, 2D deformation is problematic even in this simple example.

scenes and inherently contain no matchable features. The latter situation happens in unconventional image composit- ing, for which we shall show examples in Section 4. Here, we formulate the partitioning problem as one of the labeling and adopt the Graph Cuts method to find an optimal solution. We define the gradient alignment cost Sðp; qÞ between any adjacent pixels p and q as the sum of the computed values in the red, green, blue (rgb) color channels:

Sðp; qÞ ¼

X

r;g;b

ðð 1  ÞS (^) m þ S (^) dÞ; ð 1 Þ

where S (^) m and S (^) d are two costs measuring the gradient smoothness and similarity between the neighboring pixels, which will be defined shortly. is a weight used to balance the relative influence of the two costs, which is set to 0.3 in our experiments. Before the above computation, we assume that both images have already been smoothed by Gaussian filtering. S (^) m is defined as

S (^) mðp; qÞ ¼ krI (^) S ðpÞk þ krI (^) S ðqÞk þ krI (^) T ðpÞk þ krI (^) T ðqÞk; where kr  k denotes the norm of the gradient for each pixel. I (^) S and I (^) T are the input images with the overlapped area. Thus, S (^) m takes gradient smoothness into account, which effectively avoids the partition from breaking object edges in both input images. Sd is defined as

S (^) dðp; qÞ ¼krx I (^) S ðpÞ  rx I (^) T ðpÞk þ krx I (^) S ðqÞ  rx I (^) T ðqÞkþ kry I (^) S ðpÞ  ry I (^) T ðpÞk þ kry I (^) S ðqÞ  ry I (^) T ðqÞk; ð 2 Þ where each term above represents the gradient-level similarity at the same pixel location in the overlapped area. S (^) d penalizes pixel dissimilarity in the gradient domain. Sðp; qÞ, combining S (^) m and Sd, enables the Graph Cuts method to produce a good initial partition with maximum continuity in the gradient domain. Fig. 5 illustrates that the cut divides into (^) s and (^) t, which generates two new regions I (^) s and I (^) t (colored in gray and dark green, respec- tively) to be stitched. The set of pixels along the optimal cut in images I (^) S and I (^) T define the optimal partitions, denoted by @I^0 s and @I t^0 , respectively. @I s^0 and @I^0 t exactly overlap. The set of boundary pixels of I (^) t excluding @I^0 t is denoted by @I (^) t  @I t^0. @I^0 t and @I (^) t  @I^0 t are, respectively, indicated using orange and blue curves, shown in Fig. 5b.

Comparing to the optimal seam methods proposed in [11], [20], our new cost function takes into account both gradient smoothness and gradient similarity. Our partition favors smooth area in both images, which effectively reduces structure complexity along the partition and in turn reduces the matching ambiguity in the subsequent steps. Comparison of our stitching results with those produced by other optimal seam methods is presented in Section 4. Double optimal partitions (DOP) with feature analysis. If there are sufficient distinctive 2D features in the over- lapped area and the two images are matchable, we employ the features as context constraint in computing optimal partitions. The corresponding partitions in the two images may not exactly overlap due to geometric distortion or transformation. We propose to compute respectively two distinctive and matchable partitions in the two input images. Without assuming known distortion among the images, we first compute the optimal partition @I s^0 in image I (^) S. Then, a set of constraints with respect to image color, feature, and structure are employed in generating a matchable @I^0 t in image I (^) T. Our unified framework consists of several steps. They are listed in Table 1 and will be further motivated and described in the following sections. Later, we will show in Fig. 9 a running example with the intermediate results produced in different steps. Similar to computing SOP, the input images I (^) S and I (^) T are assumed to have been smoothed by Gaussian filtering.

3.1.1 Computing Partition @I s^0

The color deviation along the optimal partition @I s^0 should be small in order to make the following matching and deformation between the two partitions simple and robust. Accordingly, in the computation, we minimize the color differences of pixels along the partition. Denoting these pixels fI (^) sð 0 Þ; I (^) sð 1 Þ;... ; I (^) sðn  1 Þg in sequence order, where I (^) sðiÞ 2 NðI (^) sði þ 1 ÞÞ, n is the total number of the pixels, and NðÞ is the set of the four nearest neighbors, we propose to minimize the following objective function:

f (^) s ¼

X

0 i<n 1

kði; sÞk^2 ; ð 3 Þ

where

ði; sÞ ¼ 0 kI (^) sðiÞ  I (^) sði þ 1 Þk <  I (^) sðiÞ  I (^) sði þ 1 Þ kI (^) sðiÞ  I (^) sði þ 1 Þk  :

ð 4 Þ

 is a predefined threshold to suppress the small intensity change along the partition and to encourage the partition to pass through smooth regions. This minimization problem can be solved using dynamic programming by traversing all

JIA AND TANG: IMAGE STITCHING USING STRUCTURE DEFORMATION 621

Fig. 5. Single optimal partition (SOP). (a) Images IS and IT are overlapped. The SOP divides the overlapped region into (^) s and (^) t. (b) The partitioned images are Is and It. The partition boundary @I^0 s and @I t^0 (shown in orange) exactly overlap. They are respectively formed by the pixels in IS and I (^) T along the cut. The blue boundary of It is @I (^) t  @I^0 t.

TABLE 1

Double Optimal Partitioning: Steps

fold over. One example of triangulation (in)compatibility is shown in Fig. 8. Unfortunately, the problem of determining whether two point sets are compatible is believed to be NP- hard [2]. In [31], it shows that if Steiner points (extra points) are allowed, any two sets of N points may be made compatible by adding OðN^2 Þ points. In our method, it is not possible to add more points since our working set consists of the most similar features, which

are already computed and matched. Fortunately, it is possible to remove some of them to achieve compatibility. In the following, we define the triangulation in I (^) S as T (^) S ¼ T ðVS ; ES Þ, where VS ¼ fF (^) Sm 1 ;    ; F (^) Sm kg, and ES is an edge subset of connected vertex pairs fF (^) Si ; F (^) Sj g, and propose an algorithm shown in Table 2 to compute compatible triangulations. In essence, we compute two compatible triangulations by incrementally removing problematic points. We use Delau- nay triangulation [12] because it maximizes the minimum angle and makes triangles shape more uniform. This is important in partition computation and distance ratio measurement. In practice, our triangulation algorithm converges rapidly, thanks to the initial rough alignment of the images. The matched features are not far away in the overlapped regions. In all our experiments, the number of iterations is always less than 8. We show in Figs. 9d and 9e the computed compatible triangulations given the input images shown in Figs. 9a and 9b.

JIA AND TANG: IMAGE STITCHING USING STRUCTURE DEFORMATION 623

Fig. 8. Triangulation compatibility. (a) Input triangulation. (b) Incompa- tible triangulation with (a) since some edges cross each other. (c) A compatible triangulation with (a).

Fig. 9. A running example for demonstrating the computation of DOPs. (a) and (b) are IS and IT , respectively. (c) is the initial blending showing structure misalignment. (d) and (e) are the compatible triangulations, respectively, computed in the overlapped area of IS and IT. (f) and (g) show the optimal partitions, respectively, computed in the two input images, which are highlighted in green. They pass between the detected 2D features. (h) and (i) show the detected 1D features along the partitions using red crosses. By feature matching, deformation propagation, and image reconstruction, we produce seamless stitching result shown in (j).

3.1.3 One-Dimensional Feature Detection along the

Partitions

In the image plane, 2D distinctive feature points represent corners or edge joints. Similarly, along 1D partitions, there also exist features indicating abrupt change in intensity, as shown in Fig. 10 using red crosses. These 1D features are most noticeable, if they are not well aligned during the partition matching process. In this section, we propose an algorithm to detect 1D features along @I^0 s. A similar algorithm will be used to compute matchable 1D features along @I t^0 in later sections. We propose a 1D feature detection algorithm to robustly estimate the strongest gradient along the partitions. The detailed steps are described in Table 3. In the last step, we assign each detected 1D feature point a direction (polarity)

to represent the gradient projected onto the partition, which takes a value either negative or positive, as illustrated in Fig. 11. The 1D features f (^) sk detected along @I^0 s are ordered. We show in Fig. 9h, the running example, the 1D detected features along the partition @I^0 s using red crosses given the input image I (^) S in Fig. 9a.

3.1.4 Computing Partition @I t^0

Taking the output from all the previous steps, we have obtained a set of constraints to characterize the partition @I t^0. We summarize these constraints as follows:

. Smoothness constraint. We have computed the partition in I (^) S by minimizing f (^) s ¼

P

0 i<n kði; sÞk

Likewise, a matchable @I^0 t should also satisfy the smoothness constraint.

. Triangulation constraint. We have triangulated the overlapped regions in the two images into T (^) s and T (^) t, which satisfy the compatibility requirement, and obtained a subset of triangles in T (^) s intersected by @I^0 s. The triangulation constraint favors a matchable @I^0 t that intersects the corresponding triangles in T (^) t. One example is shown in Fig. 12, where @I s^0 and @I t^0 , respectively, intersect three corresponding triangles. . Structure constraint. In our 1D feature detection along @I^0 s, a set of feature points with signed feature strength is detected. These features characterize the inherent structure of @I s^0 with which @I t^0 should exactly match. Specifically, the 1D features detected along @I^0 t should be distributed in a similar way as those along @I^0 s.

624 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 4, APRIL 2008

Fig. 10. One-dimensional features along the partition. (a) The input image shown in Fig. 7. The blue line is the partition. The red crosses indicate the 1D features detected by using gradient strength along the partition. (b) Plot of pixel intensity and gradient strength along the partition. The local maxima of the gradient strength are highlighted using the red crosses. They map to the 1D features along the partition in (a). TABLE 3 1D Feature Detection Along a Partition

Fig. 11. Direction of 1D features along the optimal partition. The light green curve is @I^0 s. p and q are the curve features since they are salient structure points, shown in the magnified view on the right. We assign the direction of @I s^0 , shown as the black dashed arrow on the right. The gradient directions of p and q are illustrated by the purple arrows. The corresponding gradient direction projected onto the curve is negative for p and positive for q.

Fig. 12. Triangulation constraint. The two input images are triangulated with compatible triangulation using five points. Since the optimal partition in IS intersects three triangles, as shown on the left, @I t^0 should also intersect the corresponding three triangles. The corresponding ratios (for instance, sð 0 Þ and tð 0 Þ shown in the figure) by which the partitions divide the edges should also be similar.

TABLE 2

Computing Compatible Triangulations

order. One example is shown in Fig. 13. This problem can be formulated as

E^0 ¼ min

X

0 i<m

ðgT ðiÞ  gS ðkiÞÞ^2

s:t: 0  k 0 < k 1 <    < k (^) m 1 < n:

ð 10 Þ

We propose a dynamic programming algorithm to solve (10). Before the description, we denote

E^0 a;b ¼ min

X

0 i<a

ðgtðiÞ  gsðkiÞÞ^2 s:t: 0  k 0 <    < ka 1 < b;

which is to minimize the energy in matching the first a features in @I^0 t to the first b features in @I^0 s in the respective queues, where b  a. Therefore, E^0 m;n ¼ E^0. In the top-down approach, we have

E^0 m;n ¼ min m 1 i<n E^0 m 1 ;i þ min ij<n ðg (^) tðm  1 Þ  gsðjÞÞ^2

E^02 ; 3 ¼ min 1 i< 3 E 10 ;i þ min ij< 3 ðg (^) tð 1 Þ  gsðjÞÞ^2

E^02 ; 2 ¼ ðg (^) tð 0 Þ  g (^) sð 0 ÞÞ^2 þðg (^) tð 1 Þ  gsð 1 ÞÞ^2 ; E^01 ; 1 ¼ ðgtð 0 Þ  g (^) sð 0 ÞÞ^2 :

This implies that the dynamic programming using an array storage can efficiently solve the matching problem. Now, we have constructed a feature mapping, which is injective since there may exist unmatched features along one partition. There are two possible ways to further handle them: leaving them unmatched or matching them with the nearest similar features on the other partition without having feature mappings crossing each other. The first solution may still cause structure misalignment in the final

stitched image, whereas the latter method produces a surjective mapping and makes 1D features merge during the transformation, which is employed in our approach.

3.3 Deformation Propagation

For each matched 1D feature pair ðf k t 1 ; f (^) sk 2 Þ along the two partitions, we construct a deformation vector:

Vðf (^) tk 1 Þ ¼ V (^) xðf (^) tk 1 Þ; Vyðf k t 1 Þ; Vkrk ðf (^) tk 1 Þ

where Vx and V (^) y are the x and y components of the vector pointing from f (^) tk 1 to f (^) sk 2 (the matched feature pixels) in the image plane, as shown on the right in Fig. 14. Vkrk measures the strength difference in the gradient map and is computed by

Vkrk ðf (^) tk 1 Þ ¼ krI (^) tðf (^) tk 1 Þk  krI (^) sðf (^) sk 2 Þk: ð 11 Þ

Therefore, the deformation vector consists of both the geometric and strength differences between the matched 1D features, respectively, in the image plane and gradient map. For clarity of depiction, in the following, we describe our structure deformation method by matching features from I (^) t to I (^) s. The analogous problems that I (^) s is deformed to match I (^) t, or both I (^) s and I (^) t are deformed to match their mean respective feature points, are similar. To smoothly propagate deformation vectors from the sparse 1D features along @I t^0 to part of or all other pixels in image I (^) t, we define the deformation area St to represent the region being affected by the deformation propagation. We list two configurations in producing seamless results as follows:

  1. full propagation, where S (^) t ¼ I (^) t to globally diffuse the deformation and 2) partial propagation, where S (^) t ¼ (^) t to locally deform the image. They are alternatively used in our method in different situations. Taking Fig. 15 as an example, the yellow region in Figs. 15a and 15b are the deformation areas, where St ¼ (^) t and S (^) t ¼ I (^) t, respectively. The sparse 1D feature points are illustrated using small red squares with associated defor- mation vectors V. In Fig. 15a, @S^0 t is the boundary separating S (^) t and I (^) t  S (^) t. Thus, in order not to propagate the deformation outside of S (^) t, we set the deformation vectors to be 0 ¼ f 0 ; 0 ; 0 g for all pixels along @S t^0. For Fig. 15b, S (^) t ¼ I (^) t, so @S^0 t is outermost boundary, colored in dark purple. In both cases, we have

626 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 4, APRIL 2008

Fig. 14. After 1D feature matching, similar 1D features (pairs of black and blue dots) along the two partitions in the images should be transformed to align the two partitions. One example is shown in the zoom-in view on the right, where features in It and I (^) s are aligned according to the direction ðV (^) x ; VyÞ, as indicated by the dashed arrow in image plane.

Fig. 15. Boundary condition for deformation propagation. (a) and (b) show that S (^) t ¼ (^) t and S (^) t ¼ It, respectively. The small red squares are the 1D features whose deformation vectors are marked by arrows. Along @S t^0 , we set the deformation vector to 0 to avoid unnecessary deformation in I (^) t  S (^) t.

Fig. 13. Feature matching in queues of different lengths. In this example, @I t^0 has m detected features in the queue, whereas @I s^0 has n features, n > m as shown. Therefore, there is no one-to-one mapping between the two queues. We propose to match each feature in @I^0 t to a distinctive in @I s^0 in a monotonic order. Crossed feature mapping such as the one highlighted in red is not allowed.

VðpÞ ¼ 0 8 p 2 @S t^0 : Given the sparsely assigned deformation vectors for the features and pixels along @S t^0 to smoothly propagate the deformation inside image I (^) t, we propose to solve the minimization problem

V^ ¼ arg min V

Z

p 2 S (^) t

krVk^2 dp ð 12 Þ

by using conjugate gradients. After the optimization, each pixel in St is associated with a deformation vector. Finally, using the propagated deformation vectors in S (^) t, we perform an inverse mapping with bilinear interpolation in the gradient domain in S (^) t to construct the deformed gradient map. The final image is obtained by solving the Poisson equations on the deformed gradient map. To summarize this section, our double-optimal-partition method appropriately incorporates the three stitching prop- erties described in Section 1. Specifically, the structure connectivity is preserved in our stitching process by introdu- cing the structure constraint in Section 3.1.4 and structure deformation in Section 3.3. The intensity alignment is achieved by the operations in gradient domain: After solving the Poisson equations, global color consistency can be obtained. The image context is considered by incorporating the feature

points in computing @I t^0. The sparsely matched features offer the necessary image topology information, and the triangu- lation constraint in Section 3.1.4 requires that the partitions be in relatively similar positions in the aligned images.

4 RESULTS

In this section, we show that our method is capable of generating natural image stitching results for a variety of scenes. Comparison with other methods using our imple- mentation is also given.

4.1 Image Stitching Using Single Optimal

Partition (SOP)

We first show some image stitching examples using SOP. In Figs. 16a and 16b, we show two overlapped images of a sunset scene. Precise alignment is difficult because of the local displacement and the small overlapped region shown inside the green boxes in the figure. Fig. 16c is the feathering result obtained from initial alignment, where the ghosting artifact is significant because of intensity discrepancy between the images. In Figs. 16d, 16e, 16f, 16g, 16h, and 16i, we compare our result with those generated using previous methods. Fig. 16d is the result obtained by the optimal seam method operated in gradient domain. Although the color discrepancy is alleviated, it does not help in solving the problem of

JIA AND TANG: IMAGE STITCHING USING STRUCTURE DEFORMATION 627

Fig. 16. Sunset. (a) and (b) are the two registered images. The green boxes indicate the overlapped area. (c) Feathering result from the initial alignment. The ghosting artifact is evident because of local structure and intensity misalignment. (d), (e), (f), (g), (h), and (i) are zoom-in views of the results generated by (d) the optimal seam method operated in the gradient domain. The seam is highlighted in red. (e) GIST1 [22]. (f) Structure deformation in [37]. (g) Feature matching in [13]. (h) Our method using SOP. (i) Magnified view of our result, where we set S (^) t ¼ I (^) t. (j) The result from the optimal seam method. (k) is our image stitching result, where intensity and structures are globally aligned.

placed to achieve maximum alignment. Using our automatic 1D feature detection, matching and deformation, as shown in Fig. 20f, structure misalignment is eliminated in our result. Magnified views are shown in Fig. 20e.

5 CONCLUSION AND D ISCUSSION

In this paper, we have proposed a novel image stitching approach by image deformation, where the overlapped regions may contain significant intensity inconsistency and structure misalignment. Instead of generating precise

alignment, considering all salient features in a 2D image plane or using a predefined model in image alignment, we propose to match only the necessary features along one or two optimal partitions and use them to construct a sparse set of deformation vectors. This reduces the misalignment problem caused by complex 2D structure and color, especially for input images with significant mismatches in the overlapped area. From the 1D sparse features detected along the partition(s), we propagate the deformation into the target image smoothly. Structure deformation and color correction are simulta- neously achieved within the same framework operating in

JIA AND TANG: IMAGE STITCHING USING STRUCTURE DEFORMATION 629

Fig. 18. DOPs. (a) and (b) are input images I (^) S and IT , respectively. Because of image distortion, the initial alignment of the two images introduces visual artifact, as shown in (c). We apply DOP. The steps are shown in (d) to (j). (d) shows the computed @I^0 s. (e) illustrates the detected 1D features along @I s^0 , denoted by crosses. To compute a matchable partition @I t^0 , 2D matched features are first detected in the overlapped area (f), followed by constructing compatible triangulations in the two images in (g). Combining all constraints, the partition @I t^0 is computed, as shown on the right in (h). (i) shows the detected 1D features along @I^0 t. (j) shows the deformation vectors computed on the corresponding 1D features. (k) shows the result computed using the optimal seam method [11], where structure mismatching is apparent. (l) is our result after structure deformation. Both structure and color are well aligned.

the image gradient domain. In experiments, we also observe that commonly used methods such as blending or optimal seam cannot always produce seamless results. Our method, when applied to image composition, can automatically search for matchable features and align them by deformation. This largely alleviates the users from carefully and manually matching structures along the optimal boundaries.

Our method provides a principled and effective way to address the general problem of natural image stitching. SOP or DOP are adopted in different situations to achieve seamless stitching. Generally speaking, SOP is applicable when very few feature points are found in the overlapped regions, whereas the DOP method is appropriate for more general scenes. Due to complexity of natural images, the following factors may influence our stitching quality. First, in our experiments, when DOP is applied, we need to employ existing feature detectors and descriptors [24] to match features. The SIFT detector is not invariant to affine transformation and partial occlusion and may produce erroneous matching results in difficult images. We show one example in Figs. 21a, 21b, 21c, and 21d, where the input images contain very complex structures (the canyon) and textures (the plant). In this example, the features cannot be matched well. The magnified regions in Fig. 21d illustrates the misaligned structures. Second, our method stitches images in the image gradient domain, it does not guarantee to produce the best visual effect if the source and target images are very different in appearance. One example is shown in Figs. 21e, 21f, and 21g, where we composite part of the chimney in Fig. 21f to the image in Fig. 21e. Although the main structures are aligned well, the stitching result is not visually natural because the textures of the two chimneys are very dissimilar. In the future, we shall investigate other image matching criteria to handle the above difficult examples. Moreover, extending this method to videos and multiple images is another possible direction.

630 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 4, APRIL 2008

Fig. 19. Brushes. (a) Input image. (b) Part of the lower brush is copied and pasted onto the upper one, as shown inside the yellow region. (c) Feathering result. (d) Optimal seam result. (e) GIST1 result [22]. (f) Result of structure deformation in [37]. (g) Result by direct Poisson blending [27]. (h) Our result. The complexity in feature matching is reduced to 1D, allowing the sparse deformation vectors to be robustly propagated into the interior of the image to enforce the necessary structure continuity and smoothness.

Fig. 20. Bust. (a) and (b) show that, respectively, the source and target images. (c) shows part of bust in (a) is to be composited to the target image. In traditional image editing, a user needs to carefully align the source object and the target image in order to produce a seamless result. (d) shows that in this example, even with careful manual alignment and optimal seam computation followed by Poisson blending, the result still contains artifact because of structure discontinuity. Magnified views are shown in (e). (f) shows our result where all pertinent structures are seamlessly aligned.

Fig. 21. Two difficult examples. (a) and (b) are two input images to be stitched. They contain complex structures and features. (c) shows our stitching result. (d) The magnified region still contains errors. (e) and (f) show two input images. We graft the chimney in the red rectangle onto (e). (g) shows our result. Although the structure is aligned well, it does not look natural.