Stereo Matching and Optical Flow Estimation Techniques - Prof. Brian Potetz, Study notes of Electrical and Electronics Engineering

Various stereo matching algorithms and their taxonomy, as well as techniques for optical flow estimation. Stereo matching algorithms are used to find correspondences between two images of the same scene, while optical flow estimation is used to estimate the motion of objects in a video sequence. Different matching costs, aggregation methods, and optimization techniques used in stereo matching, as well as the optical flow constraint equation and methods for computing optical flow.

Typology: Study notes

Pre 2010

Uploaded on 03/19/2009

koofers-user-7no
koofers-user-7no 🇺🇸

10 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
!!"#$%&'($")*+,-./$0121)3
4/153$6)-.-7
8599$:;;%
<.=-,/.$>:?($@+A=59$89)B
#,CC.2-.D$E.5D13C
F)A)3(
#G5+1/)$H$#-)=I*53J$"G5+-./$KL
Dynamic Programming
N points have N! possible correspondences.
BUT, we might assume the ordering of points is the same
in the left & right eyes.
Now, solve for the best matching of p4 given p3, etc.
c d e f g ka
Left scanline
i
Right scanline
a c f g jkh
b
M
L
R
R
R
M
LLM
M
M
Figure 2: Stereo matching using dynamic programming. For each
pair of corresponding scanlines, a minimizing path through the
matrix of all pairwise matching costs is selected. Lowercase letters
(ak) symbolize the intensities along each scanline. Uppercase
letters represent the selected path through the matrix. Matches
are indicated by M, while partially occluded points (which have
a fixed cost) are indicated by Land R, corresponding to points
only visible in the left and right image, respectively. Usually, only
a limited disparity range is considered, which is 0–4 in the figure
(indicated by the non-shaded squares). Note that this diagram
shows an “unskewed” x-dslice through the DSI.
ger disparities (exceptions include continuous optimization
techniques such as optic flow [11] or splines [112]). For ap-
plications such as robot navigation or people tracking, these
may be perfectly adequate. However for image-based ren-
dering, such quantized maps lead to very unappealing view
synthesis results (the scene appears to be made up of many
thin shearing layers). To remedy this situation, many al-
gorithms apply a sub-pixel refinement stage after the initial
discrete correspondence stage. (An alternative is to simply
start with more discrete disparity levels.)
Sub-pixel disparity estimates can be computed in a va-
riety of ways, including iterative gradient descent and fit-
ting a curve to the matching costs at discrete disparity lev-
els [93, 71, 122, 77, 60]. This provides an easy way to
increase the resolution of a stereo algorithm with little addi-
tional computation. However, to work well, the intensities
being matched must vary smoothly, and the regions over
which these estimates are computed must be on the same
(correct) surface.
Recently, some questions have been raised about the ad-
visability of fitting correlation curves to integer-sampled
matching costs [105]. This situation may even be worse
when sampling-insensitive dissimilarity measures are used
[12]. We investigate this issue in Section 6.4 below.
Besides sub-pixel computations, there are of course other
ways of post-processing the computed disparities. Occluded
areas can be detected using cross-checking (comparing left-
to-right and right-to-left disparity maps) [29, 42]. A median
filter can be applied to “clean up” spurious mismatches, and
holes due to occlusion can be filled by surface fitting or
by distributing neighboring disparity estimates [13, 96]. In
our implementation we are not performing such clean-up
steps since we want to measure the performance of the raw
algorithm components.
3.5. Other methods
Not all dense two-frame stereo correspondence algorithms
can be described in terms of our basic taxonomy and rep-
resentations. Here we briefly mention some additional al-
gorithms and representations that are not covered by our
framework.
The algorithms described in this paper first enumerate all
possible matches at all possible disparities, then select the
best set of matches in some way. This is a useful approach
when a large amount of ambiguity may exist in the com-
puted disparities. An alternative approach is to use meth-
ods inspired by classic (infinitesimal) optic flow computa-
tion. Here,images are successively warped and motion esti-
mates incrementally updated until a satisfactory registration
is achieved. These techniques are most often implemented
within a coarse-to-fine hierarchical refinement framework
[90, 11, 8, 112].
A univalued representation of the disparity map is also
not essential. Multi-valued representations, which can rep-
resent several depth values along each line of sight, have
been extensively studied recently,especially for large multi-
view data set. Many of these techniques use a voxel-based
representation to encode the reconstructed colors and spatial
occupancies or opacities [113, 101, 67, 34, 33, 24]. Another
way to represent a scene with more complexity is to use mul-
tiple layers, each of which can be represented by a plane plus
residual parallax [5, 14, 117]. Finally, deformable surfaces
of various kinds have also been used to perform 3D shape
reconstruction from multiple images [120, 121, 43, 38].
3.6. Summary of methods
Table 1 gives a summary of some representative stereo
matching algorithms and their corresponding taxonomy,i.e.,
the matching cost, aggregation, and optimization techniques
used by each. The methods are grouped to contrast different
matching costs (top), aggregation methods (middle), and op-
timization techniques (third section), while the last section
lists some papers outside the framework. As can be seen
from this table, quite a large subset of the possible algorithm
design space has been explored over the years, albeit not
very systematically.
4. Implementation
We have developeda stand-alone, portable C++ implemen-
tation of several stereo algorithms. The implementation is
closely tied to the taxonomy presented in Section 3 and cur-
rently includes window-based algorithms, diffusion algo-
6
d:R2R
E(d)=Edata(d)+λEsmooth(d)
Edata(d)=!
x,y
C(Ileft (x, y),I
right(x+d(x, y ),y))
Esmooth =!
x,y
φ(d(x+1,y)d(x,y ))
+!
x,y
φ(d(x, y + 1) d(x, y))
φ(d)
"!xw(L(#x),L(#x+#x))w(R(#x),R(#x+#x))d(L(#x+#x),R(#x+#x))
"!xw(L(#x),L(#x+#x))w(R(#x),R(#x+#x))
#x=(x, y)
1
F)D./3$#-./.)$M9C)/1-G*2
&
@N.3$,2.$=)9)/OP52.D$5D5+AQ.$B13D)B2
d:R2R
E(d)=Edata(d)+λEsmooth(d)
Edata(d)=!
x,y
C(Ileft (x, y),I
right(x+d(x, y ),y))
Esmooth =!
x,y
φ(d(x+1,y)d(x,y ))
+!
x,y
φ(d(x, y + 1) d(x, y))
φ(d)
"!xw(L(#x),L(#x+#x))w(R(#x),R(#x+#x))d(L(#x+#x),R(#x+#x))
"!xw(L(#x),L(#x+#x))w(R(#x),R(#x+#x))
1
MD5+AQ.$*.-/1=$)R$.//)/$P.-B..3$1*5C.$+5-=G$<$53D$E(
#1*195/1-S$13$=)9)/$)R$
=.3-/59$+1T.9$T$53D$+1T.9$UT$V$WTX
#1*195/1-S$13$=)9)/$)R$+1T.9
13$9.N$H$/1CG-$1*5C.2
#,*$)R$599$B.1CG-2
d:R2R
E(d)=Edata(d)+λEsmooth(d)
Edata(d)=!
x,y
C(Ileft (x, y),I
right(x+d(x, y ),y))
Esmooth =!
x,y
φ(d(x+1,y)d(x,y))
+!
x,y
φ(d(x, y + 1) d(x, y ))
1
F)D./3$#-./.)$M9C)/1-G*2
Y
1L.L$F131*17.(
@N.3$,2.$=)9)/OP52.D$5D5+AQ.$B13D)B2
"9.53$,+$/.2,9-2$,213C$5$Z9)P59$!3./CS$8,3=A)3
[12+5/1-SJ$52$5$
R,3=A)3$)R$UTJSX
\
pf3
pf4
pf5

Partial preview of the text

Download Stereo Matching and Optical Flow Estimation Techniques - Prof. Brian Potetz and more Study notes Electrical and Electronics Engineering in PDF only on Docsity!

<.=-,/.$>:?($@+A=59$89)B

#,CC.2-.D$E.5D13C

F)A)3(

• #G5+1/)$H$#-)=I*53J$"G5+-./$KL

Dynamic Programming

N points have N! possible correspondences. BUT , we might assume the ordering of points is the same in the left & right eyes. Now, solve for the best matching of p4 given p3, etc. a c d e f g k Left scanline i Right scanline a^ c^ f^ g^ j^ k h b M L R R R M L L M M M Figure 2: Stereo matching using dynamic programming. For each pair of corresponding scanlines, a minimizing path through the matrix of all pairwise matching costs is selected. Lowercase letters ( a k ) symbolize the intensities along each scanline. Uppercase letters represent the selected path through the matrix. Matches are indicated by M , while partially occluded points (which have a fixed cost) are indicated by L and R , corresponding to points only visible in the left and right image, respectively. Usually, only a limited disparity range is considered, which is 0–4 in the figure (indicated by the non-shaded squares). Note that this diagram shows an “unskewed” x - d slice through the DSI. ger disparities (exceptions include continuous optimization techniques such as optic flow [11] or splines [112]). For ap- plications such as robot navigation or people tracking, these may be perfectly adequate. However for image-based ren- dering, such quantized maps lead to very unappealing view synthesis results (the scene appears to be made up of many thin shearing layers). To remedy this situation, many al- gorithms apply a sub-pixel refinement stage after the initial discrete correspondence stage. (An alternative is to simply start with more discrete disparity levels.) Sub-pixel disparity estimates can be computed in a va- riety of ways, including iterative gradient descent and fit- ting a curve to the matching costs at discrete disparity lev- els [93, 71, 122, 77, 60]. This provides an easy way to increase the resolution of a stereo algorithm with little addi- tional computation. However, to work well, the intensities being matched must vary smoothly, and the regions over which these estimates are computed must be on the same (correct) surface. Recently, some questions have been raised about the ad- visability of fitting correlation curves to integer-sampled matching costs [105]. This situation may even be worse when sampling-insensitive dissimilarity measures are used [12]. We investigate this issue in Section 6.4 below. Besides sub-pixel computations, there are of course other ways of post-processing the computed disparities. Occluded areas can be detected using cross-checking (comparing left- filter can be applied to “clean up” spurious mismatches, and holes due to occlusion can be filled by surface fitting or by distributing neighboring disparity estimates [13, 96]. In our implementation we are not performing such clean-up steps since we want to measure the performance of the raw algorithm components. 3.5. Other methods Not all dense two-frame stereo correspondence algorithms can be described in terms of our basic taxonomy and rep- resentations. Here we briefly mention some additional al- gorithms and representations that are not covered by our framework. The algorithms described in this paper first enumerate all possible matches at all possible disparities, then select the best set of matches in some way. This is a useful approach when a large amount of ambiguity may exist in the com- puted disparities. An alternative approach is to use meth- ods inspired by classic (infinitesimal) optic flow computa- tion. Here, images are successively warped and motion esti- mates incrementally updated until a satisfactory registration is achieved. These techniques are most often implemented within a coarse-to-fine hierarchical refinement framework [90, 11, 8, 112]. A univalued representation of the disparity map is also not essential. Multi-valued representations, which can rep- resent several depth values along each line of sight, have been extensively studied recently, especially for large multi- view data set. Many of these techniques use a voxel-based representation to encode the reconstructed colors and spatial occupancies or opacities [113, 101, 67, 34, 33, 24]. Another way to represent a scene with more complexity is to use mul- tiple layers, each of which can be represented by a plane plus residual parallax [5, 14, 117]. Finally, deformable surfaces of various kinds have also been used to perform 3D shape reconstruction from multiple images [120, 121, 43, 38]. 3.6. Summary of methods Table 1 gives a summary of some representative stereo matching algorithms and their corresponding taxonomy, i.e., the matching cost, aggregation, and optimization techniques used by each. The methods are grouped to contrast different matching costs (top), aggregation methods (middle), and op- timization techniques (third section), while the last section lists some papers outside the framework. As can be seen from this table, quite a large subset of the possible algorithm design space has been explored over the years, albeit not very systematically.

4. Implementation

We have developed a stand-alone, portable C++ implemen- tation of several stereo algorithms. The implementation is closely tied to the taxonomy presented in Section 3 and cur- d : R^2 → R E(d) = Edata(d) + λEsmooth(d) Edata(d) = ∑ x,y C(Ilef t(x, y), Iright(x + d(x, y), y)) Esmooth = ∑ x,y φ(d(x + 1, y) − d(x, y))

∑ x,y φ(d(x, y + 1) − d(x, y)) φ(∆d) ∑ ∆!x w(L(#x), L ∑(#x^ +^ ∆#x))w(R(#x), R(#x^ +^ ∆#x))d(L(#x^ +^ ∆#x), R(#x^ +^ ∆#x)) ∆!x w(L(#x), L(#x^ +^ ∆#x))w(R(#x), R(#x^ +^ ∆#x)) #x = (x, y) 1

F)D./3$#-./.)$M9C)/1-G*

&

• @N.3$,2.$=)9)/OP52.D$5D5+AQ.$B13D)B

d : R^2 → R E(d) = Edata(d) + λEsmooth(d) Edata(d) = ∑ x,y C(Ilef t(x, y), Iright(x + d(x, y), y)) Esmooth = ∑ x,y φ(d(x + 1, y) − d(x, y))

∑ x,y φ(d(x, y + 1) − d(x, y)) φ(∆d) ∑ ∆!x w(L(#x), L ∑(#x^ +^ ∆#x))w(R(#x), R(#x^ +^ ∆#x))d(L(#x^ +^ ∆#x), R(#x^ +^ ∆#x)) ∆!x w(L(#x), L(#x^ +^ ∆#x))w(R(#x), R(#x^ +^ ∆#x)) 1

MD5+AQ.$.-/1=$)R$.//)/$P.-B..3$15C.$+5-=G$<$53D$E(

#1195/1-S$13$=)9)/$)R$ =.3-/59$+1T.9$T$53D$+1T.9$UT$V$WTX #1195/1-S$13$=)9)/$)R$+1T. 13$9.N$H$/1CG-$15C. #,$)R$599$B.1CG- d : R^2 → R E(d) = Edata(d) + λEsmooth(d) Edata(d) =

x,y C(Ilef t(x, y), Iright(x + d(x, y), y)) Esmooth =

x,y φ(d(x + 1, y) − d(x, y))

x,y φ(d(x, y + 1) − d(x, y))

F)D./3$#-./.)$M9C)/1-G*

Y

1L.L$F131*17.(

d : R^2 → R E(d) = Edata(d) + λEsmooth(d) Edata(d) = ∑ x,y C(Ilef t(x, y), Iright(x + d(x, y), y)) Esmooth = ∑ x,y φ(d(x, y + 1) − d(x, y))

∑ x,y φ(d(x, y + 1) − d(x, y))

• @N.3$,2.$=)9)/OP52.D$5D5+AQ.$B13D)B

• "9.53$,+$/.2,9-2$,213C$5$Z9)P59$!3./CS$8,3=A)

[12+5/1-SJ$52$5$ R,3=A)3$)R$UTJSX \

x,y Esmooth =

x,y φ(d(x + 1, y) − d(x, y))

x,y φ(d(x, y + 1) − d(x, y)) φ(∆d) ∑ ∆!x w(L(#x), L ∑(#x^ +^ ∆#x))w(R(#x), R(#x^ +^ ∆#x))d(L(#x^ +^ ∆#x), R(#x^ +^ ∆#x)) ∆!x w(L(#x), L(#x^ +^ ∆#x))w(R(#x), R(#x^ +^ ∆#x)) #x = (x, y) ri = f ′^ r 0 r 0 · z vi = ∂ri ∂t = f ′^ (ro · z)v 0 − (v 0 · z)r 0 (r 0 · z)^2 1 Esmooth =

x,y φ(d(x + 1, y) − d(x, y))

x,y φ(d(x, y + 1) − d(x, y)) φ(∆d) ∆!x w(L(#x), L ∑(#x^ +^ ∆#x))w(R(#x), R(#x^ +^ ∆#x))d(L(#x^ +^ ∆#x), R(#x^ +^ ∆#x)) ∆!x w(L(#x), L(#x^ +^ ∆#x))w(R(#x), R(#x^ +^ ∆#x)) #x = (x, y) ri = f ′^ r 0 r 0 · z vi = ∂ri ∂t = f ′^ (ro · z)v 0 − (v 0 · z)r 0 (r 0 · z)^2 1 Motion Field

  • Image velocity of a point moving in the scene Perspective projection: Motion field Scene point velocity: Image velocity: Optical Flow
  • Motion of brightness pattern in the image
  • Ideally Optical flow = Motion field Optical Flow Motion Field Motion field exists but no optical flow No motion field but shading changes Optical Flow Constraint Equation
  • Assume brightness of patch remains same in both images: Optical Flow: Velocities Displacement: Aperture Problem Aperture Problem

Optical Flow Constraint Equation Constraint Equation must lie on a straight line We can compute using gradient operators! But, (u,v) cannot be found uniquely with this constraint!

∆!x w(L(#x), L ∑(#x^ +^ ∆#x))w(R(#x), R(#x^ +^ ∆#x))d(L(#x^ +^ ∆#x), R(#x^ +^ ∆#x)) ∆!x w(L(#x), L(#x^ +^ ∆#x))w(R(#x), R(#x^ +^ ∆#x)) #x = (x, y) ri = f ′^ r 0 r 0 · z vi = ∂ri ∂t = f ′^ (ro · z)v 0 − (v 0 · z)r 0 (r 0 · z)^2 δt → 0 ≈ $It(x, y) · (u, v) = −

∂t It(x, y) 1 Optical Flow Constraint Equation Constraint Equation must lie on a straight line We can compute using gradient operators! But, (u,v) cannot be found uniquely with this constraint!

∆!x w(L(#x), L ∑(#x^ +^ ∆#x))w(R(#x), R(#x^ +^ ∆#x))d(L(#x^ +^ ∆#x), R(#x^ +^ ∆#x)) ∆!x w(L(#x), L(#x^ +^ ∆#x))w(R(#x), R(#x^ +^ ∆#x)) #x = (x, y) ri = f ′^ r 0 r 0 · z vi = ∂ri ∂t = f ′^ (ro · z)v 0 − (v 0 · z)r 0 (r 0 · z)^2 δt → 0 ≈ $It(x, y) · (u, v) = −

∂t It(x, y) 1 Optical Flow Constraint Equation Constraint Equation must lie on a straight line We can compute using gradient operators! But, (u,v) cannot be found uniquely with this constraint! d : R^2 → R E(d) = Edata(d) + λEsmooth(d) Edata(d) =

x,y C(Ilef t(x, y), Iright(x + d(x, y), y)) Esmooth =

x,y φ(d(x + 1, y) − d(x, y))

x,y φ(d(x, y + 1) − d(x, y)) φ(∆d) ∑ ∆!x w(L(#x), L ∑(#x^ +^ ∆#x))w(R(#x), R(#x^ +^ ∆#x))d(L(#x^ +^ ∆#x), R(#x^ +^ ∆#x)) ∆!x w(L(#x), L(#x^ +^ ∆#x))w(R(#x), R(#x^ +^ ∆#x)) #x = (x, y) ri = f ′^ r 0 r 0 · z vi = ∂ri ∂t = f ′^ (ro · z)v 0 − (v 0 · z)r 0 (r 0 · z)^2 δt → 0 ≈ $It(x, y) · (u, v) = −

∂t It(x, y) 1 Computing Optical Flow

  • Formulate Error in Optical Flow Constraint:
  • We need additional constraints!
  • Smoothness Constraint (as in shape from shading and stereo): Usually motion field varies smoothly in the image. So, penalize departure from smoothness:
  • Find (u,v) at each image point that MINIMIZES: weighting factor Discrete Optical Flow Algorithm Consider image pixel
  • Departure from Smoothness Constraint:
  • Error in Optical Flow constraint equation:
  • We seek the set that minimize: NOTE: show up in more than one term Discrete Optical Flow Algorithm
  • Differentiating w.r.t and setting to zero: Update Rule: are averages of around pixel

Example (^) Optical Flow Result Low Texture Region - Bad

  • gradients have small magnitude Edges – so,so (aperture problem) - large gradients, all the same High Textured Region - Good
  • gradients are different, large magnitudes Revisiting the Small Motion Assumption
  • Is this motion small enough?
  • Probably not—it’s much larger than one pixel (2nd^ order terms dominate)
  • How might we solve this problem?