Stereo Matching Algorithm: Dynamic Programming and SSD Cost Function, Assignments of Computer Science

The implementation and testing of a stereo matching algorithm using dynamic programming and the sum of squared differences (ssd) cost function. It covers topics such as matching cost computation, complexity analysis, and generating ssd images for stereo pairs. Students of computer vision or image processing may find this useful for understanding stereo matching techniques.

Typology: Assignments

Pre 2010

Uploaded on 08/05/2009

koofers-user-5tg-1
koofers-user-5tg-1 🇺🇸

10 documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS 7495 Computer Vision
Problem Set 2 - Stereo Matching
Georgia Tech Fall 2002
Zhou, Zhen Hao (Howard)
Thursday, September 12, 2002
Problem Set 2 - Stereo Matching
In this assignment you will implement and test a stereo matching algorithm based on dynamic pro-
gramming (DP) and the SSD cost function. You may find the supplemental readings on stereo listed
on the course web site to be helpful, but they are not required. In this problem set it may be helpful
to use some of the image processing functions in the OpenCV library.
Grading: There are two parts which have equal weight and I expect them each to take about a week
to complete. You have the option of turning in your answers for part 1 early (by Sept. 17 or later). If
you choose to do this, we will give you solution code for part 1 which you can build on in doing part 2
(and we will grade your part 1 earlier). I encourage everyone to do this, but it is not required.
1 Matching Cost Computation
A fundamental issue in stereo matching is selecting the matching cost function which measures the
similarity between two Nby Nwindows of pixels. We desire a robust measure which can be implemented
efficiently.
(a) The normalized correlation cost function NC(IL, IR) can be viewed as an inner product between
two vectors of pixels. Prove that if ILand IRare two image patches related by IR=aIL+bfor
any constants aand b, then NC (IL, IR) = 1
A: Let nbe the common dimension of the vectors wLand wR, which each contains all the pixels’
intensities from the corresponding image patches, so we have
wR=
u1
u2
...
un
,wL=
v1
v2
...
vn
by affine transformation on image intensity for the two image patches IR=aIL+b, we have
wR=
u1
u2
...
un
=
av1+b
av2+b
...
avn+b
=awL+b,wR=awL+bwhere b=
b
b
...
b
Hence
wRwR= (awL+b)(awL+b) = a(wLwL)
Therefore, we conclude
NC(IL, IR) = C(d) = (wRwR)
||wRwR||
(wLwL)
||wLwL|| =a(wLwL)
a||wLwL||
(wLwL)
||wLwL|| =||wLwL||2
||wLwL||2= 1
1
pf3
pf4
pf5
pf8

Partial preview of the text

Download Stereo Matching Algorithm: Dynamic Programming and SSD Cost Function and more Assignments Computer Science in PDF only on Docsity!

CS 7495 Computer Vision Problem Set 2 - Stereo Matching Georgia Tech Fall 2002

Zhou, Zhen Hao (Howard) [email protected] Thursday, September 12, 2002

Problem Set 2 - Stereo Matching

In this assignment you will implement and test a stereo matching algorithm based on dynamic pro- gramming (DP) and the SSD cost function. You may find the supplemental readings on stereo listed on the course web site to be helpful, but they are not required. In this problem set it may be helpful to use some of the image processing functions in the OpenCV library.

Grading: There are two parts which have equal weight and I expect them each to take about a week to complete. You have the option of turning in your answers for part 1 early (by Sept. 17 or later). If you choose to do this, we will give you solution code for part 1 which you can build on in doing part 2 (and we will grade your part 1 earlier). I encourage everyone to do this, but it is not required.

1 Matching Cost Computation

A fundamental issue in stereo matching is selecting the matching cost function which measures the similarity between two N by N windows of pixels. We desire a robust measure which can be implemented efficiently.

(a) The normalized correlation cost function N C(IL, IR) can be viewed as an inner product between two vectors of pixels. Prove that if IL and IR are two image patches related by IR = aIL + b for any constants a and b, then N C(IL, IR) = 1 A: Let n be the common dimension of the vectors wL and wR, which each contains all the pixels’ intensities from the corresponding image patches, so we have

wR =

u 1 u 2 ... un

 ,^ wL^ =

v 1 v 2 ... vn

by affine transformation on image intensity for the two image patches IR = aIL + b, we have

wR =

u 1 u 2 ... un

av 1 + b av 2 + b ... avn + b

 =^ awL^ +^ b,^ wR^ =^ awL^ +^ b^ where^ b^ =

b b ... b

Hence wR − wR = (awL + b) − (awL + b) = a(wL − wL) Therefore, we conclude

N C(IL, IR) = C(d) = (wR − wR) ||wR − wR||

(wL − wL) ||wL − wL||

a(wL − wL) a||wL − wL||

(wL − wL) ||wL − wL||

||wL − wL||^2 ||wL − wL||^2

(b) Imagine a very simple stereo algorithm in which each window in the left image is matched in- dependently by computing the match scores for D different discrete values of disparity and then picking the best match. (Unlike the DP approach, this method does not enforce consistency across the set of matches). For simplicity, assume that each pixel is matched (i.e. occlusions are not allowed). What is the complexity of the total computational cost for matching a scanline using N C, as a function of the window size N , the number of disparity levels D, and the width of the image M?

A: The total computational cost is O(M DN 2 ) The following is the pseudo-code that best demonstrates the computational complexity.

1: for all p ∈ {p 1 , p 2 ,... , pM : all points on the scanline} do 2: for all d ∈ D do 3: compute N C for wL and wR, where wL, wR ∈ RN^

2

4: end for 5: end for

(c) An alternative to N C is a raw (un-normalized) SSD cost function, which computes the sum of squared intensity differences between two windows using the original image pixels. One potential advantage of this cost function is that it can be computed very efficiently. In this case, we can divide the matching cost computation into two phases: computation and aggregation.

In the computation phase, a set of squared difference images is computed by shifting the en- tire left image by a constant disparity level d and computing, for each pixel individually, the squared intensity difference with respect to the right image. This gives a set of D squared differ- ence images: SDIF F (x, y, d) = [IL(x − d, y) − IR(x, y)]^2.

In the aggregation phase, each squared difference image is convolved with an N by N box filter, yielding at each location x, y the sum of squared differences of pixels within an N by N neigh- borhood. Therefore: SSD(x, y, d) = SDIF F (x, y, d) ∗ ∗BN (x, y), where BN (x, y) is the box filter centered at x, y(which is equal to 1 inside an N by N squared and 0 outside), and ’**’ denotes convolution. What is the complexity cost of scanline matching in this case (analogous to (b) above)? Compare and contrast the two cost functions.

Note: The box filter convolution is separable, and can be implemented very efficiently as two separate moving averages in x and y. Consider aggregating in the x direction first: Given the sum at (x, y) based on an interval of N pixels, the sum at (x + 1, y) can be obtained by subtracting the leftmost old pixel and adding the rightmost new pixel. In other words,

SSDX(x + 1, y, d) = SSDX(x, y, d) − SDIF F (x − n, y, d) + SDIF F (x + n + 1, y, d) (1)

where n = N^2 − 1. SSD is formed in a similar manner from SSDX by summing up its columns.

(f ) Compute the SSD images for the following two stereo pairs: [sawtooth-left.bmp, sawtooth- right.bmp] and [tsukuba-left.bmp, tsukuba-right.bmp]. The sawtooth image is a ”synthetic” image with 3 disparity levels, while the Tsukuba image is a staged scene from Ohta’s lab at U.Tsukuba. (both of these pairs were taken from the Middlebury Stereo Vision Page and are described in the paper by Scharstein and Szeliski. What range of disparity values is present in each of these stereo pairs?

A: In the sawtooth pair, there are 3 disparity levels. the top left portion of the image has d = 8, the top right d = 4, and the bottom portion has d = 14 or 15.

In the tsukuba pair, the disparity range is from 2 to 15 from my program.

2 Stereo Using Dynamic Programming

For this part you will implement a dynamic programming algorithm to match two images, one scan-line at a time. You will exploit the ordering constraint that we discussed in class. The class presentation of DP involved an M by M lattice which encoded all possible matches between pixels, given the ordering constraint. In practice, we would prefer not to have to consider all possible matches. For example, most real scenes have a restricted range of depths. By restricting the number of disparity values which have to be considered, we can save both computation and storage.

(a) With respect to the virtual image, we write the stereo equations as xL = f X Z and xR = f X− ZB and define the disparity d = xL − xR. Show that d ≥ 0. What implication does this have for the search lattice in the dynamic programming approach to matching? In the following DP lattices, which paths correspond to valid stereo pairs? Sketch the depth profile of a possible scene in each valid case. By depth profile we mean the depths in the scene for a single scanline, correspond- ing to an X − Z slice through some 3-D surface. (Numerical accuracy is not required here. A piecewise planar scene with the right qualitative structure as far as occlusions and disocclusions are concerned is all that is necessary.) Number the key points in the depth profile and their corresponding locations on the DP lattice. Most stereo algorithms consider a range of discrete disparity values, including d = 0. What is the meaning of zero disparity in light of d > 0?

A: By definition and the setup of the stereo equations, we have f, B, Z ≥ 0. Hence

d = xL − xR = f

X

Z

− f

X − B

Z

= f

B

Z

The implication: the resulting path cannot cross the diagonal line of the search lattice in the DP approach. Therefore, in the following DP lattices, path (a) and (c) are valid, as below. d = 0 means that the pixels at the same position in the two images are matched.

Left Scanline Left Scanline Left Scanline

Right Scanline

a 0 a 1

a 2 a 3

a 4 a 5

Right Scanline Right Scanline

c 0

c (^1) c 2

c 3

c 4

c 5

(a) (b) (c)

a 0 a^1 a 2

a 3 a (^4) a 5 c 0 c 1

c 2 c 3

c (^4) c 5

c. Suppose the predecessor state states that, WLOG, pixel xL = i is occluded and the next pixel xL = i + 1 should be matched to xR = xL − d = i − d in the right image. The next state L indicates another occlusion at xL = i + 1; however, with the disparity d, we will move the next target to match in the right image to xR = i + 1 − d, and lose pixel xR = i − d, which is neither matched nor disoccluded. Hence, transition (j) should not be included in the list.

d. Suppose at the L state in (k), xL = i (occluded) and d = j, since xL = i is occluded, we are ready to move to xL = i + 1 and try to match it with xR = i − j; however, if these two points don’t match, we are left with two choices, to mark this as another occlusion or a disocclusion. However, in real life, it is more likely that an occlusion will occur than that of a disocclusion since its not common to have a disocclusion immediately following an occlusion; therefore we choose using occlusion here as a convention. Another argument is that at that point, we cannot concluded xR = i − j is visible in the right image only since it may match the next xL; therefore transition l to R is not used, hence (k).

(e) Pseudocode: DSI forward

1: ;; initialization 2: for all (x, d) do 3: cost(x, d, M ) ← SSD(x, d) 4: cost(x, d, L) ← occlusion cost 5: cost(x, d, R) ← occlusion cost 6: end for 7: DSI cost(W − 1 , 0 , M ) Pseudocode: DSI cost(x,d,s) 1: ;; initial condition 2: if x = 0 then 3: return ← DSIcost(0, 0 , M ) 4: DSI ptr ← N U LL 5: else 6: for all valid bpi(x, d, s) do 7: return ← cost(x, d, s) + min(DSI cost(bpi(x, d, s))) 8: DSI ptr(x, d, s) ← recover ptr(min(cost(bpi(x, d, s)))) 9: end for 10: end if

(g) We can set all match costs and occlusion costs on the path to be 20, and everywhere else to be maximum value achievable. Hence path will definitely be chosen.

(i) In the sawtooth image, my implementation cannot fully recover the tooth, especially the second and the third counting from the right edge. I suspect that this particular mistake is caused by the fact that there are white region inside the black tooth, which feature was amplified during the SSD convolution process and covered the tooth feature. The disparity image as a whole is a success since for most part of the image, it is very similar to the ground truth one. For the tsukuba image, the image as a whole looks a lot like the ground truth one. However, there are small features missing such as the arm of the lamp and camera. There are also some noise in the corners which, I suspect, comes from our restriction of (0, 0, M).

Disparity 4 to 18 for sawtooth and 0 to 14 for tsukuba gave better results with the choice of occlusion cost(OC) being from 40000 to 50000. If we set OC too low, noises would start appear

around edges, on the other hand, if we set it too high, the image will be overly smoothed and it would be hard to spot stereo features.