Acoustic Structure from Motion, Lecture notes of Microwave Engineering and Acoustics

with sonar, we introduce the concept of acoustic structure from motion (ASFM), ... In summary, the algorithm goes through the following steps:.

Typology: Lecture notes

2022/2023

Uploaded on 05/11/2023

explain
explain 🇺🇸

4

(2)

230 documents

1 / 53

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Acoustic Structure from Motion
Tiffany A. Huang
May 2016
Carnegie Mellon University
Pittsburgh, Pennsylvania 15213
CMU-RI-TR-16-08
Thesis Committee
Prof. Michael Kaess, Chair
Prof. David Wettergreen
Sanjiban Choudhury
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35

Partial preview of the text

Download Acoustic Structure from Motion and more Lecture notes Microwave Engineering and Acoustics in PDF only on Docsity!

Acoustic Structure from Motion

Tiffany A. Huang

May 2016

Carnegie Mellon University

Pittsburgh, Pennsylvania 15213

CMU-RI-TR-16-

Thesis Committee

Prof. Michael Kaess, Chair

Prof. David Wettergreen

Sanjiban Choudhury

Contents

  • Abstract
  • 1 Introduction
    • 1.1 Motivation
    • 1.2 Problem Statement
  • 2 Related Work
    • 2.1 Localization and 3D Reconstruction from Imaging Sonar
    • 2.2 Data Association for Imaging Sonar
  • 3 Sonar Geometry
    • 3.1 Cartesian to Polar Transformation
    • 3.2 Sonar and Odometry Models
    • 3.3 Arc Reprojections
  • 4 Acoustic Structure from Motion
    • 4.1 Pose Graph Formulation
    • 4.2 Nonlinear Least-Squares
    • 4.3 Existence of Solution
    • 4.4 Relative Parameterization
    • 4.5 Degenerate Cases
  • 5 Automatic Data Association
    • 5.1 Data Association Challenges
    • 5.2 Incremental Data Association Algorithm
  • 6 Experimental Results
    • 6.1 ASFM Optimization and 3D Reconstruction
      • 6.1.1 Simulation Setup
      • 6.1.2 Simulation Results
    • 6.2 Relative Parameterization
    • 6.3 Data Association
      • 6.3.1 Simulation Setup
      • 6.3.2 Simulation Results
    • 6.4 Imaging Sonar Sequence
      • 6.4.1 Imaging Sonar Experimental Setup
      • 6.4.2 3D Reconstruction with Automatic Data Association

Abstract

Although the ocean spans most of the Earth’s surface, our ability to explore and perform tasks underwater is still limited by high costs and slow, inefficient 3D mapping and localization techniques. Due to the short propagation range of light underwater, imaging sonar or forward looking sonar (FLS) is commonly used for autonomous underwater vehicle (AUV) navigation and perception. A FLS pro- vides bearing and range information to a target, but the elevation of the target is unknown within the sensor’s field of view. Hence, current state-of-the-art tech- niques commonly make a flat surface (planar) assumption so that the FLS data can be used for navigation. Towards expanding the possibilities of underwater operations, a novel approach, entitled acoustic structure from motion (ASFM), is presented for recovering 3D scene structure from multiple 2D sonar images, while at the same time localizing the sonar. Unlike other methods, ASFM does not re- quire a flat surface assumption and is capable of utilizing information from many frames, as opposed to pairwise methods that can only gather information from two frames at once. The optimization of several sonar readings of the same scene from different poses, the acoustic equivalent of bundle adjustment, and automatic data association is formulated and evaluated on both simulated data and real FLS sonar data.

1 Introduction

1.1 Motivation

Mapping and state estimation have been widely explored for autonomous vehicles that operate on land and in the air. However, for an environment that spans the majority of our planet Earth, surprisingly little progress has been made towards the same autonomous abilities underwater. For instance, the rift valley of the Mid- Atlantic Ridge, an underwater mountain range and one of the largest geographical features in the world, was not explored by humans until 1973, four years after the first humans landed on the Moon [15]! Currently, most underwater tasks are performed by human divers or remotely operated vehicles (ROVs). Autonomous underwater vehicles (AUVs) open the door to exciting new possibilities for under- water exploration such as venturing into areas too dangerous for human divers or exploring large areas much faster and more efficiently. Furthermore, AUVs have the potential to eliminate the tedium and high costs of ROV missions. More specifically, in this work we focus on the problem of simultaneous local- ization and mapping (SLAM) for AUVs, or building a map and pinpointing the vehicle’s location without any prior knowledge about the environment. One par- ticular challenge underwater is the necessity of non-conventional sensors such as sonar. Due to the turbidity of some water environments as well as the short propagation range of light in water, more common and well-studied sensors such as cameras and LIDAR do not work well underwater. As for localization, GPS cannot be used since radio waves do not travel well in water. Feature extraction and data association, or finding which measurements from different views correspond to the same object, make up the first part of the SLAM problem. Once feature correspondences are known, the constraints can then be fed into the second part, the optimization, to find the maximum likelihood set of robot poses and landmark positions. Data association is crucial because incorrect correspondences can drastically degrade the quality of the resulting map and tra- jectory. An erroneous data association will pull the poses and landmarks in the optimization out of their correct positions in an attempt to satisfy incorrect con- straints. Thus, it is important for the data association algorithm to be as accurate and robust as possible. Due to the challenges mentioned above, SLAM algorithms using sonar have not been well-studied or developed for general underwater environments. Towards real-time autonomous navigation and creating a faster and more accurate 3D map with sonar, we introduce the concept of acoustic structure from motion (ASFM),

1.2 Problem Statement

looking sonar sensors are available (e.g. Blueview 3DFLS), but they are both more expensive and slower to image a given volume (because of the low speed of sound in water), requiring up to 4 seconds for a single sweep at a short 6 m range, and more time for larger ranges. Thus, for many applications, it is advantageous to apply a 3D reconstruction technique with a FLS rather than utilize a 3D sonar directly. We explore the optimization and automatic data association of ASFM. Most sections will be split into these two parts to discuss the methodology and results behind each component.

1.2 Problem Statement

In order to explore and work in the watery environments that cover the vast ma- jority of our planet, it is necessary to have a reliable way to image and map general scenes underwater. While several methods exist to process sonar images, most re- quire a planar assumption about the scene. How can we extend our understanding of sonar images to encompass general scenes for 3D mapping and AUV navigation? Our work makes the following contributions:

  1. We present a novel method, acoustic structure from motion, for localization and general-scene 3D mapping underwater using sonar.
  2. We present a novel automatic data association algorithm for sonar images.
  3. We demonstrate our ability to localize the sonar and recover 3D structure from simulation and real data sequences.

2 Related Work

2.1 Localization and 3D Reconstruction from

Imaging Sonar

Various other works have explored different ways to localize the AUV from sonar images, but most current methods require a planar scene assumption. Johanns- son et al. [12] and Hover et al. [9] extract points with high gradients from the sonar image and cluster the points to use as features. Next, a normal distribution transform algorithm is applied to serve as a model for image registration. The entire trajectory of the AUV is put into a pose-graph smoothing algorithm, and the optimized trajectory shows significant improvements over dead reckoning from the Doppler Velocity Log (DVL). However, to solve the ambiguity in elevation of the points presented by sonar, the points are assumed to lie on a plane that is level with the vehicle. This planar assumption works well for the non-complex areas of a ship hull, the main application of their work, but induces large errors for many other environments. ASFM does not require this assumption, making it useful for a wider range of applications. Hurtos et al. [11] explore a different approach, using Fourier-based techniques instead of feature points for registration. However, the authors primarily focus on applications in 2D mapping, so do not address 3D geometry in detail. To recover 3D geometry of a scene using imaging sonar, most techniques employ a pairwise registration approach. Babaee et al. [4] use a stereo imaging system composed of one sonar and one optical camera where the centers of the two sensors’ coordinate systems and their axes align. The trajectory of the stereo system is calculated using opti-acoustic bundle adjustment. Assalih [1] once again exploits the stereo idea, but instead uses two imaging sonars placed one on top of the other. In contrast, ASFM requires only one sensor and water turbidity is not an issue because no optical cameras are involved. Our work is more similar to Brahim et al. [5] where point-based features are used with evolutionary algorithms to recover 3D geometry from pairs of sonar frames. Unlike Assalih and Brahim however, ASFM is capable of using information from multiple viewpoints as opposed to only pairs of images. Multiple viewpoints add more information and can further constrain the problem to result in more accurate reconstruction than pairwise matching. Aykin and Negahdaripour [3] relax the planar assumption for pairwise matching of sonar frames but still assume a locally planar surface in order to include shadow information. They show improvements over Johannsson [12] by instead applying a Gaussian distribution transform to the images. Negahdaripour [16] extends this

3 Sonar Geometry

Figure 3.1: Imaging sonar geometry. Any 3D point along the dashed red elevation arc will appear as the same image point in the xy plane. Bearing angle ψ and range r are measured, but the elevation angle θ is lost in the projection process.

Before discussing ASFM further, it is important to understand the information provided in a FLS sonar image. The imaging sonar sends out an acoustic ping and measures the intensity of acoustic waves reflected back from objects inside of a frustum defined by the sonar’s bearing field of view (FOV) (deg), elevation FOV (deg), and minimum and maximum range (m). The returns from one ping are put together to form an intensity image, where each pixel represents a bearing and range bin, discretized per the specifications of the sonar. As seen in Fig. 3.1, the sonar only provides partial information about a feature (bearing ψ and range r ) and does not provide its elevation angle θ. In a 1-D array of receivers, which is typical for a FLS, the difference between the time it takes for one receiver to detect a signal and another receiver to detect the same signal denotes the bearing of the feature. The range is determined by the time of flight of the sound wave. The elevation of the point is lost, as all points along an elevation arc will collapse to the same pixel in the sonar image. Since one dimension of the feature is missing, one sonar image is not sufficient to recover 3D geometry.

3.1 Cartesian to Polar Transformation

3.1 Cartesian to Polar Transformation

In all of the imaging sonar data sequences we use for our experiments, features are extracted from the Cartesian sonar image. The original polar (bearing/range) image returned by the sonar is converted to a Cartesian sonar image by finding and solving an analytic function that describes the mapping between the pixels of a Cartesian image of a given width in the sonar frame and the corresponding pixels for the polar image in the sonar frame. This mapping is also used to convert features in the Cartesian image to bearing/range measurements. All points in the sonar field of view are projected along a circular arc onto the plane of the sonar, so points returned by the sonar can lie anywhere along an arc spanning the vertical aperture of the sensor. Note that this projection implies that in the sonar point of view, all points have zs = 0. The following equations describe the mapping between the Cartesian image coordinates ( u, v ) and the polar image bearing and range bin ( nb, nr ).

γ =

w 2 rmax sin( ψmax 2 )

xs =

uw 2 γ

ys = rmax

v γ

r =

x^2 s + y s^2 (3.4)

ψ =

π

atan2( xs, ys ) (3.5)

nr =

Nr ( rrmin ) rmaxrmin

nb = M 4 ( Nb, ψ ) (3.7)

where γ is a constant, w is the width of the Cartesian image in pixels, rmin is the minimum range of the sonar, rmax is the maximum range of the sonar, Nr is the number of range bins, ψmax is the bearing field of view of the sonar, and M 4 ( Nb, ψ ) is a third-order polynomial (with 4 coefficients determined by the number of bear- ing bins ( Nb )) given by the sonar manufacturer that accounts for lens distortion. In our experiments with the Sound Metrics DIDSON 300 m sonar, w = 200 pixels, rmin = 0_._ 75 m, rmax = 5_._ 25 m, Nr = 512, Nb = 96, and ψmax = 28_._ 8 ◦. The bearing and range bins are then converted to bearing and range measure-

3.3 Arc Reprojections

− 2 0 2 0

2

4

6

8

10

Pose 2

Y (m)

X (m)

(a) Pitch − 90 ◦

−^73 − 2 − 1 0 1 2 3

8

9

10

Pose 2

Y (m)

X (m)

(^) (b) Forward x

− 2 0 2 0

2

4

6

8

10

Pose 2

Y (m)

X (m)

(c) Roll 45 ◦

Figure 3.2: Elevation arc reprojections (green points) for (a) -90◦^ pitch, (b) for- ward x , and (c) 45◦^ roll motion. The magenta diamond is the true measurement of the 3D point in the current pose.

that simple. The examples shown in Fig. 3.2 demonstrate several different possible geometries resulting from the elevation arc of one point reprojected into another sonar image. For − 90 ◦^ pitch (accompanied by forward x and upward z motion so that the sonar FOVs would overlap), one could imagine that instead of the sonar rotating, the elevation arc pitches − 90 ◦^ in the viewpoint of the new sonar frame. Conse- quently, the elevation arc becomes a distribution of 3D points that looks like a hill at similar bearing but different ranges. Mapped onto the 2D sonar image, this looks like a nearly vertical line, as evidenced by Fig. 3.2a. For the forward x motion case, the top half of the elevation arc above 0 elevation will map to the same points as the bottom half of the elevation arc, so we see a small vertical line, which should be proportional to the arc’s curvature. In this example (Fig. 3.2b) the curvature was quite small, resulting in a very short vertical line. Finally, for the 45 ◦^ roll example, one could once again imagine that instead of the sonar ro- tating, the elevation arc rolls 45 ◦^ in the viewpoint of the new sonar frame. The resulting arc (Fig. 3.2c) is now a horizontal arc instead of a vertical arc, and it is shorter than the original arc. The new horizontal arc would be the same length as the original arc if we had rolled 90 ◦^ instead. From these examples, it is clear that the reprojection of one point into another sonar image does not result in a simple geometry that can be easily exploited. The elevation arc reprojections can appear as many different geometries depending on the motion between sonar poses.

4 Acoustic Structure from Motion

ASFM is inspired by a related problem in computer vision called structure from motion (SFM), which uses multiple camera images of a scene to recover 3D ge- ometry as well as camera locations [8]. Much of the high-level formulation of the two problems are similar because like sonar images, camera images only give 2D information about the scene. However, a critical difference between the two sen- sors highlights the novelty and challenges of ASFM. Cameras provide elevation and bearing of a feature, but not the depth, while as mentioned before, sonars provide bearing and depth, but not elevation. This difference implies that new sensor models, parameterizations, and degenerate cases will have to be explored before ASFM can be used successfully.

4.1 Pose Graph Formulation

Figure 4.1: Factor graph representation of the acoustic structure from motion problem. Variable nodes consist of the underwater vehicle poses xi and the point features lj. The black dots represent factor nodes, which are derived from odometry measurements ui and feature observations mk. The unary factor p represents a prior on the first pose that defines the reference frame.

We represent the ASFM problem as a factor graph [13] (Fig. 4.1). A factor graph is a bipartite graph with two node types: variable nodes that represent the poses xi and landmarks lj to be estimated, and factor nodes that represent odometry ui and point feature sonar measurements mk. An edge in the factor graph connects one factor node with two variable nodes. Here, almost all factors are binary, i.e. they connect only two variables. Only one factor, p , is unary, and it is a prior that defines a reference frame, eliminating otherwise unconstrained degrees of freedom. The factor graph was chosen because it captures the underlying dependence structure of the ASFM estimation problem. Since the measurements ui and mk are known, they are represented as factors of the joint probability over the unknowns, the variable nodes xi and lj. In fact, the goal of ASFM is to find the maximum

4.2 Nonlinear Least-Squares

Here we have made use of the monotonicity of the logarithm function. We find an initial estimate for the feature points by backprojection of the sonar measurements. We use the first observation of each feature, consisting of a range r and bearing ψ measurement. We apply the backprojection function

  

xs ys zs

   =^ r

  

cos ψ cos θ sin ψ cos θ sin θ

   (4.4)

where we set the unknown elevation angle θ to 0. The sonar pose xi is then used to convert the point from sonar Cartesian coordinates ( xs, ys, zs ) to world Cartesian coordinates ( xg, yg, zg ), which serve as initial guesses for the 3D position of the features. Starting from this initial estimate, the nonlinear least-squares problem is solved by iterative linearization. For nonlinear measurement functions, nonlinear opti- mization methods such as Gauss-Newton or the Levenberg-Marquardt algorithm solve a succession of linear approximations in order to approach the minimum. At each iteration of the nonlinear solver, we linearize around the current estimate Θ to get a new, linear least-squares problem in

argmin ∆

A b ‖^2 , (4.5)

where A ∈ R U^ × V^ is the measurement Jacobian consisting of U = 6 N + 2 M mea- surement rows, and is an V -dimensional vector, where V = 6 N + 3 M. Note that each odometry measurement has 6 degrees of freedom (DOF) and each sonar measurement has 2, while each vehicle pose has 6 DOF and each landmark has 3 DOF. Note that the covariances Σ i , which represent covariances such as Λ i and Ξ k in Eq. 4.3, have been absorbed into the corresponding block rows of A , making use of ‖ ‖^2 Σ = T^ Σ−^1 = T^ Σ−^

T 2 Σ−^

(^12) =

∥∥ ∥Σ−^

(^12)

∥∥ ∥

2

. (4.6)

Once is found, the new estimate is given by Θ ⊕ , which is then used as the linearization point in the next iteration of the nonlinear optimization. The operator ⊕ is often simple addition, but for overparametrized quantities such as 3D rotations, an exponential map is used instead to locally obtain a minimal representation. The minimum of the linear system A b is obtained by Cholesky factor- ization. By setting the derivative in to zero we obtain the normal equations AT^ A = AT^ b. Cholesky factorization yields AT^ A = RT^ R , and a forward and backsubstitution on RT^ y = AT^ b and R = y first recovers y , then the actual solution, the update .

4.3 Existence of Solution

4.3 Existence of Solution

We discuss under which conditions the system of equations is solvable by analyzing the number of feature points that need to be observed to fully constrain the system. Let N be the number of poses, and M be the number of points to reconstruct. For every pose, there are 6 unknowns ( x, y, z, yaw, pitch, roll ) and for every point there are 3 unknowns ( x, y, z ). The first pose is fixed using a prior, so there are 0 degrees of freedom for the first pose. In the case where all features are visible from each pose, there are 2 N equations for each point, and the system is fully constrained iff: 6( N − 1) + 3 M ≤ 2 M N (4.7)

Since we are not restricted to pairs of sonar views, our simulated examples in later sections use information from 3 sonar viewpoints. From Eq. 4.7 we see that for 3 sonar views, a minimum of 4 points are needed to fully constrain the estimation problem. In our real sonar data experiments, features from 5 poses are used; thus, a minimum of 4 points are needed to make 3D reconstruction possible.

4.4 Relative Parameterization

p x^0 x^1 xn-1^ xn

l 1 l 2

u 1 un

m 1 m 2 m 3 m 4

Figure 4.2: The SLAM factor graph using a relative parameterization. All of the landmark measurements are represented relative to the first sonar pose that has seen that landmark.

Depending on the shape of the optimization function and the quality of the initial estimate, the nonlinear optimization can take a long time to converge. In ASFM, this problem is exacerbated by complicated posterior densities created from the parameterization of the landmarks in Cartesian coordinates. As the elevation of a landmark is being optimized, the landmark must move along an elevation arc, which is nonlinear. Additionally, three Cartesian coordinates have to be changed each time the landmark is moved. A similar issue exists in optical SFM, and to improve convergence properties, homogeneous coordinates were introduced as a solution. Along the same lines, we explore an alternative parameterization of

5 Automatic Data Association

5.1 Data Association Challenges

Unlike camera images, sonar images are much less intuitive to understand and interpret. An example is given in Fig. 5.1. Assume the AUV is imaging a stair-like structure underwater and we have manually picked out some point features that intuition would lead us to believe are stable, like the corners along one edge of the stairs. Since the vertical axis of the image denotes distance along the viewing axis of the sonar and the blue feature appears to be closest to the sonar, the blue point appears as the bottom-most feature in the sonar image. The next closest point to the sonar looks to be the red feature, then the green, then the purple. Note that just looking at the final ordering of the feature points in the sonar image does not give a helpful indication of the true 3D structure. From only the sonar image, it would be almost impossible to tell that in 3D, the blue feature point is in fact between the green and the purple feature points. To confuse data association further, moving the sonar angle changes the order- ing of the feature points because the distance between the features and the sonar changes. Therefore, in (b) of Fig. 5.1, the sonar moves and the resulting sonar image contains a different ordering of feature points. In this case, the sonar moves closer to the red feature point and further from the blue feature point. Conse- quently, the blue and red features switch places in the second image. Without knowing the exact motion of the AUV, even manually assigning feature correspon- dences becomes difficult. It would be very challenging to correctly associate the blue feature point in the first image to the blue feature point in the second image.

5.1 Data Association Challenges

(a)

(b)

Figure 5.1: Data association for imaging sonar presents several challenges. First, sonar images are very non-intuitive to interpret. The representation of structure in the image does not follow the visual image projection that we are familiar with in camera images. This can be seen in (a) where the order of the colored points do not agree with our intuition based on visual imagery. Second, different angles of sonar viewing could produce similar images but with different correspondences between feature points and real 3D points. The difference between (a) and (b) serves as an example. This complication makes even manual data association difficult.