Mathematical Methods for Computer Vision, Robotics, and ..., Exams of Calculus

In this chapter we will review relevant notions from linear algebra and multivariable calculus that will figure into our discussion of computational ...

Typology: Exams

2022/2023

Uploaded on 05/11/2023

anvi
anvi 🇺🇸

4.8

(4)

228 documents

1 / 219

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Mathematical Methods for Computer
Vision, Robotics, and Graphics
Course notes for CS 205A, Fall 2013
Justin Solomon
Department of Computer Science
Stanford University
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download Mathematical Methods for Computer Vision, Robotics, and ... and more Exams Calculus in PDF only on Docsity!

Mathematical Methods for Computer

Vision, Robotics, and Graphics

Course notes for CS 205A, Fall 2013

Justin Solomon

Department of Computer Science

Stanford University

  • I Preliminaries
  • 0 Mathematics Review
    • 0.1 Preliminaries: Numbers and Sets
    • 0.2 Vector Spaces
      • 0.2.1 Defining Vector Spaces
      • 0.2.2 Span, Linear Independence, and Bases
      • 0.2.3 Our Focus: R n
    • 0.3 Linearity
      • 0.3.1 Matrices
      • 0.3.2 Scalars, Vectors, and Matrices
      • 0.3.3 Model Problem: A~x = ~b
    • 0.4 Non-Linearity: Differential Calculus
      • 0.4.1 Differentiation
      • 0.4.2 Optimization
    • 0.5 Problems
  • 1 Numerics and Error Analysis
    • 1.1 Storing Numbers with Fractional Parts
      • 1.1.1 Fixed Point Representations
      • 1.1.2 Floating Point Representations
      • 1.1.3 More Exotic Options
    • 1.2 Understanding Error
      • 1.2.1 Classifying Error
      • 1.2.2 Conditioning, Stability, and Accuracy
    • 1.3 Practical Aspects
      • 1.3.1 Larger-Scale Example: Summation
    • 1.4 Problems
  • II Linear Algebra
  • 2 Linear Systems and the LU Decomposition
    • 2.1 Solvability of Linear Systems
    • 2.2 Ad-Hoc Solution Strategies
    • 2.3 Encoding Row Operations
      • 2.3.1 Permutation
      • 2.3.2 Row Scaling
      • 2.3.3 Elimination
    • 2.4 Gaussian Elimination
      • 2.4.1 Forward Substitution
      • 2.4.2 Back Substitution
      • 2.4.3 Analysis of Gaussian Elimination
    • 2.5 LU Factorization
      • 2.5.1 Constructing the Factorization
      • 2.5.2 Implementing LU
    • 2.6 Problems
  • 3 Designing and Analyzing Linear Systems
    • 3.1 Solution of Square Systems
      • 3.1.1 Regression
      • 3.1.2 Least Squares
      • 3.1.3 Additional Examples
    • 3.2 Special Properties of Linear Systems
      • 3.2.1 Positive Definite Matrices and the Cholesky Factorization
      • 3.2.2 Sparsity
    • 3.3 Sensitivity Analysis
      • 3.3.1 Matrix and Vector Norms
      • 3.3.2 Condition Numbers
    • 3.4 Problems
  • 4 Column Spaces and QR
    • 4.1 The Structure of the Normal Equations
    • 4.2 Orthogonality
      • 4.2.1 Strategy for Non-Orthogonal Matrices
    • 4.3 Gram-Schmidt Orthogonalization
      • 4.3.1 Projections
      • 4.3.2 Gram-Schmidt Orthogonalization
    • 4.4 Householder Transformations
    • 4.5 Reduced QR Factorization
    • 4.6 Problems
  • 5 Eigenvectors
    • 5.1 Motivation
      • 5.1.1 Statistics
      • 5.1.2 Differential Equations
    • 5.2 Spectral Embedding
    • 5.3 Properties of Eigenvectors
      • 5.3.1 Symmetric and Positive Definite Matrices
      • 5.3.2 Specialized Properties
    • 5.4 Computing Eigenvalues
      • 5.4.1 Power Iteration
      • 5.4.2 Inverse Iteration
      • 5.4.3 Shifting
      • 5.4.4 Finding Multiple Eigenvalues
    • 5.5 Sensitivity and Conditioning
    • 5.6 Problems
  • 6 Singular Value Decomposition
    • 6.1 Deriving the SVD
      • 6.1.1 Computing the SVD
    • 6.2 Applications of the SVD
      • 6.2.1 Solving Linear Systems and the Pseudoinverse
      • 6.2.2 Decomposition into Outer Products and Low-Rank Approximations
      • 6.2.3 Matrix Norms
      • 6.2.4 The Procrustes Problem and Alignment
      • 6.2.5 Principal Components Analysis (PCA)
    • 6.3 Problems
  • III Nonlinear Techniques
  • 7 Nonlinear Systems
    • 7.1 Single-Variable Problems
      • 7.1.1 Characterizing Problems
      • 7.1.2 Continuity and Bisection
      • 7.1.3 Analysis of Root-Finding
      • 7.1.4 Fixed Point Iteration
      • 7.1.5 Newton’s Method
      • 7.1.6 Secant Method
      • 7.1.7 Hybrid Techniques
      • 7.1.8 Single-Variable Case: Summary
    • 7.2 Multivariable Problems
      • 7.2.1 Newton’s Method
      • 7.2.2 Making Newton Faster: Quasi-Newton and Broyen
    • 7.3 Conditioning
    • 7.4 Problems
  • 8 Unconstrained Optimization
    • 8.1 Unconstrained Optimization: Motivation
    • 8.2 Optimality
      • 8.2.1 Differential Optimality
      • 8.2.2 Optimality via Function Properties
    • 8.3 One-Dimensional Strategies
      • 8.3.1 Newton’s Method
      • 8.3.2 Golden Section Search
    • 8.4 Multivariable Strategies
      • 8.4.1 Gradient Descent
      • 8.4.2 Newton’s Method
      • 8.4.3 Optimization without Derivatives: BFGS
    • 8.5 Problems
  • 9 Constrained Optimization
    • 9.1 Motivation
    • 9.2 Theory of Constrained Optimization
    • 9.3 Optimization Algorithms
      • 9.3.1 Sequential Quadratic Programming (SQP)
      • 9.3.2 Barrier Methods
    • 9.4 Convex Programming
    • 9.5 Problems
  • 10 Iterative Linear Solvers
    • 10.1 Gradient Descent
      • 10.1.1 Deriving the Iterative Scheme
      • 10.1.2 Convergence
    • 10.2 Conjugate Gradients
      • 10.2.1 Motivation
      • 10.2.2 Suboptimality of Gradient Descent
      • 10.2.3 Generating A-Conjugate Directions
      • 10.2.4 Formulating the Conjugate Gradients Algorithm
      • 10.2.5 Convergence and Stopping Conditions
    • 10.3 Preconditioning
      • 10.3.1 CG with Preconditioning
      • 10.3.2 Common Preconditioners
    • 10.4 Other Iterative Schemes
    • 10.5 Problems
  • IV Functions, Derivatives, and Integrals
  • 11 Interpolation
    • 11.1 Interpolation in a Single Variable
      • 11.1.1 Polynomial Interpolation
      • 11.1.2 Alternative Bases
      • 11.1.3 Piecewise Interpolation
      • 11.1.4 Gaussian Processes and Kriging
    • 11.2 Multivariable Interpolation
    • 11.3 Theory of Interpolation
      • 11.3.1 Linear Algebra of Functions
      • 11.3.2 Approximation via Piecewise Polynomials
    • 11.4 Problems
  • 12 Numerical Integration and Differentiation
    • 12.1 Motivation
    • 12.2 Quadrature
      • 12.2.1 Interpolatory Quadrature
      • 12.2.2 Quadrature Rules
      • 12.2.3 Newton-Cotes Quadrature
      • 12.2.4 Gaussian Quadrature
      • 12.2.5 Adaptive Quadrature
      • 12.2.6 Multiple Variables
      • 12.2.7 Conditioning
    • 12.3 Differentiation
      • 12.3.1 Differentiating Basis Functions
      • 12.3.2 Finite Differences
      • 12.3.3 Choosing the Step Size
      • 12.3.4 Integrated Quantities
    • 12.4 Problems
  • 13 Ordinary Differential Equations
    • 13.1 Motivation
    • 13.2 Theory of ODEs
      • 13.2.1 Basic Notions
      • 13.2.2 Existence and Uniqueness
      • 13.2.3 Model Equations
    • 13.3 Time-Stepping Schemes
      • 13.3.1 Forward Euler
      • 13.3.2 Backward Euler
      • 13.3.3 Trapezoidal Method
      • 13.3.4 Runge-Kutta Methods
      • 13.3.5 Exponential Integrators
    • 13.4 Multivalue Methods
      • 13.4.1 Newmark Schemes
      • 13.4.2 Staggered Grid
    • 13.5 To Do
    • 13.6 Problems
  • 14 Partial Differential Equations
    • 14.1 Motivation
    • 14.2 Basic definitions
    • 14.3 Model Equations
      • 14.3.1 Elliptic PDEs
      • 14.3.2 Parabolic PDEs
      • 14.3.3 Hyperbolic PDEs
    • 14.4 Derivatives as Operators
    • 14.5 Solving PDEs Numerically
      • 14.5.1 Solving Elliptic Equations
      • 14.5.2 Solving Parabolic and Hyperbolic Equations
  • 14.6 Method of Finite Elements
  • 14.7 Examples in Practice
    • 14.7.1 Gradient Domain Image Processing
    • 14.7.2 Edge-Preserving Filtering
    • 14.7.3 Grid-Based Fluids
  • 14.8 To Do
  • 14.9 Problems

Chapter 0

Mathematics Review

In this chapter we will review relevant notions from linear algebra and multivariable calculus that will figure into our discussion of computational techniques. It is intended as a review of back- ground material with a bias toward ideas and interpretations commonly encountered in practice; the chapter safely can be skipped or used as reference by students with stronger background in mathematics.

0.1 Preliminaries: Numbers and Sets

Rather than considering algebraic (and at times philosophical) discussions like “What is a num- ber?,” we will rely on intuition and mathematical common sense to define a few sets:

  • The natural numbers N = {1, 2, 3,.. .}
  • The integers Z = {... , −2, −1, 0, 1, 2,.. .}
  • The rational numbers Q = {a/b : a, b ∈ Z }^1
  • The real numbers R encompassing Q as well as irrational numbers like π and
  • The complex numbers C = {a + bi : a, b ∈ R }, where we think of i as satisfying i =

It is worth acknowledging that our definition of R is far from rigorous. The construction of the real numbers can be an important topic for practitioners of cryptography techniques that make use of alternative number systems, but these intricacies are irrelevant for the discussion at hand. As with any other sets, N , Z , Q , R , and C can be manipulated using generic operations to generate new sets of numbers. In particular, recall that we can define the “Euclidean product” of two sets A and B as A × B = {(a, b) : a ∈ A and b ∈ B}.

We can take powers of sets by writing

An^ = (^) ︸A × A × · · · ×︷︷ A︸ n times

(^1) This is the first of many times that we will use the notation {A : B}; the braces should denote a set and the colon can be read as “such that.” For instance, the definition of Q can be read as “the set of fractions a/b such that a and b are integers.” As a second example, we could write N = {n ∈ Z : n > 0 }.

0.2.2 Span, Linear Independence, and Bases

Suppose we start with vectors ~v 1 ,... ,~vk ∈ V for vector space V. By Definition 0.1, we have two ways to start with these vectors and construct new elements of V: addition and scalar multiplica- tion. The idea of span is that it describes all of the vectors you can reach via these two operations:

Definition 0.2 (Span). The span of a set S ⊆ V of vectors is the set

span S ≡ {a 1 ~v 1 + · · · + ak~vk : k ≥ 0, vi ∈ V for all i, and ai ∈ R for all i}.

Notice that span S is a subspace of V, that is, a subset of V that is in itself a vector space. We can provide a few examples:

Example 0.3 (Mixology). The typical “well” at a cocktail bar contains at least four ingredients at the bartender’s disposal: vodka, tequila, orange juice, and grenadine. Assuming we have this simple well, we can represent drinks as points in R^4 , with one slot for each ingredient. For instance, a typical “tequila sunrise” can be represented using the point (0, 1.5, 6, 0.75), representing amounts of vodka, tequila, orange juice, and grenadine (in ounces), resp. The set of drinks that can be made with the typical well is contained in

span {(1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0), (0, 0, 0, 1)},

that is, all combinations of the four basic ingredients. A bartender looking to save time, however, might no- tice that many drinks have the same orange juice to grenadine ratio and mix the bottles. The new simplified well may be easier for pouring but can make fundamentally fewer drinks:

span {(1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 6, 0.75)}

Example 0.4 (Polynomials). Define the pk(x) ≡ xk. Then, it is easy to see that

R [x] = span {pk : k ≥ 0 }.

Make sure you understand notation well enough to see why this is the case.

Adding another item to a set of vectors does not always increase the size of its span. For instance, in R^2 it is clearly the case that

span {(1, 0), (0, 1)} = span {(1, 0), (0, 1), (1, 1)}.

In this case, we say that the set {(1, 0), (0, 1), (1, 1)} is linearly dependent:

Definition 0.3 (Linear dependence). We provide three equivalent definitions. A set S ⊆ V of vectors is linearly dependent if:

  1. One of the elements of S can be written as a linear combination of the other elements, or S contains zero.
  2. There exists a non-empty linear combination of elements ~vk ∈ S yielding (^) ∑mk= 1 ck~vk = 0 where ck 6 = 0 for all k.
  3. There exists ~v ∈ S such that span S = span S{~v}. That is, we can remove a vector from S without affecting its span.

If S is not linearly dependent, then we say it is linearly independent. Providing proof or informal evidence that each definition is equivalent to its counterparts (in an “if and only if” fashion) is a worthwhile exercise for students less comfortable with notation and abstract mathematics. The concept of linear dependence leads to an idea of “redundancy” in a set of vectors. In this sense, it is natural to ask how large a set we can choose before adding another vector cannot possibly increase the span. In particular, suppose we have a linearly independent set S ⊆ V, and now we choose an additional vector ~v ∈ V. Adding ~v to S leads to one of two possible outcomes:

  1. The span of S ∪ {~v} is larger than the span of S.
  2. Adding ~v to S has no effect on the span. The dimension of V is nothing more than the maximal number of times we can get outcome 1, add ~v to S, and repeat.

Definition 0.4 (Dimension and basis). The dimension of V is the maximal size |S| of a linearly- independent set S ⊂ V such that span S = V. Any set S satisfying this property is called a basis for V. Example 0.5 ( R n). The standard basis for R n^ is the set of vectors of the form

~ek ≡ (0,... , 0 ︸ ︷︷ ︸ k− 1 slots

n−k slots

That is, ~ek has all zeros except for a single one in the k-th slot. It is clear that these vectors are linearly independent and form a basis; for example in R^3 any vector (a, b, c) can be written as a~e 1 + b~e 2 + c~e 3. Thus, the dimension of R n^ is n, as we would expect. Example 0.6 (Polynomials). It is clear that the set {1, x, x^2 , x^3 ,.. .} is a linearly independent set of poly- nomials spanning R [x]. Notice that this set is infinitely large, and thus the dimension of R [x] is ∞.

0.2.3 Our Focus: R n

Of particular importance for our purposes is the vector space R n, the so-called n-dimensional Eu- clidean space. This is nothing more than the set of coordinate axes encountered in high school math classes:

  • R^1 ≡ R is the number line
  • R^2 is the two-dimensional plane with coordinates (x, y)
  • R^3 represents three-dimensional space with coordinates (x, y, z) Nearly all methods in this course will deal with transformations and functions on R n. For convenience, we usually write vectors in R n^ in “column form,” as follows

(a 1 ,... , an) ≡

a 1 a 2 .. . an

Aside 0.1. There are many theoretical questions to ponder here, some of which we will address in future chapters when they are more motivated:

  • Do all vector spaces admit dot products or similar structures?
  • Do all finite-dimensional vector spaces admit dot products?
  • What might be a reasonable dot product between elements of R [x]?

Intrigued students can consult texts on real and functional analysis.

0.3 Linearity

A function between vector spaces that preserves structure is known as a linear function:

Definition 0.7 (Linearity). Suppose V and V′^ are vector spaces. Then, L : V → V ′^ is linear if it satisfies the following two criteria for all ~v,~v 1 ,~v 2 ∈ V and c ∈ R :

  • L preserves sums: L[~v 1 + ~v 2 ] = L[~v 1 ] + L[~v 2 ]
  • L preserves scalar products: L[c~v] = cL[~v]

It is easy to generate linear maps between vector spaces, as we can see in the following examples:

Example 0.8 (Linearity in R n). The following map f : R^2 → R^3 is linear:

f (x, y) = ( 3 x, 2x + y, −y)

We can check linearity as follows:

  • Sum preservation:

f (x 1 + x 2 , y 1 + y 2 ) = ( 3 (x 1 + x 2 ), 2(x 1 + x 2 ) + (y 1 + y 2 ), −(y 1 + y 2 )) = ( 3 x 1 , 2x 1 + y 1 , −y 1 ) + ( 3 x 2 , 2x 2 + y 2 , −y 2 ) = f (x 1 , y 1 ) + f (x 2 , y 2 )

  • Scalar product preservation:

f (cx, cy) = ( 3 cx, 2cx + cy, −cy) = c( 3 x, 2x + y, −y) = c f (x, y)

Contrastingly, g(x, y) ≡ xy^2 is not linear. For instance, g(1, 1) = 1 but g(2, 2) = 8 6 = 2 · g(1, 1), so this form does not preserve scalar products.

Example 0.9 (Integration). The following “functional” L from R [x] to R is linear:

L[p(x)] ≡

∫ (^1)

0

p(x) dx.

This somewhat more abstract example maps polynomials p(x) to real numbers L[p(x)]. For example, we can write

L[ 3 x^2 + x − 1 ] =

∫ (^1)

0

( 3 x^2 + x − 1 ) dx =

Linearity comes from the following well-known facts from calculus:

∫ (^1)

0

c · f (x) dx = c

∫ (^1)

0

f (x) dx ∫ (^1)

0

[ f (x) + g(x)] dx =

∫ (^1)

0

f (x) dx +

∫ (^1)

0

g(x) dx

We can write a particularly nice form for linear maps on R n. Recall that the vector ~a = (a 1 ,... , an) is equal to the sum (^) ∑k ak~ek, where ~ek is the k-th standard basis vector. Then, if L is linear we know:

L[~a] = L

[

∑ k

ak~ek

]

for the standard basis ~ek

= (^) ∑ k

L [ak~ek] by sum preservation

= (^) ∑ k

akL [~ek] by scalar product preservation

This derivation shows the following important fact: L is completely determined by its action on the standard basis vectors ~ek. That is, for any vector ~a ∈ R n, we can use the sum above to determine L[~a] by linearly combining L[~e 1 ],... , L[~en].

Example 0.10 (Expanding a linear map). Recall the map in Example 0.8 given by f (x, y) = ( 3 x, 2x + y, −y). We have f (~e 1 ) = f (1, 0) = (3, 2, 0) and f (~e 2 ) = f (0, 1) = (0, 1, − 1 ). Thus, the formula above shows:

f (x, y) = x f (~e 1 ) + y f (~e 2 ) = x

 (^) + y

0.3.1 Matrices

The expansion of linear maps above suggests one of many contexts in which it is useful to store multiple vectors in the same structure. More generally, say we have n vectors ~v 1 ,... ,~vn ∈ R m. We can write each as a column vector:

~v 1 =

v 11 v 21 .. . vm 1

,~v 2 =

v 12 v 22 .. . vm 2

, · · · ,~vn =

v 1 n v 2 n .. . vmn

Example 0.14 (Mixology). Continuing Example 0.3, suppose we make a tequila sunrise and second con- coction with equal parts of the two liquors in our simplified well. To find out how much of the basic in- gredients are contained in each order, we could combine the recipes for each column-wise and use matrix multiplication:

Well 1 Well 2 Well 3   

Vodka 1 0 0 Tequila 0 1 0 OJ 0 0 6 Grenadine 0 0 0.

Drink 1 Drink 2 ( (^0) 0.75 ) 1.5 0. 1 2

Drink 1 Drink 2   

0 0.75 Vodka 1.5 0.75 Tequila 6 12 OJ 0.75 1.5 Grenadine

In general, we will use capital letters to represent matrices, like A ∈ R m×n. We will use the notation Aij ∈ R to denote the element of A at row i and column j.

0.3.2 Scalars, Vectors, and Matrices

It comes as no surprise that we can write a scalar as a 1 × 1 vector c ∈ R^1 ×^1. Similar, as we already suggested in §0.2.3, if we write vectors in R n^ in column form, they can be considered n × 1 matrices ~v ∈ R n×^1. Notice that matrix-vector products can be interpreted easily in this context; for example, if A ∈ R m×n, ~x ∈ R n, and~b ∈ R m, then we can write expressions like

︸︷︷︸^ A

m×n

︸︷︷︸^ ~x n× 1

= (^) ︸︷︷︸~b m× 1

We will introduce one additional operator on matrices that is useful in this context:

Definition 0.8 (Transpose). The transpose of a matrix A ∈ R m×n^ is a matrix A>^ ∈ R n×m^ with elements (A>)ij = Aji.

Example 0.15 (Transposition). The transpose of the matrix

A =

is given by

A>^ =

Geometrically, we can think of transposition as flipping a matrix on its diagonal.

This unified treatment of scalars, vectors, and matrices combined with operations like trans- position and multiplication can lead to slick derivations of well-known identities. For instance,

we can compute the dot products of vectors ~a,~b ∈ R n^ by making the following series of steps:

~a ·~b =

n ∑ k= 1

akbk

a 1 a 2 · · · an

b 1 b 2 .. . bn

= ~a>~b

Many important identities from linear algebra can be derived by chaining together these opera- tions with a few rules:

(A>)>^ = A

(A + B)>^ = A>^ + B>

(AB)>^ = B>^ A>

Example 0.16 (Residual norm). Suppose we have a matrix A and two vectors ~x and ~b. If we wish to know how well A~x approximates ~b, we might define a residual ~r ≡ ~b − A~x; this residual is zero exactly when A~x = ~b. Otherwise, we might use the norm ‖~r‖ 2 as a proxy for the relationship between A~x and ~b. We can use the identities above to simplify:

‖~r‖^22 = ‖~b − A~x‖^22 = (~b − A~x) · (~b − A~x) as explained in §0.2. = (~b − A~x)>(~b − A~x) by our expression for the dot product above = (~b>^ − ~x>^ A>)(~b − A~x) by properties of transposition = ~b>~b −~b>^ A~x − ~x>^ A>~b + ~x>^ A>^ A~x after multiplication

All four terms on the right hand side are scalars, or equivalently 1 × 1 matrices. Scalars thought of as matrices trivially enjoy one additional nice property c>^ = c, since there is nothing to transpose! Thus, we can write

~x>^ A>~b = (~x>^ A>~b)>^ = ~b>^ A~x

This allows us to simplify our expression even more:

‖~r‖^22 = ~b>~b − 2 ~b>^ A~x + ~x>^ A>^ A~x = ‖A~x‖^22 − 2 ~b>^ A~x + ‖~b‖^22

We could have derived this expression using dot product identities, but intermediate steps above will prove useful in our later discussion.