Second Order Optimization Methods: An Overview, Exercises of Econometrics and Mathematical Economics

An introduction to optimization methods, focusing on second order methods. It covers the concept of taylor expansion, the newton method, gauss-newton method, quasi-newton methods, and their applications in unconstrained and constrained optimization. The document also discusses the advantages of second order methods over gradient-based methods and the computational issues involved.

Typology: Exercises

2018/2019

Uploaded on 10/29/2019

senoritale
senoritale 🇮🇩

3 documents

1 / 24

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Introduction to
Optimization
Second Order Optimization Methods
Marc Toussaint
U Stuttgart
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18

Partial preview of the text

Download Second Order Optimization Methods: An Overview and more Exercises Econometrics and Mathematical Economics in PDF only on Docsity!

Introduction to

Optimization

Second Order Optimization Methods

Marc Toussaint U Stuttgart

Planned Outline

  • Gradient-based optimization (1st order methods)
    • plain grad., steepest descent, conjugate grad., Rprop, stochastic grad.
    • adaptive stepsize heuristics
  • Constrained Optimization
    • squared penalties, augmented Lagrangian, log barrier
    • Lagrangian, KKT conditions, Lagrange dual, log barrier ↔ approx. KKT
  • 2nd order methods
    • Newton, Gauss-Newton, Quasi-Newton, (L)BFGS
    • constrained case, primal-dual Newton
  • Special convex cases
    • Linear Programming, (sequential) Quadratic Programming
    • Simplex algorithm
    • relation to relaxed discrete optimization
  • Black box optimization (“0th order methods”)
    • blackbox stochastic search
    • Markov Chain Monte Carlo methods
    • evolutionary algorithms

Why can 2nd order optimization be better than

gradient?

  • Better direction:

Conjugate Gradient

Plain Gradient

2nd Order

  • Better stepsize:
    • a full step jumps directly to the minimum of the local squared approx.
    • often this is already a good heuristic
    • additional stepsize reduction and dampening are straight-forward

Outline: 2nd order method

  • Newton
  • Gauss-Newton
  • Quasi-Newton
  • BFGS, (L)BFGS
  • Their application on constrained problems

Newton method

  • For finding roots (zero points) of f (x)

x ← x − f (x) f ′(x)

  • For finding optima of f (x) in 1D:

x ← x − f ′(x) f ′′(x) For x ∈ Rn: x ← x − ∇^2 f (x)-1∇f (x)

Newton method with adaptive stepsize α

Input: initial x ∈ Rn, functions f (x), ∇f (x), ∇^2 f (x), tolerance θ Output: x 1: initialize stepsize α = 1 and damping λ = 10−^10 2: repeat 3: compute ∆ to solve (∇^2 f (x) + λI) ∆ = −∇f (x) 4: repeat // “line search” 5: y ← x + α∆ 6: if f (y) ≤ f (x) then // step is accepted 7: x ← y 8: α ← α^0.^5 // increase stepsize towards α = 1 9: else // step is rejected 10: α ← 0. 1 α // decrease stepsize 11: end if 12: until step accepted or (in bad case) α||∆||∞ < θ/ 1000 13: until ||∆||∞ < θ

  • Notes:
    • Line 3 computes the Newton step ∆ = ∇^2 f (x)-1∇f (x), use special Lapack routine dposv to solve Ax = b (using Cholesky decomposition)
    • λ is called damping , makes the parabola more “steep” around current x for λ → ∞: ∆ becomes colinear with −∇f (x) but |∆| = 0 (^) 8/

Computational issues

  • Let Cf be computational cost of evaluating f (x) only Ceval be computational cost of evaluating f (x), ∇f (x), ∇^2 f (x) C∆ be computational cost of solving (∇^2 f (x) + λI) ∆ = −∇f (x)
  • If Ceval  Cf → proper line search instead of stepsize adaptation If C∆  Cf → proper line search instead of stepsize adaptation
  • However, in many applications (in robotics at least) Ceval ≈ Cf  C∆
  • Often, ∇^2 f (x) is banded (non-zero around diagonal only) → Ax = b becomes super fast using dpbsv (Dynamic Programming)

(If ∇^2 f (x) is a “tree”: Dynamic Programming on the “Junction Tree”)

Demo

Gauss-Newton method

  • The gradient and Hessian of f (x) become

f (x) = φ(x)>φ(x) ∇f (x) = 2∇φ(x)>φ(x) ∇^2 f (x) = 2∇φ(x)>∇φ(x) + 2φ(x)>∇^2 φ(x)

The Gauss-Newton method is the Newton method for f (x) = φ(x)>φ(x) with approximating ∇^2 φ(x) ≈ 0

The approximate Hessian 2 ∇φ(x)>∇φ(x) is always semi-pos-def!

  • In the Newton algorithm, replace line 3 by

3: compute ∆ to solve (∇φ(x)>∇φ(x) + λI) ∆ = −∇φ(x)>φ(x)

Quasi-Newton methods

Basic example

  • We’ve seen already two data points (x 1 , ∇f (x 1 )) and (x 2 , ∇f (x 2 )) How can we estimate ∇^2 f (x)?
  • In 1D:

∇^2 f (x) ≈ ∇f^ (x^2 )^ − ∇f^ (x^1 ) x 2 − x 1

  • In Rn: let y = ∇f (x 2 ) − ∇f (x 1 ), ∆x = x 2 − x 1

∇^2 f (x) ∆x =! y ∆x =! ∇^2 f (x)−^1 y

∇^2 f (x) = y y> y>∆x ∇

(^2) f (x)− (^1) = ∆x∆x> ∆x>y

Convince yourself that the last line solves the desired relations [Left: how to update ∇^2 f (x). Right: how to update directly ∇^2 f (x)-1.]

BFGS

  • Broyden-Fletcher-Goldfarb-Shanno (BFGS) method:

Input: initial x ∈ Rn, functions f (x), ∇f (x), tolerance θ Output: x 1: initialize H-1^ = In 2: repeat 3: compute ∆ = −H-1∇f (x) 4: perform a line search minα f (x + α∆) 5: ∆ ← α∆ 6: y ← ∇f (x + ∆) − ∇f (x) 7: x ← x + ∆ 8: update H-1^ ←

( I − y ∆∆>>y

)> H-

( I − y ∆∆>>y

)

  • ∆∆ ∆>y> 9: until ||∆||∞ < θ
  • Notes:
    • The blue term is the H-1-update as on the previous slide
    • The red term “deletes” previous H-1-components

2nd Order Methods for Constrained Optimization

2nd Order Methods for Constrained Optimization

  • No changes at all for
    • log barrier
    • augmented Lagrangian
    • squared penalties

Directly use (Gauss-)Newton/BFGS → will boost performance of these constrained optimization methods!