Nonlinear Optimization: Amoeba, Conjugate Gradient, Quasi-Newton, Newton-Raphson, Slides of Advanced Algorithms

An overview of nonlinear optimization algorithms, focusing on Amoeba (Nelder-Mead), Conjugate Gradient, Quasi-Newton (Davidon-Fletcher-Powell and Broyden-Fletcher-Goldfarb-Shanno), and Newton-Raphson methods. It covers their strategies, advantages, and disadvantages, as well as choosing among them. The document also includes examples of Amoeba steps and termination criteria.

Typology: Slides

2020/2021

Uploaded on 06/11/2021

myfuture
myfuture 🇺🇸

4.4

(18)

258 documents

1 / 18

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1.204 Lecture 22
Unconstrained nonlinear optimization:
Amoeba
Amoeba
BFGS
Linear programming: Glpk
Multiple optimum values
From Press
Heuristics to deal with multiple optima:
Start at many initial points. Choose best of optima found.
Find local optimum. Take a step away from it and search again.
Simulated annealing takes ‘random’ steps repeatedly
1
X1X2
A
B
C
D
E
F
G
XY
Z
Figure by MIT OpenCourseWare.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12

Partial preview of the text

Download Nonlinear Optimization: Amoeba, Conjugate Gradient, Quasi-Newton, Newton-Raphson and more Slides Advanced Algorithms in PDF only on Docsity!

1.204 Lecture 22

Unconstrained nonlinear optimization:

AmoebaAmoeba

BFGS

Linear programming: Glpk

Multiple optimum values

From Press

Heuristics to deal with multiple optima:

  • Start at many initial points. Choose best of optima found.
  • Find local optimum. Take a step away from it and search again.
  • Simulated annealing takes ‘random’ steps repeatedly

X 1 X 2

A

B

C

D

E

F

G

X Y

Z

Figure by MIT OpenCourseWare.

t t t t t t

Nonlinear optimization

  • Unconstrained nonlinear optimization algorithms

generalllly use ththe same strategy as unconstt t t rainedi d

  • Select a descent direction
  • Use a one dimensional line search to set step size
  • Step, and iterate until convergence
  • Constrained optimization used the constraints to

limit the maximum step size

  • UUnconstraiinedd optiimiizatiion must sellect maxiimum step size
  • Step size is problem-specific and must be tuned
  • Memory requirements are rarely a problem
  • Convergence, accuracy and speed are the issues

Family of nonlinear algorithms

  • Amoeba (Nelder-Mead) method
    • Solves nonlinear optimization problem directlySolves nonlinear optimization problem directly
    • Requires no derivatives or line search
    • Adapts its step size based on change in function value
  • Conjugate gradient and quasi-Newton methods
    • Require function, first derivatives and line search*
    • Line search step size adapts as algorithm proceeds
  • NewtonNewton -Raphson method (last lecture)Raphson method (last lecture)
    • Used to solve nonlinear optimization problems by solving set of first order conditions
    • Uses step size dx that makes f(x+dx)= 0. Little control.
      • ‘Globally convergent’ Newton variant has smaller step size
    • Needs first and second derivatives (and function)
  • -

Amoeba steps

  • Simplex is volume defined by n+1 points in n

dimensions

  • In 3In 3 -D, it is a tetrahedron or pyramidD it is a tetrahedron or pyramid
  • Select a starting guess at point P 0
  • Set other points of simplex as Pi= P 0 + ∆ei
  • Take a series of steps
  • Most steps move point of simplex where function is highest (“highest point”): “reflection”
  • Conserve volume of simplexConserve volume of simplex - > avoid degeneracy> avoid degeneracy
  • Where function is flat, method expands simplex to take larger steps: “expansion and reflection”
  • When it reaches a “valley floor”, simplex “contracts” itself in transverse direction and tries to ooze down the valley
  • If trying to pass through “eye of needle” it “contracts itself in all directions” around its lowest (best) point From Press et al

Amoeba steps

From Press et al

Simplex at beginning of step

High (^) Low

Reflection Reflection and expansion

Contraction Multiple contraction

Possible outcomes for a step in the downhill simplex method

(a) (b)

(c) (d)

Figure by MIT OpenCourseWare.

t t

Amoeba pseudocode: minimization

  • Start at initial guess
    • Determine which point is highest by looping over simplex points and evaluating function at each
    • If difference between highest and lowest is small, return
  • Otherwise ooze (iterate):
    • Reflect by factor= -1 through face of simplex from high point
      • If this is better than low point, reflect/expand by factor of 2
      • If this is worse than second highest, contract by 2 in this directionIf this is worse than second highest, contract by 2 in this direction
      • If this is worst point, contract in all directions around lowest point
    • Select new face based on new high point and reflect again
    • Terminate if difference between highest and lowest points is small
      • Or terminate at maximum iterations allowed

Amoeba termination criteria

  • All nonlinear optimization algorithm termination

criteria are difficult

  • TTermiinate when step si h ize iis small (ll (~10 10 -8^8 with dith doublbles))
  • Change in function value is small (~10-14^ with doubles)
  • Termination can occur at a local minimum
  • Restart amoeba at a minimum it found
  • Reset initial simplex using Pi= P 0 + ∆ei ,
  • Don’t use simplex that existed at termination
  • Code is in download:Code is in download:

public Amoeba(double ftol) public double[] minimize(double[] point, double del, MathFunction3 mf3) // And other optional methods

  • This is very easy; code ran first time on demand model
    • Experiment with starting point and del, but it’s usually easy

Quasi-Newton methods

Split problem into two minimization problems, each with a “quadratic envelope” You must understand your problem well enough to do this

Newton method

Full Newton step

If the minimum region is narrow or twisty, it is hard to get down into it. Full Newton steps give little control. (Amoeba just munches along)

Quasi-Newton methods

Quasi Newton steps BFGS takes a fraction of the Newton step, so that f(x) decreases at some minimum rate proportional to the average decrease

High order derivatives

We may be tempted to use higher derivatives to fit our nonlinear functions. More terms in the Taylor series expansion mean a higher degree polynomial fit, but using higher order derivatives is difficult because they are “stiff”

This is the challenge of nonlinear methods: use only low order derivatives to explore complex surfaces. There aren’t many general ways to do this. Solutions are problem-specific but use BFGS or other general algorithm: Understand problem regions; have good starting point; understand surface

High order derivatives

We may be tempted to use higher derivatives to fit our nonlinear functions. More terms in the Taylor series expansion mean a higher degree polynomial fit, but using higher order derivatives is difficult because they are “stiff”

This is the challenge of nonlinear methods: use only low degree derivatives to explore complex surfaces. There aren’t many general ways to do this. Solutions are problem-specific but use BFGS or other general algorithm: Understand problem regions; have good starting point; understand surface

BFGS

x x

xx

f

x

x

f

ij

i i j

f(x) f(P)

TakeapointPasorigin.ApproximatefusingTaylorseries:

2

i

b f

c f P

where

c b x x A x

x xx

P

i i^ j i j

i^2 ,

[ ]

f A x b

xx

f

A

b f

P i j

ij

P

AistheHessianmatrix.Gradientoffis :

2

BFGS, p.

lim

BFGSconstructsasequenceofmatricesH such that:

1

i

i H^ i =^ A

Tofindaminimum,findazeroofthegradientoff(x):

f(x) f(x) ( ) ( )

Atpoint x,usingaTaylorseriesagain

i

i

i i

i i i i

i i

f x f x A x x

x x f x x x A x x

−>∞

Newton'smethodsets f(x) 0 tofindthenextpoint:

1

x − xi =− A ⋅∇ f x i

BFGS, p.

  • Newton’s method, far from the minimum, canNewton s method, far from the minimum, can

project us to points x where f(x) is greater than

the current value

  • Large steps based on a quadratic approximation will not necessarily lead to improvements in ill-behaved function
  • BFGS does a line search along the Newton direction
  • It finds a point at which the objective has decreased
  • This point is used to update the Hessian (which is a matrix of second derivatives)
  • The Hessian is based on the previous Hessian and a set of correction terms based on ▼f , x and xi
  • See Numerical Recipes for BFGS updating formula
    • Proofs are difficult

BFGS lnsrch() pseudocode

  • Takes point, function and gradient value at point,

and direction ((line)) to search alongg as inpput

  • Finds new point with lower function value
    • Finding minimum along line requires too much computation
    • Finding any point whose f is less than current isn’t good enough either; we may take too many steps
    • We find a point where the average decrease from the current point is a fraction of the gradient projectioncurrent point is a fraction of the gradient projection
    • We require a minimum step size so we don’t stall
    • We use a quadratic interpolation of f along the line to choose the first new point
    • We then use a cubic interpolation, now that we have 3 points ( f(0), f(1) and f(m)) for remaining points

BFGS code

  • First code set
    • BFGS: constructor, dfppmin()(), lnsrch()()
    • MathFunction4: interface with func(), df() [gradient]
    • DemandModelBFGS: Almost same as in Newton
    • DemandModelBFGSTest: Almost same as in Newton
  • Second code set
    • Same BFGS, MathFunction4, DemandModelBFGSTest
    • DemandModelBFGS df() method computes gradient numerically, without analytic expressions for derivatives
  • Both BFGS implementations converge quickly on

same logit model coefficients as Newton

  • Code is subtle, especially interaction between dfpmin() and lnsrch().

BFGSTest

public class DemandModelBFGSTest { public static void main(String[] args) { double[][] x= { {1, 52.9 - 4.4}, {1,{ , 4.1 - 28.5},}, //// Etc. };}; double[] y= {0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1}; // Almost identical to DemandModel from last lecture // Implements MathFunction4, with func() and df() DemandModelBFGS d= new DemandModelBFGS(x, y); double[] beta= {0, 0}; // Initial guess double log0= d.func2(beta); doubledouble gtol=gtol 1E-14;1E 14; BFGS b= new BFGS(d, gtol); double logB= b.dfpmin(beta); double[][] jacobian= d.jacobian(beta); d.print(log0, logB, beta, jacobian); } } // Output identical to last lecture example. 10 iterations

Nonlinear method performance

Method Iterations Function evaluations

Newton 6 6n^2 = 24

BFGS 10 or 11 10n= 20

Amoeba 116 116

This example is illustrative only, based on our logit test case. n is the number of variables ((coefficients in our test case)). As n increases, amoeba performance is relatively more competitive

Amoeba and BFGS have better convergence and precision than Newton

Linear programming

From GLPK manual

© Andrew Makhorin. All rights reserved.

This content is excluded from our Creative Commons license.

For more information, see http://ocw.mit.edu/fairuse

Linear programming standard form

Formulate constraints as a set of equations and a set of bounded variables.

This is easy to translate to the simplex tableau used for the computations.

© Andrew Makhorin. All rights reserved.

This content is excluded from our Creative Commons license.

For more information, see http://ocw.mit.edu/fairuse

LP program, p.

import org.gnu.glpk.*; // Glpk starts numbering at 1, not 0 public class LP { publicpublic staticstatic voidvoid main(String[]main(String[] args)args) {{ int[] ia= new int[1+1000]; // Row index of coefficient int[] ja= new int[1+1000]; // Col index of coefficient double[] ar= new double[1+1000]; // Coefficient double z, x1, x2, x3; // Obj value, unknowns GlpkSolver solver = new GlpkSolver(); solver.setProbName("sample"); solver.setObjDir(GlpkSolver.LPX_MAX); // Maximization sollver.addRddRows(3);(3) solver.setRowName(1, "p"); solver.setRowBnds(1, GlpkSolver.LPX_UP, 0.0, 100.0); solver.setRowName(2, "q"); solver.setRowBnds(2, GlpkSolver.LPX_UP, 0.0, 600.0); solver.setRowName(3, "r"); solver.setRowBnds(3, GlpkSolver.LPX_UP, 0.0, 300.0);

LP program, p.

solver.addCols(3); solver.setColName(1, "x1"); solver.setColBnds(1, GlpkSolver.LPX_LO, 0.0, 0.0); solversolver .setObjCoef(1,setObjCoef(1 10 10.0); 0); solver.setColName(2, "x2"); solver.setColBnds(2, GlpkSolver.LPX_LO, 0.0, 0.0); solver.setObjCoef(2, 6.0); solver.setColName(3, "x3"); solver.setColBnds(3, GlpkSolver.LPX_LO, 0.0, 0.0); solver.setObjCoef(3, 4.0); ia[1] = 1; ja[1] = 1; ar[1] = 1.0; /* a[1,1] = 1 / ia[2]ia[2] = 1;1; ja[2]ja[2] = 2;2; ar[2] =ar[2] 1 1.0; / 0; /* a[1a[1,2] = 2] 11 // ia[3] = 1; ja[3] = 3; ar[3] = 1.0; /* a[1,3] = 1 / ia[4] = 2; ja[4] = 1; ar[4] = 10.0; / a[2,1] = 10 / ia[5] = 2; ja[5] = 2; ar[5] = 4.0; / a[2,2] = 4 / ia[6] = 2; ja[6] = 3; ar[6] = 5.0; / a[2,3] = 5 / ia[7] = 3; ja[7] = 1; ar[7] = 2.0; / a[3,1] = 2 / ia[8] = 3; ja[8] = 2; ar[8] = 2.0; / a[3,2] = 2 / ia[9] = 3; ja[9] = 3; ar[9] = 6.0; / a[3,3] = 6 */