Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Nonlinear Optimization: Amoeba, Conjugate Gradient, Quasi-Newton, Newton-Raphson, Slides of Advanced Algorithms

Massachusetts Institute of Technology (MIT)Advanced Algorithms

An overview of nonlinear optimization algorithms, focusing on Amoeba (Nelder-Mead), Conjugate Gradient, Quasi-Newton (Davidon-Fletcher-Powell and Broyden-Fletcher-Goldfarb-Shanno), and Newton-Raphson methods. It covers their strategies, advantages, and disadvantages, as well as choosing among them. The document also includes examples of Amoeba steps and termination criteria.

Typology: Slides

2020/2021

Uploaded on 06/11/2021

myfuture 🇺🇸

4.4

(18)

258 documents

1 / 18

This page cannot be seen from the preview

Don't miss anything!

1.204 Lecture 22

Unconstrained nonlinear optimization:

Amoeba

BFGS

Linear programming: Glpk

Multiple optimum values

From Press

Heuristics to deal with multiple optima:

• Start at many initial points. Choose best of optima found.

• Find local optimum. Take a step away from it and search again.

• Simulated annealing takes ‘random’ steps repeatedly

X1X2

Figure by MIT OpenCourseWare.

Discover Slides of Advanced Algorithms Massachusetts Institute of Technology (MIT)

Partial preview of the text

Download Nonlinear Optimization: Amoeba, Conjugate Gradient, Quasi-Newton, Newton-Raphson and more Slides Advanced Algorithms in PDF only on Docsity!

1.204 Lecture 22

Unconstrained nonlinear optimization:

AmoebaAmoeba

BFGS

Linear programming: Glpk

Multiple optimum values

From Press

Heuristics to deal with multiple optima:

Start at many initial points. Choose best of optima found.
Find local optimum. Take a step away from it and search again.
Simulated annealing takes ‘random’ steps repeatedly

X 1 X 2

A

B

C

D

E

F

G

X Y

Z

Figure by MIT OpenCourseWare.

t t t t t t

Nonlinear optimization

Unconstrained nonlinear optimization algorithms

generalllly use ththe same strategy as unconstt t t rainedi d

Select a descent direction
Use a one dimensional line search to set step size
Step, and iterate until convergence
Constrained optimization used the constraints to

limit the maximum step size

UUnconstraiinedd optiimiizatiion must sellect maxiimum step size
Step size is problem-specific and must be tuned
Memory requirements are rarely a problem
Convergence, accuracy and speed are the issues

Family of nonlinear algorithms

Amoeba (Nelder-Mead) method
- Solves nonlinear optimization problem directlySolves nonlinear optimization problem directly
- Requires no derivatives or line search
- Adapts its step size based on change in function value
Conjugate gradient and quasi-Newton methods
- Require function, first derivatives and line search*
- Line search step size adapts as algorithm proceeds
NewtonNewton -Raphson method (last lecture)Raphson method (last lecture)
- Used to solve nonlinear optimization problems by solving set of first order conditions
- Uses step size dx that makes f(x+dx)= 0. Little control.
  - ‘Globally convergent’ Newton variant has smaller step size
- Needs first and second derivatives (and function)

Amoeba steps

Simplex is volume defined by n+1 points in n

dimensions

In 3In 3 -D, it is a tetrahedron or pyramidD it is a tetrahedron or pyramid
Select a starting guess at point P 0
Set other points of simplex as Pi= P 0 + ∆ei
Take a series of steps
Most steps move point of simplex where function is highest (“highest point”): “reflection”
Conserve volume of simplexConserve volume of simplex - > avoid degeneracy> avoid degeneracy
Where function is flat, method expands simplex to take larger steps: “expansion and reflection”
When it reaches a “valley floor”, simplex “contracts” itself in transverse direction and tries to ooze down the valley
If trying to pass through “eye of needle” it “contracts itself in all directions” around its lowest (best) point From Press et al

Amoeba steps

From Press et al

Simplex at beginning of step

High (^) Low

Reflection Reflection and expansion

Contraction Multiple contraction

Possible outcomes for a step in the downhill simplex method

(a) (b)

Figure by MIT OpenCourseWare.

t t

Amoeba pseudocode: minimization

Start at initial guess
- Determine which point is highest by looping over simplex points and evaluating function at each
- If difference between highest and lowest is small, return
Otherwise ooze (iterate):
- Reflect by factor= -1 through face of simplex from high point
  - If this is better than low point, reflect/expand by factor of 2
  - If this is worse than second highest, contract by 2 in this directionIf this is worse than second highest, contract by 2 in this direction
  - If this is worst point, contract in all directions around lowest point
- Select new face based on new high point and reflect again
- Terminate if difference between highest and lowest points is small
  - Or terminate at maximum iterations allowed

Amoeba termination criteria

All nonlinear optimization algorithm termination

criteria are difficult

TTermiinate when step si h ize iis small (ll (~10 10 -8^8 with dith doublbles))
Change in function value is small (~10-14^ with doubles)
Termination can occur at a local minimum
Restart amoeba at a minimum it found
Reset initial simplex using Pi= P 0 + ∆ei ,
Don’t use simplex that existed at termination
• Code is in download:Code is in download:

public Amoeba(double ftol) public double[] minimize(double[] point, double del, MathFunction3 mf3) // And other optional methods

This is very easy; code ran first time on demand model
- Experiment with starting point and del, but it’s usually easy

Quasi-Newton methods

Split problem into two minimization problems, each with a “quadratic envelope” You must understand your problem well enough to do this

Newton method

Full Newton step

If the minimum region is narrow or twisty, it is hard to get down into it. Full Newton steps give little control. (Amoeba just munches along)

Quasi-Newton methods

Quasi Newton steps BFGS takes a fraction of the Newton step, so that f(x) decreases at some minimum rate proportional to the average decrease

High order derivatives

We may be tempted to use higher derivatives to fit our nonlinear functions. More terms in the Taylor series expansion mean a higher degree polynomial fit, but using higher order derivatives is difficult because they are “stiff”

This is the challenge of nonlinear methods: use only low order derivatives to explore complex surfaces. There aren’t many general ways to do this. Solutions are problem-specific but use BFGS or other general algorithm: Understand problem regions; have good starting point; understand surface

High order derivatives

This is the challenge of nonlinear methods: use only low degree derivatives to explore complex surfaces. There aren’t many general ways to do this. Solutions are problem-specific but use BFGS or other general algorithm: Understand problem regions; have good starting point; understand surface

BFGS

x x

xx

f

x

f

i i j

f(x) f(P)

TakeapointPasorigin.ApproximatefusingTaylorseries:

b f

c f P

where

c b x x A x

x xx

i i^ j i j

i^2 ,

[ ]

f A x b

xx

f

A

b f

P i j

AistheHessianmatrix.Gradientoffis :

BFGS, p.

lim

BFGSconstructsasequenceofmatricesH such that:

i H^ i =^ A

−

∞

Tofindaminimum,findazeroofthegradientoff(x):

f(x) f(x) ( ) ( )

Atpoint x,usingaTaylorseriesagain

i i

i i i i

i i

f x f x A x x

x x f x x x A x x

−>∞

Newton'smethodsets f(x) 0 tofindthenextpoint:

x − xi =− A ⋅∇ f x i

−

BFGS, p.

Newton’s method, far from the minimum, canNewton s method, far from the minimum, can

project us to points x where f(x) is greater than

the current value

Large steps based on a quadratic approximation will not necessarily lead to improvements in ill-behaved function
BFGS does a line search along the Newton direction
It finds a point at which the objective has decreased
This point is used to update the Hessian (which is a matrix of second derivatives)
The Hessian is based on the previous Hessian and a set of correction terms based on ▼f , x and xi
See Numerical Recipes for BFGS updating formula
- Proofs are difficult

BFGS lnsrch() pseudocode

Takes point, function and gradient value at point,

and direction ((line)) to search alongg as inpput

Finds new point with lower function value
- Finding minimum along line requires too much computation
- Finding any point whose f is less than current isn’t good enough either; we may take too many steps
- We find a point where the average decrease from the current point is a fraction of the gradient projectioncurrent point is a fraction of the gradient projection
- We require a minimum step size so we don’t stall
- We use a quadratic interpolation of f along the line to choose the first new point
- We then use a cubic interpolation, now that we have 3 points ( f(0), f(1) and f(m)) for remaining points

BFGS code

First code set
- BFGS: constructor, dfppmin()(), lnsrch()()
- MathFunction4: interface with func(), df() [gradient]
- DemandModelBFGS: Almost same as in Newton
- DemandModelBFGSTest: Almost same as in Newton
Second code set
- Same BFGS, MathFunction4, DemandModelBFGSTest
- DemandModelBFGS df() method computes gradient numerically, without analytic expressions for derivatives
Both BFGS implementations converge quickly on

same logit model coefficients as Newton

Code is subtle, especially interaction between dfpmin() and lnsrch().

BFGSTest

public class DemandModelBFGSTest { public static void main(String[] args) { double[][] x= { {1, 52.9 - 4.4}, {1,{ , 4.1 - 28.5},}, //// Etc. };}; double[] y= {0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1}; // Almost identical to DemandModel from last lecture // Implements MathFunction4, with func() and df() DemandModelBFGS d= new DemandModelBFGS(x, y); double[] beta= {0, 0}; // Initial guess double log0= d.func2(beta); doubledouble gtol=gtol 1E-14;1E 14; BFGS b= new BFGS(d, gtol); double logB= b.dfpmin(beta); double[][] jacobian= d.jacobian(beta); d.print(log0, logB, beta, jacobian); } } // Output identical to last lecture example. 10 iterations

Nonlinear method performance

Method Iterations Function evaluations

Newton 6 6n^2 = 24

BFGS 10 or 11 10n= 20

Amoeba 116 116

This example is illustrative only, based on our logit test case. n is the number of variables ((coefficients in our test case)). As n increases, amoeba performance is relatively more competitive

Amoeba and BFGS have better convergence and precision than Newton

Linear programming

From GLPK manual

This content is excluded from our Creative Commons license.

For more information, see http://ocw.mit.edu/fairuse

Linear programming standard form

Formulate constraints as a set of equations and a set of bounded variables.

This is easy to translate to the simplex tableau used for the computations.

This content is excluded from our Creative Commons license.

For more information, see http://ocw.mit.edu/fairuse

LP program, p.

import org.gnu.glpk.*; // Glpk starts numbering at 1, not 0 public class LP { publicpublic staticstatic voidvoid main(String[]main(String[] args)args) {{ int[] ia= new int[1+1000]; // Row index of coefficient int[] ja= new int[1+1000]; // Col index of coefficient double[] ar= new double[1+1000]; // Coefficient double z, x1, x2, x3; // Obj value, unknowns GlpkSolver solver = new GlpkSolver(); solver.setProbName("sample"); solver.setObjDir(GlpkSolver.LPX_MAX); // Maximization sollver.addRddRows(3);(3) solver.setRowName(1, "p"); solver.setRowBnds(1, GlpkSolver.LPX_UP, 0.0, 100.0); solver.setRowName(2, "q"); solver.setRowBnds(2, GlpkSolver.LPX_UP, 0.0, 600.0); solver.setRowName(3, "r"); solver.setRowBnds(3, GlpkSolver.LPX_UP, 0.0, 300.0);

LP program, p.

solver.addCols(3); solver.setColName(1, "x1"); solver.setColBnds(1, GlpkSolver.LPX_LO, 0.0, 0.0); solversolver .setObjCoef(1,setObjCoef(1 10 10.0); 0); solver.setColName(2, "x2"); solver.setColBnds(2, GlpkSolver.LPX_LO, 0.0, 0.0); solver.setObjCoef(2, 6.0); solver.setColName(3, "x3"); solver.setColBnds(3, GlpkSolver.LPX_LO, 0.0, 0.0); solver.setObjCoef(3, 4.0); ia[1] = 1; ja[1] = 1; ar[1] = 1.0; /* a[1,1] = 1 / ia[2]ia[2] = 1;1; ja[2]ja[2] = 2;2; ar[2] =ar[2] 1 1.0; / 0; /* a[1a[1,2] = 2] 11 // ia[3] = 1; ja[3] = 3; ar[3] = 1.0; /* a[1,3] = 1 / ia[4] = 2; ja[4] = 1; ar[4] = 10.0; / a[2,1] = 10 / ia[5] = 2; ja[5] = 2; ar[5] = 4.0; / a[2,2] = 4 / ia[6] = 2; ja[6] = 3; ar[6] = 5.0; / a[2,3] = 5 / ia[7] = 3; ja[7] = 1; ar[7] = 2.0; / a[3,1] = 2 / ia[8] = 3; ja[8] = 2; ar[8] = 2.0; / a[3,2] = 2 / ia[9] = 3; ja[9] = 3; ar[9] = 6.0; / a[3,3] = 6 */

Nonlinear Optimization: Amoeba, Conjugate Gradient, Quasi-Newton, Newton-Raphson, Slides of Advanced Algorithms

Related documents

Partial preview of the text

Download Nonlinear Optimization: Amoeba, Conjugate Gradient, Quasi-Newton, Newton-Raphson and more Slides Advanced Algorithms in PDF only on Docsity!

1.204 Lecture 22

Unconstrained nonlinear optimization:

AmoebaAmoeba

BFGS

Linear programming: Glpk

X 1 X 2

A

B

C

D

E

F

G

X Y

Z

generalllly use ththe same strategy as unconstt t t rainedi d

limit the maximum step size

dimensions

criteria are difficult

High order derivatives

BFGS

x x

xx

f

x

x

f

f(x) f(P)

TakeapointPasorigin.ApproximatefusingTaylorseries:

b f

c f P

where

c b x x A x

x xx

i^2 ,

[ ]

f A x b

xx

f

A

b f

AistheHessianmatrix.Gradientoffis :

lim

BFGSconstructsasequenceofmatricesH such that:

i H^ i =^ A

Tofindaminimum,findazeroofthegradientoff(x):

f(x) f(x) ( ) ( )

Atpoint x,usingaTaylorseriesagain

f x f x A x x

x x f x x x A x x

Newton'smethodsets f(x) 0 tofindthenextpoint:

x − xi =− A ⋅∇ f x i

project us to points x where f(x) is greater than

the current value

and direction ((line)) to search alongg as inpput

same logit model coefficients as Newton

Method Iterations Function evaluations

Newton 6 6n^2 = 24

BFGS 10 or 11 10n= 20

Amoeba 116 116

© Andrew Makhorin. All rights reserved.

This content is excluded from our Creative Commons license.

For more information, see http://ocw.mit.edu/fairuse

Formulate constraints as a set of equations and a set of bounded variables.

This is easy to translate to the simplex tableau used for the computations.

© Andrew Makhorin. All rights reserved.

This content is excluded from our Creative Commons license.

For more information, see http://ocw.mit.edu/fairuse