Line Search Methods - Numerical Optimization - Lecture Notes | MATH 693A, Study notes of Mathematics

Material Type: Notes; Subject: Mathematics; University: San Diego State University; Term: Fall 2009;

Typology: Study notes

Pre 2010

Uploaded on 03/28/2010

koofers-user-qco
koofers-user-qco 🇺🇸

10 documents

1 / 24

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Line Search Methods
Step Length Selection
Numerical Optimization
Lecture Notes #6
Line Search Methods: Step Length Selection
Peter Blomgren,
Department of Mathematics and Statistics
Dynamical Systems Group
Computational Sciences Research Center
San Diego State University
San Diego, CA 92182-7720
http://terminus.sdsu.edu/
Fall 2009
Peter Blomgren, h[email protected]iLine Search Methods: Step Length Selection (1/24)
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18

Partial preview of the text

Download Line Search Methods - Numerical Optimization - Lecture Notes | MATH 693A and more Study notes Mathematics in PDF only on Docsity!

Line Search Methods Step Length Selection

Numerical Optimization

Lecture Notes # — Line Search Methods: Step Length Selection

Peter Blomgren, 〈[email protected]〉 Department of Mathematics and Statistics Dynamical Systems Group Computational Sciences Research Center San Diego State University San Diego, CA 92182- http://terminus.sdsu.edu/

Fall 2009

Line Search Methods Step Length Selection

Outline

(^1) Line Search Methods Recap Step Length Selection

(^2) Step Length Selection Interpolation The Initial Step LS — Strong Wolfe Conditions

Line Search Methods Step Length Selection Recap Step Length Selection

Quick Recap: Last Time

Further, Quasi-Newton methods, where the search direction is ˜pQN k = −B k− 1 ∇f (˜xk ), exhibit super-linear convergence as long as the matrix sequence {Bk } converges to the Hessian ∇^2 f (˜x∗) in the search direction ˜pk :

lim k→∞

‖(Bk − ∇^2 f (˜x∗))˜pk ‖ ‖p˜k ‖

Coordinate Descent Methods: Slower than Steepest descent. Useful of coordinates are decoupled and/or computation of the gradient is not possible or too expensive.

Line Search Methods Step Length Selection Recap Step Length Selection

Unconstrained Optimization — In the Line Search “Universe”

Global Optimization Problem

Local Strategies

Line Search Algorithms

Search Direction Step Length

Sufficient Descent Conditions

Convergence: Global

Convergence: Local Rate

Line Search Methods Step Length Selection Recap Step Length Selection

Step Length Selection: Assumptions

We must assume that p˜k is a descent direction, i.e. that Φ′(0) < 0 — thus all our steps will be in the positive direction. When the objective f is quadratic f (˜x) = 12 ˜xT^ Q˜x + ˜bT^ ˜x + c, the optimal step can be found explicitly

αk = − ∇f (˜xk )T^ ˜pk ˜pTk Q˜pk

For general nonlinear f we must use an iterative scheme to find the step length αk. How the line search is performed impacts the robustness and efficiency of the overall optimization method.

Line Search Methods Step Length Selection Recap Step Length Selection

Step Length Selection: Classification

It is natural to classify line search methods based on how many derivatives they need: 0 Methods based on^ function values only^ tend to be inef- ficient, since they need to narrow the minimizer to a small interval.

1 Gradient information makes it easier to determine if a certain step is good — i.e. it satisfies a sufficient reduction condition.

1 Methods requiring more than one derivate are quite rare; in order to compute the second derivative the full Hessian ∇^2 f (˜xk ) is needed, this is usually too high a cost.

Line Search Methods Step Length Selection

Interpolation The Initial StepLS — Strong Wolfe Conditions

Step Length Selection: Interpolation 1 of 7

First we note that the Armijo condition can be written in terms of Φ as Φ(αk ) ≤ Φ(0) + c 1 αk Φ′(0) where c 1 ∼ 10 −^4 in practice. This is stronger (but not much stronger) that requiring descent. Our new algorithms will be efficient in the sense that the gradient ∇f (˜xk ) is computed as few times as possible. If the initial step length α 0 satisfies the Armijo condition, then we accept α 0 as the step length and terminate the search. — As we get close to the solution this will happen more and more often (for Newton and quasi-Newton methods with α 0 = 1.) Otherwise, we search for an acceptable step length in [0, α 0 ]...

Line Search Methods Step Length Selection

Interpolation The Initial StepLS — Strong Wolfe Conditions

Step Length Selection: Interpolation 2 of 7

At this stage we have computed 3 pieces of information: Φ(0), Φ′(0), and Φ(α 0 ) we use this information to build a quadratic model Φq (α):

Φq (α) =

[

Φ(α 0 ) − Φ(0) − α 0 Φ′(0) α^20

]

α^2 + Φ′(0)α + Φ(0)

Note Φq (0) = Φ(0), Φq (α 0 ) = Φ(α 0 ), Φ′ q (0) = Φ′(0) We set Φ′ q (α) = 0 to find the minimum of the model — our next α to try...

Φ′ q (α) = 2α

[

Φ(α 0 ) − Φ(0) − α 0 Φ′(0) α^20

]

Line Search Methods Step Length Selection

Interpolation The Initial StepLS — Strong Wolfe Conditions

Step Length Selection: Interpolation 4 of 7

The next iterate (α 2 ) is now the minimizer of Φc (α) which lies in [0, α 1 ], it is given as one of the roots of the quadratic equation

Φ′ c (α) = 3aα^2 + 2bα + Φ′(0) = 0

it is... α 2 =

−b +

b^2 − 3 aΦ′(0) 3 a In the extremely rare cases that α 2 does not satisfy the Armijo condition Φ(α 2 ) ≤ Φ(0) + c 1 α 2 Φ′(0) we create a new cubic model interpolating

Φ(0), Φ′(0), Φ(α 1 ), and Φ(α 2 )

i.e. Φ(0), Φ′(0) and the two most recent α’s.

Line Search Methods Step Length Selection

Interpolation The Initial StepLS — Strong Wolfe Conditions

Step Length Selection: Interpolation 5 of 7

At this point we must introduce to following safeguards to guarantee that we make sufficient progress: If |αk+1 − αk | < ǫ 1 or |αk+1| < ǫ 2 then αk+1 = αk / 2

The algorithm described assumes that computing the derivative is significantly more expensive than computing function values. However it is often, but not always, possible to compute the directional derivative (or a good estimate thereof) with minimal extra cost. In those cases we build the cubic interpolant so that it interpolates

Φ(αk ), Φ′(αk ), Φ(αk− 1 ), and Φ′(αk− 1 )

this is a Hermite Polynomial of degree 3 (see Math 541.)

Line Search Methods Step Length Selection

Interpolation The Initial StepLS — Strong Wolfe Conditions

Step Length Selection: Interpolation 7 of 7

The minimizer of H 3 (α) in [αk− 1 , αk ] is either at one of the end points, or else in the interior (given setting H 3 ′(α) = 0). The interior point is given by

αk+1 = αk − (αk − αk− 1 )

[ Φ′(αk ) + d 2 − d 1 Φ′(αk ) − Φ′(αk− 1 ) + 2d 2

]

where

d 1 = Φ′(αk− 1 ) + Φ′(αk ) − 3

[ Φ(αk− 1 ) − Φ(αk ) αk− 1 − αk

]

d 2 =

√ d 12 − Φ′(αk− 1 )Φ′(αk )

Either αk+1 is accepted as the step length, or the search process continues... Cubic interpolation gives quadratic convergence in the step length selection algorithm.

Line Search Methods Step Length Selection

Interpolation The Initial Step LS — Strong Wolfe Conditions

Step Length Selection: The Initial Step 1 of 2

For Newton and quasi-Newton methods, the search vector ˜pk contains an intrinsic sense of scale (being formed from the local descent, and curvature information), hence the initial trial step length should always be α 0 = 1, otherwise we break the quadratic respective super-linear convergence properties. For other search directions, such as steepest descent and conjugate gradient (to be described later) directions which do not have a sense of scale, other method must be used to select a good first trial step: Strategy #1: Assume that the rate of change in the current iteration will be the same as in the previous iteration, select α 0 :

α[ 0 k ]= α[k−1]^ p˜Tk− 1 ∇f (˜xk− 1 ) ˜pTk ∇f (˜xk ) .

Line Search Methods Step Length Selection

Interpolation The Initial Step LS — Strong Wolfe Conditions

Line Search for the Strong Wolfe Conditions 1 of 6

Algorithm: LS/Strong Wolfe Conditions

  1. Set α 0 = 0, choose α 1 > 0, αmax, c 1 , and c 2 , i = 1
  2. while( TRUE )
  3. Compute Φ(αi )
  4. if (Φ(αi ) > Φ(0) + c 1 αi Φ′(0)) or (Φ(αi ) ≥ Φ(αi− 1 ) and i > 1)
  5. α∗ = zoom(αi− 1 , αi ), and terminate search
  6. Compute Φ′(αi )
  7. if |Φ′(αi )| ≤ −c 2 Φ′(0)
  8. α∗ = αi , and terminate search
  9. if Φ′(αi ) ≥ 0
  10. α∗ = zoom(αi , αi− 1 ), and terminate search
  11. Choose αi+1 ∈ [αi , αmax]
  12. i = i + 1
  13. end

Line Search Methods Step Length Selection

Interpolation The Initial Step LS — Strong Wolfe Conditions

Line Search for the Strong Wolfe Conditions 2 of 6

In the first stage of the algorithm, either an acceptable step length, or a range [αi , αi+1] containing an acceptable step length is identified — none of the conditions 04, 07, 09 are satisfied so the step length is increased

If in the first stage we identified a range, the second stage invokes a function zoom which will identify an acceptable step from the interval. Notes: 04 establishes that αi is too long a step, thus α∗ must be in the range [αi− 1 , αi ]. If 07 holds, then both the strong Wolfe conditions hold (since not(04) must also hold. Finally, if 09 holds then the step is too large (since we are going uphill at this point.)