Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Line Search Methods - Numerical Optimization - Lecture Notes | MATH 693A, Study notes of Mathematics

San Diego State University (SDSU)Mathematics

Material Type: Notes; Subject: Mathematics; University: San Diego State University; Term: Fall 2009;

Typology: Study notes

Pre 2010

Uploaded on 03/28/2010

koofers-user-qco 🇺🇸

10 documents

1 / 24

This page cannot be seen from the preview

Don't miss anything!

Line Search Methods

Step Length Selection

Numerical Optimization

Lecture Notes #6

— Line Search Methods: Step Length Selection

Peter Blomgren,

h[email protected]i

Department of Mathematics and Statistics

Dynamical Systems Group

Computational Sciences Research Center

San Diego State University

San Diego, CA 92182-7720

http://terminus.sdsu.edu/

Fall 2009

Peter Blomgren, h[email protected]iLine Search Methods: Step Length Selection — (1/24)

Discover Study notes of Mathematics San Diego State University (SDSU)

Partial preview of the text

Download Line Search Methods - Numerical Optimization - Lecture Notes | MATH 693A and more Study notes Mathematics in PDF only on Docsity!

Line Search Methods Step Length Selection

Numerical Optimization

Lecture Notes # — Line Search Methods: Step Length Selection

Peter Blomgren, 〈[email protected]〉 Department of Mathematics and Statistics Dynamical Systems Group Computational Sciences Research Center San Diego State University San Diego, CA 92182- http://terminus.sdsu.edu/

Fall 2009

Line Search Methods Step Length Selection

Outline

(^1) Line Search Methods Recap Step Length Selection

(^2) Step Length Selection Interpolation The Initial Step LS — Strong Wolfe Conditions

Line Search Methods Step Length Selection Recap Step Length Selection

Quick Recap: Last Time

Further, Quasi-Newton methods, where the search direction is ˜pQN k = −B k− 1 ∇f (˜xk ), exhibit super-linear convergence as long as the matrix sequence {Bk } converges to the Hessian ∇^2 f (˜x∗) in the search direction ˜pk :

lim k→∞

‖(Bk − ∇^2 f (˜x∗))˜pk ‖ ‖p˜k ‖

Coordinate Descent Methods: Slower than Steepest descent. Useful of coordinates are decoupled and/or computation of the gradient is not possible or too expensive.

Line Search Methods Step Length Selection Recap Step Length Selection

Unconstrained Optimization — In the Line Search “Universe”

Global Optimization Problem

Local Strategies

Line Search Algorithms

Search Direction Step Length

Sufficient Descent Conditions

Convergence: Global

Convergence: Local Rate

Line Search Methods Step Length Selection Recap Step Length Selection

Step Length Selection: Assumptions

We must assume that p˜k is a descent direction, i.e. that Φ′(0) < 0 — thus all our steps will be in the positive direction. When the objective f is quadratic f (˜x) = 12 ˜xT^ Q˜x + ˜bT^ ˜x + c, the optimal step can be found explicitly

αk = − ∇f (˜xk )T^ ˜pk ˜pTk Q˜pk

For general nonlinear f we must use an iterative scheme to find the step length αk. How the line search is performed impacts the robustness and efficiency of the overall optimization method.

Line Search Methods Step Length Selection Recap Step Length Selection

Step Length Selection: Classification

It is natural to classify line search methods based on how many derivatives they need: 0 Methods based on^ function values only^ tend to be inef- ficient, since they need to narrow the minimizer to a small interval.

1 Gradient information makes it easier to determine if a certain step is good — i.e. it satisfies a sufficient reduction condition.

1 Methods requiring more than one derivate are quite rare; in order to compute the second derivative the full Hessian ∇^2 f (˜xk ) is needed, this is usually too high a cost.

Line Search Methods Step Length Selection

Interpolation The Initial StepLS — Strong Wolfe Conditions

Step Length Selection: Interpolation 1 of 7

First we note that the Armijo condition can be written in terms of Φ as Φ(αk ) ≤ Φ(0) + c 1 αk Φ′(0) where c 1 ∼ 10 −^4 in practice. This is stronger (but not much stronger) that requiring descent. Our new algorithms will be efficient in the sense that the gradient ∇f (˜xk ) is computed as few times as possible. If the initial step length α 0 satisfies the Armijo condition, then we accept α 0 as the step length and terminate the search. — As we get close to the solution this will happen more and more often (for Newton and quasi-Newton methods with α 0 = 1.) Otherwise, we search for an acceptable step length in [0, α 0 ]...

Line Search Methods Step Length Selection

Interpolation The Initial StepLS — Strong Wolfe Conditions

Step Length Selection: Interpolation 2 of 7

At this stage we have computed 3 pieces of information: Φ(0), Φ′(0), and Φ(α 0 ) we use this information to build a quadratic model Φq (α):

Φq (α) =

[

Φ(α 0 ) − Φ(0) − α 0 Φ′(0) α^20

]

α^2 + Φ′(0)α + Φ(0)

Note Φq (0) = Φ(0), Φq (α 0 ) = Φ(α 0 ), Φ′ q (0) = Φ′(0) We set Φ′ q (α) = 0 to find the minimum of the model — our next α to try...

Φ′ q (α) = 2α

[

Φ(α 0 ) − Φ(0) − α 0 Φ′(0) α^20

]

Line Search Methods Step Length Selection

Interpolation The Initial StepLS — Strong Wolfe Conditions

Step Length Selection: Interpolation 4 of 7

The next iterate (α 2 ) is now the minimizer of Φc (α) which lies in [0, α 1 ], it is given as one of the roots of the quadratic equation

Φ′ c (α) = 3aα^2 + 2bα + Φ′(0) = 0

it is... α 2 =

−b +

b^2 − 3 aΦ′(0) 3 a In the extremely rare cases that α 2 does not satisfy the Armijo condition Φ(α 2 ) ≤ Φ(0) + c 1 α 2 Φ′(0) we create a new cubic model interpolating

Φ(0), Φ′(0), Φ(α 1 ), and Φ(α 2 )

i.e. Φ(0), Φ′(0) and the two most recent α’s.

Line Search Methods Step Length Selection

Interpolation The Initial StepLS — Strong Wolfe Conditions

Step Length Selection: Interpolation 5 of 7

At this point we must introduce to following safeguards to guarantee that we make sufficient progress: If |αk+1 − αk | < ǫ 1 or |αk+1| < ǫ 2 then αk+1 = αk / 2

The algorithm described assumes that computing the derivative is significantly more expensive than computing function values. However it is often, but not always, possible to compute the directional derivative (or a good estimate thereof) with minimal extra cost. In those cases we build the cubic interpolant so that it interpolates

Φ(αk ), Φ′(αk ), Φ(αk− 1 ), and Φ′(αk− 1 )

this is a Hermite Polynomial of degree 3 (see Math 541.)

Line Search Methods Step Length Selection

Interpolation The Initial StepLS — Strong Wolfe Conditions

Step Length Selection: Interpolation 7 of 7

The minimizer of H 3 (α) in [αk− 1 , αk ] is either at one of the end points, or else in the interior (given setting H 3 ′(α) = 0). The interior point is given by

αk+1 = αk − (αk − αk− 1 )

[ Φ′(αk ) + d 2 − d 1 Φ′(αk ) − Φ′(αk− 1 ) + 2d 2

]

where

d 1 = Φ′(αk− 1 ) + Φ′(αk ) − 3

[ Φ(αk− 1 ) − Φ(αk ) αk− 1 − αk

]

d 2 =

√ d 12 − Φ′(αk− 1 )Φ′(αk )

Either αk+1 is accepted as the step length, or the search process continues... Cubic interpolation gives quadratic convergence in the step length selection algorithm.

Line Search Methods Step Length Selection

Interpolation The Initial Step LS — Strong Wolfe Conditions

Step Length Selection: The Initial Step 1 of 2

For Newton and quasi-Newton methods, the search vector ˜pk contains an intrinsic sense of scale (being formed from the local descent, and curvature information), hence the initial trial step length should always be α 0 = 1, otherwise we break the quadratic respective super-linear convergence properties. For other search directions, such as steepest descent and conjugate gradient (to be described later) directions which do not have a sense of scale, other method must be used to select a good first trial step: Strategy #1: Assume that the rate of change in the current iteration will be the same as in the previous iteration, select α 0 :

α[ 0 k ]= α[k−1]^ p˜Tk− 1 ∇f (˜xk− 1 ) ˜pTk ∇f (˜xk ) .

Line Search Methods Step Length Selection

Interpolation The Initial Step LS — Strong Wolfe Conditions

Line Search for the Strong Wolfe Conditions 1 of 6

Algorithm: LS/Strong Wolfe Conditions

Set α 0 = 0, choose α 1 > 0, αmax, c 1 , and c 2 , i = 1
while( TRUE )
Compute Φ(αi )
if (Φ(αi ) > Φ(0) + c 1 αi Φ′(0)) or (Φ(αi ) ≥ Φ(αi− 1 ) and i > 1)
α∗ = zoom(αi− 1 , αi ), and terminate search
Compute Φ′(αi )
if |Φ′(αi )| ≤ −c 2 Φ′(0)
α∗ = αi , and terminate search
if Φ′(αi ) ≥ 0
α∗ = zoom(αi , αi− 1 ), and terminate search
Choose αi+1 ∈ [αi , αmax]
i = i + 1
end

Line Search Methods Step Length Selection

Interpolation The Initial Step LS — Strong Wolfe Conditions

Line Search for the Strong Wolfe Conditions 2 of 6

In the first stage of the algorithm, either an acceptable step length, or a range [αi , αi+1] containing an acceptable step length is identified — none of the conditions 04, 07, 09 are satisfied so the step length is increased

If in the first stage we identified a range, the second stage invokes a function zoom which will identify an acceptable step from the interval. Notes: 04 establishes that αi is too long a step, thus α∗ must be in the range [αi− 1 , αi ]. If 07 holds, then both the strong Wolfe conditions hold (since not(04) must also hold. Finally, if 09 holds then the step is too large (since we are going uphill at this point.)

Line Search Methods - Numerical Optimization - Lecture Notes | MATH 693A, Study notes of Mathematics

Related documents

Partial preview of the text

Download Line Search Methods - Numerical Optimization - Lecture Notes | MATH 693A and more Study notes Mathematics in PDF only on Docsity!

Numerical Optimization

[

]

[

]