
















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Subject: Mathematics; University: San Diego State University; Term: Fall 2009;
Typology: Study notes
1 / 24
This page cannot be seen from the preview
Don't miss anything!

















Line Search Methods Step Length Selection
Lecture Notes # — Line Search Methods: Step Length Selection
Peter Blomgren, 〈[email protected]〉 Department of Mathematics and Statistics Dynamical Systems Group Computational Sciences Research Center San Diego State University San Diego, CA 92182- http://terminus.sdsu.edu/
Fall 2009
Line Search Methods Step Length Selection
Outline
(^1) Line Search Methods Recap Step Length Selection
(^2) Step Length Selection Interpolation The Initial Step LS — Strong Wolfe Conditions
Line Search Methods Step Length Selection Recap Step Length Selection
Quick Recap: Last Time
Further, Quasi-Newton methods, where the search direction is ˜pQN k = −B k− 1 ∇f (˜xk ), exhibit super-linear convergence as long as the matrix sequence {Bk } converges to the Hessian ∇^2 f (˜x∗) in the search direction ˜pk :
lim k→∞
‖(Bk − ∇^2 f (˜x∗))˜pk ‖ ‖p˜k ‖
Coordinate Descent Methods: Slower than Steepest descent. Useful of coordinates are decoupled and/or computation of the gradient is not possible or too expensive.
Line Search Methods Step Length Selection Recap Step Length Selection
Unconstrained Optimization — In the Line Search “Universe”
Global Optimization Problem
Local Strategies
Line Search Algorithms
Search Direction Step Length
Sufficient Descent Conditions
Convergence: Global
Convergence: Local Rate
Line Search Methods Step Length Selection Recap Step Length Selection
Step Length Selection: Assumptions
We must assume that p˜k is a descent direction, i.e. that Φ′(0) < 0 — thus all our steps will be in the positive direction. When the objective f is quadratic f (˜x) = 12 ˜xT^ Q˜x + ˜bT^ ˜x + c, the optimal step can be found explicitly
αk = − ∇f (˜xk )T^ ˜pk ˜pTk Q˜pk
For general nonlinear f we must use an iterative scheme to find the step length αk. How the line search is performed impacts the robustness and efficiency of the overall optimization method.
Line Search Methods Step Length Selection Recap Step Length Selection
Step Length Selection: Classification
It is natural to classify line search methods based on how many derivatives they need: 0 Methods based on^ function values only^ tend to be inef- ficient, since they need to narrow the minimizer to a small interval.
1 Gradient information makes it easier to determine if a certain step is good — i.e. it satisfies a sufficient reduction condition.
1 Methods requiring more than one derivate are quite rare; in order to compute the second derivative the full Hessian ∇^2 f (˜xk ) is needed, this is usually too high a cost.
Line Search Methods Step Length Selection
Interpolation The Initial StepLS — Strong Wolfe Conditions
Step Length Selection: Interpolation 1 of 7
First we note that the Armijo condition can be written in terms of Φ as Φ(αk ) ≤ Φ(0) + c 1 αk Φ′(0) where c 1 ∼ 10 −^4 in practice. This is stronger (but not much stronger) that requiring descent. Our new algorithms will be efficient in the sense that the gradient ∇f (˜xk ) is computed as few times as possible. If the initial step length α 0 satisfies the Armijo condition, then we accept α 0 as the step length and terminate the search. — As we get close to the solution this will happen more and more often (for Newton and quasi-Newton methods with α 0 = 1.) Otherwise, we search for an acceptable step length in [0, α 0 ]...
Line Search Methods Step Length Selection
Interpolation The Initial StepLS — Strong Wolfe Conditions
Step Length Selection: Interpolation 2 of 7
At this stage we have computed 3 pieces of information: Φ(0), Φ′(0), and Φ(α 0 ) we use this information to build a quadratic model Φq (α):
Φq (α) =
Φ(α 0 ) − Φ(0) − α 0 Φ′(0) α^20
α^2 + Φ′(0)α + Φ(0)
Note Φq (0) = Φ(0), Φq (α 0 ) = Φ(α 0 ), Φ′ q (0) = Φ′(0) We set Φ′ q (α) = 0 to find the minimum of the model — our next α to try...
Φ′ q (α) = 2α
Φ(α 0 ) − Φ(0) − α 0 Φ′(0) α^20
Line Search Methods Step Length Selection
Interpolation The Initial StepLS — Strong Wolfe Conditions
Step Length Selection: Interpolation 4 of 7
The next iterate (α 2 ) is now the minimizer of Φc (α) which lies in [0, α 1 ], it is given as one of the roots of the quadratic equation
Φ′ c (α) = 3aα^2 + 2bα + Φ′(0) = 0
it is... α 2 =
−b +
b^2 − 3 aΦ′(0) 3 a In the extremely rare cases that α 2 does not satisfy the Armijo condition Φ(α 2 ) ≤ Φ(0) + c 1 α 2 Φ′(0) we create a new cubic model interpolating
Φ(0), Φ′(0), Φ(α 1 ), and Φ(α 2 )
i.e. Φ(0), Φ′(0) and the two most recent α’s.
Line Search Methods Step Length Selection
Interpolation The Initial StepLS — Strong Wolfe Conditions
Step Length Selection: Interpolation 5 of 7
At this point we must introduce to following safeguards to guarantee that we make sufficient progress: If |αk+1 − αk | < ǫ 1 or |αk+1| < ǫ 2 then αk+1 = αk / 2
The algorithm described assumes that computing the derivative is significantly more expensive than computing function values. However it is often, but not always, possible to compute the directional derivative (or a good estimate thereof) with minimal extra cost. In those cases we build the cubic interpolant so that it interpolates
Φ(αk ), Φ′(αk ), Φ(αk− 1 ), and Φ′(αk− 1 )
this is a Hermite Polynomial of degree 3 (see Math 541.)
Line Search Methods Step Length Selection
Interpolation The Initial StepLS — Strong Wolfe Conditions
Step Length Selection: Interpolation 7 of 7
The minimizer of H 3 (α) in [αk− 1 , αk ] is either at one of the end points, or else in the interior (given setting H 3 ′(α) = 0). The interior point is given by
αk+1 = αk − (αk − αk− 1 )
[ Φ′(αk ) + d 2 − d 1 Φ′(αk ) − Φ′(αk− 1 ) + 2d 2
]
where
d 1 = Φ′(αk− 1 ) + Φ′(αk ) − 3
[ Φ(αk− 1 ) − Φ(αk ) αk− 1 − αk
]
d 2 =
√ d 12 − Φ′(αk− 1 )Φ′(αk )
Either αk+1 is accepted as the step length, or the search process continues... Cubic interpolation gives quadratic convergence in the step length selection algorithm.
Line Search Methods Step Length Selection
Interpolation The Initial Step LS — Strong Wolfe Conditions
Step Length Selection: The Initial Step 1 of 2
For Newton and quasi-Newton methods, the search vector ˜pk contains an intrinsic sense of scale (being formed from the local descent, and curvature information), hence the initial trial step length should always be α 0 = 1, otherwise we break the quadratic respective super-linear convergence properties. For other search directions, such as steepest descent and conjugate gradient (to be described later) directions which do not have a sense of scale, other method must be used to select a good first trial step: Strategy #1: Assume that the rate of change in the current iteration will be the same as in the previous iteration, select α 0 :
α[ 0 k ]= α[k−1]^ p˜Tk− 1 ∇f (˜xk− 1 ) ˜pTk ∇f (˜xk ) .
Line Search Methods Step Length Selection
Interpolation The Initial Step LS — Strong Wolfe Conditions
Line Search for the Strong Wolfe Conditions 1 of 6
Algorithm: LS/Strong Wolfe Conditions
Line Search Methods Step Length Selection
Interpolation The Initial Step LS — Strong Wolfe Conditions
Line Search for the Strong Wolfe Conditions 2 of 6
In the first stage of the algorithm, either an acceptable step length, or a range [αi , αi+1] containing an acceptable step length is identified — none of the conditions 04, 07, 09 are satisfied so the step length is increased
If in the first stage we identified a range, the second stage invokes a function zoom which will identify an acceptable step from the interval. Notes: 04 establishes that αi is too long a step, thus α∗ must be in the range [αi− 1 , αi ]. If 07 holds, then both the strong Wolfe conditions hold (since not(04) must also hold. Finally, if 09 holds then the step is too large (since we are going uphill at this point.)