Dogleg and Steihaug Method, Lecture Notes - Mathematics -, Study notes of Mathematical Methods

Variants of Trust-Region Methods, Choice of Model Functions, Solving the trust -region subproblem

Typology: Study notes

2010/2011

Uploaded on 09/09/2011

luber-1
luber-1 🇬🇧

4.8

(12)

293 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
C12.1B: CONTINUOUS OPTIMISATION
LECTURE 7: THE DOGLEG AND STEIHAUG METHODS
RAPHAEL HAUSER
MATHEMATICAL INSTITUTE, UNIVERSITY OF OXFORD
1. Variants of Trust-Region Methods. The generic trust region method we
introduced in Lecture 6 is a fairly general algorithmic framework:
(i) Although we made a specific choice for defining and updating the trust region
Rk, other choices are possible, for example by considering balls in the norms
k · k1or k · k. We will not pursue this matter further.
(ii) There is freedom in the choice of the model function mk. We chose to inves-
tigate only quadratic model functions whose linear part coincides with the
first order Taylor approximation of f, but this leaves many possibilities for
choosing the matrix Bk. We discuss this issue in Section 2 below.
(iii) The point yk+1 should be obtained via an approximate solution of the trust
region subproblem
min
yRk
mk(y).(1.1)
Theorem 1.2 of Lecture 6 shows that it is desirable to choose an approximate
computation that uses the Cauchy point as a benchmark, but other than that
there is complete freedom in choosing a method for this computation. Two
of the most widely used methods in this context are the dogleg method of
Section 3.1 and Steihaug’s method of Section 3.2.
2. Choice of the model function. Let us discuss a few methods for choosing
the matrix Bkthat determines the model function
mk(x) = f(xk) + f(xk)T(xxk) + 1
2(xxk)TBk(xxk).
2.1. Trust-Region Newton Methods. If the problem dimension is not too
large, the choice
Bk=D2f(xk)
is reasonable and leads to a model function mkthat is simply the second order Taylor
approximation of the objective function faround the current iterate xk. Methods
based on this choice of model function are called trust-region Newton methods.
It is important to understand that trust-region Newton methods are not simply
the Newton-Raphson method with an additional step-size restriction. In fact, trust-
region Newton methods overcome most of the unwanted aspects of the dynamical
behaviour of the Newton-Raphson method while retaining all its advantages with re-
gards to convergence speed:
(i) In the neighbourhood of a saddle point or a local maximiser xof f, the
Newton-Raphson method is attracted to x. This is unwanted, because x
is a spurious solution of the minimisation problem min f(x). Trust-region
Newton methods are not attracted to such solutions because the trust-region
framework ensures that the sequence (f(xk))Nis decreasing.
1
pf3
pf4
pf5

Partial preview of the text

Download Dogleg and Steihaug Method, Lecture Notes - Mathematics - and more Study notes Mathematical Methods in PDF only on Docsity!

C12.1B: CONTINUOUS OPTIMISATION

LECTURE 7: THE DOGLEG AND STEIHAUG METHODS

RAPHAEL HAUSER MATHEMATICAL INSTITUTE, UNIVERSITY OF OXFORD

  1. Variants of Trust-Region Methods. The generic trust region method we introduced in Lecture 6 is a fairly general algorithmic framework:

(i) Although we made a specific choice for defining and updating the trust region Rk, other choices are possible, for example by considering balls in the norms ‖ · ‖ 1 or ‖ · ‖∞. We will not pursue this matter further. (ii) There is freedom in the choice of the model function mk. We chose to inves- tigate only quadratic model functions whose linear part coincides with the first order Taylor approximation of f , but this leaves many possibilities for choosing the matrix Bk. We discuss this issue in Section 2 below. (iii) The point yk+1 should be obtained via an approximate solution of the trust region subproblem

min y∈Rk mk(y). (1.1)

Theorem 1.2 of Lecture 6 shows that it is desirable to choose an approximate computation that uses the Cauchy point as a benchmark, but other than that there is complete freedom in choosing a method for this computation. Two of the most widely used methods in this context are the dogleg method of Section 3.1 and Steihaug’s method of Section 3.2.

  1. Choice of the model function. Let us discuss a few methods for choosing the matrix Bk that determines the model function

mk(x) = f (xk) + ∇f (xk)T(x − xk) +

(x − xk)TBk(x − xk).

2.1. Trust-Region Newton Methods. If the problem dimension is not too large, the choice

Bk = D^2 f (xk)

is reasonable and leads to a model function mk that is simply the second order Taylor approximation of the objective function f around the current iterate xk. Methods based on this choice of model function are called trust-region Newton methods. It is important to understand that trust-region Newton methods are not simply the Newton-Raphson method with an additional step-size restriction. In fact, trust- region Newton methods overcome most of the unwanted aspects of the dynamical behaviour of the Newton-Raphson method while retaining all its advantages with re- gards to convergence speed:

(i) In the neighbourhood of a saddle point or a local maximiser x∗^ of f , the Newton-Raphson method is attracted to x∗. This is unwanted, because x∗ is a spurious solution of the minimisation problem min f (x). Trust-region Newton methods are not attracted to such solutions because the trust-region framework ensures that the sequence (f (xk))N is decreasing.

(ii) The Newton-Raphson update xk+1 = xk − D^2 f (xk)−^1 ∇f (xk) is not defined when the Hessian D^2 f (xk) is singular. However, the trust-region subproblem (1.1) is still well-defined and yk+1 can be computed. (iii) Even in situations where the Newton-Raphson update xk+1 is well-defined and f (xk+1) < f (xk), yk+1 may still differ from xk+1 because xk+1 can lie outside the trust region Rk. (iv) When xk enters a sufficiently small neighbourhood of a local minimiser x∗^ of f where D^2 f (x∗) ≻ 0, the updates xk+1,... generated by trust-region New- ton methods start coinciding with those produced by the Newton-Raphson method. The two approaches have therefore the same asymptotic conver- gence rate which is Q-quadratic.

2.2. Quasi-Newton Trust-Region Methods. When the problem dimension n is large, the natural choice for the model function mk is to use quasi-Newton updates for the approximate Hessians Bk. The only difference is that xk+1 is now obtained by approximately solving the trust region subproblem (1.1) rather than by a line-search. Again, these methods are qualitatively different from the corresponding quasi-Newton line-search methods:

(i) When Bk is updated using the SR1 rule (see Lecture 4), it is not guaranteed to be positive definite for all k. This poses a problem for the SR1 line-search method that depends on solving the linear system Bkdk = −∇f (xk), because dk may fail to be a descent direction or Bk may be nearly singular. In contrast, the SR1 trust-region method is not affected by this, because the trust-region subproblem (1.1) is still well defined and an approximate minimiser yk+1 can be obtained via Steihaug’s method (see Section 3.2). (ii) When xk enters a sufficiently small neighbourhood of a local minimiser x∗ of f , the output sequences produced by quasi-Newton trust-region methods and their line-search counterparts again start coinciding, and the asymptotic convergence rate is Q-superlinear for both approaches.

Note: (i) shows that while there are good reasons to prefer BFGS to SR1 updates in line-search methods, there is no such obvious choice when it comes to quasi-Newton trust-region methods. In fact, when the approximate solver of the trust-region sub- problem does not depend on Bk to be positive definite, SR1 updates are preferable because they are allowed to become indefinite and can model the true Hessian D^2 f (xk) better. Moreover, they are cheaper to evaluate.

  1. Solving the Trust-Region Subproblem. In this section we will discuss two of the most widely used methods for computing an approximate minimiser yk+ of the trust-region subproblem (1.1).

3.1. The Dogleg Method. This method is very simple and cheap to compute, but it works only when Bk is positive definite. Therefore, when this approach is used in connection with quasi-Newton trust-region methods, BFGS updates for Bk are a good choice, but SR1 updates are not. The method is motivated as follows: consider the exact solution of the trust region subproblem as a function of the trust region radius,

x(∆) = arg min {x∈Rn:‖x−xk ‖≤∆}

mk(x). (3.1)

−∇f (xk) yuk

xk

Rk^ yqnk

yk+

Fig. 3.2. The dogleg path in the case where yk+1 lies on the second section of the leg.

tives at ∆ = 0 are colinear:

lim ∆→0+

x(∆) − xk ∆

∇f (xk) ‖∇f (xk)‖

−‖∇f (xk)‖^2 ∇f (xk)TBk∇f (xk)

∇f (xk)

= lim τ →0+

y(τ ) − y(0) τ

Proof. See Problem Set 4.

Parts i) and ii) of the Lemma show that the dogleg minimiser yk+1 is easy to compute: if ykqn ∈ Rk then yk+1 = yqnk , and otherwise yk+1 is the unique intersection point of the dogleg path with the boundary of Rk, see Figures 3.1 and 3.2. The dogleg calculation of yk+1 can thus be summed up as follows:

Algorithm 3.2 (Dogleg). compute yuk as in (3.3) if ‖yuk − xk‖ ≥ ∆k stop with yk+1 = xk + (^) ‖yu∆k k −xk^ ‖^ (yuk − xk) (*) compute yqnk as in (3.2) if ‖yqnk − xk‖ ≤ ∆k stop with yk+1 = yqnk else begin find τ ∗^ s.t. ‖yku + τ ∗(yqnk − yuk ) − xk‖ = ∆k stop with yk+1 = yuk + τ ∗(ykqn − yuk ) end

If the algorithm stops in (*) then the dogleg minimiser lies on the first part of the leg and equals the Cauchy point. Otherwise the dogleg minimiser lies on the second part of the leg and is better than the Cauchy point. Therefore, we have

mk(yk+1) ≤ mk(yck) in both cases, and Theorem 3.1 of Lecture Note 6 can be applied.

3.2. Steihaug’s Method. This is the most widely used method for the ap- proximate solution of the trust-region subproblem. The method works for quadratic models mk defined by an arbitrary symmetric Bk. Positive definiteness is therefore not required and SR1 updates can be used for Bk.

One of the strengths of the dogleg method is that the method takes the quasi- Newton step yk+1 = ykqn when yqnk lies in the trust region. If Bk converges to D^2 f (x∗) ≻ 0 as xk approaches a strict local minimiser x∗^ of f , this allows (xk)N to converge Q-superlinearly. Steihaug’s method is designed to inherit this desirable property. However, when Bk is not positive definite, it is not necessarily desireable to move to yqnk because mk(ykqn ) might be larger than mk(xk) = f (xk). Steihaug’s method overcomes this problem as follows:

  • Draw the polygon traced by the iterates xk = z 0 , z 1 ,... , zj ,... obtained by applying the conjugate gradient algorithm to the minimisation of the quadratic function mk(x) for as long as the updates are defined, i.e., as long as dT j Bkdj > 0.
  • This terminates in the quasi-Newton point zn = yqnk , unless dT j Bkdj ≤ 0. In the second case, continue to draw the polygon from zj to infinity along dj , as mk can be pushed to −∞ along that path.
  • Minimise mk along this polygon and select yk+1 as the minimiser. The polygon is constructed so that mk(z) decreases along its path, while Theorem 3.4 below shows that ‖z − xk‖ increases. Therefore, if the polygon ends at zn ∈ Rk then yk+1 = zn, and otherwise yk+1 is the unique point where the polygon crosses the boundary ∂Rk of the trust region. Stated more formally, Steighaug’s method proceeds as follows:

Algorithm 3.3 (Steihaug). S0 Initialisation: choose tolerance parameter ǫ > 0 set z 0 = xk, d 0 = −∇mk(xk) S1 For j = 0,... , n − 1 repeat if dT j Bkdj ≤ 0 begin find τ ∗^ ≥ 0 s.t. ‖zj + τ ∗dj − xk‖ = ∆k stop with yk+1 = zj + τ ∗dj end else begin find τj := arg minτ ≥ 0 mk(zj + τ dj ) set zj+1 := zj + τj dj if ‖zj+1 − xk‖ ≥ ∆k begin find τ ∗^ ≥ 0 s.t. ‖zj + τ ∗dj − xk‖ = ∆k stop with yk+1 = zj + τ ∗dj end if ‖∇mk(zj+1)‖ ≤ ǫ stop with yk+1 = zj+1 (*) else compute dj+1 = −∇mk(zj+1) + ‖∇mk^ (zj+1)‖

2 ‖∇mk (zj )‖^2 dj end end