



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Variants of Trust-Region Methods, Choice of Model Functions, Solving the trust -region subproblem
Typology: Study notes
1 / 6
This page cannot be seen from the preview
Don't miss anything!




RAPHAEL HAUSER MATHEMATICAL INSTITUTE, UNIVERSITY OF OXFORD
(i) Although we made a specific choice for defining and updating the trust region Rk, other choices are possible, for example by considering balls in the norms ‖ · ‖ 1 or ‖ · ‖∞. We will not pursue this matter further. (ii) There is freedom in the choice of the model function mk. We chose to inves- tigate only quadratic model functions whose linear part coincides with the first order Taylor approximation of f , but this leaves many possibilities for choosing the matrix Bk. We discuss this issue in Section 2 below. (iii) The point yk+1 should be obtained via an approximate solution of the trust region subproblem
min y∈Rk mk(y). (1.1)
Theorem 1.2 of Lecture 6 shows that it is desirable to choose an approximate computation that uses the Cauchy point as a benchmark, but other than that there is complete freedom in choosing a method for this computation. Two of the most widely used methods in this context are the dogleg method of Section 3.1 and Steihaug’s method of Section 3.2.
mk(x) = f (xk) + ∇f (xk)T(x − xk) +
(x − xk)TBk(x − xk).
2.1. Trust-Region Newton Methods. If the problem dimension is not too large, the choice
Bk = D^2 f (xk)
is reasonable and leads to a model function mk that is simply the second order Taylor approximation of the objective function f around the current iterate xk. Methods based on this choice of model function are called trust-region Newton methods. It is important to understand that trust-region Newton methods are not simply the Newton-Raphson method with an additional step-size restriction. In fact, trust- region Newton methods overcome most of the unwanted aspects of the dynamical behaviour of the Newton-Raphson method while retaining all its advantages with re- gards to convergence speed:
(i) In the neighbourhood of a saddle point or a local maximiser x∗^ of f , the Newton-Raphson method is attracted to x∗. This is unwanted, because x∗ is a spurious solution of the minimisation problem min f (x). Trust-region Newton methods are not attracted to such solutions because the trust-region framework ensures that the sequence (f (xk))N is decreasing.
(ii) The Newton-Raphson update xk+1 = xk − D^2 f (xk)−^1 ∇f (xk) is not defined when the Hessian D^2 f (xk) is singular. However, the trust-region subproblem (1.1) is still well-defined and yk+1 can be computed. (iii) Even in situations where the Newton-Raphson update xk+1 is well-defined and f (xk+1) < f (xk), yk+1 may still differ from xk+1 because xk+1 can lie outside the trust region Rk. (iv) When xk enters a sufficiently small neighbourhood of a local minimiser x∗^ of f where D^2 f (x∗) ≻ 0, the updates xk+1,... generated by trust-region New- ton methods start coinciding with those produced by the Newton-Raphson method. The two approaches have therefore the same asymptotic conver- gence rate which is Q-quadratic.
2.2. Quasi-Newton Trust-Region Methods. When the problem dimension n is large, the natural choice for the model function mk is to use quasi-Newton updates for the approximate Hessians Bk. The only difference is that xk+1 is now obtained by approximately solving the trust region subproblem (1.1) rather than by a line-search. Again, these methods are qualitatively different from the corresponding quasi-Newton line-search methods:
(i) When Bk is updated using the SR1 rule (see Lecture 4), it is not guaranteed to be positive definite for all k. This poses a problem for the SR1 line-search method that depends on solving the linear system Bkdk = −∇f (xk), because dk may fail to be a descent direction or Bk may be nearly singular. In contrast, the SR1 trust-region method is not affected by this, because the trust-region subproblem (1.1) is still well defined and an approximate minimiser yk+1 can be obtained via Steihaug’s method (see Section 3.2). (ii) When xk enters a sufficiently small neighbourhood of a local minimiser x∗ of f , the output sequences produced by quasi-Newton trust-region methods and their line-search counterparts again start coinciding, and the asymptotic convergence rate is Q-superlinear for both approaches.
Note: (i) shows that while there are good reasons to prefer BFGS to SR1 updates in line-search methods, there is no such obvious choice when it comes to quasi-Newton trust-region methods. In fact, when the approximate solver of the trust-region sub- problem does not depend on Bk to be positive definite, SR1 updates are preferable because they are allowed to become indefinite and can model the true Hessian D^2 f (xk) better. Moreover, they are cheaper to evaluate.
3.1. The Dogleg Method. This method is very simple and cheap to compute, but it works only when Bk is positive definite. Therefore, when this approach is used in connection with quasi-Newton trust-region methods, BFGS updates for Bk are a good choice, but SR1 updates are not. The method is motivated as follows: consider the exact solution of the trust region subproblem as a function of the trust region radius,
x(∆) = arg min {x∈Rn:‖x−xk ‖≤∆}
mk(x). (3.1)
−∇f (xk) yuk
xk
Rk^ yqnk
yk+
Fig. 3.2. The dogleg path in the case where yk+1 lies on the second section of the leg.
tives at ∆ = 0 are colinear:
lim ∆→0+
x(∆) − xk ∆
∇f (xk) ‖∇f (xk)‖
−‖∇f (xk)‖^2 ∇f (xk)TBk∇f (xk)
∇f (xk)
= lim τ →0+
y(τ ) − y(0) τ
Proof. See Problem Set 4.
Parts i) and ii) of the Lemma show that the dogleg minimiser yk+1 is easy to compute: if ykqn ∈ Rk then yk+1 = yqnk , and otherwise yk+1 is the unique intersection point of the dogleg path with the boundary of Rk, see Figures 3.1 and 3.2. The dogleg calculation of yk+1 can thus be summed up as follows:
Algorithm 3.2 (Dogleg). compute yuk as in (3.3) if ‖yuk − xk‖ ≥ ∆k stop with yk+1 = xk + (^) ‖yu∆k k −xk^ ‖^ (yuk − xk) (*) compute yqnk as in (3.2) if ‖yqnk − xk‖ ≤ ∆k stop with yk+1 = yqnk else begin find τ ∗^ s.t. ‖yku + τ ∗(yqnk − yuk ) − xk‖ = ∆k stop with yk+1 = yuk + τ ∗(ykqn − yuk ) end
If the algorithm stops in (*) then the dogleg minimiser lies on the first part of the leg and equals the Cauchy point. Otherwise the dogleg minimiser lies on the second part of the leg and is better than the Cauchy point. Therefore, we have
mk(yk+1) ≤ mk(yck) in both cases, and Theorem 3.1 of Lecture Note 6 can be applied.
3.2. Steihaug’s Method. This is the most widely used method for the ap- proximate solution of the trust-region subproblem. The method works for quadratic models mk defined by an arbitrary symmetric Bk. Positive definiteness is therefore not required and SR1 updates can be used for Bk.
One of the strengths of the dogleg method is that the method takes the quasi- Newton step yk+1 = ykqn when yqnk lies in the trust region. If Bk converges to D^2 f (x∗) ≻ 0 as xk approaches a strict local minimiser x∗^ of f , this allows (xk)N to converge Q-superlinearly. Steihaug’s method is designed to inherit this desirable property. However, when Bk is not positive definite, it is not necessarily desireable to move to yqnk because mk(ykqn ) might be larger than mk(xk) = f (xk). Steihaug’s method overcomes this problem as follows:
Algorithm 3.3 (Steihaug). S0 Initialisation: choose tolerance parameter ǫ > 0 set z 0 = xk, d 0 = −∇mk(xk) S1 For j = 0,... , n − 1 repeat if dT j Bkdj ≤ 0 begin find τ ∗^ ≥ 0 s.t. ‖zj + τ ∗dj − xk‖ = ∆k stop with yk+1 = zj + τ ∗dj end else begin find τj := arg minτ ≥ 0 mk(zj + τ dj ) set zj+1 := zj + τj dj if ‖zj+1 − xk‖ ≥ ∆k begin find τ ∗^ ≥ 0 s.t. ‖zj + τ ∗dj − xk‖ = ∆k stop with yk+1 = zj + τ ∗dj end if ‖∇mk(zj+1)‖ ≤ ǫ stop with yk+1 = zj+1 (*) else compute dj+1 = −∇mk(zj+1) + ‖∇mk^ (zj+1)‖
2 ‖∇mk (zj )‖^2 dj end end