



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Trust Region method, Updating trust region, Cauchy point, Global convergence
Typology: Study notes
1 / 6
This page cannot be seen from the preview
Don't miss anything!




RAPHAEL HAUSER MATHEMATICAL INSTITUTE, UNIVERSITY OF OXFORD
min α> 0 f (xk + αdk),
where dk is a descent direction. Thus, in effect, in each iteration one replaces the n-dimensional minimisation problem
min x∈Rn^ f (x) (1.1)
by a simpler one-dimensional minimisation problem. Line-search methods are widely used in practical optimisation codes, but this is not the only useful principle for con- structing iterative minimisation algorithms. Trust region methods constitute a second fundamental class of algorithms. In this approach (1.1) is again replaced by a sequence of easier problems, but instead of reducing the problem dimension the simplicity is achieved by replacing f with a degree 2 polynomial. Conceptually, the idea can be described as follows:
xk+1 ≈ arg min x∈Rk mk(x). (1.2)
It may seem surprising that we propose to replace the unconstrained optimisation problem (1.1) by the constrained trust region subproblem (1.2), as constraints intro- duce additional difficulties. However, this is worthwhile doing because (1.2) need only be approximately solved, and this can be done efficiently when
mk(x) = f (xk) + ∇f (xk)T(x − xk) +
(x − xk)TBk(x − xk) (1.3)
is a quadratic function and the trust region Rk is chosen judiciously, see Lecture 7. The linear part of (1.3) coincides with the first order Taylor approximation of f around xk, so that mk(x) will be a good local approximation of f (x) if Bk ≈ D^2 f (xk). To make the method work, we will thus have to worry about how to update Bk cheaply. But note that the quasi-Newton Hessian approximations discussed in Lecture 5 are perfect for this job!
1.1. Accepting and Rejecting Updates. Let yk+1 be the approximate min- imiser of the trust region subproblem (1.2). In principle, this is the point we would like to select as our next iterate xk+1. However, yk+1 is computed on the basis of the model function mk, and it could happen that moving to yk+1 leads to an increase rather than decrease in of the true objective function f. Trust-region methods there- fore accept yk+1 only if the decrease achieved in f is at least a fixed proportion of the decrease ”promised” by mk,
xk+1 =
yk+1 if (^) mfk^ ( (xxkk^ ) )−−fm^ (yk k(y+1k+1)) > η, xk otherwise,
where η ∈ (0, 1 /4) is fixed. Note that rejecting the update does not imply that the algorithm will stall, because we can still shrink the trust region so that yk+2 6 = yk+1.
1.2. Updating the Trust Region. The easiest way to define a trust region Rk is to choose the closed ball of radius ∆k around xk in some norm ‖ · ‖,
Rk = {x ∈ Rn^ : ‖x − xk‖ ≤ ∆k}.
For simplicity, we will assume that ‖ · ‖ is the Euclidean norm. ∆k is called the trust region radius. In order to define a new trust region Rk+1 around xk+1, it suffices to fix a rule on how to select ∆k+1. The following rule is a popular choice, where yk+1 is as in Section 1.1,
∆k+1 =
∆k 4 if^
f (xk )−f (yk+1) mk (xk )−mk (yk+1) <^
1 4 , min(2∆k, ∆max) if (^) mfk^ ((xxkk^ ))−−fm^ (ky (ky+1k+1)) > 34 , ∆k otherwise.
The rule is designed so that ∆k never exceeds ∆max, and it is motivated by comparing the objective function decrease f (xk) − f (yk+1) with the decrease mk(xk) − mk(yk+1) ”promised” by the model function:
1.3. The Algorithm. By now we assembled the necessary elements to formu- late a generic trust region algorithm:
Algorithm 1.1 (Generic Trust region Method). S0 Choose ∆max > 0 , ∆ 0 ∈ (0, ∆max), η ∈ (0, 1 /4), x 0 ∈ Rn, B 0 , ǫ > 0. S1 While ‖∇f (xk)‖ ≥ ǫ repeat Compute yk+1 as the approximate minimiser of (1.2). Determine xk+1 via (1.4).
Claim 1 follows from Proposition 3.2 below; for Claim 2 see Problem Set 3. It follows from these two claims that
lim k→∞ f (xk) =
k=
f (xk+1) − f (xk) = −∞,
since (1.4) guarantees that the series on the right hand side contains only nonpositive terms.
We now set out to showing the validity of Claim 1. Intuitively it is clear that when ‖∇f (xk)‖ is bounded below and ∆k becomes sufficiently small, then f (yk+1)−f (xk) ≈ mk(yk+1) − mk(xk) should hold. Indeed, in Lemma 3.5 below we will show that ‖∇f (xk)‖ ≥ ǫ and ∆k < 2 ǫ/(7β) imply
f (yk+1) − f (xk) mk(yk+1) − mk(xk)
Claim 1 then follows immediately from the following result:
Proposition 3.2. There are at most ⌊log 4 ∆max 2 ǫ 7 β⌋ rejected updates between suc- cessive accepted updates.
Proof. Suppose to the contrary that all updates yk+1 for k = k 0 , k 0 + 1,... , k 0 + ⌈log 4 ∆max 2 ǫ 7 β⌉ =: k 1 are rejected. Then
∆k 1 = ∆k 0 4 −(k^1 −k^0 )^ ≤ 2 ǫ 7 β
and (3.1) contradicts our assumption that that yk 1 +1 is rejected.
It remains to prove (3.1). We divide the argument into several lemmas.
Lemma 3.3. Let ‖∇f (xk)‖ ≥ ǫ and ∆k < ǫ/β. Then
ykc = xk −
∆k ‖∇f (xk)‖ ∇f (xk). (3.2)
Proof. If ∇f (xk)TBk∇f (xk) ≤ 0 then (3.2) holds because of (2.1). So, we may assume that ∇f (xk)TBk∇f (xk) > 0, and then
∆k <
ǫ β
‖∇f (xk)‖ β
‖∇f (xk)‖^3 β‖∇f (xk)‖^2
‖∇f (xk)‖^3 ∇f (xk)TBk∇f (xk)
But this implies that
∆k ‖∇f (xk)‖
∇f (xk)T∇f (xk) ∇f (xk)TBk∇f (xk)
The result now follows from (2.1).
Lemma 3.4. Let ‖∇f (xk)‖ ≥ ǫ and ∆k < ǫ/(2β). Then
∇f (xk)T(yk+1 − xk) ≤ −
∆k‖∇f (xk)‖ 2
Proof. The relation ∆k < 2 ǫβ ≤ ‖∇f 2 (βx k^ )‖implies that
−∆k‖∇f (xk)‖ + ∆^2 kβ ≤ −
∆k‖∇f (xk)‖ 2
Moreover, by Lemma 3.3, ∆k < 2 ǫβ < (^) βǫ implies yck = xk − (^) ‖∇f∆ (kxk )‖ ∇f (xk), and hence,
mk(yck) = f (xk) − ∆k‖∇f (xk)‖ +
∆^2 k 2
∇f (xk)TBk∇f (xk) ‖∇f (xk)‖^2
The assumption mk(yk+1) ≤ mk(yck) from Theorem 3.1 implies
f (xk) + ∇f (xk)T(yk+1 − xk) +
(yk+1 − xk)TBk(yk+1 − xk)
(3.4) ≤
f (xk) − ∆k‖∇f (xk)‖ + ∆^2 k 2
∇f (xk)TBk∇f (xk) ‖∇f (xk)‖^2
so that
∇f (xk)T(yk+1 − xk)
≤ −∆k‖∇f (xk)‖ +
∆^2 k 2
∇f (xk)TBk∇f (xk) ‖∇f (xk)‖^2
(yk+1 − xk)TBk(yk+1 − xk)
≤ −∆k‖∇f (xk)‖ + ∆^2 β (3.3) ≤ −
∆k‖∇f (xk)‖ 2
Lemma 3.5. Let ‖∇f (xk)‖ ≥ ǫ and ∆k < 2 ǫ/(7β). Then
f (yk+1) − f (xk) mk(yk+1) − mk(xk)
Proof. We have
∆k < 2 ǫ 7 β
2 ‖∇f (xk)‖ 7 β
⇒ β∆k < ‖∇f (xk)‖ 4
β∆k 8
⇒
β∆k ‖∇f (xk)‖ + 12 β∆k
1 2 ‖∇f^ (xk)‖∆k^ −^
1 2 β∆
2 k ‖∇f (xk)‖∆k + 12 β∆^2 k
β∆k ‖∇f (xk)‖ + 12 β∆k
On the other hand, since ∆k < 2 ǫ/ 7 β < ǫ/ 2 β, Lemma 3.3 shows that
0 < mk(xk) − mk(yk+1) = ∇f (xk)T(xk − yk+1) −
(yk+1 − xk)TBk(yk+1 − xk)
≤ ∇f (xk)T(xk − yk+1) +
β∆^2 k ≤ ‖∇f (xk)‖∆k +
β∆^2 k.