Trust Region Method, Lecture Notes - Mathematics -, Study notes of Mathematical Methods

Trust Region method, Updating trust region, Cauchy point, Global convergence

Typology: Study notes

2010/2011

Uploaded on 09/09/2011

luber-1
luber-1 🇬🇧

4.8

(12)

293 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
C12.1B: CONTINUOUS OPTIMISATION
LECTURE 6: TRUST REGION METHODS
RAPHAEL HAUSER
MATHEMATICAL INSTITUTE, UNIVERSITY OF OXFORD
1. Trust Region Methods. All unconstrained optimisation methods we dis-
cussed so far in this course are based on line-searches
min
α>0f(xk+αdk),
where dkis a descent direction. Thus, in effect, in each iteration one replaces the
n-dimensional minimisation problem
min
xRnf(x) (1.1)
by a simpler one-dimensional minimisation problem. Line-search methods are widely
used in practical optimisation codes, but this is not the only useful principle for con-
structing iterative minimisation algorithms. Trust region methods constitute a second
fundamental class of algorithms. In this approach (1.1) is again replaced by a sequence
of easier problems, but instead of reducing the problem dimension the simplicity is
achieved by replacing fwith a degree 2 polynomial. Conceptually, the idea can be
described as follows:
In iteration k, replace f(x) by a locally valid quadratic model function mk(x)
(recall that we already encountered this idea in the context of quasi-Newton
methods).
Choose a neighbourhood Rkof the current iterate xkin which mk(x) can be
trusted to approximate fwell (we do not care about how well mkapproxi-
mates foutside Rk).
The next iterate xk+1 is found by approximately minimising the model func-
tion over the trust region,
xk+1 arg min
xRk
mk(x).(1.2)
It may seem surprising that we propose to replace the unconstrained optimisation
problem (1.1) by the constrained trust region subproblem (1.2), as constraints intro-
duce additional difficulties. However, this is worthwhile doing because (1.2) need only
be approximately solved, and this can be done efficiently when
mk(x) = f(xk) + f(xk)T(xxk) + 1
2(xxk)TBk(xxk) (1.3)
is a quadratic function and the trust region Rkis chosen judiciously, see Lecture 7.
The linear part of (1.3) coincides with the first order Taylor approximation of f
around xk, so that mk(x) will be a good local approximation of f(x) if BkD2f(xk).
To make the method work, we will thus have to worry about how to update Bkcheaply.
But note that the quasi-Newton Hessian approximations discussed in Lecture 5 are
perfect for this job!
1
pf3
pf4
pf5

Partial preview of the text

Download Trust Region Method, Lecture Notes - Mathematics - and more Study notes Mathematical Methods in PDF only on Docsity!

C12.1B: CONTINUOUS OPTIMISATION

LECTURE 6: TRUST REGION METHODS

RAPHAEL HAUSER MATHEMATICAL INSTITUTE, UNIVERSITY OF OXFORD

  1. Trust Region Methods. All unconstrained optimisation methods we dis- cussed so far in this course are based on line-searches

min α> 0 f (xk + αdk),

where dk is a descent direction. Thus, in effect, in each iteration one replaces the n-dimensional minimisation problem

min x∈Rn^ f (x) (1.1)

by a simpler one-dimensional minimisation problem. Line-search methods are widely used in practical optimisation codes, but this is not the only useful principle for con- structing iterative minimisation algorithms. Trust region methods constitute a second fundamental class of algorithms. In this approach (1.1) is again replaced by a sequence of easier problems, but instead of reducing the problem dimension the simplicity is achieved by replacing f with a degree 2 polynomial. Conceptually, the idea can be described as follows:

  • In iteration k, replace f (x) by a locally valid quadratic model function mk(x) (recall that we already encountered this idea in the context of quasi-Newton methods).
  • Choose a neighbourhood Rk of the current iterate xk in which mk(x) can be trusted to approximate f well (we do not care about how well mk approxi- mates f outside Rk).
  • The next iterate xk+1 is found by approximately minimising the model func- tion over the trust region,

xk+1 ≈ arg min x∈Rk mk(x). (1.2)

It may seem surprising that we propose to replace the unconstrained optimisation problem (1.1) by the constrained trust region subproblem (1.2), as constraints intro- duce additional difficulties. However, this is worthwhile doing because (1.2) need only be approximately solved, and this can be done efficiently when

mk(x) = f (xk) + ∇f (xk)T(x − xk) +

(x − xk)TBk(x − xk) (1.3)

is a quadratic function and the trust region Rk is chosen judiciously, see Lecture 7. The linear part of (1.3) coincides with the first order Taylor approximation of f around xk, so that mk(x) will be a good local approximation of f (x) if Bk ≈ D^2 f (xk). To make the method work, we will thus have to worry about how to update Bk cheaply. But note that the quasi-Newton Hessian approximations discussed in Lecture 5 are perfect for this job!

1.1. Accepting and Rejecting Updates. Let yk+1 be the approximate min- imiser of the trust region subproblem (1.2). In principle, this is the point we would like to select as our next iterate xk+1. However, yk+1 is computed on the basis of the model function mk, and it could happen that moving to yk+1 leads to an increase rather than decrease in of the true objective function f. Trust-region methods there- fore accept yk+1 only if the decrease achieved in f is at least a fixed proportion of the decrease ”promised” by mk,

xk+1 =

yk+1 if (^) mfk^ ( (xxkk^ ) )−−fm^ (yk k(y+1k+1)) > η, xk otherwise,

where η ∈ (0, 1 /4) is fixed. Note that rejecting the update does not imply that the algorithm will stall, because we can still shrink the trust region so that yk+2 6 = yk+1.

1.2. Updating the Trust Region. The easiest way to define a trust region Rk is to choose the closed ball of radius ∆k around xk in some norm ‖ · ‖,

Rk = {x ∈ Rn^ : ‖x − xk‖ ≤ ∆k}.

For simplicity, we will assume that ‖ · ‖ is the Euclidean norm. ∆k is called the trust region radius. In order to define a new trust region Rk+1 around xk+1, it suffices to fix a rule on how to select ∆k+1. The following rule is a popular choice, where yk+1 is as in Section 1.1,

∆k+1 =

∆k 4 if^

f (xk )−f (yk+1) mk (xk )−mk (yk+1) <^

1 4 , min(2∆k, ∆max) if (^) mfk^ ((xxkk^ ))−−fm^ (ky (ky+1k+1)) > 34 , ∆k otherwise.

The rule is designed so that ∆k never exceeds ∆max, and it is motivated by comparing the objective function decrease f (xk) − f (yk+1) with the decrease mk(xk) − mk(yk+1) ”promised” by the model function:

  • If the actual decrease was below our expectations, this indicates that mk should be regarded as a more local model than before. We thus find a rea- sonable ∆k+1 by shrinking ∆k.
  • If the actual decrease was above our expectations, we feel confident to expand the trust region by selecting ∆k+1 as an expansion of ∆k.
  • If there is neither reason for gloom nor euphoria, we stick to the previous value ∆k+1 = ∆k.

1.3. The Algorithm. By now we assembled the necessary elements to formu- late a generic trust region algorithm:

Algorithm 1.1 (Generic Trust region Method). S0 Choose ∆max > 0 , ∆ 0 ∈ (0, ∆max), η ∈ (0, 1 /4), x 0 ∈ Rn, B 0 , ǫ > 0. S1 While ‖∇f (xk)‖ ≥ ǫ repeat Compute yk+1 as the approximate minimiser of (1.2). Determine xk+1 via (1.4).

Claim 1 follows from Proposition 3.2 below; for Claim 2 see Problem Set 3. It follows from these two claims that

lim k→∞ f (xk) =

∑^ ∞

k=

f (xk+1) − f (xk) = −∞,

since (1.4) guarantees that the series on the right hand side contains only nonpositive terms.

We now set out to showing the validity of Claim 1. Intuitively it is clear that when ‖∇f (xk)‖ is bounded below and ∆k becomes sufficiently small, then f (yk+1)−f (xk) ≈ mk(yk+1) − mk(xk) should hold. Indeed, in Lemma 3.5 below we will show that ‖∇f (xk)‖ ≥ ǫ and ∆k < 2 ǫ/(7β) imply

f (yk+1) − f (xk) mk(yk+1) − mk(xk)

Claim 1 then follows immediately from the following result:

Proposition 3.2. There are at most ⌊log 4 ∆max 2 ǫ 7 β⌋ rejected updates between suc- cessive accepted updates.

Proof. Suppose to the contrary that all updates yk+1 for k = k 0 , k 0 + 1,... , k 0 + ⌈log 4 ∆max 2 ǫ 7 β⌉ =: k 1 are rejected. Then

∆k 1 = ∆k 0 4 −(k^1 −k^0 )^ ≤ 2 ǫ 7 β

and (3.1) contradicts our assumption that that yk 1 +1 is rejected.

It remains to prove (3.1). We divide the argument into several lemmas.

Lemma 3.3. Let ‖∇f (xk)‖ ≥ ǫ and ∆k < ǫ/β. Then

ykc = xk −

∆k ‖∇f (xk)‖ ∇f (xk). (3.2)

Proof. If ∇f (xk)TBk∇f (xk) ≤ 0 then (3.2) holds because of (2.1). So, we may assume that ∇f (xk)TBk∇f (xk) > 0, and then

∆k <

ǫ β

‖∇f (xk)‖ β

‖∇f (xk)‖^3 β‖∇f (xk)‖^2

‖∇f (xk)‖^3 ∇f (xk)TBk∇f (xk)

But this implies that

∆k ‖∇f (xk)‖

∇f (xk)T∇f (xk) ∇f (xk)TBk∇f (xk)

The result now follows from (2.1).

Lemma 3.4. Let ‖∇f (xk)‖ ≥ ǫ and ∆k < ǫ/(2β). Then

∇f (xk)T(yk+1 − xk) ≤ −

∆k‖∇f (xk)‖ 2

Proof. The relation ∆k < 2 ǫβ ≤ ‖∇f 2 (βx k^ )‖implies that

−∆k‖∇f (xk)‖ + ∆^2 kβ ≤ −

∆k‖∇f (xk)‖ 2

Moreover, by Lemma 3.3, ∆k < 2 ǫβ < (^) βǫ implies yck = xk − (^) ‖∇f∆ (kxk )‖ ∇f (xk), and hence,

mk(yck) = f (xk) − ∆k‖∇f (xk)‖ +

∆^2 k 2

∇f (xk)TBk∇f (xk) ‖∇f (xk)‖^2

The assumption mk(yk+1) ≤ mk(yck) from Theorem 3.1 implies

f (xk) + ∇f (xk)T(yk+1 − xk) +

(yk+1 − xk)TBk(yk+1 − xk)

(3.4) ≤

f (xk) − ∆k‖∇f (xk)‖ + ∆^2 k 2

∇f (xk)TBk∇f (xk) ‖∇f (xk)‖^2

so that

∇f (xk)T(yk+1 − xk)

≤ −∆k‖∇f (xk)‖ +

∆^2 k 2

∇f (xk)TBk∇f (xk) ‖∇f (xk)‖^2

(yk+1 − xk)TBk(yk+1 − xk)

≤ −∆k‖∇f (xk)‖ + ∆^2 β (3.3) ≤ −

∆k‖∇f (xk)‖ 2

Lemma 3.5. Let ‖∇f (xk)‖ ≥ ǫ and ∆k < 2 ǫ/(7β). Then

f (yk+1) − f (xk) mk(yk+1) − mk(xk)

Proof. We have

∆k < 2 ǫ 7 β

2 ‖∇f (xk)‖ 7 β

⇒ β∆k < ‖∇f (xk)‖ 4

β∆k 8

β∆k ‖∇f (xk)‖ + 12 β∆k

1 2 ‖∇f^ (xk)‖∆k^ −^

1 2 β∆

2 k ‖∇f (xk)‖∆k + 12 β∆^2 k

β∆k ‖∇f (xk)‖ + 12 β∆k

On the other hand, since ∆k < 2 ǫ/ 7 β < ǫ/ 2 β, Lemma 3.3 shows that

0 < mk(xk) − mk(yk+1) = ∇f (xk)T(xk − yk+1) −

(yk+1 − xk)TBk(yk+1 − xk)

≤ ∇f (xk)T(xk − yk+1) +

β∆^2 k ≤ ‖∇f (xk)‖∆k +

β∆^2 k.