



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The penalty function method
Typology: Study notes
1 / 5
This page cannot be seen from the preview
Don't miss anything!




RAPHAEL HAUSER MATHEMATICAL INSTITUTE, UNIVERSITY OF OXFORD
(NLP) min x∈Rn^ f (x)
s.t. gE (x) = 0, gI (x) ≥ 0.
Two central ideas underly all of the algorithms we will consider:
1.1. Merit Functions. Starting from a current iterate x, we aim at finding a new update x+ that brings us closer towards the achievement of two conflicting goals: reducing the objective function as much as possible, and satisfying the constraints. The two goals can be combined by minimising a merit function which depends both on the the objective function and on the residuals measuring the constraint violation,
rE (x) := gE (x) rI (x) := (−gI (x))+,
where
(−gj (x))+ :=
−gj (x) if − gj (x) > 0 , 0 if − gj (x) ≤ 0
is the “positive part” of −gj (j ∈ I).
Example 1.1. The penalty function method that will be further analysed below is based on the merit function
Q(x, μ) = f (x) +
2 μ
i∈E∪I
˜g^2 i (x), (1.1)
where μ > 0 is a parameter and
˜gi =
gi (i ∈ E), min(gi, 0) (i ∈ I).
Note that Q(x, μ) has continuous first but not second derivatives at points where one or several of the inequality constraints are active.
1.2. Homotopy Idea. The second term of the merit function forces the con- straint violation to be small when Q(x, μ) is minimised over x. We are not guaranteed that the constraints are exactly satisfied when μ is held fixed, but we can penalise constraint violation more strongly by choosing a smaller μ. This leads to the idea of a homotopy or continuation method which is based on reducing μ dynamically and using the following idea for the outermost iterative loop:
Given a current iterate x and a value of the homotopy parameter μ such that x is an approximate minimiser of the unconstrained problem
min y∈Rn^ Q(y, μ), (1.2)
reduce μ to a value μ+ < μ and – starting from x – apply one or several steps of an iterative algorithm for the minimisation of
min y∈Rn^
Q(y, μ+),
until an approximate minimiser x+ of this problem is reached.
Thus, the continuation approach replaces the constrained problem (NLP) by a sequence of unconstrained problems (1.2) for which we already studied solution meth- ods.
Algorithm 2.1 (QPen). S0 Initialisation choose x 0 ∈ Rn^ % (not necessarily feasible) choose (μk)N 0 ց 0 % (homotopy parameters) choose (ǫk)N 0 ց 0 % (tolerance parameters) S1 For k = 0, 1 , 2 ,... repeat y[0]^ := xk, l := 0 until ‖∇xQ(y[l], μk)‖ ≤ ǫk repeat find y[l+1]^ such that Q(y[l+1], μk) < Q(y[l], μk) % (using unconstrained minimisation method) l ← l + 1 end xk+1 := y[l] end
The choice of the sequences (μ)N 0 and (ǫ)N 0 affects the convergence speed of the method in a crucial way. We will now show that if (xk)N 0 converges then the limit point is usually a KKT points and hence a sensible candidate for a local minimiser of (NLP):
Theorem 2.2. Let f and gi be C^1 functions for all i ∈ E ∪ I, let x∗^ be an accumulation point of the sequence of iterates (xk)N 0 generated by Algorithm QPen, and let (kl)N 0 ⊆ (k)N 0 be such that liml→∞ xkl = x∗. Let us furthermore assume that that the set of gradients {∇gi(x∗) : i ∈ V(x∗)} is linearly independent, where V(x∗) = E ∪ {j ∈ I : gj (x∗) ≤ 0 } is the index set of active, violated and equality constraints. For i ∈ E ∪ I let
λ [k] i =^ −^
˜gi(xk+1) μk
On the other hand, since the LICQ holds at x∗, we have limk→∞ ∇gi(xk) = ∇gi(x∗) 6 = 0 for all (i ∈ E ∪ A(x∗)), and hence,
ϕki → ϕ∗ i , (i ∈ E ∪ A(x∗)),
where ϕki , ϕ∗ i : Rn^ → R are the unique linear functionals such that
ϕki (gj (xk)), ϕ∗ i (gj (x∗)) = δij :=
1 if i = j, 0 if i 6 = j.
Thus, for all ε > 0 there exists kε ∈ N such that
‖ϕki (w) − ϕ∗ i (w)‖ < ε ∀ k ≥ kε, ‖w‖ ≤ 2 ‖∇f (x∗)‖, i ∈ E ∪ A(x∗).
Furthermore, we may choose ε smaller than ‖ϕ∗ i ‖ × ‖∇f (x∗)‖ for all i ∈ E ∪ A(x∗), and by (2.4) we may choose kε large enough so that
∥ ∥∥∇f (x∗) + ∑ j∈E∪I
g ˜j (xk+1) μk
∇gj (xk+1)
∥∥ < ε ‖ϕ∗ i ‖
Therefore, for all k ≥ kε,
∥ ∥ ∥
j∈E∪I
˜gj (xk+1) μk
∇gj (xk+1)
ε ‖ϕ∗ i ‖
and hence,
∥ ∥ ∥ϕk i+
j∈E∪I
˜gj (xk+1) μk
∇gj (xk+1)
− ϕ∗ i (∇f (x∗))
∥ϕk i+
j∈E∪I
˜gj (xk+1) μk
∇gj (xk+1)
− ϕ∗ i
j∈E∪I
˜gj (xk+1) μk
∇gj (xk+1)
∥ϕ∗ i
j∈E∪I
g˜j (xk+1) μk
∇gj (xk+1)
− ϕ∗ i (∇f (x∗))
≤ ε + ‖ϕ∗ i ‖ ×
ε ‖ϕ∗ i ‖ < 2 ε.
This shows that
lim k→∞
λ[ ik ]= − lim k→∞
˜gi(xk+1) μk
= lim k→∞
ϕk i+
j∈E∪I
˜gj (xk+1) μk
∇gj (xk+1)
= ϕ∗ i (∇f (x∗)) =: λ∗ i
exists for all (i ∈ E ∪ A(x∗)) and
∇f (x∗) −
i∈E∪I
λ∗ i ∇gi(x∗) = 0. (2.6)
iv) (2.6) is the first of the KKT equations. Moreover, we have already established that x∗^ is feasible and that λ∗ j = 0 for j ∈ I \ A(x∗), showing complementarity. It only remains to check that λ∗ j ≥ 0 for (j ∈ A(x∗)). If gj (xk+1) ≤ 0 occurs infinitely often, then clearly λj ≥ 0. On the other hand, if j ∈ A(x∗) and gj (xk+1) > 0 for all k
sufficiently large, then ˜gj (xk+1) = 0 and λ [k] j = 0 for all^ k^ large, and this implies that λ∗ i = 0.
2.1. A Few Computational Issues. It follows from the fact that the approx- imate Lagrange multipliers λ[ ik ]converge that
g˜i(xk+1) = O(μk)
for all (i ∈ V(x∗)). This shows that μk has to be reduced to the order of precision by which we want the final result to satisfy the constraints. The Augmented Lagrangian Method which we will discuss in Lecture 14 performs much better. The Hessian of the merit function can easily be computed as
D^2 xxQ(x, μ) =D^2 f (x) +
i∈E∪I
˜gi(x) μ
D^2 gi(x) +
μ
i∈V(x)
∇gi(x)∇g iT (x)
= C(x) +
μ
AT(x)A(x)
where AT(x) is the matrix with columns {∇gi(x) : i ∈ V(x)}. Although D^2 xxQ(x, μ) is discontinuous on the boundary of the feasible domain, it can be argued that this is usually inconsequential in algorithms. When D xx^2 Q(x, μ) is used for the minimisation of Q(y, μk) in the innermost loop of Algorithm QPen, the computations can become very ill-conditioned. For example, solving the Newton equations
D^2 xxQ(y[l], μk)dl = −∇xQ(y[l], μk) (2.7)
directly can lead to large errors as the condition number of the matrix
C(y[l]) +
μk
AT(y[l])A(y[l])
is of order O(μ− k 1 ). In this particular example, it is better to introduce a new dummy variable ξl, and to reformulate (2.7) as follows,
( C(y[l]) AT(y[l]) A(y[l]) −μkI
dl ξl
−∇xQ(y[l], μk) 0
Indeed, if (dl, ξl)T^ satisfies (2.8) then dl solves (2.7): μ− k 1 Adl = ξl and −∇xQ = Cdl + ATξl = Cdl + μ− k 1 ATAdl. The advantage of this method is that the system (2.8) is usually well-conditioned and the numerical results of high precision. Similar tricks can be applied when a quasi-Newton method is used instead of the Newton- Raphson method.