Lagrangian Duality: A Generalization of Linear Programming Duality, Study notes of Mathematical Methods

Lagrangian duality, a theory that generalizes linear programming duality. It covers the kkt conditions, the lagrangian function, and the relationship between the primal and dual problems. The document also includes an example of applying lagrangian duality to a quadratic programming problem.

Typology: Study notes

2010/2011

Uploaded on 09/09/2011

luber-1
luber-1 🇬🇧

4.8

(12)

293 documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
C12.1B: CONTINUOUS OPTIMISATION
LECTURE 12: LAGRANGIAN DUALITY AND CONVEX
PROGRAMMING
RAPHAEL HAUSER
MATHEMATICAL INSTITUTE, UNIVERSITY OF OXFORD
1. Reformulating the KKT Conditions. The topic of this lecture is La-
grangian duality, a generalisation of the LP duality theory we studied in the exercises
relating to Lecture 8. As a by-product of this analysis we also find that constrained
convex optimisation problems allow first order necessary and sufficient conditions.
This generalises our results for unconstrained convex optimisation from Lecture 1.
In all that follows we consider the constrained optimisation problem
(NLP) min f(x)
s.t. gI(x)0,
gE(x) = 0,
where gIis a vector of inequality constraints and gEa vector of equality constraints.
The associated KKT conditions are
f(x)g
I(x)Tug
E(x)Tv= 0,(1.1)
gI(x)0,(1.2)
gE(x) = 0,(1.3)
u
jgj(x) = 0 (j I),(1.4)
u0.(1.5)
To motivate Lagrangian duality, we will reformulate the KKT conditions (1.1)–
(1.5) in slightly more abstract form. To do this, we want to extend the Lagrangian
as follows:
L:Rn×Rp×RqR
(x, u, v)7→
f(x)uTgI(x)vTgE(x),if xdom(f), u 0,
+if x /dom(f), u 0,
−∞ if u0.
This definition of the Lagrangian is a bit more general than the one we encountered
previously, but this is mainly interesting for the purposes of simplifying notation and
does not really entail a conceptual change:
(i) We account for the possibility that fmight not be defined on all of Rn. Our
extensions of Lis compatible with extending fby setting f(x) = +for all
x /dom f. Since (NLP) is a minimisation problem, this automatically forces
the search for optimal solutions to be restricted to dom f.
(ii) We define Lto be −∞ when the vector of Lagrange multipliers associated
with the inequality constraints is not nonnegative as it should be. Again, this
convention allows us not to worry notationally about the fact that, really, u
is constrained to the nonnegative orthant.
1
pf3
pf4
pf5
pf8

Partial preview of the text

Download Lagrangian Duality: A Generalization of Linear Programming Duality and more Study notes Mathematical Methods in PDF only on Docsity!

C12.1B: CONTINUOUS OPTIMISATION

LECTURE 12: LAGRANGIAN DUALITY AND CONVEX

PROGRAMMING

RAPHAEL HAUSER MATHEMATICAL INSTITUTE, UNIVERSITY OF OXFORD

  1. Reformulating the KKT Conditions. The topic of this lecture is La- grangian duality, a generalisation of the LP duality theory we studied in the exercises relating to Lecture 8. As a by-product of this analysis we also find that constrained convex optimisation problems allow first order necessary and sufficient conditions. This generalises our results for unconstrained convex optimisation from Lecture 1. In all that follows we consider the constrained optimisation problem

(NLP) min f (x) s.t. gI (x) ≥ 0 , gE (x) = 0,

where gI is a vector of inequality constraints and gE a vector of equality constraints. The associated KKT conditions are

∇f (x∗) − gI′ (x∗)Tu∗^ − gE′ (x∗)Tv = 0, (1.1) gI (x∗) ≥ 0 , (1.2) gE (x∗) = 0, (1.3) u∗ j gj (x∗) = 0 (j ∈ I), (1.4) u∗^ ≥ 0. (1.5)

To motivate Lagrangian duality, we will reformulate the KKT conditions (1.1)– (1.5) in slightly more abstract form. To do this, we want to extend the Lagrangian as follows:

L : Rn^ × Rp^ × Rq^ → R

(x, u, v) 7 →

f (x) − uTgI (x) − vTgE (x), if x ∈ dom(f ), u ≥ 0 , +∞ if x /∈ dom(f ), u ≥ 0 , −∞ if u  0.

This definition of the Lagrangian is a bit more general than the one we encountered previously, but this is mainly interesting for the purposes of simplifying notation and does not really entail a conceptual change:

(i) We account for the possibility that f might not be defined on all of Rn. Our extensions of L is compatible with extending f by setting f (x) = +∞ for all x /∈ dom f. Since (NLP) is a minimisation problem, this automatically forces the search for optimal solutions to be restricted to dom f. (ii) We define L to be −∞ when the vector of Lagrange multipliers associated with the inequality constraints is not nonnegative as it should be. Again, this convention allows us not to worry notationally about the fact that, really, u is constrained to the nonnegative orthant.

Lemma 1.1. The KKT conditions (1.1)–(1.5) are equivalent to the following set of equations and inequalities,

∇xL(x∗, u∗, v∗) = 0, (1.6) ∇uL(x∗, u∗, v∗) ≤ 0 , (1.7) ∇v L(x∗, u∗, v∗) = 0, (1.8) u∗^ T∇uL(x∗, u∗, v∗) = 0, (1.9) u∗^ ≥ 0 , (1.10)

where ∇xL = (DxL)T^ is the gradient with respect to x, and likewise ∇uL and ∇v L the gradients with respect to u and v.

Proof. (1.6) is just a reformulation of (1.1). Note that ∇uL = −gI and ∇v L = −gE. Therefore, (1.2) is equivalent to ∇uL(x∗, u∗, v∗) = −gI (x∗) ≤ 0, which is (1.7). Likewise, (1.3) is equivalent to ∇v L(x∗, u∗, v∗) = −gE (x∗) = 0, which is (1.8). Fi- nally, (1.4) and ∇uL = −gI imply u∗^ T∇uL(x∗, u∗, v∗) = −

i∈I u

∗ i gi(x

∗) = 0, which

is (1.9). On the other hand, (1.7),(1.10) and (1.9) imply that

i∈I u

∗ i gi(x

∗) is a sum

of nonnegative summands that adds to zero, and hence all the summands must be zero, which shows (1.4).

Our reformulation of the KKT conditions in terms of the Lagrangian provides the following deeper interpretation:

Lemma 1.2 (KKT and Saddle Points). (i) Equation (1.6) is the first order necessary condition for x∗^ to be a minimiser of the unconstrained problem

min x∈Rn^

L(x, u∗, v∗), (1.11)

where u∗^ and v∗^ are regarded as a set of fixed parameters. (ii) Equations (1.7)–(1.10) are the first order necessary optimality conditions for the problem

max (u,v)∈Rp×Rq^

L(x∗, u, v) (1.12)

where x∗^ is considered as a set of fixed parameters, and where p = |E| and q = |I|.

Proof. (i) (1.11) is an unconstrained problem. Therefore, (i) is immediate. (ii) The objective function of problem (1.12) takes the value −∞ for u  0 and finite values when u ≥ 0. Therefore, (1.12) is equivalent to the constrained optimisation problem

min (u,v)∈Rp×Rq^

− L(x∗, u, v)

s.t. u ≥ 0.

The LICQ holds at all feasible points because the constraint gradients are the co- ordinate unit vectors {e 1 ,... , ep} corresponding to the variables of u, and these are

The natural question to ask is: what is the relation between (NLP), (P) and (D)? The following Theorem shows that (P) and (NLP) are equivalent, and later we will see that for convex problems (P) and (D) are equivalent under regularity assumptions, that is, the max and min may be interchanged.

Theorem 2.1 (Lagrangian Primal). (P) and (NLP) are equivalent problems.

Proof. If x is feasible (for (NLP)) then we have gI (x) ≥ 0 and gE (x) = 0. This implies

L(x, u, v) =

f (x) − uTgI (x) − vTgE (x) = f (x) − uTgI (x) ≤ f (x), if u ≥ 0 , −∞ if u  0.

Therefore, for feasible x the objective function of (P) takes the value

max (u,v)

L(x, u, v) = L(x, 0 , v) = f (x).

On the other hand, if x is infeasible (for (NLP)) then

  • either there exists an index j ∈ I such that gj (x) < 0, and then we can choose ui = M > 0,
  • or there exists an index i ∈ E such that gi(x) 6 = 0, and then we can choose vj = − sgn(hi(x))M.

In both cases, we can set all remaining entries of u and v to zero, and then

L(x, u, v) M →∞ −→ +∞.

This shows that for infeasible x the objective function of (P) is

max (u,v)

L(x, u, v) = +∞.

In summary, we find that

max (u,v)

L(x, u, v) =

f (x) if gI (x) ≥ 0 , gE (x) = 0, +∞ otherwise,

which shows that minimising x 7 → max(u,v) L(x, u, v) over Rn^ is the same as minimis- ing f (x) over the feasible domain of (NLP).

2.1. The Interpretation of the Dual. The interpretation of the Lagrangian dual (D) is less straight forward. The following example shows that in the case where (P) is a linear programming problem, (D) is the usual LP dual. The example also shows that convex quadratic programming problems have a convex quadratic dual. And finally, the example highlights that if (P) is not a convex problem then (D) might not yield any useful information at all.

Example 2.2. Consider the problem

min x∈Rn

xTBx + cTx, (2.1)

s.t. Ax = b, x ≥ 0 ,

where B is a symmetric n × n matrix, c ∈ Rn, A is a q × n matrix and b ∈ Rq^.

Problems of the form (2.1) are called quadratic programming (QP). We have gI (x) = x, p = n and gE (x) = Ax − b. The Lagrangian of this problem is

L(x, u, v) =

1 2 x

TBx + (c − u − ATv)Tx + bTv if u ≥ 0 , −∞ otherwise.

Note that

max (u,v)

L(x, u, v) =

f (x) if Ax = b, x ≥ 0 , +∞ otherwise.

Therefore, (P) is clearly equivalent to (2.1), as predicted by Theorem 2.1. Let us now derive the dual of (2.1). We distinguish three cases.

Case 1: Let B = 0. Then (2.1) is an LP problem in standard primal form,

(P) min cTx s.t. Ax = b, x ≥ 0.

In this case we have

min x L(x, u, v) =

bTv if c − u − ATv = 0, u ≥ 0 , −∞ if c − u − ATv 6 = 0, u ≥ 0 , −∞ if u  0.

or in other words,

min x L(x, u, v) =

bTv if ATv ≤ c, u = c − ATv, −∞ otherwise.

Therefore, the dual Lagrangian problem is

(D) max bTv s.t. ATv ≤ c.

Note that this is the usual LP dual of (P).

Case 2: Let B  0. If u ≥ 0 then x 7 → L(x, u, v) is a smooth convex function. The unconstrained minimisers of convex functions are exactly their stationary points characterised by ∇xL(x, u, v) = 0 or

Bx = ATv + u − c, (2.2)

  1. Convex Programming. Weak Lagrangian duality is as far as the LP duality theory extends to nonconvex problems. To extend the theory further, we need to assume that (NLP) is convex, that is, f is convex while gj (j ∈ I) and gi, −gi (i ∈ E) are concave, so that the feasible domain F is convex. We have seen in Lecture 1 that the requirement that both gi and −gi are concave implies that gi is a linear functional plus a constant, (an affine function). Thus, only linear equality constraints appear in convex programming problems! A convex programming problem is thus of the form (CP) min x f (x)

s.t. Ax = b, x ∈ K =

z ∈ Rn^ : gj (z) ≥ 0 , (j ∈ I)

where A ∈ Rm×n^ is a matrix which can always be chosen so that its row vectors ∇g iT (i ∈ E) are linearly independent (otherwise we can eliminate a few of them or detect infeasibility), and where K is a convex set. The Lagrangian of a convex optimisation problem has nice convexity properties itself: (i) For a fixed (u∗, v∗) ∈ Rp + × Rq^ the function

x 7 → L(x, u∗, v∗) = f (x) +

j∈I

u∗ j (−gj (x)) +

i∈E

v∗ i (−gi(x))

is a sum the convex functions f , −u∗ j gj (j ∈ I) and −v∗ i gi (i ∈ E). Therefore, by the results of Lecture 1, x 7 → L(x, u∗, v∗) is globally convex! (ii) For a fixed x∗^ ∈ Rn^ the function (u, v) 7 → L(x∗, u, v) is affine (linear plus a constant) on Rp + × Rq^. Furthermore, it takes the value −∞ when u  0, and this is consistent with our definition of concavity for so- called proper functions as introduced in Lecture 1. Thus, (u, v) 7 → L(x∗, u, v) is globally concave!

3.1. Exact Characterisation of Convex Optimality. It now turns out that

  • just as in unconstrained optimisation – first order optimality conditions are all we need when (NLP) is a convex problem:

Theorem 3.1 (Sufficient Optimality Conditions for Convex Programming). Let (NLP) be a convex problem in which the objective and constraint functions are at least once continuously differentiable. Let (x∗, u∗, v∗) be a point that satisfies the KKT con- ditions (1.6)–(1.10). Then x∗^ is a global minimiser of (NLP).

Proof. The condition ∇xL(x∗, u∗, v∗) = 0 implies that x∗^ is a global minimiser of the convex unconstrained function x 7 → L(x, u∗, v∗). For all x feasible (for (NLP)), we have gI (x) ≥ 0 and gE (x) = 0. Since u∗^ ≥ 0 we therefore have

f (x) ≥ f (x) − u∗^ TgI (x) − v∗^ TgE (x) = L(x, u∗, v∗) ≥ L(x∗, u∗, v∗) = f (x∗),

the last equality derives from the complementarity condition (1.9).

What about constraint qualifications? Where have they disappeared to? It is important to realise that Theorem 3.1 only says that the KKT conditions are suf- ficient optimality conditions for convex programming, but not necessary conditions. Of course, the KKT conditions also become necessary when the LICQ or the more general MFCQ is satisfied. For convex problems it is convenient to reformulate the MFCQ by an equivalent criterion that is easier to check:

Definition 3.2 (Slater Constraint Qualification). The convex programming problem (CP) satisfies the Slater constraint qualification (SCQ) if A has full row- rank and K◦^ ∩ F is nonempty, in other words, there exists a point x ∈ Rn^ such that gE (x) = 0 and gI (x) > 0.

Corollary 3.3 (Exact Characterisation of Optimality for Convex Program- ming). If (CP) satisfies the SCQ then the KKT conditions are an exact characteri- sation of optimality.

Proof. This follows immediately from Theorem 3.1 and the necessary first order optimality conditions for nonlinear programming.

3.2. Strong Duality for Convex Programming. In the exercises we saw that strong LP duality was a direct consequence of necessary and sufficient optimality con- ditions. Now that we have a generalisation of this result, strong duality extends also:

Theorem 3.4 (Strong Lagrangian Duality). Let (CP) be a convex programming problem for which the SCQ holds and such that an optimal solution x∗^ exists. Then (D) has an optimal solution (u∗, v∗) and the primal and dual objective function values at x∗^ and (u∗, v∗) coincide.

Proof. Because of the SCQ, there exists a vector (u∗, v∗) ∈ Rp + × Rq^ such that (x∗, u∗, v∗) satisfies the KKT conditions. Since x∗^ is feasible, we have

L(x∗, u, v) = f (x∗) − uTgI (x) − vTgI (x) = f (x∗) − uTgI (x) ≤ f (x∗) = L(x∗, u∗, v∗)

for all (u, v) ∈ Rp + × Rq^ , where the last equality follows from the complementarity requirement (1.9) in the KKT conditions. Since L(x∗, u, v) = −∞ for u  0, this shows that

L(x∗, u∗, v∗) = max (u,v)

L(x∗, u, v).

On the other hand, (1.6) and the convexity of x 7 → L(x, u∗, v∗) imply that

L(x∗, u∗, v∗) = min x L(x, u∗, v∗).

The result now follows from weak duality.