




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Lagrangian duality, a theory that generalizes linear programming duality. It covers the kkt conditions, the lagrangian function, and the relationship between the primal and dual problems. The document also includes an example of applying lagrangian duality to a quadratic programming problem.
Typology: Study notes
1 / 8
This page cannot be seen from the preview
Don't miss anything!





RAPHAEL HAUSER MATHEMATICAL INSTITUTE, UNIVERSITY OF OXFORD
(NLP) min f (x) s.t. gI (x) ≥ 0 , gE (x) = 0,
where gI is a vector of inequality constraints and gE a vector of equality constraints. The associated KKT conditions are
∇f (x∗) − gI′ (x∗)Tu∗^ − gE′ (x∗)Tv = 0, (1.1) gI (x∗) ≥ 0 , (1.2) gE (x∗) = 0, (1.3) u∗ j gj (x∗) = 0 (j ∈ I), (1.4) u∗^ ≥ 0. (1.5)
To motivate Lagrangian duality, we will reformulate the KKT conditions (1.1)– (1.5) in slightly more abstract form. To do this, we want to extend the Lagrangian as follows:
L : Rn^ × Rp^ × Rq^ → R
(x, u, v) 7 →
f (x) − uTgI (x) − vTgE (x), if x ∈ dom(f ), u ≥ 0 , +∞ if x /∈ dom(f ), u ≥ 0 , −∞ if u 0.
This definition of the Lagrangian is a bit more general than the one we encountered previously, but this is mainly interesting for the purposes of simplifying notation and does not really entail a conceptual change:
(i) We account for the possibility that f might not be defined on all of Rn. Our extensions of L is compatible with extending f by setting f (x) = +∞ for all x /∈ dom f. Since (NLP) is a minimisation problem, this automatically forces the search for optimal solutions to be restricted to dom f. (ii) We define L to be −∞ when the vector of Lagrange multipliers associated with the inequality constraints is not nonnegative as it should be. Again, this convention allows us not to worry notationally about the fact that, really, u is constrained to the nonnegative orthant.
Lemma 1.1. The KKT conditions (1.1)–(1.5) are equivalent to the following set of equations and inequalities,
∇xL(x∗, u∗, v∗) = 0, (1.6) ∇uL(x∗, u∗, v∗) ≤ 0 , (1.7) ∇v L(x∗, u∗, v∗) = 0, (1.8) u∗^ T∇uL(x∗, u∗, v∗) = 0, (1.9) u∗^ ≥ 0 , (1.10)
where ∇xL = (DxL)T^ is the gradient with respect to x, and likewise ∇uL and ∇v L the gradients with respect to u and v.
Proof. (1.6) is just a reformulation of (1.1). Note that ∇uL = −gI and ∇v L = −gE. Therefore, (1.2) is equivalent to ∇uL(x∗, u∗, v∗) = −gI (x∗) ≤ 0, which is (1.7). Likewise, (1.3) is equivalent to ∇v L(x∗, u∗, v∗) = −gE (x∗) = 0, which is (1.8). Fi- nally, (1.4) and ∇uL = −gI imply u∗^ T∇uL(x∗, u∗, v∗) = −
i∈I u
∗ i gi(x
∗) = 0, which
is (1.9). On the other hand, (1.7),(1.10) and (1.9) imply that
i∈I u
∗ i gi(x
∗) is a sum
of nonnegative summands that adds to zero, and hence all the summands must be zero, which shows (1.4).
Our reformulation of the KKT conditions in terms of the Lagrangian provides the following deeper interpretation:
Lemma 1.2 (KKT and Saddle Points). (i) Equation (1.6) is the first order necessary condition for x∗^ to be a minimiser of the unconstrained problem
min x∈Rn^
L(x, u∗, v∗), (1.11)
where u∗^ and v∗^ are regarded as a set of fixed parameters. (ii) Equations (1.7)–(1.10) are the first order necessary optimality conditions for the problem
max (u,v)∈Rp×Rq^
L(x∗, u, v) (1.12)
where x∗^ is considered as a set of fixed parameters, and where p = |E| and q = |I|.
Proof. (i) (1.11) is an unconstrained problem. Therefore, (i) is immediate. (ii) The objective function of problem (1.12) takes the value −∞ for u 0 and finite values when u ≥ 0. Therefore, (1.12) is equivalent to the constrained optimisation problem
min (u,v)∈Rp×Rq^
− L(x∗, u, v)
s.t. u ≥ 0.
The LICQ holds at all feasible points because the constraint gradients are the co- ordinate unit vectors {e 1 ,... , ep} corresponding to the variables of u, and these are
The natural question to ask is: what is the relation between (NLP), (P) and (D)? The following Theorem shows that (P) and (NLP) are equivalent, and later we will see that for convex problems (P) and (D) are equivalent under regularity assumptions, that is, the max and min may be interchanged.
Theorem 2.1 (Lagrangian Primal). (P) and (NLP) are equivalent problems.
Proof. If x is feasible (for (NLP)) then we have gI (x) ≥ 0 and gE (x) = 0. This implies
L(x, u, v) =
f (x) − uTgI (x) − vTgE (x) = f (x) − uTgI (x) ≤ f (x), if u ≥ 0 , −∞ if u 0.
Therefore, for feasible x the objective function of (P) takes the value
max (u,v)
L(x, u, v) = L(x, 0 , v) = f (x).
On the other hand, if x is infeasible (for (NLP)) then
In both cases, we can set all remaining entries of u and v to zero, and then
L(x, u, v) M →∞ −→ +∞.
This shows that for infeasible x the objective function of (P) is
max (u,v)
L(x, u, v) = +∞.
In summary, we find that
max (u,v)
L(x, u, v) =
f (x) if gI (x) ≥ 0 , gE (x) = 0, +∞ otherwise,
which shows that minimising x 7 → max(u,v) L(x, u, v) over Rn^ is the same as minimis- ing f (x) over the feasible domain of (NLP).
2.1. The Interpretation of the Dual. The interpretation of the Lagrangian dual (D) is less straight forward. The following example shows that in the case where (P) is a linear programming problem, (D) is the usual LP dual. The example also shows that convex quadratic programming problems have a convex quadratic dual. And finally, the example highlights that if (P) is not a convex problem then (D) might not yield any useful information at all.
Example 2.2. Consider the problem
min x∈Rn
xTBx + cTx, (2.1)
s.t. Ax = b, x ≥ 0 ,
where B is a symmetric n × n matrix, c ∈ Rn, A is a q × n matrix and b ∈ Rq^.
Problems of the form (2.1) are called quadratic programming (QP). We have gI (x) = x, p = n and gE (x) = Ax − b. The Lagrangian of this problem is
L(x, u, v) =
1 2 x
TBx + (c − u − ATv)Tx + bTv if u ≥ 0 , −∞ otherwise.
Note that
max (u,v)
L(x, u, v) =
f (x) if Ax = b, x ≥ 0 , +∞ otherwise.
Therefore, (P) is clearly equivalent to (2.1), as predicted by Theorem 2.1. Let us now derive the dual of (2.1). We distinguish three cases.
Case 1: Let B = 0. Then (2.1) is an LP problem in standard primal form,
(P) min cTx s.t. Ax = b, x ≥ 0.
In this case we have
min x L(x, u, v) =
bTv if c − u − ATv = 0, u ≥ 0 , −∞ if c − u − ATv 6 = 0, u ≥ 0 , −∞ if u 0.
or in other words,
min x L(x, u, v) =
bTv if ATv ≤ c, u = c − ATv, −∞ otherwise.
Therefore, the dual Lagrangian problem is
(D) max bTv s.t. ATv ≤ c.
Note that this is the usual LP dual of (P).
Case 2: Let B 0. If u ≥ 0 then x 7 → L(x, u, v) is a smooth convex function. The unconstrained minimisers of convex functions are exactly their stationary points characterised by ∇xL(x, u, v) = 0 or
Bx = ATv + u − c, (2.2)
s.t. Ax = b, x ∈ K =
z ∈ Rn^ : gj (z) ≥ 0 , (j ∈ I)
where A ∈ Rm×n^ is a matrix which can always be chosen so that its row vectors ∇g iT (i ∈ E) are linearly independent (otherwise we can eliminate a few of them or detect infeasibility), and where K is a convex set. The Lagrangian of a convex optimisation problem has nice convexity properties itself: (i) For a fixed (u∗, v∗) ∈ Rp + × Rq^ the function
x 7 → L(x, u∗, v∗) = f (x) +
j∈I
u∗ j (−gj (x)) +
i∈E
v∗ i (−gi(x))
is a sum the convex functions f , −u∗ j gj (j ∈ I) and −v∗ i gi (i ∈ E). Therefore, by the results of Lecture 1, x 7 → L(x, u∗, v∗) is globally convex! (ii) For a fixed x∗^ ∈ Rn^ the function (u, v) 7 → L(x∗, u, v) is affine (linear plus a constant) on Rp + × Rq^. Furthermore, it takes the value −∞ when u 0, and this is consistent with our definition of concavity for so- called proper functions as introduced in Lecture 1. Thus, (u, v) 7 → L(x∗, u, v) is globally concave!
3.1. Exact Characterisation of Convex Optimality. It now turns out that
Theorem 3.1 (Sufficient Optimality Conditions for Convex Programming). Let (NLP) be a convex problem in which the objective and constraint functions are at least once continuously differentiable. Let (x∗, u∗, v∗) be a point that satisfies the KKT con- ditions (1.6)–(1.10). Then x∗^ is a global minimiser of (NLP).
Proof. The condition ∇xL(x∗, u∗, v∗) = 0 implies that x∗^ is a global minimiser of the convex unconstrained function x 7 → L(x, u∗, v∗). For all x feasible (for (NLP)), we have gI (x) ≥ 0 and gE (x) = 0. Since u∗^ ≥ 0 we therefore have
f (x) ≥ f (x) − u∗^ TgI (x) − v∗^ TgE (x) = L(x, u∗, v∗) ≥ L(x∗, u∗, v∗) = f (x∗),
the last equality derives from the complementarity condition (1.9).
What about constraint qualifications? Where have they disappeared to? It is important to realise that Theorem 3.1 only says that the KKT conditions are suf- ficient optimality conditions for convex programming, but not necessary conditions. Of course, the KKT conditions also become necessary when the LICQ or the more general MFCQ is satisfied. For convex problems it is convenient to reformulate the MFCQ by an equivalent criterion that is easier to check:
Definition 3.2 (Slater Constraint Qualification). The convex programming problem (CP) satisfies the Slater constraint qualification (SCQ) if A has full row- rank and K◦^ ∩ F is nonempty, in other words, there exists a point x ∈ Rn^ such that gE (x) = 0 and gI (x) > 0.
Corollary 3.3 (Exact Characterisation of Optimality for Convex Program- ming). If (CP) satisfies the SCQ then the KKT conditions are an exact characteri- sation of optimality.
Proof. This follows immediately from Theorem 3.1 and the necessary first order optimality conditions for nonlinear programming.
3.2. Strong Duality for Convex Programming. In the exercises we saw that strong LP duality was a direct consequence of necessary and sufficient optimality con- ditions. Now that we have a generalisation of this result, strong duality extends also:
Theorem 3.4 (Strong Lagrangian Duality). Let (CP) be a convex programming problem for which the SCQ holds and such that an optimal solution x∗^ exists. Then (D) has an optimal solution (u∗, v∗) and the primal and dual objective function values at x∗^ and (u∗, v∗) coincide.
Proof. Because of the SCQ, there exists a vector (u∗, v∗) ∈ Rp + × Rq^ such that (x∗, u∗, v∗) satisfies the KKT conditions. Since x∗^ is feasible, we have
L(x∗, u, v) = f (x∗) − uTgI (x) − vTgI (x) = f (x∗) − uTgI (x) ≤ f (x∗) = L(x∗, u∗, v∗)
for all (u, v) ∈ Rp + × Rq^ , where the last equality follows from the complementarity requirement (1.9) in the KKT conditions. Since L(x∗, u, v) = −∞ for u 0, this shows that
L(x∗, u∗, v∗) = max (u,v)
L(x∗, u, v).
On the other hand, (1.6) and the convexity of x 7 → L(x, u∗, v∗) imply that
L(x∗, u∗, v∗) = min x L(x, u∗, v∗).
The result now follows from weak duality.