

















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Dynamic Programming Knapsack Problems Langrangian Relaxation solving langrangian dual choosing a langrangian dual
Typology: Study notes
1 / 25
This page cannot be seen from the preview
Don't miss anything!


















x
s
d
Fig. 10.1. The uncapacitated lotsizing problem.
10.1. Uncapacitated Lot-Sizing. The uncapacitated lot-sizing problem con- cerns the following situation: A factory produces a single type of product in periods 1 ,... , n. Production in period t incurs a fixed cost f (^) t , and each unit produced costs p (^) t. Earlier production can be kept in storage for later sale at a unit storage cost of h (^) t during period t. Initial storage is empty. Client demand in period t is dt. See Figure 10.1 for an illustration. How should we plan production so as to satisfy all demands and minimise total costs?
In Example 7 we found the BP model
min x,y,s
∑n
t=
p (^) t xt +
∑^ n
t=
f (^) t y (^) t +
n∑− 1
t=
h (^) t s (^) t
s.t. s (^) t− 1 + xt = dt + s (^) t (t = 1,... , n), xt ≤ M y (^) t (t = 1,... , n), s 0 = s (^) n = 0, s ∈ R n++1 , x ∈ R n + , y ∈ Bn^.
where M is some a-priori bound on the total consumption
∑ (^) n t=1 dt^. ULS can also be interpreted as a min-cost network-flow problem with fixed charge as follows: Consider the directed graph G = (V, E) with V = { 0 , 1 ,... , t,... , n} and E = {(0, t) : t = 1,... , n} ∪ {(t, t + 1) : t = 1,... , n − 1 }. Node 0 is a source with production D =
∑ (^) n i=1 di^.^ Node^ t^ has production^ −dt^ (t^ = 1,... , n).^ See Figure 10.2 for an illustration. Now let xt be the flow along (0, t) for t = 1,... , n at a cost f (^) t + p (^) t xt if xt > 0, and zero otherwise. Let further s (^) t be the flow along (t, t + 1) for t = 1,... , n − 1 at a cost h (^) t s (^) t. The problem is to find a feasible flow that minimises the costs.
−d 1 −d 2 −d 3 −d 4
D
Fig. 10.2. The lot-sizing problem as a network flow.
d + d 1 2 d + d 3 4
d 2 d 4 −d 1 −d 2 −d 3 −d 4
D
Fig. 10.3. There exists an optimal solution that lives on a tree.
Next, we will argue that we only need to look for optimal lot-sizing solutions that occupy a tree in the directed graph constructed above. We do not claim that all optimal solutions have this structure, but there is always an optimal solution that has this structure, and that we only need to look for this solution. The special structure of this solution makes it possible to find it with dynamic programming.
Proposition 10.1. ULS has an optimal solution (x, s) with the following special structure, illustrated in Figure 10.3,
i) s (^) t− 1 xt = 0 for t = 1,... , n, that is, production occurs only when the stock is empty, ii) if xt > 0 then xt =
∑t+k i=t di^ for some^ k^ ≥^0 , that is, if production takes place during period t then the amount satisfies exactly the demands of periods t,... , t + k. Proof. Let (s, x, y) be an optimal solution occupying a minimial set of arcs. The result follows immediately from the no-wastage constraints
s (^) t− 1 + xt = dt + s (^) t
if we can show that the set of arcs on which (x, s) > 0 is a tree in the flow-graph G constructed above. To prove this, it suffices to show that if
xτ , xθ > 0 , xτ +1 ,... , xθ− 1 = 0 (10.1)
which allows to express the decision variables s (^) t as a linear function of the decision vectors x, y.
Lemma 10.2. Relation (10.2) holds for t = 1,... , n. Proof. The no-wastage constraint at time 1
x 1 = d 1 + s 1 (10.3)
yields s 1 = x 1 − d 1 , showing that (10.2) holds for t = 1. Assuming that (10.2) holds for t, the no-wastage constraint at time t + 1
s (^) t + xt+1 = dt+1 + s (^) t+
implies
s (^) t+1 = s (^) t + xt+1 − dt+1 =
∑^ t
i=
(xi − di ) + (xt+1 − dt+1 ).
In eliminating the variables s, note that the constraint s 0 = 0 was already built into (10.3). The constraint s (^) n = 0 on the other hand is superfluous, since a production schedule with s (^) n > 0 can clearly not be optimal.
Next, let us write the objective function in terms of the variables x, y. We have
g(s, x, y) =
∑^ n
t=
p (^) t xt +
∑^ n
t=
f (^) t y (^) t +
n∑− 1
t=
h (^) t
∑t
i=
(xi − di )
∑^ n
t=
c (^) t xt +
∑^ n
t=
f (^) t y (^) t −
n∑− 1
t=
h (^) t d 1 t =: #(x, y),
where we used the notation c (^) t := p (^) t +
∑n− 1 i=t h^ i^ and^ dit^ :=^
∑t j=i dj^. Since the term −
∑n− 1 t=1 h^ t^ d^1 t^ is just an additive constant, the minimiser of^ #(x, y) is the same as the minimiser of
∑^ n
t=
c (^) t xt +
∑^ n
t=
f (^) t y (^) t.
We now claim that the following problem is equivalent (in the sense that they have the same optimiser (x, y)) to the UCL problem,
(P) H(n) = min x,y
∑n
t=
c (^) t xt +
∑^ n
t=
f (^) t y (^) t
s.t. xt ≤ M y (^) t (t = 1,... , n), x ∈ R n + , y ∈ Bn^ , ∀t 1 < t 2 , xt 1 > 0 = xt 1 +1 = · · · = xt 2 − 1 < xt 2 ⇒ xt 1 = dt 1 + · · · + dt 2 − 1 (10.4)
We have already discussed why the omittance of the constraints s 0 , s (^) n = 0 is okay, and why it is okay (without loss of generality) to impose the constraints (10.4). In addition, we have dropped the constraints s (^) t ≥ 0, since these are automatically satisfied when x satisfies (10.4).
Note that (P) is no longer an IP formulation, but it has the advantage that lower- dimensional analogues can naturally be formulated by focussing on production and consumption during periods 1,... , k only (in other words, by replacing n by k),
(P (^) k ) H(k) = min x,y
∑k
t=
c (^) t xt +
∑^ k
t=
f (^) t y (^) t
s.t. xt ≤ M y (^) t (t = 1,... , k), x ∈ R k + , y ∈ Bk ∀t 1 < t 2 , xt 1 > 0 = xt 1 +1 = · · · = xt 2 − 1 < xt 2 ⇒ xt 1 = dt 1 + · · · + dt 2 − 1.
Lemma 10.3. [Key Observation] Let (x, y) be an optimal solution of (P) that satisfies the structure described in Proposition 10.1, and let t = max{θ : y (^) θ = 1} be the last period during which production occurs in the schedule (x, y). Then
i) xt = dtn , and ii) H(n) = H(t − 1) + f (^) t + c (^) t dtn (the “Bellman principle”). Proof. Part i) is an immediate consequence of the tree structure on which (s, x, y) lives. Furthermore, since s (^) t− 1 = 0, the production schedules periods 1,... , t − 1 is completely independent of the production schedule for periods( t,... , n. Therefore, x 1 ,... , xt− 1 ), (y 1 ,... , y (^) t− 1 )
must be a minimiser of (P (^) t− 1 ), and this shows ii).
Algorithm 10.4. [Dynamic Programming for ULS]
(H(t − 1) + f (^) t + c (^) t dtk ).
(Here we exploit Lemma 10.3.ii).)
H(t − 1) + f (^) t + c (^) t dt(τ (^)! −1)
ii) y (^) τ (^) !+1 = 1, iii) xτ (^) !+1 = dτ (^) !+1 ,τ (^)! − 1 , iv) increment # by 1.
Theorem 10.5. Algorithm 10.4 solves the ULS problem (P) in O(n 2 ) time. Proof. The correctness of Algorithm 10.4 is a direct consequence of Lemma 10.3. Since there are n iterations, each costing O(n) time because up to n values have to be compared with one another, the algorithm terminates in O(n 2 ) time.
knapsack problems can be solved via dynamic programming. Before we discuss the general case we consider the situation where x is a binary vector, that is, a single object is available of each type of item that can be packed into the knapsack.
Let thus a (^) j ∈ N n^ , b ∈ N, and c ∈ R n^ be given vectors. We shall see that the the 0-1 knapsack problem
(P) max x
∑n
j=
c (^) j xj
s.t.
∑^ n
j=
a (^) j xj ≤ b
x ∈ Bn
can be solved in O(nb) time. Note that it takes only O(log b) bits to represent b in computer memory, so a running time of O(nb) is actually exponentially large as a function of the input size. Nonetheless, for a moderate size of b, this approach is reasonable, as the complexity is only exponential in the size of b, and not in the size of c (^) i or a (^) i.
To set up a recursive hierarchy of binary knapsack problems, consider
(Pr (λ)) f (^) r (λ) = max
∑^ r
j=
c (^) j xj
s.t.
∑^ r
j=
a (^) j xj ≤ λ
x ∈ Br^.
Then z = f (^) n (b) gives us the optimal value of our original knapsack problem. Further- more, the problems (Pr (λ)) are also binary knapsack problems but of smaller size. To arrive at a recursive relationship between the values f (^) r (λ), we distinguish two cases:
f (^) r (λ) = f (^) r− 1 (λ).
How do we know in which case we are? Just compare the two function values we would get for f (^) r (λ) and pick the one that produces the larger value! Thus, we arrive at the recursion
f (^) r (λ) = max
f (^) r− 1 (λ), c (^) r + f (^) r− 1 (λ − a (^) r )
To initialise the recursion, we set the obvious boundary values
f (^) r (0) = 0, (r = 1,... , n), f 0 (λ) = 0, (λ = 0,... , b).
Algorithm 10.6. [Binary knapsack with integer coefficients]
Initialisation: Set f (^) r (0) = 0, (r = 1,... , n), f 0 (λ) = 0, (λ = 0,... , b). Forward Recursion: For r = 1,... , n, repeat f (^) r (λ) = f (^) r− 1 (λ), (λ = 0,... , ar − 1), f (^) r (λ) = max
f (^) r− 1 (λ), c (^) r + f (^) r− 1 (λ − a (^) r )
, (λ = a (^) r ,... , b). end. Backward Recursion: Set λ = b, r = n, x = 0. While r, λ > 0 , repeat if f (^) r (λ) = c (^) r + f (^) r− 1 (λ − a (^) r ) xr = 1, λ = λ − a (^) r end r = r − 1 end.
There are O(nb) inner loops both for the forward and backward recursion, each costing O(1) time. Therefore, the complexity of this algorithm is O(nb). The com- putations of the backtracking procedure can be avoided altogether by keeping track of the indicator variables
p (^) r (λ) =
0 if f (^) r (λ) = f (^) r− 1 (λ), 1 if f (^) r (λ) = c (^) n + f (^) r− 1 (λ − a (^) n )
during the forward iteration. This comes at the obvious expense of increasing the memory needs because the values of p (^) r (λ) need to be stored.
Example 34. Let us apply Algorithm 10.6 to the 0-1 knapsack problem
max 10x 1 + 7x 2 + 25x 3 + 24x 4 s.t. 2 x 1 + x 2 + 6x 3 + 5x 4 ≤ 7 x ∈ B^4.
Clearly, our original knapsack problem is (P (^) n (b)), so that its optimal value is given by g (^) n (b). Furthermore, the following boundary values are obvious,
g (^) r (0) = 0, (r = 0,... , n) g 0 (λ) = 0, (λ = 0,... , b).
Note that at most ) (^) aλr * copies of item r fit into a knapsack of volume λ. Using the Bellman principle, and distinguishing the cases
x∗ r = 0,... ,
λ a (^) r
we thus find the recursion formula
g (^) r (λ) = max
tc (^) r + g (^) r− 1 (λ − ta (^) r ) : t = 0,... ,
λ a (^) r
However, each of these iterations takes O()b/a (^) j *) to compute, leading to an algorithm with an overall complexity of
O
nb max j )b/a (^) j *
nb 2
We can do better than this: Observe that if x∗ r = 0, then the vector (x∗ 1 ,... , x∗ r− 1 ) must be optimal for the problem (P (^) r− 1 (λ)), so that g (^) r (λ) = g (^) r− 1 (λ). If on the other hand x∗ r ≥ 1, then the vector (x∗ 1 ,... , x∗ r− 1 , x∗ r − 1) must be optimal for (P (^) r (λ − a (^) r )), so that g (^) r (λ) = c (^) r + g (^) r (λ − a (^) r ). Therefore, we can use the recursion
g (^) r (λ) = max
g (^) r− 1 (λ), c (^) r + g (^) r (λ − a (^) r )
which takes O(1) time to compute. We need to compute O(nb) such recursive steps, so we obtain a O(nb)-time algorithm:
Algorithm 10.7. [Integer knapsack with integer coefficients] Initialisation: Set g (^) r (0) = 0, (r = 1,... , n); g 0 (λ) = 0, (λ = 0,... , b). Forward recursion: for r = 1,... , n, repeat g (^) r (λ) = g (^) r− 1 (λ), λ = 0,... , ar − 1 g (^) r (λ) = max
g (^) r− 1 (λ), c (^) r + g (^) r (λ − a (^) r )
, (λ = a (^) r ,... , b) end. Backward Recursion: Set λ = b, r = n, x = 0; while λ, r > 0 , repeat if g (^) r (λ) = g (^) r− 1 (λ) r = r − 1 , elseif g (^) r (λ) = c (^) r + g (^) r (λ − a (^) r ) xr = xr + 1, λ = λ − a (^) r , end. For the purposes of the backward sweep, it is again convenient to collect the values of the following variables,
p (^) r (λ) =
1 if g (^) r (λ) = c (^) r + g (^) r (λ − a (^) r ), 0 otherwise.
To back-track, we have to start checking p (^) n (b): If p (^) n (b) = 1 then x∗ n ≥ 1 and we must check p (^) n (b − a (^) n ) to see whether x∗ n ≥ 2 etc.. If on the other hand p (^) n (b) = 0, then x∗ n = 0 and we must check p (^) n− 1 (b) to see whether x∗ n− 1 ≥ 1 etc..
Example 35. Consider the integer knapsack problem
max 7x 1 + 9x 2 + 2x 3 + 15x 4 s.t. 3 x 1 + 4x 2 + x 3 + 7x 4 ≤ 10 x ∈ Z^4 +.
Applying Algorithm 2, we find the following table of values for g (^) r (λ), p (^) r (λ),
g 1 g 2 g 3 g 4 p 1 p 2 p 3 p (^4)
λ = 0 0 0 0 0 0 0 0 0 1 0 0 2 2 0 0 1 0 2 0 0 4 4 0 0 1 0 3 7 7 7 7 1 0 0 0 4 7 9 9 9 1 1 0 0 5 7 9 11 11 1 1 1 0 6 14 14 14 14 1 0 0 0 7 14 16 18 18 1 1 1 0 8 14 18 18 18 1 1 0 0 9 21 21 21 21 1 0 0 0 10 21 23 23 23 1 1 0 0 Back-tracking:
We have found that x∗^ = (2, 1 , 0 , 0) is an optimal solution.
Knapsack problems with integer coefficients can also be reformulated as a longest (or shortest) path problem. Consider again the problem
max
∑^ n
j=
c (^) j xj
s.t.
∑^ n
j=
a (^) j xj ≤ b
x ∈ Zn +
with b, a (^) j ∈ N (j = 1,... , n), and construct an acyclic digraph D = (V, A) with
(IP) z = max c T^ x s.t. Ax ≤ a Dx ≤ d x ∈ Zn + ,
where Ax ≤ a is a benign set of constraints (e.g., totally unimodular) in the sense that
max c T^ x s.t. Ax ≤ a x ∈ Zn +
would be easy to solve, and where Dx ≤ d is a set of m malicious constraints that render (IP) intractable (e.g., connectivity constraints in the TSP).
For such problems we will now derive a family of relaxations that can generate stronger bounds than LP relaxations. Consequently, branch-and-bound systems built around these relaxations are usually more efficient than an LP based approach. We start by writing (IP) in the slightly more general form
(IP) z = max c T^ x s.t. Dx@d, x ∈ X,
where @ can stand either for ”≤” or ”=”, and where X is a feasible set of ”benign” type. Let us write I for the set of indices that correspond to inequality constraints among the system Dx@d, and E for the set of indices that correspond to equality constraints.
Definition 11.1. A Lagrangian relaxation of (IP) is a problem of the form
(IP(u)) z(u) = max{c T^ x + u T^ (d − Dx) : x ∈ X}
where u ∈ R m^ is a fixed vector Lagrange multipliers, and where u (^) i ≥ 0 for i ∈ I.
Proposition 11.2. (IP(u)) is a relaxation of (IP). Proof. Firstly, the feasible region of (IP(u)) contains that of (IP),
X ⊇ {x ∈ X : Dx@d}.
Secondly, for all x feasible for (IP) the objective function of (IP(u)) is at least as large as that of (IP),
c T^ x + u T^ (d − Dx) = c T^ x +
i∈I
u (^) i (di − D (^) i,: x) ≥ 0 ,
where we have used that for feasible x, di − D (^) i,: x ≥ 0 for all i ∈ I and di − D (^) i,: x = 0 for all i ∈ E.
Obviously, there exist infinitely many Lagrangian relaxations (IP(u)) to choose from. So how should we choose the vector u? Since (IP(u)) is a relaxation of (IP), its optimal solution yields an upper bound z(u) on the optimal objective value z of (IP). It is therefore natural to seek the least upper bound of this kind. This leads to the Lagrangian dual problem
wLD = min{z(u) : u ∈ R m + , u (^) i ≥ 0 ∀i ∈ I }.
As with any relaxation technique, Lagrangian relaxation can sometimes yield an op- timal solution to the original problem (IP):
Proposition 11.3. If x(u) is optimal for (IP(u)) and satisfies Dx(u)@d and
(Dx(u))i = di ∀ i ∈ I s.t. u (^) i > 0 , (11.1)
then x(u) is optimal for (IP).
Proof.
z
Prop11. 2 ≤ z(u) = c T^ x(u) + u T^ (d − Dx(u))
(11.1) = c T^ x(u) ≤ z.
11.1. Lagrangian Relaxation of the UFL. Lagrangian relaxation is a partic- ularly powerful approach in the context of the uncapacitated facility location problem we encountered in Section 1,
(IP) z = max
i∈M
j∈N
c (^) ij xij −
j∈N
f (^) j y (^) j
s.t.
j∈N
xij = 1 (i ∈ M )
xij − y (^) j ≤ 0 (i ∈ M, j ∈ N ) x ∈ R | +M |×|N^ |, y ∈ B|N^ |^ ,
where M is the set of customer locations, N is the set of potential facility locations, f (^) j are the fixed costs for opening facility j, and where we replaced the original servicing costs c (^) ij with −c (^) ij to turn the problem into a maximisation problem.
”Dualising” the demand constraints
j∈N xij^ = 1, we find the Lagrangian relax- ation
(IP(u)) z(u) = max
i∈M
j∈N
(c (^) ij − u (^) i )xij −
j∈N
f (^) j y (^) j +
i∈M
u (^) i
s.t. xij − y (^) j ≤ 0 (i ∈ M, j ∈ N ) x ∈ R | +M |×|N^ |, y ∈ B|N^ |^.
where δ(S, S c^ ) is the set of edges in E with one end point in S and the other in S c^ := V \ S. Since δ(S, S c^ ) = δ(S c^ , S), we now have
e∈E(S)
xe =
e∈δ(S,S c^ )
xe = |S c^ | −
e∈E(S c^ )
xe ,
and hence,
e∈E(S) xe^ ≤^ |S|^ −^1 ⇔^
e∈E(S c^ ) xe^ ≤^ |S^
c (^) | − 1.
Eliminating all subtour elimination constraints for which 1 ∈ S, dualising all degree constraints ∑
e∈δ(i)
xe = 2
except for i = 1, and introducing the (redundant) constraint ∑
e∈E
xe = n
obtained by summing all degree constraints, we arrive at the following Lagrangian relaxation of the STSP,
(IP(u)) z(u) = min
e=(ij)∈E
(c (^) e − u (^) i − u (^) j )xe + 2
i∈V
u (^) i
∑
e∈δ(1)
xe = 2
∑
e∈E(S)
xe ≤ |S| − 1 , ∀S ⊂ V s.t. 2 ≤ |S| ≤ |V | − 1 , 1 ∈/ S
∑
e∈E
xe = n
x ∈ B|E|^.
Definition 11.5. A 1-tree in an undirected graph G = (V, E) is a subgraph that consists of the union of two edges adjacent to node 1 and a spanning tree on the remaining nodes.
Lemma 11.6. x ∈ B|E|^ is feasible for (IP(u)) if and only if the subgraph Gx := (V, {e ∈ E : xe = 1}) is a 1-tree in G = (V, E).
Proof. The constraint
e∈δ(1) xe^ = 2 guarantees that exactly two edges are incident to node 1 in the subgraph∑ Gx := (V, {e ∈ E : xe = 1}. The constraints
e∈E(S) xe^ ≤^ |S|^ −^ 1 guarantee that when node 1 is removed, then there is no cycle left in the subgraph Gx. The constraint
e∈E xe^ =^ n^ guarantees that there are n − 2 = |V \ { 1 }| − 1 (11.2)
edges in Gx \ δ(1), and since Gx \ δ(1) does not contain any cycles, (11.2) shows that it must be a tree, and hence, Gx is a 1-tree. On the other hand, it is straightforward
to check that the incidence vector x of a 1-tree satisfies the constraints.
Combining the insight of Lemma 11.6 with the results of Section 8, it follows that (IP(u)) can be solved via a greedy approach: Build a maximum weight spanning tree on the nodes { 2 ,... , n} and add the two heaviest edges incident to node 1.
Example 36. Let us now look at a numerical example and consider the STSP on 5 nodes with edge cost matrix
[c (^) e ] =
Note that the Lagrange multipliers u are unrestricted, as we have dualised equality constraints. Therefore,
u =
is a legitimate choice. Writing ¯c (^) ij := c (^) ij − u (^) i − u (^) j , we obtain the revised edge cost matrix
[¯c (^) e ] =
To build a minimum weight 1 -tree, we i) find a minimum weight spanning tree on the nodes { 2 , 3 , 4 , 5 } ii) and add two edges of minimum weight incident to 1.
In an earlier sections we have seen that the greedy algorithm solves part i): Order the edges according to increasing edge weight and add them in this order as long as they don’t create a cycle. Our edges are ordered as follows,
(4, 5), (2, 3), (3, 4), (2, 4), (3, 5), (2, 5).
We choose (4, 5), (2, 3), (3, 4) in this order and then stop because adding any of the other edges would create a cycle. Further, part ii) is trivial to solve, as (1, 2) and (1, 5) are the cheapest edges incident to node 1. Therefore, {(1, 2), (1, 5), (4, 5), (2, 3), (3, 4)} is an optimal 1 -tree. The corresponding incidence vector is actually is STSP-feasible and corresponds to a Hamiltonian tour
1 → 2 → 3 → 4 → 5 → 1.
By Proposition 11.3, it follows that this is an optimal STSP-tour.
dual, we obtain
wLD = max μ∈R T
t=
μt (c T^ x[t]^ )
s.t.
t=
μt (Dx[t]^ − d) ≤ 0
∑^ T
t=
μt = 1
μ ≥ 0.
Since { (^) T ∑
t=
μt x[t]^ : μ ≥ 0 ,
t=
μt = 1
= conv
{x[1]^ ,... , x[T^ ]^ }
is the convex hull of X, we thus arrive at the following result:
Theorem 11.7.
wLD = max c T^ x s.t. Dx ≤ d x ∈ conv(X).
We remark that this result still holds true for arbitrary feasible sets of the form
X = {x ∈ Zn + : Ax ≤ b},
not just in the finite case.
Corollary 11.8. i) If {x ∈ R n + : Ax ≤ b} is an ideal formulation of X, then
conv(X) = {x ∈ R n^ : Ax ≤ b, x ≥ 0 },
and hence, wLD coincides with the bound produced by the LP relaxation of (IP),
wLD = max{c T^ x : Dx ≤ d, Ax ≤ b, x ≥ 0 }.
ii) If {x ∈ R n + : Ax ≤ b} is not an ideal formulation of X then
conv(X) ⊂ {x ∈ R n + : Ax ≤ b}
and wLD is generally a tighter bound than the one given by the LP relaxation of (IP).
Let us remark that in situation ii) one is rewarded with a stronger bound, but this comes at a price since the fact that {x ∈ R n + : Ax ≤ b} is not an ideal formulation of X means that (IP(u)) may not be that easy to solve. Situation i) is not rare, because max{c T^ x : x ∈ X} is an easy problem mainly when {x ∈ R n + : Ax ≤ b} is an ideal formulation of X. Solving the Lagrangian dual is nonetheless interesting in this case, as solving the LP relaxation directly may be too costly. For example, in Section 11 we found that a Lagrangian relaxation of the STSP is given by
(IP(u)) z(u) = min
e=(ij)∈E
(c (^) e − u (^) i − u (^) j )xe + 2
i∈V
u (^) i
s.t.
e∈δ(1)
xe = 2
∑
e∈E(S)
xe ≤ |S| − 1 , ∀S ⊂ V s.t. 2 ≤ |S| ≤ |V | − 1 , 1 ∈/ S
∑
e∈E
xe = n
x ∈ B|E|^.
Solving the LP relaxation directly would be difficult, as it contains an exponential number of constraints. However, solving (IP(u)) via the greedy algorithm is simple, and we will next see that when (IP(u)) can be solved efficiently, then the Lagrangian Dual can in general also be solved reasonably efficiently using the subgradient algo- rithm.
The above discussion revealed that z(u) is a piecewise linear function
z(u) = max t=1,...,T
c T^ x[t]^ + u T^ (d − Dx[t]^ )
Furthermore, this function is convex, since the linear functions u 0 → c T^ x[t]^ + u T^ (d − Dx[t]^ ) are convex and the pointwise maximum of a set of convex functions is again convex. Note however that z(u) is not differentiable at the ”breakpoints” where the maximum in (11.3) is achieved by more than one index, and that to solve (LD) we need to minimise z(u) over u ≥ 0. Convex nondifferentiable functions can often be reasonably well solved by the subgradient algorithm. Before we can explain the algo- rithm, we need to understand the notion of a subgradient.
Lemma 11.9. Let f : R m^ → R be a convex function with gradient γ = ∇f (u) at u ∈ R m^. Then the first order Taylor approximation satisfies
f (u) + γ T^ (v − u) ≤ f (v)
for all v ∈ R m^.
Proof. By definition, f is convex if for all u, v ∈ R m^ and λ ∈ [0, 1],
f (λv + (1 − λ)u) ≤ λf (v) + (1 − λ)f (u).
Therefore,
f (u) + f (u + λ(v − u)) − f (u) λ
≤ f (v),