Primal Dual, Lecture Notes - Mathematics -, Study notes of Mathematical Methods

Primal-Dual path-following for linear programming

Typology: Study notes

2010/2011

Uploaded on 09/09/2011

luber-1
luber-1 🇬🇧

4.8

(12)

293 documents

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
C12.1B: CONTINUOUS OPTIMISATION
LECTURE 16: PRIMAL-DUAL PATH-FOLLOWING FOR LINEAR
PROGRAMMING
RAPHAEL HAUSER
MATHEMATICAL INSTITUTE, UNIVERSITY OF OXFORD
1. Interior-Point Methods for Linear Programming. The barrier method
we studied in Lecture 15 is an example of a so-called interior-point method: all inter-
mediate solutions are in the interior of the admissible domain. Although originally
devised for nonlinear programming, it was discovered in the mid 1980-ies that such
algorithms can also solve linear programming problems very efficiently.
Crucially, these algorithms can be designed so that they run in polynomial time,
that is, an upper bound on the number of bit operations performed by the algorithm
until completion can be given as a polynomial in the bit-length of the input data
(A, b, c) of a LP instance
(P) min
xRncTx
s.t. Ax =b,
x0.
The simplex algorithm on the other hand is known to take a number of bit oper-
ations that is exponential in the bit-length of (A, b, c) on certain LP instances. The
emergence of polynomial-time nterior-point methods for LP has therefore been hailed
as a great success, and it has sparked a huge research effort into improving these
methods and extending them to other convex problem classes.
Today, interior-point methods are often the best approach to solving very large
scale LP problems, and they are the method of choice for other important classes
of convex programming. We cannot enter the discussion of methods that underlie
industrial-strength software here, but we will use this lecture to outline the design
and analysis of the simplest case of such an algorithm.
2. Perturbations of LP problems. Recall the primal and dual linear pro-
gramming problems in standard form:
(P) min
xRncTx
s.t. Ax =b,
x0.
(D) max
yRmbTy
s.t. ATy+s=c,
s0.
For the purposes of the developing a lean theory, we will make the following reg-
ularity assumptions:
Definition 2.1. We say that (P) and (D) satisfy the standard LP regularity
assumption if the following conditions are met:
1
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Primal Dual, Lecture Notes - Mathematics - and more Study notes Mathematical Methods in PDF only on Docsity!

C12.1B: CONTINUOUS OPTIMISATION

LECTURE 16: PRIMAL-DUAL PATH-FOLLOWING FOR LINEAR

PROGRAMMING

RAPHAEL HAUSER MATHEMATICAL INSTITUTE, UNIVERSITY OF OXFORD

  1. Interior-Point Methods for Linear Programming. The barrier method we studied in Lecture 15 is an example of a so-called interior-point method: all inter- mediate solutions are in the interior of the admissible domain. Although originally devised for nonlinear programming, it was discovered in the mid 1980-ies that such algorithms can also solve linear programming problems very efficiently. Crucially, these algorithms can be designed so that they run in polynomial time, that is, an upper bound on the number of bit operations performed by the algorithm until completion can be given as a polynomial in the bit-length of the input data (A, b, c) of a LP instance

(P) min x∈Rn^

cTx

s.t. Ax = b, x ≥ 0.

The simplex algorithm on the other hand is known to take a number of bit oper- ations that is exponential in the bit-length of (A, b, c) on certain LP instances. The emergence of polynomial-time nterior-point methods for LP has therefore been hailed as a great success, and it has sparked a huge research effort into improving these methods and extending them to other convex problem classes. Today, interior-point methods are often the best approach to solving very large scale LP problems, and they are the method of choice for other important classes of convex programming. We cannot enter the discussion of methods that underlie industrial-strength software here, but we will use this lecture to outline the design and analysis of the simplest case of such an algorithm.

  1. Perturbations of LP problems. Recall the primal and dual linear pro- gramming problems in standard form:

(P) min x∈Rn cTx

s.t. Ax = b, x ≥ 0.

(D) max y∈Rm

bTy

s.t. ATy + s = c, s ≥ 0.

For the purposes of the developing a lean theory, we will make the following reg- ularity assumptions:

Definition 2.1. We say that (P) and (D) satisfy the standard LP regularity assumption if the following conditions are met:

i) A has linearly independent row vectors, that is, rank(A) = m, ii) (P) is strictly feasible, that is, there exists a point x ∈ Rn^ such that Ax = b and x > 0 componentwise, iii) (D) is strictly feasible, that is, there exist points (y, s) ∈ Rm^ × Rn^ such that ATy + s = c and s > 0 componentwise.

Note that these regularity assumptions are nothing else but the Slater constraint qualification both for (P) and (D). Whenever the standard LP regularity assumption does not hold, we can preprocess the input data A, b, c of the problems (P),(D) and obtain equivalent problems (P’),(D’) that satisfy the assumption. Thus, there is no loss of generality in assuming that (P) and (D) are regular. The following notation will subsequently be used for the primal, dual and primal- dual feasible domains:

FP = {x : Ax = b, x ≥ 0 }, FD = {(y, s) : ATy + s = c, s ≥ 0 }, F P◦ = {x : Ax = b, x > 0 }, F D◦ = {(y, s) : ATy + s = c, s > 0 }, F◦=F P◦ × F D◦. For μ > 0 we consider the following perturbations of (P) and (D):

(P)μ min x∈Rn cTx + μf (x)

s.t. Ax = b x > 0.

(D)μ max y∈Rm bTy − μf (s)

s.t. ATy + s = c, s > 0.

In both problems

f : Rn ++ → R

x 7 → −

∑^ n

j=

log(xj )

is the logarithmic barrier function. In Problem Set 6 we studied the duality/optimality theory of problems (P)μ and (D)μ and found the following result:

Theorem 2.2. Let (P),(D) satisfy the standard LP regularity assumption. Then x(μ) ∈ Rn^ and

y(μ), s(μ)

∈ Rm^ × Rn^ are optimal for (P)μ and (D)μ respectively if and only if the following system holds true:

ATy + s = c Ax = b XSe = μe x, s > 0 ,

Now note that since ˆx, ˆs > 0, there exist vectors l, u > 0 such that

ˆsTx + μf (x) ≤ sˆT^ xˆ + μf (ˆx) ⇒ l ≤ x ≤ u.

Therefore, (P)μ is equivalent to

(P’)μ min ˆsTx + μf (x) s.t. Ax = b, l ≤ x ≤ u.

Since {x : Ax = b, l ≤ x ≤ u} is a compact subset of Rn ++ and ˆsTx + μf (x) is contin- uous, the minimum of (P’)μ is attained.

Definition 3.2. For μ > 0 let us write

x(μ), y(μ), s(μ)

for the unique solution of the central path equations (2.1). Then the set {x(μ) : μ > 0 } is called the primal central path, {(y(μ), s(μ) : μ > 0 } is the dual central path, and {(x(μ), y(μ), s(μ)) : μ > 0 } is the primal-dual central path.

The central paths are smooth curves leading to a primal-dual optimal pair of solutions for the original problems (P),(D):

Theorem 3.3. The map

μ 7 →

x(μ), y(μ), s(μ)

is continuously differentiable. Furthermore, there exist x∗^ and (y∗, s∗) which are op- timal solutions to (P) and (D) respectively such that

lim μ↓ 0

x(μ), y(μ), s(μ)

= (x∗, y∗, s∗).

  1. A primal-dual interior-point algorithm. The critical step of our primal- dual interior-point algorithm will be the following: given an approximate solution (x, y, s) to the central path equations (2.1), find a better approximation. An obvious approach is to apply Newton’s method to find a zero of the map  

x y s

ATy + s − c Ax − b XSe − μe

that is, we solve the system  

0 AT^ I

A 0 0

S 0 X

∆x ∆y ∆s

ATy + s − c Ax − b XSe − μe

for (∆x, ∆y, ∆s) and set  

x+ y+ s+

x y s

∆x ∆y ∆s

Note that we have neglected the positivity constraints x, s > 0 of the central path equations. We could enforce these by taking a damped Newton step α(∆x, ∆y, ∆s)T rather than the full step (∆x, ∆y, ∆s)T^ when this takes (¯x+, y¯+, ¯s+) outside of the domain x, s > 0, just as we did in our barrier methods for nonlinear programming. However, a nice feature of our algorithm will be that this issue is dealt with automat- ically through the notion of centrality developed below. We will now describe and analyse an algorithm that iterates over points that satisfy the constraints

ATy + s = c Ax = b x, s > 0

but not necessarily the equation XSe = μe. This requires a starting point (x, y, s) that satisfies (4.2). This issue can be dealt with via a phase I type auxiliary problem. Thus, we may simply assume that such a point is available.

4.1. Centrality and the Duality Gap. In order to be able to assure that the iterates of our algorithm stay well inside the domain x, s > 0, we need a measure of centrality, or of “nearness” to the central path.

Definition 4.1. For all ω = (x, y, s) ∈ F◦^ =

(x, y, s) : ATy + s = c, Ax = b, x, s > 0

we define

μ(ω) :=

∑n j=1 xj^ sj n

Recall that LP duality showed that any feasible solution of (P) yields an upper bound on the optimal solution of (D), and any feasible solution of (D) yields a lower bound on the optimal solution of (P).

Definition 4.2. Let x and (y, s) be primal and dual feasible points. The duality gap associated with these solutions is defined as cTx − bTy.

Strong LP duality shows that the duality gap becomes zero at a primal-dual optimal point ω∗^ = (x∗, y∗, s∗). The number μ(ω) is useful in monitoring the progress of an algorithm because it is proportional to the duality gap: if ω = (x, y, s) is primal- dual feasible, then

cTx − bTy = xT(c − ATy) = xTs = nμ(ω).

It is thus reasonable to fix a number σ ∈ (0, 1) and to set μ = σμ(ω) in the system (4.1). That is to say, we are aiming to reduce the duality gap by a constant factor in each iteration. Another interesting observation is that ω = (x, y, s) ∈ F◦^ lies on the primal-dual central path if and only if XSe = μ(ω)e. This can be used to define a neighbourhood of the central path:

Definition 4.3. For θ ∈ (0, 1), let

N 2 (θ) :=

ω = (x, y, s) ∈ F◦^ : ‖XSe − μ(ω)e‖ 2 ≤ θμ(ω)

ω 0

ω 1

ω∗

ωμ

F◦

N 2 (θ)

Fig. 4.2. Choosing σ too small and aiming at a point ωμ too far away makes the update leave the narrow neighbourhood N 2 (θ).

Algorithm 4.4 (SPF). S0 Choose θ, δ ∈ (0, 1) be such that

θ^2 + δ^2 23 /^2 (1 − θ)

δ √ n

θ.

Set σ := 1 − √δn and choose ω 0 = (x 0 , y 0 , s 0 ) ∈ N 2 (θ). S1 For k = 0, 1 ,... repeat solve (4.3) with ω = ωk for ∆ω := (∆x, ∆y, ∆s) compute ωk+1 = ωk + ∆ω end

  1. Convergence Analysis of Algorithm SPF. Let us now analyse the con- vergence of the primal-dual short-step path-following method SPF. Our main theorem shows that θ and σ were chosen so that the iterates never leave the narrow neighbour- hood N 2 (θ) of the central path, and that the duality gap shrinks at a geometric rate:

Theorem 5.1. The sequence (ωk)N generated by Algorithm SPF satisfies ωk ∈ N 2 (θ) for all k ∈ N, and

μ(ωk) =

δ √ n

)k μ(ω 0 ).

An immediate consequence of Theorem 5.1 is that it takes only logarithmically many iterations to reduce the duality gap below a desired threshold ǫ > 0:

Corollary 5.2. After at most k = O

n log n×μ ǫ(ω^0 )

iterations Algorithm SPF produces a point ωk = (xk, yk, sk) ∈ F◦^ such that

cTxk − bTyk ≤ ǫ.

Theorem 5.1 readily follows from the following result:

Lemma 5.3. Let ω = (x, y, s) ∈ N 2 (θ) and let ω+ = (x+, y+, s+) = ω + ∆ω, where ∆ω = (∆x, ∆y, ∆s) solves the system  

0 AT^ I

A 0 0

S 0 X

∆x ∆y ∆s

XSe − σμ(ω)e

with σ = 1 − √δn and θ, δ chosen as in the initialisation step of Algorithm SPF. Then

ω+ ∈ N 2 (θ) and μ(ω+) = σμ(ω).

Proof. Let μ+ =

1 − √δn

μ(ω). We claim that the following three relations hold true:

μ+ =

eTX+S+e n

‖X+S+e − μ+e‖ ≤ θμ+, (5.3) x+, s+ > 0. (5.4)

Clearly, these relations imply that the lemma holds true.

We will establish the validity of Claims (5.2), (5.3), and (5.4) separately after proving the following two technical lemmas:

Lemma 5.4. ∆xT∆s = 0.

Proof. This follows readily from the first two blocks of equations in (5.1).

Lemma 5.5. Let u, v ∈ Rn^ be such that uTv ≥ 0 and let U = Diag(u), V = Diag(v). Then

‖U V e‖ ≤

‖u + v‖^2 23 /^2

Proof. First note that for all α, β ∈ R such that αβ ≥ 0 it is the case that

|αβ| ≤ (α + β)^2 4

Consider the index sets I := {j : uj vj ≥ 0 } and J := {j : uj vj < 0 }. Using this notation we can write uTv =

j∈I |uj^ vj^ | −^

j∈J |uj^ vj^ | ≥^ 0. Therefore,

‖U V e‖^2 =

j∈I

(uj vj )^2 +

j∈J

(uj vj )^2 = ‖uvI ‖^22 + ‖uvJ ‖^22

≤ ‖uvI ‖^21 + ‖uvJ ‖^21 ≤ 2 ‖uvI ‖^21 ,

where uvI is a vector of length |I| with components {uj vj : j ∈ I} and similarly for uvJ , and where we used the well-known fact that ‖ · ‖ 2 ≤ ‖ · ‖ 1. Finally, this proves that

‖U V e‖ ≤

2 ‖uvI ‖ 1

(5.5) ≤

j∈I

(uj + vj )^2 ≤

‖u + v‖^2 23 /^2

shows that XSe − μe ⊥ e.

Lemma 5.8. Claim (5.4) is true.

Proof. We begin with the case θ ≤ 1 /2 which is easier to understand. Relation (5.3) shows that (x+)j (s+)j ≥ (1 − θ)μ+ > 0 for all j. So, if (x+)j < 0 for some j then (s+)j is negative too, and then

∆xj ∆sj ≥ (x+)j (s+)j > (1 − θ)μ+. (5.12)

On the other hand,

∆xj ∆sj ≤ ‖∆X∆Se‖

(5.3),(5.7) ≤ θμ+. (5.13)

The combination of (5.12) and (5.13) yields

(1 − θ)μ+ ≤ ∆xj ∆sj < θμ+

which implies the contradiction 2θ > 1 and proves the claim. If we do not assume θ ≤ 1 /2, we can proceed as follows: let

ω(α) :=

x(α), y(α), s(α)

:= (x, y, s) + α(∆x, ∆y, ∆s).

Proceeding as in the proof of (5.2) and (5.3), one can establish that for all α ∈ [0, 1],

μ

ω(α)

eTX(α)S(α)e n

αδ √ n

μ(ω),

and ‖X(α)S(α)e − μ(ω(α))e‖ ≤ θμ(ω(α)). Thus, xj (α)sj (α) ≥ (1 − θ)μ(ω(α)) for all α ∈ [0, 1]. This implies xj (α), sj (α) > 0 for all α ∈ [0, 1] by continuity of x(α), s(α) and by the fact that (1 − θ)μ(ω(α)) > 0. In particular, x+ = x(1) > 0 and s+ = s(1) > 0, which proves the claim.