






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Primal-Dual path-following for linear programming
Typology: Study notes
1 / 10
This page cannot be seen from the preview
Don't miss anything!







RAPHAEL HAUSER MATHEMATICAL INSTITUTE, UNIVERSITY OF OXFORD
(P) min x∈Rn^
cTx
s.t. Ax = b, x ≥ 0.
The simplex algorithm on the other hand is known to take a number of bit oper- ations that is exponential in the bit-length of (A, b, c) on certain LP instances. The emergence of polynomial-time nterior-point methods for LP has therefore been hailed as a great success, and it has sparked a huge research effort into improving these methods and extending them to other convex problem classes. Today, interior-point methods are often the best approach to solving very large scale LP problems, and they are the method of choice for other important classes of convex programming. We cannot enter the discussion of methods that underlie industrial-strength software here, but we will use this lecture to outline the design and analysis of the simplest case of such an algorithm.
(P) min x∈Rn cTx
s.t. Ax = b, x ≥ 0.
(D) max y∈Rm
bTy
s.t. ATy + s = c, s ≥ 0.
For the purposes of the developing a lean theory, we will make the following reg- ularity assumptions:
Definition 2.1. We say that (P) and (D) satisfy the standard LP regularity assumption if the following conditions are met:
i) A has linearly independent row vectors, that is, rank(A) = m, ii) (P) is strictly feasible, that is, there exists a point x ∈ Rn^ such that Ax = b and x > 0 componentwise, iii) (D) is strictly feasible, that is, there exist points (y, s) ∈ Rm^ × Rn^ such that ATy + s = c and s > 0 componentwise.
Note that these regularity assumptions are nothing else but the Slater constraint qualification both for (P) and (D). Whenever the standard LP regularity assumption does not hold, we can preprocess the input data A, b, c of the problems (P),(D) and obtain equivalent problems (P’),(D’) that satisfy the assumption. Thus, there is no loss of generality in assuming that (P) and (D) are regular. The following notation will subsequently be used for the primal, dual and primal- dual feasible domains:
FP = {x : Ax = b, x ≥ 0 }, FD = {(y, s) : ATy + s = c, s ≥ 0 }, F P◦ = {x : Ax = b, x > 0 }, F D◦ = {(y, s) : ATy + s = c, s > 0 }, F◦=F P◦ × F D◦. For μ > 0 we consider the following perturbations of (P) and (D):
(P)μ min x∈Rn cTx + μf (x)
s.t. Ax = b x > 0.
(D)μ max y∈Rm bTy − μf (s)
s.t. ATy + s = c, s > 0.
In both problems
f : Rn ++ → R
x 7 → −
∑^ n
j=
log(xj )
is the logarithmic barrier function. In Problem Set 6 we studied the duality/optimality theory of problems (P)μ and (D)μ and found the following result:
Theorem 2.2. Let (P),(D) satisfy the standard LP regularity assumption. Then x(μ) ∈ Rn^ and
y(μ), s(μ)
∈ Rm^ × Rn^ are optimal for (P)μ and (D)μ respectively if and only if the following system holds true:
ATy + s = c Ax = b XSe = μe x, s > 0 ,
Now note that since ˆx, ˆs > 0, there exist vectors l, u > 0 such that
ˆsTx + μf (x) ≤ sˆT^ xˆ + μf (ˆx) ⇒ l ≤ x ≤ u.
Therefore, (P)μ is equivalent to
(P’)μ min ˆsTx + μf (x) s.t. Ax = b, l ≤ x ≤ u.
Since {x : Ax = b, l ≤ x ≤ u} is a compact subset of Rn ++ and ˆsTx + μf (x) is contin- uous, the minimum of (P’)μ is attained.
Definition 3.2. For μ > 0 let us write
x(μ), y(μ), s(μ)
for the unique solution of the central path equations (2.1). Then the set {x(μ) : μ > 0 } is called the primal central path, {(y(μ), s(μ) : μ > 0 } is the dual central path, and {(x(μ), y(μ), s(μ)) : μ > 0 } is the primal-dual central path.
The central paths are smooth curves leading to a primal-dual optimal pair of solutions for the original problems (P),(D):
Theorem 3.3. The map
μ 7 →
x(μ), y(μ), s(μ)
is continuously differentiable. Furthermore, there exist x∗^ and (y∗, s∗) which are op- timal solutions to (P) and (D) respectively such that
lim μ↓ 0
x(μ), y(μ), s(μ)
= (x∗, y∗, s∗).
x y s
ATy + s − c Ax − b XSe − μe
that is, we solve the system
∆x ∆y ∆s
ATy + s − c Ax − b XSe − μe
for (∆x, ∆y, ∆s) and set
x+ y+ s+
x y s
∆x ∆y ∆s
Note that we have neglected the positivity constraints x, s > 0 of the central path equations. We could enforce these by taking a damped Newton step α(∆x, ∆y, ∆s)T rather than the full step (∆x, ∆y, ∆s)T^ when this takes (¯x+, y¯+, ¯s+) outside of the domain x, s > 0, just as we did in our barrier methods for nonlinear programming. However, a nice feature of our algorithm will be that this issue is dealt with automat- ically through the notion of centrality developed below. We will now describe and analyse an algorithm that iterates over points that satisfy the constraints
ATy + s = c Ax = b x, s > 0
but not necessarily the equation XSe = μe. This requires a starting point (x, y, s) that satisfies (4.2). This issue can be dealt with via a phase I type auxiliary problem. Thus, we may simply assume that such a point is available.
4.1. Centrality and the Duality Gap. In order to be able to assure that the iterates of our algorithm stay well inside the domain x, s > 0, we need a measure of centrality, or of “nearness” to the central path.
Definition 4.1. For all ω = (x, y, s) ∈ F◦^ =
(x, y, s) : ATy + s = c, Ax = b, x, s > 0
we define
μ(ω) :=
∑n j=1 xj^ sj n
Recall that LP duality showed that any feasible solution of (P) yields an upper bound on the optimal solution of (D), and any feasible solution of (D) yields a lower bound on the optimal solution of (P).
Definition 4.2. Let x and (y, s) be primal and dual feasible points. The duality gap associated with these solutions is defined as cTx − bTy.
Strong LP duality shows that the duality gap becomes zero at a primal-dual optimal point ω∗^ = (x∗, y∗, s∗). The number μ(ω) is useful in monitoring the progress of an algorithm because it is proportional to the duality gap: if ω = (x, y, s) is primal- dual feasible, then
cTx − bTy = xT(c − ATy) = xTs = nμ(ω).
It is thus reasonable to fix a number σ ∈ (0, 1) and to set μ = σμ(ω) in the system (4.1). That is to say, we are aiming to reduce the duality gap by a constant factor in each iteration. Another interesting observation is that ω = (x, y, s) ∈ F◦^ lies on the primal-dual central path if and only if XSe = μ(ω)e. This can be used to define a neighbourhood of the central path:
Definition 4.3. For θ ∈ (0, 1), let
N 2 (θ) :=
ω = (x, y, s) ∈ F◦^ : ‖XSe − μ(ω)e‖ 2 ≤ θμ(ω)
ω 0
ω 1
ω∗
ωμ
F◦
N 2 (θ)
Fig. 4.2. Choosing σ too small and aiming at a point ωμ too far away makes the update leave the narrow neighbourhood N 2 (θ).
Algorithm 4.4 (SPF). S0 Choose θ, δ ∈ (0, 1) be such that
θ^2 + δ^2 23 /^2 (1 − θ)
δ √ n
θ.
Set σ := 1 − √δn and choose ω 0 = (x 0 , y 0 , s 0 ) ∈ N 2 (θ). S1 For k = 0, 1 ,... repeat solve (4.3) with ω = ωk for ∆ω := (∆x, ∆y, ∆s) compute ωk+1 = ωk + ∆ω end
Theorem 5.1. The sequence (ωk)N generated by Algorithm SPF satisfies ωk ∈ N 2 (θ) for all k ∈ N, and
μ(ωk) =
δ √ n
)k μ(ω 0 ).
An immediate consequence of Theorem 5.1 is that it takes only logarithmically many iterations to reduce the duality gap below a desired threshold ǫ > 0:
Corollary 5.2. After at most k = O
n log n×μ ǫ(ω^0 )
iterations Algorithm SPF produces a point ωk = (xk, yk, sk) ∈ F◦^ such that
cTxk − bTyk ≤ ǫ.
Theorem 5.1 readily follows from the following result:
Lemma 5.3. Let ω = (x, y, s) ∈ N 2 (θ) and let ω+ = (x+, y+, s+) = ω + ∆ω, where ∆ω = (∆x, ∆y, ∆s) solves the system
∆x ∆y ∆s
XSe − σμ(ω)e
with σ = 1 − √δn and θ, δ chosen as in the initialisation step of Algorithm SPF. Then
ω+ ∈ N 2 (θ) and μ(ω+) = σμ(ω).
Proof. Let μ+ =
1 − √δn
μ(ω). We claim that the following three relations hold true:
μ+ =
eTX+S+e n
‖X+S+e − μ+e‖ ≤ θμ+, (5.3) x+, s+ > 0. (5.4)
Clearly, these relations imply that the lemma holds true.
We will establish the validity of Claims (5.2), (5.3), and (5.4) separately after proving the following two technical lemmas:
Lemma 5.4. ∆xT∆s = 0.
Proof. This follows readily from the first two blocks of equations in (5.1).
Lemma 5.5. Let u, v ∈ Rn^ be such that uTv ≥ 0 and let U = Diag(u), V = Diag(v). Then
‖U V e‖ ≤
‖u + v‖^2 23 /^2
Proof. First note that for all α, β ∈ R such that αβ ≥ 0 it is the case that
|αβ| ≤ (α + β)^2 4
Consider the index sets I := {j : uj vj ≥ 0 } and J := {j : uj vj < 0 }. Using this notation we can write uTv =
j∈I |uj^ vj^ | −^
j∈J |uj^ vj^ | ≥^ 0. Therefore,
‖U V e‖^2 =
j∈I
(uj vj )^2 +
j∈J
(uj vj )^2 = ‖uvI ‖^22 + ‖uvJ ‖^22
≤ ‖uvI ‖^21 + ‖uvJ ‖^21 ≤ 2 ‖uvI ‖^21 ,
where uvI is a vector of length |I| with components {uj vj : j ∈ I} and similarly for uvJ , and where we used the well-known fact that ‖ · ‖ 2 ≤ ‖ · ‖ 1. Finally, this proves that
‖U V e‖ ≤
2 ‖uvI ‖ 1
(5.5) ≤
j∈I
(uj + vj )^2 ≤
‖u + v‖^2 23 /^2
shows that XSe − μe ⊥ e.
Lemma 5.8. Claim (5.4) is true.
Proof. We begin with the case θ ≤ 1 /2 which is easier to understand. Relation (5.3) shows that (x+)j (s+)j ≥ (1 − θ)μ+ > 0 for all j. So, if (x+)j < 0 for some j then (s+)j is negative too, and then
∆xj ∆sj ≥ (x+)j (s+)j > (1 − θ)μ+. (5.12)
On the other hand,
∆xj ∆sj ≤ ‖∆X∆Se‖
(5.3),(5.7) ≤ θμ+. (5.13)
The combination of (5.12) and (5.13) yields
(1 − θ)μ+ ≤ ∆xj ∆sj < θμ+
which implies the contradiction 2θ > 1 and proves the claim. If we do not assume θ ≤ 1 /2, we can proceed as follows: let
ω(α) :=
x(α), y(α), s(α)
:= (x, y, s) + α(∆x, ∆y, ∆s).
Proceeding as in the proof of (5.2) and (5.3), one can establish that for all α ∈ [0, 1],
μ
ω(α)
eTX(α)S(α)e n
αδ √ n
μ(ω),
and ‖X(α)S(α)e − μ(ω(α))e‖ ≤ θμ(ω(α)). Thus, xj (α)sj (α) ≥ (1 − θ)μ(ω(α)) for all α ∈ [0, 1]. This implies xj (α), sj (α) > 0 for all α ∈ [0, 1] by continuity of x(α), s(α) and by the fact that (1 − θ)μ(ω(α)) > 0. In particular, x+ = x(1) > 0 and s+ = s(1) > 0, which proves the claim.