









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An in-depth analysis of the analytic center cutting-plane method (accpm) for non-differentiable convex optimization. It explains the algorithm, its convergence, and presents numerical examples. Accpm is an iterative method used to find the optimal solution of a convex optimization problem by iteratively computing the analytic center of the current feasible region and adding the corresponding cutting plane to the feasible region.
Typology: Study notes
1 / 15
This page cannot be seen from the preview
Don't miss anything!










1 Analytic center cutting-plane method 2
2 Computing the analytic center 3
3 Pruning constraints 5
4 Lower bound and stopping criterion 5
5 Convergence proof 7
6 Numerical examples 10 6.1 Basic ACCPM................................... 10 6.2 ACCPM with constraint dropping........................ 10 6.3 Epigraph ACCPM................................ 13
In these notes we describe in more detail the analytic center cutting-plane method (AC- CPM) for non-differentiable convex optimization, prove its convergence, and give some nu- merical examples. ACCPM was developed by Goffin and Vial [GV93] and analyzed by Nesterov [Nes95] and Atkinson and Vaidya [AV95]. These notes assume a basic familiarity with convex optimization (see [BV04]), cutting- plane methods (see the EE364b notes Localization and Cutting-Plane Methods), and subgra- dients (see the EE364b notes Subgradients).
1 Analytic center cutting-plane method
The basic ACCPM algorithm is:
Analytic center cutting-plane method (ACCPM)
given an initial polyhedron P 0 known to contain X. k := 0. repeat Compute x(k+1), the analytic center of Pk. Query the cutting-plane oracle at x(k+1). If the oracle determines that x(k+1)^ ∈ X, quit. Else, add the returned cutting-plane inequality to P. Pk+1 := Pk ∩ {z | aT^ z ≤ b} If Pk+1 = ∅, quit. k := k + 1.
There are several variations on this basic algorithm. For example, at each step we can add multiple cuts, instead of just one. We can also prune or drop constraints, for example, after computing the analytic center of Pk. Later we will see a simple but non-heuristic stopping criterion. We can construct a cutting-plane aT^ z ≤ b at x(k), for the standard convex problem
minimize f 0 (x) subject to fi(x) ≤ 0 , i = 1,... , m,
as follows. If x(k)^ violates the ith constraint, i.e., fi(x(k)) > 0, we can take
a = gi, b = gTi x(k)^ − fi(x(k)), (1)
where gi ∈ ∂fi(x(k)). If x(k)^ is feasible, we can take
a = g 0 , b = gT 0 x(k)^ − f 0 (x(k)) + f (^) best(k), (2)
where g 0 ∈ ∂f 0 (x(k)), and f (^) best(k) is the best (lowest) objective value encountered for a feasible iterate.
where g = − diag(1/yi) 1 is the gradient of the objective. We also define r to be (rd, rp). The Newton step at a point (x, y, ν) is defined by the system of linear equations
∆x ∆y ∆ν
= −
[ rd rp
] ,
where H = diag(1/y^2 i ) is the Hessian of the objective. We can solve this system by block elimination (see [BV04, §10.4]), using the expressions
∆x = −(AT^ HA)−^1 (AT^ g − AT^ Hrp), ∆y = −A∆x − rp, (6) ∆ν = −H∆y − g − ν.
We can compute ∆x from the first equation in several ways. We can, for example, form AT^ HA, then compute its Cholesky factorization, then carry out backward and forward substitution. Another option is to compute ∆x by solving the least-squares problem
∆x = argminz
∥∥ ∥H^1 /^2 Az − H^1 /^2 rp + H−^1 /^2 g
∥∥ ∥.
The infeasible start Newton method is:
Infeasible start Newton method.
given starting point x, y ≻ 0, tolerance ǫ > 0, α ∈ (0, 1 /2), β ∈ (0, 1). ν := 0. Compute residuals from (5). repeat
This method works quite well, unless the polyhedron is empty (or, in practice, very small), in which case the algorithm does not converge. To guard against this, we fix a maximum number of iterations. Typical parameter values are β = 0.5, α = 0.01, with maximum iterations set to 50.
3 Pruning constraints
There is a simple method for ranking the relevance of the inequalities aTi x ≤ bi, i = 1,... , m that define a polyhedron P, once we have computed the analytic center x∗. Let
H = ∇^2 Φ(x∗) =
∑^ m
i=
(bi − aTi x)−^2 aiaTi.
Then the ellipsoid Ein = {z | (z − x∗)T^ H(z − x∗) ≤ 1 }
lies inside P, and the ellipsoid
Eout = {z | (z − x∗)T^ H(z − x∗) ≤ m^2 },
which is Ein scaled by a factor m about its center, contains P. Thus the ellipsoid Ein at least grossly (within a factor m) approximates the shape of P. This suggests the (ir)relevance measure
ηi =
bi − aTi x∗ ‖H−^1 /^2 ai‖
bi − aTi x∗ √ aTi H−^1 ai
for the inequality aTi x ≤ bi. This factor is always at least one; if it is m or larger, then the inequality is certainly redundant. These factors (which are easily computed from the computations involved in the New- ton method) can be used to decide which constraints to drop or prune. We simply drop constraints with the large values of ηi; we keep constraints with smaller values. One typical scheme is to keep some fixed number N of constraints, where N is usually chosen to be be- tween 3n and 5n. When this is done, the computational effort per iteration (i.e., centering) does not grow as ACCPM proceeds, as it does when no pruning is done.
4 Lower bound and stopping criterion
In §7 of the notes Localization and Cutting-Plane Methods we described a general method for constructing a lower bound on the optimal value p⋆^ of the convex problem
minimize f 0 (x) subject to f 1 (x) ≤ 0 , Cx d,
assuming we have evaluated the value and a subgradient of its objective and constraint functions at some points. For notational simplicity we lump multiple constraints into one by forming the maximum of the constraint functions. We will re-order the iterates so that at x(1),... , x(q), we have evaluated the value and a subgradient of the objective function f 0. This gives us the piecewise-linear underestimator fˆ 0 of f 0 , defined as
fˆ 0 (z) = max i=1,...,q
( f 0 (x(i)) + g( 0 i )T(z − x(i))
) ≤ f 0 (z).
Using these values of λ and μ, we conclude that
p⋆^ ≥ l(k+1),
where l(k+1)^ =
∑q i=1 λi(f 0 (x (i)) − g(i)T 0 x (i)) + ∑k i=q+1 λi(f^1 (x
(i)) − g(i)T 1 x (i)) − dT (^) μ.
Let l(bestk) be the best lower bound found after k iterations. The ACCPM algorithm can
be stopped once the gap f (^) best(k) − l(bestk) is less than a desired value ǫ > 0. This guarantees that x(k)^ is, at most, ǫ-suboptimal.
5 Convergence proof
In this section we give a convergence proof for ACCPM, adapted from Ye [Ye97, Chap. 6]. We take the initial polyhedron as the unit box, centered at the origin, with unit length sides, i.e., the initial set of linear inequalities is
−(1/2) 1 z (1/2) 1 ,
so the first analytic center is x(1)^ = 0. We assume the target set X contains a ball with radius r < 1 /2, and show that the number of iterations is no more than a constant times n^2 /r^2. Assuming the algorithm has not terminated, the set of inequalities after k iterations is
−(1/2) 1 z (1/2) 1 , aTi z ≤ bi, i = 1,... , k. (12)
We assume the cuts are neutral, so bi = aTi x(i)^ for i = 1,... , k. Without loss of generality we normalize the vectors ai so that ‖ai‖ 2 = 1. We will let φk : Rn^ → R be the logarithmic barrier function associated with the inequalities (12),
φk(z) = −
∑^ n
i=
log(1/2 + zi) −
∑^ n
i=
log(1/ 2 − zi) −
∑^ k
i=
log(bi − aTi x).
The iterate x(k+1)^ is the minimizer of this logarithmic barrier function. Since the algorithm has not terminated, the polyhedron Pk defined by (12) still contains the target set X, and hence also a ball with radius r and (unknown) center xc. We have (− 1 /2 + r) 1 xc (1/ 2 − r) 1 , and the slacks of the inequalities aTi z ≤ bi evaluated at xc also exceed r:
bi − sup ‖v‖ 2 ≤ 1
aTi (xc + rv) = bi − aTi xc − r‖ai‖ 2 = bi − aTi xc − r ≥ 0.
Therefore φk(xc) ≤ −(2n + k) log r and, since x(k)^ is the minimizer of φk,
φk(x(k)) = infz φk(z) ≤ φk(xc) ≤ (2n + k) log(1/r). (13)
We can also derive a lower bound on φk(x(k)) by noting that the functions φj are self- concordant for j = 1,... , k. Using the inequality (9.48), [BV04, p.502], we have
φj (x) ≥ φj (x(j)) +
√ (x − x(j))T^ Hj (x − x(j)) − log(1 +
√ (x − x(j))T^ Hj (x − x(j)))
for all x ∈ dom φj , where Hj is the Hessian of φj at x(j). If we apply this inequality to φk− 1 we obtain
φk(x(k)) = infx φk(x)
= infx
( φk− 1 (x) − log(−aTk (x − x(k−1)))
)
≥ inf v
( φk− 1 (x(k−1)) +
√ vT^ Hk− 1 v − log(1 +
√ vT^ Hk− 1 v) − log(−aTk v)
) .
By setting the gradient of the righthand side equal to zero, we find that it is minimized at
vˆ = −
√ aTk H k−−^11 ak
H k−−^11 ak,
which yields
φk(x(k)) ≥ φk− 1 (x(k−1)) +
√ ˆvT^ Hk− 1 ˆv − log(1 +
√ ˆvT^ Hk− 1 ˆv) − log(−aTk ˆv)
= φ(k−1)(xk−^1 ) + 0. 1744 −
log(aTk H k−−^11 ak)
≥ 0. 1744 k −
∑^ k
i=
log(aTi H− i−^11 ai) + 2n log 2
≥ 0. 1744 k −
k 2
log
( 1 k
∑^ k
i=
aTi H i−−^11 ai
)
k 2
log
( 1 k
∑^ k
i=
aTi H i−−^11 ai
)
because φ 0 (x(0)) = 2n log 2. We can further bound the second term on the righthand side by noting that
Hi = 4 diag( 1 + 2x(i))−^2 + 4 diag( 1 − 2 x(i))−^2 +
∑^ i
j=
(bj − aTj x(i))^2
aj aTj I +
n
∑^ i
j=
aj aTj
because −(1/2) 1 ≺ x(i)^ ≺ (1/2) 1 and
bi − aTi x(k)^ = aTi (x(i−1)^ − x(k)) ≤ ‖ai‖ 2 ‖x(i−1)^ − x(k)‖ 2 ≤
n.
Define B 0 = I and Bi = I + (1/n)
∑i j=1 aj^ a T j for^ i^ ≥^ 1. Then
n log(1 + k/n^2 ) = n log(Tr Bk/n) ≥ log det Bk
6 Numerical examples
We consider the problem of minimizing a piecewise linear function:
minimize f (x) = maxi=1,...,m(aTi x + bi),
with variable x ∈ Rn. The particular problem instance we use to illustrate the different methods has n = 20 variables and m = 100 terms, with problem data ai and bi generated from a unit normal distribution. Its optimal value (which is easily computed via linear programming) is f ⋆^ ≈ 1 .1.
We use the basic ACCPM algorithm described in §1, with the infeasible start Newton method used to carry out the centering steps. We take P 0 to be the unit box {z | ‖z‖∞ ≤ 1 }. We keep track of fbest, the best objective value found, and use this to generate deep objective cuts. Figure 1 shows convergence of f (k)^ − f ⋆, which is nearly linear (on a semi-log scale).
Figure 2 shows the convergence of the true suboptimality f (^) best(k) −f ⋆^ (which is not available
as the algorithm is running), along with the upper bound on suboptimality f (^) best(k) −l(bestk) (which is available as the algorithm runs).
Figure 3 shows f (^) best(k) − f ⋆^ versus the cumulative number of Newton steps required by the infeasible start Newton method in the centering steps. This plots shows that around 10 Newton steps are needed, on average, to carry out the centering. We can see that as the algorithm progresses (and P(k)^ gets small), there is a small increase in the number of Newton steps required to achieve the same factor increase in accuracy.
We illustrate ACCPM with constraint dropping, keeping at most N = 3n constraints, using the constraint dropping scheme described in §3. Figure 4 shows convergence of f (k)^ − f ⋆ with and without constraint dropping. The plots show that keeping only 3n constraints has almost no effect on the progress of the algorithm, as measured in iterations. At k = 200 iterations, the pruned polyhedron has 60 constraints, whereas the unpruned polyhedron has 240 constraints. The number of flops per Newton step is, to first order, mkn^2 , where mk is the number of constraints at iteration k, so the total flop count of iteration k can be estimated as Nkmkn^2 , where Nk is the number of Newton steps required in iteration k. Figure 5 shows the convergence of f (^) best(k) − f ⋆^ versus the (estimated) cummulative flop count.
0 50 100 150 200 10 −
10 −
10 −
10 −
100
101
k
f^ (k
f
⋆
Figure 1: The value of f (k)^ − f ⋆^ versus iteration number k, for the basic ACCPM.
0 50 100 150 200 10 −
10 −
10 0
102
104
k
f (^) best(k) − f ⋆ f (^) best(k) − l(bestk)
Figure 2: The value of f (^) best(k) − f ⋆^ (in blue) and the value of f (^) best(k) − l(bestk) (in red) versus iteration number k for the basic ACCPM.
0 50 100 150 200 10 −
10 −
10 −
10 −
100
101
Cumulative flops/10^6
f^ (k
) best
f
⋆
no dropping keeping 3n
Figure 5: The value of f (^) best(k) − f ⋆^ versus estimated cummulative flop count in the case where all constraints are kept (blue) and only 3n constraints are kept (red).
Figure 6 shows convergence of f (k)^ − f ⋆^ for the epigraph ACCPM. (See §6 of the EE364b notes Localization and Cutting-Plane Methods for details.) Epigraph ACCPM requires only 50 iterations to reach the same accuracy that was reached by basic ACCPM in 200 iterations. The convergence of f (k) best −^ f^
⋆ (^) versus the cumulative number of Newton steps is shown in
figure 7. We see that in epigraph ACCPM the average number of Newton steps per iteration is a bit higher than for basic ACCPM, but a substantial advantage remains.
0 10 20 30 40 50 10 −
10 −
10 −
10 −
100
101
k
f^ (k
f
⋆
Figure 6: The value of f (k)^ − f ⋆^ versus iteration number k, for epigraph ACCPM.
0 200 400 600 800 1000 1200 10 −
10 −
10 −
10 −
100
101
Newton iterations
f^ (k
) best
f
⋆
Figure 7: The value of f (^) best(k) − f ⋆^ versus the cumulative number of Newton steps, for epigraph ACCPM.