Analytic Center Cutting-Plane Method for Convex Optimization, Study notes of Convex Optimization

An in-depth analysis of the analytic center cutting-plane method (accpm) for non-differentiable convex optimization. It explains the algorithm, its convergence, and presents numerical examples. Accpm is an iterative method used to find the optimal solution of a convex optimization problem by iteratively computing the analytic center of the current feasible region and adding the corresponding cutting plane to the feasible region.

Typology: Study notes

2011/2012

Uploaded on 07/15/2012

sagar
sagar 🇮🇳

5

(2)

69 documents

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Analytic Center Cutting-Plane Method
S. Boyd, L. Vandenberghe, and J. Skaf
February 6, 2007
Contents
1 Analytic center cutting-plane method 2
2 Computing the analytic center 3
3 Pruning constraints 5
4 Lower bound and stopping criterion 5
5 Convergence proof 7
6 Numerical examples 10
6.1 Basic ACCPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
6.2 ACCPM with constraint dropping . . . . . . . . . . . . . . . . . . . . . . . . 10
6.3 Epigraph ACCPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1
docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Analytic Center Cutting-Plane Method for Convex Optimization and more Study notes Convex Optimization in PDF only on Docsity!

Analytic Center Cutting-Plane Method

S. Boyd, L. Vandenberghe, and J. Skaf

February 6, 2007

Contents

1 Analytic center cutting-plane method 2

2 Computing the analytic center 3

3 Pruning constraints 5

4 Lower bound and stopping criterion 5

5 Convergence proof 7

6 Numerical examples 10 6.1 Basic ACCPM................................... 10 6.2 ACCPM with constraint dropping........................ 10 6.3 Epigraph ACCPM................................ 13

In these notes we describe in more detail the analytic center cutting-plane method (AC- CPM) for non-differentiable convex optimization, prove its convergence, and give some nu- merical examples. ACCPM was developed by Goffin and Vial [GV93] and analyzed by Nesterov [Nes95] and Atkinson and Vaidya [AV95]. These notes assume a basic familiarity with convex optimization (see [BV04]), cutting- plane methods (see the EE364b notes Localization and Cutting-Plane Methods), and subgra- dients (see the EE364b notes Subgradients).

1 Analytic center cutting-plane method

The basic ACCPM algorithm is:

Analytic center cutting-plane method (ACCPM)

given an initial polyhedron P 0 known to contain X. k := 0. repeat Compute x(k+1), the analytic center of Pk. Query the cutting-plane oracle at x(k+1). If the oracle determines that x(k+1)^ ∈ X, quit. Else, add the returned cutting-plane inequality to P. Pk+1 := Pk ∩ {z | aT^ z ≤ b} If Pk+1 = ∅, quit. k := k + 1.

There are several variations on this basic algorithm. For example, at each step we can add multiple cuts, instead of just one. We can also prune or drop constraints, for example, after computing the analytic center of Pk. Later we will see a simple but non-heuristic stopping criterion. We can construct a cutting-plane aT^ z ≤ b at x(k), for the standard convex problem

minimize f 0 (x) subject to fi(x) ≤ 0 , i = 1,... , m,

as follows. If x(k)^ violates the ith constraint, i.e., fi(x(k)) > 0, we can take

a = gi, b = gTi x(k)^ − fi(x(k)), (1)

where gi ∈ ∂fi(x(k)). If x(k)^ is feasible, we can take

a = g 0 , b = gT 0 x(k)^ − f 0 (x(k)) + f (^) best(k), (2)

where g 0 ∈ ∂f 0 (x(k)), and f (^) best(k) is the best (lowest) objective value encountered for a feasible iterate.

where g = − diag(1/yi) 1 is the gradient of the objective. We also define r to be (rd, rp). The Newton step at a point (x, y, ν) is defined by the system of linear equations

 

0 0 AT

0 H I

A I 0

 

 

∆x ∆y ∆ν

  = −

[ rd rp

] ,

where H = diag(1/y^2 i ) is the Hessian of the objective. We can solve this system by block elimination (see [BV04, §10.4]), using the expressions

∆x = −(AT^ HA)−^1 (AT^ g − AT^ Hrp), ∆y = −A∆x − rp, (6) ∆ν = −H∆y − g − ν.

We can compute ∆x from the first equation in several ways. We can, for example, form AT^ HA, then compute its Cholesky factorization, then carry out backward and forward substitution. Another option is to compute ∆x by solving the least-squares problem

∆x = argminz

∥∥ ∥H^1 /^2 Az − H^1 /^2 rp + H−^1 /^2 g

∥∥ ∥.

The infeasible start Newton method is:

Infeasible start Newton method.

given starting point x, y ≻ 0, tolerance ǫ > 0, α ∈ (0, 1 /2), β ∈ (0, 1). ν := 0. Compute residuals from (5). repeat

  1. Compute Newton step (∆x, ∆y, ∆ν) using (6).
  2. Backtracking line search on ‖r‖ 2. t := 1. while y + t∆y 6 ≻ 0, t := βt. while ‖r(x + t∆x, y + t∆y, ν + t∆ν)‖ 2 > (1 − αt)‖r(x, y, ν)‖ 2 , t := βt.
  3. Update. x := x + t∆x, y := y + t∆y, ν := ν + t∆ν. until y = b − Ax and ‖r(x, y, ν)‖ 2 ≤ ǫ.

This method works quite well, unless the polyhedron is empty (or, in practice, very small), in which case the algorithm does not converge. To guard against this, we fix a maximum number of iterations. Typical parameter values are β = 0.5, α = 0.01, with maximum iterations set to 50.

3 Pruning constraints

There is a simple method for ranking the relevance of the inequalities aTi x ≤ bi, i = 1,... , m that define a polyhedron P, once we have computed the analytic center x∗. Let

H = ∇^2 Φ(x∗) =

∑^ m

i=

(bi − aTi x)−^2 aiaTi.

Then the ellipsoid Ein = {z | (z − x∗)T^ H(z − x∗) ≤ 1 }

lies inside P, and the ellipsoid

Eout = {z | (z − x∗)T^ H(z − x∗) ≤ m^2 },

which is Ein scaled by a factor m about its center, contains P. Thus the ellipsoid Ein at least grossly (within a factor m) approximates the shape of P. This suggests the (ir)relevance measure

ηi =

bi − aTi x∗ ‖H−^1 /^2 ai‖

bi − aTi x∗ √ aTi H−^1 ai

for the inequality aTi x ≤ bi. This factor is always at least one; if it is m or larger, then the inequality is certainly redundant. These factors (which are easily computed from the computations involved in the New- ton method) can be used to decide which constraints to drop or prune. We simply drop constraints with the large values of ηi; we keep constraints with smaller values. One typical scheme is to keep some fixed number N of constraints, where N is usually chosen to be be- tween 3n and 5n. When this is done, the computational effort per iteration (i.e., centering) does not grow as ACCPM proceeds, as it does when no pruning is done.

4 Lower bound and stopping criterion

In §7 of the notes Localization and Cutting-Plane Methods we described a general method for constructing a lower bound on the optimal value p⋆^ of the convex problem

minimize f 0 (x) subject to f 1 (x) ≤ 0 , Cx  d,

assuming we have evaluated the value and a subgradient of its objective and constraint functions at some points. For notational simplicity we lump multiple constraints into one by forming the maximum of the constraint functions. We will re-order the iterates so that at x(1),... , x(q), we have evaluated the value and a subgradient of the objective function f 0. This gives us the piecewise-linear underestimator fˆ 0 of f 0 , defined as

fˆ 0 (z) = max i=1,...,q

( f 0 (x(i)) + g( 0 i )T(z − x(i))

) ≤ f 0 (z).

Using these values of λ and μ, we conclude that

p⋆^ ≥ l(k+1),

where l(k+1)^ =

∑q i=1 λi(f 0 (x (i)) − g(i)T 0 x (i)) + ∑k i=q+1 λi(f^1 (x

(i)) − g(i)T 1 x (i)) − dT (^) μ.

Let l(bestk) be the best lower bound found after k iterations. The ACCPM algorithm can

be stopped once the gap f (^) best(k) − l(bestk) is less than a desired value ǫ > 0. This guarantees that x(k)^ is, at most, ǫ-suboptimal.

5 Convergence proof

In this section we give a convergence proof for ACCPM, adapted from Ye [Ye97, Chap. 6]. We take the initial polyhedron as the unit box, centered at the origin, with unit length sides, i.e., the initial set of linear inequalities is

−(1/2) 1  z  (1/2) 1 ,

so the first analytic center is x(1)^ = 0. We assume the target set X contains a ball with radius r < 1 /2, and show that the number of iterations is no more than a constant times n^2 /r^2. Assuming the algorithm has not terminated, the set of inequalities after k iterations is

−(1/2) 1  z  (1/2) 1 , aTi z ≤ bi, i = 1,... , k. (12)

We assume the cuts are neutral, so bi = aTi x(i)^ for i = 1,... , k. Without loss of generality we normalize the vectors ai so that ‖ai‖ 2 = 1. We will let φk : Rn^ → R be the logarithmic barrier function associated with the inequalities (12),

φk(z) = −

∑^ n

i=

log(1/2 + zi) −

∑^ n

i=

log(1/ 2 − zi) −

∑^ k

i=

log(bi − aTi x).

The iterate x(k+1)^ is the minimizer of this logarithmic barrier function. Since the algorithm has not terminated, the polyhedron Pk defined by (12) still contains the target set X, and hence also a ball with radius r and (unknown) center xc. We have (− 1 /2 + r) 1  xc  (1/ 2 − r) 1 , and the slacks of the inequalities aTi z ≤ bi evaluated at xc also exceed r:

bi − sup ‖v‖ 2 ≤ 1

aTi (xc + rv) = bi − aTi xc − r‖ai‖ 2 = bi − aTi xc − r ≥ 0.

Therefore φk(xc) ≤ −(2n + k) log r and, since x(k)^ is the minimizer of φk,

φk(x(k)) = infz φk(z) ≤ φk(xc) ≤ (2n + k) log(1/r). (13)

We can also derive a lower bound on φk(x(k)) by noting that the functions φj are self- concordant for j = 1,... , k. Using the inequality (9.48), [BV04, p.502], we have

φj (x) ≥ φj (x(j)) +

√ (x − x(j))T^ Hj (x − x(j)) − log(1 +

√ (x − x(j))T^ Hj (x − x(j)))

for all x ∈ dom φj , where Hj is the Hessian of φj at x(j). If we apply this inequality to φk− 1 we obtain

φk(x(k)) = infx φk(x)

= infx

( φk− 1 (x) − log(−aTk (x − x(k−1)))

)

≥ inf v

( φk− 1 (x(k−1)) +

√ vT^ Hk− 1 v − log(1 +

√ vT^ Hk− 1 v) − log(−aTk v)

) .

By setting the gradient of the righthand side equal to zero, we find that it is minimized at

vˆ = −

√ aTk H k−−^11 ak

H k−−^11 ak,

which yields

φk(x(k)) ≥ φk− 1 (x(k−1)) +

√ ˆvT^ Hk− 1 ˆv − log(1 +

√ ˆvT^ Hk− 1 ˆv) − log(−aTk ˆv)

= φ(k−1)(xk−^1 ) + 0. 1744 −

log(aTk H k−−^11 ak)

≥ 0. 1744 k −

∑^ k

i=

log(aTi H− i−^11 ai) + 2n log 2

≥ 0. 1744 k −

k 2

log

( 1 k

∑^ k

i=

aTi H i−−^11 ai

)

  • 2n log 2

k 2

log

( 1 k

∑^ k

i=

aTi H i−−^11 ai

)

  • 2n log 2 (14)

because φ 0 (x(0)) = 2n log 2. We can further bound the second term on the righthand side by noting that

Hi = 4 diag( 1 + 2x(i))−^2 + 4 diag( 1 − 2 x(i))−^2 +

∑^ i

j=

(bj − aTj x(i))^2

aj aTj  I +

n

∑^ i

j=

aj aTj

because −(1/2) 1 ≺ x(i)^ ≺ (1/2) 1 and

bi − aTi x(k)^ = aTi (x(i−1)^ − x(k)) ≤ ‖ai‖ 2 ‖x(i−1)^ − x(k)‖ 2 ≤

n.

Define B 0 = I and Bi = I + (1/n)

∑i j=1 aj^ a T j for^ i^ ≥^ 1. Then

n log(1 + k/n^2 ) = n log(Tr Bk/n) ≥ log det Bk

6 Numerical examples

We consider the problem of minimizing a piecewise linear function:

minimize f (x) = maxi=1,...,m(aTi x + bi),

with variable x ∈ Rn. The particular problem instance we use to illustrate the different methods has n = 20 variables and m = 100 terms, with problem data ai and bi generated from a unit normal distribution. Its optimal value (which is easily computed via linear programming) is f ⋆^ ≈ 1 .1.

6.1 Basic ACCPM

We use the basic ACCPM algorithm described in §1, with the infeasible start Newton method used to carry out the centering steps. We take P 0 to be the unit box {z | ‖z‖∞ ≤ 1 }. We keep track of fbest, the best objective value found, and use this to generate deep objective cuts. Figure 1 shows convergence of f (k)^ − f ⋆, which is nearly linear (on a semi-log scale).

Figure 2 shows the convergence of the true suboptimality f (^) best(k) −f ⋆^ (which is not available

as the algorithm is running), along with the upper bound on suboptimality f (^) best(k) −l(bestk) (which is available as the algorithm runs).

Figure 3 shows f (^) best(k) − f ⋆^ versus the cumulative number of Newton steps required by the infeasible start Newton method in the centering steps. This plots shows that around 10 Newton steps are needed, on average, to carry out the centering. We can see that as the algorithm progresses (and P(k)^ gets small), there is a small increase in the number of Newton steps required to achieve the same factor increase in accuracy.

6.2 ACCPM with constraint dropping

We illustrate ACCPM with constraint dropping, keeping at most N = 3n constraints, using the constraint dropping scheme described in §3. Figure 4 shows convergence of f (k)^ − f ⋆ with and without constraint dropping. The plots show that keeping only 3n constraints has almost no effect on the progress of the algorithm, as measured in iterations. At k = 200 iterations, the pruned polyhedron has 60 constraints, whereas the unpruned polyhedron has 240 constraints. The number of flops per Newton step is, to first order, mkn^2 , where mk is the number of constraints at iteration k, so the total flop count of iteration k can be estimated as Nkmkn^2 , where Nk is the number of Newton steps required in iteration k. Figure 5 shows the convergence of f (^) best(k) − f ⋆^ versus the (estimated) cummulative flop count.

0 50 100 150 200 10 −

10 −

10 −

10 −

100

101

k

f^ (k

)^ −

f

Figure 1: The value of f (k)^ − f ⋆^ versus iteration number k, for the basic ACCPM.

0 50 100 150 200 10 −

10 −

10 0

102

104

k

f (^) best(k) − f ⋆ f (^) best(k) − l(bestk)

Figure 2: The value of f (^) best(k) − f ⋆^ (in blue) and the value of f (^) best(k) − l(bestk) (in red) versus iteration number k for the basic ACCPM.

0 50 100 150 200 10 −

10 −

10 −

10 −

100

101

Cumulative flops/10^6

f^ (k

) best

f

no dropping keeping 3n

Figure 5: The value of f (^) best(k) − f ⋆^ versus estimated cummulative flop count in the case where all constraints are kept (blue) and only 3n constraints are kept (red).

6.3 Epigraph ACCPM

Figure 6 shows convergence of f (k)^ − f ⋆^ for the epigraph ACCPM. (See §6 of the EE364b notes Localization and Cutting-Plane Methods for details.) Epigraph ACCPM requires only 50 iterations to reach the same accuracy that was reached by basic ACCPM in 200 iterations. The convergence of f (k) best −^ f^

⋆ (^) versus the cumulative number of Newton steps is shown in

figure 7. We see that in epigraph ACCPM the average number of Newton steps per iteration is a bit higher than for basic ACCPM, but a substantial advantage remains.

0 10 20 30 40 50 10 −

10 −

10 −

10 −

100

101

k

f^ (k

)^ −

f

Figure 6: The value of f (k)^ − f ⋆^ versus iteration number k, for epigraph ACCPM.

0 200 400 600 800 1000 1200 10 −

10 −

10 −

10 −

100

101

Newton iterations

f^ (k

) best

f

Figure 7: The value of f (^) best(k) − f ⋆^ versus the cumulative number of Newton steps, for epigraph ACCPM.