Localization and Cutting Plane Methods-Optimization Techniques-Lecture Notes, Study notes of Convex Optimization

Prof. Asit Maddukuri provided this handout for Convex Optimization course at Aligarh Muslim University. Its main points are: Localization, Cutting, Plane, Methods, Unconstrained, Minimization, Feasibility, MVE, Bisection, Center

Typology: Study notes

2011/2012

Uploaded on 07/15/2012

sagar
sagar 🇮🇳

5

(2)

69 documents

1 / 21

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Localization and Cutting-Plane Methods
S. Boyd and L. Vandenberghe
January 29, 2007
Contents
1 Cutting-planes 2
2 Finding cutting-planes 3
2.1 Unconstrained minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Feasibility problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Inequality constrained problem . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Localization algorithms 7
3.1 Basic cutting-plane and localization algorithm . . . . . . . . . . . . . . . . . 7
3.2 Measuring uncertainty and progress . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Choosing the query point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4 Some specific cutting-plane methods 12
4.1 Bisection method on R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Center of gravity method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3 MVE cutting-plane method . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.4 Chebyshev center cutting-plane method . . . . . . . . . . . . . . . . . . . . . 16
4.5 Analytic center cutting-plane method . . . . . . . . . . . . . . . . . . . . . . 16
5 Extensions 16
5.1 Multiple cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.2 Dropping or pruning constraints . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3 Nonlinear cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6 Epigraph cutting-plane method 18
7 Lower bounds and stopping criteria 19
1
docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15

Partial preview of the text

Download Localization and Cutting Plane Methods-Optimization Techniques-Lecture Notes and more Study notes Convex Optimization in PDF only on Docsity!

Localization and Cutting-Plane Methods

 - January 29, S. Boyd and L. Vandenberghe 
  • 1 Cutting-planes Contents
  • 2 Finding cutting-planes
    • 2.1 Unconstrained minimization
    • 2.2 Feasibility problem
    • 2.3 Inequality constrained problem
  • 3 Localization algorithms
    • 3.1 Basic cutting-plane and localization algorithm
    • 3.2 Measuring uncertainty and progress
    • 3.3 Choosing the query point
  • 4 Some specific cutting-plane methods
    • 4.1 Bisection method on R
    • 4.2 Center of gravity method
    • 4.3 MVE cutting-plane method
    • 4.4 Chebyshev center cutting-plane method
    • 4.5 Analytic center cutting-plane method
  • 5 Extensions
    • 5.1 Multiple cuts
    • 5.2 Dropping or pruning constraints
    • 5.3 Nonlinear cuts
  • 6 Epigraph cutting-plane method
  • 7 Lower bounds and stopping criteria

In these notes we describe a class of methods for solving general convex and quasiconvex optimization problems, based on the use of cutting-planes, which are hyperplanes that sepa- rate the current point from the optimal points. These methods, called cutting-plane methods or localization methods, are quite different from interior-point methods, such as the barrier method or primal-dual interior-point method described in [BV04, §11]. Cutting-plane meth- ods are usually less efficient for problems to which interior-point methods apply, but they have a number of advantages that can make them an attractive choice in certain situations.

  • Cutting-plane methods do not require differentiability of the objective and constraint functions, and can directly handle quasiconvex as well as convex problems. Each itera- tion requires the computation of a subgradient of the objective or contraint functions.
  • Cutting-plane methods can exploit certain types of structure in large and complex problems. A cutting-plane method that exploits structure can be faster than a general- purpose interior-point method for the same problem.
  • Cutting-plane methods do not require evaluation of the objective and all the constraint functions at each iteration. (In contrast, interior-point methods require evaluating all the objective and constraint functions, as well as their first and second derivatives.) This can make cutting-plane methods useful for problems with a very large number of constraints.
  • Cutting-plane methods can be used to decompose problems into smaller problems that can be solved sequentially or in parallel.

To apply these methods to nondifferentiable problems, you need to know about subgra- dients, which are described in a separate set of notes. More details of the analytic center cutting-plane method are given in another separate set of notes.

1 Cutting-planes

The goal of cutting-plane and localization methods is to find a point in a convex set X ⊆ Rn, which we call the target set, or, in some cases, to determine that X is empty. In an optimization problem, the target set X can be taken as the set of optimal (or ǫ-suboptimal) points for the problem, and our goal is find an optimal (or ǫ-suboptimal) point for the optimization problem. We do not have direct access to any description of the target set X (such as the objective and constraint functions in an underlying optimization problem) except through an oracle. When we query the oracle at a point x ∈ Rn, the oracle returns the following information to us: it either tells us that x ∈ X (in which case we are done), or it returns a separating hyperplane between x and X, i.e., a 6 = 0 and b such that

aT^ z ≤ b for z ∈ X, aT^ x ≥ b.

x x

X X

Figure 2: Left: a neutral cut for the point x and target set X. Here, the query point x is on the boundary of the excluded halfspace. Right: a deep cut for the point x and target set X.

2.1 Unconstrained minimization

We first consider the unconstrained optimization problem

minimize f 0 (x), (1)

where f 0 is convex. To find a cutting-plane for this problem, at the point x, we proceed as follows. We find a subgradient g ∈ ∂f (x). (If f is differentiable at x, then our only choice is g = ∇f (x).) If g = 0, then x is optimal, i.e., in the target set X. So we assume that g 6 = 0. Recall that for all z we have f 0 (z) ≥ f 0 (x) + gT^ (z − x)

(indeed, this is the definition of a subgradient). Therefore if z satisfies gT^ (z − x) > 0, then it also satisfies f 0 (z) > f 0 (x), and so cannot be optimal (i.e., in X). In other words, we have

gT^ (z − x) ≤ 0 for z ∈ X,

and gT^ (z − x) = 0 for z = x. This shows that

gT^ (z − x) ≤ 0

is a (neutral) cutting-plane for (1) at x. This cutting-plane has a simple interpretation: in our search for an optimal point, we can remove the halfspace {z | gT^ (z − x) > 0 } from consideration because all points in it have an objective value larger than the point x, and therefore cannot be optimal. This is illustrated in figure 3. We can generate a deep cut if we know a number f¯ that satisfies

f 0 (x) > f¯ ≥ f ⋆,

X

x g^ ∈^ ∂f^ (x)

Figure 3: The curves show the level sets of a convex function f 0. In this example the optimal set X is a singleton, the minimizer of f 0. The hyperplane given by gT^ (z − x) = 0 separates the point x (which lies on the hyperplane) from the optimal set X, hence defines a (neutral) cutting-plane. All points in the unshaded halfspace can be ‘cut’ since in that halfspace we have f 0 (z) ≥ f 0 (x).

where f ⋆^ = infx f 0 (x) is the optimal value of the problem (1). In this case we know that any optimal point x⋆^ must satisfy

f¯ ≥ f ⋆^ ≥ f 0 (x) + gT^ (x⋆^ − x),

so we have the deep cut gT^ (z − x) + f 0 (x) − f¯ ≤ 0. When the problem (1) is quasiconvex (i.e., when f 0 is quasiconvex), we can find a cutting- plane at x by finding a nonzero quasigradient of f 0 at x. Essentially by definition, the inequality gT^ (z − x) ≤ 0 is a cutting-plane when g is a nonzero quasigradient of f at x.

2.2 Feasibility problem

We consider the feasibility problem

find x subject to fi(x) ≤ 0 , i = 1,... , m,

where fi are convex. Here the target set X is the feasible set. To find a cutting-plane for this problem at the point x we proceed as follows. If x is feasible, i.e., satisfies fi(x) ≤ 0 for i = 1,... , m, then x ∈ X. Now suppose x is not feasible.

where f 0 ,... , fm are convex. As above, the target set X is the optimal set. Given the query point x, we first check for feasibility. If x is not feasible, then we can construct a cut as fj (x) + gTj (z − x) ≤ 0 , (3)

where fj (x) > 0 (i.e., j is the index of any violated constraint) and gj ∈ ∂fj (x). This defines a cutting-plane for the problem (2) since any optimal point must satisfy the jth inequality, and therefore the linear inequality (3). The cut (3) is called a feasibility cut for the problem (2), since we are cutting away a halfplane of points known to be infeasible (since they violate the jth constraint). Now suppose that the query point x is feasible. Find a g 0 ∈ ∂f 0 (x). If g 0 = 0, then x is optimal and we are done. So we assume that g 0 6 = 0. In this case we can construct a cutting-plane as gT 0 (z − x) ≤ 0 ,

which we refer to as an objective cut for the problem (2). Here, we are cutting out the halfspace {z | g 0 T (z − x) > 0 } because we know that all such points have an objective value larger than x, hence cannot be optimal. We can also find a deep objective cut, by keeping track of the best objective value fbest, among feasible points, found so far. In this case we can use the cutting-plane

g 0 T (z − x) + f 0 (x) − fbest ≤ 0 ,

since all other points have objective value at least fbest. (If x is the best feasible point found so far, then fbest = f (x), and this reduces the neutral cut above.)

3 Localization algorithms

3.1 Basic cutting-plane and localization algorithm

We start with a set of initial linear inequalities

Cz  d,

where C ∈ Rq×n, that are known to be satisfied by any point in the target set X. One common choice for this initial set of inequalities is the ℓ∞-norm ball of radius R, i.e.,

−R ≤ zi ≤ R, i = 1,... , n,

where R is chosen large enough to contain X. At this point we know nothing more than

X ⊆ P 0 = {z | Cz  d}.

Now suppose we have queried the oracle at points x(1),... , x(k), none of which were announced by the oracle to be in the target set X. Then we have k cutting-planes

aTi z ≤ bi, i = 1,... , k,

X

Pk

Figure 5: Points x(1),... , x(k), shown as dots, and the associated cutting-planes, shown as lines. From these cutting-planes we conclude that the target set X (shown dark) lies inside the localization polyhedron Pk, shown lightly shaded. We can limit our search for a point in X to Pk.

that separate x(k)^ from X, respectively. Since every point in the target set must satisfy these inequalities, we know that

X ⊆ Pk = {z | Cz  d, aTi z ≤ bi, i = 1,... , k}.

In other words, we have localized X to within the polyhedron Pk. In our search for a point in X, we need only consider points in the localization polyhedron Pk. This is illustrated in figure 5. If Pk is empty, then we have a proof that the target set X is empty. If it is not, we choose a new point x(k+1)^ at which to query the cutting-plane oracle. (There is no reason to choose x(k+1)^ outside Pk, since we know that all target points lie in Pk.) If the cutting-plane oracle announces that x(k+1)^ ∈ X, we are done. If not, the cutting-plane oracle returns a new cutting-plane, and we can update the localization polyhedron by adding the new inequality. This iteration gives the basic cutting-plane or localization algorithm:

Basic conceptual cutting-plane/localization algorithm

given an initial polyhedron P 0 = {z | Cz  d} known to contain X. k := 0. repeat Choose a point x(k+1)^ in Pk. Query the cutting-plane oracle at x(k+1). If the oracle determines that x(k+1)^ ∈ X, quit.

balls, where an is a constant that depends only on n. To specify one of these balls requires and index with ⌈log 2 N ⌉ bits. This has form cn + log vol(C), where cn depends on n and ǫ. Using this measure of uncertainty, the log of the ratio of the volume of Pk to the volume of Pk+1 is exactly the descrease in uncertainty. Volume arguments can be used to show convergence of (some) cutting-plane methods. In one standard method, we assume that the target set X contains a ball Br of radius r > 0, and that P 0 is contained in some ball BR of radius R/ We show that at each step of the cutting-plane method the volume of Pk is reduced at least by some factor γ < 1. If the algorithm has not terminated in k steps, then, we have

vol(Br) ≤ vol(Pk) ≤ γk^ vol(P 0 ) ≤ γk^ vol(BR)

since Br ⊆ Pk and since BR ⊇ P 0. It follows that

k ≤

n log(R/r) log(1/γ)

3.3 Choosing the query point

The cutting-plane algorithm described above is only conceptual, since the critical step, i.e., how we choose the next query point x(k+1)^ inside the current localization polyhedron Pk, is not fully specified. Roughly speaking, our goal is to choose query points that result in small localization polyhedra. We need to choose x(k+1)^ so that Pk+1 is as small as possible, or equivalently, the new cut removes as much as possible from the current polyhedron Pk. The reduction in size (say, volume) of Pk+1 compared to Pk gives a measure of how informative the cutting-plane for x(k+1)^ is. When we query the oracle at the point x(k+1), we do not know which cutting-plane will be returned; we only know that x(k+1)^ will be in the excluded halfspace. The informativeness of the cut, i.e., how much smaller Pk+1 is than Pk, depends on the direction ak+1 of the cut, which we do not know before querying the oracle. This is illustrated in figure 6, which shows a localization polyhedron Pk and a query point x(k+1), and two cutting-planes that could be returned by the oracle. One of them gives a large reduction in the size of the localization polyhedron, but the other gives only a small reduction in size. Since we want our algorithm to work well no matter which cutting-plane is returned by the oracle, we should choose x(k+1)^ so that, no matter which cutting-plane is returned by the oracle, we obtain a good reduction in the size of our localization polyhedron. This suggests that we should choose x(k+1)^ to be deep inside the polyhedron Pk, i.e., it should be some kind of center of Pk. This is illustrated in figure 7, which shows the same localization polyhedron Pk as in figure 6 with a more central query point x(k+1). For this choice of query point, we cut away a good portion of Pk no matter which cutting-plane is returned by the oracle. If we measure the informativeness of the kth cut using the volume reduction ratio vol(Pk+1)/ vol(Pk), we seek a point x(k+1)^ such that, no matter what cutting-plane is re- turned by the oracle, we obtain a certain guaranteed volume reduction. For a cutting-plane with normal vector a, the least informative is the neutral one, since a deep cut with the

x(k+1)^

x(k+1)

Pk Pk

Pk+1 (^) Pk+

Figure 6: A localization polyhedron Pk and query point x(k+1), shown as a dot. Two possible scenarios are shown. Left. Here the cutting-plane returned by the oracle cuts a large amount from Pk; the new polyhedron Pk+1, shown shaded, is small. Right. Here the cutting-plane cuts only a very small part of Pk; the new polyhedron Pk+1, is not much smaller than Pk.

x(k+1)^ x(k+1)

Pk Pk

Pk+ Pk+

Figure 7: A localization polyhedron Pk and a more central query point xk+1^ than in the example of figure 6. The same two scenarios, with different cutting-plane directions, are shown. In both cases we obtain a good reduction in the size of the localization polyhedron; even the worst possible cutting-plane at x would result in Pk+1 substantially smaller than Pk.

4.1 Bisection method on R

We first describe a very important cutting-plane method: the bisection method. We consider the special case n = 1, i.e., a one-dimensional search problem. We will describe the tradi- tional setting in which the target set X is the singleton {x∗}, and the cutting-plane oracle always returns a neutral cut. The cutting-plane oracle, when queried with x ∈ R, tells us either that x∗^ ≤ x or that x∗^ ≥ x. In other words, the oracle tells us whether the point x∗ we seek is to the left or right of the current point x. The localization polyhedron Pk is an interval, which we denote [lk, uk]. In this case, there is an obvious choice for the next query point: we take x(k+1)^ = (lk + uk)/2, the midpoint of the interval. The bisection algorithm is:

Bisection algorithm for one-dimensional search.

given an initial interval [l, u] known to contain x∗; a required tolerance r > 0 repeat x := (l + u)/2. Query the oracle at x. If the oracle determines that x∗^ ≤ x, u := x. If the oracle determines that x∗^ ≥ x, l := x. until u − l ≤ 2 r

In each iteration the localization interval is replaced by either its left or right half, i.e., it is bisected. The volume reduction factor is the best it can be: it is always exactly 1/2. Let 2 R = u 0 −l 0 be the length of the initial interval (i.e., 2R gives it diameter). The length of the localization interval after k iterations is then 2−k 2 R, so the bisection algorithm terminates after exactly k = ⌈log 2 (R/r)⌉ (4)

iterations. Since x∗^ is contained in the final interval, we are guaranteed that its midpoint (which would be the next iterate) is no more than a distance r from x∗. We can interpret R/r as the ratio of the initial to final uncertainty. The equation (4) shows that the bisection method requires exactly one iteration per bit of reduction in uncertainty. It is straightforward to modify the bisection algorithm to handle the possibility of deep cuts, and to check whether the updated interval is empty (which implies that X = ∅). In this case, the number ⌈log 2 (R/r)⌉ is an upper bound on the number of iterations required. The bisection method can be used as a simple method for minimizing a differentiable convex function on R, i.e., carrying out a line search. The cutting-plane oracle only needs to determine the sign of f ′(x), which determines whether the minimizing set is to the left (if f ′(x) ≥ 0) or right (if f ′(x) ≤ 0) of the point x.

4.2 Center of gravity method

The center of gravity method, or CG algorithm, was one of the first localization methods proposed, by Newman [New65] and Levin [Lev65]. In this method we take the query point to be x(k+1)^ = cg(Pk), where the center of gravity of a set C ⊆ Rn^ is defined as

cg(C) =

∫ ∫^ C^ z dz C dz

assuming C is bounded and has nonempty interior. The center of gravity is invariant under affine transformations, so the CG method is also affine-invariant. The center of gravity turns out to be a very good point in terms of the worst-case volume reduction factor: we always have

vol(Pk+1) vol(Pk)

≤ 1 − 1 /e ≈ 0. 63.

In other words, the volume of the localization polyhedron is reduced by at least 37% at each step. Note that this guaranteed volume reduction is completely independent of all problem parameters, including the dimension n. This guarantee comes from the following result: suppose C ⊆ Rn^ is convex, bounded, and has nonempty interior. Then for any nonzero a ∈ Rn, we have

vol

( C ∩ {z | aT^ (z − cg(C)) ≤ 0 }

) ≤ (1 − 1 /e) vol(C).

In other words, a plane passing though the center of gravity of a convex set divides its volume almost equally: the volume division inequity can be at most in the ratio (1 − 1 /e) : 1/e, i.e., about 1.72 : 1. In the CG algorithm we have

vol(Pk) ≤ (1 − 1 /e)k^ vol(P 0 ) ≈ 0. 63 k^ vol(P 0 ).

Now suppose the initial polyhedron lies inside a Euclidean ball of radius R (i.e., it has diameter ≤ 2 R), and the target set contains a Euclidean ball of radius r. Then we have

vol(P 0 ) ≤ αnRn,

where αn is the volume of the unit Euclidean ball in Rn. Since X ⊆ Pk for each k (assuming the algorithm has not yet terminated) we have

αnrn^ ≤ vol(Pk).

Putting these together we see that

αnrn^ ≤ (1 − 1 /e)kαnRn,

4.4 Chebyshev center cutting-plane method

In the Chebyshev center cutting-plane method, due to Elzinga and Moore [EM75], the query point x(k+1)^ is taken to be the Chebyshev center of the current polyhedron Pk, i.e., the center of the largest Euclidean ball that lies inside it. This point can be computed by solving a linear program [BV04, §8.5.1]. Unlike the other methods described here, the Chebyshev center cutting-plane method is not affinely invariant. The Chebyshev center cutting-plane method can be strongly affected by problem scaling or affine transformations of coordinates.

4.5 Analytic center cutting-plane method

The analytic center cutting-plane method (ACCPM) uses as query point the analytic center of Pk, i.e., the solution of the problem

minimize −

∑m 0 i=1 log(di^ −^ c T i x)^ −^

∑mk i=1 log(bi^ −^ a T i x),

with variable x, where

Pk = {z | cTi z ≤ di, i = 1,... , m 0 , aTi z ≤ bi, i = 1,... , mk}.

(We have an implicit constraint that x ∈ int Pk.) ACCPM was developed by Goffin and Vial [GV93] and analyzed by Nesterov [Nes95] and Atkinson and Vaidya [AV95]. ACCPM seems to give a good trade-off in terms of simplicity and practical performance. It will be described in much more detail in a separate set of notes.

5 Extensions

In this section we describe several extensions and variations on cutting-plane methods.

5.1 Multiple cuts

One simple extension is to allow the oracle to return a set of linear inequalities for each query, instead of just one. When queried at x(k), the oracle returns a set of linear inequalities which are satisfied by every z ∈ X, and which (together) separate x(k)^ and X. Thus, the oracle can return Ak ∈ Rpk^ ×n^ and bk ∈ Rpk^ , where Akz  bk holds for every z ∈ X, and Akx(k)^6 ≺ bk. This means that at least one of the pk linear inequalities must be a valid cutting-plane by itself. The inequalities which are not valid cuts by themselves are called shallow cuts. It is straightforward to accommodate multiple cutting-planes in a cutting-plane method: at each iteration, we simply append the entire set of new inequalities returned by the oracle to our collection of valid linear inequalities for X. To give a simple example showing how multiple cuts can be obtained, consider the convex feasibility problem

find x subject to fi(x) ≤ 0 , i = 1,... , m.

In §2 we showed how to construct a cutting-plane at x using any (one) violated constraint. We can obtain a set of multiple cuts at x by using information from any set inequalities, provided at least one is violated. From the basic inequality

fj (z) ≥ fj (x) + gjT (z − x),

where gj ∈ ∂fj (x), we find that every z ∈ X satisfies

fj (x) + gTj (z − x) ≤ 0.

If x violates the jth inequality, this is a deep cut. If x satisfies the jth inequality, this is a shallow cut, and can be used in a group of multiple cuts, as long as one neutral or deep cut is present. Common choices for the set of inequalities to use to form cuts are

  • the most violated inequality, i.e., argmaxj fj (x),
  • any violated inequality (e.g., the first constraint found to be violated),
  • all violated inequalities,
  • all inequalities.

5.2 Dropping or pruning constraints

The computation required to find the new query point x(k+1)^ grows with the number of linear inequalities that describe Pk. This number, in turn, increases by one at each iteration (for a single cut) or more (for multiple cuts). For this reason most practical cutting-plane implementations include a mechanism for dropping or pruning the set of linear inequalities as the algorithm progresses. In the conservative approach, constraints are dropped only when they are known to be redundant. In this case dropping constraints does not change Pk, and the convergence analysis for the cutting-plane algorithm without pruning still holds. The progress, judged by volume reduction, is unchanged when we drop constraints that are redundant. To check if a linear inequality aTi z ≤ bi is redundant, i.e., implied by the linear inequalities aTj z ≤ bj , j = 1,... , m, we can solve the linear program

maximize aTi z subject to aTj z ≤ bj , j = 1,... , m, j 6 = i.

The linear inequality is redundant if and only if the optimal value is (strictly) smaller than bi. Solving a linear program to check redundancy of each inequality is usually too costly, and therefore not done. In some cases there are other methods that can identify (some) redundant constraints, with less computational effort.

E = {F u + g | ‖u‖ 2 ≤ 1 } (5)

We start with the problem

minimize f 0 (x) subject to fi(x) ≤ 0 , i = 1,... , m,

where f 0 ,... , fm are convex. In the basic cutting-plane method outlined above, we take the variable to be x, and the target set X to be the set of optimal points. Cutting-planes are found using the methods described in §2. Suppose instead we form the equivalent epigraph form problem

minimize t subject to f 0 (x) ≤ t fi(x) ≤ 0 , i = 1,... , m,

with variables x ∈ Rn^ and t ∈ R. We take the target set to be the set of optimal points for the epigraph problem (7), i.e.,

X = {(x, f 0 (x)) | x optimal for (6)}.

Let us show how to find a cutting-plane, in Rn+1, for this version of the problem, at the query point (x, t). First suppose x is not feasible for the original problem, e.g., the jth constraint is violated. Every feasible point satisfies

0 ≥ fj (z) ≥ fj (x) + gjT (z − x),

where gj ∈ ∂fj (x), so we can use the cut

fj (x) + gjT (z − x) ≤ 0

(which doesn’t involve the second variable). Now suppose that the query point x is feasible. Evaluate a subgradient g ∈ ∂f 0 (x). If g = 0, then x is optimal; otherwise, for any (z, s) ∈ Rn+1^ feasible for the problem (7), we have

s ≥ f 0 (z) ≥ f 0 (x) + gT^ (z − x)

Since x is feasible, f 0 (x) ≥ p⋆, which is the optimal value of the second variable. Thus, we can construct two cutting-planes in (z, s):

f 0 (x) + gT^ (z − x) ≤ s, s ≤ f 0 (x).

7 Lower bounds and stopping criteria

In this section we describe a general method for constructing a lower bound on the optimal value of a convex problem, assuming we have evaluated a subgradient of its objective and constraint functions at some points. The method involves solving a linear program using

data collected from the subgradient evaluations. This method can be used in cutting-plane methods to give a non-heuristic stopping criterion. In the analytic center cutting-plane method, a lower bound based on this one can be computed very cheaply at each iteration. Consider a convex function f. Suppose we have evaluated f and a subgradient of f at points x(1),... , x(q). We have, for all z,

f (z) ≥ f (x(i)) + g(i)T^ (z − x(i)), i = 1,... , q,

and so f (z) ≥ fˆ (z) = max i=1,...,q

( f (x(i)) + g(i)T^ (z − x(i))

) .

The function fˆ is a convex piecewise-linear global underestimator of f. Now suppose that we use a cutting-plane method to solve the problem

minimize f 0 (x) subject to fi(x) ≤ 0 , i = 1,... , m Cx  d,

where fi are convex. After k steps, we have evaluated the objective or constraint functions, along with a subgradient, k times (or more, if multiple cuts are used). As a result, we can form piecewise-linear approximations fˆ 0 ,... , fˆm of the objective and constraint functions. Now we form the problem

minimize fˆ 0 (x) subject to fˆi(x) ≤ 0 , i = 1,... , m Cx  d.

Since the objective and constraint functions are piecewise-linear, this problem can be trans- formed to a linear program. Its optimal value is a lower bound on p⋆, the optimal value of the problem (8), since fˆi(x) ≤ fi(x) for all x and i = 0,... , m. Computing this lower bound requires solving a linear program, and so is relatively ex- pensive. In ACCPM, however, we can easily construct a lower bound on the problem (9), as a by-product of the analytic centering computation, which in turn gives a lower bound on the original problem (8).

Acknowledgments

Lin Xiao and Jo¨elle Skaf helped develop the material here.

References

[AV95] D. Atkinson and P. Vaidya. A cutting plane algorithm for convex programming that uses analytic centers. Mathematical Programming, 69:1–43, 1995.