Docsity
Docsity

Prepara tus exámenes
Prepara tus exámenes

Prepara tus exámenes y mejora tus resultados gracias a la gran cantidad de recursos disponibles en Docsity


Consigue puntos base para descargar
Consigue puntos base para descargar

Gana puntos ayudando a otros estudiantes o consíguelos activando un Plan Premium


Orientación Universidad
Orientación Universidad


Proximal Method for Solving Convex Approximations of Nonconvex Optimization Problems, Apuntes de Matemáticas

A method for solving nonconvex optimization problems by approximating the nonsmooth convex objective function with a convex function and using the prox-regularization principle. The method is applicable when X is a nonempty closed convex subset of IRn and each function fi − λgi is convex for all λ. The document also provides properties of the approximation function and the convergence of the sequence {xk} to a solution of the original problem.

Tipo: Apuntes

2018/2019

Subido el 03/09/2021

EfraAs
EfraAs 🇵🇪

1 documento

1 / 18

Toggle sidebar

Esta página no es visible en la vista previa

¡No te pierdas las partes importantes!

bg1
J Glob Optim (2008) 42:121–138
DOI 10.1007/s10898-007-9270-x
An inexact proximal point method for solving generalized
fractional programs
Jean-Jacques Strodiot ·Jean-Pierre Crouzeix ·
Jacques A. Ferland ·VanHienNguyen
Received: 24 October 2007 / Accepted: 10 December 2007 / Published online: 15 January 2008
© Springer Science+Business Media, LLC. 2008
Abstract In this paper, we present several new implementable methods for solving a
generalized fractional program with convex data. They are Dinkelbach-type methods where
a prox-regularization term is added to avoid the numerical difficulties arising when the solu-
tion of the problem is not unique. In these methods, at each iteration a regularized parametric
problem is solved inexactly to obtain an approximation of the optimal value of the problem.
Since the parametric problem is nonsmooth and convex, we propose to solve it by using a
classical bundle method where the parameter is updated after each ‘serious step’. We mainly
study two kinds of such steps, and we prove the convergence and the rate of convergence
of each of the corresponding methods. Finally, we present some numerical experience to
illustrate the behavior of the proposed algorithms, and we discuss the practical efficiency of
each one.
Keywords Fractional programming ·Dinkelbach algorithms ·Proximal point
methods ·Bundle methods
AMS Classification 90C32 ·90C26
J.-J. Strodiot (B
)·V. H. Nguyen
Department of Mathematics, University of Namur (FUNDP), Namur, Belgium
V. H. Nguyen
J.-P. Crouzeix
LIMOS, CNRS-UMR 6158, Université Blaise Pascal, Clermont-Ferrand, France
J. A. Ferland
DIRO, Université de Montréal, Montreal, QC, Canada
123
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12

Vista previa parcial del texto

¡Descarga Proximal Method for Solving Convex Approximations of Nonconvex Optimization Problems y más Apuntes en PDF de Matemáticas solo en Docsity!

DOI 10.1007/s10898-007-9270-x

An inexact proximal point method for solving generalized

fractional programs

Jean-Jacques Strodiot · Jean-Pierre Crouzeix · Jacques A. Ferland · Van Hien Nguyen

Received: 24 October 2007 / Accepted: 10 December 2007 / Published online: 15 January 2008 © Springer Science+Business Media, LLC. 2008

Abstract In this paper, we present several new implementable methods for solving a generalized fractional program with convex data. They are Dinkelbach-type methods where a prox-regularization term is added to avoid the numerical difficulties arising when the solu- tion of the problem is not unique. In these methods, at each iteration a regularized parametric problem is solved inexactly to obtain an approximation of the optimal value of the problem. Since the parametric problem is nonsmooth and convex, we propose to solve it by using a classical bundle method where the parameter is updated after each ‘serious step’. We mainly study two kinds of such steps, and we prove the convergence and the rate of convergence of each of the corresponding methods. Finally, we present some numerical experience to illustrate the behavior of the proposed algorithms, and we discuss the practical efficiency of each one.

Keywords Fractional programming · Dinkelbach algorithms · Proximal point methods · Bundle methods

AMS Classification 90C32 · 90C

J.-J. Strodiot (B) · V. H. Nguyen

Department of Mathematics, University of Namur (FUNDP), Namur, Belgium e-mail: [email protected]

V. H. Nguyen e-mail: [email protected]

J.-P. Crouzeix LIMOS, CNRS-UMR 6158, Université Blaise Pascal, Clermont-Ferrand, France e-mail: [email protected]

J. A. Ferland DIRO, Université de Montréal, Montreal, QC, Canada e-mail: [email protected]

1 Introduction

Consider the generalized fractional programming problem

(P) λ∗^ = inf x∈X

λ(x) = max 1 ≤i≤ p

f (^) i (x) gi (x)

where X ⊆ IR n^ is nonempty, f (^) i , gi : X → IR are continuous for all 1 ≤ i ≤ p and gi (x) > 0 for all x ∈ X and 1 ≤ i ≤ p. We do not assume that λ∗^ is finite nor that (P) has optimal solutions. Crouzeix et al. [4,5] proposed two Dinkelbach-type algorithms based on the idea of solving a sequence of auxiliary parametric problems having a simpler structure. So they obtained two sequences. The first one is converging to λ∗^ and the second one to a solution of (P) if (P) has at least some solution. More precisely they first consider the parametric problem

(Pλ ) F(λ) = inf x∈X

F(x, λ),

where λ is a real parameter and

F(x, λ) = max 1 ≤i≤ p

{ f (^) i (x) − λgi (x)}. (1)

In particular, they prove that if (P) has a solution, then F(λ∗) = 0, and that if F(λ∗) = 0, then (P) and (Pλ∗^ ) have the same set (possibly empty) of solutions. The corresponding algo- rithm is as follows: given x k^ ∈ X , first find λk such that F(x k^ , λk ) = 0, and then find x k+^1 a solution of problem (Pλk ). It is easy to see that

F(x k^ , λk ) = 0 ⇔ λk = max 1 ≤i≤ p

f (^) i (x k^ ) gi (x k^ )

In [4], it is proven that if (P) has a solution, if each subproblem (Pλk ) has a solution, and if gi (x) ≤ γ for all x ∈ X and 1 ≤ i ≤ p, then the sequence {λk } converges linearly to λ∗. Moreover, if X is compact, every limit point of {x k^ } is a solution of (P). To obtain the superlinear convergence of the sequence {λk }, the algorithm has been modified in [5] by introducing a normalization vector parameter. The function F(x, λ) defined in (1) has been replaced by the function

F(x, w, λ) = max 1 ≤i≤ p

f (^) i (x) − λgi (x) wi

where w ∈ IR p, wi > 0 for all i. The algorithm becomes: given x k^ ∈ X and wki > 0 , i = 1 ,... , p, find λk such that F(x k^ , wk^ , λk ) = 0. Then compute x k+^1 as a solution of problem (Pwk (^) ,λk ) where

(Pwk (^) ,λk ) Fwk (λk ) = inf x∈X

F(x, wk^ , λk ).

In [5] the authors use the specific normalization wki = gi (x k^ ), i = 1 ,... , p at iteration k to prove that the sequence {λk } converges superlinearly to λ∗^ when X is compact and the sequence {x k^ } is converging to a solution of (P). If the solution of (P) is not unique (see an example in [8]), it can happen that the solution of problem (Pwk (^) ,λk ) is also not unique causing difficulties in the numerical solution of this problem. The case when X is not compact is also a source of numerical difficulties. On the other hand, the performances of these methods heavily depend on the effective solution

the functions f (^) i are convex, the functions gi are convex and λ 0 = λ(x^0 ) is negative. In all these cases the function F(·, wk^ , λ) is convex over X for all λ ∈ [λ∗, λ 0 ]. This assumption is justified because in our algorithms, the sequence {λk } will be nonincreasing and bounded below by λ∗. Given (x k^ , wk^ , λk ), the prox-regularization method consists in replacing the problem minx∈X F(x, wk^ , λk ) by the problem

(Pwk (^) ,λk ,αk ) min x∈X

F(x, wk^ , λk ) +

2 αk

‖x − x k^ ‖^2

where αk > 0. In order to obtain an implementable algorithm, we only compute an approximate solution of this problem. Practically this will be done by approximating in problem (Pwk (^) ,λk ,αk ) the

nonsmooth convex function F(·, wk^ , λk ) by a convex function ϕ(·, wk^ , λk ) in such a way that the problem

(A Pwk (^) ,λk ,αk ) min x∈X

ϕ(x, wk^ , λk ) +

2 αk

‖x − x k^ ‖^2

is easier to solve exactly. The form of this function and how to construct it will be the subject of the next section. Here we only define the properties that the approximation ϕ(·, wk^ , λk ) of F(·, wk^ , λk ) must satisfy so that the sequence {λk } converges to λ∗, the optimal value of (P), and the sequence {x k^ } converges to some solution of (P) if such a solution exists. The following approximation is classical in bundle methods.

Definition 2.1 Let c ∈ ( 0 , 1 ) and let wk^ > 0, λk ≥ λ∗^ and x k^ ∈ X. A convex function ϕ(·, wk^ , λk ) is a c-approximation of F(·, wk^ , λk ) at x k^ if ϕ(x, wk^ , λk ) ≤ F(x, wk^ , λk ) for all x ∈ X , and if

ϕ(x k+^1 , wk^ , λk ) ≥

c

F(x k+^1 , wk^ , λk ), (3)

where x k+^1 is the solution of problem (A Pwk (^) ,λk ,αk ).

Observe that if ϕ(·, wk^ , λk ) is a c-approximation of F(·, wk^ , λk ) at x k^ , then at x k+^1 , we can write

1 c

F(x k+^1 , wk^ , λk ) ≤ ϕ(x k+^1 , wk^ , λk ) ≤ F(x k+^1 , wk^ , λk ). (4)

In particular, since c ∈ ( 0 , 1 ), we have

ϕ(x k+^1 , wk^ , λk ) ≤ F(x k+^1 , wk^ , λk ) ≤ 0. (5)

We can now summarize our general algorithm as follows:

Algorithm 2.

  1. Choose x^0 ∈ X , w^0 > 0, α 0 > 0, c ∈ ( 0 , 1 ) and set λ 0 = λ(x^0 ).
  2. At step k, we have x k^ , wk^ , αk and λk. Then, construct a c-approximation of F(·, wk^ , λk ) and find x k+^1 ∈ X the unique solution of problem (A Pwk (^) ,λk ,αk ). Set λk+ 1 = λ(x k+^1 ), choose wk+^1 > 0, αk+ 1 > 0, set k ← k + 1, and go back to 1.

First observe that wk^ does not intervene in the computation of λk and that for any w > 0,

λk+ 1 = λ(x k+^1 ) ⇔ F(x k+^1 , w, λk+ 1 ) = 0.

Consequently at each iteration, we have that F(x k^ , wk^ , λk ) = 0, and that a c-approximation of F(·, wk^ , λk ) at x k^ satisfies the property:

F(x k^ , wk^ , λk ) − F(x k+^1 , wk^ , λk ) ≥ c [ F(x k^ , wk^ , λk ) − ϕ(x k+^1 , wk^ , λk ) ]. (6)

Observe that condition (6) can be interpreted as the real decrease on F when passing from x k^ to x k+^1 (the left–hand side) is greater than a fraction of the decrease predicted by the model ϕ (the right–hand side). This kind of step, called a serious step, is used in the bundle methods for minimizing a nonsmooth convex function as well as in the trust region methods in nonlinear programming. In order to study the convergence of the sequence {λk }, we introduce the following nota- tions. For x ∈ X , w > 0, and λ, we define the sets

I (x) =

i |

f (^) i (x) gi (x)

= λ(x)

, J (x, λ) = { j | f (^) j (x) − λg (^) j (x) = F(x, λ)},

J (x, w, λ) =

j |

f (^) j (x) − λg (^) j (x) w (^) j

= F(x, w, λ)

Proposition 2.1 Assume c ∈ ( 0 , 1 ). Then the following results hold:

  1. the sequence {λk } is nonincreasing and converges to some λˆ ≥ λ∗;
  2. if λ∗^ > −∞ and if gi (x k^ ) ≤ γ and wki ≥ w > 0 for all k and 1 ≤ i ≤ p, then F(x k+^1 , wk^ , λk ) → 0.

Proof

  1. By definition of F, we have for all 1 ≤ i ≤ p that

F(x k+^1 , wk^ , λk ) ≥ ( 1 /wik ) [ f (^) i (x k+^1 ) − λk gi (x k+^1 )]. (8)

Since f (^) i∗ (x k+^1 ) = λk+ 1 gi∗ (x k+^1 ) for i∗^ ∈ I (x k+^1 ), we obtain, using (8) and (5), that

gi∗^ (x k+^1 ) wki∗

[λk+ 1 − λk ] ≤ F(x k+^1 , wk^ , λk ) ≤ 0. (9)

Since gi∗^ (x k+^1 ) > 0 and wki∗ > 0, it follows that λk+ 1 ≤ λk. So λk → ˆλ ≥ λ∗^ because λk ≥ λ∗^ for all k.

  1. If λ∗^ > −∞, then λ >ˆ −∞ and λk+ 1 − λk → 0. Since, by assumption, there exist γ > 0 and w > 0 such that for all 1 ≤ i ≤ p and all k, gi (x k^ ) ≤ γ and wki ≥ w, and since λk+ 1 − λk ≤ 0, it follows from (9) that γ w

(λk+ 1 − λk ) ≤ F(x k+^1 , wk^ , λk ) ≤ 0. (10)

Consequently, F(x k+^1 , wk^ , λk ) → 0 when k → ∞.

In order to prove that λˆ = λ∗, i.e., that λk → λ∗, we need the following lemma.

Lemma 2.1 Let {(x k^ , wk^ , λk )} be the sequence generated by Algorithm 2.1. Then the following properties hold:

(i) for all k, one has εk ≡ −ϕ(x k+^1 , wk^ , λk ) − α− k 1 ‖x k+^1 − x k^ ‖^2 ≥ 0 ;

Theorem 2.1 Let c ∈ ( 0 , 1 ). Assume 0 < ν ≤ gi (x k^ ) ≤ γ and 0 < w ≤ wki ≤ w for all k and 1 ≤ i ≤ p. Assume also that

k≥ 0 αk^ = +∞^ and that either^ αk^ ≤^ α^ for all k or αk ≤ αk+ 1 for all k. Then the sequence {λk } generated by Algorithm 2.1 converges to λ∗, the optimal value of problem (P).

Proof Since, by Proposition 2.1, the sequence {λk } converges to λˆ ≥ λ∗, it remains to prove that λˆ = λ∗. If λˆ = −∞, then λˆ = λ∗. So we can suppose that ˆλ > −∞. Now let x ∈ X. Then, for j ∈ J (x, wk^ , λk ), we have F(x, wk^ , λk ) = ( f (^) j (x) − λk g (^) j (x))/wkj and since λ(x) ≥ f (^) j (x)/g (^) j (x), we obtain

F(x, wk^ , λk ) ≤ (λ(x) − λk )g (^) j (x)/wkj.

Then, by assumption,

F(x, wk^ , λk ) ≤ (λ(x) − λk )ν/w if λ(x) − λk ≤ 0. (12)

Now we prove that for all x ∈ X , we have

lim sup k→∞

F(x, wk^ , λk )

Suppose, to get a contradiction, that (13) is not true. Then there exist ε > 0 , x˜ ∈ X and kε such that

F( x˜, wk^ , λk ) < −ε for all k ≥ kε.

Then, for all k ≥ kε, it follows from the second part of Lemma 2.1 with x = ˜x that

‖x k+^1 − ˜x‖^2 ≤ ‖x k^ − ˜x‖^2 + 2 αk [εk + ( 2 αk )−^1 ‖x k+^1 − x k^ ‖^2 ] − 2 αk ε. (14)

First assume that αk ≤ α. Since 2αk ε > 0, we deduce from the previous inequality that

‖x k+^1 − ˜x‖^2 ≤ ‖x k^ − ˜x‖^2 + 2 α [εk + ( 2 αk )−^1 ‖x k+^1 − x k^ ‖^2 ].

But, by Lemma 2.1, the series

k≥ 1 [εk^ +^ (^2 αk^ )

− (^1) ‖x k+ (^1) − x k (^) ‖ (^2) ] is convergent and thus the

sequence {‖x k^ − ˜x‖^2 } converges to some u ≥ 0. Summing the inequality (14) from k = kε to k = q, and using αk ≤ α, we have

‖x q+^1 − ˜x‖^2 − ‖x kε^ − ˜x‖^2 ≤ 2 α

∑^ q

k=kε

[εk + ( 2 αk )−^1 ‖x k+^1 − x k^ ‖^2 ] − 2 ε

∑^ q

k=kε

αk.

Taking the limit as q → ∞ and using the assumption

k≥ 0 αk^ = +∞, we obtain that u − ‖x kε^ − ˜x‖^2 is less than −∞, which is impossible. So (13) holds. Assume now that αk ≤ αk+ 1 for all k. Then (14) implies that

( 2 αk+ 1 )−^1 ‖x k+^1 − ˜x‖^2 ≤ ( 2 αk )−^1 ‖x k^ − ˜x‖^2 + [εk + ( 2 αk )−^1 ‖x k+^1 − x k^ ‖^2 ] − ε. (15)

Since ε > 0 and the series

k≥ 1 [εk^ +^ (^2 αk^ )

− (^1) ‖x k+ (^1) − x k (^) ‖ (^2) ] is convergent, it follows that

the sequence {( 2 αk )−^1 ‖x k^ − ˜x‖^2 } converges to some u ≥ 0. Summing the inequality (15) from k = kε to k = q, we have

( 2 αq+ 1 )−^1 ‖x q+^1 − ˜x‖^2 − ( 2 αkε )−^1 ‖x kε^ − ˜x‖^2

∑^ q

k=kε

[εk + ( 2 αk )−^1 ‖x k+^1 − x k^ ‖^2 ] − ε(q − kε ).

Taking the limit as q → ∞, we obtain that u − ( 2 αkε )−^1 ‖x kε^ − ˜x‖^2 ≤ −∞, which is impossible. So (13) holds. Let x ∈ X. First if λ(x) ≥ λk for an infinite set of indices k, then λ(x) ≥ ˆλ. Otherwise, λ(x) < λk is true for all k greater than some k 0. But then from (12), we have F(x, wk^ , λk ) ≤ (λ(x) − λk )ν/w for all k ≥ k 0. Taking the superior limit of both members and using (13), we deduce that λ(x) ≥ ˆλ. So in both cases, we obtain that λ(x) ≥ ˆλ. Since x is arbitrary, we have that λ∗^ ≥ ˆλ and thus that λ∗^ = ˆλ.

Observe that it is not supposed that problem (P) has a solution to get the convergence of the sequence {λk }. Moreover, the assumption

k≥ 0 αk^ = +∞^ is usual in the convergence theorems concerning the proximal point algorithms (see, for example, [15]). Here we impose, in addition, that either the sequence {αk } is bounded above or nondecreasing. In particular, we can choose for {αk } a constant sequence or a nondecreasing sequence converging to +∞. In the next theorem, we prove the convergence of the sequence {x k^ }, but this time under the assumption that (P) has a solution. However to prove this result, we need the following lemma (see e.g., [3]).

Lemma 2.2 Let z be a limit point of a sequence¯ {z k^ } satisfying

‖z k+^1 − ¯z‖^2 ≤ ‖z k^ − ¯z‖^2 + δk

where {δk } is a sequence of nonnegative numbers such that

k≥ 0 δk^ <^ +∞. Then the whole sequence {z k^ } converges to z.¯

Theorem 2.2 Assume that the assumptions of Theorem 2.1 are satisfied. Then

(i) any limit point of the sequence {x k^ } is a solution of (P); (ii) if αk ≤ α for all k and the solution set of problem (P) is nonempty, then the sequence {x k^ } converges to some solution of (P).

Proof

(i) Let x∗^ be a limit point of the sequence {x k^ }. Then x n^ k^ → x∗^ and since λ(x) is a con- tinuous function, λ(x n^ k^ ) → λ(x∗). But λ(x n^ k^ ) = λn (^) k → λ∗^ (by Theorem 2.1). So λ(x∗) = λ∗^ and x∗^ is a solution of problem (P). (ii) First we prove that the sequence {x k^ } is bounded. In that purpose, let x¯ be a solution of problem (P). Then F( x¯, wk^ , λk ) ≤ 0. Indeed, since λk ≥ λ∗^ = λ( x¯) = maxi f (^) i ( x¯)/ gi ( x¯), we have that maxi { f (^) i ( x¯) − λk gi ( x¯)}≤0 and thus that F( x¯, wk^ , λk )≤0. Now using the second part of Lemma 2.1 with x = ¯x, we obtain

‖x k+^1 − ¯x‖^2 ≤ ‖x k^ − ¯x‖^2 + ‖x k+^1 − x k^ ‖^2 + 2 αk [F( x¯, wk^ , λk ) + εk ] ≤ ‖x k^ − ¯x‖^2 + 2 α [ εk + ( 2 αk )−^1 ‖x k+^1 − x k^ ‖^2 ].

Since the series

k≥ 1 [εk^ +^ (^2 αk^ )

− (^1) ‖x k+ (^1) − x k (^) ‖ (^2) ] is convergent, it follows that the sequence {‖x k^ − ¯x‖} is convergent and thus that the sequence {x k^ } is bounded. Let x∗^ be a limit point of the sequence {x k^ }. By (i), x∗^ is a solution of (P). Using again the second part of Lemma 2.1, but this time with x = x∗, we obtain

‖x k+^1 − x∗^ ‖^2 ≤ ‖x k^ − x∗^ ‖^2 + 2 α [ εk + ( 2 αk )−^1 ‖x k+^1 − x k^ ‖^2 ].

Since x∗^ is a limit point of {x k^ } and since

k≥ 1 [εk^ +^ (^2 αk^ )

− (^1) ‖x k+ (^1) − x k (^) ‖ (^2) ] is conver- gent, it follows from Lemma 2.2, that the whole sequence {x k^ } converges to x∗.

Writing λk+ 1 − λk = λk+ 1 − λ∗^ + λ∗^ − λk , we deduce from the previous inequality, after division by γ /wc, that [ 1 + c

w 2 αk κ

τ w γ w

)]

(λk − λ∗) ≥ (λk+ 1 − λ∗).

Since τ w ≤ γ w, the coefficient of (λk − λ∗) in the left-hand side is positive. It is strictly less than 1 when lim infk→∞ αk > (γ w)/( 2 κτ ). Hence the linear convergence of the sequence {λk } when αk is sufficiently large.

Theoretically, since c ∈ ( 0 , 1 ) and since (^2) αwk κ − (^) γ wτ w < 0, the best linear rate of conver- gence is attained when c is near 1. But the more c is close to 1, the more accurate is the c-approximation (it is exact when c = 1) and the more difficult it is to compute it. Now to obtain the superlinear convergence of the sequence {λk }, we have to impose that the regularization parameter αk tends to +∞. We also have to assume a stronger condition on the c-approximating function ϕ. This is the subject of the next definition.

Definition 2.2 Let c ∈ ( 0 , 1 ) and let wk^ > 0, λk ≥ λ∗^ and x k^ ∈ X. A convex function ϕ(·, wk^ , λk ) is a strong c-approximation of F(·, wk^ , λk ) at x k^ if ϕ(x, wk^ , λk ) ≤ F(x, wk^ , λk ) for all x ∈ X and if

F(x k+^1 , wk^ , λk ) − ϕ(x k+^1 , wk^ , λk ) ≤

1 − c αk

‖x k+^1 − x k^ ‖^2 , (21)

where x k+^1 is the solution of problem (A Pwk (^) ,λk ,αk ).

This definition is justified by the next proposition.

Proposition 2.2 Let c ∈ ( 0 , 1 ). A strong c-approximation of F(·, wk^ , λk ) at x k^ is also a c-approximation of F(·, wk^ , λk ) at x k^.

Proof By definition of x k+^1 , we have that α k− 1 (x k^ − x k+^1 ) ∈ ∂[ϕ(·, wk^ , λk ) + ψX ](x k+^1 ). So

ϕ(x k^ , wk^ , λk ) − ϕ(x k+^1 , wk^ , λk ) ≥ α− k 1 ‖x k+^1 − x k^ ‖^2. (22)

Since, by assumption, ϕ(x k^ , wk^ , λk ) ≤ F(x k^ , wk^ , λk ) = 0, we obtain from the previous inequality and from (21) that

−ϕ(x k+^1 , wk^ , λk ) ≥

1 − c

[F(x k+^1 , wk^ , λk ) − ϕ(x k+^1 , wk^ , λk )],

i.e.,

c 1 − c

ϕ(x k+^1 , wk^ , λk ) ≥

1 − c

F(x k+^1 , wk^ , λk ).

But this inequality is equivalent to (3).

Theorem 2.4 Assume that the solution set X ∗^ of problem (P) is nonempty, that the function F(·, λ∗) satisfies assumption (H ), and that τ = infx∈X∗ mini gi (x) > 0. Assume also that at each iteration, ϕ(·, wk^ , λk ) is a strong c-approximation of F(·, wk^ , λk ) with c > 1 / 2 and that the sequence {x k^ } converges to some solution of (P). Then the sequence {λk } converges superlinearly to λ∗^ if αk tends to +∞ when k → ∞ provided that at each iteration, wki is chosen equal to βgi (x k^ ) with β > 0.

Proof By definition of F and λk+ 1 , we have for i ∈ I (x k+^1 , λk+ 1 ) that

F(x k+^1 , wk^ , λk ) ≥

f (^) i (x k+^1 ) − λk gi (x k+^1 ) wik

= (λk+ 1 − λk )

gi (x k+^1 ) wki

Since λk ≥ λ∗^ for all k, we can deduce from the previous inequality that

F(x k+^1 , wk^ , λk ) ≥ (λk+ 1 − λ∗) min i

gi (x k+^1 ) wki

− (λk − λ∗) max i

gi (x k+^1 ) wik

Let x˜ k^ ∈ X ∗^ such that ‖x k^ − ˜x k^ ‖^2 = d(x k^ , X ∗). By definition of x k+^1 , we have

ϕ( x˜ k^ , wk^ , λk ) + ( 2 αk )−^1 ‖ ˜x k^ − x k^ ‖^2 ≥ ϕ(x k+^1 , wk^ , λk ) + ( 2 αk )−^1 ‖x k+^1 − x k^ ‖^2.

Since, by assumption, ϕ( x˜ k^ , wk^ , λk ) ≤ F( x˜ k^ , wk^ , λk ), we obtain from the previous inequality and from (21) that

F( x˜ k^ , wk^ , λk ) + ( 2 αk )−^1 ‖ ˜x k^ − x k^ ‖^2 ≥ F(x k+^1 , wk^ , λk ) +

2 c − 1 2 αk

‖x k+^1 − x k^ ‖^2 ,

and thus, since c > 1 /2, that

F( x˜ k^ , wk^ , λk ) + ( 2 αk )−^1 ‖ ˜x k^ − x k^ ‖^2 ≥ F(x k+^1 , wk^ , λk ). (24)

Combining (24) and (23) yields

F( x˜ k^ , wk^ , λk ) + ( 2 αk )−^1 ‖ ˜x k^ − x k^ ‖^2

≥ (λk+ 1 − λ∗) min i

gi (x k+^1 ) wki

− (λk − λ∗) max i

gi (x k+^1 ) wki

Then, using this inequality and the inequalities (19) and (20), we obtain [ max i

gi (x k+^1 ) wki

− min i

gi ( x˜ k^ ) wki

γ 2 αk κ

]

(λk − λ∗) ≥ (λk+ 1 − λ∗) min i

gi (x k+^1 ) wik

Thanks to (19), the sequences {x k^ } and { ˜x k^ } converge to the same limit. Combining this with the choice of w: wki = βgi (x k^ ) for all k and 1 ≤ i ≤ p, we have

max i

gi (x k+^1 ) wki

→ 1 /β, min i

gi ( x˜ k^ ) wki

→ 1 /β and min i

gi (x k+^1 ) wik

→ 1 /β

as k → ∞. Hence (λk+ 1 −λ∗)/(λk −λ∗) → 0 as k → ∞ because αk → +∞ as k → ∞.

This theorem must be compared with Theorem 2.2 of [5] where the superlinear conver- gence of the sequence {λk } is also obtained. In this theorem, there are no regularization terms, i.e., αk = +∞, and all the components wik of w are equal to gi (x k^ ).

3 Building c -approximations

In order to obtain an implementable algorithm, we have now to indicate how to construct a c-approximation of F(·, wk^ , λk ) at x k^ such that the subproblem (A Pwk (^) ,λk ,αk ) is easier to solve

than problem (Pwk (^) ,λk ,αk ). For the sake of simplicity, we denote the function F(·, wk^ , λk ) by

F k^ and a (strong) c-approximation of F k^ at x k^ by ϕk^. When X is described by linear equalities and/or inequalities and when ϕk^ is a piecewise linear convex function, it is very easy to see

Since s(x k^ ) ∈ ∂ F k^ (x k^ ), condition (C 1 ) is satisfied for j = 1. For the next models ϕkj , j = 2 ,... , there exist several possibilities. A first example is to take for j = 1 , 2 ,...

ϕkj+ 1 (y) = max {l kj (y), F k^ (y kj ) + 〈s(y kj ), y − y kj 〉} ∀y ∈ IR n^. (27)

Conditions (C 2 ), (C 3 ) are obviously satisfied and condition (C 1 ) is also satisfied because each linear piece of these functions are below F k^. Another example is to take for j = 1 , 2 ,...

ϕkj+ 1 (y) = max 0 ≤q≤ j

{F k^ (y kq ) + 〈s(y kq ), y − y (^) qk 〉} ∀y ∈ IR n^ , (28)

where y k 0 = x k^. Since s(y (^) qk ) ∈ ∂ F k^ (y kq ) for q = 0 ,... , j and since ϕkj+ 1 ≥ ϕkj ≥ l kj , it is easy to see that conditions (C1)–(C3) are satisfied. Comparing (27) and (28), we can say that l kj plays the same role as the j linear functions F k^ (y kq ) + 〈s(y kq ), y − y kq 〉, q = 0 ,... , j − 1. It is the reason why this function l kj is called the aggregate affine function (see, e.g., [3]). Now the algorithm to construct a strong c-approximation of F k^ at x k^ as well as the next iterate x k+^1 can be expressed as follows:

Algorithm 3.1 Let x k^ ∈ X and c ∈ ( 1 / 2 , 1 ). Set j = 1.

Step 1. Step 1. Choose ϕkj a convex piecewise linear function that satisfies (C1)–(C3) and

solve problem (P (^) jk ) to get y kj.

Step 2. Step 2. If F k^ (y kj ) − ϕkj (y kj ) ≤ ( 1 − c)α− k 1 ‖y kj − x k^ ‖^2 , then set x k+^1 = y kj , jk = j

and STOP; the function ϕkjk is a strong c-approximation of F k^ at x k^ and x k+^1 is the next iterate. Step 3. Step 3. Increase j by 1 and go to Step 1.

A c-approximation can also be obtained by replacing in Step 2 the inequality by ϕkj (y kj ) ≥

c−^1 F k^ (y kj ). As a strong c-approximation is also a c-approximation, it is immediate that if a strong c-approximation is obtained after finitely many iterations, then the same holds for the c-approximation. So we only consider strong c-approximation in the next theorem. Fur- thermore, the fact that ϕkj satisfies (C1)–(C3) means that ϕkj satisfies (C 1 ) and, if j ≥ 2, ϕkj satisfies (C 2 ) and (C 3 ) with j + 1 replaced by j. Our aim is now to prove that if x k^ is not a minimum of F k^ and if the models ϕkj , j = 1 ,...

satisfy (C1)–(C3), then there exists jk ∈ IN 0 such that ϕkjk is a strong c-approximation of F k

at x k^ , i.e., that the procedure stops in Step 2 after finitely many iterations.

Theorem 3.1 Suppose that the models ϕkj , j = 1 , 2 ,... satisfy conditions (C 1 )–(C 3 ), and

let, for each j, y kj be the unique solution of problem (P kj ). Let also x¯ k^ be the unique solution of problem (Pwk (^) ,λk ,αk ). Then

(1) F k^ (y kj ) − ϕkj (y kj ) → 0 and y kj → ¯x k^ when j → +∞.

(2) If x k^ = ¯x k^ , then the Algorithm 3.1 stops after finitely many iterations jk with ϕkjk a strong

c-approximation of F k^ at x k^ and with x k+^1 = y kjk.

(3) If x k^ = ¯x k^ , then λk = λ∗^ and x k^ is a solution to problem (P).

Proof The proof of the first part is classical and can be found in ([3], Proposition 4.3). The second part is straightforward because the left-hand side of the inequality in Step 2. tends to zero while the right-hand side converges to the positive number ( 1 − c)α k− 1 ‖ ¯x k^ − x k^ ‖^2.

Finally if x k^ = x¯ k^ , then F(x k^ , λk ) = 0 and the conclusion follows from Theorem 2. in [5].

Inserting Algorithm 3.1 in Step 1 of Algorithm 2.1, we obtain the following algorithm.

Bundle Algorithm Choose x^0 ∈ X , w^0 > 0, α 0 > 0, c ∈ ( 1 / 2 , 1 ) and set λ 0 = λ(x^0 ), y^00 = x^0 and k = 0 , j = 1.

Step 1. Choose a piecewise linear convex function ϕkj satisfying (C 1 )–(C 3 ) and solve

(P kj ) min y∈X

ϕkj (y) +

2 αk

‖y − x k^ ‖^2

to obtain the unique optimal solution y kj.

Step 2. If F k^ (y kj ) − ϕkj (y kj ) ≤ ( 1 − c)α− k 1 ‖y kj − x k^ ‖^2 , then set x k+^1 = y kj , y 0 k+ 1 = x k+^1 ,

λk+ 1 = λ(x k+^1 ), choose wk+^1 > 0, αk+ 1 > 0, increase k by 1 and set j = 0. Step 3. Increase j by 1 and go to Step 1.

Another bundle algorithm is obtained by replacing in Step 2 the first inequality corre- sponding to a strong c-approximation by the inequality ϕkj (y kj ) ≥ c−^1 F k^ (y kj ) corresponding to a c-approximation. To distinguish the two algorithms in the next section, we denote by B and B2 the bundle algorithms using the c-approximations and the strong c-approximations, respectively.

4 Numerical results

The computational experience reported here is performed with the software MATLAB. The purpose is to compare the numerical behavior of the two new bundle methods B1 and B introduced in Sect. 3 with the prox-regularization method (denoted M) introduced in Sect. 1 where each parametric subproblem (Pwk (^) ,λk ,αk ) is solved using a nonsmooth exact minimiza- tion procedure before updating the value of λk. Numerical results for method M are reported in [8] by Gugat. For this comparison, we consider a first set of test problems proposed in [7] (see also [1], p. 21).

Problem 4.

min x∈X

max

4 x^31 + 11 x 2 16 x 1 + 4 x 2

4 x^21 − x 1 3 x 1 + x 2

where

X = { x ∈ IR^2 | x 1 + x 2 ≥ 1 , 2 x 1 + x 2 ≤ 4 , x 1 , x 2 ≥ 0 }

and the initial point is x 0 = ( 1 , 1 )T^.

Problem 4.

min x∈X

max

3 x 1 − 2 x 2 4 x 1 + x 2

x 1 3 x 1 + x 2

where

X = { x ∈ IR^2 | x 1 + x 2 ≥ 1 , 2 x 1 + x 2 ≤ 4 , x 1 , x 2 ≥ 0 }

and the initial point is x 0 = ( 1 , 1 )T^.

Table 2 Comparison of the three methods on randomly generated problems

Problem n = 15 p = 20 n = 20 p = 20 n = 50 p = 50

Method B1 B2 B3 B1 B2 B3 B1 B2 B

iter 6 6 8 7 7 9 7 7 13 QP 45 86 48 54 112 49 114 267 153 cpu 1. 45 3. 03 1. 88 2. 31 4. 99 2. 30 17. 02 38. 17 22. 74

Sol λ∗^ = − 0. 534110 λ∗^ = − 0. 325440 λ∗^ = − 0. 092863

Problem n = 50 p = 100 n = 100 p = 100 n = 100 p = 150

Method B1 B2 B3 B1 B2 B3 B1 B2 B

iter 8 6 22 7 6 20 7 7 25 QP 124 229 223 124 234 195 146 301 301 cpu 16. 7 30. 2 32. 7 69. 6 133. 6 115. 2 86. 1 179. 8 190. 4

Sol λ∗^ = − 0. 097989 λ∗^ = − 0. 083710 λ∗^ = − 0. 096492

gi (x) = c (^) iT x + di in the denominators. The parameters of these functions are generated as follows:

  1. The Hessian matrix G (^) i is given by G (^) i = L (^) i Di L Ti where L (^) i is a unit lower triangular matrix with components randomly generated in [− 2. 5 , 2. 5 ] and Di is a positive diagonal matrix with components randomly generated in [ 0. 1 , 1. 6 ]. In order to generate a positive semidefinite Hessian, the first element of Di is set to zero.
  2. The components of the vectors ai and ci are randomly generated in [− 15 , 45 ] and [ 0 , 10 ], respectively.
  3. The real numbers bi and di are also randomly generated in [− 30 , 0 ] and [ 1 , 5 ], respec- tively.

Moreover, the following feasible set is used for all the test problems:

X =

x ∈ IR n^ |

∑^ n

j= 1

x (^) j ≤ 1 , 0 ≤ x (^) j ≤ 1 , j = 1 ,... , n

and the initial feasible point is x 0 = ( 1 /n,... , 1 /n). Finally, the parameters c = 0 .9, αk = 50, and wki = gi (x k^ ) for all k and 1 ≤ i ≤ p are used. The results are summarized in Table 2. For each problem the three methods give the same optimal value for λ∗. For these larger problems, we observe the same behavior as previously for B1 and B2. When the method B3 is compared with the method B1, we note that as expected, more iter- ations are required. Furthermore, even if the number of quadratic problems solved at each iteration of B3 is smaller than B1 in general, nevertheless the total number of quadratic problems solved in B3 (except for problem n = 20 , p = 20) is larger inducing that the cpu time is also larger than B1. Now comparing methods B1 and B2, the results indicate that the first one is faster than the second one. Although the number of iterations for B2 is less than or equal to the number of iterations of B1, the cpu time used by B2 is larger since more quadratic problems have to be solved to reach an inexact solution for a strong c-approxima- tion of F k^. In conclusion, the numerical results indicate that the method B1 seems to be the fastest amongst the methods studied in this paper for solving problem (P).

5 Conclusions and perspectives

We have shown that generalized fractional programs with convex data can be solved more efficiently by using an inexact proximal point method rather than an exact one. The strat- egy is to add a regularization term to the parametric function F(·, wk^ , λk ) and to introduce implementable criterions to decide when stopping the minimization of this regularized func- tion in order to update the parameter λk. Since the subproblems are nonsmooth convex problems, we propose to use a classical bundle method where after each ”serious step”, the parameter is updated. Two sequences are obtained, {λk } and {x k^ }, converging to the optimal value and to some solution of (P), respectively. This procedure is particularly interesting when several solutions exist for problem (P). Finally, some numerical tests on randomly gener- ated fractional problems indicate that the method using the c-approximations of F(·, wk^ , λk ) seems to be the most efficient. We conjecture that the efficiency of the method could be improved by using the informa- tion at the end of an iteration to obtain a good starting c-approximation at the next iteration. This should be the subject of a future investigation. In this paper we have assumed that the functions f (^) i − λgi are convex for all λ ∈ [λ∗, λ 0 ]. But for several practical problems, this assumption is not true and the max-function F(·, w, λ) is no more convex. So there is a need to consider the nonconvex case. For taking this sit- uation into account, several approaches have been proposed in the literature. One of them is to approximate the nonsmooth function by using an entropic regularization method (see, for example, [1,19]). Another way to deal with this difficulty is to adapt the proximal point method developed in this paper to the nonconvex case. In that direction, recent researches on proximal point methods (see, for example, [11]) have shown that for solving nonconvex optimization problems with this method, it is crucial, in order to get convergence of the iter- ates to a stationary point, that the proximal subproblems remain convex. In our situation, the subproblem (Pw,λ,α ) may remain convex even if the function F(·, w, λ) is nonconvex. For example, when F(·, w, λ) is a lower-C^2 function ([16], Def. 10.29, p. 447), it is possible to add to this function a quadratic term of the form ( 1 / 2 α)‖ · ‖^2 such that the resulting function becomes convex ([16], Theorem 10.33, p. 450). In our setting, if all the functions f (^) i and gi , i = 1 ,... , p, are differentiable and if for each i the gradient ∇ f (^) i − λ∇gi is Lipschitz continuous with a constant L (^) i , then the function F(·, w, λ) + ( 1 / 2 α)‖ · ‖^2 is strongly convex for α < [maxi {L (^) i /wi }]−^1 ([11], Proposition 1). In other words, if α is sufficiently small the problem (Pw,λ,α ) is strongly convex and consequently has a unique solution. Another crucial issue is the design of an efficient method for computing the solution of problem (Pw,λ,α ) when F(·, w, λ) is nonconvex. Several bundle methods have been proposed in the literature for solving this problem when the function F(·, w, λ) is locally Lipschitz (see, for example, [6,12,20]). In these methods, the function F(·, w, λ) is approximated by a piecewise linear convex function (to obtain again a convex quadratic subproblem) built step by step by using the Clarke generalized gradient [2] instead of the usual subdifferential. However, due to the nonconvexity of F(·, w, λ), these approximations are only appropriate in a neighborhood of the current point x k^ with the consequence that either a linesearch or a trust region strategy must be applied for finding the next point x k+^1. More recently, by means of variational analysis [16], Hare et al. [9] presented a new methodology for solving the subproblems based on the computation of proximal points of piecewise linear models of the nonconvex function. Convergence of the method is proven for the class of nonconvex functions that are prox-bounded and lower-C^2. From all these comments concerning non- convex optimization problems, it follows that it is reasonable to think that the proximal point