Stochastic Optimization Methods: Overview and Challenges, Study notes of Physics

An introduction to stochastic optimization methods, which are used when deterministic methods fail due to large numbers of control variables, highly nonlinear problem functions, and multiple local optima. Common issues in measuring performance and archiving solutions, and introduces the concept of dissimilarity archiving. The algorithms discussed in this course all employ some form of random search.

Typology: Study notes

2010/2011

Uploaded on 09/07/2011

hawking
hawking 🇬🇧

4.4

(25)

268 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
5R1 GTP
1
STOCHASTIC OPTIMIZATION METHODS
0. Introduction
There are a great many optimization methods. Most of these are
deterministic
methods with
some formal mathematical basis, broadly falling into two categories:
Optimality Criteria
The mathematical conditions for an optimal solution are established and then either:
a candidate solution is tested to see if it meets the conditions, or
the equations derived from the optimality criteria are solved analytically to determine the
optimal solution.
Search Methods
An initial trial solution is selected, either using common sense or at random, and the objec-
tive function is evaluated.
A move is made to a new point and the objective function is evaluated again. If it is smaller
than the value for the first trial solution, it is retained and another move is made.
The process is repeated until the minimum is found.
Many search methods use first and second derivative (gradient and Hessian) information to
direct the moves made.
Unfortunately, there are many optimization problems which cannot be satisfactorily solved
using any deterministic optimization algorithms — inevitably many problems of practical
interest fall into this category. The characteristics which make optimization problems difficult
for deterministic algorithms are:
Large numbers of control variables and constraints;
Highly nonlinear problem functions (objective and constraints);
Multiple local optima.
When these systematic search methods fail, one must resort to less conventional search tech-
niques. There are a wide variety of such techniques. Many of them employ some form of ran-
dom or stochastic search. Collectively these ‘unconventional’ optimization techniques are
known as
Heuristic Methods
, where in this context:
An heuristic optimization method is a technique which seeks good (i.e. near-optimal)
solutions at reasonable computational cost without being able to guarantee optimality.
The algorithms discussed in this course all employ some form of random search.
pf3
pf4
pf5

Partial preview of the text

Download Stochastic Optimization Methods: Overview and Challenges and more Study notes Physics in PDF only on Docsity!

STOCHASTIC OPTIMIZATION METHODS

0. Introduction

There are a great many optimization methods. Most of these are deterministic methods with some formal mathematical basis, broadly falling into two categories:

Optimality Criteria

The mathematical conditions for an optimal solution are established and then either:

  • a candidate solution is tested to see if it meets the conditions, or
  • the equations derived from the optimality criteria are solved analytically to determine the optimal solution.

Search Methods

  • An initial trial solution is selected, either using common sense or at random, and the objec- tive function is evaluated.
  • A move is made to a new point and the objective function is evaluated again. If it is smaller than the value for the first trial solution, it is retained and another move is made.
  • The process is repeated until the minimum is found.

Many search methods use first and second derivative (gradient and Hessian) information to direct the moves made.

Unfortunately, there are many optimization problems which cannot be satisfactorily solved using any deterministic optimization algorithms — inevitably many problems of practical interest fall into this category. The characteristics which make optimization problems difficult for deterministic algorithms are:

  • Large numbers of control variables and constraints;
  • Highly nonlinear problem functions (objective and constraints);
  • Multiple local optima.

When these systematic search methods fail, one must resort to less conventional search tech- niques. There are a wide variety of such techniques. Many of them employ some form of ran- dom or stochastic search. Collectively these ‘unconventional’ optimization techniques are known as Heuristic Methods , where in this context:

An heuristic optimization method is a technique which seeks good (i.e. near-optimal) solutions at reasonable computational cost without being able to guarantee optimality.

The algorithms discussed in this course all employ some form of random search.

1. Common Issues

1.1 Performance Measures

It is not necessarily easy to measure the performance of a stochastic optimization method, because, unless exactly the same sequence of random numbers is used, the algorithm will not perform the same search on the same problem, even if given the same starting point. For this reason, before making any claims as to the performance of an algorithm on a given problem, several (at least 25, preferably 50 or more) runs should be made using different random num- ber sequences (this is usually done by specifying different seeds to the random number gener- ator used) and, if the starting point can affect the run, different initial solutions.

There are different ways in which the algorithm performance can be measured:

  • If the optimal solution is known, then the length of (c.p.u.) time or the number of objective function evaluations required in order to locate the optimum can be used as a measure. As stochastic methods are not guaranteed to locate the global optimum, this is not a particularly helpful measure. Also, of course, in many (most) real-world problems, the optimal solution is not known a priori.
  • The value of the objective function for the best solution found after a specified length of (c.p.u.) time or a specified number of objective function evaluations is a more generally use- ful performance measure. Because it is in general true that the longer a stochastic search

Figure 1.1 : Hypothetical Performance Curves.

Objective function

Evaluations

A

B

N

1.2.1 Best L Solutions

One obvious strategy is store the best L solutions located (their control variable, objective function and constraint values), where L may be of the order of 25. This is easily implemented. The one potential disadvantage is that the best 25 solutions may well be very similar and therefore this ‘archive’ may not give very much information about the rest of the search space explored.

1.2.2 Best L Dissimilar Solutions

This disadvantage may be overcome by storing the best L solutions with a minimum level of dissimilarity. An obvious requirement is therefore a measure of dissimilarity between solu- tions. This is most readily defined in terms of the control variables. For instance, for continu- ous control variables, a simple measure of dissimilarity between two solutions and is:

, (1.1)

i.e. the Euclidean distance between the solutions in control variable space. If the individual control variables vary over significantly different ranges (for instance, and ), then it may be appropriate to rescale them so that within the optimiza- tion routine they can vary over the same range, e.g. or.

In addition, one needs to define two dissimilarity thresholds and , which are used as follows:

  • Let the number of solutions in the archive be l , and let these be labelled. Let be a new solution, a candidate for archiving. Let be the archived solution most closely resembles, i.e.. Let be the worst archived solution, i.e..
  • If fewer than L solutions have been archived (i.e. the archive is not yet full), archive if it is sufficiently dissimilar to all the solutions archived: If , archive if. (1.2)
  • If the archive is full, archive if it is sufficiently dissimilar to all the solutions archived and better than the worst of these: If , archive if and. (1.3) (In this case replaces in the archive.)
  • If is not sufficiently dissimilar to the archived solutions, archive it if it is the best solution found so far: If for some K , archive if , (1.4)

x A x B

DAB = ( x Ax B ) T^ ( x Ax B )

0 m ≤ x 1 ≤10 m 0 m ≤ x 2 ≤0.001 m ( 0 1, ) ( −1 1, ) D min D sim

x K x J x E x J DEJDKJK = 1 , … , l x G f ( x G ) ≥ f ( x K )∀ K = 1 , … , l x J

l < L x J DKJ > D min∀ K = 1 , … , l x J

l = L x J DKJ > D min∀ K = 1 , … , L f ( x J ) < f ( x G ) x J x G x J

DKJ < D min x J f ( x J ) < f ( x K )∀ K = 1 , … , l

or , if it is not the best solution found so far, archive it if it is sufficiently similar to and better than : If for some K , archive if and. (1.5) (In both cases replaces in the archive.)

Figures 1.2 and 1.3 show examples of this logic in action for the simple case of an optimiza- tion problem with just one control variable.

Using this logic a helpful picture of the search space can be built up. Obviously appropriate values of the dissimilarity thresholds and will be problem dependent. It is clear that and indeed should probably be at least an order of magnitude smaller than .

x E DKJ < D min x J f ( x J ) < f ( x E ) DEJ < D sim x J x E

D min D sim D sim < D min D sim D min

Figure 1.2 : Dissimilarity Archiving — D minThreshold Met.

x

f ( ) x Candidate solution archived

2 D min

Archive Not Full

x

f ( ) x Candidate solution archived

2 D min

Archive solution removed

Archive Full

Solution in archive Candidate for archiving