Introduction Optimization, Lecture Notes - Mathematics -, Study notes of Mathematical Methods

Central subject, Quadratic programming, convergence rates, convex functions

Typology: Study notes

2010/2011

Uploaded on 09/09/2011

luber-1
luber-1 🇬🇧

4.8

(12)

293 documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
C12.1B: CONTINUOUS OPTIMISATION
LECTURE 1: INTRODUCTION
RAPHAEL HAUSER
MATHEMATICAL INSTITUTE, UNIVERSITY OF OXFORD
1. The Central Subject of this Course. The engineer who designs an aircraft
with minimal drag given the required lift force, the manager who maximises profit
within constraints imposed by the available resources, the bicycle courier who seeks
the shortest path between two points in a city, and your cup of tea that cools down
to maximise the entropy in the universe all solve optimisation problems! The world
is full of them.
Mathematically, we can formulate an important class of such problems as follows:
(P) min
xRnf(x)
s.t. gi(x)0,(i= 1,...,p),
hj(x) = 0,(j= 1,...,q),
where f, giand hjare sufficiently smooth functions: typically we require them to be
twice continuously differentiable.
The function frepresents an objective (such as energy, cost etc.) that has to be
minimised under side constraints defined by the functions giand hj. We therefore
call fthe objective function, the functions githe equality constraint functions, and
the functions hjthe inequality constraint functions of (P). Note that by replacing f
by fwe can of course treat maximisation problems in the same framework.
Example 1.1 (Linear Programming). The transshipment problem occurs when
the cheapest way of shipping prescribed amounts of a commodity across a transporta-
tion network has to be determined. This can be a network of oil pipe lines, a computer
network, a network of shipping lanes, a road network etc..
A network of gas pipelines is given in Figure 1.1 An arrow from node ito node j
6
5
4
3
2
1
Fig. 1.1.Gas pipeline network
represents a pipe with transport capacity cij in the given direction. Transporting one
1
pf3
pf4
pf5
pf8

Partial preview of the text

Download Introduction Optimization, Lecture Notes - Mathematics - and more Study notes Mathematical Methods in PDF only on Docsity!

C12.1B: CONTINUOUS OPTIMISATION

LECTURE 1: INTRODUCTION

RAPHAEL HAUSER MATHEMATICAL INSTITUTE, UNIVERSITY OF OXFORD

  1. The Central Subject of this Course. The engineer who designs an aircraft with minimal drag given the required lift force, the manager who maximises profit within constraints imposed by the available resources, the bicycle courier who seeks the shortest path between two points in a city, and your cup of tea that cools down to maximise the entropy in the universe all solve optimisation problems! The world is full of them. Mathematically, we can formulate an important class of such problems as follows:

(P) min x∈Rn f (x)

s.t. gi(x) ≥ 0 , (i = 1,... , p), hj (x) = 0, (j = 1,... , q),

where f, gi and hj are sufficiently smooth functions: typically we require them to be twice continuously differentiable. The function f represents an objective (such as energy, cost etc.) that has to be minimised under side constraints defined by the functions gi and hj. We therefore call f the objective function, the functions gi the equality constraint functions, and the functions hj the inequality constraint functions of (P). Note that by replacing f by −f we can of course treat maximisation problems in the same framework.

Example 1.1 (Linear Programming). The transshipment problem occurs when the cheapest way of shipping prescribed amounts of a commodity across a transporta- tion network has to be determined. This can be a network of oil pipe lines, a computer network, a network of shipping lanes, a road network etc..

A network of gas pipelines is given in Figure 1.1 An arrow from node i to node j

6

5

4

3

2

1

Fig. 1.1. Gas pipeline network

represents a pipe with transport capacity cij in the given direction. Transporting one

unit of gas along the edge (ij) costs dij. The amount of gas produced at node i is pi, and the amount of gas consumed is qi. We assume that the total amount consumed equals the total amount of gas produced (if this assumption were not true, we could construct an equivalent transshipment problem that has this property). How do the quantities xij of gas shipped along the edges (ij) to be chosen so as to satisfy all the demands and to minimise costs? We set cij = 0 (and dij arbitrary numbers) for all edges (ij) that do not exist. Doing so, we can assume that the network is a complete graph. The problem we have to solve is the following:

min x

∑^6

i,j=

dij xij

s.t.

∑^6

k=

xki + pi =

∑^6

j=

xij + qi, (i = 1,... , 6),

0 ≤ xij ≤ cij , (i, j = 1,... , 6).

This is an example of a linear programming problem, as the objective function and all the constraint functions are linear. Note that it is not a priori clear that this problem has feasible solutions. One is therefore interested in algorithms that not only find optimal LP solutions when these exist but also detect when a problem instance is infeasible!

Example 1.2 (Quadratic Programming). In the portfolio optimisation problem, an investor considers a fixed time interval and wishes to decide which fraction of the capital he/she wants to invest in each of n different given assets when the expected return of asset i is μi and the covariance between assets i and j is σij. The vector μ = [μi] and the matrix σ = [σij ] are assumed to be known and the investor aims at a total return of at least b. Subject to this constraint, he/she aims to minimise the risk as quantified by the variance of the overall portfolio.

This problem can be modelled as

min x∈Rn

∑^ n

i=

∑^ n

j=

σij xixj

s.t.

∑^ n

i=

μixi ≥ b,

∑^ n

i=

xi = 1,

xi ≥ 0 (i = 1,... , n).

The constraint

∑n i=1 xi^ = 1 expresses the requirement that 100% of the initial capital has to be invested.

Example 1.3 (Semidefinite Programming). In optimal control, variables y 1 ,... , ym have to be chosen so as to design a system that is driven by the linear ODE

u ˙ = M (y)u,

Among the properties of algorithms that are most often analysed in theorems we can single out three important groups: Correctness: Does the algorithm compute the claimed input-output relation for all input values? This is like proving that a mathematical equation or inequality holds true, where one side of the relation can be seen as the function of interest and the other as the computational rule or algorithm. Complexity: How many computer operations will running the algorithm require as a function of the input data? Since this cannot usually be determined exactly for all input values, one is often interested either in the worst case that can occur or in an average case under some probability distribution over the input values. Moreover, when an algorithm can be run on input data of various dimensions, one quantifies the worst-case complexity as a function of the problem dimension or another measure of input size. Finally, when a numerical algorithm proceeds by iteratively improving approximations to the true (theoretical) solution of a problem, the complexity is usually analysed in terms of the number of computer operations per iteration and the convergence speed or convergence rate of the algorithm. Reliability: Is there a guarantee of how accurately the final result of the algorithm approximates the true solution of the problem? Does this guarantee hold for a large set of input data, or are there domains in the input space for which the algorithm struggles to compute an accurate solution? How do rounding errors affect the computation?

  1. Prerequisite Knowledge. Only linear algebra and multivariate calculus are required to understand this course. A course in numerical linear algebra or in numer- ical analysis helps in understanding some of the deeper issues but is not absolutely essential. Everything else will be developed from first principles. We will spend the remainder of this first lecture to discuss some important preliminary concepts. Other notions will be introduced if an when we need them.

2.1. Local versus Global Optimality. Let us first explain what we mean by “optimal solution” to the problem (P). A point x ∈ Rn^ is feasible for the optimisation problem (P) if gi(x) ≥ 0 ∀i and hj (x) = 0 ∀j, that is, if x satisfies all the constraints of the problem. The set F of feasible points is called the domain of feasibility of (P). A feasible point x∗^ is a local minimiser if there exists a ball Bǫ(x∗) around x∗^ such that

f (x∗) ≤ f (x) ∀x ∈ Bǫ(x∗) ∩ F,

that is, x∗^ is a minimiser amongst all the feasible points in a neighbourhood of x∗, but there might be feasible points further away from x∗^ where the objective function takes an even smaller value. A feasible point x∗^ is a global minimiser if

f (x∗) ≤ f (x) ∀x ∈ F,

that is, x∗^ minimises the objective function amongst all feasible points of the problem, although there might exist several of these points.

Example 2.1. The problem

(P ) min x∈R f (x) = x^3 + 9x^2

s.t. − 10 ≤ x ≤ 2

has a local minimiser at x = 0, and a global minimiser at x∗^ = − 10 , see Figure 2.1.

−100 −10 −8 −6 −4 −2 0 2

0

50

100

150

Fig. 2.1. Objective function of Example 2.

In the general framework, efficient algorithms can only be devised for the problem of finding a local minimiser. The problem of finding a global minimiser is extremely important in practise, but its solution is typically based on heuristics that rely on local minimisation as a subproblem. We therefore restrict the material of this course to local minimisation. A slightly confusing terminology is the following: an iterative algorithm for solv- ing (P) converges globally if the output sequence (xk)N converges to a local minimiser x∗^ for all starting points x 0 ∈ F. On the other hand, an iterative algorithm is called locally convergent if the output sequence (xk)N converges to a local minimiser x∗^ for all feasible starting points x 0 close enough to x∗, that is, for all x 0 ∈ Br (x∗) ∩ F for some r > 0.

Example 2.2. Let us go back to the problem of Example 2.1 and consider the following algorithm: S0 Choose x 0. Set α = 1, k = 0. S1 x = xk + αf ′(xk). S2 If x is feasible then goto S3, else α ← α/ 2 and goto S1. S3 Set xk+1 = x, k ← k + 1, α = 1, and goto S1.

This algorithm converges to the local minimiser x∗^ = 0 for all starting points x 0 ∈ (− 6 , 2], and to the global minimiser x∗^ = −10 for x 0 ∈ [− 10 , −6). For x 0 = − 6 it remains stuck. If we exclude x 0 as a starting point, then this algorithm is globally convergent, even though it only converges to local minimisers! The focus here is that the algorithm converges no matter what the starting point is. On the other hand, if we omit the judicious choice of α, we obtain the following algorithm: S0 Choose x 0. Set k = 0. S1 Set xk+1 = xk + f ′(xk), k ← k + 1, and goto S1.

{x : ‖x − x¯‖ < ρ}, ellipsoids {x : xTBx ≤ r} (with B a positive definite matrix) and affine subspaces {x : aTx = b} are all examples of convex sets. If C, D ⊆ Rn^ are convex sets, λ ∈ R and ϕ : Rn^ → Rm^ is a linear map, then C + D := {x + y : x ∈ C, y ∈ D}, λC := {λx : x ∈ C}, ϕ(C) := {ϕ(x) : x ∈ C} and C ∩ D are convex sets.

2.4. Convex Functions. Functions f : Rn^ → (−∞, +∞] into the real line extended by +∞ are called proper. A proper function is convex if its epigraph

epi(f ) :=

(x, z) ∈ Rn+1^ : f (x) ≤ z

is a convex set in Rn+1. A proper function f (assumed to be defined on all of Rn) is convex if and only if

f

λx + (1 − λ)y

≤ λf (x) + (1 − λ)f (y) (2.1)

for all x, y ∈ Rn, λ ∈ [0, 1]. If this becomes a strict inequality < for all λ ∈ (0, 1) we say that f is strictly convex. If f is convex then its effective domain dom(f ) := {x : f (x) < +∞} is a convex set in Rn. On the other hand, we call any function f : C → R which is defined on a convex set C and satisfies (2.1) for all x, y ∈ C and λ ∈ [0, 1] convex, and any such function can be extended to a convex proper function by setting f (x) := +∞ for all x /∈ C. If f and g are convex proper functions then so are f + g and λf for any λ ≥ 0. If F is a set of convex proper functions then the pointwise supremum

sup F

: x 7 → sup{f (x) : f ∈ F}

is a convex proper function. In particular, the pointwise maximum of finitely many convex proper functions is convex. If f is a convex proper function then all its level sets {x : f (x) ≤ z} (where z ∈ (−∞, +∞] is fixed) are convex. Any convex proper function f is continuous on the topological interior intr

dom(f )

of its effective domain. A proper function g : Rn^ → [−∞, +∞) or a function g : C → [−∞, +∞) defined on a convex set is called concave if −g is convex.

Theorem 2.4 (First order differential properties of convex functions). Let f : D → R be a function defined on a convex open domain D ⊂ Rn. (i) If f is convex then x∗^ is a local minimiser if and only if it is a global min- imiser. (ii) If f is C^1 on D, then f is convex if and only if for all x, y ∈ D,

f (y) ≥ f (x) + ∇f (x) · (y − x), (2.2)

that is, the graph of the first order approximation of f at x lies below the graph of f. (iii) If f is convex and ∇f (x∗) = 0 then x∗^ is a global minimiser of f. If D = Rn then this condition is both sufficient and necessary. (iv) f is both convex and concave if and only if f is an affine function. Proof. Suppose x∗^ ∈ D is a local but not a global minimiser. Then there exists a y ∈ D such that f (y) < f (x∗), and then f (λy + (1 − λ)x∗) ≤ λf (y) + (1 − λ)f (x∗) <

f (x∗) for all λ ∈ [0, 1) and x∗^ cannot be a local minimiser because λ can be chosen arbitrarily close to 0. On the other hand, every global minimiser is a local minimiser. This proves (i). Suppose now that f satisfies (2.2). Given λ ∈ [0, 1] and x, y ∈ D, let z = (1 − λ)x + λy. (2.2) implies

f (x) ≥ f (z) + ∇f (z) · (x − z) and f (y) ≥ f (z) + ∇f (z) · (y − z).

Multiplying the first inequality by (1 − λ) and the second by λ, and adding the two inequalities we get f (z) ≤ (1 − λ)f (x) + λf (y). Hence, f is convex. Suppose on the other hand that f is convex. Then f

x + λ(y − x)

≤ f (x) + λ

f (y) − f (x)

, and hence

f

x + λ(y − x)

− f (x) λ

≤ f (y) − f (x)

Taking limits as λ → 0 we get (2.2). This proves (ii). (iii) is a trivial consequence of (i) and (ii). If f is affine, then it is clearly both convex and concave. On the other hand, if f is both convex and concave, and if f is differentiable at least at one point x∗^ then it follows from (2.2) that f (y) ≥ f (x∗) + ∇f (x∗) · (y − x∗) and −f (y) ≥ −f (x∗) − ∇f (x∗) · (y − x∗) for all y. Hence, f (y) ≡ f (x∗) + ∇f (x∗) · (y − x∗). The general case can be proved in a similar way using the notion of subdifferential. One can also prove that there are always points where f is differentiable, but this is technically more difficult.

Theorem 2.5 (Second order differential properties of convex functions). Let f : D → R be a function defined on a convex open domain D ⊂ Rn. (i) If f is convex, x ∈ D and the Hessian H(x) = f ′′(x) exists, then H(x)  0 (positive semidefinite, that is, zTH(x)z ≥ 0 for all z ∈ R). (ii) If H(x) exists for all x ∈ D and H(x)  0 then f is convex. (iii) If H(x) exists for all x ∈ D and H(x) ≻ 0 (positive definite, that is, zTH(x)z > 0 for all z ∈ R \ { 0 }) then f is strictly convex. Proof. See homework assignments.