Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Introduction Optimization, Lecture Notes - Mathematics -, Study notes of Mathematical Methods

University of Oxford Mathematical Methods

Central subject, Quadratic programming, convergence rates, convex functions

Typology: Study notes

2010/2011

Uploaded on 09/09/2011

luber-1 🇬🇧

4.8

(12)

293 documents

1 / 8

This page cannot be seen from the preview

Don't miss anything!

C12.1B: CONTINUOUS OPTIMISATION

LECTURE 1: INTRODUCTION

RAPHAEL HAUSER

MATHEMATICAL INSTITUTE, UNIVERSITY OF OXFORD

1. The Central Subject of this Course. The engineer who designs an aircraft

with minimal drag given the required lift force, the manager who maximises profit

within constraints imposed by the available resources, the bicycle courier who seeks

the shortest path between two points in a city, and your cup of tea that cools down

to maximise the entropy in the universe all solve optimisation problems! The world

is full of them.

Mathematically, we can formulate an important class of such problems as follows:

(P) min

x∈Rnf(x)

s.t. gi(x)≥0,(i= 1,...,p),

hj(x) = 0,(j= 1,...,q),

where f, giand hjare sufficiently smooth functions: typically we require them to be

twice continuously differentiable.

The function frepresents an objective (such as energy, cost etc.) that has to be

minimised under side constraints defined by the functions giand hj. We therefore

call fthe objective function, the functions githe equality constraint functions, and

the functions hjthe inequality constraint functions of (P). Note that by replacing f

by −fwe can of course treat maximisation problems in the same framework.

Example 1.1 (Linear Programming). The transshipment problem occurs when

the cheapest way of shipping prescribed amounts of a commodity across a transporta-

tion network has to be determined. This can be a network of oil pipe lines, a computer

network, a network of shipping lanes, a road network etc..

A network of gas pipelines is given in Figure 1.1 An arrow from node ito node j

6

5

4

3

2

1

Fig. 1.1.Gas pipeline network

represents a pipe with transport capacity cij in the given direction. Transporting one

1

Discover Study notes of Mathematical Methods University of Oxford

Partial preview of the text

Download Introduction Optimization, Lecture Notes - Mathematics - and more Study notes Mathematical Methods in PDF only on Docsity!

C12.1B: CONTINUOUS OPTIMISATION

LECTURE 1: INTRODUCTION

RAPHAEL HAUSER MATHEMATICAL INSTITUTE, UNIVERSITY OF OXFORD

The Central Subject of this Course. The engineer who designs an aircraft with minimal drag given the required lift force, the manager who maximises profit within constraints imposed by the available resources, the bicycle courier who seeks the shortest path between two points in a city, and your cup of tea that cools down to maximise the entropy in the universe all solve optimisation problems! The world is full of them. Mathematically, we can formulate an important class of such problems as follows:

(P) min x∈Rn f (x)

s.t. gi(x) ≥ 0 , (i = 1,... , p), hj (x) = 0, (j = 1,... , q),

where f, gi and hj are sufficiently smooth functions: typically we require them to be twice continuously differentiable. The function f represents an objective (such as energy, cost etc.) that has to be minimised under side constraints defined by the functions gi and hj. We therefore call f the objective function, the functions gi the equality constraint functions, and the functions hj the inequality constraint functions of (P). Note that by replacing f by −f we can of course treat maximisation problems in the same framework.

Example 1.1 (Linear Programming). The transshipment problem occurs when the cheapest way of shipping prescribed amounts of a commodity across a transporta- tion network has to be determined. This can be a network of oil pipe lines, a computer network, a network of shipping lanes, a road network etc..

A network of gas pipelines is given in Figure 1.1 An arrow from node i to node j

6

5

4

3

2

1

Fig. 1.1. Gas pipeline network

represents a pipe with transport capacity cij in the given direction. Transporting one

unit of gas along the edge (ij) costs dij. The amount of gas produced at node i is pi, and the amount of gas consumed is qi. We assume that the total amount consumed equals the total amount of gas produced (if this assumption were not true, we could construct an equivalent transshipment problem that has this property). How do the quantities xij of gas shipped along the edges (ij) to be chosen so as to satisfy all the demands and to minimise costs? We set cij = 0 (and dij arbitrary numbers) for all edges (ij) that do not exist. Doing so, we can assume that the network is a complete graph. The problem we have to solve is the following:

min x

∑^6

i,j=

dij xij

s.t.

∑^6

k=

xki + pi =

∑^6

j=

xij + qi, (i = 1,... , 6),

0 ≤ xij ≤ cij , (i, j = 1,... , 6).

This is an example of a linear programming problem, as the objective function and all the constraint functions are linear. Note that it is not a priori clear that this problem has feasible solutions. One is therefore interested in algorithms that not only find optimal LP solutions when these exist but also detect when a problem instance is infeasible!

Example 1.2 (Quadratic Programming). In the portfolio optimisation problem, an investor considers a fixed time interval and wishes to decide which fraction of the capital he/she wants to invest in each of n different given assets when the expected return of asset i is μi and the covariance between assets i and j is σij. The vector μ = [μi] and the matrix σ = [σij ] are assumed to be known and the investor aims at a total return of at least b. Subject to this constraint, he/she aims to minimise the risk as quantified by the variance of the overall portfolio.

This problem can be modelled as

min x∈Rn

∑^ n

i=

∑^ n

j=

σij xixj

s.t.

∑^ n

i=

μixi ≥ b,

∑^ n

i=

xi = 1,

xi ≥ 0 (i = 1,... , n).

The constraint

∑n i=1 xi^ = 1 expresses the requirement that 100% of the initial capital has to be invested.

Example 1.3 (Semidefinite Programming). In optimal control, variables y 1 ,... , ym have to be chosen so as to design a system that is driven by the linear ODE

u ˙ = M (y)u,

Among the properties of algorithms that are most often analysed in theorems we can single out three important groups: Correctness: Does the algorithm compute the claimed input-output relation for all input values? This is like proving that a mathematical equation or inequality holds true, where one side of the relation can be seen as the function of interest and the other as the computational rule or algorithm. Complexity: How many computer operations will running the algorithm require as a function of the input data? Since this cannot usually be determined exactly for all input values, one is often interested either in the worst case that can occur or in an average case under some probability distribution over the input values. Moreover, when an algorithm can be run on input data of various dimensions, one quantifies the worst-case complexity as a function of the problem dimension or another measure of input size. Finally, when a numerical algorithm proceeds by iteratively improving approximations to the true (theoretical) solution of a problem, the complexity is usually analysed in terms of the number of computer operations per iteration and the convergence speed or convergence rate of the algorithm. Reliability: Is there a guarantee of how accurately the final result of the algorithm approximates the true solution of the problem? Does this guarantee hold for a large set of input data, or are there domains in the input space for which the algorithm struggles to compute an accurate solution? How do rounding errors affect the computation?

Prerequisite Knowledge. Only linear algebra and multivariate calculus are required to understand this course. A course in numerical linear algebra or in numer- ical analysis helps in understanding some of the deeper issues but is not absolutely essential. Everything else will be developed from first principles. We will spend the remainder of this first lecture to discuss some important preliminary concepts. Other notions will be introduced if an when we need them.

2.1. Local versus Global Optimality. Let us first explain what we mean by “optimal solution” to the problem (P). A point x ∈ Rn^ is feasible for the optimisation problem (P) if gi(x) ≥ 0 ∀i and hj (x) = 0 ∀j, that is, if x satisfies all the constraints of the problem. The set F of feasible points is called the domain of feasibility of (P). A feasible point x∗^ is a local minimiser if there exists a ball Bǫ(x∗) around x∗^ such that

f (x∗) ≤ f (x) ∀x ∈ Bǫ(x∗) ∩ F,

that is, x∗^ is a minimiser amongst all the feasible points in a neighbourhood of x∗, but there might be feasible points further away from x∗^ where the objective function takes an even smaller value. A feasible point x∗^ is a global minimiser if

f (x∗) ≤ f (x) ∀x ∈ F,

that is, x∗^ minimises the objective function amongst all feasible points of the problem, although there might exist several of these points.

Example 2.1. The problem

(P ) min x∈R f (x) = x^3 + 9x^2

s.t. − 10 ≤ x ≤ 2

has a local minimiser at x = 0, and a global minimiser at x∗^ = − 10 , see Figure 2.1.

−100 −10 −8 −6 −4 −2 0 2

−

0

50

100

150

Fig. 2.1. Objective function of Example 2.

In the general framework, efficient algorithms can only be devised for the problem of finding a local minimiser. The problem of finding a global minimiser is extremely important in practise, but its solution is typically based on heuristics that rely on local minimisation as a subproblem. We therefore restrict the material of this course to local minimisation. A slightly confusing terminology is the following: an iterative algorithm for solv- ing (P) converges globally if the output sequence (xk)N converges to a local minimiser x∗^ for all starting points x 0 ∈ F. On the other hand, an iterative algorithm is called locally convergent if the output sequence (xk)N converges to a local minimiser x∗^ for all feasible starting points x 0 close enough to x∗, that is, for all x 0 ∈ Br (x∗) ∩ F for some r > 0.

Example 2.2. Let us go back to the problem of Example 2.1 and consider the following algorithm: S0 Choose x 0. Set α = 1, k = 0. S1 x = xk + αf ′(xk). S2 If x is feasible then goto S3, else α ← α/ 2 and goto S1. S3 Set xk+1 = x, k ← k + 1, α = 1, and goto S1.

This algorithm converges to the local minimiser x∗^ = 0 for all starting points x 0 ∈ (− 6 , 2], and to the global minimiser x∗^ = −10 for x 0 ∈ [− 10 , −6). For x 0 = − 6 it remains stuck. If we exclude x 0 as a starting point, then this algorithm is globally convergent, even though it only converges to local minimisers! The focus here is that the algorithm converges no matter what the starting point is. On the other hand, if we omit the judicious choice of α, we obtain the following algorithm: S0 Choose x 0. Set k = 0. S1 Set xk+1 = xk + f ′(xk), k ← k + 1, and goto S1.

{x : ‖x − x¯‖ < ρ}, ellipsoids {x : xTBx ≤ r} (with B a positive definite matrix) and affine subspaces {x : aTx = b} are all examples of convex sets. If C, D ⊆ Rn^ are convex sets, λ ∈ R and ϕ : Rn^ → Rm^ is a linear map, then C + D := {x + y : x ∈ C, y ∈ D}, λC := {λx : x ∈ C}, ϕ(C) := {ϕ(x) : x ∈ C} and C ∩ D are convex sets.

2.4. Convex Functions. Functions f : Rn^ → (−∞, +∞] into the real line extended by +∞ are called proper. A proper function is convex if its epigraph

epi(f ) :=

(x, z) ∈ Rn+1^ : f (x) ≤ z

is a convex set in Rn+1. A proper function f (assumed to be defined on all of Rn) is convex if and only if

f

λx + (1 − λ)y

≤ λf (x) + (1 − λ)f (y) (2.1)

for all x, y ∈ Rn, λ ∈ [0, 1]. If this becomes a strict inequality < for all λ ∈ (0, 1) we say that f is strictly convex. If f is convex then its effective domain dom(f ) := {x : f (x) < +∞} is a convex set in Rn. On the other hand, we call any function f : C → R which is defined on a convex set C and satisfies (2.1) for all x, y ∈ C and λ ∈ [0, 1] convex, and any such function can be extended to a convex proper function by setting f (x) := +∞ for all x /∈ C. If f and g are convex proper functions then so are f + g and λf for any λ ≥ 0. If F is a set of convex proper functions then the pointwise supremum

sup F

: x 7 → sup{f (x) : f ∈ F}

is a convex proper function. In particular, the pointwise maximum of finitely many convex proper functions is convex. If f is a convex proper function then all its level sets {x : f (x) ≤ z} (where z ∈ (−∞, +∞] is fixed) are convex. Any convex proper function f is continuous on the topological interior intr

dom(f )

of its effective domain. A proper function g : Rn^ → [−∞, +∞) or a function g : C → [−∞, +∞) defined on a convex set is called concave if −g is convex.

Theorem 2.4 (First order differential properties of convex functions). Let f : D → R be a function defined on a convex open domain D ⊂ Rn. (i) If f is convex then x∗^ is a local minimiser if and only if it is a global min- imiser. (ii) If f is C^1 on D, then f is convex if and only if for all x, y ∈ D,

f (y) ≥ f (x) + ∇f (x) · (y − x), (2.2)

that is, the graph of the first order approximation of f at x lies below the graph of f. (iii) If f is convex and ∇f (x∗) = 0 then x∗^ is a global minimiser of f. If D = Rn then this condition is both sufficient and necessary. (iv) f is both convex and concave if and only if f is an affine function. Proof. Suppose x∗^ ∈ D is a local but not a global minimiser. Then there exists a y ∈ D such that f (y) < f (x∗), and then f (λy + (1 − λ)x∗) ≤ λf (y) + (1 − λ)f (x∗) <

f (x∗) for all λ ∈ [0, 1) and x∗^ cannot be a local minimiser because λ can be chosen arbitrarily close to 0. On the other hand, every global minimiser is a local minimiser. This proves (i). Suppose now that f satisfies (2.2). Given λ ∈ [0, 1] and x, y ∈ D, let z = (1 − λ)x + λy. (2.2) implies

f (x) ≥ f (z) + ∇f (z) · (x − z) and f (y) ≥ f (z) + ∇f (z) · (y − z).

Multiplying the first inequality by (1 − λ) and the second by λ, and adding the two inequalities we get f (z) ≤ (1 − λ)f (x) + λf (y). Hence, f is convex. Suppose on the other hand that f is convex. Then f

x + λ(y − x)

≤ f (x) + λ

f (y) − f (x)

, and hence

f

x + λ(y − x)

− f (x) λ

≤ f (y) − f (x)

Taking limits as λ → 0 we get (2.2). This proves (ii). (iii) is a trivial consequence of (i) and (ii). If f is affine, then it is clearly both convex and concave. On the other hand, if f is both convex and concave, and if f is differentiable at least at one point x∗^ then it follows from (2.2) that f (y) ≥ f (x∗) + ∇f (x∗) · (y − x∗) and −f (y) ≥ −f (x∗) − ∇f (x∗) · (y − x∗) for all y. Hence, f (y) ≡ f (x∗) + ∇f (x∗) · (y − x∗). The general case can be proved in a similar way using the notion of subdifferential. One can also prove that there are always points where f is differentiable, but this is technically more difficult.

Theorem 2.5 (Second order differential properties of convex functions). Let f : D → R be a function defined on a convex open domain D ⊂ Rn. (i) If f is convex, x ∈ D and the Hessian H(x) = f ′′(x) exists, then H(x) 0 (positive semidefinite, that is, zTH(x)z ≥ 0 for all z ∈ R). (ii) If H(x) exists for all x ∈ D and H(x) 0 then f is convex. (iii) If H(x) exists for all x ∈ D and H(x) ≻ 0 (positive definite, that is, zTH(x)z > 0 for all z ∈ R \ { 0 }) then f is strictly convex. Proof. See homework assignments.

Introduction Optimization, Lecture Notes - Mathematics -, Study notes of Mathematical Methods

Related documents

Partial preview of the text

Download Introduction Optimization, Lecture Notes - Mathematics - and more Study notes Mathematical Methods in PDF only on Docsity!

C12.1B: CONTINUOUS OPTIMISATION

LECTURE 1: INTRODUCTION

∑^6

∑^6

∑^6