




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Central subject, Quadratic programming, convergence rates, convex functions
Typology: Study notes
1 / 8
This page cannot be seen from the preview
Don't miss anything!





RAPHAEL HAUSER MATHEMATICAL INSTITUTE, UNIVERSITY OF OXFORD
(P) min x∈Rn f (x)
s.t. gi(x) ≥ 0 , (i = 1,... , p), hj (x) = 0, (j = 1,... , q),
where f, gi and hj are sufficiently smooth functions: typically we require them to be twice continuously differentiable. The function f represents an objective (such as energy, cost etc.) that has to be minimised under side constraints defined by the functions gi and hj. We therefore call f the objective function, the functions gi the equality constraint functions, and the functions hj the inequality constraint functions of (P). Note that by replacing f by −f we can of course treat maximisation problems in the same framework.
Example 1.1 (Linear Programming). The transshipment problem occurs when the cheapest way of shipping prescribed amounts of a commodity across a transporta- tion network has to be determined. This can be a network of oil pipe lines, a computer network, a network of shipping lanes, a road network etc..
A network of gas pipelines is given in Figure 1.1 An arrow from node i to node j
6
5
4
3
2
1
Fig. 1.1. Gas pipeline network
represents a pipe with transport capacity cij in the given direction. Transporting one
unit of gas along the edge (ij) costs dij. The amount of gas produced at node i is pi, and the amount of gas consumed is qi. We assume that the total amount consumed equals the total amount of gas produced (if this assumption were not true, we could construct an equivalent transshipment problem that has this property). How do the quantities xij of gas shipped along the edges (ij) to be chosen so as to satisfy all the demands and to minimise costs? We set cij = 0 (and dij arbitrary numbers) for all edges (ij) that do not exist. Doing so, we can assume that the network is a complete graph. The problem we have to solve is the following:
min x
i,j=
dij xij
s.t.
k=
xki + pi =
j=
xij + qi, (i = 1,... , 6),
0 ≤ xij ≤ cij , (i, j = 1,... , 6).
This is an example of a linear programming problem, as the objective function and all the constraint functions are linear. Note that it is not a priori clear that this problem has feasible solutions. One is therefore interested in algorithms that not only find optimal LP solutions when these exist but also detect when a problem instance is infeasible!
Example 1.2 (Quadratic Programming). In the portfolio optimisation problem, an investor considers a fixed time interval and wishes to decide which fraction of the capital he/she wants to invest in each of n different given assets when the expected return of asset i is μi and the covariance between assets i and j is σij. The vector μ = [μi] and the matrix σ = [σij ] are assumed to be known and the investor aims at a total return of at least b. Subject to this constraint, he/she aims to minimise the risk as quantified by the variance of the overall portfolio.
This problem can be modelled as
min x∈Rn
∑^ n
i=
∑^ n
j=
σij xixj
s.t.
∑^ n
i=
μixi ≥ b,
∑^ n
i=
xi = 1,
xi ≥ 0 (i = 1,... , n).
The constraint
∑n i=1 xi^ = 1 expresses the requirement that 100% of the initial capital has to be invested.
Example 1.3 (Semidefinite Programming). In optimal control, variables y 1 ,... , ym have to be chosen so as to design a system that is driven by the linear ODE
u ˙ = M (y)u,
Among the properties of algorithms that are most often analysed in theorems we can single out three important groups: Correctness: Does the algorithm compute the claimed input-output relation for all input values? This is like proving that a mathematical equation or inequality holds true, where one side of the relation can be seen as the function of interest and the other as the computational rule or algorithm. Complexity: How many computer operations will running the algorithm require as a function of the input data? Since this cannot usually be determined exactly for all input values, one is often interested either in the worst case that can occur or in an average case under some probability distribution over the input values. Moreover, when an algorithm can be run on input data of various dimensions, one quantifies the worst-case complexity as a function of the problem dimension or another measure of input size. Finally, when a numerical algorithm proceeds by iteratively improving approximations to the true (theoretical) solution of a problem, the complexity is usually analysed in terms of the number of computer operations per iteration and the convergence speed or convergence rate of the algorithm. Reliability: Is there a guarantee of how accurately the final result of the algorithm approximates the true solution of the problem? Does this guarantee hold for a large set of input data, or are there domains in the input space for which the algorithm struggles to compute an accurate solution? How do rounding errors affect the computation?
2.1. Local versus Global Optimality. Let us first explain what we mean by “optimal solution” to the problem (P). A point x ∈ Rn^ is feasible for the optimisation problem (P) if gi(x) ≥ 0 ∀i and hj (x) = 0 ∀j, that is, if x satisfies all the constraints of the problem. The set F of feasible points is called the domain of feasibility of (P). A feasible point x∗^ is a local minimiser if there exists a ball Bǫ(x∗) around x∗^ such that
f (x∗) ≤ f (x) ∀x ∈ Bǫ(x∗) ∩ F,
that is, x∗^ is a minimiser amongst all the feasible points in a neighbourhood of x∗, but there might be feasible points further away from x∗^ where the objective function takes an even smaller value. A feasible point x∗^ is a global minimiser if
f (x∗) ≤ f (x) ∀x ∈ F,
that is, x∗^ minimises the objective function amongst all feasible points of the problem, although there might exist several of these points.
Example 2.1. The problem
(P ) min x∈R f (x) = x^3 + 9x^2
s.t. − 10 ≤ x ≤ 2
has a local minimiser at x = 0, and a global minimiser at x∗^ = − 10 , see Figure 2.1.
−100 −10 −8 −6 −4 −2 0 2
−
0
50
100
150
Fig. 2.1. Objective function of Example 2.
In the general framework, efficient algorithms can only be devised for the problem of finding a local minimiser. The problem of finding a global minimiser is extremely important in practise, but its solution is typically based on heuristics that rely on local minimisation as a subproblem. We therefore restrict the material of this course to local minimisation. A slightly confusing terminology is the following: an iterative algorithm for solv- ing (P) converges globally if the output sequence (xk)N converges to a local minimiser x∗^ for all starting points x 0 ∈ F. On the other hand, an iterative algorithm is called locally convergent if the output sequence (xk)N converges to a local minimiser x∗^ for all feasible starting points x 0 close enough to x∗, that is, for all x 0 ∈ Br (x∗) ∩ F for some r > 0.
Example 2.2. Let us go back to the problem of Example 2.1 and consider the following algorithm: S0 Choose x 0. Set α = 1, k = 0. S1 x = xk + αf ′(xk). S2 If x is feasible then goto S3, else α ← α/ 2 and goto S1. S3 Set xk+1 = x, k ← k + 1, α = 1, and goto S1.
This algorithm converges to the local minimiser x∗^ = 0 for all starting points x 0 ∈ (− 6 , 2], and to the global minimiser x∗^ = −10 for x 0 ∈ [− 10 , −6). For x 0 = − 6 it remains stuck. If we exclude x 0 as a starting point, then this algorithm is globally convergent, even though it only converges to local minimisers! The focus here is that the algorithm converges no matter what the starting point is. On the other hand, if we omit the judicious choice of α, we obtain the following algorithm: S0 Choose x 0. Set k = 0. S1 Set xk+1 = xk + f ′(xk), k ← k + 1, and goto S1.
{x : ‖x − x¯‖ < ρ}, ellipsoids {x : xTBx ≤ r} (with B a positive definite matrix) and affine subspaces {x : aTx = b} are all examples of convex sets. If C, D ⊆ Rn^ are convex sets, λ ∈ R and ϕ : Rn^ → Rm^ is a linear map, then C + D := {x + y : x ∈ C, y ∈ D}, λC := {λx : x ∈ C}, ϕ(C) := {ϕ(x) : x ∈ C} and C ∩ D are convex sets.
2.4. Convex Functions. Functions f : Rn^ → (−∞, +∞] into the real line extended by +∞ are called proper. A proper function is convex if its epigraph
epi(f ) :=
(x, z) ∈ Rn+1^ : f (x) ≤ z
is a convex set in Rn+1. A proper function f (assumed to be defined on all of Rn) is convex if and only if
f
λx + (1 − λ)y
≤ λf (x) + (1 − λ)f (y) (2.1)
for all x, y ∈ Rn, λ ∈ [0, 1]. If this becomes a strict inequality < for all λ ∈ (0, 1) we say that f is strictly convex. If f is convex then its effective domain dom(f ) := {x : f (x) < +∞} is a convex set in Rn. On the other hand, we call any function f : C → R which is defined on a convex set C and satisfies (2.1) for all x, y ∈ C and λ ∈ [0, 1] convex, and any such function can be extended to a convex proper function by setting f (x) := +∞ for all x /∈ C. If f and g are convex proper functions then so are f + g and λf for any λ ≥ 0. If F is a set of convex proper functions then the pointwise supremum
sup F
: x 7 → sup{f (x) : f ∈ F}
is a convex proper function. In particular, the pointwise maximum of finitely many convex proper functions is convex. If f is a convex proper function then all its level sets {x : f (x) ≤ z} (where z ∈ (−∞, +∞] is fixed) are convex. Any convex proper function f is continuous on the topological interior intr
dom(f )
of its effective domain. A proper function g : Rn^ → [−∞, +∞) or a function g : C → [−∞, +∞) defined on a convex set is called concave if −g is convex.
Theorem 2.4 (First order differential properties of convex functions). Let f : D → R be a function defined on a convex open domain D ⊂ Rn. (i) If f is convex then x∗^ is a local minimiser if and only if it is a global min- imiser. (ii) If f is C^1 on D, then f is convex if and only if for all x, y ∈ D,
f (y) ≥ f (x) + ∇f (x) · (y − x), (2.2)
that is, the graph of the first order approximation of f at x lies below the graph of f. (iii) If f is convex and ∇f (x∗) = 0 then x∗^ is a global minimiser of f. If D = Rn then this condition is both sufficient and necessary. (iv) f is both convex and concave if and only if f is an affine function. Proof. Suppose x∗^ ∈ D is a local but not a global minimiser. Then there exists a y ∈ D such that f (y) < f (x∗), and then f (λy + (1 − λ)x∗) ≤ λf (y) + (1 − λ)f (x∗) <
f (x∗) for all λ ∈ [0, 1) and x∗^ cannot be a local minimiser because λ can be chosen arbitrarily close to 0. On the other hand, every global minimiser is a local minimiser. This proves (i). Suppose now that f satisfies (2.2). Given λ ∈ [0, 1] and x, y ∈ D, let z = (1 − λ)x + λy. (2.2) implies
f (x) ≥ f (z) + ∇f (z) · (x − z) and f (y) ≥ f (z) + ∇f (z) · (y − z).
Multiplying the first inequality by (1 − λ) and the second by λ, and adding the two inequalities we get f (z) ≤ (1 − λ)f (x) + λf (y). Hence, f is convex. Suppose on the other hand that f is convex. Then f
x + λ(y − x)
≤ f (x) + λ
f (y) − f (x)
, and hence
f
x + λ(y − x)
− f (x) λ
≤ f (y) − f (x)
Taking limits as λ → 0 we get (2.2). This proves (ii). (iii) is a trivial consequence of (i) and (ii). If f is affine, then it is clearly both convex and concave. On the other hand, if f is both convex and concave, and if f is differentiable at least at one point x∗^ then it follows from (2.2) that f (y) ≥ f (x∗) + ∇f (x∗) · (y − x∗) and −f (y) ≥ −f (x∗) − ∇f (x∗) · (y − x∗) for all y. Hence, f (y) ≡ f (x∗) + ∇f (x∗) · (y − x∗). The general case can be proved in a similar way using the notion of subdifferential. One can also prove that there are always points where f is differentiable, but this is technically more difficult.
Theorem 2.5 (Second order differential properties of convex functions). Let f : D → R be a function defined on a convex open domain D ⊂ Rn. (i) If f is convex, x ∈ D and the Hessian H(x) = f ′′(x) exists, then H(x) 0 (positive semidefinite, that is, zTH(x)z ≥ 0 for all z ∈ R). (ii) If H(x) exists for all x ∈ D and H(x) 0 then f is convex. (iii) If H(x) exists for all x ∈ D and H(x) ≻ 0 (positive definite, that is, zTH(x)z > 0 for all z ∈ R \ { 0 }) then f is strictly convex. Proof. See homework assignments.