MOSEK Modeling Cookbook, Lecture notes of Linear Programming

Linear programming introduction

Typology: Lecture notes

2020/2021

Uploaded on 04/29/2021

kuku-tote
kuku-tote 🇳🇬

1 document

1 / 103

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
MOSEK Modeling Cookbook
Release 2.3
MOSEK ApS
13 July 2018
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download MOSEK Modeling Cookbook and more Lecture notes Linear Programming in PDF only on Docsity!

MOSEK Modeling Cookbook

Release 2.

MOSEK ApS

13 July 2018

Contents

Chapter 1

Preface

This cookbook is about model building using convex optimization. It is intended as a modeling guide for the MOSEK optimization package. However, the style is intentionally quite generic without specific MOSEK commands or API descriptions. There are several excellent books available on this topic, for example the recent books by Ben-Tal and Nemirovski [BenTalN01] and Boyd and Vandenberghe [BV04] , which have both been a great source of inspiration for this manual. The purpose of this manual is to collect the material which we consider most relevant to our customers and to present it in a practical self-contained manner; however, we highly recommend the books as a supplement to this manual. Some textbooks on building models using optimization (or mathematical program- ming) introduce different concept through practical examples. In this manual we have chosen a different route, where we instead show the different sets and functions that can be modeled using convex optimization, which can subsequently be combined into realis- tic examples and applications. In other words, we present simple convex building blocks, which can then be combined into more elaborate convex models. With the advent of more expressive and sophisticated tools like conic optimization, we feel that this approach is better suited. The first three chapters discuss self-dual conic optimization, namely linear optimiza- tion (Sec. 2), conic quadratic optimization (Sec. 3) and semidefinite optimization (Sec. 4), which should be read in succession. Sec. 5 discusses quadratic optimization, and has Sec. 3 as a prerequisite. Sec. 6 diverges from the path of convex optimization and dis- cusses mixed integer conic optimization. The remaining chapters delve deeper into more specialized topics. Sec. 10 contains details on notation used in the manual.

Chapter 2

Linear optimization

2.1 Introduction

In this chapter we discuss different aspects of linear optimization. We first introduce the basic concepts of a linear optimization and discuss the underlying geometric interpreta- tions. We then give examples of the most frequently used reformulations or modeling tricks used in linear optimization, and we finally discuss duality and infeasibility theory in some detail.

2.1.1 Basic notions

The most basic class of optimization is linear optimization. In linear optimization we minimize a linear function given a set of linear constraints. For example, we may wish to minimize a linear function

𝑥 1 + 2𝑥 2 − 𝑥 3

under the constraints that

𝑥 1 + 𝑥 2 + 𝑥 3 = 1, 𝑥 1 , 𝑥 2 , 𝑥 3 ≥ 0.

The function we minimize is often called the objective function; in this case we have a linear objective function. The constraints are also linear and consists of both linear equality constraints and linear inequality constraints. We typically use a more compact notation

minimize 𝑥 1 + 2𝑥 2 − 𝑥 3 subject to 𝑥 1 + 𝑥 2 + 𝑥 3 = 1 𝑥 1 , 𝑥 2 , 𝑥 3 ≥ 0 ,

and we call (2.1) an linear optimization problem. The domain where all constraints are satisfied is called the feasible set; the feasible set for (2.1) is shown in Fig. 2.1. For this simple problem we see by inspection that the optimal value of the problem is − 1 obtained by the optimal solution

(𝑥⋆ 1 , 𝑥⋆ 2 , 𝑥⋆ 3 ) = (0, 0 , 1).

a

x (^0) x

aT^ x (^) = (^) γ

Fig. 2.2: The dashed line illustrates a hyperplane {𝑥 | 𝑎𝑇^ 𝑥 = 𝛾}

with 𝐴 ∈ R𝑚×𝑛^ represents an intersection of 𝑚 hyperplanes. Next consider a point 𝑥 above the hyperplane in Fig. 2.2. Since 𝑥 − 𝑥 0 forms an acute angle with 𝑎 we have that 𝑎𝑇^ (𝑥 − 𝑥 0 ) ≥ 0 , or 𝑎𝑇^ 𝑥 ≥ 𝛾. The set {𝑥 | 𝑎𝑇^ 𝑥 ≥ 𝛾} is called a halfspace, see Fig. 2.3. Similarly the set {𝑥 | 𝑎𝑇^ 𝑥 ≤ 0 } forms another halfspace; in Fig. 2.3 it corresponds to the area below the dashed line.

a

x (^0) x

aT^ x > γ

Fig. 2.3: The grey area is the halfspace {𝑥 | 𝑎𝑇^ 𝑥 ≥ 𝛾}

A set of linear inequalities

𝐴𝑥 ≤ 𝑏

corresponds to an intersection of halfspaces and forms a polyhedron, see Fig. 2.4.

a 1

a 2 a 3

a 4

a 5

Fig. 2.4: A polyhedron formed as an intersection of halfspaces.

The polyhedral description of the feasible set gives us a very intuitive interpretation of linear optimization, which is illustrated in Fig. 2.5. The dashed lines are normal to the objective 𝑐 = (− 1 , 1), and to minimize 𝑐𝑇^ 𝑥 we move as far as possible in the opposite direction of 𝑐, to a point where one the normals intersect the polyhedron; an optimal solution is therefore always either a vertex of the polyhedron, or an entire facet of the polyhedron may be optimal.

x⋆

a 1

a 2 a 3

a 4

a 5

c

Fig. 2.5: Geometric interpretation of linear optimization. The optimal solution 𝑥⋆^ is at point where the normals to 𝑐 (the dashed lines) intersect the polyhedron.

The polyhedron shown in the figure in bounded, but this is not always the case for polyhedra coming from linear inequalities in optimization problems. In such cases the optimization problem may be unbounded, which we will discuss in detail in Sec. 2.4.

2.2 Linear modeling

In this section we discuss both useful reformulation techniques to model different func- tions using linear programming, as well as common practices that are best avoided. By modeling we mean equivalent reformulations that lead to the same optimal solution; there is no approximation or modeling error involved.

2.2.1 Convex piecewise-linear functions

Perhaps the most basic and frequently used reformulation for linear optimization involves modeling a convex piecewise-linear function by introducing a number of linear inequalities. Consider the convex piecewise-linear (or rather piecewise-affine) function illustrated in Fig. 2.6, where the function can be described as max{𝑎 1 𝑥 + 𝑏 1 , 𝑎 2 𝑥 + 𝑏 2 , 𝑎 3 𝑥 + 𝑏 3 }.

a 1 x + b 1

a 2 x + b 2

a 3 x + b 3

Fig. 2.6: A convex piecewise-linear function (solid lines) of a single variable 𝑥. The function is defined as the maximum of 3 affine functions.

with additional (auxiliary) variable 𝑧 ∈ R𝑛, and we claim that (2.2) and (2.3) are equiv- alent. They are equivalent if all 𝑧𝑖 = |𝑥𝑖| in (2.3). Suppose that 𝑡 is optimal (as small as possible) for (2.3), but that some 𝑧𝑖 > |𝑥𝑖|. But then we could reduce 𝑡 further by reducing 𝑧𝑖 contradicting the assumption that 𝑡 optimal, so the two formulations are equivalent. Therefore, we can model (2.2) using linear (in)equalities

∑︁^ 𝑛

𝑖=

with auxiliary variables 𝑧. Similarly, we can describe the epigraph of the norm of an affine function of 𝑥,

‖𝐴𝑥 − 𝑏‖ 1 ≤ 𝑡

as

∑︁^ 𝑛

𝑖=

where 𝑎𝑖 is the 𝑖−th row of 𝐴 (taken as a column-vector). The ℓ 1 norm is overwhelmingly popular as a convex approximation of the cardinality (i.e., number on nonzero elements) of a vector 𝑥. For example, suppose we are given an underdetermined linear system

𝐴𝑥 = 𝑏

where 𝐴 ∈ R𝑚×𝑛^ and 𝑚 << 𝑛. The basis pursuit problem

minimize ‖𝑥‖ 1 subject to 𝐴𝑥 = 𝑏,

uses the ℓ 1 norm of 𝑥 as a heuristic of finding a sparse solution (one with many zero elements) to 𝐴𝑥 = 𝑏, i.e., it aims to represent 𝑏 using few columns of 𝐴. Using the reformulation above we can pose the problem as a linear optimization problem,

minimize 𝑒𝑇^ 𝑧 subject to −𝑧 ≤ 𝑥 ≤ 𝑧 𝐴𝑥 = 𝑏,

where 𝑒 = (1,... , 1)𝑇^.

2.2.4 The ℓ∞ norm

The ℓ∞ norm of a vector 𝑥 ∈ R𝑛^ is defined as

‖𝑥‖∞ := max 𝑖=1,...,𝑛

which is another example of simple piecewise-linear functions. To model

‖𝑥‖∞ ≤ 𝑡 (2.6)

we use that 𝑡 ≥ max𝑖=1,...,𝑛 |𝑥𝑖| if and only if 𝑡 is greater than each term, i.e., we can model (2.6) as

−𝑡 ≤ 𝑥𝑖 ≤ 𝑡, 𝑖 = 1,... , 𝑛.

Again, we can also consider an affine function of 𝑥, i.e.,

‖𝐴𝑥 − 𝑏‖∞ ≤ 𝑡,

which can be described as

−𝑡 ≤ 𝑎𝑇𝑖 𝑥 − 𝑏 ≤ 𝑡, 𝑖 = 1,... , 𝑛.

It is interesting to note that the ℓ 1 and ℓ∞ norms are dual norms. For any norm ‖ · ‖ on R𝑛, the dual norm ‖ · ‖* is defined as

‖𝑥‖* = max{𝑥𝑇^ 𝑣 | ‖𝑣‖ ≤ 1 }.

Let us verify that the dual of the ℓ∞ norm is the ℓ 1 norm. Consider

‖𝑥‖*,∞ = max{𝑥𝑇^ 𝑣 | ‖𝑣‖∞ ≤ 1 }.

Obviously the maximum is attained for

i.e., ‖𝑥‖*,∞ = ‖𝑥‖ 1 =

(^) 𝑖 |𝑥𝑖|. Similarly, consider the dual of the ℓ 1 norm,

‖𝑥‖⋆, 1 = max{𝑥𝑇^ 𝑣 | ‖𝑣‖ 1 ≤ 1 }.

To maximize 𝑥𝑇^ 𝑣 subject to |𝑣 1 | + · · · + |𝑣𝑛| ≤ 1 we identify the largest element of 𝑥, say |𝑥𝑘|, The optimizer 𝑣 is then given by 𝑣𝑘 = ± 1 and 𝑣𝑖 = 0, 𝑖 ̸= 𝑘, i.e., ‖𝑥‖, 1 = ‖𝑥‖∞. This illustrates a more general property of dual norms, namely that ‖𝑥‖* = ‖𝑥‖.

2.2.5 Avoid ill-posed problems

A problem is ill posed if small perturbations of the problem data result in arbitrarily large perturbations of the solution, or change feasibility of the problem. Such problem formulations should always be avoided as even the smallest numerical perturbations (for example rounding errors, or solving the problem on a different computer) can result in different or wrong solutions. Additionally, from an algorithmic point of view, even computing a wrong solution is very difficult for ill-posed problems. A rigorous definition of the degree of ill-posedness is possible by defining a condition number for a linear optimization, but unfortunately this is not a very practical metric, as evaluating such a condition number requires solving several optimization problems. Therefore even though being able to quantify the difficulty of an optimization problem from a condition number is very attractive, we only make the modest recommendations to avoid problems

ˆ that are nearly infeasible,

with a dual problem

maximize 𝑏𝑇^ 𝑦 − 𝛾𝑒𝑇^ 𝑧 subject to 𝐴𝑇^ 𝑦 + 𝑠 − 𝑧 = 𝑐 𝑠, 𝑧 ≥ 0.

Suppose we do not know a-priori an upper bound on ‖𝑥‖∞, so we choose 𝛾 = 10^12 reasoning that this will not change the optimal solution. Note that the large variable bound becomes a penalty term in the dual problem; in finite precision such a large bound will effectively destroy accuracy of the solution.

2.3 Duality in linear optimization

Duality theory is a rich and powerful area of convex optimization, and central to un- derstanding sensitivity analysis and infeasibility issues in linear (and convex) optimiza- tion. Furthermore,it provides a simple and systematic way of obtaining non-trivial lower bounds on the optimal value for many difficult non-convex problem. In this section we only discuss duality theory at a descriptive level suited for practitioners; we refer to Sec. 9 for a more advanced treatment.

2.3.1 The dual problem

Initially, consider the standard linear optimization problem

minimize 𝑐𝑇^ 𝑥 subject to 𝐴𝑥 = 𝑏 𝑥 ≥ 0.

Associated with (2.7) is a so-called Lagrangian function 𝐿 : R𝑛^ × R𝑚^ × R𝑛^ ↦→ R that augments the objective with a weighted combination of all the constraints,

𝐿(𝑥, 𝑦, 𝑠) = 𝑐𝑇^ 𝑥 + 𝑦𝑇^ (𝑏 − 𝐴𝑥) − 𝑠𝑇^ 𝑥,

The variables 𝑦 ∈ R𝑝^ and 𝑠 ∈ R𝑛 + are called Lagrange multipliers or dual variables. It is easy to verify that

𝐿(𝑥, 𝑦, 𝑠) ≤ 𝑐𝑇^ 𝑥

for any feasible 𝑥. Indeed, we have 𝑏 − 𝐴𝑥 = 0 and 𝑥𝑇^ 𝑠 ≥ 0 since 𝑥, 𝑠 ≥ 0 , i.e., 𝐿(𝑥, 𝑦, 𝑠) ≤ 𝑐𝑇^ 𝑥. Note the importance of nonnegativity of 𝑠; more generally of all La- grange multipliers associated with inequality constraints. Without the nonnegativity constraint the Lagrangian function is not a lower bound. The dual function is defined as the minimum of 𝐿(𝑥, 𝑦, 𝑠) over 𝑥. Thus the dual function of (2.7) is

𝑔(𝑦, 𝑠) = min 𝑥 𝐿(𝑥, 𝑦, 𝑠) = min 𝑥

𝑥𝑇^ (𝑐 − 𝐴𝑇^ 𝑦 − 𝑠) + 𝑏𝑇^ 𝑦.

We see that the Langrangian function is linear in 𝑥, so it is unbounded below unless when 𝑐 − 𝐴𝑇^ 𝑦 − 𝑠 = 0, i.e.,

𝑏𝑇^ 𝑦, 𝑐 − 𝐴𝑇^ 𝑦 − 𝑠 = 0

−∞, otherwise.

Finally, we get a dual problem is by maximizing 𝑔(𝑦, 𝑠). The dual problem of (2.7) is

maximize 𝑏𝑇^ 𝑦 subject to 𝑐 − 𝐴𝑇^ 𝑦 = 𝑠 𝑠 ≥ 0.

Example 2.1 (Dual of basis pursuit). As another example, let us derive the dual of the basis pursuit formulation (2.5). The Lagrangian function is

𝐿(𝑥, 𝑧, 𝑦, 𝑢, 𝑣) = 𝑒𝑇^ 𝑧 + 𝑢𝑇^ (𝑥 − 𝑧) − 𝑣𝑇^ (𝑥 + 𝑧) + 𝑦𝑇^ (𝑏 − 𝐴𝑥)

with Lagrange multipliers 𝑦 ∈ R𝑚^ and 𝑢, 𝑣 ∈ R𝑛 +. The dual function

𝑔(𝑦, 𝑢, 𝑣) = min 𝑥,𝑧

𝐿(𝑥, 𝑧, 𝑦, 𝑢, 𝑣) = min 𝑥,𝑧

𝑧𝑇^ (𝑒 − 𝑢 − 𝑣) + 𝑥𝑇^ (𝑢 − 𝑣 − 𝐴𝑇^ 𝑦) + 𝑦𝑇^ 𝑏

is linear in 𝑧 and 𝑥 so unbounded below unless 𝑒 = 𝑢 + 𝑣 and 𝐴𝑇^ 𝑦 = 𝑢 − 𝑣, i.e., the dual problem is

maximize 𝑏𝑇^ 𝑦 subject to 𝑒 = 𝑢 + 𝑣, 𝐴𝑇^ 𝑦 = 𝑢 − 𝑣 𝑢, 𝑣 ≥ 0.

Example 2.2 (Dual of basis pursuit revisited). We can also derive the dual of the basis pursuit formulation (2.4) directly. The Lagrangian is

𝐿(𝑥, 𝑦) = ‖𝑥‖ 1 + 𝑦𝑇^ (𝐴𝑥 − 𝑏) = ‖𝑥‖ 1 + 𝑥𝑇^ 𝐴𝑇^ 𝑦 − 𝑏𝑇^ 𝑦

with a dual function

𝑔(𝑦, 𝑠) = −𝑏𝑇^ 𝑦 + min 𝑥

(‖𝑥‖ 1 + 𝑥𝑇^ 𝐴𝑇^ 𝑦).

The term min𝑥(‖𝑥‖ 1 + 𝑥𝑇^ 𝐴𝑇^ 𝑦) can be simplified as

min 𝑥 (‖𝑥‖ 1 + 𝑥𝑇^ 𝐴𝑇^ 𝑦) = min 𝑡≥ 0 min ‖𝑧‖ 1 =

(𝑡‖𝑧‖ 1 + 𝑡𝑧𝑇^ 𝐴𝑇^ 𝑦)

= min 𝑡≥ 0

𝑡(1 − max ‖𝑧‖ 1 =

𝑧𝑇^ 𝐴𝑇^ 𝑦)

= min 𝑡≥ 0

𝑡(1 − ‖𝐴𝑇^ 𝑦‖∞),

where we used the definition of the dual norm in the last line. Finally min𝑡≥ 0 𝑡(1 − ‖𝐴𝑇^ 𝑦‖∞) is 0 if ‖𝐴𝑇^ 𝑦‖∞ ≤ 1 and unbounded below otherwise. In other words, we get a dual function

𝑔(𝑦) =

−𝑏𝑇^ 𝑦, ‖𝐴𝑇^ 𝑦‖∞ ≤ 1 ,

−∞, otherwise,

and a dual problem

maximize 𝑏𝑇^ 𝑦 subject to ‖𝐴𝑇^ 𝑦‖∞ ≤ 1.

2.4 Infeasibility in linear optimization

2.4.1 Basic concepts

In Sec. 2.3.2 we summarized the main duality properties, namely weak and strong duality properties. In this section we discuss situations where strong duality does not hold. Those situations are captured by the following two results known as (variations of) Farkas’ lemma; for proofs see Sec. 9.

Lemma 2.1 (Farkas’ lemma). Given 𝐴 and 𝑏, exactly one of the two statements are true:

  1. There exists an 𝑥 ≥ 0 such that 𝐴𝑥 = 𝑏.
  2. There exists a 𝑦 such that 𝐴𝑇^ 𝑦 ≤ 0 and 𝑏𝑇^ 𝑦 > 0.

The Farkas lemma tells us that either the primal problem (2.7) is feasible (ℱ𝑝 ̸= ∅) or there exists a 𝑦 such that 𝐴𝑇^ 𝑦 ≤ 0 and 𝑏𝑇^ 𝑦 > 0. In other words, any 𝑦 satisfying

𝐴𝑇^ 𝑦 ≤ 0 , 𝑏𝑇^ 𝑦 > 0

is a certificate of primal infeasibility. We can also think of a Farkas certificate as an unbounded direction for the dual problem; to that end assume that

̸∃𝑥 ≥ 0 : 𝐴𝑥 = 𝑏,

so we have a 𝑦 satisfying 𝐴𝑇^ 𝑦 ≤ 0 and 𝑏𝑇^ 𝑦 > 0. If we further assume existence of point 𝑦 0 satifying

𝑐 − 𝐴𝑇^ 𝑦 0 ≥ 0

then the dual remains feasible in the direction of 𝑦,

𝑐 − 𝐴𝑇^ (𝑡𝑦 + 𝑦 0 ) ≥ 0 , ∀𝑡 ≥ 0

with an unbounded objective 𝑏𝑇^ (𝑡𝑦 + 𝑦 0 ) → ∞ for 𝑡 → ∞, i.e., 𝑑⋆^ = ∞. Similarly, the dual variant of Farkas’ lemma states that either the dual problem is feasible (ℱ𝑑 ̸= ∅) or there exists an 𝑥 ≥ 0 such that 𝐴𝑥 = 0 and 𝑐𝑇^ 𝑥 < 0. More precisely

Lemma 2.2 (Farkas’ lemma dual variant). Given 𝐴 and 𝑐, exactly one of the two state- ments are true:

  1. There exists an 𝑥 ≥ 0 such that 𝐴𝑥 = 0 and 𝑐𝑇^ 𝑥 < 0.
  2. There exists a 𝑦 such that 𝑐 − 𝐴𝑇^ 𝑦 ≥ 0.

In other words, any 𝑥 ≥ 0 satisfying 𝐴𝑥 = 0 and 𝑐𝑇^ 𝑥 < 0 is a certificate of dual infeasibility. If the primal problem is feasible, then the certificate is a feasible unbounded direction for the primal objective, i.e., 𝑝⋆^ = −∞. Below we summarize the different cases that can occur in linear optimization:

ˆ If the either the primal or dual problems are feasible, we have strong duality, i.e., 𝑝⋆^ = 𝑑⋆.

ˆ If the primal problem is infeasible (𝑝⋆^ = ∞), then from Farkas’ lemma the dual problem is unbounded (𝑑⋆^ = ∞) or infeasible (𝑑⋆^ = −∞).

ˆ If the primal problem is unbounded (𝑝⋆^ = −∞), then from weak duality the dual problem is infeasible (𝑑⋆^ = −∞).

ˆ If the dual problem is infeasible (𝑑⋆^ = −∞), then from Farkas’ dual lemma then the primal problem is unbounded (𝑝⋆^ = −∞) or infeasible (𝑝⋆^ = ∞).

Example 2.3 (Primal and dual infeasibility). As an example exhibiting both primal and dual infeasibility consider the problem

minimize −𝑥 1 − 𝑥 2 subject to 𝑥 1 = − 1 𝑥 1 , 𝑥 2 ≥ 0

with a dual problem

maximize −𝑦

subject to

[︂

]︂

[︂

]︂

Both the primal and dual problems are trivially infeasible; 𝑦 = − 1 serves as a certificate of primal infeasibility, and 𝑥 = (0, 1) is a certificate of dual infeasibility.

2.4.2 Locating infeasibility

In some cases we are interested in locating the cause of infeasibility in a model, for example if we expect the infeasibility to be caused by an error in the problem formulation. This can be difficult in practice, but a Farkas certificate lets us reduce the dimension of the infeasible problem, which in some cases pinpoints the cause of infeasibility. To that end, suppose we are given a certificate of primal infeasibility,

𝐴𝑇^ 𝑦 ≤ 0 , 𝑏𝑇^ 𝑦 > 0 ,

and define the index-set

ℐ = {𝑖 | 𝑦𝑖 ̸= 0}.

Consider the reduced set of constraints

𝐴ℐ,:𝑥 = 𝑏ℐ , 𝑥 ≥ 0.

It is then easy to verify that

𝐴𝑇 ℐ,:𝑦ℐ ≤ 0 , 𝑏𝑇 ℐ 𝑦ℐ > 0

is an infeasibility certificate for the reduced problem with fewer constraints. If the reduced system is sufficiently small, it may be possible to locate the cause of infeasibility by manual inspection.

Chapter 3

Conic quadratic optimization

3.1 Introduction

This chapter extends the notion of linear optimization with quadratic cones; conic quadratic optimization is a straightforward generalization of linear optimization, in the sense that we optimize a linear function with linear (in)equalities with variables belonging to one or more (rotated) quadratic cones. In general we also allow some of the variables to be linear variables as long as some of the variables belong to a quadratic cone. We discuss the basic concept of quadratic cones, and demonstrate the surprisingly large flexibility of conic quadratic modeling with several examples of (non-trivial) convex functions or sets that be represented using quadratic cones. These convex sets can then be combined arbitrarily to form different conic quadratic optimization problems. We finally extend the duality theory and infeasibility analysis from linear to conic optimization, and discuss infeasibility of conic quadratic optimization problems.

3.1.1 Quadratic cones

We define an 𝑛-dimensional quadratic cone as

𝒬𝑛^ =

𝑥 ∈ R𝑛^ | 𝑥 1 ≥

𝑥^22 + 𝑥^23 + · · · + 𝑥^2 𝑛

.^ (3.1)

The geometric interpretation of a quadratic (or second-order) cone is shown in Fig. 3. for a cone with three variables, and illustrates how the boundary of the cone resembles an ice-cream cone. The 1-dimensional quadratic cone simply implies standard nonnegativity 𝑥 1 ≥ 0. A set 𝑆 is called a convex cone if for any 𝑥 ∈ 𝑆 we have 𝛼𝑥 ∈ 𝑆, ∀𝛼 ≥ 0. From the definition (3.1) it is clear that if 𝑥 ∈ 𝒬𝑛^ then obviously 𝛼𝑥 ∈ 𝒬𝑛, ∀𝛼 ≥ 0 , which justifies the notion quadratic cone.

3.1.2 Rotated quadratic cones

An 𝑛−dimensional rotated quadratic cone is defined as

𝒬𝑛𝑟 =

𝑥 ∈ R𝑛^ | 2 𝑥 1 𝑥 2 ≥ 𝑥^23 + · · · + 𝑥^2 𝑛, 𝑥 1 , 𝑥 2 ≥ 0

x 2 x 3

x 1

Fig. 3.1: A quadratic or second-order cone satisfying 𝑥 1 ≥

𝑥^22 + 𝑥^23.

As the name indicates, there is simple relationship between quadratic and rotated quadratic cones. Define an orthogonal transformation

Then it is easy to verify that

𝑥 ∈ 𝒬𝑛^ ⇐⇒ (𝑇𝑛𝑥) ∈ 𝒬𝑛𝑟 ,

and since 𝑇 is orthogonal we call 𝒬𝑟 a rotated cone; the transformation corresponds to a rotation of 𝜋/ 4 of the (𝑥 1 , 𝑥 2 ) axis. For example if 𝑥 ∈ 𝒬^3 and

√^1 2 √^1 1 2 0 √ 2 −^ √^1 2 0 0 0 1

√^1 12 (𝑥^1 +^ 𝑥^2 ) √ 2 (𝑥^1 −^ 𝑥^2 ) 𝑥 3

then

2 𝑧 1 𝑧 2 ≥ 𝑧^23 , 𝑧 1 , 𝑧 2 ≥ 0 =⇒ (𝑥^21 − 𝑥^22 ) ≥ 𝑥^23 , 𝑥 1 ≥ 0 ,

and similarly (by interchanging roles of 𝑥 and 𝑧) we see that

𝑥^21 ≥ 𝑥^22 + 𝑥^23 , 𝑥 1 ≥ 0 =⇒ 2 𝑧 1 𝑧 2 ≥ 𝑧^23 , 𝑧 1 , 𝑧 2 ≥ 0.

Thus, one could argue that we only need quadratic cones, but there are many examples of functions where using an explicit rotated quadratic conic formulation is more natural; in Sec. 3.2 we discuss many examples involving both quadratic cones and rotated quadratic cones.