Optimization 1 - 2010 Lecture Notes, Lecture notes of Mathematics

Definitions and theorems related to convex optimization. It covers topics such as convex sets, convex functions, cones, extreme points, convex combinations, hyperplanes, affine sets, polyhedrons, and dimensions. The document also includes proofs for some of the theorems presented. The content is relevant for students studying optimization and related fields.

Typology: Lecture notes

Pre 2010

Available from 05/06/2023

tandhi-wahyono
tandhi-wahyono 🇮🇩

5

(15)

774 documents

1 / 41

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Optimization I Notes Page 1
Daniel Guetta, 2010
OPTIMIZATION I
Introduction – Lots of Geometry
Some definitions
o Definition (Convex set): The set n
Í if convex if for any 12
,Îxx and
[0,1]lÎ, 12
() (1 )ll l=+- Îxx x. Note that the intersection of a finite
number of convex sets is convex.
o Definition (Convex function): A function ()fx defined on a convex set
n
Í is convex if for any 12
,Îxx , the linear interpolation between those two
points lies above the curve:
()()()
1212
(1 ) (1 ) [0, 1]ffflllll£+-- Î+xxxx
o Definition (Cone): A set n
Î is a cone if for all Îx and any 0l³,
lÎx:
1
2
2
x
1
x
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29

Partial preview of the text

Download Optimization 1 - 2010 Lecture Notes and more Lecture notes Mathematics in PDF only on Docsity!

O PTIMIZATION I

Introduction – Lots of Geometry

 Some definitions

o Definition (Convex set) : The set  Í  n if convex if for any x 1 (^) , x 2 Î and l Î [0,1], x ( ) l = l x 1 (^) + (1 - l ) x 2 Î. Note that the intersection of a finite number of convex sets is convex. o Definition (Convex function) : A function f^ ( ) x^ defined^ on^ a^ convex^ set  Í ^ n is convex if for any x 1 (^) , x 2 Î , the linear interpolation between those two points lies above the curve:

f ( l x 1 + (1 - l ) x 2 ) £ l f ( x 1 ) +(1 - l ) f ( x 2 ) l Î[0,1]

o Definition (Cone) : A set  Î  n is a cone if for all x Î and any l ³ 0 , l x Î :

x 1^ x^2 

The set (^) { x Î  n^ : x = A a a , ³ 0, A Î  n^ ´ m^ , a Î m (^) }is the cone generated by the columns of A. o Definition (extreme point) : An extreme point of the convex set  is a point x Î that cannot be written as a convex combination of other points in .

o Definition (Convex Combination) : A convex combination of points x (^) 1 , , x k

is a point x = (^) å^ ki = 1 li x i , such that l ³ 0 and (^) å^ ki = 1 li = 1. The set of convex combinations of a set points is the smallest convex set containing all the points; it is called the convex hull of these points.

o Definition (Hyperplane) : The set  = (^) { x Î  n^ : ax = b , a Î  n , b Î} is

called a hyperplane with normal a. The set  = (^) { x Î  n^ : ax £ b } is a closed halfspace , and  is its bounding hyperplane. o Definition (Afine set) : A set  a (^) Î  n is an affine set if for all x 1 (^) , x 2 Î a and l Î -¥ ¥( , ) (^) , x ( ) l = l x 1 (^) + (1 - l ) x (^) 2 Î a. A hyperplane is an example of an affine set. Roughly speaking, an affine set is a subspace that need not contain the original. o Definition (Polyhedron) : A polyhedron is a set which is the intersection of a finite number of closed hyperplanes. It is necessarily convex. If the polyhedron is non-empty and bounded (ie: there exists a large ball it lies inside of), it is called a polytope. o Definition (Dimension) : The dimension of an affine set  a is the maximum number of linearly independent vectors in  a. o Definition (Supporting hyperplane) : A supporting hyperplane of a closed, convex set  is a hyperplane  such that  Ç  ¹Æand  Í :

Extreme point

Extreme point

Not an extreme point

 Polyhedra in standard form

o The definition of a polyhedron above (in terms of the intersection of a number of half-spaces) can be written as  = (^) { A x £ b : A Î  n^ ´ m^ , x Î  n^ , b Î m }, where the rows of A contain the normals of the various hyperplanes defining the polyhedron. o It is often convenient, however, to write the polyhedron in an equivalent standard form  ¢^ = (^) { A ¢ x = b : x ³ 0 , A Î  n^ ¢´^ m^ , x Î  n^ ¢, b Î m }, where the rows of A are linearly independent. This involves a number of steps:  Re-write each inequality constraint A row^ ix £ bi as the equality A row^ ix + si = bi , where s (^) i > 0. s (^) i becomes a new variable.  Eliminate any linearly independent rows in A (this does not alter the problem – see page 57 of B&T for proof). Note that this implies that, in standard form, m < n – in other words, the number of constraints is less than or equal to the number of variables.)  Replace any unconstrained variables x (^) i with two new variables x (^) i +^ and xi - , both constrained to be positive, and add the constraint xi = xi +^ - xi -. [The validity of this step is not entirely obvious, but for the simplex method, it works].

 Algebraic Characterization of Vertices & Extreme Points

o In the previous section, we provided definitions of vertices and extreme points. It would seem logical that the solution of a linear program should lie at one of these points. In this section, we see that these concepts are equivalent, and we develop an algebraic characterization of such points. o We present two characterizations – the first is in terms of polyhedra in non- standard form, which is more useful to gain an intuitive grasp of the concept, and the second in terms of polyhedra in standard form, which we will use hereafter. o Theorem : Let  = (^) { x : A x ³ b , A ¢^ x = b ¢} be a non-empty polyhedron, and let x Î. The following three statements are equivalent:

  1. x is a vertex
  1. x is an extreme point
  2. All equality constraints are active at x , some of the inequality constraints are active, and out of all the constraints that are active at x , n of them are linearly independent [ Note : we say the vectors a i are linearly independent if the system of equations a i (^) ⋅ x = bi has a unique solution (see p48 of B&T)]. Proof : See p50, B&T.

o Theorem : Let  = (^) { x Î  n^ : A x = b , x ³ (^0) } be a non-empty polyhedron in

standard form, and let x Î. Then the following three statements are equivalent

  1. x is an extreme point of .
  2. The columns of A corresponding to the strictly positive components of x are linearly independent. More precisely, there exists a subset  of columns of A that are linearly independent such that x (^) i = 0 for all i Ï 
  3. x is a vertex of . At first sight, statement (2) in this theorem seems somewhat different to statement (3) in the previous theorem, because here, we talk in terms of columns of A (variables), whereas in the previous theorem, we talked in terms of rows of A (constraints). In fact, the two are equivalent; we discuss two ways of seeing this; the first in terms of the rows of A , and the second in terms of the columns:  The fact that all variables Ï  are equal to 0 already creates n -  linearly independent active constraints. We need all remaining active constraints to include at least  also be linearly independent constraints; in other words, we need  of the rows of the matrix B to be linearly independent. Another way of saying this is that we need all the columns of the matrix B (there are  of them) to be linearly independent.  Another way of stating the constraint A x = b is that we need to synthesize b from a non-negative linear combination of the columns of A ; in the example of the diet problem, b is our requirement in nutrient-

Now, let w =( w ^0 ) and set x ¢ = x + e + w , and x ¢¢ = x - e - w. We have that x ¢,^ x ¢¢Î. However, x = 12 x^ ¢^ +^12 x ¢¢. This means that x is not an extreme point. Thus, by the contra-positive, 1  2.

 2  3 Choose a point x Î that satisfies 2. We would like to show that there exists a supporting hyperplane  to the polygon  such that  Ç  = x.

We postulate that the hyperplane  = (^) { x Î  n : cx = (^0) } satisfies this requirement, where = æçç^ ö÷÷÷^ = æ öçç^0 ÷÷÷ =vector of 1's çççè (^) ÷÷ø çççè ø÷÷ c c e ce Now, let’s do the proof step-by-step:  x Î We have ⋅ = æ ö æçç^0 ÷÷÷^ ⋅ çç^ ö÷÷÷= ⋅ = 0 çççè ø è (^) ÷÷ ççç ÷÷ø c x x c x e x  ^    Í Consider any y Î { } x. We have

⋅ = æ ö æçç^0 ÷÷÷^ ⋅ çç^ ö÷÷÷= ⋅ çççè ø è (^) ÷÷ ççç ÷÷ø c x y e y e y   But since y Î , we must have y  ³ 0. Thus, cx ³ 0.   Ç  = x Imagine y Î  Ç. This means that yc = 0 , but since all the components of y are positive (since it in the polyhedron), the only way this can happen is if y  = 0. Furthermore, since y is in the polyhedron, A y = b. Thus

( ) ( )

A A

A A A A

x y 0 x y ^ x   y0 We have already established that y ^ = 0 , and by definition, x  = 0 A (^) ( x - y (^) )= 0

By assumption (2), however, this can only happen if x - y = 0x = yx = y.  3  1 Choose a point x Î that satisfies (3), and assume that (1) is not true; in other words, for some l Î [0,1] and x ¢,^ x ¢¢Î , we can write x = l x ¢^ + (1 - l ) x ¢¢. Our assumption that x satisfies (3) implies that there exists a vector c such that cx < cx ¢and cx < cx ¢¢. Now: (1 ) (1 ) (1 )

l l l l l l

⋅ = ⋅ éêë^ ¢^ + - ¢¢ùúû = ⋅ ¢^ + - ¢¢

⋅ + - ⋅ ⋅

c x c x x c x x c x c x c x This is a contradiction. x cannot be on the line between x ¢ and x ¢¢ and also “below” both of them. o Note that the theorem above says nothing of how many variables the set  must contain. The case  = rank A = m , however, is a natural choice, because the constraint A x = b already includes m constraints, and  Choosing  > m is impossible, since A contains only m rows.  Choosing ^ < m would imply choosing more than nm non-negativity constraints, which, in total, would result in more than n constraints. The resulting system would be over-defined, and might not have a solution. We therefore define… o Definition (Basis) : A linearly independent set of m columns

{^ A col^^ B^1 ,^ ,^ A col Bm } of^ A^ is a^ basis^ for the column space of^ A. [Note: if^ A^ contains no linearly independent rows, then rank A = m , and our definition boils down to the fact a basis is a maximally linearly independent set of m columns].

B = éêë^ A co^ l^ B^1 , , A c lo Bm ùúû is called the basis matrix and the associated vector of variables x (^) B that solves B x (^) B = b is called the vector of basic variables. Other variables (and columns of A ) are called non-basic :

The vertex x =(0, 23 , )^23 ^ is at the intersection of 3 planes; since there are three variables, it is non-degenerate. This corresponds to the basis matrix 2 1 B 1 2

æçç ö÷÷ = (^) ççç ÷÷÷ è ø Clearly, each entry of x corresponding to a column in the basis is non-zero. The vertex x =(2, 0, 0)^ , however, is at the intersection of 4 planes. It is degenerate, and corresponds to two different bases 1 1 1 2 B (^) 1 2 B 1 1

æç ö÷÷ æç ö÷÷ = çççç^ ÷÷÷ =çççç ÷÷÷ è ø è ø In each case, one of the variables corresponding to a basic column is 0.

Representation & Optimality

 In this section, we prove what might be called the “fundamental Theorem of Linear Programming” – that the optimal solution of a linear program occurs at a vertex.

 The Representation Theorem

o Before proving our fundamental theorem, we prove that polyhedra can be represented in a very useful form. o Definition (recession direction) : A recession direction of the polyhedron  is a non-zero vector d Î n such that, for any x Î , { x^ :^ x^ =^ x^ +^ q d^^ , q^ Î^ + }Î

For a polyhedron in standard form, d is a recessive direction if and only if A d = 0 (so that we remain feasible as we move along that direction), d ³ 0 (so that we never become negative as we move along that direction) and d ¹ 0.

x

d

o Theorem (Representation) : Any point x Î  º (^) { x : A x = b x , ³ (^0) } can be

written as x = (^) å i Î V li v i + a d l (^) +=1, li ³ 0 , a ³ 0 where (^) { v i^ : i Î V } is the set of vertices of the polyhedron and d is a recession direction. o Proof : We prove this by induction, on the number of strictly positive components in x. Suppose the theorem holds true if x has p – 1 strictly positive components, and consider an x with p strictly positive components. If x is a vertex, the theorem is trivially true. If x is not a vertex, the columns of A corresponding to the positive components of x are linearly dependent. This implies that there exists a w ¹ 0 such that wi = 0 if x (^) i = 0 and A w = 0. Now, consider points of the form x ( ) q = x + q w. Clearly, A x ( ) q = b for all q. As we move along w , we will either hit a non-negativity constraint, or go on forever (if w^ is a recession direction). We consider these two cases:  w has both +ve and - ve components In that case, we’ll hit a non- negativity constraint. Let  q ¢^ = smallest positive q such that x ( q ¢)has at most p – 1 strictly positive components.  q ¢¢^ = largest negative q such that x ( q ¢¢) has at most p – 1 strictly positive components.

turns out that an appropriate starting point is simply the point in the polyhedron with the smallest number of strictly positive components. This must be a vertex, because if it was not, we could carry out the steps outlined above and find a point with fewer strictly positive components – this is a contradiction. [Note that it is not always the case a polyhedron must have vertices – for example, the polyhedron (^) { x Î ^2 : x (^) 1 ³ 0 , x 1 £ (^1) } has no vertices. However, the non-negativity constrains of the standard-form polyhedron ensure there is at least one.

 The Fundamental Theorem

o Theorem (Fundamental Theorem of Linear Programming) : If  ¹ Æ, then the minimum min (^) x Î  cx is either attained at a vertex of  or unbounded. Proof : We consider two cases:  Case 1 –has a recession direction d such that c d ⋅ < 0 : in that case, the problem is unbounded, because for any x Î , cx ( ) q = c ⋅ (^) ( x + q d (^) )= cx + q c d ⋅  -¥ as q  ¥.  Case 2 –  has no such recession direction : in that case, consider any point x Î. By our Representation Theorem, we can write x = (^) å li v^ i + a d , where l + (^) = 1, li ³0, a ³ 0. We then have ( )

( ) ( )

min min

i^ i i v

i i i i i i v

l a l l (^) Î Î

å å å

c x c v c d c v c v c v Thus, the minimum is indeed attained at a vertex.

Simplex

 We have thus far established that the optimum of a linear program occurs at one of the vertices of the feasible region. We now consider the simplex algorithm , an efficient method of jumping for vertex to vertex while constantly improving the objective function.

 Representation in terms of emanating directions

o Consider a polyhedron A x = b , where A = éêë B N^ , ùúû Î m^ ´ n , and a non-degenerate basic solution x ˆ^ =( x ˆ^ B  ,^ x ˆ (^) N ^ ), where x ˆ B (^) = B -^1 b > 0 and x ˆ N (^) = 0. o Claim : Consider the matrix ˆ ˆ 0 0 ˆ 0

B N

M B^ N^ M B^ N

I I

æçç ö÷÷ æçç öæ÷÷ (^) çç ö÷÷ æ öçç ÷÷ = (^) ççç ÷÷÷ = (^) ççç ÷÷÷ (^) ççç ÷÷÷ =ççç ÷÷÷ è ø è øè ø è ø

x x^ b x The last nm columns of M –1^ (ie: from column m + 1 onwards) are the directions of the edges of P emanating from the basic feasible solution ˆ x. Proof : Let h j be the j th^ column of M –1. Using the fact that since there is no degeneracy, x B has M nonzero components, and so the row is clearly from the second half of the matrix above, we can write, for j > m : 1 col 1 1 1 1 0 1 th row

j j j j j b

B

M e B^ B^ N^ e B^ N e I I j

  • -^ -^ -

æçç (^) - ö÷÷ æç (^) - ö÷ æç (^) - ö÷ çç ÷÷÷ = = çç^ ÷÷÷^ = çç ÷÷÷ = ççç^ ÷÷÷ ççè (^) ÷ø ççè (^) ÷ø çç ÷ ¬÷÷ çççè (^) ÷÷÷ø

A

h  

Now, consider moving in the direction h j by an amount q ; x ( ) q = x ˆ + q h j. This point is still on the polyhedron, because

( 1 col^ ) col col col

B B N j j j j

A B N

B B

q q q q q q q

x x x x A A b A A b Thus, hj is indeed an edge of P , and it clearly results from increasing only one of the x N. For a geometric interpretation, consider that the rows of M contain the vectors normal to every active constraint at the BFS, and that MM -^1 = Im row (^) i M (^) col-^1 j = 0 " i ¹ j. This means that our emanating edges (columns of M –1) are perpendicular to every normal vector save one (along which we’re trying to move):

  1. h j are the “extreme” recession directions of C.
  2. h j are the “edges” (1-dimensional faces) of C. [ Hint : let c = sM^ -^1 and s = e - e j ].
  3. If ˆ x is a non-degenerate BFS, then h j is an edge of P.

 Background to the Simplex Algorithm

o Consider linear program is min z = c x ^. The directional derivative of z with respect to x in the direction h j is ch^ j. If it is greater than 0, the direction is “uphill”, and vice-versa. o We call cj = ch j^ = - c (^) BB^ -^1 A col j + cj the reduced cost of direction j. Practically, they can be calculated by  Solving p ^ = c BB -^1  Setting, for j > m , c j = cj - pA col^ j Geometrically:  p is the particular linear combination of equality constraints that gives c B.  Each component A col^ j is the amount by which a unit change in xj will “affect” each constraint.  Thus, pA col^ j is the resulting change in the objective when we move one unit along direction j.  Similarly, cj is the direct change resulting in a unit change of xj. o Theorem : If c (^) j = ch j ³ 0 for all j > m , then the current BFS is optimal. Proof : Consider that any y Î P can be written 1 1 1

n (^) j n j^ m j n j m j^ j j m j^ j

y y y c

= + = + = +

å å å

y x c y c x c c x c x

h h Thus, the objective at any point is greater than at ˆ x. This theorem has an interesting geometrical explanation, which we discuss below, when we motivate duality.

 The Simplex Algorithm

1. Start with a BFS x.

  1. Compute the simplex multipliers p by solving Bp =^ c B , and compute the reduced costs c j^ =^ cj -^ p^ ⋅^ A col^ j for all^ j^ Ï^ basis.
  2. Check for optimality: if c^ j ³^0 " j^ Ïbasis, then the current solution is optimal.
  3. Choose a nonbasic variable to enter the basis; ie: choose a “downhill edge” from the set of downhill edges V along which to move (typically, the one with the smallest reduced cost): q Î V = (^) { j Ï B : cj < (^0) }
  4. Compute w q for all q by solving B w q^ = A col q. Note that w q^ = - h j. If w q^ £ 0 , stop: z  -¥along h q.
  5. Otherwise, compute q = min 1 £ £ i m { xwjj : wi > (^0) } (to find the basic variable that should leave the basis).
  6. Update the solution and the basis matrix B. Set:

i i

q j j i

x x x

q qw

 The Full Tableau Simplex

o The simplex algorithm outlined above is relatively inefficient, because it involves the inversion of the matrix B at each step. We therefore use a different form of the algorithm which constantly maintains and updates the matrix B -^1 éêë A^ | b ùúû. Typically, this information is stored in a tableau containing an extra row:

Or, in more detail:

B -^1 A col 1^ B -^1 A col^ n

,

,

B

B m

x

x

c 1  cn - c x B  B

B -^1 A B -^1 b

c ^ - c B  B^ -^1 A - c B  B -^1 b

o Find pivot column : Choose the pivot column with the smallest reduced cost; or, in the case of ties, the one with the smallest j. This variable will enter the basis :

o Find pivot row : Now, consider that the pivot column contains a col^ j^ = B -^1 A col j. Very conveniently this is none other than the negative of the emanating direction corresponding to j from our BFS, -h j.  If every item in the pivot column, a col^ j is negative, then every component of h j is positive – we can move along this direction without ever becoming infeasible. The problem is unbounded. Assuming the problem is bounded, find q , the maximum amount we can move in direction h j before the problem becomes infeasible, and i , the variable that leaves the basis when this happens: min ij^ : (^) ij 0, argmin ij : (^) ij 0, i i

q x^ h i B i x h i B h h

ìïï üïï ìïï üïï = (^) íï < (^) ýï = (^) íï < ýï ïî ïþ ïî ïþ

Î Î

In terms of our j th^ column, this looks like

 If the minimum above is attained at two values of i , the entering basis is degenerate. See the discussion below for anti-cycling rules. o Pivot : We now pivot on the element a (^) ij. Because of the structure of our tableau, the only thing that needs to change is the vector c B and the matrix B , which needs to change from B to B , where col? col? col? c

col col ol?

i

B j

B éê^ ù

ë úû

é ù

êë úû

A A

A A

A

A

To work out how to update the tableau, consider that

min i i : (^) ij 0 ij

x (^) a q a

ìïïï üïïï = (^) íï > ýï ïïî ïïþ

Leaving basis Pivot row argmin i i : (^) ij 0 ij

i x a a

ìïïï üïïï = = = (^) íï > ýï ïïî ïïþ

j = Entering basis = Pivot column = argmin j (^) { c (^) j : cj < (^0) }

B -^1 B = éê col?^ B -^1 col j col ?úù ë^ I^ ^ AI û Now, imagine we found a matrix Q such that

QB -^1 B Q éê^ col?^ B -^1 col j col ?ù I = (^) ë IAI ûú=

Then we would have QB -^1 = B -^1 So once we find the mysterious matrix Q , all we need to do is apply it to our tableau to update it.

Instead of thinking of Q as a matrix, it is helpful to think about it in terms of row operation – all we need is the series of row operations that will turn B -^1 B into I and apply them to our tableau. These operations are:  Divide the pivot row by the pivot element, to get a 1 in there.  For each other row, subtract appropriate multiples of the pivot row to make every other element in the pivot column zero. In terms of our tableau

/ for the pivot row (ie: )

for every other row (ie: )

ij j i

a a^ a^ i

a a a i

ab^ ab ab a b

a

a

ìïï =

= íï -

îï ¹

It turns out that the rule above also applies to the last row of the tableau. To see why, consider that originally, the last row consists of

éê | 0 ùú - B B -^1 éê A | ùú

ë^ c^ û c^ ë^ b û

 

Adding a multiple of the pivot row to this row involves adding a linear

combination of éêë A^ | b ùúû, and so the result will be of the form

éêë c | 0 ùúû - T éêë A | b ùúû

But consider that after these row operations  The last element of the pivot column contains a 0, by design.  The last element of every other column that stays in the basis will also contain a 0, because