










































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Columbia Business School - First Year of the Doctoral Program in Decisions, Risk and Operations • Condensed Notes roughly following two courses I took - "Foundations of Optimization" (thought by Prof Ciamac Moallemi) and "Convex Optimization" (thought by Prof Garud Iyengar). These notes are also heavily based on Boyd and Vandenberghe's book "Convex Optimization" (available online) and Luenberger's "Optimization by Vector Space Methods". The chapter numbers in these notes refer to Boyd and Vandenberghe's text. Rough list of topics covered: convexity of sets and functions, formulation of convex programs (from linear programs to semi-definite programs), duality, applications, Hilbert and Banach spaces, minimum-norm problems in Banach spaces, the Hahn-Banach Theorem.
Typology: Lecture notes
1 / 50
This page cannot be seen from the preview
Don't miss anything!











































o An optimization problem is minimise f ( ) subject to x x Î f is the objective (real) is the constraint set/feasible set/search space. o x * is an optimal solution ( global minimizer ) if and only if f ( x *^ )£ f ( ) x " x Î o Maximizing f ( x ) is equivalent to minimizing – f ( x ). o We consider problems in the following form minimize ( ) subject to ( ( ) 0
i n
i i^ m g m i
f
r
h = " £ £ £ " £ £ Î
x
x
x
x
o We consider the following subsets of the problem In linear programming , all functions are linear. In convex programming, the f and g are convex, and the h are linear. o If is the feasible set of a problem, a point x Î is a local minimum if there exists a neighborhood Nr ( ) x such that f ( x ) £ f ( ) y " y Î Ç Nr ( x ). It is an unconstrained local minimum if f ( ) x £ f ( ) y " y Î Nr ( ) x. (Strict equivalents exist).
o An open ball around a point x Î n with radius r > 0 is the set
o A point x Î Ì n is an interior point if there exists an open ball such that Nr ( ) x Ì. A set Ì n is open if =int.
o A point x Î Ì n is a closure point if, for every open ball N (^) r ( ) x , there exists y Î with y Î Nr ( ) x. A set Ì n is closed if =cl. o The set of reals is both closed and open. o Theorems : The union of open sets is open. The intersection of a finite number of open sets is open. The intersection of closed sets is closed. The union of a finite number of closed sets is closed.
o A sequence of vectors (^) { x (^) n } Ì n converges to a limit x Î n if (^) k lim ¥ x - x (^) k = 0 , and we say that x (^) k x. o A set Ì n is (sequentially) compact if, given a sequence { x (^) k }Ì , there is a subsequence { x (^) k (^) i }converging to an element x Î. Theorem (Heine-Borel): A set Ì n is compact if and only if it is closed and bounded. Theorem : A closed subset of a compact set is compact. Theorem : Suppose { (^) n }are a sequence of non-empty, compact sets that are nested (ie: n (^) + 1 Ì n ) – then their intersection is non-empty. o A real-valued function f defined on a domain Ì n is continuous at the point x Î if, for every sequence { x (^) k }Ì with x k (^) x , lim k ¥ f ( x (^) k ) = f ( ) x. f is continuous if it is continuous at all points in . o A function f is coercive over a set Ì n if, for every sequence { x (^) k (^) }Ì with x k (^) ¥, we have lim k ¥ f ( x (^) k )= ¥. o The inverse image of the set Ì is defined by f -^1 ( ) = (^) { x Î : f ( x )Î}. Theorem : If f is continuous and is open closed^ and is open closed^ , then f -^1 ( ) is also open closed^. This is the standard way to prove that a set is open/closed.
1 0
( ) ( )^ , , ( ) n (^ )^ li m (^ i )^ ( ) i h
T n
f f^ f^ f^ h^ f x
f x x h = éê^ ¶^ ¶ ùú Î êêë (^) ¶ ¶ úúû
x x^ x^ x x^ e x
f is differentiable over an open set Î if it is differentiable at every point in the set. If, in addition, the components of the gradient are continuous over , then f is continuously differentiable over . o If, for a point x Îint , each component of the gradient is differentiable, we say f is twice differentiable at x , and we define the Hessian Matrix ^2 f ( x ) Î n^ ´ n by 2 2 ( ) ( ) i j (^) ij
f f x x
éê (^) ¶ ùú = ê (^) ¶ ¶ ú êë úû
x^ x
If f is twice continuously differentiable in a neighborhood of x , then the Hessian is symmetric. o Suppose at f is twice continuously differentiable over a neighborhood N^ r ( ) x^ , then for all d Î Nr ( ) 0 1 2 2 f ( x + d ) = f ( ) x + f ( ) x T^ d + 2 d T f ( ) x d + o æççè^ d ö÷÷÷ø (Formally, this means that for every C > 0, there exists a neighborhood around 0 such that the estimate of f ( x + d ) differs from the real value by no more than C d^2. o Consider a vector-valued function F : m^ , Ì n and a point x^ Îint^. We define the gradient to be the matrix F x ( ) Î n^ ´ m with
1 ( ) ( ), , ( ) ( ) j ( ) m ij i
x = é^ ù =^ ¶ êë úû (^) ¶ F x x x x^ x o The chain rule states that for interior points, if h x ( ) = g f x ( ( )) , then h x ( ) = f x ( ) g f x ( ( ))
o Consider a matrix A Î m^ ´ n. Then ker A = (^) { x Î n : A x = (^0) } im A = (^) { y Î m^ : y = A x x , Î n } o Given a set Î n , ^ = (^) { x Î n : x ⋅ y = 0 " y Î}
o Lemma : im A = éêë^ ker( A^ ^ )ùúû^. In other words, given z Î m , z = A x for some x Î n^ z ⋅ y = 0 " y with A y = 0
o Affine sets Definition : A set Ì n is affine if, for all points x 1 (^) , x (^) 2 Î and a scalar l Î , l x 1 (^) + (1 - l ) x (^) 2 Î. Example : The empty space, a line and any subspace are affine. Similarly, = (^) { x Î n^ : A x = b }is affine. Definition : Given a set of points Ì n , the affine hull aff is the set of points l 1 x 1 (^) + + lk x k , where k > 1, { x i (^) }Ì and (^) l + = 1. The affine hull is affine and is the smallest affine set containing . o Convex sets Definition : The set is convex if, for all points x 1 (^) , x 2 Î and a scalar l Î (0,1), l x (^) 1 + (1 - l ) x (^) 2 Î (^). Clearly, affine sets are also convex. Definition : Given a set of points Ì n , the convex hull conv is the set of points l 1 (^) x 1 (^) + + lk x k , where k > 1, { x (^) i }Ì , li ³ 0 and l + = 1. Theorem (scalar multiplication) : if Ì n is convex and a Î , then a = (^) { a x : x Î (^) } is convex. Theorem (vector sum) : If , Ì n are convex sets, then the set + = (^) { x + y : x Î , y Î (^) } is also convex. Theorem (affine transformations) : If Ì n is a convex set, A Î ^ m^ ´ n is a matrix and b Î m is a vector, then the set { A^ x^ +^ b^ : x^ Î^ }is a convex subset of^ m^. Theorem : If ^ is an arbitrary collection of convex sets, then the intersection Ç Î is also convex. o Miscellaneous definitions
The L2-norm (Euclidean norm): x (^) 2 = (^) å^ ni = 1 xi^2 = x ⋅ x x (^) G = x ^ G x (when G 0 and symmetric) The p -norm: (^) ( )
1/ 1 |^ |
n (^) p^ p x = (^) å i = xi for p > 1 x (^) ¥ =max |{ x 1 | , , | xn |} Given a norm, the (closed) ball with centre x 0 and radius r > 0 is { x^ Î^ n^^ : x^ -^ x 0 £ r }, and it is convex.
o Theorem (projection) : Let Ì n be a closed and non-empty convex set, and
consider the Euclidean norm. Fix the vector x^^ Î n^. Consider the problem min s.t. Î n
z x z For every x Î n , the problem has a unique global minimum x * called the projection of x onto . A vector x ¢ Î is equal to x * if and only if ( x - x ¢) (^ ⋅ z - x ¢) £ 0 " z Î Geometrically, the angle between x ¢ x and x ¢ z must be larger than 90o for all points in the set:
Proof : Existence follows from the fact z - x is coercive and is closed. Uniqueness follows because minimizing z - x is equivalent to minimizing z - x^2 = z ⋅ z - 2 z ⋅ x + x ⋅ x , which is strictly convex.
Now, consider that f ( x *^ ) = 2( x *- x ). By necessary and sufficient conditions for convex optimization problems (derived later), the condition in the theorem must hold.
x^ ¢
x z
Application : Suppose we want to approximate f ( x ) over a set of points {^ x 1^ ,^ ,^ x m } using^ g ( )^ x^^ =^ å^ k = 1 r^ f ( ) x , where the^^ fi^ are basis functions and^ r^ is a vector of weights. One way to do this is to solve the problem
{ }
2 min 1 ( ) ( ) s.t. ( ) is a linear combination of ( )
m i f^ i g i g f
=éêë^ - ùúû ⋅ ⋅
å x^ x Consider the matrix F i (^) , = f ( x i ) and the vector y , yi = f ( x (^) i ). This problem is equivalent to
{ }
min s.t. : k
Î F r Î
z r
y z This is a projection problem, and so a unique optimizer exists.
Theorem – Sufficient Conditions (Weierstrass): Consider the problem min f ( ) s.t. x x Î Ì n. Then if o is non-empty o f is lower semicontinuous over and one of the following conditions hold:
0
0
lim (^ )^ (^ ) 0
li (^ )^ (^ )
m ( )
f
f
f f
f f
a
a
a a a a a
x d x d x
d
d x
x
x (^) d
Thus, f ( x *) ⋅ d ³ 0. Since d is arbitrary, we must have f ( x *) = 0.
12 2 2 *^2 1 2 2 *^2
2
T T
f f f o o
f f
a a a a a a
x + d x x d d x d d x d
and so 12 2 2 *^2 1 2 *^2 2 2
T T
f o f o
a a a a
d x d d x d Taking limits as a 0 : d T^ ^2 f ( x *) d ³ 0 Since d is arbitrary, this leads to our result. o Theorem – sufficient conditions : Consider a point x *^ Îint. If f is twice
continuously differentiable in a neighborhood of x * , and f ( x *^ ) = 0 ^2 f ( x *) 0 Then^ x^ * is a strict unconstrained local minimum. The geometric interpretation is as above – the only difference is that we now require a positive definite instead of a positive semi definite matrix. Proof : Let l > 0 be the smallest eigenvalue of ^2 f ( x *), and let d Î Nr ( ) \ { } 0 0 ( ) ( ) ( ) ( )
12 * 2 12 *^2 1 2 2 2 (^22)
2
T T
f f o f
f o o
f
o
l l
æç ö÷÷ = ççç^ + ÷÷÷ ççè ÷÷
ø
x d x
Now, for any g Î (0, l ), there exists e Î (0, r ]such that
2 2 with ||^ |
l (^) + o d ³ g (^) " d d < e d And this means that ( *^ ) ( *^ ) || ||^2 ( *) f f (^) 2 f x + d ³ x + g d > x
o Verify there is a global minimum (using the existence theorem). o Find the set of possible unconstrained local minima using f ( ) x = 0. o Compare these points with all points on the boundary \ int. o Example : Consider min (^) x Î n^12 x^ ^ G x - b x ^ and G 0. By an earlier theorem, global minima must exist. Furthermore, \ int is empty, and so the global minimum must be an unconstrained local minimum. The first order necessary conditions immediately allow us to characterize that point as G x *- b = 0.
o Consider the problem min f ( , x a ) s.t. x Î n. We let x * be a local optimum, and f *^ ( ) a = f ( x *( ), a a ). The first-order conditions are x (^) f ( x *( ), a a )= 0 Taking the derivative with respect to a , we obtain x *^ ( ) a (^2) xx (^) f ( x *^ ( ), a a ) + ^2 xa f ( x *( ), a a ) = 0 From this expression, we can obtain expressions for the sensitivity of the optimum, and of the optimal value: x *^ ( ) a = -^2 xa^ f ( x *^ ( ), a a ) (^) { ^2 xx^ f ( x *( ), a a )}- 1 f^ *^ ( ) a = a (^) f ( x *^ ( ), a a ) = x *^ ( ) a (^) x (^) f ( x *^ ( ), a a ) + a (^) f ( x *^ ( ), a a ) = af ( x *( ), a a ) o The implicit function theorem tells us when this exists.
Consider the problem min f ( ) s.t. x x Î Ì n. We are interested in characterizing local minima that are not in int . We will assume, though, that f is continuously differentiable in a neighborhood of the point considered.
( ) ( )
f ( x (^) k (^) ) = f ( x ) + x^^ k - d x^ f ( x k )⋅ d k
Now, if d Î( x *) as well, then f ( x *) ⋅ d < 0. The strict inequality implies that this is also true in a neighborhood of x *, and so for k large enough, we get f ( x k ) < f ( x *). This contradicts the local minimality of x *. Unfortunately, is hard to characterize algebraically, unless we focus on the particular example where is the intersection of equality constraints.
Consider the problem min f ( ) s.t. x h x ( ) = 0 x , Î n where h : n^ m. We assume the f and h (^) i are continuously differentiable in a neighborhood of the local minimum. In this particular case, we will show we can characterize in a simple way. The intuition behind our result is that for any feasible x , d Î n and a > 0 h x ( + a d ) » h ( ) x + a h x ( ) d = a h ( ) x d So intuitively, one might expected that any direction for which h x ( ) d^ = 0 to maintain feasibility. We now formalize this statement… Definition : the cone of first-order feasible variations at x *^ Î n is the set ( x *^ ) = (^) { d Î n : h x ( *^ ) d^ = (^0) } = éêëker^ h x ( *)ùúû Note that d Î ( x *^ ) - d Î( x *). As such, ( x *)is actually a subspace of n. Definition : A point x *^ Î n is a regular point if it is feasible and the constraint gradients h (^) i ( x *) are linearly independent. In other words, h x ( *) ¹ 0. If m > n , no regular points exist, and if m = 1, this reduces to h 1 ( x *)¹ 0. Lemma (regularity) : Let x * be a regular point. Then ( x *^ ) =( x *) Proof : This theorem is hard. The intuition behind the proof is o Consider the curve we would trace if we were sitting at a point x * and we started walking forward or backwards while staying on the constraint (ie: while keeping the constraint satisfied). We’ll start by showing that for any direction
d Î( x *), there is such a path that starts by walking forward or backward along the direction d. o Once we’ve established this, the result is relatively easy, because the path constitutes a “walk” fully contained in our set which eventually ends up being in the direction d. It’s therefore in . And now the painful details! First, let’s find the curve in question: o Begin by choosing d Î ( x *). Given a scalar t , consider the curve x ( ) t = x *+ t d. This satisfies our requirement that we be moving either side of x *, and that we start by going in direction d. However, there’s no guarantee we stay on the constraints. o Instead, consider the path x ( t ) = x *^ + t d + h x ( *) ( ) u t for some unknown vector u ( ) t Î m. This seems sensible – we are correcting our path to reflect how h might change. For x ( t ) to be “valid”, we require it to satisfy the m equations h x ( *^ + t d + h x ( *) ( ) u t )= 0 For t = 0, u (0) = 0 is clearly a solution.
Now, take the gradient of the boxed equation with respect to u and evaluate it at ( t , u ) = 0. We get h x ( *^ ) h x ( *) Since the columns of h x ( *)are linearly independent, this matrix is invertible. The two results above allow us to use the implicit function theorem to deduce that a solution u ( t ) to the boxed equation exists for all t Î -( t t , ), for some t.
Thus, we have managed to find a curve x ( )^ t that keeps us on the constraints and that is defined over t Î -( t t , ) with x (0) = x * (this implies that the curve represents moving forward and backward from x *). o All we now need to prove is that the initial direction in which we move is d. To do that, differentiate the boxed equation above with respect to t and evaluate at t = 0. We get
Or in other words, we require f ( x *)to be in ( x *)^: f ( *^ ) Î ( *^ )^ = éê^ ker ( *^ ) ùú^= im ( *) x x (^) ë h x (^) û h x Or in other words, there exists l Î m such that f ( x *^ ) = h x ( *) l. Proof : Since x * is a local minimum, ( x *^ ) Ç ( x *)= Æ, and since x * is regular, ( x *^ ) Ç ( x *)= Æ. Now, assume d Î( x *) – by what we have such said, d Ï( x *), and so f ( x *) ⋅ d ³ 0. However, since we also have - d Î( x *), we must have f ( x *) ⋅ d = 0. For the last part of the theorem, note that im A = (ker A ^ )^, as proved in the introductory section of these notes. The last part of the previous theorem is important, because it provides a “simple” way to characterize the tangent cone, and a “recipe” to find optimal points. This can be formalized further using…
o Theorem – necessary conditions : If x * is a local minimum that is a regular point, then there exists a unique vector l *^ Î m called a Lagrange multiplier such that f ( x *^ ) + l *^ h x ( *^ ) = f ( x *^ ) + (^) å^ mi = 1 li *^ h (^) i ( x *)= 0 In addition, if f and h are twice continuously differentiable d ^ ( ^2 f ( x *^ ) + (^) å^ mi = 1 li *^ ^2 hi ( x *^ )) d ³ 0 " d Î( x *)
There is an interesting geometrical interpretation of the first-order condition. It effectively states that f ( x *) [the direction in which we might increase our objective] must be a linear combination of the h (^) i ( x *) [the perpendicular to the constraints hi ( x * ) = 0 ]. Since we cannot move along any of those perpendiculars without leaving the constraints, we clearly cannot move along f ( x *). Here is an example, in which f^ ( ) x^ is constant:
Proof : The existence of l * is simply a restatement of the previous theorem. The uniqueness of l * follows from the fact that the columns of h x ( *) are linearly independent. For the second-order condition, consider a d Î( x *), and use the first part of the regularity lemma to define a path x ( ) t either side of x *, which always stays on the constraints and such that x (0) = d. Now, define g t ( ) = f ( ( )) x t and take a double derivative g t ( ) = x ( ) t ^ ^2 f ( ( )) ( ) x t x t + x ( ) t f ( ( )) x t Since all points x ( t ) satisfy the constraints of the problem, and x * is a local minimum, t = 0 must be an unconstrained local minimum of g ( t ). Thus g (0) = d ^2 f ( x *^ ) d + x (0) f ( x *) ³ 0 Finally, consider ( ) t = l * h^ ( ( )) x t = 0 and differentiate it twice, to get (0) (^) = d ^ ( å^ mi = 1 li ^2 h^ i ( x *^ )) d + x (0) h ( x *^ ) l *= 0 Finally, add the last two equations, and apply the first order condition. o We define the Lagrangian as ( x , l ) = f ( ) x + l ⋅ h x ( ) The first and second order conditions then reduce to
2 * * *
( , ) 0 d ( )
x xx
x d x d x
l l
And the feasibility condition is given by (^) l ( x *^ , l *) = 0
{ x^ :^ h x ( )^^ =^0 } f ( ) x (Darker shading implies larger value of f )
h x ( *)
x^ *
multipliers under weaker assumptions called constraint qualifications. If the constraints are linear, for example, Lagrange multipliers are guaranteed to exist. The weakest form of constraint qualification is quasiregularity , which requires that ( x *^ ) = ( x *). o Theorem – Sufficient Conditions : Assume that f and h are both twice
continuously differentiable, and that x *^ Î n and l *^ Î m satisfy
2 * * *
x
xx
x 0 x 0 d x d d x 0
l
l l (^) l Then x * is a strict local minimum. Proof : The second condition above implies that x * is clearly feasible. Suppose it is not a strict local minimum; then there exists a sequence { x (^) k } Ì n such that x (^) k ¹ x * and x k (^) x * which lies entirely in the feasible region of the problem [ie: h x ( (^) k (^) )= 0 ] and f ( x k )£ f ( x *). We define, for some d
k kk *^ dk k^0
d x^ x d x x x x Now, by the mean value theorem, there exists x Î [ x *, x k ]with h x ( (^) k (^) ) - h ( x *^ ) = h x ( (^) k (^) ) (^ x k - x *) = h ( x k ) ( d (^) k d k ) But since x * and x k are feasible, h x ( (^) k (^) ) = h x ( *)= 0 , so. h x ( k (^) ) d k = 0 Taking the limit as k ¥ , we get h x ( *) d^ = 0 , and so d Î ( x *). Now, we know that h x ( (^) k (^) ) = 0 f x ( (^) k (^) ) - f ( x *) £ 0 Using a second order Taylor expansion (with remainder) with some set of x ˆ i Î [ x k , x *^ ], we can re-write these as hi ( x k (^) ) = hi ( x *) + dk h (^) i ( x *^ ) ⋅ d k (^) + 12 dk^2 d^ k ^2 hi ( x ˆ i ) d k = 0 f ( x (^) k ) - f ( x^ *^ ) = dk f ( x *^ ) ⋅ d k (^) + 12 dk^2^ d k ^2 f ( x ˆ^0 ) d (^) k £ 0 We can modify the first set of equations slightly by remembering that h x ( *) = 0 , and multiplying both sides of the equation by l i *. This gives
hi ( x (^) k ) = dk li * h (^) i ( x *^ ) ⋅ d k (^) + 12 dk^2 d^ k li *^ ^2 hi ( x ˆ i ) d (^) k = 0 Adding these m + 1 equations, we get ( ) ( ) ( )
m (^) i i k k k m (^) i i i k i i m (^) i k k k
k k (^) i i i k
f h f h L f h
d d
l d l d l
å å x å
x x d d x x d x l d d x x d
Noting that, by the first order conditions, x L ( x *^ , l * ) and then dividing by 12 dk^2 and taking the limit as k ¥ , this becomes d ( ^2 f ( x *^ ) + (^) å^ mi = 1 li *^ ^2 hi ( x *)) d £ 0 But since d Î ( x *) \ { } 0 , this violates our assumed second order condition. o We now consider an application of these conditions. Consider the program min (^) x Î n s^2 = x^ G^ x s.t. 1 x ^ = 1, m x = m which might represent minimizing the variance in a portfolio while keeping total sales equal to 1 unit, and keeping the expected return equal to a certain value m. The first-order conditions give 2 G x *^ + l 1 * 1 + l 2 *^ m = 0 1 x ^ *^ = 1, m x *= m From the first equation, we obtain
l l l l l l m
x 1 1 x 1 1 1 x 1
m m m m m m
The last two equations are a system of equations for ( l^1 *^ ,^ l 2^ *): 1 1 1 * 1 1 2 *
l l m
æç (^) G G öæ÷÷ (^) ç ö÷÷ (^) æç ö÷÷
m m m m
this system is nonsingular provided that G 0 and 1 and m are linearly independent. We then get 1 * 1 1 2 *^2
l h z m l h^ z m
æç ö÷÷ (^) æç (^) + ö÷÷ ççç (^) ÷÷÷ =ççç + ÷÷÷ çè ÷ø çè ø Where the constants depend on G and m. Now, using the first equation in the FOCs, we obtain, for some vectors v and w x^ *^ = m v + w