Introduction to Semidefinite Programming - Lecture Notes | IOE 511, Study notes of Systems Engineering

Material Type: Notes; Professor: Epelman; Class: Cont Optimum Methods; Subject: Industrial And Operations Engineering; University: University of Michigan - Ann Arbor; Term: Fall 2007;

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-spg
koofers-user-spg 🇺🇸

5

(1)

10 documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
IOE 511/Math 562, Section 1, Fall 2007 108
16 Introduction to Semidefinite Programming (SDP)
16.1 Introduction
Semidefinite programming (SDP ) is probably the most exciting development in mathematical
programming in the last ten years. SDP has applications in such diverse fields as traditional
convex constrained optimization, control theory, and combinatorial optimization. Because SDP is
solvable via interior-point methods (and usually requires about the same amount of computational
resources as linear optimization), most of these applications can usually be solved fairly efficiently
in practice as well as in theory.
16.2 A Slightly Different View of Linear Programming
Consider the linear programming problem in standard form:
LP : minimize c·x
s.t. ai·x=bi, i = 1, . . . , m
xRn
+.
Here xis a vector of nvariables, and we write c·x for the inner-product !n
j=1 cjxj”, etc.
Also, Rn
+:= {xRn|x0}, and we call Rn
+the nonnegative orthant. In fact, Rn
+is a closed convex
cone, where Kis called a closed a convex cone if Ksatisfies the following two conditions:
If x, w K, then αx+βwKfor all nonnegative scalars αand β.
Kis a closed set.
In words, LP is the following problem:
“Minimize the linear function c·x, subject to the condition that xmust solve mgiven equations
ai·x=bi, i = 1, . . . , m, and that xmust lie in the closed convex cone K=Rn
+.”
We will write the standard linear programming dual problem as:
LD : maximize
m
!
i=1
yibi
s.t.
m
!
i=1
yiai+s=c
sRn
+.
Given a feasible solution xof LP and a feasible solution (y, s) of LD, the duality gap is simply
c·x!m
i=1 yibi= (c!m
i=1 yiai)·x=s·x0,because x0 and s0. We know from LP
duality theory that so long as the primal problem LP is feasible and has bounded optimal objective
value, then the primal and the dual both attain their optima with no duality gap. That is, there
exists xand (y, s) feasible for the primal and dual, respectively, for which c·x!m
i=1 y
ibi=
s·x= 0.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Introduction to Semidefinite Programming - Lecture Notes | IOE 511 and more Study notes Systems Engineering in PDF only on Docsity!

16 Introduction to Semidefinite Programming (SDP)

16.1 Introduction

Semidefinite programming (SDP ) is probably the most exciting development in mathematical programming in the last ten years. SDP has applications in such diverse fields as traditional convex constrained optimization, control theory, and combinatorial optimization. Because SDP is solvable via interior-point methods (and usually requires about the same amount of computational resources as linear optimization), most of these applications can usually be solved fairly efficiently in practice as well as in theory.

16.2 A Slightly Different View of Linear Programming

Consider the linear programming problem in standard form:

LP : minimize c · x s.t. ai · x = b (^) i , i = 1,... , m x ∈ R n +.

Here x is a vector of n variables, and we write “c · x” for the inner-product “

∑n j=1 cj^ xj^ ”, etc.

Also, R n + := {x ∈ R n^ | x ≥ 0 }, and we call R n + the nonnegative orthant. In fact, R n + is a closed convex cone, where K is called a closed a convex cone if K satisfies the following two conditions:

  • If x, w ∈ K, then αx + βw ∈ K for all nonnegative scalars α and β.
  • K is a closed set.

In words, LP is the following problem:

“Minimize the linear function c · x, subject to the condition that x must solve m given equations ai · x = b (^) i , i = 1,... , m, and that x must lie in the closed convex cone K = R n + .”

We will write the standard linear programming dual problem as:

LD : maximize

∑m i=

y (^) i b (^) i

s.t.

∑m i=

y (^) i ai + s = c s ∈ R n +.

Given a feasible solution x of LP and a feasible solution (y, s) of LD, the duality gap is simply c · x −

∑m i=1 y^ i^ b^ i^ = (c^ −^

∑m i=1 y^ i^ ai^ )^ ·^ x^ =^ s^ ·^ x^ ≥^0 ,^ because^ x^ ≥^ 0 and^ s^ ≥^ 0. We know from^ LP duality theory that so long as the primal problem LP is feasible and has bounded optimal objective value, then the primal and the dual both attain their optima with no duality gap. That is, there exists x∗^ and (y ∗^ , s∗^ ) feasible for the primal and dual, respectively, for which c · x∗^ −

∑m i=1 y^ ∗ i b^ i^ = s∗^ · x∗^ = 0.

16.3 Facts about Matrices and the Semidefinite Cone

16.3.1 Facts about the Semidefinite Cone

If X is an n × n matrix, then X is a symmetric positive semidefinite (SPSD) matrix if X = X T and v T^ Xv ≥ 0 for any v ∈ R n^.

If X is an n × n matrix, then X is a symmetric positive definite (SPD) matrix if X = X T^ and

v T^ Xv > 0 for any v ∈ R n^ , v %= 0.

Let S n^ denote the set of symmetric n × n matrices, and let S n + denote the set of symmetric positive semidefinite (SPSD) n × n matrices. Similarly let S (^) ++n denote the set of symmetric positive definite (SPD) n × n matrices.

Let X and Y be any symmetric matrices. We write “X & 0” to denote that X is SPSD, and we write “X & Y ” to denote that X − Y & 0. We write “X ' 0” to denote that X is SPD, etc.

S n + = {X ∈ S n^ | X & 0 } is a closed convex cone in R n 2 of dimension n × (n + 1)/2.

To see why this remark is true, suppose that X, W ∈ S (^) +n. Pick any scalars α, β ≥ 0. For any v ∈ R n^ , we have: v T^ (αX + βW )v = αv T^ Xv + βv T^ W v ≥ 0 ,

whereby αX + βW ∈ S n +. This shows that S (^) +n is a cone. It is also straightforward to show that S (^) +n is a closed set.

16.3.2 Facts about Eigenvalues and Eigenvectors

If M is a square n × n matrix, then λ is an eigenvalue of M with corresponding eigenvector x if M x = λx and x %= 0.

Note that λ is an eigenvalue of M if and only if λ is a root of the polynomial:

p(λ) := det(M − λI),

that is p(λ) = det(M − λI) = 0.

This polynomial will have n roots counting multiplicities, that is, there exist λ 1 , λ 2 ,... , λ (^) n for which: p(λ) := det(M − λI) = Π ni=1 (λ (^) i − λ).

If M is symmetric, then all eigenvalues λ of M must be real numbers, and these eigenvalues can be ordered so that λ 1 ≥ λ 2 ≥ · · · ≥ λ (^) n if we so choose.

The corresponding eigenvectores ( q 1 ,... , q n^ of M can be chosen so that they are orthogonal, namely q i^

) T (

q j^

= 0 for i %= j, and can be scaled so that

q i^

) T (

q i^

= 1. This means that the matrix:

Q :=

[

q 1 q 2 · · · q n^

]

16.4 Semidefinite Programming

Let X ∈ S n^. We can think of X as a matrix, or equivalently, as an array of n 2 components of the form (x 11 ,... , xnn ). We can also just think of X as an object (a vector) in the space S n^. All three different equivalent ways of looking at X will be useful.

What will a linear function of X look like? If C(X) is a linear function of X, then C(X) can be written as C • X, where

C • X :=

∑^ n

i=

∑^ n

j=

Cij X (^) ij.

If X is a symmetric matrix, there is no loss of generality in assuming that the matrix C is also symmetric. With this notation, we are now ready to define a semidefinite program. A semidefinite program (SDP ) is an optimization problem of the form:

SDP : minimize C • X s.t. A (^) i • X = b (^) i , i = 1,... , m, X & 0.

Notice that in an SDP that the variable is the matrix X, but it might be helpful to think of X as an array of n 2 numbers or simply as a vector in S n^. The objective function is the linear function C • X and there are m linear equations that X must satisfy, namely A (^) i • X = b (^) i , i = 1,... , m. The variable X also must lie in the (closed convex) cone of positive semidefinite symmetric matrices S n +. Note that the data for SDP consists of the symmetric matrix C (which is the data for the objective function) and the m symmetric matrices A 1 ,... , A (^) m , and the m−vector b, which form the m linear equations.

Let us see an example of an SDP for n = 3 and m = 2. Define the following matrices:

A 1 =

 , A 2 =

 (^) , b =

, and C =

Then the variable X will be the 3 × 3 symmetric matrix:

X =

x 11 x 12 x 13 x 21 x 22 x 23 x 31 x 32 x 33

and so, for example,

C • X = x 11 + 2x 12 + 3x 13 + 2x 21 + 9x 22 + 0x 23 + 3x 31 + 0x 32 + 7x 33 = x 11 + 4x 12 + 6x 13 + 9x 22 + 0x 23 + 7x 33.

since, in particular, X is symmetric. Therefore the SDP can be written as:

SDP : minimize x 11 + 4x 12 + 6x 13 + 9x 22 + 0x 23 + 7x 33 s.t. x 11 + 0x 12 + 2x 13 + 3x 22 + 14x 23 + 5x 33 = 11 0 x 11 + 4x 12 + 16x 13 + 6x 22 + 0x 23 + 4x 33 = 19

X =

x 11 x 12 x 13 x 21 x 22 x 23 x 31 x 32 x 33

Notice that SDP looks remarkably similar to a linear program. However, the standard LP con- straint that x must lie in the nonnegative orthant is replaced by the constraint that the variable X must lie in the cone of positive semidefinite matrices. Just as “x ≥ 0” states that each of the n components of x must be nonnegative, it may be helpful to think of “X & 0” as stating that each of the n eigenvalues of X must be nonnegative. It is easy to see that a linear program LP is a special instance of an SDP. To see one way of doing this, suppose that (c, a 1 ,... , am , b 1 ,... , b (^) m ) comprise the data for LP. Then define:

A (^) i =

ai 1 0... 0 0 ai 2... 0 .. .

0 0... a (^) in

, i = 1,... , m, and C =

c 1 0... 0 0 c 2... 0 .. .

0 0... cn

Then LP can be written as:

SDP : minimize C • X s.t. A (^) i • X = b (^) i , i = 1,... , m, X (^) ij = 0, i = 1,... , n, j = i + 1,... , n, X & 0 ,

with the association that

X =

x 1 0... 0 0 x 2... 0 .. .

0 0... xn

Of course, in practice one would never want to convert an instance of LP into an instance of SDP. The above construction merely shows that SDP includes linear programming as a special case.

16.5 Semidefinite Programming Duality

The dual problem of SDP is defined (or derived from first principles) to be:

SDD : maximize

∑m i=

y (^) i b (^) i

s.t.

∑m i=

y (^) i A (^) i + S = C S & 0.

One convenient way of thinking about this problem is as follows. Given multipliers y 1 ,... , y (^) m , the objective is to maximize the linear function

∑m i=1 y^ i^ b^ i^. The constraints of^ SDD^ state that the matrix S defined as

S = C −

∑^ m

i=

y (^) i A (^) i

must be positive semidefinite. That is,

C −

∑^ m

i=

y (^) i Ai & 0.

where the last inequality follows from the fact that all Djj ≥ 0 and the fact that the diagonal of the symmetric positive semidefinite matrix P T^ QEQT^ P must be nonnegative.

To prove the second part of the proposition, suppose that trace(SX) = 0. Then from the above equalities, we have ∑n

j=

Djj (P T^ QEQT^ P )jj = 0.

However, this implies that for each j = 1,... , n, either Djj = 0 or the (P T^ QEQT^ P )jj = 0. Furthermore, the latter case implies that the j th^ row of P T^ QEQT^ P is all zeros. Therefore DP T^ QEQT^ P = 0, and so SX = P DP T^ QEQT^ = 0.

Unlike the case of linear programming, we cannot assert that either SDP or SDD will attain their respective optima, and/or that there will be no duality gap, unless certain regularity conditions hold. One such regularity condition which ensures that strong duality will prevail is a version of the “Slater condition,” summarized in the following theorem which we will not prove:

Theorem 92 Let z ∗ P and z ∗ D denote the optimal objective function values of SDP and SDD, respectively. Suppose that there exists a feasible solution Xˆ of SDP such that Xˆ ' 0 , and that there exists a feasible solution (ˆy, Sˆ) of SDD such that Sˆ ' 0. Then both SDP and SDD attain their optimal values, and z ∗ P = z (^) D∗.

16.6 Key Properties of Linear Programming that do not extend to SDP

The following summarizes some of the more important properties of linear programming that do not extend to SDP :

  • There may be a finite or infinite duality gap. The primal and/or dual may or may not attain their optima. However, as noted above in Theorem 92, both programs will attain their common optimal value if both programs have feasible solutions that are SPD.
  • There is no finite algorithm for solving SDP. There is a simplex algorithm, but it is not a finite algorithm. There is no direct analog of a “basic feasible solution” for SDP.

16.7 SDP in Combinatorial Optimization

SDP has wide applicability in combinatorial optimization. A number of N P −hard combinatorial optimization problems have convex relaxations that are semidefinite programs. In many instances, the SDP relaxation is very tight in practice, and in certain instances in particular, the optimal solution to the SDP relaxation can be converted to a feasible solution for the original problem with provably good objective value. An example of the use of SDP in combinatorial optimization is given below.

16.7.1 An SDP Relaxation of the MAX CUT Problem

Let G be an undirected graph with nodes N = { 1 ,... , n}, and edge set E. Let wij = wji be the weight on edge (i, j), for (i, j) ∈ E. We assume that wij ≥ 0 for all (i, j) ∈ E. The MAX CUT

problem is to determine a subset S of the nodes N for which the sum of the weights of the edges that cross from S to its complement S¯ is maximized (where S¯ := N \ S).

We can formulate MAX CUT as an integer program as follows. Let xj = 1 for j ∈ S and xj = − 1 for j ∈ S¯. Then our formulation is:

M AXCU T : maximize (^) x (^14)

∑^ n i=

∑^ n j=

wij (1 − xi xj )

s.t. xj ∈ {− 1 , 1 }, j = 1,... , n.

Now let Y = xxT^ ,

whereby Yij = xi xj , i = 1,... , n, j = 1,... , n.

Also let W be the matrix whose (i, j)th^ element is wij for i = 1,... , n and j = 1,... , n. Then MAX CUT can be equivalently formulated as:

M AXCU T : maximize (^) Y,x (^14)

∑^ n i=

∑^ n j=

wij − 14 W • Y

s.t. xj ∈ {− 1 , 1 }, j = 1,... , n Y = xxT^.

Notice in this problem that the first set of constraints are equivalent to Yjj = 1, j = 1,... , n. We therefore obtain: M AXCU T : maximize (^) Y,x (^14)

∑^ n i=

∑^ n j=

wij − 14 W • Y

s.t. Yjj = 1, j = 1,... , n Y = xxT^.

Last of all, notice that the matrix Y = xxT^ is a symmetric rank-1 positive semidefinite matrix. If we relax this condition by removing the rank-1 restriction, we obtain the following relaxtion of MAX CUT, which is a semidefinite program:

RELAX : maximize (^) Y (^14)

∑^ n i=

∑^ n j=

wij − 14 W • Y

s.t. Yjj = 1, j = 1,... , n Y & 0.

It is therefore easy to see that RELAX provides an upper bound on MAXCUT, i.e.,

M AXCU T ≤ RELAX.

As it turns out, one can also prove without too much effort that:

  1. 87856 RELAX ≤ M AXCU T ≤ RELAX.

This is an impressive result, in that it states that the value of the semidefinite relaxation is guar- anteed to be no more than 12.2% higher than the value of N P -hard problem MAX CUT.

16.8.2 SDP for Second-Order Cone Optimization

A second-order cone optimization problem (SOCP) is an optimization problem of the form:

SOCP: minx cT^ x s.t. Ax = b ‖Qi x + d (^) i ‖ ≤

g (^) iT x + h (^) i

, i = 1,... , k.

In this problem, the norm ‖v‖ is the standard Euclidean norm:

‖v‖ :=

v T^ v.

The norm constraints in SOCP are called “second-order cone” constraints. Note that these are convex constraints.

Here we show that any second-order cone constraint can be written as an SDP constraint. Indeed we have:

Property:

‖Qx + d‖ ≤

g T^ x + h

(g T^ x + h)I (Qx + d) (Qx + d)T^ g T^ x + h

Note in the above that the matrix involved here is a linear function of the variable x, and so is in the general form of an SDP constraint. This property is a direct consequence of the fact (stated earlier) that

M =

P v v T^ d

& 0 ⇐⇒ d − v T^ P −^1 v ≥ 0.

Therefore we can write the second-order cone optimization problem as:

SDPSOCP: minx cT^ x s.t. Ax( = b (g Ti x + h (^) i )I (Qi x + d (^) i ) (Qi x + d (^) i )T^ g Ti x + h (^) i

& 0 , i = 1,... , k.

16.8.3 SDP for Eigenvalue Optimization

There are many types of eigenvalue optimization problems that can be formulated as SDP s. In a typical eigenvalue optimization problem, we are given symmetric matrices B and Ai , i = 1,... , k, and we choose weights w 1 ,... , w (^) k to create a new matrix S:

S := B −

∑^ k

i=

wi A (^) i.

In some applications there might be restrictions on the weights w, such as w ≥ 0 or more generally linear inequalities of the form Gw ≤ d. The typical goal is then to choose w in such a way that the eigenvalues of S are “well-aligned,” for example:

  • λ (^) min (S) is maximized
  • λ (^) max (S) is minimized
  • λ (^) max (S) − λ (^) min (S) is minimized

∑n j=1 λ^ j^ (S) is minimized or maximized

Let us see how to work with these problems using SDP. First, we have:

Property: M & tI if and only if λ (^) min (M ) ≥ t.

To see why this is true, let us consider the eigenvalue decomposition of M = QDQT^ , and consider the matrix R defined as:

R = M − tI = QDQT^ − tI = Q(D − tI)QT^.

Then M & tI ⇐⇒ R & 0 ⇐⇒ D − tI & 0 ⇐⇒ λ (^) min (M ) ≥ t.

Property: M - tI if and only if λ (^) max (M ) ≤ t.

To see why this is true, let us consider the eigenvalue decomposition of M = QDQT^ , and consider the matrix R defined as:

R = M − tI = QDQT^ − tI = Q(D − tI)QT^.

Then M - tI ⇐⇒ R - 0 ⇐⇒ D − tI - 0 ⇐⇒ λ (^) max (M ) ≤ t.

Now suppose that we wish to find weights w to minimize the difference between the largest and the smallest eigenvalues of S. This problem can be written down as:

EOP : minimize λ (^) max (S) − λ (^) min (S) w, S s.t. S = B −

∑k i=

wi A (^) i Gw ≤ d.

Then EOP can be written as:

EOP : minimize μ − λ w, S, μ, λ s.t. S = B −

∑k i=

wi A (^) i Gw ≤ d λI - S - μI.

This last problem is a semidefinite program.

Using constructs such as those shown above, very many other types of eigenvalue optimization problems can be formulated as∑ SDP s. For example, suppose that we would like to work with n j=1 λ^ j^ (S). Then one can use elementary properties of the determinant function to prove:

Property: If M is symmetric, then

∑n j=1 λ^ j^ (S) =^

∑n j=1 M^ jj^.

x^ P

Eout

Ein

Figure 5: Illustration of the ellipsoid construction at the analytic center.

the analytic center is the solution (ˆy, Sˆ) of the following optimization problem:

(ACP:) maximize (^) y,S

∏n i=

λ (^) i (S) s.t.

∑m i=1 y^ i^ Ai^ +^ S^ =^ C S & 0.

This is easily seen to be the same as:

(ACP:) minimize (^) y,S − ln det(S) s.t.

∑m i=1 y^ i^ A^ i^ +^ S^ =^ C S ' 0.

Just as in linear inequality systems, the analytic center possesses a very nice “centrality” property in the feasible region P of the semi-definite inequality system. Suppose that (ˆy, Sˆ) is the analytic center. Then there are easy-to-construct ellipsoids E (^) IN and E (^) OUT , both centered at ˆy and where E (^) OUT is a scaled version of E (^) IN with scale factor n, with the property that:

E (^) IN ⊂ P ⊂ E (^) OUT ,

as illustrated in Figure 5.

16.8.6 SDP for the Minimum Volume Circumscription Problem

A given matrix R ' 0 and a given point z can be used to define an ellipsoid in R n^ :

E (^) R,z := {y | (y − z)T^ R(y − z) ≤ 1 }.

One can prove that the volume of E (^) R,z is proportional to

det(R −^1 ).

Suppose we are given a convex set X ∈ R n^ described as the convex hull of k points c 1 ,... , ck. We would like to find an ellipsoid circumscribing these k points that has minimum volume, see Figure

Figure 6: Illustration of the circumscribed ellipsoid problem.

Our problem can be written in the following form:

M CP : minimize vol (E (^) R,z ) R, z s.t. ci ∈ E (^) R,z , i = 1,... , k,

which is equivalent to:

M CP : minimize − ln(det(R)) R, z s.t. (ci − z)T^ R(ci − z) ≤ 1 , i = 1,... , k R ' 0 ,

Now factor R = M 2 where M ' 0 (that is, M is a square root of R), and now M CP becomes:

M CP : minimize − ln(det(M 2 )) M, z s.t. (ci − z)T^ M T^ M (ci − z) ≤ 1 , i = 1,... , k, M ' 0.

Next notice the equivalence: ( I M c (^) i − M z (M ci − M z)T^1

& 0 ⇐⇒ (ci − z)T^ M T^ M (ci − z) ≤ 1

In this way we can write M CP as:

M CP : minimize −2 ln(det(M )) M, z s.t.

I M c (^) i − M z (M ci − M z)T^1

& 0 , i = 1,... , k, M ' 0.

Last of all, make the substitution y = M z to obtain:

M CP : minimize −2 ln(det(M )) M, y s.t.

I M c (^) i − y (M ci − y)T^1

& 0 , i = 1,... , k, M ' 0.

Let fμ (X) denote the objective function of BSDP (μ). Then it is not too difficult to derive:

−∇fμ (X) = C − μX −^1 ,

and so the Karush-Kuhn-Tucker conditions for BSDP (μ) are:  



Ai • X = b (^) i , i = 1,... , m, X ' 0 , C − μX −^1 =

∑m i=

y (^) i A (^) i.

We can define S = μX −^1 ,

which implies XS = μI,

and we can rewrite the Karush-Kuhn-Tucker conditions as:      

Ai • X = b (^) i , i = 1,... , m, X ' 0 ∑^ m i=

y (^) i A (^) i + S = C XS = μI.

It follows that if (X, y, S) is a solution of this system, then X is feasible for SDP , (y, S) is feasible for SDD, and the resulting duality gap is

S • X =

∑^ n

i=

∑^ n

j=

S (^) ij X (^) ij =

∑^ n

j=

(SX)jj =

∑^ n

j=

(μI)jj = nμ.

This suggests that we try solving BSDP (μ) for a variety of values of μ as μ → 0.

Interior-point methods for SDP are very similar to those for linear optimization, in that they use Newton’s method to solve the KKT system as μ → 0.

16.11 Website for SDP

A good website for semidefinite programming is:

http://www-user.tu-chemnitz.de/ helmberg/semidef.html.