









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Professor: Epelman; Class: Cont Optimum Methods; Subject: Industrial And Operations Engineering; University: University of Michigan - Ann Arbor; Term: Fall 2007;
Typology: Study notes
1 / 16
This page cannot be seen from the preview
Don't miss anything!










Semidefinite programming (SDP ) is probably the most exciting development in mathematical programming in the last ten years. SDP has applications in such diverse fields as traditional convex constrained optimization, control theory, and combinatorial optimization. Because SDP is solvable via interior-point methods (and usually requires about the same amount of computational resources as linear optimization), most of these applications can usually be solved fairly efficiently in practice as well as in theory.
Consider the linear programming problem in standard form:
LP : minimize c · x s.t. ai · x = b (^) i , i = 1,... , m x ∈ R n +.
Here x is a vector of n variables, and we write “c · x” for the inner-product “
∑n j=1 cj^ xj^ ”, etc.
Also, R n + := {x ∈ R n^ | x ≥ 0 }, and we call R n + the nonnegative orthant. In fact, R n + is a closed convex cone, where K is called a closed a convex cone if K satisfies the following two conditions:
In words, LP is the following problem:
“Minimize the linear function c · x, subject to the condition that x must solve m given equations ai · x = b (^) i , i = 1,... , m, and that x must lie in the closed convex cone K = R n + .”
We will write the standard linear programming dual problem as:
LD : maximize
∑m i=
y (^) i b (^) i
s.t.
∑m i=
y (^) i ai + s = c s ∈ R n +.
Given a feasible solution x of LP and a feasible solution (y, s) of LD, the duality gap is simply c · x −
∑m i=1 y^ i^ b^ i^ = (c^ −^
∑m i=1 y^ i^ ai^ )^ ·^ x^ =^ s^ ·^ x^ ≥^0 ,^ because^ x^ ≥^ 0 and^ s^ ≥^ 0. We know from^ LP duality theory that so long as the primal problem LP is feasible and has bounded optimal objective value, then the primal and the dual both attain their optima with no duality gap. That is, there exists x∗^ and (y ∗^ , s∗^ ) feasible for the primal and dual, respectively, for which c · x∗^ −
∑m i=1 y^ ∗ i b^ i^ = s∗^ · x∗^ = 0.
16.3.1 Facts about the Semidefinite Cone
If X is an n × n matrix, then X is a symmetric positive semidefinite (SPSD) matrix if X = X T and v T^ Xv ≥ 0 for any v ∈ R n^.
If X is an n × n matrix, then X is a symmetric positive definite (SPD) matrix if X = X T^ and
v T^ Xv > 0 for any v ∈ R n^ , v %= 0.
Let S n^ denote the set of symmetric n × n matrices, and let S n + denote the set of symmetric positive semidefinite (SPSD) n × n matrices. Similarly let S (^) ++n denote the set of symmetric positive definite (SPD) n × n matrices.
Let X and Y be any symmetric matrices. We write “X & 0” to denote that X is SPSD, and we write “X & Y ” to denote that X − Y & 0. We write “X ' 0” to denote that X is SPD, etc.
S n + = {X ∈ S n^ | X & 0 } is a closed convex cone in R n 2 of dimension n × (n + 1)/2.
To see why this remark is true, suppose that X, W ∈ S (^) +n. Pick any scalars α, β ≥ 0. For any v ∈ R n^ , we have: v T^ (αX + βW )v = αv T^ Xv + βv T^ W v ≥ 0 ,
whereby αX + βW ∈ S n +. This shows that S (^) +n is a cone. It is also straightforward to show that S (^) +n is a closed set.
16.3.2 Facts about Eigenvalues and Eigenvectors
If M is a square n × n matrix, then λ is an eigenvalue of M with corresponding eigenvector x if M x = λx and x %= 0.
Note that λ is an eigenvalue of M if and only if λ is a root of the polynomial:
p(λ) := det(M − λI),
that is p(λ) = det(M − λI) = 0.
This polynomial will have n roots counting multiplicities, that is, there exist λ 1 , λ 2 ,... , λ (^) n for which: p(λ) := det(M − λI) = Π ni=1 (λ (^) i − λ).
If M is symmetric, then all eigenvalues λ of M must be real numbers, and these eigenvalues can be ordered so that λ 1 ≥ λ 2 ≥ · · · ≥ λ (^) n if we so choose.
The corresponding eigenvectores ( q 1 ,... , q n^ of M can be chosen so that they are orthogonal, namely q i^
q j^
= 0 for i %= j, and can be scaled so that
q i^
q i^
= 1. This means that the matrix:
Q :=
q 1 q 2 · · · q n^
Let X ∈ S n^. We can think of X as a matrix, or equivalently, as an array of n 2 components of the form (x 11 ,... , xnn ). We can also just think of X as an object (a vector) in the space S n^. All three different equivalent ways of looking at X will be useful.
What will a linear function of X look like? If C(X) is a linear function of X, then C(X) can be written as C • X, where
C • X :=
∑^ n
i=
∑^ n
j=
Cij X (^) ij.
If X is a symmetric matrix, there is no loss of generality in assuming that the matrix C is also symmetric. With this notation, we are now ready to define a semidefinite program. A semidefinite program (SDP ) is an optimization problem of the form:
SDP : minimize C • X s.t. A (^) i • X = b (^) i , i = 1,... , m, X & 0.
Notice that in an SDP that the variable is the matrix X, but it might be helpful to think of X as an array of n 2 numbers or simply as a vector in S n^. The objective function is the linear function C • X and there are m linear equations that X must satisfy, namely A (^) i • X = b (^) i , i = 1,... , m. The variable X also must lie in the (closed convex) cone of positive semidefinite symmetric matrices S n +. Note that the data for SDP consists of the symmetric matrix C (which is the data for the objective function) and the m symmetric matrices A 1 ,... , A (^) m , and the m−vector b, which form the m linear equations.
Let us see an example of an SDP for n = 3 and m = 2. Define the following matrices:
(^) , b =
, and C =
Then the variable X will be the 3 × 3 symmetric matrix:
x 11 x 12 x 13 x 21 x 22 x 23 x 31 x 32 x 33
and so, for example,
C • X = x 11 + 2x 12 + 3x 13 + 2x 21 + 9x 22 + 0x 23 + 3x 31 + 0x 32 + 7x 33 = x 11 + 4x 12 + 6x 13 + 9x 22 + 0x 23 + 7x 33.
since, in particular, X is symmetric. Therefore the SDP can be written as:
SDP : minimize x 11 + 4x 12 + 6x 13 + 9x 22 + 0x 23 + 7x 33 s.t. x 11 + 0x 12 + 2x 13 + 3x 22 + 14x 23 + 5x 33 = 11 0 x 11 + 4x 12 + 16x 13 + 6x 22 + 0x 23 + 4x 33 = 19
X =
x 11 x 12 x 13 x 21 x 22 x 23 x 31 x 32 x 33
Notice that SDP looks remarkably similar to a linear program. However, the standard LP con- straint that x must lie in the nonnegative orthant is replaced by the constraint that the variable X must lie in the cone of positive semidefinite matrices. Just as “x ≥ 0” states that each of the n components of x must be nonnegative, it may be helpful to think of “X & 0” as stating that each of the n eigenvalues of X must be nonnegative. It is easy to see that a linear program LP is a special instance of an SDP. To see one way of doing this, suppose that (c, a 1 ,... , am , b 1 ,... , b (^) m ) comprise the data for LP. Then define:
A (^) i =
ai 1 0... 0 0 ai 2... 0 .. .
0 0... a (^) in
, i = 1,... , m, and C =
c 1 0... 0 0 c 2... 0 .. .
0 0... cn
Then LP can be written as:
SDP : minimize C • X s.t. A (^) i • X = b (^) i , i = 1,... , m, X (^) ij = 0, i = 1,... , n, j = i + 1,... , n, X & 0 ,
with the association that
x 1 0... 0 0 x 2... 0 .. .
0 0... xn
Of course, in practice one would never want to convert an instance of LP into an instance of SDP. The above construction merely shows that SDP includes linear programming as a special case.
The dual problem of SDP is defined (or derived from first principles) to be:
SDD : maximize
∑m i=
y (^) i b (^) i
s.t.
∑m i=
y (^) i A (^) i + S = C S & 0.
One convenient way of thinking about this problem is as follows. Given multipliers y 1 ,... , y (^) m , the objective is to maximize the linear function
∑m i=1 y^ i^ b^ i^. The constraints of^ SDD^ state that the matrix S defined as
S = C −
∑^ m
i=
y (^) i A (^) i
must be positive semidefinite. That is,
∑^ m
i=
y (^) i Ai & 0.
where the last inequality follows from the fact that all Djj ≥ 0 and the fact that the diagonal of the symmetric positive semidefinite matrix P T^ QEQT^ P must be nonnegative.
To prove the second part of the proposition, suppose that trace(SX) = 0. Then from the above equalities, we have ∑n
j=
Djj (P T^ QEQT^ P )jj = 0.
However, this implies that for each j = 1,... , n, either Djj = 0 or the (P T^ QEQT^ P )jj = 0. Furthermore, the latter case implies that the j th^ row of P T^ QEQT^ P is all zeros. Therefore DP T^ QEQT^ P = 0, and so SX = P DP T^ QEQT^ = 0.
Unlike the case of linear programming, we cannot assert that either SDP or SDD will attain their respective optima, and/or that there will be no duality gap, unless certain regularity conditions hold. One such regularity condition which ensures that strong duality will prevail is a version of the “Slater condition,” summarized in the following theorem which we will not prove:
Theorem 92 Let z ∗ P and z ∗ D denote the optimal objective function values of SDP and SDD, respectively. Suppose that there exists a feasible solution Xˆ of SDP such that Xˆ ' 0 , and that there exists a feasible solution (ˆy, Sˆ) of SDD such that Sˆ ' 0. Then both SDP and SDD attain their optimal values, and z ∗ P = z (^) D∗.
The following summarizes some of the more important properties of linear programming that do not extend to SDP :
SDP has wide applicability in combinatorial optimization. A number of N P −hard combinatorial optimization problems have convex relaxations that are semidefinite programs. In many instances, the SDP relaxation is very tight in practice, and in certain instances in particular, the optimal solution to the SDP relaxation can be converted to a feasible solution for the original problem with provably good objective value. An example of the use of SDP in combinatorial optimization is given below.
16.7.1 An SDP Relaxation of the MAX CUT Problem
Let G be an undirected graph with nodes N = { 1 ,... , n}, and edge set E. Let wij = wji be the weight on edge (i, j), for (i, j) ∈ E. We assume that wij ≥ 0 for all (i, j) ∈ E. The MAX CUT
problem is to determine a subset S of the nodes N for which the sum of the weights of the edges that cross from S to its complement S¯ is maximized (where S¯ := N \ S).
We can formulate MAX CUT as an integer program as follows. Let xj = 1 for j ∈ S and xj = − 1 for j ∈ S¯. Then our formulation is:
M AXCU T : maximize (^) x (^14)
∑^ n i=
∑^ n j=
wij (1 − xi xj )
s.t. xj ∈ {− 1 , 1 }, j = 1,... , n.
Now let Y = xxT^ ,
whereby Yij = xi xj , i = 1,... , n, j = 1,... , n.
Also let W be the matrix whose (i, j)th^ element is wij for i = 1,... , n and j = 1,... , n. Then MAX CUT can be equivalently formulated as:
M AXCU T : maximize (^) Y,x (^14)
∑^ n i=
∑^ n j=
wij − 14 W • Y
s.t. xj ∈ {− 1 , 1 }, j = 1,... , n Y = xxT^.
Notice in this problem that the first set of constraints are equivalent to Yjj = 1, j = 1,... , n. We therefore obtain: M AXCU T : maximize (^) Y,x (^14)
∑^ n i=
∑^ n j=
wij − 14 W • Y
s.t. Yjj = 1, j = 1,... , n Y = xxT^.
Last of all, notice that the matrix Y = xxT^ is a symmetric rank-1 positive semidefinite matrix. If we relax this condition by removing the rank-1 restriction, we obtain the following relaxtion of MAX CUT, which is a semidefinite program:
RELAX : maximize (^) Y (^14)
∑^ n i=
∑^ n j=
wij − 14 W • Y
s.t. Yjj = 1, j = 1,... , n Y & 0.
It is therefore easy to see that RELAX provides an upper bound on MAXCUT, i.e.,
M AXCU T ≤ RELAX.
As it turns out, one can also prove without too much effort that:
This is an impressive result, in that it states that the value of the semidefinite relaxation is guar- anteed to be no more than 12.2% higher than the value of N P -hard problem MAX CUT.
16.8.2 SDP for Second-Order Cone Optimization
A second-order cone optimization problem (SOCP) is an optimization problem of the form:
SOCP: minx cT^ x s.t. Ax = b ‖Qi x + d (^) i ‖ ≤
g (^) iT x + h (^) i
, i = 1,... , k.
In this problem, the norm ‖v‖ is the standard Euclidean norm:
‖v‖ :=
v T^ v.
The norm constraints in SOCP are called “second-order cone” constraints. Note that these are convex constraints.
Here we show that any second-order cone constraint can be written as an SDP constraint. Indeed we have:
Property:
‖Qx + d‖ ≤
g T^ x + h
(g T^ x + h)I (Qx + d) (Qx + d)T^ g T^ x + h
Note in the above that the matrix involved here is a linear function of the variable x, and so is in the general form of an SDP constraint. This property is a direct consequence of the fact (stated earlier) that
M =
P v v T^ d
& 0 ⇐⇒ d − v T^ P −^1 v ≥ 0.
Therefore we can write the second-order cone optimization problem as:
SDPSOCP: minx cT^ x s.t. Ax( = b (g Ti x + h (^) i )I (Qi x + d (^) i ) (Qi x + d (^) i )T^ g Ti x + h (^) i
& 0 , i = 1,... , k.
16.8.3 SDP for Eigenvalue Optimization
There are many types of eigenvalue optimization problems that can be formulated as SDP s. In a typical eigenvalue optimization problem, we are given symmetric matrices B and Ai , i = 1,... , k, and we choose weights w 1 ,... , w (^) k to create a new matrix S:
∑^ k
i=
wi A (^) i.
In some applications there might be restrictions on the weights w, such as w ≥ 0 or more generally linear inequalities of the form Gw ≤ d. The typical goal is then to choose w in such a way that the eigenvalues of S are “well-aligned,” for example:
∑n j=1 λ^ j^ (S) is minimized or maximized
Let us see how to work with these problems using SDP. First, we have:
Property: M & tI if and only if λ (^) min (M ) ≥ t.
To see why this is true, let us consider the eigenvalue decomposition of M = QDQT^ , and consider the matrix R defined as:
R = M − tI = QDQT^ − tI = Q(D − tI)QT^.
Then M & tI ⇐⇒ R & 0 ⇐⇒ D − tI & 0 ⇐⇒ λ (^) min (M ) ≥ t.
Property: M - tI if and only if λ (^) max (M ) ≤ t.
To see why this is true, let us consider the eigenvalue decomposition of M = QDQT^ , and consider the matrix R defined as:
R = M − tI = QDQT^ − tI = Q(D − tI)QT^.
Then M - tI ⇐⇒ R - 0 ⇐⇒ D − tI - 0 ⇐⇒ λ (^) max (M ) ≤ t.
Now suppose that we wish to find weights w to minimize the difference between the largest and the smallest eigenvalues of S. This problem can be written down as:
EOP : minimize λ (^) max (S) − λ (^) min (S) w, S s.t. S = B −
∑k i=
wi A (^) i Gw ≤ d.
Then EOP can be written as:
EOP : minimize μ − λ w, S, μ, λ s.t. S = B −
∑k i=
wi A (^) i Gw ≤ d λI - S - μI.
This last problem is a semidefinite program.
Using constructs such as those shown above, very many other types of eigenvalue optimization problems can be formulated as∑ SDP s. For example, suppose that we would like to work with n j=1 λ^ j^ (S). Then one can use elementary properties of the determinant function to prove:
Property: If M is symmetric, then
∑n j=1 λ^ j^ (S) =^
∑n j=1 M^ jj^.
x^ P
Eout
Ein
Figure 5: Illustration of the ellipsoid construction at the analytic center.
the analytic center is the solution (ˆy, Sˆ) of the following optimization problem:
(ACP:) maximize (^) y,S
∏n i=
λ (^) i (S) s.t.
∑m i=1 y^ i^ Ai^ +^ S^ =^ C S & 0.
This is easily seen to be the same as:
(ACP:) minimize (^) y,S − ln det(S) s.t.
∑m i=1 y^ i^ A^ i^ +^ S^ =^ C S ' 0.
Just as in linear inequality systems, the analytic center possesses a very nice “centrality” property in the feasible region P of the semi-definite inequality system. Suppose that (ˆy, Sˆ) is the analytic center. Then there are easy-to-construct ellipsoids E (^) IN and E (^) OUT , both centered at ˆy and where E (^) OUT is a scaled version of E (^) IN with scale factor n, with the property that:
E (^) IN ⊂ P ⊂ E (^) OUT ,
as illustrated in Figure 5.
16.8.6 SDP for the Minimum Volume Circumscription Problem
A given matrix R ' 0 and a given point z can be used to define an ellipsoid in R n^ :
E (^) R,z := {y | (y − z)T^ R(y − z) ≤ 1 }.
One can prove that the volume of E (^) R,z is proportional to
det(R −^1 ).
Suppose we are given a convex set X ∈ R n^ described as the convex hull of k points c 1 ,... , ck. We would like to find an ellipsoid circumscribing these k points that has minimum volume, see Figure
Figure 6: Illustration of the circumscribed ellipsoid problem.
Our problem can be written in the following form:
M CP : minimize vol (E (^) R,z ) R, z s.t. ci ∈ E (^) R,z , i = 1,... , k,
which is equivalent to:
M CP : minimize − ln(det(R)) R, z s.t. (ci − z)T^ R(ci − z) ≤ 1 , i = 1,... , k R ' 0 ,
Now factor R = M 2 where M ' 0 (that is, M is a square root of R), and now M CP becomes:
M CP : minimize − ln(det(M 2 )) M, z s.t. (ci − z)T^ M T^ M (ci − z) ≤ 1 , i = 1,... , k, M ' 0.
Next notice the equivalence: ( I M c (^) i − M z (M ci − M z)T^1
& 0 ⇐⇒ (ci − z)T^ M T^ M (ci − z) ≤ 1
In this way we can write M CP as:
M CP : minimize −2 ln(det(M )) M, z s.t.
I M c (^) i − M z (M ci − M z)T^1
& 0 , i = 1,... , k, M ' 0.
Last of all, make the substitution y = M z to obtain:
M CP : minimize −2 ln(det(M )) M, y s.t.
I M c (^) i − y (M ci − y)T^1
& 0 , i = 1,... , k, M ' 0.
Let fμ (X) denote the objective function of BSDP (μ). Then it is not too difficult to derive:
−∇fμ (X) = C − μX −^1 ,
and so the Karush-Kuhn-Tucker conditions for BSDP (μ) are:
Ai • X = b (^) i , i = 1,... , m, X ' 0 , C − μX −^1 =
∑m i=
y (^) i A (^) i.
We can define S = μX −^1 ,
which implies XS = μI,
and we can rewrite the Karush-Kuhn-Tucker conditions as:
Ai • X = b (^) i , i = 1,... , m, X ' 0 ∑^ m i=
y (^) i A (^) i + S = C XS = μI.
It follows that if (X, y, S) is a solution of this system, then X is feasible for SDP , (y, S) is feasible for SDD, and the resulting duality gap is
∑^ n
i=
∑^ n
j=
S (^) ij X (^) ij =
∑^ n
j=
(SX)jj =
∑^ n
j=
(μI)jj = nμ.
This suggests that we try solving BSDP (μ) for a variety of values of μ as μ → 0.
Interior-point methods for SDP are very similar to those for linear optimization, in that they use Newton’s method to solve the KKT system as μ → 0.
A good website for semidefinite programming is:
http://www-user.tu-chemnitz.de/ helmberg/semidef.html.