Matrix Norm Approximation, Factorization Caching, Kelley’s Cutting Plane Algorithm, Exercises of Convex Optimization

Prof. Chhaayank Buhpathi assigned this task to do at home for Convex Optimization course at Aliah University. It includes: Matrix, Norm, Approximation, Linear, Combination, Random, Instance, Fejer, Monotone, LP, Warning

Typology: Exercises

2011/2012

Uploaded on 07/15/2012

saeeda
saeeda 🇮🇳

4

(4)

49 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
EE364b Prof. S. Boyd
EE364b Homework 3
1. Matrix norm approximation. We consider the problem of approximating a given matrix
BRp×qas a linear combination of some other given matrices AiRp×q,i= 1,...,n,
as measured by the matrix norm (maximum singular value):
minimize kx1A1+···+xnAnBk.
(a) Explain how to find a subgradient of the objective function at x.
(b) Generate a random instance of the problem with n= 5, p= 3, q= 6. Use CVX
to find the optimal value fof the problem. Use a subgradient method to solve
the problem, starting from x= 0. Plot ffversus iteration. Experiment with
several step size sequences.
2. Step sizes that guarantee moving closer to the optimal set. Consider the subgradient
method iteration x+=xαg, where g∂f (x). Show that if α < 2(f(x)f)/kgk2
2
(which is twice Polyak’s optimal step size value) we have
kx+xk2<kxxk2,
for any optimal point x. This implies that dist(x+, X )<dist(x, X). (Methods
in which successive iterates move closer to the optimal set are called ejer monotone.
Thus, the subgradient method, with Polyak’s optimal step size, is ejer monotone.)
3. Alternating projections for LP feasibility. We consider the problem of finding a point
xRnthat satisfies Ax =b,x0, where ARm×n, with m < n.
(a) Work out alternating projections for this problem. (In other words, explain how
to compute (Euclidean) projections onto {x|Ax =b}and Rn
+.)
(b) Implement your method, and try it on one or more problem instances with m=
500, n= 2000. With x(k)denoting the iterate after projection onto Rn
+, plot
kAx(k)bk2, the residual of the equality constraint. (This should converge to
zero; you can terminate when this norm is smaller than 105.)
Here is a simple way to generate data which is feasible. First generate a random
A, and a random zwith positive entries. Then set b=Az. (Of course, you cannot
use zin your alternating projections method; you must start from some obvious
point, such as 0.)
Warning. When Ais a fat matrix, the Matlab command A\b does not do what
you might expect, i.e., compute a least-norm solution of Ax =b.
1
docsity.com
pf2

Partial preview of the text

Download Matrix Norm Approximation, Factorization Caching, Kelley’s Cutting Plane Algorithm and more Exercises Convex Optimization in PDF only on Docsity!

EE364b Prof. S. Boyd

EE364b Homework 3

  1. Matrix norm approximation. We consider the problem of approximating a given matrix B ∈ Rp×q^ as a linear combination of some other given matrices Ai ∈ Rp×q, i = 1,... , n, as measured by the matrix norm (maximum singular value):

minimize ‖x 1 A 1 + · · · + xnAn − B‖.

(a) Explain how to find a subgradient of the objective function at x. (b) Generate a random instance of the problem with n = 5, p = 3, q = 6. Use CVX to find the optimal value f ⋆^ of the problem. Use a subgradient method to solve the problem, starting from x = 0. Plot f − f ⋆^ versus iteration. Experiment with several step size sequences.

  1. Step sizes that guarantee moving closer to the optimal set. Consider the subgradient method iteration x+^ = x − αg, where g ∈ ∂f (x). Show that if α < 2(f (x) − f ⋆)/‖g‖^22 (which is twice Polyak’s optimal step size value) we have

‖x+^ − x⋆‖ 2 < ‖x − x⋆‖ 2 ,

for any optimal point x⋆. This implies that dist(x+, X⋆) < dist(x, X⋆). (Methods in which successive iterates move closer to the optimal set are called F´ejer monotone. Thus, the subgradient method, with Polyak’s optimal step size, is F´ejer monotone.)

  1. Alternating projections for LP feasibility. We consider the problem of finding a point x ∈ Rn^ that satisfies Ax = b, x  0, where A ∈ Rm×n, with m < n.

(a) Work out alternating projections for this problem. (In other words, explain how to compute (Euclidean) projections onto {x | Ax = b} and Rn +.) (b) Implement your method, and try it on one or more problem instances with m = 500, n = 2000. With x(k)^ denoting the iterate after projection onto Rn +, plot ‖Ax(k)^ − b‖ 2 , the residual of the equality constraint. (This should converge to zero; you can terminate when this norm is smaller than 10−^5 .) Here is a simple way to generate data which is feasible. First generate a random A, and a random z with positive entries. Then set b = Az. (Of course, you cannot use z in your alternating projections method; you must start from some obvious point, such as 0.) Warning. When A is a fat matrix, the Matlab command A\b does not do what you might expect, i.e., compute a least-norm solution of Ax = b.

docsity.com

(c) Factorization caching is a general technique for speeding up some repeated cal- culations, such as projection onto an affine set. Assuming A is dense, the cost of computing the projection of a point onto the affine set {x | Ax = b} is O(m^2 n) flops. (See Appendix C in Convex Optimization.) By saving some of the matri- ces involved in this computation, such as a Cholesky factorization (or even more directly, the inverse) of AAT^ , subsequent projections can be carried out at a cost of O(mn) flops, i.e., m times faster. (There are several other ways to get this speedup, by saving other matrices.) Effectively, this makes each subgradient step (after the first one) a factor m times cheaper. Explain how to do this, and im- plement a caching scheme in your code. Verify that you obtain a speedup. (You may have to try your code on a larger problem instance.) (d) Over-projection. A general method that can speed up alternating projections is to over-project, which means replacing the simple projection x+^ = P (x) with x+^ = x+γ(P (x)−x), where γ ∈ [1, 2). (When γ = 1, this reduces to standard pro- jection.) It is not hard to show that alternating projections, with over-projection, converges to a point in the intersection of the sets. Implement over-projection and experiment with the over-projection factor γ, ob- serving the effect on the number of iterations required for convergence.

  1. Kelley’s cutting-plane algorithm. We consider the problem of minimizing a convex function f : Rn^ → R over some convex set C, assuming we can evaluate f (x) and find a subgradient g ∈ ∂f (x) for any x. Suppose we have evaluated the function and a subgradient at x(1),... , x(k). We can form the piecewise-linear approximation

fˆ (k)(x) = max i=1,...,k

( f (x(i)) + g(i)T^ (x − x(i))

) ,

which satisfies fˆ (k)(x) ≤ f (x) for all x. It follows that

L(k)^ = inf x∈C f^ ˆ (k)(x) ≤ p⋆,

where p⋆^ = infx∈C f (x). Since fˆ (k+1)(x) ≥ fˆ (k)(x) for all x, we have L(k+1)^ ≥ L(k). In Kelley’s cutting-plane algorithm, we set x(k+1)^ to be any point that minimizes fˆ (k) over x ∈ C. The algorithm can be terminated when U (k)^ − L(k)^ ≤ ǫ, where U (k)^ = mini=1,...,k f (x(i)). Use Kelley’s cutting-plane algorithm to minimize the piecewise-linear function f (x) = maxi=1,...,m(aTi x + bi) that we have used for other numerical examples, with C the unit cube, i.e., C = {x | ‖x‖∞ ≤ 1 }. The data that defines the particular function can be found in the Matlab directory of the subgradient notes on the course web site. You can start with x(1)^ = 0 and run the algorithm for 40 iterations. Plot f (x(k)), U (k), L(k) and the constant p⋆^ (on the same plot) versus k.

docsity.com