Minimum Volume Ellipsoid, Stochastic Subgradient, Exercises of Convex Optimization

Prof. Chhaayank Buhpathi assigned this task to do at home for Convex Optimization course at Aliah University. It includes: Vector, Machine, Stochastic, Subgradient, Linear, Vector, Machine, SVM, Error, Probability

Typology: Exercises

2011/2012

Uploaded on 07/15/2012

saeeda
saeeda 🇮🇳

4

(4)

49 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
EE364b Prof. S. Boyd
EE364b Homework 4
1. Support vector machine training via stochastic subgradient. We suppose that feature-
label pairs, (x, y )Rn× {−1,1}, are generated from some distribution. We seek a
linear classifier or predictor, of the form ˆy=sign(wTx), where wRnis the weight
vector. (We can add an entry to xthat is always 1 to get an affine classifier.) Our
classifier is correct when ywTx > 0; since this expression is homogeneous in w, we can
write this as ywTx1. Thus, our goal is to choose wso that 1 ywTx0 with high
probability.
Asupport vector machine (SVM) chooses wsvm as the minimizer of
f(w) = E1ywTx++ (ρ/2)kwk2
2,
where ρ > 0 is a parameter. The first term is the average loss, and the second term is a
quadratic regularizer. Finding wsvm involves solving a stochastic optimization problem.
Explain how to (approximately) solve this stochastic optimization problem using the
stochastic subgradient method, with one sample per subgradient step. In this context,
the samples from the distribution are called data or examples, and the collection of
these is called the training data. Since this method only processes one data sample in
each step, it is called a streaming algorithm (since it does not have to store more than
one data sample in each step).
Implement the stochastic subgradient method for a problem with n= 20, and (x, y )
samples generated using
randn(’state’,0)
w_true = randn(n,1); % ’true’ weight vector
% to get each data sample use snippet below
x = randn(n,1);
y = sign(w_true’*x+0.1*randn(1));
Experiment with the choice of ρ, the step size rule, and the number of iterations to
run (but don’t be afraid to run the algorithm for 10000 steps).
To view the convergence, you can plot two quantities at each step: the optimality gap
f(w)fand the classifier error probability Prob ywTx0. To (approximately)
compute these quantities, use a Monte Carlo method, using, say, 10000 samples. (You’ll
want to compute these 10000 samples, and evaluate the Monte Carlo estimates of the
two quantities above, without using Matlab for loops. Also note that evaluation of
these two quantities will be far more costly than each step of the stochastic subgradient
method.) You can use CVX to estimate f.
1
docsity.com
pf2

Partial preview of the text

Download Minimum Volume Ellipsoid, Stochastic Subgradient and more Exercises Convex Optimization in PDF only on Docsity!

EE364b Prof. S. Boyd

EE364b Homework 4

  1. Support vector machine training via stochastic subgradient. We suppose that feature- label pairs, (x, y) ∈ Rn^ × {− 1 , 1 }, are generated from some distribution. We seek a linear classifier or predictor, of the form ˆy = sign(wT^ x), where w ∈ Rn^ is the weight vector. (We can add an entry to x that is always 1 to get an affine classifier.) Our classifier is correct when ywT^ x > 0; since this expression is homogeneous in w, we can write this as ywT^ x ≥ 1. Thus, our goal is to choose w so that 1 − ywT^ x ≤ 0 with high probability. A support vector machine (SVM) chooses wsvm^ as the minimizer of

f (w) = E

( 1 − ywT^ x

)

    • (ρ/2)‖w‖

2 2 ,

where ρ > 0 is a parameter. The first term is the average loss, and the second term is a quadratic regularizer. Finding wsvm^ involves solving a stochastic optimization problem. Explain how to (approximately) solve this stochastic optimization problem using the stochastic subgradient method, with one sample per subgradient step. In this context, the samples from the distribution are called data or examples, and the collection of these is called the training data. Since this method only processes one data sample in each step, it is called a streaming algorithm (since it does not have to store more than one data sample in each step). Implement the stochastic subgradient method for a problem with n = 20, and (x, y) samples generated using

randn(’state’,0) w_true = randn(n,1); % ’true’ weight vector % to get each data sample use snippet below x = randn(n,1); y = sign(w_true’x+0.1randn(1));

Experiment with the choice of ρ, the step size rule, and the number of iterations to run (but don’t be afraid to run the algorithm for 10000 steps). To view the convergence, you can plot two quantities at each step: the optimality gap f (w) − f ⋆^ and the classifier error probability Prob

( ywT^ x ≤ 0

)

. To (approximately) compute these quantities, use a Monte Carlo method, using, say, 10000 samples. (You’ll want to compute these 10000 samples, and evaluate the Monte Carlo estimates of the two quantities above, without using Matlab for loops. Also note that evaluation of these two quantities will be far more costly than each step of the stochastic subgradient method.) You can use CVX to estimate f ⋆.

docsity.com

  1. Minimum volume ellipsoid covering a half-ellipsoid. In this problem we derive the update formulas used in the ellipsoid method, i.e., we will determine the minimum volume ellipsoid that contains the intersection of the ellipsoid

E = {x ∈ Rn^ | (x − xc)T^ P −^1 (x − xc) ≤ 1 }

and the halfspace H = {x | gT^ (x − xc) ≤ 0 }. We’ll assume that n > 1, since for n = 1 the problem is easy.

(a) We first consider a special case: E is the unit ball centered at the origin (P = I, xc = 0), and g = −e 1 (e 1 is the first unit vector), so E∩H = {x | xT^ x ≤ 1 , x 1 ≥ 0 }. Let E˜ = {x | (x − ˜xc)T^ P˜ −^1 (x − x˜c) ≤ 1 } denote the minimum volume ellipsoid containing E ∩ H. Since E ∩ H is symmetric about the line through first unit vector e 1 , it is clear (and not too hard to show) that E˜ will have the same symmetry. This means that the matrix P˜ is diagonal, of the form P˜ = diag(α, β, β,... , β), and that ˜xc = γe 1 (where α, β > 0 and γ ≥ 0). So now we have only three variables to determine: α, β, and γ. Express the volume of E˜ in terms of these variables, and also the constraint that E ⊇ E ∩ H˜. Then solve the optimization problem directly, to show that

α =

n^2 (n + 1)^2

, β =

n^2 n^2 − 1

, γ =

n + 1

(which agrees with the formulas we gave, for this special case). Hint. To express E ∩ H ⊆ E˜ in terms of the variables, it is necessary and sufficient for the conditions on α, β, and γ to hold on the boundary of E ∩ H, i.e., at the points x 1 = 0, x^22 + · · · + x^2 n ≤ 1 , or the points x 1 ≥ 0 , x^21 + x^22 + · · · + x^2 n = 1.

(b) Now consider the general case, stated at the beginning of this problem. Show how to reduce the general case to the special case solved in part (a). Hint. Find an affine transformation that maps the original ellipsoid to the unit ball, and g to −e 1. Explain why minimizing the volume in these transformed coordinates also minimizes the volume in the original coordinates.

docsity.com