

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Prof. Chhaayank Buhpathi assigned this task to do at home for Convex Optimization course at Aliah University. It includes: Vector, Machine, Stochastic, Subgradient, Linear, Vector, Machine, SVM, Error, Probability
Typology: Exercises
1 / 2
This page cannot be seen from the preview
Don't miss anything!


EE364b Prof. S. Boyd
f (w) = E
( 1 − ywT^ x
)
2 2 ,
where ρ > 0 is a parameter. The first term is the average loss, and the second term is a quadratic regularizer. Finding wsvm^ involves solving a stochastic optimization problem. Explain how to (approximately) solve this stochastic optimization problem using the stochastic subgradient method, with one sample per subgradient step. In this context, the samples from the distribution are called data or examples, and the collection of these is called the training data. Since this method only processes one data sample in each step, it is called a streaming algorithm (since it does not have to store more than one data sample in each step). Implement the stochastic subgradient method for a problem with n = 20, and (x, y) samples generated using
randn(’state’,0) w_true = randn(n,1); % ’true’ weight vector % to get each data sample use snippet below x = randn(n,1); y = sign(w_true’x+0.1randn(1));
Experiment with the choice of ρ, the step size rule, and the number of iterations to run (but don’t be afraid to run the algorithm for 10000 steps). To view the convergence, you can plot two quantities at each step: the optimality gap f (w) − f ⋆^ and the classifier error probability Prob
( ywT^ x ≤ 0
)
. To (approximately) compute these quantities, use a Monte Carlo method, using, say, 10000 samples. (You’ll want to compute these 10000 samples, and evaluate the Monte Carlo estimates of the two quantities above, without using Matlab for loops. Also note that evaluation of these two quantities will be far more costly than each step of the stochastic subgradient method.) You can use CVX to estimate f ⋆.
E = {x ∈ Rn^ | (x − xc)T^ P −^1 (x − xc) ≤ 1 }
and the halfspace H = {x | gT^ (x − xc) ≤ 0 }. We’ll assume that n > 1, since for n = 1 the problem is easy.
(a) We first consider a special case: E is the unit ball centered at the origin (P = I, xc = 0), and g = −e 1 (e 1 is the first unit vector), so E∩H = {x | xT^ x ≤ 1 , x 1 ≥ 0 }. Let E˜ = {x | (x − ˜xc)T^ P˜ −^1 (x − x˜c) ≤ 1 } denote the minimum volume ellipsoid containing E ∩ H. Since E ∩ H is symmetric about the line through first unit vector e 1 , it is clear (and not too hard to show) that E˜ will have the same symmetry. This means that the matrix P˜ is diagonal, of the form P˜ = diag(α, β, β,... , β), and that ˜xc = γe 1 (where α, β > 0 and γ ≥ 0). So now we have only three variables to determine: α, β, and γ. Express the volume of E˜ in terms of these variables, and also the constraint that E ⊇ E ∩ H˜. Then solve the optimization problem directly, to show that
α =
n^2 (n + 1)^2
, β =
n^2 n^2 − 1
, γ =
n + 1
(which agrees with the formulas we gave, for this special case). Hint. To express E ∩ H ⊆ E˜ in terms of the variables, it is necessary and sufficient for the conditions on α, β, and γ to hold on the boundary of E ∩ H, i.e., at the points x 1 = 0, x^22 + · · · + x^2 n ≤ 1 , or the points x 1 ≥ 0 , x^21 + x^22 + · · · + x^2 n = 1.
(b) Now consider the general case, stated at the beginning of this problem. Show how to reduce the general case to the special case solved in part (a). Hint. Find an affine transformation that maps the original ellipsoid to the unit ball, and g to −e 1. Explain why minimizing the volume in these transformed coordinates also minimizes the volume in the original coordinates.