SVM Optimization-Machine Learning-Lecture Handout, Exercises of Machine Learning

This lecture notes was distributed for Machine Learning course by Pakistan Institute of Engineering and Applied Sciences, Islamabad (PIEAS). Its main points are: Algorithms, SVM, Optimization, KKT, Conditions, Shrinking, Chunking, Iterative, Solvers, Decomposition, Criterion

Typology: Exercises

2011/2012

Uploaded on 07/19/2012

zaraa
zaraa 🇵🇰

6 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Algorithms for solving the SVM Optimization
1 Introduction
In previous lectures we derived the following form for the SVM optimization problem (L1soft
margin):
Max Pαi1
2PαiαjyiyjK(xi˙xj) subject to 0 αiC,Pαiyi= 0
Basic points:
Using KKT conditions, we can check if a proposed solution is optimal.
The objective function is quadratic with a global maximum, so if we do an uphill search while
maintaining constraints we will find the maximum. We do not necessarily need to perform
gradient ascent - any step that guarantees going up is useful. The difference is whether we
can get quick convergence.
There are general solvers for quadratic programming problems, which includes SVM, but
SVM problem can be simpler than the general problem. Some of the solutions methods
discussed in the lecture do use such “off the shelf” solvers. However, these are only run on
small subproblems.
The main advantage in developing algorithms specific for SVM is that that SVM solutions
are sparse, that is, do not have many support vectors (where αi6= 0). This will be used by
the fast solvers.
2 Techniques to speed up solver
Shrinking: Guess examples is.t. αi= 0 and examples s.t. αi=C
Fix these α’s and solve subproblem on remaining examples.
Check using the KKT conditions, if optimal then we’re done.
If not, guess again.
Chunking: pick subset Ito optimize over. Fix αifor i /Iand solve the problem for the
remaining indices.
L=Pαi1
2PαiαjyiyjKi,j
=PiIαi+Const 1
2PiIPjIαiαjyiyjKi,j +Const PiIαiyiPj /IyjαjKi,j
1
docsity.com
pf3

Partial preview of the text

Download SVM Optimization-Machine Learning-Lecture Handout and more Exercises Machine Learning in PDF only on Docsity!

Algorithms for solving the SVM Optimization

1 Introduction

In previous lectures we derived the following form for the SVM optimization problem (L 1 soft margin): Max ∑ αi − (^12) ∑ αiαj yiyj K(xi x˙j ) subject to 0 ≤ αi ≤ C, ∑ αiyi = 0

Basic points:

  • Using KKT conditions, we can check if a proposed solution is optimal.
  • The objective function is quadratic with a global maximum, so if we do an uphill search while maintaining constraints we will find the maximum. We do not necessarily need to perform gradient ascent - any step that guarantees going up is useful. The difference is whether we can get quick convergence.
  • There are general solvers for quadratic programming problems, which includes SVM, but SVM problem can be simpler than the general problem. Some of the solutions methods discussed in the lecture do use such “off the shelf” solvers. However, these are only run on small subproblems.
  • The main advantage in developing algorithms specific for SVM is that that SVM solutions are sparse, that is, do not have many support vectors (where αi 6 = 0). This will be used by the fast solvers.

2 Techniques to speed up solver

  • Shrinking: Guess examples i s.t. αi = 0 and examples s.t. αi = C Fix these α’s and solve subproblem on remaining examples. Check using the KKT conditions, if optimal then we’re done. If not, guess again.
  • Chunking: pick subset I to optimize over. Fix αi for i /∈ I and solve the problem for the remaining indices. L =

∑ αi − (^12)

∑ αiαj yiyj Ki,j =

∑ i∈I αi^ +^ Const^ −^ 1 2

∑ i∈I

∑ j∈I αiαj^ yiyj^ Ki,j^ +^ Const^ −^

∑ i∈I αiyi

∑ j /∈I yj^ αj^ Ki,j

Let Wi = yi

∑ j /∈I yj^ αj^ Ki,j^. then^ L^ =^ Const+LI^ where^ LI^ =^

∑ i∈I αi(1^ −^ Wi)^ −^

∑ i,j∈I αiαj^ yiyj^ Ki,j The reduced optimization problem is to maximize ∑ LI subject to 0 ≤ αi ≤ C for i ∈ I and i∈I αiyi^ =^ −^

∑ i /∈I αiyi. Therefore like shrinking, chunking gives a subproblem of the same type. We can use this to develop an iterative algorithm as follows:

  • Iterative Chunking: Init some I and α Repeat : solve LI and check if optimal if not add to I any example violating KKT conditions Issue 1: want to make sure LI goes up with every iteration Issue 2: Hope to add to I only elements that are Support Vectors in final solution. Notice that even if this happens the last problem we solve includes all the support vectors so it may still be large.

3 More Efficient Solvers

Decomposition methods avoid this by keeping |I| small. One such method is to fix the size of I over all iterations. The main question is how to pick I s.t. LI always increases, because if we can do that then we will reach convergence.

Fact: It is sufficient to include one KKT violator in I to guarantee that L goes up.

Why? Solve LI , we know that the current setting for I is not optimal. This means that LI goes up. Therefore L goes up as well (the difference between them is constant since other parameters are not changed). So the main questions is what I to choose at each step. Notice that using a small I will probably give a small improvement in L therefore requiring many iterations, but the QP will be solved easily so will be fast. On the other hand a large I may need fewer iterations but each iteration will be slower. SVMlight picks |I| = q small but > 2. SMO takes the tradeoff to the extreme using I = 2. The advantage of SMO is that the QP can be solved analytically, there is no need for QP engine for the subproblem, and we get very fast run time per iteration. libSVM implements SMO; it differs from the original formulation of SMO in using a different formulation of stopping criterion and in the choice of I. It also avoids the potential for backtracking search for an I that makes progress that was needed in the original formulation. In the following we provide some of the details of libSVM. We first provide the following three ingredients:

  • new criterion to check optimality
  • from this, a method to choose I of size 2
  • analytic solution of size 2 problem

Stopping Criterion and Choice of I: Define: gi = (^) ∂α∂Li = 1 − yi

∑ j αj^ yj^ Ki,j^ and^ g ∗ i =^ gi|α=α∗^ where^ α ∗ (^) is the optimum α vector.

We must always satisfy

∑ αiyi = 0, so if we increase one αiyi in SMO, we must decrease the other. We know αiyi ∈ [0, C] if yi = 1 and αiyi ∈ [−C, 0] if yi = −1. We will use the notation