

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This lecture notes was distributed for Machine Learning course by Pakistan Institute of Engineering and Applied Sciences, Islamabad (PIEAS). Its main points are: Algorithms, SVM, Optimization, KKT, Conditions, Shrinking, Chunking, Iterative, Solvers, Decomposition, Criterion
Typology: Exercises
1 / 3
This page cannot be seen from the preview
Don't miss anything!


In previous lectures we derived the following form for the SVM optimization problem (L 1 soft margin): Max ∑ αi − (^12) ∑ αiαj yiyj K(xi x˙j ) subject to 0 ≤ αi ≤ C, ∑ αiyi = 0
Basic points:
∑ αi − (^12)
∑ αiαj yiyj Ki,j =
∑ i∈I αi^ +^ Const^ −^ 1 2
∑ i∈I
∑ j∈I αiαj^ yiyj^ Ki,j^ +^ Const^ −^
∑ i∈I αiyi
∑ j /∈I yj^ αj^ Ki,j
Let Wi = yi
∑ j /∈I yj^ αj^ Ki,j^. then^ L^ =^ Const+LI^ where^ LI^ =^
∑ i∈I αi(1^ −^ Wi)^ −^
∑ i,j∈I αiαj^ yiyj^ Ki,j The reduced optimization problem is to maximize ∑ LI subject to 0 ≤ αi ≤ C for i ∈ I and i∈I αiyi^ =^ −^
∑ i /∈I αiyi. Therefore like shrinking, chunking gives a subproblem of the same type. We can use this to develop an iterative algorithm as follows:
Decomposition methods avoid this by keeping |I| small. One such method is to fix the size of I over all iterations. The main question is how to pick I s.t. LI always increases, because if we can do that then we will reach convergence.
Fact: It is sufficient to include one KKT violator in I to guarantee that L goes up.
Why? Solve LI , we know that the current setting for I is not optimal. This means that LI goes up. Therefore L goes up as well (the difference between them is constant since other parameters are not changed). So the main questions is what I to choose at each step. Notice that using a small I will probably give a small improvement in L therefore requiring many iterations, but the QP will be solved easily so will be fast. On the other hand a large I may need fewer iterations but each iteration will be slower. SVMlight picks |I| = q small but > 2. SMO takes the tradeoff to the extreme using I = 2. The advantage of SMO is that the QP can be solved analytically, there is no need for QP engine for the subproblem, and we get very fast run time per iteration. libSVM implements SMO; it differs from the original formulation of SMO in using a different formulation of stopping criterion and in the choice of I. It also avoids the potential for backtracking search for an I that makes progress that was needed in the original formulation. In the following we provide some of the details of libSVM. We first provide the following three ingredients:
Stopping Criterion and Choice of I: Define: gi = (^) ∂α∂Li = 1 − yi
∑ j αj^ yj^ Ki,j^ and^ g ∗ i =^ gi|α=α∗^ where^ α ∗ (^) is the optimum α vector.
We must always satisfy
∑ αiyi = 0, so if we increase one αiyi in SMO, we must decrease the other. We know αiyi ∈ [0, C] if yi = 1 and αiyi ∈ [−C, 0] if yi = −1. We will use the notation