Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Support Vector Machines (SVMs) for Classification: Wide Margin and Misclassification, Slides of Semantics of Programming Languages

An overview of support vector machines (svms) for classification, focusing on maximizing the margin while minimizing misclassifications. The optimization problem, lagrangian relaxation, and the kernel trick. Svms are a powerful scheme for classifying data with wide margins and low misclassifications, and many standard kernels are available (linear, polynomial, rbf, string).

Typology: Slides

2012/2013

Uploaded on 04/24/2013

baishali
baishali 🇮🇳

4

(2)

84 documents

Partial preview of the text

Download Support Vector Machines (SVMs) for Classification: Wide Margin and Misclassification and more Slides Semantics of Programming Languages in PDF only on Docsity!

Sp’

Classification

(SVMs / Kernel method)

Sp’

LP versus Quadratic programming

min cT^ x

Ax  b

x  0



min xT^ QxcT^ x

Axb

x  0

  • LP: linear constraints, linear objective function
  • LP can be solved in polynomial time. - In QP, the objective function contains a quadratic form. - For +ve semindefinite Q, the QP can be solved in polynomial time

Sp’

Separating by a wider margin

  • Solutions with a wider margin are better.



Maximize 2  2

, or Minimize^ ^

2 2

Sp’

Separating via misclassification

  • In general, data is not linearly separable
  • What if we also wanted to minimize misclassified points
  • Recall that, each sample xi in our training set has the label yi {- 1,1}
  • For each point i, yi(Txi- 0 ) should be positive
  • Define i >= max {0, 1- yi(Txi- 0 ) }
  • If i is correctly classified ( yi(Txi- 0 ) >= 1), and i = 0
  • If i is incorrectly classified, or close to the boundaries i > 0
  • We must minimize ii

Sp’

Reformulating the optimization

min

2

2 ^ C^  i^  i

i  0

 i  1  yi   T^ xi  0 

Sp’

Lagrangian relaxation

L

2

 C  i  i   i  i   i  1  yi   T^ xi  0   i  i  i

  • Goal
  • S.t.
  • We minimize



min^ ^

2

2 ^ C^  i^  i



i  0

 i  1  yi   T^ xi  0 

Sp’

Substituting

  • Substituting (1)

L

2

 C  i  i   i  i   i  1  yi   T^ xi  0   i  i  i

  T

2

  i  i yi xi

 



 

^  i  C^  i  i ^  i   i  iyi^  0   i  i

L   1 2

i  (^) j yi y (^) j xiT^ x (^) j i , j

 ^  i  C^^  i  i ^  i   i  iyi^  0   i  i

Sp’

  • Substituting (2,3), we have the minimization problem

L   1 2

i  (^) j yi y (^) j xiT^ x (^) j i , j

 ^  i  C^^  i  i ^  i   i  iyi^  0   i  i

min  1 2

i  (^) j yi y (^) j xiT^ x (^) j i , j

   ii

s. t.  (^) i^ yi^  i ^0 0  iC

Sp’

The kernel method

  • The SVM formulation can be solved using QP on dot-products.
  • As these are wide-margin classifiers, they provide a more robust solution.
  • However, the true power of SVMs approach from using ‘the kernel method’, which allows us to go to higher dimensional (and non- linear spaces)

Sp’

kernel

  • Let X be the set of objects
    • Ex: X =the set of samples in micro-arrays.
    • Each object xX is a vector of gene expression values
  • k: X  X -> R is a positive semidefinite kernel if - k is symmetric. - k is +ve semidefinite
k ( x , x ')  k ( x ', x )
cT^ kc  0  c  Rp

Docsity.com

Sp’

Linear kernel is +ve semidefinite

  • Recall X as a matrix, such that each column is a sample - X=[x 1 x 2 …]
  • By definition, the linear kernel kL=XTX
  • For any c

c

T kLcc

T X

T XcXc

2  0

Sp’

Generalizing kernels

  • Any object can be represented by a feature vector in real space.

 : X  Rp

k ( x , x ')  ( x )

T

( x ')

Sp’

The kernel trick

  • If an algorithm for vectorial data is expressed exclusively in the form of dot-products, it can be changed to an algorithm on an arbitrary kernel - Simply replace the dot-product by the kernel

Sp’

Kernel trick example

  • Consider a kernel k defined on a mapping 
    • k(x,x’) = (x)T^ (x’)
  • It could be that  is very difficult to compute explicitly, but k is easy to compute
  • Suppose we define a distance function between two objects as
  • How do we compute this distance?
d ( x , x ')  ( x ) ( x ')

2

d ( x , x ')  ( x ) ( x ') 2 ( x ) T ( x ) ( x ') T ( x ')  2 ( x ) T ( x ')  k ( x , x )  k ( x ', x ')  2 k ( x , x ')