Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Support Vector Machines (SVM) in Machine Learning: Optimization and Classification - Prof., Study notes of Computer Science

An overview of support vector machines (svm) in machine learning, focusing on their classification and regression methods. Svms aim to find optimal linear or non-linear hyperplanes in feature space for separating data, with the goal of maximizing the margin between classes. The document also covers the concept of support vectors and the use of mercer kernels for non-linear data.

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-ino
koofers-user-ino 🇺🇸

10 documents

1 / 22

Toggle sidebar

Related documents


Partial preview of the text

Download Support Vector Machines (SVM) in Machine Learning: Optimization and Classification - Prof. and more Study notes Computer Science in PDF only on Docsity!

Greg Grudic Machine Learning 1

Support Vector Machines

Greg Grudic

(Notes borrowed from Bernhard Schölkopf)

Today’s Lecture Goals

  • Support Vector Machine Classification
    • Same assumptions on data!
    • Different Optimization problem.
  • Support Vector Machine Regression
    • Same assumptions on data!
    • Different Optimization problem.

A Good text on SVMs: Bernhard Schölkopf and Alex Smola.

Learning with Kernels. MIT Press, Cambridge, MA,

2002

Greg Grudic Machine Learning 3

Support Vector Machine

Classification

  • Classification is as a problem of finding

optimal (canonical) linear hyperplanes.

  • Optimal Linear Separating Hyperplanes:
    • In Feature Space
    • In Kernel Space

Linear Separating Hyper-Planes

How many lines can separate these points?

NO!

Which line should we use?

Greg Grudic Machine Learning 5

Linear Separating Hyper-Planes

1

x

2

x

0

T

w x + b <

0

T

w x + b >

0

T

w x + b =

y = − 1

y = + 1

Linear Separating Hyper-Planes

  • Given data:
  • Finding a separating hyperplane can be posed as a

constraint satisfaction problem

  • Or, equivalently

(1,..., ), find such that

1 if 1

1 if 1

T

i i

T

i i

i N

b y

b y

∀ ∈

  • ≥ + = +

  • ≤ − = −

w

w x

w x

( ) ( ) 1 1

, ,..., , N N

x y x y

( )

1 0,

T

i i

y w x + b − ≥ ∀ i

Greg Grudic Machine Learning 7

Margin For Canonical Hyperplane

1

, ,

,

d

i i

i

w x

=

=

=

w x

x x x

Calculating the Margin:

How do SVMs choose the

boundary?

2

margin =

w

Find the one with the

Maximum MARGIN!

Greg Grudic Machine Learning 9

SVM: Constraint Optimization

Problem

  • Given data:
  • Minimize subject to:

( ) ( ) 1 1

, ,..., , N N

x y x y

( ) (^ )

1 0, 1,...,

T

i i

y w x + b − ≥ ∀ i = N

2

w

The Lagrange Function Formulation

Greg Grudic Machine Learning 11

Derivation of the Dual Problem

  • At the saddle point (extremum)
  • This give the conditions
  • Substitute into to get the dual problem

L ( , , b ) 0, L ( , , b ) 0

b

∂ ∂

= =

∂ ∂

w α w α

w

1 1

0,

N N

i i i i i

i i

α y αy

= =

= = ∑ ∑

w x

L ( w , , b α )

Dual Problem

  • Maximize
  • Subject to

( )

1 1 1

1

,

2

N N N

i i j i j i j

i i j

W α α α α y y

= = =

= − ∑ ∑∑

x x

1

0, 1,...,

0

i

N

i i

i

i N

y

α

α

=

≥ =

= ∑

Greg Grudic Machine Learning 13

Support Vector Expansion (1)

1

N

i i i

i

α y

=

=

w x

, 1 0 irrelevant

i i i i

y b α

 
+ > ⇒ = →
 

w x x

OR

, 1 (On Margin) Support Vector

i i i

y b

 
+ =
 

w x x

Support Vector Expansion (2)

1

N

i i i

i

α y

=

=

w x

f (^) ( x ) (^) = sgn (^) ( w x , + b )

OR

( )

1

sgn ,

N

i i i

i

f αy b

=

 
= + 
 
 

x x x

Substitute

Greg Grudic Machine Learning 15

What are the Support Vectors?

Maximized

Margin

Why do we want a model with only

a few SVs?

  • Leaving out an example that does not become and

SV gives the same solution!

  • Theorem (Vapnik and Chervonenkis, 1974) :

Let #SV(N) be the number of SVs obtained by

training on N examples randomly drawn for

P(X,Y), and E be an expectation. Then

[ ]

[ #SV(N)]

Prob(test error)

N

E

E

Greg Grudic Machine Learning 17

What Happens When Data is Not

Separable: Soft Margin SVM

Add a Slack Variable

0 if correctly classified

distance to margin otherwise

i

i

ξ

  

= (^) 

 

x

i

ξ

Soft Margin SVM: Constraint

Optimization Problem

  • Given data:
  • Minimize subject to:

( ) ( ) 1 1

, ,..., , N N

x y x y

( ) (^ )

1 , 1,...,

T

i i i

y w x + b ≥ − ξi = N

2

1

1

2

N

i

i

C ξ

=

w

Greg Grudic Machine Learning 19

Dual Problem (Non-separable data)

  • Maximize
  • Subject to

( )

1 1 1

1

,

2

N N N

i i j i j i j

i i j

W α α α α y y

= = =

= − ∑ ∑∑

x x

1

0 , 1,...,

0

i

N

i i

i

C i N

y

α

α

=

≤ ≤ =

= ∑

Same Decision Boundary

( )

1

sgn ,

N

i i i

i

f αy b

=

 
 
= + 
 
 

x x x

Greg Grudic Machine Learning 21

Nonlinear SVMs

  • KEY IDEA: Note that both the decision boundary and dual

optimization formulation rely on dot products in input

space only!

( )

1

sgn ,

N

i i i

i

f αy b

=

 
 
= + 
 
 

x x x

( )

1 1 1

1

,

2

N N N

i i j i j i j

i i j

W α α α α y y

= = =

= − ∑ ∑∑

x x

,

i

x x

Mapping into Nonlinear Space

Greg Grudic Machine Learning 23

Nonlinear

Data?

SVM Model Structure

Greg Grudic Machine Learning 25

Kernel Trick

Replace

,

i j

x x

( ) (^ )^ ( )

, ,

i j i j

K x x = Φ x Φ x

with

Can use the same algorithms in nonlinear kernel space!

Nonlinear SVMs

( ) ( )

1

sgn ,

N

i i i

i

f αy K b

=

 
 
= + 
 
 

x x x

( ) ( )

1 1 1

1

,

2

N N N

i i j i j i j

i i j

W α α α α y y K

= = =

= −

∑ ∑∑

x x

Maximize:

Boundary:

Greg Grudic Machine Learning 27

Need Mercer Kernels

( ) (^ )^ ( )

( ) (^ )

( )

, ,

,

,

i j i j

j i

j i

K

K

= Φ Φ

= Φ Φ

=

x x x x

x x

x x

Gram (Kernel) Matrix

( ) ( )

( ) ( )

1 1 1

1

, ,

, ,

N

N N N

K K

K

K K

 

 

     =    

 

     

x x x x

x x x x

%

"

( ) ( ) 1 1

, ,..., , N N

x y x y Training Data:

Properties:

•Positive Definite Matrix

•Symmetric

•Positive on diagnal

•N by N

Greg Grudic Machine Learning 29

Commonly Used Mercer Kernels

  • Polynomial
  • Sigmoid
  • Gaussian

( ) (^) ( )

( ) (^) ( )

( )

2

2

, ,

, tanh ,

, exp

d

i j i j

i j i j

i j i j

K c

K
K

κ θ

σ

= +
= +
 
= − −
 
 

x x x x

x x x x

x x x x

Greg Grudic Machine Learning 33

MNIST: A SVM Success Story

  • Handwritten

character benchmark

  • 60,000 training and

10,0000 testing

  • Dimension d = 28 x

28

Results on Test Data

SVM used a polynomial kernel of degree 9.

Greg Grudic Machine Learning 35

Learning Regression Models

  • Collect Training data
  • Build Model: stock value = M ( feature space)
  • Make a prediction

Feature Space

Stock

Value

*
*
*
*
*
*
* *
*
*
*
*
*
*
*
*
*
*
*
*
*
* *
*
*
*
*
*
*
*
*
*
*
*
* *
*
*
*
*
*
  • Greg Grudic Machine Learning
  • Greg Grudic Machine Learning
  • Greg Grudic Machine Learning

Greg Grudic Machine Learning 41

Greg Grudic Machine Learning 43

( ) ( )

( ( ))

1 1

2

1

: , ,..., ,

1

K K

K

i i

i

TestData y y

MSE y f

K

= − ∑

x x

x

Mean Squared Error (MSE)

Results on test data: