Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Support Vector Machines: Maximizing the Geometric Margin - Prof. X. Fern, Study notes of Computer Science

The concept of support vector machines (svms) and the importance of maximizing the geometric margin for effective classification. The intuition behind the margin, the difference between functional and geometric margins, and the optimization problem for finding the maximum margin classifier.

Typology: Study notes

Pre 2010

Uploaded on 08/31/2009

koofers-user-p9f
koofers-user-p9f 🇺🇸

10 documents

Partial preview of the text

Download Support Vector Machines: Maximizing the Geometric Margin - Prof. X. Fern and more Study notes Computer Science in PDF only on Docsity!

Lecture 10

Support Vector Machines

Oct - 20 - 2008

Linear SeparatorsLinear

Separators

-^

Which of the linear separators is optimal?

p

p

+ +^

+ +^

+ +^

+

+^

− −^

− −

+

^

Intuition of Margin

•^

Consider points A, B, and C

-^

We are quite confident in our

A+

w

·^ x

b

= 0

q

prediction for A because it isfar from the decisionboundary

+^

+ +^

^

B

w

· x

b

=

0

w · x

b

> 0

boundary

•^

In contrast, we are not soconfident in our prediction for

+ +^

+^

− −^

− −

+^

w · x

b

< 0

C because a slight change inthe decision boundary mayflip the decision.

+

^

C

flip

the decision.

Given a training set, we would like to make all of our

di ti

t^

d^

fid

t! Thi

b^

t^

d b

predictions correct and confident! This can be captured bythe concept of margin

Functional Margin

g

•^

One possible way to define margin:

-^

We define this as the functional margin of the linear classifier

w.r.t training example

x

i ,^

iy

•^

The large the value, the better – really?

-^

What if we rescale (

w

,^ b

) by a factor

α

,^

consider the

linear classifier specified by (

α

w

,^ α

b

Decision boundary remain the same

  • Decision boundary remain the same– Yet, functional margin gets multiplied by

α

-^

We can change the functional margin of a linear classifierWe

can change the functional margin of a linear classifier without changing anything meaningful

  • We need something more meaningful

Some basic

facts about lines

Some

basic facts about lines

w · x

b

?

(^1) X

1

?

(^1) X

|| ||

(^1) w

b

x w

Geometric Margin

A+

•^

The geometric margin of (

w

,^ b

w.r.t.

x

(i)

is the distance from

x

(i)

to

the decision surface

+^

+

+ +

B

A γ

the

decision surface

•^

This distance can be computed as

+ +^

+^

− −^

− −

+ +

^

C

x w w

(^

b

y^

i

i

i^

•^

Given training set

S

i x ,^ y

i):

i=1,…, N

}, the geometric

margin of the classifier w.r.t.

S

is

)(

min^1

i

N i

γ

γ^

L

=

Note that the points

closest to the boundary are called

the

support

Note

that the points closest to the boundary are called the

support

vectors

  • in fact these are the only points that really matters, other

examples are ignorable

Maximum Margin Classifier

•^

This can be represented as a constrained optimizationproblem.

N

i

b

y

(i)

(i)

b

1

)

(

:

max subject to

,

=

γ

γ

x w

w

•^

This optimization problem is in a nasty form so we

N

i

( )y

, , 1

,

:

subject to

L

γ

w

•^

This

optimization problem is in a nasty form, so we

need to do some rewriting

-^

Let

γ

γ

⋅^

||w||, we can rewrite this as

γ^

γ^

||^

b

max

,

γ^ w

w

N
i
b
y^

i

i^

subject to
L
x
w
w

Maximum Margin Classifier

•^

Note that we can arbitrarily rescale

w

and

b

to make the

functional margin

'large or small γ g^

g

•^

So we can rescale them such that

max

γ

' γ

N
i
b
y^

i

i

b

max subject to

,

L
x
w
w

w

)

min ly

equivalent (or 1

max

2 w

N

i

b

y^

i

i

b

b

, , 1

, 1 )

( :

subject to

)

min ly

equivalent (or

max

,

,

L

⋅^ x w

w

w

w

w

Maximizing the geometric margin is equivalent to minimizing the magnitude of w

subject to maintaining a functional margin of at least 1

The solution•^

We can not give you a close form solution that you candirectly plug in the numbers and compute for an arbitrary

y p

g^

p^

y

data sets

-^

But, the solution can always be written in the followingform

N

N

form

-^

This is the form of

w

b can be calculated accordingly

=

=

N i

i i

N i

i i i^

y
x
y^

1

1

s.t.,

α

α

w

This

is the form of

w

, b can be calculated accordingly

using some additional steps

-^

The weight vector is a linear combination of all thetraining examples

-^

Importantly, many of the

α

’s are zerosi

These points that have non zero

’s are the

support

•^

These points that have non-zero

α

’s are thei

support

vectors

A Geometrical InterpretationA

Geometrical Interpretation

Class 2 α

α

10

α

α

α

α

α

α

6

Class 1

α

α