Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The concept of support vector machines (svms) and the importance of maximizing the geometric margin for effective classification. The intuition behind the margin, the difference between functional and geometric margins, and the optimization problem for finding the maximum margin classifier.
Typology: Study notes
1 / 15
-^
+ +^
+ +^
−
+ +^
+
+^
− −^ −
− −
+
−
−^
−
−
−
Intuition of Margin
Consider points A, B, and C
-^
We are quite confident in our
A+
w
·^ x
b
= 0
q
prediction for A because it isfar from the decisionboundary
+^
+ +^
−^
−
B
w
· x
b
=
0
w · x
b
> 0
boundary
In contrast, we are not soconfident in our prediction for
+ +^
+^
− −^ −
− −
+^
w · x
b
< 0
C because a slight change inthe decision boundary mayflip the decision.
+
−
−^
−
−
−
C
flip
the decision.
Given a training set, we would like to make all of our
di ti
t^
d^
fid
t! Thi
b^
t^
d b
predictions correct and confident! This can be captured bythe concept of margin
One possible way to define margin:
-^
We define this as the functional margin of the linear classifier
w.r.t training example
x
i ,^
iy
The large the value, the better – really?
-^
What if we rescale (
w
,^ b
) by a factor
α
consider the
linear classifier specified by (
α
w
,^ α
b
Decision boundary remain the same
α
-^
We can change the functional margin of a linear classifierWe
can change the functional margin of a linear classifier without changing anything meaningful
w · x
b
?
(^1) X
1
?
(^1) X
|| ||
(^1) w
b
x w
⋅
Geometric Margin
A+
The geometric margin of (
w
,^ b
w.r.t.
x
(i)
is the distance from
x
(i)
to
the decision surface
+^
+
+ +
−
B
the
decision surface
This distance can be computed as
+ +^
+^
− −^ −
− −
+ +
−
−^
−
−
−
C
x w w
b
y^
i
i
i^
−
Given training set
i x ,^ y
i):
i=1,…, N
}, the geometric
margin of the classifier w.r.t.
is
)(
min^1
i
N i
γ
γ^
=
Note that the points
closest to the boundary are called
the
support
Note
that the points closest to the boundary are called the
support
vectors
examples are ignorable
Maximum Margin Classifier
This can be represented as a constrained optimizationproblem.
N
i
b
y
(i)
(i)
b
1
)
(
:
max subject to
,
=
≥
⋅
γ
γ
x w
w
This optimization problem is in a nasty form so we
N
i
( )y
, , 1
,
:
subject to
≥
γ
w
This
optimization problem is in a nasty form, so we
need to do some rewriting
-^
Let
γ
γ
||w||, we can rewrite this as
γ^
γ^
b
,
w
i
i^
Maximum Margin Classifier
Note that we can arbitrarily rescale
w
and
b
to make the
functional margin
'large or small γ g^
g
So we can rescale them such that
γ
' γ
i
i
b
,
w
)
min ly
equivalent (or 1
max
2 w
N
i
b
y^
i
i
b
b
, , 1
, 1 )
( :
subject to
)
min ly
equivalent (or
max
,
,
≥
⋅^ x w
w
w
w
w
Maximizing the geometric margin is equivalent to minimizing the magnitude of w
subject to maintaining a functional margin of at least 1
We can not give you a close form solution that you candirectly plug in the numbers and compute for an arbitrary
y p
g^
p^
y
data sets
-^
But, the solution can always be written in the followingform
N
N
form
-^
This is the form of
w
b can be calculated accordingly
=
=
N i
i i
N i
i i i^
1
1
α
α
This
is the form of
w
, b can be calculated accordingly
using some additional steps
-^
The weight vector is a linear combination of all thetraining examples
-^
Importantly, many of the
α
’s are zerosi
These points that have non zero
’s are the
support
These points that have non-zero
α
’s are thei
support
vectors
Class 2 α
α
10
α
α
α
α
α
α
6
Class 1
α
α