Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The concepts of support vector machines (svm) with hard and soft margins, the geometric margin, and the solutions to svm. It also introduces the idea of mapping inputs to higher-dimensional feature spaces to solve linearly inseparable cases and presents various kernel functions. The document concludes with an overview of ensemble learning and its application to svm.
Typology: Study notes
1 / 10
= =
= =
N
i
i i
N
i
i i w (^) i yx y 1 1
w yx y αi c
N
i
i i
N
i
i i
= =
, 0 0 1 1
No soft margin
With soft margin
For classifying with a new input z
Compute
classify z as + if positive, and - otherwise
Note: w need not be formed explicitly, we can classify z by taking inner products with the support vectors
= =
s
j
t t t
s
j
t t
j j j
j j
Mapping the input to a higher dimensional space
can solve the linearly inseparable cases
0 x
x 2
x
x
( x, x^2 )
Non-linear SVMs: Feature Spaces
always be mapped to some higher-dimensional feature space such that the data is linearly separable:
Example: Quadratic Feature Space
x = ( x 1 , x 2 , L, x m)
You may be wondering about the ’s At least they won’t hurt anything!
You will find out why they are there soon!
Dot product in quadratic feature space
Ф(a)·Ф(b)
Now let’s just look at another interesting function of (a·b):
1 1 1 1
2 2
1 1 1
1
2 1
2
2
∑ ∑ ∑ ∑
∑∑ ∑
∑ ∑
= = =+ =
= = =
= =
m
i
ii
m
i
m
ji
i ji j
m
i
i i
m
i
ii
m
i
m
j
i ji j
m
i
ii
m
i
ii
ab aabb ab
aabb ab
ab ab
ab ab
ab
They are the same! And the later only takes O(m) to compute!
Kernel Functions
via some transformation x → φ( x ), the dot product that we need to compute for classifying a point x becomes:
<φ( x i ) ⋅φ( x )> for all support vectors x i
inner product in some feature space.
k (a,b) = <φ( a ) ⋅φ( b )>
k (a,b) = (a⋅b+1) 2
This is equivalent to mapping to the quadratic space!
More kernel functions
In this case, the corresponding mapping φ( x ) is infinite- dimensional! Lucky that we don’t have to compute the mapping explicitly!
∑ ∑ = =
s
j
t t t
s
j
t t
j j j
j j
Note: We will not get into the details but the learning of w can be achieved by using kernel functions as well!
h 2 L hS
different training sets and/or learning algorithms
Ensemble method:
(x, y^ =h^ (x))
h 1
Traditional:
(x, y*^ =h 1 (x))
ensemble) is more accurate than a single classifier.
Bagging: Bootstrap Aggregation (Breiman, 1996)
bootstrapping
independent training sets
training sets, using the same base learner
Decision Boundary by
the CART Decision Tree Algorithm