Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

collection of papers, Study Guides, Projects, Research of Signal Processing and Analysis

COMSATS Institute of Information Technology (CIIT)Signal Processing and Analysis

papers for signal processing and machine learning

Typology: Study Guides, Projects, Research

2017/2018

Uploaded on 09/27/2018

yousafrind 🇵🇰

4 documents

1 / 25

This page cannot be seen from the preview

Don't miss anything!

Deep Support Vector Machines

Marco A. Wiering

Institute of Artificial Intelligence and Cognitive Engineering

University of Groningen, the Netherlands

Presentation at ROKS’13, Leuven, 09 July 2013

Marco A. Wiering 1/25

Discover Study Guides, Projects, Research of Signal Processing and Analysis COMSATS Institute of Information Technology (CIIT)

Partial preview of the text

Download collection of papers and more Study Guides, Projects, Research Signal Processing and Analysis in PDF only on Docsity!

Deep Support Vector Machines

Marco A. Wiering

Institute of Artificial Intelligence and Cognitive Engineering

University of Groningen, the Netherlands

Presentation at ROKS’13, Leuven, 09 July 2013

I

Support Vector Machines

I

Deep Support Vector Machines

I

Experimental Results on Regression Problems

I

Experimental Results on Classification Problems

I

Conclusion

Limitations of Support Vector Machines

I

Support Vector Machines (SVM) often outperform other

machine learning methods

I

However, the standard SVM has a single adjustable layer

of weights

I

Instead of using such “shallow models”, deep architectures

can be better alternatives

I

SVMs use a-priori chosen kernel functions to compute

similarities between input vectors

I

A problem is that the choice of kernel function is important,

but kernel functions are not very flexible

I

Therefore we propose the deep SVM (DSVM)

I

The DSVM contains multiple layers of SVMs

Support Vector Regression

I

The objective function of the SVM is based on structural

risk minimization theory developed by Vapnik in the 1960s

I

Goal: find g( x ) most suitable to the data, e.g. for

regression, -insensitive (Hinge) loss function:

I

|y

i

− g( x

i

I

But also generalize well!

I

g( x ) as flat as possible ⇒ || w || as small as possible

I

Yields a convex optimization problem

SVM Regression Objective Function

I

The resulting dual objective problem is:

max

W (α

`

i= 1

i

`

t= 1

i

)y

i

i,j= 1

i

j

)( x

i

· x

j

subject to constraints

i

≤ C,

`

i= 1

i

I

Then: o = g( x ) =

`

i= 1

i

)( x

i

· x ) + b

Optimization Algorithms

I

For SVMs specialized toolkits have been developed such

as SVMLight and LibSVM.

I

There are multiple optimization algorithms that aim to

maximize the dual objective:

I

Sequential Minimal Optimization (SMO) is often used

I

Quadratic Programming can be used as well

I

A simple solution is to use gradient ascent:

i

∂W (·)

i

+λ(−−y

i

`

j= 1

j

)K

( f ( x

i

|θ), f ( x

j

and the gradient ascent learning rule for α

i

is:

i

∂W (·)

i

+λ(−+y

i

`

j= 1

j

)K

( f ( x

i

|θ), f ( x

j

Main Ideas of DSVMs

I

Choosing the right (parameterized) kernel may be difficult

I

Instead, we will use a set of SVMs to map the input vector

x to a feature vector f ( x )

I

More SVMs can be used to create larger feature

representations

I

All support vector coefficients (α-values) are trained using

gradient ascent or descent on an adapted dual objective

function

I

Just like Multi-layer perceptrons consist of simple

perceptrons, the DSVM consists of SVMs

Architecture

[ x ] 1 / / /.-,()*+

I

f ( x )

[ x ] 2 / / /.-,()*+

H

S

J

S

M

f / /

[ x ] D− 1 / / /.-,()*+

v v v v v v v v v v

S

t t t t t t t t t t t

[ x ] D / / /.-,()*+

s s s s s s s s s

I

Input layer of size D

I

Total of d SVMs S

a

, each one extracting one feature

I

Central feature layer of size d

I

Main support vector machine M

Adapted Objective

I

Output function: g( x ) ⇒ g( f ( x ))

I

Objective function: W (α

) ⇒ W ( f ( x ), α

I

New optimization problem:

min

f ( x )

max

W ( f ( x ), α

I

This is a min-max optimization problem

I

Adapt f ( x ) through gradient descent

I

Adapt α

through gradient ascent

Training Procedure (1)

I

Adapt α

towards a (local) maximum of W ( f ( x ), α

I

i

∂W

I

Remember:

max

W (α

`

i= 1

i

`

t= 1

i

)y

i

i,j= 1

i

j

)(K ( f ( x

i

), f ( x

j

I

The resulting gradient ascent SVM training rule for α

i

− λ( − y

i

j

)K ( f ( x

i

), f ( x

j

Training Procedure (3)

I

For the RBF kernel of the main SVM we have:

δK ( f ( x

i

), f ( x ))

δ f ( x

i

a

f ( x

i

a

− f ( x

j

a

m

K ( f ( x

i

), f ( x

j

I

This leads to:

δW

δ f ( x

i

a

l

j= 1

i

j

f ( x

i

a

− f ( x

j

a

m

K ( f ( x

i

), f ( x

j

I

We create a new dataset for each feature extracting SVM

and then train it with the gradient ascent SVM algorithm

I

We repeat the alternating training of the main SVM and

feature layer SVMs a number of times

Related Work

The DSVM is related to the following methods:

I

Kernel learning. Often relies on a fixed set of basis kernels,

where

I

Parameters are learned for a kernel (e.g. RBF kernel), or:

I

Different kernels are linearly or non-linearly combined

I

There are recent developments in multi-layer kernel

learning, e.g. Dinuzzo (2010)

I

Suykens (1999) used logistic functions to learn features.

The learning algorithm was quite different

I

Vincent and Y. Bengio (2000) proposed a neural support

vector network, but it used a random subset of support

vectors and a heuristic to adapt the neural networks

Results on Regression Problems

Dataset #inst. #feat. N SVM results DSVM results Graczyk results

Baseball 337 6 4000 0.02413 ± 0.00011 0.02294 ± 0.00010 0.

Boston Housing 461 4 1000 0. 006838 ± 0. 000095 0.006381 ± 0.000091 0.

Concrete Strength 72 5 4000 0. 00706 ± 0. 000070 0. 00621 ± 0. 000054 0.

Diabetes 43 2 4000 0. 02719 ± 0. 000263 0. 02327 ± 0. 000219 0.

Machine-CPU 188 6 1000 0. 00805 ± 0. 000181 0. 00638 ± 0. 000123 0.

Mortgage 1049 6 1000 0.000080 ± 0.000001 0.000080 ± 0.000001 0.

Stock 950 5 1000 0. 00086 ± 0. 000006 0. 00076 ± 0. 000005 0.

Breast Cancer 152 6 4000 0. 06947 ± 0. 000297 0.06910 ± 0.000295 0.

Auto-MPG 392 7 1000 6.852 ± 0.091 6.715 ± 0.092 N/A

Housing 506 13 1000 8.71 ± 0.14 9.30 ± 0.15 N/A

collection of papers, Study Guides, Projects, Research of Signal Processing and Analysis

Related documents

Partial preview of the text

Download collection of papers and more Study Guides, Projects, Research Signal Processing and Analysis in PDF only on Docsity!

Deep Support Vector Machines

Marco A. Wiering

Institute of Artificial Intelligence and Cognitive Engineering

University of Groningen, the Netherlands

Presentation at ROKS’13, Leuven, 09 July 2013

Contents

I

Support Vector Machines

I

Deep Support Vector Machines

I

Experimental Results on Regression Problems

I

Experimental Results on Classification Problems

I

Conclusion

Limitations of Support Vector Machines

I

Support Vector Machines (SVM) often outperform other

machine learning methods

I

However, the standard SVM has a single adjustable layer

of weights

I

Instead of using such “shallow models”, deep architectures

can be better alternatives

I

SVMs use a-priori chosen kernel functions to compute

similarities between input vectors

I

A problem is that the choice of kernel function is important,

but kernel functions are not very flexible

I

Therefore we propose the deep SVM (DSVM)

I

The DSVM contains multiple layers of SVMs

Support Vector Regression

I

The objective function of the SVM is based on structural

risk minimization theory developed by Vapnik in the 1960s

I

Goal: find g( x ) most suitable to the data, e.g. for

regression, -insensitive (Hinge) loss function:

I

|y

i

− g( x

i

I

But also generalize well!

I

g( x ) as flat as possible ⇒ || w || as small as possible

I

Yields a convex optimization problem

SVM Regression Objective Function

I

The resulting dual objective problem is:

max

W (α

`

i= 1

i

i

`

t= 1

i

i

)y

i

i,j= 1

i

i

j

j

)( x

i

regression, -insensitive (Hinge) loss function:

+λ(−−y

+λ(−+y