Prepare-se para as provas
Obter pontos
Guias e Dicas
Venda na Docsity
Docsity I.A.
ENEM

Prepare-se para as provas

Estude fácil! Tem muito documento disponível na Docsity

Ganhe pontos para baixar

Ganhe pontos ajudando outros esrudantes ou compre um plano Premium

Guias e Dicas

Venda na Docsity

Entrar Cadastre-se

Prepare-se para as provas

Estude fácil! Tem muito documento disponível na Docsity

Encontrar documentos

Prepare-se para as provas com trabalhos de outros alunos como você, aqui na Docsity

Encontra documentos específicos para os exames da tua universidade

Encontra documentos específicos para os exames da tua universidade

Prepare-se com as videoaulas e exercícios resolvidos criados a partir da grade da sua Universidade

Responda perguntas de provas passadas e avalie sua preparação.

Resuma seus documentos, faça perguntas, converta-os em questionários e mapas conceituais

TCC e ENEM 2026

Estude com provas passadas, TCCs e dicas úteis

Explorar perguntas

Tire suas dúvidas lendo as respostas dadas por outros alunos como você.

Ganhe pontos para baixar

Ganhe pontos ajudando outros esrudantes ou compre um plano Premium

Compartilhe documentos

Por cada documento compartilhado

Responda às perguntas

por cada resposta enviada (máx. 1 por dia)

Todas as maneiras de obter pontos grátis

Ganhe pontos imediatamente

Escolha um Plano Premium com todos os pontos que precisa

Oportunidades de estudo

Escolha seu próximo programa de estudos

Entre em contato direto com as melhores Universidades do mundo. Pesquise entre milhares de Universidades e parceiros oficiais

Comunidade

Pergunte à comunidade

Peça ajuda à comunidade e tire suas dúvidas relacionadas ao estudo

Guias grátis

Os eBooks que salvam estudantes!

Baixe gratuitamente nossos guias de estudo, métodos para diminuir a ansiedade, dicas de TCC preparadas pelos professores da Docsity

Adapting Multiple Kernel Parameters for Support Vector Machines using GA, Notas de estudo de Informática

Universidade Federal de Pernambuco (UFPE)Informática

Seleção de parâmetros de SVM utilizando Algoritmos genéticos

Tipologia: Notas de estudo

Antes de 2010

Compartilhado em 21/04/2010

francisco-carlos-monteiro-souza-1 🇧🇷

1 documento

1 / 6

Esta página não é visível na pré-visualização

Não perca as partes importantes!

bg1

626

Adapting

Multiple

Kernel

Parameters

for

Support

Vector

Machines

using

Genetic

Algorithms

Sergio

A.

Rojas

Division

of

Parasitology

National

Institute

for

Medical

Research

London

NW7

1AA,

UK

and

Department

of

Computer

Science

University

College

London

sroj

as(&,nimr.mrc.ac.uk

Abstract-

Kernel

parameterization

is

a

key

design

step

in

the

application

of

support

vector

machines

(SVM)

for

supervised

learning

problems.

A

grid-search

with

a

cross-validation

criteria

is

often

conducted

to

choose

the

kernel

parameters

but

it

is

computationally

unfeasible

for

a

large

number

of

them.

Here

we

describe

a

genetic

algorithm

(GA)

as

a

method

for

tuning

kernels

of

multiple

parameters

for

classification

tasks,

with

application

to

the

weighted

radial

basis

function

(RBF)

kernel.

In

this

type

of

kernels

the

number

of

parameters

equals

the

dimension

of

the

input

patterns

which

is

usually

high

for

biological

datasets.

We

show

preliminary

experimental

results

where

adapted

weighted

RBF

kernels

for

SVM

achieve

classification

performance

over

98%

in

human

serum

proteomic

profile

data.

Further

improvements

to

this

method

may

lead

to

discovery

of

relevant

biomarkers

in

biomedical

applications.

1

Introduction

The

Support

Vector

Machine

[1]

(SVM)

is

a

well-known

supervised

machine

learning

technique

that

has

been

applied

successfully

to

a

wide

variety

of

problems

ranging

from

classification

[2],

regression

[3]

and

clustering

[4]

in

diverse

domains

such

as

web

text

mining

[5],

gene

expression

[6]

and

proteome

analysis

of

infectious

diseases

[work

in

progress].

The

SVM

was

proposed

originally

as

a

learning

algorithm

to

find

an

optimal

discrimination

function

between

two

linearly-separable

classes

by

maximizing

the

margin

of

the

closest

samples

to

a

separating

hyper-plane

in

the

input-dimensional

space

[2].

Further

extensions

have been

made

to

handle

the

non-

separable

cases

with

a

soft

margin

parameter

[7]

and

non-

linear

cases

by

the

use

of

a

kernel

function

[2,

7].

The

kernel

function

computes

a

measure

of

similarity

between

input

patterns

in

a

transformed

vectorial

space.

The

function

chosen

to

carry

out

the

kernel

mapping

may

be

dependant

on

parameters

such

as

the

dimension

in

a

polynomial

kernel

or

the

width

in

a

radial

basis

function

(RBF)

kernel.

These

parameters

must

be

tuned

to

the

specific

dataset

in

order

to

get

the

best

performance

of

the

Delmiro

Fernandez-Reyes

Division

of

Parasitology

National

Institute

for

Medical

Research

London

NW7

lAA,

UK

and

Department

of

Computer

Science

University

College

London

dfernan(Aynimr.mrc.ac.uk

SVM.

Usually

a

grid-search

through

a

range

of

values

for

the

parameters

is

used,

by

varying

one

parameter

with

a

fixed

step-size

while

keeping

the

others

constant

[8].

However

for

kernels

with

a

large

amount

of

parameters

such

as

weighted

kernels

it

is

computationally

unfeasible.

A

gradient

descend

might

be

used

for

this

purpose

as

described

in

[11],

altough

this

method

may

lead

to

local

minima.

We

propose

using

a

genetic

algorithm

[9]

(GA)

to

search

the

parameterization

space

of

SVM

kernels

with

multiple

parameters

with

application

to

classification

problems.

In

the

next

section

we

shortly

describe

the

SVM

and

weighted

kernels.

Section

3

explains

the

GA

approach

for

tuning

weighted

kernels

and

experimental

results

on

artificial

and

real

datasets

are

shown

in

Section

4.

The

paper

concludes

with

some

directions

for

future

work.

2

SVM

and

Weighted

Kernels

We

consider

the

problem

of

binary

classification

in

a

dataset

of

examples.

Let

D

=

{(xj,

y1),...,(x1,

y1)}

be

the

set

of

training

examples

where

x

e

91'

is

an

n-dimensional

input

vector,

y

e

{+1,-1}

is

its

corresponding

class

label

and

1

is

the

number

of

examples.

A

kernel

function

K

:

9

x

9

->9

1

computes

the

inner

product

between

two

examples

K(x,z)

=<

4D(x),D(z)

>

where

tD

is

a

mapping

from

the

input

space

to

a

transformed

feature

space.

In

this

feature

space

an

SVM

learns

a

decision

function

or

I

hyperplane

of

the

form

f(x)

=

aiyiK(x,,

x)

+

b

where

i=l

coefficients

ai

are

found

by

solving

a

constrained

quadratic

optimization

problem

aimed

to

maximize

the

margin

or

distance

of

opposite

examples

to

the

hyperplane,

and

to

minimize

a

regularization

factor

that

allows

for

misclassifications

(for

a

comprehensive

description

of

SVM

the

reader

is

referred

to

[10]).

Support

vectors

are

those

examples

xi

with

corresponding

ai

>

0.

It

is

not

necessary

to

know

the

underlying

feature

mapping

if

the

function

K(x,z)

satisfies

the

Mercer's

0-7803-9363-5/05/$20.00

©2005

IEEE.

pf3

pf4

pf5

Descubra Notas de estudo de Informática Universidade Federal de Pernambuco (UFPE)

Documentos relacionados

Calculating Hydromorphological Parameters for a Catchment using ILWIS

Simultaneous Detection of Hydroquinone & Guaiacol using Biosensor & Calibration

ILWIS User Guide: Understanding Vector and Raster Maps and Their Display in ILWIS

Atributos: Impacto de Irrelevantes em Algoritmos de Aprendizado de Máquina

Kernel Básico para MC68HC908GP32

Exercícios de Algoritmos e Programação - UFPE

Desafios em Algoritmos e Programação Computacional: Exercícios para Engenharia Civil e Pro

Metodologias e critérios de avaliação de projetos agropecuários: Capitalismo x Socialismo

Boas Práticas no Ciclo do Sangue: Requisitos e Definições

Introdução à Lógica de Programação e Algoritmos

Vector Control of Three Phase AC Machines N P Quang

Métodos (pré-preparo e preparo)

Pré-visualização parcial do texto

Baixe Adapting Multiple Kernel Parameters for Support Vector Machines using GA e outras Notas de estudo em PDF para Informática, somente na Docsity!

Adapting Multiple Kernel Parameters for Support Vector Machines

using Genetic Algorithms

Sergio A. Rojas

Division of Parasitology National Institute for Medical Research London NW7 1AA, UK and

Department of Computer Science

University College London

srojas(&,nimr.mrc.ac.uk

Abstract- Kernel parameterization is a key design step in the application of support vector machines (SVM) for supervised learning problems. A grid-search with a cross-validation criteria is often conducted to choose the kernel parameters but it is computationally unfeasible for a (^) large number of them. Here we

describe a genetic algorithm (GA) as a method for

tuning kernels of multiple parameters for classification tasks, with application to the weighted radial basis function (RBF) kernel. In this type of kernels the number of parameters equals the dimension of the

input patterns which is^ usually^ high^ for^ biological

datasets. We show preliminary experimental results

where adapted weighted RBF^ kernels^ for^ SVM achieve

classification performance over^ 98% in human^ serum

proteomic profile data. Further improvements to^ this method may lead to discovery of relevant biomarkers

in biomedical applications.

1 Introduction

The Support Vector Machine [1] (SVM) is a^ well-known

supervised machine learning technique that has^ been

applied successfully to a wide variety of problems ranging

from classification [2], regression [3] and clustering [4] in

diverse domains such as web text^ mining [5], gene

expression [6] and^ proteome analysis of^ infectious

diseases [work in^ progress]. The^ SVM^ was^ proposed

originally as^ a^ learning algorithm to^ find^ an^ optimal

discrimination function between two linearly-separable

classes by maximizing the margin of the closest samples to

a separating hyper-plane in the input-dimensional space

[2]. Further extensions have been made to handle the non-

separable cases with a soft margin parameter [7] and non-

linear cases by the use of a kernel function [2, 7]. The

kernel function computes a measure^ of^ similarity between

input patterns in a transformed vectorial space.

The function chosen to carry out the kernel^ mapping

may be^ dependant on^ parameters such^ as^ the^ dimension^ in

a polynomial kernel or the width in a radial basis function

(RBF) kernel.^ These^ parameters must^ be^ tuned^ to^ the

specific dataset in order to get the best performance of the

Delmiro Fernandez-Reyes

Division of Parasitology National Institute for Medical Research

London NW7 lAA, UK

and Department of Computer Science University College London

dfernan(Aynimr.mrc.ac.uk

SVM. Usually a grid-search through a range of values for

the parameters is used, by varying one parameter with a

fixed step-size while keeping the others constant [8].

However for kernels with a large amount of parameters

such as weighted kernels it is computationally unfeasible.

A gradient descend might be used for this purpose as

described in [11], altough this method may lead to local

minima.

We propose using a genetic algorithm [9] (GA) to

search the^ parameterization space of^ SVM^ kernels with

multiple parameters with^ application to^ classification

problems. In the next section we shortly describe the SVM

and weighted kernels. Section 3 explains the GA approach

for tuning weighted kernels and experimental results on

artificial and real datasets are shown in Section 4. The paper concludes with^ some^ directions for future work.

2 SVM and Weighted Kernels

We consider the problem of binary classification in a

dataset of examples. Let D =^ {(xj, y1),...,(x1, y1)} be the set

of training examples where x e^ 91' is an n-dimensional

input vector, y e^ {+1,-1} is its^ corresponding class label

and 1 is the number of examples. A kernel function

K : 9 x 9 ->9^1 computes the inner product between two

examples K(x,z) =< 4D(x),D(z) > where tD^ is a mapping

from the input space to a transformed feature space. In^ this

feature space an^ SVM learns a^ decision function^ or

I

hyperplane of^ the^ form^ f(x) =^ aiyiK(x,, x)^ +^ b^ where

i=l

coefficients ai are found by solving a constrained

quadratic optimization problem aimed to maximize the

margin or distance of opposite examples to the

hyperplane, and to minimize a regularization factor that

allows for misclassifications (for a comprehensive

description of SVM the reader is referred to [10]).

Support vectors are^ those^ examples xi with^ corresponding

ai >^ 0.^ It^ is not^ necessary^ to know^ the^ underlying^ feature

mapping if the function K(x,z) satisfies the Mercer's

0-7803-9363-5/05/$20.00 ©2005^ IEEE.

Theorem conditions [10] (guaranteed with a positive semi-

definite Gram matrix K = (^) (K(xi x)) (^) j=l ).

Valid common kernel functions are the polynomial

K(x,z) = (a <x,z > +1)d (1)

and the radial basis function kernel

K(x,z) = exp%( 20.Z^2 (2)

These two^ kernels^ have^ few^ parameters.^ In^ the

polynomial d^ determines the^ dimension^ of the^ kernel (a

linear kernel has d =1) and a is a scaling factor. In

equation (2) a. is a factor that shapes the width of the

radial basis function. By including^ an^ independent^ scaling

factor for each^ input^ variable,^ it^ is^ possible^ to^ define^ a

more general form of^ these two^ kernels^ [11], the^ weighted

polynomial kernel:

K(ad t)R ke (3)

and (^) the weighted RBF kernel:

K(x,z) = expr- (xj^ -^ zj)^ (4)

The number of parameters or scale factors for these

kernels equals the dimensionality of the input vectors.

Note that for dimension greater than 3 or 4 it becomes

intractable to adjust them by a grid-search. Hence we

propose a GA-based method to overcome this problem.

3 GA for^ Adapting Weighted Kernels

Below we describe a kernel tuning approach for SVM

using a GA. We parameterized weighted RBF kernels but

the approach can be followed to other kind of weighted

kernels. Genetic algorithms have not been applied to

choosing multiple parameters in weighted kernels

although a related approach has been reported recently to

tune generalized Gaussian kernels by means of

evolutionary strategies [12]. In^ that study the^ kernel

matrix is modified using a^ covariance matrix^ adaptation

method with^ constraints^ to^ guarantee its^ applicability to^ a

SVM (i.e the resulting matrix must be symmetric, positive

definite). The recombination of^ good individuals^ is^ made

by averaging (obtaining the^ center^ of^ mass)^ the^ population

which prevents useful cross-over like that of parents with

opposite scale magnitude.

3.1 Encoding kernel parameters

A standard GA [9] was used in^ this approach. We^ define^ a

chromosome as^ an^ n-dimensional^ vector^ of^ real^ values,

Si =^ (cf1'02.o-7n).^ Each^ gene a0^ represents the^ scale

factor for thej-th input variable. The chromosome is then used in (3) or (4) when computing the kernel matrix K 3.2 Genetic operators The initial chromosome population is randomly generated with values between 0 and 1. We used single-point crossover to recombine subsets of scale factors. The number of parent individuals is^ defined^ by a crossover rate, 0 < Pc < 1.

Variations of scale factors are introduced by a logarithmic mutation function which is applied to a Pm =I1- pc percentage of individuals each^ generation. For these chromosomes a subset of genes J is chosen randomly across the genome according to a mutation

factor 0 <^ Pim < 1^. Next a random normally distributed

number R N(0,1) is generated and the values of the genes in J are up or down scaled (depending on the sign of R) within two folds of the current value, by rule (5).

(^) (t + (^) 1) = 10(2R)0j (t) J c (^) {1,2,.., n} (^) (5)

Note that because R is not necessarily an integer number, the power operation may introduce not only

changes in the scale but also in the value itself. The

mutation function was designed to resemble a random

logarithmic grid-search over the scaling factors. The

intuition behind is to allow the GA to search in different scale regions for individual genes 3.3 Fitness evaluation

The fitness of a chromosome is determined by its

generalization capability when plugged into the weighted

RBF kernel of a SVM classifier. We used the area under

the curve (AUC) of the classifier in a Receiver Operating

Characteristic (ROC) curve [13] as a^ measure^ of

generalization performance. A^ given chromosome^ si

comprises the scale values aj of equation (4), so a Gram

kernel matrix (^) Kican be computed using (^) si and all the

examples in^ a^ dataset.^ A^ SVM^ classifier is trained^ with

this matrix^ using a^ 5-fold^ cross-validation^ procedure. The

fitness value is estimated averaging the^ AUC^ over^ the^5

folds. We defined the fitmess function^ as^ (6). Since the

standard deviation is substracted from^ the AUC^ mean, the

fittest chromosomes are those with high AUC average and

a low dispersion. Thus the fitness value indicates the

generalization capability of a SVM trained with a kernel

with weights (^) si.

fi= AUC_crossval_ avg(si) -^ A^ UC^ crossval^ std(si) (6)

4 Experiments

4.1 Datasets and software

We performed experiments in^ a^ variety of datasets

involving real and artificial data.^ We^ used the^ Iris^ and

Table 2. Classification performance of^ experiments. Rightmost columns show AUC estimate values averaged with standard

deviation over a^ number^ of^ experiments. (N: number of experiments,^ G:^ number of generations, P: population size,^ Pc:

crossover rate, pm.: mutation^ rate, pl,,: logarithmic mutation factor)

Dataset N^ G P^ p, P P,m Cross-validation^ Held-out^ test

HAT(best in Figla.)^20 25 200 0.8^ 0.2^ 1.0^ 99.81±1.78^ 98.26±1.

HAT (best in^ Figlb.)^20 25 200 0.8^ 0.2^ 1.0^ 99.81±1.78^ 98.26±1. Iris 30 30 30 0.8^ 0.2^ 0.3^ 96.04±1.67^ 89.23±6. Iris-noise 30 30 30 0.8^ 0.2^ 0.3^ 91.92±2.77^ 87.25±6. Heart 20 30 30 0.8 0.2 0.3 86.14±1.75 81.32±6. Heart-noise 20 30 30 0.8^ 0.2^ 0.3^ 85.36±1.51^ 77.47±7. Random2l 10 30 30 0.8 0.2 0.3 87.80±0.99 86.68±3. Repeat2l 10 30 30 0.8^ 0.2^ 0.3^ 89.65±0.99^ 90.01±3. Redund2l 10 30 30 0.8^ 0.2^ 0.3^ 88.49±1.04^ 86.77±1.

irs

2 4 6 8 10 12 14 16 Generations

(a) repeat2l

a 5n Generations

15 20

(c )

1

086 i Ta

07 065 0-

n r,

heart I r I

I I

--Best 0 4 6 8 1 Population -1 0 2 4 6 8 1 0 Generations

(b)

0

Generations

(d)

F

Best Population

0, 0~ 0s C, (^) 0~

0, 0'

055 II

095 0,

86

08 zi 075

0.. (^0 )

(^12 14 )

hat

b~~~~~~~~^ -T^ r^ T^ S^ S:Ia-ae^ '-T^11 :^ .:^71 a1T^ T..T

..---Best ----- (^) Population _ __.

Figure 2.^ Classification^ performance^ over^ evolutionary^ time.^ Plots of^ average^ fitness for the best individual and the mean population are shown for^ some^ of^ the^ experiments^ in^ Table^ 2.^ Values^ are^ averaged over the number of repetitions, N. (a) Iris dataset, (b) Heart, (c) Repeat2l,^ (d) HAT.

-I

U.. I I^ t^ A^ I^ I

I I

ti ; u... I

In order to (^) study the role of the (^) plm parameter in the

quality of solutions found by the logarithmic mutation we carried out further experiments in the HAT dataset. We are particularly interested in studying this experiment because proteomics is^ a^ hot topic currently for

experimentation in bioinformatics. Besides, this dataset

comprises a higher (^) dimensionality than those (^) previously described. We varied (^) Plrn stepwise within a (^) range of 0. to 1.0 (Figure la). Note that best classification results

over 95% were obtained while setting p, = 0.8 with

Plrn =^ 1.0^.^ The^ effect^ of this^ value^ in^ the^ mutation^ factor

is that the complete genome, that is, (^) the whole set of

scaling factors, is translated in the same direction to a

bigger o lower scale, allowing the GA to explore different

order of magnitudes during the computation of the

weighting parameters. Useful combinations of subsets of

weights in dissimilar scales are then propelled by the

crossover rate. Hence we studied the effect of changing

the crossover rate using the best mutation factor rate of

1.0 (Figure lb). There were not major changes in the

classification performance when pc varies from 0.5 to

Lastly we assesed the practicality of the GA for tuning

the kernel parameters by tracing the SVM generalization

performance during the evolutionary process. Figure 2

shows plots of AUC vs generations for experiments with

the Iris (Fig. 2a), Heart (2b), Repeat2l (2c) and HAT (2d)

datasets, averaged over the number of experiments

reported in Table 2 (until the maximum number of

generations before the algorithm became stalled over the

set of experiments). It can be seen a tendency for the

AUC to increase as the number of generations grow in all

the cases. In^ the Heart and Repeat21 datasets the trend

has a small slope as these are noisy datasets. On the other

hand, for^ the^ Iris and^ HAT^ datasets, there is^ a^ sudden

increment of both population mean and best chromosome

fitness during the initial generations and then it keeps

growing gradually showing that the set of parameters

searched by the GA improve having a meaningful effect

over time. A similar behaviour was observed in the

remainder datasets.

5 Conclusions

We have described a GA approach for adjusting

multiple parameters in SVM kernels. Although we

considered weighted RBF kernels, the method can be

extended to other weighted kernels. The experiments

showed encouraging results in^ generalization performance

for tuning kernels including a few (4, 6) or a large (20,

number of parameters. In^ the latter case

parameterization is^ prohibited in the^ standard^ grid-search

technique due to computational costs. In the particular

case of the HAT proteomic dataset, performance (^) achieved

is similar to that reported by our collaborators in a

previous study using other machine learning methods not related with weighted kernels or SVM [ 16]. This study showed the applicability of GA for adapting SVM kernels to a particular dataset. However there are interesting questions arising from this approach. For example, we attained a high variability in the results of the held-out tests. When examining the weights given by the best chromosomes evolved for a specific dataset we found that they are very heterogeneous in the scale of magnitude due to the logarithmic mutation that was used. This prompted us to design a different mutation strategy, where the weights are all maintained in a homogenous scale by controlling a single global width parameterized

beforehand using a grid-search. Preliminary results of

this combined strategy are being reported in an ongoing paper.

Other ideas emerging form this work might provide

useful insight in outlining new algorithms for tasks like

feature subset selection and feature extraction. If the

weights encoded in^ the chromosome represent scale

factors of the input variables they can indicate the degree

of the relevance of those (^) variables while learning the concept implicit in^ the dataset.^ Once the GA has

evolved those variables with highest scale factors can be

regarded as the most important for solving the given task.

We are currently working on this direction by having the

GA method described above to apply a cut off threshold

on the vector (^) si forcing the less relevant features to zero

thus giving sparse weights for the selected features

(alternatively they can be ranked by magnitude). Since the

kernel weights must be plugged into the SVM during

training, this can be considered a wrapper method for

feature selection [19] in contrast to other GA approaches

where the chromosome encodes the inclusion, or the identification of the variables to (^) be included in a filter

method [15, 20]. We intend to use this approach for

biomarker discovery (results will be published

elsewhere).

Acknowledgments

We would like to thank our team of collaborators Prof.

Sanjeev Krishna, Dr.^ Dan^ Agranoff and^ Dr.^ Marios

Papadopoulos at the Department of Cellular and

Molecular Medicine, St George's Hospital Medical

School, London, UK for allowing us to use the HAT

dataset in this (^) preliminary work. Datasets and

comprehensive analytical studies will be published

elsewhere. We also thank Dr. Mark Herbster, Prof.

Anthony Finkelstein (Dept. of Computer Science, UCL,

London, UK) and Dr. Anthony A. Holder (Division of Parasitology, National Institute for Medical^ Research,

London, UK) for valuable discussions and providing

support for this work. Finally, we are grateful to the

reviewers for their usefiul comments.