Machine Learning: Feature Maps and Kernels, Slides of Artificial Intelligence

How to apply machine learning algorithms to non-linearly separable data by using feature maps and corresponding kernels. It covers the concept of a feature map, its relationship to dot-products, and the definition of a kernel. Examples of different types of kernels are provided, including polynomial and radial basis function kernels. The document also discusses the benefits of using kernels for non-linearity and the computational efficiency of computing the kernel matrix instead of the feature map.

Typology: Slides

2012/2013

Uploaded on 04/23/2013

sarangarajan
sarangarajan 🇮🇳

4.2

(5)

68 documents

1 / 30

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CSE 151 Machine Learning
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e

Partial preview of the text

Download Machine Learning: Feature Maps and Kernels and more Slides Artificial Intelligence in PDF only on Docsity!

CSE 151 Machine Learning

Feature Maps

Data which is not linearly separable may be linearly

separable in another feature space.

X 1

X 2

Z 1

Z 2

Z 3

In the figure, data is linearly separable in z-space:

(Z 1 , Z 2 , Z 3 ) = (X 1

2

, 2X 1 X2, X 2

2

Feature Maps

Let φ(x)^ be a feature map. For example: = (x 1

2

, x 1 , x 2 )

We want to run perceptron on the feature space

φ(x)

φ(x)

Many linear classification algorithms, including perceptron,

SVMs, etc can be written in terms of dot-products ￿x, z￿

We can simply change these dot products to ￿φ(x),^ φ(z)￿

For a feature map φ we define the corresponding kernel K:

K(x, y) = ￿φ(x), φ(y)￿

Thus we can write the perceptron algorithm in terms of K.

Computing K directly is often faster than computing the map.

For a feature map (^) φ (^) we define the corresponding kernel K: K(x, y) = ￿φ(x), φ(y)￿ Examples:

K(x, z) = (￿x, z￿) 2

For a feature map (^) φ (^) we define the corresponding kernel K: K(x, y) = ￿φ(x), φ(y)￿ Examples:

K(x, z) = (￿x, z￿) 2 Suppose d=2, and: φ(x) = [φ 1 (x),^ φ 2 (x),^ φ 3 (x),.. .] (￿x, z￿) 2 = (x 1 z 1 + x 2 z 2 ) 2 ￿φ(x), φ(z)￿ = φ 1 (x)φ 1 (z) + φ 2 (x)φ 2 (z) +...

For a feature map (^) φ (^) we define the corresponding kernel K: K(x, y) = ￿φ(x), φ(y)￿ Examples:

K(x, z) = (￿x, z￿) 2 Suppose d=2, and: φ(x) = [φ 1 (x),^ φ 2 (x),^ φ 3 (x),.. .] (￿x, z￿) 2 = (x 1 z 1 + x 2 z 2 ) 2

= x

2 1

z

2 1

+ x 1 x 2 z 1 z 2 + x 1 x 2 z 1 z 2 + x

2 2

z

2 2 ￿φ(x), φ(z)￿ = φ 1 (x)φ 1 (z) + φ 2 (x)φ 2 (z) +...

For a feature map (^) φ (^) we define the corresponding kernel K: K(x, y) = ￿φ(x), φ(y)￿ Examples:

K(x, z) = (￿x, z￿) 2 Suppose d=2, and: φ(x) = [φ 1 (x),^ φ 2 (x),^ φ 3 (x),.. .] (￿x, z￿) 2 = (x 1 z 1 + x 2 z 2 ) 2

= x

2 1

z

2 1

+ x 1 x 2 z 1 z 2 + x 1 x 2 z 1 z 2 + x

2 2

z

2 2

φ 1 (x)φ 1 (z) φ 2 (x)φ 2 (z) φ

3 (x)φ 3 (z)^ φ^4 (x)φ^4 (z)

￿φ(x), φ(z)￿ = φ 1 (x)φ 1 (z) + φ 2 (x)φ 2 (z) +...

φ(x) = [x

2 1

, x 1 x 2 , x 1 x 2 , x

2 2 Feature map for K: ]

For a feature map (^) φ (^) we define the corresponding kernel K: K(x, y) = ￿φ(x), φ(y)￿ Examples:

φ(x) (^) = [ x 1 2 , x 1 x 2 , x 1 x 3 , x 2 x 1 , x 2 2 , x 2 x 3 , x 3 x 1 , x 3 x 2 , x 3 2 ] T For d=3, K(x, z) =

d ￿ i= xizi

d ￿ i= xizi

d ￿ i= d ￿ j= xixj zizj = d ￿ i,j= (xixj )(zizj ) Time to compute K directly = O(d) Time to compute K though feature map = O(d 2 ) For more general d: K(x, z) = (￿x, z￿) 2

For a feature map (^) φ (^) we define the corresponding kernel K: K(x, y) = ￿φ(x), φ(y)￿ Examples:

For d=2: K(x, z) = (￿x, z￿) 2 K(x, z) = (￿x, z￿ + c) 2 (￿x, z￿ + c) 2 = (￿x, z￿) 2

  • 2c￿x, z￿ + c 2 = x 2 1 z 2 1
  • x 1 x 2 z 1 z 2 + x 1 x 2 z 1 z 2 + x 2 2 z 2 2
  • 2cx 1 z 1 + 2cx 2 z 2 + c 2

For a feature map (^) φ (^) we define the corresponding kernel K: K(x, y) = ￿φ(x), φ(y)￿ Examples:

For d=2: K(x, z) = (￿x, z￿) 2 K(x, z) = (￿x, z￿ + c) 2 (￿x, z￿ + c) 2 = (￿x, z￿) 2

  • 2c￿x, z￿ + c 2 = x 2 1 z 2 1
  • x 1 x 2 z 1 z 2 + x 1 x 2 z 1 z 2 + x 2 2 z 2 2
  • 2cx 1 z 1 + 2cx 2 z 2 + c 2 Like previous example

For a feature map (^) φ (^) we define the corresponding kernel K: K(x, y) = ￿φ(x), φ(y)￿ Examples:

For d=2: K(x, z) = (￿x, z￿) 2 K(x, z) = (￿x, z￿ + c) 2 (￿x, z￿ + c) 2 = (￿x, z￿) 2

  • 2c￿x, z￿ + c 2 = x 2 1 z 2 1
  • x 1 x 2 z 1 z 2 + x 1 x 2 z 1 z 2 + x 2 2 z 2 2
  • 2cx 1 z 1 + 2cx 2 z 2 + c 2 Like previous example φ 5 (x)φ 5 (z) φ 6 (x)φ 6 (z) φ 7 (x)φ 7 (z) φ(x) = [x 2 1 , x 1 x 2 , x 1 x 2 , x 2 2 , √ 2 cx 1 , √ 2 cx 2 , c]

For a feature map (^) φ (^) we define the corresponding kernel K: K(x, y) = ￿φ(x), φ(y)￿ Examples:

K(x, z) = (￿x, z￿) 2 K(x, z) = (￿x, z￿ + c) 2 φ(x) = [xixj , 1 ≤ i, j ≤ d, √ 2 cxi, 1 ≤ i ≤ d, c] More general d:

For a feature map (^) φ (^) we define the corresponding kernel K: K(x, y) = ￿φ(x), φ(y)￿ Examples:

Corresponds to an infinite dimensional feature map! K(x, z) = (￿x, z￿) 2 K(x, z) = (￿x, z￿ + c) 2 K(x, z) = (￿x, z￿ + c) d

K(x, z) = exp

−￿x − z￿

2

/c

2

For a feature map (^) φ (^) we define the corresponding kernel K: K(x, y) = ￿φ(x), φ(y)￿ Examples:

Corresponds to an infinite dimensional feature map!

  1. String kernels(#common words in a string), graph kernels, etc K(x, z) = (￿x, z￿) 2 K(x, z) = (￿x, z￿ + c) 2 K(x, z) = (￿x, z￿ + c) d

K(x, z) = exp

−￿x − z￿

2

/c

2