Optimal Feature Generation-Recognizing Patterns and Classifying Them-Lecture Slides, Slides of Pattern Classification and Recognition

This lecture is related to Pattern Classification and Recognition. It was delivered by Sahayu Agendra at Banasthali Vidyapith. It includes: Optimal, Feature, Generation, Fisher, Linear, Discrimination, Scatter, Matrices, Transformation, Criterion, Diagnolizes, Simultaneously

Typology: Slides

2011/2012

Uploaded on 07/17/2012

bandhula
bandhula 🇮🇳

4.7

(10)

91 documents

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Optimal Feature Generation
In general, feature generation is a problem-dependent
task. However, there are a few general directions
common in a number of applications. We focus on three
such alternatives.
Optimized features based on Scatter matrices (Fisher’s
linear discrimination).
The goal: Given an original set of mmeasurements
,compute ,bythelinear transformation
so that the J3scattering matrix criterion involving Sw,S
b
is maximized. ATis an matrix.
m
x
y
xAy T
xm
docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Optimal Feature Generation-Recognizing Patterns and Classifying Them-Lecture Slides and more Slides Pattern Classification and Recognition in PDF only on Docsity!

1

Optimal Feature Generation

In general, feature generation is a problem-dependenttask.

However,

there

are

a

few

general

directions

common in a number of applications. We focus on threesuch alternatives.

Optimized

features

based

on

Scatter

matrices

(Fisher’s

linear discrimination).

  • The goal: Given an original set

of

m

measurements

, compute

, by the linear transformation

so that the

J

3

scattering matrix criterion involving

S

w

, S

b

is maximized.

A

T^

is an

matrix.

m

x

y

x A

y

T

xm

2

  • The basic steps in the proof:
J

3

= trace{

S

w

S

m

S

yw

A

T

S

xw

A, S

yb

= A
T^ S

xb

A
J
A

)=trace{(

A
T^ S

xw

A
A
T^ S

xb

A
  • Compute
A

so that

J
A
)^

is maximum.

  • The solution:
    • Let
B

be

the

matrix

that

diagonalizes

simultaneously matrices

S

yw

, S

yb

, i.e:

B
T^ S

yw

B = I , B
T^ S

yb

B = D

where

B

is a

x

matrix and

D

a

x

diagonal matrix.

4

If

<M-

, choose the

eigenvectors corresponding to

the

largest eigenvectors.

In

this

case,

J 3,y

<J

3,x

,^

that

is

there

is

loss

of

information.

  • Geometric

interpretation.

The

vector

is

the

projection of

onto the subspace spanned by the

eigenvectors of

y

x

xb

xw

S

S

(^1) 

5

Principal Components Analysis(The Karhunen – Loève transform):

The goal: Given an original set of

m

measurements

compute for an orthogonal

A
,^

so that the elements of

are

optimally mutually uncorrelated.That is 

Sketch of the proof:

m

x

     y

x

A

y

T

y

^

j i j y i y E

, 0 ) ( ) (

A R A A x x A E y y E R x

T

T

T

T

y^

docsity.com

7

Define

The Karhunen – Loève transform minimizes thesquare error: 

The error is:

It

can

be

also

shown

that

this

is

the

minimum

mean

square

error

compared

to

any

other

representation of

x

by an

ℓ-dimensional vector.

  

(^10)

) (

ˆ^

i

i a i y

x

^

^

  

  



2

2

) (

ˆ^

m i

i a i y E x x E

^

^



m i

i

x

x

E

2

8

In other words,

is the projection of

into

the

subspace

spanned

by

the

principal

eigenvectors. However, for Pattern Recognitionthis is not the always the best solution.

ˆ x

x

10

Subspace Classification. Following the idea of projecting ina

subspace,

the

subspace

classification

classifies

an

unknown

to the class whose subspace is closer to

The following steps are in order:

  • For each class, estimate the autocorrelation matrix
R

, i

and compute the

m

largest eigenvalues. Form

A

, by i

using respective eigenvectors as columns.

  • Classify

to the class

ω

, i

for which the norm of the

subspace projection is maximumAccording to Pythagoras theorem, this corresponds tothe subspace to which

is closer.

x

x

x

x

j i x A x A

T j

T i^

11

Independent Component Analysis (ICA)In

contrast

to

PCA,

where

the

goal

was

to

produce

uncorrelated

features,

the

goal

in

ICA

is

to

produce

statistically

independent

features.

This

is

a

much

stronger requirement, involving higher to second orderstatistics. In this way, one may overcome the problemsof PCA, as exposed before.

The goal: Given

, compute

so

that

the

components

of

are

statistically

independent.

In

order

the

problem

to

have

a

solution, the following assumptions must be valid:

  • Assume

that

is

indeed

generated

by

a

linear

combination of independent components

x

   y x W

y^

y

x

y Φ

x

13

Common’s

method:

Given

,^

and

under

the

previously

stated

assumptions,

the

following

steps

are adopted:

  • Step 1: Perform PCA on
  • Step 2: Compute a unitary matrix,

, so that the fourth

order cross-cummulants of the transform vectorare zero. This is equivalent to searching for an

that

makes the squares of the auto-cummulants maximum,where,

is the 4

th

order auto-cumulant.

x

x

x A

y

T

ˆ A

y A

y

T^

ˆ ˆ 

ˆ A

^

(^2) 

4

ˆˆ

) (

ˆ) (

max

 

i y k

A

TA A

^

4 k

14

  • Step 3:

A hierarchy of components: which

ℓ^

to use? In PCA

one

chooses

the

principal

ones.

In

ICA

one

can

choose the ones with the least resemblance to theGaussian pdf.

^

TA A

W

ˆ