Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Clustering Algorithms Via Function Optimization, Slides of Pattern Classification and Recognition

Banasthali Vidyapith Pattern Classification and Recognition

This lecture is related to Pattern Classification and Recognition. It was delivered by Sahayu Agendra at Banasthali Vidyapith. It includes: Clustering, Algorithms, Function, Optimization, Parametric, Specific, Model, Com[act, Ring-shaped, Decomposition

Typology: Slides

2011/2012

Uploaded on 07/17/2012

bandhula 🇮🇳

4.7

(10)

91 documents

1 / 61

This page cannot be seen from the preview

Don't miss anything!

1

CLUSTERING ALGORITHMS VIA

FUNCTION OPTIMIZATION

In this context the clusters are assumed to be described by a parametric

specific model whose parameters are unknown (all parameters are

included in a vector denoted by θ).

Examples:

Compact clusters. Each cluster Ciis represented by a point miin the

l-dimensional space. Thus θ=[m1T, m2T, …, mmT]T

.

Ring-shaped clusters. Each cluster Ciis modeled by a hypersphere

C(ci,ri), where ciand riare its center and its radius, respectively.

Thus

θ=[c1T, r1, c2T, r2, …, cmT, rm]T

.

A cost J(θ)is defined as a function of the data vectors in Xand θ.

Optimization of J(θ)with respect to θresults in θthat characterizes

optimally the clusters underlying X.

The number of clusters mis a priori known in most of the cases.

docsity.com

Discover Slides of Pattern Classification and Recognition Banasthali Vidyapith

Partial preview of the text

Download Clustering Algorithms Via Function Optimization and more Slides Pattern Classification and Recognition in PDF only on Docsity!

1

CLUSTERING ALGORITHMS VIA

FUNCTION OPTIMIZATION

In this context the clusters are assumed to be described by a parametricspecific model whose parameters are unknown (all parameters areincluded in a vector denoted by

θ

Examples:

Compact clusters. Each cluster

C

is represented by a pointi

m

in thei

l-dimensional space. Thus

θ

=[

m^1

T, m

T 2 , …, m

Tm

T]

Ring-shaped clusters. Each cluster

C

is modeled by a hyperspherei

C(

c^ ,ri^

), wherei

c

andi^

r^ i

are its center and its radius, respectively.

Thus

θ =

[c

T, r 1

, c 1

T 2 , r

, …, c 2

Tm , r

]m

T^.

A cost

J

( θ

)^ is defined as a function of the data vectors in

X

and

θ

Optimization of

J

( θ

)^ with respect to

θ

results in

θ

that characterizes

optimally the clusters underlying

X

The number of clusters

m

is a priori known in most of the cases.

2

Cost optimization clustering algorithms considered in the sequel

Mixture decomposition schemes. 

Fuzzy clustering algorithms. 

Possibilistic clustering algorithms.

4

M-step

t+

arg

max

Q Θ

(t))

More specifically, the M-step results in:For

θ

’s:j

(*) Provided that all pairs of

( θ

, θ k

)^ j are functionally independent.

For

P

’s:j

(**) Taking into account the constraints

P

k 0 ,^

k=

1 ,…,m

and

P

+P 1

+…P 2

=m

Thus, the EM algorithm for this case may be stated as follows:

(*)

1

0 ) ; | ( ln

))( ; | (

^



 



N i

m j

j j i

j

i j^

C x p t x C P



(**)

1

))( ; | (

1







N i

i j

j^

t x C P N P

docsity.com

5



Generalized Mixture Decomposition Algorithmic Scheme (GMDAS)

Choose initial estimates,

θ

= θ

and

P

=P

t= 

Repeat

Compute

, i

,…,N, j

,…,m

Set

θ

(tj

equal to the solution of the equation

with respect to

θ

, forj

j=

1 ,…,m

Set

,^ j=

1 ,…,m

•^

t=

t+

Until convergence, with respect to

, is achieved.

 

^

m k^

k k k i

j j j i i j

t P t C x p t P t C x p t x C P

1

)( ))( ; | (

))( ; | (

^



 



N i

m j

j j i

j

i j^

C x p t x C P 1

1

0 ) ; | ( ln

))( ; | (





 

N i

i j

j^

t x C P N t

P^

1

))( ; | (

1 ) 1 (

7

Compact and Hyperellipsoidal ClustersIn this case :

•^

each cluster

C

is modeled by a normal distributionj

N

( μ

, Σ j

).j

•^

θ j^

consists of the parameters of

μ

and the (independent)j

parameters of

.j

It is

,^

j=

,…,m

For this case:• Eq. (1) in GMDAS is replaced by• Eq. (2) in GMDAS is replaced by the equations

)

( )

1 ( 2 ) (^2) (

| | ln ) ; | ( ln^

1

(^2) / 1 2 /

j

j T j

l j

j j^

x

C xp





^

   

 ^











 



     

^

m k^

k k k T k k j j j T j j j

t P t x t t x t t P t x t t x t t x

CP

1

(^2) / 1

1

(^2) / 1

)( )))(

)((

))(

1 ( 2 exp( |) ( |

)( )))(

)((

))(

1 (^2 exp( |) ( | ))( ;| (



 

 ^

N k

k j

N k

k

k j

j

t x CP

x t x CP

t

1 1

))( ; | (

) 1 ( 









  ^

N k

k j

N k

T j k j k k j j

t x CP

t x t x t x

CP

t

1

))( ; | (

))(

))((

))(( ; | (

) 1 (



8

Remark:

The above scheme is computationally very demanding since it

requires the inversion of the

m

covariance matrices at each

iteration step. Two ways to deal with this problem are:

The use of a single covariance matrix for all clusters.• The use of different diagonal covariance matrices. 

Example 1: (a) Consider three two-dimensional normal distributionswith mean values:

μ^1

=[

,^ 1]

T,^

μ^2

=[3.

,^ 3.5]

T,^

μ^3

=[

,^ 1]

T

and covariance matricesrespectively.A group of

vectors stem from each distribution. These form the

data set

X

3

2

1

10

(b) The same as (a) but now

μ

=[1 1

,^ 1]

T,^

μ^2

=[

,^ 2]

T,^

μ^3

=[

,^ 1]

T^ (The clusters

are closer). Confusion matrix:

Cluster 1

Cluster 2

Cluster 3

st 1 distribution

85

4

11

nd 2 distribution

35

56

9

rd 3 distribution

26

0

74

The algorithm reveals the underlying structure less accurately.

The data set

Results of GMDAS

11



Fuzzy clustering algorithms^ 

Each vector belongs simultaneously to more than one clusters. 

A fuzzy

m

-clustering of

X

, is defined by a set of functions

u^ : Xj^

A

[

,^ 1]

,^

j=

,…,m

If^

A={

, a hard

m

-clustering of

X

is produced.

u

(xj

)^ i

denotes the degree of membership of

x

in clusteri^

C

. It isj

u^1

(x

)+i

u

(x 2

)+i

+^

u^ m

(x

)=1i

The number of clusters

m

is assumed to be known

a priori.

13

Fuzzy clustering algorithms (cont)

Most fuzzy clustering schemes result from the minimization of :

Subject to the constraints:where

u^ ij

[

,1]

,^

i=

,…,N,

j=

,…,m

and

^



N i

m j

j i q ij

q^

x

d

u

U

J^

1



m j

ij^

N

i

u 1

, 1 m

j

N

u

N i

ij^

1

^

14

Remarks:

•^

The degree of membership of

x

ini^

C

cluster is related to the grade ofj

membership of

x

in resti^

m

clusters.

•^

If^

q=

, no fuzzy clustering is better than the best hard clustering in

terms of

J

( θ q

,U

•^

If^

q>

, there are fuzzy clusterings with lower values of

J

( θ q

,U

)^ than

the best hard clustering.

16

Generalized Fuzzy Algorithmic Scheme (GFAS)

Choose

θ

(0)j

as initial estimate for

θ

,^ j

j=

,…,m

•^

t=

Repeat

^

For

i=

to

N

o For

j=

to

m

(A)

o End {For-

j}

^

End {For-

i}

^

t=t

^

For

j=

to

m

o

Parameter updating: Solve

(B)

with respect to

θ

and setj

θ

(tj )^ equal to this solution.

^

End {For-

j}

Until a termination criterion is met



 



^

m j

q

k i

j i

ij^

t

x d

t

x d

t u^

1

1 1 ))( , (

))( , (

1 )(

 

1

N i

j

j i

q ij

x d

t u

17

Remarks:

•^

A candidate termination condition is

|| θ

(t)-

θ (

t-1)||<

ε ,

where

.||^

is any vector norm and

ε

a user-defined constant.

•^

GFAS may also be initialized from

U

instead of

θ

(0)j

,^ j

,…,m

and

start iterations with computing

θ

first.j

•^

If a point

x

coincides with one or more representatives, then it isi^

shared arbitrarily among the clusters whose representatives coincidewith

x

, subject to the constraint that the summation of the degree ofi

membership over all clusters sums to

19



Fuzzy clustering – Point representatives (cont.)

The

Minkowski distance

where

p

is a positive integer and

x

,ik

θ

jk^

are the

k

-th coordinates of

x

i

and

θ

.j

For even and finite

p

, the differentiability of

d

(x

, θ i^

)^ j is guaranteed. In

this case the updating equation (B) of GFAS givesa system of

l^

nonlinear equations with

l^

unknowns.

GFAS algorithms with the Minkowski distance are also known aspFCM algorithms.

^





^









 



N i

p

l k

p jk ik

p ir jr

q ij^

l

r

x

t u 1

(^11)

1

,..., 1

, 0

|

)

( ) 1 (

 

p

l k

p jk ik

j i^

x

x d

1

|

) , (^

 

^  



^





docsity.com

20



Fuzzy Clustering – Point representatives (cont.)

Example 2(a):

Consider the setup of example 1(a).• Consider GFAS with distances

(i)

d

(x

, θ i^

) = (j

x^ - i

θ

T)j

A(

x^ - i

θ

), withj

A

being the identity matrix

(ii)

d

(x

, θ i^

) = (j

x^ - i

θ

T)j

A(

x^ - i

θ

), withj

(iii) The Minkowski distance with

p

Example 2(b):

Consider the setup of example 1(b).• Consider GFAS with the distances considered in example 2(a).

Clustering Algorithms Via Function Optimization, Slides of Pattern Classification and Recognition

Related documents

Partial preview of the text

Download Clustering Algorithms Via Function Optimization and more Slides Pattern Classification and Recognition in PDF only on Docsity!

CLUSTERING ALGORITHMS VIA

FUNCTION OPTIMIZATION

C

=[

T]

C

C(

T^.

J

X

J

X

Q Θ

P

P

=P

•^

^

•^

C

N

•^

=[

,^ 1]

T,^

=[3.

,^ 3.5]

T,^

=[

,^ 1]

X

=[1 1

,^ 1]

T,^

=[

,^ 2]

T,^

=[

,^ 1]

X

A

[

,^ 1]

,^

A={

X

C

+^

a priori.

Fuzzy clustering algorithms (cont)

[

,1]

,^

,…,N,

^

x

d

u

U

J^

N

i

u 1

, 1 m

j

N

u

^

•^

C

•^

J

,U

•^

J

,U