Clustering Algorithms Via Function Optimization, Slides of Pattern Classification and Recognition

This lecture is related to Pattern Classification and Recognition. It was delivered by Sahayu Agendra at Banasthali Vidyapith. It includes: Clustering, Algorithms, Function, Optimization, Parametric, Specific, Model, Com[act, Ring-shaped, Decomposition

Typology: Slides

2011/2012

Uploaded on 07/17/2012

bandhula
bandhula 🇮🇳

4.7

(10)

91 documents

1 / 61

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
CLUSTERING ALGORITHMS VIA
FUNCTION OPTIMIZATION
In this context the clusters are assumed to be described by a parametric
specific model whose parameters are unknown (all parameters are
included in a vector denoted by θ).
Examples:
Compact clusters. Each cluster Ciis represented by a point miin the
l-dimensional space. Thus θ=[m1T, m2T, …, mmT]T
.
Ring-shaped clusters. Each cluster Ciis modeled by a hypersphere
C(ci,ri), where ciand riare its center and its radius, respectively.
Thus
θ=[c1T, r1, c2T, r2, …, cmT, rm]T
.
A cost J(θ)is defined as a function of the data vectors in Xand θ.
Optimization of J(θ)with respect to θresults in θthat characterizes
optimally the clusters underlying X.
The number of clusters mis a priori known in most of the cases.
docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d

Partial preview of the text

Download Clustering Algorithms Via Function Optimization and more Slides Pattern Classification and Recognition in PDF only on Docsity!

1

CLUSTERING ALGORITHMS VIA

FUNCTION OPTIMIZATION

In this context the clusters are assumed to be described by a parametricspecific model whose parameters are unknown (all parameters areincluded in a vector denoted by

θ

Examples:

Compact clusters. Each cluster

C

is represented by a pointi

m

in thei

l-dimensional space. Thus

θ

=[

m^1

T, m

T 2 , …, m

Tm

T]

Ring-shaped clusters. Each cluster

C

is modeled by a hyperspherei

C(

c^ ,ri^

), wherei

c

andi^

r^ i

are its center and its radius, respectively.

Thus

θ =

[c

T, r 1

, c 1

T 2 , r

, …, c 2

Tm , r

]m

T^.

A cost

J

( θ

)^ is defined as a function of the data vectors in

X

and

θ

Optimization of

J

( θ

)^ with respect to

θ

results in

θ

that characterizes

optimally the clusters underlying

X

The number of clusters

m

is a priori known in most of the cases.

2

Cost optimization clustering algorithms considered in the sequel

Mixture decomposition schemes. 

Fuzzy clustering algorithms. 

Possibilistic clustering algorithms.

4

  • M-step

t+

arg

max

Q Θ

(t))

More specifically, the M-step results in:For

θ

’s:j

(*) Provided that all pairs of

( θ

, θ k

)^ j are functionally independent.

For

P

’s:j

(**) Taking into account the constraints

P

k 0 ,^

k=

1 ,…,m

and

P

+P 1

+…P 2

=m

Thus, the EM algorithm for this case may be stated as follows:

(*)

1

1

0 ) ; | ( ln

))( ; | (

^

 

N i

m j

j j i

j

i j^

C x p t x C P

(**)

1

))( ; | (

1



N i

i j

j^

t x C P N P

docsity.com

5

Generalized Mixture Decomposition Algorithmic Scheme (GMDAS)

Choose initial estimates,

θ

= θ

and

P

=P

t= 

Repeat

  • Compute

, i

,…,N, j

,…,m

  • Set

θ

(tj

equal to the solution of the equation

with respect to

θ

, forj

j=

1 ,…,m

  • Set

,^ j=

1 ,…,m

•^

t=

t+

Until convergence, with respect to

, is achieved.

 

^

m k^

k k k i

j j j i i j

t P t C x p t P t C x p t x C P

1

)( ))( ; | (

)( ))( ; | (

))( ; | (

^

 

N i

m j

j j i

j

i j^

C x p t x C P 1

1

0 ) ; | ( ln

))( ; | (



 

N i

i j

j^

t x C P N t

P^

1

))( ; | (

1 ) 1 (

7

Compact and Hyperellipsoidal ClustersIn this case :

•^

each cluster

C

is modeled by a normal distributionj

N

( μ

, Σ j

).j

•^

θ j^

consists of the parameters of

μ

and the (independent)j

parameters of

.j

It is

,^

j=

,…,m

For this case:• Eq. (1) in GMDAS is replaced by• Eq. (2) in GMDAS is replaced by the equations

)

( )

1 ( 2 ) (^2) (

| | ln ) ; | ( ln^

1

(^2) / 1 2 /

j

j T j

l j

j j^

x

x

C xp

^

   

 ^

 

     

^

m k^

k k k T k k j j j T j j j

t P t x t t x t t P t x t t x t t x

CP

1

1

(^2) / 1

1

(^2) / 1

)( )))(

)((

))(

1 ( 2 exp( |) ( |

)( )))(

)((

))(

1 (^2 exp( |) ( | ))( ;| (

 

 ^

N k

k j

N k

k

k j

j

t x CP

x t x CP

t

1 1

))( ; | (

))( ; | (

) 1 ( 

  ^

N k

k j

N k

T j k j k k j j

t x CP

t x t x t x

CP

t

1

1

))( ; | (

))(

))((

))(( ; | (

) 1 (

8

Remark:

  • The above scheme is computationally very demanding since it

requires the inversion of the

m

covariance matrices at each

iteration step. Two ways to deal with this problem are:

  • The use of a single covariance matrix for all clusters.• The use of different diagonal covariance matrices. 

Example 1: (a) Consider three two-dimensional normal distributionswith mean values:

μ^1

=[

,^ 1]

T,^

μ^2

=[3.

,^ 3.5]

T,^

μ^3

=[

,^ 1]

T

and covariance matricesrespectively.A group of

vectors stem from each distribution. These form the

data set

X

3

2

1

10

(b) The same as (a) but now

μ

=[1 1

,^ 1]

T,^

μ^2

=[

,^ 2]

T,^

μ^3

=[

,^ 1]

T^ (The clusters

are closer). Confusion matrix:

Cluster 1

Cluster 2

Cluster 3

st 1 distribution

85

4

11

nd 2 distribution

35

56

9

rd 3 distribution

26

0

74

The algorithm reveals the underlying structure less accurately.

The data set

Results of GMDAS

11

Fuzzy clustering algorithms^ 

Each vector belongs simultaneously to more than one clusters. 

A fuzzy

m

-clustering of

X

, is defined by a set of functions

u^ : Xj^

A

[

,^ 1]

,^

j=

,…,m

If^

A={

, a hard

m

-clustering of

X

is produced.

u

(xj

)^ i

denotes the degree of membership of

x

in clusteri^

C

. It isj

u^1

(x

)+i

u

(x 2

)+i

+^

u^ m

(x

)=1i

The number of clusters

m

is assumed to be known

a priori.

13

Fuzzy clustering algorithms (cont)

Most fuzzy clustering schemes result from the minimization of :

Subject to the constraints:where

u^ ij

[

,1]

,^

i=

,…,N,

j=

,…,m

and

^

N i

m j

j i q ij

q^

x

d

u

U

J^

1

1

m j

ij^

N

i

u 1

, 1 m

j

N

u

N i

ij^

1

^

14

Remarks:

•^

The degree of membership of

x

ini^

C

cluster is related to the grade ofj

membership of

x

in resti^

m

clusters.

•^

If^

q=

, no fuzzy clustering is better than the best hard clustering in

terms of

J

( θ q

,U

•^

If^

q>

, there are fuzzy clusterings with lower values of

J

( θ q

,U

)^ than

the best hard clustering.

16

Generalized Fuzzy Algorithmic Scheme (GFAS)

  • Choose

θ

(0)j

as initial estimate for

θ

,^ j

j=

,…,m

•^

t=

  • Repeat

^

For

i=

to

N

o For

j=

to

m

(A)

o End {For-

j}

^

End {For-

i}

^

t=t

^

For

j=

to

m

o

Parameter updating: Solve

(B)

with respect to

θ

and setj

θ

(tj )^ equal to this solution.

^

End {For-

j}

  • Until a termination criterion is met

 



^

m j

q

k i

j i

ij^

t

x d

t

x d

t u^

1

1 1 ))( , (

))( , (

1 )(

 

1

N i

j

j i

q ij

x d

t u

17

Remarks:

•^

A candidate termination condition is

|| θ

(t)-

θ (

t-1)||<

ε ,

where

.||^

is any vector norm and

ε

a user-defined constant.

•^

GFAS may also be initialized from

U

instead of

θ

(0)j

,^ j

,…,m

and

start iterations with computing

θ

first.j

•^

If a point

x

coincides with one or more representatives, then it isi^

shared arbitrarily among the clusters whose representatives coincidewith

x

, subject to the constraint that the summation of the degree ofi

membership over all clusters sums to

19

Fuzzy clustering – Point representatives (cont.)

  • The

Minkowski distance

where

p

is a positive integer and

x

,ik

θ

jk^

are the

k

-th coordinates of

x

i

and

θ

.j

For even and finite

p

, the differentiability of

d

(x

, θ i^

)^ j is guaranteed. In

this case the updating equation (B) of GFAS givesa system of

l^

nonlinear equations with

l^

unknowns.

GFAS algorithms with the Minkowski distance are also known aspFCM algorithms.

^

^

 

N i

p

l k

p jk ik

p ir jr

q ij^

l

r

x

x

t u 1

(^11)

1

1

,..., 1

, 0

|

|

)

( ) 1 (

 

p

l k

p jk ik

j i^

x

x d

1

1

|

|

) , (^

 

^  

^



docsity.com

20

Fuzzy Clustering – Point representatives (cont.)

Example 2(a):

  • Consider the setup of example 1(a).• Consider GFAS with distances

(i)

d

(x

, θ i^

) = (j

x^ - i

θ

T)j

A(

x^ - i

θ

), withj

A

being the identity matrix

(ii)

d

(x

, θ i^

) = (j

x^ - i

θ

T)j

A(

x^ - i

θ

), withj

(iii) The Minkowski distance with

p

Example 2(b):

  • Consider the setup of example 1(b).• Consider GFAS with the distances considered in example 2(a).

^

A