





















































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This lecture is related to Pattern Classification and Recognition. It was delivered by Sahayu Agendra at Banasthali Vidyapith. It includes: Clustering, Algorithms, Function, Optimization, Parametric, Specific, Model, Com[act, Ring-shaped, Decomposition
Typology: Slides
1 / 61
This page cannot be seen from the preview
Don't miss anything!






















































1
In this context the clusters are assumed to be described by a parametricspecific model whose parameters are unknown (all parameters areincluded in a vector denoted by
θ
Examples:
Compact clusters. Each cluster
is represented by a pointi
m
in thei
l-dimensional space. Thus
θ
m^1
T, m
T 2 , …, m
Tm
Ring-shaped clusters. Each cluster
is modeled by a hyperspherei
c^ ,ri^
), wherei
c
andi^
r^ i
are its center and its radius, respectively.
Thus
θ =
[c
T, r 1
, c 1
T 2 , r
, …, c 2
Tm , r
]m
A cost
( θ
)^ is defined as a function of the data vectors in
and
θ
Optimization of
( θ
)^ with respect to
θ
results in
θ
that characterizes
optimally the clusters underlying
The number of clusters
m
is a priori known in most of the cases.
2
Cost optimization clustering algorithms considered in the sequel
Mixture decomposition schemes.
Fuzzy clustering algorithms.
Possibilistic clustering algorithms.
4
t+
arg
max
(t))
More specifically, the M-step results in:For
θ
’s:j
(*) Provided that all pairs of
( θ
, θ k
)^ j are functionally independent.
For
’s:j
(**) Taking into account the constraints
P
k 0 ,^
k=
1 ,…,m
and
P
+P 1
+…P 2
=m
Thus, the EM algorithm for this case may be stated as follows:
(*)
1
1
0 ) ; | ( ln
))( ; | (
^
N i
m j
j j i
j
i j^
C x p t x C P
(**)
1
))( ; | (
1
N i
i j
j^
t x C P N P
docsity.com
5
Generalized Mixture Decomposition Algorithmic Scheme (GMDAS)
Choose initial estimates,
θ
= θ
and
t=
Repeat
, i
,…,N, j
,…,m
θ
(tj
equal to the solution of the equation
with respect to
θ
, forj
j=
1 ,…,m
,^ j=
1 ,…,m
t=
t+
Until convergence, with respect to
, is achieved.
^
m k^
k k k i
j j j i i j
t P t C x p t P t C x p t x C P
1
)( ))( ; | (
)( ))( ; | (
))( ; | (
N i
m j
j j i
j
i j^
C x p t x C P 1
1
0 ) ; | ( ln
))( ; | (
N i
i j
j^
t x C P N t
P^
1
))( ; | (
1 ) 1 (
7
Compact and Hyperellipsoidal ClustersIn this case :
each cluster
is modeled by a normal distributionj
( μ
, Σ j
).j
θ j^
consists of the parameters of
μ
and the (independent)j
parameters of
.j
It is
,^
j=
,…,m
For this case:• Eq. (1) in GMDAS is replaced by• Eq. (2) in GMDAS is replaced by the equations
)
( )
1 ( 2 ) (^2) (
| | ln ) ; | ( ln^
1
(^2) / 1 2 /
j
j T j
l j
j j^
x
x
C xp
^
^
^
m k^
k k k T k k j j j T j j j
t P t x t t x t t P t x t t x t t x
CP
1
1
(^2) / 1
1
(^2) / 1
)( )))(
)((
))(
1 ( 2 exp( |) ( |
)( )))(
)((
))(
1 (^2 exp( |) ( | ))( ;| (
^
N k
k j
N k
k
k j
j
t x CP
x t x CP
t
1 1
))( ; | (
))( ; | (
) 1 (
^
N k
k j
N k
T j k j k k j j
t x CP
t x t x t x
CP
t
1
1
))( ; | (
))(
))((
))(( ; | (
) 1 (
8
Remark:
requires the inversion of the
m
covariance matrices at each
iteration step. Two ways to deal with this problem are:
Example 1: (a) Consider three two-dimensional normal distributionswith mean values:
μ^1
μ^2
μ^3
T
and covariance matricesrespectively.A group of
vectors stem from each distribution. These form the
data set
3
2
1
10
(b) The same as (a) but now
μ
μ^2
μ^3
T^ (The clusters
are closer). Confusion matrix:
Cluster 1
Cluster 2
Cluster 3
st 1 distribution
85
4
11
nd 2 distribution
35
56
9
rd 3 distribution
26
0
74
The algorithm reveals the underlying structure less accurately.
The data set
Results of GMDAS
11
Fuzzy clustering algorithms^
Each vector belongs simultaneously to more than one clusters.
A fuzzy
m
-clustering of
, is defined by a set of functions
u^ : Xj^
j=
,…,m
If^
, a hard
m
-clustering of
is produced.
u
(xj
)^ i
denotes the degree of membership of
x
in clusteri^
. It isj
u^1
(x
)+i
u
(x 2
)+i
u^ m
(x
)=1i
The number of clusters
m
is assumed to be known
13
Most fuzzy clustering schemes result from the minimization of :
Subject to the constraints:where
u^ ij
i=
j=
,…,m
and
N i
m j
j i q ij
q^
1
1
m j
ij^
N i
ij^
1
14
Remarks:
The degree of membership of
x
ini^
cluster is related to the grade ofj
membership of
x
in resti^
m
clusters.
If^
q=
, no fuzzy clustering is better than the best hard clustering in
terms of
( θ q
If^
q>
, there are fuzzy clusterings with lower values of
( θ q
)^ than
the best hard clustering.
16
Generalized Fuzzy Algorithmic Scheme (GFAS)
θ
(0)j
as initial estimate for
θ
j=
,…,m
t=
For
i=
to
o For
j=
to
m
o End {For-
j}
End {For-
i}
t=t
For
j=
to
m
o
with respect to
θ
and setj
θ
(tj )^ equal to this solution.
End {For-
j}
^
m j
q
k i
j i
ij^
t
x d
t
x d
t u^
1
1 1 ))( , (
))( , (
1 )(
1
N i
j
j i
q ij
x d
t u
17
Remarks:
A candidate termination condition is
|| θ
(t)-
θ (
t-1)||<
ε ,
where
is any vector norm and
ε
a user-defined constant.
GFAS may also be initialized from
instead of
θ
(0)j
,^ j
,…,m
and
start iterations with computing
θ
first.j
If a point
x
coincides with one or more representatives, then it isi^
shared arbitrarily among the clusters whose representatives coincidewith
x
, subject to the constraint that the summation of the degree ofi
membership over all clusters sums to
19
Fuzzy clustering – Point representatives (cont.)
where
p
is a positive integer and
x
,ik
θ
jk^
are the
k
-th coordinates of
x
i
and
θ
.j
For even and finite
p
, the differentiability of
d
(x
, θ i^
)^ j is guaranteed. In
this case the updating equation (B) of GFAS givesa system of
l^
nonlinear equations with
l^
unknowns.
^
^
N i
p
l k
p jk ik
p ir jr
q ij^
l
r
x
x
t u 1
(^11)
1
1
,..., 1
, 0
|
|
)
( ) 1 (
p
l k
p jk ik
j i^
x
x d
1
1
|
|
) , (^
^
^
docsity.com
20
Fuzzy Clustering – Point representatives (cont.)
Example 2(a):
(i)
d
(x
, θ i^
) = (j
x^ - i
θ
T)j
x^ - i
θ
), withj
being the identity matrix
(ii)
d
(x
, θ i^
) = (j
x^ - i
θ
T)j
x^ - i
θ
), withj
(iii) The Minkowski distance with
p
Example 2(b):