

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Cluster Analysis, Hierarchical Cluster Analyses, Stimulus, Given Proximity, Pairwise Combinations, Dissimilarity Data, Euclidean Distance, Minkowski Metric, Correlation are some points from this helpful lecture notes.
Typology: Study notes
1 / 3
This page cannot be seen from the preview
Don't miss anything!


Ch. 14: Cluster Analysis (CA) I. Situation A. Given proximity or distance data obtained from all possible pairwise combinations of stimuli or regular subject-by- variable data from a usual data collection, CA makes clusters based on the distance between stimuli (variables). B. Many similar techniques have been developed in different areas (biology, sociology, psychology). SAS lists 11 methods. C. CA methods can be classified into hierarchical or non- hierarchical cluster analyses.
II. Similarity(Proximity) and Dissimilarity(Distance) data A. Euclidean distance
d( x , y ) = ( )'( ) ( ) ( )'^1 ( ) 1
x y x y x y^2 x y S x y
p
i
− − = i − i = − − − =
where x = (x 1 , x 2 ,.. xp)’, y = (y 1 , y 2 ,.. yp)’, p = the number of stimuli, and S = sample covariance matrix. B. Minkowski metric (general formula)
d( x , y ) =
p^ r
i
r xi yi
1 /
1
=
where p = the number of stimuli, and r = the order of power. If r=2, then it is the Euclidean distance. If p=2, r=1, then it is the city block (Manhattan distance) C. Correlation(closeness of the shapes)
=
p
i
xi yi 1
( )^2 can be
expressed as,
d^2 ( x , y ) = ( vx − vy )^2 + p ( x − y )^2 + 2 vxvy ( 1 − r xy ),
where
=
p
i
xi x 1
p
x x
p
i
= =^1 , vy =
=
p
i
yi y 1
p
y y
p
i
= =^1 , and
rxy = Pearson Product-moment correlation coefficient.
III. Different methods of making clusters in Hierarchical Cluster Analysis A. Single Linkage: makes clusters based on the minimum distance between one stimulus in one cluster and one stimulus in the other cluster. B. Complete Linkage: makes clusters based on the maximum distance in the other cluster. C. Average Linkage: makes clusters based on the average distance between pairs of stimuli or clusters. D. Centroid Linkage: makes clusters based on the Euclidean distance between their means (centroids). E. Median Method: to avoid the impact of cluster size, we can use the midpoint of two clusters. F. Ward’s Method: makes clusters based on the minimum between- cluster SSE.
IV. Ward’s minimum-variance method A. Model
A B
A B
n n
, where
DAB : squared distance between cluster A and cluster B, _ _ YA, YB: mean vectors for cluster A and cluster B, and NA, nB: sample size for cluster A and cluster B.
IAB = SSEAB – (SSEA + SSEB), where
=
n A
i
yi yA yi yA 1