Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Cluster Analysis - Basic Statistics for Behavioral Sciences - Lecture Notes, Study notes of Statistics for Psychologists

English and Foreign Languages University Statistics for Psychologists

Cluster Analysis, Hierarchical Cluster Analyses, Stimulus, Given Proximity, Pairwise Combinations, Dissimilarity Data, Euclidean Distance, Minkowski Metric, Correlation are some points from this helpful lecture notes.

Typology: Study notes

2011/2012

Uploaded on 11/21/2012

ashakiran 🇮🇳

4.5

(27)

261 documents

1 / 3

This page cannot be seen from the preview

Don't miss anything!

Ch. 14: Cluster Analysis (CA)

I. Situation

A. Given proximity or distance data obtained from all possible

pairwise combinations of stimuli or regular subject-by-

variable data from a usual data collection, CA makes

clusters

based on the distance between stimuli (variables).

B. Many similar techniques have been developed in different

areas (biology, sociology, psychology). SAS lists 11 methods.

C. CA methods can be classified into hierarchical or non-

hierarchical cluster analyses.

1. Hierarchical Cluster Analysis

a) Starts with n-clusters (each stimulus is a cluster).

b) The two closest clusters are merged to form a new

cluster that replaces the two old clusters.

c) Merging two closest clusters is repeated until only

one cluster is left.

d) Different methods have different ways to compute the

distance between two clusters.

2. Non-hierarchical Cluster Analysis: n-clusters will be

separated into g-clusters without using a hierarchical

method.

II. Similarity(Proximity) and Dissimilarity(Distance) data

A. Euclidean distance

d(x, y) =

)(

)'()()()'(

1

2

yx

Syxyxyxyx

p

iii

−−=−=−−

−

=

∑

.

where

x = (x1, x2, . . xp)’,

y = (y1, y2, . . yp)’,

p = the number of stimuli, and

S = sample covariance matrix.

B. Minkowski metric (general formula)

d(x, y) =

r

p

i

r

ii

yx

/1

1

|| 









−

∑

=

.

where p = the number of stimuli, and r = the order of power.

If r=2, then it is the Euclidean distance. If p=2, r=1, then

it is the city block (Manhattan distance)

C. Correlation(closeness of the shapes)

The squared Euclidean distance d2(x, y) =

∑

=

−

p

iii yx

1

2

)(

can be

expressed as,

d2(x, y) =

)1(2

)()(

22 xyyxyx

rvvyxp

vv −+−+−

,

Docsity.com

Discover Study notes of Statistics for Psychologists English and Foreign Languages University

Partial preview of the text

Download Cluster Analysis - Basic Statistics for Behavioral Sciences - Lecture Notes and more Study notes Statistics for Psychologists in PDF only on Docsity!

Ch. 14: Cluster Analysis (CA) I. Situation A. Given proximity or distance data obtained from all possible pairwise combinations of stimuli or regular subject-by- variable data from a usual data collection, CA makes clusters based on the distance between stimuli (variables). B. Many similar techniques have been developed in different areas (biology, sociology, psychology). SAS lists 11 methods. C. CA methods can be classified into hierarchical or non- hierarchical cluster analyses.

Hierarchical Cluster Analysis a) Starts with n-clusters (each stimulus is a cluster). b) The two closest clusters are merged to form a new cluster that replaces the two old clusters. c) Merging two closest clusters is repeated until only one cluster is left. d) Different methods have different ways to compute the distance between two clusters.
Non-hierarchical Cluster Analysis: n-clusters will be separated into g-clusters without using a hierarchical method.

II. Similarity(Proximity) and Dissimilarity(Distance) data A. Euclidean distance

d( x , y ) = ( )'( ) ( ) ( )'^1 ( ) 1

x y x y x y^2 x y S x y

p

i

− − = i − i = − − − =

where x = (x 1 , x 2 ,.. xp)’, y = (y 1 , y 2 ,.. yp)’, p = the number of stimuli, and S = sample covariance matrix. B. Minkowski metric (general formula)

d( x , y ) =

p^ r

i

r xi yi

1 /

1

=

where p = the number of stimuli, and r = the order of power. If r=2, then it is the Euclidean distance. If p=2, r=1, then it is the city block (Manhattan distance) C. Correlation(closeness of the shapes)

The squared Euclidean distance d^2 ( x , y ) = ∑

=

p

i

xi yi 1

( )^2 can be

expressed as,

d^2 ( x , y ) = ( vx − vy )^2 + p ( x − y )^2 + 2 vxvy ( 1 − r xy ),

where

vx = ∑

=

p

i

xi x 1

( )^2 ,

p

x x

p

i

∑ i

= =^1 , vy =

=

p

i

yi y 1

( )^2 ,

p

y y

p

i

∑ i

= =^1 , and

rxy = Pearson Product-moment correlation coefficient.

III. Different methods of making clusters in Hierarchical Cluster Analysis A. Single Linkage: makes clusters based on the minimum distance between one stimulus in one cluster and one stimulus in the other cluster. B. Complete Linkage: makes clusters based on the maximum distance in the other cluster. C. Average Linkage: makes clusters based on the average distance between pairs of stimuli or clusters. D. Centroid Linkage: makes clusters based on the Euclidean distance between their means (centroids). E. Median Method: to avoid the impact of cluster size, we can use the midpoint of two clusters. F. Ward’s Method: makes clusters based on the minimum between- cluster SSE.

IV. Ward’s minimum-variance method A. Model

DAB =

A B

n n

Y Y

( )^2

, where

DAB : squared distance between cluster A and cluster B, _ _ YA, YB: mean vectors for cluster A and cluster B, and NA, nB: sample size for cluster A and cluster B.

∑ (^ Y^ A^ −^ YB )=|| YA − YB || = Euclidean length.

For any distance or dissimilarity data, d( x , y ) = Σ(x-y)^2 /2, Ward’s Method joins two clusters A and B which minimizes the IAB (Increase in SSE), which is the same as minimizing the between-cluster distances.

IAB = SSEAB – (SSEA + SSEB), where

SSEA = ∑

=

n A

i

yi yA yi yA 1

Cluster Analysis - Basic Statistics for Behavioral Sciences - Lecture Notes, Study notes of Statistics for Psychologists

Related documents

Partial preview of the text

Download Cluster Analysis - Basic Statistics for Behavioral Sciences - Lecture Notes and more Study notes Statistics for Psychologists in PDF only on Docsity!

The squared Euclidean distance d^2 ( x , y ) = ∑

vx = ∑

( )^2 ,

∑ i

( )^2 ,

∑ i

Y Y

( )^2

∑ (^ Y^ A^ −^ YB )=|| YA − YB || = Euclidean length.

SSEA = ∑