






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Various methods for standardizing cases or variables using proximities software. It covers standardization methods like z, range, rescale, max, mean, and sd. Additionally, it discusses transformations like absolute, reverse, and rescale. The document also introduces proximity measures for continuous, frequency count, and binary data, such as euclid, correlation, and rr.
Typology: Study notes
1 / 12
This page cannot be seen from the preview
Don't miss anything!







1
Either cases or variables can be standardized. The following methods of standardization are available:
PROXIMITIES subtracts the mean from each value for the variable or case being standardized and then divides by the standard deviation of the values. If a standard deviation is 0, PROXIMITIES sets all values for the case or variable to 0.
PROXIMITIES divides each value for the variable or case being standardized by the range of the values. If the range is 0, PROXIMITIES leaves all values unchanged.
From each value for the variable or case being standardized, PROXIMITIES subtracts the minimum value and then divides by the range. If a range is 0, PROXIMITIES sets all values for the case or variable to 0.50.
PROXIMITIES divides each value for the variable or case being standardized by the maximum of the values. If the maximum of a set of values is 0, PROXIMITIES uses an alternate process to produce a comparable standardization: it divides by the absolute magnitude of the smallest value and adds 1.
PROXIMITIES divides each value for the variable or case being standardized by the mean of the values. If a mean is 0, PROXIMITIES adds one to all values for the case or variable to produce a mean of 1.
PROXIMITIES divides each value for the variable or case being standardized by the standard deviation of the values. PROXIMITIES does not change the values if their standard deviation is 0.
Three transformations are available for the values PROXIMITIES computes or reads:
Take the absolute values of the proximities.
Transform similarity values into dissimilarities, or vice versa, by changing the signs of the coefficients.
RESCALE standardizes the proximities by first subtracting the value of the smallest and then dividing by the range. If you specify more than one transformation, PROXIMITIES does them in the order listed above: first ABSOLUTE, then REVERSE, then RESCALE.
The distance between two items is the maximum absolute difference between the values for the items.
CHEBYCHEV (^) I x y , (^) T = max i x (^) i − yi
The distance between two items is the sum of the absolute differences between the values for the items.
BLOCK x y x (^) i yi i
I , T =^ ∑ −
MINKOWSKI( p )
The distance between two items is the p th root of the sum of the absolute differences to the p th power between the values for the items.
MINKOWSKI x y x (^) i yi i
p p I , T =^ −
% ' &
( 0 ∑ )
1
POWER (^) I p r , T
The distance between two items is the r th root of the sum of the absolute differences to the p th power between the values for the items.
POWER x y x (^) i yi i
p r I , T =^ % − ' &
( 0 ∑ )
1
The magnitude of this dissimilarity measure depends on the total frequencies of the two cases or variables whose proximity is computed. Expected values are from the model of independence of cases (or variables), x and y.
CHISQ x y
x E x E x
y E y E y
i i
i i
i i
i i
I , T
P I TU I T
P I TU I T
∑ ∑
2 2
This is the CHISQ measure normalized by the square root of the combined frequency. Therefore, its value does not depend on the total frequencies of the two cases or variables whose proximity is computed.
x y
x y N
I T
PROXIMITIES constructs a 2 × 2 contingency table for each pair of items in turn. It uses this table to compute a proximity measure for the pair. Item 2 Present Absent Item 1 Present a b Absent c d
PROXIMITIES computes all binary measures from the values of a , b , c, and d. These values are tallies across variables (when the items are cases) or tallies across cases (when the items are variables).
Rogers and Tanimoto Similarity Measure
RT x y a d a d b c
I , T I T
Sokal and Sneath Similarity Measure 2
SS2 x y a a b c
I , T I T
Kulczynski Similarity Measure 1
This measure has a minimum value of 0 and no upper limit. It is undefined when there are no nonmatches (^) I b = 0 and c = (^0) T. Therefore, PROXIMITIES assigns an artificial upper limit of 9999.999 to K1 when it is undefined or exceeds this value.
K1 x y
a b c
I , T =
Sokal and Sneath Similarity Measure 3
This measure has a minimum value of 0, has no upper limit, and is undefined when there are no nonmatches I b = 0 and c = 0 T. As with K1, PROXIMITIES assigns an artificial upper limit of 9999.999 to SS3 when it is undefined or exceeds this value.
SS3 x y
a d b c
I , T =
The following three binary measures yield values that you can interpret in terms of conditional probability. All three are similarity measures.
Kulczynski Similarity Measure 2
This yields the average conditional probability that a characteristic is present in one item given that the characteristic is present in the other item. The measure is an average over both items acting as predictors. It has a range of 0 to 1.
K2 x y
a a b a a c I , T
Sokal and Sneath Similarity Measure 4
This yields the conditional probability that a characteristic of one item is in the same state (present or absent) as the characteristic of the other item. The measure is an average over both items acting as predictors. It has a range of 0 to 1.
SS4 x y
a a b a a c d b d d c d I , T
Hamann Similarity Measure
This measure gives the probability that a characteristic has the same state in both items (present in both or absent from both) minus the probability that a characteristic has different states in the two items (present in one and absent from the other). HAMANN has a range of –1 to +1 and is monotonically related to SM, SS1, and RT.
HAMANN x y
a d b c a b c d
I , T
The following four binary measures assess the association between items as the predictability of one given the other. All four measures yield similarities.
Yule’s Q (Similarity)
This is the 2 × 2 version of Goodman and Kruskal’s ordinal measure gamma. Like Yule’s Y , Q is a function of the cross-product ratio for a 2 × 2 table and has a range of –1 to +1.
Q x y ad bc ad bc
I , T =^
The remaining binary measures available in PROXIMITIES are either binary equivalents of association measures for continuous variables or measures of special properties of the relation between items.
Ochiai Similarity Measure
This is the binary form of the cosine. It has a range of 0 to 1 and is a similarity measure.
OCHIAI x y
a a b
a a c
I , T =
% '&^
( 0 )^ +
% '&^
( 0 )
Sokal and Sneath Similarity Measure 5
This is a similarity measure. Its range is 0 to 1.
SS5 x y
ad a b a c b d c d
I , T I TI TI TI T
Fourfold Point Correlation (Similarity)
This is the binary form of the Pearson product-moment correlation coefficient. Phi is a similarity measure, and its range is 0 to 1.
PHI x y
ad bc a b a c b d c d
I , T I TI TI TI T
Binary Euclidean Distance
This is a distance measure. Its minimum value is 0, and it has no upper limit.
BEUCLID (^) I x y , (^) T = b + c
Binary Squared Euclidean Distance
This is also a distance measure. Its minimum value is 0, and it has no upper limit.
BSEUCLID (^) I x y , (^) T = b + c
Size Difference
This is a dissimilarity measure with a minimum value of 0 and no upper limit.
SIZE x y
b c a b c d
I , T
I T I T
2 2
Pattern Difference
This is also a dissimilarity measure. Its range is 0 to 1.
PATTERN x y bc a b c d
I , T I T
2
Binary Shape Difference
This dissimilarity measure has no upper or lower limit.
BSHAPE x y
a b c d b c b c a b c d
I , T
I TI T I T I T
2 2