Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Standardizing Data and Computing Proximities: Methods and Measures, Study notes of Mathematical Statistics

Alliance University Mathematical Statistics

Various methods for standardizing cases or variables using proximities software. It covers standardization methods like z, range, rescale, max, mean, and sd. Additionally, it discusses transformations like absolute, reverse, and rescale. The document also introduces proximity measures for continuous, frequency count, and binary data, such as euclid, correlation, and rr.

Typology: Study notes

2011/2012

Uploaded on 10/31/2012

sangawar 🇮🇳

4.5

(4)

118 documents

1 / 12

This page cannot be seen from the preview

Don't miss anything!

1

PROXIMITIES

Standardizing Cases or Variables

Either cases or variables can be standardized. The following methods of

standardization are available:

Z

PROXIMITIES subtracts the mean from each value for the variable or case being

standardized and then divides by the standard deviation of the values. If a standard

deviation is 0, PROXIMITIES sets all values for the case or variable to 0.

RANGE

PROXIMITIES divides each value for the variable or case being standardized by

the range of the values. If the range is 0, PROXIMITIES leaves all values

unchanged.

RESCALE

From each value for the variable or case being standardized, PROXIMITIES

subtracts the minimum value and then divides by the range. If a range is 0,

PROXIMITIES sets all values for the case or variable to 0.50.

MAX

PROXIMITIES divides each value for the variable or case being standardized by

the maximum of the values. If the maximum of a set of values is 0, PROXIMITIES

uses an alternate process to produce a comparable standardization: it divides by the

absolute magnitude of the smallest value and adds 1.

Discover Study notes of Mathematical Statistics Alliance University

Partial preview of the text

Download Standardizing Data and Computing Proximities: Methods and Measures and more Study notes Mathematical Statistics in PDF only on Docsity!

1

Standardizing Cases or Variables

Either cases or variables can be standardized. The following methods of standardization are available:

Z

PROXIMITIES subtracts the mean from each value for the variable or case being standardized and then divides by the standard deviation of the values. If a standard deviation is 0, PROXIMITIES sets all values for the case or variable to 0.

RANGE

PROXIMITIES divides each value for the variable or case being standardized by the range of the values. If the range is 0, PROXIMITIES leaves all values unchanged.

RESCALE

From each value for the variable or case being standardized, PROXIMITIES subtracts the minimum value and then divides by the range. If a range is 0, PROXIMITIES sets all values for the case or variable to 0.50.

MAX

PROXIMITIES divides each value for the variable or case being standardized by the maximum of the values. If the maximum of a set of values is 0, PROXIMITIES uses an alternate process to produce a comparable standardization: it divides by the absolute magnitude of the smallest value and adds 1.

MEAN

PROXIMITIES divides each value for the variable or case being standardized by the mean of the values. If a mean is 0, PROXIMITIES adds one to all values for the case or variable to produce a mean of 1.

SD

PROXIMITIES divides each value for the variable or case being standardized by the standard deviation of the values. PROXIMITIES does not change the values if their standard deviation is 0.

Transformations

Three transformations are available for the values PROXIMITIES computes or reads:

ABSOLUTE

Take the absolute values of the proximities.

REVERSE

Transform similarity values into dissimilarities, or vice versa, by changing the signs of the coefficients.

RESCALE

RESCALE standardizes the proximities by first subtracting the value of the smallest and then dividing by the range. If you specify more than one transformation, PROXIMITIES does them in the order listed above: first ABSOLUTE, then REVERSE, then RESCALE.

CHEBYCHEV

The distance between two items is the maximum absolute difference between the values for the items.

CHEBYCHEV (^) I x y , (^) T = max i x (^) i − yi

BLOCK

The distance between two items is the sum of the absolute differences between the values for the items.

BLOCK x y x (^) i yi i

I , T =^ ∑ −

MINKOWSKI( p )

The distance between two items is the p th root of the sum of the absolute differences to the p th power between the values for the items.

MINKOWSKI x y x (^) i yi i

p p I , T =^ −

% ' &

( 0 ∑ )

1

POWER (^) I p r , T

The distance between two items is the r th root of the sum of the absolute differences to the p th power between the values for the items.

POWER x y x (^) i yi i

p r I , T =^ % − ' &

( 0 ∑ )

1

Measures for Frequency Count Data

CHISQ

The magnitude of this dissimilarity measure depends on the total frequencies of the two cases or variables whose proximity is computed. Expected values are from the model of independence of cases (or variables), x and y.

CHISQ x y

x E x E x

y E y E y

i i

I , T

P I TU I T

∑ ∑

2 2

PH

This is the CHISQ measure normalized by the square root of the combined frequency. Therefore, its value does not depend on the total frequencies of the two cases or variables whose proximity is computed.

PH

CHISQ

x y

x y N

I T

Measures for Binary Data

PROXIMITIES constructs a 2 × 2 contingency table for each pair of items in turn. It uses this table to compute a proximity measure for the pair. Item 2 Present Absent Item 1 Present a b Absent c d

PROXIMITIES computes all binary measures from the values of a , b , c, and d. These values are tallies across variables (when the items are cases) or tallies across cases (when the items are variables).

Rogers and Tanimoto Similarity Measure

RT x y a d a d b c

I , T I T

Sokal and Sneath Similarity Measure 2

SS2 x y a a b c

I , T I T

Kulczynski Similarity Measure 1

This measure has a minimum value of 0 and no upper limit. It is undefined when there are no nonmatches (^) I b = 0 and c = (^0) T. Therefore, PROXIMITIES assigns an artificial upper limit of 9999.999 to K1 when it is undefined or exceeds this value.

K1 x y

a b c

I , T =

Sokal and Sneath Similarity Measure 3

This measure has a minimum value of 0, has no upper limit, and is undefined when there are no nonmatches I b = 0 and c = 0 T. As with K1, PROXIMITIES assigns an artificial upper limit of 9999.999 to SS3 when it is undefined or exceeds this value.

SS3 x y

a d b c

I , T =

Conditional Probabilities

The following three binary measures yield values that you can interpret in terms of conditional probability. All three are similarity measures.

Kulczynski Similarity Measure 2

This yields the average conditional probability that a characteristic is present in one item given that the characteristic is present in the other item. The measure is an average over both items acting as predictors. It has a range of 0 to 1.

K2 x y

a a b a a c I , T

I T I T

Sokal and Sneath Similarity Measure 4

This yields the conditional probability that a characteristic of one item is in the same state (present or absent) as the characteristic of the other item. The measure is an average over both items acting as predictors. It has a range of 0 to 1.

SS4 x y

a a b a a c d b d d c d I , T

I T I T I T I T

Hamann Similarity Measure

This measure gives the probability that a characteristic has the same state in both items (present in both or absent from both) minus the probability that a characteristic has different states in the two items (present in one and absent from the other). HAMANN has a range of –1 to +1 and is monotonically related to SM, SS1, and RT.

HAMANN x y

a d b c a b c d

I , T

I T I T

Predictability Measures

The following four binary measures assess the association between items as the predictability of one given the other. All four measures yield similarities.

Yule’s Q (Similarity)

This is the 2 × 2 version of Goodman and Kruskal’s ordinal measure gamma. Like Yule’s Y , Q is a function of the cross-product ratio for a 2 × 2 table and has a range of –1 to +1.

Q x y ad bc ad bc

I , T =^

Other Binary Measures

The remaining binary measures available in PROXIMITIES are either binary equivalents of association measures for continuous variables or measures of special properties of the relation between items.

Ochiai Similarity Measure

This is the binary form of the cosine. It has a range of 0 to 1 and is a similarity measure.

OCHIAI x y

a a b

a a c

I , T =

% '&^

( 0 )^ +

% '&^

( 0 )

Sokal and Sneath Similarity Measure 5

This is a similarity measure. Its range is 0 to 1.

SS5 x y

ad a b a c b d c d

I , T I TI TI TI T

Fourfold Point Correlation (Similarity)

This is the binary form of the Pearson product-moment correlation coefficient. Phi is a similarity measure, and its range is 0 to 1.

PHI x y

ad bc a b a c b d c d

I , T I TI TI TI T

Binary Euclidean Distance

This is a distance measure. Its minimum value is 0, and it has no upper limit.

BEUCLID (^) I x y , (^) T = b + c

Binary Squared Euclidean Distance

This is also a distance measure. Its minimum value is 0, and it has no upper limit.

BSEUCLID (^) I x y , (^) T = b + c

Size Difference

This is a dissimilarity measure with a minimum value of 0 and no upper limit.

SIZE x y

b c a b c d

I , T

I T I T

2 2

Pattern Difference

This is also a dissimilarity measure. Its range is 0 to 1.

PATTERN x y bc a b c d

I , T I T

2

Binary Shape Difference

This dissimilarity measure has no upper or lower limit.

BSHAPE x y

a b c d b c b c a b c d

I , T

I TI T I T I T

2 2

Standardizing Data and Computing Proximities: Methods and Measures, Study notes of Mathematical Statistics

Related documents

Partial preview of the text

Download Standardizing Data and Computing Proximities: Methods and Measures and more Study notes Mathematical Statistics in PDF only on Docsity!

Standardizing Cases or Variables

Z

RANGE

RESCALE

MAX

MEAN

SD

Transformations

ABSOLUTE

REVERSE

RESCALE

CHEBYCHEV

BLOCK

Measures for Frequency Count Data

CHISQ

PH

PH

CHISQ

I T

Measures for Binary Data

Conditional Probabilities

I T I T

I T I T I T I T

I T I T

Predictability Measures

Other Binary Measures