Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

HOMALS Algorithm: Modernizing Guttman's Scale Theory for Incomplete Data, Study notes of Mathematical Statistics

Alliance University Mathematical Statistics

The homals algorithm is a modernized version of guttman's (1941) scale theory for analyzing incomplete data. The iterative process of the homals algorithm, which includes the treatment of missing values, optimization of the objective function, and orthonormalization of the data. The algorithm aims to find object scores and category quantifications that minimize the loss function while satisfying normalization restrictions.

Typology: Study notes

2011/2012

Uploaded on 10/31/2012

sangawar 🇮🇳

4.5

(4)

118 documents

1 / 6

This page cannot be seen from the preview

Don't miss anything!

1

HOMALS

The iterative HOMALS algorithm is a modernized version of Guttman (1941). The

treatment of missing values, described below, is based on setting weights in the loss

function equal to zero, and was first described in De Leeuw and Van Rijckevorsel

(1980). Other possibilities do exist and can be accomplished by recoding the data

(Gifi, 1981; Meulman, 1982).

Notation

The following notation is used throughout this chapter unless otherwise stated:

n Number of cases (objects)

m Number of variables

p Number of dimensions

For variable j, jm=1, ,K

hj n-vector with categorical observations

kj Number of valid categories (distinct values) of variable j

Gj Indicator matrix for variable j, of order nk

j

×

gir j

ir j

jir

16 =%

&

'

1

0

when the th object is in the th category of variable

when the th object is not in the th category of variable

Mj Binary, diagonal nn

× matrix, with diagonal elements defined as

mik

ik

jii j

j

16 =%

&

'

11

01

when the th observation is within the range [

when the th observation is outside the range [

,]

Dj Diagonal matrix containing the univariate marginals, i.e., the column sums

of Gj.

Discover Study notes of Mathematical Statistics Alliance University

Partial preview of the text

Download HOMALS Algorithm: Modernizing Guttman's Scale Theory for Incomplete Data and more Study notes Mathematical Statistics in PDF only on Docsity!

1

The iterative HOMALS algorithm is a modernized version of Guttman (1941). The treatment of missing values, described below, is based on setting weights in the loss function equal to zero, and was first described in De Leeuw and Van Rijckevorsel (1980). Other possibilities do exist and can be accomplished by recoding the data (Gifi, 1981; Meulman, 1982).

Notation

The following notation is used throughout this chapter unless otherwise stated:

n Number of cases (objects) m (^) Number of variables p (^) Number of dimensions

For variable j , (^) j = 1, K, m

h (^) j n -vector with categorical observations

k (^) j Number of valid categories (distinct values) of variable^ j

G (^) j Indicator matrix for variable j , of order n × k (^) j

g

i r j

1 6 j ir i r j

% & '

when the th object is in the th category of variable when the th object is not in the th category of variable

M (^) j Binary, diagonal^ n^ ×^ n matrix, with diagonal elements defined as

m

i k j ii i k

j

1 6 j

% & '

when the th observation is within the range [ when the th observation is outside the range [

, ]

D (^) j Diagonal matrix containing the univariate marginals, i.e., the column sums of G (^) j.

The quantification matrices are

X Object scores, of order n × p

Y j Category quantifications, of order k (^) j × p

Y Concatenated category quantification matrices, of order^ k^ j p j ∑ ×.

Note: The matrices G (^) j , M (^) j , and D (^) j are exclusively notational devices; they are stored in reduced form, and the program fully profits from their sparseness by replacing matrix multiplications with selective accumulation.

Objective Function Optimization

The HOMALS objective is to find object scores X and a set of Y j (for j = 1, K , m ) so that the function

σ 1 X Y ; (^) 6 = 3 X − G Y (^) 8 M (^) 3 X G Y 8

1 m (^) ∑ j tr j j j j j

is minimal, under the normalization restriction X M X ′ (^) ∗ = mn I , where the matrix

M (^) ∗ = (^) ∑ M j j

, and I is the p × p identity matrix. The inclusion of M (^) j in

σ 1 X Y ; 6 ensures that there is no influence of data values outside the range [ , 1 k (^) j ], which may be really missing or merely regarded as such; M ∗ contains the number

of “active” data values for each object. The object scores are also centered; that is, they satisfy u M X ′ (^) ∗ = 0 , with u denoting an n -vector with ones.

Optimization is achieved through the following iteration scheme:

Initialization
Update object scores
Orthonormalization
Update category quantifications
Convergence test: repeat steps 2-4 or continue
Rotation

5. Convergence test

The difference between consecutive loss function values σ

3^ X Y 8 −^ σ^4 X^ +^ ; Y + 9 is^ compared^ with^ the^ user-specified^ convergence criterion ε —a small positive number. Steps 2 to 4 are repeated as long as the loss difference exceeds ε.

6. Rotation

As indicated in step 3, during iteration the orientation of X and Y with respect to the coordinate system is not necessarily correct; this also reflects that σ 1 X Y ; 6 is invariant under simultaneous rotations of X and Y. From theory it

is known that solutions in different dimensionality should be nested; that is, the p -dimensional solution should be equal to the first p columns of the 1 p + 16 -

dimensional solution. Nestedness is achieved by computing the eigenvectors of the matrix 1 m (^) j j j j ∑ Y D Y ′^. The corresponding eigenvalues are printed after

the convergence message of the program. The calculation involves tridiagonalization with Householder transformations followed by the implicit QL algorithm (Wilkinson, 1965).

Diagnostics

Maximum Rank (may be issued as a warning when exceeded)

The maximum rank p max indicates the maximum number of dimensions that can be computed for any data set. In general we have:

p n k (^) j m max (^) j = min − , max ,

−

% & '

( )

(^1 16) ∑ 1 116 ,

where m 1 is the number of variables with no missing values. Although the number of nontrivial dimensions may be less than p max when m = 2 , HOMALS does allow dimensionalities all the way up to p max.

Marginal Frequencies

The frequencies table gives the univariate marginals and the number of missing values (that is, values that are regarded as out of range for the current analysis) for each variable. These are computed as the column sums of D (^) j and the total sum of M (^) j.

Discrimination Measure

These are the dimensionwise variances of the quantified variables. For variable j and dimension s , we have

η 2 js^ = y ′1 6 j s D y j 1 6 j s n ,

where y 1 6 j s is the s th column of Y j , corresponding to the s th quantified variable

G y j 1 6 j s.

Eigenvalues

The computation of the eigenvalues that are reported after convergence is discussed in step 6. With the HISTORY option, the sum of the eigenvalues is reported during iteration under the heading “total fit.” Due to the fact that the sum of the eigenvalues is equal to the trace of the original matrix, the sum can be computed as 1 m^2 js j s ∑ ∑^ η. The value of^ σ^1 X Y ;^6 is equal to^ p^ m^ js j s

− (^1) ∑ ∑ η 2.

References

Björk, A., and Golub, G. H. 1973. Numerical methods for computing angles between linear subspaces. Mathematics of Computation , 27: 579–594.

De Leeuw, J., and Van Rijckevorsel, J. 1980. HOMALS and PRINCALS—Some generalizations of principal components analysis. In: Data Analysis and Informatics , E. Diday et al, eds. Amsterdam: North-Holland.

Gifi, A. 1981. Nonlinear multivariate analysis. Leiden: Department of Data Theory.

HOMALS Algorithm: Modernizing Guttman's Scale Theory for Incomplete Data, Study notes of Mathematical Statistics

Related documents

Partial preview of the text

Download HOMALS Algorithm: Modernizing Guttman's Scale Theory for Incomplete Data and more Study notes Mathematical Statistics in PDF only on Docsity!

Notation

1 6 j ir i r j

1 6 j

, ]

, ]

Objective Function Optimization

Diagnostics

η 2 js^ = y ′1 6 j s D y j 1 6 j s n ,

where y 1 6 j s is the s th column of Y j , corresponding to the s th quantified variable

G y j 1 6 j s.

References