



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The homals algorithm is a modernized version of guttman's (1941) scale theory for analyzing incomplete data. The iterative process of the homals algorithm, which includes the treatment of missing values, optimization of the objective function, and orthonormalization of the data. The algorithm aims to find object scores and category quantifications that minimize the loss function while satisfying normalization restrictions.
Typology: Study notes
1 / 6
This page cannot be seen from the preview
Don't miss anything!




1
The iterative HOMALS algorithm is a modernized version of Guttman (1941). The treatment of missing values, described below, is based on setting weights in the loss function equal to zero, and was first described in De Leeuw and Van Rijckevorsel (1980). Other possibilities do exist and can be accomplished by recoding the data (Gifi, 1981; Meulman, 1982).
The following notation is used throughout this chapter unless otherwise stated:
n Number of cases (objects) m (^) Number of variables p (^) Number of dimensions
For variable j , (^) j = 1, K, m
h (^) j n -vector with categorical observations
k (^) j Number of valid categories (distinct values) of variable^ j
G (^) j Indicator matrix for variable j , of order n × k (^) j
g
i r j
% & '
when the th object is in the th category of variable when the th object is not in the th category of variable
M (^) j Binary, diagonal^ n^ ×^ n matrix, with diagonal elements defined as
m
i k j ii i k
j
% & '
when the th observation is within the range [ when the th observation is outside the range [
D (^) j Diagonal matrix containing the univariate marginals, i.e., the column sums of G (^) j.
The quantification matrices are
X Object scores, of order n × p
Y j Category quantifications, of order k (^) j × p
Y Concatenated category quantification matrices, of order^ k^ j p j ∑ ×.
Note: The matrices G (^) j , M (^) j , and D (^) j are exclusively notational devices; they are stored in reduced form, and the program fully profits from their sparseness by replacing matrix multiplications with selective accumulation.
The HOMALS objective is to find object scores X and a set of Y j (for j = 1, K , m ) so that the function
σ 1 X Y ; (^) 6 = 3 X − G Y (^) 8 M (^) 3 X G Y 8
1 m (^) ∑ j tr j j j j j
is minimal, under the normalization restriction X M X ′ (^) ∗ = mn I , where the matrix
M (^) ∗ = (^) ∑ M j j
, and I is the p × p identity matrix. The inclusion of M (^) j in
σ 1 X Y ; 6 ensures that there is no influence of data values outside the range [ , 1 k (^) j ], which may be really missing or merely regarded as such; M ∗ contains the number
of “active” data values for each object. The object scores are also centered; that is, they satisfy u M X ′ (^) ∗ = 0 , with u denoting an n -vector with ones.
Optimization is achieved through the following iteration scheme:
5. Convergence test
The difference between consecutive loss function values σ
3^ X Y 8 −^ σ^4 X^ +^ ; Y + 9 is^ compared^ with^ the^ user-specified^ convergence criterion ε —a small positive number. Steps 2 to 4 are repeated as long as the loss difference exceeds ε.
6. Rotation
As indicated in step 3, during iteration the orientation of X and Y with respect to the coordinate system is not necessarily correct; this also reflects that σ 1 X Y ; 6 is invariant under simultaneous rotations of X and Y. From theory it
is known that solutions in different dimensionality should be nested; that is, the p -dimensional solution should be equal to the first p columns of the 1 p + 16 -
dimensional solution. Nestedness is achieved by computing the eigenvectors of the matrix 1 m (^) j j j j ∑ Y D Y ′^. The corresponding eigenvalues are printed after
the convergence message of the program. The calculation involves tridiagonalization with Householder transformations followed by the implicit QL algorithm (Wilkinson, 1965).
Maximum Rank (may be issued as a warning when exceeded)
The maximum rank p max indicates the maximum number of dimensions that can be computed for any data set. In general we have:
p n k (^) j m max (^) j = min − , max ,
−
% & '
( )
(^1 16) ∑ 1 116 ,
where m 1 is the number of variables with no missing values. Although the number of nontrivial dimensions may be less than p max when m = 2 , HOMALS does allow dimensionalities all the way up to p max.
Marginal Frequencies
The frequencies table gives the univariate marginals and the number of missing values (that is, values that are regarded as out of range for the current analysis) for each variable. These are computed as the column sums of D (^) j and the total sum of M (^) j.
Discrimination Measure
These are the dimensionwise variances of the quantified variables. For variable j and dimension s , we have
Eigenvalues
The computation of the eigenvalues that are reported after convergence is discussed in step 6. With the HISTORY option, the sum of the eigenvalues is reported during iteration under the heading “total fit.” Due to the fact that the sum of the eigenvalues is equal to the trace of the original matrix, the sum can be computed as 1 m^2 js j s ∑ ∑^ η. The value of^ σ^1 X Y ;^6 is equal to^ p^ m^ js j s
− (^1) ∑ ∑ η 2.
Björk, A., and Golub, G. H. 1973. Numerical methods for computing angles between linear subspaces. Mathematics of Computation , 27: 579–594.
De Leeuw, J., and Van Rijckevorsel, J. 1980. HOMALS and PRINCALS—Some generalizations of principal components analysis. In: Data Analysis and Informatics , E. Diday et al, eds. Amsterdam: North-Holland.
Gifi, A. 1981. Nonlinear multivariate analysis. Leiden: Department of Data Theory.