HOMALS Algorithm: Modernizing Guttman's Scale Theory for Incomplete Data, Study notes of Mathematical Statistics

The homals algorithm is a modernized version of guttman's (1941) scale theory for analyzing incomplete data. The iterative process of the homals algorithm, which includes the treatment of missing values, optimization of the objective function, and orthonormalization of the data. The algorithm aims to find object scores and category quantifications that minimize the loss function while satisfying normalization restrictions.

Typology: Study notes

2011/2012

Uploaded on 10/31/2012

sangawar
sangawar 🇮🇳

4.5

(4)

118 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
HOMALS
The iterative HOMALS algorithm is a modernized version of Guttman (1941). The
treatment of missing values, described below, is based on setting weights in the loss
function equal to zero, and was first described in De Leeuw and Van Rijckevorsel
(1980). Other possibilities do exist and can be accomplished by recoding the data
(Gifi, 1981; Meulman, 1982).
Notation
The following notation is used throughout this chapter unless otherwise stated:
n Number of cases (objects)
m Number of variables
p Number of dimensions
For variable j, jm=1, ,K
hj n-vector with categorical observations
kj Number of valid categories (distinct values) of variable j
Gj Indicator matrix for variable j, of order nk
j
×
gir j
ir j
jir
16 =%
&
'
1
0
when the th object is in the th category of variable
when the th object is not in the th category of variable
Mj Binary, diagonal nn
× matrix, with diagonal elements defined as
mik
ik
jii j
j
16 =%
&
'
11
01
when the th observation is within the range [
when the th observation is outside the range [
,]
,]
Dj Diagonal matrix containing the univariate marginals, i.e., the column sums
of Gj.
pf3
pf4
pf5

Partial preview of the text

Download HOMALS Algorithm: Modernizing Guttman's Scale Theory for Incomplete Data and more Study notes Mathematical Statistics in PDF only on Docsity!

1

The iterative HOMALS algorithm is a modernized version of Guttman (1941). The treatment of missing values, described below, is based on setting weights in the loss function equal to zero, and was first described in De Leeuw and Van Rijckevorsel (1980). Other possibilities do exist and can be accomplished by recoding the data (Gifi, 1981; Meulman, 1982).

Notation

The following notation is used throughout this chapter unless otherwise stated:

n Number of cases (objects) m (^) Number of variables p (^) Number of dimensions

For variable j , (^) j = 1, K, m

h (^) j n -vector with categorical observations

k (^) j Number of valid categories (distinct values) of variable^ j

G (^) j Indicator matrix for variable j , of order n × k (^) j

g

i r j

1 6 j ir i r j

% & '

when the th object is in the th category of variable when the th object is not in the th category of variable

M (^) j Binary, diagonal^ n^ ×^ n matrix, with diagonal elements defined as

m

i k j ii i k

j

1 6 j

% & '

when the th observation is within the range [ when the th observation is outside the range [

, ]

, ]

D (^) j Diagonal matrix containing the univariate marginals, i.e., the column sums of G (^) j.

The quantification matrices are

X Object scores, of order n × p

Y j Category quantifications, of order k (^) j × p

Y Concatenated category quantification matrices, of order^ k^ j p j ∑ ×.

Note: The matrices G (^) j , M (^) j , and D (^) j are exclusively notational devices; they are stored in reduced form, and the program fully profits from their sparseness by replacing matrix multiplications with selective accumulation.

Objective Function Optimization

The HOMALS objective is to find object scores X and a set of Y j (for j = 1, K , m ) so that the function

σ 1 X Y ; (^) 6 = 3 XG Y (^) 8 M (^) 3 X G Y 8

  

 

1 m (^) ∑ j tr j j j j j 

is minimal, under the normalization restriction X M X ′ (^) ∗ = mn I , where the matrix

M (^) ∗ = (^) ∑ M j j

, and I is the p × p identity matrix. The inclusion of M (^) j in

σ 1 X Y ; 6 ensures that there is no influence of data values outside the range [ , 1 k (^) j ], which may be really missing or merely regarded as such; M ∗ contains the number

of “active” data values for each object. The object scores are also centered; that is, they satisfy u M X ′ (^) ∗ = 0 , with u denoting an n -vector with ones.

Optimization is achieved through the following iteration scheme:

  1. Initialization
  2. Update object scores
  3. Orthonormalization
  4. Update category quantifications
  5. Convergence test: repeat steps 2-4 or continue
  6. Rotation

5. Convergence test

The difference between consecutive loss function values σ

3^ X Y 8 −^ σ^4 X^ +^ ; Y + 9 is^ compared^ with^ the^ user-specified^ convergence criterion ε —a small positive number. Steps 2 to 4 are repeated as long as the loss difference exceeds ε.

6. Rotation

As indicated in step 3, during iteration the orientation of X and Y with respect to the coordinate system is not necessarily correct; this also reflects that σ 1 X Y ; 6 is invariant under simultaneous rotations of X and Y. From theory it

is known that solutions in different dimensionality should be nested; that is, the p -dimensional solution should be equal to the first p columns of the 1 p + 16 -

dimensional solution. Nestedness is achieved by computing the eigenvectors of the matrix 1 m (^) j j j jY D Y ′^. The corresponding eigenvalues are printed after

the convergence message of the program. The calculation involves tridiagonalization with Householder transformations followed by the implicit QL algorithm (Wilkinson, 1965).

Diagnostics

Maximum Rank (may be issued as a warning when exceeded)

The maximum rank p max indicates the maximum number of dimensions that can be computed for any data set. In general we have:

p n k (^) j m max (^) j = min − , max ,   

   −

  

  

% & '

( )

(^1 16) ∑ 1 116 ,

where m 1 is the number of variables with no missing values. Although the number of nontrivial dimensions may be less than p max when m = 2 , HOMALS does allow dimensionalities all the way up to p max.

Marginal Frequencies

The frequencies table gives the univariate marginals and the number of missing values (that is, values that are regarded as out of range for the current analysis) for each variable. These are computed as the column sums of D (^) j and the total sum of M (^) j.

Discrimination Measure

These are the dimensionwise variances of the quantified variables. For variable j and dimension s , we have

η 2 js^ = y ′1 6 j s D y j 1 6 j s n ,

where y 1 6 j s is the s th column of Y j , corresponding to the s th quantified variable

G y j 1 6 j s.

Eigenvalues

The computation of the eigenvalues that are reported after convergence is discussed in step 6. With the HISTORY option, the sum of the eigenvalues is reported during iteration under the heading “total fit.” Due to the fact that the sum of the eigenvalues is equal to the trace of the original matrix, the sum can be computed as 1 m^2 js j s ∑ ∑^ η. The value of^ σ^1 X Y ;^6 is equal to^ p^ m^ js j s

− (^1) ∑ ∑ η 2.

References

Björk, A., and Golub, G. H. 1973. Numerical methods for computing angles between linear subspaces. Mathematics of Computation , 27: 579–594.

De Leeuw, J., and Van Rijckevorsel, J. 1980. HOMALS and PRINCALS—Some generalizations of principal components analysis. In: Data Analysis and Informatics , E. Diday et al, eds. Amsterdam: North-Holland.

Gifi, A. 1981. Nonlinear multivariate analysis. Leiden: Department of Data Theory.