








Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This document has following main points PROXSCAL, Introduction, Preliminaries, Initial Configuration, Configuration Update, Transformation Update, Termination, Remaining Issues
Typology: Study notes
1 / 14
This page cannot be seen from the preview
Don't miss anything!









1
PROXSCAL performs multidimensional scaling of proximity data to find a least- squares representation of the objects in a low-dimensional space. Individual differences models can be specified for multiple sources. A majorization algorithm guarantees monotone convergence for optionally transformed, metric and nonmetric data under a variety of models and constraints. Detailled mathematical derivations concerning the algorithm can be found in Commandeur and Heiser (1993).
The following notation is used throughout this chapter, unless stated otherwise. For the dimensions of the vectors and matrices are used:
n Number of objects m Number of sources p Number of dimensions s Number of independent variables h maximum( s , p ) l (^) Length of transformation vector r (^) Degree of spline t (^) Number of interior knots for spline
The input and input-related variables are:
∆ k n^ ×^ n matrix with raw proximities for source^ k
n × n matrix with weights for source k
Output and output-related variables are:
D k n × n matrix with transformed proximities for source k
p × p matrix with space weights for source k
n × p matrix with individual space coordinates for source k
Special matrices and functions are:
J (^) I − 11 T^ / 1 1 T , centering matrix of appropriate size
D (^) ( X k (^) ) n × n matrix with distances, with elements (^) { dijk (^) },
where dijk = (^) ( x ik (^) − x (^) jk ) ( x ik (^) − x jk )
V k n^ ×^ n matrix with elements^ { vijk^ }, where
for
for
ijk n ijk ilk l i
w i j
v w i j ≠
∑
B X ( (^) k (^) ) n × n × m matrix with elements (^) { bijk (^) }, where
( ) ( )
( )
( )
f if d 0 and d 0 if d 0 and
if
ijk ijk ij k ij k ijk (^) ij k n ilk l i
w i j
b (^) i j
b i j
δ
≠
∑
Introduction
The following loss function is minimized by PROXSCAL,
( ) 2 2 1
d
m n ijk ijk ij k k i j
w d m
σ = <
∑∑ (^) X^ ,^ (1.1)
which is the weighted mean squared error between the transformed proximities and the distances of n object within m sources. The transformation function for the
The proximities are normalized such that the weighted squared proximities equal the sum of the weights, again, taking into account the conditionality.
Step 1: Initial Configuration
PROXSCAL allows for several initial configurations. Before determining the initial configuration, missings are handled, and the raw proximities are initialized. Finally, after one of the starts described below, the common space Z is centered on the origin and optimally dilated in accordance with the normalized proximities.
The simplex start consists of a rank- p approximation of the matrix V B J − ( (^) ). Set H , an n × p columnwise orthogonal matrix, satisfying H H T = I (^) p equal to I (^) p ,
nonzero rows are selected in such a way that the first Z = B J H ( ) contains the p columns of B J ( ) with the largest diagonal elements. The following steps are computed in turn, until convergence is reached:
For a restricted common space Z , the second step is adjusted in order to fullfill the restictions. This procedure was introduced in Heiser (1985).
The proximities are aggregated over sources, squared, double centered and multiplied with −0.5 , after which an eigenvalue decomposition is used to determine the coordinate values, thus −0.5 JD J *^ = Q 4 T ,
1
m m ij ijk ijk ijk k k
d w d w
−
= =
∑ ∑
followed by Z = Q 1/ 2, where only the first p positive ordered eigenvalues ( λ 1 ≥ λ 2 ≥ K ≥λ n ) and eigenvectors are used. This technique, classical scaling, is due to Torgerson (1952, 1958) and Gower (1966) and also known under the names Torgerson scaling or Torgerson-Gower scaling.
The coordinate values are randomly generated from a uniform distribution using the default random number generator from the SPSS system.
The coordinate values provided by the user are used.
Step 2: Configuration Update
The common space Z is related to the individual spaces X k (^) ( k = 1,..., m )through
Assume that weight matrix A k is of full rank. Only considering Z defines (1.1) as σ (^2) ( z (^) )= c + z Hz T^ − 2 z t T , (3.1)
where ( )
( )
( )
T 1 T 1
vec , 1 ,
1 vec ,
m k k k k m k k k k
m
m
=
=
∑
∑
z Z
H A A V
t B X X A
for which a solution is found as z = H t − (3.3)
Several special cases exist for which (3.3) can be simplified. First, the weights
with the different models, reflected in restrictions for the space weights. Equation
Suppose, P L Q k k T k is the singular value decomposition of A k ,for which the diagonal matrix with singular values L (^) k is in nonincreasing order. Then, for the reduced rank model, the best r r ( < p )rank approximation of A (^) k is given by T R T k k , where R (^) k contains the first r columns of P L k k , and T k contains the first r columns of Q k.
For the weighted Euclidean model, (3.4) reduces to a diagonal matrix ( ) ( ( ) ) T -1 T A (^) k =diag Z V Z k diag Z B X (^) k X (^) k.
reduced rank model, which can be done in r dimensions, as explained in Heiser and Stoop (1986).
Fixed coordinates
step further, which results in an update for object i on dimension a as T T T T T T T 1 1
m p m ia i k k k a j k k a k j i a ia i a i k^ j^ a^ k i a i
z m m
= ≠ =
e V e e^ ∑^ B X^ X A e^ ∑ ∑ e A A e V^ z^ e V e e V z %
where the a th column of Z is divided into z (^) a = z % (^) ia (^) + zia e i , with e i the i th column of
the identity matrix, and T^ T 1
1 m a j k k a k m (^) k =
V = (^) ∑ e A A e V.
This update procedure will only locally minimize (3.1) and repeatedly cycling through all free coordinates until convergence is reached, will provide global optimization. After all free coordinates have been updated, Z is centered on the origin. On output, the configuration is adapted as to coincide with the initial fixed coordinates.
Independent variables
T 1
h j j j =
Z = QB = (^) ∑ q b.
h j k k k ≠ j
U = (^) ∑ q b
1
j k j k k
T = C − (^) ∑ V U A A , where (^) ( ) T 1
k k k
C = (^) ∑ B X X A
1 T T 1
j j k j k k j j
−
=
b (^) ∑ q V q A A T q
1 1
j j j j j
1
j j k k j k
V = (^) ∑ b A A b V and
Finally, set T 1
h j j j =
Z = QB = (^) ∑ q b.
Independent variables restrictions were introduced for the MDS model in Bentler and Weeks (1978), Bloxom (1978), de Leeuw and Heiser (1980) and Meulman and Heiser (1984). If there are more dimensions ( p ) than independent variables ( s ), p - s dummy variables are created and treated completely free in the analysis. The transformations for the independent variables from Step 4 are identical to the transformations of the proximities, except that the nonnegativety constraint does
normalized on n , and the reverse normalization is applied to the regression weights
Spline
( )
1988). In this case, the spline transformation gives a smooth nondecreasing piecewise polynomial transformation. It is computed as a weighted regression of
and computed using nonnegative alternating least squares (Groenen, van Os and Meulman, 2000).
After transformation, the transformed proximities are normalized such that the sum-of-squares of the weighted transformed proximities are equal to mn ( n -1)/2 in the unconditional case and equal to n ( n -1)/2 in the matrix-conditional case.
Step 4: Termination
After evaluation of the loss function, the old function value and new function values are used to decide whether iterations should continue. If the new function value is smaller than or equal to the minimum Stress value MINSTRESS, provided by the user, iterations are terminated. Also, if the difference in consecutive Stress values is smaller than or equal to the convergence criterion DIFFSTRESS, provided by the user, iterations are terminated. Finally, iterations are terminated if the current number of iterations, exceeds the maximum number of iterations MAXITER, also provided by the user. In all other cases, iterations continue.
Remaining Issues
For the identity model without further restictions, the common space can be
relaxed update.
For a restart in p -1 dimensions, the p -1 most important dimensions need to be identified. For the identity model, the first p -1 principal axes are used. For the
weighted Euclidean model, the p -1 most important space weights are used, and for the generalized Euclidean and reduced rank models, the p -1 largest singular values of the space weights determine the remaining dimensions.
The following statistics are used for the computation of the Stress measures:
( )
( )
( ) ( )
( ) ( )
( ) ( )
( ) ( )
( ) ( ( ) ( ))
2 2 1
4 4 1
2 2 1
4 4 1
1
2 2 2 1 2 2 1
m n ijk ijk k i j m n ijk ijk k i j m n ijk ij k k i j m n ijk ij k k i j m n ijk ijk ij k k i j m n ijk ijk ij k k i j m n ijk ij k k i j
η η η η ρ ρ κ = <
= <
= <
= <
= <
= <
= <
∑∑
∑∑
∑∑
∑∑
∑∑
∑∑
∑∑
where d ( X )is the average distance.
The loss function minimized by PROXSCAL, normalized raw Stress, is given by:
( ) (^ )^ (^ )
( )
2 2 2 2
η η α ρ α σ η
, with
( ) ( )
2
ρ α η
Note that at a local minimum of X , α is equal to one. The other Fit and Stress measures provided by PROXSCAL are given by:
Stress-I:
( ) (^ )^ (^ )
( )
2 2
2
η ˆ^ η α 2 ρ α
η α
, with
( )
( )
η^2 ˆ α ρ
References
Barlow, R. E., Bortholomew, D. J., Bremner, J. M. and Brunk, H. D. (1972). Statistical inference under order restrictions. New York: Wiley.
Bentler, P. M. and Weeks, D. G. (1978). Restricted multidimensional scaling models. Journal of Mathematical Psychology , 17 , 138-151.
Bloxom, B. (1978). Constrained multidimensional scaling in n spaces. Psychometrika , 43 , 397-408.
Carroll, J. D. and Chang, J. J. (1972, March). IDIOSCAL (Individual Differences In Orientation SCALing): A generalization of INDSCAL allowing idiosyncratic reference systems as well as analytic approximation to INDSCAL. Paper presented at the spring meeting of the Psychometric Society. Princeton, NJ.
Commandeur, J. J. F. and Heiser, W. J. (1993). Mathematical derivations in the proximity scaling (PROXSCAL) of symmetric data matrices (Tech. Rep. No. RR- 93-03). Leiden, The Netherlands: Department of Data Theory, Leiden University.
De Leeuw, J. (1977). Applications of convex analysis to multidimensional scaling. In J. R. Barra, F. Brodeau, G. Romier, and B. van Cutsem (Eds.), Recent developments in statistics (pp. 133-145). Amsterdam, The Netherlands: North- Holland.
De Leeuw, J. and Heiser, W. J. (1980). Multidimensional scaling with restrictions on the configuration. In P. R. Krishnaiah (Ed.), Multivariate analysis (Vol. V, pp. 501-522). Amsterdam, The Netherlands: North-Holland.
Gower, J. C. (1966). Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika , 53 , 325-338.
Groenen, P. J. F., Heiser, W. J. and Meulman, J. J. (1999). Global optimization in least squares multidimensional scaling by distance smoothing. Journal of Classification , 16 , 225-254.
Groenen, P. J. F., van Os, B. and Meulman, J. J. (2000). Optimal scaling by alternating length-constained nonnegative least squares, with application to distance-based analysis. Psychometrika , 65 , 511-524.
Heiser, W. J. (1985). A general MDS initialization procedure using the SMACOF algorithm-model with constraints (Tech. Rep. No. RR-85-23). Leiden, The Netherlands: Department of Data Theory, Leiden University.
Heiser, W. J. (1987). Joint ordination of species and sites: The unfolding technique. In P. Legendre and L. Legendre (Eds.), Developments in numerical ecology (pp. 189-221). Berlin, Heidelberg: Springer-Verlag.
Heiser, W. J. and De Leeuw, J. (1986). SMACOF-I (Tech. Rep. No. UG-86-02). Leiden, The Netherlands: Department of Data Theory, Leiden University.
Heiser, W. J. and Stoop, I. (1986). Explicit SMACOF algorithms for individual differences scaling (Tech. Rep. No. RR-86-14). Leiden, The Netherlands: Department of Data Theory, Leiden University.
Kruskal, J. B. (1964). Nonmetric multidimensional scaling: A numerical method. Psychometrika , 29 , 28-42.
Meulman, J. J. and Heiser, W. J. (1984). Constrained Multidimensional Scaling: more Directions than Dimensions. In T. Havranek et al (Eds.), COMPSTAT 1984 (pp. 137-142). Wien: Physica Verlag.
Ramsay, J. O. (1988). Monotone regression splines in action. Statistical Science , 3 (4), 425-461.
Stoop, I., Heiser, W. J. and De Leeuw, J. (1981). How to use SMACOF-IA. Leiden, The Netherlands: Department of Data Theory, Leiden University.
Stoop, I. and De Leeuw, J. (1982). How to use SMACOF-IB. Leiden, The Netherlands: Department of Data Theory, Leiden University.
Torgerson, W. S. (1952). Multidimensional scaling: I. Theory and method. Psychometrika , 17 , 401-419.
Torgerson, W. S. (1958). Theory and methods of scaling. New York: Wiley.
Wolkowicz, H. and Styan, G. P. H. (1980). Bounds for eigenvalues using traces. Linear algebra and its applications , 29 , 471-506.