Proxscal - Mathematics and Statistics - Study Notes, Study notes of Mathematical Statistics

This document has following main points PROXSCAL, Introduction, Preliminaries, Initial Configuration, Configuration Update, Transformation Update, Termination, Remaining Issues

Typology: Study notes

2011/2012

Uploaded on 10/31/2012

sangawar
sangawar 🇮🇳

4.5

(4)

118 documents

1 / 14

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
PROXSCAL
PROXSCAL performs multidimensional scaling of proximity data to find a least-
squares representation of the objects in a low-dimensional space. Individual
differences models can be specified for multiple sources. A majorization algorithm
guarantees monotone convergence for optionally transformed, metric and
nonmetric data under a variety of models and constraints.
Detailled mathematical derivations concerning the algorithm can be found in
Commandeur and Heiser (1993).
Notation
The following notation is used throughout this chapter, unless stated otherwise.
For the dimensions of the vectors and matrices are used:
n Number of objects
m Number of sources
p Number of dimensions
s Number of independent variables
h maximum(s, p)
l Length of transformation vector
r Degree of spline
t Number of interior knots for spline
The input and input-related variables are:
k
nn
× matrix with raw proximities for source k
k
W nn
× matrix with weights for source k
E ns× matrix with raw independent variables
F np× matrix with fixed coordinates
Output and output-related variables are:
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe

Partial preview of the text

Download Proxscal - Mathematics and Statistics - Study Notes and more Study notes Mathematical Statistics in PDF only on Docsity!

1

PROXSCAL

PROXSCAL performs multidimensional scaling of proximity data to find a least- squares representation of the objects in a low-dimensional space. Individual differences models can be specified for multiple sources. A majorization algorithm guarantees monotone convergence for optionally transformed, metric and nonmetric data under a variety of models and constraints. Detailled mathematical derivations concerning the algorithm can be found in Commandeur and Heiser (1993).

Notation

The following notation is used throughout this chapter, unless stated otherwise. For the dimensions of the vectors and matrices are used:

n Number of objects m Number of sources p Number of dimensions s Number of independent variables h maximum( s , p ) l (^) Length of transformation vector r (^) Degree of spline t (^) Number of interior knots for spline

The input and input-related variables are:

k n^ ×^ n matrix with raw proximities for source^ k

W k

n × n matrix with weights for source k

E n^ ×^ s matrix with raw independent variables

F n^ ×^ p matrix with fixed coordinates

Output and output-related variables are:

D k n × n matrix with transformed proximities for source k

Z n^ ×^ p matrix with common space coordinates

A k

p × p matrix with space weights for source k

X k

n × p matrix with individual space coordinates for source k

Q n^ ×^ h matrix with transformed independent variables

B h^ ×^ p matrix with regression weights for independent variables

S l^ ×^ (^ r^ +^ t )matrix of coefficients for the spline basis

Special matrices and functions are:

J (^) I11 T^ / 1 1 T , centering matrix of appropriate size

D (^) ( X k (^) ) n × n matrix with distances, with elements (^) { dijk (^) },

where dijk = (^) ( x ik (^) − x (^) jk ) ( x ik (^) − x jk )

V k n^ ×^ n matrix with elements^ { vijk^ }, where

for

for

ijk n ijk ilk l i

w i j

v w i j

−^ ≠

B X ( (^) k (^) ) n × n × m matrix with elements (^) { bijk (^) }, where

( ) ( )

( )

( )

f if d 0 and d 0 if d 0 and

if

ijk ijk ij k ij k ijk (^) ij k n ilk l i

w i j

b (^) i j

b i j

δ

 ∑

X

X

X

Introduction

The following loss function is minimized by PROXSCAL,

( ) 2 2 1

d

m n ijk ijk ij k k i j

w d m

σ = <

≡ ^ − 

∑∑ (^)  X^  ,^ (1.1)

which is the weighted mean squared error between the transformed proximities and the distances of n object within m sources. The transformation function for the

Normalization

The proximities are normalized such that the weighted squared proximities equal the sum of the weights, again, taking into account the conditionality.

Step 1: Initial Configuration

PROXSCAL allows for several initial configurations. Before determining the initial configuration, missings are handled, and the raw proximities are initialized. Finally, after one of the starts described below, the common space Z is centered on the origin and optimally dilated in accordance with the normalized proximities.

Simplex Start

The simplex start consists of a rank- p approximation of the matrix V B J − ( (^) ). Set H , an n × p columnwise orthogonal matrix, satisfying H H T = I (^) p equal to I (^) p ,

where I p denotes the matrix with the first p columns of the identity matrix. The

nonzero rows are selected in such a way that the first Z = B J H ( ) contains the p columns of B J ( ) with the largest diagonal elements. The following steps are computed in turn, until convergence is reached:

  1. For a fixed Z , H = PQ T , where PQ T is taken from the singular value decomposition B J Z ( ) = PLQ T ;
  2. For a fixed H , Z = 2 −1/ 2^ V B J H − ( ) , where −

V is the pseudo-inverse of V.

For a restricted common space Z , the second step is adjusted in order to fullfill the restictions. This procedure was introduced in Heiser (1985).

Torgerson Start

The proximities are aggregated over sources, squared, double centered and multiplied with −0.5 , after which an eigenvalue decomposition is used to determine the coordinate values, thus −0.5 JD J *^ = Q 4 T ,

where elements of D * are defined as

1

  • 2 1 1

m m ij ijk ijk ijk k k

d w d w

= =

∑ ∑

followed by Z = Q 1/ 2, where only the first p positive ordered eigenvalues ( λ 1 ≥ λ 2 ≥ K ≥λ n ) and eigenvectors are used. This technique, classical scaling, is due to Torgerson (1952, 1958) and Gower (1966) and also known under the names Torgerson scaling or Torgerson-Gower scaling.

(Multiple) Random Start

The coordinate values are randomly generated from a uniform distribution using the default random number generator from the SPSS system.

User-Provided Start

The coordinate values provided by the user are used.

Step 2: Configuration Update

Update for the Common Space

The common space Z is related to the individual spaces X k (^) ( k = 1,..., m )through

the model X k = ZA k , where A k are matrices containing space weights.

Assume that weight matrix A k is of full rank. Only considering Z defines (1.1) as σ (^2) ( z (^) )= c + z Hz T^ − 2 z t T , (3.1)

where ( )

( )

( )

T 1 T 1

vec , 1 ,

1 vec ,

m k k k k m k k k k

m

m

=

=

z Z

H A A V

t B X X A

for which a solution is found as z = H t − (3.3)

Several special cases exist for which (3.3) can be simplified. First, the weights

matrices W k may all be equal, or even all equal to one. In these cases H will

simplify, as will the pseudo-inverse of H. Another simplification is concerned

with the different models, reflected in restrictions for the space weights. Equation

Suppose, P L Q k k T k is the singular value decomposition of A k ,for which the diagonal matrix with singular values L (^) k is in nonincreasing order. Then, for the reduced rank model, the best r r ( < p )rank approximation of A (^) k is given by T R T k k , where R (^) k contains the first r columns of P L k k , and T k contains the first r columns of Q k.

For the weighted Euclidean model, (3.4) reduces to a diagonal matrix ( ) ( ( ) ) T -1 T A (^) k =diag Z V Z k diag Z B X (^) k X (^) k.

The space weights for the identity model need no update, since A k = I for all k.

Simplifications can be obtained if all weights W are equal to one and for the

reduced rank model, which can be done in r dimensions, as explained in Heiser and Stoop (1986).

Restrictions

Fixed coordinates

If some of the coordinates of Z are fixed by the user, then only the free

coordinates of Z need to be updated. The dimensionwise approach is taken one

step further, which results in an update for object i on dimension a as T T T T T T T 1 1

m p m ia i k k k a j k k a k j i a ia i a i k^ j^ a^ k i a i

z m m

= ≠ =

 ^  

e V e e^ ∑^ B X^ X A e^ ∑ ∑ e A A e V^ z^ e V e e V z %

where the a th column of Z is divided into z (^) a = z % (^) ia (^) + zia e i , with e i the i th column of

the identity matrix, and T^ T 1

1 m a j k k a k m (^) k =

V = (^) ∑ e A A e V.

This update procedure will only locally minimize (3.1) and repeatedly cycling through all free coordinates until convergence is reached, will provide global optimization. After all free coordinates have been updated, Z is centered on the origin. On output, the configuration is adapted as to coincide with the initial fixed coordinates.

Independent variables

Independent variables Q are used to express the coordinates of the common space

Z as a weighted sum of these independent variables as

T 1

h j j j =

Z = QB = (^) ∑ q b.

An update for Z is found by performing the following calculations for j = 1,..., h :

1. T

h j k k kj

U = (^) ∑ q b

2. T

1

1 m

j k j k k

m k =

T = C − (^) ∑ V U A A , where (^) ( ) T 1

1 m

k k k

m k =

C = (^) ∑ B X X A

3. update b j as

1 T T 1

1 m

j j k j k k j j

mk

=

b (^) ∑ q V q A A T q

  1. optionally, compute optimally transformed variables by regressing

1 1

j j j j j

k k

q %^ T b I V q , where T^ T

1

1 m

j j k k j k

m k =

V = (^) ∑ b A A b V and

k 1 is greater than or equal to the largest eigenvalue of V j , on the original

variable q j. Missing elements in the original variable are replaced with the

corresponding values from q % j.

Finally, set T 1

h j j j =

Z = QB = (^) ∑ q b.

Independent variables restrictions were introduced for the MDS model in Bentler and Weeks (1978), Bloxom (1978), de Leeuw and Heiser (1980) and Meulman and Heiser (1984). If there are more dimensions ( p ) than independent variables ( s ), p - s dummy variables are created and treated completely free in the analysis. The transformations for the independent variables from Step 4 are identical to the transformations of the proximities, except that the nonnegativety constraint does

not apply. After transformation, the variables q are centered on the origin,

normalized on n , and the reverse normalization is applied to the regression weights

b.

Spline

( )

vec D ˆ = Sb. PROXSCAL uses monotone spline transformations (Ramsay,

1988). In this case, the spline transformation gives a smooth nondecreasing piecewise polynomial transformation. It is computed as a weighted regression of

D on the spline basis S. Regression weights b are restricted to be nonnegative

and computed using nonnegative alternating least squares (Groenen, van Os and Meulman, 2000).

Normalization

After transformation, the transformed proximities are normalized such that the sum-of-squares of the weighted transformed proximities are equal to mn ( n -1)/2 in the unconditional case and equal to n ( n -1)/2 in the matrix-conditional case.

Step 4: Termination

After evaluation of the loss function, the old function value and new function values are used to decide whether iterations should continue. If the new function value is smaller than or equal to the minimum Stress value MINSTRESS, provided by the user, iterations are terminated. Also, if the difference in consecutive Stress values is smaller than or equal to the convergence criterion DIFFSTRESS, provided by the user, iterations are terminated. Finally, iterations are terminated if the current number of iterations, exceeds the maximum number of iterations MAXITER, also provided by the user. In all other cases, iterations continue.

Remaining Issues

Acceleration

For the identity model without further restictions, the common space can be

updated with acceleration as Z new^ = 2 Z update^ − Z old, also refered to as the

relaxed update.

Lowering dimensionality

For a restart in p -1 dimensions, the p -1 most important dimensions need to be identified. For the identity model, the first p -1 principal axes are used. For the

weighted Euclidean model, the p -1 most important space weights are used, and for the generalized Euclidean and reduced rank models, the p -1 largest singular values of the space weights determine the remaining dimensions.

Stress measures

The following statistics are used for the computation of the Stress measures:

( )

( )

( ) ( )

( ) ( )

( ) ( )

( ) ( )

( ) ( ( ) ( ))

2 2 1

4 4 1

2 2 1

4 4 1

1

2 2 2 1 2 2 1

d

d

ˆ d

ˆ d

d d

m n ijk ijk k i j m n ijk ijk k i j m n ijk ij k k i j m n ijk ij k k i j m n ijk ijk ij k k i j m n ijk ijk ij k k i j m n ijk ij k k i j

w d

w d

w

w

w d

w d

w

η η η η ρ ρ κ = <

= <

= <

= <

= <

= <

= <

∑∑

∑∑

∑∑

∑∑

∑∑

∑∑

∑∑

D

D

X X

X X

X X

X X

X X X

where d ( X )is the average distance.

The loss function minimized by PROXSCAL, normalized raw Stress, is given by:

( ) (^ )^ (^ )

( )

2 2 2 2

η η α ρ α σ η

D X X

D

, with

( ) ( )

2

ρ α η

X

X

Note that at a local minimum of X , α is equal to one. The other Fit and Stress measures provided by PROXSCAL are given by:

Stress-I:

( ) (^ )^ (^ )

( )

2 2

2

η ˆ^ η α 2 ρ α

η α

D + X − X

X

, with

( )

( )

η^2 ˆ α ρ

D

X

References

Barlow, R. E., Bortholomew, D. J., Bremner, J. M. and Brunk, H. D. (1972). Statistical inference under order restrictions. New York: Wiley.

Bentler, P. M. and Weeks, D. G. (1978). Restricted multidimensional scaling models. Journal of Mathematical Psychology , 17 , 138-151.

Bloxom, B. (1978). Constrained multidimensional scaling in n spaces. Psychometrika , 43 , 397-408.

Carroll, J. D. and Chang, J. J. (1972, March). IDIOSCAL (Individual Differences In Orientation SCALing): A generalization of INDSCAL allowing idiosyncratic reference systems as well as analytic approximation to INDSCAL. Paper presented at the spring meeting of the Psychometric Society. Princeton, NJ.

Commandeur, J. J. F. and Heiser, W. J. (1993). Mathematical derivations in the proximity scaling (PROXSCAL) of symmetric data matrices (Tech. Rep. No. RR- 93-03). Leiden, The Netherlands: Department of Data Theory, Leiden University.

De Leeuw, J. (1977). Applications of convex analysis to multidimensional scaling. In J. R. Barra, F. Brodeau, G. Romier, and B. van Cutsem (Eds.), Recent developments in statistics (pp. 133-145). Amsterdam, The Netherlands: North- Holland.

De Leeuw, J. and Heiser, W. J. (1980). Multidimensional scaling with restrictions on the configuration. In P. R. Krishnaiah (Ed.), Multivariate analysis (Vol. V, pp. 501-522). Amsterdam, The Netherlands: North-Holland.

Gower, J. C. (1966). Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika , 53 , 325-338.

Groenen, P. J. F., Heiser, W. J. and Meulman, J. J. (1999). Global optimization in least squares multidimensional scaling by distance smoothing. Journal of Classification , 16 , 225-254.

Groenen, P. J. F., van Os, B. and Meulman, J. J. (2000). Optimal scaling by alternating length-constained nonnegative least squares, with application to distance-based analysis. Psychometrika , 65 , 511-524.

Heiser, W. J. (1985). A general MDS initialization procedure using the SMACOF algorithm-model with constraints (Tech. Rep. No. RR-85-23). Leiden, The Netherlands: Department of Data Theory, Leiden University.

Heiser, W. J. (1987). Joint ordination of species and sites: The unfolding technique. In P. Legendre and L. Legendre (Eds.), Developments in numerical ecology (pp. 189-221). Berlin, Heidelberg: Springer-Verlag.

Heiser, W. J. and De Leeuw, J. (1986). SMACOF-I (Tech. Rep. No. UG-86-02). Leiden, The Netherlands: Department of Data Theory, Leiden University.

Heiser, W. J. and Stoop, I. (1986). Explicit SMACOF algorithms for individual differences scaling (Tech. Rep. No. RR-86-14). Leiden, The Netherlands: Department of Data Theory, Leiden University.

Kruskal, J. B. (1964). Nonmetric multidimensional scaling: A numerical method. Psychometrika , 29 , 28-42.

Meulman, J. J. and Heiser, W. J. (1984). Constrained Multidimensional Scaling: more Directions than Dimensions. In T. Havranek et al (Eds.), COMPSTAT 1984 (pp. 137-142). Wien: Physica Verlag.

Ramsay, J. O. (1988). Monotone regression splines in action. Statistical Science , 3 (4), 425-461.

Stoop, I., Heiser, W. J. and De Leeuw, J. (1981). How to use SMACOF-IA. Leiden, The Netherlands: Department of Data Theory, Leiden University.

Stoop, I. and De Leeuw, J. (1982). How to use SMACOF-IB. Leiden, The Netherlands: Department of Data Theory, Leiden University.

Torgerson, W. S. (1952). Multidimensional scaling: I. Theory and method. Psychometrika , 17 , 401-419.

Torgerson, W. S. (1958). Theory and methods of scaling. New York: Wiley.

Wolkowicz, H. and Styan, G. P. H. (1980). Bounds for eigenvalues using traces. Linear algebra and its applications , 29 , 471-506.