SVD and Correspondence Analysis: Scaling, Variance Estimation, and Singular Values, Study notes of Mathematical Statistics

An in-depth explanation of singular value decomposition (svd) and correspondence analysis. Topics covered include data scaling through centering and rescaling, variance estimation using the delta method, and the ability of correspondence analysis to fit supplementary points and make biplots. The document also discusses various standardization options and the determination of singular values, maximum rank, and inertia.

Typology: Study notes

2011/2012

Uploaded on 10/31/2012

sangawar
sangawar 🇮🇳

4.5

(4)

118 documents

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
CORRESPONDENCE
The CORRESPONDENCE algorithm consists of three major parts:
1. A singular value decomposition (SVD)
2. Centering and rescaling of the data and various rescalings of the results
3. Variance estimation by the delta method.
Other names for SVD are “Eckart-Young decomposition” after Eckart and Young
(1936), who introduced the technique in psychometrics, and “basic structure”
(Horst, 1963). The rescalings and centering, including their rationale, are well
explained in Benzécri (1969), Nishisato (1980), Gifi (1981), and Greenacre (1984).
Those who are interested in the general framework of matrix approximation and
reduction of dimensionality with positive definite row and column metrics are
referred to Rao (1980). The delta method is a method that can be used for the
derivation of asymptotic distributions and is particularly useful for the
approximation of the variance of complex statistics. There are many versions of the
delta method, differing in the assumptions made and in the strength of the
approximation (Rao, 1973, ch. 6; Bishop et al., 1975, ch. 14; Wolter, 1985, ch. 6).
Other characteristic features of CORRESPONDENCE are the ability to fit
supplementary points into the space defined by the active points, the ability to
constrain rows and/or columns to have equal scores, and the ability to make biplots
using either chi-squared distances, as in standard correspondence analysis, or
Euclidean distances.
Notation
The following notation is used throughout this chapter unless otherwise stated:
1
t Total number of rows (row objects)
1
s Number of supplementary rows
1
k Number of rows in analysis (i 11 st )
2
t Total number of columns (column objects)
2
s Number of supplementary columns
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download SVD and Correspondence Analysis: Scaling, Variance Estimation, and Singular Values and more Study notes Mathematical Statistics in PDF only on Docsity!

1

CORRESPONDENCE

The CORRESPONDENCE algorithm consists of three major parts:

  1. A singular value decomposition (SVD)
  2. Centering and rescaling of the data and various rescalings of the results
  3. Variance estimation by the delta method.

Other names for SVD are “Eckart-Young decomposition” after Eckart and Young (1936), who introduced the technique in psychometrics, and “basic structure” (Horst, 1963). The rescalings and centering, including their rationale, are well explained in Benzécri (1969), Nishisato (1980), Gifi (1981), and Greenacre (1984). Those who are interested in the general framework of matrix approximation and reduction of dimensionality with positive definite row and column metrics are referred to Rao (1980). The delta method is a method that can be used for the derivation of asymptotic distributions and is particularly useful for the approximation of the variance of complex statistics. There are many versions of the delta method, differing in the assumptions made and in the strength of the approximation (Rao, 1973, ch. 6; Bishop et al., 1975, ch. 14; Wolter, 1985, ch. 6). Other characteristic features of CORRESPONDENCE are the ability to fit supplementary points into the space defined by the active points, the ability to constrain rows and/or columns to have equal scores, and the ability to make biplots using either chi-squared distances, as in standard correspondence analysis, or Euclidean distances.

Notation

The following notation is used throughout this chapter unless otherwise stated:

t 1

Total number of rows (row objects)

s 1

Number of supplementary rows

k 1 Number of rows in analysis (i t 1 − s 1 )

t 2

Total number of columns (column objects)

s 2

Number of supplementary columns

k 2 Number of columns in analysis ( t 2 − s 2 )

p Number of dimensions

Data-Related Quantities

f ij

Nonnegative data value for row i and column j: collected in table F

f (^) i+ Marginal total of row i, i = 1 , K,k 1

f (^) + j Marginal total of column j, j = 1 , K,k 2

N Grand total of F

Scores and Statistics

ris Score of row object^ i^ on dimension^ s

c (^) js Score of column object^ j^ on dimension^ s

I Total inertia

Basic Calculations

One way to phrase the CORRESPONDENCE objective (cf. Heiser, 1981) is to say that we wish to find row scores {r (^) is }and column scores {c (^) js }so that the function

( ) = (^) ∑∑ ∑ ( − ) i j s

ris cjs fij ris cjs

2 σ{ };{ }

is minimal, under the standardization restriction either that

f (^) i r ris it st i

∑^ +^ =δ

or

(b) standardization option cmean (remove column means)

ij

~

f ij = f ,

k 1

N

f i +~ = , f j~ + =f+j

(c) rcmean (remove both row and column means) ij

~

f ij = f , + = i+

~

f i f , j

~

f j + =f+

(d) standardization option rsum (equalize row totals, then remove row means)

i

~ ~ ij i ij

f

f f

f ,

k 1

N

f i +~ = ,

k 2

N

f +~j =

(e) standardization option csum (equalize column totals, then remove column means)

j

~ ~ ij j ij

f

f f

f

k 1

N

f i +~ = ,

k 2

N

f +~j =

2. Then, if not computed yet in step 1, f i+~ , or/and f j~+ are computed:

k 1

N

f i ~+ = ,

k 2

N

f +~ j = , and

~ f j ~ i

~ ij ij

f

f

z

2. Singular value decomposition

When rows and/or columns are specified as supplementary, first these rows and/or

colums of Z are set to zero, yielding Z

Let the singular value decomposition of Z be denoted by

Z = K Λ L^ ’

with K K’^ = I , L L’^ = I , and Λ diagonal. This decomposition is calculated by a

routine based on Golub and Reinsch (1971). It involves Householder reduction to bidiagonal form and diagonalization by a QR procedure with shifts. The routine requires an array with more rows than columns, so when k 1 < k 2 the original table is transposed and the parameter transfer is permuted accordingly.

3. Adjustment to the row and column metric

The arrays of both the left-hand singular vectors and the right-hand singular vectors are adjusted row-wise to form scores that are standardized in the row and in the column marginal proportions, respectively:

~ris =kis fi+~ N,

~c js =ljs f+~j N.

This way, both sets of scores satisfy the standardization restrictions simultaneously.

4. Determination of variances and covariances

For the application of the delta method to the results of generalized eigenvalue methods under multinomial sampling, the reader is referred to Gifi (1990, ch. 12) and Israels&& (1987, Appendix B). It is shown there that N time variance-covariance matrix of a function φ of the observed cell proportions p { p f N} ~

= ij = ij

asymptotically reaches the form

( ( ))

’ ’

cov ~

× =∑∑ ∑∑ ∑∑ i j ij

ij i j i j ij

ij ij ij

ij

p p p p

N p

∂φ π ∂

∂φ π ∂

∂φ ∂

∂φ φ π

Here the quantities π (^) ij are the cell probabilities of the multinomial distribution, and ∂φ ∂pij are the partial derivatives of φ (which is either a generalized eigenvalue or a generalized eigenvector) with respect to the observed cell

j

j js s

i

i is s

f c N

f r N

β

α

λ

λ

~ 2 2

~ 2 2

columnscores:

rowscores:

The estimated variances and covariances are adjusted according to the type of normalization chosen.

Diagnostics

After printing the data, CORRESPONDENCE optionally also prints a table of row profiles and column profiles, which are { f (^) ij fi+}and { f (^) ij f+ j}, respectively.

Singular Values, Maximum Rank and Inertia

All singular values λ (^) s defined in step 2 are printed up to a maximum of min{ ( k 1 − 1 ) (, k 2 − 1 )}. Small singular values and corresponding dimensions are suppressed when they don’t exceed the quantity ( ) 7 12

k k − ; in this case a

warning message is issued. Dimensionwise inertia and total inertia are given by the relationships

∑ ∑∑

s s i

is

~ i s

N

f r

I

2 2 λ

where the right-hand part of this equality is true only if the normalization is row principal (but for the other normalizations similar relationships are easily derived from step 5). The quantities “proportion explained” are equal to inertia divided by total inertia: λ (^2) s I.

Supplementary Points

Supplementary row and column points are given by

2 2 ~

~ sup −

=∑ λα js s j (^) i

ij

is c

f

f

r

2 2 ~

~ sup −

=∑ λβ is s i (^) j

ij

js r

f

f

c

Mass, Scores, Inertia and Contributions

The mass, scores, inertia and contributions for the row and columns points (including supplementary points) are given in the Overview Row Points Table and the Overview Column Points Table. These tables are printed in p dimensions. The tables are given first for rows, then for columns. The masses are the marginal

proportions ( f i+~ N and f +~ j N, respectively). The inertia of the rows/columns

is given by:

1

2

2

2

k

i

j ij

k

j

i ij

I z

I z

For supplementary points, the contribution to the inertia of dimensions is zero. The contribution of the active points to the inertia of each dimension is given by

β

α

λ

τ

λ

τ

2

~ 2

2

~ 2

s

j js js

s

i is is

c

N

f

r

N

f

The contribution of dimensions to the inertia of each point is given by

References

Benzécri, J. P. 1969. Statistical analysis as a tool to make patterns emerge from data. In: Methodologies of Pattern Recognition, S. Watanabe, ed. New York: Academic Press.

Bishop, Y. M. M., Fienberg, S. E., and Holland, P. W. 1975. Discrete multivariate analysis: Theory and practice. Cambridge, Mass.: MIT Press.

Eckart, C., and Young, G. 1936. The approximation of one matrix by another one of lower rank. Psychometrika, 1: 211–218.

Gifi, A. 1981. Nonlinear multivariate analysis. Leiden: Department of Data Theory.

Golub, G. H., and Reinsch, C. 1971. Linear algebra, Chapter I.10. In: Handbook for Automatic Computation, Volume II, J. H. Wilkinson and C. Reinsch, eds. New York: Springer-Verlag.

Greenacre, M. J. 1984. Theory and applications of correspondence analysis. London: Academic Press.

Heiser, W. J. 1981. Unfolding analysis of proximal data. Doctoral dissertation. Department of Data Theory, University of Leiden.

Horst, P. 1963. Matrix algebra for social scientists. New York: Holt, Rinehart, and Winston.

Israëls, A. 1987. Eigenvalue techniques for qualitative data. Leiden: DSWO Press.

Nishisato, S. 1980. Analysis of categorical data: dual scaling and its applications. Toronto: University of Toronto Press.

Rao, C. R. 1973. Linear statistical inference and its applications, 2nd ed. New York: John Wiley & Sons, Inc.

Rao, C. R. 1980. Matrix approximations and reduction of dimensionality in multivariate statistical analysis. In: Multivariate Analysis, Vol. 5, P. R. Krishnaiah, ed. Amsterdam: North-Holland.

Wolter, K. M. 1985. Introduction to variance estimation. Berlin: Springer-Verlag.