Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

SVD and Correspondence Analysis: Scaling, Variance Estimation, and Singular Values, Study notes of Mathematical Statistics

Alliance University Mathematical Statistics

An in-depth explanation of singular value decomposition (svd) and correspondence analysis. Topics covered include data scaling through centering and rescaling, variance estimation using the delta method, and the ability of correspondence analysis to fit supplementary points and make biplots. The document also discusses various standardization options and the determination of singular values, maximum rank, and inertia.

Typology: Study notes

2011/2012

Uploaded on 10/31/2012

sangawar 🇮🇳

4.5

(4)

118 documents

1 / 10

This page cannot be seen from the preview

Don't miss anything!

1

CORRESPONDENCE

The CORRESPONDENCE algorithm consists of three major parts:

1. A singular value decomposition (SVD)

2. Centering and rescaling of the data and various rescalings of the results

3. Variance estimation by the delta method.

Other names for SVD are “Eckart-Young decomposition” after Eckart and Young

(1936), who introduced the technique in psychometrics, and “basic structure”

(Horst, 1963). The rescalings and centering, including their rationale, are well

explained in Benzécri (1969), Nishisato (1980), Gifi (1981), and Greenacre (1984).

Those who are interested in the general framework of matrix approximation and

reduction of dimensionality with positive definite row and column metrics are

referred to Rao (1980). The delta method is a method that can be used for the

derivation of asymptotic distributions and is particularly useful for the

approximation of the variance of complex statistics. There are many versions of the

delta method, differing in the assumptions made and in the strength of the

approximation (Rao, 1973, ch. 6; Bishop et al., 1975, ch. 14; Wolter, 1985, ch. 6).

Other characteristic features of CORRESPONDENCE are the ability to fit

supplementary points into the space defined by the active points, the ability to

constrain rows and/or columns to have equal scores, and the ability to make biplots

using either chi-squared distances, as in standard correspondence analysis, or

Euclidean distances.

Notation

The following notation is used throughout this chapter unless otherwise stated:

1

t Total number of rows (row objects)

1

s Number of supplementary rows

1

k Number of rows in analysis (i 11 st −)

2

t Total number of columns (column objects)

2

s Number of supplementary columns

Discover Study notes of Mathematical Statistics Alliance University

Partial preview of the text

Download SVD and Correspondence Analysis: Scaling, Variance Estimation, and Singular Values and more Study notes Mathematical Statistics in PDF only on Docsity!

1

CORRESPONDENCE

The CORRESPONDENCE algorithm consists of three major parts:

A singular value decomposition (SVD)
Centering and rescaling of the data and various rescalings of the results
Variance estimation by the delta method.

Other names for SVD are “Eckart-Young decomposition” after Eckart and Young (1936), who introduced the technique in psychometrics, and “basic structure” (Horst, 1963). The rescalings and centering, including their rationale, are well explained in Benzécri (1969), Nishisato (1980), Gifi (1981), and Greenacre (1984). Those who are interested in the general framework of matrix approximation and reduction of dimensionality with positive definite row and column metrics are referred to Rao (1980). The delta method is a method that can be used for the derivation of asymptotic distributions and is particularly useful for the approximation of the variance of complex statistics. There are many versions of the delta method, differing in the assumptions made and in the strength of the approximation (Rao, 1973, ch. 6; Bishop et al., 1975, ch. 14; Wolter, 1985, ch. 6). Other characteristic features of CORRESPONDENCE are the ability to fit supplementary points into the space defined by the active points, the ability to constrain rows and/or columns to have equal scores, and the ability to make biplots using either chi-squared distances, as in standard correspondence analysis, or Euclidean distances.

Notation

The following notation is used throughout this chapter unless otherwise stated:

t 1

Total number of rows (row objects)

s 1

Number of supplementary rows

k 1 Number of rows in analysis (i t 1 − s 1 )

t 2

Total number of columns (column objects)

s 2

Number of supplementary columns

k 2 Number of columns in analysis ( t 2 − s 2 )

p Number of dimensions

Data-Related Quantities

f ij

Nonnegative data value for row i and column j: collected in table F

f (^) i+ Marginal total of row i, i = 1 , K,k 1

f (^) + j Marginal total of column j, j = 1 , K,k 2

N Grand total of F

Scores and Statistics

ris Score of row object^ i^ on dimension^ s

c (^) js Score of column object^ j^ on dimension^ s

I Total inertia

Basic Calculations

One way to phrase the CORRESPONDENCE objective (cf. Heiser, 1981) is to say that we wish to find row scores {r (^) is }and column scores {c (^) js }so that the function

( ) = (^) ∑∑ ∑ ( − ) i j s

ris cjs fij ris cjs

2 σ{ };{ }

is minimal, under the standardization restriction either that

f (^) i r ris it st i

∑^ +^ =δ

or

(b) standardization option cmean (remove column means)

ij

~

f ij = f ,

k 1

N

f i +~ = , f j~ + =f+j

(c) rcmean (remove both row and column means) ij

~

f ij = f , + = i+

~

f i f , j

~

f j + =f+

(d) standardization option rsum (equalize row totals, then remove row means)

i

~ ~ ij i ij

f

f f

f ,

k 1

N

f i +~ = ,

k 2

N

f +~j =

(e) standardization option csum (equalize column totals, then remove column means)

j

~ ~ ij j ij

f

f f

f

k 1

N

f i +~ = ,

k 2

N

f +~j =

2. Then, if not computed yet in step 1, f i+~ , or/and f j~+ are computed:

k 1

N

f i ~+ = ,

k 2

N

f +~ j = , and

~ f j ~ i

~ ij ij

f

z

2. Singular value decomposition

When rows and/or columns are specified as supplementary, first these rows and/or

colums of Z are set to zero, yielding Z

Let the singular value decomposition of Z be denoted by

Z = K Λ L^ ’

with K K’^ = I , L L’^ = I , and Λ diagonal. This decomposition is calculated by a

routine based on Golub and Reinsch (1971). It involves Householder reduction to bidiagonal form and diagonalization by a QR procedure with shifts. The routine requires an array with more rows than columns, so when k 1 < k 2 the original table is transposed and the parameter transfer is permuted accordingly.

3. Adjustment to the row and column metric

The arrays of both the left-hand singular vectors and the right-hand singular vectors are adjusted row-wise to form scores that are standardized in the row and in the column marginal proportions, respectively:

~ris =kis fi+~ N,

~c js =ljs f+~j N.

This way, both sets of scores satisfy the standardization restrictions simultaneously.

4. Determination of variances and covariances

For the application of the delta method to the results of generalized eigenvalue methods under multinomial sampling, the reader is referred to Gifi (1990, ch. 12) and Israels&& (1987, Appendix B). It is shown there that N time variance-covariance matrix of a function φ of the observed cell proportions p { p f N} ~

= ij = ij

asymptotically reaches the form

( ( ))

’ ’

cov ~

× =∑∑ ∑∑ ∑∑ i j ij

ij i j i j ij

ij ij ij

ij

p p p p

N p

∂

∂φ π ∂

∂φ ∂

∂φ φ π

Here the quantities π (^) ij are the cell probabilities of the multinomial distribution, and ∂φ ∂pij are the partial derivatives of φ (which is either a generalized eigenvalue or a generalized eigenvector) with respect to the observed cell

∑

j

j js s

i

i is s

f c N

f r N

β

α

λ

~ 2 2

columnscores:

rowscores:

The estimated variances and covariances are adjusted according to the type of normalization chosen.

Diagnostics

After printing the data, CORRESPONDENCE optionally also prints a table of row profiles and column profiles, which are { f (^) ij fi+}and { f (^) ij f+ j}, respectively.

Singular Values, Maximum Rank and Inertia

All singular values λ (^) s defined in step 2 are printed up to a maximum of min{ ( k 1 − 1 ) (, k 2 − 1 )}. Small singular values and corresponding dimensions are suppressed when they don’t exceed the quantity ( ) 7 12

k k − ; in this case a

warning message is issued. Dimensionwise inertia and total inertia are given by the relationships

∑ ∑∑

s s i

is

~ i s

N

f r

I

2 2 λ

where the right-hand part of this equality is true only if the normalization is row principal (but for the other normalizations similar relationships are easily derived from step 5). The quantities “proportion explained” are equal to inertia divided by total inertia: λ (^2) s I.

Supplementary Points

Supplementary row and column points are given by

2 2 ~

~ sup −

=∑ λα js s j (^) i

ij

is c

f

r

2 2 ~

~ sup −

=∑ λβ is s i (^) j

ij

js r

f

c

Mass, Scores, Inertia and Contributions

The mass, scores, inertia and contributions for the row and columns points (including supplementary points) are given in the Overview Row Points Table and the Overview Column Points Table. These tables are printed in p dimensions. The tables are given first for rows, then for columns. The masses are the marginal

proportions ( f i+~ N and f +~ j N, respectively). The inertia of the rows/columns

is given by:

∑

1

2

k

i

j ij

k

j

i ij

I z

For supplementary points, the contribution to the inertia of dimensions is zero. The contribution of the active points to the inertia of each dimension is given by

β

α

λ

τ

λ

τ

2

~ 2

2

~ 2

s

j js js

s

i is is

c

N

f

r

N

f

The contribution of dimensions to the inertia of each point is given by

References

Benzécri, J. P. 1969. Statistical analysis as a tool to make patterns emerge from data. In: Methodologies of Pattern Recognition, S. Watanabe, ed. New York: Academic Press.

Bishop, Y. M. M., Fienberg, S. E., and Holland, P. W. 1975. Discrete multivariate analysis: Theory and practice. Cambridge, Mass.: MIT Press.

Eckart, C., and Young, G. 1936. The approximation of one matrix by another one of lower rank. Psychometrika, 1: 211–218.

Gifi, A. 1981. Nonlinear multivariate analysis. Leiden: Department of Data Theory.

Golub, G. H., and Reinsch, C. 1971. Linear algebra, Chapter I.10. In: Handbook for Automatic Computation, Volume II, J. H. Wilkinson and C. Reinsch, eds. New York: Springer-Verlag.

Greenacre, M. J. 1984. Theory and applications of correspondence analysis. London: Academic Press.

Heiser, W. J. 1981. Unfolding analysis of proximal data. Doctoral dissertation. Department of Data Theory, University of Leiden.

Horst, P. 1963. Matrix algebra for social scientists. New York: Holt, Rinehart, and Winston.

Israëls, A. 1987. Eigenvalue techniques for qualitative data. Leiden: DSWO Press.

Nishisato, S. 1980. Analysis of categorical data: dual scaling and its applications. Toronto: University of Toronto Press.

Rao, C. R. 1973. Linear statistical inference and its applications, 2nd ed. New York: John Wiley & Sons, Inc.

Rao, C. R. 1980. Matrix approximations and reduction of dimensionality in multivariate statistical analysis. In: Multivariate Analysis, Vol. 5, P. R. Krishnaiah, ed. Amsterdam: North-Holland.

Wolter, K. M. 1985. Introduction to variance estimation. Berlin: Springer-Verlag.

SVD and Correspondence Analysis: Scaling, Variance Estimation, and Singular Values, Study notes of Mathematical Statistics

Related documents

Partial preview of the text

Download SVD and Correspondence Analysis: Scaling, Variance Estimation, and Singular Values and more Study notes Mathematical Statistics in PDF only on Docsity!

CORRESPONDENCE

Notation

t 1

s 1

k 1 Number of rows in analysis (i t 1 − s 1 )

t 2

s 2

k 2 Number of columns in analysis ( t 2 − s 2 )

Data-Related Quantities

f ij

Scores and Statistics

ris cjs fij ris cjs

f ij = f ,

k 1

N

f i +~ = , f j~ + =f+j

f ij = f , + = i+

f i f , j

f j + =f+

f

f f

f ,

k 1

N

f i +~ = ,

k 2

N

f +~j =

f

f f

f

k 1

N

f i +~ = ,

k 2

N

f +~j =

2. Then, if not computed yet in step 1, f i+~ , or/and f j~+ are computed:

k 1

N

f i ~+ = ,

k 2

N

f +~ j = , and

f

f

z

colums of Z are set to zero, yielding Z

Let the singular value decomposition of Z be denoted by

Z = K Λ L^ ’

with K K’^ = I , L L’^ = I , and Λ diagonal. This decomposition is calculated by a

~ris =kis fi+~ N,

~c js =ljs f+~j N.

= ij = ij

cov ~

p p p p

N p

f c N

f r N

columnscores:

rowscores:

Singular Values, Maximum Rank and Inertia

k k − ; in this case a

N

f r

I

Supplementary Points

is c

f

f

r

js r

f

f

c

Mass, Scores, Inertia and Contributions