Understanding Factor Analysis: Modeling the Covariance Structure of Measured Traits, Study notes of Statistics

Factor analysis is a statistical method used to describe the covariance relationships between multiple measured traits by identifying underlying unobservable factors. The concept of factor analysis, the orthogonal factor model, and the restrictions imposed on the model. It also discusses three methods for estimating the factor loading matrix: principal component method, principal factor method, and maximum likelihood method.

Typology: Study notes

Pre 2010

Uploaded on 08/19/2009

koofers-user-gy1
koofers-user-gy1 🇺🇸

10 documents

1 / 11

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Factor Analysis
Recall, the objective of factor analysis is to describe
the covariance relationships between a large number of
measured traits using a few linear combinations of un-
derlying unobservable traits. Notice that this type of
analysis is model based, unlike PCA.
Example: Suppose that we measure the white blood
cell count, the circulating antibody volume, the thyroid
activity, and the motile sperm count for men who were
exposed to radiation from a nuclear plant many years
ago. We could explain much of the variation in these
variables by knowing the exact amount of radiation ex-
posure, which is no longer unobservable.
Setup
Let ydenote the set of pmeasurements, E(y) = µde-
note their mean, and let var(y) = Σ. Then, the idea of
factor analysis is to construct a linear model
yµ=Lf +²
=
`11f1+`12f2+···+`1mfm
.
.
.
`p1f1+`p2f2+···+`pmfm
+
²1
.
.
.
²p
.
Here, yµare the pcentered measurements, Lis the
p×mmatrix of factor loadings, fare the unobserved
common factors for the population, and ²are the ran-
dom errors (or variation which is not accounted for by
the common factors).
NOTE: We want m(the number of factors) to be much
smaller than p(the number of measured attributes).
D.1
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Understanding Factor Analysis: Modeling the Covariance Structure of Measured Traits and more Study notes Statistics in PDF only on Docsity!

Factor Analysis

Recall, the objective of factor analysis is to describe the covariance relationships between a large number of measured traits using a few linear combinations of un- derlying unobservable traits. Notice that this type of analysis is model based, unlike PCA.

Example: Suppose that we measure the white blood cell count, the circulating antibody volume, the thyroid activity, and the motile sperm count for men who were exposed to radiation from a nuclear plant many years ago. We could explain much of the variation in these variables by knowing the exact amount of radiation ex- posure, which is no longer unobservable.

Setup

Let y denote the set of p measurements, E(y) = μ de-

note their mean, and let var(y) = Σ. Then, the idea of factor analysis is to construct a linear model

y − μ = Lf + ≤

11 f 1 + 12 f 2 + · · · + 1 mfm ...p 1 f 1 + p 2 f 2 + · · · +pmfm

≤p

Here, y − μ are the p centered measurements, L is the p × m matrix of factor loadings, f are the unobserved common factors for the population, and ≤ are the ran- dom errors (or variation which is not accounted for by the common factors).

NOTE: We want m (the number of factors) to be much smaller than p (the number of measured attributes).

Restrictions

We will impose the following restrictions on our model

  • E(≤) = 0.
  • var(≤) = Ψp×p = Diag(ψ 1 , · · · , ψp).
  • ≤ and f are independent.
  • Typically, E(f ) = 0 and var(f ) = Im×m. This is called the orthogonal factor model, but it need not always be the case.

This model imposes a covariance structure on y:

Note that since Ψ is diagonal, the off diagonal elements of LL′^ are σij, the covariances in Σ.

This means that cov(yi, yj) =

∑m k=1 ikjk^ and that the covariance of y is completely determined by the m fac-

tors (m << p).

Furthermore, var(yi) =

∑m k=1 `

2 ik +^ ψi^ where^ ψi^ is called the specific variance and the summation term is called the ith communality; the variance of the ith measure- ment which is shared with the other variables via the m common factors.

Principal Component Method

Recall that we wrote the spectral decomposition as

Σ = λ 1 a 1 a′ 1 + · · · + λpapa′ p = A′ΛA

∑^ m

i=

λiaia′ i +

∑^ p

i=m+

λiaia′ i

∑^ m

i=

i′ i +

∑^ p

i=m+

λiaia′ i.

Here, `i = ai

λi. Note that the second term is not diagonal in general.

We will assume that ψi ≈ σii −

∑m j=1 `

2 ij =^ σii^ −^

∑m j=1 λia

2 ij. Then, Σ ≈ LL′^ + Ψ.

So we will estimate L and Ψ by calculating the expected eigenvalues and eigenvectors from S or R.

Observations

  • The estimated factor loadings don’t change as the number of factors increases.
  • The diagonal elements of ˆLLˆ′^ + ˆΨ are equal to the diagonal elements of S or R, but the covariances may not be exactly reproduced.
  • We will select m to make the off-diagonal elements close to the values in S (or to make the off-diagonal elements of S − (ˆLLˆ′^ + ˆΨ) small).

Principal Factor Method

Consider modelling the correlation matrix, R = LL′^ + Ψ. Then,

LL′^ = R − Ψ =

h^21 r 12 · · · r 1 p r 21 h^22 · · · r 2 p ... ...... ... rp 1 rp 2 · · · h^2 p

where h^2 i = 1 − ψi (the communality).

Suppose that initial estimates are available for the com- munalities, (h∗ 1 )^2 , (h∗ 2 )^2 , · · · , (h∗ p)^2. For example, we might

regress each trait on all the others, and then use the R^2 (the multiple correlation coefficient) as h^2. This can be done using the PRIORS SMC option in SAS’s PROC FACTOR.

Then, the estimate of R − Ψ at step r is

(R − Ψ)r =

(h∗ 1 )^2 r 12 · · · r 1 p r 21 (h∗ 2 )^2 · · · r 2 p ... ...... ... rp 1 rp 2 · · · (h∗ p)^2

 =^ L

∗ r(L

∗ r)

where

L∗ r =

ˆλ∗ 1 ˆa∗ 1 , · · · ,

ˆλ∗ mˆa∗ m

, and ψ^2 i,r = 1 −

∑^ m

j=

ˆλ∗ i (ˆa∗ ij)^2.

Maximum Likelihood Estimation

A likelihood function is needed to perform any sort of maximum likelihood estimation. Toward this end, we will make the additional assumption that yj ∼ N(μ, Σ), for j = 1, · · · , n.

Similarly, we will assume that f ∼ N(0, I) and ≤j ∼

N(0, Ψ).

We will make the further restriction that L′Ψ−^1 L = ∆, where ∆ is a diagonal matrix. We need this restriction since the factor loading matrix is not unique.

Some points to note:

  • Finding the maximum of the likelihood for this data is quite “expensive” computationally.
  • We will use other methods for exploratory data analysis.
  • Likelihood ratio tests could be used for testing hy- potheses in this framework.

Factor Rotation:

Recall, Tm×m is an orthogonal matrix, which has the property that ˆLLˆ′^ + ˆΨ = ˆL∗(ˆL∗)′^ + ˆΨ where L∗^ = LT. This means that estimated specific variances and com- munalities can not be altered by the orthogonal trans- formation. We previously said that there are an infinite number of choices for T. Some selection criterion is necessary.

For example, we will find the orthogonal transformation that maximizes the objective function

∑^ m

j=

^1

p

∑^ p

i=

`∗ ij^2 hi

γ p

∑^ p

i=

`∗ ij^2 hi

The fraction `∗ ij^2 /hi is called the scaled loadings. This scaling gives variables with small communalities more influence.

Different choices of γ correspond to different orthogonal rotations found in the literature. For example:

  • Varimax, γ = 1.
  • Quartimax, γ = 0.
  • Equimax, γ = m/2.
  • Parsimax, γ = p p(+mm−−1) 2.

The Regression Method

We will spend a little time discussing the regression method, as that is what is usually used.

Consider the joint distribution of (yj − μ) and f (^) j, assum- ing multivariate normality, as in the maximum likelihood approach. Then, ( yj − μ f (^) j

∼ Np+m

0 , Σ∗^ =

[

LL′^ + Ψ L

L′^ Im×m

])

when the m factor model is correct.

So, the conditional mean of f (^) j given yj − μ is

E(f (^) j|yj − μ) = L′(LL′^ + Ψ)−^1 (yj − μ).

Notice that L′(LL′^ +Ψ)−^1 is an m×p matrix of regression coefficients.

Then, we will use the estimated conditional mean vector to estimate the factor scores,

fˆ (^) j = ˆL′(ˆLLˆ′^ + ˆΨ)−^1 (yj − ¯y).

Alternatively, we could reduce the effect of possible in- correct determination of the number of factors m, by using S as a substitute for ˆLLˆ′^ + ˆΨ. Then,

fˆ (^) j = ˆL′S−^1 (yj − y¯), j = 1, 2 , · · · , n.

Assessing the Model

Suppose that we have performed a factor analysis and are interested in determining if the model appears to be correct. We could consider

  • Plots
  • Check for outliers (recall that f (^) j are i.i.d from a N(0, Im×m) population when the model is true.
  • Check for multivariate normality.
  • Use univariate tests for normality to check the fac- tor scores.