






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Factor analysis is a statistical method used to describe the covariance relationships between multiple measured traits by identifying underlying unobservable factors. The concept of factor analysis, the orthogonal factor model, and the restrictions imposed on the model. It also discusses three methods for estimating the factor loading matrix: principal component method, principal factor method, and maximum likelihood method.
Typology: Study notes
1 / 11
This page cannot be seen from the preview
Don't miss anything!







Recall, the objective of factor analysis is to describe the covariance relationships between a large number of measured traits using a few linear combinations of un- derlying unobservable traits. Notice that this type of analysis is model based, unlike PCA.
Example: Suppose that we measure the white blood cell count, the circulating antibody volume, the thyroid activity, and the motile sperm count for men who were exposed to radiation from a nuclear plant many years ago. We could explain much of the variation in these variables by knowing the exact amount of radiation ex- posure, which is no longer unobservable.
Setup
Let y denote the set of p measurements, E(y) = μ de-
note their mean, and let var(y) = Σ. Then, the idea of factor analysis is to construct a linear model
y − μ = Lf + ≤
11 f 1 + 12 f 2 + · · · + 1 mfm ...p 1 f 1 + p 2 f 2 + · · · +pmfm
≤p
Here, y − μ are the p centered measurements, L is the p × m matrix of factor loadings, f are the unobserved common factors for the population, and ≤ are the ran- dom errors (or variation which is not accounted for by the common factors).
NOTE: We want m (the number of factors) to be much smaller than p (the number of measured attributes).
We will impose the following restrictions on our model
This model imposes a covariance structure on y:
Note that since Ψ is diagonal, the off diagonal elements of LL′^ are σij, the covariances in Σ.
This means that cov(yi, yj) =
∑m k=1 ikjk^ and that the covariance of y is completely determined by the m fac-
tors (m << p).
Furthermore, var(yi) =
∑m k=1 `
2 ik +^ ψi^ where^ ψi^ is called the specific variance and the summation term is called the ith communality; the variance of the ith measure- ment which is shared with the other variables via the m common factors.
Recall that we wrote the spectral decomposition as
Σ = λ 1 a 1 a′ 1 + · · · + λpapa′ p = A′ΛA
∑^ m
i=
λiaia′ i +
∑^ p
i=m+
λiaia′ i
∑^ m
i=
i′ i +
∑^ p
i=m+
λiaia′ i.
Here, `i = ai
λi. Note that the second term is not diagonal in general.
We will assume that ψi ≈ σii −
∑m j=1 `
2 ij =^ σii^ −^
∑m j=1 λia
2 ij. Then, Σ ≈ LL′^ + Ψ.
So we will estimate L and Ψ by calculating the expected eigenvalues and eigenvectors from S or R.
Observations
Consider modelling the correlation matrix, R = LL′^ + Ψ. Then,
h^21 r 12 · · · r 1 p r 21 h^22 · · · r 2 p ... ...... ... rp 1 rp 2 · · · h^2 p
where h^2 i = 1 − ψi (the communality).
Suppose that initial estimates are available for the com- munalities, (h∗ 1 )^2 , (h∗ 2 )^2 , · · · , (h∗ p)^2. For example, we might
regress each trait on all the others, and then use the R^2 (the multiple correlation coefficient) as h^2. This can be done using the PRIORS SMC option in SAS’s PROC FACTOR.
Then, the estimate of R − Ψ at step r is
(R − Ψ)r =
(h∗ 1 )^2 r 12 · · · r 1 p r 21 (h∗ 2 )^2 · · · r 2 p ... ...... ... rp 1 rp 2 · · · (h∗ p)^2
∗ r(L
∗ r)
where
L∗ r =
ˆλ∗ 1 ˆa∗ 1 , · · · ,
ˆλ∗ mˆa∗ m
, and ψ^2 i,r = 1 −
∑^ m
j=
ˆλ∗ i (ˆa∗ ij)^2.
A likelihood function is needed to perform any sort of maximum likelihood estimation. Toward this end, we will make the additional assumption that yj ∼ N(μ, Σ), for j = 1, · · · , n.
Similarly, we will assume that f ∼ N(0, I) and ≤j ∼
N(0, Ψ).
We will make the further restriction that L′Ψ−^1 L = ∆, where ∆ is a diagonal matrix. We need this restriction since the factor loading matrix is not unique.
Some points to note:
Recall, Tm×m is an orthogonal matrix, which has the property that ˆLLˆ′^ + ˆΨ = ˆL∗(ˆL∗)′^ + ˆΨ where L∗^ = LT. This means that estimated specific variances and com- munalities can not be altered by the orthogonal trans- formation. We previously said that there are an infinite number of choices for T. Some selection criterion is necessary.
For example, we will find the orthogonal transformation that maximizes the objective function
∑^ m
j=
p
∑^ p
i=
`∗ ij^2 hi
γ p
∑^ p
i=
`∗ ij^2 hi
The fraction `∗ ij^2 /hi is called the scaled loadings. This scaling gives variables with small communalities more influence.
Different choices of γ correspond to different orthogonal rotations found in the literature. For example:
We will spend a little time discussing the regression method, as that is what is usually used.
Consider the joint distribution of (yj − μ) and f (^) j, assum- ing multivariate normality, as in the maximum likelihood approach. Then, ( yj − μ f (^) j
∼ Np+m
L′^ Im×m
when the m factor model is correct.
So, the conditional mean of f (^) j given yj − μ is
E(f (^) j|yj − μ) = L′(LL′^ + Ψ)−^1 (yj − μ).
Notice that L′(LL′^ +Ψ)−^1 is an m×p matrix of regression coefficients.
Then, we will use the estimated conditional mean vector to estimate the factor scores,
fˆ (^) j = ˆL′(ˆLLˆ′^ + ˆΨ)−^1 (yj − ¯y).
Alternatively, we could reduce the effect of possible in- correct determination of the number of factors m, by using S as a substitute for ˆLLˆ′^ + ˆΨ. Then,
fˆ (^) j = ˆL′S−^1 (yj − y¯), j = 1, 2 , · · · , n.
Suppose that we have performed a factor analysis and are interested in determining if the model appears to be correct. We could consider