Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Understanding Factor Analysis: Modeling the Covariance Structure of Measured Traits, Study notes of Statistics

University of Missouri (MU) - Columbia Statistics

Factor analysis is a statistical method used to describe the covariance relationships between multiple measured traits by identifying underlying unobservable factors. The concept of factor analysis, the orthogonal factor model, and the restrictions imposed on the model. It also discusses three methods for estimating the factor loading matrix: principal component method, principal factor method, and maximum likelihood method.

Typology: Study notes

Pre 2010

Uploaded on 08/19/2009

koofers-user-gy1 🇺🇸

10 documents

1 / 11

This page cannot be seen from the preview

Don't miss anything!

Factor Analysis

Recall, the objective of factor analysis is to describe

the covariance relationships between a large number of

measured traits using a few linear combinations of un-

derlying unobservable traits. Notice that this type of

analysis is model based, unlike PCA.

Example: Suppose that we measure the white blood

cell count, the circulating antibody volume, the thyroid

activity, and the motile sperm count for men who were

exposed to radiation from a nuclear plant many years

ago. We could explain much of the variation in these

variables by knowing the exact amount of radiation ex-

posure, which is no longer unobservable.

Setup

Let ydenote the set of pmeasurements, E(y) = µde-

note their mean, and let var(y) = Σ. Then, the idea of

factor analysis is to construct a linear model

y−µ=Lf +²

=



`11f1+`12f2+···+`1mfm

.

`p1f1+`p2f2+···+`pmfm

+



²1

.

²p

.

Here, y−µare the pcentered measurements, Lis the

p×mmatrix of factor loadings, fare the unobserved

common factors for the population, and ²are the ran-

dom errors (or variation which is not accounted for by

the common factors).

NOTE: We want m(the number of factors) to be much

smaller than p(the number of measured attributes).

D.1

Discover Study notes of Statistics University of Missouri (MU) - Columbia

Partial preview of the text

Download Understanding Factor Analysis: Modeling the Covariance Structure of Measured Traits and more Study notes Statistics in PDF only on Docsity!

Factor Analysis

Recall, the objective of factor analysis is to describe the covariance relationships between a large number of measured traits using a few linear combinations of un- derlying unobservable traits. Notice that this type of analysis is model based, unlike PCA.

Example: Suppose that we measure the white blood cell count, the circulating antibody volume, the thyroid activity, and the motile sperm count for men who were exposed to radiation from a nuclear plant many years ago. We could explain much of the variation in these variables by knowing the exact amount of radiation ex- posure, which is no longer unobservable.

Setup

Let y denote the set of p measurements, E(y) = μ de-

note their mean, and let var(y) = Σ. Then, the idea of factor analysis is to construct a linear model

y − μ = Lf + ≤

11 f 1 + 12 f 2 + · · · + 1 mfm ...p 1 f 1 + p 2 f 2 + · · · +pmfm

≤p

Here, y − μ are the p centered measurements, L is the p × m matrix of factor loadings, f are the unobserved common factors for the population, and ≤ are the ran- dom errors (or variation which is not accounted for by the common factors).

NOTE: We want m (the number of factors) to be much smaller than p (the number of measured attributes).

Restrictions

We will impose the following restrictions on our model

E(≤) = 0.
var(≤) = Ψp×p = Diag(ψ 1 , · · · , ψp).
≤ and f are independent.
Typically, E(f ) = 0 and var(f ) = Im×m. This is called the orthogonal factor model, but it need not always be the case.

This model imposes a covariance structure on y:

Note that since Ψ is diagonal, the off diagonal elements of LL′^ are σij, the covariances in Σ.

This means that cov(yi, yj) =

∑m k=1 ikjk^ and that the covariance of y is completely determined by the m fac-

tors (m << p).

Furthermore, var(yi) =

∑m k=1 `

2 ik +^ ψi^ where^ ψi^ is called the specific variance and the summation term is called the ith communality; the variance of the ith measure- ment which is shared with the other variables via the m common factors.

Principal Component Method

Recall that we wrote the spectral decomposition as

Σ = λ 1 a 1 a′ 1 + · · · + λpapa′ p = A′ΛA

∑^ m

i=

λiaia′ i +

∑^ p

i=m+

λiaia′ i

∑^ m

i=

i′ i +

∑^ p

i=m+

λiaia′ i.

Here, `i = ai

λi. Note that the second term is not diagonal in general.

We will assume that ψi ≈ σii −

∑m j=1 `

2 ij =^ σii^ −^

∑m j=1 λia

2 ij. Then, Σ ≈ LL′^ + Ψ.

So we will estimate L and Ψ by calculating the expected eigenvalues and eigenvectors from S or R.

Observations

The estimated factor loadings don’t change as the number of factors increases.
The diagonal elements of ˆLLˆ′^ + ˆΨ are equal to the diagonal elements of S or R, but the covariances may not be exactly reproduced.
We will select m to make the off-diagonal elements close to the values in S (or to make the off-diagonal elements of S − (ˆLLˆ′^ + ˆΨ) small).

Principal Factor Method

Consider modelling the correlation matrix, R = LL′^ + Ψ. Then,

LL′^ = R − Ψ =

h^21 r 12 · · · r 1 p r 21 h^22 · · · r 2 p ... ...... ... rp 1 rp 2 · · · h^2 p

where h^2 i = 1 − ψi (the communality).

Suppose that initial estimates are available for the com- munalities, (h∗ 1 )^2 , (h∗ 2 )^2 , · · · , (h∗ p)^2. For example, we might

regress each trait on all the others, and then use the R^2 (the multiple correlation coefficient) as h^2. This can be done using the PRIORS SMC option in SAS’s PROC FACTOR.

Then, the estimate of R − Ψ at step r is

(R − Ψ)r =

(h∗ 1 )^2 r 12 · · · r 1 p r 21 (h∗ 2 )^2 · · · r 2 p ... ...... ... rp 1 rp 2 · · · (h∗ p)^2

 =^ L

∗ r(L

∗ r)

where

L∗ r =

ˆλ∗ 1 ˆa∗ 1 , · · · ,

ˆλ∗ mˆa∗ m

, and ψ^2 i,r = 1 −

∑^ m

j=

ˆλ∗ i (ˆa∗ ij)^2.

Maximum Likelihood Estimation

A likelihood function is needed to perform any sort of maximum likelihood estimation. Toward this end, we will make the additional assumption that yj ∼ N(μ, Σ), for j = 1, · · · , n.

Similarly, we will assume that f ∼ N(0, I) and ≤j ∼

N(0, Ψ).

We will make the further restriction that L′Ψ−^1 L = ∆, where ∆ is a diagonal matrix. We need this restriction since the factor loading matrix is not unique.

Some points to note:

Finding the maximum of the likelihood for this data is quite “expensive” computationally.
We will use other methods for exploratory data analysis.
Likelihood ratio tests could be used for testing hy- potheses in this framework.

Factor Rotation:

Recall, Tm×m is an orthogonal matrix, which has the property that ˆLLˆ′^ + ˆΨ = ˆL∗(ˆL∗)′^ + ˆΨ where L∗^ = LT. This means that estimated specific variances and com- munalities can not be altered by the orthogonal trans- formation. We previously said that there are an infinite number of choices for T. Some selection criterion is necessary.

For example, we will find the orthogonal transformation that maximizes the objective function

∑^ m

j=

^1

p

∑^ p

i=

`∗ ij^2 hi

γ p

∑^ p

i=

`∗ ij^2 hi

The fraction `∗ ij^2 /hi is called the scaled loadings. This scaling gives variables with small communalities more influence.

Different choices of γ correspond to different orthogonal rotations found in the literature. For example:

Varimax, γ = 1.
Quartimax, γ = 0.
Equimax, γ = m/2.
Parsimax, γ = p p(+mm−−1) 2.

The Regression Method

We will spend a little time discussing the regression method, as that is what is usually used.

Consider the joint distribution of (yj − μ) and f (^) j, assum- ing multivariate normality, as in the maximum likelihood approach. Then, ( yj − μ f (^) j

∼ Np+m

0 , Σ∗^ =

[

LL′^ + Ψ L

L′^ Im×m

])

when the m factor model is correct.

So, the conditional mean of f (^) j given yj − μ is

E(f (^) j|yj − μ) = L′(LL′^ + Ψ)−^1 (yj − μ).

Notice that L′(LL′^ +Ψ)−^1 is an m×p matrix of regression coefficients.

Then, we will use the estimated conditional mean vector to estimate the factor scores,

fˆ (^) j = ˆL′(ˆLLˆ′^ + ˆΨ)−^1 (yj − ¯y).

Alternatively, we could reduce the effect of possible in- correct determination of the number of factors m, by using S as a substitute for ˆLLˆ′^ + ˆΨ. Then,

fˆ (^) j = ˆL′S−^1 (yj − y¯), j = 1, 2 , · · · , n.

Assessing the Model

Suppose that we have performed a factor analysis and are interested in determining if the model appears to be correct. We could consider

Plots
Check for outliers (recall that f (^) j are i.i.d from a N(0, Im×m) population when the model is true.
Check for multivariate normality.
Use univariate tests for normality to check the fac- tor scores.

Understanding Factor Analysis: Modeling the Covariance Structure of Measured Traits, Study notes of Statistics

Related documents

Partial preview of the text

Download Understanding Factor Analysis: Modeling the Covariance Structure of Measured Traits and more Study notes Statistics in PDF only on Docsity!

Factor Analysis

Restrictions

Principal Component Method

Principal Factor Method

LL′^ = R − Ψ =

 =^ L

Maximum Likelihood Estimation

Factor Rotation:

^1

The Regression Method

0 , Σ∗^ =

[

LL′^ + Ψ L

])

Assessing the Model