




































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Exam; Class: APPLD MULTIVAR STAT; Subject: Statistics ; University: University of Nebraska - Lincoln; Term: Unknown 2005;
Typology: Exams
1 / 44
This page cannot be seen from the preview
Don't miss anything!





































6. Factor Analysis (FA)
6.1 Objectives of FA
The same as with PCA:
Discover the “true dimension” of the data
Try to interpret the “new” variables
With FA, there is more emphasis with the interpretation
of the new variables. These new variables are thought
of as “underlying factors” which control the originally
measured variables.
PCA is more concerned with explaining the variance
structure of the data; FA is more concerned with
explaining the correlation and covariance structure of
data.
6.2 Caveats
If the original set of variables are already uncorrelated,
FA will not help.
Some statisticians do not believe that FA is a VALID
statistical technique. The reason is because the non-
uniqueness of FA solutions. A researcher could
continue to change the FA solutions (rotate factors)
until a solution is found that the researcher likes! See
some of the quotes presented on p. 148 of Johnson
6.3 Some History of Factor Analysis
Spearman (1904) originated FA. See Johnson (1998, p.
148-9) for a discussion. Below is a summary.
Test scores for a student can be modeled by the
following equations:
Classics =
1
f +
1
French =
2
f +
2
Music =
6
f +
6
Where
Classics, French, … , Music are original (actually
measurable) variables
f = random component common to all original
variables
j
= random component specific for the j
th
original variable
j
= loading on f specific to each original variable
(not eigenvalues!)
Summary:
Spearman believed that a child’s performance on a
particular test was the sum of an overall intelligence
component and a subject specific component
Note that f and j
are not directly measurable
x 1
1
11
f
1
12
f
2
1m
f
m
1
x 2
2
21
f
1
22
f
2
2m
f
m
2
x p
p
p
f
1
p
f
2
pm
f
m
p
where
1.x
j
is the j
th
random variable
2.f
k
’s i.i.d. (0,1) for k=1,…,m and are called common
factors. These are unobservable random
variables (the “equivalent” in regression are
observable). These factors are uncorrelated with
each other.
j
’s ~ independent (0,
j
) for j=1,…,p and are called
specific factors. These are unobservable random
variables.
j
is the specific variance of
j
5.f k
and j
are independent for all k=1,…,m and j=1,
…,p
jk
measures the contribution of the k
th
factor to the
the j
th
original variable. These are called factor
loadings. They will help us interpret the common
factors! These ’s are NOT eigenvalues (Johnson
uses this notation).
Summary:
We assume that the x 1
,…,x
p
come about through the
FA model structure (p. 6. 4 ).
Each x j
variable is made up of factors common to all –
f
k
’s for k=1,…,m different factors.
Each x j
variable has a factor that is specific to all -
j
for j=1,…,p.
Examine the factor loadings to determine the
importance of a common factor to a particular
independent variable.
Notes:
statements above!
distributed”.
k
and
j
zero mean assumption can be made
without loss of generality (WLOG).
k
variance 1 assumption can be made WLOG.
about is the f
k
and
j
are independent.
j
can be assumed to be 0. We subtract
the means from the x
j
’s to accomplish this. Johnson
(1998) decides to continue to call the FA model:
x
j
j
f
1
j
f
2
jm
f
m
j
for j=1,…,p
where the x j
’s are “mean adjusted.” It may have
been better for him to use something like j j j
x x
the left hand side of the equation. Thus, we could
write a model as
z
j
j
f
1
j
f
2
jm
f
m
j
for j=1,…,p
instead. In matrix form, this becomes
p 1 p m m 1
p 1
z f
where z = [z
1
, z
2
,…, z
p
] and Cov( z ) = P. Note that
the
jk
’s will not be the same between using x
j
or z
j
Be careful with how Johnson uses x
j
and z
j
in the
text!
6.5 Factor Analysis Equations
Notes:
1.Suppose A is a matrix of constants and y be a vector of
random variables. Then Cov( Ay ) = A Cov( y ) A . This can
be shown using standard properties of covariances and
variances (i.e., Var(ay
1
+by
2
) = a
2
Var(y
1
) + b
2
Var(y
2
2abCov(y
1
,y
2
) for constants a and b,…. – see p. 179 of
Mood, Graybill, Boes (1974)).
2.Suppose a and b are independent random vectors. Then
Cov( a + b ) = Cov( a ) + Cov( b ).
From the matrix form of the FA model,
= Cov( x )
= Cov( f + )
= Cov( f ) + Cov() since f and are independent
= Cov( f ) + Cov()
= I + since f ~( 0 , I ) and ~( 0 , )
Instead of trying to find if , f , and exist such that
x = f + , we try to find a and such that
= + . These are the factor analysis equations!
Notes:
1.If = + , then the common factors (f k
’s) explain
the covariances between the original variables exactly
since is a diagonal matrix. This can be seen from
the matrices below.
11 12 1p 11 12 1m 11 12 1m 1
12 22 2p 21 22 2m 21 22 2m 2
1p 2p pp p1 p2 pm p1 p2 pm p
0 0
0 0
0 0
m m m
2
1k 1k 2k 1k pk
k 1 k 1 k 1
11 12 1p 1
m m m
2
12 22 2p 1k 2k 2k 2k pk 2
k 1 k 1 k 1
1p 2p pp p m m m
2
1k pk pk 2k pk
k 1 k 1 k 1
0 0
0 0
0 0
Usually, P is used in place of for the FA. If standardized
data is used, note that P would be the same as ! This
implies that is a matrix of correlations between the z j
’s
(standardized data) and the f k
’s. Then
1.Corr(z
j
, f
k
jk
m
2
jk j
k 1
=1 since the diagonal elements of a correlation
matrix are 1
3.The communality of the j
th
standardized variable is
m
2
jk
k 1
Make sure you can show these statements above by
writing out P = + !
Nonuniqueness of the common factors
Let T be an orthogonal matrix. This means the individual
column vectors within the matrix are orthogonal to each
other. Note that T T = TT = I.
Example: Let x ~ 2
We have shown previously that the eigenvectors
with length 1 from are orthogonal to each other. If
these eigenvectors are put as the columns of a
matrix, the result is an orthogonal matrix:
Note that T T = I.
If m>1, the factor loading matrix () is not unique!
Notice that
= TT + since TT = I
= ( T )( T) + since ( AB )= B A for two
matrices A and B
) + where
Therefore, if is a loading matrix, T also is a loading
matrix! There are an infinite number of orthogonal
matrices.
Notice from the FA model that the same result can be
seen:
x = f +
= TT f +
= ( T )( T f ) +
f
where
= T and f
= T f.
Thus, the loading matrix is not unique.
Therefore, we can form p(p+1)/2 equations with
mp+p = p(m+1) unknowns. Thus, more than one
solution may exist. For example, suppose p=4 and
m = 2. Then we have 10 equations with 12
unknowns.
Below are the matrices written out:
m m m
2
1k 1 1k 2k 1k pk
k 1 k 1 k 1
12 1p
m m m
2
12 2p 1k 2k 2k 2 2k pk
k 1 k 1 k 1
1p 2p m m m
2
1k pk pk 2k pk p
k 1 k 1 k 1
1
1
1
Example: p=3, m=1 is an example with 6 equations and
6 unknowns.
2
12 13 11 1 11 21 11 31
2
12 23 11 21 21 2 21 31
2
13 23 11 31 21 31 31 3
1
1
1
Then 1 =
2
11 1
2
21 2
2
31 3
12
11
21
13
11
31
, and
23
21
31
If the correlations are known, then values for
11
12
13
1
2
, and
3
can be found - not necessarily for
all possible P.
See Johnson (1998) example on p. 154.
6.7 Choosing the Appropriate Number of Factors
Need to decide on an initial guess for m (number of
common factors) before solving the factor equations.
Performing a PCA and finding the number of principal
components is often done to determine the initial m.
When making final decisions for a number of common
factors, here are a few guidelines:
Do not include trivial common factors; i.e., factors that
are accounting for only one original variable.
See Section 6.8 - AIC, SBC, and LRTs
6.8 Computer Solutions of the Factor Analysis
Equations
There are a number of ways for finding solutions to the FA
equations. Below is one of them.
Maximum Likelihood
See the ML supplement for an introduction to maximum
likelihood estimation and likelihood ratio tests.
where
is the log of the likelihood function
evaluated at the parameter estimates.
The best number of factors to use correspond to the
smallest AIC. A few different choices for number of
factors would be needed to use this criteria.
Schwarz’s Bayesian criterion (SBC)
Similar to AIC, but with a different penalty function.
Likelihood ratio test (LRT) for the number of factors
See the ML supplement for an introduction to
likelihood ratio tests.
SAS produces two LRTs:
o
:There are no common factors ( P = I )
a
:There is at least one common factor ( P I )
Section 5.8 discusses this test
-2log() =
2p 5
N 1 log(| R |)
can be
approximated by a
2
p(p 1) / 2
. We can reject H o
if
-2log() is larger than the 1- quantile from a
2
p(p 1) / 2
distribution.
o
:m common factors are sufficient
(i.e., = + where is pm)
a
:More common factors are needed
Johnson and Wichern (1998, p.537-8) discusses
this test in detail. The test statistic is
–2log() =
o
Nlog
where
and
o
(where
and
are appropriate estimates of and ).
Using a Bartlett correction, the adjusted statistic
o
(N 1 (2p 4m 5) / 6)Nlog
can be approximated by a 2
2
[(p m) p m] / 2
. We can
reject H
o
if -2log() is larger than the 1- quantile
from a
2
2
[(p m) p m] / 2
distribution.
that SAS gives us more output that what we actually
need to use so some of it will not be discussed.
data set1;
infile 'c:/chris/unl/stat873/chapter 5/goblet.txt';
input goblet x1 x2 x3 x4 x5 x6;
w1 = x1/x3;
w2 = x2/x3;
w4 = x4/x3;
w5 = x5/x3;
w6 = x6/x3;
run ;
title2 'FA using ML - 2 factors';
proc factor data=set1 method=ml nfactor= 2 heywood
rotate=varimax score out=scores preplot plot
residual;
var w1 w2 w4 w5 w6;
run ;
The FACTOR Procedure
Initial Factor Method: Maximum Likelihood
Prior Communality Estimates: SMC
w1 w2 w4 w5 w
0.79179653 0.79285454 0.79995615 0.73303498 0.
Preliminary Eigenvalues: Total = 15.7398852 Average = 3.
Eigenvalue Difference Proportion Cumulative
1 12.7344277 9.3386641 0.8091 0.
2 3.3957637 2.9360802 0.2157 1.
3 0.4596834 0.6744991 0.0292 1.
4 -0.2148157 0.4203582 -0.0136 1.
5 -0.6351739 -0.0404 1.
2 factors will be retained by the NFACTOR criterion.
Iteration Criterion Ridge Change Communalities
1 0.2868406 0.0000 0.2000 0.81121 0.83253 1.
0.68477 0.
2 0.2323764 0.0000 0.1101 0.82712 0.90451 1.
0.69009 0.
3 0.2313655 0.0000 0.0169 0.84401 0.88941 1.
0.69483 0.
4 0.2312003 0.0000 0.0073 0.83793 0.89672 1.
0.69151 0.
5 0.2311715 0.0000 0.0029 0.84060 0.89381 1.
0.69282 0.
6 0.2311662 0.0000 0.0013 0.83948 0.89508 1.
0.69226 0.
7 0.2311653 0.0000 0.0005 0.83995 0.89455 1.
0.69250 0.
Convergence criterion satisfied.
Initial Factor Method: Maximum Likelihood
Significance Tests Based on 25 Observations
Pr >
Test DF Chi-Square ChiSq
H0: No common factors 10 79.6675 <.
HA: At least one common factor
H0: 2 Factors are sufficient 1 4.6618 0.
HA: More factors are needed
Chi-Square without Bartlett's Correction 5.
Akaike's Information Criterion 3.
Schwarz's Bayesian Criterion 2.
Tucker and Lewis's Reliability Coefficient 0.
Squared Canonical Correlations
Factor1 Factor
1.0000000 0.
Eigenvalues of the Weighted Reduced Correlation
Matrix: Total = 10.7559159 Average = 2.
Eigenvalue Difference Proportion Cumulative
1 Infty Infty
2 10.7559164 10.3941898 1.0000 1.
3 0.3617266 0.2456236 0.0336 1.
4 0.1161031 0.5939333 0.0108 1.
5 -0.4778303 -0.0444 1.
Factor Pattern
Factor1 Factor
w1 0.52895 0.
w2 0.46502 0.
w4 1.00000 -0.
w5 0.71823 0.
w6 0.60474 -0.
Variance Explained by Each Factor
Factor Weighted Unweighted