Factor Analysis - Applied Multivariable Statistics - Lecture Notes | STAT 873, Exams of Statistics

Material Type: Exam; Class: APPLD MULTIVAR STAT; Subject: Statistics ; University: University of Nebraska - Lincoln; Term: Unknown 2005;

Typology: Exams

Pre 2010

Uploaded on 08/31/2009

koofers-user-wo0
koofers-user-wo0 🇺🇸

10 documents

1 / 44

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
6. Factor Analysis (FA)
6.1 Objectives of FA
The same as with PCA:
Discover the “true dimension” of the data
Try to interpret the “new” variables
With FA, there is more emphasis with the interpretation
of the new variables. These new variables are thought
of as “underlying factors” which control the originally
measured variables.
PCA is more concerned with explaining the variance
structure of the data; FA is more concerned with
explaining the correlation and covariance structure of
data.
6.2 Caveats
If the original set of variables are already uncorrelated,
FA will not help.
Some statisticians do not believe that FA is a VALID
statistical technique. The reason is because the non-
uniqueness of FA solutions. A researcher could
continue to change the FA solutions (rotate factors)
until a solution is found that the researcher likes! See
some of the quotes presented on p. 148 of Johnson
(1998).
2005 Christopher R. Bilder
6.1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c

Partial preview of the text

Download Factor Analysis - Applied Multivariable Statistics - Lecture Notes | STAT 873 and more Exams Statistics in PDF only on Docsity!

6. Factor Analysis (FA)

6.1 Objectives of FA

The same as with PCA:

 Discover the “true dimension” of the data

 Try to interpret the “new” variables

With FA, there is more emphasis with the interpretation

of the new variables. These new variables are thought

of as “underlying factors” which control the originally

measured variables.

PCA is more concerned with explaining the variance

structure of the data; FA is more concerned with

explaining the correlation and covariance structure of

data.

6.2 Caveats

 If the original set of variables are already uncorrelated,

FA will not help.

 Some statisticians do not believe that FA is a VALID

statistical technique. The reason is because the non-

uniqueness of FA solutions. A researcher could

continue to change the FA solutions (rotate factors)

until a solution is found that the researcher likes! See

some of the quotes presented on p. 148 of Johnson

6.3 Some History of Factor Analysis

Spearman (1904) originated FA. See Johnson (1998, p.

148-9) for a discussion. Below is a summary.

Test scores for a student can be modeled by the

following equations:

Classics = 

1

f + 

1

French = 

2

f + 

2

Music = 

6

f + 

6

Where

Classics, French, … , Music are original (actually

measurable) variables

f = random component common to all original

variables

j

= random component specific for the j

th

original variable

j

= loading on f specific to each original variable

(not eigenvalues!)

Summary:

 Spearman believed that a child’s performance on a

particular test was the sum of an overall intelligence

component and a subject specific component

 Note that f and  j

are not directly measurable

x 1

1

11

f

1

12

f

2

1m

f

m

1

x 2

2

21

f

1

22

f

2

2m

f

m

2

x p

p

p

f

1

p

f

2

pm

f

m

p

where

1.x

j

is the j

th

random variable

2.f

k

’s i.i.d. (0,1) for k=1,…,m and are called common

factors. These are unobservable random

variables (the “equivalent” in regression are

observable). These factors are uncorrelated with

each other.

j

’s ~ independent (0,

j

) for j=1,…,p and are called

specific factors. These are unobservable random

variables.

j

is the specific variance of 

j

5.f k

and  j

are independent for all k=1,…,m and j=1,

…,p

jk

measures the contribution of the k

th

factor to the

the j

th

original variable. These are called factor

loadings. They will help us interpret the common

factors! These ’s are NOT eigenvalues (Johnson

uses this notation).

Summary:

 We assume that the x 1

,…,x

p

come about through the

FA model structure (p. 6. 4 ).

 Each x j

variable is made up of factors common to all –

f

k

’s for k=1,…,m different factors.

 Each x j

variable has a factor that is specific to all - 

j

for j=1,…,p.

 Examine the factor loadings to determine the

importance of a common factor to a particular

independent variable.

Notes:

  1. MAKE SURE that you understand ALL of these

statements above!

  1. i.i.d. stands for “independently and identically

distributed”.

  1. The f

k

and 

j

zero mean assumption can be made

without loss of generality (WLOG).

  1. The f

k

variance 1 assumption can be made WLOG.

  1. The main assumption that we need to be concerned

about is the f

k

and 

j

are independent.

6) BE CAREFUL WITH THE NOTATION NEXT:

WLOG, 

j

can be assumed to be 0. We subtract

the means from the x

j

’s to accomplish this. Johnson

(1998) decides to continue to call the FA model:

x

j

j

f

1

j

f

2

jm

f

m

j

for j=1,…,p

where the x j

’s are “mean adjusted.” It may have

been better for him to use something like j j j

x x  

the left hand side of the equation. Thus, we could

write a model as

z

j

j

f

1

j

f

2

jm

f

m

j

for j=1,…,p

instead. In matrix form, this becomes

p 1 p m m 1

p 1

  

z   f  

where z = [z

1

, z

2

,…, z

p

] and Cov( z ) = P. Note that

the 

jk

’s will not be the same between using x

j

or z

j

Be careful with how Johnson uses x

j

and z

j

in the

text!

6.5 Factor Analysis Equations

Notes:

1.Suppose A is a matrix of constants and y be a vector of

random variables. Then Cov( Ay ) = A Cov( y ) A . This can

be shown using standard properties of covariances and

variances (i.e., Var(ay

1

+by

2

) = a

2

Var(y

1

) + b

2

Var(y

2

2abCov(y

1

,y

2

) for constants a and b,…. – see p. 179 of

Mood, Graybill, Boes (1974)).

2.Suppose a and b are independent random vectors. Then

Cov( a + b ) = Cov( a ) + Cov( b ).

From the matrix form of the FA model,

 = Cov( x )

= Cov( f + )

= Cov( f ) + Cov() since f and  are independent

= Cov( f ) + Cov()

=  I  +  since f ~( 0 , I ) and ~( 0 , )

Instead of trying to find if , f , and  exist such that

x = f + , we try to find a  and  such that

 =  + . These are the factor analysis equations!

Notes:

1.If  =  + , then the common factors (f k

’s) explain

the covariances between the original variables exactly

since  is a diagonal matrix. This can be seen from

the matrices below.

11 12 1p 11 12 1m 11 12 1m 1

12 22 2p 21 22 2m 21 22 2m 2

1p 2p pp p1 p2 pm p1 p2 pm p

0 0

0 0

0 0

                 

       

         

       

 

       

       

         

       

   

   

               

   

m m m

2

1k 1k 2k 1k pk

k 1 k 1 k 1

11 12 1p 1

m m m

2

12 22 2p 1k 2k 2k 2k pk 2

k 1 k 1 k 1

1p 2p pp p m m m

2

1k pk pk 2k pk

k 1 k 1 k 1

0 0

0 0

0 0

  

  

  

 

       

 

       

 

   

             

   

 

 

   

 

   

     

   

         

 

 

  

       

   

 

Usually, P is used in place of  for the FA. If standardized

data is used, note that P would be the same as ! This

implies that  is a matrix of correlations between the z j

’s

(standardized data) and the f k

’s. Then

1.Corr(z

j

, f

k

jk

m

2

jk j

k  1

 =1 since the diagonal elements of a correlation

matrix are 1

3.The communality of the j

th

standardized variable is

m

2

jk

k  1

Make sure you can show these statements above by

writing out P =  + !

Nonuniqueness of the common factors

Let T be an orthogonal matrix. This means the individual

column vectors within the matrix are orthogonal to each

other. Note that TT = TT  = I.

Example: Let x ~ 2

N ,

We have shown previously that the eigenvectors

with length 1 from  are orthogonal to each other. If

these eigenvectors are put as the columns of a

matrix, the result is an orthogonal matrix:

T =

Note that TT = I.

If m>1, the factor loading matrix () is not unique!

Notice that

P =  + 

=  TT  +  since TT = I

= ( T )( T)  +  since ( AB )= BA  for two

matrices A and B

) +  where 

= T

Therefore, if  is a loading matrix,  T also is a loading

matrix! There are an infinite number of orthogonal

matrices.

Notice from the FA model that the same result can be

seen:

x =  f + 

=  TTf + 

= ( T )( Tf ) + 

f

where 

= T and f

= Tf.

Thus, the loading matrix is not unique.

Therefore, we can form p(p+1)/2 equations with

mp+p = p(m+1) unknowns. Thus, more than one

solution may exist. For example, suppose p=4 and

m = 2. Then we have 10 equations with 12

unknowns.

Below are the matrices written out:

m m m

2

1k 1 1k 2k 1k pk

k 1 k 1 k 1

12 1p

m m m

2

12 2p 1k 2k 2k 2 2k pk

k 1 k 1 k 1

1p 2p m m m

2

1k pk pk 2k pk p

k 1 k 1 k 1

1

1

1

  

  

  

 

         

 

   

 

 

             

 

 

 

 

 

   

 

           

 

 

   

   

Example: p=3, m=1 is an example with 6 equations and

6 unknowns.

2

12 13 11 1 11 21 11 31

2

12 23 11 21 21 2 21 31

2

13 23 11 31 21 31 31 3

1

1

1

            

 

 

         

 

 

            

 

 

Then 1 =

2

11 1

2

21 2

2

31 3

12

11

21

13

11

31

, and 

23

21

31

If the correlations are known, then values for 

11

12

13

1

2

, and 

3

can be found - not necessarily for

all possible P.

See Johnson (1998) example on p. 154.

6.7 Choosing the Appropriate Number of Factors

Need to decide on an initial guess for m (number of

common factors) before solving the factor equations.

Performing a PCA and finding the number of principal

components is often done to determine the initial m.

When making final decisions for a number of common

factors, here are a few guidelines:

 Do not include trivial common factors; i.e., factors that

are accounting for only one original variable.

 See Section 6.8 - AIC, SBC, and LRTs

6.8 Computer Solutions of the Factor Analysis

Equations

There are a number of ways for finding solutions to the FA

equations. Below is one of them.

Maximum Likelihood

See the ML supplement for an introduction to maximum

likelihood estimation and likelihood ratio tests.

  • penalty for number of parameters

where

is the log of the likelihood function

evaluated at the parameter estimates.

The best number of factors to use correspond to the

smallest AIC. A few different choices for number of

factors would be needed to use this criteria.

Schwarz’s Bayesian criterion (SBC)

Similar to AIC, but with a different penalty function.

Likelihood ratio test (LRT) for the number of factors

See the ML supplement for an introduction to

likelihood ratio tests.

SAS produces two LRTs:

1. H

o

:There are no common factors ( P = I )

H

a

:There is at least one common factor ( PI )

Section 5.8 discusses this test

-2log() =

2p 5

N 1 log(| R |)

can be

approximated by a

2

p(p 1) / 2

. We can reject H o

if

-2log() is larger than the 1- quantile from a

2

p(p 1) / 2

distribution.

2. H

o

:m common factors are sufficient

(i.e.,  =  +  where  is pm)

H

a

:More common factors are needed

Johnson and Wichern (1998, p.537-8) discusses

this test in detail. The test statistic is

–2log() =

o

Nlog

S

where

S

=[(N-1)/N]

and

o

(where

 and

are appropriate estimates of  and ).

Using a Bartlett correction, the adjusted statistic

o

(N 1 (2p 4m 5) / 6)Nlog

S

can be approximated by a 2

2

[(p  m)  p m] / 2

. We can

reject H

o

if -2log() is larger than the 1- quantile

from a

2

2

[(p  m)  p m] / 2

distribution.

that SAS gives us more output that what we actually

need to use so some of it will not be discussed.

data set1;

infile 'c:/chris/unl/stat873/chapter 5/goblet.txt';

input goblet x1 x2 x3 x4 x5 x6;

w1 = x1/x3;

w2 = x2/x3;

w4 = x4/x3;

w5 = x5/x3;

w6 = x6/x3;

run ;

title2 'FA using ML - 2 factors';

proc factor data=set1 method=ml nfactor= 2 heywood

rotate=varimax score out=scores preplot plot

residual;

var w1 w2 w4 w5 w6;

run ;

The FACTOR Procedure

Initial Factor Method: Maximum Likelihood

Prior Communality Estimates: SMC

w1 w2 w4 w5 w

0.79179653 0.79285454 0.79995615 0.73303498 0.

Preliminary Eigenvalues: Total = 15.7398852 Average = 3.

Eigenvalue Difference Proportion Cumulative

1 12.7344277 9.3386641 0.8091 0.

2 3.3957637 2.9360802 0.2157 1.

3 0.4596834 0.6744991 0.0292 1.

4 -0.2148157 0.4203582 -0.0136 1.

5 -0.6351739 -0.0404 1.

2 factors will be retained by the NFACTOR criterion.

Iteration Criterion Ridge Change Communalities

1 0.2868406 0.0000 0.2000 0.81121 0.83253 1.

0.68477 0.

2 0.2323764 0.0000 0.1101 0.82712 0.90451 1.

0.69009 0.

3 0.2313655 0.0000 0.0169 0.84401 0.88941 1.

0.69483 0.

4 0.2312003 0.0000 0.0073 0.83793 0.89672 1.

0.69151 0.

5 0.2311715 0.0000 0.0029 0.84060 0.89381 1.

0.69282 0.

6 0.2311662 0.0000 0.0013 0.83948 0.89508 1.

0.69226 0.

7 0.2311653 0.0000 0.0005 0.83995 0.89455 1.

0.69250 0.

Convergence criterion satisfied.

Initial Factor Method: Maximum Likelihood

Significance Tests Based on 25 Observations

Pr >

Test DF Chi-Square ChiSq

H0: No common factors 10 79.6675 <.

HA: At least one common factor

H0: 2 Factors are sufficient 1 4.6618 0.

HA: More factors are needed

Chi-Square without Bartlett's Correction 5.

Akaike's Information Criterion 3.

Schwarz's Bayesian Criterion 2.

Tucker and Lewis's Reliability Coefficient 0.

Squared Canonical Correlations

Factor1 Factor

1.0000000 0.

Eigenvalues of the Weighted Reduced Correlation

Matrix: Total = 10.7559159 Average = 2.

Eigenvalue Difference Proportion Cumulative

1 Infty Infty

2 10.7559164 10.3941898 1.0000 1.

3 0.3617266 0.2456236 0.0336 1.

4 0.1161031 0.5939333 0.0108 1.

5 -0.4778303 -0.0444 1.

Factor Pattern

Factor1 Factor

w1 0.52895 0.

w2 0.46502 0.

w4 1.00000 -0.

w5 0.71823 0.

w6 0.60474 -0.

Variance Explained by Each Factor

Factor Weighted Unweighted