










Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A step-by-step guide on conducting Exploratory Factor Analysis (EFA) using SPSS software. EFA is a statistical technique used to identify underlying factors that explain the correlations among a set of variables. checking for missing data, dealing with multicollinearity and singularity, and assessing normality and linearity. It also includes instructions on creating a principal factor analysis.
Typology: Schemes and Mind Maps
1 / 18
This page cannot be seen from the preview
Don't miss anything!











The purpose of this lesson on Exploratory Factor Analysis is to understand and apply statistical techniques to a single set of variables when the researcher is interested in discovering which variables in the set form coherent subsets that are relatively independent of one another. Variables that are correlated with one another but largely independent of other subsets of variables are combined into factors. Factors are thought to reflect underlying processes that have created the correlations among variables.
That dataset ( FACTOR.sav ) that we will be using is part of a larger data set from Tabachnick and Fidell (2007). The study involved 369 middle-class, English-speaking women between the ages of 21 and 60 who completed the Bem Sex Role Inventory (BSRI). Respondents attribute traits to themselves by assigning numbers between 1 ( never or almost never true of me ) and 7 ( always or almost always true of me ) to each of the items. Forty-four items from the BSRI were selected for this research example.
A general rule of thumb is to have at least 300 cases for factor analysis. “Solutions that have several high loading marker variables (> .80) do not require such large sample sizes (about 150 cases should be sufficient) as solutions with lower loadings” (Tabachnick & Fidell, 2007, p. 613). *Our data set has an adequate sample size of 369 cases. Bryant and Yarnold (1995) state that, “one’s sample should be at least five times the number of variables. The subjects-to-variables ratio should be 5 or greater. Furthermore, every analysis should be based on a minimum of 100 observations regardless of the subjects-to-variables ratio” (p. 100).
MISSING DATA
To check for missing data: Click Analyze Descriptive Statistics Click Frequencies Click over all 44 Items to Variable(s): ( except Subno) De-select [ ] Display frequency tables This will produce a warning message, simply click OK
Click OK
Exploratory Factor Analysis
The first table of the output identifies missing values for each item. Scrolling across the output, you will notice that there are no missing values for this set of data. If there were missing data, use one option (estimate, delete, or missing data pairwise correlation matrix is analyzed). If nonrandom pattern or small sample size, consider estimation but it can lead to overfitting the data resulting in too high correlations. Please refer to Tabachnick and Fidell (2007) to obtain more information about deleting and dealing with missing data.
For the sake of this training, we will start with an assessment of multivariate outliers. However, we would usually begin by conducting screening for univariate outliers and assumptions. Many statistical methods are sensitive to outliers so it is important to identify outliers and make decisions about what to do with them. Recall, that a multivariate outlier is an extreme score on one or more variables.
REASON FOR OUTLIERS (TABACHNICK & FIDELL, 2007)
To check for multivariate outliers: Click Analyze Regression Click Linear Dependent: subno Independent(s): All remaining 44 Items Click Save Under Distances [√] Mahalanobis Click Continue Click OK
Exploratory Factor Analysis
Multicollinearity occurs when the IVs are highly correlated. Singularity occurs when you have redundant variables.
To test for multicollinearity and singularity, use the following SPSS commands: Click Analyze Regression Click Linear Click Reset Dependent: subno Independent(s): All 44 Items Be sure not to include MAH_ Click Statistics [√] Collinearity diagnostics Click Continue Click OK
This will produce an output page… If the determinant of R and eigenvalues associated with some factors approach 0, multicollinearity or singularity may be in existence. “To investigate further, look at the SMCs for each variable where it serves as DV with all other variables as IVs” (Tabachnick & Fidell, 2007, p. 614).
Exploratory Factor Analysis
Looking at the output page on the following page, under Collinearity Statistics look at the Tolerance values for each item on the test. We want the Tolerance values to be high, closer to 1.0.
Next, we want to explore SMCs (squared multiple correlations) of a variable where it serves as DV with the rest as IVs in multiple correlation (Tabchnick & Fidell, 2007). Many programs, including SPSS, convert the SMC values for each variable to tolerance (1 – SMC) and deal with tolerance instead of SMC. Thus, we have to calculate the SMCs ourselves. Turn to the next page of this handout and next to the tolerance values – calculate the SMCs for the first tem items (1 – Tolerance). We want the SMCs to be low, closer to .00.
If any of the SMCs are one (1), then singularity if present. If any of the SMCs are very large (i.e., near one), then multicollinearity is present (Tabachnick & Fidell, 2007).
The tolerance and SMC values were fine for this group of data. However, if the tolerance values are too low, we would want to scroll down to the next table and examine the Condition Index for each item. According to Tabachnick and Fidell (2007), we do not want the Condition Index values to be greater than 30. Examine the Condition Index for all 44 items. As you can see, the last 25 items have Condition Indexes that are grater than
Because of these high Condition Indexes , you would next need to examine the Variance Proportion for those high Condition Index items which are located next to the Condition Index. According to Tabachnick and Fidell (2007), we do not want two Variance Proportions to be greater than .50 for each item.
To explain further, look at the Variance Proportion of Dimension 45. Scroll across the page and see if there are two items with Variance Proportions that are greater than. for Dimension 45.
Next, you have to make some decisions about multicollinearity. Because we did not find evidence of any Variance Proportions that are grater than .50, we may decide that we do not have evidence of multicollinearity. However, one can also combine evidence (explore the SMC, Tolerance Values, Condition Index, and Variance Proportions) and decide if there is combined evidence of multicollinearity.
Generally, if the Condition Index and Variance Proportion values are high, then there is evidence of multicollinearity.
For this set of data… we have no evidence that multicollinearity or singularity exist.
Save the output as “MULTICOLLINEARITY”
Exploratory Factor Analysis
If Principal Factor Analysis is used descriptively, then assumptions about distributions are not essential. However, normality of variables enhances the solution (Tabachnick & Fidell, 2007).
When the numbers of factors are determined using statisicial inference, multivariate normality is assumed. “Normality among single variables is assessed by skewness and kurtosis” (Tabachnick & Fidell, 2007, p. 613) – and as such, the distributions of the 44 variables need to be examined for skewness and kurtosis.
To obtain the skewness and kurtosis of the 44 variables one would first Click Analyze Descriptive Statistics Click Frequencies Click Reset Click over all 44 Items to Variable(s): box Be sure not to include Subno and MAH_ Click Statistics Under Dispersion
[√] all Under Central Tendency [√] all
Under Distribution [√] all Click Continue Click Charts (^) Histograms
[√] With normal curve Click Continue De-select [ ] Display frequency tables Click OK An output will be produced… scroll to the top of the output to Frequencies. You will see the skewness values and their standard error values for all 44 items.
Exploratory Factor Analysis
Skewness: A distribution that is not symmetric but has more cases (more of a “tail”) toward one end of the distribution than the other is said to be skewed (Norusis, 1994).
“helpful”
Skewness Standard Score
Direction of the Skewness
Significant Departure? (yes, no)
= Std. Error
SkewnessValue
Scroll to the top of the output to Frequencies. You will see the kurtosis values and their standard error values for all 44 items.
Kurtosis: The relative concentration of scores in the center, the upper and lower ends (tails) and the shoulders (between the center and the tails) of a distribution (Norusis, 1994).
“helpful”
Kurtosis Standard Score
Direction of the Kurtosis
Significant Departure? (yes, no)
= Std. Error
Kurtosis Value
Exploratory Factor Analysis
Click Analyze Data Reduction Click Factor Highlight all 44 Items and click them over to the Variable(s): box. Be sure not to include Subno and MAH_ Click Descriptives Under Statistics [√] Univariate descriptives
[√] Initial solution ( default )
Exploratory Factor Analysis
Under Correlation Matrix
[√] Coefficients
[√] Determinant
[√] KMO and Bartlett’s test of sphericity Click Continue Click Extraction Change Method to Principal axis factoring Under Display
[√] Unrotated factor solution ( default )
[√] Scree plot Click Continue Click OK
An output will then be produced…
Look at the Correlation Matrix produced on the output page. “A matrix that is factorable should include several sizable correlations. The expected size depends, to some extent, on N (larger sample sizes tend to produce smaller correlations), but if no correlation exceeds .30, use of FA is questionable because there is probably nothing to factor analyze” (Tabachnick & Fidell, 2007, p. 614). We want the correlations between items to be greater than .30.
“High bivariate correlations, however, are not ironclad proof that the correlation matrix contains factors. It is possible that the correlations are between only two variables and do not reflect underlying processes that are simultaneously affecting several variables. For this reason, it is helpful to examine matrices of partial correlations where pairwise correlations are adjusted for effects of all other variables” (Tabachnick & Fidell, 2007, p. 614).
To examine partial correlations, look on the output page and scroll down to KMO and Bartlett’s Test.
The Kaiser-Meyer-Olkin Measure of Sampling (KMO) is an index for comparing the magnitudes of the observed correlation coefficients to the magnitudes of the partial correlation coefficients.
Exploratory Factor Analysis
Usually the scree plot is negatively decreasing – the eigenvalue is highest for the first factor and moderate but decreasing for the next few factors before reaching small values for the last several factors.
Examine the Scree Plot on your output page…
You look for the point where the line drawn through the points changes slope. Unfortunately, the scree test is not exact; it involves judgment of where the discontinuity in eigenvalues occurs and researchers are not perfectly reliable judges (Tabachnick & Fidell, 2007).
In the example, a single straight line can comfortably fit the first four eigenvalues. After that, another line, with a noticeably different slope, best fits the remaining eight points. Therefore, there appears to be about four (4) factors in the data.
Once you have determined the number of factors by these criteria, it is important to look at the rotated loading matrix to determine the number of variables that load on each factor.
Click Analyze Data Reduction Click Factor Click Reset Highlight all 44 Items and click them over to the Variable(s): box. Be sure not to include Subno and MAH_ Click Extraction Change Method to Principal axis factoring Under Extract (^) Number of factors: Type in the number 4 (four) Click Continue Click Rotation Under Method (^) Varimax
Click Continue Click OK
An output should be produced…
Exploratory Factor Analysis
Parenthetically, we chose Varimax but it is acceptable and common to experiment with various extraction and rotation procedures before deciding upon the preferred solution (Tabachnick & Fidell, 2007).
Look at the Communalities chart on your output. Under the Extraction heading, we want values to be greater than .20. Looking at the output, you can see that there are several variables below .20.
Having many factors less than .20 indicates that the items are not loading properly on the factors. However, Tabachnick and Fidell (2007) explain that factorial purity was not a consideration with the development of the BSRI which means that when developing the BSRI there was no concern with items loading on certain factors.
Next, examine the table labeled Total Variance Explained on your output.
Under Rotation Sums of Squared Loadings , you can see that the four factors have eigenvalues greater than two (2).
Finally, examine the Rotated Factor Matrix table on your output.
Factors are interpreted through their factor loadings. Tabachnick and Fidell (2007) decided to use a loading of .45 (20% variance overlap between variable and factor). Factors appear as columns and items appear as rows. Tabachnick and Fidell also recommend a minimum factor loading of .32.
The greater the loading, the more the variable is a pure measure of the factor. Comrey and Lee (1992) suggest that loadings in excess of
Choice of the cutoff for size of loading to be interpreted is a matter of researcher preference (Tabachnick & Fidell, 2007).
Look at the chart below and you will see the output for the Rotated Factor Matrix. For each factor column (there should be four of them), circle the values that exceed .45 for each factor column.
There should be twelve (12) items circled for Factor 1, six (6) under Factor 2, five (5) under Factor 3, and three (3) under Factor 4.
Examine the items circles and label the factors accordingly.
Exploratory Factor Analysis
One of the most important reasons for naming a factor is to communicate to others. The name should capsulize the substantive nature of the factor and enable others to grasp its meaning (Rummel, 1970).
The choice of factor names should be related to the basic purpose of the factor analysis. If the goal is to describe or simplify the complex interrelationships in the data, a descriptive factor label can be applied. The descriptive approach to factor naming involves selecting a label that best reflects the substance of the variables loaded highly and near zero on a factor. The factors are classificatory and names to define each category are sought (Rummel, 1970).
There are a number of considerations involved in descriptively naming factors:
Click Analyze Scale Click Reliability Analysis Click over the 44 Items under the Items: box Be sure not to include Subno and MAH_ For the Model: box – be sure that Alpha is selected Click OK
Interpret Cronbach’s Alpha by providing the information asked for below: Cronbach’s Alpha For all 44 items
N of items
Exploratory Factor Analysis
Next, examine the internal consistency of the items which have high factor loadings on each of the four factors (i.e., > .45). These are the item loadings you circled for each of the four factors in the Rotated Factor Matrix.
Click Analyze Scale Click Reliability Analysis Click Reset Click over the items for that factor under the Items: box For the Model: box – be sure that Alpha is selected Click OK
Cronbach’s Alpha For Factor 1
N of items
Do the same procedure for the next three factors and interpret Cronbach’s Alpha by providing the information asked for below:
Cronbach’s Alpha For Factor 2
N of items
Cronbach’s Alpha For Factor 3
N of items
Cronbach’s Alpha For Factor 4
N of items