Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Material Type: Exam; Professor: Westfall; Class: Applied Multivariate Analysis; Subject: Infrmtion Sys and Quant Scienc; University: Texas Tech University; Term: Fall 2006;
Typology: Exams
1 / 8
MVA Final Fall 06. Open notes, no book. Points (out of 200) in parentheses.
1.(15) Suppose you have a data set with two dependent variables, Y 1 and Y 2 , and three independent variables, X 1 , X 2 , and X 3. Explain, step-by-step, how to obtain a partial correlation matrix of Y 1 and Y 2 , controlling for X 1 , X 2 , and X 3 , using regression residuals.
Solution: Fit a regression equation relating Y 1 to X 1 , X 2 , and X 3 getting a column of residuals, e 1 = Y 1 – (b 01 +b 11 X 1 + b 21 X 2 +b 31 X 3 ). Fit another regression equation relating Y 2 to X 1 , X 2 , and X 3 getting a column of residuals, e 2 = Y 2 – (b 02 +b 12 X 1 + b 22 X 2 +b 32 X 3 ). Now, calculate the correlation coefficient between the columns e 1 , e 2. This is the partial correlation between Y 1 and Y 2 , controlling for X 1 , X 2 , and X 3.
FacKnowledge = Faculty knowledge FacTeaching = Faculty teaching FacResInClass = Faculty use of research in class FacOutsideClass = Faculty availability outside of class FacIntSuccess = International success of faculty GrAdvisAvail = Graduate advisor availability GrAdvAdmin = Graduate advisor administrative skill GrAdvPersonal = Graduate advisor personal help GrAdvCareerAdv = Graduate advisor help to advance career GrAdvPlcmt = Graduate advisor help in placement
Here is the SAS code:
proc princomp data=isqs6348.pgs; var FacKnowledge FacTeaching FacResInClass FacOutsideClass FacIntSuccess GrAdvisAvail GrAdvAdmin GrAdvPersonal GrAdvCareerAdv GrAdvPlcmt; run ;
Here is some selected output:
Eigenvalues of the Correlation Matrix
Eigenvalue Difference Proportion Cumulative
1 5.85957806 4.16341342 0.5860 0. 2 1.69616465 1.18571670 0.1696 0. 3 0.51044795 0.05013162 0.0510 0. 4 0.46031633 0.05371107 0.0460 0. 5 0.40660526 0.12012661 0.0407 0. 6 0.28647865 0.01403389 0.0286 0. 7 0.27244476 0.04491817 0.0272 0. 8 0.22752659 0.06433406 0.0228 0. 9 0.16319253 0.04594731 0.0163 0. 10 0.11724522 0.0117 1.
Eigenvectors
Prin1 Prin2 Prin3 Prin4 Prin
FacKnowledge 0.273091 0.395137 0.432351 0.411725 -. FacTeaching 0.296674 0.387501 0.249367 0.158671 -. FacResInClass 0.282866 0.345842 0.244605 -.421583 0. FacOutsideClass 0.299664 0.280972 -.675718 -.000788 0. FacIntSuccess 0.313763 0.267767 -.417742 -.130137 -. GrAdvisAvail 0.323535 -.295559 -.090535 0.439159 0. GrAdvAdmin 0.336762 -.316461 0.000871 0.297661 0. GrAdvPersonal 0.343815 -.305471 -.013451 0.049178 -. GrAdvCareerAdv 0.349947 -.293464 0.119199 -.250745 -. GrAdvPlcmt 0.332252 -.237746 0.193309 -.513697 -.
Eigenvectors
Prin6 Prin7 Prin8 Prin9 Prin
FacKnowledge 0.118130 0.552144 -.011526 0.043147 -. FacTeaching -.480731 -.654390 0.063739 -.088827 0. FacResInClass 0.296804 0.043306 -.106793 -.012448 0. FacOutsideClass -.451318 0.360697 0.072077 0.049820 -. FacIntSuccess 0.588544 -.294085 -.060778 0.023290 0. GrAdvisAvail 0.250562 -.053508 0.531952 -.407311 0. GrAdvAdmin 0.061545 -.146160 -.151681 0.770610 -. GrAdvPersonal -.123355 0.066198 -.661418 -.301311 0. GrAdvCareerAdv -.071223 0.040779 -.107735 -.293706 -. GrAdvPlcmt -.172973 0.134817 0.469521 0.224221 0.
2.A.(15) A portion of the data matrix looks like this:
Fac Fac Fac FacRes Outside Obs (^) Knowledge Teaching InClass Class …
(^1 3 3 3 4) … (^2 4 3 4 4) … (^3 4 4 3 3) … (^4 3 3 4 4) … (^5 4 4 3 4) … (^6 5 4 3 2) … (^7 2 3 1 3) … … … … … … …
Additional columns containing principal component scores can be calculated and included along with the rest of the data in the data set. Specifically how are the data values in the additional column corresponding to the first principal component calculated? Be detailed and specific.
Solution to 2.A: Standardize the columns of data by subtracting the sample mean and dividing by the sample standard deviation. For example, if the sample mean of Fac Knowledge is 3.67 and the standard deviation in 0.87, then standardized values are
Standardized Fac Obs (^) Knowledge
(^1) (3-3.67)/. (^2) (4-3.67)/. (^3) (4-3.67)/. (^4) (3-3.67)/. (^5) (4-3.67)/. (^6) (5-3.67)/. (^7) (2-3.67)/.
… …
Perform similar standardizations for the other columns, noting that the means and standard deviations will be different for every column.
The PC1 column is then obtained as 0.273091(Standardized Fac Knowledge) + 0.296674(Standardized Fac Teaching) +0.282866(Standardized Fac Research in Class) +…+0.332252(Standardized GrdAdvPlacement).
2.B.(5) What is the variance of the data values in the additional column corresponding to the first principal component?
Solution: The eigenvalues are the variances. So the variance is 5.86.
2.C.(5) An additional data column corresponding to the second principal component can also be included. What is the correlation between the data values in the two columns (PC1 and PC2)?
Solution: PC’s are uncorrelated. So the correlation in 0.
2.D.(15) What do PC1 and PC2 measure? Use the ‘sorting’ idea: which students have the highest PC1? Which have the lowest? What then, does PC1 measure about the students? Which students have the highest PC2? Which have the lowest? What then, does PC2 measure about the students?
Solution. Students who give high ratings on all variables have high PC1; students who give low ratings on all variables have low PC1. So PC1 seems to measure the student’s “overall satisfaction with teaching and advising.”
Students who give high ratings to faculty and low ratings to advisors have high PC2; students who give low ratings to faculty and high ratings to advisors have low PC2. So PC2 seems to measure the student’s “comparison of teaching with advising.”
proc factor data=isqs6348.pgs nfactors= 2 method=ML rotate=varimax res; var FacKnowledge FacTeaching FacResInClass FacOutsideClass FacIntSuccess GrAdvisAvail GrAdvAdmin GrAdvPersonal GrAdvCareerAdv GrAdvPlcmt; run ;
Selected results are as follows:
Root Mean Square Off-Diagonal Residuals: Overall = 0.
Rotated Factor Pattern Factor1 Factor FacKnowledge 0.17634 0. FacTeaching 0.21229 0. FacResInClass 0.23442 0. FacOutsideClass 0.30818 0. FacIntSuccess 0.34346 0. GrAdvisAvail 0.78659 0. GrAdvAdmin 0.84583 0. GrAdvPersonal 0.87515 0. GrAdvCareerAdv 0.87382 0. GrAdvPlcmt 0.77278 0.
3.A.(10) What does “Root Mean Square Off-Diagonal Residuals: Overall = 0.02939727” tell you? (I am not looking for a “magic threshold” here, I am just asking, what does the number mean and why does it mean that?)
Solution: This number is the square root of the average squared elements of the residual matrix. The residual matrix is the observed correlation matrix minus the FA model fitted correlation matrix. So the number .029 is the “typical” difference between the elements of the observed correlation matrix and the model fitted correlation matrix. Since the correlations are on a -1 to 1 scale, we know that .029 is small. For example, if a true correlation is .43, the model might fit a value .40, which does not differ by too much on the -1 to 1 correlation scale.
3.B.(15) Find the percentage of variance in FacKnowledge that explained by the two common factors. Explain the logic step by step, in particular noting assumptions that you have made about the factor analysis model.
Solution: Since the correlation matrix is analyzed, we know we are talking about standardized data. So Var(FacKnowledge) =1. But the model assumes that
FacKnowledge = l 1 F 1 + l 2 F 2 + ε 1.
The assumptions of the model are that F 1 , F 2 and ε 1 are uncorrelated, with Var(F 1 )=Var(F 2 )=1, so
Var (FacKnowledge) = Var( l 1 F 1 + l 2 F 2 + ε 1 ) = l 12 Var(F 1 ) + l 22 Var(F 2 ) + Var(ε 1 ) = l 12 + l 22 + Var(ε 1 ).
Thus
1 = Var(FacKnowledge) = l 12 + l 22 + Var(ε 1 ),
implying l 12 + l 22 is the proportion of variance in FacKnowledge that is explained by F 1 , F 2. The estimated proportion is thus 0.17634^2 + 0.76921^2 = 62.3%.
3.C.(20) What do Factor1 and Factor2 measure? Use the “sorting” idea again.
Solution: Students with high Factor1 give high ratings to advisors; students with low Factor1 give low ratings to advisors. Thus Factor1 measures “satisfaction with advisors.”
Students with high Factor2 give high ratings to faculty; students with low Factor2 give low ratings to faculty. Thus Factor2 measures “satisfaction with faculty.”
3.D.(25) Suppose you want to estimate the correlation between Factor1 and Factor2. Explain specifically, step by step, how you would do this using the confirmatory factor analysis model.
Solution: Fit a model where the faculty measures are related to Factor1 only, and the advisor ratings are related to Factor2 only. Allow the Factors to correlate. The estimated correlation is the value that best reproduces the observed covariance matrix. This can be done using PROC CALIS as follows:
proc calis data=isqs6348.pgs; var FacKnowledge FacTeaching FacResInClass FacOutsideClass FacIntSuccess GrAdvisAvail GrAdvAdmin GrAdvPersonal GrAdvCareerAdv GrAdvPlcmt; lineqs FacKnowledge = b1 F1 + e1, FacTeaching = b2 F1 + e2, FacResInClass = b3 F1 + e3, FacOutsideClass = b4 F1 + e4, FacIntSuccess = b5 F1 + e5, GrAdvisAvail = b6 F2 + e6, GrAdvAdmin = b7 F2 + e7, GrAdvPersonal = b8 F2 + e8, GrAdvCareerAdv = b9 F2 + e9, GrAdvPlcmt = b10 F2 + e10; std e1-e10=the1-the10, F1-F2= 1 1 ; cov F1 F2 = phi1; run ;
(FYI, the estimated correlation is .595.)
Y 1 = .6 F + ε 1 Y 2 = .3 F + ε 2 Y 3 = .7 F + ε 3
Where as usual, Var(Y 1 )= Var(Y 2 )=Var(Y 3 )=Var(F) = 1, and where F, ε 1 , ε 2 and ε 3 are uncorrelated.
4.A.(10) Find the covariance matrix of.
1 2 3
Solution: This is just the factor analysis model, so Cov( Y ) = LL’ + Ψ. Note that Var(ε 1 )=1-.6^2 = .64,
Cov Y
4.B. (5) Find Var(Y 1 +Y 2 +Y 3 ).
Var Y Y Y
4.C.(5) Find Cronbach’s coefficient α.
Solution:
4.D. (10) Find the true reliability of the summate Y 1 +Y 2 +Y 3 as a measure of F.
Solution: We need the covariance matrix of (Y 1 +Y 2 +Y 3 , F). So note that
1 2 3 1 2 3
Cov Cov F
. So the correlation is 1.6/sqrt(4.26) = .7752, and the reliability is .7752^2 = .601.
5.A. (10) Comment on both convergent and discriminant validity.
Solution: Partition the matrix into 2x2 blocks. Since the correlation is high within both diagonal blocks, there is good convergent validity, meaning that the X variables seem to measure a similar quantity, and ther Y variables seem to measure a similar quantity. Since the (2x2) off-diagonal correlation block has low correlations, the X variables are not similar to the Y variables. Hence the X variables and the Y variables seem to be measuring different things, and thus there is good discriminant validity.
5.B. (5) Suppose the data are standardized to Z-scores: Y1s Y2s X1s X2s, the “s” in the subscript denoting that the variables have been standardized, and are now different from Y 1 Y 2 X 1 X 2. Give the covariance matrix of the vector (Y1s, Y2s, X1s, X2s).
Solution: The covariance matrix of the standardized data is equal to the correlation matrix of the unstandardized data. So it is already shown above.
5.C. (15) Find the correlation between the two summates (Y1s+Y2s) and (X1s+X2s).
Solution: 1 2 1 2
s s s s
Cov X X
So the correlation is .6/{sqrt(3.8) * sqrt(3.6) } = .162.
Solution: The test for difference between mean vectors “failed to reject” so at this point you don’t have any evidence of difference between the two populations. For all you know, the populations might even be identical. If this is the case, it is dubious that you could classify data as having coming from one group or the other; ie, it is dubious that discriminant analysis will provide a good classification model.