Study Guide for Exam 2 | Applied Multivariate Analysis | STAT 636, Exams of Descriptive statistics

Material Type: Exam; Class: APPLIED MULTIVARIATE ANLY; Subject: STATISTICS; University: Texas A&M University; Term: Unknown 1989;

Typology: Exams

Pre 2010

Uploaded on 02/10/2009

koofers-user-gah
koofers-user-gah 🇺🇸

10 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
STAT 636
EXAM #2
1. To illustrate the use of FACTOR ANALYSIS, SAS gives data for a Socio-Economic
Study conducted in the Los Angeles area. The data were taken from N 12 census areas onœ
p 5 variables. The variables are total population, (POPULATION), average years ofœ
education (SCHOOL), number of people employed (EMPLOYMENT), a measure of
professional services available (SERVICES) and the average value of houses (VALUE) In an
attempt to summarize the data and identify underlying factors, a FACTOR ANALYSIS was
performed. The principal factoring method (method = principal) with prior communalities
equal to one yielded the following results: (Note: Other factoring methods gave similar
results.)
Correlation matrix
Population School Employment Services Value
Population 1.0000 0.0098 0.9724 0.4389 0.0224
School 0.0098 1.0000 0.1543 0.6914 0.8631
Employment 0.9724 0.1543 1.0000 0.5147 0.1219
Services 0.4389 0.6914 0.5147 1.0000 0.7777
Value 0.0224 0.8631 0.1219 0.7777 1.0000
Eigenvalues
Eigenvalue Difference Proportion Cumulative
1 2.87331359 1.07665350 0.5747 0.5747
2 1.79666009 1.58182321 0.3593 0.9340
3 0.21483689 0.11490283 0.0430 0.9770
4 0.09993405 0.08467868 0.0200 0.9969
5 0.01525537 0.0031 1.0000
Factor Pattern
Factor1 Factor2
Population 0.58096 0.80642
School 0.76704 -0.54476
Employment 0.67243 0.72605
Services 0.93239 -0.10431
Value 0.79116 -0.55818
The sociologist reached the following conclusions:
(i) People tend to live where the jobs are plentiful.
(ii) People with higher education live in more expensive houses.
(iii) Over 93% of the variability in the data can be explained by two factors.
(iv) A single factor consisting of an average (possibly weighted average) of the five
variables might account for much of the difference between the 12 areas.
Question #1.a: Comment on why you think these conclusions are appropriate using the
above data to support your answers.
pf3
pf4
pf5

Partial preview of the text

Download Study Guide for Exam 2 | Applied Multivariate Analysis | STAT 636 and more Exams Descriptive statistics in PDF only on Docsity!

STAT 636

EXAM

  1. To illustrate the use of FACTOR ANALYSIS, SAS gives data for a Socio-Economic Study conducted in the Los Angeles area. The data were taken from N œ12 census areas on p œ5 variables. The variables are total population, (POPULATION), average years of education (SCHOOL), number of people employed (EMPLOYMENT), a measure of professional services available (SERVICES) and the average value of houses (VALUE) In an attempt to summarize the data and identify underlying factors, a FACTOR ANALYSIS was performed. The principal factoring method (method = principal) with prior communalities equal to one yielded the following results: (Note: Other factoring methods gave similar results.) Correlation matrix Population School Employment Services Value Population 1.0000 0.0098 0.9724 0.4389 0. School 0.0098 1.0000 0.1543 0.6914 0. Employment 0.9724 0.1543 1.0000 0.5147 0. Services 0.4389 0.6914 0.5147 1.0000 0. Value 0.0224 0.8631 0.1219 0.7777 1.

Eigenvalues Eigenvalue Difference Proportion Cumulative 1 2.87331359 1.07665350 0.5747 0. 2 1.79666009 1.58182321 0.3593 0. 3 0.21483689 0.11490283 0.0430 0. 4 0.09993405 0.08467868 0.0200 0. 5 0.01525537 0.0031 1.

Factor Pattern Factor1 Factor Population 0.58096 0. School 0.76704 -0. Employment 0.67243 0. Services 0.93239 -0. Value 0.79116 -0.

The sociologist reached the following conclusions: (i) People tend to live where the jobs are plentiful. (ii) People with higher education live in more expensive houses. (iii) Over 93% of the variability in the data can be explained by two factors. (iv) A single factor consisting of an average (possibly weighted average) of the five variables might account for much of the difference between the 12 areas.

Question #1.a: Comment on why you think these conclusions are appropriate using the above data to support your answers.

In an attempt to clarify the interpretation, a VARIMAX rotation was performed (~35 degrees) yielding the following factor pattern. Rotated factor pattern Factor1 Factor Population 0.01602 0. School 0.94076 -0. Employment 0.13702 0. Services 0.82481 0. HouseValue 0.96823 -0.

Question #1.b: Does this pattern clarify the underlying variables. Support your answer with a brief description of your observations.

The same data was analyzed using PROC PRINCOMP yielding a set of eigenvalues and eigenvectors.

Question #1.c: Recalling how the above factor analysis was obtained , what are the eigenvalues and eigenvectors for the principal components analysis?


NOTE: SINCE WE USED PRINCIPAL FACTORING WITH INITAL COMMUNALITIES=1 AND DID NOT ITERATE, WE SEE THAT THE EIGENVALES IN PC ANALAYIS WOULD BE THE SAME AND WE HAVE THE RELATION

FACTOR (^) I œ È-i EIGENVECTORI

THIS IS THE ESSENTIAL RELATION THAT HELPS TO RELATE THE TWO

CONCEPTS. THE ELEMENTS OF FACTOR REPRESENT THE CORRELATIONS OFI

THE DATA X WITH THE I TH^ FACTOR. RECALL THAT FACTOR (^) I œL AS DEFINED INI THE PC DISCUSSION.


  1. Remote sensing data were taken on seven fields planted with corn and six planted with soybeans. Four variables, x1 x4 were recorded for each field. We would like to know if these four variables can discriminate between these two crops and can a rule be developed for classifying a future observation on these four variables as either corn or soybeans. The SAS output from PROC DISCRIM yields

Classification Table Corn Soybeans Total Corn 6 1 7 Soybeans 1 5 6 Total 7 6 13

(iv) What do you conclude from the CANDISC output?

In a further study, three crops, Clover, Cotton and Sugar beets were examined on these same four variables. PROC DISCRIM yields the following result.

Linear Discriminant Function for Crop

Variable Clover Cotton Sugar beets Constant -6.0 -7.0 -5. x1 0.10 0.01 0. x2 0.12 0.11 0. x3 0.10 0.11 0. x4 0.12 0.15 0.

Question #2.b (i) Determine the rule for classifying a new field. (ii) Suppose a new field yields (x1  x4) œœœœ (20 30 10 20). How would you classify it?


JUST AS ABOVE, WE CAN DEFINE THREE LINES, L (^) CLOVER/COTTON , L (^) CLOVER/BEETSAND L (^) COTTON/BEETSBY TAKING DIFFERENCES IN COLUMNS. IF OUR NEW FIELD YIELDS L (^) CLOVER/COTTON ā0 WE CHOOSE CLOVER OVER COTTON AND IF IT YIELDS L (^) CLOVER/BEETS ā0, WE CHOOSE CLOVER OVER BEETS ETC. THIS HELPS US VISUALIZE THE RULE.

ALTERNATIVELY, WE CAN EVALUATE EACH COLUMN FOR THE NEW FILED THUS COMPUTING. L (^) CLOVER, L COTTONAND L BEETS NOTING FOR EXAMPLE THAT L (^) CLOVER/COTTON œ L (^) CLOVER  L (^) COTTONETC, WE SEE THAT THE ABOVE RULE SAYS ASSIGN THE NEW FIELD TO THE ONE THAT YIELD THE MAXIMUM OF L (^) CLOVER, L (^) COTTONAND L BEETS. IN THIS CASE WE HAVE L (^) CLOVER œ 3, L (^) COTTONœ.6 AND L (^) BEETS œ1.3 HENCE WE CONCLUDE CLOVER.


  1. A remote sensing study was performed on 24 fields recording the variables x1 x4. We wish to see if this data defines groups of fields that are similar and the data is submitted to PROC CLUSTER. As a part of your analysis, you would like to have a graphic display of how well you did.

Question #3a. Describe briefly how you would obtain this in SAS. It is not necessary to give detailed computer commands.

Suppose the following cluster description was obtained:

OBS ID CLUS

1 2 1 2 5 1 3 1 1 4 15 1 5 33 1 6 4 1 7 17 1 8 3 1 9 6 1 10 14 1 11 16 1 12 27 1

OBS ID CLUS OBS ID CLUS

Question #3b After the analysis was performed it was determined that the three crops observed were CORN with ID (^) œœœœ 1  7, COTTON with ID œœœœ 14  19 and CLOVER with ID œœœœ (^26)  36. Develop a frequency table to assess the effectiveness of the clustering procedure,

  1. You are given an N by p data matrix, X, consisting of N p-variate observations. Sometimes we are interested in the rows of X and sometimes in the columns.

Question #4. Discuss (briefly) four multivariate methods that are used for one or the other of these purposes, indicating the objective of the method and a concise description of the basic computations involved.


PC: GIVEN THE P-VARIATE DATA X, WE WISH TO DEFINE A SMALL NUMBER OF LINEAR FUNCTIONS OF X THAT EXPLAIN THE VARIABILITY IN PARTICULAR WE SEEK Y1 THAT HAS MAXIMUM VARIABLITY, Y2 THAT IS UNCORRELATED WITH Y1 AND NEXT LARGEST VARIABILITY ETC. THESE ARE DETERMINED BY THE EIGENVECTORS AND VARIANCES BY THE EIGENVALUES. THIS CORRESPONDS TO AN ORTHOGONAL ROTATION OF THE COORINATE SYSTEM. SMALL EIGEN VALUES SUGGEST DEGENERACIES IN THE DATA.**

FA: HERE WE ARE INTERESTED IN EXPLAINING THE DATA X IN TERMS OF A SMALL NUMBER OF COMMON FACTORS. THE FACTORS ARE SIMILAR TO THE PC EIGENVALUES (SEE QUESTION 1C) THE FACTORS DEFINE LOADINGS THAT ARE THE CORREALTION OF THE X DATA WITH THE FACTORS. AFTER OBATINING THE FACTORS WE MAY CONSIDER ROTATING THEM TO