Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

For each uploaded document

Answer questions

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Study Guide for Exam 2 | Applied Multivariate Analysis | STAT 636, Exams of Descriptive statistics

Texas A&M University (A&M)Descriptive statistics

Material Type: Exam; Class: APPLIED MULTIVARIATE ANLY; Subject: STATISTICS; University: Texas A&M University; Term: Unknown 1989;

Typology: Exams

Pre 2010

Uploaded on 02/10/2009

koofers-user-gah 🇺🇸

10 documents

1 / 6

This page cannot be seen from the preview

Don't miss anything!

bg1

STAT 636

EXAM #2

1. To illustrate the use of FACTOR ANALYSIS, SAS gives data for a Socio-Economic

Study conducted in the Los Angeles area. The data were taken from N 12 census areas onœ

p 5 variables. The variables are total population, (POPULATION), average years ofœ

education (SCHOOL), number of people employed (EMPLOYMENT), a measure of

professional services available (SERVICES) and the average value of houses (VALUE) In an

attempt to summarize the data and identify underlying factors, a FACTOR ANALYSIS was

performed. The principal factoring method (method = principal) with prior communalities

equal to one yielded the following results: (Note: Other factoring methods gave similar

results.)

Correlation matrix

Population School Employment Services Value

Population 1.0000 0.0098 0.9724 0.4389 0.0224

School 0.0098 1.0000 0.1543 0.6914 0.8631

Employment 0.9724 0.1543 1.0000 0.5147 0.1219

Services 0.4389 0.6914 0.5147 1.0000 0.7777

Value 0.0224 0.8631 0.1219 0.7777 1.0000

Eigenvalues

Eigenvalue Difference Proportion Cumulative

1 2.87331359 1.07665350 0.5747 0.5747

2 1.79666009 1.58182321 0.3593 0.9340

3 0.21483689 0.11490283 0.0430 0.9770

4 0.09993405 0.08467868 0.0200 0.9969

5 0.01525537 0.0031 1.0000

Factor Pattern

Factor1 Factor2

Population 0.58096 0.80642

School 0.76704 -0.54476

Employment 0.67243 0.72605

Services 0.93239 -0.10431

Value 0.79116 -0.55818

The sociologist reached the following conclusions:

(i) People tend to live where the jobs are plentiful.

(ii) People with higher education live in more expensive houses.

(iii) Over 93% of the variability in the data can be explained by two factors.

(iv) A single factor consisting of an average (possibly weighted average) of the five

variables might account for much of the difference between the 12 areas.

Question #1.a: Comment on why you think these conclusions are appropriate using the

above data to support your answers.

pf3

pf4

pf5

Discover Exams of Descriptive statistics Texas A&M University (A&M)

Related documents

Homework 3 - Applied Multivariate Analysis | STAT 636

Canonical Correlation Analysis - Applied Multivariate Analysis | STAT 636

(1)

Notes on Inference on Mean Vectors - Applied Multivariate Analysis | STAT 636

Applied Multivariate Statistics - Homework 1 | STAT 5600

Multivariate Analysis of Variance - Lecture Notes | STAT 636

MANOVA Homework - Applied Multivariate Methods | STAT 579

Homework Problems 3 - Applied Multivariate Analysis I | STAT 8108

Solved Homework 5 - Applied Multivariate Analysis I | STAT 8108

Applied Multivariate Methods - Homework Repeated Measures | STAT 579

Classification on Applied Multivariate Statistics - Homework 10 | STAT 5600

Factor Analysis - Applied Multivariate Statistics - Assignment 5 | STAT 5600

Homework 3 Sample Questions - Applied Multivariate Statistics | STAT 5600

Partial preview of the text

Download Study Guide for Exam 2 | Applied Multivariate Analysis | STAT 636 and more Exams Descriptive statistics in PDF only on Docsity!

STAT 636

EXAM

To illustrate the use of FACTOR ANALYSIS, SAS gives data for a Socio-Economic Study conducted in the Los Angeles area. The data were taken from N œ12 census areas on p œ5 variables. The variables are total population, (POPULATION), average years of education (SCHOOL), number of people employed (EMPLOYMENT), a measure of professional services available (SERVICES) and the average value of houses (VALUE) In an attempt to summarize the data and identify underlying factors, a FACTOR ANALYSIS was performed. The principal factoring method (method = principal) with prior communalities equal to one yielded the following results: (Note: Other factoring methods gave similar results.) Correlation matrix Population School Employment Services Value Population 1.0000 0.0098 0.9724 0.4389 0. School 0.0098 1.0000 0.1543 0.6914 0. Employment 0.9724 0.1543 1.0000 0.5147 0. Services 0.4389 0.6914 0.5147 1.0000 0. Value 0.0224 0.8631 0.1219 0.7777 1.

Eigenvalues Eigenvalue Difference Proportion Cumulative 1 2.87331359 1.07665350 0.5747 0. 2 1.79666009 1.58182321 0.3593 0. 3 0.21483689 0.11490283 0.0430 0. 4 0.09993405 0.08467868 0.0200 0. 5 0.01525537 0.0031 1.

Factor Pattern Factor1 Factor Population 0.58096 0. School 0.76704 -0. Employment 0.67243 0. Services 0.93239 -0. Value 0.79116 -0.

The sociologist reached the following conclusions: (i) People tend to live where the jobs are plentiful. (ii) People with higher education live in more expensive houses. (iii) Over 93% of the variability in the data can be explained by two factors. (iv) A single factor consisting of an average (possibly weighted average) of the five variables might account for much of the difference between the 12 areas.

Question #1.a: Comment on why you think these conclusions are appropriate using the above data to support your answers.

In an attempt to clarify the interpretation, a VARIMAX rotation was performed (~35 degrees) yielding the following factor pattern. Rotated factor pattern Factor1 Factor Population 0.01602 0. School 0.94076 -0. Employment 0.13702 0. Services 0.82481 0. HouseValue 0.96823 -0.

Question #1.b: Does this pattern clarify the underlying variables. Support your answer with a brief description of your observations.

The same data was analyzed using PROC PRINCOMP yielding a set of eigenvalues and eigenvectors.

Question #1.c: Recalling how the above factor analysis was obtained , what are the eigenvalues and eigenvectors for the principal components analysis?

NOTE: SINCE WE USED PRINCIPAL FACTORING WITH INITAL COMMUNALITIES=1 AND DID NOT ITERATE, WE SEE THAT THE EIGENVALES IN PC ANALAYIS WOULD BE THE SAME AND WE HAVE THE RELATION

FACTOR (^) I œ È-i EIGENVECTORI

THIS IS THE ESSENTIAL RELATION THAT HELPS TO RELATE THE TWO

CONCEPTS. THE ELEMENTS OF FACTOR REPRESENT THE CORRELATIONS OFI

THE DATA X WITH THE I TH^ FACTOR. RECALL THAT FACTOR (^) I œL AS DEFINED INI THE PC DISCUSSION.

Remote sensing data were taken on seven fields planted with corn and six planted with soybeans. Four variables, x1 x4 were recorded for each field. We would like to know if these four variables can discriminate between these two crops and can a rule be developed for classifying a future observation on these four variables as either corn or soybeans. The SAS output from PROC DISCRIM yields

Classification Table Corn Soybeans Total Corn 6 1 7 Soybeans 1 5 6 Total 7 6 13

(iv) What do you conclude from the CANDISC output?

In a further study, three crops, Clover, Cotton and Sugar beets were examined on these same four variables. PROC DISCRIM yields the following result.

Linear Discriminant Function for Crop

Variable Clover Cotton Sugar beets Constant -6.0 -7.0 -5. x1 0.10 0.01 0. x2 0.12 0.11 0. x3 0.10 0.11 0. x4 0.12 0.15 0.

Question #2.b (i) Determine the rule for classifying a new field. (ii) Suppose a new field yields (x1 x4) œœœœ (20 30 10 20). How would you classify it?

JUST AS ABOVE, WE CAN DEFINE THREE LINES, L (^) CLOVER/COTTON , L (^) CLOVER/BEETSAND L (^) COTTON/BEETSBY TAKING DIFFERENCES IN COLUMNS. IF OUR NEW FIELD YIELDS L (^) CLOVER/COTTON ā0 WE CHOOSE CLOVER OVER COTTON AND IF IT YIELDS L (^) CLOVER/BEETS ā0, WE CHOOSE CLOVER OVER BEETS ETC. THIS HELPS US VISUALIZE THE RULE.

ALTERNATIVELY, WE CAN EVALUATE EACH COLUMN FOR THE NEW FILED THUS COMPUTING. L (^) CLOVER, L COTTONAND L BEETS NOTING FOR EXAMPLE THAT L (^) CLOVER/COTTON œ L (^) CLOVER L (^) COTTONETC, WE SEE THAT THE ABOVE RULE SAYS ASSIGN THE NEW FIELD TO THE ONE THAT YIELD THE MAXIMUM OF L (^) CLOVER, L (^) COTTONAND L BEETS. IN THIS CASE WE HAVE L (^) CLOVER œ 3, L (^) COTTONœ.6 AND L (^) BEETS œ1.3 HENCE WE CONCLUDE CLOVER.

A remote sensing study was performed on 24 fields recording the variables x1 x4. We wish to see if this data defines groups of fields that are similar and the data is submitted to PROC CLUSTER. As a part of your analysis, you would like to have a graphic display of how well you did.

Question #3a. Describe briefly how you would obtain this in SAS. It is not necessary to give detailed computer commands.

Suppose the following cluster description was obtained:

OBS ID CLUS

1 2 1 2 5 1 3 1 1 4 15 1 5 33 1 6 4 1 7 17 1 8 3 1 9 6 1 10 14 1 11 16 1 12 27 1

OBS ID CLUS OBS ID CLUS

Question #3b After the analysis was performed it was determined that the three crops observed were CORN with ID (^) œœœœ 1 7, COTTON with ID œœœœ 14 19 and CLOVER with ID œœœœ (^26) 36. Develop a frequency table to assess the effectiveness of the clustering procedure,

You are given an N by p data matrix, X, consisting of N p-variate observations. Sometimes we are interested in the rows of X and sometimes in the columns.

Question #4. Discuss (briefly) four multivariate methods that are used for one or the other of these purposes, indicating the objective of the method and a concise description of the basic computations involved.

PC: GIVEN THE P-VARIATE DATA X, WE WISH TO DEFINE A SMALL NUMBER OF LINEAR FUNCTIONS OF X THAT EXPLAIN THE VARIABILITY IN PARTICULAR WE SEEK Y1 THAT HAS MAXIMUM VARIABLITY, Y2 THAT IS UNCORRELATED WITH Y1 AND NEXT LARGEST VARIABILITY ETC. THESE ARE DETERMINED BY THE EIGENVECTORS AND VARIANCES BY THE EIGENVALUES. THIS CORRESPONDS TO AN ORTHOGONAL ROTATION OF THE COORINATE SYSTEM. SMALL EIGEN VALUES SUGGEST DEGENERACIES IN THE DATA.**

FA: HERE WE ARE INTERESTED IN EXPLAINING THE DATA X IN TERMS OF A SMALL NUMBER OF COMMON FACTORS. THE FACTORS ARE SIMILAR TO THE PC EIGENVALUES (SEE QUESTION 1C) THE FACTORS DEFINE LOADINGS THAT ARE THE CORREALTION OF THE X DATA WITH THE FACTORS. AFTER OBATINING THE FACTORS WE MAY CONSIDER ROTATING THEM TO