Examination 2 - Applied Multivariable Statistics | STAT 530, Exams of Statistics

Material Type: Exam; Professor: Habing; Class: APPLIED MULTIVARI STATS; Subject: Statistics; University: University of South Carolina - Columbia; Term: Unknown 1989;

Typology: Exams

Pre 2010

Uploaded on 09/02/2009

koofers-user-o34
koofers-user-o34 ๐Ÿ‡บ๐Ÿ‡ธ

4

(1)

10 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
STAT 530 Exam 2 - Due by 5:30pm Tuesday, December 6th
โ€ข The exam should be turned into me, or the secretary in room 216; it should not be left in my mailbox.
โ€ข You may use your notes, any text or reference book, the course web page, and any combination of R, SAS,
and SPSS
โ€ข You may not discuss the problems with anyone (especially your fellow students or other instructors)
except me. I am happy to give computer advice.
โ€ข You must turn in the code/commands used to generate the output.
โ€ข The exam is worth 80 points. Question 1 is worth 10 points and questions 2-6 are worth 14 points each.
Notice that students taking the course for graduate credit must answer one of the two additional questions.
The raw data set used for all of the following questions can be found at:
http://www.stat.sc.edu/~habing/courses/data/finbears.txt
It is an expanded version of a data set we looked at earlier this semester and is based on a data set originally
described in "Reader's Digest" (April, 1979) and "Sports Afield", (September, 1981). It consists of several
measurements for bears that were captured, measured, and released. (The full data set actually caught several of
the bears multiple times over a period of years.)
The variables in the data set are:
Name โ€“ Name of the Bear
Sex โ€“ M=Male, F=Female
Age โ€“ Estimated Age in Months
Maturity โ€“ mature if more than 18 months, otherwise young
Weight โ€“ Weight in Pounds
Length โ€“ Body Length in Inches
ChestG โ€“ Girth of Chest in Inches
HeadL โ€“ Length of Head in Inches
HeadW โ€“ Width of Head in Inches
NeckG โ€“ Girth of Neck in Inches
sChestG โ€“ Standardized ChestG divided by Length
sHeadL โ€“ Standardized HeadL divided by Length
sHeadW โ€“ Standardized HeadW divided by Length
sNeckG โ€“ Standardized NeckG divided by Length
Grp โ€“ First letter indicates sex, second is Y for young and O for mature
The observations are currently ordered by name.
If it is read into R as fbears it may be useful to use attach(fbears)so that you can refer to each column
merely by its name.
Also recall that:
cbind will produce a matrix of the selected columns, as in cbind(Length,NeckG,sNeckG)
as.matrix can do this as well, as in as.matrix(fbears[,c(6,10,14)])
as.character will convert a factor to a string, as in as.character(Name)
== will let you indicate a subset of matrix, as in fbears[Grp=="FO",]
pf3

Partial preview of the text

Download Examination 2 - Applied Multivariable Statistics | STAT 530 and more Exams Statistics in PDF only on Docsity!

STAT 530 Exam 2 - Due by 5:30pm Tuesday, December 6 th

  • The exam should be turned into me, or the secretary in room 216; it should not be left in my mailbox.
  • You may use your notes, any text or reference book, the course web page, and any combination of R, SAS, and SPSS
  • You may not discuss the problems with anyone (especially your fellow students or other instructors) except me. I am happy to give computer advice.
  • You must turn in the code/commands used to generate the output.
  • The exam is worth 80 points. Question 1 is worth 10 points and questions 2-6 are worth 14 points each. Notice that students taking the course for graduate credit must answer one of the two additional questions.

The raw data set used for all of the following questions can be found at: http://www.stat.sc.edu/~habing/courses/data/finbears.txt

It is an expanded version of a data set we looked at earlier this semester and is based on a data set originally described in "Reader's Digest" (April, 1979) and "Sports Afield", (September, 1981). It consists of several measurements for bears that were captured, measured, and released. (The full data set actually caught several of the bears multiple times over a period of years.)

The variables in the data set are:

Name โ€“ Name of the Bear Sex โ€“ M=Male, F=Female Age โ€“ Estimated Age in Months Maturity โ€“ mature if more than 18 months, otherwise young Weight โ€“ Weight in Pounds Length โ€“ Body Length in Inches ChestG โ€“ Girth of Chest in Inches HeadL โ€“ Length of Head in Inches HeadW โ€“ Width of Head in Inches NeckG โ€“ Girth of Neck in Inches sChestG โ€“ Standardized ChestG divided by Length sHeadL โ€“ Standardized HeadL divided by Length sHeadW โ€“ Standardized HeadW divided by Length sNeckG โ€“ Standardized NeckG divided by Length Grp โ€“ First letter indicates sex, second is Y for young and O for mature

The observations are currently ordered by name.

If it is read into R as fbears it may be useful to use attach(fbears)so that you can refer to each column merely by its name.

Also recall that: cbind will produce a matrix of the selected columns, as in cbind(Length,NeckG,sNeckG) as.matrix can do this as well, as in as.matrix(fbears[,c(6,10,14)]) as.character will convert a factor to a string, as in as.character(Name) == will let you indicate a subset of matrix, as in fbears[Grp=="FO",]

  1. Each part of this question can be answered simply by providing the appropriate name.

a) What multivariate statistical method would both give an idea of how many underlying latent variables are needed to model the quantitative variables in this data set as well as to describe what the relationships are between the latent and observed variables?

b) Which method from the first half of the course gives the same result as classical multidimensional scaling using Euclidean distance (be specific).

c) What multivariate method is used to test if a specific path diagram describes the underlying relationship between the variables in this data set.

d) If bears took multiple choice tests, what group of multivariate statistical methods would be useful for analyzing their test results.

e) What type of distance is very closely related to multivariate normality, the chi-square plot for checking multivariate normality, and Hotellingโ€™s T-square?

  1. It is desired to see if the four standardized sChestG, sHeadL, sHeadW, sNeckG, along with the overall Length are able to separate the bears based on sex and maturity group.

Peform a linear discriminant analysis to find the combinations of these five variables that best separate the bears into the four groups. Provide an appropriate measure of how well the linear discriminant functions do at separating the four groups overall, and indicate which of the groups seem to be identified well and which seem to be identified poorly.

One piece of the output produced in such an analysis is the set of posterior probabilities. Give the estimated posterior probabilities for the first bear Adam and briefly say what these probabilities indicate (imagine you are explaining to someone who knows very little about discriminant analysis and doesnโ€™t want much detail). What assumptions need to be true in order for these probabilities to be accurate? (You do not need to check the assumptions).

  1. In many instances it is desirable to find groups of individuals that are similar, whether it is to have control over variation in a designed experiment or for casting purposes in movies.

Find a group of four bears such that none of them differ very much from any of the others in the group in terms of any of the four rescaled and standardized variables sChestG, sHeadL, sHeadW, and sNeckG.

Make a table showing how those four bears compare on each of those four variables.

Briefly justify your choice of method to select the group. Be sure to include a justification for any options that you had to specify in carrying out the analysis and give any output you used to make your decision.