Exam 2 Problems - Applied Multivariate Statistical Analysis | STAT 530, Exams of Statistics

Material Type: Exam; Professor: Habing; Class: APPLIED MULTIVARI STATS; Subject: Statistics; University: University of South Carolina - Columbia; Term: Unknown 1989;

Typology: Exams

Pre 2010

Uploaded on 09/02/2009

koofers-user-gqx
koofers-user-gqx 🇺🇸

5

(1)

10 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
STAT 530 Exam 2 - Due by 11:00am, Tuesday, December 9th
The exam should be turned into me, or the secretary in room 216; it should not be left in my mailbox.
For this exam you may use your notes, any text or reference book, the course web page, and any of SAS,
SPSS, or R.
You may not discuss the problems with anyone (especially your fellow students or other instructors)
except me.
You must turn in the code (or list of menu options) used to generate the output.
Each of the five questions is weighted equally.
Graduate students have an additional question (see page 3).
This exam uses two data sets:
The first data set is an excerpt of data gathered by Dunn (1928) concerning white leghorn fowl (e.g. chickens)
and can be found at:
http://www.stat.sc.edu/~habing/courses/data/fowl08.txt
The six columns in the data set are:
1 – ID – The ID number for the chicken
2 – SLength – The length of the skull
3 – SWidth – The width of the skull
4 – Femur – The length of the femur (on the leg)
5 – Tibia – The length of the tibia (on the leg)
6 – Humerus – The length of the humerus (on the wing)
7 – Ulna – The length of the ulna (on the wing)
8 – Age – The age of the chicken
All of the measurements (the lengths and widths in columns 2-7) are in mm and the age is in days. A set of
standardized measurements could be made by using:
standardize<-function(x){
(x-mean(x))/sd(x)}
sfowl<-apply(fowl[,2:7],2,standardize)
The second data set is http://www.stat.sc.edu/~habing/courses/data/orange.txt. The data set
concerns several samples of orange juice from several different countries (BEL, LSP, TME, and VME). Each of
them has had several chemical elements measured: boron (B), barium (BA), calcium (CA), potassium (K),
magnesium (MG), manganese (MN), phosphorous (P), rubidium (RB), and zinc (ZN). The first varibable is
simply an ID number.
A set of standardized measurements could be made by using:
standardize<-function(x){
(x-mean(x))/sd(x)}
sorange<-apply(orange[,3:11],2,standardize)
pf3

Partial preview of the text

Download Exam 2 Problems - Applied Multivariate Statistical Analysis | STAT 530 and more Exams Statistics in PDF only on Docsity!

STAT 530 Exam 2 - Due by 11:00am, Tuesday, December 9th

 The exam should be turned into me, or the secretary in room 216; it should not be left in my mailbox.  For this exam you may use your notes, any text or reference book, the course web page, and any of SAS, SPSS, or R.  You may not discuss the problems with anyone (especially your fellow students or other instructors) except me.  You must turn in the code (or list of menu options) used to generate the output.  Each of the five questions is weighted equally.  Graduate students have an additional question (see page 3). This exam uses two data sets: The first data set is an excerpt of data gathered by Dunn (1928) concerning white leghorn fowl (e.g. chickens) and can be found at: http://www.stat.sc.edu/~habing/courses/data/fowl08.txt The six columns in the data set are: 1 – ID – The ID number for the chicken 2 – SLength – The length of the skull 3 – SWidth – The width of the skull 4 – Femur – The length of the femur (on the leg) 5 – Tibia – The length of the tibia (on the leg) 6 – Humerus – The length of the humerus (on the wing) 7 – Ulna – The length of the ulna (on the wing) 8 – Age – The age of the chicken All of the measurements (the lengths and widths in columns 2-7) are in mm and the age is in days. A set of standardized measurements could be made by using: standardize<-function(x){ (x-mean(x))/sd(x)} sfowl<-apply(fowl[,2:7],2,standardize) The second data set is http://www.stat.sc.edu/~habing/courses/data/orange.txt. The data set concerns several samples of orange juice from several different countries (BEL, LSP, TME, and VME). Each of them has had several chemical elements measured: boron (B), barium (BA), calcium (CA), potassium (K), magnesium (MG), manganese (MN), phosphorous (P), rubidium (RB), and zinc (ZN). The first varibable is simply an ID number. A set of standardized measurements could be made by using: standardize<-function(x){ (x-mean(x))/sd(x)} sorange<-apply(orange[,3:11],2,standardize)

  1. Consider the two subsets of variables for the fowl data set: “Skull measurements” = SLength Swidth and “Limb measurements” = Femur Tibia Humerus Ulna Perform the appropriate multivariate analysis to determine what the strongest linear relationships are between these two sets of variables. Briefly describe what those relationships are in terms of the variables, say how you measured if the relationships were statistically significant, give a measure of the strength of the relationships, and say if you think the relationships are very strong or not.
  2. It is desired to see if the four elements MN, P, RB, and ZN are able to separate the oranges by the country of origin. Peform a linear discriminant analysis to find the combinations of these four variables that best separate the oranges into the appropriate groups, reporting the equations of the first three linear discriminant functions. Provide an appropriate, single number, measure of how well the linear discriminant functions do at separating the four groups overall. Also indicate which country’s oranges are least likely to be classified correctly. One piece of the output produced in such an analysis is the set of posterior probabilities. Give the estimated posterior probabilities for the first orange and briefly say what these probabilities indicate (imagine you are explaining to someone who knows very little about discriminant analysis and doesn’t want much detail). What assumptions need to be true in order for these probabilities to be accurate? (You do not need to check the assumptions).
  3. It is desired to see if the four countries of origin of the oranges differ on average in terms of the four elements MN, P, RB, and ZN. State the appropriate null and alternate hypothesis for these vectors of measurements and conduct the hypothesis test. Briefly say why you chose the test statistic you did. Give the output and state your conclusion. Why should you worry about violations of the equal covariance assumption for this data set? (You do not need to check the assumptions).