Download Exam 2 Problems - Applied Multivariate Statistical Analysis | STAT 530 and more Exams Statistics in PDF only on Docsity!
STAT 530 Exam 2 - Due by 11:00am, Tuesday, December 9th
The exam should be turned into me, or the secretary in room 216; it should not be left in my mailbox. For this exam you may use your notes, any text or reference book, the course web page, and any of SAS, SPSS, or R. You may not discuss the problems with anyone (especially your fellow students or other instructors) except me. You must turn in the code (or list of menu options) used to generate the output. Each of the five questions is weighted equally. Graduate students have an additional question (see page 3). This exam uses two data sets: The first data set is an excerpt of data gathered by Dunn (1928) concerning white leghorn fowl (e.g. chickens) and can be found at: http://www.stat.sc.edu/~habing/courses/data/fowl08.txt The six columns in the data set are: 1 – ID – The ID number for the chicken 2 – SLength – The length of the skull 3 – SWidth – The width of the skull 4 – Femur – The length of the femur (on the leg) 5 – Tibia – The length of the tibia (on the leg) 6 – Humerus – The length of the humerus (on the wing) 7 – Ulna – The length of the ulna (on the wing) 8 – Age – The age of the chicken All of the measurements (the lengths and widths in columns 2-7) are in mm and the age is in days. A set of standardized measurements could be made by using: standardize<-function(x){ (x-mean(x))/sd(x)} sfowl<-apply(fowl[,2:7],2,standardize) The second data set is http://www.stat.sc.edu/~habing/courses/data/orange.txt. The data set concerns several samples of orange juice from several different countries (BEL, LSP, TME, and VME). Each of them has had several chemical elements measured: boron (B), barium (BA), calcium (CA), potassium (K), magnesium (MG), manganese (MN), phosphorous (P), rubidium (RB), and zinc (ZN). The first varibable is simply an ID number. A set of standardized measurements could be made by using: standardize<-function(x){ (x-mean(x))/sd(x)} sorange<-apply(orange[,3:11],2,standardize)
- Consider the two subsets of variables for the fowl data set: “Skull measurements” = SLength Swidth and “Limb measurements” = Femur Tibia Humerus Ulna Perform the appropriate multivariate analysis to determine what the strongest linear relationships are between these two sets of variables. Briefly describe what those relationships are in terms of the variables, say how you measured if the relationships were statistically significant, give a measure of the strength of the relationships, and say if you think the relationships are very strong or not.
- It is desired to see if the four elements MN, P, RB, and ZN are able to separate the oranges by the country of origin. Peform a linear discriminant analysis to find the combinations of these four variables that best separate the oranges into the appropriate groups, reporting the equations of the first three linear discriminant functions. Provide an appropriate, single number, measure of how well the linear discriminant functions do at separating the four groups overall. Also indicate which country’s oranges are least likely to be classified correctly. One piece of the output produced in such an analysis is the set of posterior probabilities. Give the estimated posterior probabilities for the first orange and briefly say what these probabilities indicate (imagine you are explaining to someone who knows very little about discriminant analysis and doesn’t want much detail). What assumptions need to be true in order for these probabilities to be accurate? (You do not need to check the assumptions).
- It is desired to see if the four countries of origin of the oranges differ on average in terms of the four elements MN, P, RB, and ZN. State the appropriate null and alternate hypothesis for these vectors of measurements and conduct the hypothesis test. Briefly say why you chose the test statistic you did. Give the output and state your conclusion. Why should you worry about violations of the equal covariance assumption for this data set? (You do not need to check the assumptions).
- In many instances it is desirable to find groups of individuals that are similar, whether it is to have control over variation in a designed experiment or for casting purposes in movies. Find a group of four oranges such that none of them differ very much from any of the nine elements from any of the other three. (It does not matter which country they come from, or even if it is the same country). Make a table showing how those four oranges compare on each of those nine elements. Briefly justify your choice of method to select the group. Be sure to include a justification for any options that you had to specify in carrying out the analysis and give any output you used to make your decision.
- One way of graphically displaying the relationships between the oranges would be to construct a “map” of them based on the amount of the nine elements present in them. Determine the “best” number of dimensions for capturing and displaying the results using isometric multidimensional scaling with Karl Pearson distance. There are likely several possible number of dimensions that could be reasonable, justify your choice. For the two-dimensional scaling produce a plot of the bears labeled by country of origin. Do the four countries seem well separated on your map? Graduate Students, also answer one of the following: A) There is a “seemingly unrelated” method that can be used to answer questions similar to both MANOVA and discriminant analysis. Name the method for the two group case, give the general equation that would describe the relationship between the two groups and the variables (being sure to identify any symbols you use in the equation). B) In problem 5, the plot of the first two dimensions may be somewhat different for the 2 and 3 dimensional solutions. Briefly explain why this cannot happen for classical multidimensional scaling.