Exam 1 Problems - Applied Multivariate Statistical Analysis | STAT 530, Exams of Statistics

Material Type: Exam; Professor: Habing; Class: APPLIED MULTIVARI STATS; Subject: Statistics; University: University of South Carolina - Columbia; Term: May 1991;

Typology: Exams

Pre 2010

Uploaded on 09/02/2009

koofers-user-1cp
koofers-user-1cp 🇺🇸

3.7

(3)

10 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
STAT 530 Exam 1 - Due by 4:00 pm on Thursday, October 23rd
The exam should be turned into me, or the secretary in room 216; it should not be left in my mailbox.
For this exam you may use your notes, any text or reference book, the course web page, and any of SAS,
SPSS, or R.
You may not discuss the problems with anyone (especially your fellow students or other instructors)
except me.
You must turn in the code (or list of menu options) used to generate the output.
Each of the three questions is weighted equally.
Graduate students have an additional question (see page 3).
This exam uses two data sets:
The first data set is an excerpt of data gathered by Dunn (1928) concerning white leghorn fowl (e.g. chickens)
and can be found at:
http://www.stat.sc.edu/~habing/courses/data/fowl08.txt
The six columns in the data set are:
1 – ID – The ID number for the chicken
2 – SLength – The length of the skull
3 – SWidth – The width of the skull
4 – Femur – The length of the femur (on the leg)
5 – Tibia – The length of the tibia (on the leg)
6 – Humerus – The length of the humerus (on the wing)
7 – Ulna – The length of the ulna (on the wing)
8 – Age – The age of the chicken
All of the measurements (the lengths and widths in columns 2-7) are in mm and the age is in days.
The second data set contains a portion of the results of a survey by Roberts and Lattin (1991) concerning
Australian consumers' perceptions of their favorite cereal brands. It can be found at:
http://www.stat.sc.edu/~habing/courses/data/cereal08.txt
In particular, it contains the ratings of 12 different brands of breakfast cereal by 116 respondents. Each
respondent rated at least one of the cereals on 13 attributes.
The thirteen attributes
were:
1. Natural
2. Fibre
3. Sweet
4. Salt
5. Fun
6. Soggy
7. Health
8. Plain
9. Crisp
10. Sugar
11. Treat
12. Boring
13. Nutritious
And the brands of
cereal were:
1. All Bran
2. Cerola Muesli
3. Just Right
4. Kellogg's Corn
Flakes
5. Komplete
6. NutriGrain
7. Purina Muesli
8. Rice Bubbles
9. Special K
10. Sustain
11. Vitabrit
12. Weetbix
The ratings given were from one to five, with a five meaning the consumer felt the cereal possessed that
attribute, and a one meaning that it did not possess that attribute.
Looking at the data set, the first line of the data says that reviewer 108 felt that All Bran was high on "Fibre"
(scored it a 5) and was low on “Boring” attribute (scored it a 1).
pf3

Partial preview of the text

Download Exam 1 Problems - Applied Multivariate Statistical Analysis | STAT 530 and more Exams Statistics in PDF only on Docsity!

STAT 530 Exam 1 - Due by 4:00 pm on Thursday, October 23rd

 The exam should be turned into me, or the secretary in room 216; it should not be left in my mailbox.  For this exam you may use your notes, any text or reference book, the course web page, and any of SAS, SPSS, or R.  You may not discuss the problems with anyone (especially your fellow students or other instructors) except me.  You must turn in the code (or list of menu options) used to generate the output.  Each of the three questions is weighted equally.  Graduate students have an additional question (see page 3). This exam uses two data sets: The first data set is an excerpt of data gathered by Dunn (1928) concerning white leghorn fowl (e.g. chickens) and can be found at: http://www.stat.sc.edu/~habing/courses/data/fowl08.txt The six columns in the data set are: 1 – ID – The ID number for the chicken 2 – SLength – The length of the skull 3 – SWidth – The width of the skull 4 – Femur – The length of the femur (on the leg) 5 – Tibia – The length of the tibia (on the leg) 6 – Humerus – The length of the humerus (on the wing) 7 – Ulna – The length of the ulna (on the wing) 8 – Age – The age of the chicken All of the measurements (the lengths and widths in columns 2-7) are in mm and the age is in days. The second data set contains a portion of the results of a survey by Roberts and Lattin (1991) concerning Australian consumers' perceptions of their favorite cereal brands. It can be found at: http://www.stat.sc.edu/~habing/courses/data/cereal08.txt In particular, it contains the ratings of 12 different brands of breakfast cereal by 116 respondents. Each respondent rated at least one of the cereals on 13 attributes. The thirteen attributes were:

  1. Natural
  2. Fibre
  3. Sweet
  4. Salt
  5. Fun
  6. Soggy
    1. Health
    2. Plain
    3. Crisp
    4. Sugar
    5. Treat
    6. Boring
    7. Nutritious And the brands of cereal were:
    8. All Bran
    9. Cerola Muesli
    10. Just Right
    11. Kellogg's Corn Flakes
    12. Komplete
      1. NutriGrain
      2. Purina Muesli
      3. Rice Bubbles
      4. Special K
      5. Sustain
      6. Vitabrit
      7. Weetbix The ratings given were from one to five, with a five meaning the consumer felt the cereal possessed that attribute, and a one meaning that it did not possess that attribute. Looking at the data set, the first line of the data says that reviewer 108 felt that All Bran was high on "Fibre" (scored it a 5) and was low on “Boring” attribute (scored it a 1).

1a) Find a pair of variables from the fowl measurements (columns 2-7) that seems to be (at least approximately) from a multivariate normal population. Justify your answer. b) Identify one variable from the fowl measurements that seems to have light tails and one that seems to have heavy tails. Justify your answer. c) Identify any potential outlier chickens (by ID number) in the set of fowl measurement. Justify your answer. d) Construct a scatter plot of SLength against Femur and briefly describe the relationship seen there. As a follow-up, construct a plot that also takes Swidth into account. Briefly describe what additional insight using this third variable gives. e) Construct a scatter-plot of Natural against Nutritious that indicates how many observations are actually involved. Make sure the figure has appropriate labels and captions, and briefly explain how it should be read.

  1. It is desired to see if the set of six fowl measurements can be reduced to a smaller set of variables without losing much information. a) Choose to use either the correlation or covariance matrix and justify your choice. b) Indicate the minimum number of principal components required to explain 95% of the variation in the six measurement variables. c) The number of components you chose in (b) explain at least 95% of the total variation in the six variables. Check if any of the six have significantly less of their variation explained than the other variables. d) Describe the characteristics that a chicken with a large positive first principal component is likely to have. e) Describe (using either numeric summaries or graphical summaries) the relationship between the first principal component and the age of the chicken.
  2. In the cereal data, the thirteen variables measured about each cereal provide a lot of information. However it is likely too much information to be easily useful, and it seems reasonable that some underlying traits influence the responses to each of the individuals to the thirteen questions. One way to investigate this would be to perform an exploratory factor analysis on the data set. a) Identify the appropriate number of factors needed in order to get a model that is parsimonious, fits, and has enough large factor loadings when using the varimax rotation. Justify your answer. b) Identify the practically significant loadings for each factor of the model you chose in (a). Briefly give a feeling for what each factor seems to be representing. c) Identify which variables have the most variance explained by the common factors in the model and which have the least. d) What percentage of the variation in “Natural” is explained by Factor 1? e) Draw a path diagram showing the factors you found in the exploratory factor analysis and which of the variables they influence.