Applied Multivariate Analysis: M.Phil. Thesis Questions, Exams of Statistics

The questions for the m.phil. Thesis exam in applied multivariate analysis. The exam covers topics such as multivariate normal distribution, principal component analysis, and dissimilarity measures. Students are required to answer three out of four questions, each worth 20 marks. The document also includes instructions for the exam and stationery requirements.

Typology: Exams

2012/2013

Uploaded on 02/26/2013

dharmaketu
dharmaketu 🇮🇳

4.6

(165)

99 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
M. PHIL. IN STATISTICAL SCIENCE
Monday 12 June 2006 1.30 to 3.30
APPLIED MULTIVARIATE ANALYSIS
Attempt THREE questions. There are FOUR questions in total.
Marks for each question are indicated on the paper in square brackets.
Each question is worth a total of 20 marks.
You may use the following results without proof.
Given XNp(µ,Σ) and (p×q)matrix Awith q < p,ATXNq(ATµ, ATΣA);
and
RRg(x)dx is minimised with respect to Rby R={x:g(x)<0}, for any function
g.
STATIONERY REQUIREMENTS SPECIAL REQUIREMENTS
Cover sheet None
Treasury Tag
Script paper
You may not start to read the questions
printed on the subsequent pages until
instructed to do so by the Invigilator.
pf3
pf4
pf5

Partial preview of the text

Download Applied Multivariate Analysis: M.Phil. Thesis Questions and more Exams Statistics in PDF only on Docsity!

M. PHIL. IN STATISTICAL SCIENCE

Monday 12 June 2006 1.30 to 3.

APPLIED MULTIVARIATE ANALYSIS

Attempt THREE questions. There are FOUR questions in total. Marks for each question are indicated on the paper in square brackets. Each question is worth a total of 20 marks.

You may use the following results without proof.

Given X ∼ Np(μ, Σ) and (p × q) matrix A with q < p, AT^ X ∼ Nq (AT^ μ, AT^ ΣA);

and

∫ R g(x)dx^ is minimised with respect to^ R^ by^ R

∗ (^) = {x : g(x) < 0 }, for any function

g.

STATIONERY REQUIREMENTS SPECIAL REQUIREMENTS

Cover sheet None Treasury Tag Script paper

You may not start to read the questions

printed on the subsequent pages until

instructed to do so by the Invigilator.

1 (a) Given iid observations Z 1 ,... , Zp ∼ N (0, 1) state how these can be used to obtain a single observation X ∼ N (μ, Σ) for a given (p × 1) vector μ and for positive definite non-singular (p × p) matrix Σ. Hence, or otherwise, derive the density function for the p-dimensional multivariate normal distribution with mean μ and covariance matrix Σ. [8]

(b) Suppose X ∼ Np(μ, Σ) is partitioned as X =

[

X 1

X 2

]

where X 1 is (q × 1) with

q < p and the corresponding partitions of μ and Σ are μ =

[

μ 1 μ 2

]

and Σ =

[

]

where Σ 11 and Σ 22 are positive semi-definite symmetric matrices. Show that the marginal distribution of X 1 is Nq (μ 1 , Σ 11 ). [3]

(c) Suppose we have data X 1 ,... , Xn ∼ Np(μx, Σ) with Σ unknown. State the form of the best test statistic for testing H 0 : μx = μ 0 vs H 1 : μx 6 = μ 0. What is the null distribution of this test statistic? [3]

Suppose Y = AX+b where A is a (p×p) non-singular matrix and b a (p×1)-vector. Show that your test statistic above for testing H 0 : μx = μ 0 is identical to that used in testing H 0 : μy = Aμ 0 + b. What does this tell you about the properties of your test statistic with regards non-singular linear transformations of the variables? [6]

Applied Multivariate Analysis

3 (a) What is the purpose of principal components analysis? [1]

Let X be a p-variate random variable with covariance matrix Σ. Derive the first principal component of X. [5]

Define the remaining principal components of X. [1] (b) One hundred 13 year old children were assessed in Physics, Biology, Mathemat- ics and Physical Education. Each child was given a mark out of 300 for each of the four disciplines.

Let X be the vector of marks obtained by an individual child, where X 1 corresponds to the mark in Physics, X 2 the mark in Biology and X 3 , X 4 the marks in Mathematics and Physical Education respectively. The sample covariance is given by

(i) Verify that one of the principal components has coefficients proportional to (1, 1 , 1 , 0). [3] (ii) Given that the remaining principal components are proportional to (1, 0 , − 1 , 10), (1, − 2 , 1 , 0) and (10, 0 , − 10 , −2) calculate the proportion of total variation attributable to each of the four components. [4] (iii) Interpret the components where possible and say what conclusions you would draw from this analysis. Explain your answer. [4]

(iv) Suppose now that the Physics teacher decides to assess his students by giving them marks out of 1000 rather than marks out of 300, like his colleagues. Explain how this might affect the sample covariance matrix and hence, how you might consider modifying your method of analysing the data. [2]

Applied Multivariate Analysis

4 (a) Let X be an (n × p) data matrix in which each row corresponds to a p- variate measurement on one of n individuals. Assuming that the p variates are continuous variables describe three possible measures of dissimilarity of pairs of individuals. Comment on their relative advantages and disadvantages. [3]

(b) What four properties must be satisfied for a dissimilarity function to be a metric dissimilarity coefficient? [2]

The values of four binary variables are measured for each of four individuals as follows: Individual Variable 1 2 3 4

1 1 1 1 0 2 0 0 1 1 3 1 1 1 1 4 0 1 0 1

Construct a dissimilarity matrix for the four individuals using (i) the simple matching coefficient and (ii) Jaccard’s coefficient. [4]

If Srt denotes the simple matching coefficient show that drt = 1 − Srt is a metric dissimilarity coefficient. [4]

(c) Five subjects were each given three psychological tests. The scores for each subject on each test were recorded and the Euclidean distances between each pair of subjects were calculated as follows:

Subject

A B C D E

A 0 - - - - B 4.2 0 - - - C 5.9 7.6 0 - - D 1.2 7.0 10.3 0 - E 6.1 2.6 5.4 7.8 0

Using single-link clustering, cluster the five subjects. Sketch the dendrogram and interpret the results. [4]

How would your dendrogram change if you used a complete-link clustering algo- rithm? [3]

END OF PAPER

Applied Multivariate Analysis