Download M. Phil. in Statistical Science: Exam Questions on Design & Multivariate Analysis and more Exams Statistics in PDF only on Docsity! M. PHIL. IN STATISTICAL SCIENCE Friday 6 June, 2003 1:30 to 4:30 PAPER 42 Experimental Design and Multivariate Analysis Attempt FOUR questions. There are six questions in total. The questions carry equal weight. You may not start to read the questions printed on the subsequent pages until instructed to do so by the Invigilator. 2 1 Applied Multivariate Analysis Suppose that x1, . . . , xn is a random sample from the p-variate normal distribution N(µ, V ). (i) Show that if `(µ, V ) is the corresponding log-likelihood function, then −2`(µ, V ) = n log|V |+ ntr(V −1S) + n(x̄− µ)T V −1(x̄− µ), where you should define x̄, S. (ii) State without proof the formulae for the maximum likelihood estimators of µ, V , and (also without proof) the joint distribution of these two quantities. (iii) Show that the generalised likelihood ratio test of H0 : V is a diagonal matrix may be written as reject H0 if log|R| < constant, where R is the sample correlation matrix derived from x1, . . . , xn. What is the form of this test in the case p = 2? 2 Applied Multivariate Analysis Each of a class of 13 students is given a set of 10 “Yes/No” questions to answer, and the corresponding data matrix is read into S-Plus, with the response “Yes” corresponding to a 1, and “No” to a 0. (i) Explain carefully how the distance matrix is computed. (You may assume that the function dist2full (d) converts (dij , 1 6 i < j 6 13) to the “full” distance matrix (dij , 1 6 i, j 6 13).) (ii) Explain briefly the results of the hierarchical clustering algorithm, and sketch the graph that you would expect it to give. > a _ read.table("students", header=T) > a _ as.matrix(a); a eggs meat coffee beer UKres Cantab Fem sports driver Left.h Philip 1 1 1 0 1 1 0 0 1 1 Chad 1 1 1 0 0 0 0 1 1 0 Graham 1 1 1 1 1 1 0 1 1 0 Tim 1 1 1 1 1 1 0 1 0 0 Mark 1 1 0 1 1 1 0 0 0 1 Juliet 0 1 1 0 1 0 1 0 0 0 Garfield 0 1 1 1 0 0 0 1 0 0 Nicolas 1 1 1 1 0 0 0 1 1 0 Frederic 1 1 0 1 0 0 0 1 1 0 John 1 1 1 1 0 0 0 0 1 0 Sauli 1 1 0 0 1 0 0 1 1 0 Fred 1 1 1 0 0 0 0 1 0 0 Gbenga 1 1 1 0 0 0 0 1 0 0 Paper 42 5 4 Design of Experiments Let n be the incidence matrix for a general block design with t treatments to be compared in b blocks, each containing k(6 t) experimental units, so that nij = 1 if the ith treatment occurs in the jth block, and nij = 0 otherwise. Interpret (nnT )ii and (nnT )il, i 6= l, in words. Consider a design where nnT = (r − λ)It + λ1t1Tt , where It is the t × t identity matrix and 1t is a t× 1 vector of ones. Determine what kind of design this is (i) if r = λ and (ii) if λ < r . In case (ii), show that k < t and b > t. The main effects of five different types of hardwood A,B, C, D, E on paper strength are being investigated. Four observations are obtained on each type, giving the following analysis of variance df ss Hardwood types * 42.4 Residual * 46.7 Total (corrected) * 89.1 Complete the missing degrees of freedom, and test whether there is a significant difference between the hardwood types. In fact, further information reveals that the observations were obtained over a period of five days as shown below. Day 1 2 3 4 5 Hardwood types: ABDE BCDE ACDE ABCD ABCE What type of design is this? The sum of squares for days is 28.4 and the sum of squares for hardwood types (adjusted for days) is 18.3. Test whether there is a significant difference between the hardwood types. [Hint: If P(Y > Fm,n(α)) = α where Y ∼ Fm,n, then F5,14(0.10) = 2.31 F5,10(0.10) = 2.52 F4,15(0.10) = 2.36 F4,11(0.10) = 2.54 F5,14(0.05) = 2.96 F5,10(0.05) = 3.33 F4,15(0.05) = 3.06 F4,11(0.05) = 3.36 ] Paper 42 [TURN OVER 6 5 Design of Experiments Explain how to construct a 1 2k replicate of a 2m experiment (you may quote results from lectures without proof). If k = 1, show that each contrast is aliased with one other contrast. An experimenter wishes to investigate all main effects and all two-factor interactions except AF of six factors A,B, C, D, E, F (each at two levels) affecting the operation of an industrial process. In order to carry out the experiment, normal production must be stopped, and it is decided that normal production can only be interrupted for four days. Treating days as blocks, explain how to construct a suitable design if eight treatment combinations can be tested in a single day, and only 4 days are available for the experiment. You may assume that third and higher order interactions are negligible. Give the partition of the degrees of freedom in the resulting analysis of variance table. Paper 42 7 6 Design of Experiments A scientist wishes to maximize the yield, y, during crystal growth as a function of three coded variables x1, x2, x3. As part of a search procedure for the maximum, trials are run at the 14 points given below, with associated yields y1, . . . , y14 as shown. x1 x2 x3 Yield −1 −1 −1 y1 −1 −1 1 y2 −1 1 −1 y3 −1 1 1 y4 1 −1 −1 y5 1 −1 1 y6 1 1 −1 y7 1 1 1 y8 0 0 0 y9 0 0 0 y10 0 0 0 y11 0 0 0 y12 0 0 0 y13 0 0 0 y14 Consider the model y = β0 + β1x1 + β2x2 + β3x3 + , where errors for different runs are independent N(0, σ2) random variables. Find the least squares estimate β̂ of β = (β0, β1, β2, β3)T in terms of y1, . . . , y14, and write down the distribution of β̂. After examining the fit of this model, six more trials are run as shown below. x1 x2 x3 Yield −1.682 0 0 y15 1.682 0 0 y16 0 −1.682 0 y17 0 1.682 0 y18 0 0 −1.682 y19 0 0 1.682 y20 What design is formed by all 20 points? A full second order model is fitted to these 20 points. Write down this model. The residual sum of squares is found to be 1860.98, and the sum of squares from the six centre points is 14∑ i=9 ( yi − 1 6 14∑ k=9 yk )2 = 859.33 . Describe how to test for lack of fit of this model. [Hint: If P(Y > Fm,n(α)) = α where Y ∼ Fm,n, then F9,10(0.10) = 2.35 F5,5(0.10) = 3.45 F4,6(0.10) = 3.18 F9,10(0.05) = 3.02 F5,5(0.05) = 5.05 F4,6(0.05) = 4.53 ] Paper 42