



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This is the Exam of Statistical Science which includes Applied Statistics, Independent, Medals Winners, Following List, Medals, Proportion, Russia, China, Australia etc. Key important points are: Applied Statistics, Independent, Medals Winners, Following List, Medals, Proportion, Russia, China, Australia, Germany
Typology: Exams
1 / 6
This page cannot be seen from the preview
Don't miss anything!




9 am Monday 11 June until 1 pm Thursday 14 June 2001
Attempt three of the four questions.
This is an ‘Open-Book’ examination, involving use of the Statistical Laboratory’s network of workstations. Candidates will receive this paper at 9.00 a.m. on Monday 11 June, and must hand in their scripts to the Chairman of Examiners by 1.00 p.m. on Thursday 14 June.
The data-sets will be emailed to candidates on Monday 11 June.
(The Statistical Laboratory Computer Officer and an Examiner will normally be available for consultation if required between 9.00 a.m. and 4.30 p.m. on these four days.)
Each candidate should submit his/her script with a signed statement that the work has been carried out without any collaboration with others.
The scripts may be handwritten. Candidates are requested to submit at most 25 pages in total. They are advised that the total work set should take between 4 and 6 hours.
The questions carry equal weight.
1 The Independent, October 2 and 3, 2000, gives the following list of 78 Sydney medals winners G,S,B = gold, silver, bronze respectively, MMP= ‘medals in proportion to population’ and MBGDP= medals per $billion GDP.
G S B total MMP MBGDP USA 39 25 33 97 0.36 0. Russia 32 28 28 88 0.60 0. China 28 16 15 59 0.05 0. Australia 16 25 17 58 3.09 0. Germany 14 17 26 57 0.69 0. France 13 14 11 38 0.65 0. Italy 13 8 13 34 0.59 0. Nether’ds 12 9 4 25 1.59 0. Cuba 11 11 7 29 2.61 1. Britain 11 10 7 28 0.47 0. Romania 11 6 9 26 1.16 0. ..................... Chile 0 0 1 1 0.07 0. Iceland 0 0 1 1 3.65 0. India 0 0 1 1 0.01 0. Kyrgyzstan 0 0 1 1 0.21 0. Kuwait 0 0 1 1 0.54 0. Macedonia 0 0 1 1 0.50 0. Qatar 0 0 1 1 1.35 0. SriLanka 0 0 1 1 0.05 0.
(i) Compute the variables pop and BGDP by
pop = total/MMP and BGDP = total/(MBGDP + ·005)
(the ·005 is introduced for obvious reasons)
Summarise the data, including pop and BGDP, by appropriate graphs and tables.
(ii) Now investigate a linear regression of score on log(pop) and log(BGDP), where score is defined by score = 3G + 2S + 1*B
Is this this best scale for your regression?
(iii) Give a brief discussion of principal components, as applied to the data matrix
(log(score), log(pop), log (BGDP)).
3 The data set ‘biopsy’, of which the first few lines appear in the Table below, is an edited version of the Venables and Ripley library (MASS) data-set: any row containing an NA has been deleted. As you can see from
? biopsy
this data frame, which results from biopsies of breast cancer tumours of 699 patients, contains the following measurements
ID Sample code number
V1 Clump thickness V2 Uniformity of cell size
V3 Uniformity of cell shape
V4 Marginal adhesion V5 Single epithelial cell size
V6 Bare nuclei V7 Bland chromatin
V8 Normal nucleoli
V9 Mitoses Class Class of tumor, “benign” or “malignant”.
We are interested in predicting class from V1,.. .,V9. Table ID V1 V2 V3 V4 V5 V6 V7 V8 V9 class 1 1000025 5 1 1 1 2 1 3 1 1 benign 2 1002945 5 4 4 5 7 10 3 2 1 benign 3 1015425 3 1 1 1 2 2 3 1 1 benign 4 1016277 6 8 8 1 3 4 3 7 1 benign 5 1017023 4 1 1 3 2 1 3 1 1 benign 6 1017122 8 10 10 8 7 10 9 7 1 malignant 7 1018099 1 1 1 1 2 10 3 1 1 benign 8 1018561 2 1 2 1 2 1 3 1 1 benign 9 1033078 2 1 1 1 2 1 1 1 5 benign 10 1033078 4 2 1 1 2 1 2 1 1 benign
(i) Summarise, with appropriate graphs, V1,.. .,V9 for each of the 2 values of class.
(ii) How does P(malignant)/P(benign) depend on the 9 variables? (You may like to use
stepAIC( ) from library (MASS).)
How well does your best model predict malignant tumours?
(iii) Now try tree (class ∼.. .)
and compare the results with those of (ii).
(iv) If the physician can only afford to measure 7 of the 9 variables, which ones would you recommend that he use?
4 The data below is from a hypothetical study to assess whether a computerised scheduling intervention would be effective in reducing the waiting time for a certain knee operation after the decision to operate had been made. The study was designed as a randomised-controlled trial of the computerised intervention versus common practice in two centres: Centre A and Centre B. Centre A recruited 200 patients, while Centre B recruited 300 patients. The trial was conducted over a three-year period, with an eighteen- month recruitment period and an additional eighteen months follow-up at the end of this recruitment phase. Additional information on the patient’s age and sex was recorded. Patients under 18 years of age were excluded. It is also known from an audit that took place before the study began that patients aged between 35 and 45, in general, wait longer than patients who are aged 45 years or older or aged 35 years or younger. It was thought at the beginning of the study that the impact of treatment might possibly differ for males and females.
Note that patients may drop out of the study. For example, they may decide to go “private”. However, information on why they dropped out was not recorded. Note also that information on the severity of the knee injury was not recorded.
Using survival data analysis methods, find an appropriate way of modelling this data. Interpret the results and test any assumptions (graphically and/or otherwise) made. Give possible (plausible) reasons for your findings.
ID Time Op Age Sex Intervention Centre 1 192 0 39 1 0 0 2 101 1 46 1 0 0 3 197 1 28 0 0 0 4 66 1 49 0 0 0 5 345 1 35 0 1 0 6 113 1 45 1 0 0 7 62 1 48 1 1 0 8 6 0 30 1 1 0 9 39 1 58 1 1 0 10 174 0 43 0 1 0 11 137 1 39 1 0 0 12 73 1 35 1 0 0 .....................