Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Applied Statistics - Mathematical Tripos - Final Exam, Exams of Mathematics

Anna University Mathematics

This is the Final Exam of Mathematical Tripos which includes Astrophysical Dynamics, Self-Gravitating Stellar System, Power-Law Density Profile, Gravitational Potential, Length and Density Scales, Isotropic Spherical System etc. Key important points are: Applied Statistics, Normal Random Variables, Mean Zero and Variance, Random Vector, Least Squares Estimator, Algebraic Form, Dimensional Vector, Unknown Parameter Values, Covariate Values

Typology: Exams

2012/2013

Uploaded on 02/26/2013

devaku 🇮🇳

4.2

(11)

131 documents

1 / 11

This page cannot be seen from the preview

Don't miss anything!

MATHEMATICAL TRIPOS Part III

Tuesday, 1 June, 2010 9:00 am to 12:00 pm

PAPER 37

APPLIED STATISTICS

Attempt no more than FOUR questions.

There are FIVE questions in total.

The questions carry equal weight.

STATIONERY REQUIREMENTS SPECIAL REQUIREMENTS

Cover sheet None

Treasury Tag

Script paper

You may not start to read the questions

printed on the subsequent pages until

instructed to do so by the Invigilator.

Discover Exams of Mathematics Anna University

Partial preview of the text

Download Applied Statistics - Mathematical Tripos - Final Exam and more Exams Mathematics in PDF only on Docsity!

MATHEMATICAL TRIPOS Part III

Tuesday, 1 June, 2010 9:00 am to 12:00 pm

PAPER 37

APPLIED STATISTICS

Attempt no more than FOUR questions. There are FIVE questions in total. The questions carry equal weight.

STATIONERY REQUIREMENTS SPECIAL REQUIREMENTS

Cover sheet None Treasury Tag Script paper

You may not start to read the questions printed on the subsequent pages until instructed to do so by the Invigilator.

Suppose that Y = (Y 1 ,... , Yn)T^ satisfies Y = Xβ + ε , where X is a known n × p matrix with rank p (< n), β = (β 1 ,... , βp)T^ is unknown, ε = (ε 1 ,... , εn)T^ where ε 1 ,... , εn are independent normal random variables with mean zero and variance σ^2 , and, where vT^ denotes the transpose of v. Derive the least squares estimator βˆ of β. Explain what is meant by the vector Yˆ of fitted values and by the vector ˆǫ of residuals. Find the distribution of ˆǫ. Show that Yˆ is in the space spanned by the columns of X. Show that XT^ ˆǫ = 0 and interpret this result.

[You may assume without proof that, for an m× 1 random vector W and a k×m (constant) matrix A, cov(AW ) = Acov(W )AT^ .]

Gas chromatography is a technique used to detect small amounts of a substance using a gas chromatograph. The edited R output below refers to a study in which five gas chromatograph readings were taken for each of four specimens containing different (known) amounts of the substance. The aim of the study is to calibrate the chromatograph by relating the actual amount of the substance to the chromatograph reading. In the R output reading contains the chromatograph readings and amount contains the amount of the substance in nanograms. The plots are also included below the output.

Write down the algebraic form of the model fitted in gas1.lm, together with any assumptions, and discuss whether or not this model seems to be satisfactory. Explain briefly what is shown in the boxcox plot and explain what you conclude from it. Write down the model fitted in gas2.lm. What features of the plot for this model might lead you to fit model gas3.lm? Using the gas3.lm model, explain how to obtain an estimate of the expected chromatograph reading when the amount of substance is 3.0 nanograms.

gasdata amount reading 1 0.25 6. 2 0.25 7. 3 0.25 6. 4 0.25 6. 5 0.25 7. 6 1.00 29. 7 1.00 30. 8 1.00 30. 9 1.00 29. 10 1.00 29. 11 5.00 211. 12 5.00 204. 13 5.00 212. 14 5.00 213. 15 5.00 205. 16 20.00 929. 17 20.00 905. 18 20.00 922. 19 20.00 928. 20 20.00 919. gas1.lm <- lm(reading~amount) plot(gas1.lm$fitted.values,gas1.lm$residuals) library(MASS)

Part III, Paper 37

The table below shows car insurance premiums for various categories of policyholders with 0, 3, 6 or 9 points on their driving licenses. For each category of policyholder the top row gives the premiums for third party fire and theft only policies and the bottom row gives the premiums for comprehensive policies.

Number of points 0 3 6 9 21 year old male 306 384 384 409 500 555 555 605 21 year old female 266 304 279 287 435 430 464 478 30 year old female 177 177 177 213 320 325 325 268 40 year old male 154 162 162 189 230 230 230 295

In the (edited) R output below, Gender, Age, Policy and Points are factors, and corner point constraints are used.

(a) Comment on any obvious deficiencies of the data.

(b) Write down the algebraic form of the model fitted in insurance1.lm, defining your notation carefully and writing down the assumptions and constraints explicitly. You are given that the residual sum of squares for this model is 19512.

What hypothesis is being tested by the test statistic whose value is f, and why does the test statistic take this form? What is the result of this hypothesis test? Write down your conclusion in words.

(d) Write down the algebraic form of the model fitted in insurance3.lm, again explicitly writing down the assumptions and constraints. Test whether this model is an improvement over insurance2.lm, and summarise in words how premiums depend on age, gender, policy type and the number of points. What is the estimated comprehensive policy premium for a 40 year old female policyholder with 6 points on her license?

x [1] 306 384 384 409 500 555 555 605 266 304 279 287 435 430 464 478 177 177 177 [20] 213 320 325 325 368 154 162 162 189 230 230 230 295 Gender [1] M M M M M M M M F F F F F F F F F F F F F F F F M M M M M M M M Levels: F M Age [1] 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 30 30 30 30 30 30 30 30 40 [26] 40 40 40 40 40 40 40 Levels: 21 30 40 Points

Part III, Paper 37

[1] 0 3 6 9 0 3 6 9 0 3 6 9 0 3 6 9 0 3 6 9 0 3 6 9 0 3 6 9 0 3 6 9 Levels: 0 3 6 9

Policy [1] 3rd 3rd 3rd 3rd comp comp comp comp 3rd 3rd 3rd 3rd comp comp comp [16] comp 3rd 3rd 3rd 3rd comp comp comp comp 3rd 3rd 3rd 3rd comp comp [31] comp comp Levels: 3rd comp insurance1.lm <- lm(x~Age+Gender+Policy+Points)

Points2 <- factor(rep(c(1,1,1,2),times=8)) Points [1] 1 1 1 2 1 1 1 2 1 1 1 2 1 1 1 2 1 1 1 2 1 1 1 2 1 1 1 2 1 1 1 2 Levels: 1 2 insurance2.lm <- lm(x~Age+Gender+Policy+Points2) f <- ((22323-19512)/2)/(19512/24) f [1] 1. qf(0.95,2,24) [1] 3.

insurance3.lm <- lm(x~Age*Policy + Gender + Points2) anova(insurance3.lm) Df Sum Sq Mean Sq F value Pr(>F) Age 2 275639 137820 329.850 < 2.2e- Policy 1 167476 167476 400.827 < 2.2e- Gender 1 35627 35627 85.267 2.276e- Points2 1 10438 10438 24.981 4.177e- Age:Policy 2 12295 6147 14.713 6.754e- Residuals 24 10028 418 summary(insurance3.lm) Estimate Std. Error t value Pr(>|t|) (Intercept) 269.760 9.094 29.665 < 2e- Age30 -94.187 13.520 -6.966 3.33e- Age40 -207.812 13.520 -15.370 6.38e- Policycomp 175.375 10.220 17.159 5.61e- GenderM 94.375 10.220 9.234 2.28e- Points22 41.708 8.345 4.998 4.18e- Age30:Policycomp -26.875 17.702 -1.518 0. Age40:Policycomp -95.875 17.702 -5.416 1.46e-

Residual standard error: 20.44 on 24 degrees of freedom Multiple R-Squared: 0. F-statistic: 171.5 on 7 and 24 DF, p-value: < 2.2e-

Part III, Paper 37 [TURN OVER

summary(blow3.glm) Estimate Std. Error z value Pr(>|z|) (Intercept) -9.5621 0.7499 -12.75 <2e- lT 2.2164 0.2079 10.66 <2e- S 4.5086 0.5159 8.74 <2e-

Part III, Paper 37 [TURN OVER

The (edited) R output below refers to a study into the effectiveness of some particular traffic control measures in reducing accident rates. In each of eight locations, there are data on the number of accidents over a number of years before and after the installation of the traffic control measures. In the R ouput below, loc contains the location identifiers (numbers between 1 and 8), befaft contains indicators of whether the observation was taken before or after installation (1 denotes before, 2 denotes afterwards), years contains the length of the observation period (in years), and nacc contains the number of accidents that occurred during that observation period. Corner point constraints are used.

(a) Explain what is calculated in line (*).

(b) Write down the algebraic form of the model fitted in traffic1.glm, defining your notation carefully and stating any assumptions. Using the output to summary(traffic1.glm), show how to obtain an estimate of the ratio r of the accident rate after installation to the accident rate before installation. Explain how to obtain an approximate 95% confidence interval for r.

(c) Write down the algebraic form of the model in traffic2.glm. Why do you think this model is fitted? Comment on the fit of the model.

(d) Write a short paragraph giving relevant formal statistical analysis and your conclu- sions about the effect of the traffic measures on accident rates.

loc [1] 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 befaft [1] 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 years [1] 9 2 9 2 8 3 8 2 9 2 8 2 9 2 8 3 nacc [1] 13 0 6 2 30 4 20 0 10 0 15 6 7 1 13 2 Befaft <- factor(befaft) Loc <- factor(loc) r1 <- sum(nacc[befaft==1])/sum(years[befaft==1]) r2 <- sum(nacc[befaft==2])/sum(years[befaft==2]) r2/r1 # line (*) [1] 0. traffic1.glm <- glm(nacc~offset(log(years))+Befaft,poisson) summary(traffic1.glm) Estimate Std. Error z value Pr(>|z|) (Intercept) 0.51669 0.09366 5.517 3.45e- Befaft2 -0.69901 0.27466 -2.545 0. Null deviance: 58.589 on 15 degrees of freedom Residual deviance: 50.863 on 14 degrees of freedom exp(-0.69901) [1] 0. traffic2.glm <- glm(nacc~offset(log(years))+Loc+Befaft,poisson) anova(traffic2.glm,test="Chisq") Df Deviance Resid. Df Resid. Dev P(>|Chi|) NULL 15 58. Loc 7 32.564 8 26.025 3.191e-

Part III, Paper 37

A researcher has collected hospital data for swine influenza-related admissions during the middle period of the 2009 UK epidemic. Specifically, she has recorded the dates of admission, swine influenza-related death and discharge, and the time still in hospital since admission if a patient has yet to be discharged or to die from swine influenza- related causes at the time of data collection. She approaches you with the data and is particularly interested in the case fatality ratio θ associated with hospitalisation (i.e. the proportion of swine influenza-related hospital cases who eventually die from the disease) and the conditional distribution corresponding to the time of death given that a case will eventually die (I = 1) from swine influenza-related causes (with distribution function F (t|I = 1) and density f (t|I = 1)). The conditional distribution corresponding to the time to recovery (i.e. discharge) given that a case will eventually recover (I = 2) from the illness (with distribution function F (t|I = 2) and density f (t|I = 2)) may also be of interest. You recognise that this is a survival analysis problem and offer to help her analyse the data.

By appropriately defining all notation used: (a) Identify which type(s) of patients correspond to right-censored observations.

(b) Write down the likelihood contributions for a case (i.e. a swine influenza-related admitted patient) who

(i) dies in hospital at time t after admission; (ii) recovers and is discharged at time t after admission; (iii) remains in hospital at time t after admission.

(c) Derive an E-M algorithm, giving full details for the E-step, that can be used to estimate the parameters of interest to the researcher given that the conditional densities, f (t|I = 1) and f (t|I = 2), associated with time to swine influenza-related death and time to recovery given eventual death from swine influenza-related causes and eventual recovery respectively, are log-normal densities with parameters (μ 1 , σ 1 ) and (μ 2 , σ 2 ).

[Hint: if X has a log-normal distribution with parameter (μ, σ), then Y = log(X) has a normal distribution with mean μ and variance σ^2. Also, if Y has a N (μ, σ^2 ) distribution, then, writing z = (y − μ)/σ , we have E(Y |Y > y) = μ + σψ(z) ,

E

Y − a b

∣ Y > y

b 2

σ^2 [1 − ω(z)] + [(μ − a) + σψ(z)]^2

for constants a and b ( 6 = 0), and

var (Y |Y > 0) = σ^2 [1 − ω(z)] where ψ(z) = φ(z) 1 − Φ(z)

and ω(z) = ψ(z)[ψ(z) − z] ,

and where φ(·) and Φ(·) are the density and distribution function respectively for a standard normal distribution.]

Part III, Paper 37

END OF PAPER

Part III, Paper 37

Applied Statistics - Mathematical Tripos - Final Exam, Exams of Mathematics

Related documents

Partial preview of the text

Download Applied Statistics - Mathematical Tripos - Final Exam and more Exams Mathematics in PDF only on Docsity!

MATHEMATICAL TRIPOS Part III

PAPER 37

APPLIED STATISTICS

STATIONERY REQUIREMENTS SPECIAL REQUIREMENTS

E

END OF PAPER