




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This is the Exam of Statistics which includes Network Comprises, Illegally Downloaded, Independent, Probability, Computers, Particular Network, Likelihood Function, Maximum Likelihood Estimator, Relative Likelihood Interval etc. Key important points are: Independently, Identically Distributed, Exponential Random Variables, Parameter, Probability, Density Function, Likelihood Function, Maximum Likelihood, Log Likelihood Function, Estimates
Typology: Exams
1 / 8
This page cannot be seen from the preview
Don't miss anything!





PART II (Second year)
MATHEMATICS & STATISTICS 2 hours
Math 235: Statistics
You should answer ALL Section A questions and THREE Section B questions. In Section A there are questions worth a total of 50 marks, but the maximum mark that you can gain there is 40. There are statistical tables at the end of this exam paper.
SECTION A
A1. Let X 1 , X 2 ,... , Xn be independently and identically distributed exponential random variables with parameter θ > 0, each having probability density function f (x|θ) = θ exp(−θx), x > 0.
(a) Write down the likelihood function of θ, L(θ), and the log-likelihood function l(θ). [4] (b) Find the maximum likelihood estimator of θ. Verify that it is indeed a maximum. [4] (c) Determine the maximum likelihood estimator of the mean 1/θ. What property of maximum likelihood estimates have you used to obtain your answer? [4]
A2. A family of mice has n children. Among mice it is known that the probability that a baby is a boy is 3/4 and is 1/4 for a girl, independently for each child. The family is observed to have one boy but the number of girls is unknown.
(a) Write down the probability of this observation conditional on n. [4] (b) What are the possible values of n? [2] (c) Write down the likelihood function of n, L(n). Sketch the likelihood function of n up to 5. [4] (d) Show by considering likelihood ratios, or otherwise, that the likelihood is a decreasing function of n. Hence state the maximum likelihood estimate. [3] please turn over
SECTION A continued
A3. A high street store advertises in the first week of each month. They believe that monthly sales Yi are higher when advertising expenditure ei is increased, and also when there is a Bank Holiday in the month. The variable bi takes the value 1 when there is a Bank Holiday in month i and zero otherwise. Data are available for the last six months,
Sales (£100) 38.1 325.4 63.6 83.6 201.2 29. Advertising expenditure (£100) 6.7 244.7 30.0 44.7 164.0 12. Bank holiday? No Yes No Yes No No It is suggested that the normal linear model Yi = θ 1 + θ 2 ei + θ 3 bi + Zi for i = 1,... , 6 might be appropriate.
(a) What assumptions are made about the errors Zi in the normal linear model? [3] (b) Give the definition of a factor and state how it is represented in a linear model. [2] (c) Write down the design matrix X for this model. [4] (d) If θ = (22. 2 , 1. 2 , 16 .6), calculate the predicted values for the sales. [3]
A4. Observations y 1 , y 2 and y 3 are realised values from the normal linear model Yi = θxi + Zi i = 1, 2 , 3 where x 1 = 1, x 2 = 3, x 3 = 6 and var (Zi) = σ^2. Let θˆ be the least squares estimate of the parameter θ and S(θ) be the error sum of squares function.
(a) Write down the property satisfied by S(θˆ). [2] (b) Write down the design matrix X for this model, and hence find the least squares estimate θˆ. [5] (c) Show that E(θˆ) = θ. What is this property of θˆ called? [3] (d) Show that var (θˆ) = 461 σ^2. [3]
please turn over
SECTION B continued
B3. A simple linear regression model relating measured responses yi, i = 1, 2 ,... , n to a two level factor with indicator variables xi, 1 and xi, 2 may be written as
model 1: yi = θ 1 xi, 1 + θ 2 xi, 2 + zi
or
model 2: yi = φ 1 + φ 2 xi, 2 + zi.
The following table shows values of the measured responses and indicator variables
yi 3.2 4.5 6.8 1.2 5.3 2. xi, 1 1 1 0 0 1 0 xi, 2 0 0 1 1 0 1 (a) Explain why a constant term has not been included in model 1. [2] (b) Write out the design matrices X and A for the two linear models. [3] (c) (i) Show that the two models are equivalent and may be related to the transformation matrix T such that X = AT and φ = T θ. Write down T. [4] (ii) Using your matrix T , obtain expressions for each of φ 1 and φ 2 in terms of θ 1 and θ 2. [2] (d) Using your answer to part (c) or otherwise, interpret the coefficients φ 1 and φ 2. [2] (e) If a second two-level factor is to be included in the model, explain why it is preferable to include this additional factor in model 2 than in model 1. [2] The following table shows the results of fitting model 2 to the data Parameter φ 1 φ 2 Estimated coefficient 3.47 0. Standard error of estimated coefficient 1.28 1. The standard errors have been estimated. (f) (i) What are the residual degrees of freedom for this model? [2] (ii) Test whether or not the coefficient φ 2 is significantly different from zero. Interpret your result. [3]
please turn over
SECTION B continued
B4. The normal linear model for n observations and p explanatory variables can be expressed in matrix form as follows: Y = Xθ + Z where Z ∼ N(0, σ^2 I) and I is the n × n identity matrix. The least squares estimator of θˆ is θˆ = (X′X)−^1 X′Y. (a) Show that the variance matrix for θˆ is σ^2 (X′X)−^1. [4] (b) Write down the formula for the estimated residual variance ˆσ^2. [2] A zoologist is interested in measuring whether the body length y of bank voles is related to tail length t or foot length f. He fits two different models yi = θ 0 + θ 1 ti + zi and yi = θ 0 + θ 1 ti + θ 2 fi + zi. He obtains the following data: Body length (mm) 7.1 8.4 9.6 12.3 10.3 11.6 11. Tail length (mm) 2.5 3.1 2.8 4.2 3.5 3.7 3. Foot length (mm) 1.1 0.3 1.3 1.6 1.0 1.9 2. The residual sums of squares are, for the first model, S 1 = 2.77 and, for the second model, S 2 = 1.90. (c) Consider the first model only. For this model, the parameter estimate for θ 1 is 2.89, with a standard error of 0.496. The estimated residual variance is ˆσ^2 = 0.553. (i) Calculate the variance matrix for θˆ in this model, and hence obtain the standard error for θˆ 1. [3] (ii) Calculate a 95% confidence interval for θˆ 1 and hence test the null hypothesis that θ 1 is zero. [3] (d) Are either of these models nested within the other? Explain your answer. [2] (e) (i) What is the estimated residual variance for the second model? [2] (ii) Using the F statistic, test the null hypothesis that the first model is acceptable. You should give the value of the test statistic and the result of the test. [4]
end of exam
Values of t for which P (| T |> t) = p, where T has a t-distribution with r degrees of freedom.
The F distribution table Values of f for which P (F > f ) = 0.05 (upper values) and P (F > f ) = 0.01 (lower values) where F has an F -distribution with r and s degrees of freedom. r
- 0.20 0.10 0.05 0.01 0. p - 1 3.078 6.314 12.706 63.657 636. - 2 1.886 2.920 4.303 9.925 31. - 3 1.638 2.353 3.182 5.841 12. - 4 1.533 2.132 2.776 4.604 8. - 5 1.476 2.015 2.571 4.032 6. - 6 1.440 1.943 2.447 3.707 5. - 7 1.415 1.895 2.365 3.499 5. - 8 1.397 1.860 2.306 3.355 5. - 9 1.383 1.833 2.262 3.250 4. - 10 1.372 1.812 2.228 3.169 4. - 11 1.363 1.796 2.201 3.106 4. - 12 1.356 1.782 2.179 3.055 4. - 13 1.350 1.771 2.160 3.012 4. - 14 1.345 1.761 2.145 2.977 4. - 15 1.341 1.753 2.131 2.947 4. - 16 1.337 1.746 2.120 2.921 4.