



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This is the Exam of Statistical Science which includes Recursive Method, Time Series, Observations, Stationary Autoregressive Process, Obtaining Forecasts, Noise Process, Weakly Stationary, Considering, Recursive Forecasts etc. Key important points are: Positive Responses, Observe, Independent, Conditionally, Distribution, Posterior Distribution, Jeffreys Prior, Bernoulli Trials, Positive Integers, Successes
Typology: Exams
1 / 5
This page cannot be seen from the preview
Don't miss anything!




Thursday, 4 June 2009 1:30 pm to 3:30 pm
Attempt no more than THREE questions. There are FOUR questions in total. The questions carry equal weight.
Cover sheet None Treasury Tag Script paper
You may not start to read the questions printed on the subsequent pages until instructed to do so by the Invigilator.
1 Suppose we observe x positive responses out of m Bernoulli trials, each assumed conditionally independent given an unknown, common success chance θ. Our prior distribution for θ is Beta(a,b), with density p(θ|a, b) = (^) Γ(Γ(aa)Γ(+bb)) θa−^1 (1 − θ)b−^1 ; θ ∈ (0, 1).
(a) Derive the posterior distribution for θ given x.
(b) Show that the Jeffreys prior for θ is Beta(0.5,0.5). State the invariance property of Jeffreys prior distributions.
(c) We plan to observe a further n Bernoulli trials. If a and b above are positive integers, show that the predictive distribution for the future number of successes Y can be written as p(y|n, x, m, a, b) = A × B, where
m + a + b − 1 m + n + a + b − 1 and B =
y + x + a − 1 y
m + n − y − x + b − 1 n − y
m + n + a + b − 2 n
(d) Another form of invariance property is as follows. Suppose we observe x out of m successes, and calculate the probability of observing y successes out of a further m trials. Compare this with the situation in which we had observed y out of m successes, and calculated the probability of x successes out of a further m trials. Then we should require that these two probabilities are the same. Show that this is true if a = b = 1.
(e) What other attractive predictive property does a Beta(1,1) prior have, as identified by Bayes?
(f) Can you interpret the form of B in p(y|n, x, m, a, b) when a = b = 1?
Applied Bayesian Statistics
3 Some classic mutagenicity assay data on salmonella features three plates that have been processed at each of six doses of quinoline, (recorded as μg per plate). The numbers of revertant colonies of TA98 Salmonella on each plate are shown below.
Dose level i 1 2 3 4 5 6 Dose xi 0 10 33 100 333 1000 Plate 1 15 16 16 27 33 20 Plate 2 21 18 26 41 38 27 Plate 3 29 21 33 60 41 42
A certain dose-response curve is suggested by theory, so that for an observation Yij on the jth plate at the ith dose, we assume a Poisson model allowing for ‘over-dispersion’:
Yij ∼ Poisson(μij ) independently, given the μij ’s log μij = α + β log(xi + 10) + γxi + λij λij ∼ Normal(0, τ 2 ) independently.
(a) Just from examining the data by eye, why do you think an allowance for over- dispersion may be needed?
(b) In what way does this model allow for over-dispersion? The model is fitted using the following WinBUGS code:
for(i in 1:doses) { for(j in 1:plates) { y[i,j] ~ dpois(mu[i,j]) log(mu[i,j]) <- alpha + betalog(x[i]+10) + gammax[i] + lambda[i,j] lambda[i,j] ~ dnorm(0.0, invtau2) } } alpha ~ dunif(-100,100) beta ~ dunif(-100,100) gamma ~ dunif(-100,100) tau~ dunif(0,100) invtau2<-1/(tau*tau)
(c) How could the convergence be improved?
(d) Explain briefly the prior distributions given to the parameters, in particular why the standard Jeffreys prior is not given to variance parameter τ 2.
(e) How would you adapt the code if you wanted to fit a model with no overdispersion? (f) The following table shows the DIC output based on 10000 iterations when fitting models with and without over-dispersion.
Dbar = post.mean of -2logL; Dbar pD DIC Model without over-dispersion 139.2 2.9 142. Model with over-dispersion 110.6 13.6 124.
Applied Bayesian Statistics
Interpret these results, in particular the pD column.
(g) How would you calculate standardised residuals around the fitted values for each plate? What would you be looking for and what would this procedure be checking?
(h) Suppose you wanted to check the underlying dose-response assumption by seeing if the predictions it would make matched the observed data. What replications might you make and how would you compare them with the observed data?
(a) Suppose we have two alternative models M 1 and M 2 with parameter vectors ψ 1 and ψ 2 respectively, and we are provided with prior distributions p(ψi|Mi), sampling distributions pY (y|ψi, Mi) for i=1,2, and a prior probability p(M 1 ) = 1 − p(M 2 ). For an observation y, how would we find the posterior odds on model M 1? In a (simplified version of a) micro-array experiment, we will make observations Y 1 , ..., YN which summarise the expression of N genes, where N is very large. Each Yi is a standardised Normal variable with mean θi and variance 1, where θi is the true expression of gene i. If a gene i is ‘negative’, then θi = 0. If gene i is ‘positive’, then θi is assumed to be a Normal variable with mean 0 and variance V , where V is assumed known. The proportion of ‘positive’ genes is denoted q, for the moment assumed known. We now observe a vector y = (y 1 , ..., yN ).
(b) For a positive gene, state the posterior mean of θi given yi.
(c) Write down the predictive distribution for Yi|V for a positive gene i. Hence derive an expression for the posterior odds in favour of a gene being positive, as a function of yi, V and q.
(d) Suppose now that q is unknown. Write down an expression for p(y|q, V ). Explain briefly how you might go about finding a maximum likelihood estimate for q?
(e) Suppose you have external information that q is around 10%, and is unlikely to be above 15%. In words, how might you tranform this information into a formal prior distribution?
(f) By introducing an indicator function or otherwise, provide rough WinBUGS code that will provide full posterior distributions for q and the θi’s.
(g) Suppose you are not told the actual values comprising y, but only that 15% of the genes had an observed expression greater than 2. How might you estimate q?
Applied Bayesian Statistics