



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A series of questions related to a bayesian analysis of false positives in statistical science. The questions cover topics such as ioannidis' paper on the prevalence of false research findings, bayesian modeling for monitoring testosterone/epitestosterone ratios, and independent logistic growth curve modeling for orange tree circumferences. Questions on calculating probabilities, deriving distributions, and interpreting results.
Typology: Exams
1 / 7
This page cannot be seen from the preview
Don't miss anything!




Wednesday 4 June 2008 1.30 to 3.
Attempt no more than THREE questions. There are FOUR questions in total. The questions carry equal weight.
When asked to provide code, you need only give the approximate form for the statements: precise syntax is not expected.
Cover sheet Treasury tag Script paper
None
1 Much of biomedical science is concerned with finding true relationships between ‘expos- ures’ (say genes) and ‘outcomes’ (say diseases). Ioannidis (2005) published a much-cited paper Why most published research findings are false, in which he supposed that relationships being studied were either ‘true’ (denoted T ) or ‘false’ (denoted F ): a false relationship corresponds to a null hypothesis H 0 , while a true relationship corresponds to an alternative hypothesis H 1. He assumed, in any particular field, a ratio R between the proportion of true relationships and the proportion of false relationships. Furthermore, he assumed that each study was reported as being either ‘positive’ (denoted P ), or ‘negative’ (denoted N ), and that all studies were designed with Type 1 error α (the probability that a false relationship is detected in the study - a ‘false positive result’) and Type II error β (the probability that a relationship that truly does exist is not detected in the study - a ‘false negative result’).
(a) What is the probability that a relationship is false, given that a study reported it as positive?
(b) Show that R has to be at least α/(1 − β), in order for a reported positive result to be more likely true than false.
Suppose a study is designed to have a test statistic Y with a N(0, 1) distribution under the null hypothesis that there is no relationship, and the alternative hypothesis is that Y has a N(2. 49 , 1) [NB Φ(1.65) ≈ 0. 95 , Φ(− 0 .84) ≈ 0 .2, where Φ is the standard normal cumulative distribution function].
(c) If a positive result is declared if Y > 1 .65 , show that α = 0. 05 , β = 0..
(d) If we observe y , show that the Bayes factor for the alternative hypothesis against the null hypotheses is e^2 .49(y−^1 .245).
(e) In a succeeding paper, Goodman and Greenland criticised Ioannidis’ analysis, saying that it is not sensible to summarise results by just whether they are ‘significant’ or not. In the light of the preceding analysis, do you think this is a reasonable criticism?
Suppose that n independent studies, each designed with Type I error α and Type II error β, are performed to test the same relationship.
(f ) What is the chance that, if the relationship does not exist, at least one will show a positive result?
(g) If all we know is that there is at least one positive result among the n studies, what are the odds that this is a true finding?
(h) What happens to these odds as n increases? Why does Ioannidis claim that ‘The hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true’?
(i) Goodman and Greenland also say it is unrealistic to assume that all we know is that there is at least one positive result among n studies. If we read a published account that claims a positive relationship based on the full results of many studies carried out in a field, should that increase or decrease our confidence in the positive finding, relative to what we would feel after reading a single study?
Paper 44
3 The testosterone/epitestosterone (T/E) ratio is used as a biomarker for detecting athletes’ abuse of testosterone, and a Bayesian model has been suggested for monitoring repeated measures of the logarithm of an athlete’s T/E ratio.
An athlete i is assumed, if he/she is not taking additional testosterone, to have log(T/E) measurements drawn independently from a normal distribution with mean θi and standard deviation σi. We assumed that the population of θi’s are independently drawn from a N(μ, τ 2 ) distribution.
Suppose for the moment that we know values for μ and τ. We select an athlete i for whom we have no data, and assume we know σi.
(a) What is the predictive distribution for Yi 1 , the first observation to be taken on athlete i? (b) We want to set an upper limit ui 1 , so that the predictive probability that an observation lies above that limit, if the athlete is not abusing testosterone, is 1/1000. Write down an explicit expression for ui 1 in terms of μ, τ, σi and Φ, where Φ is the standard normal distribution function.
Suppose we now take a measurement yi 1 on athlete i.
(c) What is the posterior distribution for θi?
(d) Denoting the posterior mean and variance of θi by mi 1 and vi 1 respectively, what is the predictive distribution for Yi 2 , a future second measurement?
(e) What monitoring limit ui 2 would you set for Yi 2 , such that there was a 1/1000 predicted probability that an ‘innocent’ athlete would exceed that limit?
(f ) Explain in words how the monitoring limits will change as more data becomes available on athlete i.
In fact the parameters μ, τ and σ’s are unknown, but there is a historical database available of repeated measurements on athletes who were not abusing testosterone, denoted yj 1 , ..., yjnj , for j in 1 to J. The following code has been suggested for analysing the database.
for (j in 1:J) {
for (k in 1:n [j]) {
Y [j,k] ~ dnorm (theta [j], invsigma2 [j]) }
theta [j] ~ dnorm (mu, invtau2) log (invsigma2 [j]) <- - log.sigma2 [j] log.sigma2 [j] ~ dnorm (phi, invpsi2) # (1) }
mu ~ dunif (-100, 100) invtau2 <- 1 / (tautau) ; tau ~ dunif (0, 100) # (2) phi ~ dunif (-100, 100) invpsi2 <- 1 / (psipsi) ; psi ~ dunif (0, 100) # (3)
(g) Explain why this code may be reasonable, with special reference to the numbered lines of code.
Question continued/...
Paper 44
(h) Suppose you now have three measurements available on a new athlete, and you want to check whether the third observation is ‘extreme’ relative to the first two (i.e. outside a 99.9% prediction interval). How would you adapt the code to carry out this analysis?
The assumption that the θ’s are normally distributed is questionable, and the following modification to the prior distribution has been suggested:
theta [i] ~ dnorm (mu , invtau2 [i]) invtau2 [i] <- lambda [i] / (4tautau) lambda [i] ~ dchisqr (4)
(i) What prior distribution will this imply for each (θi − μ)/τ?
(j) Why might this be an appropriate model?
(k) How might you assess if it was a more appropriate model for the historical data?
Paper 44 [TURN OVER
(a) Why is x transformed?
(b) Interpret the parameters phi[i,1], and 1/ (1 + phi[i,2]). Why do you think this is called a logistic curve?
(c) Do you think the prior distributions are reasonable?
The investigator decides to replace the starred lines of code in Model A by the following (Model B).
..... theta [i,1:3] ~ dmnorm (mu [], Omega [,]) } for (p in 1:3) { mu[p] ~ dunif (-100,100) } Omega [1:3, 1:3] ~ dwish (R [,], 3) R [1,1] <- 1; R [1,2] <- 0; R [1,3] <- 0; R [2,1] <- 0; R [2,2] <- 1; R [2,3] <- 0; R [3,1] <- 0; R [3,2] <- 0; R [3,3] <- 1
Some of the output is shown below:
theta [,1] theta [,2] theta [,3] mean sd mean sd mean sd theta [1,] 5.063 0.090 0.753 0.357 1.443 0. theta [2,] 5.392 0.059 0.679 0.252 1.601 0. theta [3,] 5.057 0.103 0.557 0.354 1.412 0. theta [4,] 5.427 0.053 0.673 0.234 1.715 0. theta [5,] 5.302 0.098 0.291 0.289 1.498 0.
(d) Comment on the appropriateness of the new code
(e) Why might it be reasonable to assume that all trees have the same values for theta[i,2] and theta[i,3]?
(f ) How might you change the code to reflect this assumption (keeping theta[i,1] as a random effect)? Call this Model C.
After 50,000 iterations the following output for DICs were obtained for all three fitted models:
Dbar = post.mean of -2logL; Dbar pD DIC Model A 248.1 12.0 260. Model B 244.2 13.9 258. Model C 247.3 8.2 255.
(g) Comment on the how the values of Dbar (the mean deviance), pD (the effective number of parameters) and DIC depend on the models.
(h) What might be considered unsatisfactory about Dbar and pD for Model A? Why might this have happened?
(i) In the final model, say very briefly how you would go about testing the normality assumption for the residual error. Would you also be testing the adequacy of the normal assumption for the random effects?
Paper 44