Posterior distribution, Study notes of Mathematics

Bayesian posterior distribution prior

Typology: Study notes

2020/2021

Uploaded on 02/09/2021

aaisha-g
aaisha-g 🇨🇦

1 document

1 / 17

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
MAS3301 Bayesian Statistics
Problems 5 and Solutions
Semester 2
2008-9
Problems 5
1. (Some of this question is also in Problems 4). I recorded the attendance of students at
tutorials for a module. Suppose that we can, in some sense, regard the students as a sample
from some population of students so that, for example, we can learn about the likely behaviour
of next year’s students by observing this year’s. At the time I recorded the data we had had
tutorials in Week 2 and Week 4. Let the probability that a student attends in both weeks be
θ11,the probability that a student attends in week 2 but not Week 4 be θ10 and so on. The
data are as follows.
Attendance Probability Observed frequency
Week 2 and Week 4 θ11 n11 = 25
Week 2 but not Week 4 θ10 n10 = 7
Week 4 but not Week 2 θ01 n01 = 6
Neither week θ00 n00 = 13
Suppose that the prior distribution for (θ11, θ10 , θ01, θ00) is a Dirichlet distribution with den-
sity proportional to
θ3
11θ10θ01 θ2
00.
(a) Find the prior means and prior variances of θ11, θ10, θ01 , θ00.
(b) Find the posterior distribution.
(c) Find the posterior means and posterior variances of θ11, θ10 , θ01, θ00 .
(d) Using the R function hpdbeta which may be obtained from the Web page (or other-
wise), find a 95% posterior hpd interval, based on the exact posterior distribution, for
θ00.
(e) Find an approximate 95% hpd interval for θ00 using a normal approximation based on
the posterior mode and the partial second derivatives of the log posterior density.
Compare this with the exact hpd interval.
Hint: To find the posterior mode you will need to introduce a Lagrange multiplier.
(f) The population mean number of attendances out of two is µ= 2θ11 +θ10 +θ01.Find
the posterior mean of µand an approximation to the posterior standard deviation of µ.
2. Samples are taken from twenty wagonloads of an industrial mineral and analysed. The
amounts in ppm (parts per million) of an impurity are found to be as follows.
44.3 50.2 51.7 49.4 50.6 55.0 53.5 48.6 48.8 53.3
59.4 51.4 52.0 51.9 51.6 48.3 49.3 54.1 52.4 53.1
We regard these as independent samples from a normal distribution with mean µand variance
σ2=τ1.
Find a 95% posterior hpd interval for µunder each of the following two conditions.
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Posterior distribution and more Study notes Mathematics in PDF only on Docsity!

MAS3301 Bayesian Statistics

Problems 5 and Solutions

Semester 2

Problems 5

  1. (Some of this question is also in Problems 4). I recorded the attendance of students at tutorials for a module. Suppose that we can, in some sense, regard the students as a sample from some population of students so that, for example, we can learn about the likely behaviour of next year’s students by observing this year’s. At the time I recorded the data we had had tutorials in Week 2 and Week 4. Let the probability that a student attends in both weeks be θ 11 , the probability that a student attends in week 2 but not Week 4 be θ 10 and so on. The data are as follows.

Attendance Probability Observed frequency Week 2 and Week 4 θ 11 n 11 = 25 Week 2 but not Week 4 θ 10 n 10 = 7 Week 4 but not Week 2 θ 01 n 01 = 6 Neither week θ 00 n 00 = 13

Suppose that the prior distribution for (θ 11 , θ 10 , θ 01 , θ 00 ) is a Dirichlet distribution with den- sity proportional to θ^311 θ 10 θ 01 θ 002.

(a) Find the prior means and prior variances of θ 11 , θ 10 , θ 01 , θ 00. (b) Find the posterior distribution. (c) Find the posterior means and posterior variances of θ 11 , θ 10 , θ 01 , θ 00. (d) Using the R function hpdbeta which may be obtained from the Web page (or other- wise), find a 95% posterior hpd interval, based on the exact posterior distribution, for θ 00. (e) Find an approximate 95% hpd interval for θ 00 using a normal approximation based on the posterior mode and the partial second derivatives of the log posterior density. Compare this with the exact hpd interval. Hint: To find the posterior mode you will need to introduce a Lagrange multiplier. (f) The population mean number of attendances out of two is μ = 2θ 11 + θ 10 + θ 01. Find the posterior mean of μ and an approximation to the posterior standard deviation of μ.

  1. Samples are taken from twenty wagonloads of an industrial mineral and analysed. The amounts in ppm (parts per million) of an impurity are found to be as follows.

44.3 50.2 51.7 49.4 50.6 55.0 53.5 48.6 48.8 53. 59.4 51.4 52.0 51.9 51.6 48.3 49.3 54.1 52.4 53.

We regard these as independent samples from a normal distribution with mean μ and variance σ^2 = τ −^1. Find a 95% posterior hpd interval for μ under each of the following two conditions.

(a) The value of τ is known to be 0.1 and our prior distribution for μ is normal with mean 60.0 and standard deviation 20.0. (b) The value of τ is unknown. Our prior distribution for τ is a gamma distribution with mean 0.1 and standard deviation 0.05. Our conditional prior distribution for μ given τ is normal with mean 60.0 and precision 0. 025 τ (that is, standard deviation

40 τ −^1 /^2 ).

  1. We observe a sample of 30 observations from a normal distribution with mean μ and precision τ. The data, y 1 ,... , y 30 , are such that

∑^30

i=

yi = 672 and

∑^30

i=

y i^2 = 16193.

(a) Suppose that the value of τ is known to be 0.04 and that our prior distribution for μ is normal with mean 20 and variance 100. Find the posterior distribution of μ and evaluate a posterior 95% hpd interval for μ. (b) Suppose that we have a gamma(1, 10) prior distribution for τ and our conditional prior distribution for μ given τ is normal with mean 20 and variance (0. 1 τ )−^1. Find the marginal posterior distribution for τ, the marginal posterior distribution for μ and the marginal posterior 95% hpd interval for μ.

  1. The following data come from the experiment reported by MacGregor et al. (1979). They give the supine systolic blood pressures (mm Hg) for fifteen patients with moderate essential hypertension. The measurements were taken immediately before and two hours after taking a drug.

Patient 1 2 3 4 5 6 7 8 Before 210 169 187 160 167 176 185 206 After 201 165 166 157 147 145 168 180 Patient 9 10 11 12 13 14 15 Before 173 146 174 201 198 148 154 After 147 136 151 168 179 129 131

We are interested in the effect of the drug on blood pressure. We assume that, given pa- rameters μ, τ, the changes in blood pressure, from before to after, in the n patients are independent and normally distributed with unknown mean μ and unknown precision τ. The fifteen differences are as follows.

Our prior distribution for τ is a gamma(0. 35 , 1 .01) distribution. Our conditional prior dis- tribution for μ given τ is a normal N (0, [0. 003 τ ]−^1 ) distribution.

(a) Find the marginal posterior distribution of τ. (b) Find the marginal posterior distribution of μ. (c) Find the marginal posterior 95% hpd interval for μ. (d) Comment on what you can conclude about the effect of the drug.

  1. The lifetimes of certain components are supposed to follow a Weibull distribution with known shape parameter α = 2. The probability density function of the lifetime distribution is

f (t) = αρ^2 t exp[−(ρt)^2 ]

for 0 < t < ∞. We will observe a sample of n such lifetimes where n is large.

  1. A random sample of n = 1000 people was chosen from a large population. Each person was asked whether they approved of a proposed new law. The number answering “Yes” was x = 372. (For the purpose of this exercise all other responses and non-responses are teated as simply “Not Yes”). Assume that x is an observation from the binomial(n, p) distribution where p is the unknown proportion of people in the population who would answer “Yes.” Our prior distribution for p is a uniform distribution on (0, 1). Let p = Φ(θ) so θ = Φ−^1 (p) where Φ(y) is the standard normal distribution function and Φ−^1 (z) is its inverse.

(a) Find the maximum likelihood estimate of p and hence find the maximum likelihood estimate of θ. (b) Disregarding the prior distribution, find a large-sample approximation to the posterior distribution of θ. (c) Using your approximate posterior distribution for θ, find an approximate 95% hpd interval for θ. (d) Use the exact posterior distribution for p to find the actual posterior probability that θ is inside your approximate hpd interval.

Notes: • The standard normal distribution function Φ(x) =

∫ (^) x −∞ φ(u)^ du^ where^ φ(u) = (2π)−^1 /^2 exp{−u^2 / 2 }.

  • Let l be the log-likelihood. Then

dl dθ

dl dp

dp dθ and

d^2 l dθ^2

d dθ

dl dp

dp dθ

d dθ

dl dp

dp dθ

dl dp

d^2 p dθ^2

=

d^2 l dp^2

dp dθ

dl dp

d^2 p dθ^2

  • Derivatives of p : dp dθ

= φ(θ)

d^2 p dθ^2

= −θφ(θ)

  • You can evaluate Φ−^1 (u) using R with qnorm(u,0,1) and φ(u) is given by dnorm(u,0,1)
  1. The amounts of rice, by weight, in 20 nominally 500g packets are determined. The weights, in g, are as follows.

496 506 495 491 488 492 482 495 493 496 487 490 493 495 492 498 491 493 495 489

Assume that, given the values of parameters μ, τ, the weights are independent and each has a normal N (μ, τ ) distribution. The values of μ and τ are unknown. Our prior distribution is as follows. We have a gamma(2, 9) prior distribution for τ and a N (500, (0. 005 τ )−^1 ) conditional prior distribution for μ given τ.

(a) Find the posterior probability that μ < 495. (b) Find the posterior predictive probability that a new packet of rice will contain less than 500g of rice

  1. A machine which is used in a manufacturing process jams from time to time. It is thought that the frequency of jams might change over time as the machine becomes older. Once every three months the number of jams in a day is counted. The results are as follows.

Observation i 1 2 3 4 5 6 7 8 Age of machine ti (months) 3 6 9 12 15 18 21 24 Number of jams yi 10 13 24 17 20 22 20 23

Our model is as follows. Given the values of two parameters α, β, the number of jams yi on a dat when the machine has age ti months has a Poisson distribution

yi ∼ Poisson(λi)

where loge(λi) = α + βti. Assume that the effect of our prior distribution on the posterior distribution is negligible and that large-sample approximations may be used.

(a) Let the values of α and β which maximise the likelihood be ˆα and β.ˆ Assuming that the likelihood is differentiable at its maximum, show that these satisfy the following two equations

∑^8

i=

(λˆi − yi) = 0

∑^8

i=

ti(λˆi − yi) = 0

where loge(ˆλi) = ˆα + βtˆi and show that these equations are satisfied (to a good approximation) by

αˆ = 2. 552 and βˆ = 0. 02638.

(You may use R to help with the calculations, but show your commands). You may assume from now on that these values maximise the likelihood. (b) Find an approximate symmetric 95% posterior interval for α + 24β. (c) Find an approximate symmetric 95% posterior interval for exp(α + 24β), the mean jam-rate per day at age 24 months.

(You may use R to help with the calculations, but show your commands).

Homework 5

Solutions to Questions 9 and 10 of Problems 5 are to be submitted in the Homework Letterbox no later than 4.00pm on Tuesday May 5th.

Reference

MacGregor, G.A., Markandu, N.D., Roulston, J.E. and Jones, J.C., 1979. Essential hypertension: effect of an oral inhibitor of angiotensin-converting enzyme. British Medical Journal, 2 , 1106-1109.

Posterior means:

θ 11 :

θ 10 :

θ 01 :

θ 00 :

Posterior variances:

θ 11 :

63 × 62

622 × 63

θ 10 :

63 × 62

622 × 63

θ 01 :

63 × 62

622 × 63

θ 00 :

63 × 62

622 × 63

(d) Posterior distribution for θ 00 is beta(16, 62 − 16). That is beta(16,46).

Using the R command hpdbeta(0.95,16,46) gives 0. 15325 < θ 00 < 0. 36724. (e) The log posterior density is (apart from a constant)

∑^4

j=

(a 1 ,j − 1) log θj.

Add λ(

∑j j=1 θj^ −^ 1) to this and differentiate wrt^ θj^ then set the derivative equal to zero. This gives

a 1 ,j − 1 θˆj

  • λ = 0

which leads to θˆj = − (a^1 ,j^ −^ 1) λ

However

j=1 θj^ = 1 so

∑^4

j=

(a 1 ,j − 1) λ

so

λ = −

∑^4

j=

(a 1 ,j − 1)

and θˆj = ∑a^1 ,j^ −^1 a 1 ,k − 4

Hence the posterior mode for θ 00 is

θˆ 00 =^15 58

The second derivatives of the log likelihood are ∂^2 l ∂θ^2 j

a 1 ,j − 1 θ^2 j

and

∂^2 l ∂θj ∂θk

Since the mixed partial second derivatives are zero, the information matrix is diagonal and the posterior variance of θj is approximately

θˆ^2 j a 1 ,j − 1

(a 1 ,j − 1)^2 (a 1 ,j − 1)(

a 1 ,k − 4)^2

(ai,j − 1 (

a 1 ,k − 4)^2

The posterior variance of θ 00 is approximately 15 582

The approximate 95% hpd interval is 0. 2586 ± 1. 96

    1. that is
  1. 12772 < θ 00 < 0. 38948. This is a little wider than the exact interval.

(f) Approximation based on posterior mode and curvature: Posterior modes: θ 11 :

θ 10 :

θ 01 :

So, approx. posterior mean of μ is

2 ×

Approx. posterior variances:

θ 11 :

θ 10 :

θ 01 :

Since the (approx.) covariances are all zero, the approx. posterior variance of μ is

4 ×

so approx. standard deviation is √ 0 .0377527 = 0. 1943.

N.B. There is an alternative exact calculation, as follows, which is also acceptable. Posterior mean:

2 ×

Posterior covariances:

− a 1 ,j a 1 ,k A^21 (A 1 + 1)

var(μ) = 4var(θ 11 ) + var(θ 10 ) + var(θ 01 ) +4covar(θ 11 , θ 10 ) + 4covar(θ 11 , θ 01 ) + 2covar(θ 10 , θ 01 )

= 4

63 × 62

63 × 622

63 × 62

63 × 622

63 × 62

63 × 622

29 × 9

63 × 622

29 × 8

63 × 622

9 × 8

63 × 622

63 × 62

63 × 622

So the standard deviation is 0. 1040. The difference is quite big!

That is M 1 ± 2. 048 × 0. 68591 That is

  1. 051 < μ < 52. 860

  2. (a) We have

P 0 = 0. 01 Pd = nτ = 30 × 0 .04 = 1. 2 P 1 = 0 .01 + 1.2 = 1. 21 M 0 = 20 y¯ = 22. 4

M 1 =

P 0 M 0 + Pd ¯y P 1

0. 01 × 20 + 1. 2 × 22. 4

Posterior:

μ ∼ N (22. 380 , 1. 21 −^1 ). That is μ ∼ N (22. 380 , 0 .8264).

95% hpd interval: 22. 380 ± 1. 96

    1. That is
  1. 60 < μ < 24. 16

(b) We have

d 0 = 2 d 1 = d 0 + n = 32 v 0 = 10 c 0 = 0. 1 c 1 = c 0 + n = 30. 1 m 0 = 20 ¯y = 22. 4

m 1 =

c 0 m 0 + ny¯ c 0 + n

0. 1 × 20 + 30 × 22. 4

s^2 n =

n

∑^ n

i=

y i^2 −

n

( (^) n ∑

i=

yi

(¯y − m 0 )^2 = (22. 4 − 20)^2 = 5. 76 r^2 = (¯y − m 0 )^2 + s^2 n = 43. 7667

vd =

c 0 r^2 + ns^2 n c 0 + n

v 1 =

d 0 v 0 + nvd d 0 + n

Marginal posterior distribution for τ : d 1 v 1 τ = 11670. 77 τ ∼ χ^232. Marginal posterior distribution for μ:

μ − m 1 √ v 1 /c 1

μ − 22. 392 √

  1. 27419 / 30. 1

∼ t 32

95% hpd interval: m 1 ± 2. 037

v 1 c 1

That is

  1. 392 ± 2. 037

That is

  1. 16 < μ < 24. 63.

  2. Data: n = 15

y = − 284

y^2 = 6518

¯y = − 18. 9333 s^2 n =

Calculate posterior:

d 0 = 0. 7 v 0 = 2. 02 / 0 .7 = 2. 8857 c 0 = 0. 003 m 0 = 0 d 1 = d 0 + 15 = 15. 7 (¯y − m 0 )^2 = ¯y^2 = 358. 4711 r^2 = (¯y − m 0 )^2 + s^2 n = 434. 5333

vd =

c 0 r^2 + ns^2 n c 0 + n

v 1 =

d 0 v 0 + nvd d 0 + n

c 1 = c 0 + 15

m 1 =

c 0 m 0 + ny¯ c 0 + n

(a) Marginal posterior distribution for τ is gamma(d 1 / 2 , d 1 v 1 /2). That is

gamma(7. 85 , 572 .014).

(Alternatively d 1 v 1 τ ∼ χ^2 d 1. That is 1144. 025 τ ∼ χ^215. 7 ). (b) Marginal posterior for μ:

μ − m 1 √ v 1 /c 1

μ − 18. 9295 √

  1. 8569

∼ t 15. 7

(c) We can use R for the critical points of t 15. 7 : ± 2. 1232

qt(0.975,15.7)

95% interval: − 18. 9295 ± 2. 1232

    1. That is

− 23. 61 < μ < − 14. 25.

(d) Comment, e.g., since zero is well outside the 95% interval it seems clear that the drug reduces the blood pressure.

This quadratic equation has two solutions but one is negative and ρ must be positive so

ρˆ =

t^2 i )(2n + 1) 4

t^2 i

The second derivative is ∂^2 g ∂ρ^2

2 n + 1 ρ^2

∑^ n

i=

t^2 i

so the posterior variance is approximately

1 2[(n + 1/2)/ρˆ^2 +

t^2 i ]

= 7. 900519 × 10 −^8.

Our 95% hpd interval is therefore 0. 0097410 ± 1. 96

  1. 900519 × 10 −^8. That is

  2. 009190 < ρ < 0. 010292.

So the prior makes no noticeable difference in this case.

  1. (a) λ ∼ gamma(5, 1) so 2λ ∼ gamma(5, 1 /2), i.e. gamma(10/ 2 , 1 /2), i.e. χ^210.

From tables, 95% interval, 3. 247 < 2 λ < 20. 48. That is

  1. 6235 < λ < 10. 24

(b) Prior density prop. to λ^5 −^1 e−λ. Likelihood

L =

∏^45

i=

e−λλxi xi!

e−^45 λλ

xi ∏ xi!

∝ e−^45 λλ^182.

Posterior density prop. to λ^187 −^1 e−^46 λ. This is a

gamma(187, 46)

distribution. (c) Posterior mean: 147 46

Posterior variance:

187 462

Posterior sd: √ 187 462

95% interval 4. 0652 ± 1. 96 × 0. 29728. That is

  1. 4826 < λ < 4. 6479

(d) Joint prob. of λ, X = m:

46187 Γ(187)

λ^187 −^1 e−^46 λ^

λme−λ m!

Γ(187 + m) 47 187+m

m!

47 187+m Γ(187 + m)

λ187+m−^1 e−^47 λ

Integrate out λ:

Pr(X = m) =

47 187+m

Γ(187 + m) Γ(187)m!

=

(186 + m)! 186!m!

)m

186 + m m

)m

(e) Joint probability (density) of θ, X = m:

  1. 05 e−^0.^05 θ^

θme−θ m!

m!

Γ(1 + m)

  1. 05 m+

  2. 05 m+ Γ(1 + m)

θm+1−^1 e−^1.^05 θ

Integrate out θ :

Pr(X = m) =

  1. 05 m+

Γ(1 + m) m!

)m

Log posterior probs: “Ordinary”:

log(P 1 ) = log[Γ(187 + 10)] − log[Γ(187)] − log[Γ(11)] + 187 log(46/47) + 10 log(1/47) = log[Γ(197)] − log[Γ(187)] − log(Γ(11)] + 187 log(46) − 197 log(47) = − 5. 079796

lgamma(197) - lgamma(187) - lgamma(11) + 187log(46) - 197log(47) [1] -5.

“Type 2”:

log(P 2 ) = log(0. 05 / 1 .05) + 10 log(1/ 1 .05) = log(0.05) − 11 log(1.05) = − 3. 532424

Hence the predictive probabilities are as follows. “Ordinary”: P 1 = exp(− 5 .079796) = 0. 006221178 “Type 2”: P 2 = exp(− 3 .532424) = 0. 02923396 Hence the posterior probability that this is an ordinary customer is

9 × 0. 006221178 9 × 0 .006221178 + 1 × 0. 02923396

  1. (a) Likelihood:

L =

∏^8

i=

e−λi^ λy ii yi!

Log likelihood:

l = −

λi +

yi log λi −

log(yi!)

= −

λi +

yi(α + βti) −

log(yi!)

Derivatives:

∂λi ∂α

∂α

eα+βti^ = λi ∂λi ∂β

∂β

eα+βti^ = λiti

∂l ∂α

∑ (^) ∂λi ∂α

yi = −

λi +

yi = −

(λi − yi) ∂l ∂β

∑ (^) ∂λi ∂β

yiti = −

λiti +

yiti = −

ti(λi − yi)

At the maximum

∂l ∂α

∂l ∂β

Hence ˆα and βˆ satisfy the given equations. Calculations in R:

y<-c(10,13,24,17,20,22,20,23) t<-seq(3,24,3) lambda<-exp(2.552+0.02638t) sum(lambda-y) [1] 0. sum(t(lambda-y)) [1] -0.

These values seem close to zero but let us try a small change to the parameter values:

lambda<-exp(2.55+0.0264t) sum(lambda-y) [1] -0. sum(t(lambda-y)) [1] -3.

The results are now much further than zero suggesting that the given values are very close to the solutions.

(5 marks)

(b) Second derivatives:

∂^2 l ∂α^2

∑ (^) ∂λi ∂α

λi

∂^2 l ∂β^2

ti

∂λi ∂α

t^2 i λi

∂^2 l ∂α∂β

∑ (^) ∂λi ∂β

tiλi

Variance matrix:

V = −

∂^2 l ∂α^2

∂^2 l ∂α∂β ∂^2 l ∂α∂β

∂^2 l ∂β^2

Numerically using R:

lambda<-exp(2.552+0.02638t) d2<-matrix(nrow=2,ncol=2) d2[1,1]<- -sum(lambda) d2[1,2]<- - sum(tlambda) d2[2,1]<- - sum(tlambda) d2[2,2]<- - sum((t^2)lambda) V<- - solve(d2) V [,1] [,2] [1,] 0.038194535 -0. [2,] -0.002136181 0. The mean of α + 24β is 2.552 + 24 × 0 .02638 = 3. 18512. The variance is 0.038194535 + 24^2 ∗ 0 .0001449430 + 2 × 1 × 24 × (− 0 .0021361807) =

Alternative matrix-based calculation in R:

dim(m)<-c(1,2) v<-m%%V%%t(m) v [,1] [1,] 0.

The approximate 95% interval is

That is

  1. 9139 < α + 24β < 3. 4563

(5 marks)

(c) The interval for λ 24 is e^2.^9139 < eα+24β^ < e^3.^4563.

That is

  1. 429 < λ 24 < 31. 700.

(2 marks)