Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Solutions to extra problems from a university course on moment generating functions and densities. Topics covered include the lognormal and normal distributions, and the negative binomial probability mass function. The solutions involve calculating moments and identifying the required scaled exponential family form for certain densities.
Typology: Assignments
1 / 4
m(t) =
∫ etyf (y|ξ, σ) dy
∫ ety^ exp
( yξ − b(ξ) σ^2
) dy
= exp
( b(ξ + tσ^2 ) − b(ξ) σ^2
) ∫ exp
( y(ξ + tσ^2 ) − b(ξ + tσ^2 ) σ^2
) dy
= exp
( b(ξ + tσ^2 ) − b(ξ) σ^2
) ,
where the result follows because the integral in the second-to-last line is equal to 1. Thus, we have by the chain rule dm(t)/dt = mt(t) = bξ (ξ + tσ^2 )m(t) and d^2 m(t)/dt^2 = mtt(t) = bξ (ξ + tσ^2 )mt(t) + σ^2 bξξ(ξ + tσ^2 )m(t). It follows that
E(Y ) = mt(0) = bξ (ξ),
E(Y 2 ) = mtt(0) = bξ (ξ)bξ (ξ) + σ^2 bξξ(ξ) so that var(Y ) = σ^2 bξξ(ξ) + b^2 ξ (ξ) − {E(Y )}^2 = σ^2 bξξ(ξ).
(b) From (a), we also have
mttt(t) = σ^4 bξξξ(ξ + tσ^2 )m(t) + 2σ^2 bξξ(ξ + tσ^2 )mt(t) + bξ (ξ + tσ^2 )mtt(t)
and
mtttt(t) = σ^6 bξξξξ(ξ+tσ^2 )m(t)+3σ^4 bξξξ(ξ+tσ^2 )mt(t)+3σ^2 σ^4 bξξ(ξ+tσ^2 )mtt(t)+bξ (ξ+tσ^2 )mttt(t).
These may be used to calculate E{(Y − μ)^3 } and E{(Y − μ)^4 }. We have E{(Y − μ)^3 } = E[{(Y − bξ (ξ)}^3 ] = E(Y 3 ) + 3bξ (ξ)E(Y 2 ) + 2b^3 ξ (ξ). Upon substitution of the expressions for the moments of Y , we obtain
E{(Y −μ)^3 } = σ^4 bξξξ(ξ)m(0)+2σ^2 bξξ(ξ)mt(0)+bξ (ξ)mtt(0)− 3 b^3 ξ (ξ)− 3 σ^2 bξ(ξ)bξξ (ξ)+2b^2 ξ (ξ).
This simplifies to σ^4 bξξξ(ξ). By an entirely similar calculation, we have E{(Y − μ)^4 } = E(Y 4 ) + 4E(Y 3 )bξ (ξ) + 6b^2 ξ (ξ)E(Y 2 ) − 4 b^3 ξ (ξ)E(Y ) + b^4 ξ (ξ). Upon substitution, this becomes
E{(Y − μ)^4 } = σ^6 bξξξξ(ξ) + 3σ^4 bξξξ(ξ)mt(0) + 3σ^2 bξξ(ξ)mtt(0) + bξ (ξ)mttt(0) − 4 bξ (ξ)mttt(0) + 6b^3 ξ (ξ) + 6σ^2 b^2 ξ (ξ)bξξ (ξ) − 3 b^4 ξ (ξ) = σ^6 bξξξξ(ξ) − 6 σ^2 b^2 ξ (ξ)bξξ (ξ) + 3mt(0){b^2 ξ (ξ) + σ^2 bξξ(ξ)} − 3 b^4 ξ (ξ) = σ^6 bξξξξ(ξ) + 3σ^4 {bξξ(ξ)}^2.
Thus, m 3 = σ^4 bξξξ(ξ), m 4 = σ^6 bξξξξ(ξ) + 3σ^4 {bξξ(ξ)}^2 , and we obtain, using m 2 = σ^2 bξξ(ξ),
ζ = σbξξξ(ξ)/{bξξ(ξ)}^3 /^2 ,
κ = σ^2 bξξξξ(ξ)/{bξξ (ξ)}^2 + 3 − 3 = σ^2 bξξξξ(ξ)/{bξξ (ξ)}^2.
Later in the course, we will find it interesting that ζ is of order σ and κ is of order σ^2.
f (y) =
y
√ 2 π log(1 + σ^2 )
exp
( −
log y − log μ + log{(σ^2 + 1)/ 2 } 2 log(σ^2 + 1)
)
= exp
( −
log y − log μ + γ^2 / 2 2 γ^2
− log y − log(2π)^1 /^2 − log γ
) ,
letting γ^2 = log(σ^2 + 1). It is clear that we cannot isolate y; hence, this density cannot be put in the required scaled exponential family form. (b) For the normal, the density of Y is
f (y) =
2 πσμ
exp
( −
(y − μ)^2 2 σ^2 μ^2
)
= exp
( −
y^2 − 2 yμ + μ^2 2 σ^2 μ
− log σ − log(2π)^1 /^2 − log μ
)
= exp
( y/μ − 1 / 2 σ^2
− y^2 /(2σ^2 μ^2 ) − log μ − log σ − log(2π)^1 /^2
) .
Clearly, we cannot write the first term in the argument of the exponential in the form yξ−b(ξ); moreover, the second term has a quadratic function of y linked with μ. Hence, this density cannot be put in the required scaled exponential family form. The normal with constant variance is a member of the scaled exponential family class, but the normal with variance depending on the mean is not, in general. Here, the dependence of the variance on μ introduces a term in which μ is linked to a quadratic function of y, which clearly violates the nice property of the scaled exponential family that says that mean (through ξ) and y are linked only in a linear fashion. This feature is what makes this family has GLS as maximum likelihood.
f (y) =
∫ (^) ∞
0
mye−m y!
( mθ μ
)θ e−mθ/μ Γ(θ)
m
dm
Γ(θ + y) Γ(θ)y!
( μ θ + μ
)θ+y ( θ μ
)θ ∫ (^) ∞
0
mθ+y−^1 e−m(θ+μ)/μ Γ(θ + y){μ/(θ + μ)}θ+y^
dm
Γ(θ + y) Γ(θ)y!
(θ/μ)θ (1 + θ/μ)θ+y^
y = 0, 1 ,.. .. This is the negative binomial probability mass function. (b) It is straightforward to show that this density is a member of the scaled exponential family for θ known. The algebra is not shown here, but we may identify ξ = log{μ/(μ + θ)}, b(ξ) = −θ log{ 1 − μ/(μ + θ)} = −θ log(1 − eξ^ ), and c(y, σ^2 ) = log{Γ(θ + y)/Γ(y)} − log y!. Of course, if θ is not known, it is straightforward to demonstrate the we may not “isolate” y multiplied by a function that does not depend on θ; in fact, y and θ appear together in the complicated function of y log{Γ(θ + y)/Γ(y)}. Thus, we have immediately that
E(Y ) = bξ (ξ) =
θeξ 1 − eξ^
= μ
and var(Y ) = bξξ(ξ) =
θe^2 ξ (1 − eξ^ )^2
θeξ 1 − eξ^
= μ^2 /θ + μ.
(c) In (b), we showed that E(Y ) = μ and var(Y ) = μ + μ^2 /θ, where y = 0, 1 , 2 ,... ,. The data for which this is an appropriate model are thus count data; however, the mean-variance relationship is not that of a Poisson distribution. Rather, the variance is like that of a Poisson plus an additional term. This suggests that this model may be appropriate in a situation where we have count data, but variance increases more profoundly than a Poisson distribution would dictate. In fact, the negative binomial distribution is often used as a model for overdispersed count data.
P (Y = y) =
∫ (^1)
0
P (Y = y|P = p) f (p)dp,
where f (p) is the beta density. Thus,
P (Y = y) =
∫ (^1)
0
( m y
) py(1 − p)m−y^
pa−^1 (1 − p)b−^1 B(a, b)
dp
( m y
) ]{B(a, b)}−^1
Γ(a + y)Γ(m − y + b) Γ(a + y + m − y + b)
∫ (^1)
0
pa+y−^1 (1 − p)m−y+b−^1 B(a + y, m − y + b)
dp
Γ(m + 1)Γ(a + y)Γ(m − y + b)Γ(a + b) Γ(y + 1)Γ(m − y + 1)Γ(m + a + b)Γ(a)Γ(b)
y = 0, 1 ,... , m. (b) Rather than work with this mess directly, it is easier to use a conditioning argument. If P has a beta distribution, then it is straightforward to show that E(P ) = a/(a + b) = π, say, and var(P ) =
ab (a + b + 1)(a + b)^2
so that var(P ) = (a + b + 1)−^1 π(1 − π) = τ π(1 − π), say, where τ = (a + b + 1)−^1. Now E(Y ) = E{E(Y |P )} = E(mP ) = mπ, and
var(Y ) = E{var(Y |P )} = var{E(Y |P )} = E{mP (1 − P )} + var(mP ) = E(mP ) − E(mP 2 ) + var(mP ) = m(m − 1)var(P ) + mE(P ){ 1 − E(P )} = m(m − 1)τ π(1 − π) + mπ(1 − π) = mπ(1 − π){1 + τ (m − 1)}.
Thus, E(Y ) = mπ and var(Y ) = mπ
m − mπ m
{1 + τ (m − 1)}.
(c) Note that, when τ = 0, this mean-variance specification is just that of the binomial distribution with parameters m and π. Thus, as τ > 0 here (because a, b > 0), we have a mean-variance model for data taking the integer values 0, 1 ,... , m like a binomial, but
with mean-variance different from the binomial. In particular, this mean-variance model has variance that is like that of a binomial times a positive, multiplicative factor 1 + τ (m − 1) that serves to inflate the variance. Thus, this model would be useful in a situation where we would ordinarily expect Y to follow a binomial distribution but with variance exhibiting overdispersion due to additional sources of variation. This is as discussed in Section 4.5 of the notes. This is precisely the situation where the beta-binomial distribution was first proposed as a model for data.
ℓQL(μ; y) = σ−^2
∫ (^) μ
y
y − u u(1 + uθ)
du
= σ−^2
{ y log
( μ 1 + μθ
) −
θ
log(1 + μθ) − y log
( y 1 + μθ
)
θ
log(1 + θy)
}
= σ−^2
{ y log
( μ μ + 1/θ
) −
θ
log
( μ + 1/θ 1 /θ
) − y log
( y y + 1/θ
)
θ
log
( y + 1/θ 1 /θ
)} .
(b) When σ = 1, the “important part of the QL (involving μ) is
y log
( μ μ + 1/θ
) −
θ
log
( μ + 1/θ 1 /θ
) .
Now, from the previous problem, the negative binomial density is, with θ here equivalent to 1 θ there, Γ(1/θ + y) Γ(1/θ)y!
(1/θμ)^1 /θ (1 + 1/θμ)^1 /θ+y
=
Γ(1/θ + y) Γ(1/θ)y!
( 1 /θ μ + 1/θ
) 1 /θ ( μ μ + 1/θ
)y .
Taking logs, the part of this expression involving μ is
y log
( μ μ + 1/θ
) − (1/θ) log
( 1 /θ μ + 1/θ
) ,
which is identical to the “important part” of the QL in (a). Furthermore, it is clear from this expression that
ξ = log
( μ μ + 1/θ
)
and b(ξ) = −(1/θ) log(1 − eξ^ ), so that this indeed in the form of a scaled exponential family. (c) Letting μ(β) emphasize dependence on β, differentiating the QL with respect to β using the chain rule gives ∂/∂β ℓQL(μ; y) = ∂/∂μ ℓQL(μ; y)∂/∂β μ(β); we have ∂/∂μ ℓQL(μ; y) = y/μ − y?(μ + 1/θ) − 1 /{θ(μ + 1/θ)} =
y θμ(μ + 1/θ)
μ θμ(μ + 1/θ) = (y − μ){μ + μ^2 θ} Thus, putting this together, we see that this is indeed in the form of gradient × (1/variance) × (response-mean), as required.