Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Moment Generating Functions and Densities: Lognormal, Normal, Negative Binomial, Assignments of Statistics

Solutions to extra problems from a university course on moment generating functions and densities. Topics covered include the lognormal and normal distributions, and the negative binomial probability mass function. The solutions involve calculating moments and identifying the required scaled exponential family form for certain densities.

Typology: Assignments

Pre 2010

Uploaded on 03/11/2009

koofers-user-fky
koofers-user-fky 🇺🇸

10 documents

1 / 4

Toggle sidebar

Related documents


Partial preview of the text

Download Moment Generating Functions and Densities: Lognormal, Normal, Negative Binomial and more Assignments Statistics in PDF only on Docsity!

ST 762, HOMEWORK 2 EXTRA PROBLEM SOLUTIONS, FALL 2007

  1. (a) Let the moment generating function be m(t). By definition, where integration is over the support of Y and f (y|ξ, σ) is the density of Y ,

m(t) =

∫ etyf (y|ξ, σ) dy

=

∫ ety^ exp

( yξ − b(ξ) σ^2

  • c(y, σ)

) dy

= exp

( b(ξ + tσ^2 ) − b(ξ) σ^2

) ∫ exp

( y(ξ + tσ^2 ) − b(ξ + tσ^2 ) σ^2

  • c(y, σ)

) dy

= exp

( b(ξ + tσ^2 ) − b(ξ) σ^2

) ,

where the result follows because the integral in the second-to-last line is equal to 1. Thus, we have by the chain rule dm(t)/dt = mt(t) = bξ (ξ + tσ^2 )m(t) and d^2 m(t)/dt^2 = mtt(t) = bξ (ξ + tσ^2 )mt(t) + σ^2 bξξ(ξ + tσ^2 )m(t). It follows that

E(Y ) = mt(0) = bξ (ξ),

E(Y 2 ) = mtt(0) = bξ (ξ)bξ (ξ) + σ^2 bξξ(ξ) so that var(Y ) = σ^2 bξξ(ξ) + b^2 ξ (ξ) − {E(Y )}^2 = σ^2 bξξ(ξ).

(b) From (a), we also have

mttt(t) = σ^4 bξξξ(ξ + tσ^2 )m(t) + 2σ^2 bξξ(ξ + tσ^2 )mt(t) + bξ (ξ + tσ^2 )mtt(t)

and

mtttt(t) = σ^6 bξξξξ(ξ+tσ^2 )m(t)+3σ^4 bξξξ(ξ+tσ^2 )mt(t)+3σ^2 σ^4 bξξ(ξ+tσ^2 )mtt(t)+bξ (ξ+tσ^2 )mttt(t).

These may be used to calculate E{(Y − μ)^3 } and E{(Y − μ)^4 }. We have E{(Y − μ)^3 } = E[{(Y − bξ (ξ)}^3 ] = E(Y 3 ) + 3bξ (ξ)E(Y 2 ) + 2b^3 ξ (ξ). Upon substitution of the expressions for the moments of Y , we obtain

E{(Y −μ)^3 } = σ^4 bξξξ(ξ)m(0)+2σ^2 bξξ(ξ)mt(0)+bξ (ξ)mtt(0)− 3 b^3 ξ (ξ)− 3 σ^2 bξ(ξ)bξξ (ξ)+2b^2 ξ (ξ).

This simplifies to σ^4 bξξξ(ξ). By an entirely similar calculation, we have E{(Y − μ)^4 } = E(Y 4 ) + 4E(Y 3 )bξ (ξ) + 6b^2 ξ (ξ)E(Y 2 ) − 4 b^3 ξ (ξ)E(Y ) + b^4 ξ (ξ). Upon substitution, this becomes

E{(Y − μ)^4 } = σ^6 bξξξξ(ξ) + 3σ^4 bξξξ(ξ)mt(0) + 3σ^2 bξξ(ξ)mtt(0) + bξ (ξ)mttt(0) − 4 bξ (ξ)mttt(0) + 6b^3 ξ (ξ) + 6σ^2 b^2 ξ (ξ)bξξ (ξ) − 3 b^4 ξ (ξ) = σ^6 bξξξξ(ξ) − 6 σ^2 b^2 ξ (ξ)bξξ (ξ) + 3mt(0){b^2 ξ (ξ) + σ^2 bξξ(ξ)} − 3 b^4 ξ (ξ) = σ^6 bξξξξ(ξ) + 3σ^4 {bξξ(ξ)}^2.

Thus, m 3 = σ^4 bξξξ(ξ), m 4 = σ^6 bξξξξ(ξ) + 3σ^4 {bξξ(ξ)}^2 , and we obtain, using m 2 = σ^2 bξξ(ξ),

ζ = σbξξξ(ξ)/{bξξ(ξ)}^3 /^2 ,

κ = σ^2 bξξξξ(ξ)/{bξξ (ξ)}^2 + 3 − 3 = σ^2 bξξξξ(ξ)/{bξξ (ξ)}^2.

Later in the course, we will find it interesting that ζ is of order σ and κ is of order σ^2.

  1. (a) The lognormal with E(Y ) = μ, var(Y ) = σ^2 μ^2 has density

f (y) =

y

√ 2 π log(1 + σ^2 )

exp

( −

log y − log μ + log{(σ^2 + 1)/ 2 } 2 log(σ^2 + 1)

)

= exp

( −

log y − log μ + γ^2 / 2 2 γ^2

− log y − log(2π)^1 /^2 − log γ

) ,

letting γ^2 = log(σ^2 + 1). It is clear that we cannot isolate y; hence, this density cannot be put in the required scaled exponential family form. (b) For the normal, the density of Y is

f (y) =

2 πσμ

exp

( −

(y − μ)^2 2 σ^2 μ^2

)

= exp

( −

y^2 − 2 yμ + μ^2 2 σ^2 μ

− log σ − log(2π)^1 /^2 − log μ

)

= exp

( y/μ − 1 / 2 σ^2

− y^2 /(2σ^2 μ^2 ) − log μ − log σ − log(2π)^1 /^2

) .

Clearly, we cannot write the first term in the argument of the exponential in the form yξ−b(ξ); moreover, the second term has a quadratic function of y linked with μ. Hence, this density cannot be put in the required scaled exponential family form. The normal with constant variance is a member of the scaled exponential family class, but the normal with variance depending on the mean is not, in general. Here, the dependence of the variance on μ introduces a term in which μ is linked to a quadratic function of y, which clearly violates the nice property of the scaled exponential family that says that mean (through ξ) and y are linked only in a linear fashion. This feature is what makes this family has GLS as maximum likelihood.

  1. (a) The density of Y will be

f (y) =

∫ (^) ∞

0

mye−m y!

( mθ μ

)θ e−mθ/μ Γ(θ)

m

dm

=

Γ(θ + y) Γ(θ)y!

( μ θ + μ

)θ+y ( θ μ

)θ ∫ (^) ∞

0

mθ+y−^1 e−m(θ+μ)/μ Γ(θ + y){μ/(θ + μ)}θ+y^

dm

=

Γ(θ + y) Γ(θ)y!

(θ/μ)θ (1 + θ/μ)θ+y^

,

y = 0, 1 ,.. .. This is the negative binomial probability mass function. (b) It is straightforward to show that this density is a member of the scaled exponential family for θ known. The algebra is not shown here, but we may identify ξ = log{μ/(μ + θ)}, b(ξ) = −θ log{ 1 − μ/(μ + θ)} = −θ log(1 − eξ^ ), and c(y, σ^2 ) = log{Γ(θ + y)/Γ(y)} − log y!. Of course, if θ is not known, it is straightforward to demonstrate the we may not “isolate” y multiplied by a function that does not depend on θ; in fact, y and θ appear together in the complicated function of y log{Γ(θ + y)/Γ(y)}. Thus, we have immediately that

E(Y ) = bξ (ξ) =

θeξ 1 − eξ^

= μ

and var(Y ) = bξξ(ξ) =

θe^2 ξ (1 − eξ^ )^2

+

θeξ 1 − eξ^

= μ^2 /θ + μ.

(c) In (b), we showed that E(Y ) = μ and var(Y ) = μ + μ^2 /θ, where y = 0, 1 , 2 ,... ,. The data for which this is an appropriate model are thus count data; however, the mean-variance relationship is not that of a Poisson distribution. Rather, the variance is like that of a Poisson plus an additional term. This suggests that this model may be appropriate in a situation where we have count data, but variance increases more profoundly than a Poisson distribution would dictate. In fact, the negative binomial distribution is often used as a model for overdispersed count data.

  1. (a) We have

P (Y = y) =

∫ (^1)

0

P (Y = y|P = p) f (p)dp,

where f (p) is the beta density. Thus,

P (Y = y) =

∫ (^1)

0

( m y

) py(1 − p)m−y^

pa−^1 (1 − p)b−^1 B(a, b)

dp

=

( m y

) ]{B(a, b)}−^1

Γ(a + y)Γ(m − y + b) Γ(a + y + m − y + b)

∫ (^1)

0

pa+y−^1 (1 − p)m−y+b−^1 B(a + y, m − y + b)

dp

=

Γ(m + 1)Γ(a + y)Γ(m − y + b)Γ(a + b) Γ(y + 1)Γ(m − y + 1)Γ(m + a + b)Γ(a)Γ(b)

,

y = 0, 1 ,... , m. (b) Rather than work with this mess directly, it is easier to use a conditioning argument. If P has a beta distribution, then it is straightforward to show that E(P ) = a/(a + b) = π, say, and var(P ) =

ab (a + b + 1)(a + b)^2

,

so that var(P ) = (a + b + 1)−^1 π(1 − π) = τ π(1 − π), say, where τ = (a + b + 1)−^1. Now E(Y ) = E{E(Y |P )} = E(mP ) = mπ, and

var(Y ) = E{var(Y |P )} = var{E(Y |P )} = E{mP (1 − P )} + var(mP ) = E(mP ) − E(mP 2 ) + var(mP ) = m(m − 1)var(P ) + mE(P ){ 1 − E(P )} = m(m − 1)τ π(1 − π) + mπ(1 − π) = mπ(1 − π){1 + τ (m − 1)}.

Thus, E(Y ) = mπ and var(Y ) = mπ

m − mπ m

{1 + τ (m − 1)}.

(c) Note that, when τ = 0, this mean-variance specification is just that of the binomial distribution with parameters m and π. Thus, as τ > 0 here (because a, b > 0), we have a mean-variance model for data taking the integer values 0, 1 ,... , m like a binomial, but

with mean-variance different from the binomial. In particular, this mean-variance model has variance that is like that of a binomial times a positive, multiplicative factor 1 + τ (m − 1) that serves to inflate the variance. Thus, this model would be useful in a situation where we would ordinarily expect Y to follow a binomial distribution but with variance exhibiting overdispersion due to additional sources of variation. This is as discussed in Section 4.5 of the notes. This is precisely the situation where the beta-binomial distribution was first proposed as a model for data.

  1. (a) The QL is

ℓQL(μ; y) = σ−^2

∫ (^) μ

y

y − u u(1 + uθ)

du

= σ−^2

{ y log

( μ 1 + μθ

) −

θ

log(1 + μθ) − y log

( y 1 + μθ

)

θ

log(1 + θy)

}

= σ−^2

{ y log

( μ μ + 1/θ

) −

θ

log

( μ + 1/θ 1 /θ

) − y log

( y y + 1/θ

)

θ

log

( y + 1/θ 1 /θ

)} .

(b) When σ = 1, the “important part of the QL (involving μ) is

y log

( μ μ + 1/θ

) −

θ

log

( μ + 1/θ 1 /θ

) .

Now, from the previous problem, the negative binomial density is, with θ here equivalent to 1 θ there, Γ(1/θ + y) Γ(1/θ)y!

(1/θμ)^1 /θ (1 + 1/θμ)^1 /θ+y

=

Γ(1/θ + y) Γ(1/θ)y!

( 1 /θ μ + 1/θ

) 1 /θ ( μ μ + 1/θ

)y .

Taking logs, the part of this expression involving μ is

y log

( μ μ + 1/θ

) − (1/θ) log

( 1 /θ μ + 1/θ

) ,

which is identical to the “important part” of the QL in (a). Furthermore, it is clear from this expression that

ξ = log

( μ μ + 1/θ

)

and b(ξ) = −(1/θ) log(1 − eξ^ ), so that this indeed in the form of a scaled exponential family. (c) Letting μ(β) emphasize dependence on β, differentiating the QL with respect to β using the chain rule gives ∂/∂β ℓQL(μ; y) = ∂/∂μ ℓQL(μ; y)∂/∂β μ(β); we have ∂/∂μ ℓQL(μ; y) = y/μ − y?(μ + 1/θ) − 1 /{θ(μ + 1/θ)} =

y θμ(μ + 1/θ)

μ θμ(μ + 1/θ) = (y − μ){μ + μ^2 θ} Thus, putting this together, we see that this is indeed in the form of gradient × (1/variance) × (response-mean), as required.