



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An analysis of the maximum likelihood estimation equations for gamma and normal distributions. The similarities and differences between the equations, including their linearity in the data and the presence of quadratic terms. The document also discusses the relationship between mean and variance for gamma distributions and its implications for the estimating equation.
Typology: Assignments
1 / 6
This page cannot be seen from the preview
Don't miss anything!




log L =
∑^ n
j=
{Yj log f (xj , β) − f (xj , β) − log Yj !}.
Taking derivatives with respect to β and setting equal to zero gives the estimating equation
∂/∂β log L =
∑^ n
j=
{Yj fβ (xj , β)/f (xj , β) − fβ (xj , β)}
∑^ n
j=
f −^1 (xj , β){Yj − f (xj , β)}fβ (xj , β) = 0.
Note that, under the Poisson distribution, var(Yj |xj ) = f (xj , β). (b) The loglikelihood is
log L =
∑^ n
j=
[Yj log f (xj , β) + (kj − Yj ) log{ 1 − f (xj , β)}].
Thus, taking derivatives with respect to β gives
∂/∂β log L =
∑^ n j=
[Yj fβ (xj , β)/f (xj , β) − (kj − Yj )fβ (xj , β)/{ 1 − f (xj , β)}]
∑^ n
j=
[kj f (xj , β){ 1 − f (xj , β)}]−^1 {Yj − kj f (xj , β)}kj fβ (xj , β) = 0.
Note that, under the binomial distribution, var(Yj |xj ) = kj f (xj , β){ 1 − f (xj , β)}. (c) Both of these estimating equations are linear in the data Yj. In addition, they both have a specific form, that of the GLS-type equation in (3.2) of the notes. That is, they have the form of a deviation (response−mean) times a gradient and a “weight” equal to the inverse of the variance of the response. This is no accident, as we will see: Both of these distributions are members of a special class with this property.
L =
∏^ n
j=
Y 1 /σ
(^2) − 1 j exp[−Yj^ /{σ (^2) f (xj , β)}] Γ(1/σ^2 ){σ^2 f (xj , β)}^1 /σ^2
so that
log L =
∑^ n j=
[(1/σ^2 − 1) log Yj − Yj /{σ^2 f (xj , β)} − (1/σ^2 ) log{σ^2 f (xj , β)} − log Γ(1/σ^2 )].
Taking derivatives with respect to β yields the estimating equation
∂/∂β log L =
∑^ n
j=
[Yj fβ (xj , β)/{σf (xj , β)}^2 − (1/σ^2 )fβ (xj , β)/f (xj , β)]
= (1/σ^2 )
∑^ n
j=
f −^2 (xj , β){Yj − f (xj , β)}fβ (xj , β) = 0.
(b) Now we have
log L = −n log(2π)^1 /^2 − n log σ −
∑^ n j=
log f (xj , β) −
2 σ^2
∑^ n j=
[Yj − f (xj , β)]^2 f 2 (xj , β)
Thus,
∂/∂β log L =
∑^ n
j=
λβ (xj , β) + (1/σ^2 )
∑^ n
j=
[Yj − f (xj , β)]^2 f 2 (xj , β) λβ (xj , β)
+(1/σ^2 )
∑^ n
j=
f −^2 (xj , β){Yj − f (xj , β)}fβ (xj , β)
= (1/σ^2 )
∑^ n
j=
f −^2 (xj , β){Yj − f (xj , β)}fβ (xj , β)
∑^ n
j=
( [Yj − f (xj , β)]^2 σ^2 f 2 (xj , β)
) λβ (xj , β) = 0.
(c) It is straightforward to derive the form of the lognormal density given only the information in the problem, which we do here. If Z = log Y , then the Jacobian of the transformation is 1 /Y , and the density of Y is thus n(log Y ; m, γ^2 )Y −^1 , where n(·; m, γ^2 ) is the normal density with mean m and variance γ^2. Thus, the desired density is
(2π)−^1 /^2 (γY )−^1 exp
{ (log Y − m)^2 2 γ^2
} .
We would like this in terms of the mean and variance of Y. If E(Y ) = f , then using the moment generating function of a normal, we have
E(Y ) = E(eZ^ ) = em+γ (^2) / 2 = f.
We also have E(Y 2 ) = E(e^2 Z^ ) = e^2 m+2γ 2 , so that
var(Y ) = (eγ 2 − 1){E(Y )}^2.
Thus, σ^2 = eγ^2 − 1, and we may deduce that
γ^2 = log(σ^2 + 1), m = log f − log{(σ^2 + 1)/ 2 }.
Applying this to our problem and ignoring constants, we have
log L = −
∑^ n j=
log Yj −(n/2)
∑^ n j=
log{log(σ^2 +1)}−
∑^ n j=
[log Yj − log f (xj , β) + log{(σ^2 + 1)/ 2 }]^2 log(σ^2 + 1)
fβ (xj , β) f (xj , β)
Taking derivatives with respect to β and simplifying yields the estimating equation
∑^ n
j=
[log Yj − log f (xj , β) + log{(σ^2 + 1)/ 2 }]
fβ (xj , β) f (xj , β)
(d) The equation in (a) is of exactly the same “GLS” form as those in Problem 1(a) and (b)
Only when λ = 0 (the log transformation) will this be the case; otherwise, this is not possible. The implication is that, technically, for positive response, it is impossible for the TBS model to hold! However, if P {h(Y, λ) > − 1 /λ} is close to 1, for all practical purposes we may ignore this technical detail in applications. This explains why this model has been successfully and widely used even with positive response on the original scale.
1 (β 0 + β 1 /x)−^1 = β 0 + β 1 /x.
Thus, we may write this model alternatively as
Y −^1 − 1 1
f −^1 (x, β) − 1 1
with f (x, β) = (β 0 + β 1 /x)−^1. It follows that this is of the form in 3(a) with λ = −1, θ = 0, so that E(Yj |xj ) = f (xj , β) and var(Yj |xj ) = σ^2 f 4 (xj , β). Thus, this model makes an approximate assumption that the variance increases drastically with the mean. For (ii), using the above, we may immediately identify λ = −1 and θ = −1, so that E(Yj |xj ) = f (xj , β) and var(Yj |xj ) = σ^2 x− j 2 f 4 (xj , β). Thus, this model perhaps makes a less severe approximate assumption on variance, as it “tempers” the power of the mean with the inverse of the square of xj. (b) By some algebra, we can write this model as
Y = (β 0 + β 1 /x)−^1 (1 + β 1 e).
Thus, we see that this model makes the assumption that E(Yj |xj ) = (β 0 + β 1 /x)−^1 and var(Yj |xj ) = σ^2 β 12 {E(Yj |xj )}^2 , so that the model does assume constant coefficient of variation, where the coefficient of variation is σβ 1. Note further that
log Y = log(β 0 + β 1 /x)−^1 + log(1 + β 1 e) ≈ log(β 0 + β 1 /x)−^1 + β 1 e
by a Taylor series in the second term about e = 0. Thus, this model is of the form in part (a) with λ = θ = 0 and “errors” with variance σβ 1.
g−^2 (β, θ, xj ) ≈ g−^2 (β∗, θ, xj ) + GT^ (β∗, θ, xj )(β − β∗), where G(β, θ, xj ) = ∂/∂β g−^2 (β, θ, xj ), say (a (q × 1) vector). Substituting these approximations into the GLS equation (3.12) on page 59 gives
∑^ n
j=
{g−^2 (β∗θ, xj ) + GT^ (β∗, θ, xj )(β − β∗)}{Yj − f (xj , β∗) − f (^) βT (xj , β∗)(β − β∗)}
×{fβ (xj , β∗) + f (^) ββ (xj , β∗)(β − β∗)}
∑^ n
j=
g−^2 (β∗, θ, xj ){Yj − f (xj , β∗)}fβ (xj , β∗)
∑^ n
j=
g−^2 (β∗, θ, xj )fβ (xj , β∗)f (^) βT (xj , β∗)(β − β∗)
∑^ n
j=
g−^2 (β∗, θ, xj ){Yj − f (xj , β∗)}f (^) ββ (xj , β∗)(β − β∗)
∑^ n
j=
g−^2 (β∗, θ, xj )f (^) βT (xj , β∗)(β − β∗)fββ (xj , β∗)(xj , β∗)
∑^ n
j=
GT^ (β∗, θ, xj )(β − β∗) × terms involving {Yj − f (xj , β∗)}, (β − β∗).
The first two terms involve the data and are linear in (β − β∗). The third depends on the product of {Yj − f (xj , β∗)} and (β − β∗), which is expected to be “smaller” for β∗^ close to β than the first two terms. The fourth term is quadratic in (β −β∗), so should also be “smaller” that the first two. The remaining terms involve at least products of {Yj − f (xj , β∗)} and (β − β∗), so are also “smaller.” Thus, as in the argument in Section 3.2, we disregard these terms. Note that the presence of β in the “weights” really doesn’t affect the form of the linear approximation at β∗. We are left with ∑^ n
j=
g−^2 (β∗, θ, xj ){Yj −f (xj , β∗)}fβ (xj , β∗) ≈
∑^ n
j=
g−^2 (β∗, θ, xj )fβ (xj , β∗)f (^) βT (xj , β∗)(β−β∗),
which may be rewritten in obvious matrix notation as
{XT^ (β∗)W (β∗)X(β∗)}(β − β∗) ≈ XT^ (β∗)W (β∗){Y − f (β∗)},
yielding the required updating scheme.
{ 1 + n−^1 (x 0 − ¯x)^2 ∑n j=1(xj^ −^ x¯)^2
} = SD,
where ¯x is the mean of the n x values involved in the fit and ˆσ^2 is the usual estimator for constant variance in simple linear regression based on the n observations. I obtain the 90% prediction interval as ( βˆ∗ 3 + βˆ∗ 4 x 0 ) ± t 0. 95 SD,