Comparing Estimating Equations for Gamma and Normal Distributions, Assignments of Statistics

An analysis of the maximum likelihood estimation equations for gamma and normal distributions. The similarities and differences between the equations, including their linearity in the data and the presence of quadratic terms. The document also discusses the relationship between mean and variance for gamma distributions and its implications for the estimating equation.

Typology: Assignments

Pre 2010

Uploaded on 03/18/2009

koofers-user-f9h
koofers-user-f9h 🇺🇸

10 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ST 762, HOMEWORK 1 EXTRA PROBLEM SOLUTIONS, FALL 2007
1. (a) The loglikelihood is
log L=
n
X
j=1
{Yjlog f(xj,β)f(xj,β)log Yj!}.
Taking derivatives with respect to βand setting equal to zero gives the estimating equation
∂/∂βlog L=
n
X
j=1
{Yjfβ(xj,β)/f(xj,β)fβ(xj,β)}
=
n
X
j=1
f1(xj,β){Yjf(xj,β)}fβ(xj,β) = 0.
Note that, under the Poisson distribution, var(Yj|xj) = f(xj,β).
(b) The loglikelihood is
log L=
n
X
j=1
[Yjlog f(xj,β) + (kjYj) log{1f(xj,β)}].
Thus, taking derivatives with respect to βgives
∂/∂βlog L=
n
X
j=1
[Yjfβ(xj,β)/f(xj,β)(kjYj)fβ(xj,β)/{1f(xj,β)}]
=
n
X
j=1
[kjf(xj,β){1f(xj,β)}]1{Yjkjf(xj,β)}kjfβ(xj,β) = 0.
Note that, under the binomial distribution, var(Yj|xj) = kjf(xj,β){1f(xj,β)}.
(c) Both of these estimating equations are linear in the data Yj. In addition, they both have
a specific form, that of the GLS-type equation in (3.2) of the notes. That is, they have the
form of a deviation (responsemean) times a gradient and a “weight” equal to the inverse of
the variance of the response. This is no accident, as we will see: Both of these distributions
are members of a special class with this property.
2. (a) The likelihood is
L=
n
Y
j=1
Y121
jexp[Yj/{σ2f(xj,β)}]
Γ(12){σ2f(xj,β)}12,
so that
log L=
n
X
j=1
[(121) log YjYj/{σ2f(xj,β)} (12) log {σ2f(xj,β)} log Γ(12)].
Taking derivatives with respect to βyields the estimating equation
∂/∂βlog L=
n
X
j=1
[Yjfβ(xj,β)/{σf (xj,β)}2(12)fβ(xj,β)/f (xj,β)]
= (12)
n
X
j=1
f2(xj,β){Yjf(xj,β)}fβ(xj,β) = 0.
1
pf3
pf4
pf5

Partial preview of the text

Download Comparing Estimating Equations for Gamma and Normal Distributions and more Assignments Statistics in PDF only on Docsity!

ST 762, HOMEWORK 1 EXTRA PROBLEM SOLUTIONS, FALL 2007

  1. (a) The loglikelihood is

log L =

∑^ n

j=

{Yj log f (xj , β) − f (xj , β) − log Yj !}.

Taking derivatives with respect to β and setting equal to zero gives the estimating equation

∂/∂β log L =

∑^ n

j=

{Yj fβ (xj , β)/f (xj , β) − fβ (xj , β)}

∑^ n

j=

f −^1 (xj , β){Yj − f (xj , β)}fβ (xj , β) = 0.

Note that, under the Poisson distribution, var(Yj |xj ) = f (xj , β). (b) The loglikelihood is

log L =

∑^ n

j=

[Yj log f (xj , β) + (kj − Yj ) log{ 1 − f (xj , β)}].

Thus, taking derivatives with respect to β gives

∂/∂β log L =

∑^ n j=

[Yj fβ (xj , β)/f (xj , β) − (kj − Yj )fβ (xj , β)/{ 1 − f (xj , β)}]

∑^ n

j=

[kj f (xj , β){ 1 − f (xj , β)}]−^1 {Yj − kj f (xj , β)}kj fβ (xj , β) = 0.

Note that, under the binomial distribution, var(Yj |xj ) = kj f (xj , β){ 1 − f (xj , β)}. (c) Both of these estimating equations are linear in the data Yj. In addition, they both have a specific form, that of the GLS-type equation in (3.2) of the notes. That is, they have the form of a deviation (response−mean) times a gradient and a “weight” equal to the inverse of the variance of the response. This is no accident, as we will see: Both of these distributions are members of a special class with this property.

  1. (a) The likelihood is

L =

∏^ n

j=

Y 1 /σ

(^2) − 1 j exp[−Yj^ /{σ (^2) f (xj , β)}] Γ(1/σ^2 ){σ^2 f (xj , β)}^1 /σ^2

so that

log L =

∑^ n j=

[(1/σ^2 − 1) log Yj − Yj /{σ^2 f (xj , β)} − (1/σ^2 ) log{σ^2 f (xj , β)} − log Γ(1/σ^2 )].

Taking derivatives with respect to β yields the estimating equation

∂/∂β log L =

∑^ n

j=

[Yj fβ (xj , β)/{σf (xj , β)}^2 − (1/σ^2 )fβ (xj , β)/f (xj , β)]

= (1/σ^2 )

∑^ n

j=

f −^2 (xj , β){Yj − f (xj , β)}fβ (xj , β) = 0.

(b) Now we have

log L = −n log(2π)^1 /^2 − n log σ −

∑^ n j=

log f (xj , β) −

2 σ^2

∑^ n j=

[Yj − f (xj , β)]^2 f 2 (xj , β)

Thus,

∂/∂β log L =

∑^ n

j=

λβ (xj , β) + (1/σ^2 )

∑^ n

j=

[Yj − f (xj , β)]^2 f 2 (xj , β) λβ (xj , β)

+(1/σ^2 )

∑^ n

j=

f −^2 (xj , β){Yj − f (xj , β)}fβ (xj , β)

= (1/σ^2 )

∑^ n

j=

f −^2 (xj , β){Yj − f (xj , β)}fβ (xj , β)

∑^ n

j=

( [Yj − f (xj , β)]^2 σ^2 f 2 (xj , β)

) λβ (xj , β) = 0.

(c) It is straightforward to derive the form of the lognormal density given only the information in the problem, which we do here. If Z = log Y , then the Jacobian of the transformation is 1 /Y , and the density of Y is thus n(log Y ; m, γ^2 )Y −^1 , where n(·; m, γ^2 ) is the normal density with mean m and variance γ^2. Thus, the desired density is

(2π)−^1 /^2 (γY )−^1 exp

{ (log Y − m)^2 2 γ^2

} .

We would like this in terms of the mean and variance of Y. If E(Y ) = f , then using the moment generating function of a normal, we have

E(Y ) = E(eZ^ ) = em+γ (^2) / 2 = f.

We also have E(Y 2 ) = E(e^2 Z^ ) = e^2 m+2γ 2 , so that

var(Y ) = (eγ 2 − 1){E(Y )}^2.

Thus, σ^2 = eγ^2 − 1, and we may deduce that

γ^2 = log(σ^2 + 1), m = log f − log{(σ^2 + 1)/ 2 }.

Applying this to our problem and ignoring constants, we have

log L = −

∑^ n j=

log Yj −(n/2)

∑^ n j=

log{log(σ^2 +1)}−

∑^ n j=

[log Yj − log f (xj , β) + log{(σ^2 + 1)/ 2 }]^2 log(σ^2 + 1)

fβ (xj , β) f (xj , β)

Taking derivatives with respect to β and simplifying yields the estimating equation

∑^ n

j=

[log Yj − log f (xj , β) + log{(σ^2 + 1)/ 2 }]

fβ (xj , β) f (xj , β)

(d) The equation in (a) is of exactly the same “GLS” form as those in Problem 1(a) and (b)

  • linear in the data with “weighting” by the inverse of the variance under the distributional

Only when λ = 0 (the log transformation) will this be the case; otherwise, this is not possible. The implication is that, technically, for positive response, it is impossible for the TBS model to hold! However, if P {h(Y, λ) > − 1 /λ} is close to 1, for all practical purposes we may ignore this technical detail in applications. This explains why this model has been successfully and widely used even with positive response on the original scale.

  1. (a) For (i), note immediately that

1 (β 0 + β 1 /x)−^1 = β 0 + β 1 /x.

Thus, we may write this model alternatively as

Y −^1 − 1 1

f −^1 (x, β) − 1 1

  • e,

with f (x, β) = (β 0 + β 1 /x)−^1. It follows that this is of the form in 3(a) with λ = −1, θ = 0, so that E(Yj |xj ) = f (xj , β) and var(Yj |xj ) = σ^2 f 4 (xj , β). Thus, this model makes an approximate assumption that the variance increases drastically with the mean. For (ii), using the above, we may immediately identify λ = −1 and θ = −1, so that E(Yj |xj ) = f (xj , β) and var(Yj |xj ) = σ^2 x− j 2 f 4 (xj , β). Thus, this model perhaps makes a less severe approximate assumption on variance, as it “tempers” the power of the mean with the inverse of the square of xj. (b) By some algebra, we can write this model as

Y = (β 0 + β 1 /x)−^1 (1 + β 1 e).

Thus, we see that this model makes the assumption that E(Yj |xj ) = (β 0 + β 1 /x)−^1 and var(Yj |xj ) = σ^2 β 12 {E(Yj |xj )}^2 , so that the model does assume constant coefficient of variation, where the coefficient of variation is σβ 1. Note further that

log Y = log(β 0 + β 1 /x)−^1 + log(1 + β 1 e) ≈ log(β 0 + β 1 /x)−^1 + β 1 e

by a Taylor series in the second term about e = 0. Thus, this model is of the form in part (a) with λ = θ = 0 and “errors” with variance σβ 1.

  1. We have f (xj , β) ≈ f (xj , β∗) + f (^) βT (xj , β∗)(β − β∗), fβ (xj , β) ≈ fβ (xj , β∗) + f (^) ββ (xj , β∗)(β − β∗),

g−^2 (β, θ, xj ) ≈ g−^2 (β∗, θ, xj ) + GT^ (β∗, θ, xj )(β − β∗), where G(β, θ, xj ) = ∂/∂β g−^2 (β, θ, xj ), say (a (q × 1) vector). Substituting these approximations into the GLS equation (3.12) on page 59 gives

∑^ n

j=

{g−^2 (β∗θ, xj ) + GT^ (β∗, θ, xj )(β − β∗)}{Yj − f (xj , β∗) − f (^) βT (xj , β∗)(β − β∗)}

×{fβ (xj , β∗) + f (^) ββ (xj , β∗)(β − β∗)}

∑^ n

j=

g−^2 (β∗, θ, xj ){Yj − f (xj , β∗)}fβ (xj , β∗)

∑^ n

j=

g−^2 (β∗, θ, xj )fβ (xj , β∗)f (^) βT (xj , β∗)(β − β∗)

∑^ n

j=

g−^2 (β∗, θ, xj ){Yj − f (xj , β∗)}f (^) ββ (xj , β∗)(β − β∗)

∑^ n

j=

g−^2 (β∗, θ, xj )f (^) βT (xj , β∗)(β − β∗)fββ (xj , β∗)(xj , β∗)

∑^ n

j=

GT^ (β∗, θ, xj )(β − β∗) × terms involving {Yj − f (xj , β∗)}, (β − β∗).

The first two terms involve the data and are linear in (β − β∗). The third depends on the product of {Yj − f (xj , β∗)} and (β − β∗), which is expected to be “smaller” for β∗^ close to β than the first two terms. The fourth term is quadratic in (β −β∗), so should also be “smaller” that the first two. The remaining terms involve at least products of {Yj − f (xj , β∗)} and (β − β∗), so are also “smaller.” Thus, as in the argument in Section 3.2, we disregard these terms. Note that the presence of β in the “weights” really doesn’t affect the form of the linear approximation at β∗. We are left with ∑^ n

j=

g−^2 (β∗, θ, xj ){Yj −f (xj , β∗)}fβ (xj , β∗) ≈

∑^ n

j=

g−^2 (β∗, θ, xj )fβ (xj , β∗)f (^) βT (xj , β∗)(β−β∗),

which may be rewritten in obvious matrix notation as

{XT^ (β∗)W (β∗)X(β∗)}(β − β∗) ≈ XT^ (β∗)W (β∗){Y − f (β∗)},

yielding the required updating scheme.

  1. (a) If you plot the data you will notice that there are two distinct phases of decay. It turns out that my algorithm uses the final 6 observations to fit the “second phase.” The remaining 5 observations are used to fit the “first phase.” The method I used is based on the hint. I start with the last 3 observations (the farthest out in time) and fit a straight line to them by simple linear regression, using log Y as the response. This is based on the fact that, in this region, Y ≈ β 3 e−β^4 x, or log Y ≈ log β 3 −β 4 x = β∗ 3 +β 4 ∗ x, say. Thus, I obtain estimates for β 3 ∗ and β 4 ∗. I then construct a 90% prediction interval for log Y at x 0 corresponding to the time of the next observation (backward in time). For this, I use the standard prediction interval formula for simple linear regression based on the fit with the n = 3 final values. That is, the estimated standard deviation of the prediction error at x 0 is σˆ

{ 1 + n−^1 (x 0 − ¯x)^2 ∑n j=1(xj^ −^ x¯)^2

} = SD,

where ¯x is the mean of the n x values involved in the fit and ˆσ^2 is the usual estimator for constant variance in simple linear regression based on the n observations. I obtain the 90% prediction interval as ( βˆ∗ 3 + βˆ∗ 4 x 0 ) ± t 0. 95 SD,