Comparison of Wald Statistic for θ and log(θ) in Exponential Regression Model - Prof. Jian, Assignments of Statistics

A solution to a problem comparing the wald statistic for θ and log(θ) in an exponential regression model. It includes the likelihood function, the profile likelihood of β1 and θ, and the calculation of the wald statistic for log(θ) and θ. The document also discusses the appropriateness of the wald statistic based on the quadratic approximation of the log-likelihood function.

Typology: Assignments

Pre 2010

Uploaded on 02/10/2009

koofers-user-w5r
koofers-user-w5r 🇺🇸

10 documents

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Stat613: Intermediate Theory of Statistics
Homework 9 (due 11/07/07)
Instructor: Jianhua Huang Students: Anna Dao
TA: Seokho Lee Yogesh Dwivedi
Exercise 11.21: Find the profile likelihood of θ=θ12for the rat data example using the exponential
model; report the approximate 95% CI for θ. Compare the Wald statistic based on b
θand log(b
θ); which is
more appropriate?
Solution
Problem setting: Two groups of rats were exposed to carcinogen, and number of days (yi+100) to censoring
or to death due to cancer was recorded. Two groups were distinguished by a pretreatment regimen (x1=
. . . =xn1= 0, xn1+1 =. . . =xn= 1), where n=n1+n2. In this setting, n1= 19, n2= 21, n = 40.
Subject Time (Y) Uncensored (δ) Treatment (X)
1 43 1 0
.
.
..
.
..
.
..
.
.
n1144 0 0
n1+ 1 42 1 1
.
.
..
.
..
.
..
.
.
n244 0 1
We assume
yiExponential(θi)
and apply the concept of generalized linear model, in particular the log-link
log(θi) = xT
iβ(1)
Based on the observed data (y1, δ1, x1), . . . , (yn, δn, xn), the likelihood function of the regression parameter
βcan be written as
L(β) =
n
Y
i=1{pθi(yi)}δi{Pθi(Yi> yi)}1δi(2)
where θis a function of β.
Problem solving: From Eq.(1)
log(θi) = β0, i = 1, . . . , n1
log(θi) = β0+β1, i =n1+ 1, . . . , n
That is,
log(θ1) := log(θi) or θ1=eβ0for i= 1, . . . , n1
1
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Comparison of Wald Statistic for θ and log(θ) in Exponential Regression Model - Prof. Jian and more Assignments Statistics in PDF only on Docsity!

Stat613: Intermediate Theory of Statistics

Homework 9 (due 11/07/07)

Instructor: Jianhua Huang Students: Anna Dao TA: Seokho Lee Yogesh Dwivedi

Exercise 11.21: Find the profile likelihood of θ = θ 1 /θ 2 for the rat data example using the exponential model; report the approximate 95% CI for θ. Compare the Wald statistic based on θ̂ and log(θ̂); which is more appropriate?

Solution

Problem setting: Two groups of rats were exposed to carcinogen, and number of days (yi + 100) to censoring or to death due to cancer was recorded. Two groups were distinguished by a pretreatment regimen (x 1 =

... = xn 1 = 0, xn 1 +1 =... = xn = 1), where n = n 1 + n 2. In this setting, n 1 = 19, n 2 = 21, n = 40.

Subject Time (Y ) Uncensored (δ) Treatment (X) 1 43 1 0 .. .

n 1 144 0 0 n 1 + 1 42 1 1 .. .

n 244 0 1

We assume

yi ∼ Exponential(θi)

and apply the concept of generalized linear model, in particular the log-link

log(θi) = xTi β (1)

Based on the observed data (y 1 , δ 1 , x 1 ),... , (yn, δn, xn), the likelihood function of the regression parameter β can be written as

L(β) =

∏^ n

i=

{pθi (yi)}δi^ {Pθi (Yi > yi)}^1 −δi^ (2)

where θ is a function of β.

Problem solving: From Eq.(1)

log(θi) = β 0 , i = 1,... , n 1 log(θi) = β 0 + β 1 , i = n 1 + 1,... , n

That is, log(θ 1 ) := log(θi) or θ 1 = eβ^0 for i = 1,... , n 1

and log(θ 2 ) := log(θi) or θ 2 = eβ^0 +β^1 for i = n 1 + 1,... , n

We recognize the following relation:

log(θ) = −β 1 and θ = e−β^1

Therefore, to find the profile likelihood of θ is to find the profile likelihood of β 1. For that reason, we will present the right hand side of the Eq.(2) as a function of β. After this, β̂ 0 will be presented as a function of β 1 in the likelihood. That is, the likelihood function become the profile likelihood of β 1. Then via transformation θ = eβ^1 , we will obtain the profile likelihood of θ.

From the model assumption, we have:

pθi (yi) =

θi

eyi/θi

Pθi (Ti > yi) = e−yi/θi

and the log likelihood l(β) from Eq.(2) becomes

l(β) =

∑^ n

i=

−δi log(θi) −

yiδi θi

(1 − δi)yi θi

∑^ n

i=

−δi(xTi β) −

yi ex Ti β

∑^ n^1

i=

−δiβ 0 − yie−β^0 +

∑^ n

i=n 1 +

−δi(β 0 + β 1 ) − yie−β^0 −β^1

Let d 1 =

∑n 1 i=1 δi,^ d^2 =^

∑n i=n 1 +1 δi^ and^ v^1 =^

∑n 1 i=1 yi^ ,^ v^2 =^

∑n 2 i=n 1 +1 yi. We obtain

l(β) = −d 1 β 0 − v 1 e−β^0 − d 2 (β 0 + β 1 ) − v 2 e−β^0 −β^1 (3)

Then

∂l ∂β 0

= −d 1 + v 1 e−β^0 − d 2 + v 2 e−β^0 −β^1

From setting ∂l/∂β 0 = 0, we have

e− β̂ 0 = d 1 + d 2 v 1 + v 2 e−β^1

or equivalently

β̂ 0 = log

( (^) v 1 +^ v 2 e−β^1 d 1 + d 2

Plug this result in Eq.(3), we gain the profile log likelihood of β 1

l(β 1 ) = d 1 log

( (^) d 1 +^ d 2 v 1 + v 2 e−β^1

− v 1

d 1 + d 2 v 1 + v 2 e−β^1

  • d 2

log

( (^) d 1 +^ d 2 v 1 + v 2 e−β^1

− β 1

− v 2

d 1 + d 2 v 1 + v 2 e−β^1

e−β^1

= (d 1 + d 2 ) ln

( (^) d 1 +^ d 2 v 1 eβ^1 + v 2

  • d 1 β 1 − (d 1 + d 2 )

Note that e−β^1 = θ and β 1 = − log θ, the profile log likelihood of θ is

l(θ) = (d 1 + d 2 ) ln

( (^) d 1 +^ d 2 v 1 θ + v 2

  • d 1 log θ − (d 1 + d 2 )

0.5 1.0 1.5 2.

0

(a) Log−likelihood of θθ

θθ

l(θθ

)

0.5 1.0 1.5 2.

0

(b) Log−likelihood of θθ

θθ

l(θθ

)

Figure 1: ( a) Poor quadratic approximations based on θ̂ (dotted) of the log-likelihood function of θ (solid). (b) Good quadratic approximations based on log ̂θ (dotted) of the log-likelihood function of θ (solid).

Exponential model: Note that from the score equations in Exercise 11.25 we can obtain

e− β̂ 0 =

d 1 v 1

or β̂ 0 = log

( (^) v 1 d 1

e− β̂ 1 =

d 2 v 1 d 1 v 2

or β̂ 1 = log

( (^) d 1 v 2 d 2 v 1

That is, β̂ 0 = log(2195/17) = 4.861 and β̂ 1 = log(2923 × 17 /(2195 × 19)) = 0. 175

General model: Under the General Extreme Distribution, the estimate of the regression parameters were obtained using the ”optim” function in R. The function that was maximized was the profile likelihood of σ, as given in Exercise 11.27 solution.

l(σ) = −(d 1 + d 2 ) log(σ) +

∑^ n

i=

δi

qi − xi β̂ σ

∑^ n

i=

exp

qi − xi β̂ σ

which was maximized at ̂σ = 0.322. Using this value and the following relation calculated:

β̂ 1 = σ log

d 1 v 2 ∗ d 2 v 1 ∗

, where v∗ 1 =

∑^ n^1

i=

y 1 /σ i , v

∗ 2 =

n (^1) ∑+n 2

i=n 1 +

y 1 /σ i

β̂ 0 = σ log

d 1 v 1 ∗ + d 2 v∗ 1 d 1 (d 1 + d 2 )

the estimate of β 1 and β 0 was found to be 0.213 and 4.874 respectively.

Alternative Approach (To be explained in class):The estimates of β 0 and β 1 could also be found using the following R function, which was obtained from Author’s website :

require(survival5)

xdat<- scan(’rat.dat’,skip=1, what=list(group=0,surv=0,status=0)) attach(xdat) t1 <- 100 surv<- xdat$surv- t reg1<- survreg(Surv(surv,status)~factor(group), dist=’exponential’) print(summary(reg1)) reg2<- survreg(Surv(surv,status)~factor(group), dist=’weibull’) print(summary(reg2))

Exercise 11.25: Derive the theoretically the Fisher information for the regression parameter (β 0 , β 1 ) under the exponential and the general extreme-value distributions in Section 11.6.

Solution

Exponential Model: From the full likelihood in Eq.(3), we compute the following derivatives

∂l(β) ∂β 0 = −(d 1 + d 2 ) + e−β^0 v 1 + e−β^0 −β^1 v 2

∂l(β) ∂β 1 = −d 2 + e−β^0 −β^1 v 2

The second derivatives with respect to β 0 and β 1 are

∂^2 l(β) ∂^2 β 0

= −e−β^0 v 1 − e−β^0 −β^1 v 2

∂^2 l(β) ∂^2 β 1

= −e−β^0 −β^1 v 2

∂^2 l(β) ∂β 0 ∂β 1 = −e−β^0 −β^1 v 2

Therefore, the Fisher information is

I(β) = e−β^0 −β^1

v 2 + eβ^1 v 1 v 2 v 2 v 2

General Model: To get the Fisher Information of (β 0 ,β 1 ) under the General Extreme Value Distribution, note that if Y is exponential is with mean θ, then

log(Y ) = β 0 + W

where β 0 = log(θ) and W has the standard extreme value distribution with density

p(w) = ew^ exp(−ew)

and survival function P (W > w) = exp(−ew)

A more flexible regression model between a covariate vector xi and outcome Yi can be developed by assuming

log(Yi) = log(θi) + σWi

where W (^) i′ s are iid with standard extreme value distribution and σ is a scale parameter. Let Q = log(Y ), Then the pdf of Y given x is given by

fQ(qi|xi) =

σ

exp

[

qi − μ(xi) σ

− exp

qi − μ(xi) σ

)]

, - ∞ < qi < ∞, μ(xi) = log(θi)

Exercise 11.26: Compute the Fisher information for the observed data, and verify the standard errors given in the Example 11.6. Check the quadratic approximation of the profile likelihood of the slope parameter β̂ 1.

Solution

Exponential model: From Exercise 11.26, we know that

I(β) = e−β^0 −β^1

v 2 + eβ^1 v 1 v 2 v 2 v 2

Employ the information of β̂ 0 and β̂ 1 , the observed Fisher information becomes

I(β̂) = e− β̂ 0 −̂β 1

v 2 + ê^ β^1 v 1 v 2 v 2 v 2

d 2 v 2 v 2 +^

d 2 v 2

d 1 v 2 d 2 v 1 v^1

d 2 v 2 v 2 d 2 v 2 v 2

d 2 v 2 v 2

d 1 + d 2 d 2 d 2 d 2

The Fisher information from the observed data demonstrates that only the uncensored events are informative, which is intuitive.

The variance matrix of the estimates is just the inverse of the observed Fisher information matrix.

V ar(β̂) =

d 1 −^

1 d 1 − (^) d^11 d^11 + (^) d^12

Then se(β̂ 0 ) = 1/

17 = 0.243 and se(β̂ 1 ) = 1/

General model: To compute the Fisher Information (under the General Extreme Value Distribution) for the observed data, we first get

I(β,̂ ̂σ) =

Define:

I 11 =

I 12 =

I 22 =

I 21 =

which gives

I∗(β 0 , β 1 ) =

−1.0 −0.5 0.0 0.5 1.0 1.

0

Profile log−likelihood of ββ_

ββ_

l(ββ

_1)

Figure 2: ( a) Good quadratic approximations based on β̂ 1 (dotted) of the log-likelihood function of β 1 (solid).

Now, using the ”solve” function in R, we take the inverse of I∗(β 0 , β 1 ), and we get the covariance matrix as

I∗(β 0 , β 1 ) =

Now, under the General Extreme Value Distribution the se(β̂ 0 ) =

√^0 .006123487 = 0.078, and^ se(β̂^1 ) = 0 .011379024 = 0.107, which is same as the result in Example 11.6, except for discrepancy at the 3rd decimal digit.

Now, to check the quadratic approximation of the the profile likelihood of the slope parameter β 1 , we try to see how well is the following approximation:

l(β 1 ) − l(β̂ 1 ) ≈ −

I(β̂ 1 )(β 1 − β̂ 1 )^2

The approximation above is investigated graphically, Fig.(2), by plotting l(β 1 ) − l(β̂ 1 ) and superimposing −(1/2)I(β̂ 1 )(β 1 − β̂ 1 )^2 to see how well it follows.

Exercise 11.27: Find the profile likelihood of the scale parameter σ in the Example 11.6, and verify the likelihood ratio statistic W = 44.8. Report also the Wald test H 0 : σ = 1, and discuss whether it is appropriate.

Solution

We will use the l(β, σ) from Ex 11.25 to find theprofile log likelihood of σ. Notice,

l(β, σ) = log L(β, σ) = −(d 1 + d 2 ) log(σ) +

∑^ n

i=

δi

qi − xiβ σ

∑^ n

i=

exp

qi − xiβ σ

0.25 0.30 0.35 0.40 0.45 0.50 0.

0

Profile log−likelihood of σσ

σσ

l(σσ

)

Figure 3: ( a) Poor quadratic approximations based on β̂ (dotted) of the log-likelihood function of β (solid).

Now, the observed information, I(̂σ) is computed by finding I(σ) at the ̂σ. Using ’R’ I(̂σ) = 1404.174, therefore se(̂σ) = 1/

1404 .174 = 0.026. Therefore the Wald Statistic is -26.15. Hence using Wald test for H 0 : σ = 1 vs Ha : σ 6 = 1, is rejected with a p-value  0 .001. The Wald test is not appropriate owing to the poor quadratic approximation of the log profile likelihood as evident Fig.(3).