






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A solution to a problem comparing the wald statistic for θ and log(θ) in an exponential regression model. It includes the likelihood function, the profile likelihood of β1 and θ, and the calculation of the wald statistic for log(θ) and θ. The document also discusses the appropriateness of the wald statistic based on the quadratic approximation of the log-likelihood function.
Typology: Assignments
1 / 10
This page cannot be seen from the preview
Don't miss anything!







Stat613: Intermediate Theory of Statistics
Instructor: Jianhua Huang Students: Anna Dao TA: Seokho Lee Yogesh Dwivedi
Exercise 11.21: Find the profile likelihood of θ = θ 1 /θ 2 for the rat data example using the exponential model; report the approximate 95% CI for θ. Compare the Wald statistic based on θ̂ and log(θ̂); which is more appropriate?
Solution
Problem setting: Two groups of rats were exposed to carcinogen, and number of days (yi + 100) to censoring or to death due to cancer was recorded. Two groups were distinguished by a pretreatment regimen (x 1 =
... = xn 1 = 0, xn 1 +1 =... = xn = 1), where n = n 1 + n 2. In this setting, n 1 = 19, n 2 = 21, n = 40.
Subject Time (Y ) Uncensored (δ) Treatment (X) 1 43 1 0 .. .
n 1 144 0 0 n 1 + 1 42 1 1 .. .
n 244 0 1
We assume
yi ∼ Exponential(θi)
and apply the concept of generalized linear model, in particular the log-link
log(θi) = xTi β (1)
Based on the observed data (y 1 , δ 1 , x 1 ),... , (yn, δn, xn), the likelihood function of the regression parameter β can be written as
L(β) =
∏^ n
i=
{pθi (yi)}δi^ {Pθi (Yi > yi)}^1 −δi^ (2)
where θ is a function of β.
Problem solving: From Eq.(1)
log(θi) = β 0 , i = 1,... , n 1 log(θi) = β 0 + β 1 , i = n 1 + 1,... , n
That is, log(θ 1 ) := log(θi) or θ 1 = eβ^0 for i = 1,... , n 1
and log(θ 2 ) := log(θi) or θ 2 = eβ^0 +β^1 for i = n 1 + 1,... , n
We recognize the following relation:
log(θ) = −β 1 and θ = e−β^1
Therefore, to find the profile likelihood of θ is to find the profile likelihood of β 1. For that reason, we will present the right hand side of the Eq.(2) as a function of β. After this, β̂ 0 will be presented as a function of β 1 in the likelihood. That is, the likelihood function become the profile likelihood of β 1. Then via transformation θ = eβ^1 , we will obtain the profile likelihood of θ.
From the model assumption, we have:
pθi (yi) =
θi
eyi/θi
Pθi (Ti > yi) = e−yi/θi
and the log likelihood l(β) from Eq.(2) becomes
l(β) =
∑^ n
i=
−δi log(θi) −
yiδi θi
(1 − δi)yi θi
∑^ n
i=
−δi(xTi β) −
yi ex Ti β
∑^ n^1
i=
−δiβ 0 − yie−β^0 +
∑^ n
i=n 1 +
−δi(β 0 + β 1 ) − yie−β^0 −β^1
Let d 1 =
∑n 1 i=1 δi,^ d^2 =^
∑n i=n 1 +1 δi^ and^ v^1 =^
∑n 1 i=1 yi^ ,^ v^2 =^
∑n 2 i=n 1 +1 yi. We obtain
l(β) = −d 1 β 0 − v 1 e−β^0 − d 2 (β 0 + β 1 ) − v 2 e−β^0 −β^1 (3)
Then
∂l ∂β 0
= −d 1 + v 1 e−β^0 − d 2 + v 2 e−β^0 −β^1
From setting ∂l/∂β 0 = 0, we have
e− β̂ 0 = d 1 + d 2 v 1 + v 2 e−β^1
or equivalently
β̂ 0 = log
( (^) v 1 +^ v 2 e−β^1 d 1 + d 2
Plug this result in Eq.(3), we gain the profile log likelihood of β 1
l(β 1 ) = d 1 log
( (^) d 1 +^ d 2 v 1 + v 2 e−β^1
− v 1
d 1 + d 2 v 1 + v 2 e−β^1
log
( (^) d 1 +^ d 2 v 1 + v 2 e−β^1
− β 1
− v 2
d 1 + d 2 v 1 + v 2 e−β^1
e−β^1
= (d 1 + d 2 ) ln
( (^) d 1 +^ d 2 v 1 eβ^1 + v 2
Note that e−β^1 = θ and β 1 = − log θ, the profile log likelihood of θ is
l(θ) = (d 1 + d 2 ) ln
( (^) d 1 +^ d 2 v 1 θ + v 2
0.5 1.0 1.5 2.
−
−
−
−
−
−
0
(a) Log−likelihood of θθ
θθ
l(θθ
)
0.5 1.0 1.5 2.
−
−
−
−
−
−
0
(b) Log−likelihood of θθ
θθ
l(θθ
)
Figure 1: ( a) Poor quadratic approximations based on θ̂ (dotted) of the log-likelihood function of θ (solid). (b) Good quadratic approximations based on log ̂θ (dotted) of the log-likelihood function of θ (solid).
Exponential model: Note that from the score equations in Exercise 11.25 we can obtain
e− β̂ 0 =
d 1 v 1
or β̂ 0 = log
( (^) v 1 d 1
e− β̂ 1 =
d 2 v 1 d 1 v 2
or β̂ 1 = log
( (^) d 1 v 2 d 2 v 1
That is, β̂ 0 = log(2195/17) = 4.861 and β̂ 1 = log(2923 × 17 /(2195 × 19)) = 0. 175
General model: Under the General Extreme Distribution, the estimate of the regression parameters were obtained using the ”optim” function in R. The function that was maximized was the profile likelihood of σ, as given in Exercise 11.27 solution.
l(σ) = −(d 1 + d 2 ) log(σ) +
∑^ n
i=
δi
qi − xi β̂ σ
∑^ n
i=
exp
qi − xi β̂ σ
which was maximized at ̂σ = 0.322. Using this value and the following relation calculated:
β̂ 1 = σ log
d 1 v 2 ∗ d 2 v 1 ∗
, where v∗ 1 =
∑^ n^1
i=
y 1 /σ i , v
∗ 2 =
n (^1) ∑+n 2
i=n 1 +
y 1 /σ i
β̂ 0 = σ log
d 1 v 1 ∗ + d 2 v∗ 1 d 1 (d 1 + d 2 )
the estimate of β 1 and β 0 was found to be 0.213 and 4.874 respectively.
Alternative Approach (To be explained in class):The estimates of β 0 and β 1 could also be found using the following R function, which was obtained from Author’s website :
require(survival5)
xdat<- scan(’rat.dat’,skip=1, what=list(group=0,surv=0,status=0)) attach(xdat) t1 <- 100 surv<- xdat$surv- t reg1<- survreg(Surv(surv,status)~factor(group), dist=’exponential’) print(summary(reg1)) reg2<- survreg(Surv(surv,status)~factor(group), dist=’weibull’) print(summary(reg2))
Exercise 11.25: Derive the theoretically the Fisher information for the regression parameter (β 0 , β 1 ) under the exponential and the general extreme-value distributions in Section 11.6.
Solution
Exponential Model: From the full likelihood in Eq.(3), we compute the following derivatives
∂l(β) ∂β 0 = −(d 1 + d 2 ) + e−β^0 v 1 + e−β^0 −β^1 v 2
∂l(β) ∂β 1 = −d 2 + e−β^0 −β^1 v 2
The second derivatives with respect to β 0 and β 1 are
∂^2 l(β) ∂^2 β 0
= −e−β^0 v 1 − e−β^0 −β^1 v 2
∂^2 l(β) ∂^2 β 1
= −e−β^0 −β^1 v 2
∂^2 l(β) ∂β 0 ∂β 1 = −e−β^0 −β^1 v 2
Therefore, the Fisher information is
I(β) = e−β^0 −β^1
v 2 + eβ^1 v 1 v 2 v 2 v 2
General Model: To get the Fisher Information of (β 0 ,β 1 ) under the General Extreme Value Distribution, note that if Y is exponential is with mean θ, then
log(Y ) = β 0 + W
where β 0 = log(θ) and W has the standard extreme value distribution with density
p(w) = ew^ exp(−ew)
and survival function P (W > w) = exp(−ew)
A more flexible regression model between a covariate vector xi and outcome Yi can be developed by assuming
log(Yi) = log(θi) + σWi
where W (^) i′ s are iid with standard extreme value distribution and σ is a scale parameter. Let Q = log(Y ), Then the pdf of Y given x is given by
fQ(qi|xi) =
σ
exp
qi − μ(xi) σ
− exp
qi − μ(xi) σ
, - ∞ < qi < ∞, μ(xi) = log(θi)
Exercise 11.26: Compute the Fisher information for the observed data, and verify the standard errors given in the Example 11.6. Check the quadratic approximation of the profile likelihood of the slope parameter β̂ 1.
Solution
Exponential model: From Exercise 11.26, we know that
I(β) = e−β^0 −β^1
v 2 + eβ^1 v 1 v 2 v 2 v 2
Employ the information of β̂ 0 and β̂ 1 , the observed Fisher information becomes
I(β̂) = e− β̂ 0 −̂β 1
v 2 + ê^ β^1 v 1 v 2 v 2 v 2
d 2 v 2 v 2 +^
d 2 v 2
d 1 v 2 d 2 v 1 v^1
d 2 v 2 v 2 d 2 v 2 v 2
d 2 v 2 v 2
d 1 + d 2 d 2 d 2 d 2
The Fisher information from the observed data demonstrates that only the uncensored events are informative, which is intuitive.
The variance matrix of the estimates is just the inverse of the observed Fisher information matrix.
V ar(β̂) =
d 1 −^
1 d 1 − (^) d^11 d^11 + (^) d^12
Then se(β̂ 0 ) = 1/
17 = 0.243 and se(β̂ 1 ) = 1/
General model: To compute the Fisher Information (under the General Extreme Value Distribution) for the observed data, we first get
I(β,̂ ̂σ) =
Define:
which gives
I∗(β 0 , β 1 ) =
−1.0 −0.5 0.0 0.5 1.0 1.
−
−
−
0
Profile log−likelihood of ββ_
ββ_
l(ββ
_1)
Figure 2: ( a) Good quadratic approximations based on β̂ 1 (dotted) of the log-likelihood function of β 1 (solid).
Now, using the ”solve” function in R, we take the inverse of I∗(β 0 , β 1 ), and we get the covariance matrix as
I∗(β 0 , β 1 ) =
Now, under the General Extreme Value Distribution the se(β̂ 0 ) =
√^0 .006123487 = 0.078, and^ se(β̂^1 ) = 0 .011379024 = 0.107, which is same as the result in Example 11.6, except for discrepancy at the 3rd decimal digit.
Now, to check the quadratic approximation of the the profile likelihood of the slope parameter β 1 , we try to see how well is the following approximation:
l(β 1 ) − l(β̂ 1 ) ≈ −
I(β̂ 1 )(β 1 − β̂ 1 )^2
The approximation above is investigated graphically, Fig.(2), by plotting l(β 1 ) − l(β̂ 1 ) and superimposing −(1/2)I(β̂ 1 )(β 1 − β̂ 1 )^2 to see how well it follows.
Exercise 11.27: Find the profile likelihood of the scale parameter σ in the Example 11.6, and verify the likelihood ratio statistic W = 44.8. Report also the Wald test H 0 : σ = 1, and discuss whether it is appropriate.
Solution
We will use the l(β, σ) from Ex 11.25 to find theprofile log likelihood of σ. Notice,
l(β, σ) = log L(β, σ) = −(d 1 + d 2 ) log(σ) +
∑^ n
i=
δi
qi − xiβ σ
∑^ n
i=
exp
qi − xiβ σ
0.25 0.30 0.35 0.40 0.45 0.50 0.
−
−
−
−
−
−
0
Profile log−likelihood of σσ
σσ
l(σσ
)
Figure 3: ( a) Poor quadratic approximations based on β̂ (dotted) of the log-likelihood function of β (solid).
Now, the observed information, I(̂σ) is computed by finding I(σ) at the ̂σ. Using ’R’ I(̂σ) = 1404.174, therefore se(̂σ) = 1/
1404 .174 = 0.026. Therefore the Wald Statistic is -26.15. Hence using Wald test for H 0 : σ = 1 vs Ha : σ 6 = 1, is rejected with a p-value 0 .001. The Wald test is not appropriate owing to the poor quadratic approximation of the log profile likelihood as evident Fig.(3).