











Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The cox proportional hazards model, focusing on the survivor and hazard functions. It covers the relationships between these functions, their estimation, and diagnostic statistics. The model assumes the hazard function for an individual is proportional to an expiring risk, with the baseline hazard function being constant for individuals with the same covariates. The document also discusses the estimation of the baseline survivor function and the use of diagnostic statistics.
Typology: Study notes
1 / 19
This page cannot be seen from the preview
Don't miss anything!












1
Cox (1972) first suggested the models in which factors related to lifetime have a multiplicative effect on the hazard function. These models are called proportional hazards (PH) models. Under the proportional hazards assumption, the hazard function h of t given X is of the form
where x is a known vector of regressor variables associated with the individual, β
individual with x = 0. Hence, for any two covariates sets x 1 and x 2 , the log
When a factor does not affect the hazard function multiplicatively, stratification may be useful in model building. Suppose that individuals can be assigned to one of m different strata, defined by the levels of one or more factors. The hazard function for an individual in the jth stratum is defined as
There are two unknown components in the model: the regression parameter β^ and the
We begin by considering a nonnegative random variable T representing the
function (the probability of an individual surviving until time t). Hence
S t f u du t
∞
h t
f t S t
x
x x
and (4) is
S t h u du
t
0
Thus,
ln S t | h u | du
t
0
For some purposes, it is also useful to define the cumulative hazard function
H t h u du S t
t
0
Assume that the hazard function has the form of equation (1). The survivor function can be written as
S t | S t
exp x
x
′ 0
β (8)
where d (^) ji is the sum of case weights of individuals whose lifetime is equal to t (^) ji
and S (^) ji is the weighted sum of the regression vector x for those d (^) ji individuals,
wl is the case weight of individual l, and R (^) ji is the set of individuals alive and
uncensored just prior to t (^) ji in the jth stratum. Thus the log-likelihood arising from
equation (12a) is
l L (^) ji d w e i
k
j
m ji l i l R
k
j
j m l
ji
j = = ′ −
= =
′
= = ∈
1 1 1 1
(12b)
and the first derivatives of l are
l S d
w x e
w e r r^ p
l ji l
ji
j
r
ji
r ji
l lr l R
l l R
i
k
j
m β
∂ ∂ β
′ ∈ ′
∈
= =
x
x
β
β 1 1
In equation (13) , S (^) ji
is the rth component of S (^) ji ji ji p
maximum partial likelihood estimate (MPLE) of β is obtained by setting
∂ β
l r equal to zero for r = 1, K ,p, where^ p^ is the number of independent variables in the
model. The equations
∂ β
l r p r
Newton-Raphson method.
under translation. All the covariates are centered by their corresponding overall
mean. The overall mean of a covariate is defined as the sum of the product of
weight and covariate for all the censored and uncensored cases in each stratum. For
notational simplicity, x l used in the Estimation Section denotes centered
covariates.
Three convergence criteria for the Newton-Raphson method are available:
iteration; that is,
parameter estimate for previous iteration
δ
The asymptotic covariance matrix for the MPLE β =$^ $^ , , $
I −^1 where I is the information matrix containing minus the second partial
derivatives of ln L. The (r, s)-th element of I is defined by
x
x
x x
x
rs r s
ji
l ls lr l R
l l R
l lr l R
l ls l R
l l R
i
k
j
m
d
w x x e
w e
w x e w x e
w e
l ji l
ji
l ji
l ji
l ji
j
′ ∈ ′
∈
′ ∈
′ ∈
′ ∈
= =
∂ β ∂ β
2
2 1 1
ln
β
β
β β
β
We can also write I in a matrix form as
I (^) rs d (^) ji x t (^) ji V t (^) ji x tji i
k
j
m j = ′ = =
1 1
L p (^) i p (^) i p p
w
l D
i
w
l C
k
w
i l C
k l l l
i
l l
i
l l
k
1 1 1 (^1 )
′ ′
∈
−
′
∈
′
= ∈
where Di is the set of individuals dying at t (^) i and Ci is the set of individuals with
is empty and p (^) k = 0 .)
L (^) i
w
i l D
k i
w
l R D
l l
i
l l
i i
1 1
′
= ∈
′
∈ −
Differentiating ln L 1 with respect to α 1 , K ,αk and setting the equations equal to zero, we get
w l l w i k
l D i
l l l R l i i
exp exp , , exp
′ ∈ ∈
x x x
β β β
α
We then plug the MPLE β$ of β into equation (15) and solve these k equations separately. There are two things worth noting:
exp $
exp $
exp $
α (^) i
i i
l l l R
w
w i
l
− ′
x
x
x
β
β
β
for α$ (^) i is
$ (^) exp exp $
α (^) i i l l l R
d w i
∈
where d (^) i wl l Di
∈
:
S t (^) i i t (^) it
<
exist, Breslow (1974) suggests using equation (17) as an estimate for α (^) i ; however, we will use this as an initial estimate.
and Prentice (1980). At a specified time t, it is consistently estimated by
var − ln $^ = exp ′$
∈
−
<
−
t (^) i t l Ri
0
2
where a is a p × 1 vector with the jth element defined by
w x
w
i
l lj l l R
l l l R
t t
i
i
i
exp $
exp $
∈
∈
<
x
x
β
β
2
The Wald statistic is calculated for the variables in the model to select variables for removal. The Wald statistic for variable x (^) j is defined by
β′ (^) j B 11 ,j βj
where β$ (^) j is the parameter estimate associated with x (^) j and B 11, j is the submatrix of A 11 −^1 associated with x (^) j.
The LR statistic is defined as twice the log of the ratio of the likelihood functions of two models evaluated at their own MPLES. Assume that r variables are in the current model and let us call the current model the full model. Based on the MPLES of parameters for the full model, l(full) is defined in equation (12b). For each of r variables deleted from the full model, MPLES are found and the reduced log-likelihood function, l(reduced), is calculated. Then LR statistic is defined as
–2(l(reduced) – l(full))
The conditional statistic is also computed for every variable in the model. The formula for conditional statistic is the same as LR statistic except that the parameter estimates for each reduced model are conditional estimates, not MPLES.
The conditional estimates are defined as follows. Let β$^ = β$^ , , β$
MPLES for the r variables (blocks) and C be the asymptotic covariance for the parameters left in the model given β$i is
β (^) i β (^) i β i i
− C (^) 12 C 22
1
without β$i , C 12
is the covariance between the parameter estimates left in the
is the covariance of β$i. Then the conditional statistic for variable x i is defined by
where l (^) i
Note that all these four statistics have a chi-square distribution with degrees of freedom equal to the number of parameters the corresponding model has.
The initial model for the first method is for a model that does not include covariates. The log-likelihood function l is equal to
l d (^) ji nji i
k
j
m j 0 1 1
∗
= =
where n ∗ji^ is the sum of weights of individuals in set R (^) ji.
When a stepwise method is requested, at each step, -2 log-likelihood function and three chi-square statistics (model chi-square, improvement chi-square, and overall chi-square) and their corresponding degrees of freedom and significance are printed.
For each of the single variables in the equation, MPLE, SE for MPLE, Wald statistic, and its corresponding df, significance, and partial R are given. For a single variable, R is defined by
Wald 2 log - likelihood for the intial model
sign of MPLE
1 2
if Wald > 2. Otherwise R is set to zero. For a multiple category variable, only the Wald statistic, df, significance, and partial R are printed, where R is defined by
Wald df 2 log - likelihood for the intial model
1 2
if Wald > 2 df. Otherwise R is set to zero.
For each of the variables not in the equation, the Score statistic is calculated and its corresponding degrees of freedom, significance, and partial R are printed. The partial R for variables not in the equation is defined similarly to the R for the variables in the equation by changing the Wald statistic to the Score statistic. There is one overall statistic called the residual chi-square. This statistic tests if all regression coefficients for the variables not in the equation are zero. It is defined by
with respect to all the parameters not in the equation evaluated at MPLE β$ and B 22 is equal to A (^) 22 A (^) 21 A (^) 111 A^12
1 − −^
−
S 0 has been discussed in equations (15) through (18). It is easy to see from
and, for a given x ,
$ (^) | $ exp^ $
$ exp $ S t S t S t
a x
x x
′ (^) ∗ ′ − 0 0
β β
The asymptotic variances are
and
2 β 0
∆βl = − −v rl l
m
where
w w w
v d p t t t t
t p t p t
m u v v
n
l i l i l i i i i
k
i i n i
l l l l
ji
j
ji
=
−
diag , ,
1
1
1
1
x x w p
p
model evaluated at t (^) i , and n (^) ji is the number of individuals in R (^) ji.
Partial residuals can only be computed for the covariates which are not time dependent. At time t (^) i in stratum j, x (^) g is the p × 1 observed covariate vector for any gth individual in set Di , where Di is the set of individuals dying at t (^) i. The partial residual γ (^) g is defined by
γ (^) g
g
gp
= g p ti
γ
γ
1
Rewriting the above formula in a univariate form, we get
γ (^) gh gh
l lh l l R
l l l R
x i
w x
w
i h p g D
i
∈
∈
exp $
exp $^
x
x
β
β
where x (^) gh is the hth component for x (^) g. For every variable, the residuals can be plotted against times to test the proportional hazards assumption.
The residuals ei are computed by
which is the same as the estimate of the cumulative hazard function.
For a specified pattern, the covariate values x c are determined and x ′c β$ is computed. There are three plots available in COXREG.
$ (^) | $ exp^
$ S t (^) i x c (^) S ti c
x
′ 0
β
Cox, D. R. 1972. Regression models and life tables (with discussion). Journal of the Royal Statistical Society, Series B, 34: 187–220.
Kalbfleisch, J. D., and Prentice, R. L. 1980. The statistical analysis of failure time data. New York: John Wiley & Sons, Inc.
Lawless, J. F. 1982. Statistical models and methods for lifetime data. New York: John Wiley & Sons, Inc.
Storer, B. E., and Crowley, J. 1985. A diagnostic for Cox regression and general conditional likelihoods. Journal of the American Statistical Association, 80: 139–147.