Cox Proportional Hazards Model: Survivor and Hazard Functions, Study notes of Mathematical Statistics

The cox proportional hazards model, focusing on the survivor and hazard functions. It covers the relationships between these functions, their estimation, and diagnostic statistics. The model assumes the hazard function for an individual is proportional to an expiring risk, with the baseline hazard function being constant for individuals with the same covariates. The document also discusses the estimation of the baseline survivor function and the use of diagnostic statistics.

Typology: Study notes

2011/2012

Uploaded on 10/31/2012

sangawar
sangawar 🇮🇳

4.5

(4)

118 documents

1 / 19

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
COXREG
Cox (1972) first suggested the models in which factors related to lifetime have a
multiplicative effect on the hazard function. These models are called proportional
hazards (PH) models.
Under the proportional hazards assumption, the hazard function h of t given X is
of the form
ht h t exx
27 16
=
0β (1)
where x is a known vector of regressor variables associated with the individual, β
is a vector of unknown parameters, and ht
016 is the baseline hazard function for an
individual with x=0. Hence, for any two covariates sets x1 and x2, the log
hazard functions ht x1
27
and ht x2
27
should be parallel across time.
When a factor does not affect the hazard function multiplicatively, stratification
may be useful in model building. Suppose that individuals can be assigned to one of
m different strata, defined by the levels of one or more factors. The hazard function
for an individual in the jth stratum is defined as
ht h te
jj
|xx
16 16
=
0β (2)
There are two unknown components in the model: the regression parameter β and the
baseline hazard function ht
j016. The estimation for the parameters is described below.
Estimation
We begin by considering a nonnegative random variable T representing the
lifetimes of individuals in some population. Let ft|x
16
denote the probability
density function (pdf) of T given a regressor x and let St|x
16
be the survivor
function (the probability of an individual surviving until time t). Hence
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13

Partial preview of the text

Download Cox Proportional Hazards Model: Survivor and Hazard Functions and more Study notes Mathematical Statistics in PDF only on Docsity!

1

Cox (1972) first suggested the models in which factors related to lifetime have a multiplicative effect on the hazard function. These models are called proportional hazards (PH) models. Under the proportional hazards assumption, the hazard function h of t given X is of the form

h t 2 x 7 = h 0 1 6t e x ′β^ (1)

where x is a known vector of regressor variables associated with the individual, β

is a vector of unknown parameters, and h 0 1 6t is the baseline hazard function for an

individual with x = 0. Hence, for any two covariates sets x 1 and x 2 , the log

hazard functions h t 2 x 17 and h t 2 x 27 should be parallel across time.

When a factor does not affect the hazard function multiplicatively, stratification may be useful in model building. Suppose that individuals can be assigned to one of m different strata, defined by the levels of one or more factors. The hazard function for an individual in the jth stratum is defined as

h j 1 t | x 6 = h 0 j1 6t e x ′β^ (2)

There are two unknown components in the model: the regression parameter β^ and the

baseline hazard function h 0 j1 6t. The estimation for the parameters is described below.

Estimation

We begin by considering a nonnegative random variable T representing the

lifetimes of individuals in some population. Let f t 1 | x 6 denote the probability

density function (pdf) of T given a regressor x and let S t 1 | x 6 be the survivor

function (the probability of an individual surviving until time t). Hence

S t f u du t

1 |^ x^ 6 =^1 | x 6

I

The hazard h t 1 | x 6 is then defined by

h t

f t S t

x

x x

Another useful expression for S t 1 | x 6 in terms of h t 1 | x 6 derived from equations (3)

and (4) is

S t h u du

t

1 |^ x^ 6 =^ exp^ − 1 | x 6

I 

0

Thus,

ln S t | h u | du

t

1 x 6 = −I 1 x 6

0

For some purposes, it is also useful to define the cumulative hazard function

H t h u du S t

t

1 | x 6 = I 1 | x 6 = −ln 1 | x 6

0

Assume that the hazard function has the form of equation (1). The survivor function can be written as

S t | S t

exp x

x

′ 0

β (8)

where S 0 1 6t is the baseline survivor function defined by

S 0 1 6t = exp 2 −H 0 1 6t 7 (9)

where d (^) ji is the sum of case weights of individuals whose lifetime is equal to t (^) ji

and S (^) ji is the weighted sum of the regression vector x for those d (^) ji individuals,

wl is the case weight of individual l, and R (^) ji is the set of individuals alive and

uncensored just prior to t (^) ji in the jth stratum. Thus the log-likelihood arising from

equation (12a) is

l L (^) ji d w e i

k

j

m ji l i l R

k

j

j m l

ji

j = = ′ −

= =

= = ∈

ln β ∑ ∑ β ∑∑ ln ∑

1 6 S x β

1 1 1 1

(12b)

and the first derivatives of l are

D

l S d

w x e

w e r r^ p

l ji l

ji

j

r

ji

r ji

l lr l R

l l R

i

k

j

m β

∂ ∂ β

′ ∈ ′

= =

x

x

β

β 1 1

, 1 , K, (13)

In equation (13) , S (^) ji

1 6r

is the rth component of S (^) ji ji ji p

=  S S 

1 6 , K, 1 6. The

maximum partial likelihood estimate (MPLE) of β is obtained by setting

∂ β

l r equal to zero for r = 1, K ,p, where^ p^ is the number of independent variables in the

model. The equations

∂ β

l r p r

= 0 1 =1, K, 6 can usually be solved by using the

Newton-Raphson method.

Note that from equation (12a) the partial likelihood function L 1 6β is invariant

under translation. All the covariates are centered by their corresponding overall

mean. The overall mean of a covariate is defined as the sum of the product of

weight and covariate for all the censored and uncensored cases in each stratum. For

notational simplicity, x l used in the Estimation Section denotes centered

covariates.

Three convergence criteria for the Newton-Raphson method are available:

  • Absolute value of the largest difference in parameter estimates between

iterations1 6 δ divided by the value of the parameter estimate for the previous

iteration; that is,

BCON

parameter estimate for previous iteration

δ

  • Absolute difference of the log-likelihood function between iterations divided by the log-likelihood function for previous iteration.
  • Maximum number of iterations.

The asymptotic covariance matrix for the MPLE β =$^ $^ , , $

4 β 1 K^ βp 9 is estimated by

I −^1 where I is the information matrix containing minus the second partial

derivatives of ln L. The (r, s)-th element of I is defined by

I

x

x

x x

x

rs r s

ji

l ls lr l R

l l R

l lr l R

l ls l R

l l R

i

k

j

m

E L

d

w x x e

w e

w x e w x e

w e

l ji l

ji

l ji

l ji

l ji

j

′ ∈ ′

′ ∈

′ ∈

′ ∈

= =

∂ β ∂ β

2

2 1 1

ln

β

β

β β

β

We can also write I in a matrix form as

I (^) rs d (^) ji x t (^) ji V t (^) ji x tji i

k

j

m j = ′ = =

∑ ∑^4 3 89 3 8 3 8^4

1 1

Writing S$ 0 1 t i + 6 =pi 1 i = 1, K,k 6 , the observed likelihood function is of the form

L p (^) i p (^) i p p

w

l D

i

w

l C

k

w

i l C

k l l l

i

l l

i

l l

k

1 1 1 (^1 )

K

'K

K

*K^

′ ′

= ∈

exp 1 x β 6 exp 1 x β 6 exp 1 x β 6 exp 1 x β 6

where Di is the set of individuals dying at t (^) i and Ci is the set of individuals with

censored times in t i − 1 , ti 6. (Note that if the last observation is uncensored, C k + 1

is empty and p (^) k = 0 .)

If we let α i = pi pi− 1 1 i = 1, K,k 6 , L 1 can be written as

L (^) i

w

i l D

k i

w

l R D

l l

i

l l

i i

1 1

= ∈

∈ −

exp 1 x β 6 exp 1 x β 6

Differentiating ln L 1 with respect to α 1 , K ,αk and setting the equations equal to zero, we get

w l l w i k

l D i

l l l R l i i

exp exp , , exp

′ ∈ ∈

x x x

β β β

α

K (15)

We then plug the MPLE β$ of β into equation (15) and solve these k equations separately. There are two things worth noting:

  • If any Di = 1 , α$ (^) i can be solved explicitly.

exp $

exp $

exp $

α (^) i

i i

l l l R

w

w i

l

− ′

x

x

x

β

β

β

  • If Di > 1 , equation (7) must be solve iteratively for α$ (^) i. A good initial value

for α$ (^) i is

$ (^) exp exp $

α (^) i i l l l R

d w i

∑ 4 x^^ β^9

where d (^) i wl l Di

∑ is the weight sum for set^ Di^. (See Lawless, 1982, p. 361.)

Once the α$ i , i = 1, K ,k are found, S 0 1 6t is estimated by

:

S t (^) i i t (^) it

0 1 6^

<

∏α^ (18)

Since the above estimate of S 0 1 6t requires some iterative calculations when ties

exist, Breslow (1974) suggests using equation (17) as an estimate for α (^) i ; however, we will use this as an initial estimate.

The asymptotic variance for − ln S$ 0 1 6t can be found in Chapter 4 of Kalbfleisch

and Prentice (1980). At a specified time t, it is consistently estimated by

var − ln $^ = exp ′$

<

S t ∑Di ∑w l l a

t (^) i t l Ri

0

2

1 6^1

4 9 4 x^ β^9 a I (19)

where a is a p × 1 vector with the jth element defined by

D

w x

w

i

l lj l l R

l l l R

t t

i

i

i

exp $

exp $

<

x

x

β

β

2

Wald Statistic

The Wald statistic is calculated for the variables in the model to select variables for removal. The Wald statistic for variable x (^) j is defined by

β′ (^) j B 11 ,j βj

where β$ (^) j is the parameter estimate associated with x (^) j and B 11, j is the submatrix of A 11 −^1 associated with x (^) j.

LR (Likelihood Ratio) Statistic

The LR statistic is defined as twice the log of the ratio of the likelihood functions of two models evaluated at their own MPLES. Assume that r variables are in the current model and let us call the current model the full model. Based on the MPLES of parameters for the full model, l(full) is defined in equation (12b). For each of r variables deleted from the full model, MPLES are found and the reduced log-likelihood function, l(reduced), is calculated. Then LR statistic is defined as

–2(l(reduced) – l(full))

Conditional Statistic

The conditional statistic is also computed for every variable in the model. The formula for conditional statistic is the same as LR statistic except that the parameter estimates for each reduced model are conditional estimates, not MPLES.

The conditional estimates are defined as follows. Let β$^ = β$^ , , β$

4 1 K^ r 9 be the

MPLES for the r variables (blocks) and C be the asymptotic covariance for the parameters left in the model given β$i is

β (^) i β (^) i β i i

1 6 1 6 i

= − 1 6^  1 6

C (^) 12 C 22

1

where β$i is the MPLE for the parameter(s) associated with x i and β$ 1 6i is β$

without β$i , C 12

1 6^ i

is the covariance between the parameter estimates left in the

model β$ 1 6i and β$i , and C 22

1 6^ i

is the covariance of β$i. Then the conditional statistic for variable x i is defined by

− 2  l i −l full

4 β 1 69 1^6

where l (^) i

4 β 1 6 9 is the log-likelihood function evaluated at^

β 1 6i.

Note that all these four statistics have a chi-square distribution with degrees of freedom equal to the number of parameters the corresponding model has.

Statistics

Initial Model Information

The initial model for the first method is for a model that does not include covariates. The log-likelihood function l is equal to

l d (^) ji nji i

k

j

m j 0 1 1

1 6 = −^4

= =

∑ ∑ ln

where n ∗ji^ is the sum of weights of individuals in set R (^) ji.

Model Information

When a stepwise method is requested, at each step, -2 log-likelihood function and three chi-square statistics (model chi-square, improvement chi-square, and overall chi-square) and their corresponding degrees of freedom and significance are printed.

Information for Variables in the Equation

For each of the single variables in the equation, MPLE, SE for MPLE, Wald statistic, and its corresponding df, significance, and partial R are given. For a single variable, R is defined by

R =

# ×

Wald 2 log - likelihood for the intial model

sign of MPLE

1 2

if Wald > 2. Otherwise R is set to zero. For a multiple category variable, only the Wald statistic, df, significance, and partial R are printed, where R is defined by

R =

Wald df 2 log - likelihood for the intial model

1 2

if Wald > 2 df. Otherwise R is set to zero.

Information for the Variables Not in the Equation

For each of the variables not in the equation, the Score statistic is calculated and its corresponding degrees of freedom, significance, and partial R are printed. The partial R for variables not in the equation is defined similarly to the R for the variables in the equation by changing the Wald statistic to the Score statistic. There is one overall statistic called the residual chi-square. This statistic tests if all regression coefficients for the variables not in the equation are zero. It is defined by

u ′4 9β$^ B 22 u 4 9β$

where u 4 9β$ is the vector of first derivatives of the partial log-likelihood function

with respect to all the parameters not in the equation evaluated at MPLE β$ and B 22 is equal to A (^) 22 A (^) 21 A (^) 111 A^12

1 − −^

4 9 and^ A^ is defined in equation^ (21).

Survival Table

For each stratum, the estimates of the baseline cumulative survival 1 6S 0 and

hazard 1 H 06 function and their standard errors are computed. The estimate S$ 0 for

S 0 has been discussed in equations (15) through (18). It is easy to see from

equation (9) that H 0 1 6t is estimated by

H^ $^0 1 6t = −ln S$ 0 1 6t

and the asymptotic variance of H$ 0 1 6t is defined in equation (19). Finally, the

cumulative hazard function H t 1 | x 6 and survival function S t 1 | x 6 are estimated by

H t^ $^1 | x 6 = exp 4 x ′β$^9 H$ 0 1 6t

and, for a given x ,

$ (^) | $ exp^ $

$ exp $ S t S t S t

a x

x x

′ (^) ∗ ′ − 0 0

β β

The asymptotic variances are

var 4 H t$^1 | x 69 = exp 4 2 x ′β$^9 var 4 H$ 0 1 6t 9

and

var 4 S t$^1 | x 69 = exp 4 2 x ′$^ 9 1 4 S t $^ | x 69 var 4 H $ 1 6t 9

2 β 0

∆βl = − −v rl l

m

I

where

w w w

v d p t t t t

t p t p t

m u v v

n

l i l i l i i i i

k

i i n i

l l l l

ji

j

ji

=

diag , ,

1

1

1

1

K
K

x x w p

p

I

and x ′1 6t i is an n ji × pmatrix which represents the p covariate variables in the

model evaluated at t (^) i , and n (^) ji is the number of individuals in R (^) ji.

Partial Residuals

Partial residuals can only be computed for the covariates which are not time dependent. At time t (^) i in stratum j, x (^) g is the p × 1 observed covariate vector for any gth individual in set Di , where Di is the set of individuals dying at t (^) i. The partial residual γ (^) g is defined by

γ (^) g

g

gp

= g p ti

γ

γ

1

K x 1 6 x

Rewriting the above formula in a univariate form, we get

γ (^) gh gh

l lh l l R

l l l R

x i

w x

w

i h p g D

i

exp $

exp $^

x

x

β

β

1 K

where x (^) gh is the hth component for x (^) g. For every variable, the residuals can be plotted against times to test the proportional hazards assumption.

Residuals

The residuals ei are computed by

ei = H t$^1 i | x i 6 = exp 4 x i′ β$^9 4 H$ 0 1 6ti 9

which is the same as the estimate of the cumulative hazard function.

Plots

For a specified pattern, the covariate values x c are determined and x ′c β$ is computed. There are three plots available in COXREG.

Survival Plot

For stratum j, 4 t i , S$ 0 1 t i | x c 69 , i = 1, K ,k j are plotted where

$ (^) | $ exp^

$ S t (^) i x c (^) S ti c

x

′ 0

β

Cox, D. R. 1972. Regression models and life tables (with discussion). Journal of the Royal Statistical Society, Series B, 34: 187–220.

Kalbfleisch, J. D., and Prentice, R. L. 1980. The statistical analysis of failure time data. New York: John Wiley & Sons, Inc.

Lawless, J. F. 1982. Statistical models and methods for lifetime data. New York: John Wiley & Sons, Inc.

Storer, B. E., and Crowley, J. 1985. A diagnostic for Cox regression and general conditional likelihoods. Journal of the American Statistical Association, 80: 139–147.