Mixed Effects Model: Computing Covariance Matrices and Derivatives, Study notes of Mathematical Statistics

An overview of the computational steps involved in estimating the parameters of a mixed effects model, with a focus on computing the covariance matrices and their derivatives. The use of newton's algorithm and scoring methods, as well as the computation of the likelihood functions for maximum likelihood estimation (mle) and restricted maximum likelihood estimation (reml). It also discusses the use of confidence intervals for the estimated covariance parameters.

Typology: Study notes

2011/2012

Uploaded on 10/31/2012

sangawar
sangawar 🇮🇳

4.5

(4)

118 documents

1 / 24

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
MIXED
Overview
This document summarizes the computational algorithms discussed in Wolfinger,
Tobias and Sall (1994).
Notation
Overall covariance parameter vector
G
T A vector of covariance parameters associated with random effects.
k A vector of covariance parameters associated with the k-th random
effect.
R A vector of covariance parameters associated with the residual term.
K
Number of random effects.
r
S Number of repeated subjects.
k
S Number of subjects in k-th random effect.
)(T
V
The n×n covariance matrix of y. This matrix is sometimes denoted by
)(T
y
V. The subscript is removed here for a cleaner notation.
)(
s
V
First derivative of )(
V
with respect to the s-th parameter in T.
)(
st
V
Second derivative of )(T
V
with respect to s and t-th parameter in T.
)( R
R The n×n covariance matrix of H.
)( Rs TR
First derivative of )( R
R with respect to the s-th parameter in R
T
)( Rst TR
Second derivative of )( R
TRwith respect to s and t-th parameter in
R
T.
)( G
G The covariance matrix of random effects.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18

Partial preview of the text

Download Mixed Effects Model: Computing Covariance Matrices and Derivatives and more Study notes Mathematical Statistics in PDF only on Docsity!

1

Overview

This document summarizes the computational algorithms discussed in Wolfinger,

Tobias and Sall (1994).

Notation

Overall covariance parameter vector

„G

A vector of covariance parameters associated with random effects.

k

A vector of covariance parameters associated with the k-th random

effect.

R

A vector of covariance parameters associated with the residual term.

K Number of random effects.

S (^) r Number of repeated subjects.

S (^) k Number of subjects in k-th random effect.

V („) The n×n covariance matrix of^ y. This matrix is sometimes denoted by

Vy („). The subscript is removed here for a cleaner notation.

V ^ s( ) First derivative of^ V (^ )with respect to the s-th parameter in^ „^.

V ^ st( ) Second derivative of^ V^ („)with respect to s and t-th parameter in^ „^.

R ( (^) R) The n×n covariance matrix of^ r.

R ^ s ( „R ) First derivative of^ R (^^ R)with respect to the s-th parameter in^ „R

R ^ st ( „R ) Second derivative of^ R (^^ „R)with respect to s and t-th parameter in

„R.

G ( (^) G ) The covariance matrix of random effects.

G ^ s ( G ) First derivative of^ G^ (^ G )with respect to s-th parameter in^ „^ G.

G ^ st ( (^) G ) Second derivative of^ G (^^ G )with respect to s and t-th parameter in

„G.

V k (^) („ (^) k) The covariance matrix of the k-th random effect for^ one^ random subject.

V ^ k,s ( „k ) First derivative of^ V k^ ( „^ k)with respect to s-th parameter in^ „k^.

V k,st („ (^) k)

 (^) Second derivative of ( ) V k (^) „k with respect to s and t-th parameter in

„k.

y (^) n by 1 vector of dependent variable.

X n by p design matrix of fixed effects.

Z n by q design matrix of random effects.

r (^) n by 1 vector of residual.

i p by 1 vector of fixed effects parameters.

t (^) q by 1 vector of random effects parameters.

r (^) n by 1 vector of residual error.

Wc n by n diagonal matrix of case weights.

Wrw n by n diagonal matrix of regression weights.

Model

In this document, we assume a mixed effect model of the form

y = X + Z + (1)

In this model, we assume that r is distributed as N[ 0, R ( R )] and t is

independently distributed as N[ 0, G ( G )]. Therefore y is distributed as

N[ X V ( )], where ( ) ( ) ( R )

T V = ZG G Z + R. The unknown parameters

include the regression parameters in i and covariance parameters in „.

Estimation of these model parameters relies on the use of a Newton-Ralphson or

scoring algorithm. When we use either algorithm for finding MLE or REML

Number of columns in the design matrix Z (^) kj (j-th random subject of k-th random

effect) is equal to number of levels of the k-th random effect variable.

Under this partition, the G( „G ) will be a block diagonal matrix which can be

expressed as

G ( (^) G ) k 1 [ I (^) S Vk ( k )] k

K .

It should also be noted that each random effect has its own parameter vector „k ,

k=1,..., K , and there are no functional constraints between elements in these

parameter vectors. Thus „ G = („ 1 ,...,„ K ).

When there are correlated random effects, Zkj will be a combined design matrix

of the correlated random effects. Therefore in subsequent sections, each random

effect can either be one single random effect or a set of correlated random effects.

Repeated Subjects:

When the REPEATED subcommand is used, R („R ) will be a block diagonal

matrix where i-th block is R (^) i („R ), i=1,..., S R. That is,

( ) i 1 i( )

R R (^) R R

S =⊕ =

The dimension of R (^) i („R ) will be equal to number of cases in one repeated

subject but all R (^) i („ (^) R)share the same parameter vector „ (^) R.

Likelihood Functions

Recall that the –2 times log likelihood of MLE is

i„ „ „ „ „

nlog 2

2 ( , ) log|V( )| r( ) V( ) r( )

T 1 MLE

− 4 (2)

and the –2 times log likelihood of REML is

log| | (n p)log 2

(^2) REML( ) log| ( )| ( ) () ( )

X’V X

V r V r

1

T 1 4 (3)

where n is the number of observations and p is the rank of fixed effect design

matrix. From (2) and (3), we can see that the key component of the likelihood

functions are

( ) log| ’ ( ) |.

( ) log| ( )|

1 3

1 2

1

XV X

r V r

V

θ θ

θ θ θ θ

θ θ





(4)

Therefore, in each estimation iteration, we need to compute 41 ( „), 4 2 („)and

4 3 („)as well as their 1

st and 2

nd derivatives with respective to „.

Newton & Scoring Algorithms

Covariance parameters in „ can be found by maximizing (2) or (3), however, there

are no closed form solutions in general. Therefore Newton and scoring algorithms

are used to find the solution numerically. The algorithm is outlined as below,

  1. Compute starting value and initial log-likelihood (REML or ML).
  2. Compute gradient vector g and Hessian matrix H of the log-likelihood

function using last iteration's estimate „i (^) − 1. (See later section for computation

of g and H)

  1. Compute the new step d H g

1 = −.

  1. Let = 1.
  2. Compute estimates of i-th iteration (^) i = (^) i − 1 + d.

Relative Hessian convergence:

g H gi

1 i

T i 1



| (^4) i |

( if (^4) i 20 , the denominator will be replaced by 1)

Starting value of Newton’s Algorithm

If no prior information is available, we can choose the initial values of G and R

to be the identity. However, it is highly desirably to estimate the scale of the

parameter. By ignoring the random effects, and assuming the residual errors are

i.i.d. with variance

2 † , we can fit a GLM model and estimate

2 † by the residual

sum of squares

2 †ˆ. Then we choose the starting value of Newton’s algorithm to be

2

K

G (^) k and 1

2

K
R.

Confidence Intervals of Covariance Parameters

The estimate „ˆ^ (ML or REML) is asymptotically normally distributed. Its variance

covariance matrix can be approximated by

1 2 H

− − , where H is the hessian

matrix of the log-likelihood function evaluated at „ˆ^. A simple Wald’s type

confidence interval for any covariance parameter can be obtained by using the

asymptotic normality of the parameter estimates, however it is not very appropriate

for variance parameters and correlation parameters that have a range of [ 0 ,∞) and

[− 1 , 1 ]respectively. Therefore these parameters are transformed to parameters that

have range ( −∞, ∞). Using uniform delta method, see for example (van der Vaart,

1998), these transformed estimates still have asymptotic normal distributions.

Suppose we are estimation a variance parameter

2 † by

2 †ˆ (^) n that is distributed as

N[ ,Var(ˆ )]

2 n

2 † † asymptotically. The transformation we used is log( )

2 † which

can correct the skewness of

2 †ˆ (^) n, moreover log( ˆ )

2 †n has the range ( −∞,∞)

which matches that of normal distribution. Using delta method, one can show that

the asymptotic distribution of log( ˆ )

2 † (^) n is N[log( ), Var(ˆ )]

2 n

2 4 † † †

. Thus, a

( 1 − h) 100 %confidence interval of log( )

2 † is given by

[log( ˆ ) z Var(ˆ ) , log(ˆ ) z Var(ˆ )]

2 n

2 1 / 2 n

2 n

2 n

2 1 / 2 n

2 †n (^) D † † † D † †

− −

− − − +

where z 1 −D / 2 is the upper ( 1 − h/ 2 ) percentage point of standard normal

distribution. By inverting this confidence interval, a ( 1 − h) 100 % confidence

interval for

2 † is given by

[exp (log( ˆ ) z Var(ˆ )) , exp(log( ˆ ) z Var(ˆ ))]

2 n

2 1 / 2 n

2 n

2 n

2 1 / 2 n

2 †n (^) D † † † D † †

− −

− − − +

When we need a confidence interval for a correlation parameter , a possible

transformation will be its generalized logit

arctan h( ) = 0. 5 log[( 1 + )/( 1 − )]. The resulting confidence interval for

will be

[tanh( arctan h ( )ˆ z ( 1 ˆ ) Var(ˆ)) ,tanh(arctan h( )ˆ z ( 1 ˆ ) Var(ˆ))]

2 1 1 / 2

2 1

(^1) D/ 2

D

− −

− − − − − −

Fixed and Random Effect Parameters

Estimation and prediction

After we obtain an estimate of „ , best linear unbiased estimator (BLUE) of i and

best linear unbiased predictor (BLUP) of t can be found by solving the mixed

model equations, Henderson (1984).

− − −

− −

Z R y

XR y

ZR X Z R X G
X R X X R Z

T 1

T 1

T 1 T 1 1

T 1 T 1

t

i (5)

for some hypothesis matrix L. Estimators or predictors of Lb can easily be

constructed by substituting iˆ^ and tˆ into (8) and its variance covariance matrix can

be approximated by

T LCL. If L 1 is zero and L 0 is estimable, Lb ˆ^ is called

the best linear unbiased estimator of L 0. If L 1 is nonzero and L 0 is

estimable, Lb ˆ^ is called the best linear unbiased predictor of Lb.

To test the hypothesis H 0 : Lb = a for a given vector a , we can use the statistic

( ) ( )

q

ˆ a ˆ ˆ

T − − =

Lb a (LCL) Lb F

T 1

(9)

where q is the rank of the matrix L. The statistic in (9) has an approximate F

distribution. The numerator degrees of freedom is q and the denominator degree

of freedom can be obtained by Satterthwaite (1946) approximation. The method

outlined below is similar to Giesbrecht and Burns (1985), McLean and Sanders

(1988), and Fai and Cornelius (1996).

Satterthwaite’s Approximation

To find the denominator degrees of freedom of (9), we first perform the spectral

decomposition LC L 2 B D B

ˆ T T

where B is an orthogonal matrix of

eigenvectors and (^) D is a diagonal matrix of eigenvalues. If we let (^4) m be the m-th

row of B L , dm be the m-th eigenvalues and

m

T gm g

1

2 m m (ˆ)

2 d

− ∑

where

T T

ˆ

T m m m

C

g

and

1 ( ˆ)

− T „ is the covariance matrix of the estimated

covariance parameters. If we let

q

m 1

m m

m I( 2) 2

E

then the denominator degree of freedom is given by

E q

2 E

Note that the degrees of freedom can only be computed when E>q.

Type I &III Statistics

Type I or III test statistics are special cases of custom hypothesis tests.

Estimated Marginal Means (EMMEANS)

Estimated marginal means are special cases of custom hypothesis test. The

construction of the L matrix for EMMEANS can be found in "Estimated Marginal

Means" section of GLM’s algorithm document. If Bonferroni or Sidak adjustment is

requested for multiple comparisons, they will be computed according to the

algorithm detailed in Appendix 10:Post Hoc Tests.

Saved Values

If predicted values is requested, it will be computed by

y X Z ˆ

using the estimates given in (6).

If fixed predicted values is requested, it will be computed by

y X

1st and 2nd Derivatives of^4 k („)

In each Newton or scoring iteration we need to compute the 1

st and 2

nd derivatives

of the components of the log-likelihood (^4) k („), k=1,2,3. Here we let

g (^) k (^2 4) k and ( )

2

„ „„

H (^) k (^2 4) k , k 2 1 , 2 , 3 , then the 1

st derivatives

with respect to the s-th parameter in „ is given by

[ ] (
[ ] ,
[ ] ( ),

1 1 3

1 1 2

1 1

g tr XV VV X

g rV VV r

g trV V

s

T s

s s

s s

 

 



(13)

and the 2

nd derivatives with respect to s and t-th parameter are given by

[ ] ( ) ( ),

1 1 1 H 1 (^) st trV VsV Vt trV Vst 2   ^    

r V VV r

r V VV XX VV r

H r V VV VV r

st

T

t

T s

T

s t

T st

1 1

1 1 1

1 1 1 2

~~ 2

[ ] 2

 

  

  

(14)

[ ] 2 (

1 1

1 1 1 1

1 1 1 3

trXV VV X

trXV VV XXV VV X

H trXV VV VV X

st

T

t

T s

T

s t

T st

 

   

  

where X 2 XC

for a matrix C satisfying CC XV X P

T 2 2

  ( ’ )

1 and

r I X(X’V X) X’V y y X

= [ − ] = −

− − − .

Derivatives w.r.t. Parameters in G

Derivatives with respect to parameters in G can be constructed by from the entries

of

  

  

  

r V X r V Z r V r

Z V X Z V Z Z V r

XV X XV Z XV r

W r X W r Z W r r

W ZX W Z Z W Z r

W X X W XZ W Xr

W Xr Z

T T T

T T T

T T T

1 1 1

1 1 1

1 1 1

1 1 1

1 1 1

1 1 1

1

(15)

The matrix W 1 ( X ; r ; Z )can be computed from W 1 ( X ; y ; Z )given in (27), by

using the following relationship,

r 2 y  Xb 0

where b 0 is the current estimate of i.

Using the above formula, we can obtain the following expressions,

1 1 0

0

1 1 1

W ( y , y ) W ( y , X ) b

r V r y V y y V Xb

T T T

  

(16)

1 1 0

0

1 1 1

W ( X , y ) W ( X , X ) b

X V r X V y X V Xb

T T T

  

(17)

s

(1)s 0 0

W
W

and

s t

0

2 (2)st 0

W
W

where „ s and „ t are the s

th and t

th parameters of R. Therefore,

A R B
W AB A R R R R RR RR R RR RR B
A R B
W AB A R RR B
W AB A R B

st

s

T

st s t t s

st T

T

s

s T

T

[ ( )]
( , ) [ ]
[ ( )]

1

( 2 ) 1 1 1 1 1 1 1 1 0

1

( 1 ) 1 1 0

1 0

 wT T

w

       

 wT

w

 



The matrices A and B can be X , Z , Z

or r , where

 1  1  1 Z 2 ZM 2 Z(G  Z R Z)

~ T

, and

0

1 1 1 r 2 [ I  X ( XV X ) XV ] y 2 y  Xb

  

Remark: The matrix

1 1 1 ( )

   G  Z R Z

T involved in Z

can be obtained by

pre/post multiply

  (  )

1 I LZ R ZL

T T by L and

T L ).

Using these notations, the 1

st derivatives of (^4) k („)with respect to a parameter in

R are as follows,

[ ] tr( )

[ ] (, ) 2 ( ,

[ ] ( ) tr( ( , ) )

3 , 3

0

( 1 ) 0 0

( 1 ) 0 0

( 1 ) 2 , 0

( 1 ) 0

1 1 ,

s Rs R

s

s s Rs

s Rs s

g H

W r ZW ZZW Zr

g W rr W r ZW Zr

g trR R W ZZM

(21)

To compute 2

nd derivatives w.r.t. „ s and „ t (of R ), we need to consider the

following simplification factors.

W ZZM W Z ZMW Z Z M
H R RR R R R

()st ()s ()t

s t st

st R

1 0

1 0

2 0

1 1 1 1

0

( 1 ) 0

( 1 ) 0 0

( 1 ) 0 0

( 1 ) 2 0

W XZ W Zr W XZ W Z r

H W Xr W XZW ZZ WZr

s s

s s s R

, ) ( , )]
[ ( , ) (
) ( , ) (, )]
2 [ (,

( 1 ) 0 0

( 1 ) 0

( 1 ) 0

( 1 ) 0 0

( 2 ) 0 0

0

( 2 ) 0 0

( 2 ) 2 0

W ZZ W Zr W Z r

W r ZW ZZ W r Z M

W r Z W Zr

W r Z W Z Z W Zr

H W r r

t t

s s

st

st

st st R

0

( 1 ) 0 0

0

( 1 ) 0

( 1 ) 0 0

( 1 ) 3 0

W XZ W ZZ W Z X
W XZ W ZX
H W XX W XZ W Z X

s

s

s s s R

[ ( , ) ] [ ( , ) ]
2 [ ( , ) ( , ) ( , )]

0

0 0

( 1 ) 0 0

( 1 ) 3 0

W Z X
MW ZZ IG W ZZM I
H W XZ W XZMW Z Z

t

st s s GR

Based on these simplification terms, the second derivatives are given by

[ 1 ] , ( 1 )

st H (^) GR st 2 trHGR

t R

s T G

st [ H (^) 2 ] GR , st 2 HGR 2  2 ( H 2 ) H 2 23)

[ H 3 (^) ] , tr ( H 3 P H 3 PH 3 P )

t R

s G

st GR st^2 GR 

Gradient & Hessian of REML

The restricted log likelihood is given by

log| ( ) | (n p)log 2

2 ( | ) log| ( )| ( ) ( ) ( )

1

1





XV X

y V r V r

T

T (^4) REML

where p is equal to the rank of X.

Therefore the s

th element of the gradient vector is given by

[ g ] s 2 [ g 1 ] s [ g 2 ] s [ g 3 ] s

and the (s,t)-th element of the Hessian matrix is given by

[ H ] st 2 [ H 1 ] st [ H 2 ] st [ H 3 ] st

If scoring algorithm is used, the Hessian can be simplified to

[H] (^) st = − [H 1 ]st + [H 3 ] st.

Gradient & Hessian of MLE

The log likelihood is given by

nlog 2

2 ( | ) log| ( )| ( ) ( ) ( )

1

 y V r V r

T (^4) MLE

Therefore the s

th element of the gradient vector is given by

[ g ] s 2 [ g 1 ] s [ g 2 ] s

and the (s,t)

th element of the Hessian matrix is given by

[ H ] st 2 [ H 1 ] st [ H 2 ] st.

If scoring algorithm is used the Hessian can be simplified to

[H]st = − [H 1 ] st.

It should be noted that the Hessian matrices for the scoring algorithm in both ML

and REML are not ‘exact’. In order to speed up calculation, some second derivative

terms are dropped. Therefore, they are only used in intermediate step of

optimization but not for standard error calculations.