Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Mixed Effects Model: Computing Covariance Matrices and Derivatives, Study notes of Mathematical Statistics

Alliance University Mathematical Statistics

An overview of the computational steps involved in estimating the parameters of a mixed effects model, with a focus on computing the covariance matrices and their derivatives. The use of newton's algorithm and scoring methods, as well as the computation of the likelihood functions for maximum likelihood estimation (mle) and restricted maximum likelihood estimation (reml). It also discusses the use of confidence intervals for the estimated covariance parameters.

Typology: Study notes

2011/2012

Uploaded on 10/31/2012

sangawar 🇮🇳

4.5

(4)

118 documents

1 / 24

This page cannot be seen from the preview

Don't miss anything!

1

MIXED

Overview

This document summarizes the computational algorithms discussed in Wolfinger,

Tobias and Sall (1994).

Notation

Overall covariance parameter vector

G

T A vector of covariance parameters associated with random effects.

k A vector of covariance parameters associated with the k-th random

effect.

R A vector of covariance parameters associated with the residual term.

K

Number of random effects.

r

S Number of repeated subjects.

k

S Number of subjects in k-th random effect.

)(T

V

The n×n covariance matrix of y. This matrix is sometimes denoted by

)(T

y

V. The subscript is removed here for a cleaner notation.

)(

s

V

 First derivative of )(

V

with respect to the s-th parameter in T.

)(

st

V

 Second derivative of )(T

V

with respect to s and t-th parameter in T.

)( R

R The n×n covariance matrix of H.

)( Rs TR

 First derivative of )( R

R with respect to the s-th parameter in R

T

)( Rst TR

 Second derivative of )( R

TRwith respect to s and t-th parameter in

R

T.

)( G

G The covariance matrix of random effects.

Discover Study notes of Mathematical Statistics Alliance University

Partial preview of the text

Download Mixed Effects Model: Computing Covariance Matrices and Derivatives and more Study notes Mathematical Statistics in PDF only on Docsity!

1

Overview

This document summarizes the computational algorithms discussed in Wolfinger,

Tobias and Sall (1994).

Notation

Overall covariance parameter vector

G

A vector of covariance parameters associated with random effects.

k

A vector of covariance parameters associated with the k-th random

effect.

R

A vector of covariance parameters associated with the residual term.

K Number of random effects.

S (^) r Number of repeated subjects.

S (^) k Number of subjects in k-th random effect.

V () The n×n covariance matrix of^ y. This matrix is sometimes denoted by

Vy (). The subscript is removed here for a cleaner notation.

V ^ s( ) First derivative of^ V (^ )with respect to the s-th parameter in^ ^.

V ^ st( ) Second derivative of^ V^ ()with respect to s and t-th parameter in^ ^.

R ( (^) R) The n×n covariance matrix of^ r.

R ^ s ( R ) First derivative of^ R (^^ R)with respect to the s-th parameter in^ R

R ^ st ( R ) Second derivative of^ R (^^ R)with respect to s and t-th parameter in

R.

G ( (^) G ) The covariance matrix of random effects.

G ^ s ( G ) First derivative of^ G^ (^ G )with respect to s-th parameter in^ ^ G.

G ^ st ( (^) G ) Second derivative of^ G (^^ G )with respect to s and t-th parameter in

G.

V k (^) ( (^) k) The covariance matrix of the k-th random effect for^ one^ random subject.

V ^ k,s ( k ) First derivative of^ V k^ ( ^ k)with respect to s-th parameter in^ k^.

V k,st ( (^) k)

(^) Second derivative of ( ) V k (^) k with respect to s and t-th parameter in

k.

y (^) n by 1 vector of dependent variable.

X n by p design matrix of fixed effects.

Z n by q design matrix of random effects.

r (^) n by 1 vector of residual.

i p by 1 vector of fixed effects parameters.

t (^) q by 1 vector of random effects parameters.

r (^) n by 1 vector of residual error.

Wc n by n diagonal matrix of case weights.

Wrw n by n diagonal matrix of regression weights.

Model

In this document, we assume a mixed effect model of the form

y = X + Z + (1)

In this model, we assume that r is distributed as N[ 0, R ( R )] and t is

independently distributed as N[ 0, G ( G )]. Therefore y is distributed as

N[ X V ( )], where ( ) ( ) ( R )

T V = ZG G Z + R. The unknown parameters

include the regression parameters in i and covariance parameters in .

Estimation of these model parameters relies on the use of a Newton-Ralphson or

scoring algorithm. When we use either algorithm for finding MLE or REML

Number of columns in the design matrix Z (^) kj (j-th random subject of k-th random

effect) is equal to number of levels of the k-th random effect variable.

Under this partition, the G( G ) will be a block diagonal matrix which can be

expressed as

G ( (^) G ) k 1 [ I (^) S Vk ( k )] k

K .

It should also be noted that each random effect has its own parameter vector k ,

k=1,..., K , and there are no functional constraints between elements in these

parameter vectors. Thus G = ( 1 ,..., K ).

When there are correlated random effects, Zkj will be a combined design matrix

of the correlated random effects. Therefore in subsequent sections, each random

effect can either be one single random effect or a set of correlated random effects.

Repeated Subjects:

When the REPEATED subcommand is used, R (R ) will be a block diagonal

matrix where i-th block is R (^) i (R ), i=1,..., S R. That is,

( ) i 1 i( )

R R (^) R R

S =⊕ =

The dimension of R (^) i (R ) will be equal to number of cases in one repeated

subject but all R (^) i ( (^) R)share the same parameter vector (^) R.

Likelihood Functions

Recall that the –2 times log likelihood of MLE is

i

nlog 2

2 ( , ) log|V( )| r( ) V( ) r( )

T 1 MLE

− 4 (2)

and the –2 times log likelihood of REML is

log| | (n p)log 2

(^2) REML( ) log| ( )| ( ) () ( )

−

X’V X

V r V r

1

T 1 4 (3)

where n is the number of observations and p is the rank of fixed effect design

matrix. From (2) and (3), we can see that the key component of the likelihood

functions are

( ) log| ’ ( ) |.

( ) log| ( )|

1 3

1 2

1

XV X

r V r

V

θ θ

θ θ θ θ

θ θ

(4)

Therefore, in each estimation iteration, we need to compute 41 ( ), 4 2 ()and

4 3 ()as well as their 1

st and 2

nd derivatives with respective to .

Newton & Scoring Algorithms

Covariance parameters in can be found by maximizing (2) or (3), however, there

are no closed form solutions in general. Therefore Newton and scoring algorithms

are used to find the solution numerically. The algorithm is outlined as below,

Compute starting value and initial log-likelihood (REML or ML).
Compute gradient vector g and Hessian matrix H of the log-likelihood

function using last iteration's estimate i (^) − 1. (See later section for computation

of g and H)

Compute the new step d H g

− 1 = −.

Let = 1.
Compute estimates of i-th iteration (^) i = (^) i − 1 + d.

Relative Hessian convergence:

g H gi

1 i

T i 1

| (^4) i |

( if (^4) i 20 , the denominator will be replaced by 1)

Starting value of Newton’s Algorithm

If no prior information is available, we can choose the initial values of G and R

to be the identity. However, it is highly desirably to estimate the scale of the

parameter. By ignoring the random effects, and assuming the residual errors are

i.i.d. with variance

2 , we can fit a GLM model and estimate

2 by the residual

sum of squares

2 ˆ. Then we choose the starting value of Newton’s algorithm to be

2

K

G (^) k and 1

2

K

R.

Confidence Intervals of Covariance Parameters

The estimate ˆ^ (ML or REML) is asymptotically normally distributed. Its variance

covariance matrix can be approximated by

1 2 H

− − , where H is the hessian

matrix of the log-likelihood function evaluated at ˆ^. A simple Wald’s type

confidence interval for any covariance parameter can be obtained by using the

asymptotic normality of the parameter estimates, however it is not very appropriate

for variance parameters and correlation parameters that have a range of [ 0 ,∞) and

[− 1 , 1 ]respectively. Therefore these parameters are transformed to parameters that

have range ( −∞, ∞). Using uniform delta method, see for example (van der Vaart,

1998), these transformed estimates still have asymptotic normal distributions.

Suppose we are estimation a variance parameter

2 by

2 ˆ (^) n that is distributed as

N[ ,Var(ˆ )]

2 n

2 asymptotically. The transformation we used is log( )

2 which

can correct the skewness of

2 ˆ (^) n, moreover log( ˆ )

2 n has the range ( −∞,∞)

which matches that of normal distribution. Using delta method, one can show that

the asymptotic distribution of log( ˆ )

2 (^) n is N[log( ), Var(ˆ )]

2 n

2 4

−

. Thus, a

( 1 − h) 100 %confidence interval of log( )

2 is given by

[log( ˆ ) z Var(ˆ ) , log(ˆ ) z Var(ˆ )]

2 n

2 1 / 2 n

2 n

2 1 / 2 n

2 n (^) D D

− −

− − − +

where z 1 −D / 2 is the upper ( 1 − h/ 2 ) percentage point of standard normal

distribution. By inverting this confidence interval, a ( 1 − h) 100 % confidence

interval for

2 is given by

[exp (log( ˆ ) z Var(ˆ )) , exp(log( ˆ ) z Var(ˆ ))]

2 n

2 1 / 2 n

2 n

2 1 / 2 n

2 n (^) D D

− −

− − − +

When we need a confidence interval for a correlation parameter , a possible

transformation will be its generalized logit

arctan h( ) = 0. 5 log[( 1 + )/( 1 − )]. The resulting confidence interval for

will be

[tanh( arctan h ( )ˆ z ( 1 ˆ ) Var(ˆ)) ,tanh(arctan h( )ˆ z ( 1 ˆ ) Var(ˆ))]

2 1 1 / 2

2 1

(^1) D/ 2

D

− −

− − − − − −

Fixed and Random Effect Parameters

Estimation and prediction

After we obtain an estimate of , best linear unbiased estimator (BLUE) of i and

best linear unbiased predictor (BLUP) of t can be found by solving the mixed

model equations, Henderson (1984).

−

− − −

− −

Z R y

XR y

ZR X Z R X G

X R X X R Z

T 1

T 1 T 1 1

T 1 T 1

t

i (5)

for some hypothesis matrix L. Estimators or predictors of Lb can easily be

constructed by substituting iˆ^ and tˆ into (8) and its variance covariance matrix can

be approximated by

T LCL. If L 1 is zero and L 0 is estimable, Lb ˆ^ is called

the best linear unbiased estimator of L 0. If L 1 is nonzero and L 0 is

estimable, Lb ˆ^ is called the best linear unbiased predictor of Lb.

To test the hypothesis H 0 : Lb = a for a given vector a , we can use the statistic

( ) ( )

q

ˆ a ˆ ˆ

T − − =

− Lb a (LCL) Lb F

T 1

(9)

where q is the rank of the matrix L. The statistic in (9) has an approximate F

distribution. The numerator degrees of freedom is q and the denominator degree

of freedom can be obtained by Satterthwaite (1946) approximation. The method

outlined below is similar to Giesbrecht and Burns (1985), McLean and Sanders

(1988), and Fai and Cornelius (1996).

Satterthwaite’s Approximation

To find the denominator degrees of freedom of (9), we first perform the spectral

decomposition LC L 2 B D B

ˆ T T

where B is an orthogonal matrix of

eigenvectors and (^) D is a diagonal matrix of eigenvalues. If we let (^4) m be the m-th

row of B L , dm be the m-th eigenvalues and

m

T gm g

1

2 m m (ˆ)

2 d

− ∑

where

T T

ˆ

T m m m

C

g

and

1 ( ˆ)

− T is the covariance matrix of the estimated

covariance parameters. If we let

q

m 1

m m

m I( 2) 2

E

then the denominator degree of freedom is given by

E q

2 E

Note that the degrees of freedom can only be computed when E>q.

Type I &III Statistics

Type I or III test statistics are special cases of custom hypothesis tests.

Estimated Marginal Means (EMMEANS)

Estimated marginal means are special cases of custom hypothesis test. The

construction of the L matrix for EMMEANS can be found in "Estimated Marginal

Means" section of GLM’s algorithm document. If Bonferroni or Sidak adjustment is

requested for multiple comparisons, they will be computed according to the

algorithm detailed in Appendix 10:Post Hoc Tests.

Saved Values

If predicted values is requested, it will be computed by

y X Z ˆ

using the estimates given in (6).

If fixed predicted values is requested, it will be computed by

y X

1st and 2nd Derivatives of^4 k ()

In each Newton or scoring iteration we need to compute the 1

st and 2

nd derivatives

of the components of the log-likelihood (^4) k (), k=1,2,3. Here we let

g (^) k (^2 4) k and ( )

2

H (^) k (^2 4) k , k 2 1 , 2 , 3 , then the 1

st derivatives

with respect to the s-th parameter in is given by

[ ] (

[ ] ,

[ ] ( ),

1 1 3

1 1 2

1 1

g tr XV VV X

g rV VV r

g trV V

s

T s

s s

(13)

and the 2

nd derivatives with respect to s and t-th parameter are given by

[ ] ( ) ( ),

1 1 1 H 1 (^) st trV VsV Vt trV Vst 2 ^

r V VV r

r V VV XX VV r

H r V VV VV r

st

T

t

T s

T

s t

T st

1 1

1 1 1

1 1 1 2

~~ 2

[ ] 2

(14)

[ ] 2 (

1 1

1 1 1 1

1 1 1 3

trXV VV X

trXV VV XXV VV X

H trXV VV VV X

st

T

t

T s

T

s t

T st

where X 2 XC

for a matrix C satisfying CC XV X P

T 2 2

( ’ )

1 and

r I X(X’V X) X’V y y X

= [ − ] = −

− − − .

Derivatives w.r.t. Parameters in G

Derivatives with respect to parameters in G can be constructed by from the entries

of

r V X r V Z r V r

Z V X Z V Z Z V r

XV X XV Z XV r

W r X W r Z W r r

W ZX W Z Z W Z r

W X X W XZ W Xr

W Xr Z

T T T

1 1 1

1

(15)

The matrix W 1 ( X ; r ; Z )can be computed from W 1 ( X ; y ; Z )given in (27), by

using the following relationship,

r 2 y Xb 0

where b 0 is the current estimate of i.

Using the above formula, we can obtain the following expressions,

1 1 0

0

1 1 1

W ( y , y ) W ( y , X ) b

r V r y V y y V Xb

T T T

(16)

1 1 0

0

1 1 1

W ( X , y ) W ( X , X ) b

X V r X V y X V Xb

T T T

(17)

s

(1)s 0 0

W

and

s t

0

2 (2)st 0

W

where s and t are the s

th and t

th parameters of R. Therefore,

A R B

W AB A R R R R RR RR R RR RR B

A R B

W AB A R RR B

W AB A R B

st

s

T

st s t t s

st T

T

s

s T

T

[ ( )]

( , ) [ ]

[ ( )]

1

( 2 ) 1 1 1 1 1 1 1 1 0

1

( 1 ) 1 1 0

1 0

wT T

w

wT

w

The matrices A and B can be X , Z , Z

or r , where

1 1 1 Z 2 ZM 2 Z(G Z R Z)

~ T

, and

0

1 1 1 r 2 [ I X ( X ’ V X ) X ’ V ] y 2 y Xb

Remark: The matrix

1 1 1 ( )

G Z R Z

T involved in Z

can be obtained by

pre/post multiply

( )

1 I LZ R ZL

T T by L and

T L ).

Using these notations, the 1

st derivatives of (^4) k ()with respect to a parameter in

R are as follows,

[ ] tr( )

[ ] (, ) 2 ( ,

[ ] ( ) tr( ( , ) )

3 , 3

0

( 1 ) 0 0

( 1 ) 2 , 0

( 1 ) 0

1 1 ,

s Rs R

s

s s Rs

s Rs s

g H

W r ZW ZZW Zr

g W rr W r ZW Zr

g trR R W ZZM

(21)

To compute 2

nd derivatives w.r.t. s and t (of R ), we need to consider the

following simplification factors.

W ZZM W Z ZMW Z Z M

H R RR R R R

()st ()s ()t

s t st

st R

1 0

2 0

1 1 1 1

0

( 1 ) 0

( 1 ) 0 0

( 1 ) 2 0

W XZ W Zr W XZ W Z r

H W Xr W XZW ZZ WZr

s s

s s s R

, ) ( , )]

[ ( , ) (

) ( , ) (, )]

2 [ (,

( 1 ) 0 0

( 1 ) 0

( 1 ) 0 0

( 2 ) 0 0

0

( 2 ) 0 0

( 2 ) 2 0

W ZZ W Zr W Z r

W r ZW ZZ W r Z M

W r Z W Zr

W r Z W Z Z W Zr

H W r r

t t

s s

st

st st R

0

( 1 ) 0 0

0

( 1 ) 0

( 1 ) 0 0

( 1 ) 3 0

W XZ W ZZ W Z X

W XZ W ZX

H W XX W XZ W Z X

s

s s s R

[ ( , ) ] [ ( , ) ]

2 [ ( , ) ( , ) ( , )]

0

0 0

( 1 ) 0 0

( 1 ) 3 0

W Z X

MW ZZ IG W ZZM I

H W XZ W XZMW Z Z

t

st s s GR

Based on these simplification terms, the second derivatives are given by

[ 1 ] , ( 1 )

st H (^) GR st 2 trHGR

t R

s T G

st [ H (^) 2 ] GR , st 2 HGR 2 2 ( H 2 ) H 2 23)

[ H 3 (^) ] , tr ( H 3 P H 3 PH 3 P )

t R

s G

st GR st^2 GR

Gradient & Hessian of REML

The restricted log likelihood is given by

log| ( ) | (n p)log 2

2 ( | ) log| ( )| ( ) ( ) ( )

1

XV X

y V r V r

T

T (^4) REML

where p is equal to the rank of X.

Therefore the s

th element of the gradient vector is given by

[ g ] s 2 [ g 1 ] s [ g 2 ] s [ g 3 ] s

and the (s,t)-th element of the Hessian matrix is given by

[ H ] st 2 [ H 1 ] st [ H 2 ] st [ H 3 ] st

If scoring algorithm is used, the Hessian can be simplified to

[H] (^) st = − [H 1 ]st + [H 3 ] st.

Gradient & Hessian of MLE

The log likelihood is given by

nlog 2

2 ( | ) log| ( )| ( ) ( ) ( )

1

y V r V r

T (^4) MLE

Therefore the s

th element of the gradient vector is given by

[ g ] s 2 [ g 1 ] s [ g 2 ] s

and the (s,t)

th element of the Hessian matrix is given by

[ H ] st 2 [ H 1 ] st [ H 2 ] st.

If scoring algorithm is used the Hessian can be simplified to

[H]st = − [H 1 ] st.

It should be noted that the Hessian matrices for the scoring algorithm in both ML

and REML are not ‘exact’. In order to speed up calculation, some second derivative

terms are dropped. Therefore, they are only used in intermediate step of

optimization but not for standard error calculations.

Mixed Effects Model: Computing Covariance Matrices and Derivatives, Study notes of Mathematical Statistics

Related documents

Partial preview of the text

Download Mixed Effects Model: Computing Covariance Matrices and Derivatives and more Study notes Mathematical Statistics in PDF only on Docsity!

Overview

Notation

G

R.

G.

Model

parameter vectors. Thus  G = ( 1 ,..., K ).

Likelihood Functions

X’V X

XV X

V

Newton & Scoring Algorithms

Starting value of Newton’s Algorithm

K

K

R.

Confidence Intervals of Covariance Parameters

[exp (log( ˆ ) z Var(ˆ )) , exp(log( ˆ ) z Var(ˆ ))]

[tanh( arctan h ( )ˆ z ( 1 ˆ ) Var(ˆ)) ,tanh(arctan h( )ˆ z ( 1 ˆ ) Var(ˆ))]

Fixed and Random Effect Parameters

ZR X Z R X G

X R X X R Z

ˆ T T

C

E

2 E

Type I &III Statistics

Saved Values

1st and 2nd Derivatives of^4 k ()

[ ] (

[ ] ,

[ ] ( ),

[ ] ( ) ( ),

[ ] 2

[ ] 2 (

= [ − ] = −

Derivatives w.r.t. Parameters in G

W

W

W

W

A R B

W AB A R R R R RR RR R RR RR B

A R B

W AB A R RR B

W AB A R B

[ ( )]

( , ) [ ]

[ ( )]

~ T

[ ] (, ) 2 ( ,

W ZZM W Z ZMW Z Z M

H R RR R R R

, ) ( , )]

[ ( , ) (

) ( , ) (, )]

2 [ (,

W XZ W ZZ W Z X

W XZ W ZX

H W XX W XZ W Z X

[ ( , ) ] [ ( , ) ]

2 [ ( , ) ( , ) ( , )]

W Z X

MW ZZ IG W ZZM I

H W XZ W XZMW Z Z

[ 1 ] , ( 1 )

Gradient & Hessian of REML

XV X

Gradient & Hessian of MLE

G

R.

G.

parameter vectors. Thus G = ( 1 ,..., K ).

1st and 2nd Derivatives of^4 k ()