
















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An overview of the computational steps involved in estimating the parameters of a mixed effects model, with a focus on computing the covariance matrices and their derivatives. The use of newton's algorithm and scoring methods, as well as the computation of the likelihood functions for maximum likelihood estimation (mle) and restricted maximum likelihood estimation (reml). It also discusses the use of confidence intervals for the estimated covariance parameters.
Typology: Study notes
1 / 24
This page cannot be seen from the preview
Don't miss anything!

















1
This document summarizes the computational algorithms discussed in Wolfinger,
Tobias and Sall (1994).
Overall covariance parameter vector
A vector of covariance parameters associated with random effects.
k
A vector of covariance parameters associated with the k-th random
effect.
R
A vector of covariance parameters associated with the residual term.
K Number of random effects.
S (^) r Number of repeated subjects.
S (^) k Number of subjects in k-th random effect.
V () The n×n covariance matrix of^ y. This matrix is sometimes denoted by
Vy (). The subscript is removed here for a cleaner notation.
V ^ s( ) First derivative of^ V (^ )with respect to the s-th parameter in^ ^.
V ^ st( ) Second derivative of^ V^ ()with respect to s and t-th parameter in^ ^.
R ( (^) R) The n×n covariance matrix of^ r.
R ^ s ( R ) First derivative of^ R (^^ R)with respect to the s-th parameter in^ R
R ^ st ( R ) Second derivative of^ R (^^ R)with respect to s and t-th parameter in
G ( (^) G ) The covariance matrix of random effects.
G ^ s ( G ) First derivative of^ G^ (^ G )with respect to s-th parameter in^ ^ G.
G ^ st ( (^) G ) Second derivative of^ G (^^ G )with respect to s and t-th parameter in
V k (^) ( (^) k) The covariance matrix of the k-th random effect for^ one^ random subject.
V ^ k,s ( k ) First derivative of^ V k^ ( ^ k)with respect to s-th parameter in^ k^.
V k,st ( (^) k)
(^) Second derivative of ( ) V k (^) k with respect to s and t-th parameter in
k.
y (^) n by 1 vector of dependent variable.
X n by p design matrix of fixed effects.
Z n by q design matrix of random effects.
r (^) n by 1 vector of residual.
i p by 1 vector of fixed effects parameters.
t (^) q by 1 vector of random effects parameters.
r (^) n by 1 vector of residual error.
Wc n by n diagonal matrix of case weights.
Wrw n by n diagonal matrix of regression weights.
In this document, we assume a mixed effect model of the form
y = X + Z + (1)
In this model, we assume that r is distributed as N[ 0, R ( R )] and t is
independently distributed as N[ 0, G ( G )]. Therefore y is distributed as
N[ X V ( )], where ( ) ( ) ( R )
T V = ZG G Z + R. The unknown parameters
include the regression parameters in i and covariance parameters in .
Estimation of these model parameters relies on the use of a Newton-Ralphson or
scoring algorithm. When we use either algorithm for finding MLE or REML
Number of columns in the design matrix Z (^) kj (j-th random subject of k-th random
effect) is equal to number of levels of the k-th random effect variable.
Under this partition, the G( G ) will be a block diagonal matrix which can be
expressed as
G ( (^) G ) k 1 [ I (^) S Vk ( k )] k
K .
It should also be noted that each random effect has its own parameter vector k ,
k=1,..., K , and there are no functional constraints between elements in these
When there are correlated random effects, Zkj will be a combined design matrix
of the correlated random effects. Therefore in subsequent sections, each random
effect can either be one single random effect or a set of correlated random effects.
Repeated Subjects:
When the REPEATED subcommand is used, R (R ) will be a block diagonal
matrix where i-th block is R (^) i (R ), i=1,..., S R. That is,
( ) i 1 i( )
R R (^) R R
S =⊕ =
The dimension of R (^) i (R ) will be equal to number of cases in one repeated
subject but all R (^) i ( (^) R)share the same parameter vector (^) R.
Recall that the –2 times log likelihood of MLE is
i
nlog 2
2 ( , ) log|V( )| r( ) V( ) r( )
T 1 MLE
− 4 (2)
and the –2 times log likelihood of REML is
log| | (n p)log 2
(^2) REML( ) log| ( )| ( ) () ( )
−
−
V r V r
1
T 1 4 (3)
where n is the number of observations and p is the rank of fixed effect design
matrix. From (2) and (3), we can see that the key component of the likelihood
functions are
( ) log| ’ ( ) |.
( ) log| ( )|
1 3
1 2
1
r V r
θ θ
θ θ θ θ
θ θ
(4)
Therefore, in each estimation iteration, we need to compute 41 ( ), 4 2 ()and
4 3 ()as well as their 1
st and 2
nd derivatives with respective to .
Covariance parameters in can be found by maximizing (2) or (3), however, there
are no closed form solutions in general. Therefore Newton and scoring algorithms
are used to find the solution numerically. The algorithm is outlined as below,
function using last iteration's estimate i (^) − 1. (See later section for computation
of g and H)
− 1 = −.
Relative Hessian convergence:
g H gi
1 i
T i 1
| (^4) i |
( if (^4) i 20 , the denominator will be replaced by 1)
If no prior information is available, we can choose the initial values of G and R
to be the identity. However, it is highly desirably to estimate the scale of the
parameter. By ignoring the random effects, and assuming the residual errors are
i.i.d. with variance
2 , we can fit a GLM model and estimate
2 by the residual
sum of squares
2 ˆ. Then we choose the starting value of Newton’s algorithm to be
2
G (^) k and 1
2
The estimate ˆ^ (ML or REML) is asymptotically normally distributed. Its variance
covariance matrix can be approximated by
1 2 H
− − , where H is the hessian
matrix of the log-likelihood function evaluated at ˆ^. A simple Wald’s type
confidence interval for any covariance parameter can be obtained by using the
asymptotic normality of the parameter estimates, however it is not very appropriate
for variance parameters and correlation parameters that have a range of [ 0 ,∞) and
[− 1 , 1 ]respectively. Therefore these parameters are transformed to parameters that
have range ( −∞, ∞). Using uniform delta method, see for example (van der Vaart,
1998), these transformed estimates still have asymptotic normal distributions.
Suppose we are estimation a variance parameter
2 by
2 ˆ (^) n that is distributed as
N[ ,Var(ˆ )]
2 n
2 asymptotically. The transformation we used is log( )
2 which
can correct the skewness of
2 ˆ (^) n, moreover log( ˆ )
2 n has the range ( −∞,∞)
which matches that of normal distribution. Using delta method, one can show that
the asymptotic distribution of log( ˆ )
2 (^) n is N[log( ), Var(ˆ )]
2 n
2 4
−
. Thus, a
( 1 − h) 100 %confidence interval of log( )
2 is given by
[log( ˆ ) z Var(ˆ ) , log(ˆ ) z Var(ˆ )]
2 n
2 1 / 2 n
2 n
2 n
2 1 / 2 n
2 n (^) D D
− −
− − − +
where z 1 −D / 2 is the upper ( 1 − h/ 2 ) percentage point of standard normal
distribution. By inverting this confidence interval, a ( 1 − h) 100 % confidence
interval for
2 is given by
2 n
2 1 / 2 n
2 n
2 n
2 1 / 2 n
2 n (^) D D
− −
− − − +
When we need a confidence interval for a correlation parameter , a possible
transformation will be its generalized logit
arctan h( ) = 0. 5 log[( 1 + )/( 1 − )]. The resulting confidence interval for
will be
2 1 1 / 2
2 1
(^1) D/ 2
D
− −
− − − − − −
Estimation and prediction
After we obtain an estimate of , best linear unbiased estimator (BLUE) of i and
best linear unbiased predictor (BLUP) of t can be found by solving the mixed
model equations, Henderson (1984).
−
−
− − −
− −
Z R y
XR y
T 1
T 1
T 1 T 1 1
T 1 T 1
t
i (5)
for some hypothesis matrix L. Estimators or predictors of Lb can easily be
constructed by substituting iˆ^ and tˆ into (8) and its variance covariance matrix can
be approximated by
T LCL. If L 1 is zero and L 0 is estimable, Lb ˆ^ is called
the best linear unbiased estimator of L 0. If L 1 is nonzero and L 0 is
estimable, Lb ˆ^ is called the best linear unbiased predictor of Lb.
To test the hypothesis H 0 : Lb = a for a given vector a , we can use the statistic
( ) ( )
q
ˆ a ˆ ˆ
T − − =
− Lb a (LCL) Lb F
T 1
(9)
where q is the rank of the matrix L. The statistic in (9) has an approximate F
distribution. The numerator degrees of freedom is q and the denominator degree
of freedom can be obtained by Satterthwaite (1946) approximation. The method
outlined below is similar to Giesbrecht and Burns (1985), McLean and Sanders
(1988), and Fai and Cornelius (1996).
Satterthwaite’s Approximation
To find the denominator degrees of freedom of (9), we first perform the spectral
decomposition LC L 2 B D B
where B is an orthogonal matrix of
eigenvectors and (^) D is a diagonal matrix of eigenvalues. If we let (^4) m be the m-th
row of B L , dm be the m-th eigenvalues and
m
T gm g
1
2 m m (ˆ)
2 d
− ∑
where
T T
ˆ
T m m m
g
and
1 ( ˆ)
− T is the covariance matrix of the estimated
covariance parameters. If we let
q
m 1
m m
m I( 2) 2
then the denominator degree of freedom is given by
E q
Note that the degrees of freedom can only be computed when E>q.
Type I or III test statistics are special cases of custom hypothesis tests.
Estimated Marginal Means (EMMEANS)
Estimated marginal means are special cases of custom hypothesis test. The
construction of the L matrix for EMMEANS can be found in "Estimated Marginal
Means" section of GLM’s algorithm document. If Bonferroni or Sidak adjustment is
requested for multiple comparisons, they will be computed according to the
algorithm detailed in Appendix 10:Post Hoc Tests.
If predicted values is requested, it will be computed by
y X Z ˆ
using the estimates given in (6).
If fixed predicted values is requested, it will be computed by
y X
In each Newton or scoring iteration we need to compute the 1
st and 2
nd derivatives
of the components of the log-likelihood (^4) k (), k=1,2,3. Here we let
g (^) k (^2 4) k and ( )
2
H (^) k (^2 4) k , k 2 1 , 2 , 3 , then the 1
st derivatives
with respect to the s-th parameter in is given by
1 1 3
1 1 2
1 1
g tr XV VV X
g rV VV r
g trV V
s
T s
s s
s s
(13)
and the 2
nd derivatives with respect to s and t-th parameter are given by
1 1 1 H 1 (^) st trV VsV Vt trV Vst 2 ^
r V VV r
r V VV XX VV r
H r V VV VV r
st
T
t
T s
T
s t
T st
1 1
1 1 1
1 1 1 2
~~ 2
(14)
1 1
1 1 1 1
1 1 1 3
trXV VV X
trXV VV XXV VV X
H trXV VV VV X
st
T
t
T s
T
s t
T st
where X 2 XC
for a matrix C satisfying CC XV X P
T 2 2
( ’ )
1 and
r I X(X’V X) X’V y y X
− − − .
Derivatives with respect to parameters in G can be constructed by from the entries
of
r V X r V Z r V r
Z V X Z V Z Z V r
XV X XV Z XV r
W r X W r Z W r r
W ZX W Z Z W Z r
W X X W XZ W Xr
W Xr Z
T T T
T T T
T T T
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
1
(15)
The matrix W 1 ( X ; r ; Z )can be computed from W 1 ( X ; y ; Z )given in (27), by
using the following relationship,
r 2 y Xb 0
where b 0 is the current estimate of i.
Using the above formula, we can obtain the following expressions,
1 1 0
0
1 1 1
W ( y , y ) W ( y , X ) b
r V r y V y y V Xb
T T T
(16)
1 1 0
0
1 1 1
W ( X , y ) W ( X , X ) b
X V r X V y X V Xb
T T T
(17)
s
(1)s 0 0
and
s t
0
2 (2)st 0
where s and t are the s
th and t
th parameters of R. Therefore,
st
s
T
st s t t s
st T
T
s
s T
T
1
( 2 ) 1 1 1 1 1 1 1 1 0
1
( 1 ) 1 1 0
1 0
wT T
w
wT
w
The matrices A and B can be X , Z , Z
or r , where
1 1 1 Z 2 ZM 2 Z(G Z R Z)
, and
0
1 1 1 r 2 [ I X ( X ’ V X ) X ’ V ] y 2 y Xb
Remark: The matrix
1 1 1 ( )
G Z R Z
T involved in Z
can be obtained by
pre/post multiply
( )
1 I LZ R ZL
T T by L and
T L ).
Using these notations, the 1
st derivatives of (^4) k ()with respect to a parameter in
R are as follows,
[ ] tr( )
[ ] ( ) tr( ( , ) )
3 , 3
0
( 1 ) 0 0
( 1 ) 0 0
( 1 ) 2 , 0
( 1 ) 0
1 1 ,
s Rs R
s
s s Rs
s Rs s
g H
W r ZW ZZW Zr
g W rr W r ZW Zr
g trR R W ZZM
(21)
To compute 2
nd derivatives w.r.t. s and t (of R ), we need to consider the
following simplification factors.
()st ()s ()t
s t st
st R
1 0
1 0
2 0
1 1 1 1
0
( 1 ) 0
( 1 ) 0 0
( 1 ) 0 0
( 1 ) 2 0
W XZ W Zr W XZ W Z r
H W Xr W XZW ZZ WZr
s s
s s s R
( 1 ) 0 0
( 1 ) 0
( 1 ) 0
( 1 ) 0 0
( 2 ) 0 0
0
( 2 ) 0 0
( 2 ) 2 0
W ZZ W Zr W Z r
W r ZW ZZ W r Z M
W r Z W Zr
W r Z W Z Z W Zr
H W r r
t t
s s
st
st
st st R
0
( 1 ) 0 0
0
( 1 ) 0
( 1 ) 0 0
( 1 ) 3 0
s
s
s s s R
0
0 0
( 1 ) 0 0
( 1 ) 3 0
t
st s s GR
Based on these simplification terms, the second derivatives are given by
st H (^) GR st 2 trHGR
t R
s T G
st [ H (^) 2 ] GR , st 2 HGR 2 2 ( H 2 ) H 2 23)
[ H 3 (^) ] , tr ( H 3 P H 3 PH 3 P )
t R
s G
st GR st^2 GR
The restricted log likelihood is given by
log| ( ) | (n p)log 2
2 ( | ) log| ( )| ( ) ( ) ( )
1
1
y V r V r
T
T (^4) REML
where p is equal to the rank of X.
Therefore the s
th element of the gradient vector is given by
[ g ] s 2 [ g 1 ] s [ g 2 ] s [ g 3 ] s
and the (s,t)-th element of the Hessian matrix is given by
[ H ] st 2 [ H 1 ] st [ H 2 ] st [ H 3 ] st
If scoring algorithm is used, the Hessian can be simplified to
[H] (^) st = − [H 1 ]st + [H 3 ] st.
The log likelihood is given by
nlog 2
2 ( | ) log| ( )| ( ) ( ) ( )
1
y V r V r
T (^4) MLE
Therefore the s
th element of the gradient vector is given by
[ g ] s 2 [ g 1 ] s [ g 2 ] s
and the (s,t)
th element of the Hessian matrix is given by
[ H ] st 2 [ H 1 ] st [ H 2 ] st.
If scoring algorithm is used the Hessian can be simplified to
[H]st = − [H 1 ] st.
It should be noted that the Hessian matrices for the scoring algorithm in both ML
and REML are not ‘exact’. In order to speed up calculation, some second derivative
terms are dropped. Therefore, they are only used in intermediate step of
optimization but not for standard error calculations.