














Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This document has following main points Variance Components, Weights, Model, Minimum Norm Quadratic Unbiased Estimate, Maximum Likelihood Estimate, The Fisher Information Matrix, Iteration Procedure
Typology: Study notes
1 / 22
This page cannot be seen from the preview
Don't miss anything!















1
The Variance Components procedure provides estimates for variances of random effects under a general linear model framework. Four types of estimation methods are available in this procedure.
The following notation is used throughout this chapter. Unless otherwise stated, all vectors are column vectors and all quantities are known.
k (^) Number of random effects, k ≥ 0 m 0 Number of parameters in the fixed effects, m 0 ≥ 0 mi Number of parameters in the i th random effect, mi ≥ 0 , i = 1, K, k
σ (^) i^2 Unknown variance of the i th random effect, σ (^) i^2 ≥ 0 , i = 1, K, k σ (^) e^2 Unknown variance of the residual term, same as σ (^) k^2 +^1 , σ (^) e^2 > 0 γ (^) i^2 Unknown variance ratio of the i th random effect, γ (^) i^2^ = σ (^) i^2^ σ e^2 , γ (^) i^2 ≥ 0 , i = 1, K, k , and γ (^) k^2 +^1 = 1 y (^) The length n vector of observations
X i The n × mi design matrix, i = 0 1, , K, k β 0 The length m 0 vector of parameters of the fixed effects β i The length mi vector of parameters of the i th random effect, i = 1, K, k Unless otherwise stated, a p × p identity matrix is denoted as I (^) p , a p × q zero matrix is denoted as (^0) p × q , and a zero vector of length p is denoted as (^0) p.
For the sake of clarity and simplicity, the algorithms described in this chapter assume unit frequency weight and unit regression weight for all cases. Weights can be applied as described in the following two sections.
Frequency Weight
The WEIGHT command specifies frequency weights.
Regression Weight
The REGWGT subcommand specifies regression weights. Suppose the l th 0 l = 1, K, n 5 case has a regression weight wi > 0 (cases with nonpositive regression weights are excluded from all calculations in the procedure). Let W = diag 1 w (^) 1 , K, w (^) n 6 be the n × n diagonal weight matrix. Then the VARCOMP procedure will perform all calculations as if y is physically transformed to W y
(^12)
and X i to W X
(^12) i ,^ i^ =^ 0 1, ,^ K,^ k ; and then the pertinent algorithm is applied to the transformed data.
The mixed model is represented, following Rao (1973), as
(^0 0) ∑ 1
β (^) i β i i
k
and
q i^ k i (^) i k = i ′^ = = +
% & '
X Ry Ry
1 6 0 5
where SSQ( A ) is the sum of squares of all elements of a matrix A.
MINQUE(0)
The prior values are α (^) i = 0 , i = 1, K, k , and α (^) k + 1 = 1. Under this set of prior values, V^ =^ I n and^ R^ =^ I^ n −^ X^ 0 1 X X ′ 0^ 0 6 − X ′ 0. Since this^ R^ is an idempotent matrix, some of the elements of S and q can be simplified to
s i k s j k s n q
i k i i k j j j k k k
, , ,
trace , , ; trace , , ; rank
1 1 1 1 0 1
y Ry
1 6 3 8 1 6
Using the algorithm by Goodnight (1978), the elements of S and q are obtained without explicitly computing R. The steps are described as follows: Step 1. Form the symmetric matrix:
′ ′ ′ ′ ′ ′ ′ ′
′ ′ ′ ′ ′ ′ ′ ′
!
"
$
X X X X X X X y X X X X X X X y
X X X X X X X y y X y X y X y y
0 0 0 1 0 0 1 0 1 1 1 1
0 1 0 1
k k
k k k k k k
Step 2. Sweep the above matrix by pivoting on each diagonal of X X ′ 0 0. This produces the following matrix:
G GX X GX X GX y X X G X RX X RX X Ry
X X G X RX X RX X Ry y X G y RX y RX y Ry
!
"
$
0 1 0 0 1 0 1 1 1 1
0 1 0 1
k k
k k k k k k
where G = 1 X X ′ 0 06 −. In the process of computing the above matrix, the rank of X 0 is obtained as the number of nonzero pivots found.
MINQUE(1)
The prior values are α (^) i = 1 , i = 1 , K, k + 1. Under this set of prior values,
∑ i^ i i
k
1
1
. Using Giesbrecht (1983), the matrix S and the vector q are obtained
through an iterative procedure. The steps are described as follows: Step 1. Construct the augmented matrix A = X (^) 0 X (^) 1 L X (^) k y. Then compute the 0 m + 1 5 0× m + 15 matrix T 0 (^) k + 1 5 = A A ′.
Step 2. Define H (^) l X X i i i l
k 0 5 =^ ′ =
∑
1 , and T 0 5 (^) l = A H ′ (^) 0 5− l^1 A^ , l = 1, K, k. Update T 0 (^) l + 15 to
T 0 5 (^) l using the W Transform given in Goodnight and Hemmerle (1979). The updating formula is
T 0 5 (^) l = T 0 (^) l + 5 − A H ′ (^) 0 − l^ + 5 X (^) l 4 I (^) m (^) l + X H l ′ (^) 0 − l + 5 X (^) l 9 X H l ′ (^) 0 l 5 A
−
− 1 1
1 1
1 1
1
Likelihood Function
The likelihood function is
L ≡ L = −^ n^ e − − ′^ − e
− (^) − 0 5 0 5θ 2 2 2 1 β 6 1 β 6
1 2 (^1) 2 0 0 π σ V exp y X V^1 y X 0 0 σ^2.
The log-likelihood function is
l L n^ n^ e e
= log = − 2 log 2 − 2 log − 12 log − 1 − ′^ − −. 2
2 2 0 0 π σ^1 0 σ
0 5 4 9 V 1 y X β 6 V 1 y X β 6
The Gradient Vector
∂ ∂ (^) σ ∂ ∂ γ σ ∂ ∂ σ σ σ
l
l (^) i k
l n
e
i e
i i i i
e e e
β 0 2 0
1
2 2
1 1 1
2 4
1 2
−
− − −
−
X V r
r V X X V r X V X
r V r
tr , , , ,
4 9 K
where r = y − X 0 β 0. The gradient vector is
dl d
l
l
l e
θ
β
γ
!
"
$
∂ σ
0 2
2
The Hessian Matrix
∂ ∂ (^) σ
2 0 0 2
0 1 0
l 1 β β (^) e
∂ ∂ γ ∂ σ ∂ ∂ γ ∂ γ σ
2 2 0 2
(^1 1 )
2 2 2
1 1 2
1 1 1
l (^) i k
l (^) i k j k
i e
i i
i j
i j j i e
i i j j
β
− −
− − − − −
r V X X V X
X V X X V X r V X X V X X V r
tr , , ; , , ,
4 9 K^ K
∂ σ ∂ σ ∂ ∂ σ ∂ γ σ ∂ ∂ σ ∂ σ σ σ
2 2 0 4
(^1 )
2 2 2 4
1 1
2 2 2 4 6
1
l
l (^) j k
l n
e e
e j e
j j
e e e e
β
−
− −
−
r V X
r V X X V r
r V r
The Hessian matrix is
d l d d
l l l
l l l
l l l
e
e
e e e e
2
2 0 0
2 0 2
2
2 0 2 (^2 )
2 2 2
2 2 2 2 (^2 )
2 2 2
2 2 2
θ θ
β β (^) β γ β
γ β γ γ γ
β γ
!
"
$
∂ ∂ σ ∂ ∂ ∂
∂ ∂ σ ∂ ∂ σ ∂
∂ σ ∂
∂ σ ∂ σ
E l
E l^ j k
E l^ n
e
m
e j e
j j
e e e
∂ σ ∂
∂ ∂ σ ∂ γ σ ∂ ∂ σ ∂ σ σ
2 (^2 ) 2 2 2 2
1
2 2 2 4
0
1 2
β
^
^
= −^ ′^ =
^
^
−
tr 4 X V X 9 , K, ,
The Fisher Information matrix is
E d l d d
l
E l^ E l
E l^ E l
m m m m
m m m (^) e
m e e e
2
2 0 0 2 2 2
2 2 2 2 2 2
2 2 2
0 0 0
0 0
0
θ θ
β β
γ γ γ
γ
^
^
^
^
^
′
^
^
^
!
"
$
× −
− ×
∂ ∂ σ ∂ ∂ σ ∂
∂ σ ∂ σ
1 6
1 6
where
E l
E l^ E l
E l^ E l
k
k k k
∂ γ ∂ γ
∂ γ ∂ γ
∂ γ ∂ γ
∂ γ ∂ γ
2 2 2
2 12 12
2 12 2
2 2 12
2 2 2
γ γ
^
^
^
^
^
^
^
^
!
"
$
L
and
E l
E l
E l
e
e
e k
∂ σ ∂
∂ σ ∂ γ
∂ σ ∂ γ
2 2 2
2 2 12
2 2 2
γ
^
^
^
^
!
"
$
Iteration Procedure
Initial Values
Updating
At the s th iteration 0 s = 0 1, ,K 5 , the parameter vector is updated as
θ^ $^0 s + 1 5 = θ$^ 0 5 s + ρ∆θ$0 5 s
where ∆ θ$ (^) 0 5 s is the value of increment ∆θ evaluated at θ = θ$^ 0 5 s , and ρ > 0 is a step size such that l (^) 4 θ $^0 s + 1 5 9 ≥ l 4 θ $0 5 s 9. The increment vector depends on the choice of step type—Newton-Raphson versus Fisher scoring. The step size is determined by the step-halving technique with ρ = 1 initially and a maximum of 10 halvings.
Covariance Matrix
Let θ$ be the vector of maximum likelihood estimates. Their covariance matrix is given by
cov $ $
θ 4 9 θ θ (^) θ θ
^
− E d l d d
2 1
Let
ψ
β
=
!
"
$
0 1
2
2 2
σ
σ σ
k e
be the original parameters. Their maximum likelihood estimates are given by
ψ
β
=
!
"
$
0 2 12
2 2 2
σ γ
σ γ σ
e
e k e
and their covariance matrix is estimated by
cov (^) 2 7ψ$^ = J cov (^) 4 9θ$ J ′
where
!
"
$
× ×
m m k m k m e k
0 0 0 0
2 0 0 1
σ γ
which is the 1 m (^) 0 + k + (^1) 6 1 × m (^) 0 + k + 16 Jacobian matrix of transforming θ to ψ.
The restricted maximum likelihood method finds a linear transformation on y such that the resulting vector does not involve the fixed effect parameter vector β 0 regardless of their values. It has been shown that these linear combinations are the residuals obtained after a linear regression on the fixed effects. Suppose r is the
be an n × 0 n − r 5 matrix whose columns are these linearly independent combinations. Then the properties of K are (Searle et al., 1992, Chapter 6):
′ = ′ =
(^0 0) n r 5 m 0
where T is a 0 n − r 5 × n matrix with linearly independent rows and
M = I (^) n − X (^) 0 1 X X ′ 0 (^) 0 6 − X ′ 0
It can be shown that REML estimation is invariant to K (Searle et al., 1992, Chapter 6); thus, we can choose K such that K K ′ = I n − r to simplify calculations. It follows that the distribution of K y ′ is N (^) n − r 4 (^0) , σ (^) e^2 K VK ′ 9.
Parameters
The parameter vector is θ = γ
!
" $
2 σ (^2) e^ where^ γ^
2
12
2
!
"
$
γ
γ
k
The Hessian Matrix
∂ γ ∂ γ σ ∂ ∂ σ ∂ γ σ ∂ ∂ σ ∂ σ σ σ
2 2 2 2 2 2 2 4 2 2 2 6 4
l (^) i k j k
l (^) j k
l n r
i j e
i i j j i j j i
e j e
j j
e e e e
y RX X RX X Ry X RX X RX
y RX X Ry
y Ry
tr , , ; , ,
3 8 K^ K
The Hessian matrix is
d l d d
l l
l l
e
e e e
2
2 2 2
2 2 2 2 2 2
2 2 2
θ θ
γ γ γ
γ
!
"
$
∂ ∂ ∂
∂ ∂ ∂ σ ∂ ∂ σ ∂
∂ ∂ σ ∂ σ
where
∂ γ ∂ γ
∂ γ ∂ γ
∂ γ ∂ γ
∂ γ ∂ γ
2 2 2
2 12 12
2 12 2
2 2 12
2 2 2
l
l l
l l
k
k k k
γ γ
!
"
$
and^ ∂ ∂ σ ∂
∂ σ ∂ γ
∂ σ ∂ γ
2 2 2
2 2 12
2 2 2
l
l
l
e
e
e k
γ
!
"
$
M
The Fisher Information Matrix
Since K X ′ (^) 0 = (^0) 0 n − r 5 × m 0 and trace 0 RV 5 = n − r , the expected second derivatives are
E l^ i k j k i j
i j j i
∂ γ ∂ γ
2 2 2
= −^ tr^3 X RX X RX^ ′^ ′^8 =^ ,^ K,^ ,^ = ,^ K,
E l^ j k
E l^ n^ r
e j e
j j
e e e
∂ ∂ σ ∂ γ σ ∂ ∂ σ ∂ σ σ
2 2 2 2
2 2 2 4
= −^ ′^ =
^
^
tr 3 X RX 8 , K,
The Fisher Information matrix is
E d l d d
E l^ E l
E l^ E l
e
e e e
2
2 2 2
2 2 2 2 2 2
2 2 2
θ θ
γ γ γ
γ
^
^
^
^
^
^
^
^
!
"
$
∂ ∂ σ ∂ ∂ σ ∂
∂ σ ∂ σ
where
E l
E l^ E l
E l^ E l
k
k k k
∂ γ ∂ γ
∂ γ ∂ γ
∂ γ ∂ γ
∂ γ ∂ γ
2 2 2
2 12 12
2 12 2
2 2 12
2 2 2
γ γ
^
^
^
^
^
^
^
^
!
"
$
L
and
E l
E l
E l
e
e
e k
∂ σ ∂
∂ σ ∂ γ
∂ σ ∂ γ
2 2 2
2 2 12
2 2 2
γ
^
^
^
^
!
"
$
Otherwise, the Fisher scoring step is used instead. The increment vector for each type of step is:
^
− d l d d
dl d
2 1 .
^
^
− E d l d d
dl d
2 1 .
Convergence Criteria
Given the convergence criterion ε > 0 , the iteration is considered converged when the following criteria are satisfied:
Negative Variance Estimates
Negative variance estimates can occur at the end of an iteration. An ad hoc method is to set those estimates to zero before the next iteration.
Covariance Matrix
Let θ$ be the vector of maximum likelihood estimates. Their covariance matrix is given by
cov $ $
θ 4 9 θ θ (^) θ θ
^
− E d l d d
2 1
Let
ψ =
!
"
$
σ
σ σ
12
2 2
k e
be the original parameters. Their maximum likelihood estimates are given by
ψ =
!
"
$
σ γ
σ γ σ
e
e k e
2 12
2 2 2
and their covariance matrix is estimated by
cov (^) 2 7ψ$^ = J cov (^) 4 9θ$ J ′
where
!
" $
σ (^) e^2 (^) k 0 1
γ
which is the 0 k + 1 5 0× k + 15 Jacobian matrix of transforming θ to ψ.
The ANOVA variance component estimates are obtained by equating the expected mean squares of the random effects to their observed mean squares. The VARCOMP procedure offers two types of sum of squares: Type I and Type III (see Appendix 11 for details).