Variance Components - Mathematics and Statistics - Study Notes, Study notes of Mathematical Statistics

This document has following main points Variance Components, Weights, Model, Minimum Norm Quadratic Unbiased Estimate, Maximum Likelihood Estimate, The Fisher Information Matrix, Iteration Procedure

Typology: Study notes

2011/2012

Uploaded on 10/31/2012

sangawar
sangawar 🇮🇳

4.5

(4)

118 documents

1 / 22

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Variance Components (VARCOMP)
The Variance Components procedure provides estimates for variances of random
effects under a general linear model framework. Four types of estimation methods
are available in this procedure.
Notation
The following notation is used throughout this chapter. Unless otherwise stated, all
vectors are column vectors and all quantities are known.
n Number of observations, n1
k Number of random effects, k0
m0 Number of parameters in the fixed effects, m00
mi Number of parameters in the ith random effect, mi
0, ik=1, ,K
m Total number of parameters, mm m m
k
=+++
01
L
σ
i
2 Unknown variance of the ith random effect,
σ
i
20, ik=1, ,K
σ
e
2 Unknown variance of the residual term, same as
σ
k+1
2
,
σ
e
20>
γ
i
2 Unknown variance ratio of the ith random effect,
γσσ
i
ie
2
22
=,
γ
i
20,
ik=1, ,K, and
γ
k+=
1
21
y The length n vector of observations
e The length n vector of residuals
Xi The nm
i
× design matrix, ik=01,, ,K
β0 The length m0 vector of parameters of the fixed effects
βi The length mi vector of parameters of the ith random effect, ik=1, ,K
Unless otherwise stated, a
p
p
× identity matrix is denoted as Ip, a
p
q× zero
matrix is denoted as 0pq×, and a zero vector of length p is denoted as 0p.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16

Partial preview of the text

Download Variance Components - Mathematics and Statistics - Study Notes and more Study notes Mathematical Statistics in PDF only on Docsity!

1

The Variance Components procedure provides estimates for variances of random effects under a general linear model framework. Four types of estimation methods are available in this procedure.

Notation

The following notation is used throughout this chapter. Unless otherwise stated, all vectors are column vectors and all quantities are known.

n Number of observations, n ≥ 1

k (^) Number of random effects, k ≥ 0 m 0 Number of parameters in the fixed effects, m 0 ≥ 0 mi Number of parameters in the i th random effect, mi ≥ 0 , i = 1, K, k

m Total number of parameters, m = m 0 + m 1 + L+ mk

σ (^) i^2 Unknown variance of the i th random effect, σ (^) i^2 ≥ 0 , i = 1, K, k σ (^) e^2 Unknown variance of the residual term, same as σ (^) k^2 +^1 , σ (^) e^2 > 0 γ (^) i^2 Unknown variance ratio of the i th random effect, γ (^) i^2^ = σ (^) i^2^ σ e^2 , γ (^) i^2 ≥ 0 , i = 1, K, k , and γ (^) k^2 +^1 = 1 y (^) The length n vector of observations

e The length n vector of residuals

X i The n × mi design matrix, i = 0 1, , K, k β 0 The length m 0 vector of parameters of the fixed effects β i The length mi vector of parameters of the i th random effect, i = 1, K, k Unless otherwise stated, a p × p identity matrix is denoted as I (^) p , a p × q zero matrix is denoted as (^0) p × q , and a zero vector of length p is denoted as (^0) p.

Weights

For the sake of clarity and simplicity, the algorithms described in this chapter assume unit frequency weight and unit regression weight for all cases. Weights can be applied as described in the following two sections.

Frequency Weight

The WEIGHT command specifies frequency weights.

  • Cases with nonpositive frequency are excluded from all calculations in the procedure.
  • Non-integral frequency weight is rounded to the nearest integer.
  • The total sample size is equal to the sum of positive rounded frequency weights.

Regression Weight

The REGWGT subcommand specifies regression weights. Suppose the l th 0 l = 1, K, n 5 case has a regression weight wi > 0 (cases with nonpositive regression weights are excluded from all calculations in the procedure). Let W = diag 1 w (^) 1 , K, w (^) n 6 be the n × n diagonal weight matrix. Then the VARCOMP procedure will perform all calculations as if y is physically transformed to W y

(^12)

and X i to W X

(^12) i ,^ i^ =^ 0 1, ,^ K,^ k ; and then the pertinent algorithm is applied to the transformed data.

Model

The mixed model is represented, following Rao (1973), as

y = X + X + e

(^0 0) ∑ 1

β (^) i β i i

k

and

q i^ k i (^) i k = i ′^ = = +

% & '

SSQ , ,

SSQ

X Ry Ry

1 6 0 5

K

where SSQ( A ) is the sum of squares of all elements of a matrix A.

MINQUE(0)

The prior values are α (^) i = 0 , i = 1, K, k , and α (^) k + 1 = 1. Under this set of prior values, V^ =^ I n and^ R^ =^ I^ n −^ X^ 0 1 X X ′ 0^ 0 6 − X ′ 0. Since this^ R^ is an idempotent matrix, some of the elements of S and q can be simplified to

s i k s j k s n q

i k i i k j j j k k k

, , ,

trace , , ; trace , , ; rank

1 1 1 1 0 1

X RX

X RX

X

y Ry

1 6 3 8 1 6

K

K

Using the algorithm by Goodnight (1978), the elements of S and q are obtained without explicitly computing R. The steps are described as follows: Step 1. Form the symmetric matrix:

′ ′ ′ ′ ′ ′ ′ ′

′ ′ ′ ′ ′ ′ ′ ′



!

"

$

# # # #

X X X X X X X y X X X X X X X y

X X X X X X X y y X y X y X y y

0 0 0 1 0 0 1 0 1 1 1 1

0 1 0 1

L

L

M M M M

L

L

k k

k k k k k k

Step 2. Sweep the above matrix by pivoting on each diagonal of X X ′ 0 0. This produces the following matrix:

G GX X GX X GX y X X G X RX X RX X Ry

X X G X RX X RX X Ry y X G y RX y RX y Ry



!

"

$

# # # #

0 1 0 0 1 0 1 1 1 1

0 1 0 1

L

L

M M M M

L

L

k k

k k k k k k

where G = 1 X X ′ 0 06 −. In the process of computing the above matrix, the rank of X 0 is obtained as the number of nonzero pivots found.

Step 3. Form S and q. The MINQUE(0) of σ are σ =$ S q −.

MINQUE(1)

The prior values are α (^) i = 1 , i = 1 , K, k + 1. Under this set of prior values,

V = X X

i^ i i

k

1

1

. Using Giesbrecht (1983), the matrix S and the vector q are obtained

through an iterative procedure. The steps are described as follows: Step 1. Construct the augmented matrix A = X (^) 0 X (^) 1 L X (^) k y. Then compute the 0 m + 1 5 0× m + 15 matrix T 0 (^) k + 1 5 = A A ′.

Step 2. Define H (^) l X X i i i l

k 0 5 =^ ′ =

1 , and T 0 5 (^) l = A H ′ (^) 0 5− l^1 A^ , l = 1, K, k. Update T 0 (^) l + 15 to

T 0 5 (^) l using the W Transform given in Goodnight and Hemmerle (1979). The updating formula is

T 0 5 (^) l = T 0 (^) l + 5 − A H ′ (^) 0 − l^ + 5 X (^) l 4 I (^) m (^) l + X H l ′ (^) 0 − l + 5 X (^) l 9 X H l ′ (^) 0 l 5 A

− 1 1

1 1

1 1

1

Likelihood Function

The likelihood function is

LL = −^ n^ e  − − ′^ − e 

− (^) − 0 5 0 5θ 2 2 2 1 β 6 1 β 6

1 2 (^1) 2 0 0 π σ V exp y X V^1 y X 0 0 σ^2.

The log-likelihood function is

l L n^ n^ e e

= log = − 2 log 2 − 2 log − 12 log − 1 − ′^ − −. 2

2 2 0 0 π σ^1 0 σ

0 5 4 9 V 1 y X β 6 V 1 y X β 6

The Gradient Vector

∂ ∂ (^) σ ∂ ∂ γ σ ∂ ∂ σ σ σ

l

l (^) i k

l n

e

i e

i i i i

e e e

β 0 2 0

1

2 2

1 1 1

2 4

1 2

= ′^ ′ − ′ =

= ′^ −

− − −

X V r

r V X X V r X V X

r V r

tr , , , ,

4 9 K

where r = yX 0 β 0. The gradient vector is

dl d

l

l

l e

θ

β

γ



!

"

$

# # # # #

∂ σ

0 2

2

The Hessian Matrix

∂ ∂ (^) σ

2 0 0 2

0 1 0

l 1 β β (^) e

= − X V ′ − X

∂ ∂ γ ∂ σ ∂ ∂ γ ∂ γ σ

2 2 0 2

(^1 1 )

2 2 2

1 1 2

1 1 1

l (^) i k

l (^) i k j k

i e

i i

i j

i j j i e

i i j j

β

− −

− − − − −

r V X X V X

X V X X V X r V X X V X X V r

tr , , ; , , ,

K

4 9 K^ K

∂ σ ∂ σ ∂ ∂ σ ∂ γ σ ∂ ∂ σ ∂ σ σ σ

2 2 0 4

(^1 )

2 2 2 4

1 1

2 2 2 4 6

1

l

l (^) j k

l n

e e

e j e

j j

e e e e

β

− −

r V X

r V X X V r

r V r

, K,

The Hessian matrix is

d l d d

l l l

l l l

l l l

e

e

e e e e

2

2 0 0

2 0 2

2

2 0 2 (^2 )

2 2 2

2 2 2 2 (^2 )

2 2 2

2 2 2

θ θ

β β (^) β γ β

γ β γ γ γ

β γ



!

"

$

# # # # # #

∂ ∂ σ ∂ ∂ ∂

∂ ∂ σ ∂ ∂ σ ∂

∂ σ ∂

∂ σ ∂ σ

E l

E l^ j k

E l^ n

e

m

e j e

j j

e e e

∂ σ ∂

∂ ∂ σ ∂ γ σ ∂ ∂ σ ∂ σ σ

2 (^2 ) 2 2 2 2

1

2 2 2 4

0

1 2

β

 ^

 ^

 



 

 = −^ ′^ =

 ^

 ^

tr 4 X V X 9 , K, ,

The Fisher Information matrix is

E d l d d

l

E l^ E l

E l^ E l

m m m m

m m m (^) e

m e e e

2

2 0 0 2 2 2

2 2 2 2 2 2

2 2 2

0 0 0

0 0

0

θ θ

β β

γ γ γ

γ

 ^

 ^

 ^

 ^

 ^

  ′

 ^

 ^

 ^

 



!

"

$

# # # # # #

× −

− ×

∂ ∂ σ ∂ ∂ σ ∂

∂ σ ∂ σ

1 6

1 6

where

E l

E l^ E l

E l^ E l

k

k k k

∂ γ ∂ γ

∂ γ ∂ γ

∂ γ ∂ γ

∂ γ ∂ γ

2 2 2

2 12 12

2 12 2

2 2 12

2 2 2

γ γ

 ^

 ^

 ^

 ^

 ^

 

 ^

 ^

 ^

 



!

"

$

# # # # # #

L

M O M

L

and

E l

E l

E l

e

e

e k

∂ σ ∂

∂ σ ∂ γ

∂ σ ∂ γ

2 2 2

2 2 12

2 2 2

γ

 ^

 ^

 ^

 

 ^

 



!

"

$

# # # # # #

M

Iteration Procedure

Initial Values

  • Fixed Effect Parameters : β$ 0 = 1 X X 0 ′ 0 (^) 6 − X y 0 ′.
  • Random Effect Variance Components : For the i th 0 i = 1, K, k 5 random effect, compute β$ i = 1 X X (^) i ′ (^) i (^) 6 − X y i ′. Then assign the variance of the mi elements of β$ i using divisor 1 mi − 16 to the estimate σ$ (^) i^2 if mi ≥ 2 ; otherwise σ$ (^) i^2 = 0.
  • Residual Variance : σ$ (^) e^2^ = r rn where X = X (^) 0 X (^) 1 L X k and r = y − 0 X X ′ 5 − X y ′. If σ$ (^) e^2 = 0 but k ≥ 1 then reset σ$ (^) e^2 = 10 −^8 so that the iteration can continue. The variance ratios are then computed as γ$ (^) i^2^ = σ$^ i^2^ σ$ e^2 , i = 1, K, k. Following the same method in which the residual variance is initialized, σ$ (^) e^2 > 0 for k ≥ 1.

Updating

At the s th iteration 0 s = 0 1, ,K 5 , the parameter vector is updated as

θ^ $^0 s + 1 5 = θ$^ 0 5 s + ρ∆θ$0 5 s

where ∆ θ$ (^) 0 5 s is the value of increment ∆θ evaluated at θ = θ$^ 0 5 s , and ρ > 0 is a step size such that l (^) 4 θ $^0 s + 1 5 9 ≥ l 4 θ $0 5 s 9. The increment vector depends on the choice of step type—Newton-Raphson versus Fisher scoring. The step size is determined by the step-halving technique with ρ = 1 initially and a maximum of 10 halvings.

Covariance Matrix

Let θ$ be the vector of maximum likelihood estimates. Their covariance matrix is given by

cov $ $

θ 4 9 θ θ (^) θ θ

 ^

 













E d l d d

2 1

Let

ψ

β

=



!

"

$

# # # #

0 1

2

2 2

σ

σ σ

M

k e

be the original parameters. Their maximum likelihood estimates are given by

ψ

β

=



!

"

$

# # # #

0 2 12

2 2 2

σ γ

σ γ σ

e

e k e

M

and their covariance matrix is estimated by

cov (^) 2 7ψ$^ = J cov (^) 4 9θ$ J

where

J

I 0 0

= 0 I



!

"

$

× ×

m m k m k m e k

0 0 0 0

2 0 0 1

σ γ

which is the 1 m (^) 0 + k + (^1) 6 1 × m (^) 0 + k + 16 Jacobian matrix of transforming θ to ψ.

Restricted Maximum Likelihood Estimate (REML)

The restricted maximum likelihood method finds a linear transformation on y such that the resulting vector does not involve the fixed effect parameter vector β 0 regardless of their values. It has been shown that these linear combinations are the residuals obtained after a linear regression on the fixed effects. Suppose r is the

rank of X 0 ; then there are at most n − r linearly independent combinations. Let K

be an n × 0 nr 5 matrix whose columns are these linearly independent combinations. Then the properties of K are (Searle et al., 1992, Chapter 6):

′ = ′ =

K X 0 − ×

K TM

(^0 0) n r 5 m 0

where T is a 0 nr 5 × n matrix with linearly independent rows and

M = I (^) nX (^) 0 1 X X ′ 0 (^) 0 6 − X ′ 0

It can be shown that REML estimation is invariant to K (Searle et al., 1992, Chapter 6); thus, we can choose K such that K K ′ = I nr to simplify calculations. It follows that the distribution of K y ′ is N (^) nr 4 (^0) , σ (^) e^2 K VK ′ 9.

Parameters

The parameter vector is θ = γ

 !

" $

2 σ (^2) e^ where^ γ^

2

12

2



!

"

$

γ

γ

M

k

The Hessian Matrix

∂ γ ∂ γ σ ∂ ∂ σ ∂ γ σ ∂ ∂ σ ∂ σ σ σ

2 2 2 2 2 2 2 4 2 2 2 6 4

l (^) i k j k

l (^) j k

l n r

i j e

i i j j i j j i

e j e

j j

e e e e

y RX X RX X Ry X RX X RX

y RX X Ry

y Ry

tr , , ; , ,

3 8 K^ K

K

The Hessian matrix is

d l d d

l l

l l

e

e e e

2

2 2 2

2 2 2 2 2 2

2 2 2

θ θ

γ γ γ

γ



!

"

$

∂ ∂ ∂

∂ ∂ ∂ σ ∂ ∂ σ ∂

∂ ∂ σ ∂ σ

where

∂ γ ∂ γ

∂ γ ∂ γ

∂ γ ∂ γ

∂ γ ∂ γ

2 2 2

2 12 12

2 12 2

2 2 12

2 2 2

l

l l

l l

k

k k k

γ γ



!

"

$

# # # # # #

L

M O M

L

and^ ∂ ∂ σ ∂

∂ σ ∂ γ

∂ σ ∂ γ

2 2 2

2 2 12

2 2 2

l

l

l

e

e

e k

γ



!

"

$

# # # # # #

M

The Fisher Information Matrix

Since K X ′ (^) 0 = (^0) 0 nr 5 × m 0 and trace 0 RV 5 = nr , the expected second derivatives are

E l^ i k j k i j

i j j i

∂ γ ∂ γ

2 2 2

 



 

 = −^ tr^3 X RX X RX^ ′^ ′^8 =^ ,^ K,^ ,^ = ,^ K,

E l^ j k

E l^ n^ r

e j e

j j

e e e

∂ ∂ σ ∂ γ σ ∂ ∂ σ ∂ σ σ

2 2 2 2

2 2 2 4

 



 

 = −^ ′^ =

 ^

 ^

tr 3 X RX 8 , K,

The Fisher Information matrix is

E d l d d

E l^ E l

E l^ E l

e

e e e

2

2 2 2

2 2 2 2 2 2

2 2 2

θ θ

γ γ γ

γ

 ^

 ^

 ^

 ^

 ^

   ^

 ^

 ^

 



!

"

$

∂ ∂ σ ∂ ∂ σ ∂

∂ σ ∂ σ

where

E l

E l^ E l

E l^ E l

k

k k k

∂ γ ∂ γ

∂ γ ∂ γ

∂ γ ∂ γ

∂ γ ∂ γ

2 2 2

2 12 12

2 12 2

2 2 12

2 2 2

γ γ

 ^

 ^

 ^

 ^

 ^

 

 ^

 ^

 ^

 



!

"

$

# # # # # #

L

M O M

L

and

E l

E l

E l

e

e

e k

∂ σ ∂

∂ σ ∂ γ

∂ σ ∂ γ

2 2 2

2 2 12

2 2 2

γ

 ^

 ^

 ^

 

 ^

 



!

"

$

# # # # # #

M

Otherwise, the Fisher scoring step is used instead. The increment vector for each type of step is:

  • Newton-Raphson Step :∆θ θ θ θ

 ^

 

d l d d

dl d

2 1 .

  • Fisher Scoring Step : ∆θ θ θ θ

 ^

 

 ^

 

E d l d d

dl d

2 1 .

Convergence Criteria

Given the convergence criterion ε > 0 , the iteration is considered converged when the following criteria are satisfied:

  1. l (^) 4 θ $^0 s + 1 5 9 4− l θ $^ 0 5 s (^) 9 < ε × max  1 , l 4 θ $0 5 s 9 , and
  2. ρ 4 θ $^0 s + 1 5 − θ$^ 0 5 s (^) 9 < ε×max 41 , θ$0 5 s 9 where a is the sum of absolute values of elements of the vector a.

Negative Variance Estimates

Negative variance estimates can occur at the end of an iteration. An ad hoc method is to set those estimates to zero before the next iteration.

Covariance Matrix

Let θ$ be the vector of maximum likelihood estimates. Their covariance matrix is given by

cov $ $

θ 4 9 θ θ (^) θ θ

 ^

 













E d l d d

2 1

Let

ψ =



!

"

$

σ

σ σ

12

2 2

M

k e

be the original parameters. Their maximum likelihood estimates are given by

ψ =



!

"

$

σ γ

σ γ σ

e

e k e

2 12

2 2 2

M

and their covariance matrix is estimated by

cov (^) 2 7ψ$^ = J cov (^) 4 9θ$ J

where

J = I

 !

" $

σ (^) e^2 (^) k 0 1

γ

which is the 0 k + 1 5 0× k + 15 Jacobian matrix of transforming θ to ψ.

ANOVA Estimate

The ANOVA variance component estimates are obtained by equating the expected mean squares of the random effects to their observed mean squares. The VARCOMP procedure offers two types of sum of squares: Type I and Type III (see Appendix 11 for details).