ANOVA and MGLM - Matrix Computations and Statistical Inference, Study notes of Mathematical Statistics

An in-depth explanation of the computations and statistical inference involved in the analysis of variance (anova) and multivariate general linear model (mglm). Topics covered include the formation of augmented row vectors, computation of cross-product and sums-of-squares matrices, t-statistic and partial eta-squared calculations, noncentrality parameter and observed power, correlation, standard error, studentized residuals, and wilks' lambda. The document also discusses the hotelling-lawley trace, pillai's trace, and the likelihood ratio test statistic.

Typology: Study notes

2011/2012

Uploaded on 10/31/2012

sangawar
sangawar 🇮🇳

4.5

(4)

118 documents

1 / 23

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
GLM
Univariate and Multivariate
GLM (general linear model) is a general procedure for analysis of variance and
covariance, as well as regression. It can be used for both univariate and multivariate
designs. Repeated measures analysis is also available. Algorithms that apply only to
repeated measures are in the chapter GLM Repeated Measures.
For information on post hoc tests, see Appendix 10. For sums of squares, see
Appendix 11. For distribution functions, see Appendix 12. For Box’s M test, see
Appendix 14.
Notation
The following notation is used throughout this chapter. Unless otherwise stated, all
vectors are column vectors and all quantities are known.
n Number of cases.
N Effective sample size.
p Number of parameters (including the constant, if it exists) in the model.
r Number of dependent variables in the model.
Y n × r matrix of dependent variables. The rows are the cases and the columns
are the dependent variables. The ith row is yi, in=1, ,K.
X n × p design matrix. The rows are the cases and the columns are the
parameters. The ith row is xi, in=1, ,K.
r
X
Number of nonredundant columns in the design matrix. Also the rank of the
design matrix.
wi Regression weight of the ith case.
fi Frequency weight of the ith case.
B
p
r
× unknown parameter matrix. The columns are the dependent variables.
The jth column is bj,
j
r
=1, ,K.
6 r × r unknown common multiplier of the covariance matrix of any row of
Y. The (i, j)th element is
σ
ij, i
r
=1, ,K,
j
r
=1, ,K.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17

Partial preview of the text

Download ANOVA and MGLM - Matrix Computations and Statistical Inference and more Study notes Mathematical Statistics in PDF only on Docsity!

1

GLM

Univariate and Multivariate

GLM (general linear model) is a general procedure for analysis of variance and covariance, as well as regression. It can be used for both univariate and multivariate designs. Repeated measures analysis is also available. Algorithms that apply only to repeated measures are in the chapter GLM Repeated Measures. For information on post hoc tests, see Appendix 10. For sums of squares, see Appendix 11. For distribution functions, see Appendix 12. For Box’s M test, see Appendix 14.

Notation

The following notation is used throughout this chapter. Unless otherwise stated, all vectors are column vectors and all quantities are known.

n Number of cases. N Effective sample size. p Number of parameters (including the constant, if it exists) in the model. r Number of dependent variables in the model. Y (^) n × r matrix of dependent variables. The rows are the cases and the columns are the dependent variables. The i th row is yi , i = 1, K, n. X (^) n × p design matrix. The rows are the cases and the columns are the parameters. The i th row is xi , i = 1, K, n. rX Number of nonredundant columns in the design matrix. Also the rank of the design matrix. w i

Regression weight of the i th case.

f i

Frequency weight of the i th case.

B p × runknown parameter matrix. The columns are the dependent variables. The j th column is b j , j = 1, K, r.

T r^ ×^ r^ unknown^ common multiplier of the covariance matrix of any row of

Y. The (i, j) th element is σ ij , i = 1, K, r , j = 1, K, r.

Model

The model is Y = XB and yi is independently distributed as a p -dimensional normal distribution with mean x Bi and covariance matrix w (^) i −^1 Σ. The i th case is ignored if w (^) i ≤ 0.

Frequency Weight and Total Sample Size

The frequency weight f i is the number of replications represented by an SPSS case; therefore, the weight must be a non-negative integer. It is computed by rounding the value in the SPSS weight variable to the nearest integer. The total sample size is N f (^) i wi i

n = > ∑ (^) =

I 0

1 I T, where^ I^ I w^ i >^0 T =^1 if^ w^ i >^0 and is equal to 0 otherwise.

The Cross-Product and Sums-of-Squares Matrices

To prepare for the SWEEP operation, an augmented row vector of length H p + r S is formed:

z ′ = (^) i (^) I x ′ (^) i (^) , yi T

Then the H p + r S H× p + r S matrix is computed:

∑= Z WZ f wi z z i

n 1 i^ i^ i.

This matrix is partitioned as

% ' &

( 0

Z WZ (^) )

X WX X WY

Y WX Y WY

The upper left p × p submatrix is X ′ WX and the lower right r × r submatrix is

Y ′ WY.

The t Statistic

For testing H 0 : bij = 0 versus H 1 : bij ≠ 0 , the t statistic is

t b (^) ij bij =

7 8 9 u

$ (^) / se( $ (^) ) if the standard error is positive SYSMIS otherwise

The significance value for this statistic is 2 1Q − CDF. T t ,I Nrx TV where CDF.T is the SPSS function for the cumulative t distribution.

Partial Eta Squared Statistic

η^2

2 2

= 1 0

7 8

u

9

u

b^ $^ ( b $^ ( N r ) var( b $^ )) N r N b

ij ij ij X X ij

X if r^ and the denominator is positive if but SYSMIS otherwise

The value should be within 0 ≤ η^2 ≤ 1.

Noncentrality Parameter

c = t

Observed Power

p

t N r c t N r c r N

r N

c X c X X = X

7

8

uu

9

u u

1 NCDF. T NCDF. T

SYSMIS

or any arguments to NCDF. T or IDF. T are SYSMIS

where t (^) c = IDF.T 1I − α / 2 , NrX T and α is the user-specified chance of Type I error H 0 < α< 1 S. NCDF.T and IDF.T are the SPSS functions for the cumulative noncentral t distribution and for the inverse cumulative t distribution, respectively. The default value is α = 0 05.. The observed power should be within 0 ≤ p ≤ 1.

Confidence Interval

For the p % level, the individual univariate confidence interval for the parameter is

b^ $^ t se b $ ij ±α R W ij

where t α (^) = IDF.T 0 5 1P. H + p / 100 S, NrX U for i = 1 , K, n j ; = 1 , K, r. The default value of p is 95 ( 0 < p < 100 ).

Correlation

corr if the standard errors are positive SYSMIS otherwise

( $^ , $^ )

$ (^) se $^ se $ b (^) ij b (^) rs = js^ g^ ir^ b^ ij^ × brs

7 8

u

9 u

σ (^) R R W R WW

for i r , = 1 , K , p j s ; , = 1 , K, r.

Estimated Marginal Means

Estimated marginal means (EMMEANS) are computed as the generic l Bm ′$ expression with appropriate l and m vectors. l is a column vector of length p and m is a column vector of length r. Since the l vector is chosen to be always estimable, the quantity l Bm ′$^ is in fact the estimated modified marginal means (Searle, Speed, and Milliken, 1980). When covariates (or products of covariates) are present in the effects, the overall means of the covariates (or products of covariates) are used in the l matrix. Suppose X and Y are covariates and they appear as XY in an effect; then the mean of XY is used instead of the product of the mean of X and the mean of Y.

L Matrix

For each level combination of the between subjects factors in TABLES, identify the nonmissing cases with positive caseweights and positive regression weights which are associated with the current level combination. Suppose the cases are classified by three between-subjects factors: A, B and C. Now A and B are specified in TABLES and the current level combination is A=1 and B=2. A case in the cell A=1, B=2, and C=3 is associated with the current level combination,

Significance

The t statistic is

t = ′^ ′^ ′^ >

7 8

u 9 u

l Bm $^ se (^) R l Bm $^ (^) W ifse (^) R l Bm $ W SYSMIS otherwise

If the t statistic is not system missing, then the significance is computed based on a t distribution with Nr X degrees of freedom.

Pairwise Comparison

Between-Subjects Factor

Suppose the l vectors are indexed by the level of the between-subjects factor as l i (^) 1 ,K , ib , i (^) s = 1, K, ns and s = 1, K, b where n (^) s is the number of levels of between- subjects factor s and b is the number of between-subjects factors specified inside TABLES. The difference in estimated marginal means of level is and level is ′ of between-subjects factor s at fixed levels of other between-subjects factors is

l (^) i 1 (^) ,K , i (^) s (^) − 1 , i i (^) s , (^) s (^) + 1 ,K , i (^) bl (^) i 1 (^) , K, i (^) s − 1 , i i (^) s , (^) s + 1 ,K , i (^) b Bm $

R (^) ′ W for^ i^ s ,^ is^ ′ =^1 ,^ K,^ n^ s ; is^ ≠ ′ is.

The standard error of the difference is computed by substituting for l in (1) : l (^) i (^) 1 ,K , i (^) s − 1 , i i (^) s , (^) s + 1 , K, i (^) bl i 1 (^) ,K , i (^) s (^) − 1 , i i (^) s ′ , (^) s (^) + 1 ,K , ib.

Within-Subjects Factor

Suppose the m vectors are indexed by level of the within-subjects factor as m (^) j 1 (^) ,K , jw , j (^) s = 1, K , ns and s = 1, K, w , where ns is the number of levels of within- subjects factor s and w is the number of within-subjects factors specified inside TABLES. The difference in estimated marginal means of level js and level js ′ of within-subjects factor s at fixed levels of other within-subjects factors is

l B m ′ (^) R (^) j 1 (^) , K, j (^) s − 1 , js , j (^) s + 1 ,K , j (^) bm j 1 (^) , K, j (^) s − 1 , j (^) s ′ , j (^) s + 1 ,K , jb W for j (^) s , j (^) s ′ = 1 , K, n (^) s ; j (^) sjs.

The standard error of the difference is computed by substituting for m in (1) m (^) i 1 (^) , K, i (^) s − 1 , i i (^) s , (^) s + 1 ,K , i (^) bm i 1 (^) ,K , i (^) s − 1 , i i (^) s ′ , (^) s (^) + 1 ,K , ib.

Confidence Interval

The H 1 − αS ×100%confidence interval is

l Bm ′ $^ ± t 1 (^) − α 2 ; Nr X × se (^) R l Bm ′$ W

and t 1 (^) − α 2 ; Nr X is the I 1 − α 2 T ×100%percentile of a t distribution with Nr X degrees of freedom. No confidence interval is computed if Nr X ≤ 0.

Saved Values

Temporary variables can be added to the working data file. These include predicted values, residuals, and diagnostics.

Predicted Values

The n × r matrix of predicted values is Y $^ = XB $. The i th row of Y $is y $^ ′ = (^) i x Bi $, i = 1, K, n. Let the elements of Y $ be y $ ij and the elements of XGX ′ be π i j.

The standard error of y $ ij is

se (^) Q V y $ (^) ij = σ$ (^) jj π (^) ii for i = 1 , K , n j ; = 1 , K, r

The weighted predicted value of the i th case is w (^) i y $^ (^) i ′.

Residuals

The n × r matrix of residuals is E $^ = YY $.

The i th row of E $ is e $^ ′ = i y ′ − i $ yi , i = 1, K, n.

Let the elements of E $ be e $ ij ; then

e^ $^ ij = y (^) ij − $ yij , for i = 1 , K, n j ; = 1 , K, r

The weighted residual is w (^) i $ e^ (^) i ′.

Diagnostics

Cook’s Distance

Cook’s Distance D measures the change to the solution that results from omitting each observation. The formula is

D

e w w^ r

ij

ij

jj i ii

ii i ii X

%

'

& &

(

0

) ) −

% '

&

( 0

)

σ$ π

π 1 1 π

2

I T I^ T

for i = 1 , K, n j ; = 1 , K, r. This formula is equivalent to

Dij = (^) Q e $ ij^ se( e $ (^) ij^ ) (^) V (^) Rse Q V y $ ij^ (^) se (^) Q V e $ (^) ij W rX

2 provided w (^) i > 0 and se (^) Q V e $ ij > 0.

When w (^) i ≤ 0 or se (^) Q V e $ ij = 0 , D (^) ij is system missing.

Leverage (Uncentered)

The leverage for the i th case ( i = 1, K, n ) for all dependent variables is

LEVER

if i (^) SYSMIS otherwise

wi (^) ii wi

7 > 8 9

π 0

Hypothesis Testing

Let L be an l × p known matrix, M be an r × m known matrix and K be an l × m known matrix. The test hypotheses H 0 : LBM = K versus H 1 : LBMK are testable if and only if LB is estimable. The following results apply to testable hypotheses only. Nontestable hypotheses are excluded.

The hypothesis SSCP matrix is S (^) H = ( LBM $^ − K )′ H LGL ′ S −^1 ( LBM $^ − K ) and the error SSCP matrix is S (^) E = M SM ′.

Four test statistics, based on the eigenvalues of S (^) E^ −^1 S^ H , are available: Wilks’ lambda, Hotelling-Lawley trace, Pillai’s trace, and Roy’s largest root. Let the eigenvalues of S (^) E^ −^1 S^ H be λ 1 ≥ K≥ λ rE ≥ 0 and λ rE + 1 , K, λ m = 0 , and let rE = rank( S (^) E ); s = min (^) I l r , E T; n (^) e = nr X ; m^ ∗^ = (^12) P rEl − (^1) U; n^ ∗^ = (^12) I n (^) erE − (^1) T.

Wilks’ Lambda

=

det det

S

S S

E H E (^) k k

m I T I T I T

1 λ

When H o is true, the F statistic

F

lrE

ςτ − υ −

τ

τ

1

1

I T R^ Λ W Λ

follows asymptotically an F distribution, where

ς

υ

τ

= −^ +^ −^ +^ −^ >

7 8

u

9 u

n r l

lr

l r l r l r

e E

E

E E E

1 2 1 4 2 2 2 2 2 2

I T

I T

R W R W ifR W 1 otherwise

The degrees of freedom are I lr (^) E , ςτ − 2 υT. The F statistic is exact if s = 1 2,. See Rao (1951) and Section 8c.5 of Rao (1973) for details.

The eta-squared statistic is η^2 = 1 − Λ1/ s^.

The noncentrality parameter is λ = (^) Iξτ − 2 υ η (^) T 2 / (^) R 1 −η^2 W.

The power is 1 − NCDF. F FP (^) α, lrE , (^) Iξτ − 2 υ (^) T,λU where Fα is the upper 100 α percentage point of the central F distribution, and α is user-specified on the ALPHA keyword on the CRITERIA subcommand.

follows asymptotically an F distribution with degrees of freedom R^ s^ R^2 m^^ *^ +^ s^ +^1 W R, s^^2 n^^ *+^ s +^1 WW. The^ F^ statistic is exact if^ s^ =^ 1.

The eta-squared statistic is η^2 = V / s.

The noncentrality parameter is λ= s (^) R 2 n *^ + s + (^1) Wη 2 /R 1 −η^2 W.

The power is 1 − NCDF. F FR (^) α, s (^) R 2 m *^ + s + (^1) W R , s 2 n * + s + (^1) W,λW where Fα is the

upper 100α percentage point of the central F distribution and α is user-specified on the ALPHA keyword on the CRITERIA subcommand.

Roy’s Largest Root

Θ =λ 1

which is the largest eigenvalue of S^ − E^1 S^ H. When H o is true, the F statistic is

F = ΘI n (^) e − ω + rH

where ω = max (^) I l r , E T is an upper bound of F that yields a lower bound on the significance level. The degrees of freedom areI ω , n (^) e − ω+ rH T. The F statistic is exact if s = 1.

The eta-squared statistic is η^2 = Θ / (^) H 1 +ΘS.

The noncentrality parameter is λ = (^) I n (^) e − ω + rH Tη 2 / (^) R 1 −η^2 W.

The power is 1 − NCDF. F FI (^) α ,ω , n (^) e − ω + l ,λT, where Fα is the upper 100 α percentage point of the central F distribution and α is user-specified on the ALPHA keyword on the CRITERIA subcommand.

Individual Univariate Test

F

l n r

H i E i

S

S X

; ; I^ T

, i = 1, K, m

where S H;i and S E;i are the i th diagonal elements of the matrices S H and S E respectively. Under the null hypothesis, the F statistic has an F distribution with degrees of freedom (^) I l n , − rX T.

The partial eta-squared statistic is η^2 = S (^) H i ; / (^) Q S (^) H i ; + S E i ;V.

The noncentrality parameter is λ = (^) I nr X (^) T S H i (^) ; S E i ;.

The power is 1 − NCDF. F FI (^) α, , 1 nrX ,λT where Fα is the upper 100α percentage point of the central F distribution and α is user-specified on the ALPHA keyword on the CRITERIA subcommand.

Bartlett’s Test of Sphericity

Bartlett’s test of sphericity is printed when the Residual SSCP matrix is requested.

Hypotheses

In Bartlett’s test of sphericity the null hypothesis is H (^) o :Σ =σ 2 I (^) r versus the

alternative hypothesis H 1 : Σ ≠σ 2 I r where σ 2 > 0 is unspecified and I r is an r × r

identity matrix.

Likelihood Ratio Test Statistic

λ =

7

8

u u

9

u u

A

A

A

A

n nr r

2 2 0

trace

trace

trace

I H S T

H S

H S

if

SYSMIS if

Significance

1 − CDF.CHISQ H c f , S − ω 2 PCDF.CHISQ H c f , + 4 S −CDF.CHISQH c f , SU

where CDF.CHISQ is the SPSS function for the cumulative chi-square distribution. The significance is reset to zero whenever the computed value is less than zero due to floating point imprecision.

Custom Hypothesis Tests

The TEST subcommand offers custom hypothesis tests. The hypothesis term is any effect specified (either explicitly or implicitly) in the DESIGN subcommand. The error term can be a linear combination of effects that are specified in the DESIGN subcommand or a sum of squares with specified degrees of freedom. The TEST subcommand is available only for univariate analysis; therefore, an F statistic is computed. When the error term is a linear combination of effects and no value for degrees of freedom is specified, the error degrees of freedom is approximated by the Satterthwaite (1946) method.

Notation

The following notation is used in this section:

S Number of effects in the linear combination q (^) s Coefficient of the s th effect in the linear combination, s = 1, K, S l (^) s Degrees of freedom of the s th effect in the linear combination, s = 1, K, S MS s Mean square of the s th effect in the linear combination, s = 1, K, S Q Linear combination of effects lQ Degrees of freedom of the linear combination MS Q Mean square of the linear combination

Error Term

Mean Squares

If the error term is a linear combination of effects, the error mean square is

MS (^) Q s MS s s

S = q × =

∑ 1

If the user supplied the mean squares, MS Q is equal to the number specified after the keyword VS. If MS Q < 0 , the custom error term is invalid, and MS Q is equal to the system-missing value and an error message is issued.

Degrees of Freedom

If MS Q ≥ 0 and the user did not supply the error degrees of freedom, then the error degrees of freedom is approximated using the Satterthwaite (1946) method. Define

d q l l s =^ s^ s^ s^ s

(^7) > 8

u 9 u^

MS if otherwise

I T

2 0 0

Then D d (^) s s

S

=

∑ 1

. The approximate error degrees of freedom is

l (^) Q = Q D^ D >

7 8

u 9 u^

MS if SYSMIS otherwise

Q V

2 0

If MS Q ≥ 0 and the user supplied the error degrees of freedom, lQ is equal to the number following the keyword DF. If lQ < 0 , the custom degrees of freedom is invalid. In this case, lQ is equal to the system-missing value and an error message is issued.

σ (^) e^2 Unknown variance of the residual term, σ (^) e^2 > 0. X i The n × pi design matrix, i = 0 1, , K, k. i 0 The length p 0 vector of parameters of the fixed effects. i i The length pi vector of parameters of the i th random effect, i = 1, K, k. L (^) The s × p full row rank matrix. The rows are estimable functions. s ≥ 1.

Relationships between these symbols and those defined at the beginning of the chapter are:

  • p = p (^) 0 + p 1 (^) + L+ pk
  • X = X (^) 0 | X (^) 1 | K | X k

• B =

1

3

2 2 2 2

4

6

5 5 5 5

i i

i

0 1 M k

Model

The mixed model is represented, following Rao (1973), as

Y = X + X + e

(^0 0) ∑ 1

i (^) i i i i

k

The random vectors i 1 , K, i k and e are assumed to be jointly independent. Moreover, the random vector i i is distributed as N (^) pi R (^0) , σ (^) i^2 I pi W for i = 1, K, k and

the residual vector e is distributed as N (^) n R (^0) , σ (^) e^2 W −^1 W. Thus,

E

i i i i

k e

Y X

Y X X W

H S

H S

=

− ∑

0 0

2

1

2 1

i

cov σ σ

Expected Mean Squares

For the estimable function L , the expected hypothesis sum of squares is

E SS L E L

L i k L k i

k e L

I T

I T

= %' ′ ( 0

= ′ ′ + %' ′ ( 0 +

Y W A W Y

X W A W X X W A W X A

1 2 1 2

1 2 1 2 1 2 1 0 0 0 0 2 2 1

i i σ trace σ^2 trace

where

A L = W XGL ′ LGL ′ − LGX W ′

(^12 ) H S

Since L = LGX WX ′ , trace (^) I A (^) L T = s and X W A W X ′ = LLGL ′ − L

(^12 12 ) L H^ S^. The matrix X W A W X

(^12 ) L can therefore be computed in the following way:

  1. Compute an s × s upper triangular matrix U such that U U ′ = LGL ′by the Cholesky decomposition.
  2. Invert the matrix U to give U −^1.
  3. Compute C = L U ′ −^1.

Now we have X W A W X ′ = CC

(^12 ) L. If the rows of^ C^ are partitioned into the same-size submatrices as those contained in X —that is,

C

C

C

C

1

3

2 2 2 2

4

6

5 5 5 5

0 1 M k

where C i is a pi × s submatrix—then X W A W Xk (^) L (^) k = C C ii

1 2 1 (^2) , i = 0 1, , K, k.