Multinomial Loglinear and Logit Models in GENLOG: Estimation and Analysis, Study notes of Mathematical Statistics

An overview of multinomial loglinear and logit models in genlog, focusing on normalizing constants, cell structure values, likelihood equations, hessian matrix, and estimation methods. It covers topics such as maximum likelihood estimates, newton-raphson method, initial values, and stopping criteria.

Typology: Study notes

2011/2012

Uploaded on 10/31/2012

sangawar
sangawar 🇮🇳

4.5

(4)

118 documents

1 / 21

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
GENLOG
Multinomial Loglinear and Logit Models
This chapter describes the algorithms used to calculate maximum-likelihood
estimates for the multinomial loglinear model and the multinomial logit model.
This algorithm is applicable only to aggregated data.
Notation
The following notation is used throughout this chapter unless otherwise stated:
A Generic categorical independent (explanatory) variable. Its categories are
indexed by an array of integers.
B Generic categorical dependent (response) variable. Its categories are
indexed by an array of integers.
r Number of categories of B.
c Number of categories of A.
p Number of nonredundant (nonaliased) parameters.
i Generic index for the category of B.
j Generic index for the categories of A.
k Generic index for the parameter.
nij Observed count in the ith response of B and the jth setting of A.
N
j Marginal total count at the jth setting of A. It is equal to
nij
i
r
=
1.
N Total observed count. It is equal to
nij
i
r
j
c
== 11 .
mij Expected count.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15

Partial preview of the text

Download Multinomial Loglinear and Logit Models in GENLOG: Estimation and Analysis and more Study notes Mathematical Statistics in PDF only on Docsity!

1

GENLOG

Multinomial Loglinear and Logit Models

This chapter describes the algorithms used to calculate maximum-likelihood estimates for the multinomial loglinear model and the multinomial logit model. This algorithm is applicable only to aggregated data.

Notation

The following notation is used throughout this chapter unless otherwise stated: A Generic categorical independent (explanatory) variable. Its categories are indexed by an array of integers. B (^) Generic categorical dependent (response) variable. Its categories are indexed by an array of integers. r Number of categories of B. c Number of categories of A. p Number of nonredundant (nonaliased) parameters. i (^) Generic index for the category of B. j Generic index for the categories of A. k Generic index for the parameter. nij Observed count in the ith response of B and the jth setting of A. N (^) j Marginal total count at the jth setting of A. It is equal to

i^ nij

r

N Total observed count. It is equal to

i^ nij

r j

c

mij Expected count.

π (^) ij Probability of having an observation in the ith response of B and the jth

setting of A. 0 ≤ πij ≤ 1 ∑ ∑i= 1 πij= 1

r j

c and (^) =. z (^) ij Cell structure value. α (^) j jth normalizing constant. β (^) k kth nonredundant parameter.

β A vector of 3 β 1 , K, βp 8 ′ .

x (^) ijk An element in the ith row and the kth column of the design matrix for the j setting. The same notation is used for both loglinear and logit models so that the methods are presented in a unified way. Conceptually, one can consider a loglinear model as a special case of a logit model where the explanatory variable has only one level (that is, c = 1).

Components of the Model

There are two components in a loglinear model: the random component and the systematic component.

Random Component

The random component describes the joint distribution of the counts.

  • The counts (^) =n (^1) j , K,nrjB at the jth setting of A have the multinomial

3 N^ j ,^ π^1 j ,^ K,πrj 8 distribution.

  • The counts n (^) ij and ni j′ ′ are independent if j ≠ j′.

Normalizing Constants

α (^) j j ij

v i

r

N

z e

j c ij

log , , 1

1 K (2)

Cell Structure Values

The cell structure values play two roles in SPSS loglinear procedures, depending on

their signs. If z ij > 0 , it is a usual weight for the corresponding cell and log 3 8z ij is

sometimes called the offset. If z (^) ij ≤ 0 , a structural zero is imposed on the cell

0 B = i A, =j 5. Contingency tables containing at least one structural zero are called

incomplete tables. If n ij = 0 but zij> 0 , the cell 0 B = i A, =j 5 contains a

sampling zero. Although SPSS still considers a structural zero part of the contingency table, it is not used in fitting the model. Cellwise statistics are not computed for structural zeros.

Maximum-Likelihood Estimation

The multinomial log-likelihood is

L L (^) p n (^) ij mij i

r

j

c

= =

1 1

, K , constant log (3)

Likelihood Equations

It can be shown that

∂β

L (^) n m x k (^) i ij^ ij

r

j

c = − ijk = =

∑ ∑^3

1 1

for k = 1, K,p

Let g 1 6 β = 3 g 1 1 6 β, K, g p1 6β 8 ′be the 0 p + 15 gradient vector with

g (^) k L k

β ∂

The maximum-likelihood estimates β$^ = β$^1 , K , β$p

t

4 9 are regarded as a solution to

the vector of likelihood equations:

g 1 6 β = 0 (4)

Hessian Matrix

The likelihood equations are nonlinear functions of β. Solving them for β$ requires an iterative method. The Newton-Raphson method is used. It can be shown that ∂ ∂β ∂β θ θ

2

1 1

L (^) m x x k t

ij ijk jk ijl jl i

r

j

c = − − − = =

∑ ∑^3 83

where

θ (^) jk j

ij ijk i

r

= N ∑= m x j = c k = p

1

, K, and , K, (5)

Let H 1 6 β be the p × pinformation matrix, where − H 1 6 β is the Hessian matrix of

(3). The elements of H 1 6 β are

h (^) kl L^ k p l p k l

2 1 , K, and 1 , K, (6)

Note: H 1 6 β is a symmetric positive-definite matrix. The asymptotic covariance

matrix of β$ is estimated by H −^1 1 6 β.

Initial Values

SPSS uses the β 0 5^0 , which corresponds to a saturated model as the initial value for β. Then the initial estimates for the expected cell counts are

m

n z ij z

ij ij ij

0 5 (^) = +^ > ≤

∆ if if (9)

where ∆ ≥ 0 is a constant. Note: For saturated models, SPSS adds ∆ to nij if z (^) ij > 0. This is done to avoid numerical problems in case some observed counts are 0. We advise users to set ∆ to 0 whenever all observed counts (other than structural zeros) are positive.

The initial values for other quantities are

θ (^) jk j (^) i ij

r N m^ xijk

0 0 1

0 5 (^) =^1 0 5

and

ηij^0 mij^ mij^ zij^ nij^ mij^ zij^ mij

0 5 = %&K 0 5^4 0 5^ 9 4+ − 0 5^9 > 0 5>

'K^

log / if and otherwise

Stopping Criteria

SPSS checks the following conditions for convergence:

1. maxi j,  mij 0 s^ +^15 −mij 0 5s^ /mij0 5s < ε provided that mij 0 5s^ > 0

2. maxi j,  mij 0 s^ +^15 −mij0 5s  < ε

  1. (^) k g (^) k p

p (^2) 1

4 9^ β^ $^ / ε

^

^ <

The iteration is said to be converged if either conditions 1 and 3 or conditions 2 and 3 are satisfied. If p^ =^0 , then condition 3 will be automatically satisfied. The iteration is said to be not converged if neither pair of conditions is satisfied within the maximum number of iterations.

Algorithm

The iteration process uses the following steps:

  1. Calculate mij0 5^0 using (9), θ 0 5jk^0 using (10), and nij0 5^0 using (11).
  2. Set s = 0.

3. Calculate H 4 β 0 5s 9 using (6) evaluated at m ij = mij0 5s; calculate q 4 β 0 5s 9 using

(7) evaluated at n (^) ij = nij0 5s.

  1. Solve for β 0 s^ +^15 using (8).
  2. Calculate v (^) ijs^ k xijk (^) ks
    • p +

1 1 1

(^0 5) β 0 5 and

m N^ z e^ z e^ z z

ij

s (^) j ijv t tjv

r ij ij

  • ijs ijs = =

^

^

^

&K

'K

1 1 0 0 0

0 5

/ if if

  1. Check whether the stopping criteria are satisfied. If yes, stop iteration and declare convergence. Otherwise continue.
  2. Increase s by 1 and check whether the maximum iteration has been reached. If yes, stop iteration and declare the process not converged. Otherwise repeat steps 3-7.

where

X

n m m z n m z n m z n m

ij

ij ij ij ij ij ij ij ij ij ij ij ij

2

KK

K

K

3 8 if^ and

SYSMIS if and if or

If any X (^) ij 2 is system missing, then X 2 is also system missing. The likelihood-ratio chi-square statistic is

G Gij i

r

j

c 2 2 1 1

= =

where

G

n n m (^) z n m z n m z n m z n m

ij

ij ij ij ij ij ij ij ij ij ij ij ij ij ij ij

2

K

K

K

K

log / $^ , $ , $ , , $^ ; $

4 3 89 if and

SYSMIS if and if and or

If any Gij 2 is system missing, then G^2 is also system missing.

Degrees of Freedom

The degrees of freedom for each statistic is defined as a = c r 0 − 15 − p −E, where E

is the number of cells with z (^) ij ≤ 0 or m$^ ij= 0.

Significance Level

The significance level (or the p value) for the Pearson chi-square statistic is

Prob 4 χ 2 a > X^29 and that for the likelihood-ratio chi-square statistic is

Prob 4 χ 2 a > G^29. In both cases, χ 2 a^ is the central chi-square distribution with a

degrees of freedom.

Analysis of Dispersion (Logit Models Only)

SPSS provides the analysis of dispersion based on two types of dispersion: entropy and concentration. The following definitions are used: S(A) Dispersion due to the model S(B|A) Dispersion due to residuals S(B) Total dispersion R=S(A)/S(B) Measure of association

where S A0 5 + S B A 0 | 5 =S B0 5. Also define

π (^) i j ij

c

j j

c

m

N

=

1

1

π (^) i j | ij j

m N

The bounds are 0 ≤ π$^ i ≤ 1 and 0 ≤ π$i j| ≤ 1.

Entropy

S B N S (^) iB i

r

=

1

where

Si B i^ i^ i i

0 5 =^ 1 6

$ (^) log $ $

π π π π

if if

Residuals

Goodness-of-fit statistics provide only broad summaries of how models fit data. The pattern of lack of fit is revealed in cell-by-cell comparisons of observed and fitted cell counts.

Simple Residuals

The simple residual of the (i,j)th cell is

r

n m z ij z

ij ij ij ij

$ if SYSMIS if

Standardized Residuals

The standardized residual for the (i,j)th cell is

r

n m m m N z m N ijS z^ n^ m

ij ij ij ij j ij ij j = ij ij ij

KK

K

K

if and 0 < if and SYSMIS otherwise

The standardized residuals are also known as Pearson residuals even though

i^ rij^ S X

r j

c

2 1 1

2

∑ = ∑ = ≠^. Although the standardized residuals are asymptotically

normal, their asymptotic variances are less than 1.

Adjusted Residuals

The adjusted residual is the simple residual divided by its estimated standard error. Its definition and applications first appeared in Haberman (1973) and re- appeared on page 454 of Haberman (1979). This statistic for the (i,j)th cell is

r

n m s z m ijA z^ m

ij ij ij ij ij = ij ij

K

K

3 8 if^ and

if and n SYSMIS otherwise

ij

where

s m

m N ij ij ij m^ x^ x^ h j ijk

p ijk jk ijl jl l

p

=  − − − − kl

= =

1 $^ $^ $

1 1

h kl^ is the (k,l)th element of H −^1 4 9 β$. The adjusted residuals are asymptotically

standard normal.

Deviance Residuals

Pierce and Schafer (1986) and McCullagh and Nelder (1989) define the signed square root of the individual contribution to the G^2 statistic as the deviance residual. This statistic for the (i,j)th cell is

rij D^ = sign 3 n ij −m$ ij 8 dij

where

d

n n m n m z m n m z m z m

ij

ij ij ij ij ij ij ij ij ij ij ij ij ij

K

K

K

K

log / $^ $^ , $^ , $ (^) , $ (^) , $

4 4 3 8 9 3^89 if^ and

if and n if and n SYSMIS otherwise

ij ij

For multinomial sampling, the individual contribution to the G^2 statistic is only

2 n ij log 3 n ij / m$ij 8 , but this is negative when n ij < m$^ ij. Thus, an extra term

2 3 n ij − m$ij 8 is added to it so that d ij > 0 for all i and j. However, we still have

rij D G i

r j

c

(^2 ) 1 1

where

V d m N ij ij d m^ f^ f h i

r

j

c

j j

c ij ij i

r k l l

p kl k

p = −

= = = = = =

2 1 1 1 1

2

1 1

f (^) k d mij x i

r

j

c = (^) ij ijk − ik = =

1 1

Generalized Log-Odds Ratio

Consider a linear combination of the natural logarithm of cell counts

d (^) ij m i

r

j

c ij = =

1 1

log 3 8 (12)

where d (^) ij are real numbers with the restriction

d (^) ij j c i

r = = =

1

, K,

The quantity in (12) is estimated by

d (^) ij m d z d x i

r

j

c ij ij i

r

j

c ij ij ijk k k

p

i

r

j

c

= = = = = = =

∑ ∑ =^ ∑ ∑ +∑∑∑

1 1 1 1 1 1 1

log 3 $^8 log 3 8 β$ (13)

The variance of (13) is

var d (^) ij m w w h i

r

j

c ij k l kl l

p

k

p

= = = =

1 1 1 1

log 3 $ 8 (14)

where

w (^) k d (^) ijx k p i

r

j

c = (^) ijk = = =

1 1

1, K,

Wald Statistic

The null hypothesis is

H d (^) ij m i

r

j

c 0 ij 1 1

: log 0 = =

∑ ∑^3 8 =

The Wald statistic is

W

d m

w w h

i ij

r j

c ij

l k^ l^ kl

p k

= p

^

= =

1 1

2

1 1

log 3 $ 8

Under H 0 , W asymptotically distributes as a chi-square distribution with 1 degree

of freedom. The significance level is Prob 4 χ 12 ≥ W 9. Note: W will be system

missing if (14) is 0.

The cell count is

n

n v v v

ij s v ijs ij ij ij

= ij

K

'K

≤ ≤

if if or

where

n

n n z ijs n z

ijs ijs ijs ijs ijs

+ = >^ >

if and if and

and ∑ 1 ≤ ≤s v

ij

means summation over the range of s with the terms z (^) ijs > 0. The cell weight value is

z

n z n n v

z v n v v v

ij

s v ijsijs^ ij^ ij^ ij

s v ijs^ ij ij^ ij ij ij

ij = ij

K

K

K

K

K

K

≤ ≤

≤ ≤

1

1

if and

if and if if

If no variable is specified as the cell weight variable, then all cases have unit cell weights by default.

The cell covariate value is

x

n x n n v

x v n v v v

ij

s v ijsijs^ ij^ ij^ ij

s v ijs^ ij ij^ ij ij ij

ij

ij

K

KK

K

K

K

≤ ≤

≤ ≤

1

1

if and

if and if or

The cell GRESID coefficient is

c

n c n n v

c v n v v v

ij

s v ijsijs^ ij^ ij^ ij

s v ijs^ ij ij^ ij ij ij

ij

ij

K

KK

K

K

K

≤ ≤

≤ ≤

1

1

if and

if and if or

There are no defaults for the GRESID coefficients.

The cell GLOR coefficient is

e

n e n n v

e v n v v v

ij

s v ijsijs^ ij^ ij^ ij

s v ijs^ ij ij^ ij ij ij

ij

ij

K

KK

K

K

K

≤ ≤

≤ ≤

1

1

if and

if and if or

There are no defaults for the GLOR coefficients.