













Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An overview of multinomial loglinear and logit models in genlog, focusing on normalizing constants, cell structure values, likelihood equations, hessian matrix, and estimation methods. It covers topics such as maximum likelihood estimates, newton-raphson method, initial values, and stopping criteria.
Typology: Study notes
1 / 21
This page cannot be seen from the preview
Don't miss anything!














1
This chapter describes the algorithms used to calculate maximum-likelihood estimates for the multinomial loglinear model and the multinomial logit model. This algorithm is applicable only to aggregated data.
The following notation is used throughout this chapter unless otherwise stated: A Generic categorical independent (explanatory) variable. Its categories are indexed by an array of integers. B (^) Generic categorical dependent (response) variable. Its categories are indexed by an array of integers. r Number of categories of B. c Number of categories of A. p Number of nonredundant (nonaliased) parameters. i (^) Generic index for the category of B. j Generic index for the categories of A. k Generic index for the parameter. nij Observed count in the ith response of B and the jth setting of A. N (^) j Marginal total count at the jth setting of A. It is equal to
i^ nij
r
N Total observed count. It is equal to
i^ nij
r j
c
mij Expected count.
π (^) ij Probability of having an observation in the ith response of B and the jth
r j
c and (^) =. z (^) ij Cell structure value. α (^) j jth normalizing constant. β (^) k kth nonredundant parameter.
x (^) ijk An element in the ith row and the kth column of the design matrix for the j setting. The same notation is used for both loglinear and logit models so that the methods are presented in a unified way. Conceptually, one can consider a loglinear model as a special case of a logit model where the explanatory variable has only one level (that is, c = 1).
There are two components in a loglinear model: the random component and the systematic component.
The random component describes the joint distribution of the counts.
α (^) j j ij
v i
r
z e
j c ij
log , , 1
The cell structure values play two roles in SPSS loglinear procedures, depending on
sometimes called the offset. If z (^) ij ≤ 0 , a structural zero is imposed on the cell
sampling zero. Although SPSS still considers a structural zero part of the contingency table, it is not used in fitting the model. Cellwise statistics are not computed for structural zeros.
The multinomial log-likelihood is
L L (^) p n (^) ij mij i
r
j
c
= =
1 1
, K , constant log (3)
It can be shown that
∂β
L (^) n m x k (^) i ij^ ij
r
j
c = − ijk = =
1 1
for k = 1, K,p
g (^) k L k
β ∂
The maximum-likelihood estimates β$^ = β$^1 , K , β$p
t
the vector of likelihood equations:
The likelihood equations are nonlinear functions of β. Solving them for β$ requires an iterative method. The Newton-Raphson method is used. It can be shown that ∂ ∂β ∂β θ θ
2
1 1
L (^) m x x k t
ij ijk jk ijl jl i
r
j
c = − − − = =
where
θ (^) jk j
ij ijk i
r
1
, K, and , K, (5)
h (^) kl L^ k p l p k l
2 1 , K, and 1 , K, (6)
SPSS uses the β 0 5^0 , which corresponds to a saturated model as the initial value for β. Then the initial estimates for the expected cell counts are
m
n z ij z
ij ij ij
0 5 (^) = +^ > ≤
∆ if if (9)
where ∆ ≥ 0 is a constant. Note: For saturated models, SPSS adds ∆ to nij if z (^) ij > 0. This is done to avoid numerical problems in case some observed counts are 0. We advise users to set ∆ to 0 whenever all observed counts (other than structural zeros) are positive.
The initial values for other quantities are
θ (^) jk j (^) i ij
r N m^ xijk
0 0 1
and
ηij^0 mij^ mij^ zij^ nij^ mij^ zij^ mij
log / if and otherwise
SPSS checks the following conditions for convergence:
p (^2) 1
The iteration is said to be converged if either conditions 1 and 3 or conditions 2 and 3 are satisfied. If p^ =^0 , then condition 3 will be automatically satisfied. The iteration is said to be not converged if neither pair of conditions is satisfied within the maximum number of iterations.
The iteration process uses the following steps:
(7) evaluated at n (^) ij = nij0 5s.
1 1 1
(^0 5) β 0 5 and
m N^ z e^ z e^ z z
ij
s (^) j ijv t tjv
r ij ij
1 1 0 0 0
0 5
/ if if
where
n m m z n m z n m z n m
ij
ij ij ij ij ij ij ij ij ij ij ij ij
2
SYSMIS if and if or
If any X (^) ij 2 is system missing, then X 2 is also system missing. The likelihood-ratio chi-square statistic is
G Gij i
r
j
c 2 2 1 1
= =
where
n n m (^) z n m z n m z n m z n m
ij
ij ij ij ij ij ij ij ij ij ij ij ij ij ij ij
2
log / $^ , $ , $ , , $^ ; $
SYSMIS if and if and or
If any Gij 2 is system missing, then G^2 is also system missing.
is the number of cells with z (^) ij ≤ 0 or m$^ ij= 0.
The significance level (or the p value) for the Pearson chi-square statistic is
degrees of freedom.
SPSS provides the analysis of dispersion based on two types of dispersion: entropy and concentration. The following definitions are used: S(A) Dispersion due to the model S(B|A) Dispersion due to residuals S(B) Total dispersion R=S(A)/S(B) Measure of association
π (^) i j ij
c
j j
c
m
N
=
1
1
π (^) i j | ij j
m N
The bounds are 0 ≤ π$^ i ≤ 1 and 0 ≤ π$i j| ≤ 1.
S B N S (^) iB i
r
=
1
where
Si B i^ i^ i i
$ (^) log $ $
π π π π
if if
Goodness-of-fit statistics provide only broad summaries of how models fit data. The pattern of lack of fit is revealed in cell-by-cell comparisons of observed and fitted cell counts.
The simple residual of the (i,j)th cell is
r
n m z ij z
ij ij ij ij
$ if SYSMIS if
The standardized residual for the (i,j)th cell is
r
n m m m N z m N ijS z^ n^ m
ij ij ij ij j ij ij j = ij ij ij
if and 0 < if and SYSMIS otherwise
The standardized residuals are also known as Pearson residuals even though
i^ rij^ S X
r j
c
2 1 1
2
normal, their asymptotic variances are less than 1.
The adjusted residual is the simple residual divided by its estimated standard error. Its definition and applications first appeared in Haberman (1973) and re- appeared on page 454 of Haberman (1979). This statistic for the (i,j)th cell is
r
n m s z m ijA z^ m
ij ij ij ij ij = ij ij
if and n SYSMIS otherwise
ij
where
s m
m N ij ij ij m^ x^ x^ h j ijk
p ijk jk ijl jl l
p
= =
1 1
standard normal.
Pierce and Schafer (1986) and McCullagh and Nelder (1989) define the signed square root of the individual contribution to the G^2 statistic as the deviance residual. This statistic for the (i,j)th cell is
where
d
n n m n m z m n m z m z m
ij
ij ij ij ij ij ij ij ij ij ij ij ij ij
log / $^ $^ , $^ , $ (^) , $ (^) , $
if and n if and n SYSMIS otherwise
ij ij
For multinomial sampling, the individual contribution to the G^2 statistic is only
rij D G i
r j
c
(^2 ) 1 1
where
V d m N ij ij d m^ f^ f h i
r
j
c
j j
c ij ij i
r k l l
p kl k
p = −
= = = = = =
2 1 1 1 1
2
1 1
f (^) k d mij x i
r
j
c = (^) ij ijk − ik = =
1 1
Consider a linear combination of the natural logarithm of cell counts
d (^) ij m i
r
j
c ij = =
1 1
where d (^) ij are real numbers with the restriction
d (^) ij j c i
r = = =
1
The quantity in (12) is estimated by
d (^) ij m d z d x i
r
j
c ij ij i
r
j
c ij ij ijk k k
p
i
r
j
c
= = = = = = =
1 1 1 1 1 1 1
The variance of (13) is
var d (^) ij m w w h i
r
j
c ij k l kl l
p
k
p
= = = =
1 1 1 1
where
w (^) k d (^) ijx k p i
r
j
c = (^) ijk = = =
1 1
The null hypothesis is
H d (^) ij m i
r
j
c 0 ij 1 1
: log 0 = =
The Wald statistic is
d m
w w h
i ij
r j
c ij
l k^ l^ kl
p k
= p
= =
1 1
2
1 1
Under H 0 , W asymptotically distributes as a chi-square distribution with 1 degree
missing if (14) is 0.
The cell count is
n
n v v v
ij s v ijs ij ij ij
= ij
≤ ≤
if if or
where
n
n n z ijs n z
ijs ijs ijs ijs ijs
if and if and
ij
means summation over the range of s with the terms z (^) ijs > 0. The cell weight value is
z
n z n n v
z v n v v v
ij
s v ijsijs^ ij^ ij^ ij
s v ijs^ ij ij^ ij ij ij
ij = ij
≤ ≤
≤ ≤
1
1
if and
if and if if
If no variable is specified as the cell weight variable, then all cases have unit cell weights by default.
The cell covariate value is
x
n x n n v
x v n v v v
ij
s v ijsijs^ ij^ ij^ ij
s v ijs^ ij ij^ ij ij ij
ij
ij
≤ ≤
≤ ≤
1
1
if and
if and if or
The cell GRESID coefficient is
c
n c n n v
c v n v v v
ij
s v ijsijs^ ij^ ij^ ij
s v ijs^ ij ij^ ij ij ij
ij
ij
≤ ≤
≤ ≤
1
1
if and
if and if or
There are no defaults for the GRESID coefficients.
The cell GLOR coefficient is
e
n e n n v
e v n v v v
ij
s v ijsijs^ ij^ ij^ ij
s v ijs^ ij ij^ ij ij ij
ij
ij
≤ ≤
≤ ≤
1
1
if and
if and if or
There are no defaults for the GLOR coefficients.