








Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This document has following main points Plum, Data Aggregation, Data Assumptions, Model, Log-likelihood Function, Derivatives of the Log-likelihood Function, Expectation of the Second Derivative
Typology: Study notes
1 / 14
This page cannot be seen from the preview
Don't miss anything!









The purpose of the PLUM procedure is to model the dependence of an ordinal categorical response variable on a set of discrete and/or continuous independent variables. Since the choice and the number of response categories can be quite arbitrary, it is essential to model the dependence such that the choice of the response categories does not affect the conclusion of the inference. That is, the final conclusion should be the same if any two or more adjacent categories of the old scale are combined. Such considerations lead to modeling the dependence of the response on the independent variables by means of the cumulative response probability.
Y (^) The ordinal response variable, which takes integer values from 1 to J , J ≥ 2. J The number of categories of the ordinal response. m (^) The number of subpopulations. X A^ m p A matrix with vector-element xiA^ , the observed values at the i th subpopulation, determined by the independent variables specified in the command. X (^) m p matrix with vector-element xi , the observed values of the location model’s independent variables at the i th subpopulation. Z (^) m q matrix with vector-element zi , the observed values of the scale model’s independent variables at the i th subpopulation. f (^) ijs The frequency weight for the^ s -th observation which belongs to the cell corresponding to Y 2 j at subpopulation i. nij The sum of frequency weights of the observations that belong to the cell corresponding to Y 2 j at subpopulation i. rij The cumulative total up to and including^ Y^^2 j at subpopulation^ i. ni The marginal frequency of subpopulation^ i. n The sum of all frequency weights. γ (^) ij The cumulative response probability up to and including Y 2 j at subpopulation i. π (^) ij The cell response probability corresponding to^ Y^^2 j at subpopulation^ i. ( J 1 ) 1 vector of threshold parameters in the location part of the model. i p 1 vector of location parameters in the location part of the model. (^) q 1 vector of scale parameters in the scale part of the model. B 2 ( T^ , i T^ , T^ )T^ The {( J -1)+ p + q }^ ×^ 1 vector of unknown parameters in the general model. B^ ^ 2 ( ^ T^ , i^ T^ , ^ T^ )T^ The {( J -1)+ p + q }^ ×^ 1 vector of maximum likelihood estimates of the parameters in the general model. B 2 ( T^ , iT^ )T^ The {(in the location-only model. J -1)+ p }^ ×^ 1 vector of maximum likelihood estimates of the parameters
γ^ (^) ij The cumulative response probability estimate based on the maximum likelihood estimate B in the general model.
γ^ ij The cumulative response probability estimate based on the maximum likelihood estimate
B in the location-only model. π^ (^) ij The cell response probability estimate based on the maximum likelihood estimate B in the general model. π^ ij The cell response probability estimate based on the maximum likelihood estimate
B in the location-only model. e^ Number of non-redundant parameters in the general model. If all parameters are non-redundant, e ˆ^ = ( J -1) + p + q. e^ Number of non-redundant parameters in the location-only model. If all parameters are non-redundant, e^ ^ = ( J -1) + p.
Observations with negative or missing frequency weights are discarded. Observations are aggregated by the definition of subpopulations. Subpopulations are defined by the cross- classifications of the set of independent variables specified in the command. Let ni be the marginal count of subpopulation i ,
n (^) i nij j
J (^2) 1
If there is no observation for the cell of Y 2 j at subpopulation i , it is assumed that nij 2 0 , provided that ni 0. A non-negative scalar δ [ , )0 1 may be added to any zero cell (i.e., cell
Let ( n (^) i 1 ,..., niJ )T^ be the J 1 vector of counts for the categories of Y at subpopulation. It is assumed that each ( ni (^) 1 ,..., niJ )T^ is independently multinomial distributed with probability vector ( π (^) i 1 ,..., π iJ )T^ of dimension J 1 and fixed total ni.
Let γ (^) ij = Prob( Y ≤ j | x i ) be the cumulative response probability for Y , i.e.,
γ ij π il l
j (^2) 1
for j = 1, …, J -1. Notice that γ (^) iJ 2 1 , hence only the first J -1 γ ’s are needed in the model.
l rij (^) ij ri (^) j g ij j
J
i
m 2 (^)
^ ϕ (^1 )^ (ϕ^ ) 1
1
1
in which rij is the cumulative total
rij nk k
j (^2) 1
the argument ϕ (^) ij is given by
ϕ
γ ij γ γ
ij i j ij
% '
&
( 0
)
log 1
and the function g ( ϕ )is
g i j i j ij
( ϕ) ( (ϕ ))
γ γ γ
% '
&
( 0
)
log 1 exp log^1 1
Notice that a constant term c (^) i n (^) i ni niJ
m (^2) 1 log{ !/ ( 1 !! !)} which is independent of the unknown parameters has been excluded here. Thus, l is in fact the kernel of the true log- likelihood function. Further details of the log-likelihood function can be found at the end of this chapter.
Details of derivatives can be found at the end of this chapter.
First Derivative
The first derivative of l with respect to B (^) k , k 2 1 ,...,( J 1 ) p q , is
l B
l (^) U Q k
i ij
ij ijk j
J
i
m
1 ϕ
1
1
in which
li (^) r r ij
ij i j
ij ϕ i j
γ ( 1 ) (^) γ 1
U (^) ij i j ij i j ij
γ γ γ γ
1 ( 1 )
and
Qij Pijk ij P ij
i j k
ij i j
i j i j
γ η
γ γ
γ (^1 1) η
1 1
in which
k J
x J k J p z J p k J p q
ijk
ij k
jk i i k J i i k J p ij
7
8
u u uu
9
u u u u
η
δ
η
exp( )
exp( )
[ ( )]
[ {( ) }]
T
T
if
if if
z
z
1
1
δ (^) jk 2 1 if j = k , 0 otherwise, and P (^) iJk = 0. For i = 1, …, m , j = 1, …, J -1,
7
8
u uu
9
u u u
γ η
γ γ γ γ γ γ φ γ π γ π
ij ij
ij ij ij ij ij ij ij ij
( ) log( ) log( ) ( ( )) ( (. )) /
1
Logit link Complementary log - log link Negative Log - log link Probit link cos 2 Cauchit link
and γ (^) iJ / η iJ 20.
Second Derivative
The second derivative is
% '
&
( 0
)
2 2
1
1
1
l B B
l B U Q^
l U B Q^
l (^) U Q s k B
i s ij
ij ijk i ij
ij s
ijk i ij
ij
ijk j s
J
i
m ϕ ϕ ϕ
for s , k = 1, …, ( J – 1) + p + q. The first term of the equation is
(^2 ) 1
l B U Q^
r i U Q Q s ij
ij ijk
i j i j ϕ γ ij^ ijs^ ijk.
The second term is
% '&^
( 0 )^
% '
&
( 0
)
% '
&
( 0
)
2 2
1
1
1 1 1 1
1
1
1
1
1
l B B
l B U Q r U Q Q
n U Q Q
s k
i s ij
ij ijk j
J
i
m
i j j i j^ ij^ ijs^ ijk
J
i
m
i ij ijs ijk j
J
i
m
ϕ
γ
Maximum Likelihood Estimate
To obtain the maximum likelihood estimate of B , a Fisher Scoring iterative estimation method or Newton-Raphson iterative estimation method can be used. Let B ( ) t^ be the parameter vector at iteration t and l / B ( ) t be a vector of the first derivatives of l evaluated at B 2 B ( ) t^. Moreover, let A ( ) t^ be a {( J -1)+ p + q } × {( J -1)+ p + q } matrix such that
B B
( ) ( )
( )
t sk
s k
s k
l B B l B B
t
t
% '&^
( 0 )
7
8
u uu
9
u uu
2
2
Newton - Raphson approach
E Fisher Scoring approach
For a location-only model, the corresponding formulas use the first ( J -1)+ p elements of l / B ( ) t and the upper {( J -1)+ p } × {( J -1)+ p } submatrix of A ( ) t^. The parameter vector B at iteration t 1 is updated by B (^ t^ ^1 )^ where
( ) ( ) ( ) ( ) ( )
t t t t t
(^12) ξ l
and ξ 3 0 is a stepping scalar such that l (^) R B ( t^^ ^1 )^ W R l B ( ) t W 0.
Stepping
Use step-halving method if l (^) R B ( t^^ ^1 )^ W R l B ( ) t W 10. Let V be the maximum number of steps in step-halving, the set of values of ξ is {1/2 v : v = 0, …, V -1}.
Starting Values of the Parameters
Location-Only Model
If a location-only model is specified, set B ( )^0 2 ( )^0 T^ , 0 T^
T R W where
θ (^) j
ik k
j
i
m
i i
m
n
n
( ) (^0 )
1
%
'
& & & & &&
(
0
) ) ) ) ))
link
for j = 1, …, J -1.
General Model
If a general model is specified, first ignore the scale part (i.e., by assuming that = 0 and treat the model as if it is a location-only model) and use B ( )^0 2 ( )^0 T^ , 0 T^
T R W as the starting
value to obtain the maximum likelihood estimate
B 2 T^ iT^
T R , W. After^
B is obtained, find
the maximum likelihood estimate B ^ 2 ^ T^ , i^ T^ , T^
T R W of the general model by starting at T^ iT^ T^
T R ,^ ,^0 W.
The above practice is essentially the same as taking B ( )^0 2 (0) T^ , 0 T^ , 0 T^
T R W. The advantage is that the maximum likelihood estimate
B can be obtained in the process of finding B .
Ordinal Adjustments for the Threshold Parameters
If the monotonicity property θ 1 ! θ J 1 is not preserved at the end of any iteration, ad hoc adjustment will be taken before the next iteration starts. If θ ( ) jt^^3 θ( ) jt 1 for some j , then both θ ( ) jt^ and θ ( ) jt 1 are set toR θ ( ) jt^^ θ( ) jt ^1 W / 2 before the next iteration. This value is then compared with θ ( ) jt ^2 and so on.
Convergence Criteria
Given two convergence criteria ε (^) k 3 0 and ε (^) p 3 0 , the iteration is considered to be converged if one of the following criteria are satisfied:
B t^ ^1 B it^ 1 ε (^) p.
Location-Only Model versus Intercept-Only Model
The following statistic is available when a location-only model is specified. The Model Chi- square statistic is given by
2 l ( B ( )^0 ) 2 l ( B )
Under that null hypothesis that H 0 :i 2 0 , the Model Chi-square is asymptotically chi- squared distributed with e^ ^ – ( J – 1) degrees of freedoms.
General Model versus Location-Only Model
The following statistic is available when a general model is specified. The Model Chi-square statistic is given by
2 l ( ) 2 l ( ^ )
Under that null hypothesis that H 0 : 2 0 , the Model Chi-square is asymptotically chi- squared distributed with e ˆ^ – e^ ^ degrees of freedoms.
Likelihood Ratio Test for Equal Slopes Assumption
For location-only model, a likelihood ratio test of parallel lines in the location is performed. If the regression lines are not parallel, the location can be specified as
η (^) ij 2 θ j i (^) j T^ x i
for j = 1, …, J -1. That is, the location parameters i (^) j (or slopes) vary with the levels of the response. The parameter for the above “non-parallel” location-only model is B 2 ( T^ , i T j^ ,..., iT J^ 1 )T which is of dimension {( J -1)+( J -1) p } × 1. The first derivative l / B of the log-likelihood is the same as in the “parallel” model, except that Pijk 2 η (^) ij / Bk is replaced by the following:
k J ijk x J sp k J sp p s J
ij k
jk i k J sp
7 8 9 ^ ^
η δ^1 1 1 1 1 2
Similarly, the expected value of the second derivative is the same as in the parallel model, except that the Pijk is replaced by the above equation. To test the null hypothesis of parallelism H (^) 0 : i 1 2 ... 2 i J 1 , find the maximum likelihood estimate
B of the parallel location-only model and the maximum likelihood estimate
B of the non-parallel model. The Model Chi-square statistic is given by
2 l ( ) 2 l ( )
Under the null hypothesis, the Model Chi-square statistic is asymptotically chi-squared distributed with ( k -2) p degrees of freedoms.
Pseudo R Squares
Cox and Snell’s R Square
The Cox and Snell’s R^2 for a general model is
n CS^2
0
2 2 1
% '&^
( 0 )
Replace B by
B for a location-only model.
Nagelkerke’s R Square
The Nagelkerke’s R^2 is
N (^) L n (^2) CS^2 (^2 1) ( B ( ) (^0) ) 2 /
McFadden’s R Square
The McFadden’s R^2 for a general model is
R l M l 2 2 1 (^0)
% '&^
( 0 )
Replace B by
B for a location-only model.
Predicted Cell Counts & Cumulative Totals
Predicted Cell Counts
The estimated cell response probability based on the maximum likelihood estimate for the general model is
π
γ γ γ γ
ij
i ij i j i J
j j J j J
7 8
u
9
u
1 1 1
Covariance and Correlation Matrices
The estimate of the covariance matrix of B is
Cov
Newton - Raphson method
E Fisher Scoring method