





















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An in-depth analysis of various statistics used in the statistical analysis of contingency tables, including marginal and cell statistics, chi-square statistics, goodman and kruskal's tau, and cohen's kappa. It covers formulas, standard errors, and degrees of freedom for each statistic.
Typology: Study notes
1 / 29
This page cannot be seen from the preview
Don't miss anything!






















1
The notation and statistics refer to bivariate subtables defined by a row variable X and a column variable Y, unless specified otherwise. By default, CROSSTABS deletes cases with missing values on a table-by-table basis.
The following notation is used throughout this chapter unless otherwise stated: X (^) i Distinct values of row variable arranged in ascending order: X 1 < X 2 < L Expected Count
r c ij W = i^ j
Row Percent
row percent = 100 ×Q f (^) ij riV
Column Percent
column percent = 100 ×Q f (^) ij cjV
Total Percent
total percent = 100 ×Q f (^) ij WV
Residual
Rij = f (^) ij −Eij
Standardized Residual
ij E
ij ij
Yates Continuity Corrected for 2 x 2 Tables
χ (^) c
W f f f f W r r c c
f f f f 2
11 22 12 21
2
1 2 1 2
11 22 12 21
7
8
u u
9
u u
P. U if > 0.5W
otherwise
The degrees of freedom are 1.
Mantel-Haenszel Test of Linear Association
χ (^2) MH^ = (^) IW − (^1) Tr^2
where r is the Pearson correlation coefficient to be defined later. The degrees of freedom are 1.
Phi Coefficient
For a table not 2 × 2
ϕ
χ = p W
2
For a 2 × 2 table only, ϕ is equal to the Pearson correlation coefficient so that the sign of ϕ matches that of the correlation coefficients.
Coefficient of Contingency
p p
% '
&&
( 0
))
χ χ
2 2
1 2
Cramér’s V
W q
= p −
% '
&&
( 0
))
χ 2
1 2
I 1 T
where q = minY R C, d.
Lambda
Let f (^) im and f (^) mj be the largest cell count in row i and column j, respectively. Also, let rm be the largest row subtotal and cm the largest column subtotal. Define λY X as the proportion of relative error in predicting an individual’s Y category that can be eliminated by knowledge of the X category. λY X is computed as
λY X
im m i
R
m
f c = W c
=
∑ 1
The standard errors are
f f f c r W
W r c
f W
W r c
ij ijr^ ijc^ ir^ cj j
C im i
R mj m m j
C
i
R
m m
ij ijr^ ijc^ ir^ cj^ ir^ cj j
C
i
R
m m
0
2 1 1 1
2
1
1
2 1
2 1
% '
&&
( 0
))
1
3
2 2 2
4
6
5 5 5 − −
= = = =
= =
δ δ δ δ
δ δ δ δ λ δ δ λ
Q V
Q V
where
δ
δ
ijr^
mj
ir^
m
i f
i r
= (^78) 9
= (^78) 9
if is row index for otherwise
if is index for otherwise
and where
δ
δ
ijc^
im
ic^
m
j f
j c
= (^78) 9
= (^78) 9
if is column index for otherwise
if is index for otherwise
Goodman and Kruskal’s Tau (Goodman & Kruskal, 1954)
Similarly defined is Goodman and Kruskal’s tauI T τ :
τ (^) Y X
ij i i j
j j
C
j j
C
W f r c
W c
∑ ∑
∑
=
=
2 2 1 2 2 1
R W ,
with standard error
ASE f v r
f c c W r
f r ij f i
ij j j j
C
i
ij i
ij j
C
i j
(^1 ) 1
2
2 1
2 = 4 − %^1 −^1 '
& &
(
0
) )
%
'
& &
(
0
) )
7 8 u 9 u
@ Au = = Bu δ ∑ I^ δT ∑^ δ ∑ ,
in which
δ = − = − = =
W (^) ∑ c (^) j v W (^) ∑ f r (^) ∑c j
C ij i i j
j j
C 2 2 1
2 2 1
and ,
τ (^) X Y and its standard error can be obtained by interchanging the roles of X and Y.
The significance level is based on the chi-square distribution, since
Y X R C
X Y R C
− −
− −
(^21 )
1 1
2
H SH S
H SH S
τ χ
τ χ
where
P f
c r ij Wf
j i i j ij
% '
&
( 0 ∑ ln ) ,
2
The formulas for U (^) X Y can be obtained by interchanging the roles of X and Y. A symmetric version of the two asymmetric uncertainty coefficients is defined as follows:
1 3
2 2
4 6
5 5
2 I T^ I T^ I^ T I T I T
with asymptotic standard errors
f U XY
r c W
f ij W
i j ij i j
(^1 2 )
2 =^2
% '&^
( 0 )^
% '&^
( 0 )
7 8 u 9 u^
@ Au Bu
∑ I T I T
I T ln I T I Tln ,
or
I T I T
I T I T I T
Cohen’s kappaI T κ , defined only for square table (^) IR =CT , is computed as
κ =
= =
=
∑ ∑
∑
W f r c
W r c
ii i
R i i i
R
i i i
R
1 1 2 1
with variance
var
var
,
(^1 2 2 2 )
2 2 2
2 4
2
2
2
7 8
uu
9
uu
1
3
2 2 2
4
6
5 5 5 −
@
A
u u u
B
u u u
%
'
&&
(
0
))
%
'
&&
(
0
)
∑ ∑
∑
∑ ∑ ∑ ∑
∑
∑ ∑ ∑
∑
∑
∑
f W f
W r c
W f f r c W f r c
W r c
W f W f r c r c
W r c
W W r c
W r c
ii ii
i i
ii ii i i ii i i
i i
ii ij j i i j
i i
i i
i i i
i i i
R WR W
R W
R W R I^ TW
R W
R W Q V R W
R W
(^0) ) +
%
'
&&
(
0
)) −^ +
%
'
&&
(
0
))
1
3
2 2 2
4
6
5 5 5
∑ r ci^ i^ W^ ∑r c^ r^ c i
i i i i i
2 I T
f D D C D v W D D r c
ij r c ij ij b ij i j
1 b r c
= (^) ∑ 2 − + − + I T R^ Q^ V W^
τ τ I T ,
where
vij = r Di c +c Dj r
Under the independence assumption, the standard error is
f C D (^) W P Q
D D
ij ij ij i j r c
0
(^2 )
2
∑ Q^ −^ V^ −^ I^ − T ,
Kendall’s Tau- c
τ (^) c
q P Q W q
I T (^2) I 1 T
with standard error
ASE q q W
f (^) ij Cij D (^) ij W P Q i j
(^1 )
∑ −^ −^ − I T
Q V I^ T ,
or, under the independence assumption,
ASE q q W
f (^) ij C (^) ij D (^) ij W P Q i j
(^0 )
∑ −^ −^ − I T
Q V I^ T ,
where
q = min Y R C, d
GammaI T γ is estimated by
γ = −
with standard error
f (^) ij QC (^) ij PDij i j
(^1 )
∑ − I T
Q V ,
or, under the hypothesis of independence,
f C D W ij ij ij P^ Q i j
0
∑ −^ −^ − I T Q^ V^
I T ,
Somers’ d with row variable X as the independent variable is calculated as
d P^ Q Y X (^) Dr
with standard error
f D C D P Q W R r
ij r ij ij i i j
(^1 )
= (^) ∑ t Q − (^) V I− − (^) TI − Tv ,
or, under the hypothesis of independence,
where
cov , ,
X Y X Y f X r Y c W
S X X r X r W
i j ij i j
i i i
R j j j
C
i i i
R i i i
R
I T
H S
% '
&
( 0
)
% '
&&
( 0
))
% '
&
( 0
)
= =
= =
1 1
2 1 1
2
and
S Y Y cj j Y c W j
C j j j
C I T =^ −
%
'
& &
(
0
) ∑= ∑= )
2 1 1
2
The variance of r is
var ,
(^1 )
1 32
4 65
7 8 9
@ A T ∑ B
f (^) ij T X (^) i X Y (^) j Y ST X (^) i X S Y Y (^) j Y S X i j
Q VQ V Q V I T Q V I T
If the null hypothesis is true,
var 0 ,^ ,
2 2
2
2 2
%
'
& &
(
0
) ) %
'
&&
(
0
))
%
'
& &
(
0
) )
∑ ∑
∑ ∑
f X Y f X Y W
r X c Y
ij i j i j
ij i j i j
i i i
j j j
where
X X ri i W i
=
∑ 1
and
Y Y cj j W j
=
∑ 1
Under the hypothesis that ρ = 0 ,
t r^ W r
is distributed as a t with W − 2 degrees of freedom.
The Spearman’s rank correlation coefficient rs is computed by using rank scores Ri for X (^) i and Ci for Y (^) j. These rank scores are defined as follows:
R r r i R
C c c j^ C
i k i k i
j h j h j
<
<
I T
Q V
for
for
The formulas for rs and its asymptotic variance can be obtained from the Pearson formulas by substituting Ri and C (^) j for X (^) i and Y (^) j , respectively.
Asymmetric η with the column variable Y as dependent is
ηY SYW S Y
I T
where
v f f f f
f f f f
% '&^
( 0 )
12 11 11 12
22 21 21 22
1 2
I T I T
The relative risk for column 2 and the confidence interval are computed similarly.
This statistic is used to test if a square table is symmetric.
For 2x2 table, the statistic reduces to the classical McNemar (1947) statistic for which exact p-value can be computed. The two-tailed probability level is
min( , )
0
n n 12 21 n n
=
The Cochran’s and Mantel-Haenzel statistics test the independence of two dichotomous variables, controlling for one or more other categorical variables. These “other” categorical variables define a number of strata, across which these statistics are computed. The Breslow-Day statistic is used to test homogeneity of the common odds ratio, which is a weaker condition than the conditional independence (i.e., homogeneity with the common odds ratio of 1) tested by Cochran’s and Mantel-Haenszel statistics. Tarone’s statistic is the Breslow-Day statistic adjusted for the consistent but inefficient estimator such as the Mantel-Haenszel estimator of the common odds ratio.
The addition of strata requires the following modifications to the notation: K (^) The number of strata. f (^) ijk Sum of cell weights for cases in the ith row of the jth column of the kth strata.
c (^) jk f (^) ijk i
R
=
 1
, the jth column of the kth strata subtotal.
rik f (^) ijk j
C
=
 1
, the ith row of the kth strata subtotal.
n (^) k c (^) jk r j
C ik i
R
= =
 = 1 1
, the grand total of the kth strata.