Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Statistical Analysis of Contingency Tables: Crosstabs and Associated Statistics, Study notes of Mathematical Statistics

Alliance University Mathematical Statistics

An in-depth analysis of various statistics used in the statistical analysis of contingency tables, including marginal and cell statistics, chi-square statistics, goodman and kruskal's tau, and cohen's kappa. It covers formulas, standard errors, and degrees of freedom for each statistic.

Typology: Study notes

2011/2012

Uploaded on 10/31/2012

sangawar 🇮🇳

4.5

(4)

118 documents

1 / 29

This page cannot be seen from the preview

Don't miss anything!

CROSSTABS

The notation and statistics refer to bivariate subtables defined by a row variable X

and a column variable Y, unless specified otherwise. By default, CROSSTABS

deletes cases with missing values on a table-by-table basis.

Notation

The following notation is used throughout this chapter unless otherwise stated:

Xi Distinct values of row variable arranged in ascending order:

XX X

R12

<<<L

Yj Distinct values of column variable arranged in ascending order:

YY Y

C12

<<<L

fij Sum of cell weights for cases in cell ij,

cj fij

∑

, the jth column subtotal

ri fij

∑

, the ith row subtotal

W cr

∑∑

, the grand total

Marginal and Cell Statistics

Count

count =fij

Discover Study notes of Mathematical Statistics Alliance University

Partial preview of the text

Download Statistical Analysis of Contingency Tables: Crosstabs and Associated Statistics and more Study notes Mathematical Statistics in PDF only on Docsity!

The notation and statistics refer to bivariate subtables defined by a row variable X and a column variable Y, unless specified otherwise. By default, CROSSTABS deletes cases with missing values on a table-by-table basis.

Notation

The following notation is used throughout this chapter unless otherwise stated: X (^) i Distinct values of row variable arranged in ascending order: X 1 < X 2 < L Expected Count

E

r c ij W = i^ j

Row Percent

row percent = 100 ×Q f (^) ij riV

Column Percent

column percent = 100 ×Q f (^) ij cjV

Total Percent

total percent = 100 ×Q f (^) ij WV

Residual

Rij = f (^) ij −Eij

Standardized Residual

SR

R

ij E

ij ij

Yates Continuity Corrected for 2 x 2 Tables

χ (^) c

W f f f f W r r c c

f f f f 2

11 22 12 21

1 2 1 2

11 22 12 21

u u

P. U if > 0.5W

otherwise

The degrees of freedom are 1.

Mantel-Haenszel Test of Linear Association

χ (^2) MH^ = (^) IW − (^1) Tr^2

where r is the Pearson correlation coefficient to be defined later. The degrees of freedom are 1.

Other Measures of Association

Phi Coefficient

For a table not 2 × 2

χ = p W

For a 2 × 2 table only, ϕ is equal to the Pearson correlation coefficient so that the sign of ϕ matches that of the correlation coefficients.

Coefficient of Contingency

CC

W

p p

% '

( 0

))

χ χ

2 2

1 2

Cramér’s V

V

W q

= p −

% '

( 0

))

χ 2

1 2

I 1 T

where q = minY R C, d.

Measures of Proportional Reduction in Predictive Error

Lambda

Let f (^) im and f (^) mj be the largest cell count in row i and column j, respectively. Also, let rm be the largest row subtotal and cm the largest column subtotal. Define λY X as the proportion of relative error in predicting an individual’s Y category that can be eliminated by knowledge of the X category. λY X is computed as

λY X

im m i

f c = W c

∑ 1

The standard errors are

ASE

f f f c r W

W r c

ASE

f W

W r c

ij ijr^ ijc^ ir^ cj j

C im i

R mj m m j

m m

ij ijr^ ijc^ ir^ cj^ ir^ cj j

m m

2 1 1 1

2 1

% '

( 0

))

2 2 2

5 5 5 − −

= = = =

= =

δ δ δ δ

δ δ δ δ λ δ δ λ

Q V

where

ijr^

ir^

i f

i r

= (^78) 9

if is row index for otherwise

if is index for otherwise

and where

ijc^

ic^

j f

j c

= (^78) 9

if is column index for otherwise

if is index for otherwise

Goodman and Kruskal’s Tau (Goodman & Kruskal, 1954)

Similarly defined is Goodman and Kruskal’s tauI T τ :

τ (^) Y X

ij i i j

j j

W f r c

W c

∑ ∑

∑

2 2 1 2 2 1

R W ,

with standard error

ASE f v r

f c c W r

f r ij f i

ij j j j

ij i

ij j

i j

(^1 ) 1

2 1

2 = 4 − %^1 −^1 '

& &

(

) )

& &

(

) )

7 8 u 9 u

@ Au = = Bu δ ∑ I^ δT ∑^ δ ∑ ,

in which

δ = − = − = =

W (^) ∑ c (^) j v W (^) ∑ f r (^) ∑c j

C ij i i j

j j

C 2 2 1

2 2 1

and ,

τ (^) X Y and its standard error can be obtained by interchanging the roles of X and Y.

The significance level is based on the chi-square distribution, since

W C

W R

Y X R C

X Y R C

− −

(^21 )

1 1

H SH S

τ χ

where

P f

c r ij Wf

j i i j ij

% '

( 0 ∑ ln ) ,

The formulas for U (^) X Y can be obtained by interchanging the roles of X and Y. A symmetric version of the two asymmetric uncertainty coefficients is defined as follows:

U U X^ U Y^ U XY

U X U Y

= +^ −

1 3

2 2

4 6

5 5

2 I T^ I T^ I^ T I T I T

with asymptotic standard errors

ASE

W U X U Y

f U XY

r c W

U X U Y

f ij W

i j ij i j

(^1 2 )

2 =^2

% '&^

( 0 )^

% '&^

( 0 )

7 8 u 9 u^

@ Au Bu

∑ I T I T

I T ln I T I Tln ,

ASE

W U X U Y

0 =^2 P^ U X^ U Y^ U^ XY^2 W

I T I T

I T I T I T

Cohen’s Kappa

Cohen’s kappaI T κ , defined only for square table (^) IR =CT , is computed as

κ =

= =

∑ ∑

∑

W f r c

W r c

ii i

R i i i

i i i

1 1 2 1

with variance

var

(^1 2 2 2 )

2 2 2

2 4

7 8

2 2 2

5 5 5 −

u u u

(

))

(

)

∑ ∑

∑

∑ ∑ ∑ ∑

∑

∑ ∑ ∑

∑

W

f W f

W r c

W f f r c W f r c

W r c

W f W f r c r c

W r c

W W r c

W r c

ii ii

i i

ii ii i i ii i i

i i

ii ij j i i j

i i

i i i

R WR W

R W

R W R I^ TW

R W

R W Q V R W

R W

(^0) ) +

(

)) −^ +

(

))

2 2 2

5 5 5

∑ r ci^ i^ W^ ∑r c^ r^ c i

i i i i i

2 I T

ASE

D D

f D D C D v W D D r c

ij r c ij ij b ij i j

1 b r c

= (^) ∑ 2 − + − + I T R^ Q^ V W^

τ τ I T ,

where

vij = r Di c +c Dj r

Under the independence assumption, the standard error is

ASE

f C D (^) W P Q

D D

ij ij ij i j r c

(^2 )

∑ Q^ −^ V^ −^ I^ − T ,

Kendall’s Tau- c

τ (^) c

q P Q W q

I T (^2) I 1 T

with standard error

ASE q q W

f (^) ij Cij D (^) ij W P Q i j

(^1 )

∑ −^ −^ − I T

Q V I^ T ,

or, under the independence assumption,

ASE q q W

f (^) ij C (^) ij D (^) ij W P Q i j

(^0 )

∑ −^ −^ − I T

Q V I^ T ,

where

q = min Y R C, d

Gamma

GammaI T γ is estimated by

γ = −

P Q

with standard error

ASE

P Q

f (^) ij QC (^) ij PDij i j

(^1 )

=^42

∑ − I T

Q V ,

or, under the hypothesis of independence,

ASE

P Q

f C D W ij ij ij P^ Q i j

∑ −^ −^ − I T Q^ V^

I T ,

Somers’ d

Somers’ d with row variable X as the independent variable is calculated as

d P^ Q Y X (^) Dr

with standard error

ASE

D

f D C D P Q W R r

ij r ij ij i i j

(^1 )

= (^) ∑ t Q − (^) V I− − (^) TI − Tv ,

or, under the hypothesis of independence,

where

cov , ,

X Y X Y f X r Y c W

S X X r X r W

i j ij i j

i i i

R j j j

i i i

R i i i

I T

H S

% '

( 0

)

% '

( 0

))

% '

( 0

)

= =

1 1

2 1 1

and

S Y Y cj j Y c W j

C j j j

C I T =^ −

& &

(

) ∑= ∑= )

2 1 1

The variance of r is

var ,

(^1 )

1 32

4 65

7 8 9

@ A T ∑ B

f (^) ij T X (^) i X Y (^) j Y ST X (^) i X S Y Y (^) j Y S X i j

Q VQ V Q V I T Q V I T

If the null hypothesis is true,

var 0 ,^ ,

2 2

& &

(

) ) %

(

))

& &

(

) )

∑ ∑

f X Y f X Y W

r X c Y

ij i j i j

i i i

j j j

where

X X ri i W i

R

∑ 1

and

Y Y cj j W j

C

∑ 1

Under the hypothesis that ρ = 0 ,

t r^ W r

is distributed as a t with W − 2 degrees of freedom.

Spearman Correlation

The Spearman’s rank correlation coefficient rs is computed by using rank scores Ri for X (^) i and Ci for Y (^) j. These rank scores are defined as follows:

R r r i R

C c c j^ C

i k i k i

j h j h j

I T

Q V

for

K

The formulas for rs and its asymptotic variance can be obtained from the Pearson formulas by substituting Ri and C (^) j for X (^) i and Y (^) j , respectively.

Eta

Asymmetric η with the column variable Y as dependent is

ηY SYW S Y

I T

where

v f f f f

f f f f

% '&^

( 0 )

12 11 11 12

22 21 21 22

1 2

I T I T

The relative risk for column 2 and the confidence interval are computed similarly.

McNemar-Bowker’s Test

This statistic is used to test if a square table is symmetric.

Notations

n Dimension of the table (both row and column)

p ij Unknown population cell probability of row i and column j

n ij Observed counts cell count of row i and column j

Algorithm

Given a n × nsquare table, the McNemar-Bowker’s statistic is used to test the

hypothesis H 0 :pij =pjifor all (i

A Special Case: 2x2 Tables

For 2x2 table, the statistic reduces to the classical McNemar (1947) statistic for which exact p-value can be computed. The two-tailed probability level is

2 12 21 ( 1 / 2 )^1221

min( , )

n n 12 21 n n

i i

n n +

Conditional Independence and Homogeneity

The Cochran’s and Mantel-Haenzel statistics test the independence of two dichotomous variables, controlling for one or more other categorical variables. These “other” categorical variables define a number of strata, across which these statistics are computed. The Breslow-Day statistic is used to test homogeneity of the common odds ratio, which is a weaker condition than the conditional independence (i.e., homogeneity with the common odds ratio of 1) tested by Cochran’s and Mantel-Haenszel statistics. Tarone’s statistic is the Breslow-Day statistic adjusted for the consistent but inefficient estimator such as the Mantel-Haenszel estimator of the common odds ratio.

Notation and Definitions

The addition of strata requires the following modifications to the notation: K (^) The number of strata. f (^) ijk Sum of cell weights for cases in the ith row of the jth column of the kth strata.

c (^) jk f (^) ijk i

Â 1

, the jth column of the kth strata subtotal.

rik f (^) ijk j

Â 1

, the ith row of the kth strata subtotal.

n (^) k c (^) jk r j

C ik i

= =

Â =Â 1 1

, the grand total of the kth strata.

Statistical Analysis of Contingency Tables: Crosstabs and Associated Statistics, Study notes of Mathematical Statistics

Related documents

Partial preview of the text

Download Statistical Analysis of Contingency Tables: Crosstabs and Associated Statistics and more Study notes Mathematical Statistics in PDF only on Docsity!

Notation

E

SR

R

Other Measures of Association

CC

W

V

Measures of Proportional Reduction in Predictive Error

ASE

ASE

W C

W R

H SH S

H SH S

U U X^ U Y^ U XY

U X U Y

= +^ −

ASE

W U X U Y

U X U Y

ASE

W U X U Y

0 =^2 P^ U X^ U Y^ U^ XY^2 W

Cohen’s Kappa

W

ASE

D D

ASE

Gamma

P Q

P Q

ASE

P Q

=^42

ASE

P Q

Somers’ d

ASE

D

R

C

Spearman Correlation

K

K

Eta

McNemar-Bowker’s Test

Notations

n Dimension of the table (both row and column)

p ij Unknown population cell probability of row i and column j

n ij Observed counts cell count of row i and column j

Algorithm

Given a n × nsquare table, the McNemar-Bowker’s statistic is used to test the

hypothesis H 0 :pij =pjifor all (i

A Special Case: 2x2 Tables

2 12 21 ( 1 / 2 )^1221

i i

n n +

Conditional Independence and Homogeneity

Notation and Definitions