Custom Tables - Mathematics and Statistics - Study Notes, Study notes of Mathematical Statistics

Main points of this file are Custom Tables, Conditions and assumptions, Statistics, Small sample validity of the test, Test statistics for multiple response sets, Use of case weights, Hypothesis

Typology: Study notes

2011/2012

Uploaded on 10/31/2012

sangawar
sangawar 🇮🇳

4.5

(4)

118 documents

1 / 14

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Custom Tables
This document describes the algorithms used in the Custom Tables procedure.
A note on weights and multiple response sets
Case weights are always based on Counts, not Responses, even when one of the
variables is a multiple response variable.
Pearson's Chi-Square
Notation
The following notation is used for the computation of Pearson’s chi-square:
R
Number of rows in the sub-table.
C Number of columns in the sub-table.
ij
f Sum of case weights in cell (i,j).
i
r Marginal case weights total in i-th row.
j
c Marginal case weights total in j-th column.
W Marginal case weights total in the sub-table.
ij
E Expected cell counts.
2
p
χ Pearson's Chi-Square statistic.
ij
p Population proportion for cell (i,j).
.i
p Marginal population proportion for i-th row.
j.
p Marginal population proportion for j-th column.
df Degrees of Freedom.
p
p-value of the chi-square test.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe

Partial preview of the text

Download Custom Tables - Mathematics and Statistics - Study Notes and more Study notes Mathematical Statistics in PDF only on Docsity!

1

This document describes the algorithms used in the Custom Tables procedure.

A note on weights and multiple response sets

Case weights are always based on Counts, not Responses, even when one of the variables is a multiple response variable.

Pearson's Chi-Square

Notation

The following notation is used for the computation of Pearson’s chi-square:

R Number of rows in the sub-table.

C Number of columns in the sub-table.

f (^) ij Sum of case weights in cell (i,j).

ri Marginal case weights total in i-th row.

c (^) j Marginal case weights total in j-th column.

W Marginal case weights total in the sub-table.

E (^) ij Expected cell counts.

2 χ (^) p

Pearson's Chi-Square statistic.

p (^) ij Population proportion for cell (i,j).

p (^) i. Marginal population proportion for i-th row.

p (^) .j Marginal population proportion for j-th column.

df Degrees of Freedom.

p p-value of the chi-square test.

2

α (^) Significance level supplied by the user.

Conditions and assumptions

  • Tests will not be performed on Comperimeter tables.
  • Chi-square tests are performed on each innermost sub-table of each layer.
  • If a scale variable is in the layer, that layer will not be used in analysis.
  • The row variable and column variable must be two different categorical variables or multiple response sets.
  • The contingency table must have at least two non-empty rows and two non- empty columns.
  • Non-empty rows and columns do not include subtotals and totals.
  • Empty rows and columns are assumed to be structural zeros. Therefore, R and C are the numbers of non-empty rows and columns in the table.
  • If weighting is on, cell statistics must include weighted cell counts or weighted simple row/column percents; the analysis will be performed using these weighted cell statistics. If weighting is off, cell statistics must include cell counts or simple row/column percents; the analysis will be unweighted.
  • Tests are constructed by using all visible categories. Hiding of categories and showing of user-missing categories are respected.

Statistics

Hypothesis:

H 0 :p (^) ij = pi.p.ji = 1 ,...,Rand j = 1 ,...,Cvs. not H 0

4

Multiple response set in rows and categorical variable in columns

R

i

C

j i ij

ij ij p

W

r E

f E

1 1

2 2

Under the null hypothesis, the statistic has an approximate Chi-square distribution with df = R ( C − 1 )degrees of freedom.

Multiple response sets in rows and columns

= = − −

R

i

C

j (^) i j ij

ij ij p

W

c

W

r E

f E

1 1

2 2

Under the null hypothesis, the statistic has an approximate Chi-square distribution with df = RC degrees of freedom.

P-value

p 1 F( ;df )

2 = − χp ,

where F( x;df)is the cumulative distribution function of Chi-square distribution

with df degrees of freedom.

The chi-square test is significant if the p <α.

Use of case weights:

The case weights (or frequency weights) are supposed to be integers representing number of replications of each case. In chi-square tests, we will only check if the

aggregated cell counts f (^) ij are integers. If not, they will be rounded to nearest

integer before computations.

Small sample validity of the test

Pearson's chi-square is a large sample test, it may not be valid when sample size is small. A rule of thumb is to check if there are more than 80% of cells have expected cell counts larger than 5 and expected cell counts are all larger than 1.

Test statistics for multiple response sets

The formulas above use a variation of the Pearson chi-square test statistics developed for a combination of categorical variable and a multiple response set as initially suggested by Agresti and Liu (1999). Formulas and properties of this test can be found in a comparative study by Bilder et al. (2000). An extension of this approach when both variables are multiple response sets is given in the paper by Thomas and Decady (2004). It contains a study of the test properties as well as additional references.

References

Agresti, A. and Liu, I.-M. (1999), “Modeling responses to a categorical variable allowing arbitrarily many category choices”, Biometrics, 55, 936-943.

Bilder, C.R., Loughin, T.M. and Nettleton, D. (2000), “Multiple marginal independence testing for pick any/c variables”, Communications in Statistics: Simulation, 29, 1285-1316.

Thomas, D.R. and Decady, Y.J. (2004), “Testing for association using multiple response survey data: approximate procedures based on Rao-Scott Approach”, International Journal of Testing, 4, 43-59.

Column Proportions Test

Notation

The following notation is used for the computation of Column Proportions Tests:

R Number of rows in the sub-table.

C Number of columns in the sub-table.

A (^) i i-th category of the row variable.

off, cell statistics requested must include cell counts or simple column percents; an unweighted analysis will be performed.

  • A proportion will be discarded if the proportion is equal to zero or one, or the

sum of case weights in a category is less than 2, (i.e. c (^) j < 2 ). If less than two

proportions are left after discarding proportions, test will not be performed.

Statistics

Table layout:

B 1 B 2 ... B (^) C

A 1 P 11 p 12 p (^) 1C

A 2 P 21 p 22 p (^) 2C

... (^) ... ... ... ...

AR p (^) R1 p (^) R2 ... p (^) RC

Hypothesis:

Without lost of generality, we will only look at the i-th row of the table. Let C* be the number of categories in the i-th row where the proportion is greater than zero and less than one, and where the sum of case weights in the corresponding column is at least 2. In the i-th row, C(C-1)/2 comparisons will be made among

p (^) i 1 ,pi 2 ,...,piC. The (j,k)th hypothesis will be

H (^0) jk:pij= pik vs. H (^1) jk:pij≠ pik.

Aggregated Statistics:

Column proportions tests are based on the aggregated proportions ( pˆ^ ij) and cell

counts for each column ( c (^) j). Column proportions are computed using the un-

8

rounded cell counts j

ij ij c

f pˆ = which are equal to the proportions actually

displayed in CTABLE.

Statistics for the (i,j)th comparisons:

Pooled proportion: j k

j ij k ik ijk c c

c p c p p ~ ~

z statistic with a categorical variable in the columns:

~^ )

j k

ijk ijk

ij ik

c c

p p

p p z

− +

When multiple response set defines columns there may exist cases that belong to

both j-th and k-th columns. Let c ~ jk^ be the rounded sum of weights for such cases.

z statistic with a multiple response set in the columns:

j k

jk

j k

ijk ijk

ij ik

cc

c

c c

p p

p p z

p-value: p = 2 [ 1 −Φ(|z|)] ,

where Φ( z)is the CDF of standard normal distribution.

Alternatively, the statistics can be constructed as a chi-square statistic,

2 2 χ =z ,

10

k Number of categories with case weights greater than or equal to 2.

μ (^) i Population mean of the i-th category, i=1,...,k.

x (^) ij j-th observation in i-th group.

w (^) ij Case weight of the j-th observation in i-th group.

w (^) i Sum of case weights in category i, i=1,...,k.

w i

~ (^) Rounded sum of case weights in category i, i=1,...,k.

x (^) i Mean of category i, i=1,...,k.

s (^) i Standard devation of category i, i=1,...,k.

s (^) ij Pooled standard deviation from i-th and j-th group.

s (^) w Pooled standard deviation of all categories.

W (^) Total case weights. Sum of rounded w (^) i's.

p (^) B p-value adjusted by using Bonferroni method.

α Significance level supplied by the user.

Conditions and Assumptions

  • Tests will not be performed for Comperimeter tables.
  • Tests are performed on each innermost sub-tables for each layer.
  • The row variable must be a scale variable, possibly nested under or over some categorical variables. The column variable must be categorical or a multiple response set.
  • If weighting is on, cell statistics must include weighted means; a weighted analysis will be performed using the weighted statistics. If weighting is off, cell statistics must include means, an unweighted analysis will be performed.
  • Tests are constructed by using all visible, non-empty categories excluding totals and sub-totals. Hiding of categories and showing of user-missing categories are respected.
  • Total case weights in each category must be at least two. Categories not satisfying this assumption are not used. If number of categories satisfying this condition is less than two, no comparisons will be made.
  • Variances of all categories are assumed to be equal.
  • User and system missing values of scale variables are excluded.

Statistics

All Pairwise Comparisons

Hypotheses:

H (^0) ij:μ (^) i= μ j, vs. H 1 ij :μ (^) i≠ μj , for all i > j.

Total number of hypotheses: 2

k (k 1 )

, (where ∑

=

k

i 1

i

k I(w 2 )).

Aggregated statistics:

The statistics in pairwise comparisons are computed from aggregated category

means ( x (^) i), sample variances (

2 si ) and sample sizes ( w (^) i), i=1,...,k. Various

quantities used in the comparisons are shown below.

Total case weight (sample size): ∑

=

k

i 1

W round(wi )I(wi 2 )

Mean of i-th category: i

n

j 1

ij ij

i w

w x

x

i

=

T-statistic for comparing levels of a multiple response set

t (^) ij =

i j

ij

i j

ij

i j

ww

w

w w

s

x x

,

P-value p = 2 [ 1 − F (| tij |; w ~ i + w ~ jw ~ ij − 2 )] ,

A comparison is significant if p <α(or p (^) B<α, if Bonferroni adjustment is

used).

Statisitics for (i,j)th comparisons with variance pooled from all categories

Assume w (^) i ≥ 2 and w (^) j ≥ 2.

Within groups variance pooled from all the categories:

2

2 1

W k

I w w s

s

i

k

i

i i

w

=

T-statistic for levels of a categorical variable:

t (^) ij =

i j

w

i j

w w

s

x x

P-value p 2 [ 1 F(|t |;W k )]

= − ij −.

A comparison is significant if p <α(or p (^) B<α, if Bonferroni adjustment is

used).

This test is available for categories defined by categorical variable only.

14

Bonferroni adjustment

If the Bonferroni adjustment for multiple comparisons is requested, the p-value p will be adjusted by

min(

pk k p (^) B

Possible computational problems:

From the formulas, we can see that comparison can be made as long as

either

2 s ij or

2 s w is nonzero. If variances for both compared categories are zero, the

first test cannot be conducted. If variances for all categories with cell count greater

than or equal to two are zero,

2 s (^) w becomes zero and the second test conducted be

conducted either.

Use of case weights:

The case weights (or frequency weights) are supposed to be integers representing number of replications of each case. If sum of case weights in each group

( w (^) i,i=1,...,k) are not integers, they will be rounded to the nearest integers before

calculations. Consequently, the total weight W will become the sum of rounded

w (^) i's.