Contingency Tables - Lecture Notes | STAT 102, Study notes of Statistics

Material Type: Notes; Class: INTRO BUSINESS STAT; Subject: Statistics; University: University of Pennsylvania; Term: Unknown 1998;

Typology: Study notes

Pre 2010

Uploaded on 03/28/2010

koofers-user-83o
koofers-user-83o 🇺🇸

10 documents

1 / 23

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Lecture 23 Contingency Tables
STAT 102
Stat 102 has concentrated on analysis of explanatory and
predictive models with one dependent variable (Y) and one or
more predictive variables or factors (often denoted by X).
Data Structures examined include situations described as:
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17

Partial preview of the text

Download Contingency Tables - Lecture Notes | STAT 102 and more Study notes Statistics in PDF only on Docsity!

Lecture 23 Contingency Tables

STAT 102

 Stat 102 has concentrated on analysis of explanatory and predictive models with one dependent variable ( Y ) and one or more predictive variables or factors (often denoted by X ).  Data Structures examined include situations described as:

 Ordinary linear regression: Y continuous ; One continuous x -variable.  Multiple linear regression: Y continuous ; Several continuous x -variables  Polynomial regression: Like multiple regression, but with some polynomial terms  ANOVA: Y continuous ; One or more qualitative X-factors  Regression and ANOVA combined:  Logistic Regression Y qualitative ; One or more continuous X-factors (We also included a qualitative X-factor (sex) in one example)

We haven’t yet talked about ….

Example Wine and Music: Do shoppers buy more wine when appropriate music is played? Data: A supermarket in N. Ireland played no background music, French music, or Italian music at random times of the day. Here is the number of purchases for different combination of type of music and type of wine: 2-way table of results

Type of Wine By Type of Music Count (^) French Italian None Total French 39 30 30 99 Italian 1 19 11^31 Other 35 35 43 113 Total 75 84 84 243

Taken from Moore and McCabe, Introduction to the Practice of Statistics, 5th^ edition. Original source, Ryan, Northrup-Clewes, Knox and Thurnham (1998). The effect of in-store music on consumer choice of wine. Proc. Nutrition Soc. 57. p1069.

Test of Independence Null Hypothesis: In words -- H 0 : Row effects are independent of Column effects Symbolically – The (population) probabilities for the row effects (Wine) are

denoted by^1 2 3

def for ,

P BuyingFrenchWine Similar

The probabilities for the column effects (Music) are 1 2 3

, are similarly defined

 P FrenchMusic

H 0 is that the probability of a count in the i , j cell is the product, for example Pr (^)  Frenc h& None (^)   Pr count in cell 1 , (^3)     1 3  P 13. Ha is that H 0 is not true.

Computations (cont) We can put this together to get estimates for what the cell counts should be if H 0 is true. The expected (under H 0 ) cell counts would be Expected ijEijN (^)  P ˆ ij. Pearson’s Chi-Square statistic for testing H 0 is

2 ij ij ij (^) ij

N E

E

χ^2  .

Under H 0 it has a Chi-Square distribution with

df   I  1  J  1 

In our example, df  2  2  4. Your book has tables of the Chi-Square distribution, or use JMP. We REJECT when χ^2  C , with C as found in the table.

Computations (summary)

 Get the table. It has I rows and J columns and entries Nij.

 Calculate Eij , the Expected (under H 0 ) cell counts. Looking back

we see they have the formula j ij E Ni^ j N Ni N N

N

N

N

  

    

 

 ^.

 Calculate

2 2 Observed Expected Expected

ij ij ij ij ij (^) ij ij

N E

E

χ^2    .

 Use df   I  1  J  1 . Get the critical value, C , from the Chi-

Square table with this df. Reject whenever χ^2  C.

Analysis in JMP (cont) JMP also gives the value of (^) χ^2 , and the resulting P-value: Test ChiSquare Prob>ChiSq Likelihood Ratio 21.875 0. Pearson 18.279 0. If you calculated by hand you would get this from the table on p 9 as 18.279  2.3337  7.6724  .0004  .5209 ....

( You can ignore the Likelihood Ratio statistic. It belongs to a different statistical method. It comes from a different set of formulas than Pearson’s Chi-Square. The P-values from the two tests are nearly always very similar. )

The result of this calculation is that we Reject H 0. (We do so at any reasonable level of significance since P-value = .0011). This should be followed by a summary of the nature of the effects we have detected.

Nature of the Effects We have Rejected H 0. Therefore we believe choice of wine and nature of music are related to each other. What is the relation? There are formal ways to investigate this [see the Appendix to this lecture] , but for Stat 102 we recommend looking at the numbers and use common-sense. You can look at the cell counts (as on p6), and/or at the cell Chi- Square values (as on p9) and/or at row or column % (as below). Used carefully, any of these looks should yield similar conclusions. Here is the table of counts and column %:

“When French music is playing customers seem very much less likely to purchase Italian wine than they are to purchase Italian wine when Italian music is playing.”

Also (from the last two columns ) “There doesn’t seem to be much difference in distribution of wine choice between times when Italian music is playing and when no music is playing. (There’s a slightly greater % of people choosing Italian wine when Italian music is playing, but it’s not a large difference.)”

Second Example Income and Job Satisfaction About 50 years ago, a classic sociological study looked at the relationship between a worker’s salary and their job satisfaction. One two-way table from this study looked at the results from a survey of worker’s income and a questionnaire judging their job satisfaction. Annual job INCOME was divided into 4 categories <$6,000, $6,000 - $15,000, $15,000 - $25,000, >$25,000. Their SATISFACTION was graded into four categories 1 = very dissatisfied, 2 = somewhat dissatisfied, 3 = somewhat satisfied, 4 = very satisfied. [By accident, both examples in this lecture have square contingency tables. But that’s not necessary; rectangular tables are also common].

Data table and results of the analysis follow:

An Aside The Mosaic Plot in JMP provides a very nice visualization for this data: Mosaic Plot

Satisfaction

(1) 0-6 (2) 6- (3) 15-25(4) 25+ Income

2

3

4

You can clearly see how the % of very dissatisfied workers goes down as earnings go up, and the % of very satisfied workers increases with income.

Pearson Chi-Square Test of Independence

Test of H 0 : Satisfaction is independent of Income

df = 33 = 9.

Results from JMP:

Test ChiSquare Prob>ChiSq Likelihood Ratio 12.037 0. Pearson 11.989 0. The P-value is 0.2140. We cannot reject the null hypothesis.

[We have to regretfully conclude that numbers like those in the table might well have occurred by chance. BUT note that not only do we have numbers like these, but they occur in a very obvious pattern. Our test isn’t looking for any special pattern. Might there be a better test to use on this data? See the Appendix .]

GLM (log-link) Wine and Music Data

Whole Model Test Model - LogLikelihood L-R ChiSquare DF Prob>ChiSq Difference 38.99 77.98 8 <. Full 21. Reduced 60. Effect Tests Source DF L-R ChiSquare Prob>ChiSq Wine 2 68.33 <. Music 2 11.31 0. Music*Wine 4 21.88 0.

These tables show that various things we can see in the table are statistically significant. In particular, the interaction is significant.

It is this interaction term that describes the dependence of Wine and Music. The significance of this term is another confirmation of the result of our earlier Pearson Chi-Square test.

Parameter Estimates This analysis yields a parameter estimates table that one can use to estimate the values of relevant quantities.

For example one can estimate a quantity like E  Log (^)  N (^) 12 N  2 ,

and hence also E N  (^) 12 N  2 . This corresponds to the proportion of

Italian Music listeners who purchase French wine.

Parameter Estimates Term Estimate Std Error L-R ChiSquare Prob>ChiSq Intercept 2.96 0.13 51.04 <. Wine[French] 0.52 0.14 20.25 <. Wine[Italian] - 1.18 0.24 67.05 <. Music[French] - 0.56 0.24 10.64 0. Music[Italian] 0.34 0.14 7.05 0. Music[French]Wine[French] 0.73 0.25 15.21 <. Music[French]Wine[Italian] - 1.22 0.46 16.53 <. Music[Italian]Wine[French] - 0.42 0.17 7.82 0. Music[Italian]Wine[Italian] 0.83 0.26 15.21 <.