Chi-Square Test and Contingency Analysis: Understanding Independence and Goodness-of-Fit, Slides of Statistics

An in-depth explanation of chi-square test and contingency analysis, two statistical methods used to determine if there is a significant relationship between two categorical variables. The concepts of chi-square distribution, critical values, contingency tables, expected frequencies, and the logic of the tests. It also includes examples of applying these tests to real-life scenarios, such as testing hand preference independence from gender and checking uniformity of technical support calls distribution.

Typology: Slides

2012/2013

Uploaded on 01/31/2013

pakhi
pakhi 🇮🇳

4.6

(19)

84 documents

1 / 27

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Contingency Analysis
& Goodness-of-Fit Tests
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b

Partial preview of the text

Download Chi-Square Test and Contingency Analysis: Understanding Independence and Goodness-of-Fit and more Slides Statistics in PDF only on Docsity!

Contingency Analysis

& Goodness-of-Fit Tests

The Chi-square Distribution

  • The chi-square distribution is a family of

distributions, depending on degrees of

freedom:

  • d.f. = n - 1

2

0 4 8 12 16 20 24 28 0 4 8 12 16 20 24 28 0 4 8 12 16 20 24 28

d.f. = 1 d.f. = 5 d.f. = 15

χ^2 χ^2 χ^2

Finding the Critical Value

  • The critical value, , is found from the

chi-square table

4

Do not reject H 0 Reject H 0

α

χ^2 α

χ^2

χ^2 α

Contingency Tables

Contingency Tables

  • Situations involving multiple population

proportions

  • Used to classify sample observations

according to two or more characteristics

  • Also called a crosstabulation table.

Contingency Table Example

Sample results organized in a contingency table:

7

Gender

Hand Preference

Left Right

Female 12 108 120

Male 24 156 180

36 264 300

120 Females, 12 were left handed 180 Males, 24 were left handed

sample size = n = 300:

Logic of the Test

  • If H 0 is true, then the proportion of left-handed
females should be the same as the proportion of left-
handed males
  • The two proportions above should be the same as
the proportion of left-handed people overall

8

H 0 : Hand preference is independent of gender H (^) A: Hand preference is not independent of gender

Expected Cell Frequencies

  • Expected cell frequencies:

10

Total sample size

(i Row total)(j Column total) e

th th

ij =

  1. 4 300

( 120 )( 36 ) e 11 = =

Example:

Observed v. Expected Frequencies

Observed frequencies vs. expected frequencies:

11

Gender

Hand Preference Left Right

Female

Observed = 12 Expected = 14.

Observed = 108 Expected = 105.

120

Male

Observed = 24 Expected = 21.

Observed = 156 Expected = 158.

180

36 264 300

Observed v. Expected Frequencies

13

Gender

Hand Preference Left Right

Female

Observed = 12 Expected = 14.

Observed = 108 Expected = 105.

120

Male

Observed = 24 Expected = 21.

Observed = 156 Expected = 158.

180

36 264 300

  1. 6848
  2. 4

( 156 158. 4 )

  1. 6

( 24 21. 6 )

  1. 6

( 108 105. 6 )

  1. 4

χ^2 = (^12 −^14.^4 )^2 + −^2 + −^2 + −^2 =

Contingency Analysis

14

χ^2 .05 = 3.841^ χ^2 Reject H 0

α = 0.

Decision Rule: If χ^2 > 3.841, reject H 0 , otherwise, do not reject H (^0)

χ^2 = 0. 6848 with d.f.=(r -1)(c -1) = (1)(1) = 1

Do not reject H 0

Here, χ^2 = 0. < 3.841, so we do not reject H (^0) and conclude that gender and hand preference are independent

  • Are technical support calls equal across all days of the week? (i.e., do calls follow a uniform distribution?) - Sample data for 10 days per day of week: Sum of calls for this day: Monday 290 Tuesday 250 Wednesday 238 Thursday 257 Friday 265 Saturday 230 Sunday 192

16

Chi-Square Goodness-of-Fit Test

Σ = 1722

 If calls are uniformly distributed, the 1722 calls
would be expected to be equally divided across the
7 days:
 Chi-Square Goodness-of-Fit Test: test to see if the
sample results are consistent with the expected
results

17

Logic of Goodness-of-Fit Test

246 expectedcalls per day if uniform

Chi-Square Test Statistic

• The test statistic is

19

(where df k 1)

e

(o e )

i

2

χ 2 = i − i = −

where: k = number of categories oi = observed cell frequency for category i ei = expected cell frequency for category i

H 0 : The distribution of calls is uniform over days of the week H (^) A: The distribution of calls is not uniform

The Rejection Region

  • Reject H 0 if

20

− χ = i

2 (^2) i i e

(o e )

H 0 : The distribution of calls is uniform over days of the week H (^) A: The distribution of calls is not uniform

2 α

2 χ > χ

0

α

χ^2 α

Do not reject H Reject H 0 0

(with k – 1
degrees of
freedom)

χ^2