Chi-Squared Test for Goodness of Fit: A Comprehensive Guide with Examples, Lecture notes of Statistics

[Week 6] Test for Goodness of Fit Test (Chi-squared Test)

Typology: Lecture notes

2018/2019

Uploaded on 04/20/2019

kefart
kefart 🇺🇸

4.4

(11)

55 documents

1 / 22

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Test for Goodness of Fit
(Chi-squared Test)
April 8, 2019
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16

Partial preview of the text

Download Chi-Squared Test for Goodness of Fit: A Comprehensive Guide with Examples and more Lecture notes Statistics in PDF only on Docsity!

Test for Goodness of Fit

(Chi-squared Test)

April 8, 2019

Example 1: Mendel’s Early Genetics Model

Mendel did much work in early genetics in the 19th Century, but it wasn’t appreciated until later. He conducted experiments on the distributions of traits in pea plants. In one experiment, he classified 556 peas according to shape (Round or Angular) and colour (Yellow or Green). He predicted that the 4 different ‘offspring’ (RY, RG, AY, AG) would occur in the ratio 9:3:3:1. He observed counts of 315, 108, 101 and 32.

Does Mendel’s theory fit the data?

Example 3: Testing Data fits a Binomial Model

Data results in the following frequency table and plot.

Category 0 1 2 3 Total Oi 19 34 27 20 100

Test whether the data could be modelled by Bin(n, p) for some p.

Putting this example into context, imagine a sports journalist claims that Michael Jordan’s free throws follow a Binomial distribution with probability 80%. Sports Example Larry Bird highlights

Steps for the Chi-Squared Test

Context Consider a set of categorical data with g categories in which fall observed counts Oi, for i = 1, 2 ,... , g. A probability model is proposed for the categories, and we want to test whether it is adequate.

Preparation Construct the following table:

Class 1 2 3... g Totals Observed Counts O 1 O 2 O 3... Og

∑g i=1 Oi^ =^ n Expected Counts E 1 E 2 E 3... Eg

∑g i=1 Ei^ =^ n

Notes: I (^) Oi are given, and Ei need to be worked out from the hypothesised model H 0 , so Ei = nP (category i). I (^) Sometimes we need to estimate k parameter(s) of the model first, before we can work out Ei.

H H 0 : Model fits. vs H 1 : Model doesn’t fit. A Cochran’s Rule: Check that Ei ≥ 1 and no more than 20% of Ei are less than 5. If some of the Ei are too small, then we combine categories together. T I (^) Definition Formula: τ = ∑g i=

(Oi−Ei)^2 Ei ∼^ χ

2 g−k− 1 (under^ H^0 ) I (^) Calculation Formula: τ = ∑gi=1^ O (^2) i Ei −^ n^ ∼^ χ 2 g−k− 1 (under^ H^0 ) I (^) Large values of τ will argue against H 0 for H 1. (This indicates a difference between Oi and Ei.) I (^) The observed value is τ 0. P P -value = P (χ^2 g−k− 1 ≥ τ 0 ). C Weigh up the P -value.

H H 0 : Model 9:3:3:1 fits. vs H 1 : Model doesn’t fit. A Cochran’s Rule: All Ei ≥ 1 and no more than 20% of Ei are less than 5. T I (^) Calculation Formula: τ =

i=

O^2 i Ei −^556 ∼^ χ

2 3 (under^ H^0 ) I (^) Large values of τ will argue against H 0 for H 1 , as this indicates a difference between Oi and Ei.) I (^) The observed value is τ 0 = (^312315). 752 + (^104108). 252 + (^104101). 252 + (^3432). 752 − 556 ≈ 0. 47.

o=c(315,108,101,32) e=c(312.75,104.25,104.25,34.75) sum((o-e)^2/e)

[1] 0.

sum(o^2/e) - 556

[1] 0.

P P -value = P (χ^23 ≥ 0 .47) > 0. 25 using tables.

1-pchisq(0.47,3)

[1] 0.

C As the P -value is so large, the data is consistent with Mendel’s model.

H H 0 : U[0,1] Model fits. vs H 1 : Model doesn’t fit. A Cochran’s Rule: All Ei ≥ 1 and no more than 20% of Ei are less than 5. T I (^) Calculation Formula: τ =

i=

O^2 i Ei −^1000 ∼^ χ

2 9 (under^ H^0 ) I (^) Large values of τ will argue against H 0 for H 1 , as this indicates a difference between Oi and Ei.) I (^) The observed value is τ 0 = 1361002 + 1051002 +... 1171002 − 1000 = 29. 02.

o=c(136,105,107,89,97,84,76,84,105,117) e=c(100,100,100,100,100,100,100,100,100,100) sum((o-e)^2/e)

[1] 29.

sum(o^2/e) - 1000

[1] 29.

P P -value = P (χ^29 ≥ 29 .02) < 0. 01 using tables.

1-pchisq(29.02,9)

[1] 0.

C As the P -value is so small, the data is not consistent with random number generator.

(2) Construct the following table:

Category 0 1 2 3 Total Oi 19 34 27 20 100 Ei 13.03 38.02 36.97 11.98 100

as Ei =

i

(0.493)i(1 − 0 .493)^3 −i^ × 100 for i = 0, 1 , 2 , 3.

dbinom(x,3,0.493)\times

[1] 13.03238 38.01755 36.96775 11.

So the parameters are: g = 4, k = 1.

H H 0 : Bin(3,p) Model fits. vs H 1 : Model doesn’t fit. A Cochran’s Rule: All Ei ≥ 1 and no more than 20% of Ei are less than 5. T I (^) Calculation Formula: τ =

i=

O^2 i Ei −^100 ∼^ χ

2 4 − 1 − 1 =^ χ 2 2 (under^ H^0 ) I (^) Large values of τ will argue against H 0 for H 1 , as this indicates a difference between Oi and Ei. I (^) The observed value is τ 0 = (^1319). 032 +... (^1120). 982 − 100 ≈ 11. 2.

o=c(19,34,27,20)> e=c(13,38,37,12) sum((o-e)^2/e) [1] 11. sum(o^2/e) - 100 [1] 11.

Chi-square test for independence

I (^) χ^2 -test is a very flexible test because it can be applied for a range of situations. I (^) Fitting discrete probability to categories of counts (Mendel genetics) I (^) Fitting models for continuous data based using discrete intervals (Random number generator) I (^) We will introduce a final example, which looks at how to establish independence between two discrete variables. i.e. We test on the null hypothesis H 0 : variables are independent vs. HA : variables are not independent. I (^) Let’s begin with an example: imagine we have two variables in which we can tabulate counts of smoking and rate of lung cancer.

Table: Contibgency table of smoking status and lung cancer.

No smoking Smoking Total No lung cancer 200 1400 1600 Lung cancer 100 1300 1400 Total 300 2700 3000