














Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
[Week 6] Test for Goodness of Fit Test (Chi-squared Test)
Typology: Lecture notes
1 / 22
This page cannot be seen from the preview
Don't miss anything!















April 8, 2019
Mendel did much work in early genetics in the 19th Century, but it wasn’t appreciated until later. He conducted experiments on the distributions of traits in pea plants. In one experiment, he classified 556 peas according to shape (Round or Angular) and colour (Yellow or Green). He predicted that the 4 different ‘offspring’ (RY, RG, AY, AG) would occur in the ratio 9:3:3:1. He observed counts of 315, 108, 101 and 32.
Does Mendel’s theory fit the data?
Data results in the following frequency table and plot.
Category 0 1 2 3 Total Oi 19 34 27 20 100
Test whether the data could be modelled by Bin(n, p) for some p.
Putting this example into context, imagine a sports journalist claims that Michael Jordan’s free throws follow a Binomial distribution with probability 80%. Sports Example Larry Bird highlights
Context Consider a set of categorical data with g categories in which fall observed counts Oi, for i = 1, 2 ,... , g. A probability model is proposed for the categories, and we want to test whether it is adequate.
Preparation Construct the following table:
Class 1 2 3... g Totals Observed Counts O 1 O 2 O 3... Og
∑g i=1 Oi^ =^ n Expected Counts E 1 E 2 E 3... Eg
∑g i=1 Ei^ =^ n
Notes: I (^) Oi are given, and Ei need to be worked out from the hypothesised model H 0 , so Ei = nP (category i). I (^) Sometimes we need to estimate k parameter(s) of the model first, before we can work out Ei.
H H 0 : Model fits. vs H 1 : Model doesn’t fit. A Cochran’s Rule: Check that Ei ≥ 1 and no more than 20% of Ei are less than 5. If some of the Ei are too small, then we combine categories together. T I (^) Definition Formula: τ = ∑g i=
(Oi−Ei)^2 Ei ∼^ χ
2 g−k− 1 (under^ H^0 ) I (^) Calculation Formula: τ = ∑gi=1^ O (^2) i Ei −^ n^ ∼^ χ 2 g−k− 1 (under^ H^0 ) I (^) Large values of τ will argue against H 0 for H 1. (This indicates a difference between Oi and Ei.) I (^) The observed value is τ 0. P P -value = P (χ^2 g−k− 1 ≥ τ 0 ). C Weigh up the P -value.
H H 0 : Model 9:3:3:1 fits. vs H 1 : Model doesn’t fit. A Cochran’s Rule: All Ei ≥ 1 and no more than 20% of Ei are less than 5. T I (^) Calculation Formula: τ =
i=
O^2 i Ei −^556 ∼^ χ
2 3 (under^ H^0 ) I (^) Large values of τ will argue against H 0 for H 1 , as this indicates a difference between Oi and Ei.) I (^) The observed value is τ 0 = (^312315). 752 + (^104108). 252 + (^104101). 252 + (^3432). 752 − 556 ≈ 0. 47.
o=c(315,108,101,32) e=c(312.75,104.25,104.25,34.75) sum((o-e)^2/e)
[1] 0.
sum(o^2/e) - 556
[1] 0.
P P -value = P (χ^23 ≥ 0 .47) > 0. 25 using tables.
1-pchisq(0.47,3)
[1] 0.
C As the P -value is so large, the data is consistent with Mendel’s model.
H H 0 : U[0,1] Model fits. vs H 1 : Model doesn’t fit. A Cochran’s Rule: All Ei ≥ 1 and no more than 20% of Ei are less than 5. T I (^) Calculation Formula: τ =
i=
O^2 i Ei −^1000 ∼^ χ
2 9 (under^ H^0 ) I (^) Large values of τ will argue against H 0 for H 1 , as this indicates a difference between Oi and Ei.) I (^) The observed value is τ 0 = 1361002 + 1051002 +... 1171002 − 1000 = 29. 02.
o=c(136,105,107,89,97,84,76,84,105,117) e=c(100,100,100,100,100,100,100,100,100,100) sum((o-e)^2/e)
[1] 29.
sum(o^2/e) - 1000
[1] 29.
P P -value = P (χ^29 ≥ 29 .02) < 0. 01 using tables.
1-pchisq(29.02,9)
[1] 0.
C As the P -value is so small, the data is not consistent with random number generator.
(2) Construct the following table:
Category 0 1 2 3 Total Oi 19 34 27 20 100 Ei 13.03 38.02 36.97 11.98 100
as Ei =
i
(0.493)i(1 − 0 .493)^3 −i^ × 100 for i = 0, 1 , 2 , 3.
dbinom(x,3,0.493)\times
[1] 13.03238 38.01755 36.96775 11.
So the parameters are: g = 4, k = 1.
H H 0 : Bin(3,p) Model fits. vs H 1 : Model doesn’t fit. A Cochran’s Rule: All Ei ≥ 1 and no more than 20% of Ei are less than 5. T I (^) Calculation Formula: τ =
i=
O^2 i Ei −^100 ∼^ χ
2 4 − 1 − 1 =^ χ 2 2 (under^ H^0 ) I (^) Large values of τ will argue against H 0 for H 1 , as this indicates a difference between Oi and Ei. I (^) The observed value is τ 0 = (^1319). 032 +... (^1120). 982 − 100 ≈ 11. 2.
o=c(19,34,27,20)> e=c(13,38,37,12) sum((o-e)^2/e) [1] 11. sum(o^2/e) - 100 [1] 11.
I (^) χ^2 -test is a very flexible test because it can be applied for a range of situations. I (^) Fitting discrete probability to categories of counts (Mendel genetics) I (^) Fitting models for continuous data based using discrete intervals (Random number generator) I (^) We will introduce a final example, which looks at how to establish independence between two discrete variables. i.e. We test on the null hypothesis H 0 : variables are independent vs. HA : variables are not independent. I (^) Let’s begin with an example: imagine we have two variables in which we can tabulate counts of smoking and rate of lung cancer.
Table: Contibgency table of smoking status and lung cancer.
No smoking Smoking Total No lung cancer 200 1400 1600 Lung cancer 100 1300 1400 Total 300 2700 3000