



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A set of slides from a ucla statistics 110b course taught by ivo dinov. The slides cover the concepts of categorical data, binomial and multinomial experiments, binomial and multinomial distributions, and goodness of fit tests using pearson's chi-square statistic. The slides also include examples and instructions on how to calculate expected frequencies, test statistics, and critical values.
Typology: Study notes
1 / 6
This page cannot be seen from the preview
Don't miss anything!




Stat 110B, UCLA, Ivo Dinov Slide 1
Asst. Prof. In Statistics and Neurology
z Teaching Assistants: Brian Ng, UCLA Statistics
University of California, Los Angeles, Spring 2003 http://www.stat.ucla.edu/~dinov/courses_students.html
Slide 2 Stat 110B, UCLA, Ivo Dinov
Categorical Data
Categorical Data is that which counts the number of outcomes falling into various categories. •Binomial Experiment – consists of two categories •Multinomial Experiment – consist of more than two categories
Slide 3 Stat 110B, UCLA, Ivo Dinov
Binomial Experiment
Slide 4 Stat 110B, UCLA, Ivo Dinov
Binomial Distribution Pdf, E[X], Var[X]
Multinomial Experiment
Multinomial Cont’d
n k k
n k
1
1
Slide 7 Stat 110B, UCLA, Ivo Dinov
Slide 8 Stat 110B, UCLA, Ivo Dinov
Testing Goodness of Fit with Specified Cell Probabilities
We wish to test whether the cell probabilities are specified by p 1 o,…, p (^) ko^ where p 1 o+…+pko^ = 1. We will use a test statistic to compare the observed cell count Ni to the expected cell count under Ho, Ei = npio Ho:p 1 = p 1 o, (and) …., (and) p (^) k = pko Ha : Some pi ≠ pio
Slide 9 Stat 110B, UCLA, Ivo Dinov
Test Statistic
=
k
i (^) i
i i
1
2
This is a Pearson’s goodness-of-fit statistic
Rejection Region: X^2 > χ (^) α^2 where χ^2 is the chi-squared distribution with k-1 degrees of freedom.
General Rule: We want np (^) io≥ 5 for all cells
Slide 10 Stat 110B, UCLA, Ivo Dinov
Example
A study is run to see whether the public favors the construction of a new dam. It is thought that 40% favor dam construction, 30% are neutral, 20% oppose the dam, and the rest have not thought about it. A random sample of 150 individuals are interviewed resulting in 42 in favor, 61 neutral, 33 opposed, and the rest have not though about it. Does the data indicate that the stated proportions are incorrect? Use α=0.01.
Example Cont’d
Ho: p 1 =0.4, p 2 =0.3, p 3 =0.2, p 4 =0. Ha : At least one probability is not as specified Test Statistic: X^2 Rejection Region: X^2 > χ^2 0.01, 3 = 11.
Favor Neutral Oppose Unaware Total ni 42 61 33 14 150 pio 0.4 0.3 0.2 0.1 1 Ei 60 45 30 15 150
2
2 2 2 2
Since X^2 = 11.46 > χ0.01,3,^2 = 11.34, we reject Ho. Conclude that at least one of the true proportions differs from that hypothesized
Slide 19 Stat 110B, UCLA, Ivo Dinov
Example 1.151.29 (^) 1.141.4 1.341.32 1.291.34 1.361.26 1.261.36 1.221.36 1.41. 1.28 1.45 1.29 1.28 1.38 1.55 1.46 1.32 Corr(N(0,1), Data)=0.
-1.18561-1.13997 1.141. -1.10793-1.01801 1.221. -0.76809-0.67098 1.261. -0.49035-0.43352 1.281. -0.37026-0.21241 1.291. -0.16153-0.10578 (^) 1.321. -0.09336-0.05053 1.321. 0.036083-0.03419^ 1.341. 0.1270260.141456 1.361. 0.5795520.93013 1.381. 0.9932251.02682 (^) 1.451. 1.0784971.468405 1.461.
AscendingOrder Stats: N(0,1) | Data R ~ Ryan-Joiner (α,n) RJ(0.01,24)= 0. RJ(0.10,24)= 0.
Since R o=0. Î R o > Critical Value Î Strong Correlation ÎCan’t Reject H (^) o 0.00.10.20.30.40.50.60.70.80.91.
For higher confidence, smaller Type I error α, we need smaller Correlations, R(N,D)
α
R^1
Slide 20 Stat 110B, UCLA, Ivo Dinov
Testing Homogeneity of Populations
*We wish to compare I multinomial populations, each with J categories. * Take ni samples from the ith population Let Nij be the number of observations from the i th^ population in the jth^ category. Hence, Σj Nij = ni Place the data in a I x J table
Slide 21 Stat 110B, UCLA, Ivo Dinov
Table
Category 1 2 …. J Total 1 n11 n12 …. n1J n1. 2 n21 n22 n2J n2. Pop.... …...
... …... ... …... I nI1 nI2 …. nIJ nI. Total n.1 n.2 …. n.J n
Slide 22 Stat 110B, UCLA, Ivo Dinov
Corresponding to each cell, there is a cell probability pij=probability and outcome for the i th^ population falls into the jth^ category, where Σj p (^) ij = 1
Category 1 2 …. J 1 p11 p12 …. p1J 2 p21 p22 p2J Pop.... …..
... ….. ... ….. I pI1 pI2 …. pIJ
Test
Ho: p1j = p2j = … = pIj , j = 1,…,J Ha : Some pij ≠ pi’j
Under Ho , the common cell probability p (^) j is estimated by
j j
⋅
Test Cont’d
The estimated expected cell frequency is
i j ij i j
⋅
The test statistic is
rows columns (^) ij
ij ij
2 2
Rejection Region: X^2 > χ^2 α with d.f. = (I-1)(J-1)
Slide 25 Stat 110B, UCLA, Ivo Dinov
Testing for Association
Row Categories – A 1 ,…,AI Column Categories – B 1 ,…,B (^) J
Slide 26 Stat 110B, UCLA, Ivo Dinov
n = Total number of observations nij = the number of individuals classified as Ai and Bj Hence, ΣΣ nij = n Ho: P(Ai∩Bj) = P(Ai)P(Bj) for all i,j Ha : Some P(Ai∩B (^) j) ≠ P(Ai)P(Bj)
Slide 27 Stat 110B, UCLA, Ivo Dinov
Expected Frequency:
Test Statistic:
rows columns (^) ij
ij ij
2
Rejection Region: X^2 > χ^2 α with d.f = (I-1)(J-1) Slide 28 Stat 110B, UCLA, Ivo Dinov
The Chi-square distribution
0 5 10 15 20
df = 2
df = 4 df = 7 df = 10
prob
( prob ) df
2
Lotto after 399 numbers have been drawn – Do some numbers appear more frequently in LOTTO?
0
10
1 10 20 30 40 Number on ball
(^20) 9.975 (Expected freq.)
Figure 11.1.4 Frequency of LOTTO winning numbers
TABLE 11.1.3 Frequency of Winning Numbers in LOTTO
1. (7) 2. (10) 3. (8) 4. (9) 5. (13) 6. (8) 7. (12) 8. (16) 9. (11) 10. (6) 11. (13) 12. (10) 13. (9) 14. (11) 15. (11) 16. (6) 17. (11) 18. (13) 19. (6) 20. (13) 21. (7) 22. (9) 23. (8) 24. (12) 25. (6) 26. (4) 27. (10) 28. (8) 29. (14) 30. (12) 31. (11) 32. (12) 33. (9) 34. (11) 35. (6) 36. (8) 37. (14) 38. (10) 39. (15) 40. (10)
Lotto after 399 numbers have been drawn – Do some numbers appear more frequently in LOTTO? Number-range: [1:40] Number of balls selected at each draw: 7 Number of samples: 57 Total number of balls selected: 57*7=399, Expected value of each number: 399/40 = 9. Observed χ^2 statistics is x 0 =30. df=40-1= P-value = 0. Conclusion: No evidence for departure from the null hypothesis.