Applied Statistics for Engineering and Sciences: Goodness of Fit and Chi-Square Tests, Study notes of Statistics

A set of slides from a ucla statistics 110b course taught by ivo dinov. The slides cover the concepts of categorical data, binomial and multinomial experiments, binomial and multinomial distributions, and goodness of fit tests using pearson's chi-square statistic. The slides also include examples and instructions on how to calculate expected frequencies, test statistics, and critical values.

Typology: Study notes

Pre 2010

Uploaded on 08/26/2009

koofers-user-em4
koofers-user-em4 🇺🇸

9 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
S
tat 110B, UCLA, Ivo Dinov Slide 1
UCLA STAT 110B
Applied Statistics for Engineering
and the Sciences
zInstructor: Ivo Dinov,
Asst. Prof. In Statistics and Neurology
zTeaching Assistants: Brian Ng, UCLA Statistics
University of California, Los Angeles, Spring 2003
http://www.stat.ucla.edu/~dinov/courses_students.html
Stat 110B, UCLA, Ivo DinovSlide 2
Categorical Data
Categorical Data is that which counts the
number of outcomes falling into various
categories.
•Binomial Experiment – consists of two
categories
•Multinomial Experiment – consist of more
than two categories
Stat 110B, UCLA, Ivo DinovSlide 3
Binomial Experiment
n independent trials
Two possible outcomes (S) success and (F) failure
p = Probability of success on each trial
X = Number of successes in n trials
Stat 110B, UCLA, Ivo DinovSlide 4
Binomial Distribution
Pdf, E[X], Var[X]
Stat 110B, UCLA, Ivo DinovSlide 5
Multinomial Experiment
n independent trials results in one of k
possible categories labeled 1, …, k
•p
i= the probability of a trial resulting in the
ith category, where p1+…+pk= 1
•N
i= number of trials resulting in the ith
category, where N1+…+Nk= n
Stat 110B, UCLA, Ivo DinovSlide 6
Multinomial Cont’d
The random variables N1,…,Nkhave a
multinomial distribution
k
n
k
n
k
kpp
nn
n
nnp
=1
1
1
1!!
!
),...,(
pf3
pf4
pf5

Partial preview of the text

Download Applied Statistics for Engineering and Sciences: Goodness of Fit and Chi-Square Tests and more Study notes Statistics in PDF only on Docsity!

Stat 110B, UCLA, Ivo Dinov Slide 1

UCLA STAT 110B

Applied Statistics for Engineering

and the Sciences

z Instructor: Ivo Dinov,

Asst. Prof. In Statistics and Neurology

z Teaching Assistants: Brian Ng, UCLA Statistics

University of California, Los Angeles, Spring 2003 http://www.stat.ucla.edu/~dinov/courses_students.html

Slide 2 Stat 110B, UCLA, Ivo Dinov

Categorical Data

Categorical Data is that which counts the number of outcomes falling into various categories. •Binomial Experiment – consists of two categories •Multinomial Experiment – consist of more than two categories

Slide 3 Stat 110B, UCLA, Ivo Dinov

Binomial Experiment

  • n independent trials
  • Two possible outcomes (S) success and (F) failure
  • p = Probability of success on each trial
  • X = Number of successes in n trials

Slide 4 Stat 110B, UCLA, Ivo Dinov

Binomial Distribution Pdf, E[X], Var[X]

Multinomial Experiment

  • n independent trials results in one of k possible categories labeled 1, …, k
  • p (^) i = the probability of a trial resulting in the ith category, where p 1 +…+pk = 1
  • Ni = number of trials resulting in the ith category, where N 1 +…+Nk = n

Multinomial Cont’d

  • The random variables N 1 ,…,Nk have a multinomial distribution

n k k

n k

k p p

n n

n

p n n ⋅⋅⋅

1

1

Slide 7 Stat 110B, UCLA, Ivo Dinov

Multinomial Cont’d

  • Expected Value: E[Ni] = np (^) i = Ei
  • Variance: Var [Ni] = np (^) iq (^) i
  • Covariance: Cov [Ni, Nj ] = -np (^) ipj

Slide 8 Stat 110B, UCLA, Ivo Dinov

Testing Goodness of Fit with Specified Cell Probabilities

We wish to test whether the cell probabilities are specified by p 1 o,…, p (^) ko^ where p 1 o+…+pko^ = 1. We will use a test statistic to compare the observed cell count Ni to the expected cell count under Ho, Ei = npio Ho:p 1 = p 1 o, (and) …., (and) p (^) k = pko Ha : Some pi ≠ pio

Slide 9 Stat 110B, UCLA, Ivo Dinov

Test Statistic

=

k

i (^) i

i i

E

N E

X

1

2

This is a Pearson’s goodness-of-fit statistic

Rejection Region: X^2 > χ (^) α^2 where χ^2 is the chi-squared distribution with k-1 degrees of freedom.

General Rule: We want np (^) io≥ 5 for all cells

Slide 10 Stat 110B, UCLA, Ivo Dinov

Example

A study is run to see whether the public favors the construction of a new dam. It is thought that 40% favor dam construction, 30% are neutral, 20% oppose the dam, and the rest have not thought about it. A random sample of 150 individuals are interviewed resulting in 42 in favor, 61 neutral, 33 opposed, and the rest have not though about it. Does the data indicate that the stated proportions are incorrect? Use α=0.01.

Example Cont’d

Ho: p 1 =0.4, p 2 =0.3, p 3 =0.2, p 4 =0. Ha : At least one probability is not as specified Test Statistic: X^2 Rejection Region: X^2 > χ^2 0.01, 3 = 11.

Favor Neutral Oppose Unaware Total ni 42 61 33 14 150 pio 0.4 0.3 0.2 0.1 1 Ei 60 45 30 15 150

2

2 2 2 2

X =

Since X^2 = 11.46 > χ0.01,3,^2 = 11.34, we reject Ho. Conclude that at least one of the true proportions differs from that hypothesized

Slide 19 Stat 110B, UCLA, Ivo Dinov

Example 1.151.29 (^) 1.141.4 1.341.32 1.291.34 1.361.26 1.261.36 1.221.36 1.41. 1.28 1.45 1.29 1.28 1.38 1.55 1.46 1.32 Corr(N(0,1), Data)=0.

-1.18561-1.13997 1.141. -1.10793-1.01801 1.221. -0.76809-0.67098 1.261. -0.49035-0.43352 1.281. -0.37026-0.21241 1.291. -0.16153-0.10578 (^) 1.321. -0.09336-0.05053 1.321. 0.036083-0.03419^ 1.341. 0.1270260.141456 1.361. 0.5795520.93013 1.381. 0.9932251.02682 (^) 1.451. 1.0784971.468405 1.461.

AscendingOrder Stats: N(0,1) | Data R ~ Ryan-Joiner (α,n) RJ(0.01,24)= 0. RJ(0.10,24)= 0.

Since R o=0. Î R o > Critical Value Î Strong Correlation ÎCan’t Reject H (^) o 0.00.10.20.30.40.50.60.70.80.91.

1.0 H^ o: Data is Normal

For higher confidence, smaller Type I error α, we need smaller Correlations, R(N,D)

α

R^1

Slide 20 Stat 110B, UCLA, Ivo Dinov

Testing Homogeneity of Populations

*We wish to compare I multinomial populations, each with J categories. * Take ni samples from the ith population Let Nij be the number of observations from the i th^ population in the jth^ category. Hence, Σj Nij = ni Place the data in a I x J table

Slide 21 Stat 110B, UCLA, Ivo Dinov

Table

Category 1 2 …. J Total 1 n11 n12 …. n1J n1. 2 n21 n22 n2J n2. Pop.... …...

... …... ... …... I nI1 nI2 …. nIJ nI. Total n.1 n.2 …. n.J n

Slide 22 Stat 110B, UCLA, Ivo Dinov

Corresponding to each cell, there is a cell probability pij=probability and outcome for the i th^ population falls into the jth^ category, where Σj p (^) ij = 1

Category 1 2 …. J 1 p11 p12 …. p1J 2 p21 p22 p2J Pop.... …..

... ….. ... ….. I pI1 pI2 …. pIJ

Test

Ho: p1j = p2j = … = pIj , j = 1,…,J Ha : Some pij ≠ pi’j

Under Ho , the common cell probability p (^) j is estimated by

n

n

p

j j

Test Cont’d

The estimated expected cell frequency is

n

nn

E np

i j ij i j

The test statistic is

rows columns (^) ij

ij ij

E

n E

X

2 2

Rejection Region: X^2 > χ^2 α with d.f. = (I-1)(J-1)

Slide 25 Stat 110B, UCLA, Ivo Dinov

Testing for Association

  • Individuals are categorized by two categorical variables. We wish to determine whether these variables are associated. *

Row Categories – A 1 ,…,AI Column Categories – B 1 ,…,B (^) J

Slide 26 Stat 110B, UCLA, Ivo Dinov

n = Total number of observations nij = the number of individuals classified as Ai and Bj Hence, ΣΣ nij = n Ho: P(Ai∩Bj) = P(Ai)P(Bj) for all i,j Ha : Some P(Ai∩B (^) j) ≠ P(Ai)P(Bj)

Slide 27 Stat 110B, UCLA, Ivo Dinov

Expected Frequency:

n

n xn

ˆ = i ⋅ ⋅j

Eij

Test Statistic:

rows columns (^) ij

ij ij

E

n E

X

2

Rejection Region: X^2 > χ^2 α with d.f = (I-1)(J-1) Slide 28 Stat 110B, UCLA, Ivo Dinov

The Chi-square distribution

0 5 10 15 20

df = 2

df = 4 df = 7 df = 10

prob

( prob ) df

2

Lotto after 399 numbers have been drawn – Do some numbers appear more frequently in LOTTO?

0

10

1 10 20 30 40 Number on ball

(^20) 9.975 (Expected freq.)

Figure 11.1.4 Frequency of LOTTO winning numbers

TABLE 11.1.3 Frequency of Winning Numbers in LOTTO

1. (7) 2. (10) 3. (8) 4. (9) 5. (13) 6. (8) 7. (12) 8. (16) 9. (11) 10. (6) 11. (13) 12. (10) 13. (9) 14. (11) 15. (11) 16. (6) 17. (11) 18. (13) 19. (6) 20. (13) 21. (7) 22. (9) 23. (8) 24. (12) 25. (6) 26. (4) 27. (10) 28. (8) 29. (14) 30. (12) 31. (11) 32. (12) 33. (9) 34. (11) 35. (6) 36. (8) 37. (14) 38. (10) 39. (15) 40. (10)

Lotto after 399 numbers have been drawn – Do some numbers appear more frequently in LOTTO? Number-range: [1:40] Number of balls selected at each draw: 7 Number of samples: 57 Total number of balls selected: 57*7=399, Expected value of each number: 399/40 = 9. Observed χ^2 statistics is x 0 =30. df=40-1= P-value = 0. Conclusion: No evidence for departure from the null hypothesis.