ANOVA: Testing Means Equality and Identifying Group Differences, Study notes of Statistics

The concept of multiple comparisons in anova, which is used when we reject the null hypothesis that all treatment means are equal and want to identify which ones are significantly different. The overall error rate, planned and unplanned comparisons, and various methods for carrying out multiple comparisons, such as the bonferroni and newman-keuls procedures. It also includes examples and calculations for pairwise comparisons and interpreting the results.

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-3dg
koofers-user-3dg 🇺🇸

7 documents

1 / 20

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Multiple comparisons
When we carry out an ANOVA on k treatments, we test
H0:µ1=··· =µkversus Ha:H0is false
Assume we reject the null hypothesis, i.e. we have some evidence
that not all treatment means are equal. Then we could for example
be interested in which ones are the same, and which ones differ. For
this, we might have to carry out some more hypothesis tests.
This procedure is referred to as multiple comparisons.
1
Key issue
We will be conducting, say, Tdifferent tests, and we become
concerned about the overall error rate (sometimes called the
“family-wise” error rate).
Overall error rate = Pr(reject at least one H0|all H0are true)
= 1 [1Pr(reject first |first H0is true)]Tif independent
T×Pr(reject first |first H0is true)generally
2
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14

Partial preview of the text

Download ANOVA: Testing Means Equality and Identifying Group Differences and more Study notes Statistics in PDF only on Docsity!

Multiple comparisons

When we carry out an ANOVA on k treatments, we test

H 0 : μ 1 = · · · = μk versus Ha : H 0 is false

Assume we reject the null hypothesis, i.e. we have some evidence that not all treatment means are equal. Then we could for example be interested in which ones are the same, and which ones differ. For this, we might have to carry out some more hypothesis tests.

This procedure is referred to as multiple comparisons.

1

Key issue

We will be conducting, say, T different tests, and we become concerned about the overall error rate (sometimes called the “family-wise” error rate).

Overall error rate = Pr(reject at least one H 0 | all H 0 are true)

  

= 1 − [ 1 − Pr(reject first | first H 0 is true)]T^ if independent

≤ T × Pr(reject first | first H 0 is true) generally

Types of multiple comparisons

There are two different types of multiple comparisons procedures:

Sometimes we already know in advance what questions we want to answer. Those comparisons are called planned (or a priori) comparisons.

Sometimes we do not know in advance what questions we want to answer, and the judgement about which group means will be studied the same depends on the ANOVA outcome. Those comparisons are called unplanned (or a posteriori) comparisons.

3

The distinction

Planned comparisons: adjust for just those tests that are planned.

Unplanned comparisons: adjust for all possible comparisons.

Specific questions

The investigator wants to answer three specific questions:

  • Does the addition of sugars have an effect on the lengths of the pea sections?
  • Are there differences between the pure sugar treatments and the mixed sugar treatment?
  • Are there differences among the pure sugar treatments?

Planned comparisons! 7

The plant physiologist’s data

60 65 70 75 response

treatment

l lll (^) l l (^) l ll l

l (^) l l ll l l l l l

ll^ ll^ lll^ l ll

l lll lll ll l

l l l l l

ll ll l

C

G

F

G+F

S

ANOVA table

Source SS Df MS F-value p-value

Between treatment 1077.3 4 269.3 49.4 < 0.

Within treatment 245.5 45 5.

9

The first comparison

Compare the control to the others.

Test

μC –

μG + μF + μG+F + μS 4

Look at

y^ ¯C –

y¯G + ¯yF + ¯yG+F + ¯yS 4

ANOVA table

Source SS Df MS F-value p-value

Between treatment 1077.3 4 269.3 49.4 < 0.

Within treatment 245.5 45 5.

Control versus sugars

Source SS Df MS F-value p-value

Between treatment 832.3 1 832.3 152.4 < 0.

Within treatment 245.5 45 5. 13

Pure sugars versus mixed sugar

Source SS Df MS F-value p-value

Between treatment 48.1 1 48.1 8.82 0.

Within treatment 245.5 45 5.

Among pure sugars

Source SS Df MS F-value p-value

Between treatment 196.9 2 98.4 18.0 < 0.

Within treatment 245.5 45 5.

Orthogonal comparisons

Two comparisons c 1 = (c1,1,... , c1,k) and c 2 = (c2,1,... , c2,k) are orthogonal if and only if

∑^ k

t = 1

ntct,1ct,2 = 0

where nt is the number of observations in treatment group t.

15

Summary

source SS df MS F-value

Treatment 1077.3 4 269.3 49.

Control versus sugars 832.3 1 832.3 152.

Pure sugars versus mixed sugar 48.1 1 48.1 8.

Among pure sugars 196.9 2 98.4 18.

Within treatment 245.5 45 5.

Total 1322.8 49

“Unplanned” comparisons

Suppose we are comparing k treatment groups.

Suppose ANOVA indicates that you reject H 0 : μ 1 = · · · = μk

What next?

Which of the μ’s are different from which others?

Consider testing H 0 : μi = μj for all pairs i,j.

There are

(k 2

= k^ (k 2 − 1 )such pairs.

k = 5 −→

(k 2

19

Bonferroni correction

Suppose we have 10 treatment groups, and so 45 pairs.

If we perform 45 t-tests at the significance level α = 0.05, we’d expect to reject 5% × 45 ≈ 2 of them, even if all of the means were the same.

Let α = Pr(reject at least one pairwise test | all μ’s the same)

≤ (no. tests) × Pr(reject test #1 | μ’s the same)

The Bonferroni correction:

Use α′^ = α/(no. tests) as the significance level for each test.

For example, with 10 groups and so 45 pairwise tests, we’d use α′^ = 0.05 / 45 ≈ 0.0011 for each test.

56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 Coagulation Time

Diet A

Diet B

Diet C

Diet D

Combined

21

Pairwise comparisons

Comparison p-value α′′^ = α k = 0.05 6 = 0.

A vs B 0.

A vs C < 0.

A vs D 1.

B vs C 0.

B vs D < 0.

C vs D < 0.

Results

σ ˆ = 2.

n = 10 for each group

SE = 2.34 ×

2 /10 = 1.05 for each comparison.

df = 45, α′^ = 0.005 −→ t = 2.

Groups with

| Y¯i· − Y¯j·| > 2.69 × 1. = 2.

are deemed different.

Bonferroni-corrected CIs:

( Y¯i· − Y¯j·) ± 2.

25

Tukey’s HSD

HSD = “Honest significant difference”

Reject H 0 : μi = μj if

| Y¯i· − Y¯j·| > Qα(k, df) ×

MW/n

We’re assuming equal sample sizes (n) for the treatment groups.

k = no. treatment groups; df = n · k – k

Qα(k, df) = 1 – α quantile of the “Studentized range distribution.”

We won’t go into where Qα(k, df) comes from. Suffice it to say: it’s an adjustment not unlike the Bonferroni correction, and it can be calculated using qtukey() in R. Alternatively, the function TukeyHSD() will do the whole thing.

Results

Taking α = 0.05, k = 5, df = 45,

Qα(k, df) = 4.02. qtukey(0.95, 5, 45)

Groups with

| Y¯i· − Y¯j·| > 4.02 ×

are deemed different.

27

A comparison

Uncorrected:

Each interval, individually, had (in advance) a 95% chance of covering the true mean differ- ence.

Corrected:

(In advance) there was a greater than 95% chance that all of the intervals would cover there re- spective parameters.

Example

Sorted sample means:

G+F F G S C

Table of differences:

F G S C G+F 0.2 1.3 6.1 12. F 1.1 5.9 11. G 4.8 10. S 6. 31

Example (continued)

From the ANOVA table:

MW = 5.46 n = 10 for each group

MW/10 = 0.739 df = 45

The qi (using df=45 and α = 0.05):

q 2 q 3 q 4 q 5 2.85 3.43 3.77 4.

Ri = qi ×

MW/10:

R 2 R 3 R 4 R 5

Example (continued)

Table of differences:

F G S C G+F 0.2 1.3 6.1 12. F 1.1 5.9 11. G 4.8 10. S 6.

Ri = qi ×

MW/10:

R 2 R 3 R 4 R 5

33

Results

Sorted sample means:

G+F F G S C

58.0 58.2 59.3 64.1 70.

Interpretation:

G+F ≈ F ≈ G < S < C

Example (continued)

Table of differences:

C A B E D 3.3 10.4 11.1 19. C 7.1 7.8 15. A 0.7 8. B 8.

Ri = qi ×

MW/4:

R 2 R 3 R 4 R 5

37

Results

Sorted sample means:

D C A B E

29.6 32.9 40.0 40.7 48.

Interpretation:

{D, C, A, B} < E and D < {A, B}

Varying sample sizes

For the Tukey and Newman-Keuls methods, we assumed that the numbers of responses in each treatment group were the same.

What to do if they vary?

  • If they don’t vary too much, use 1/ni + 1 /nj in place of 2/n.
  • If they are quite different, it’s probably best to just stick with the Bonferroni correction.

39

Final words on multiple comparisons

10 20 30 40 50 60 70

J

I

H

G

F

E

D

C

B

A

resp

l l l l l

l ll l^ l

l l l l l

l l (^) l l l

l l^ l l l

l l l l l

l ll^ l l

l l l l l

l (^) l l ll

l ll l l

10 20 30 40 50 60 70

F−J

A−E

resp

l l^ l l ll l^ l ll ll l lll l ll lll l ll

l (^) l l l (^) l lll l l l l lll l lll (^) l lllll