Nonparametric Testing for Location Shifts: Kruskal-Wallis and Wilcoxon Rank Sum - Prof. R., Study notes of Data Analysis & Statistical Methods

An overview of nonparametric methods for testing location parameters, specifically the kruskal-wallis and wilcoxon rank sum tests. The purpose and assumptions of these tests, their test statistics, hypotheses, and decision rules. The kruskal-wallis test is designed to deal with non-normality and is most sensitive to population distributions that are equal up to a location shift. The wilcoxon rank sum test, also known as the mann-whitney test, assumes samples are taken from two independent, continuous populations and tests for median differences. Examples and computational details.

Typology: Study notes

Pre 2010

Uploaded on 12/09/2010

wk2151
wk2151 🇺🇸

3 documents

1 / 31

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Nonparametric Methods for
Location Parameters
Kruskal-Wallis: FCSM 8.5
Wilcoxon Rank Sum: FCSM 6.3
Wilcoxon
Signed
Rank: FCSM 6 5
Wilcoxon
Signed
-
Rank:
FCSM
6
.
5
1
BTRY 6010 & ILRST 6100Nonparametric methods
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f

Partial preview of the text

Download Nonparametric Testing for Location Shifts: Kruskal-Wallis and Wilcoxon Rank Sum - Prof. R. and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

Nonparametric Methods for Location Parameters

Kruskal-Wallis: FCSM 8.5 Wilcoxon Rank Sum: FCSM 6.3Wilcoxon Signed Rank: FCSM 6 5Wilcoxon Signed-Rank: FCSM 6.

BTRY 6010 & ILRST 6100

Nonparametric methods

Kruskal

-Wallis (KW) Test

Kruskal Wallis (KW) Test 

Similar in spirit to ANOVA – key assumptions include

Independent observations (within and between groups) 

Independent observations (within and between groups) 

Data are

continuous

& from independent populations

The test is intended and designed to:

g

deal with

non-normality

by converting data to

ranks

be most sensitive to population distributions that are equalup to a

location shift

same except location e g

medians

up to a

location

shift

same

except location, e.g.,

medians

KW null and alternative hypotheses:

H

: All population distributions are identical 0

(Hence there

H

: All population distributions are identical. (Hence there 0 is equality of means, variances, everything!) 

H

: All distributions are a

not

identical, and at least one tends

to give larger observations than at least one otherto give larger observations than at least one other.

BTRY 6010 & ILRST 6100

Nonparametric methods

The KW tests asks (

without

assuming normality!): Can the

differences among the medians shown in these boxplots bereasonably explained as random variation from identicaldistributions?

BTRY 6010 & ILRST 6100

Nonparametric methods

Test details:Test

details:

There are

K

groups; combine all data together.

Ignoring the group labels, rank the

n

1

n

K^

data

values in ascending order (i.e., smallest observationhas rank = 1 2

nd

smallest has rank = 2 etc

has rank = 1, 2

nd

smallest has rank = 2, etc…).

Associate with each observation its rank in this com-bined sample

Then return the data to the original

bined sample. Then return the data to the originalgroups, so each rank we just computed using thecombined data now has a group label.

Key idea: compare the average rank values for the^ K

groups; small / large differences support

H

0

/^

H

a^

.^5

BTRY 6010 & ILRST 6100

Nonparametric methods

The actual testThe

actual test

Sort

yij

for

j =

^1

,…,n

, i =i

^1
,…,K

into increasing order.

F

j^

i^

K

d fi

R

k^

i^

d t

F

or

j^

^1

,…,n

, i = 1,…,K:i

d

efine

R

ij^ = rank assigned to

y

ij

Compute the average rank for group

i :

1 1

ni

i^

ij j i

T^

R

n^

^

Basic test statistic (assumes no tied data values):

(^1) j n^ i

2 1

K^

n

^
^

H compares average

1

tot^2

i^

i

i

tot

tot

n

n

T

n

n

H

^
^
^

H compares

average

group rank to valuethat we expect toobserve under

H

0

^

2

1

(^

K

i^

i^

tot

i

tot

tot

n T

n

n

n

^
^
^

Computationallyconvenient form

1

(^

i

tot

tot

n

n

^

BTRY 6010 & ILRST 6100

Nonparametric methods

Hypotheses:

ypo

eses

H

: The 0

K

groups (i.e., populations) are identical.

H

: All distributions are a

not

identical, and at least one

tends to give larger observations than at least oneother. (A simple version: at least one population hasa different median) than the others.)a different median) than the others.)

As always: we need to know something about thesampling distribution of

H

under

H

0

p

g

0

Tabulation of distribution is not sensible for

H

(see

Slide 23), and exact calculations are quite difficult.

)^

q

JMP and other software packages deal with thesedifficulties by using approximations.

Ties create problems with ranks: adjust

H

? how?

BTRY 6010 & ILRST 6100

Nonparametric methods

Example:

Sand in Concrete

Example:

Sand in Concrete

A manufacturer of concrete bridge supports is interested indetermining the effect of varying the sand content on thedetermining the effect of varying the sand content on thestrength of the supports. Five supports are made for eachof five different amounts of sand in the concrete mix andeach is tested for compression resistanceeach is tested for compression resistance.

Percent Sand

15%

20%

25%

30%

35%

15%

20%

25%

30%

35%

7

17

14

20

7

7

12

18

24

10

7

12

18

24

10

10

11

18

22

11

15

18

19

19

15

9

19

19

23

11

BTRY 6010 & ILRST 6100

Nonparametric methods

Standard ANOVA:

K^

= 5,

n

tot

= 25

ANOVA

Table

SSBSSESST

There is strong evidence against

H

0

  • the

p

-value is very small

(much less than

α

= 0.05). Hence we conclude that the mean

degree of resistance is not the same for the five levels of sanddegree of resistance is not the same for the five levels of sand.

BTRY 6010 & ILRST 6100

Nonparametric methods

Example:

Ministers & Mental Illness

Three independent samples of

40

.^

.05.

.

.

.

.90.

.

Distributions^ Residual Score

Example:

Ministers & Mental Illnessp^

p

ministers selected from threespecific religious denominations(Catholic Methodist and

30 20 10 0 10

(Catholic, Methodist, andPentecostal). 

Response

variable

-10 -20 -

-^ -^ -^

0

1

2

3

Normal Quantile Plot

Response variable Y = mental illness awareness score

Are there differences between

Shapiro Wilk: p = 0.

the distributions of these scoresacross the three denominations?

BTRY 6010 & ILRST 6100

Nonparametric methods

’^

f

Levene’s Test for Minister data:

P -value of Levene’s test >

α

= 0.05.

S^

t^

th t

t^

l^

t

Suggests that we cannot rule outequal variances assumption.

BTRY 6010 & ILRST 6100

Nonparametric methods

Only an approximate

p

-value can be obtained when

using chi-square table (Table 8 in O&L):

H = 3.2532 < 4.

p

-value > 0.

Note: The standard ANOVA that assumes normality leads to a p

-

Note:

The standard ANOVA that assumes normality leads to a p

value of 0.1683. The ANOVA F test and KW test nearly always leadto different quantitative results but frequently come to the sameconclusion – especially when variances are comparable and normality

y^

y

is not severely violated.

16

BTRY 6010 & ILRST 6100

Nonparametric methods

Wilcoxon

Rank Sum (WRS) Test

Wilcoxon

Rank

Sum (WRS) Test

Equivalent to Mann-Whitney test, so also called MWW

Assumes samples are taken from two independent,continuous populations (possibly non-normal).

H

: The two population distributions are identical. 0 (So means, variances, & every other feature are equal.) H

Distrib tions are identical e cept location shift

H

: Distributions are identical except location shift. a (Example: median or

Q

1

or

Q

3

differs.)

( H

another form: Not equal

Too broad to be useful!)

( H

, another form: Not equal. Too broad to be useful!) a

Intuition for location shift

H

: WRS tests for median a

differences.

WRS is appropriate for comparing skewed

g

or heavy-tailed distributions. (ANOVA is not!)

BTRY 6010 & ILRST 6100

Nonparametric methods

Standard form of the WRS testStandard

form of the WRS test

Compute the test statistic, say

T

, labeling as “population

1” the population that has the smaller sample size 1

the population that has the smaller sample size

Hypotheses:^ 

H

: The two populations are identical 0

H

: The two populations are identical. 0

Three possibilities for

H

a^ (one-sided & two-sided)

^

H

: Population 1 is shifted to the right of population 2. a

^

H

: Population 1 is shifted to the left of population 2. a

^

H

: Populations 1 & 2 have different location parameters. a

In practice: we rarely use one sided version of this test 

In practice: we rarely use one-sided version of this test.As suggested on Slide 12, two-sided version of test isequivalent to KW test (for the case of

K

= 2 groups).

BTRY 6010 & ILRST 6100

Nonparametric methods

Example:

Environmental Contamination

Example:

Environmental Contamination

The raw data

Data, now sorted on logPPM,with column of ranks added inwith column of ranks added in Notice: Ranks are the same whether

nbackground

= 7, n

site

= 7

we rank items on PPM or logPPM

BTRY 6010 & ILRST 6100

Nonparametric methods