Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Nonparametric Testing for Location Shifts: Kruskal-Wallis and Wilcoxon Rank Sum - Prof. R., Study notes of Data Analysis & Statistical Methods

Cornell University Data Analysis & Statistical Methods

Prof. R. Strawderman

An overview of nonparametric methods for testing location parameters, specifically the kruskal-wallis and wilcoxon rank sum tests. The purpose and assumptions of these tests, their test statistics, hypotheses, and decision rules. The kruskal-wallis test is designed to deal with non-normality and is most sensitive to population distributions that are equal up to a location shift. The wilcoxon rank sum test, also known as the mann-whitney test, assumes samples are taken from two independent, continuous populations and tests for median differences. Examples and computational details.

Typology: Study notes

Pre 2010

Uploaded on 12/09/2010

wk2151 🇺🇸

3 documents

1 / 31

This page cannot be seen from the preview

Don't miss anything!

Nonparametric Methods for

Location Parameters

Kruskal-Wallis: FCSM 8.5

Wilcoxon Rank Sum: FCSM 6.3

Wilcoxon

Signed

Rank: FCSM 6 5

Wilcoxon

Signed

Rank:

FCSM

BTRY 6010 & ILRST 6100Nonparametric methods

Discover Study notes of Data Analysis & Statistical Methods Cornell University

Partial preview of the text

Download Nonparametric Testing for Location Shifts: Kruskal-Wallis and Wilcoxon Rank Sum - Prof. R. and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

Nonparametric Methods for Location Parameters

Kruskal-Wallis: FCSM 8.5 Wilcoxon Rank Sum: FCSM 6.3Wilcoxon Signed Rank: FCSM 6 5Wilcoxon Signed-Rank: FCSM 6.

BTRY 6010 & ILRST 6100

Nonparametric methods

Kruskal

-Wallis (KW) Test

Kruskal Wallis (KW) Test 

Similar in spirit to ANOVA – key assumptions include



Independent observations (within and between groups) 

Data are

continuous

& from independent populations



The test is intended and designed to:

g



deal with

non-normality

by converting data to

ranks



be most sensitive to population distributions that are equalup to a

location shift

same except location e g

medians

up to a

location

shift

same

except location, e.g.,

medians



KW null and alternative hypotheses:



: All population distributions are identical 0

(Hence there



: All population distributions are identical. (Hence there 0 is equality of means, variances, everything!) 

: All distributions are a

not

identical, and at least one tends

to give larger observations than at least one otherto give larger observations than at least one other.

BTRY 6010 & ILRST 6100

Nonparametric methods

The KW tests asks (

without

assuming normality!): Can the

differences among the medians shown in these boxplots bereasonably explained as random variation from identicaldistributions?

BTRY 6010 & ILRST 6100

Nonparametric methods

Test details:Test

details:



There are

K

groups; combine all data together.



Ignoring the group labels, rank the

n

data

values in ascending order (i.e., smallest observationhas rank = 1 2

smallest has rank = 2 etc

has rank = 1, 2

smallest has rank = 2, etc…).



Associate with each observation its rank in this com-bined sample

Then return the data to the original

bined sample. Then return the data to the originalgroups, so each rank we just computed using thecombined data now has a group label.



Key idea: compare the average rank values for the^ K

groups; small / large differences support

H

/^

H

.^5

BTRY 6010 & ILRST 6100

Nonparametric methods

The actual testThe

actual test



Sort

yij

for

j =

^1

,…,n

, i =i

^1

,…,K

into increasing order.

F

K

d fi

R

d t



F

^1

,…,n

, i = 1,…,K:i

efine

R

ij^ = rank assigned to



Compute the average rank for group

i :

1 1

ij j i



^





Basic test statistic (assumes no tied data values):

(^1) j n^ i



2 1

^

^

H compares average

tot^2

tot

T

H



^

^

^

H compares

average

group rank to valuethat we expect toobserve under

^



(^

tot

n T



^

^

^

Computationallyconvenient form

(^

tot





^

BTRY 6010 & ILRST 6100

Nonparametric methods



Hypotheses:

ypo

eses



H

: The 0

K

groups (i.e., populations) are identical.



H

: All distributions are a

not

identical, and at least one

tends to give larger observations than at least oneother. (A simple version: at least one population hasa different median) than the others.)a different median) than the others.)



As always: we need to know something about thesampling distribution of

H

under

H

p

g



Tabulation of distribution is not sensible for

H

(see

Slide 23), and exact calculations are quite difficult.

)^

q

JMP and other software packages deal with thesedifficulties by using approximations.



Ties create problems with ranks: adjust

H

? how?

BTRY 6010 & ILRST 6100

Nonparametric methods

Example:

Sand in Concrete

Example:

Sand in Concrete

A manufacturer of concrete bridge supports is interested indetermining the effect of varying the sand content on thedetermining the effect of varying the sand content on thestrength of the supports. Five supports are made for eachof five different amounts of sand in the concrete mix andeach is tested for compression resistanceeach is tested for compression resistance.

Percent Sand

15%

20%

25%

30%

35%

15%

20%

25%

30%

35%

BTRY 6010 & ILRST 6100

Nonparametric methods

Standard ANOVA:

= 5,

tot

= 25

ANOVA

Table

SSBSSESST

There is strong evidence against

-value is very small

(much less than

= 0.05). Hence we conclude that the mean

degree of resistance is not the same for the five levels of sanddegree of resistance is not the same for the five levels of sand.

BTRY 6010 & ILRST 6100

Nonparametric methods

Example:

Ministers & Mental Illness



Three independent samples of

.05.

.90.

Distributions^ Residual Score

Example:

Ministers & Mental Illnessp^

ministers selected from threespecific religious denominations(Catholic Methodist and

30 20 10 0 10

(Catholic, Methodist, andPentecostal). 

Response

variable

-10 -20 -

-^ -^ -^

Normal Quantile Plot

Response variable Y = mental illness awareness score 

Are there differences between

Shapiro Wilk: p = 0.

the distributions of these scoresacross the three denominations?

BTRY 6010 & ILRST 6100

Nonparametric methods

’^

Levene’s Test for Minister data:

P -value of Levene’s test >

= 0.05.

th t

Suggests that we cannot rule outequal variances assumption.

BTRY 6010 & ILRST 6100

Nonparametric methods

Only an approximate

-value can be obtained when

using chi-square table (Table 8 in O&L):

H = 3.2532 < 4.



-value > 0.

Note: The standard ANOVA that assumes normality leads to a p

Note:

The standard ANOVA that assumes normality leads to a p

value of 0.1683. The ANOVA F test and KW test nearly always leadto different quantitative results but frequently come to the sameconclusion – especially when variances are comparable and normality

is not severely violated.

BTRY 6010 & ILRST 6100

Nonparametric methods

Wilcoxon

Rank Sum (WRS) Test

Wilcoxon

Rank

Sum (WRS) Test



Equivalent to Mann-Whitney test, so also called MWW



Assumes samples are taken from two independent,continuous populations (possibly non-normal).



H

: The two population distributions are identical. 0 (So means, variances, & every other feature are equal.) H

Distrib tions are identical e cept location shift



H

: Distributions are identical except location shift. a (Example: median or

Q

differs.)



( H

another form: Not equal

Too broad to be useful!)



( H

, another form: Not equal. Too broad to be useful!) a



Intuition for location shift

H

: WRS tests for median a

differences.

WRS is appropriate for comparing skewed

or heavy-tailed distributions. (ANOVA is not!)

BTRY 6010 & ILRST 6100

Nonparametric methods

Standard form of the WRS testStandard

form of the WRS test



Compute the test statistic, say

T

, labeling as “population

1” the population that has the smaller sample size 1

the population that has the smaller sample size



Hypotheses:^ 

: The two populations are identical 0



: The two populations are identical. 0



Three possibilities for

a^ (one-sided & two-sided)

^

: Population 1 is shifted to the right of population 2. a

^

: Population 1 is shifted to the left of population 2. a

^

: Populations 1 & 2 have different location parameters. a



In practice: we rarely use one sided version of this test 

In practice: we rarely use one-sided version of this test.As suggested on Slide 12, two-sided version of test isequivalent to KW test (for the case of

K

= 2 groups).

BTRY 6010 & ILRST 6100

Nonparametric methods

Example:

Environmental Contamination

Example:

Environmental Contamination

The raw data

Data, now sorted on logPPM,with column of ranks added inwith column of ranks added in Notice: Ranks are the same whether

nbackground

= 7, n

site

= 7

we rank items on PPM or logPPM

BTRY 6010 & ILRST 6100

Nonparametric methods

Nonparametric Testing for Location Shifts: Kruskal-Wallis and Wilcoxon Rank Sum - Prof. R., Study notes of Data Analysis & Statistical Methods

Related documents

Partial preview of the text

Download Nonparametric Testing for Location Shifts: Kruskal-Wallis and Wilcoxon Rank Sum - Prof. R. and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

Nonparametric Methods for Location Parameters

Kruskal-Wallis: FCSM 8.5 Wilcoxon Rank Sum: FCSM 6.3Wilcoxon Signed Rank: FCSM 6 5Wilcoxon Signed-Rank: FCSM 6.

Kruskal

-Wallis (KW) Test

Kruskal Wallis (KW) Test 

Similar in spirit to ANOVA – key assumptions include

The test is intended and designed to:

g

KW null and alternative hypotheses:

There are

K

groups; combine all data together.

Ignoring the group labels, rank the

n

n

data

values in ascending order (i.e., smallest observationhas rank = 1 2

smallest has rank = 2 etc

has rank = 1, 2

smallest has rank = 2, etc…).

Associate with each observation its rank in this com-bined sample

Then return the data to the original

bined sample. Then return the data to the originalgroups, so each rank we just computed using thecombined data now has a group label.

Key idea: compare the average rank values for the^ K

groups; small / large differences support

H

/^

H

.^5

^1

^1

,…,K

F

K

R

F

^1

R

^

^

T

H

^

^

^

(^

^

^

^

(^

^

Hypotheses:

ypo

eses

H

K

H

As always: we need to know something about thesampling distribution of

H

under

H

p

g

Tabulation of distribution is not sensible for

H

(see

Slide 23), and exact calculations are quite difficult.

)^

q

JMP and other software packages deal with thesedifficulties by using approximations.

Ties create problems with ranks: adjust

H

? how?

Example:

Sand in Concrete

Example: