Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Permutation tests, Exercises of Genetics

University of Pennsylvania (UPenn)Genetics

A permutation test gives a simple way to compute the sampling distribution for any test statistic, under the strong null hypothesis that a set of genetic ...

Typology: Exercises

2022/2023

Uploaded on 05/11/2023

bridge 🇺🇸

4.9

(13)

287 documents

1 / 27

This page cannot be seen from the preview

Don't miss anything!

Permutation tests

Ken Rice

Thomas Lumley

UW Biostatistics

Seattle, June 2008

Discover Exercises of Genetics University of Pennsylvania (UPenn)

Partial preview of the text

Download Permutation tests and more Exercises Genetics in PDF only on Docsity!

Permutation tests

Ken Rice Thomas Lumley

UW Biostatistics

Seattle, June 2008

Overview

Permutation tests
A mean
Smallest p-value across multiple models
Cautionary notes

Permutations

To estimate the sampling distribution of the test statistic we need many samples generated under the strong null hypothesis.

If the null hypothesis is true, changing the exposure would have no effect on the outcome. By randomly shuffling the exposures we can make up as many data sets as we like.

If the null hypothesis is true the shuffled data sets should look like the real data, otherwise they should look different from the real data.

The ranking of the real test statistic among the shuffled test statistics gives a p-value

Example: null is true

gender outcome

Data

gender outcome

Shuffling outcomes

gender outcome

Shuffling outcomes (ordered)

Means

Our first example is a difference in mean outcome in a dominant model for a single SNP

make up some ‘true’ data

carrier<-rep(c(0,1), c(100,200)) null.y<-rnorm(300) alt.y<-rnorm(300, mean=carrier/2)

In this case we know from theory the distribution of a difference in means and we could just do a t-test.

We can compare the t-test results to the results of a permutation test on the mean difference.

Means: t-test

t.test(null.y~carrier, var.equal=TRUE) Two Sample t-test

data: null.y by carrier t = 0.5348, df = 298, p-value = 0. alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.1740603 0. sample estimates: mean in group 0 mean in group 1 0.067211652 0.

t.test(alt.y~carrier, var.equal=TRUE) Two Sample t-test

data: alt.y by carrier t = -5.0415, df = 298, p-value = 8.033e- alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.8381897 -0. sample estimates: mean in group 0 mean in group 1 -0.1405250 0.

Means: permutation test

hist(many.truenull) abline(v=null.diff, lwd=2, col="purple") mean(abs(many.truenull) > abs(null.diff))

hist(many.falsenull) abline(v=alt.diff, lwd=2, col="purple") mean(abs(many.falsenull) > abs(alt.diff))

Example: null is true

616 shuffled differences exceeed true difference: p = 617/ 1001

How many permutations?

With 1000 permutations the smallest possible p-value is 0.001, and the uncertainty near p = 0.05 is about ±1%

If we have multiple testing we may need much more precision.

Using 100,000 permutations reduces the uncertainty near p = 0 .05 to ± 0 .1% and allows p-values as small as 0.00001.

A useful strategy is to start with 1000 permutations and continue to larger numbers only if p is small enough to be interesting, eg p < 0 .1.

Parallel computing of permutations is easy: just run R on multiple computers.

Debugging

R error messages are sometimes opaque, because the error occurs in a low-level function.

traceback() reports the entire call stack, which is useful for seeing where the error really happened.

Suppose our outcome variable were actually a data frame with one column rather than a vector:

one.test(carrier,ywrong) Error in ‘[.data.frame‘(y, xstar == 1) : undefined columns selected

We didn’t know we were calling ‘[.data.frame‘, so we don’t understand the message

Debugging

The post-mortem debugger lets you look inside the code where the error occurred.

options(error=recover) one.test(carrier,ywrong) Error in ‘[.data.frame‘(y, xstar == 1) : undefined columns selected

Enter a frame number, or 0 to exit

1: one.test(carrier, ywrong) 2: mean(y[xstar == 1]) 3: y[xstar == 1] 4: ‘[.data.frame‘(y, xstar == 1)

Selection: 1 Called from: eval(expr, envir, enclos)

Debugging

Browse[1]> str(xstar) num [1:300] 1 1 1 1 1 1 1 0 1 1 ... Browse[1]> str(y) ’data.frame’: 300 obs. of 1 variable: $ null.y: num 1.265 0.590 -0.722 0.676 -0. ...

Turn the post-mortem debugger off with

options(error=NULL)

Minimum p-value

dat <- data.frame(y=rep(0:1,each=100), SNP1=rbinom(200,2,.1), SNP2=rbinom(200,2,.2),SNP3=rbinom(200,2,.2), SNP4=rbinom(200,2,.4),SNP5=rbinom(200,2,.1), SNP6=rbinom(200,2,.2),SNP7=rbinom(200,2,.2), SNP8=rbinom(200,2,.4))

head(dat) y SNP1 SNP2 SNP3 SNP4 SNP5 SNP6 SNP7 SNP 1 0 0 0 0 0 0 1 0 0 2 0 0 1 0 1 0 1 0 2 3 0 0 1 0 1 1 0 0 0 4 0 0 0 1 1 0 0 0 0 5 0 0 1 0 1 1 0 0 0 6 0 0 0 0 1 0 1 0 1

Minimum p-value

oneZ<-function(outcome, snp){ model <- glm(outcome~snp, family=binomial()) coef(summary(model))["snp","z value"] } maxZ<-function(outcome, snps){ allZs <- sapply(snps, function(snp) oneZ(outcome, snp)) max(abs(allZs)) }

true.maxZ<-maxZ(dat$y, dat[,-1])

manypermZ<-replicate(10000, maxZ(sample(dat$y), dat[,-1]))

Permutation tests, Exercises of Genetics

Related documents

Partial preview of the text

Download Permutation tests and more Exercises Genetics in PDF only on Docsity!

Permutation tests

Overview

Permutations

Example: null is true

Means

make up some ‘true’ data

Means: t-test

Means: permutation test

Example: null is true

How many permutations?

Debugging

Debugging

Debugging

Minimum p-value

Minimum p-value