Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Stat 421 Homework 5: Analyzing Flux Data and Hypothesis Testing using CLT - Prof. Friedric, Assignments of Statistics

University of Washington (UW) - Seattle Statistics

Prof. Friedrich-Wilhelm Scholz

Instructions for homework 5 in stat 421, a fall 2008 course. Students are required to analyze a flux dataset, understand a function named pval.delta, and test a hypothesis about randomly generated data using the central limit theorem (clt) or simulation. The goal is to determine the probability that the mean of a dataset follows a normal distribution and to find the confidence interval for a parameter expressing the shift in means due to a difference in fluxes.

Typology: Assignments

Pre 2010

Uploaded on 03/19/2009

koofers-user-7la 🇺🇸

9 documents

1 / 4

This page cannot be seen from the preview

Don't miss anything!

Stat 421, Fall 2008

Fritz Scholz

Homework 5

Due: October 31, 2008

Problem 1: Using the Flux data set, assume that for some reason we

received less data, say 5 for flux X and 5 for flux Y. In particular,

work with the flux data set, where the rows/boards 1,5,6,9,10,13,14,15

have been removed. Explain line by line what the following function does,

by embedding comment lines # ......

Note that there is a function fun0 defined within pval.delta. That works as long as

you define the function before you call it. See what happens if you define the function

after you first call it.

pval.delta = function (delta=0)

{

fun0=function(ind,y){mean(y[ind])-mean(y[-ind])}

# The following assumes that the original data set

# flux is in your work space.

flux0=flux[-c(1,5,6,9,10,13,14,15),]

SIR=flux0$SIR

FLUX=flux0$FLUX

sir.delta=SIR

sir.delta[FLUX=="Y"]=sir.delta[FLUX=="Y"]-delta

ybar.xbar.observed=mean(SIR[FLUX=="Y"])-delta-

mean(SIR[FLUX=="X"])

ybar.xbar.ref.dist=combn(1:10,5,FUN=fun0,y=sir.delta)

pval=mean(abs(ybar.xbar.ref.dist)>=.999999*abs(ybar.xbar.obs

erved))

pval

}

Think of delta as a possible effect due to flux difference.

By subtracting out that delta from all SIR values that were

treated with Y we remove such a difference effect. The test

of the null hypothesis (no difference in flux) should then lead

to acceptance (high p-values) as long as we remove the

right amount delta from the SIR values obtained under FLUX Y.

In effect we test the hypothesis that the effect of FLUX=Y is a

delta shift in the SIR values relative to the SIR values under FLUX=X.

After removing such a delta shift from SIR under FLUX Y we treat

Discover Assignments of Statistics University of Washington (UW) - Seattle

Partial preview of the text

Download Stat 421 Homework 5: Analyzing Flux Data and Hypothesis Testing using CLT - Prof. Friedric and more Assignments Statistics in PDF only on Docsity!

Stat 421, Fall 2008 Fritz Scholz Homework 5 Due: October 31, 2008 Problem 1: Using the Flux data set, assume that for some reason we received less data, say 5 for flux X and 5 for flux Y. In particular, work with the flux data set, where the rows/boards 1,5,6,9,10,13,14, have been removed. Explain line by line what the following function does, by embedding comment lines # ...... Note that there is a function fun0 defined within pval.delta. That works as long as you define the function before you call it. See what happens if you define the function after you first call it. pval.delta = function (delta=0) { fun0=function(ind,y){mean(y[ind])-mean(y[-ind])}

The following assumes that the original data set

flux is in your work space.

flux0=flux[-c(1,5,6,9,10,13,14,15),] SIR=flux0$SIR FLUX=flux0$FLUX sir.delta=SIR sir.delta[FLUX=="Y"]=sir.delta[FLUX=="Y"]-delta ybar.xbar.observed=mean(SIR[FLUX=="Y"])-delta- mean(SIR[FLUX=="X"]) ybar.xbar.ref.dist=combn(1:10,5,FUN=fun0,y=sir.delta) pval=mean(abs(ybar.xbar.ref.dist)>=.999999*abs(ybar.xbar.obs erved)) pval } Think of delta as a possible effect due to flux difference. By subtracting out that delta from all SIR values that were treated with Y we remove such a difference effect. The test of the null hypothesis (no difference in flux) should then lead to acceptance (high p-values) as long as we remove the right amount delta from the SIR values obtained under FLUX Y. In effect we test the hypothesis that the effect of FLUX=Y is a delta shift in the SIR values relative to the SIR values under FLUX=X. After removing such a delta shift from SIR under FLUX Y we treat

the situation as though there is no FLUX effect, which represented our null hypothesis when we first introduced this example. Exercise this function for various values of delta and record the resulting output pval. After getting a feel for this function try to determine for which set of delta values your p-values stay at or above .05. Write a loop function , say delta.plot, that exercises the above function for an appropriate grid-vector of delta values, gets the resulting vector of p-values, and plots the latter against delta, drawing a horizontal line at the level .05 (using abline(h=.05)). Give the code and the plot. The set of delta values with p-value >= .05 is an interval. Try to determine the interval end points to within one decimal accuracy, e.g, [-2.1,-.5]. This may take some chiseling in the choices of the delta vector in your loop function. What you will have determined is a 95% confidence interval for delta, a parameter that expresses the shift in means due to a difference in fluxes. This is a confidence interval for delta based on the randomization test. Does this interval contain zero? Would you have accepted the hypothesis of no flux difference without making any delta adjustment? [ Further commentary: Such confidence intervals (sets) exploit the basic duality between confidence sets and tests of hypotheses. Our hypothesis H(delta.0): delta=delta.0 expresses the fact that after subtracting delta.0 from all SIR.Y values there is no more difference between FLUX X and FLUX Y. This can be examined by testing H.0: no difference in Fluxes in view of our modified SIR values, i.e., SIR.X and SIR.Y-delta.0. The fact that such delta shift in the SIR responses represent the only possible way in which FLUX X and FLUX Y may differ is a strong assumption. The basic duality between confidence sets and tests of hypotheses consists in defining as 1-alpha level confidence set the set C(SIR.X,SIR.Y) of all delta.0 for which we cannot reject the hypothesis H(delta.0) at level alpha based on SIR.X and SIR.Y, or for which we cannot reject the hypothesis H.0 based on SIR.X and SIR.Y-delta.0. Then delta.0 is in C(SIR.X,SIR.Y) if and only if H(delta.0): delta=delta.0 is accepted by the corresponding level alpha test. We then have P(delta.0 in C(SIR.X,SIR.Y) |H(delta.0)) = P(H(delta.0) is accepted|H(delta.0)) =1-alpha.

To apply the CLT, find the mean and variance of each Bi and from that the mean and variance of Bidi (treating the di as given constants) and from that the mean and variance of Dbar, using the rules of expectation and variance. Based on this find the normal approximation to the full reference distribution of the Dbar values and give the approximate two-sided p-value of Dbar.observed. Examine the variance ratio criterion that plays an important role in the CLT. What is its numerical size here, expressed in terms of the variances of Bidi/n? For the simulation approach generate random vectors (B 1 ,…, Bn) and evaluate Dbar each time, using the given vector (d 1 , …, dn ). Hint: to generate such a vector (B 1 ,…, Bn), note that rbinom(n,1,.5) generates a vector of length n filled with 0’s and 1’s, independently chosen with probability ½ each. You can transform 0’s and 1’s into -1 and +1, respectively, using rbinom(n,1,.5)*a-b for appropriate a and b, try various choices of a and b to get the hang of it. Do this generation of Dbar Nsim=10000 times in a loop as part of a function. This function should also plot a histogram of the Nsim generated Dbar values and show superimposed the normal density determined in the previous approach. This simulated histogram is also a good approximation to the full reference distribution. Use this simulated set of Nsim Dbar values to estimate the two-sided p-value for Dbar.observed. Give the details of your normal approximation, the value of the variance ratio criterion, your simulation code, the histogram plot with normal density, and the two p-value assessments. Try to make the code general enough so that it can be applied to a data set D.vec of any length n>1.

Stat 421 Homework 5: Analyzing Flux Data and Hypothesis Testing using CLT - Prof. Friedric, Assignments of Statistics

Related documents

Partial preview of the text

Download Stat 421 Homework 5: Analyzing Flux Data and Hypothesis Testing using CLT - Prof. Friedric and more Assignments Statistics in PDF only on Docsity!

The following assumes that the original data set

flux is in your work space.