Statistical Analysis Homework: Comparing Two Groups and Hypothesis Testing - Prof. Peter H, Assignments of Statistics

A statistics homework assignment from a university course, specifically stat 502. The assignment includes tasks related to analyzing data from two groups of prairie voles, where one group receives food supplements and the other forages for food. The tasks involve creating histograms, boxplots, computing means and medians, and performing t-tests to evaluate differences between the groups. Additionally, the assignment covers the derivation of the distribution of p-values under the null hypothesis for one-sample and two-sample t-tests, and the derivation of the distribution of the two-sample t-statistic under the null hypothesis. Lastly, the assignment includes a question about finding the distribution of the sum of two chi-square distributions.

Typology: Assignments

Pre 2010

Uploaded on 03/10/2009

koofers-user-9dj
koofers-user-9dj 🇺🇸

10 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Stat 502
Homework 2
Assigned 10/9/08
Due 10/16/08
1. (Voles): Researchers studying 24 wild prairie voles are interested in the effects of food sup-
plements on reproductive success. The researchers randomly assigned 12 voles to receive
supplements (group B), with the remaining 12 voles needing to forage to obtain all of their
food (group A). At the end of the mating season the number of pups for each of the 24
females was recorded.
(a) Make a histogram and boxplots for each of the two groups. Comment on the differences.
(b) Compute the means and medians of each group. Comment on the differences.
(c) Plot the density of the appropriate t-distribution if one were to use the ordinary two-
sample t-test to evaluate differences between the groups. Obtain the corresponding p-
value. Write down the assumptions which validate the use of this p-value, and comment
on whether or not they are met for these data.
(d) Make a histogram of the randomization distribution of the t-statistic, and compute the
corresponding p-value. Write down the assumptions which validate the use of this p-
value, and comment on whether or not they are met for these data.
2. (Null distribution of p-values): Recall that a p-value is a function of the data, and so before
the experiment is run it is a random variable.
(a) Consider the one-sample t-test for evaluating evidence against H0:µ=µ0. Derive the
distribution of the p-value under the null hypothesis. Using the result, show that the
type I error of a level-αtest is α.
(b) Now consider the hypothesis H0:µA=µB. Via simulation, compute the null distri-
bution of the p-value based on the two-sample t-test under each of the following six
experimental scenarios.
i. Y1,A, . . . , Y10,A i.i.d. normal(1,1), Y1,B , . . . , Y10,B i.i.d. normal(1,1)
ii. Y1,A, . . . , Y10,A i.i.d. normal(1,1), Y1,B , . . . , Y10,B i.i.d. normal(1,3)
iii. Y1,A, . . . , Y10,A i.i.d. Poisson(1), Y1,B , . . . , Y10,B i.i.d. Poisson(1)
iv. Y1,A, . . . , Y10,A i.i.d. normal(1,1), Y1,B , . . . , Y10,B i.i.d. normal(3,1)
v. Y1,A, . . . , Y10,A i.i.d. normal(1,1), Y1,B , . . . , Y10,B i.i.d. normal(3,3)
vi. Y1,A, . . . , Y10,A i.i.d. Poisson(1), Y1,B , . . . , Y10,B i.i.d. Poisson(3)
More specifically, for each of the above scenarios,
simulate 1000 (or more) runs of the 10-sample experiment;
compute the p-value for each of the 1000 runs;
plot the empirical distribution of the 1000 p-values (using, for example, a histogram)
and compare to the distribution from part (a).
1
pf2

Partial preview of the text

Download Statistical Analysis Homework: Comparing Two Groups and Hypothesis Testing - Prof. Peter H and more Assignments Statistics in PDF only on Docsity!

Stat 502 Homework 2 Assigned 10/9/ Due 10/16/

  1. (Voles): Researchers studying 24 wild prairie voles are interested in the effects of food sup- plements on reproductive success. The researchers randomly assigned 12 voles to receive supplements (group B), with the remaining 12 voles needing to forage to obtain all of their food (group A). At the end of the mating season the number of pups for each of the 24 females was recorded.

(a) Make a histogram and boxplots for each of the two groups. Comment on the differences. (b) Compute the means and medians of each group. Comment on the differences. (c) Plot the density of the appropriate t-distribution if one were to use the ordinary two- sample t-test to evaluate differences between the groups. Obtain the corresponding p- value. Write down the assumptions which validate the use of this p-value, and comment on whether or not they are met for these data. (d) Make a histogram of the randomization distribution of the t-statistic, and compute the corresponding p-value. Write down the assumptions which validate the use of this p- value, and comment on whether or not they are met for these data.

  1. (Null distribution of p-values): Recall that a p-value is a function of the data, and so before the experiment is run it is a random variable.

(a) Consider the one-sample t-test for evaluating evidence against H 0 : μ = μ 0. Derive the distribution of the p-value under the null hypothesis. Using the result, show that the type I error of a level-α test is α. (b) Now consider the hypothesis H 0 : μA = μB. Via simulation, compute the null distri- bution of the p-value based on the two-sample t-test under each of the following six experimental scenarios. i. Y 1 ,A,... , Y 10 ,A ∼ i.i.d. normal(1,1), Y 1 ,B ,... , Y 10 ,B ∼ i.i.d. normal(1,1) ii. Y 1 ,A,... , Y 10 ,A ∼ i.i.d. normal(1,1), Y 1 ,B ,... , Y 10 ,B ∼ i.i.d. normal(1,3) iii. Y 1 ,A,... , Y 10 ,A ∼ i.i.d. Poisson(1), Y 1 ,B ,... , Y 10 ,B ∼ i.i.d. Poisson(1) iv. Y 1 ,A,... , Y 10 ,A ∼ i.i.d. normal(1,1), Y 1 ,B ,... , Y 10 ,B ∼ i.i.d. normal(3,1) v. Y 1 ,A,... , Y 10 ,A ∼ i.i.d. normal(1,1), Y 1 ,B ,... , Y 10 ,B ∼ i.i.d. normal(3,3) vi. Y 1 ,A,... , Y 10 ,A ∼ i.i.d. Poisson(1), Y 1 ,B ,... , Y 10 ,B ∼ i.i.d. Poisson(3) More specifically, for each of the above scenarios,

  • simulate 1000 (or more) runs of the 10-sample experiment;
  • compute the p-value for each of the 1000 runs;
  • plot the empirical distribution of the 1000 p-values (using, for example, a histogram) and compare to the distribution from part (a).

Write a sentence or two about what you have learned from each of these simulations (hint: you should have learned something about robustness of the t-test, power of the t-test and about whether your calculation in part (a) was correct).

  1. (t-distribution)

(a) Recall that we defined the χ^2 m distribution as the distribution of a sum of m squared standard normal random variables. Thus if X ∼ χ^2 m then we can think of X as being represented by X = Z^21 + · · · Z m^2 where Z 1 ,... , Zm ∼ i.i.d. normal(0,1). Use this to derive the distribution of X 1 + X 2 , where X 1 ∼ χ^2 m 1 , X 2 ∼ χ^2 m 2 , and X 1 and X 2 are independent. (b) Let Y 1 ,... , Yn ∼ i.i.d. normal(μ, σ^2 ), and let Zi = (Yi − μ)/σ. i. What is the distribution of Z 1 ,... , Zn? ii. Write out (Zi − Z¯) in terms of the Y ’s, μ and σ^2. iii. Use the fact that

(Zi − Z¯)^2 ∼ χ^2 n− 1 to derive the distribution of

(Yi − Y¯ )^2 /σ^2. (c) Let Y 1 ,A,... , YnA,A ∼ i.i.d. normal(μA, σ^2 ) and Y 1 ,B ,... , YnB ,B ∼ i.i.d. normal(μB , σ^2 ). Use the results in (a) and (b) to obtain the distribution of (nA −1)s^2 A/σ^2 +(nB −1)s^2 B /σ^2. Indicate how you are using the results from (a) and (b). Note that (nA − 1)s^2 A/σ^2 + (nB − 1)s^2 B /σ^2 is equal to (nA + nB − 2)s^2 p/σ^2. (d) Use the above results to derive the distribution of the two-sample t-statistic under H 0 : μA = μB.

  1. (Sample size) A researcher needs to estimate the mean circumference of a population of small trees in the Cascade mountains. The researcher would like to construct a confidence interval for the mean, and wants this interval to contain the true mean with probability at least. and would like to estimate the mean with a precision of about 1 cm, i.e. the length of the confidence interval should be no more than 1 cm. Studies from other regions suggest the standard deviation in circumference is about 7 cm. Write out a formula that gives the width of 95% confidence interval as a function of sample size n and the data. Make some assumptions that allow you to make a graph of sample size versus width of the confidence interval, where the sample size ranges from n = 2 to a value such that the width is less than or equal to 1 cm. How many trees do you recommend the researcher sample? What assumptions are you making for this recommendation?