Download Methods of Applied Statistics - Homework 5 Solutions | STAT 420 and more Assignments Data Analysis & Statistical Methods in PDF only on Docsity! Stat 420 Homework 5 Solution - 1 - 1. Use the teengamb data set in 13.1. Let the dependent variable be Y=gamble and sex be the grouping variable. Denote the population means to be and and the population variances to be and . Assume = . (a) Use R to draw boxplots of Y for each of the two sex groups. boxplot(gamble~sex,data=teengamb,main="Boxplot of gamble") 0 1 0 50 10 0 15 0 Boxplot of gamble (b) Do the boxplots indicate that males and females have identical means and variances? Compute the sample means and variances. Since the 50th percentile values are different, means seem to be different. (although 50 percentile(sample median) doesn't exactly indicate the mean) Moreover, the length of boxplots are significantly different, so variances also seem to be very different. > mean(gambleF) 3.865789 > mean(gambleM) 29.775 > var(gambleF) 26.53001 > var(gambleM) 1393.095 Stat 420 Homework 5 Solution - 2 - (c) Compute the t statistic. Uncer , give the distribution where the observed t stat comes from. Since the variances are unknown but equal, we need to do the usual two sample t-test. (not the Welch's t test) > t.test(gamble[1:19],gamble[20:47],var.equal=TRUE) Two Sample t-test data: gamble[1:19] and gamble[20:47] t = -2.9961, df = 45, p-value = 0.004437 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -43.32649 -8.49193 sample estimates: mean of x mean of y 3.865789 29.775000 The t statistics is -2.9961 (i.e. =2.9961), and the distribution where the observed t stat comes from is t(45). However, according to the boxplots, the true variances seem to be quite different. Therefore, Welch's t test would be more appropriate. > t.test(gamble[1:19],gamble[20:47]) Welch Two Sample t-test data: gamble[1:19] and gamble[20:47] t = -3.6227, df = 28.503, p-value = 0.001123 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -40.54758 -11.27085 sample estimates: mean of x mean of y 3.865789 29.775000 According to Welch's t test, the t statistics is -3.6227 (i.e. =3.6227), and the distribution where the observed t stat comes from is t(28). Stat 420 Homework 5 Solution - 5 - Since , we get But we can simplify these. for the terms n=1~19, and for the remaining terms n=20~47. for the terms n=1~19 and for the remaining terms n=20~47. Therefore, which are the sample means of gamble for each group. From the sample means that we've already found before, we can find . Similarly, is the overall sample mean of gamble which turns out to be > mean(gamble) [1] 19.30106 Therefore, using these estimates, we can compute RSS values for each model, and compute F statistic whose distribution is . 3. Suppose that the true parameters are which are both not known to the scientist. However, the variance is known. A scientist plans to randomly select N subjects from group 1 and another N subjects from group 2. To test for differences at level , the scientist will use the z statistic . The scientist will conclude that the two population means are different if . (a) Find the appropriate threshold . Since , the threshold is ± ±. (b) Find the smallest sample size N that is needed so that the probability that the scientist concludes difference, when in fact the difference exists, is at least 80%. Since , i.e. , we need to compute the power at 0.1 and find N such that the power is at least 80%. power at 0.1 = (reject ∣ ) = Stat 420 Homework 5 Solution - 6 - ∼ ⇒ ∼ Therefore, = = ≥ There could be many choices of N, but if we assign 80% to the right tail probability, then ≤ N=1570.⇒ (c) Suppose that, due to budget constraints, the scientist can afford to sample N=18 subjects. What is the power of this test? ∼ ⇒ ∼ power at 0.1 = (reject ∣ ) = = = = = 0.06036785