Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Methods of Applied Statistics - Homework 5 Solutions | STAT 420, Assignments of Data Analysis & Statistical Methods

Material Type: Assignment; Class: Methods of Applied Statistics; Subject: Statistics; University: University of Illinois - Urbana-Champaign; Term: Unknown 1989;

Typology: Assignments

Pre 2010

Uploaded on 03/16/2009

koofers-user-x8c
koofers-user-x8c 🇺🇸

5

(2)

10 documents

1 / 6

Toggle sidebar

Related documents


Partial preview of the text

Download Methods of Applied Statistics - Homework 5 Solutions | STAT 420 and more Assignments Data Analysis & Statistical Methods in PDF only on Docsity! Stat 420 Homework 5 Solution - 1 - 1. Use the teengamb data set in 13.1. Let the dependent variable be Y=gamble and sex be the grouping variable. Denote the population means to be  and  and the population variances to be   and  . Assume  = . (a) Use R to draw boxplots of Y for each of the two sex groups. boxplot(gamble~sex,data=teengamb,main="Boxplot of gamble") 0 1 0 50 10 0 15 0 Boxplot of gamble (b) Do the boxplots indicate that males and females have identical means and variances? Compute the sample means and variances. Since the 50th percentile values are different, means seem to be different. (although 50 percentile(sample median) doesn't exactly indicate the mean) Moreover, the length of boxplots are significantly different, so variances also seem to be very different. > mean(gambleF) 3.865789 > mean(gambleM) 29.775 > var(gambleF) 26.53001 > var(gambleM) 1393.095 Stat 420 Homework 5 Solution - 2 - (c) Compute the t statistic. Uncer    , give the distribution where the observed t stat comes from. Since the variances are unknown but equal, we need to do the usual two sample t-test. (not the Welch's t test) > t.test(gamble[1:19],gamble[20:47],var.equal=TRUE) Two Sample t-test data: gamble[1:19] and gamble[20:47] t = -2.9961, df = 45, p-value = 0.004437 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -43.32649 -8.49193 sample estimates: mean of x mean of y 3.865789 29.775000 The t statistics is -2.9961 (i.e.   =2.9961), and the distribution where the observed t stat comes from is t(45). However, according to the boxplots, the true variances seem to be quite different. Therefore, Welch's t test would be more appropriate. > t.test(gamble[1:19],gamble[20:47]) Welch Two Sample t-test data: gamble[1:19] and gamble[20:47] t = -3.6227, df = 28.503, p-value = 0.001123 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -40.54758 -11.27085 sample estimates: mean of x mean of y 3.865789 29.775000 According to Welch's t test, the t statistics is -3.6227 (i.e.   =3.6227), and the distribution where the observed t stat comes from is t(28). Stat 420 Homework 5 Solution - 5 - Since      , we get                        But we can simplify these.    for the terms n=1~19, and    for the remaining terms n=20~47.    for the terms n=1~19 and    for the remaining terms n=20~47. Therefore,              which are the sample means of gamble for each group. From the sample means that we've already found before, we can find     . Similarly,  is the overall sample mean of gamble which turns out to be  > mean(gamble) [1] 19.30106 Therefore, using these estimates, we can compute RSS values for each model, and compute F statistic whose distribution is   . 3. Suppose that the true parameters are     which are both not known to the scientist. However, the variance   is known. A scientist plans to randomly select N subjects from group 1 and another N subjects from group 2. To test for differences at level , the scientist will use the z statistic     . The scientist will conclude that the two population means are different if  . (a) Find the appropriate threshold . Since , the threshold is ±  ±. (b) Find the smallest sample size N that is needed so that the probability that the scientist concludes difference, when in fact the difference exists, is at least 80%. Since    , i.e.   , we need to compute the power at 0.1 and find N such that the power is at least 80%. power at 0.1 =  (reject ∣  ) =            Stat 420 Homework 5 Solution - 6 -  ∼       ⇒      ∼   Therefore,            =         =          ≥ There could be many choices of N, but if we assign 80% to the right tail probability, then   ≤   N=1570.⇒ (c) Suppose that, due to budget constraints, the scientist can afford to sample N=18 subjects. What is the power of this test?  ∼            ⇒    ∼   power at 0.1 =  (reject ∣  ) =            =                =             =        = 0.06036785