Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Solutions of Homework 6 on Methods of Applied Statistics | STAT 420, Assignments of Data Analysis & Statistical Methods

Material Type: Assignment; Class: Methods of Applied Statistics; Subject: Statistics; University: University of Illinois - Urbana-Champaign; Term: Unknown 2000;

Typology: Assignments

Pre 2010

Uploaded on 03/16/2009

koofers-user-2mr-1
koofers-user-2mr-1 🇺🇸

10 documents

1 / 13

Toggle sidebar

Related documents


Partial preview of the text

Download Solutions of Homework 6 on Methods of Applied Statistics | STAT 420 and more Assignments Data Analysis & Statistical Methods in PDF only on Docsity! Stat 420 Homework 6 Solution - 1 - 1. One way ANOVA (Exercise 14.4) Use the infmort data set where Y is income and the grouping variable is region(Asia, Europe, Americas and Africa) (a) Draw a boxplot of income for each of the four regions. library(faraway) ## 14.4 infmort data ## Y = income ## Group = region data(infmort) boxplot(income~region, infmort) Africa Europe Asia Americas 0 10 00 20 00 30 00 40 00 50 00 (b) Compute the sample mean and sample variance for each of the regions. ## Group 1 == Asia Yas = infmort$income[infmort$region=="Asia"]; Nas = length(Yas); ## Group 2 == Europe Yeu = infmort$income[infmort$region=="Europe"]; Neu = length(Yeu); Stat 420 Homework 6 Solution - 2 - ## Group 3 == Americas Yam = infmort$income[infmort$region=="Americas"]; Nam = length(Yam); ## Group 4 == Africa Yaf = infmort$income[infmort$region=="Africa"]; Naf = length(Yaf); > c(Nas,Neu,Nam,Naf) [1] 30 18 23 34 > mean(Yas) ; var(Yas) [1] 638.8667 [1] 937319.4 > mean(Yeu) ; var(Yeu) [1] 3040.222 [1] 2071527 > mean(Yam) ; var(Yam) [1] 939.8696 [1] 1851366 > mean(Yaf) ; var(Yaf) [1] 273.2353 [1] 264522.3 (c) A virtual inspection of the boxplots may lead us to conclude that the equal variance assumption may not be satisfied and that some transformation may be needed. Nevertheless, for the moment, let us fit the model       where  is iid    and the indicator variables are •  is 1 if the    country is in Asia and 0 otherwise •  is 1 if the    country is in Europe and 0 otherwise •  is 1 if the    country is in Americas and 0 otherwise •  is 1 if the    country is in Africa and 0 otherwise i. Using the matrix formulation , give the design matrix X and elements of the parameter vector . Note the columns of X are the group indicators and the elements of the vector  consists of population means of the 4 regions. Stat 420 Homework 6 Solution - 5 - ## Problem: unequal variance ## Log transform Ynewas = log(Yas); Yneweu = log(Yeu); Ynewam = log(Yam); Ynewaf = log(Yaf); Ynew = c(Ynewas, Yneweu, Ynewam, Ynewaf); out1 = lm(Ynew ~ Gas + Geu + Gam + Gaf -1); G = rep(1, length(Ynew)); out0 = lm(Ynew ~ G -1); > anova(out0,out1) Analysis of Variance Table Model 1: Ynew ~ G - 1 Model 2: Ynew ~ Gas + Geu + Gam + Gaf - 1 Res.Df RSS Df Sum of Sq F Pr(>F) 1 104 188.421 2 101 91.210 3 97.211 35.882 7.17e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Since the P-value is smaller than 0.05, we reject    . iii. Compute the joint 95% confidence intervals for all pairwise differences in the means using the Bonferroni method. Make sure that you show all the steps and in particular how you derived the critical value. State your conclusion. ## Bonferonni varas = var(Ynewas); vareu = var(Yneweu); varam = var(Ynewam); varaf = var(Ynewaf); varest = (Nas-1)*varas + (Neu-1)*vareu + (Nam-1)*varam + (Naf-1)*varaf; df = Nas-1 + Neu -1 + Nam -1 + Naf-1; varest = varest/df; alpha=0.05 Ncomp = 6; cval = qt((alpha/(2*Ncomp)), df); meanas = mean(Ynewas); meaneu=mean(Yneweu); meanam=mean(Ynewam); meanaf=mean(Ynewaf); Stat 420 Homework 6 Solution - 6 - > ## Asia - Europe > meanas - meaneu - cval*sqrt(varest*(1/Nas + 1/Neu)) [1] -1.438337 > meanas - meaneu + cval*sqrt(varest*(1/Nas + 1/Neu)) [1] -2.963309 Since the confidence interval doesn't include 0 and consists of negative numbers, we can conclude the mean of income in Asia is smaller than the mean of income in Europe. > ## Asia - Americas > meanas - meanam - cval*sqrt(varest*(1/Nas + 1/Nam)) [1] 0.03019815 > meanas - meanam + cval*sqrt(varest*(1/Nas + 1/Nam)) [1] -1.387394 Since the confidence interval includes 0, the mean of income in Asia is not significantly different from the mean of income in Americas. > ## Asia - Africa > meanas - meanaf - cval*sqrt(varest*(1/Nas + 1/Naf)) [1] 1.220632 > meanas - meanaf + cval*sqrt(varest*(1/Nas + 1/Naf)) [1] -0.06060009 Since the confidence interval includes 0, the mean of income in Asia is not significantly different from the mean of income in Africa. > ## Europe - Americas > meaneu - meanam - cval*sqrt(varest*(1/Neu + 1/Nam)) [1] 2.327047 > meaneu - meanam + cval*sqrt(varest*(1/Neu + 1/Nam)) [1] 0.7174033 Since the confidence interval is strictly positive, the mean of income in Europe is significantly larger than the mean of income in Americas. > ## Europe - Africa > meaneu - meanaf - cval*sqrt(varest*(1/Neu + 1/Naf)) [1] 3.526315 > meaneu - meanaf + cval*sqrt(varest*(1/Neu + 1/Naf)) [1] 2.035363 Since the confidence interval is strictly positive, the mean of income in Europe is significantly larger than the mean of income in Africa. Stat 420 Homework 6 Solution - 7 - > ## America - Africa > meanam - meanaf - cval*sqrt(varest*(1/Nam + 1/Naf)) [1] 1.949079 > meanam - meanaf + cval*sqrt(varest*(1/Nam + 1/Naf)) [1] 0.5681484 Since the confidence interval is strictly positive, the mean of income in America is significantly larger than the mean of income in Africa. 2. Two way ANOVA Consider the warpbreaks data set in 15.1. Let the response variable be breaks and the two factors be wool and tension. (a) Draw boxplots with breaks as the response and wool as a factor. Draw boxplots with breaks as response and tension as the factor. Comment on the spread of the response variable across levels in each factor. data(warpbreaks) Y = warpbreaks$breaks N = length(Y); par(mfrow=c(1,2)) boxplot(breaks~wool, warpbreaks, main="breaks and wool") boxplot(breaks~tension, warpbreaks, main="breaks and tension") A B 10 20 30 40 50 60 70 b re a k s a n d w o o l L M H 10 20 30 40 50 60 70 b re a k s a n d te n s io n Stat 420 Homework 6 Solution - 10 - The above plot suggests an interaction between wool type and tension. # Another code for the interaction plot with(warpbreaks,interaction.plot(tension,wool,log(breaks))) 3. 0 3. 2 3. 4 3. 6 tension m ea n of l og (b re ak s) L M H wool A B (d) State the cell means model and then use R to compute the parameters. The cell means model :     where  is iid    and the indicator variables are •  is 1 if the    subject came from wool A and tension L and, 0 otherwise •  is 1 if the    subject came from wool A and tension M and, 0 otherwise •  is 1 if the    subject came from wool A and tension H and, 0 otherwise •  is 1 if the    subject came from wool B and tension L and, 0 otherwise •  is 1 if the    subject came from wool B and tension M and, 0 otherwise •  is 1 if the    subject came from wool B and tension H and, 0 otherwise Y<-c(YAL,YAM,YAH,YBL,YBM,YBH) Stat 420 Homework 6 Solution - 11 - GAL <- c(rep(1,9),rep(0,54-9)) GAM <- c(rep(0,9),rep(1,9),rep(0,54-18)) GAH <- c(rep(0,18),rep(1,9),rep(0,54-27)) GBL <- c(rep(0,27),rep(1,9),rep(0,54-36)) GBM <- c(rep(0,36),rep(1,9),rep(0,9)) GBH <- c(rep(0,45),rep(1,9)) X<-cbind(GAL,GAM,GAH,GBL,GBM,GBH) ; X # Using the matrix muhat<-solve(t(X)%*%X) %*% t(X)%*% Y ;muhat [,1] GAL 3.717945 GAM 3.116750 GAH 3.117623 GBL 3.282378 GBM 3.309327 GBH 2.904152 # Using the regression cell<-lm(Y~GAL+GAM+GAH+GBL+GBM+GBH -1) Coefficients: Estimate Std. Error t value Pr(>|t|) GAL 3.7179 0.1247 29.82 <2e-16 *** GAM 3.1167 0.1247 25.00 <2e-16 *** GAH 3.1176 0.1247 25.01 <2e-16 *** GBL 3.2824 0.1247 26.33 <2e-16 *** GBM 3.3093 0.1247 26.55 <2e-16 *** GBH 2.9042 0.1247 23.30 <2e-16 *** (e) State the deviations model and then use R to compute the parameters. The deviations model : Let's choose the group BH as the baseline ;          N = length(Y) Y = log(warpbreaks$breaks) ## wool factor WoolA = rep(0, N); for(i in 1:N){ Stat 420 Homework 6 Solution - 12 - WoolA[i] = (warpbreaks$wool[i]=="A");} ## tension factor TenL = rep(0, N); for(i in 1:N){ TenL[i] = (warpbreaks$tension[i]=="L");} TenM = rep(0, N); for(i in 1:N){ TenM[i] = (warpbreaks$tension[i]=="M");} ## interaction terms ALint = WoolA*TenL; AMint = WoolA*TenM; dev<-lm(Y ~ WoolA + TenL + TenM + ALint + AMint) summary(dev) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.9042 0.1247 23.296 <2e-16 *** WoolA 0.2135 0.1763 1.211 0.2319 TenL 0.3782 0.1763 2.145 0.0370 * TenM 0.4052 0.1763 2.298 0.0260 * ALint 0.2221 0.2493 0.891 0.3775 AMint -0.4060 0.2493 -1.629 0.1100 (f) Find the best model (using the deviations model as the "full" model) using the backward elimination method. Interpret the best model. ## Full model out = lm(Y ~ WoolA + TenL + TenM + ALint + AMint) summary(out) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.9042 0.1247 23.296 <2e-16 *** WoolA 0.2135 0.1763 1.211 0.2319 TenL 0.3782 0.1763 2.145 0.0370 * TenM 0.4052 0.1763 2.298 0.0260 * ALint 0.2221 0.2493 0.891 0.3775 AMint -0.4060 0.2493 -1.629 0.1100