Download Solutions of Homework 6 on Methods of Applied Statistics | STAT 420 and more Assignments Data Analysis & Statistical Methods in PDF only on Docsity! Stat 420 Homework 6 Solution - 1 - 1. One way ANOVA (Exercise 14.4) Use the infmort data set where Y is income and the grouping variable is region(Asia, Europe, Americas and Africa) (a) Draw a boxplot of income for each of the four regions. library(faraway) ## 14.4 infmort data ## Y = income ## Group = region data(infmort) boxplot(income~region, infmort) Africa Europe Asia Americas 0 10 00 20 00 30 00 40 00 50 00 (b) Compute the sample mean and sample variance for each of the regions. ## Group 1 == Asia Yas = infmort$income[infmort$region=="Asia"]; Nas = length(Yas); ## Group 2 == Europe Yeu = infmort$income[infmort$region=="Europe"]; Neu = length(Yeu); Stat 420 Homework 6 Solution - 2 - ## Group 3 == Americas Yam = infmort$income[infmort$region=="Americas"]; Nam = length(Yam); ## Group 4 == Africa Yaf = infmort$income[infmort$region=="Africa"]; Naf = length(Yaf); > c(Nas,Neu,Nam,Naf) [1] 30 18 23 34 > mean(Yas) ; var(Yas) [1] 638.8667 [1] 937319.4 > mean(Yeu) ; var(Yeu) [1] 3040.222 [1] 2071527 > mean(Yam) ; var(Yam) [1] 939.8696 [1] 1851366 > mean(Yaf) ; var(Yaf) [1] 273.2353 [1] 264522.3 (c) A virtual inspection of the boxplots may lead us to conclude that the equal variance assumption may not be satisfied and that some transformation may be needed. Nevertheless, for the moment, let us fit the model where is iid and the indicator variables are • is 1 if the country is in Asia and 0 otherwise • is 1 if the country is in Europe and 0 otherwise • is 1 if the country is in Americas and 0 otherwise • is 1 if the country is in Africa and 0 otherwise i. Using the matrix formulation , give the design matrix X and elements of the parameter vector . Note the columns of X are the group indicators and the elements of the vector consists of population means of the 4 regions. Stat 420 Homework 6 Solution - 5 - ## Problem: unequal variance ## Log transform Ynewas = log(Yas); Yneweu = log(Yeu); Ynewam = log(Yam); Ynewaf = log(Yaf); Ynew = c(Ynewas, Yneweu, Ynewam, Ynewaf); out1 = lm(Ynew ~ Gas + Geu + Gam + Gaf -1); G = rep(1, length(Ynew)); out0 = lm(Ynew ~ G -1); > anova(out0,out1) Analysis of Variance Table Model 1: Ynew ~ G - 1 Model 2: Ynew ~ Gas + Geu + Gam + Gaf - 1 Res.Df RSS Df Sum of Sq F Pr(>F) 1 104 188.421 2 101 91.210 3 97.211 35.882 7.17e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Since the P-value is smaller than 0.05, we reject . iii. Compute the joint 95% confidence intervals for all pairwise differences in the means using the Bonferroni method. Make sure that you show all the steps and in particular how you derived the critical value. State your conclusion. ## Bonferonni varas = var(Ynewas); vareu = var(Yneweu); varam = var(Ynewam); varaf = var(Ynewaf); varest = (Nas-1)*varas + (Neu-1)*vareu + (Nam-1)*varam + (Naf-1)*varaf; df = Nas-1 + Neu -1 + Nam -1 + Naf-1; varest = varest/df; alpha=0.05 Ncomp = 6; cval = qt((alpha/(2*Ncomp)), df); meanas = mean(Ynewas); meaneu=mean(Yneweu); meanam=mean(Ynewam); meanaf=mean(Ynewaf); Stat 420 Homework 6 Solution - 6 - > ## Asia - Europe > meanas - meaneu - cval*sqrt(varest*(1/Nas + 1/Neu)) [1] -1.438337 > meanas - meaneu + cval*sqrt(varest*(1/Nas + 1/Neu)) [1] -2.963309 Since the confidence interval doesn't include 0 and consists of negative numbers, we can conclude the mean of income in Asia is smaller than the mean of income in Europe. > ## Asia - Americas > meanas - meanam - cval*sqrt(varest*(1/Nas + 1/Nam)) [1] 0.03019815 > meanas - meanam + cval*sqrt(varest*(1/Nas + 1/Nam)) [1] -1.387394 Since the confidence interval includes 0, the mean of income in Asia is not significantly different from the mean of income in Americas. > ## Asia - Africa > meanas - meanaf - cval*sqrt(varest*(1/Nas + 1/Naf)) [1] 1.220632 > meanas - meanaf + cval*sqrt(varest*(1/Nas + 1/Naf)) [1] -0.06060009 Since the confidence interval includes 0, the mean of income in Asia is not significantly different from the mean of income in Africa. > ## Europe - Americas > meaneu - meanam - cval*sqrt(varest*(1/Neu + 1/Nam)) [1] 2.327047 > meaneu - meanam + cval*sqrt(varest*(1/Neu + 1/Nam)) [1] 0.7174033 Since the confidence interval is strictly positive, the mean of income in Europe is significantly larger than the mean of income in Americas. > ## Europe - Africa > meaneu - meanaf - cval*sqrt(varest*(1/Neu + 1/Naf)) [1] 3.526315 > meaneu - meanaf + cval*sqrt(varest*(1/Neu + 1/Naf)) [1] 2.035363 Since the confidence interval is strictly positive, the mean of income in Europe is significantly larger than the mean of income in Africa. Stat 420 Homework 6 Solution - 7 - > ## America - Africa > meanam - meanaf - cval*sqrt(varest*(1/Nam + 1/Naf)) [1] 1.949079 > meanam - meanaf + cval*sqrt(varest*(1/Nam + 1/Naf)) [1] 0.5681484 Since the confidence interval is strictly positive, the mean of income in America is significantly larger than the mean of income in Africa. 2. Two way ANOVA Consider the warpbreaks data set in 15.1. Let the response variable be breaks and the two factors be wool and tension. (a) Draw boxplots with breaks as the response and wool as a factor. Draw boxplots with breaks as response and tension as the factor. Comment on the spread of the response variable across levels in each factor. data(warpbreaks) Y = warpbreaks$breaks N = length(Y); par(mfrow=c(1,2)) boxplot(breaks~wool, warpbreaks, main="breaks and wool") boxplot(breaks~tension, warpbreaks, main="breaks and tension") A B 10 20 30 40 50 60 70 b re a k s a n d w o o l L M H 10 20 30 40 50 60 70 b re a k s a n d te n s io n Stat 420 Homework 6 Solution - 10 - The above plot suggests an interaction between wool type and tension. # Another code for the interaction plot with(warpbreaks,interaction.plot(tension,wool,log(breaks))) 3. 0 3. 2 3. 4 3. 6 tension m ea n of l og (b re ak s) L M H wool A B (d) State the cell means model and then use R to compute the parameters. The cell means model : where is iid and the indicator variables are • is 1 if the subject came from wool A and tension L and, 0 otherwise • is 1 if the subject came from wool A and tension M and, 0 otherwise • is 1 if the subject came from wool A and tension H and, 0 otherwise • is 1 if the subject came from wool B and tension L and, 0 otherwise • is 1 if the subject came from wool B and tension M and, 0 otherwise • is 1 if the subject came from wool B and tension H and, 0 otherwise Y<-c(YAL,YAM,YAH,YBL,YBM,YBH) Stat 420 Homework 6 Solution - 11 - GAL <- c(rep(1,9),rep(0,54-9)) GAM <- c(rep(0,9),rep(1,9),rep(0,54-18)) GAH <- c(rep(0,18),rep(1,9),rep(0,54-27)) GBL <- c(rep(0,27),rep(1,9),rep(0,54-36)) GBM <- c(rep(0,36),rep(1,9),rep(0,9)) GBH <- c(rep(0,45),rep(1,9)) X<-cbind(GAL,GAM,GAH,GBL,GBM,GBH) ; X # Using the matrix muhat<-solve(t(X)%*%X) %*% t(X)%*% Y ;muhat [,1] GAL 3.717945 GAM 3.116750 GAH 3.117623 GBL 3.282378 GBM 3.309327 GBH 2.904152 # Using the regression cell<-lm(Y~GAL+GAM+GAH+GBL+GBM+GBH -1) Coefficients: Estimate Std. Error t value Pr(>|t|) GAL 3.7179 0.1247 29.82 <2e-16 *** GAM 3.1167 0.1247 25.00 <2e-16 *** GAH 3.1176 0.1247 25.01 <2e-16 *** GBL 3.2824 0.1247 26.33 <2e-16 *** GBM 3.3093 0.1247 26.55 <2e-16 *** GBH 2.9042 0.1247 23.30 <2e-16 *** (e) State the deviations model and then use R to compute the parameters. The deviations model : Let's choose the group BH as the baseline ; N = length(Y) Y = log(warpbreaks$breaks) ## wool factor WoolA = rep(0, N); for(i in 1:N){ Stat 420 Homework 6 Solution - 12 - WoolA[i] = (warpbreaks$wool[i]=="A");} ## tension factor TenL = rep(0, N); for(i in 1:N){ TenL[i] = (warpbreaks$tension[i]=="L");} TenM = rep(0, N); for(i in 1:N){ TenM[i] = (warpbreaks$tension[i]=="M");} ## interaction terms ALint = WoolA*TenL; AMint = WoolA*TenM; dev<-lm(Y ~ WoolA + TenL + TenM + ALint + AMint) summary(dev) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.9042 0.1247 23.296 <2e-16 *** WoolA 0.2135 0.1763 1.211 0.2319 TenL 0.3782 0.1763 2.145 0.0370 * TenM 0.4052 0.1763 2.298 0.0260 * ALint 0.2221 0.2493 0.891 0.3775 AMint -0.4060 0.2493 -1.629 0.1100 (f) Find the best model (using the deviations model as the "full" model) using the backward elimination method. Interpret the best model. ## Full model out = lm(Y ~ WoolA + TenL + TenM + ALint + AMint) summary(out) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.9042 0.1247 23.296 <2e-16 *** WoolA 0.2135 0.1763 1.211 0.2319 TenL 0.3782 0.1763 2.145 0.0370 * TenM 0.4052 0.1763 2.298 0.0260 * ALint 0.2221 0.2493 0.891 0.3775 AMint -0.4060 0.2493 -1.629 0.1100