Final Exam with Solution - Statistical Methods for Bioscience I | STAT 571, Exams of Data Analysis & Statistical Methods

Material Type: Exam; Professor: Ane; Class: Statistical Methods for Bioscience I; Subject: STATISTICS; University: University of Wisconsin - Madison; Term: Fall 2007;

Typology: Exams

Pre 2010

Uploaded on 09/02/2009

koofers-user-0bm
koofers-user-0bm 🇺🇸

10 documents

1 / 1

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Stat 571 Final Exam - Solution December 19, 2007
1.(a) 37 24/66 = 13.45
(b) X2= 1.5365 + 0.055 + ···+2.943 = 8.8705. Using df=2
we get a p-value slightly above 0.01. We reject the null
hypothesis: there is evidence that mole rats from different
colonies choose their tubers differently. Note: from the
small contributions of colony B cells, mole rats from colony
B seem to have average tuber preferences. Colonies A and
C have the largest contributions, so they seem to differ more
markedly from the average tuber preference profile.
2.(a) ˆ
b1= 26,198.4/1006.6 = 26.02662.
ˆ
b0= 471.426.02 16.1 = 52.37.
For each increase of 1kg in female’s weight, there is an av-
erage increase of 26,027 eggs.
(b) r= 26,198.4/1006.6811,738 = 0.916 so r2= 0.84.
It means that 84% of the variability in the number of eggs
is explained by the female’s weight.
(c) SSError= (1 r2)811,738 = 129,878 so s2
e= MSEr-
ror = SSError/(15 2) = 9990.93, and se= 99.954 (which
seems correct from the graph provided). Then the standard
error of ˆ
b1is p9990.93/1006.6 = 9.925 = 3.15.
T-test for the slope: t= 26.02/3.15 = 8.26 on df= 13. We
get a p-value <2.001. Strong evidence that the slope is
>0.
(d) prediction of one new value: 52.37 + 26.02 19 = 546.87
thousands of eggs if the female weighs 19kg, and 52.37 +
26.02 24 = 677.0 thousands of eggs at 24kg.
Larger. This is because ¯x= 16.1kg so 19kg is closer to the
mean female’s weight than 24kg. The SE of the prediction
is larger when (x¯x)2is larger, and here (24¯x)2is larger
than (19 ¯x)2.
3. See Ley et al. Nature 444:1022 (2006).
(a) Paired t-test, 2 sided.
(b) t= 3.09, df= 121 = 11 so the p-value is 2.005 < p <
2.01 i.e. .01 < p < .02. We reject the null hypothesis. In
obese subjects (such as those sampled for the study), the
mean bacteroidetes abundance is higher at week 54 than at
week 12 of the diet.
4.(a) We can get SSError as (81)7852+···+(81)622=
4,427,213. To get SSCultivar, we can first get the grand
mean (81830+···+8403)/48 = 635.17 then SSCultivar is
8(1830635.17)2+···+ 8(403635.17)2= 13,962,803.
We can also get SSCultivar by subtracting SSError from
SSTotal.
Source df SS MS F p-value
Cultivar 5 13,962,803 2,792,561 26.5 < .001
Error 42 4,427,213 105,409.8
Total 47 18,390,020
No, this analysis does not provide evidence that the 5 oca
cultivars do not all have the same mean. The significant p-
value could be due to the rhubarb having a different mean
than all the oca cultivars, but the 5 ocas’ means could all
be the same.
(b) From the residual vs. fitted values plot, we see that the
rhubard residuals are way larger than the oca residuals. It
is also suggested by the large SD in the rhubarb sample
(785), about 10 times bigger than those in the oca sam-
ples (41-72). The large rhubarb residuals appear as large
outliers (positive and negative) in the normal scores plot.
That’s what makes plot unusual.
Independence assumption: not met. Duplicate values are
’nested’ within tubers or stalks. There are really only 4
independent tubers and 4 independent stalks. But all 8 ob-
servations within each sample are not all independent.
Equal variances: not met, as said earlier.
Normality: may be. Even though the normal scores plot
is not linear, its shape might be only explained by the
rhubarb’s large SD.
(c) dfCultivar is 5 1 = 4 and dfError is 35 so the p-value
is < .001. Strong evidence that the 5 oca cultivars do not
all have the same mean oxalate content.
(d) For α= 0.01 we get Q= 5.05 at df=30 and 4.93 at
df=40. Our dfError=35. We prefer being conservative and
use Q= 5.0, close to the Q value at df=30.(The true value
is Q= 4.98 at dfError= 35). The critical distance is the
dQ= 5.0p3251 1/8 = 100.8. Only cultivar O2 is signifi-
cantly different from all other oca cultivars.
O1 O4 O3 O5 O2
306 336 397 403 539
---------------
5. (a) False. The equality is always valid, even when X
and Yare dependent.
(b) True. The median depends on middle values only, not
on the largest or on the lowest values. The mean depends
on all values and can be pulled by outliers.
(c) False. The t-distribution is symmetric, the F-distribution
is not.
(d) False. The Binomial can be skewed, but never bimodal.
(e) True. The t-test depends on means, which are affected
by outliers. The Mann-Whitney test depends on ranks only,
not affected by extreme values.
1

Partial preview of the text

Download Final Exam with Solution - Statistical Methods for Bioscience I | STAT 571 and more Exams Data Analysis & Statistical Methods in PDF only on Docsity!

Stat 571 Final Exam - Solution December 19, 2007

1.(a) 37 ∗ 24 /66 = 13. 45 (b) X^2 = 1.5365 + 0.055 + · · · + 2.943 = 8.8705. Using df= we get a p-value slightly above 0.01. We reject the null hypothesis: there is evidence that mole rats from different colonies choose their tubers differently. Note: from the small contributions of colony B cells, mole rats from colony B seem to have average tuber preferences. Colonies A and C have the largest contributions, so they seem to differ more markedly from the average tuber preference profile.

2.(a) ˆb 1 = 26, 198. 4 / 1006 .6 = 26.02662. ˆb 0 = 471. 4 − 26. 02 ∗ 16 .1 = 52.37.

For each increase of 1kg in female’s weight, there is an av- erage increase of 26,027 eggs. (b) r = 26, 198. 4 /

  1. 6 ∗ 811 , 738 = 0.916 so r^2 = 0.84. It means that 84% of the variability in the number of eggs is explained by the female’s weight. (c) SSError= (1 − r^2 ) ∗ 811 , 738 = 129, 878 so s^2 e = MSEr- ror = SSError/(15 − 2) = 9990.93, and se = 99.954 (which seems correct from the graph provided). Then the standard error of ˆb 1 is

T-test for the slope: t = 26. 02 / 3 .15 = 8.26 on df= 13. We get a p-value < 2 ∗ .001. Strong evidence that the slope is

(d) prediction of one new value: 52.37 + 26. 02 ∗ 19 = 546. 87 thousands of eggs if the female weighs 19kg, and 52.37 +

  1. 02 ∗ 24 = 677.0 thousands of eggs at 24kg. Larger. This is because ¯x = 16.1kg so 19kg is closer to the mean female’s weight than 24kg. The SE of the prediction is larger when (x − x¯)^2 is larger, and here (24 − x¯)^2 is larger than (19 − ¯x)^2.

  2. See Ley et al. Nature 444:1022 (2006). (a) Paired t-test, 2 sided. (b) t = 3.09, df= 12−1 = 11 so the p-value is 2∗. 005 < p < 2 ∗ .01 i.e.. 01 < p < .02. We reject the null hypothesis. In obese subjects (such as those sampled for the study), the mean bacteroidetes abundance is higher at week 54 than at week 12 of the diet.

4.(a) We can get SSError as (8−1)∗ 7852 +· · ·+(8−1)∗ 622 = 4 , 427 , 213. To get SSCultivar, we can first get the grand mean (8∗1830+· · ·+8∗403)/48 = 635.17 then SSCultivar is 8 ∗ (1830 − 635 .17)^2 + · · · + 8 ∗ (403 − 635 .17)^2 = 13, 962 , 803. We can also get SSCultivar by subtracting SSError from SSTotal. Source df SS MS F p-value Cultivar 5 13,962,803 2,792,561 26.5 <. 001 Error 42 4,427,213 105,409. Total 47 18,390, No, this analysis does not provide evidence that the 5 oca

cultivars do not all have the same mean. The significant p- value could be due to the rhubarb having a different mean than all the oca cultivars, but the 5 ocas’ means could all be the same. (b) From the residual vs. fitted values plot, we see that the rhubard residuals are way larger than the oca residuals. It is also suggested by the large SD in the rhubarb sample (785), about 10 times bigger than those in the oca sam- ples (41-72). The large rhubarb residuals appear as large outliers (positive and negative) in the normal scores plot. That’s what makes plot unusual. Independence assumption: not met. Duplicate values are ’nested’ within tubers or stalks. There are really only 4 independent tubers and 4 independent stalks. But all 8 ob- servations within each sample are not all independent. Equal variances: not met, as said earlier. Normality: may be. Even though the normal scores plot is not linear, its shape might be only explained by the rhubarb’s large SD. (c) dfCultivar is 5 − 1 = 4 and dfError is 35 so the p-value is < .001. Strong evidence that the 5 oca cultivars do not all have the same mean oxalate content. (d) For α = 0.01 we get Q = 5.05 at df=30 and 4.93 at df=40. Our dfError=35. We prefer being conservative and use Q = 5.0, close to the Q value at df=30.(The true value is Q = 4.98 at dfError= 35). The critical distance is the dQ = 5. 0

3251 ∗ 1 /8 = 100.8. Only cultivar O2 is signifi- cantly different from all other oca cultivars.

O1 O4 O3 O5 O 306 336 397 403 539


  1. (a) False. The equality is always valid, even when X and Y are dependent. (b) True. The median depends on middle values only, not on the largest or on the lowest values. The mean depends on all values and can be pulled by outliers. (c) False. The t-distribution is symmetric, the F-distribution is not. (d) False. The Binomial can be skewed, but never bimodal. (e) True. The t-test depends on means, which are affected by outliers. The Mann-Whitney test depends on ranks only, not affected by extreme values.