Final Exam with Solution - Statistical Methods for Bioscience I | STAT 571 | Exams Data Analysis & Statistical Methods

Stat 571 Final Exam - Solution December 19, 2007

1.(a) 37 ∗24/66 = 13.45

(b) X2= 1.5365 + 0.055 + ···+2.943 = 8.8705. Using df=2

we get a p-value slightly above 0.01. We reject the null

hypothesis: there is evidence that mole rats from different

colonies choose their tubers differently. Note: from the

small contributions of colony B cells, mole rats from colony

B seem to have average tuber preferences. Colonies A and

C have the largest contributions, so they seem to differ more

markedly from the average tuber preference profile.

2.(a) ˆ

b1= 26,198.4/1006.6 = 26.02662.

b0= 471.4−26.02 ∗16.1 = 52.37.

For each increase of 1kg in female’s weight, there is an av-

erage increase of 26,027 eggs.

(b) r= 26,198.4/√1006.6∗811,738 = 0.916 so r2= 0.84.

It means that 84% of the variability in the number of eggs

is explained by the female’s weight.

e= MSEr-

ror = SSError/(15 −2) = 9990.93, and se= 99.954 (which

seems correct from the graph provided). Then the standard

error of ˆ

b1is p9990.93/1006.6 = √9.925 = 3.15.

T-test for the slope: t= 26.02/3.15 = 8.26 on df= 13. We

get a p-value <2∗.001. Strong evidence that the slope is

>0.

(d) prediction of one new value: 52.37 + 26.02 ∗19 = 546.87

thousands of eggs if the female weighs 19kg, and 52.37 +

26.02 ∗24 = 677.0 thousands of eggs at 24kg.

Larger. This is because ¯x= 16.1kg so 19kg is closer to the

mean female’s weight than 24kg. The SE of the prediction

is larger when (x−¯x)2is larger, and here (24−¯x)2is larger

than (19 −¯x)2.

3. See Ley et al. Nature 444:1022 (2006).

(a) Paired t-test, 2 sided.

(b) t= 3.09, df= 12−1 = 11 so the p-value is 2∗.005 < p <

2∗.01 i.e. .01 < p < .02. We reject the null hypothesis. In

obese subjects (such as those sampled for the study), the

mean bacteroidetes abundance is higher at week 54 than at

week 12 of the diet.

4.(a) We can get SSError as (8−1)∗7852+···+(8−1)∗622=

4,427,213. To get SSCultivar, we can first get the grand

mean (8∗1830+···+8∗403)/48 = 635.17 then SSCultivar is

8∗(1830−635.17)2+···+ 8∗(403−635.17)2= 13,962,803.

We can also get SSCultivar by subtracting SSError from

SSTotal.

Source df SS MS F p-value

Cultivar 5 13,962,803 2,792,561 26.5 < .001

Error 42 4,427,213 105,409.8

Total 47 18,390,020

No, this analysis does not provide evidence that the 5 oca

cultivars do not all have the same mean. The significant p-

value could be due to the rhubarb having a different mean

than all the oca cultivars, but the 5 ocas’ means could all

be the same.

(b) From the residual vs. fitted values plot, we see that the

rhubard residuals are way larger than the oca residuals. It

is also suggested by the large SD in the rhubarb sample

(785), about 10 times bigger than those in the oca sam-

ples (41-72). The large rhubarb residuals appear as large

outliers (positive and negative) in the normal scores plot.

That’s what makes plot unusual.

Independence assumption: not met. Duplicate values are

’nested’ within tubers or stalks. There are really only 4

independent tubers and 4 independent stalks. But all 8 ob-

servations within each sample are not all independent.

Equal variances: not met, as said earlier.

Normality: may be. Even though the normal scores plot

is not linear, its shape might be only explained by the

rhubarb’s large SD.

is < .001. Strong evidence that the 5 oca cultivars do not

all have the same mean oxalate content.

(d) For α= 0.01 we get Q= 5.05 at df=30 and 4.93 at

df=40. Our dfError=35. We prefer being conservative and

use Q= 5.0, close to the Q value at df=30.(The true value

is Q= 4.98 at dfError= 35). The critical distance is the

dQ= 5.0p3251 ∗1/8 = 100.8. Only cultivar O2 is signifi-

cantly different from all other oca cultivars.

O1 O4 O3 O5 O2

306 336 397 403 539

---------------

5. (a) False. The equality is always valid, even when X

and Yare dependent.

(b) True. The median depends on middle values only, not

on the largest or on the lowest values. The mean depends

on all values and can be pulled by outliers.

is not.

(d) False. The Binomial can be skewed, but never bimodal.

(e) True. The t-test depends on means, which are affected

by outliers. The Mann-Whitney test depends on ranks only,

not affected by extreme values.

Partial preview of the text

Download Final Exam with Solution - Statistical Methods for Bioscience I | STAT 571 and more Exams Data Analysis & Statistical Methods in PDF only on Docsity!

Stat 571 Final Exam - Solution December 19, 2007

1.(a) 37 ∗ 24 /66 = 13. 45 (b) X^2 = 1.5365 + 0.055 + · · · + 2.943 = 8.8705. Using df= we get a p-value slightly above 0.01. We reject the null hypothesis: there is evidence that mole rats from different colonies choose their tubers differently. Note: from the small contributions of colony B cells, mole rats from colony B seem to have average tuber preferences. Colonies A and C have the largest contributions, so they seem to differ more markedly from the average tuber preference profile.

2.(a) ˆb 1 = 26, 198. 4 / 1006 .6 = 26.02662. ˆb 0 = 471. 4 − 26. 02 ∗ 16 .1 = 52.37.

For each increase of 1kg in female’s weight, there is an av- erage increase of 26,027 eggs. (b) r = 26, 198. 4 /

6 ∗ 811 , 738 = 0.916 so r^2 = 0.84. It means that 84% of the variability in the number of eggs is explained by the female’s weight. (c) SSError= (1 − r^2 ) ∗ 811 , 738 = 129, 878 so s^2 e = MSEr- ror = SSError/(15 − 2) = 9990.93, and se = 99.954 (which seems correct from the graph provided). Then the standard error of ˆb 1 is

T-test for the slope: t = 26. 02 / 3 .15 = 8.26 on df= 13. We get a p-value < 2 ∗ .001. Strong evidence that the slope is

(d) prediction of one new value: 52.37 + 26. 02 ∗ 19 = 546. 87 thousands of eggs if the female weighs 19kg, and 52.37 +

02 ∗ 24 = 677.0 thousands of eggs at 24kg. Larger. This is because ¯x = 16.1kg so 19kg is closer to the mean female’s weight than 24kg. The SE of the prediction is larger when (x − x¯)^2 is larger, and here (24 − x¯)^2 is larger than (19 − ¯x)^2.
See Ley et al. Nature 444:1022 (2006). (a) Paired t-test, 2 sided. (b) t = 3.09, df= 12−1 = 11 so the p-value is 2∗. 005 < p < 2 ∗ .01 i.e.. 01 < p < .02. We reject the null hypothesis. In obese subjects (such as those sampled for the study), the mean bacteroidetes abundance is higher at week 54 than at week 12 of the diet.

4.(a) We can get SSError as (8−1)∗ 7852 +· · ·+(8−1)∗ 622 = 4 , 427 , 213. To get SSCultivar, we can first get the grand mean (8∗1830+· · ·+8∗403)/48 = 635.17 then SSCultivar is 8 ∗ (1830 − 635 .17)^2 + · · · + 8 ∗ (403 − 635 .17)^2 = 13, 962 , 803. We can also get SSCultivar by subtracting SSError from SSTotal. Source df SS MS F p-value Cultivar 5 13,962,803 2,792,561 26.5 <. 001 Error 42 4,427,213 105,409. Total 47 18,390, No, this analysis does not provide evidence that the 5 oca

cultivars do not all have the same mean. The significant p- value could be due to the rhubarb having a different mean than all the oca cultivars, but the 5 ocas’ means could all be the same. (b) From the residual vs. fitted values plot, we see that the rhubard residuals are way larger than the oca residuals. It is also suggested by the large SD in the rhubarb sample (785), about 10 times bigger than those in the oca sam- ples (41-72). The large rhubarb residuals appear as large outliers (positive and negative) in the normal scores plot. That’s what makes plot unusual. Independence assumption: not met. Duplicate values are ’nested’ within tubers or stalks. There are really only 4 independent tubers and 4 independent stalks. But all 8 ob- servations within each sample are not all independent. Equal variances: not met, as said earlier. Normality: may be. Even though the normal scores plot is not linear, its shape might be only explained by the rhubarb’s large SD. (c) dfCultivar is 5 − 1 = 4 and dfError is 35 so the p-value is < .001. Strong evidence that the 5 oca cultivars do not all have the same mean oxalate content. (d) For α = 0.01 we get Q = 5.05 at df=30 and 4.93 at df=40. Our dfError=35. We prefer being conservative and use Q = 5.0, close to the Q value at df=30.(The true value is Q = 4.98 at dfError= 35). The critical distance is the dQ = 5. 0

3251 ∗ 1 /8 = 100.8. Only cultivar O2 is signifi- cantly different from all other oca cultivars.

O1 O4 O3 O5 O 306 336 397 403 539

(a) False. The equality is always valid, even when X and Y are dependent. (b) True. The median depends on middle values only, not on the largest or on the lowest values. The mean depends on all values and can be pulled by outliers. (c) False. The t-distribution is symmetric, the F-distribution is not. (d) False. The Binomial can be skewed, but never bimodal. (e) True. The t-test depends on means, which are affected by outliers. The Mann-Whitney test depends on ranks only, not affected by extreme values.

Final Exam with Solution - Statistical Methods for Bioscience I | STAT 571, Exams of Data Analysis & Statistical Methods

Related documents

Partial preview of the text

Download Final Exam with Solution - Statistical Methods for Bioscience I | STAT 571 and more Exams Data Analysis & Statistical Methods in PDF only on Docsity!

Stat 571 Final Exam - Solution December 19, 2007