Practice Final Exam - Statistical Methods for Bioscience I | STAT 572, Assignments of Data Analysis & Statistical Methods

Material Type: Assignment; Professor: Ane; Class: Statistical Methods for Bioscience II; Subject: STATISTICS; University: University of Wisconsin - Madison; Term: Unknown 1989;

Typology: Assignments

Pre 2010

Uploaded on 09/02/2009

koofers-user-gyz
koofers-user-gyz 🇺🇸

9 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Stat 572 Final Examination Spring 2008
1. false. r=b1sx/sy= (3/4) 2/6 = 0.25
2. true.
3. false. The variance equals the mean.
4. false. It depends on the sign of the coefficient
for x1.
5. true.
6. false. The standard deviation 2.5 is about vari-
ability in the data around the mean value, not
about our uncertainty about the mean value.
7. true.
8. true.
9. (a) From model 1, the prediction is the same
at all doses: 21.50 + 0.95 235 = 244.75 g.
From model 2, the predicted weight is 1.12+
1.00 235 = 236.12 g in the control group,
1.12 + 8.90 + 1.00 235 = 245.02 g in the
low dose group, 1.12 + 19.1 + 1.00 235 =
255.22 g in the high dose group. With
model 3, the predicted weight is 4.50 +
0.99 235 = 237.15 g in the control group,
4.50 + 0.99 235 17.36 + 0.12 235 =
247.99 g in the low dose group, 4.50+0.99
235 + 25.07 0.03 235 = 255.17 g in the
high dose group.
(b) Same slope in all groups: 0.95 (no units:
g/g) in model 1. In model 2, the slope is
still assumed to be the same in all groups:
1.00. This model says that the final weight
is expected to be equal to the initial weight,
plus a value that depends on which treat-
ment was received. Model 3: slopes are
no longer assumed to be equal. Estimated
to be 0.99 0.03 = 0.96 in the high dose
group. With this model, we estimate that
with the high dose, small rats gain more
weight on average than large rats. For in-
stance, a 180g rat gains (4.50 + 25.07 +
0.96 180) 180 = 22.37 g while a 240g
rat would gain on average (4.50 + 25.07 +
0.96 240) 240 = 19.97 g.
(c) Confidence interval: use ˆ
b1±tSEˆ
b1where
tis the quantile from the t distribution
with df= 30 4. This is easier in models
1 and 2, where we read the standard er-
ror of the common slope directly from the
output: 0.09 in model 1, 0.05 in model
2. In model 3, we do not have the slope’s
standard error in the high group directly.
We have the SE of the control group slope
(0.08) and the SE of the difference between
the control group slope and the high group
slope (0.11). With more output, we could
get the correlation between these two es-
timates, then we could use a formula to
get the SE of the high group slope (we did
not see this formula in class in 2009). An
alternative would be to set the ’high dose’
group as the baseline group, and re-run
the analysis so that the output provides
the SE of the slope in the high group di-
rectly.
(d) Model 2 is most appropriate: the f test
comparing model 2 and model 3 says that
the slopes in the 3 groups are not signif-
icantly different (p=.4676). Also, the f
test comparing model 2 and model 1 says
that the intercepts in the 3 different groups
are different, with very strong evidence (p=
4.5109).
(e) We would need to consider the treatment
as a numerical variable (dosage d= 0, 0.10
and 0.25) rather than a categorical vari-
able (control, low, high). The model de-
scribed can be written as:
final weight = (β0+β1d)initial weight+e
Therefore, to fit this model, I would use
linear regression with the following pre-
dictors: dosage (numerical), initial weight,
and their interaction dosage:initial weight.
10. (a) Using model 1: estimated probability that
the tumor is malignant:
p= (1 + exp((5.1820 0.9461 3)))1
= 0.9124 if the score is 3, and p= (1 +
exp((5.1820 0.9461 6)))1= 0.3788 if
the score is 6. Using model 2:
1
pf2

Partial preview of the text

Download Practice Final Exam - Statistical Methods for Bioscience I | STAT 572 and more Assignments Data Analysis & Statistical Methods in PDF only on Docsity!

Stat 572 Final Examination Spring 2008

  1. false. r = b 1 ∗ sx/sy = (− 3 /4) ∗ 2 /6 = − 0. 25
  2. true.
  3. false. The variance equals the mean.
  4. false. It depends on the sign of the coefficient for x 1.
  5. true.
  6. false. The standard deviation 2.5 is about vari- ability in the data around the mean value, not about our uncertainty about the mean value.
  7. true.
  8. true.
  9. (a) From model 1, the prediction is the same at all doses: 21.50 + 0. 95 ∗ 235 = 244.75 g. From model 2, the predicted weight is 1.12+
    1. 00 ∗ 235 = 236.12 g in the control group, 1 .12 + 8.90 + 1. 00 ∗ 235 = 245.02 g in the low dose group, 1.12 + 19.1 + 1. 00 ∗ 235 = 255 .22 g in the high dose group. With model 3, the predicted weight is 4.50 +
    2. 99 ∗ 235 = 237.15 g in the control group, 4 .50 + 0. 99 ∗ 235 − 17 .36 + 0. 12 ∗ 235 = 247 .99 g in the low dose group, 4.50+0. 99 ∗ 235 + 25. 07 − 0. 03 ∗ 235 = 255.17 g in the high dose group. (b) Same slope in all groups: 0.95 (no units: g/g) in model 1. In model 2, the slope is still assumed to be the same in all groups: 1.00. This model says that the final weight is expected to be equal to the initial weight, plus a value that depends on which treat- ment was received. Model 3: slopes are no longer assumed to be equal. Estimated to be 0. 99 − 0 .03 = 0.96 in the high dose group. With this model, we estimate that with the high dose, small rats gain more weight on average than large rats. For in- stance, a 180g rat gains (4.50 + 25.07 +
  10. 96 ∗ 180) − 180 = 22.37 g while a 240g rat would gain on average (4.50 + 25.07 +
  11. 96 ∗ 240) − 240 = 19.97 g.

(c) Confidence interval: use ˆb 1 ± t SEˆb 1 where t is the quantile from the t distribution with df= 30 − 4. This is easier in models 1 and 2, where we read the standard er- ror of the common slope directly from the output: 0.09 in model 1, 0.05 in model

  1. In model 3, we do not have the slope’s standard error in the high group directly. We have the SE of the control group slope (0.08) and the SE of the difference between the control group slope and the high group slope (0.11). With more output, we could get the correlation between these two es- timates, then we could use a formula to get the SE of the high group slope (we did not see this formula in class in 2009). An alternative would be to set the ’high dose’ group as the baseline group, and re-run the analysis so that the output provides the SE of the slope in the high group di- rectly. (d) Model 2 is most appropriate: the f test comparing model 2 and model 3 says that the slopes in the 3 groups are not signif- icantly different (p = .4676). Also, the f test comparing model 2 and model 1 says that the intercepts in the 3 different groups are different, with very strong evidence (p =
  2. 510 −^9 ). (e) We would need to consider the treatment as a numerical variable (dosage d = 0, 0. 10 and 0.25) rather than a categorical vari- able (control, low, high). The model de- scribed can be written as:

final weight = (β 0 +β 1 d)∗initial weight+e

Therefore, to fit this model, I would use linear regression with the following pre- dictors: dosage (numerical), initial weight, and their interaction dosage:initial weight.

  1. (a) Using model 1: estimated probability that the tumor is malignant: p = (1 + exp(−(5. 1820 − 0. 9461 ∗ 3)))−^1 = 0.9124 if the score is 3, and p = (1 + exp(−(5. 1820 − 0. 9461 ∗ 6)))−^1 = 0.3788 if the score is 6. Using model 2:

Stat 572 Final Examination Spring 2008

p = (1+exp(−(3. 6739 − 0. 2238 ∗ 3 − 0. 0772 ∗ 32 )))−^1 = 0 .9095 if the score is 3, and p = (1+exp(−(3. 6739 − 0. 2238 ∗ 6 − 0. 0772 ∗ 62 )))−^1 = 0.3898 if the score is 6. The es- timates from model 2 seem to be closer to the proportions observed in the experi- ment. (b) The drop in deviance is 13. 8 − 7 .1 = 6.7, which we can compare to a chi-square dis- tribution with df= 1 (only 1 extra param- eter in model 2). Since 6.7 is quite bigger than 1, I think the p-value for this anal- ysis of deviance is quite low and model 2 is preferable. (now I looked up in R: the p-value is 0.0096). We may also consider the estimate for the quadratic coefficient (− 0 .0772) and its standard error (0.0317). The estimate is a bit larger (in absolute value) than twice its SE. But since this Wald test is borderline, and since the anal- ysis of deviance is more reliable, I would rather present the analysis of deviance.

  1. (a) The model that was used can be written as: yi = vj[i] + sk[i] + bh[i] + ei where vj is the mean yield of variety j, sk is the effect of site k, bh is the effect of block h (blocks are nested within sites). Fur- thermore, site and block effects were ran- dom, i.e. it was assumed that sk ∼ i.i.d. N (0, σ s^2 ) and bh ∼ i.i.d. N (0, σ^2 b ). It was estimated that ˆσs = 9.314 bushels/acre and ˆσb = 5.634 (and ˆσe = 15.152). (b) The effect of site 10 is estimated to be 5. and the estimated effect of block 2 in that site is 3.44 bushels/acre. Therefore, the average yields in block 2 of site 10 are esti- mated to be 125.73+8.64 = 134.37 bushels per acre for variety A, 128.09 + 8.64 = 136 .73 for variety B, and 120.42 + 8.64 = 129 .06 for variety C. (c) To find a confidence interval for the dif- ference in mean yields between varieties A and C, I would repeat this many times:
  • simulate new data under the current model
  • analyze this new data with the same

model

  • record the estimated difference in mean yields between varieties A and C: ˆvAi −ˆvCi, where i is an index for the simulation. After running this for i = 1 to i = 1000 for instance, I would look at the 1000 val- ues of ˆvA − ˆvC , and discard the lowest 25 values and the highest 25 values to get a 95% confidence interval for the true value of vA − vC. (d) i. 134. 37 ± 2 ∗ 15 .152 = (104. 07 , 164 .67) ii. 134. 37 − 129. 06 ± 2 ∗

iii. ± 2 ∗

iv. ± 2 ∗