Practice Final Exam - Statistical Methods for Bioscience I | STAT 572 | Assignments Data Analysis & Statistical Methods

Stat 572 Final Examination Spring 2008

1. false. r=b1∗sx/sy= (−3/4) ∗2/6 = −0.25

2. true.

3. false. The variance equals the mean.

4. false. It depends on the sign of the coefficient

for x1.

5. true.

6. false. The standard deviation 2.5 is about vari-

ability in the data around the mean value, not

about our uncertainty about the mean value.

7. true.

8. true.

9. (a) From model 1, the prediction is the same

at all doses: 21.50 + 0.95 ∗235 = 244.75 g.

From model 2, the predicted weight is 1.12+

1.00 ∗235 = 236.12 g in the control group,

1.12 + 8.90 + 1.00 ∗235 = 245.02 g in the

low dose group, 1.12 + 19.1 + 1.00 ∗235 =

255.22 g in the high dose group. With

model 3, the predicted weight is 4.50 +

0.99 ∗235 = 237.15 g in the control group,

4.50 + 0.99 ∗235 −17.36 + 0.12 ∗235 =

247.99 g in the low dose group, 4.50+0.99∗

235 + 25.07 −0.03 ∗235 = 255.17 g in the

high dose group.

(b) Same slope in all groups: 0.95 (no units:

g/g) in model 1. In model 2, the slope is

still assumed to be the same in all groups:

1.00. This model says that the final weight

is expected to be equal to the initial weight,

plus a value that depends on which treat-

ment was received. Model 3: slopes are

no longer assumed to be equal. Estimated

to be 0.99 −0.03 = 0.96 in the high dose

group. With this model, we estimate that

with the high dose, small rats gain more

weight on average than large rats. For in-

stance, a 180g rat gains (4.50 + 25.07 +

0.96 ∗180) −180 = 22.37 g while a 240g

rat would gain on average (4.50 + 25.07 +

0.96 ∗240) −240 = 19.97 g.

b1±tSEˆ

b1where

tis the quantile from the t distribution

with df= 30 −4. This is easier in models

1 and 2, where we read the standard er-

ror of the common slope directly from the

output: 0.09 in model 1, 0.05 in model

2. In model 3, we do not have the slope’s

standard error in the high group directly.

We have the SE of the control group slope

(0.08) and the SE of the difference between

the control group slope and the high group

slope (0.11). With more output, we could

get the correlation between these two es-

timates, then we could use a formula to

get the SE of the high group slope (we did

not see this formula in class in 2009). An

alternative would be to set the ’high dose’

group as the baseline group, and re-run

the analysis so that the output provides

the SE of the slope in the high group di-

rectly.

(d) Model 2 is most appropriate: the f test

comparing model 2 and model 3 says that

the slopes in the 3 groups are not signif-

icantly different (p=.4676). Also, the f

test comparing model 2 and model 1 says

that the intercepts in the 3 different groups

are different, with very strong evidence (p=

4.510−9).

(e) We would need to consider the treatment

as a numerical variable (dosage d= 0, 0.10

and 0.25) rather than a categorical vari-

able (control, low, high). The model de-

scribed can be written as:

final weight = (β0+β1d)∗initial weight+e

Therefore, to fit this model, I would use

linear regression with the following pre-

dictors: dosage (numerical), initial weight,

and their interaction dosage:initial weight.

10. (a) Using model 1: estimated probability that

the tumor is malignant:

p= (1 + exp(−(5.1820 −0.9461 ∗3)))−1

= 0.9124 if the score is 3, and p= (1 +

exp(−(5.1820 −0.9461 ∗6)))−1= 0.3788 if

the score is 6. Using model 2:

Partial preview of the text

Download Practice Final Exam - Statistical Methods for Bioscience I | STAT 572 and more Assignments Data Analysis & Statistical Methods in PDF only on Docsity!

Stat 572 Final Examination Spring 2008

false. r = b 1 ∗ sx/sy = (− 3 /4) ∗ 2 /6 = − 0. 25
true.
false. The variance equals the mean.
false. It depends on the sign of the coefficient for x 1.
true.
false. The standard deviation 2.5 is about vari- ability in the data around the mean value, not about our uncertainty about the mean value.
true.
true.
(a) From model 1, the prediction is the same at all doses: 21.50 + 0. 95 ∗ 235 = 244.75 g. From model 2, the predicted weight is 1.12+
1. 00 ∗ 235 = 236.12 g in the control group, 1 .12 + 8.90 + 1. 00 ∗ 235 = 245.02 g in the low dose group, 1.12 + 19.1 + 1. 00 ∗ 235 = 255 .22 g in the high dose group. With model 3, the predicted weight is 4.50 +
2. 99 ∗ 235 = 237.15 g in the control group, 4 .50 + 0. 99 ∗ 235 − 17 .36 + 0. 12 ∗ 235 = 247 .99 g in the low dose group, 4.50+0. 99 ∗ 235 + 25. 07 − 0. 03 ∗ 235 = 255.17 g in the high dose group. (b) Same slope in all groups: 0.95 (no units: g/g) in model 1. In model 2, the slope is still assumed to be the same in all groups: 1.00. This model says that the final weight is expected to be equal to the initial weight, plus a value that depends on which treat- ment was received. Model 3: slopes are no longer assumed to be equal. Estimated to be 0. 99 − 0 .03 = 0.96 in the high dose group. With this model, we estimate that with the high dose, small rats gain more weight on average than large rats. For in- stance, a 180g rat gains (4.50 + 25.07 +
96 ∗ 180) − 180 = 22.37 g while a 240g rat would gain on average (4.50 + 25.07 +
96 ∗ 240) − 240 = 19.97 g.

(c) Confidence interval: use ˆb 1 ± t SEˆb 1 where t is the quantile from the t distribution with df= 30 − 4. This is easier in models 1 and 2, where we read the standard er- ror of the common slope directly from the output: 0.09 in model 1, 0.05 in model

In model 3, we do not have the slope’s standard error in the high group directly. We have the SE of the control group slope (0.08) and the SE of the difference between the control group slope and the high group slope (0.11). With more output, we could get the correlation between these two es- timates, then we could use a formula to get the SE of the high group slope (we did not see this formula in class in 2009). An alternative would be to set the ’high dose’ group as the baseline group, and re-run the analysis so that the output provides the SE of the slope in the high group di- rectly. (d) Model 2 is most appropriate: the f test comparing model 2 and model 3 says that the slopes in the 3 groups are not signif- icantly different (p = .4676). Also, the f test comparing model 2 and model 1 says that the intercepts in the 3 different groups are different, with very strong evidence (p =
510 −^9 ). (e) We would need to consider the treatment as a numerical variable (dosage d = 0, 0. 10 and 0.25) rather than a categorical vari- able (control, low, high). The model de- scribed can be written as:

final weight = (β 0 +β 1 d)∗initial weight+e

Therefore, to fit this model, I would use linear regression with the following pre- dictors: dosage (numerical), initial weight, and their interaction dosage:initial weight.

(a) Using model 1: estimated probability that the tumor is malignant: p = (1 + exp(−(5. 1820 − 0. 9461 ∗ 3)))−^1 = 0.9124 if the score is 3, and p = (1 + exp(−(5. 1820 − 0. 9461 ∗ 6)))−^1 = 0.3788 if the score is 6. Using model 2:

Stat 572 Final Examination Spring 2008

p = (1+exp(−(3. 6739 − 0. 2238 ∗ 3 − 0. 0772 ∗ 32 )))−^1 = 0 .9095 if the score is 3, and p = (1+exp(−(3. 6739 − 0. 2238 ∗ 6 − 0. 0772 ∗ 62 )))−^1 = 0.3898 if the score is 6. The es- timates from model 2 seem to be closer to the proportions observed in the experi- ment. (b) The drop in deviance is 13. 8 − 7 .1 = 6.7, which we can compare to a chi-square dis- tribution with df= 1 (only 1 extra param- eter in model 2). Since 6.7 is quite bigger than 1, I think the p-value for this anal- ysis of deviance is quite low and model 2 is preferable. (now I looked up in R: the p-value is 0.0096). We may also consider the estimate for the quadratic coefficient (− 0 .0772) and its standard error (0.0317). The estimate is a bit larger (in absolute value) than twice its SE. But since this Wald test is borderline, and since the anal- ysis of deviance is more reliable, I would rather present the analysis of deviance.

(a) The model that was used can be written as: yi = vj[i] + sk[i] + bh[i] + ei where vj is the mean yield of variety j, sk is the effect of site k, bh is the effect of block h (blocks are nested within sites). Fur- thermore, site and block effects were ran- dom, i.e. it was assumed that sk ∼ i.i.d. N (0, σ s^2 ) and bh ∼ i.i.d. N (0, σ^2 b ). It was estimated that ˆσs = 9.314 bushels/acre and ˆσb = 5.634 (and ˆσe = 15.152). (b) The effect of site 10 is estimated to be 5. and the estimated effect of block 2 in that site is 3.44 bushels/acre. Therefore, the average yields in block 2 of site 10 are esti- mated to be 125.73+8.64 = 134.37 bushels per acre for variety A, 128.09 + 8.64 = 136 .73 for variety B, and 120.42 + 8.64 = 129 .06 for variety C. (c) To find a confidence interval for the dif- ference in mean yields between varieties A and C, I would repeat this many times:

simulate new data under the current model
analyze this new data with the same

model

record the estimated difference in mean yields between varieties A and C: ˆvAi −ˆvCi, where i is an index for the simulation. After running this for i = 1 to i = 1000 for instance, I would look at the 1000 val- ues of ˆvA − ˆvC , and discard the lowest 25 values and the highest 25 values to get a 95% confidence interval for the true value of vA − vC. (d) i. 134. 37 ± 2 ∗ 15 .152 = (104. 07 , 164 .67) ii. 134. 37 − 129. 06 ± 2 ∗

iii. ± 2 ∗

iv. ± 2 ∗

Practice Final Exam - Statistical Methods for Bioscience I | STAT 572, Assignments of Data Analysis & Statistical Methods

Related documents

Partial preview of the text

Download Practice Final Exam - Statistical Methods for Bioscience I | STAT 572 and more Assignments Data Analysis & Statistical Methods in PDF only on Docsity!

Stat 572 Final Examination Spring 2008

Stat 572 Final Examination Spring 2008