Solved Midterm Exam 2 - Statistical Methods for Bioscience I | STAT 571, Exams of Data Analysis & Statistical Methods

Material Type: Exam; Professor: Ane; Class: Statistical Methods for Bioscience I; Subject: STATISTICS; University: University of Wisconsin - Madison; Term: Fall 2007;

Typology: Exams

Pre 2010

Uploaded on 09/02/2009

koofers-user-ela
koofers-user-ela 🇺🇸

10 documents

1 / 1

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Stat 571 Second Midterm Exam November 20, 2007
1(a) df= 10+ (131) 2 = 20 since one zero is dropped in
the second sample. With a 2-sided test .02 < p < .05
and we conclude that the two groups have different
variances (reject σ1=σ2”).
(b) Using the t-test that does not require equal variance
we get s2
¯y1¯y2=8.062
10 +2.052
13 = 6.82 and t= (90.30
86.65)/6.82 = 1.398. Using df= 9 and a 2-sided test
we get .10 <p<.20. We fail to reject µ1=µ2”:
there is no significant difference between the average
size of the two Lyrebird species.
2(a) the Mann-Whitney test p-value is 0.028 because the
ranks do not change. The t-test p-value is smaller
than 0.006 because only ¯y1¯y2changes (increases).
Sample sizes (4) and variability (s1and s2) stay the
same. So tincreases and the p-value decreases.
(b) Data set 3 (top). Since the ranks are the same in
data set 3 and in data set 1, the p-value from Mann-
Whitney test is 0.028, which is < .05. With data set
4 the rank sums would be 23 and 13, and 13 is too
big to get significance at the .05 level (the table says
we need 10 or less). Note: data set 3 returns p > .05
with a t-test because the variability in sample “b” is
large. Data set 4 also returns p>.05 with a t-test
because the difference in sample means is small.
(c) For data set 3, Mann-Whitney test would be most
appropriate, because we are not sure the data are
normally distributed (presence of an ’outlier’ in sam-
ple “b”) and the sample size in each sample is small
(4). For data set 4, a t-test would be most appro-
priate, because the normal distribution seems appro-
priate in each sample and the t-test is more powerful
than Mann-Whitney test when both can be used.
3(a) Smaller with a one-sided test in the correct direction.
The formula for nis σ2(zα/2+zβ)2/(µ0µa)2for a
2-sided test, while we replace zα/2=z.015 = 2.17
by zα=z.03 = 1.88 for a one-sided test. zαbeing
smaller, nis also smaller for a one-sided test.
(b) Using n=σ2(zα/2+zβ)2/(µ0µa)2we get 25 =
104 (2.17 + zβ)2/(0 5)2i.e. zβ=6.01 2.17 =
0.281 and with Table A β=.3897 so the power is
61%. One can get the same result by determining the
rejection region: outside 0±2.17p104/25 = ±4.426.
Then the power is P{Z < 9.426/2.04}+P{Z >
0.574/2.04}= 0 + P{Z > 0.281}= 1 0.3897 =
0.61.
4(a) ˆpA= 63/220 = .2864, ˆpC= 59/80 = .7375 and
ˆp= (63 + 59)/300 = .4067. We can approximate the
binomial with a normal distribution because 220ˆp=
41.3>5 and 220 ˆq= 178.7>5 and similarly
80 ˆp= 37.4>5 and 80 ˆq > 5. Then z= (.7375
.2864)/p.4067 .5933 (1/220 + 1/80) = 7.034. Us-
ing the normal distribution we get p < 22.9107is
way smaller than .0001 and way smaller than α=.10.
We strongly reject pA=pC: there is strong evidence
that the locus is linked to some genetic region affect-
ing flowering time, the C allele being linked to early
flowering. Assumptions include random sampling of
and independence among plants.
(b) Observations are clearly paired: each leaf observa-
tion is paired to the root observation made on the
same plant. Therefore, one would use a two-sided
paired-sample t-test, provided that the distribution
of difference (leaf expression -root expression) is not
too far from a normal distribution. Note: there is no
need to assume that σ1=σ2and no need to assume
normality of gene expression. Only the normality of
the expression difference is needed.
5(a) false: we mostly use a t-distribution. The normal
distribution is only used when the variance is known.
true
true because z-quantiles are smaller than t-quantiles.
false: the confidence interval is centered at the sam-
ple mean y, which is random) while the not-rejection
region is centered at the null hypothesis µ0(not ran-
dom - known before collecting the data)
true
true: they both have the same length, which is twice
zα/2σ/n.
Summary of grades:
Frequency
20 40 60 80 100
0 10 20 30 40
●●
76 83 89
1

Partial preview of the text

Download Solved Midterm Exam 2 - Statistical Methods for Bioscience I | STAT 571 and more Exams Data Analysis & Statistical Methods in PDF only on Docsity!

Stat 571 Second Midterm Exam November 20, 2007

1(a) df= 10+(13−1)−2 = 20 since one zero is dropped in the second sample. With a 2-sided test. 02 < p <. 05 and we conclude that the two groups have different variances (reject “σ 1 = σ 2 ”). (b) Using the t-test that does not require equal variance we get s^2 ¯y 1 −¯y 2 = 8.^06 2 10 +^

  1. 052 13 = 6.82 and^ t^ = (90.^30 − 86 .65)/

6 .82 = 1.398. Using df= 9 and a 2-sided test we get. 10 < p < .20. We fail to reject “μ 1 = μ 2 ”: there is no significant difference between the average size of the two Lyrebird species. 2(a) the Mann-Whitney test p-value is 0.028 because the ranks do not change. The t-test p-value is smaller than 0.006 because only ¯y 1 − y¯ 2 changes (increases). Sample sizes (4) and variability (s 1 and s 2 ) stay the same. So t increases and the p-value decreases. (b) Data set 3 (top). Since the ranks are the same in data set 3 and in data set 1, the p-value from Mann- Whitney test is 0.028, which is < .05. With data set 4 the rank sums would be 23 and 13, and 13 is too big to get significance at the .05 level (the table says we need 10 or less). Note: data set 3 returns p >. 05 with a t-test because the variability in sample “b” is large. Data set 4 also returns p > .05 with a t-test because the difference in sample means is small. (c) For data set 3, Mann-Whitney test would be most appropriate, because we are not sure the data are normally distributed (presence of an ’outlier’ in sam- ple “b”) and the sample size in each sample is small (4). For data set 4, a t-test would be most appro- priate, because the normal distribution seems appro- priate in each sample and the t-test is more powerful than Mann-Whitney test when both can be used. 3(a) Smaller with a one-sided test in the correct direction. The formula for n is σ^2 (zα/ 2 + zβ )^2 /(μ 0 − μa)^2 for a 2-sided test, while we replace zα/ 2 = z. 015 = 2. 17 by zα = z. 03 = 1.88 for a one-sided test. zα being smaller, n is also smaller for a one-sided test. (b) Using n = σ^2 (zα/ 2 + zβ )^2 /(μ 0 − μa)^2 we get 25 = 104 ∗ (2.17 + zβ )^2 /(0 − 5)^2 i.e. zβ =

0 .281 and with Table A β = .3897 so the power is 61%. One can get the same result by determining the rejection region: outside 0± 2. 17 ∗

Then the power is P {Z < − 9. 426 / 2. 04 } + P {Z > − 0. 574 / 2. 04 } = 0 + P {Z > − 0. 281 } = 1 − 0 .3897 = 0 .61. 4(a) pˆA = 63/220 = .2864, pˆC = 59/80 = .7375 and pˆ = (63 + 59)/300 = .4067. We can approximate the

binomial with a normal distribution because 220∗ pˆ =

  1. 3 > 5 and 220 ∗ qˆ = 178. 7 > 5 and similarly 80 ∗ pˆ = 37. 4 > 5 and 80 ∗ ˆq > 5. Then z = (. 7375 − .2864)/

. 4067 ∗. 5933 ∗ (1/220 + 1/80) = 7.034. Us- ing the normal distribution we get p < 2 ∗ 2. 9 ∗ 10 −^7 is way smaller than .0001 and way smaller than α = .10. We strongly reject pA = pC : there is strong evidence that the locus is linked to some genetic region affect- ing flowering time, the C allele being linked to early flowering. Assumptions include random sampling of and independence among plants. (b) Observations are clearly paired: each leaf observa- tion is paired to the root observation made on the same plant. Therefore, one would use a two-sided paired-sample t-test, provided that the distribution of difference (leaf expression -root expression) is not too far from a normal distribution. Note: there is no need to assume that σ 1 = σ 2 and no need to assume normality of gene expression. Only the normality of the expression difference is needed. 5(a) false: we mostly use a t-distribution. The normal distribution is only used when the variance is known. true true because z-quantiles are smaller than t-quantiles. false: the confidence interval is centered at the sam- ple mean (¯y, which is random) while the not-rejection region is centered at the null hypothesis μ 0 (not ran- dom - known before collecting the data) true true: they both have the same length, which is twice zα/ 2 ∗ σ/

n. Summary of grades:

Frequency

20 40 60 80 100

0

10

20

30

40

l l l l l ll

76 83 89