Sample Midterm Examination 1 - Statistical Methods | STAT 500, Exams of Data Analysis & Statistical Methods

Material Type: Exam; Professor: Dixon; Class: STATISTICAL METHODS; Subject: STATISTICS; University: Iowa State University; Term: Fall 2007;

Typology: Exams

Pre 2010

Uploaded on 09/02/2009

koofers-user-51k-1
koofers-user-51k-1 🇺🇸

10 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Stat 500 Midterm 1 2 October 2007 page 1 of 6
Please put your name on the back of your answer book.
Do NOT put it on the front. Thanks.
The exam is closed book, closed notes. Use only the formula sheet and tables I provide today.
You may use a calculator.
Write your answers in your blue book. Ask if you need a second (or third) blue book.
You have 2 hours (120 minutes) to complete the exam.
Stop working when the end of the exam is announced.
Points are indicated for each question. There are 120 total points.
Important reminders:
budget your time. Some parts of each question should be easy; others may be hard. Make
sure you do all parts you can.
notice that some parts do not require any computations.
show your work neatly so you can receive partial credit.
Good luck!
1. 10 pts. This problem is based on a study of nurses salaries in for-profit and non-profit hospitals.
The investigators are interested in the evenness of salaries; that is whether all nurses receive
similar salaries or whether salaries are very different. The Gini coefficient was used as the
measure of salary evenness. The details of the computation are irrelevant, but it may help to
know that a Gini coefficient of 0 indicates all salaries are identical (maximum possible evenness)
and a coefficient of 1 indicates the minimum possible evenness.
The investigators randomly sampled 20 for-profit hospitals and 20 non-profit hospitals in the
United States and computed the Gini coefficient for each group. Here is what they found:
Group niGini coefficient s.e.
for-profit 20 0.672 0.120
non-profit 20 0.457 0.105
difference 0.215 0.159
The investigators computed a randomization distribution with 99 values. Here are the 10 smallest
and 10 largest values in the randomization distribution:
-0.456 -0.356 -0.273 -0.257 -0.231 -0.210 -0.198 -0.157 -0.152 -0.143 · · · 0.171 0.172 0.181 0.218
0.239 0.248 0.263 0.357 0.440 0.601
They also computed a bootstrap distribution with 100 values. Here are the 10 smallest and 10
largest values in that distribution.
-0.173 -0.171 -0.168 -0.150 -0.143 -0.141 -0.135 -0.130 -0.127 -0.113 · · · 0.554 0.555 0.568 0.599
0.608 0.642 0.717 0.745 0.786 0.800
a) 5 pts. Calculate an appropriate design-based 90% confidence interval for the difference between
the Gini coefficients between for-profit and non-profit hospitals in the United States. If you need
additional information, indicate what is needed.
b) 5 pts. Is it reasonable to extrapolate from the 40 hospitals in the study to all for-profit and
all non-profit hospitals in the United States? Briefly explain why or why not.
pf3
pf4
pf5

Partial preview of the text

Download Sample Midterm Examination 1 - Statistical Methods | STAT 500 and more Exams Data Analysis & Statistical Methods in PDF only on Docsity!

Please put your name on the back of your answer book. Do NOT put it on the front. Thanks.

  • The exam is closed book, closed notes. Use only the formula sheet and tables I provide today. You may use a calculator.
  • Write your answers in your blue book. Ask if you need a second (or third) blue book.
  • You have 2 hours (120 minutes) to complete the exam. Stop working when the end of the exam is announced.
  • Points are indicated for each question. There are 120 total points.
  • Important reminders:
    • budget your time. Some parts of each question should be easy; others may be hard. Make sure you do all parts you can.
    • notice that some parts do not require any computations.
    • show your work neatly so you can receive partial credit.
  • Good luck!
  1. 10 pts. This problem is based on a study of nurses salaries in for-profit and non-profit hospitals. The investigators are interested in the evenness of salaries; that is whether all nurses receive similar salaries or whether salaries are very different. The Gini coefficient was used as the measure of salary evenness. The details of the computation are irrelevant, but it may help to know that a Gini coefficient of 0 indicates all salaries are identical (maximum possible evenness) and a coefficient of 1 indicates the minimum possible evenness. The investigators randomly sampled 20 for-profit hospitals and 20 non-profit hospitals in the United States and computed the Gini coefficient for each group. Here is what they found:

Group ni Gini coefficient s.e. for-profit 20 0.672 0. non-profit 20 0.457 0. difference 0.215 0.

The investigators computed a randomization distribution with 99 values. Here are the 10 smallest and 10 largest values in the randomization distribution: -0.456 -0.356 -0.273 -0.257 -0.231 -0.210 -0.198 -0.157 -0.152 -0.143 · · · 0.171 0.172 0.181 0. 0.239 0.248 0.263 0.357 0.440 0. They also computed a bootstrap distribution with 100 values. Here are the 10 smallest and 10 largest values in that distribution. -0.173 -0.171 -0.168 -0.150 -0.143 -0.141 -0.135 -0.130 -0.127 -0.113 · · · 0.554 0.555 0.568 0. 0.608 0.642 0.717 0.745 0.786 0. a) 5 pts. Calculate an appropriate design-based 90% confidence interval for the difference between the Gini coefficients between for-profit and non-profit hospitals in the United States. If you need additional information, indicate what is needed. b) 5 pts. Is it reasonable to extrapolate from the 40 hospitals in the study to all for-profit and all non-profit hospitals in the United States? Briefly explain why or why not.

  1. 32 pts (4pts each part). A randomized experiment was set up to compare 3 treatments (labelled A, B, and C). Each treatment was randomly applied to 8 e.u’s, so the total sample size, N , is
    1. The sample means, variances, and pooled error variance are:

Y (^) A. Y (^) B. Y (^) C. s^2 A s^2 B s^2 C s^2 p 3.10 2.00 2.00 1.2 1.2 0.6 1.

The investigators are especially interested in the difference between treatments A and B. They test Ho: μA − μB = 0 using a linear contrast, the pooled error variance, and a T distribution. The p-value is 0.039. Each item in the following list identifies one change in the experimental design or data. Each item on the list may affect the p-value for the test of Ho: μA − μB = 0. Tell me whether the p- value will INCREASE (become less significant), DECREASE (become more significant), NOT CHANGE, or you CAN’T TELL. You DO NOT need to calculate or report the new p-value(s). No explanation needed.

(a) Decrease the number of replicates (e.g. from 8 to 4 e.u.s per treatment) (b) Increase the sample variance (s^2 p) (c) Decrease the sample average for treatment A from 3.10 to 2. (d) Increase the number of treatments (e.g. from 3 to 5). The number of replicates per e.u. is not changed.

The following are changes to the test. Tell me whether the p-value for the new test will INCREASE (become less significant), DECREASE (become more significant), NOT CHANGE, or you CAN’T TELL, compared to the original test. Again, you DO NOT need to report the new p-value(s).

(e) Test Ho: μA − μB = 1. (f) Test Ho: μA − (μB + μC )/2 = 0 (g) Test Ho: μA − μB = 0 using a Tukey multiple comparisons adjustment. (h) Test Ho: μA − μB = 0 using an unequal variance (Welch) t-test.

  1. 48 pts. The following problem is based on ISU research to understand biological mechanisms for the beneficial effects of Echinacea, a herbal medicine. A graduate student is looking at anti- inflamatory activity, using an assay of PGE-2 activity. Lower values are a good thing; they correspond to less inflamation after an injury. There are many different species of Echinacea and each species grows in many different locations. Different species and different locations are suspected of having different anti-inflamatory activity. In this particular experiment, the student is comparing seven treatments, including two different species each from three locations, and a control treatment. The control is not expected to have any effect. 35 “cell cultures” were prepared; treatments were randomly assigned these cell cultures, with 5 cell cultures per treatment. There are 35 lines of data in the data file. SAS code and output follow.

Treatment # species location average s.d. 1 control 91.8 1. 2 E. angustifolia A 79.2 2, 3 E. angustifolia B 74.4 3. 4 E. angustifolia C 79.4 3. 5 E. purpurea D 29.8 9. 6 E. purpurea E 23.8 4. 7 E. purpurea F 28.4 4.

(a) 10 pts. Complete the ANOVA table Source d.f. SS MS Treatments Error 17. Total 25271.

(b) 5 pts. Test Ho: all treatments have the same mean. Report your test statistic and an approximate p-value. Provide an appropriate one sentence conclusion. (c) 8 pts. What is the observational unit in this study? What is the experimental unit?

The investigators chose the treatments because they are interested in three questions: a) What is the difference between the control and the average of the 6 Echinacea treatments? b) What is the difference between the two Echinacea species, averaged over locations? c) Are there differences between locations within species?

(d) 6 pts. Estimate the difference between the two Echinaceae species averaged over locations. Estimate the s.e. of this difference. What d.f. is associated with this s.e.? (e) 4 pts. Both questions a) and b) above can be answered by linear contrasts. Are those contrasts orthogonal? Explain why or why not. (f) 5 pts. The investigator’s question c, “Are there differences between locations within species?” is a comparison between locations A, B, and C, and between locations D, E, and F. Use the available information to provide the most appropriate answer. Report your test statistic (or statistics) and p-value(s). If there is insufficient information, indicate what additional information you need. (g) 5 pts. Is the assumption of independence appropriate here? Explain why or why not. (h) 5 pts. A residual plot for these data is shown at the end of the SAS outout. Is it appropriate to log transform the data? Explain why or why not.

data pge; infile ...; input trt pge2; proc glm; class trt; model pge2 = trt; lsmeans trt / stderr pdiff adjust=tukey; estimate ’1 vs ave. of rest’ trt 6 -1 -1 -1 -1 -1 -1 / divisor=6; contrast ’1 vs ave. of rest’ trt 6 -1 -1 -1 -1 -1 -1; contrast ’ave. of 2,3,4 - ave. of 5,6,7’ trt 0 1 1 1 -1 -1 -1; contrast ’2 - 3’ trt 0 1 -1 0 0 0 0; contrast ’2 - 4’ trt 0 1 0 -1 0 0 0; contrast ’3 - 4’ trt 0 0 1 -1 0 0 0; contrast ’5 - 6’ trt 0 0 0 0 1 -1 0; contrast ’5 - 7’ trt 0 0 0 0 1 0 -1; contrast ’6 - 7’ trt 0 0 0 0 0 1 -1; title ’Echinaceae PGE2 assay’; run; Echinaceae PGE2 assay Class Level Information

Class Levels Values trt 7 1 2 3 4 5 6 7

Number of Observations Read 35 Number of Observations Used 35

Part of the output deleted

Least Squares Means Adjustment for Multiple Comparisons: Tukey

Standard trt pge2 LSMEAN Error Pr > |t| 1 91.8000000 1.8852813 <. 2 79.2000000 1.8852813 <. 3 74.4000000 1.8852813 <. 4 74.0000000 1.8852813 <. 5 29.8000000 1.8852813 <. 6 23.8000000 1.8852813 <. 7 28.4000000 1.8852813 <.