









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Exam; Class: Introduction to Biostatistics; Subject: STATISTICS; University: University of Wisconsin - Madison; Term: Spring 2008;
Typology: Exams
1 / 15
This page cannot be seen from the preview
Don't miss anything!










6-1. Suppose that a random sample of n = 51 children is selected from the population of newborn infants in Mexico. The probability that a child in this population weighs at most 2500 grams is
2500 grams, using… (a) the exact binomial distribution, (b) the normal approximation to the binomial distribution (with continuity correction ). [[Technically, an assumption under which this method may be used is violated here. Why?]]
and that in this random sample of n = 51 children, we find six whose weights are under 2500 grams. Calculate the associated p -value (with continuity correction ) and 95% confidence
significance level.
6-2. A new “smart pill” is tested on n = 36 individuals randomly sampled from a certain population
is performed, to see if there is any statistically significant difference from the mean IQ score of the original population. Using this information, answer the following. (a) Calculate the p -value of the sample. (b) Fill in the following table, concluding with the decision either to reject or not reject the null
Significance
Confidence
Confidence Interval Decision about H 0
. . . (c) Extend these observations to more general circumstances. Namely, as the significance level decreases, what happens to the ability to reject a null hypothesis? Explain why this is so, in terms of the p -value and generated confidence intervals.
6-3. Consider the distribution of serum cholesterol levels for all 20- to 74-year-old males living in the United States. The mean of this population is 211 mg/dL, and the standard deviation is 46. mg/dL. In a study of a subpopulation of such males who smoke and are hypertensive, it is assumed (not unreasonably) that the distribution of serum cholesterol levels is normally
population. (a) Formulate the null hypothesis and complementary alternative hypothesis , for testing whether
smokers is equal to the known mean serum cholesterol level of 211 mg/dL of the general population of 20- to 74-year-old males. (b) In the study, a random sample of size n = 12 hypertensive smokers was selected, and found to have a sample mean cholesterol level of x = 217 mg/dL. Construct a 95% confidence interval for the true mean cholesterol level of this subpopulation.
(d) Based on your answers in parts (b) and (c), is the null hypothesis rejected in favor of the
exactly has been demonstrated, based on the empirical evidence? (e) Determine the 95% acceptance region and complementary rejection region for the null hypothesis. Is this consistent with your findings in part (d)? Why?
6-4. Consider a random sample of ten children selected from a population of infants receiving antacids that contain aluminum, in order to treat peptic or digestive disorders. The distribution of plasma
are not known. The mean aluminum level for the sample of n = 10 infants is found to be x = 37.
(a) Formulate the null hypothesis and complementary alternative hypothesis , for a two-sided test of whether the mean plasma aluminum level of the population of infants receiving antacids is equal to the mean plasma aluminum level of the population of infants not receiving antacids. (b) Construct a 95% confidence interval for the true mean plasma aluminum level of the population of infants receiving antacids.
(d) Based on your answers in parts (b) and (c), is the null hypothesis rejected in favor of the
exactly has been demonstrated, based on the empirical evidence? (e) With the knowledge that significantly elevated plasma aluminum levels are toxic to human beings, reformulate the null hypothesis and complementary alternative hypothesis , for the appropriate one-sided test of the mean plasma aluminum levels. With the same sample data as above, how does the new p -value compare with that found in part (c), and what is the resulting conclusion and interpretation?
6-7. Recall that, for any random variable X , the population mean and population variance are, respectively,
Likewise, for a random variable Y ,
In addition, suppose each population value of X corresponds to one and only one population value of Y. Then the population covariance between X and Y is defined by
Cov( X , Y ) = (^) σ (^) XY = (^) E (^) [ ( X − μ (^) X )( Y − μ Y )],
which measures the extent to which they vary with respect to one another.
These population “expected values” can be estimated by the sample mean , sample variance , and sample covariance , based on data values ( x 1 , x 2 , …, xn ) and ( y 1 , y 2 , …, yn ), respectively:
Mean( X ) ≈ x =
n (^) 1
n i i
∑ Var( X )^ ≈^ sx
n − 1
2 1
n i i
∑ −
Mean( Y ) ≈ y =
n (^) 1 Var( Y )^ ≈^ s
n i i
∑ y
n − 1
2 1
n i i
∑ −
Cov( X , Y ) ≈ sxy = 1 n − 1 1 (^ )(^ )
n i i i
∑ −^ −.
It can be shown (using previous properties of mathematical expectation and some elementary algebra), that for any two random variables X and Y ,
b) Var ( X + Y ) = Var( X ) + Var( Y ) + 2 Cov( X , Y )
and that these properties hold for the sample mean , sample variance , and sample covariance as well.
For the following ordered data sets X and Y , form the new data sets X + Y and X − Y. Calculate all of their sample means, variances, and associated covariance, and verify that formulas 1 and 2 hold. (In R, use mean, var, and cov.)
Recall that two events A and B are said to be independent if P ( A ∩ B ) = P ( A ) P ( B ). In the same spirit, two random variables X and Y are independent if their joint probability distribution P ( X ≤ x ∩ Y ≤ y ) = P ( X ≤ x ) P ( Y ≤ y ) for all x , y. It can be shown mathematically, that if X and Y are independent, then their covariance Cov( X , Y ) = 0. Repeat the above calculations for the following ordered data sets X and Y , paying special attention to
if X and Y are independent? (This property is crucial in § 6.2.1.)
6-8. The arrival time of my usual morning bus ( B ) is normally distributed, with a mean ETA at 8:00 AM, and a standard deviation of 4 minutes. My arrival time ( A ) at the bus stop is also normally distributed, with a mean ETA at 7:50 AM, and a standard deviation of 3 minutes.
(a) With what probability can I expect to catch the bus? ( Hint : What is the distribution of the random variable X = A – B , and what must be true about X in the event that I catch the bus?)
(b) How much earlier should I arrive, if I expect to catch the bus with 99% probability?
6-9. In this problem, assume that population cholesterol level is normally distributed.
(a) Consider a small clinical trial, designed to measure the efficacy of a new cholesterol- lowering drug against a placebo. A group of six high-cholesterol patients is randomized to either a treatment arm or a control arm, resulting in two numerically balanced samples of
Placebo Drug 220 180 240 200 290 220
(b) Now imagine that the same drug is tested using another pilot study, with a different design. Serum cholesterol levels of n = 3 patients are measured at the beginning of the study, then re- measured after a six month treatment period on the drug, in order to test the null hypothesis
Baseline End of Study 220 180 240 200 290 220
(c) Compare and contrast these two study designs and their results.
generally NOT equivalent to an informal side-by-side comparison of the individual
(a) Suppose that two population random variables X 1 and X 2 are normally distributed, each
samples are selected, each of size n = 100 , and it is found that the corresponding means are x 1 (^) = 215 and x 2 (^) = 200 , respectively. Show that even though the two individual 95%
rejected. (See middle figure below.)
simplicity), resulting in corresponding means x 1 and x 2 , respectively. Let CI μ 1 and CI μ 2
be the respective 100
1 2 / 2
x x d z (^) α σ n
the denominator is simply the margin of error for the confidence intervals.) Also let CI (^) μ 1 −μ 2
x 1
x 2
x 1 (^) − x 2
| 0
x 2
x 1 (^) − x 2
| 0 x 1
x 1
x 2
x 1 (^) − x 2
| 0
6-13. Z -tests and Chi-squared Tests
(a) Test of Independence. Imagine that a marketing research study surveys a random sample of n = 2000 consumers about their responses regarding two brands ( A and B ) of a certain product, with the following observed results.
Do You Like Brand B****? Yes No
Yes 335 915
Do You Like
Brand
No 165 585 750
“The probability of liking A , given that B is liked, is equal to probability of liking A , given that B is not liked.”
⇔ “There is no association between liking A and liking B .”
⇔ “Liking A and liking B are independent of each other.” (Why? See Exercise on page 3-14 of section 3.2.)
“The probability of liking B , given that A is liked, is equal to probability of liking B , given that A is not liked.”
⇔^ “There is no association between liking^ B^ and liking^ A .”
⇔ “Liking B and liking A are independent of each other.”
previous Z -score? Conclusion?
Compute the Chi-squared score. How does it compare with the preceding Z -scores? Conclusion?
6-14. Consider the following 2 × 2 contingency table taken from a retrospective case-control study that investigates the proportion of diabetes sufferers among acute myocardial infarction (heart attack) victims in the Navajo population residing in the United States. MI Yes No Total
Yes 46 25 71
Diabetes^ No^98 119
Total 144 144 288
significance level, what exactly has been demonstrated about the proportion of diabetics among the two categories of heart disease in this population?
(b) In the study design above, the 144 victims of myocardial infarction ( cases ) and the 144 individuals free of heart disease ( controls ) were actually age- and gender-matched. The members of each case-control pair were then asked whether they had ever been diagnosed with diabetes. Of the 46 individuals who had experienced MI and who were diabetic, it turned out that 9 were paired with diabetics and 37 with non-diabetics. Of the 98 individuals who had experienced MI but who were not diabetic, it turned out that 16 were paired with diabetics and 82 with non-diabetics. Therefore, each cell in the resulting 2 × 2 contingency table below corresponds to the combination of responses for age- and gender- matched case-control pairs, rather than individuals. MI Diabetes No Diabetes Totals
Diabetes 9 16 25
No MI^ No Diabetes
Totals 46 98 144
Conduct a McNemar Test for the null hypothesis H 0 : “The number of ‘diabetic, MI case’ - ‘non-diabetic, non-MI control’ pairs, is equal to the number of ‘non-diabetic, MI case’ - ‘diabetic, non-MI control’ pairs, who have been matched on age and gender,” or more succinctly, H 0 : “There is no association between diabetes and myocardial infarction in the Navajo population, adjusting for age and gender.” Determine whether or not we can reject the
significance level, what exactly has been demonstrated about the association between diabetes and myocardial infarction in this population?
(c) Why does the McNemar Test only consider discordant case-control pairs? Hint : What, if anything, would a concordant pair (i.e., either both individuals in a ‘MI case - No MI control’ pair are diabetic, or both are non-diabetic) reveal about a diabetes-MI association, and why?
6-15. The following data are taken from a study that attempts to determine whether the use of electronic fetal monitoring (“exposure”) during labor affects the frequency of caesarian section deliveries (“disease”). Of the 5824 infants included in the study, 2850 were electronically monitored during labor and 2974 were not. Results are displayed in the 2 × 2 contingency table below.
Caesarian Delivery
Yes No Totals
Yes 358 2492
EFM Exposure
No 229 2745 2974
Totals 587 5237 5824
(a) Calculate a point estimate OR m for the population odds ratio OR , and interpret.
(b) Compute a 95% confidence interval for the population odds ratio OR.
(c) Based on your answer in part (b), show that the null hypothesis H 0 : OR = 1 can be rejected in
conclusion: What exactly has been demonstrated about the association between electronic fetal monitoring and caesarian section delivery? Be precise.
(d) Does this imply that electronic monitoring somehow causes a caesarian delivery? Can the association possibly be explained any other way? If so, how?
(d) To compute a 95% confidence interval for the summary odds ratio OR summary, we must first verify that the sample sizes in the two studies are large enough to ensure that the method used is valid.
Step 1: Verify that the expected number of observations of the ( i, j )th^ cell in the first table, plus the expected number of observations of the corresponding ( i, j )th^ cell in the second table, is greater than or equal to 5, for i = 1, 2 and j = 1, 2. Recall that the expected number of the ( i, j )th^ cell is given by Ei j = R (^) i C (^) j/ n.
Step 2: By its definition, the quantity L computed in part (b) is a weighted mean of log-odds ratios, and already represents a point estimate of ln( OR summary). The estimated standard error of L is given by
m^1 s.e.( L ) w 1 + w 2
Step 3: From these two values in Step 2, construct a 95% confidence interval for ln( OR summary), and exponentiate it to derive a 95% confidence interval for OR summary itself.
(e) Also compute the value of the Chi-squared test statistic for OR summary given at the end of § 6.2.3.
2
Association of the null hypothesis H (^) 0 : OR summary = 1, versus the alternative HA : OR summary ≠ 1,
demonstrated about the association between the number of term pregnancies and the odds of developing epithelial ovarian cancer? Be precise.
6-17. In a random sample of n = 1200 consumers who are surveyed about their ice cream flavor preferences, 416 indicate that they prefer vanilla, 419 prefer chocolate, and 365 prefer strawberry.
(a) Conduct a Chi-squared “Goodness-of-Fit” Test of the null hypothesis of equal proportions
Vanilla Chocolate Strawberry
416 419 365
(b) Suppose that the sample of n = 1200 consumers is equally divided between males and females, yielding the results shown below. Conduct a Chi-squared Test of the null
Vanilla Chocolate Strawberry Totals
Males 200 190 210 600
Females 216 229 155 600
Totals 416 419 365 1200
6-18. In the late 1980s, the pharmaceutical company Upjohn received approval from the Food and Drug Administration to market RogaineTM, a 2% minoxidil solution, for the treatment of androgenetic alopecia (male pattern hair loss). Upjohn’s advertising campaign for Rogaine included the results of a double-blind randomized clinical trial, conducted with 1431 patients in 27 centers across the United States. The results of this study at the end of four months are summarized in the 2 × 5 contingency table below, where the two row categories represent the treatment arm and control arm respectively, and each column represents a response category, the degree of hair growth reported. [Source: Ronald L. Iman, A Data-Based Approach to Statistics, Duxbury Press]
Degree of Hair Growth No Growth
New Vellus
Minimal Growth
Moderate Dense Growth Growth Total
Rogaine 301 172 178 58 5 714 Placebo 423 150 114 29 1 717 Total 724 322 292 87 6 1431
and .) Infer whether or not we can reject the null
level, what exactly has been demonstrated about the efficacy of Rogaine versus placebo?
(b) Form a 2 × 2 contingency table by combining the last four columns into a single column
Rogaine versus placebo?
(c) Calculate the p -value using a two-sample Z -test of the null hypothesis in part (b), and show that the square of the corresponding z -score is equal to the Chi-squared test statistic found in