










Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Exam; Professor: Davidian; Class: Applied Longitudianal Data Analysis; Subject: Statistics; University: North Carolina State University; Term: Spring 2007;
Typology: Exams
1 / 18
This page cannot be seen from the preview
Don't miss anything!











Please sign the following pledge certifying that the work on this test is your own:
“I have neither given nor received unauthorized aid on this test.”
Signature:
Printed Name:
There are FIVE questions, all with multiple parts. For each part of each question, please write your answers in the space provided. If you need more space, continue on the back of the page and indicate clearly where on the back you have continued your answer. Scratch paper will be provided.
You are allowed ONE (1) SHEET of HANDWRITTEN NOTES (FRONT ONLY). Calculators are NOT allowed (you will not need one). NOTHING should be on your desk but this test paper, your one page of notes, and scratch paper given to you.
Points for each part of each problem are given in the left margin. TOTAL POINTS = 100.
In all problems, all symbols and notation are defined exactly as they are in the class notes.
If you are asked to provide a NUMERICAL ESTIMATE or EXPRESSION, and the ONLY WAY you can see to do this involves some arithmetic (e.g., adding or multiplying several numbers), YOU DO NOT HAVE TO DO THE ARITHMETIC. JUST GIVE THE EXPRESSION. HOWEVER, if a numerical estimate IS AVAILABLE DIRECTLY from the output WITHOUT having to do arithmetic, I will only give partial credit if you do not recognize this.
If you believe it is IMPOSSIBLE to provide a numerical estimate or expression from what is provided, STATE THIS AND EXPLAIN.
At baseline (week 0), the weights of the subjects were recorded immediately before they starting their assigned programs. Subjects then began following their assigned programs and were asked to return to the clinic at 1, 2, 4, 8, and 12 weeks thereafter to be weighed. A challenge with weight loss studies is that subjects may fail to show up for these visits or, worse, drop out of the study if they feel they have not lost sufficient weight. Thus, the investigators paid all subjects $20.00 at each clinic visit to encourage them to continue to appear at the scheduled times. This tactic helped to keep the percentage of missed visits relatively low, although some women did miss some visits after the initial baseline one. The data set contains 742 weight measurements across the 150 subjects. The raw data are plotted below.
0 2 4 6 8 10 12
170
180
190
200
210
220
230
Week
Weight (lbs)
North Shore Plan
0 2 4 6 8 10 12
170
180
190
200
210
220
230
Week
Weight (lbs)
Trim Fast Plan
0 2 4 6 8 10 12
170
180
190
200
210
220
230
Week
Weight (lbs)
Control
Also recorded for each subject was an indicator of the subject’s age (0 = 25 years old or less, 1 = older than 25).
Here is the model again for convenience:
δ 1 i = 1 if subject i was assigned to “North Shore” (program 1), = 0 otherwise δ 2 i = 1 if subject i was assigned to “Trim-Quick” (program 2), = 0 otherwise δ 3 i = 1 if subject i was assigned to “Control” (program 3), = 0 otherwise
Yij = β 0 i + β 1 itij + eij (1) β 0 i = β 00 + b 0 i, β 1 i = β 11 δ 1 i + β 12 δ 2 i + β 13 δ 3 i + b 1 i. (2)
[5 points] (c) The primary study hypothesis was that the “typical” (mean) rate of weight change possibly differs in the population of overweight women depending on which program they follow. In terms of symbols in model (1), (2), give a set of null and alternative hypotheses H 0 and H 1 that address this issue. Give numerical values of a test statistic and associated p-value appropriate for testing H 0 vs. H 1 and, based on them, state your conclusion regarding the strength of the evidence against H 0. The question is whether or not β 11 = β 12 = β 13. This is addressed by Contrast ’B’. The test statistic is 13.06 or 6.53 – both have associated p-values < 0 .002, which is very small. At any reasonable level of significance we would conclude that there is sufficient evidence to suggest that the rate of change of weight is different in at least one of the groups from the others.
[5 points] (d) Give a numerical estimate of the difference of population mean weight at the end of the study (12 weeks) between the “North Shore” (program 1) and control (program 3) programs. (You needn’t provide a standard error.) The population mean at 12 weeks for program 1 is β 00 + β 11 (12), and that for program 3 is β 00 + β 13 (12). Thus, the difference is (β 11 − β 13 )(12). There does not appear to be an estimate statement directly giving this value, but we can get a numerical expression from the Solution for Fixed Effects as
(− 0 .7384 + 0.09361)(12).
Here is the model again for convenience:
δ 1 i = 1 if subject i was assigned to “North Shore” (program 1), = 0 otherwise δ 2 i = 1 if subject i was assigned to “Trim-Quick” (program 2), = 0 otherwise δ 3 i = 1 if subject i was assigned to “Control” (program 3), = 0 otherwise
Yij = β 0 i + β 1 itij + eij (1) β 0 i = β 00 + b 0 i, β 1 i = β 11 δ 1 i + β 12 δ 2 i + β 13 δ 3 i + b 1 i. (2)
[5 points] (e) An additional question that the investigators wanted to address was the long-standing issue of whether or not there is a difference in the rate at which women lose weight on average between the “North Shore” (program 1) and “Trim-Quick” (program 2) diet plans. Is there evidence to suggest such a difference? (You need not state formal hypotheses.) From the output, provide numerical evidence (values of a test statistic and p-value) justifying your answer. The question focuses on the difference β 11 − β 12. This is addressed directly by Contrast ’E’. The test statistic is 1.55 with a p-value of 0.21. There is not enough evidence from this study to suggest a difference in rate of weight loss between the two diet plans.
[5 points] (f) Women following the control (program 3) were supposed to follow their usual eating habits, under which they were presumably not losing weight, throughout the study. One issue in weight loss research is that women assigned to a control may become inspired through participation in the study to change their habits, begin following a diet plan on their own, and show a change in weight as a result. Is there evidence that this is the case here? (You need not state formal hypotheses.) From the output, provide numerical evidence (values of a test statistic and p-value) justifying your answer. If women continued with their usual eating habits, we would expect them to exhibit no change in weight on average over the study; that is, we’d expect β 13 = 0. We can read the Z test statistic and p-value directly off the Solutions for Fixed Effects table as − 0 .74 and 0.46, respectively. There does not seem to be any evidence of this in the current study. Of course, this doesn’t mean that it didn’t happen; rather, it only means we do not have sufficient evidence to say that it did.
δ 1 i = 1 if subject i was assigned to “North Shore” (program 1), = 0 otherwise δ 2 i = 1 if subject i was assigned to “Trim-Quick” (program 2), = 0 otherwise ai = 0 if i is 25 years old or less at baseline (≤ 25) = 1 if i is older than 25 at baseline (> 25).
Yij = β 0 i + β 1 itij + eij (3) β 0 i = β 00 + β 01 ai + b 0 i, β 1 i = β 11 δ 1 i + β 11 aδ 1 iai + β 12 δ 2 i + β 12 aδ 2 iai + b 1 i (4)
On page 9 you will find SAS code and selected portions of its output. Use this to answer the following.
[5 points] (a) The first question was whether or not these data support the contention that the mean weight in the population of women who are not currently following a diet plan is associated with age group (≤ 25 or > 25). From the output, cite the numerical values of a test statis- tic and associated p-value appropriate for addressing this issue (you need not state formal hypotheses). This question has to do with the state of affairs when women are not following a diet plan, so prior to starting their assigned plans. The mean weight of women of age ai at time 0 is β 00 + β 01 ai, and we are interested in whether or not β 01 = 0. A Wald statistic and p-value can be read off the Solution for Fixed Effects table and are equal to − 10 .18 and < 0 .0001, respectively. There is strong evidence that mean weight is different depending on age group among women not following a diet plan. (The negative sign on the estimate and statistic suggests that older women weigh less on average.)
[5 points] (b) Give a numerical estimate of the difference in “typical” (mean) rate of weight change for women who are > 25 between the “North Shore” and “Trim-Quick” diets. Also give a standard error. The typical mean weight change for “North Shore” (program 1) women > 25 is β 11 + β 11 a, and that for “Trim-Quick” (program 2) is β 12 + β 12 a. Thus, we are interested in the difference β 11 −β 12 +β 11 a −β 12 a. This is addressed directly by Estimate ’C’. The estimate and standard error are -0.2767 and 0.2038, respectively.
Here is the model again for convenience:
δ 1 i = 1 if subject i was assigned to “North Shore” (program 1), = 0 otherwise δ 2 i = 1 if subject i was assigned to “Trim-Quick” (program 2), = 0 otherwise ai = 0 if subject i is 25 years old or less at baseline (≤ 25) = 1 if subject i is older than 25 at baseline (> 25).
Yij = β 0 i + β 1 itij + eij (3) β 0 i = β 00 + β 01 ai + b 0 i, β 1 i = β 11 δ 1 i + β 11 aδ 1 iai + β 12 δ 2 i + β 12 aδ 2 iai + b 1 i (4)
[5 points] (c) Another question was whether or not, among women who follow the “North Shore” diet (program 1), the “typical” (mean) number of pounds lost per week is different between those who are ≤ 25 years old and those who are > 25. From the output, cite the numerical values of a test statistic and associated p-value appropriate for addressing this issue (you need not state formal hypotheses). The question is whether or not β 11 a = 0. From the Solution for Fixed Effects, the Wald test statistic is 0.24, with a p-value of 0.81. There is no evidence to suggest that the “typical” rate of weight loss (pounds per week) for women following “North Shore” is different between the two age groups.
[5 points] (a) Explain in plain English without using any symbols what Model IV assumes about vari- ation in woman-specific “inherent” rates of weight change. Model III assumes that “inherent” rates of weight change for women who follow the same diet and are in the same age group vary nonnegligibly about the “typical” mean rate of weight change for their diet-age group – the random effect associated with week captures this variation. That is, only part of the variation in woman-specific “inherent” rates of weight change is due to a systematic relationship with diet plan and age group; the rest is due to “inherent” “biological variation.” On the other hand, in Model IV, these rates of change have no associated random effect to allow for “inherent” “biological variation.” Thus, this model assumes that all variation in woman-specific “inherent” rates of weight change is due to a systematic relationship with diet plan and age group. The implication is that women sharing the same diet plan and age will all have the same “inherent” rate of weight change.
[5 points] (b) Based on a model that assumes “inherent” rates of weight change among women who are < 25 years old and assigned to “North Shore” are not identical, give a numerical expression characterizing the “inherent” rate of weight change for Subject 2 (this subject is > 25 years old and was assigned to “North Shore”). We want to use Model III here. The quantity we wish to “estimate” (predict) is β 11 +β 11 a +b 1 i for i = 2. From the output, we get − 0 .7853 + 0.06661 + 0.3491.
[5 points] (c) Which of Models III and IV do you prefer? Cite numerical evidence supporting your choice. This is the problem of comparing two models with different numbers of random effects, so that the null hypothesis that the models are the same is on the boundary of the parameter space. We can compare the two models informally on the basis of AIC and BIC; from the output, Model III yields AIC = 2881.8 and BIC = 2907.9, while Model IV yields AIC = 3129.8 and BIC = 3150.7. Both measures are considerably smaller for Model III, suggesting that it may be preferred. If we wanted to do a likelihood ratio test, we would calculate the test statistic as 3113. 8 − 2861 .8 = 252 and compare it to the critical value for a mixture of χ^21 and χ^22 distributions. The test statistic is so huge that it surely exceeds this critical value. Thus, the evidence available seems to support Model III.
title "Model III"; proc mixed data=weight2 method=ml; class id program; model weight = age weekprogram ageweek*program / solution; random int week / type=un subject=id solution; run; Model III The Mixed Procedure Covariance Parameter Estimates Cov Parm Subject Estimate UN(1,1) id 11. UN(2,1) id 0. UN(2,2) id 0. Residual 8. Fit Statistics -2 Log Likelihood 2861. AIC (smaller is better) 2881. BIC (smaller is better) 2907. Solution for Fixed Effects
Effect program Estimate StandardError DF t Value Pr > |t| Intercept 209.61 0.8479 98 247.21 <. age -9.6927 0.9525 292 -10.18 <. weekprogram 1 -0.7853 0.2371 292 -3.31 0. weekprogram 2 -0.9467 0.3324 292 -2.85 0. ageweekprogram 1 0.06661 0.2821 292 0.24 0. ageweekprogram 2 0.5047 0.3587 292 1.41 0. Solution for Random Effects Std Err Effect id Estimate Pred DF t Value Pr > |t| Intercept 1 0.6094 1.6262 292 0.37 0. week 1 0.4939 0.3262 292 1.51 0. Intercept 2 0.4987 1.6476 292 0.30 0. week 2 0.3491 0.2860 292 1.22 0. Intercept 3 -4.7972 1.5643 292 -3.07 0. week 3 -0.5238 0.3890 292 -1.35 0. title "Model IV"; proc mixed data=weight2 method=ml; class id program; model weight = age weekprogram ageweekprogram / solution; random int / type=un subject=id solution; run; Model IV The Mixed Procedure Covariance Parameter Estimates Cov Parm Subject Estimate UN(1,1) id 24. Residual 22. Fit Statistics -2 Log Likelihood 3113. AIC (smaller is better) 3129. BIC (smaller is better) 3150. Solution for Fixed Effects Standard Effect program Estimate Error DF t Value Pr > |t| Intercept 209.60 1.2830 98 163.36 <. age -9.6228 1.4408 388 -6.68 <. weekprogram 1 -0.8572 0.1343 388 -6.38 <. weekprogram 2 -0.8710 0.1777 388 -4.90 <. ageweekprogram 1 0.1201 0.1612 388 0.74 0. ageweek*program 2 0.3971 0.1927 388 2.06 0. Solution for Random Effects Std Err Effect id Estimate Pred DF t Value Pr > |t| Intercept 1 2.7643 2.0953 388 1.32 0. Intercept 2 1.9481 2.0384 388 0.96 0. Intercept 3 -5.9747 2.0270 388 -2.95 0.
For convenience, here is the model again:
Yj = number of acne lesions on the face for teenager j gj = 0 if j is a girl (female), = 1 if j is a boy (male) tj = 0 if j is using a topical medication, = 1 if j is using an oral medication sj = 0 if j has mild to moderate acne, = 1 if j has severe acne.
E(Yj ) = exp(β 0 + β 1 gj + β 2 tj + β 3 sj ) (5)
[5 points] (c) From the output, provide a numerical estimate of and confidence interval for the factor by which the mean number of acne lesions for teenagers with mild to moderate acne should be multiplied to obtain the mean number of lesions for teenagers with severe acne. Based on these values, is there evidence that teenagers with severe acne have more lesions on average than teenagers with mild to moderate acne? This factor is exp(β 3 ). This quantity is estimated in Estimate ’D’ to be 2.05, with a con- fidence interval of 1.50 to 2.81. The confidence interval does not contain 1.0, and its lower bound is greater than 1. This suggests that there is evidence that the mean number of lesions for teenagers with severe acne is about twice that for those with mild to moderate acne, so that teenagers with severe acne have more lesions on average.
data acne; infile "acne.dat"; input id gender medicate severity lesions; run; proc genmod data=acne; model lesions = gender medicate severity / dist=poisson link=log; contrast ’A’ gender 1, medicate 1, severity 1 / wald; contrast ’B’ int 1, gender 1, medicate 1, severity 1 / wald; contrast ’C’ gender 1 medicate 1 severity 1 / wald; contrast ’D’ gender 1 medicate -1, gender 1 severity -1 / wald; estimate ’A’ gender 1 medicate 1 severity 0 /exp; estimate ’B’ int 1 gender 1 medicate 1 severity 0 / exp; estimate ’C’ int 1 severity 1 / exp; run;estimate ’D’ int 0 severity 1 / exp; Obs id gender medicate severity lesions (^12 12 00 01 00 ) (^34 34 10 00 00 ) 5 5 0 0 0 3
Algorithm converged. The GENMOD Procedure Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept 1 1.0486 0.1315 0.7908 1.3063 63.57 <. gender 1 0.5805 0.1389 0.3082 0.8527 17.46 <. medicateseverity 11 0.15460.7201 0.13310.1603 -0.10620.4059 0.41541.0343 (^) 20.181.35 0.2452<. Scale 0 1.0000 0.0000 1.0000 1. NOTE: The scale parameter was held fixed.
Contrast Estimate Results Label Estimate StandardError^ Alpha Confidence Limits SquareChi- Pr > ChiSq AExp(A) 0.73512.0857 0.19010.3965 0.050.05 0.36251.4370 1.10763.0272 14.96 0. BExp(B) 1.78375.9516 0.11690.6956 0.050.05 1.55464.7332 2.01277.4837 232.91 <. CExp(C) 1.76875.8630 0.15670.9190 0.050.05 1.46154.3122 2.07597.9714 127.33 <. DExp(D) 0.72012.0546 0.16030.3294 0.050.05 0.40591.5007 1.03432.8130 20.18 <.
Contrast Results Chi- Contrast DF Square Pr > ChiSq Type AB (^34) 682.9231.75 <.0001<.0001 WaldWald C 1 29.01 <.0001 Wald D 2 7.98 0.0185 Wald
For convenience, here is the model again:
δ 1 i = 1 if subject i was in Group 1 (low dose), = 0 otherwise δ 2 i = 1 if subject i was in Group 2 (high dose), = 0 otherwise δ 3 i = 1 if subject i was in Group 3 (placebo), = 0 otherwise
E(Yij ) = exp(β 0 + β 1 tij δ 1 i + β 2 tij δ 2 i + β 3 tij δ 3 i) 1 + exp(β 0 + β 1 tij δ 1 i + β 2 tij δ 2 i + β 3 tij δ 3 i)
On page 18, you will find some SAS code and selected portions of its output. Use this to answer the following.
[5 points] (b) The first question the investigators wished to address was whether or not the pattern of change of the log odds of having at least one panic attack once treatment is initiated is possibly different for at least one of three groups. From the output, cite the numerical values of a test statistic and associated p-value appropriate for addressing this issue (you need not state formal hypotheses). The question is whether or not β 1 = β 2 = β 3. This is addressed by Contrast ’A’. The test statistic is 14.59 with a p-value of 0.0007. There is evidence that the pattern is different for a least one of the groups. (c) From the output, provide a numerical estimate of the amount by which the log odds [5 points] of having at least one panic attack changes per week if high-dose antidepressant therapy is administered. Also provide the numerical value (to 4 decimal places) of an associated standard error that is valid if the assumption the investigators have made on correlations among the binary indicators of panic attacks is correct. The log odds for the high-dose group (group 2) changes by β 2 in 1 week. Thus, the estimate is -0.2172. Assuming that the working compound symmetric correlation is correct, from the Model-Based Standard Error Estimates table, the standard error is 0.0460.
[5 points] (d) From the output, provide a numerical estimate for the odds ratio comparing the odds of having at least one panic attack at week 4 under low-dose therapy to the odds of having a panic attack at week 4 under high-dose therapy. The odds of at least one panic attack at week 4 for the low-dose group is eβ^0 +β^1 (4)^ and for the high-dose group is eβ^0 +β^2 (4). The odds ratio is the quotient eβ^0 +β^1 (4)/eβ^0 +β^2 (4)^ = eβ^1 (4)/eβ^2 (4)^ = e(β^1 −β^2 )(4). An estimate of this quantity is given directly by Estimate ’C’ as 1.5304. (Some of you got in indirectly from Estimate ’A’ and Estimate ’B’ as 0.6420/0.4195.)
For convenience, here is the model again:
δ 1 i = 1 if subject i was in Group 1 (low dose), = 0 otherwise δ 2 i = 1 if subject i was in Group 2 (high dose), = 0 otherwise δ 3 i = 1 if subject i was in Group 3 (placebo), = 0 otherwise
E(Yij ) = exp(β 0 + β 1 tij δ 1 i + β 2 tij δ 2 i + β 3 tij δ 3 i) 1 + exp(β 0 + β 1 tij δ 1 i + β 2 tij δ 2 i + β 3 tij δ 3 i)
[5 points] (e) Let gi = 0 if subject i is female and gi = 1 is male. The investigators wished to modify model (6) to allow the following: (i) The probability of having a panic attack before starting therapy of any kind may depend on gender. (ii) The pattern of change after week 0 for each group is of the same form as that in model (6) but may be different for males and females within each group. Write down a modified model for E(Yij ) that incorporates these features. Provide class and model statements that will fit your model in proc genmod, and give a contrast statement that addresses the issue of whether or not the pattern of change after week 0 differs between males and females for at least one of the three groups. The modified model is E(Yij ) = exp(ηij ) 1 + exp(ηij )
where
ηij = β 0 + β 0 ggi + β 1 tij δ 1 i + β 1 gtij δ 1 igi + β 2 tij δ 1 i + β 2 gtij δ 1 igi + β 3 tij δ 1 i + β 3 gtij δ 1 igi.
The issue in question is then whether or not at least one of βkg, k = 1, 2 , 3, is different from
proc genmod descending; class id group; model attack = gender weekgroup weekgroupgender / dist=binomial link=logit; repeated... contrast ’gender’ weekgroupgender 1 0 0, weekgroupgender 0 1 0, weekgroup*gender 0 0 1 / wald;