Download Quiz Solution for ST512: Linear Regression Analysis in NFL Games - Prof. Jason Osborne and more Quizzes Statistics in PDF only on Docsity! ST512 Fall Semester, 2008 Quiz 1- solution 1. In NFL games from a sample of n = 682 games, two measurements are made: the published point spread x and margin of victory y for the favored team. A simple linear regression of y on x was fit with SAS. Code and selected output are given below. For all significance tests, use α = .05. proc reg data=spreads ; model outcome=spread; run; The SAS System 1 The REG Procedure Root MSE 13.26051 R-Square 0.0755 Dependent Mean 6.09673 Adj R-Sq 0.0742 Coeff Var 217.50217 Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 0.02755 0.96668 0.03 0.9773 spread 1 1.14291 0.15446 7.40 <.0001 (a) Obtain a 95% confidence interval for the slope term (β1) from the regression model. At level α = .05, is it plausible that β1 = 0? How about β1 = 1? β̂1 ± 1.96ŜE(β̂1) or 1.14 ± 0.3 or (.84, 1.44) Since β1 = 0 is not in this region, it is not a plausible value of the slope in the population of all games. Conversely, β1 = 1 is in the region, and it is plausible that for every point that the published spread increases, the mean outcome also increases by a point. (b) Is µ(x = 0) = 0 a plausible value for the mean outcome when x = 0? Conduct an appropriate test. mu(x = 0) = β0 and inference for β0 can be based directly on the output. A t-statistic for H0 : β0 = 0 is t = .03 with a p-value of .9773, indicating that a zero-intercept is entirely plausible. It would be interesting to see if confidence bands for µ(x) contain the hypothetical function µ(x) = x. (c) Let the average point spread and outcome be denoted by x̄ and ȳ, respectively. (Note that the latter is given in the output as ȳ = 6.10.) Consider games in which one team is favored by x = x̄ points. i. Report an estimate of the mean margin of victory for the favored team in these (x = x̄) games, along with a standard error. µ̂(x = x̄) = β̂0 + β̂1x̄ = ȳ = 6.1 ŜE(β̂0 + β̂1x̄) = √ MSE( 1 n + 02) = 13.26/ √ 682 = .5 ii. Report an estimate of the standard deviation of the margin of victory for the favored teams in these (x = x̄) games. √ MS(E) = 13.26 iii. Obtain a 95% prediction interval for the amount by which a team favored by this amount (x = x̄) will win in an upcoming game. β̂0 + β̂1x̄ ± 1.96 √ MS(E)(1 + 1/n) or 6.10 ± 1.9613.26 √ 1 + 1/682 or 6.1 ± 26 (d) By writing (up,down, same) indicate how the answers to questions i) and ii) in part (c) would change if games where x = x̄ + 1 were considered: i. mean ? up standard error ? up ii. same (e) PROBLEM NOT ASKED: report the regression sum of squares for this analysis. 3. A dentistry experiment randomizes subjects to two groups (n = 20 each). One group receives a control toothpaste, the other an experimental fluoride toothpaste. Average daily plaque scores, y, are measured as the response, along with a covariate of brushing frequency called compliance, which is centered about its observed mean (1.984) leading to a mean-zero covariate, z = compliance − 1.984. SAS code and output pertinent to this problem are given on the next page. In testing for an effect of the fluoride treat- ment, consider the model in which mean plaque score depends linearly on compliance (and hence on z), and in which this dependence is constant across treatments. (a) Report a statistic, p-value, and associated degrees of freedom for a test for a fluoride treatment effect on plaque score after controlling for compliance. Directly from output, t = −2.35, p = .0240, df = 1, 37 (b) Report the mean plaque score for each treatment, adjusted to the overall mean compliance of 1.984 (or z = 0). Group Adjusted mean control β̂0 = 7.78 fluoride β̂0 + β̂F = 7.78 − .74 = 7.04 (c) Estimate the difference between mean plaque score for the two treatments at a given compliance. Report a standard error. Directly from output, β̂F = −.74(SE = .315) (d) Give the proportion of observed variation in plaque score explained by a simple linear regression on z, where the treatment (toothpaste type) is ignored. r2 = R(βz|β0) SS(Tot) = 28.7/70.6 = .40 (e) Tough problem: given that the unadjusted mean plaque score was higher for the control group than for the fluoride group, recover the unadjusted means for each group using the output. The ANCOVA model may be written E(Yi|Fi, z + i) = β0 + βF Fi + βzzi where i indexes all 40 observations. The reduced one-factor model may be written E(Yij) = µ + τi where i indexes the fluoride treatment and j = 1, . . . , 20 indexes the subjects receiving a given treatment. The treatment sum of squares is the sum of squared differences of the unadjusted means from the grand mean, SS(Trt) = 2∑ i=1 20∑ j=1 (ȳi+ − ȳ++)2 and can therefore be used to recover ȳ1+ and ȳ2+. Using the output, SS(Trt) + R(βz|β1, trt) = SS(full model) SS(Trt) = SS(full model) − R(βz|β1, trt) = 34.1 − 26.6 = 7.5 = ∑ i ∑ j (ȳi+ − ȳ++)2 = 20 ∑ i (ȳi+ − ȳ++)2 = 20[(ȳ1+ − ȳ++)2 + (ȳ2+ − ȳ++)2] = 40(ȳ1+ − ȳ++)2 (ȳi+ equidistant from ȳ++) SS(trt) 40 = (ȳ1+ − ȳ++)2 √ SS(trt) 40 = (ȳ1+ − ȳ++) 0.433 = (ȳ1+ − ȳ++) so that the two treatment means are ȳ++ ± 0.433 or ȳc = 6.976, ȳF = 7.84 proc reg data=teeth2; model plaque=z fluoride /ss1 ss2; /* fluoride is an indicator for fluoride group */ run; ****************************************************************************************** The SAS System 1 The REG Procedure Model: MODEL1 Dependent Variable: plaque Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 34.13180 17.06590 17.30 <.0001 Error 37 36.50552 0.98664 Corrected Total 39 70.63731 Root MSE 0.99330 R-Square 0.4832 Dependent Mean 7.40873 Adj R-Sq 0.4553 Coeff Var 13.40709 Parameter Standard Variable DF Estimate Error t Value Pr > |t| Type I SS Type II SS Intercept 1 7.77946 0.22243 34.97 <.0001 2195.57355 1206.88547 z 1 -0.90661 0.17450 -5.20 <.0001 28.66618 26.63263 fluoride 1 -0.74145 0.31502 -2.35 0.0240 5.46562 5.46562