






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
alternative hypothesis, the test statistics or the critical values. ... A multiple regression model of the following form is fitted to a data set.
Typology: Lecture notes
1 / 11
This page cannot be seen from the preview
Don't miss anything!







Dr. M. Dettling Summer 2011
Approved: Any written material, calculator (without communication facility). Tables: Attached. Note: All tests have to be done at the 5%-level. If the question concerns the significance of a factor (or similar) and if nothing else is indicated, you don’t need to give the null- and alternative hypothesis, the test statistics or the critical values. Exercise 1 is a multiple-choice exercise. In each sub-exercise, exactly one answer is correct. A correct answer adds 1 plus-point and a wrong answer 12 minus-point. You get a minimum of 0 points for the whole multiple-choice exercise. Tick the correct answer to the multiple choice exercises in the separately added answer sheet. Do not stay too long at a part where you experience a lot of difficulties.
A multiple regression model of the following form is fitted to a data set.
Yi = β 0 + β 1 · xi, 1 + β 2 · xi, 2 + β 3 · xi, 3 + β 4 · xi, 4 + εi, εi ∼ N (0, σ^2 ) i.i.d.
The model is fitted using the software R and the following summary output is obtained.
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) ??? 0.1960 8.438 3.57e- x1 5.3036 2.5316 ??? 0. x2 4.0336 2.4796 1.627 0. x3 -9.3153 2.4657 -3.778 0. x4 0.5884 2.2852 0.257 0.
Residual standard error: 1.892 on 95 degrees of freedom Multiple R-squared: 0.1948,Adjusted R-squared: ??? F-statistic: 5.745 on 4 and 95 DF, p-value: 0.
Consider the following scatterplot:
0.0 0.5 1.0 1.
x
y
z= z=
The different symbols in the plot correspond to the values of two different groups. The response variable y and the covariable x are continuous, the indicator variable z ∈ { 0 , 1 } encodes the respective group membership.
a) The covariables x and z are interacting. Explain! b) Are x and z correlated? Explain! c) What model would you fit to these data? Write down a model equation? A linear model has been fit to the above data. The R-output is given as follows:
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.31215 0.08485 15.464 < 2e- x 1.09296 0.14606 7.483 9.37e- z -1.25344 0.21241 -5.901 8.84e- x:z -0.35241 0.20656 -1.706 0.
Residual standard error: 0.2766 on 78 degrees of freedom Multiple R-squared: 0.7755,Adjusted R-squared: 0. F-statistic: 89.82 on 3 and 78 DF, p-value: < 2.2e-
d) What are the estimated regression lines for the two groups? e) Is it statistically nesessery to fit two regression lines with different slopes? Motivate your answer.
We repeat the regression analysis but without interaction of x and z.
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.40026 0.06814 20.550 < 2e- x 0.91675 0.10452 8.771 2.72e- z -1.57061 0.10400 -15.103 < 2e-
Residual standard error: 0.28 on 79 degrees of freedom Multiple R-squared: 0.7671,Adjusted R-squared: 0. F-statistic: 130.1 on 2 and 79 DF, p-value: < 2.2e-
f ) Which quantities in the R-Output can be used to compare the two models?
The swiss military carried out a study in order to analyze which soldiers are fit enough to join the special force team AAD10. In this regard, the dependent binary variable (y) reflects state of fitness of a soldier. y = 1 means that the soldier is fit enough for the special force team AAD10, whereas y = 0 indicates that the soldier is not fit enough. The following predictor variables were used for the analysis:
a) Write down the logistic regression model for this case. b) Look at the following R-Output. Formally, which predictors have a significant influence on the response? Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -15.5543 7.2946 -2.132 0. X1 -0.5859 0.3569 ??? ??? X2 0.5643 0.3317 ??? ??? X3 1.9639 0.8800 ??? ???
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 27.526 on ??? degrees of freedom Residual deviance: 14.177 on 16 degrees of freedom AIC: ???
Number of Fisher Scoring iterations: 6 c) How many observations were used in this logistic regression? d) What are the odds for y = 1 if x 2 is increased by 1 and the other predictors remain the same? e) Estimate the probability for y = 1 with x 1 = 3, x 2 = 25 and x 3 = 2. What would be your prediction for y in this case? f ) We have x 1 = 5 and x 2 = 25. Which value do we have to choose for x 3 in order to get a probability of 50% for y = 1? g) Now we calculate the logistic regression without the predictor variable x 1. Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -7.60061 4.43762 -1.713 0. X2 0.08727 0.14484 0.603 ??? X3 1.53255 0.68010 2.253 ???
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 27.526 on ??? degrees of freedom Residual deviance: 18.158 on 17 degrees of freedom AIC: ???
Which of the two models from above would you prefere concerning AIC? Motivate your answer.
age80+ 2.82207 0.11372 24.816 < 2e-16 *** smokeyes 0.41044 0.04096 10.021 < 2e-16 ***
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 1789.071 on 17 degrees of freedom Residual deviance: 12.661 on 8 degrees of freedom AIC: 153.
Number of Fisher Scoring iterations: 4
d) Compute the fitted value of the first observation. e) The effect of smoke is significant. According to the fitted model, how much more likely is it that a randomly chosen smoking pearson dies from lung cancer in comparison to a randomly chosen non-smoking pearson, given that both belong to the same age group? f ) If there are 436 person in the age group 75–79 with smoke status “yes”, how many of them do you expect to die from lung cancer according to the model? g) Consider the interval [191. 01 , 240 .16]. Is it plausible that this interval is a 95% prediction interval for the number in f )? Explain. (Hint: A Poisson distribution with parameter λ > 100 is well approximated by a normal distribution.)
- Bsp.: P [Z ≤ 1 .96] = 0.