Advanced Statistical Methodology (STAT 526) Midterm Exam Solutions, Exams of Applied Statistics

Solutions to the midterm exam for the advanced statistical methodology course (stat 526). It covers topics such as regression analysis, goodness-of-fit tests, and odds ratios. The solutions demonstrate the application of statistical concepts and techniques to real-world problems. Valuable for students studying statistics and data analysis.

Typology: Exams

2018/2019

Uploaded on 02/13/2025

jasmine-liu
jasmine-liu 🇺🇸

1 document

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ADVANCED STATISTICAL METHODOLOGY (STAT 526)
Spring 2019
MIDTERM EXAM (BRNG 2290)
8:00-10:00PM, Wednesday, Feburary 27, 2019
There are totally 32 points in the exam. The students with score higher than or equal to 30 points will
receive 30 points. Please write down your name and student ID number below.
NAME:
ID:
> summary(Midterm)
Min. 1st Qu. Median Mean 3rd Qu. Max.
17.00 23.75 29.25 26.88 30.00 30.00
> sort(Midterm)
[1] 17.0 20.0 23.0 23.0 24.0 26.5 28.0 29.0 29.5 30.0
[11] 30.0 30.0 30.0 30.0 30.0 30.0
1
pf3
pf4
pf5
pf8

Partial preview of the text

Download Advanced Statistical Methodology (STAT 526) Midterm Exam Solutions and more Exams Applied Statistics in PDF only on Docsity!

ADVANCED STATISTICAL METHODOLOGY (STAT 526)

Spring 2019 MIDTERM EXAM (BRNG 2290) 8:00-10:00PM, Wednesday, Feburary 27, 2019

There are totally 32 points in the exam. The students with score higher than or equal to 30 points will receive 30 points. Please write down your name and student ID number below.

NAME:

ID:

summary(Midterm) Min. 1st Qu. Median Mean 3rd Qu. Max. 17.00 23.75 29.25 26.88 30.00 30. sort(Midterm) [1] 17.0 20.0 23.0 23.0 24.0 26.5 28.0 29.0 29.5 30. [11] 30.0 30.0 30.0 30.0 30.0 30.

  1. (10 points). The data set reports exam information for preliminary school students. It contains counts of pass/fail with respect to students’ weekly studying hours (hours) and three studying methods (method, coded by 1, 2, and 3). The working hours are partitioned into many intervals. The center values of these intervals are used in fitting models. The R output is given below.

summary(mod.main) glm(formula=cbind(pass,fail)~hours+factor(method),family =binomial,data=exam) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.13325 0.27940 -11.214 < 2e- hours 0.22174 0.01522 14.566 < 2e- factor(method)2 0.81913 0.22643 3.618 0. factor(method)3 1.21552 0.23671 5.135 2.82e- Null deviance: 397.538 on 17 degrees of freedom Residual deviance: 11.465 on 14 degrees of freedom summary(mod.int) glm(formula=cbind(pass,fail)~hours*factor(method),family =binomial,data=exam) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.06060 0.40367 -7.582 3.4e- hours 0.21693 0.02458 8.827 < 2e- factor(method)2 0.34967 0.56810 0.616 0. factor(method)3 1.40095 0.52562 2.665 0. hours:factor(method)2 0.03725 0.03896 0.956 0. hours:factor(method)3 -0.01765 0.03531 -0.500 0. Null deviance: 397.5383 on 17 degrees of freedom Residual deviance: 9.4339 on 12 degrees of freedom round(summary(mod.main)$cov.unscaled,6) (Intercept) hours factor(method)2 factor(method) (Intercept) 0.078063 -0.003499 -0.034658 -0. hours -0.003499 0.000232 0.000624 0. factor(method)2 -0.034658 0.000624 0.051269 0. factor(method)3 -0.038451 0.000875 0.027593 0. round(qchisq(0.95,1:20),2) [1] 3.84 5.99 7.81 9.49 11.07 12.59 14.07 15.51 16.92 18. [11] 19.68 21.03 22.36 23.68 25.00 26.30 27.59 28.87 30.14 31.

(e) (2 points). Provide the 95% confidence interval for hours in the first methods, respectively, if one wants to have 90% passing probability. Solution: For method 1, we have ˆx = (log 9 + 3.13325)/ 0 .22174 = 24.0393. By

( ∂^ xˆ ∂ βˆ 0

, ∂^ ˆx ∂ βˆ 1

, − log 9 + 3.^13325

  1. 221742

we have

σx^2 ˆ = (0. 31916 , − 108 .41)

(

  1. 078062 − 0. 003499 − 0. 003499 0. 00232

) (

  1. 31916 − 108. 41

) = 2. 9767.

Thus, the 95% confidence interval for hours is

  1. 0393 ± 1. 96

2 .9767 = [20. 6577 , 27 .4209].

  1. (8 points). The following table reported the relationship between education (educ) and religious beliefs (belief).

Education Religious Beliefs Degree Fundamentalist Moderate Liberal Total High School 178 138 101 417 College 570 648 442 1660 Bachelor Degree 145 252 252 649 Total 893 1038 795 2726

summary(mod.main) Call: glm(formula=yy~factor(educ)+factor(belief),family=poisson,data=Religion) Deviance Residuals: 1 2 3 4 5 6 7 8 9 3.3824 1.1150 -4.9219 -1.6875 0.6302 0.3091 -1.9260 -1.9429 4. summary(mod.ll) Call: glm(formula=yy~factor(educ)+factor(belief)+educ:belief, family=poisson,data=Religion) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 4.89267 0.05490 89.123 < 2e-16 *** factor(educ)2 0.81846 0.08970 9.124 < 2e-16 *** factor(educ)3 -0.73975 0.16889 -4.380 1.19e-05 *** factor(belief)2 -0.46604 0.09237 -5.045 4.53e-07 *** factor(belief)3 -1.38392 0.17656 -7.838 4.56e-15 *** educ:belief 0.30336 0.04049 7.493 6.76e-14 *** Null deviance: 1013.4427 on 8 degrees of freedom Residual deviance: 8.7621 on 3 degrees of freedom summary(mod.row)

Call: glm(formula=yy~factor(educ)+factor(belief)+factor(educ):belief, family=poisson,data=Religion) Coefficients: (1 not defined because of singularities) Estimate Std. Error z value Pr(>|z|) (Intercept) 5.70452 0.13550 42.100 < 2e-16 *** factor(educ)2 1.04760 0.14159 7.399 1.37e-13 *** factor(educ)3 -0.70291 0.17244 -4.076 4.58e-05 *** factor(belief)2 0.49840 0.06718 7.419 1.18e-13 *** factor(belief)3 0.54221 0.10205 5.313 1.08e-07 *** factor(educ)1:belief -0.57554 0.08198 -7.020 2.21e-12 *** factor(educ)2:belief -0.39688 0.06008 -6.605 3.96e-11 *** factor(educ)3:belief NA NA NA NA Null deviance: 1013.4427 on 8 degrees of freedom Residual deviance: 4.2737 on 2 degrees of freedom

(a) (2 points). Provide a goodness-of-fit test to assess whether the main effects model fits the data. Solution: By the values of deviance residuals in the output, we obtain G^2 =3. 38242 + 1. 11502 + (− 4 .9219)^2 + (− 1 .6875)^2 + 0. 63022 + 0. 30912

  • (− 1 .9260)^2 + (− 1 .9429)^2 + 4. 33722 =66. 54. Since G^2 > χ^20. 05 , 4 = 9.49, we conclude that the model does not fit the data. (b) (2 points). State the linear-by-liner association model. Provide two tests to assess significance of the linear-by-linear association term. Solution: The linear-by-linear association model is

log λij = μ + αi + βj + γ(uivj ),

where λij = E(yij ), αi with α 1 = 0 represents the main effects of educ, βj with β 1 = 0 represents the main effects of belief, γ is the coefficient of the linear-by-linear association term, and ui and vj are score values of educ and belief. In this output, we have ui = i and vj = j. We can use the Wald and the likelihood ratio test. The Wald statistic value is 7.493 and its p-value is

  1. 76 × 10 −^14. Thus, it conclude that the linear-by-linear association term is significant. The likelihood ratio statistic value is 66. 54 − 8 .76 = 57. 78 < χ^20. 05 , 1 = 3.84. It also conclude that the linear-by-linear association is significant. (c) (2 points). State the null hypothesis in the test between the linear-by-linear association model and the row effects model. Provide a test statistic to assess whether the row effects model can be reduced to the linear-by-linear association model. Solution: The row effects model is

log λij = μ + αi + βj + γivj ,

where λij = E(yij ), αi with α 1 = 0 represents row main effects, βj with β 1 = 0 represents column main effects, γi with γ 3 = 0 represents row effects in the interaction, and vj = j are scores of

(d) (2 points). Provide the 95% confidence interval for the probability of killed when conc = 0.6 in the model with overdispersion. Solution: The predicted value of the linear term is ηˆ = − 1 .5655 + 3. 2791 × 0 .6 = 0. 40196. The variance is

ϕˆ

( 1 0. 6

) (^0. 015926 − 0. 030085

) ( 1

  1. 6

) = 0. 02553.

The 95% confidence interval for η is 0. 40196 ± 1. 96

0 .02553 = [0. 08879 , 0 .71513]. The 95% confi- dence interval for the probability is [e^0.^08879 /(1+e^0.^08879 ), e^0.^71513 /(1+e^0.^71513 )] = [0. 5222 , 0 .67153].

  1. (6 points). The data reported the feeling of life (low, medium and high) with respect to income levels (xx) (1–low, 5– high). The R output is given below.

g <- multinom(yy~factor(xx),weight=freq) g$dev [1] 441. g$edf [1] 10 g1 <- multinom(yy~xx,weight=freq) g1$dev [1] 444. g1$edf [1] 4 summary(g1)$coefficient (Intercept) xx Median -0.1973812 0. High -0.3598186 0. g2 <- polr(yy~xx,weight=freq) g2$dev [1] 445. g2$edf [1] 3 summary(g2)$coefficient $ Re-fitting to get Hessian Value Std. Error t value xx 0.2171542 0.1032490 2. Low|Median -0.5685649 0.3895745 -1. Median|High 0.9913447 0.3932452 2.

(a) (2 points) Write down the model assumptions of the second and the third models in the output. Solution: Let π 1 (x), π 2 (x), and π 3 (x) for feeling levels. The assumption of the second model is

log πj^ (x) π 1 (x) = β 0 j + β 1 j x, j = 2, 3.

The assumption of the third model is

log

∑j k=1 πk(x) 1 − ∑j k=1 πk(x)

= β 0 j − β 1 x, j = 1, 2.

(b) (2 points). Provide a goodness-of-fit test about whether the multinomial and the proportional odds models fit the data if income levels are treated as their score values. Solution: The residual deviance of the second model is G^2 = 444. 8235 − 441 .7743 = 3. 0492 < χ 0. 05 , 6 = 12.59. Therefore, the second model fits the data. The residual deviance of the third model is G^2 = 445. 2111 − 441 .7743 = 3. 4368 < χ 0. 05 , 7 = 14.07. Therefore, the proportional odds model also fits the data. (c) (2 points). Predict the probability in the multinomial model and the proportional odds model, respectively, if the income level is 5. Solution: For the second model,

ηˆ 2 = − 0 .19738 + 5(0.20261) = 0. 81567 , ˆη 3 = − 0 .35982 + 5(0.32062) = 1. 24328.

Then, ˆπ 2 = e^0.^81567 πˆ 1 , ˆπ 3 = e^1.^24328 ˆπ 1 , and ˆπ 1 + ˆπ 2 + ˆπ 3 = 1, implying that ˆπ 1 = 0.1486, πˆ 2 = 0.3360, and ˆπ 3 = 0.5153. For the second model,

πˆ 1 = e−^0.^56856 −5(0.21715) 1 + e−^0.^56856 −5(0.21715)^ = 0.^1605 πˆ 3 =

e^0.^99134 −5(0.21715)^ = 0.^5236 πˆ 2 =1 − πˆ 1 − ˆπ 3 = 0. 3159.