




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A detailed analysis of regression modeling techniques applied to a crime data set from the georgia institute of technology. The analysis includes stepwise regression, elastic net, and lasso methods for variable selection and model building. The step-by-step process of scaling the data, performing cross-validation, and evaluating the model quality through adjusted r-squared and cross-validation r-squared metrics. The results show that variable selection plays a crucial role in developing a parsimonious and explanatory regression model, with the stepwise regression approach identifying a set of 8 significant variables that achieve the best model performance. The document also discusses the tradeoffs between model complexity and explanatory power, highlighting the importance of balancing simplicity and predictive accuracy in regression modeling.
Typology: Exams
1 / 8
This page cannot be seen from the preview
Don't miss anything!





Step: AIC=503. .outcome ~ M + Ed + Po1 + M.F + U1 + U2 + Ineq + Prob
Df Sum of Sq RSS AIC
Call: lm(formula = Crime ~ M.F + U1 + Prob + U2 + M + Ed + Ineq + Po1, data = scaledData)
Residuals: Min 1Q Median 3Q Max -444.70 -111.07 3.03 122.15 483.
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 905.09 28.52 31.731 < 2e-16 *** M.F 65.83 40.08 1.642 0. U1 -109.73 60.20 -1.823 0.. Prob -86.31 33.89 -2.547 0.01505 * U2 158.22 61.22 2.585 0.01371 * M 117.28 42.10 2.786 0.00828 ** Ed 201.50 59.02 3.414 0.00153 ** Ineq 244.70 55.69 4.394 8.63e-05 *** Po1 305.07 46.14 6.613 8.26e-08 ***
Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 195.5 on 38 degrees of freedom Multiple R-squared: 0.7888, Adjusted R-squared: 0. F-statistic: 17.74 on 8 and 38 DF, p-value: 1.159e-
s (Intercept) 889. So 45. M 77. Ed 98. Po1 311. Po. LF 2. M.F 46. Pop. NW 2. U. U2 29. Wealth. Ineq 164. Prob -77. Time.
Call: lm(formula = Crime ~ So + M + Ed + Po1 + LF + M.F + NW + U2 + Ineq + Prob, data = scaledData)
s (Intercept) 894. So 31. M 106. Ed 179. Po1 291. Po. LF. M.F 53. Pop -22. NW 18. U1 -78. U2 124. Wealth 63. Ineq 256. Prob -91. Time -1.
Call: lm(formula = Crime ~ So + M + Ed + Po1 + M.F + Pop + NW + U1 + U2 + Wealth + Ineq + Prob + Time, data = scaledData)
Residuals: Min 1Q Median 3Q Max -441.49 -112.23 17.28 116.21 476.
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 898.403 53.039 16.938 < 2e-16 *** So 19.630 128.514 0.153 0. M 114.387 50.973 2.244 0.031654 * Ed 194.473 64.239 3.027 0.004760 ** Po1 290.682 67.449 4.310 0.000139 *** M.F 47.290 49.692 0.952 0. Pop -31.180 47.769 -0.653 0. NW 21.612 60.190 0.359 0. U1 -91.277 67.188 -1.359 0. U2 142.231 68.245 2.084 0.044969 * Wealth 85.100 97.455 0.873 0. Ineq 286.057 86.442 3.309 0.002269 ** Prob -97.248 48.918 -1.988 0.. Time -8.224 46.715 -0.176 0.
Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 205.6 on 33 degrees of freedom Multiple R-squared: 0.7973, Adjusted R-squared: 0. F-statistic: 9.986 on 13 and 33 DF, p-value: 5.194e-