









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Resampling Methods - Exercise R code as soutution manual ISLR Introduction to Statistical Learning James, Witten, Hastie, Tibshirani
Typology: Exercises
1 / 15
This page cannot be seen from the preview
Don't miss anything!










title: "Chapter 5: Resampling Methods" author: "Solutions to Exercises" date: "January 18, 2016" output: html_document: keep_md: no
>EXERCISE 1: $$ Var(\alpha X + (1-\alpha)Y) \ = Var(\alpha X) + Var((1-\alpha)Y) +2 Cov(\alpha X, (1-\alpha)Y) \ = \alpha^2 \sigma_X^2 + (1-\alpha)^2 \sigma_Y^2 + 2 \alpha (1-\alpha) \sigma_{XY} \ = \alpha^2 \sigma_X^2 + (1+\alpha^2-2\alpha) \sigma_Y^2 + (2\alpha - 2\alpha^2) \sigma_{XY} \ = \alpha^2 \sigma_X^2 + \sigma_Y^2+\alpha^2\sigma_Y^2-2\alpha\sigma_Y^2 + 2\alpha \sigma_{XY} - 2\alpha^2 \sigma_{XY} $$ $$ \frac{\partial }{\partial \alpha}: 2\alpha\sigma_X^2 + 0 + 2\alpha\sigma_Y^2 - 2\sigma_Y^2 + 2 sigma_{XY} - 4\alpha\sigma_{XY} = 0 $$ $$ (2\sigma_X^2 + 2\sigma_Y^2 - 4\sigma_{XY}) \alpha = 2\sigma_Y^2 - 2\sigma_{XY} $$ $$ \alpha = \frac{\sigma_Y^2 - \sigma_{XY}}{\sigma_X^2 + \sigma_Y^2 - 2\sigma_{XY}} $$
>EXERCISE 2: Part a) Probability is equal to not selecting that one observation out of all observations: $\frac{n-1}{n}$ Part b) Because bootstrap uses replacement, the probability is the same as Part a: $\frac{n-1}{n}$ Part c) Probability of not selecting the jth observation is the same for each selection. After $n$ selections, the probability of never selecting the jth observation is: $(\frac{n-1}{n})^n = (1-\frac{1}{n})^n$ Part d)
1-(1-1/5)^Part e)
1-(1-1/100)^ >EXERCISE 3: __Part a)__ From page 181 in the text, the k-fold CV approach "involves randomly dividing the set of observations into k groups, or folds, of approximately equal size. The first fold is treated as a validation set, and the method is fit on the remaining k-1 folds. The mean squared error, MSE, is then computed on the observations in the held-out fold. This procedure is repeated k times." __Part b)__ * Compared to the validation set approach, k-fold CV has less variance but more bias * Compared to LOOCV approach, k-fold CV has more variance but less bias *** >EXERCISE 4: We can use the bootstrap method to sample with replacement from our dataset and estimate Y's from each sample. With the results of different predicted Y values, we can then estimate the standard deviation of our prediction. *** ## APPLIED *** ## >EXERCISE 5: __Part a)__ ```{r, warning=FALSE, message=FALSE} require(ISLR) data(Default) set.seed(1) fit1 <- glm(default ~ income + balance, data=Default, family=binomial) summary(fit1)Part b)
set.seed(1) train <- sample(nrow(Default), nrow(Default)*0.5) fit2 <- glm(default ~ income + balance, data=Default, family=binomial, subset=train) prob2 <- predict(fit2, Default[-train,], type="response") pred2 <- ifelse(prob2 > 0.5, "Yes", "No") table(pred2, Default[-train,]$default) mean(Default[-train,]$default != pred2) # test errorPart c)
set.seed(2) # Repeat 1 train <- sample(nrow(Default), nrow(Default)*0.5) ## ``` Test error with the `student` feature included is similar to without including `student` (no significant reduction) *** >EXERCISE 6: __Part a)__ ```{r, warning=FALSE, message=FALSE} require(ISLR) data(Default) set.seed(1) fit1 <- glm(default ~ income + balance, data=Default, family=binomial) summary(fit1)Estimated standard error is 0.000004985 for income and 0.0002274 for balance Part b)
set.seed(1) boot.fn <- function(df, trainid) { return(coef(glm(default ~ income + balance, data=df, family=binomial, subset=trainid))) } boot.fn(Default, 1:nrow(Default)) # check match with summaryPart c)
require(boot) boot(Default, boot.fn, R=100)Part d) Standard error estimates are pretty close using glm summary function versus bootstrap with R=
>EXERCISE 7: Part a)
require(ISLR) data(Weekly) set.seed(1) fit1 <- glm(Direction ~ Lag1 + Lag2, data=Weekly, family=binomial) str(loocv.err)Part e)
mean(loocv.err)Estimated test error with LOOCV is 44.4%
>EXERCISE 8: Part a)
set.seed(1) y <- rnorm(100) # why is this needed? x <- rnorm(100) y <- x - 2*x^2 + rnorm(100)$Y = X - 2X^2 + \epsilon$ $n = 100$ observations
$p = 2$ features Part b)
plot(x, y)Relationship between X and Y is quadratic Part c)
set.seed(1) df <- data.frame(y, x, x2=x^2, x3=x^3, x4=x^4) fit1 <- glm(y ~ x, data=df) cv.err1 <- cv.glm(df, fit1) cv.err1$delta fit2 <- glm(y ~ x + x2, data=df) cv.err2 <- cv.glm(df, fit2) cv.err2$delta fit3 <- glm(y ~ x + x2 + x3, data=df) cv.err3 <- cv.glm(df, fit3) cv.err3$delta fit4 <- glm(y ~ x + x2 + x3 + x4, data=df) cv.err4 <- cv.glm(df, fit4) cv.err4$delta fit0 <- lm(y ~ poly(x,4)) summary(fit0)Summary shows that only $X$ and $X^2$ are statistically significant predictors. This agrees with the LOOCV results that indicate using only $X$ and $X^2$ produces the best model.
>EXERCISE 9: Part a)
require(MASS) require(boot) data(Boston) (medv.mu <- mean(Boston$medv))Part b)
(medv.sd <- sd(Boston$medv)/sqrt(nrow(Boston)))Part c)
set.seed(1) mean.fn <- function(var, id) { return(mean(var[id])) } (boot.res <- boot(Boston$medv, mean.fn, R=100))Estimation from bootstrap with R=100 is 0.38, reasonably close to 0. Part d)
boot.res$t0 - 2*sd(boot.res$t) # lower bound boot.res$t0 + 2*sd(boot.res$t) # upper bound t.test(Boston$medv)Part e)
(medv.median <- median(Boston$medv))Part f)