Introduction to Statistical Learning ISLR Chapter 5 Solutions Code, Exercises of Statistics

Resampling Methods - Exercise R code as soutution manual ISLR Introduction to Statistical Learning James, Witten, Hastie, Tibshirani

Typology: Exercises

2020/2021

Uploaded on 05/26/2021

ekachakra
ekachakra 🇺🇸

4.6

(33)

268 documents

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
---
title: "Chapter 5: Resampling Methods"
author: "Solutions to Exercises"
date: "January 18, 2016"
output:
html_document:
keep_md: no
---
***
## CONCEPTUAL
***
<a id="ex01"></a>
>EXERCISE 1:
$$ Var(\alpha X + (1-\alpha)Y) \\
= Var(\alpha X) + Var((1-\alpha)Y) +2 Cov(\alpha X, (1-\alpha)Y) \\
= \alpha^2 \sigma_X^2 + (1-\alpha)^2 \sigma_Y^2 + 2 \alpha (1-\alpha) \sigma_{XY} \\
= \alpha^2 \sigma_X^2 + (1+\alpha^2-2\alpha) \sigma_Y^2 + (2\alpha - 2\alpha^2) \sigma_{XY} \\
= \alpha^2 \sigma_X^2 + \sigma_Y^2+\alpha^2\sigma_Y^2-2\alpha\sigma_Y^2 + 2\alpha \sigma_{XY} -
2\alpha^2 \sigma_{XY} $$
$$ \frac{\partial }{\partial \alpha}: 2\alpha\sigma_X^2 + 0 + 2\alpha\sigma_Y^2 - 2\sigma_Y^2 + 2\
sigma_{XY} - 4\alpha\sigma_{XY} = 0 $$
$$ (2\sigma_X^2 + 2\sigma_Y^2 - 4\sigma_{XY}) \alpha = 2\sigma_Y^2 - 2\sigma_{XY} $$
$$ \alpha = \frac{\sigma_Y^2 - \sigma_{XY}}{\sigma_X^2 + \sigma_Y^2 - 2\sigma_{XY}} $$
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Introduction to Statistical Learning ISLR Chapter 5 Solutions Code and more Exercises Statistics in PDF only on Docsity!

title: "Chapter 5: Resampling Methods" author: "Solutions to Exercises" date: "January 18, 2016" output: html_document: keep_md: no

CONCEPTUAL

>EXERCISE 1: $$ Var(\alpha X + (1-\alpha)Y) \ = Var(\alpha X) + Var((1-\alpha)Y) +2 Cov(\alpha X, (1-\alpha)Y) \ = \alpha^2 \sigma_X^2 + (1-\alpha)^2 \sigma_Y^2 + 2 \alpha (1-\alpha) \sigma_{XY} \ = \alpha^2 \sigma_X^2 + (1+\alpha^2-2\alpha) \sigma_Y^2 + (2\alpha - 2\alpha^2) \sigma_{XY} \ = \alpha^2 \sigma_X^2 + \sigma_Y^2+\alpha^2\sigma_Y^2-2\alpha\sigma_Y^2 + 2\alpha \sigma_{XY} - 2\alpha^2 \sigma_{XY} $$ $$ \frac{\partial }{\partial \alpha}: 2\alpha\sigma_X^2 + 0 + 2\alpha\sigma_Y^2 - 2\sigma_Y^2 + 2 sigma_{XY} - 4\alpha\sigma_{XY} = 0 $$ $$ (2\sigma_X^2 + 2\sigma_Y^2 - 4\sigma_{XY}) \alpha = 2\sigma_Y^2 - 2\sigma_{XY} $$ $$ \alpha = \frac{\sigma_Y^2 - \sigma_{XY}}{\sigma_X^2 + \sigma_Y^2 - 2\sigma_{XY}} $$

>EXERCISE 2: Part a) Probability is equal to not selecting that one observation out of all observations: $\frac{n-1}{n}$ Part b) Because bootstrap uses replacement, the probability is the same as Part a: $\frac{n-1}{n}$ Part c) Probability of not selecting the jth observation is the same for each selection. After $n$ selections, the probability of never selecting the jth observation is: $(\frac{n-1}{n})^n = (1-\frac{1}{n})^n$ Part d)

1-(1-1/5)^

Part e)

1-(1-1/100)^ &gt;EXERCISE 3: __Part a)__ From page 181 in the text, the k-fold CV approach "involves randomly dividing the set of observations into k groups, or folds, of approximately equal size. The first fold is treated as a validation set, and the method is fit on the remaining k-1 folds. The mean squared error, MSE, is then computed on the observations in the held-out fold. This procedure is repeated k times." __Part b)__ * Compared to the validation set approach, k-fold CV has less variance but more bias * Compared to LOOCV approach, k-fold CV has more variance but less bias *** &gt;EXERCISE 4: We can use the bootstrap method to sample with replacement from our dataset and estimate Y's from each sample. With the results of different predicted Y values, we can then estimate the standard deviation of our prediction. *** ## APPLIED *** ## &gt;EXERCISE 5: __Part a)__ ```{r, warning=FALSE, message=FALSE} require(ISLR) data(Default) set.seed(1) fit1 &lt;- glm(default ~ income + balance, data=Default, family=binomial) summary(fit1)

Part b)

set.seed(1) train &lt;- sample(nrow(Default), nrow(Default)*0.5) fit2 &lt;- glm(default ~ income + balance, data=Default, family=binomial, subset=train) prob2 &lt;- predict(fit2, Default[-train,], type="response") pred2 &lt;- ifelse(prob2 &gt; 0.5, "Yes", "No") table(pred2, Default[-train,]$default) mean(Default[-train,]$default != pred2) # test error

Part c)

set.seed(2) # Repeat 1 train &lt;- sample(nrow(Default), nrow(Default)*0.5) ## ``` Test error with the `student` feature included is similar to without including `student` (no significant reduction) *** &gt;EXERCISE 6: __Part a)__ ```{r, warning=FALSE, message=FALSE} require(ISLR) data(Default) set.seed(1) fit1 &lt;- glm(default ~ income + balance, data=Default, family=binomial) summary(fit1)

Estimated standard error is 0.000004985 for income and 0.0002274 for balance Part b)

set.seed(1) boot.fn &lt;- function(df, trainid) { return(coef(glm(default ~ income + balance, data=df, family=binomial, subset=trainid))) } boot.fn(Default, 1:nrow(Default)) # check match with summary

Part c)

require(boot) boot(Default, boot.fn, R=100)

Part d) Standard error estimates are pretty close using glm summary function versus bootstrap with R=

  • income: 4.985e-06 with glm summary, 4.128e-06 using bootstrap
  • balance: 2.274e-04 with glm summary, 2.106e-04 using bootstrap

>EXERCISE 7: Part a)

require(ISLR) data(Weekly) set.seed(1) fit1 &lt;- glm(Direction ~ Lag1 + Lag2, data=Weekly, family=binomial) str(loocv.err)

Part e)

mean(loocv.err)

Estimated test error with LOOCV is 44.4%

>EXERCISE 8: Part a)

set.seed(1) y &lt;- rnorm(100) # why is this needed? x &lt;- rnorm(100) y &lt;- x - 2*x^2 + rnorm(100)

$Y = X - 2X^2 + \epsilon$ $n = 100$ observations

$p = 2$ features Part b)

plot(x, y)

Relationship between X and Y is quadratic Part c)

set.seed(1) df &lt;- data.frame(y, x, x2=x^2, x3=x^3, x4=x^4) fit1 &lt;- glm(y ~ x, data=df) cv.err1 &lt;- cv.glm(df, fit1) cv.err1$delta fit2 &lt;- glm(y ~ x + x2, data=df) cv.err2 &lt;- cv.glm(df, fit2) cv.err2$delta fit3 &lt;- glm(y ~ x + x2 + x3, data=df) cv.err3 &lt;- cv.glm(df, fit3) cv.err3$delta fit4 &lt;- glm(y ~ x + x2 + x3 + x4, data=df) cv.err4 &lt;- cv.glm(df, fit4) cv.err4$delta fit0 &lt;- lm(y ~ poly(x,4)) summary(fit0)

Summary shows that only $X$ and $X^2$ are statistically significant predictors. This agrees with the LOOCV results that indicate using only $X$ and $X^2$ produces the best model.

>EXERCISE 9: Part a)

require(MASS) require(boot) data(Boston) (medv.mu &lt;- mean(Boston$medv))

Part b)

(medv.sd &lt;- sd(Boston$medv)/sqrt(nrow(Boston)))

Part c)

set.seed(1) mean.fn &lt;- function(var, id) { return(mean(var[id])) } (boot.res &lt;- boot(Boston$medv, mean.fn, R=100))

Estimation from bootstrap with R=100 is 0.38, reasonably close to 0. Part d)

boot.res$t0 - 2*sd(boot.res$t) # lower bound boot.res$t0 + 2*sd(boot.res$t) # upper bound t.test(Boston$medv)

Part e)

(medv.median &lt;- median(Boston$medv))

Part f)