Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

ISLR Chapter 6 Linear Model Selection and Regularization Lab Manual, Exercises of Statistics

Purdue University Statistics

Introduction to Statistical Learning (James/Witten/Hastie/Tibshirani)

Typology: Exercises

2020/2021

Uploaded on 05/26/2021

ekanga 🇺🇸

4.9

(16)

263 documents

1 / 5

This page cannot be seen from the preview

Don't miss anything!

Chapter 6 - Linear Model Selection and Regularization

Lab Solution

1 Problem 9

library(ISLR)

set.seed(1)

(a). Split the data set into a training set and a test set.

train.size =dim(College)[1]/2

train =sample(1:dim(College)[1], train.size)

test =-train

College.train =College[train, ]

College.test =College[test, ]

(b). Fit a linear model using least squares on the training set, and report the test error obtained

lm.fit =lm(Apps~., data=College.train)

lm.pred =predict(lm.fit, College.test)

mean((College.test[, "Apps"]-lm.pred)^2)

## [1] 1108531

The test MSE is 1108531.2630464

(c). Fit a ridge regression model on the training set, with λchosen by cross validation. Report the test error

obtained.

library(glmnet)

## Loading required package: Matrix

## Loading required package: foreach

## Loaded glmnet 2.0-16

train.mat =model.matrix(Apps~., data=College.train)

test.mat =model.matrix(Apps~., data=College.test)

grid =10 ^seq(4,-2,length=100)

mod.ridge =cv.glmnet(train.mat, College.train[, "Apps"], alpha=0,lambda=grid, thresh=1e-12)

lambda.best =mod.ridge$lambda.min

The best λfor ridge regression is 0.0114976

ridge.pred =predict(mod.ridge, newx=test.mat, s=lambda.best)

mean((College.test[, "Apps"]-ridge.pred)^2)

## [1] 1108512

The test MSE for ridge regression is 1108511.6962057, which is slightly smaller than linear regression

1

Discover Exercises of Statistics Purdue University

Partial preview of the text

Download ISLR Chapter 6 Linear Model Selection and Regularization Lab Manual and more Exercises Statistics in PDF only on Docsity!

Chapter 6 - Linear Model Selection and Regularization

Lab Solution

1 Problem 9

library (ISLR) set.seed (1)

(a). Split the data set into a training set and a test set.

train.size = dim (College)[1] / 2 train = sample (1: dim (College)[1], train.size) test = -train College.train = College[train, ] College.test = College[test, ]

(b). Fit a linear model using least squares on the training set, and report the test error obtained

lm.fit = lm (Apps~., data=College.train) lm.pred = predict (lm.fit, College.test) mean ((College.test[, "Apps"] - lm.pred)^2)

## [1] 1108531

The test MSE is 1108531.

(c). Fit a ridge regression model on the training set, with λ chosen by cross validation. Report the test error obtained.

library (glmnet)

_## Loading required package: Matrix

Loading required package: foreach

Loaded glmnet 2.0-_

train.mat = model.matrix (Apps~., data=College.train) test.mat = model.matrix (Apps~., data=College.test) grid = 10 ^ seq (4, -2, length=100) mod.ridge = cv.glmnet (train.mat, College.train[, "Apps"], alpha=0, lambda=grid, thresh=1e-12) lambda.best = mod.ridge$lambda.min

The best λ for ridge regression is 0.

ridge.pred = predict (mod.ridge, newx=test.mat, s=lambda.best) mean ((College.test[, "Apps"] - ridge.pred)^2)

## [1] 1108512

The test MSE for ridge regression is 1108511.6962057, which is slightly smaller than linear regression

(d). Fit a lasso model on the training set, with λ chosen by crossvalidation. Report the test error obtained, along with the number of non-zero coefficient estimates.

mod.lasso = cv.glmnet (train.mat, College.train[, "Apps"], alpha=1, lambda=grid, thresh=1e-12) lambda.best = mod.lasso$lambda.min

The best λ for the lasso is 28.

lasso.pred = predict (mod.lasso, newx=test.mat, s=lambda.best) mean ((College.test[, "Apps"] - lasso.pred)^2)

## [1] 1028718

The test MSE of 1028717.7713552 is lower than ridge regression and linear regression.

mod.lasso = glmnet ( model.matrix (Apps~., data=College), College[, "Apps"], alpha=1) predict (mod.lasso, s=lambda.best, type="coefficients")

19 x 1 sparse Matrix of class "dgCMatrix"

1 (Intercept) -649.

(Intercept).

PrivateYes -391.

Accept 1.

Enroll -0.

Top10perc 29.

Top25perc -0.

F.Undergrad.

P.Undergrad 0.

Outstate -0.

Room.Board 0.

Books.

Personal.

PhD -4.

Terminal -3.

S.F.Ratio 0.

perc.alumni -1.

Expend 0.

Grad.Rate 4.

The coefficients for several variables are pushed to zero.

(e). Fit a PCR model on the training set, with M chosen by crossvalidation. Report the test error obtained, along with the value of M selected by cross-validation.

library (pls)

_## Attaching package: ’pls’

The following object is masked from ’package:stats’:

loadings_

pcr.fit = pcr (Apps~., data=College.train, scale=T, validation="CV") validationplot (pcr.fit, val.type="MSEP")

Apps

number of components

MSEP

Here, 6 components seems best.

pls.pred = predict (pls.fit, College.test, ncomp=6) mean ((College.test[, "Apps"] - data.frame (pls.pred))^2)

Warning in mean.default((College.test[, "Apps"] - data.frame(pls.pred))^2): argument is not numeric

or logical: returning NA

[1] NA

The test MSE for PLS is NA

(g). Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?

test.avg = mean (College.test[, "Apps"]) lm.test.r2=1- mean ((College.test[,"Apps"]-lm.pred)^2)/ mean ((College.test[,"Apps"]-test.avg)^2) ridge.test.r2=1- mean ((College.test[,"Apps"]-ridge.pred)^2)/ mean ((College.test[,"Apps"]-test.avg)^2)

lasso.test.r2=1- mean ((College.test[,"Apps"]-lasso.pred)^2)/ mean ((College.test[,"Apps"]-test.avg)^2) pcr.test.r2=1- mean ((College.test[,"Apps"]- data.frame (pcr.pred))^2)/ mean ((College.test[,"Apps"]-test.avg)^2)

Warning in mean.default((College.test[, "Apps"] - data.frame(pcr.pred))^2): argument is not numeric

or logical: returning NA

pls.test.r2=1- mean ((College.test[,"Apps"]- data.frame (pls.pred))^2)/ mean ((College.test[,"Apps"]-test.avg)^2)

Warning in mean.default((College.test[, "Apps"] - data.frame(pls.pred))^2): argument is not numeric

or logical: returning NA

rbind ( c ("OLS", "Ridge", "Lasso", "PCR", "PLS"), c (lm.test.r2, ridge.test.r2, lasso.test.r2, pcr.test.r2, pls.test.r2))

## [,1] [,2] [,3] [,4] [,5]

[1,] "OLS" "Ridge" "Lasso" "PCR" "PLS"

[2,] "0.901068223618923" "0.901069969877579" "0.908191243758643" NA NA

Each methodology has a reasonably large R^2 and they are all fairly close together. There may not be much of a difference between them.

ISLR Chapter 6 Linear Model Selection and Regularization Lab Manual, Exercises of Statistics

Related documents

Partial preview of the text

Download ISLR Chapter 6 Linear Model Selection and Regularization Lab Manual and more Exercises Statistics in PDF only on Docsity!

Chapter 6 - Linear Model Selection and Regularization

Lab Solution

1 Problem 9

## [1] 1108531

Loading required package: foreach

Loaded glmnet 2.0-_

## [1] 1108512

## [1] 1028718

19 x 1 sparse Matrix of class "dgCMatrix"

1

(Intercept) -649.

(Intercept).

PrivateYes -391.

Accept 1.

Enroll -0.

Top10perc 29.

Top25perc -0.

F.Undergrad.

P.Undergrad 0.

Outstate -0.

Room.Board 0.

Books.

Personal.

PhD -4.

Terminal -3.

S.F.Ratio 0.

perc.alumni -1.

Expend 0.

Grad.Rate 4.

The following object is masked from ’package:stats’:

loadings_

number of components

MSEP

Warning in mean.default((College.test[, "Apps"] - data.frame(pls.pred))^2): argument is not numeric

[1] NA

Warning in mean.default((College.test[, "Apps"] - data.frame(pcr.pred))^2): argument is not numeric

Warning in mean.default((College.test[, "Apps"] - data.frame(pls.pred))^2): argument is not numeric

## [,1] [,2] [,3] [,4] [,5]

[1,] "OLS" "Ridge" "Lasso" "PCR" "PLS"

[2,] "0.901068223618923" "0.901069969877579" "0.908191243758643" NA NA