Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

ISLR Chapter 8 Tree-Based Methods Lab Manual, Exercises of Statistics

Colorado State University (CSU)Statistics

Introduction to Statistical Learning (James/Witten/Hastie/Tibshirani)

Typology: Exercises

2020/2021

Uploaded on 05/26/2021

ekansh 🇺🇸

4.3

(20)

266 documents

1 / 5

This page cannot be seen from the preview

Don't miss anything!

Chapter 8 - Tree-Based Methods

Lab Solution

1 Problem 9

(a). Create a training set containing a random sample of 800 observations, and a test set containing

the remaining observations.

library(ISLR)

attach(OJ)

set.seed(1)

train =sample(dim(OJ)[1], 800)

OJ.train =OJ[train, ]

OJ.test =OJ[-train, ]

(b). Fit a tree to the training data, with Purchase as the response and the other variables as predic-

tors. Use the summary() function to produce summary statistics about the tree, and describe the

results obtained. What is the training error rate? How many terminal nodes does the tree have?

library(tree)

oj.tree =tree(Purchase ~., data = OJ.train)

summary(oj.tree)

##

## Classification tree:

## tree(formula = Purchase ~ ., data = OJ.train)

## Variables actually used in tree construction:

## [1] "LoyalCH" "PriceDiff" "SpecialCH" "ListPriceDiff"

## Number of terminal nodes: 8

## Residual mean deviance: 0.7305 = 578.6 / 792

## Misclassification error rate: 0.165 = 132 / 800

As shown in the output, the tree has 8 terminal nodes.

(c). Type in the name of the tree object in order to get a detailed text output. Pick one of the

terminal nodes, and interpret the information displayed.

oj.tree

## node), split, n, deviance, yval, (yprob)

## * denotes terminal node

##

## 1) root 800 1064.00 CH ( 0.61750 0.38250 )

## 2) LoyalCH < 0.508643 350 409.30 MM ( 0.27143 0.72857 )

## 4) LoyalCH < 0.264232 166 122.10 MM ( 0.12048 0.87952 )

1

Discover Exercises of Statistics Colorado State University (CSU)

Partial preview of the text

Download ISLR Chapter 8 Tree-Based Methods Lab Manual and more Exercises Statistics in PDF only on Docsity!

Chapter 8 - Tree-Based Methods

Lab Solution

1 Problem 9

(a). Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.

library (ISLR) attach (OJ) set.seed (1)

train = sample ( dim (OJ)[1], 800) OJ.train = OJ[train, ] OJ.test = OJ[-train, ]

(b). Fit a tree to the training data, with Purchase as the response and the other variables as predic- tors. Use the summary() function to produce summary statistics about the tree, and describe the results obtained. What is the training error rate? How many terminal nodes does the tree have?

library (tree) oj.tree = tree (Purchase ~ ., data = OJ.train) summary (oj.tree)

Classification tree:

tree(formula = Purchase ~ ., data = OJ.train)

Variables actually used in tree construction:

[1] "LoyalCH" "PriceDiff" "SpecialCH" "ListPriceDiff"

Number of terminal nodes: 8

Residual mean deviance: 0.7305 = 578.6 / 792

Misclassification error rate: 0.165 = 132 / 800

As shown in the output, the tree has 8 terminal nodes.

(c). Type in the name of the tree object in order to get a detailed text output. Pick one of the terminal nodes, and interpret the information displayed.

oj.tree

node), split, n, deviance, yval, (yprob)

* denotes terminal node

1) root 800 1064.00 CH ( 0.61750 0.38250 )

2) LoyalCH < 0.508643 350 409.30 MM ( 0.27143 0.72857 )

4) LoyalCH < 0.264232 166 122.10 MM ( 0.12048 0.87952 )

8) LoyalCH < 0.0356415 57 10.07 MM ( 0.01754 0.98246 ) *

9) LoyalCH > 0.0356415 109 100.90 MM ( 0.17431 0.82569 ) *

5) LoyalCH > 0.264232 184 248.80 MM ( 0.40761 0.59239 )

10) PriceDiff < 0.195 83 91.66 MM ( 0.24096 0.75904 )

20) SpecialCH < 0.5 70 60.89 MM ( 0.15714 0.84286 ) *

21) SpecialCH > 0.5 13 16.05 CH ( 0.69231 0.30769 ) *

11) PriceDiff > 0.195 101 139.20 CH ( 0.54455 0.45545 ) *

3) LoyalCH > 0.508643 450 318.10 CH ( 0.88667 0.11333 )

6) LoyalCH < 0.764572 172 188.90 CH ( 0.76163 0.23837 )

12) ListPriceDiff < 0.235 70 95.61 CH ( 0.57143 0.42857 ) *

13) ListPriceDiff > 0.235 102 69.76 CH ( 0.89216 0.10784 ) *

7) LoyalCH > 0.764572 278 86.14 CH ( 0.96403 0.03597 ) *

The * symbols denote terminal nodes. If we select the one labeled “7)", we can see that there are 278 observations with a value for LoyalCH > 0.764572, and they are classified as “CH"

(d). Create a plot of the tree, and interpret the results.

plot (oj.tree) text (oj.tree, pretty = 0)

cv.oj = cv.tree (oj.tree, FUN = prune.tree)

(g). Produce a plot with tree size on the x-axis and cross-validated classification error rate on the y-axis.

plot (cv.oj$size, cv.oj$dev, type = "b", xlab = "Tree Size", ylab = "Deviance")

Tree Size

Deviance

(h). Which tree size corresponds to the lowest cross-validated classification error rate?

(i). Produce a pruned tree corresponding to the optimal tree size obtained using cross-validation. If cross-validation does not lead to selection of a pruned tree, then create a pruned tree with five terminal nodes.

oj.pruned = prune.tree (oj.tree, best = cv.oj$size[ which.min (cv.oj$dev)])

(j). Compare the training error rates between the pruned and unpruned trees. Which is higher?

summary (oj.pruned)

Classification tree:

snip.tree(tree = oj.tree, nodes = 4:5)

Variables actually used in tree construction:

[1] "LoyalCH" "ListPriceDiff"

Number of terminal nodes: 5

Residual mean deviance: 0.7829 = 622.4 / 795

Misclassification error rate: 0.1825 = 146 / 800

The pruned tree is higher

(k). Compare the test error rates between the pruned and unpruned trees. Which is higher?

pred.unpruned = predict (oj.tree, OJ.test, type = "class") misclass.unpruned = sum (OJ.test$Purchase != pred.unpruned) misclass.unpruned/ length (pred.unpruned)

## [1] 0.

pred.pruned = predict (oj.pruned, OJ.test, type = "class") misclass.pruned = sum (OJ.test$Purchase != pred.pruned) misclass.pruned/ length (pred.pruned)

## [1] 0.

The pruned tree has a higher test error

ISLR Chapter 8 Tree-Based Methods Lab Manual, Exercises of Statistics

Related documents

Partial preview of the text

Download ISLR Chapter 8 Tree-Based Methods Lab Manual and more Exercises Statistics in PDF only on Docsity!

Chapter 8 - Tree-Based Methods

Lab Solution

1 Problem 9

Classification tree:

tree(formula = Purchase ~ ., data = OJ.train)

Variables actually used in tree construction:

[1] "LoyalCH" "PriceDiff" "SpecialCH" "ListPriceDiff"

Number of terminal nodes: 8

Residual mean deviance: 0.7305 = 578.6 / 792

Misclassification error rate: 0.165 = 132 / 800

node), split, n, deviance, yval, (yprob)

* denotes terminal node

1) root 800 1064.00 CH ( 0.61750 0.38250 )

2) LoyalCH < 0.508643 350 409.30 MM ( 0.27143 0.72857 )

4) LoyalCH < 0.264232 166 122.10 MM ( 0.12048 0.87952 )

8) LoyalCH < 0.0356415 57 10.07 MM ( 0.01754 0.98246 ) *

9) LoyalCH > 0.0356415 109 100.90 MM ( 0.17431 0.82569 ) *

5) LoyalCH > 0.264232 184 248.80 MM ( 0.40761 0.59239 )

10) PriceDiff < 0.195 83 91.66 MM ( 0.24096 0.75904 )

20) SpecialCH < 0.5 70 60.89 MM ( 0.15714 0.84286 ) *

21) SpecialCH > 0.5 13 16.05 CH ( 0.69231 0.30769 ) *

11) PriceDiff > 0.195 101 139.20 CH ( 0.54455 0.45545 ) *

3) LoyalCH > 0.508643 450 318.10 CH ( 0.88667 0.11333 )

6) LoyalCH < 0.764572 172 188.90 CH ( 0.76163 0.23837 )

12) ListPriceDiff < 0.235 70 95.61 CH ( 0.57143 0.42857 ) *

13) ListPriceDiff > 0.235 102 69.76 CH ( 0.89216 0.10784 ) *

7) LoyalCH > 0.764572 278 86.14 CH ( 0.96403 0.03597 ) *

Tree Size

Deviance

Classification tree:

snip.tree(tree = oj.tree, nodes = 4:5)

Variables actually used in tree construction:

[1] "LoyalCH" "ListPriceDiff"

Number of terminal nodes: 5

Residual mean deviance: 0.7829 = 622.4 / 795

Misclassification error rate: 0.1825 = 146 / 800

## [1] 0.

## [1] 0.