Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Cost-Complexity Pruning Process for CART and QUEST Trees, Study notes of Mathematical Statistics

Alliance University Mathematical Statistics

The cost-complexity pruning process for cart and quest trees, which helps to find the optimally pruned subtree based on a given real number α. The process involves calculating the cost-complexity risk of a tree and finding the smallest optimally pruned subtree. The document also provides an algorithm to generate a sequence of pruned subtrees and select the right-sized subtree.

Typology: Study notes

2011/2012

Uploaded on 10/31/2012

sangawar 🇮🇳

4.5

(4)

118 documents

1 / 4

This page cannot be seen from the preview

Don't miss anything!

1

Cost-Complexity Pruning Process

Assuming a CART or QUEST tree has been grown successfully using a learning sample, this

document describes the automatic cost-complexity pruning process for both CART and

QUEST trees. Materials in this document are based on Classification and Regression Trees

by Breiman et al (1984). Calculations of the risk estimates used throughout this document are

given in “Assignment and Risk Estimation” (TREE-assignment-risk.pdf).

Cost-Complexity Risk of a Tree T

Given a tree T and a real number

α

, the cost-complexity risk of T with respect to

α

is

|

~

|)()( TTRTR

α

+= ,

where |

~

T

| is the number of terminal nodes and R(T) is the resubstitution risk estimate of T.

Smallest Optimally Pruned Subtree

Pruned subtree: For any tree

T

,

T

′ is a pruned subtree of

T

if

T

′ is a tree with the same

root node as T and all nodes of

T

′ are also nodes of

T

. Denote TT %

′if

T

′ is a pruned

subtree of

T

.

Optimally pruned subtree: Given

α

, a pruned subtree T’ of T is called an optimally

pruned subtree of T with respect to

α

if )(min)( TRTR TT ′′

=

′′′

αα

%

. The optimally pruned

subtree may not be unique.

Smallest optimally pruned subtree: If TT ′′′% for any optimally pruned subtree T” %T0

such that RT

α

()

′ = RT

α

()

′′ , then T’ is the smallest optimally pruned subtree of 0

T with

respect to

α

, and is denoted by T0(

α

).

Cost-Complexity Pruning Process

Suppose that a tree T0 was grown. The cost-complexity pruning process consists of two steps:

1. Based on the learning sample, find a sequence of pruned subtrees {}Tkk

K=0 of T0 such

that T0 T1 T2 … TK, where TK has only the root node of T0.

2. Find an “honest” risk estimate 

R

(Tk) of each subtree. Select a right sized tree from the

sequence of pruned subtrees.

Discover Study notes of Mathematical Statistics Alliance University

Partial preview of the text

Download Cost-Complexity Pruning Process for CART and QUEST Trees and more Study notes Mathematical Statistics in PDF only on Docsity!

Cost-Complexity Pruning Process

Assuming a CART or QUEST tree has been grown successfully using a learning sample, this document describes the automatic cost-complexity pruning process for both CART and QUEST trees. Materials in this document are based on Classification and Regression Trees by Breiman et al (1984). Calculations of the risk estimates used throughout this document are given in “Assignment and Risk Estimation” (TREE-assignment-risk.pdf).

Cost-Complexity Risk of a Tree T

Given a tree T and a real number α , the cost-complexity risk of T with respect to α is

R α ( T )= R ( T )+ α| T ,

where |

T | is the number of terminal nodes and R ( T ) is the resubstitution risk estimate of T.

Smallest Optimally Pruned Subtree

Pruned subtree : For any tree T , T ′ is a pruned subtree of T if T ′ is a tree with the same

root node as T and all nodes of T ′^ are also nodes of T. Denote T ′ 7 T if T ′^ is a pruned

subtree of T.

Optimally pruned subtree : Given α , a pruned subtree T’ of T is called an optimally

pruned subtree of T with respect to α if R ( T ) min R ( T )

T T

α (^) ′′% α

. The optimally pruned

subtree may not be unique.

Smallest optimally pruned subtree : If T ′ 7 T ′′ for any optimally pruned subtree T” 7 T 0

such that R α ( T ′) = R α ( T ′′ ), then T’ is the smallest optimally pruned subtree of T 0 with

respect to α , and is denoted by T 0 ( α ).

Cost-Complexity Pruning Process

Suppose that a tree T 0 was grown. The cost-complexity pruning process consists of two steps:

Based on the learning sample , find a sequence of pruned subtrees { Tk } kK = 0 of T 0 such that T 0 2 T 1 2 T 2 2 … 2 TK , where TK has only the root node of T 0.
Find an “honest” risk estimate R ( Tk ) of each subtree. Select a right sized tree from the sequence of pruned subtrees.

Generate a sequence of smallest optimally pruned subtrees

To generate a sequence of pruned subtrees in step 1, the cost-complexity pruning technique developed by Breiman et. al. (1984) is used. In generating the sequence of subtrees, only the

learning sample is used. Given any real value αmin ( α min = 0 in any SPSS implementation)

and an initial tree T 0 , there exists a sequence of real values

− ∞< α 1 = αmin< α 2 <"< α K < +∞ and a sequence of pruned subtrees

T 0 2 T 12 " 2 T K , such that the smallest optimally pruned subtree of T 0 for a given α is

1

0

0 0

K

k k K K

k k k K

T T

T

T ,

where

1 min g^ k ( t )

k + (^) t ∈ Tk α = , Tk + 1 = { t ∈ Tk : gk ( s )> α (^) k + 1 forallancestorsoft} ,

k

k k kt

kt k

t T

t T T

T

Rt RT

g t

, ,

Tk , t

is the branch of Tk stemming from node t, and R(t) is the resubstitution risk estimate of

node t based on the learning sample.

Explicit algorithm

The algorithm can be used to generate a sequence of subtrees of T 0 for a given initial value α = αmin , and an initial tree T 0 = {1, …, # T 0 } where # T 0 is the number of nodes in T 0. For node t , let

leftchildof otherwise

0 isterminal

t

lt t ,

rightchildof otherwise

0 isterminal

t

rt t ,

parentof otherwise

0 isrootnode

t

pat

Selecting the Right Sized Subtree

To select the right sized pruned subtree from the sequence of pruned subtrees { Tk } kK = 0 of T 0 ,

an “honest” method is used to estimate the risk R ( Tk ) and its standard error se ( R^ ˆ^ ( Tk ))of each subtree T (^) k. Two methods can be used: the resubstitution estimation method and the test sample estimation method. Resubstitution estimation is used if there is no test sample. Test sample estimation is used if there is a testing sample. Select the subtree Tk* as the right sized subtree of T 0 based on one of the following rules.

Simple rule

The right sized tree is selected as the k * ∈ {0, 1, 2, …, K } such that

ˆ ( * ) minˆ( k )

k (^) k

R T = RT.

The b-SE rule

For any nonnegative real value b (default b = 1), the right sized tree is selected as the largest k ** ∈ {0, 1, 2, …, K } such that

R ˆ^ ( Tk ** )≤ R ˆ( Tk )+ b ⋅ se ( R ˆ( Tk )).**

References

Breiman, L., Friedman, J.H., Olshen, R., and Stone, C.J., 1984. Classification and Regression Trees Wadsworth & Brooks/Cole Advanced Books & Software, Pacific California.

Cost-Complexity Pruning Process for CART and QUEST Trees, Study notes of Mathematical Statistics

Related documents

Partial preview of the text

Download Cost-Complexity Pruning Process for CART and QUEST Trees and more Study notes Mathematical Statistics in PDF only on Docsity!

Cost-Complexity Pruning Process

Cost-Complexity Risk of a Tree T

Smallest Optimally Pruned Subtree

Pruned subtree : For any tree T , T ′ is a pruned subtree of T if T ′ is a tree with the same

root node as T and all nodes of T ′^ are also nodes of T. Denote T ′ 7 T if T ′^ is a pruned

subtree of T.

Optimally pruned subtree : Given α , a pruned subtree T’ of T is called an optimally

pruned subtree of T with respect to α if R ( T ) min R ( T )

Smallest optimally pruned subtree : If T ′ 7 T ′′ for any optimally pruned subtree T” 7 T 0

such that R α ( T ′) = R α ( T ′′ ), then T’ is the smallest optimally pruned subtree of T 0 with

Cost-Complexity Pruning Process

Generate a sequence of smallest optimally pruned subtrees

learning sample is used. Given any real value αmin ( α min = 0 in any SPSS implementation)

− ∞< α 1 = αmin< α 2 <"< α K < +∞ and a sequence of pruned subtrees

T 0 2 T 12 " 2 T K , such that the smallest optimally pruned subtree of T 0 for a given α is

k k k K

T T

T T

T

T ,

1 min g^ k ( t )

t T

t T T

T

Rt RT

g t

Tk , t

is the branch of Tk stemming from node t, and R(t) is the resubstitution risk estimate of

leftchildof otherwise

0 isterminal

t

t

lt t ,

rightchildof otherwise

0 isterminal

t

t

rt t ,

parentof otherwise

0 isrootnode

t

t

pat

Selecting the Right Sized Subtree

ˆ ( * ) minˆ( k )

R T = RT.