

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The cost-complexity pruning process for cart and quest trees, which helps to find the optimally pruned subtree based on a given real number α. The process involves calculating the cost-complexity risk of a tree and finding the smallest optimally pruned subtree. The document also provides an algorithm to generate a sequence of pruned subtrees and select the right-sized subtree.
Typology: Study notes
1 / 4
This page cannot be seen from the preview
Don't miss anything!


Assuming a CART or QUEST tree has been grown successfully using a learning sample, this document describes the automatic cost-complexity pruning process for both CART and QUEST trees. Materials in this document are based on Classification and Regression Trees by Breiman et al (1984). Calculations of the risk estimates used throughout this document are given in “Assignment and Risk Estimation” (TREE-assignment-risk.pdf).
Given a tree T and a real number α , the cost-complexity risk of T with respect to α is
R α ( T )= R ( T )+ α| T ,
where |
T | is the number of terminal nodes and R ( T ) is the resubstitution risk estimate of T.
T T
α (^) ′′% α
. The optimally pruned
subtree may not be unique.
respect to α , and is denoted by T 0 ( α ).
Suppose that a tree T 0 was grown. The cost-complexity pruning process consists of two steps:
To generate a sequence of pruned subtrees in step 1, the cost-complexity pruning technique developed by Breiman et. al. (1984) is used. In generating the sequence of subtrees, only the
and an initial tree T 0 , there exists a sequence of real values
1
0
0
0 0
K
k k K K
where
k + (^) t ∈ Tk α = , Tk + 1 = { t ∈ Tk : gk ( s )> α (^) k + 1 forallancestorsoft} ,
k
k k kt
kt k
, ,
node t based on the learning sample.
Explicit algorithm
The algorithm can be used to generate a sequence of subtrees of T 0 for a given initial value α = αmin , and an initial tree T 0 = {1, …, # T 0 } where # T 0 is the number of nodes in T 0. For node t , let
To select the right sized pruned subtree from the sequence of pruned subtrees { Tk } kK = 0 of T 0 ,
an “honest” method is used to estimate the risk R ( Tk ) and its standard error se ( R^ ˆ^ ( Tk ))of each subtree T (^) k. Two methods can be used: the resubstitution estimation method and the test sample estimation method. Resubstitution estimation is used if there is no test sample. Test sample estimation is used if there is a testing sample. Select the subtree Tk* as the right sized subtree of T 0 based on one of the following rules.
Simple rule
The right sized tree is selected as the k * ∈ {0, 1, 2, …, K } such that
k (^) k
The b-SE rule
For any nonnegative real value b (default b = 1), the right sized tree is selected as the largest k ** ∈ {0, 1, 2, …, K } such that
R ˆ^ ( Tk ** )≤ R ˆ( Tk )+ b ⋅ se ( R ˆ( Tk )).**
References
Breiman, L., Friedman, J.H., Olshen, R., and Stone, C.J., 1984. Classification and Regression Trees Wadsworth & Brooks/Cole Advanced Books & Software, Pacific California.