




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Main discussion in this file is about CHAID, Exhaustive CHAID Algorithms, Merging, Splitting, Stopping, The p-Value Calculations, Continuous dependent variable, Expected Cell Frequencies
Typology: Study notes
1 / 8
This page cannot be seen from the preview
Don't miss anything!





This document describes the tree growing process of CHAID and Exhaustive CHAID algorithms. The CHAID algorithm is originally proposed by Kass (1980) and the Exhaustive CHAID is by Biggs et al (1991). Algorithm CHAID and Exhaustive CHAID allow multiple splits of a node. Both CHAID and exhaustive CHAID algorithms consist of three steps: merging, splitting and stopping. A tree is grown by repeatedly using these three steps on each node starting form the root node.
Y The dependent variable, or target variable. It can be ordinal categorical, nominal categorical or continuous. If Y is categorical with J classes, its class takes values in C = {1, …, J }.
The set of all predictor variables. A predictor can be ordinal categorical, nominal categorical or continuous.
N
The whole learning sample.
The case weight associated with case n.
The frequency weight associated with case n. Non-integral positive value is rounded to its nearest integer.
The following algorithm only accepts nominal or ordinal categorical predictors. When predictors are continuous, they are transformed into ordinal predictors before using the following algorithm.
For each predictor variable X , merge non-significant categories. Each final category of X will result in one child node if X is used to split the node. The merging step also calculates the adjusted p -value that is to be used in the splitting step.
category. Then a new set of categories of X is formed. If it does not, then go to step 7.
The “best” split for each predictor is found in the merging step. The splitting step selects which predictor to be used to best split the node. Selection is accomplished by comparing the adjusted p -value associated with each predictor. The adjusted p -value is obtained in the merging step.
The stopping step checks if the tree growing process should be stopped according to the following stopping rules.
If the dependent variable Y is continuous, perform an ANOVA F test that tests if the means of Y for different categories of X are the same. This ANOVA F test calculates the F -statistic and hence derives the p -value as
∑∑
∑∑
= ∈
= ∈
i
f nD
n n n n i
I
i n D
n n n i
1
2
1
2
p = Pr ( F ( I − 1 , Nf − I )> F ) ,
where
∑
∑
∈
∈
n D
n n n
n D
n n n n i
∑
∑
∈
n D
n n
nD
n n n
y , (^) ∑ ∈
nD
If the dependent variable Y is nominal categorical, the null hypothesis of independence of X and Y is tested. To do the test, a contingency (or count) table is formed using classes of Y as columns and categories of the predictor X as rows. The expected cell frequencies under the null hypothesis are estimated. The observed cell frequencies and the expected cell frequencies are used to calculate Pearson chi-squared statistic or likelihood ratio statistic. The p -value is computed based on either one of these two statistics.
The Pearson’s Chi-square statistic and likelihood ratio statistic are respectively,
∑∑ = =
J
j
I
i (^) ij
ij ij
1 1
2 2
J
j
I
i
1
2
where (^) ∑ ∈
n D
following. The corresponding p -value is given by p =Pr (χ (^) d (^2) > X^2 )for Pearson’s Chi-
square test or p =Pr( χ (^) d (^2) > G^2 )for likelihood ratio test, where χ (^) d^2 follows a chi-squared distribution with degrees of freedom d = ( J - 1)( I - 1).
Estimation of Expected Cell Frequencies without case Weights
..
..
i j
where
n (^) i nij j
J (^) t
. = =
∑ 1
, n (^) j nij i
I (^) t
. = =
∑ 1
, n n (^) ij i
I
j
J (^) t t .. = = =
∑ ∑ 1 1
Estimation of Expected Cell Frequencies with Case Weights
If case weights are specified, the expected cell frequency under the null hypothesis of independence is of the form
mij wij α i β j
where α i and β (^) j are parameters to be estimated, and
ij
ij ij
w = , (^) ∑ ∈
nD
Parameters estimates αˆ (^) i , βˆ^ j , and hence m ˆ (^) ij , are resulted from the following iterative procedure.
∑ ∑
j
k ij
k i i j
k ij j
k i i
()
(). 1 ()
( 1 ). α β
α.
∑
− +
i
k ij i
k j j
1 ( 1 )
( 1 ). α
β.
∑ (^ )^ ∑
−
j
k ij
k i i
j
k s s i
k ij j
k j i
j ()
(). 1 () ()( )
( 1 ). α β γ
α.
∑ (^ )
− + −
i
k s s i
k ij i
k j j (^) j
1 ( 1 ) () (^ )
( 1 ). α γ
β.
k j
k ij ij i
j
− + + − = α β γ ,
∑
∑
j
j ij
j
j ij ij i
2 *
()
() ( 1 ) k i
i i
k k i i
γ
γ γ.
k j
k ij i
k ij
j
ij
( 1 ) ( 1 ) ( 1 )
k j
k α (^) i β γ and ( k + 1 )
αˆˆ ,β γ. Otherwise, k = k + 1 , go to 2.
The Bonferroni Adjustments
The adjusted p -value is calculated as the p -value times a Bonferroni multiplier. The Bonferroni multiplier adjusts for multiple tests.
Suppose that a predictor variable originally has I categories, and it is reduced to r categories after the merging step. The Bonferroni multiplier B is the number of possible ways that I
equation.
= (^) ∑
−
=
1
0
r
v
I v (^).
Exhaustive CHAID merges two categories iteratively until only two categories left. The Bonferroni multiplier B is the sum of number of possible ways of merging two categories at each iteration.
2
Missing Values
If the dependent variable of a case is missing, it will not be used in the analysis. If all predictor variables of a case are missing, this case is ignored. If the case weight is missing, zero, or negative, the case is ignored. If the frequency weight is missing, zero, or negative, the case is ignored.
Otherwise, missing values will be treated as a predictor category. For ordinal predictors, the algorithm first generates the “best” set of categories using all non-missing information from the data. Next the algorithm identifies the category that is most similar to the missing category. Finally, the algorithm decides whether to merge the missing category with its most similar category or to keep the missing category as a separate category. Two p-values are calculated, one for the set of categories formed by merging the missing category with its most similar category, and the other for the set of categories formed by adding the missing category as a separate category. Take the action that gives the smallest p -value.
For nominal predictors, the missing category is treated the same as other categories in the analysis.
References
Bigss, D., Ville, B., and Suen, E. (1991). A Method of Choosing Multiway Partitions for Classification and Decision Trees. Journal of Applied Statistics , 18, 1, 49-62.
Goodman, L. A. (1979). Simple Models for the Analysis of Association in Cross- Classifications Having Ordered Categories. Journal of the American Statistical Association , 74, 537-552.
Kass, G. V. (1980). An Exploratory Technique for Investigating Large Quantities of Categorical Data. Applied Statistics , 20, 2, 119-127.