Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

CHAID Algorithms - Mathematics and Statistics - Study Notes, Study notes of Mathematical Statistics

Alliance University Mathematical Statistics

Main discussion in this file is about CHAID, Exhaustive CHAID Algorithms, Merging, Splitting, Stopping, The p-Value Calculations, Continuous dependent variable, Expected Cell Frequencies

Typology: Study notes

2011/2012

Uploaded on 10/31/2012

sangawar 🇮🇳

4.5

(4)

118 documents

1 / 8

This page cannot be seen from the preview

Don't miss anything!

1

CHAID and Exhaustive CHAID Algorithms

This document describes the tree growing process of CHAID and Exhaustive CHAID

algorithms. The CHAID algorithm is originally proposed by Kass (1980) and the Exhaustive

CHAID is by Biggs et al (1991). Algorithm CHAID and Exhaustive CHAID allow multiple

splits of a node.

Both CHAID and exhaustive CHAID algorithms consist of three steps: merging, splitting and

stopping. A tree is grown by repeatedly using these three steps on each node starting form the

root node.

Notations

Y The dependent variable, or target variable. It can be ordinal categorical,

nominal categorical or continuous.

If Y is categorical with J classes, its class takes values in C = {1, …, J}.

m

X, m = 1, …, M The set of all predictor variables. A predictor can be ordinal categorical,

nominal categorical or continuous.

{}

N

n

nn y1

,=

=x! The whole learning sample.

n

w The case weight associated with case n.

n

f The frequency weight associated with case n. Non-integral positive value is

rounded to its nearest integer.

The CHAID Algorithm

The following algorithm only accepts nominal or ordinal categorical predictors. When

predictors are continuous, they are transformed into ordinal predictors before using the

following algorithm.

Merging

For each predictor variable X, merge non-significant categories. Each final category of X will

result in one child node if X is used to split the node. The merging step also calculates the

adjusted p-value that is to be used in the splitting step.

1. If X has 1 category only, stop and set the adjusted p-value to be 1.

2. If X has 2 categories, go to step 8.

3. Else, find the allowable pair of categories of X (an allowable pair of categories for

ordinal predictor is two adjacent categories, and for nominal predictor is any two

categories) that is least significantly different (i.e., most similar). The most similar pair is

the pair whose test statistic gives the largest p-value with respect to the dependent

variable Y. How to calculate p-value under various situations will be described in later

sections.

Discover Study notes of Mathematical Statistics Alliance University

Partial preview of the text

Download CHAID Algorithms - Mathematics and Statistics - Study Notes and more Study notes Mathematical Statistics in PDF only on Docsity!

CHAID and Exhaustive CHAID Algorithms

This document describes the tree growing process of CHAID and Exhaustive CHAID algorithms. The CHAID algorithm is originally proposed by Kass (1980) and the Exhaustive CHAID is by Biggs et al (1991). Algorithm CHAID and Exhaustive CHAID allow multiple splits of a node. Both CHAID and exhaustive CHAID algorithms consist of three steps: merging, splitting and stopping. A tree is grown by repeatedly using these three steps on each node starting form the root node.

Notations

Y The dependent variable, or target variable. It can be ordinal categorical, nominal categorical or continuous. If Y is categorical with J classes, its class takes values in C = {1, …, J }.

X m , m = 1, …, M

The set of all predictor variables. A predictor can be ordinal categorical, nominal categorical or continuous.

N

3 = x n , y nn = 1

The whole learning sample.

w n

The case weight associated with case n.

f n

The frequency weight associated with case n. Non-integral positive value is rounded to its nearest integer.

The CHAID Algorithm

The following algorithm only accepts nominal or ordinal categorical predictors. When predictors are continuous, they are transformed into ordinal predictors before using the following algorithm.

Merging

For each predictor variable X , merge non-significant categories. Each final category of X will result in one child node if X is used to split the node. The merging step also calculates the adjusted p -value that is to be used in the splitting step.

If X has 1 category only, stop and set the adjusted p -value to be 1.
If X has 2 categories, go to step 8.
Else, find the allowable pair of categories of X (an allowable pair of categories for ordinal predictor is two adjacent categories, and for nominal predictor is any two categories) that is least significantly different (i.e., most similar). The most similar pair is the pair whose test statistic gives the largest p -value with respect to the dependent variable Y. How to calculate p -value under various situations will be described in later sections.

For the pair having the largest p -value, check if its p -value is larger than a user-specified

alpha-level^ α^ merge ( alpha_merge ). If it does, this pair is merged into a single compound

category. Then a new set of categories of X is formed. If it does not, then go to step 7.

( Optional ) If the newly formed compound category consists of three or more original categories, then find the best binary split within the compound category which p -value is the smallest. Perform this binary split if its p -value is not larger than an alpha-level α (^) split-merge ( alpha_spli-merge ).
Go to step 2.
( Optional ) Any category having too few observations (as compared with a user-specified minimum segment size) is merged with the most similar other category as measured by the largest of the p -values.
The adjusted p -value is computed for the merged categories by applying Bonferroni adjustments that are to be discussed later.

Splitting

The “best” split for each predictor is found in the merging step. The splitting step selects which predictor to be used to best split the node. Selection is accomplished by comparing the adjusted p -value associated with each predictor. The adjusted p -value is obtained in the merging step.

Select the predictor that has the smallest adjusted p -value (i.e., most significant).
If this adjusted p -value is less than or equal to a user-specified alpha-level αsplit ( alpha_split ), split the node using this predictor. Else, do not split and the node is considered as a terminal node.

Stopping

The stopping step checks if the tree growing process should be stopped according to the following stopping rules.

If a node becomes pure; that is, all cases in a node have identical values of the dependent variable, the node will not be split.
If all cases in a node have identical values for each predictor, the node will not be split.
If the current tree depth reaches the user specified maximum tree depth limit value, the tree growing process will stop.
If the size of a node is less than the user-specified minimum node size value, the node will not be split.
If the split of a node results in a child node whose node size is less than the user- specified minimum child node size value, child nodes that have too few cases (as compared with this minimum) will merge with the most similar child node as measured by the largest of the p -values. However, if the resulting number of child nodes is 1, the node will not be split.

Continuous dependent variable

If the dependent variable Y is continuous, perform an ANOVA F test that tests if the means of Y for different categories of X are the same. This ANOVA F test calculates the F -statistic and hence derives the p -value as

∑∑

= ∈

= I

i

f nD

n n n n i

I

i n D

n n n i

w f I x i y y N I

w f I x i y y I

F

1

2

1

2

p = Pr ( F ( I − 1 , Nf − I )> F ) ,

where

∑

∈

n D

n n n

n D

n n n n i

w fI x i

w f yI x i

y

∑

∈

n D

n n

nD

n n n

w f

w f y

y , (^) ∑ ∈

nD

N f fn ,

and F ( I − 1 , Nf − I )) is a random variable following a F -distribution with degrees of

freedom I and N f − I.

Nominal dependent variable

If the dependent variable Y is nominal categorical, the null hypothesis of independence of X and Y is tested. To do the test, a contingency (or count) table is formed using classes of Y as columns and categories of the predictor X as rows. The expected cell frequencies under the null hypothesis are estimated. The observed cell frequencies and the expected cell frequencies are used to calculate Pearson chi-squared statistic or likelihood ratio statistic. The p -value is computed based on either one of these two statistics.

The Pearson’s Chi-square statistic and likelihood ratio statistic are respectively,

∑∑ = =

J

j

I

i (^) ij

ij ij

m

n m

X

1 1

2 2

∑∑

J

j

I

i

G nij nij mij

1

2

2 ln( /ˆ )

where (^) ∑ ∈

n D

nij fnI ( xn i yn j ) is the observed cell frequency and m ij is the

estimated expected cell frequency for cell ( x n = i , yn = j ) from independence model as

following. The corresponding p -value is given by p =Pr (χ (^) d (^2) > X^2 )for Pearson’s Chi-

square test or p =Pr( χ (^) d (^2) > G^2 )for likelihood ratio test, where χ (^) d^2 follows a chi-squared distribution with degrees of freedom d = ( J - 1)( I - 1).

Estimation of Expected Cell Frequencies without case Weights

..

n

nn

m

i j

ij =

where

n (^) i nij j

J (^) t

. = =

∑ 1

, n (^) j nij i

I (^) t

. = =

∑ 1

, n n (^) ij i

I

j

J (^) t t .. = = =

∑ ∑ 1 1

Estimation of Expected Cell Frequencies with Case Weights

If case weights are specified, the expected cell frequency under the null hypothesis of independence is of the form

mij wij α i β j

= −^1

where α i and β (^) j are parameters to be estimated, and

ij

ij ij

n

w

w = , (^) ∑ ∈

nD

wij wnfnI ( x i yn j ).

Parameters estimates αˆ (^) i , βˆ^ j , and hence m ˆ (^) ij , are resulted from the following iterative procedure.

k = 0 , α (^) i (^0 )= β( j^0 )= 1 , mij (^0 )= wij −^1.

∑ ∑

j

k ij

k i i j

k ij j

k i i

m

n

w

n

()

(). 1 ()

( 1 ). α β

α.

∑

− +

i

k ij i

k j j

w

n

1 ( 1 )

( 1 ). α

β.

mij (^ k +^1 )= wij −^1 α (^) i ( k +^1 ) β( jk +^1 ).

∑ (^ )^ ∑

−

j

k ij

k i i

j

k s s i

k ij j

k j i

m

n

w

n

j ()

(). 1 () ()( )

( 1 ). α β γ

α.

∑ (^ )

− + −

i

k s s i

k ij i

k j j (^) j

w

n

1 ( 1 ) () (^ )

( 1 ). α γ

β.

( )
- 1 ( 1 ) ( 1 ) ( k ) ( s s ) i

k j

k ij ij i

j

m w

− + + − = α β γ ,

∑

j

j ij

j

j ij ij i

s s m

s s n m

G

2 *

otherwise

()

() ( 1 ) k i

i i

k k i i

G G

γ

γ γ.

( ) ( 1 ) 1 ( 1 ) ( 1 ) ( k 1 ) ( s s ) i

k j

k ij i

k ij

j

m w

− + + + − = α β γ.

If (+^1 )− () < ε ,

max ijk ijk

ij

m m , stop and output

( 1 ) ( 1 ) ( 1 )

- k + i

k j

k α (^) i β γ and ( k + 1 )

mij as the

final estimates i j ,ˆˆ i , m ˆˆ ij

αˆˆ ,β γ. Otherwise, k = k + 1 , go to 2.

The Bonferroni Adjustments

The adjusted p -value is calculated as the p -value times a Bonferroni multiplier. The Bonferroni multiplier adjusts for multiple tests.

CHAID

Suppose that a predictor variable originally has I categories, and it is reduced to r categories after the merging step. The Bonferroni multiplier B is the number of possible ways that I

categories can be merged into r categories. For r = I, B = 1. For 2 ≤ r < I , use the following

equation.

= (^) ∑

−

=

Ordinalwithamissing category

Nominalpredictor

Ordinalpredictor

1

0

r

I

r

I

v r v

r v

r

I

B

r

v

I v (^).

Exhaustive CHAID

Exhaustive CHAID merges two categories iteratively until only two categories left. The Bonferroni multiplier B is the sum of number of possible ways of merging two categories at each iteration.

Ordinalwithamissing category

Nominalpredictor

Ordinalpredictor

2

I I

B.

Missing Values

If the dependent variable of a case is missing, it will not be used in the analysis. If all predictor variables of a case are missing, this case is ignored. If the case weight is missing, zero, or negative, the case is ignored. If the frequency weight is missing, zero, or negative, the case is ignored.

Otherwise, missing values will be treated as a predictor category. For ordinal predictors, the algorithm first generates the “best” set of categories using all non-missing information from the data. Next the algorithm identifies the category that is most similar to the missing category. Finally, the algorithm decides whether to merge the missing category with its most similar category or to keep the missing category as a separate category. Two p-values are calculated, one for the set of categories formed by merging the missing category with its most similar category, and the other for the set of categories formed by adding the missing category as a separate category. Take the action that gives the smallest p -value.

For nominal predictors, the missing category is treated the same as other categories in the analysis.

References

Bigss, D., Ville, B., and Suen, E. (1991). A Method of Choosing Multiway Partitions for Classification and Decision Trees. Journal of Applied Statistics , 18, 1, 49-62.

Goodman, L. A. (1979). Simple Models for the Analysis of Association in Cross- Classifications Having Ordered Categories. Journal of the American Statistical Association , 74, 537-552.

Kass, G. V. (1980). An Exploratory Technique for Investigating Large Quantities of Categorical Data. Applied Statistics , 20, 2, 119-127.

CHAID Algorithms - Mathematics and Statistics - Study Notes, Study notes of Mathematical Statistics

Related documents

Partial preview of the text

Download CHAID Algorithms - Mathematics and Statistics - Study Notes and more Study notes Mathematical Statistics in PDF only on Docsity!

CHAID and Exhaustive CHAID Algorithms

Notations

X m , m = 1, …, M

3 = x n , y nn = 1

w n

f n

The CHAID Algorithm

Merging

alpha-level^ α^ merge ( alpha_merge ). If it does, this pair is merged into a single compound

Splitting

Stopping

Continuous dependent variable

= I

w f I x i y y N I

w f I x i y y I

F

w fI x i

w f yI x i

y

w f

w f y

N f fn ,

and F ( I − 1 , Nf − I )) is a random variable following a F -distribution with degrees of

freedom I and N f − I.

Nominal dependent variable

m

n m

X

∑∑

G nij nij mij

2 ln( /ˆ )

nij fnI ( xn i yn j ) is the observed cell frequency and m ij is the

estimated expected cell frequency for cell ( x n = i , yn = j ) from independence model as

n

nn

m

ij =

= −^1

n

w

wij wnfnI ( x i yn j ).

m

n

w

n

w

n

m

n

w

n

w

n

m w

s s m

s s n m

G

otherwise

G G

m w

max ijk ijk

m m , stop and output

mij as the

final estimates i j ,ˆˆ i , m ˆˆ ij

CHAID

categories can be merged into r categories. For r = I, B = 1. For 2 ≤ r < I , use the following

Ordinalwithamissing category

Nominalpredictor

Ordinalpredictor

r

I

r

r

I

v r v

r v