Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Assignment and Risk Estimation - Mathematics and Statistics - Study Notes, Exercises of Mathematical Statistics

Alliance University Mathematical Statistics

Main discussion in this file is about Assignment, Risk Estimation, Assignment of a Node , Assignment of a case, Loss Function, Risk Estimation of a tree T, Resubstitution Estimate of the Risk

Typology: Exercises

2011/2012

Uploaded on 10/31/2012

sangawar 🇮🇳

4.5

(4)

118 documents

1 / 6

This page cannot be seen from the preview

Don't miss anything!

1

Assignment and Risk Estimation

This document discusses how a class or a value is assigned to a node and to a case and three

methods of risk estimation: the resubstitution method, test sample method and cross

validation method. The information is applicable to the tree growing algorithms CART,

CHAID, exhaustive CHAID and QUEST. Materials in this document are based on

Classification and Regression Trees by Breiman, et al (1984). It is assumed that a CART,

CHAID, exhaustive CHAID or QUEST tree has been grown successfully using a learning

sample.

Notations

Y The dependent variable, or target variable. It can be either categorical

(nominal or ordinal) or continuous.

If Y is categorical with J classes, its class takes values in C = {1, …, J}.

{}

N

n

nn y1

,=

=x! The learning sample where n

x and n

y are the predictor vector and

dependent variable for case n.

)(t! The learning samples that fall in node t.

n

f The frequency weight associated with case n. Non-integral positive value is

rounded to its nearest integer.

n

w The case weight associated with case n.

π

()j, j = 1, …, J Prior probability of Y = j

)|( jiC The cost of miss-classifying a class j case as a class i case, 0)|( =jjC .

Assignment

Once the tree is grown, an assignment (also called action or decision) is given to each node

based on the learning sample. To predict the dependent variable value for an incoming case,

we first find in which terminal node it falls, then use the assignment of that terminal node for

prediction.

Assignment of a Node

For any node t, let t

d be the assignment given to node t,







=continuous is )(

lcategorica is )(

*

Yty

Ytj

dt,

Discover Exercises of Mathematical Statistics Alliance University

Partial preview of the text

Download Assignment and Risk Estimation - Mathematics and Statistics - Study Notes and more Exercises Mathematical Statistics in PDF only on Docsity!

Assignment and Risk Estimation

This document discusses how a class or a value is assigned to a node and to a case and three methods of risk estimation: the resubstitution method, test sample method and cross validation method. The information is applicable to the tree growing algorithms CART, CHAID, exhaustive CHAID and QUEST. Materials in this document are based on Classification and Regression Trees by Breiman, et al (1984). It is assumed that a CART, CHAID, exhaustive CHAID or QUEST tree has been grown successfully using a learning sample.

Notations

Y The dependent variable, or target variable. It can be either categorical (nominal or ordinal) or continuous.

If Y is categorical with J classes, its class takes values in C = {1, …, J }.

N 3 = x n , y (^) nn = 1 The learning sample where x (^) n and yn are the predictor vector and

dependent variable for case n.

3 ( t ) The learning samples that fall in node^ t.

f (^) n The frequency weight associated with case n. Non-integral positive value is rounded to its nearest integer.

w n The case weight associated with case n.

π ( ) j , j = 1, …, J Prior probability of Y = j

C ( i | j ) The cost of miss-classifying a class j case as a class i case, C ( j | j )= 0.

Assignment

Once the tree is grown, an assignment (also called action or decision) is given to each node based on the learning sample. To predict the dependent variable value for an incoming case, we first find in which terminal node it falls, then use the assignment of that terminal node for prediction.

Assignment of a Node

For any node t , let dt be the assignment given to node t ,

() iscontinuous

() iscategorical

yt Y

j t Y d (^) t ,

j ( t )= argmin i ∑ jC ( i | j ) p ( j | t )

,

∈

n t

n n n w

w f y N t

y t !

,

where

j

p jt

p jt p j t ( ,)

w j

wj

N

N t p jt j ,

∈

n!

N w wnfn , ∑

∈

n!

N (^) w , j wnfnI ( yn j ),

∈

()

n t

N (^) w t wnfn !

∈

()

, (^ ) ( )

n t

N (^) w j t wnfnI yn j !

.

If there is more than one class j that achieves the minimum, choose j

* ( t ) to be the smallest

such j for which ∑

∈

()

, (^ ) ( )

n t

N (^) f jt fnI yn j !

is greater than 0, or the absolute smallest if

Nf , j ( t ) is zero for all of them.

For CHAID and exhaustive CHAID, use π ( j )= Nw , j Nw in the equation.

Assignment of a case

For a case with predictor vector x , the assignment or prediction dT ( x ) for this case by the tree T is

( )

(( )) iscontinuous

( ) iscategorical ( )

yt Y

j t Y d (^) T x

x x ,

where t ( x )is the terminal node the case falls in.

Risk estimation

Note that case weight is not involved in risk estimation, though it is involved in tree growing process and class assignment.

( )

∈

n D

n n T n n j fj

nD

n n T n j n fj

j

fL y d I y j L N

f Ly d L I y j N

s

2 2

,

2

,

2

x

,

∑ (^ )^ ∑

∈ ∈

nD

n n T n nD f

n n T n f

f L y d L N

f Ly d L N

s

2 2 2 2 ( , ( ))

x x.

Putting everything together we get

∈ ∈

∈

( ()) continuous

( ()| ) () categorical,M

~ (^) ()

2

~

,

~ ,

f y yt Y N

C j t j N t Y N

j

C j t j N t Y N

RT D

t TnDt

n n f

j tT

fj f j

t T j

fj f

,























 − −

























− 























 −

=

∑ ∑

∑

∑∑

∈ ∈

∈

( ()) ( | ) con

1

cat,M

() ( ()| )

( )

() ( ()| ) ( | ) cat,M ( )

1

Var( ( | ))

~ (^) ()

4 2 2

~ (^) ,

2

~

,

2 ,

2

,

~

2 2 2 ,

f y yt N RT D Y N

Y N

N tC j t j

N tC j t j N

j

N tC j t j N RT D Y N

RT D

t TnDt

n n f f

j (^) t T fj

tT

fj

fj fj

j tT

fj f f

π

,

where

∈

()

, (^ ) ( )

nDt

N (^) f j t fnI yn j.

The estimated standard error of R(T|D) is given by se( R ( T | D ))= var( R ( T | D )).

Risk estimation of a tree is often written as ∑

∈

tT

R T D Rt D ~

( | ) ( | ) with R ( t | D )

being the contribution from node t to the tree risk such that

∈

( ()) continuous

( ()| ) categorical,M

() ( ()| ) categorical,M

()

2

,

f y yt Y N

C j t j Y N

j N t

N tC j t j Y N

Rt D

nDt

n n f

j (^) fj

fj

j

fj f

.

Resubstitution Estimate of the Risk of a tree T

The resubstitution risk estimation method uses the same set of data (learning sample 3 ) that is used to grow the tree T to calculate its risk, i.e.

Var( ( )) Var( ( | ))

~

RT R T

RT RT Rt

Rt Rt

tT

=

∈

.

Test Sample Estimate of the Risk

The idea of test sample risk estimation is that the whole data set is divided into 2 mutually exclusive subsets 3 and (^3) ′. 3 is used as a learning sample to grow a tree T and (^3) ′is used as a test sample to check the accuracy of the tree. The test sample estimate is

Var( ( )) Var( ( | ))

R T R T

R T RT

ts

.

Cross Validation Estimate of the Risk of a tree T

Cross validation estimation is provided only when a tree is grown using the automatic tree growing process. Let T be a tree which has been grown using all data from the whole data set

3

0

. Let V ≥ 2 be a positive integer.

Divide 30 into V mutually exclusive subsets (^3) ′ (^) v , v = 1, …, V. Let (^3) v be 30 - (^3) ′ v , v = 1,

…, V.

For each v , consider (^3) v as a learning sample and grow a tree Tv on (^3) v by using the

same set of user specified stopping rules which was applied to grow T.

After Tv is grown and assignment ()

j (^) v t or y (^) v ( t )for node t of Tv is done, consider

3 ′ v as a test sample and calculate its test sample risk estimate ( (^) v )

ts R T.

Assignment and Risk Estimation - Mathematics and Statistics - Study Notes, Exercises of Mathematical Statistics

Related documents

Partial preview of the text

Download Assignment and Risk Estimation - Mathematics and Statistics - Study Notes and more Exercises Mathematical Statistics in PDF only on Docsity!

Assignment and Risk Estimation

Notations

Assignment

j ( t )= argmin i ∑ jC ( i | j ) p ( j | t )

N w wnfn , ∑

, (^ ) ( )

such j for which ∑

, (^ ) ( )

For CHAID and exhaustive CHAID, use π ( j )= Nw , j Nw in the equation.

Risk estimation

∑ (^ )^ ∑

RT D

, (^ ) ( )

Risk estimation of a tree is often written as ∑

RT R T

R T R T

R T RT