Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Decision Tree Classifiers, Lecture notes of Machine Learning

Indian Institute of Technology Machine Learning

Decision Tree Classifiers lecture notes. CS 419

Typology: Lecture notes

2017/2018

Uploaded on 09/07/2018

mohd-safwan 🇮🇳

1 document

1 / 67

This page cannot be seen from the preview

Don't miss anything!

Decision tree learning

Sunita Sarawagi

IIT Bombay

http://www.it.iitb.ac.in/~sunita

Discover Lecture notes of Machine Learning Indian Institute of Technology

Partial preview of the text

Download Decision Tree Classifiers and more Lecture notes Machine Learning in PDF only on Docsity!

Decision tree learning

Sunita Sarawagi

IIT Bombay

http://www.it.iitb.ac.in/~sunita

Decision tree classifiers

Widely used learning method
Easy to interpret: can be re-represented as if-then-else

rules

Approximates function by piece wise constant regions
Does not require any prior knowledge of data distribution,

works well on noisy data.

Has been applied to:
- classify medical patients based on the disease,
- equipment malfunction by cause,
- loan applicant by likelihood of payment.
- lots and lots of other applications..

Tree where internal nodes are simple decision rules on

one or more attributes and leaf nodes are predicted

class labels.

Decision trees

Salary < 1 M

Prof = teaching

Good

Age < 30

Bad Bad

Good

Training Dataset

age income student credit_rating buys_computer

<=30 high no fair no

<=30 high no excellent no

30…40 high no fair yes

40 medium no fair yes

40 low yes fair yes

40 low yes excellent no

31…40 low yes excellent yes

<=30 medium no fair no

<=30 low yes fair yes

40 medium yes fair yes

<=30 medium yes excellent yes

31…40 medium no excellent yes

31…40 high yes fair yes

40 medium no excellent no

This

follows

an

example

from

Quinlan’s

ID

Weather Data: Play or not Play?

Outlook Temperature Humidity Windy Play?

sunny hot high false No

sunny hot high true No

overcast hot high false Yes

rain mild high false Yes

rain cool normal false Yes

rain cool normal true No

overcast cool normal true Yes

sunny mild high false No

sunny cool normal false Yes

rain mild normal false Yes

sunny mild normal true Yes

overcast mild high true Yes

overcast hot normal false Yes

rain mild high true No

Note:

Outlook is the

Forecast,

no relation to

Microsoft

email program

overcast

high normal false

true

sunny

rain

No Yes No Yes

Yes

Example Tree for “Play?”

Outlook

Humidity

Windy

Tree learning algorithms

ID3 (Quinlan 1986)
Successor C4.5 (Quinlan 1993)
CART
SLIQ (Mehta et al)
SPRINT (Shafer et al)

Basic algorithm for tree building

Greedy top-down construction.

Gen_Tree (Node, data)

make node a leaf?

Yes

Stop

Find best attribute and best split on attribute

Partition data on split condition

For each child j of node Gen_Tree (node_j, data_j)

Selection

criteria

Measures of impurity

Entropy
Gini



i i

Entropy S p p

( ) log



Gini S p

Information gain

Information gain on partitioning S into r subsets
Impurity (S) - sum of weighted impurity of each subset

Entropy





Entropy S

S

Gain S S S Entropy S

Gini

Information gain: example

K= 2, |S| = 100, p

= 0.6, p

= 0.

E(S) = -0.6 log(0.6) - 0.4 log (0.4)=0.

| S

| = 70, p

= 0.8, p

= 0.

E(S

) = -0.8log0.8 - 0.2log0.2 = 0.

| S

| = 30, p

= 0.13, p

= 0.

E(S

) = -0.13log0.13 - 0.87 log 0.87=.

Information gain: E(S) - (0.7 E(S

) + 0.3 E(S

) ) =0.

Weather Data: Play or not Play?

Outlook Temperature Humidity Windy Play?

sunny hot high false No

sunny hot high true No

overcast hot high false Yes

rain mild high false Yes

rain cool normal false Yes

rain cool normal true No

overcast cool normal true Yes

sunny mild high false No

sunny cool normal false Yes

rain mild normal false Yes

sunny mild normal true Yes

overcast mild high true Yes

overcast hot normal false Yes

rain mild high true No

Example: attribute “Outlook”

“Outlook” = “Sunny”:
“Outlook” = “Overcast”:
“Outlook” = “Rainy”:
Expected information for attribute:

info([2,3]) entropy(2/5,3/5)  2 / 5 log( 2 / 5 ) 3 / 5 log( 3 / 5 )  0. 971 bits

info([4,0]) entropy(1,0)  1 log( 1 ) 0 log( 0 )  0 bits

info([3,2]) entropy(3/5,2/5)  3 / 5 log( 3 / 5 ) 2 / 5 log( 2 / 5 )  0. 971 bits

Note: log(0) is

not defined, but

we evaluate

0*log(0) as zero

info([3,2], [4,0],[3,2]) ( 5 / 14 ) 0. 971 ( 4 / 14 ) 0 ( 5 / 14 ) 0. 971

 0. 693 bits

witten&eibe

Computing the information gain

Information gain:

(information before split) – (information after split)

Information gain for attributes from weather

data:

gain(" Outlook")  info([9,5])-info([2,3],[4,0],[3,2])  0.940- 0.

 0. 247 bits

gain("Outlook" )  0. 247 bits

gain("Temperatur e") 0. 029 bits

gain(" Humidity") 0. 152 bits

gain(" Windy")  0. 048 bits

witten&eibe

Decision Tree Classifiers, Lecture notes of Machine Learning

Related documents

Partial preview of the text

Download Decision Tree Classifiers and more Lecture notes Machine Learning in PDF only on Docsity!

Decision tree learning

Decision tree classifiers

Decision trees

Training Dataset

This

follows

an

example

from

Quinlan’s

ID

Weather Data: Play or not Play?

Example Tree for “Play?”

Tree learning algorithms

Basic algorithm for tree building

Measures of impurity

Entropy S p p

( ) log

Gini S p

Information gain

Entropy S

S

S

Gain S S S Entropy S

Information gain: example

Weather Data: Play or not Play?

Example: attribute “Outlook”

Computing the information gain

(information before split) – (information after split)

data:

gain(" Outlook")  info([9,5])-info([2,3],[4,0],[3,2])  0.940- 0.