Decision Trees: Top-Down Construction and Attribute Selection, Slides of Computer Fundamentals

An overview of decision trees, a popular machine learning algorithm for classification tasks. The construction of decision trees using a top-down approach, the importance of choosing the right splitting attribute, and the use of information gain and gain ratio as criteria for attribute selection. The document also includes examples of decision trees and their application to weather data.

Typology: Slides

2012/2013

Uploaded on 01/29/2013

ashu
ashu 🇮🇳

3.8

(16)

107 documents

1 / 30

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Classification:
Decision Trees
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e

Partial preview of the text

Download Decision Trees: Top-Down Construction and Attribute Selection and more Slides Computer Fundamentals in PDF only on Docsity!

Classification:

Decision Trees

Docsity.com

2

Outline

 Top-Down Decision Tree Construction

 Choosing the Splitting Attribute

 Information Gain and Gain Ratio

Docsity.com

4

Weather Data: Play or not Play?

Outlook Temperature Humidity Windy Play? sunny hot high false No sunny hot high true No overcast hot high false Yes rain mild high false Yes rain cool normal false Yes rain cool normal true No overcast cool normal true Yes sunny mild high false No sunny cool normal false Yes rain mild normal false Yes sunny mild normal true Yes overcast mild high true Yes overcast hot normal false Yes rain mild high true No

Note: Outlook is the Forecast, no relation to Microsoft email program

Docsity.com

5

overcast

high normal (^) true false

sunny (^) rain

No Yes No Yes

Yes

Example Tree for “Play?”

Outlook

Humidity Windy

Docsity.com

7

Choosing the Splitting Attribute

 At each node, available attributes are evaluated

on the basis of separating the classes of the

training examples. A Goodness function is used

for this purpose.

 Typical goodness functions:

 information gain (ID3/C4.5)  information gain ratio  gini index

witten&eibe Docsity.com

8

Which attribute to select?

witten&eibe Docsity.com

10

Computing information

 Information is measured in bits

 Given a probability distribution, the info required to

predict an event is the distribution’s entropy

 Entropy gives the information required in bits (this can involve fractions of bits!)

 Formula for computing the entropy:

entropy( p 1 (^) , p 2 ,, pn ) = − p 1 log p 1 − p 2 log p 2 − pn log p n

witten&eibe Docsity.com

11

Claude Shannon, who has died aged 84, perhaps more than anyone laid the groundwork for today’s digital revolution. His exposition of information theory, stating that all information could be represented mathematically as a succession of noughts and ones, facilitated the digital manipulation of data without which today’s information society would be unthinkable. Shannon’s master’s thesis, obtained in 1940 at MIT, demonstrated that problem solving could be achieved by manipulating the symbols 0 and 1 in a process that could be carried out automatically with electrical circuitry. That dissertation has been hailed as one of the most significant master’s theses of the 20th century. Eight years later, Shannon published another landmark paper, A M athem atical Theory of Com m unication , generally taken as his most important scientific contribution.

Born: 30 April 1916 Died: 23 February 2001

“Father of inform ation theory”

Shannon applied the same radical approach to cryptography research, in which he later became a consultant to the US government. Many of Shannon’s pioneering insights were developed before they could be applied in practical form. He was truly a remarkable man, yet unknown to most of the world. witten&eibe

*Claude Shannon

Docsity.com

13

Example: attribute “Outlook”, 2

 “Outlook” = “Sunny”:

 “Outlook” = “Overcast”:

 “Outlook” = “Rainy”:

 Expected information for attribute:

info([2,3]) =entropy(2/5,3/5)= − 2 / 5 log( 2 / 5 )− 3 / 5 log( 3 / 5 ) = 0. 971 bits

info([4,0]) =entropy(1,0) =− 1 log( 1 )− 0 log( 0 ) = 0 bits

info([3,2]) =entropy(3/5,2/5)= − 3 / 5 log( 3 / 5 )− 2 / 5 log( 2 / 5 ) = 0. 971 bits

_Note: log(0) is not defined, but we evaluate 0log(0) as zero_*

info([3,2], [4,0],[3,2]) =( 5 / 14 )× 0. 971 +( 4 / 14 )× 0 +( 5 / 14 )× 0. 971

= 0. 693 bits witten&eibe Docsity.com

14

Computing the information gain

 Information gain:

(information before split) – (information after split)

 Compute for attribute “Humidity”

gain(" Outlook") = info([9,5])-info([2,3],[4,0],[3,2]) = 0.940- 0.

= 0. 247 bits

witten&eibe Docsity.com

16

Computing the information gain

 Information gain:

(information before split) – (information after split)

 Information gain for attributes from weather

data:

gain(" Outlook") = info([9,5])-info([2,3],[4,0],[3,2]) = 0.940- 0.

= 0. 247 bits

gain(" Outlook")= 0. 247 bits gain(" Temperature")= 0. 029 bits gain(" Humidity")= 0. 152 bits gain(" Windy") = 0. 048 bits

witten&eibe Docsity.com

17

Continuing to split

gain(" Temperature")= 0. 571 bits

gain(" Humidity")= 0. 971 bits

gain(" Windy") = 0. 020 bits

witten&eibe Docsity.com

21

Highly-branching attributes

 Problematic: attributes with a large number of

values (extreme case: ID code)

 Subsets are more likely to be pure if there is a

large number of values

⇒Information gain is biased towards choosing attributes with a large number of values

⇒This may result in overfitting (selection of an attribute

that is non-optimal for prediction)

witten&eibe Docsity.com

22

Weather Data with ID code

ID Outlook Temperature Humidity Windy Play? A sunny hot high false No B sunny hot high true No C overcast hot high false Yes D rain mild high false Yes E rain cool normal false Yes F rain cool normal true No G overcast cool normal true Yes H sunny mild high false No I sunny cool normal false Yes J rain mild normal false Yes K sunny mild normal true Yes L overcast mild high true Yes M overcast hot normal false Yes N rain mild high true No Docsity.com