






















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An overview of decision trees, a popular machine learning algorithm for classification tasks. The construction of decision trees using a top-down approach, the importance of choosing the right splitting attribute, and the use of information gain and gain ratio as criteria for attribute selection. The document also includes examples of decision trees and their application to weather data.
Typology: Slides
1 / 30
This page cannot be seen from the preview
Don't miss anything!























Docsity.com
2
Docsity.com
4
Outlook Temperature Humidity Windy Play? sunny hot high false No sunny hot high true No overcast hot high false Yes rain mild high false Yes rain cool normal false Yes rain cool normal true No overcast cool normal true Yes sunny mild high false No sunny cool normal false Yes rain mild normal false Yes sunny mild normal true Yes overcast mild high true Yes overcast hot normal false Yes rain mild high true No
Note: Outlook is the Forecast, no relation to Microsoft email program
Docsity.com
5
overcast
high normal (^) true false
sunny (^) rain
No Yes No Yes
Yes
Outlook
Humidity Windy
Docsity.com
7
information gain (ID3/C4.5) information gain ratio gini index
witten&eibe Docsity.com
8
witten&eibe Docsity.com
10
Given a probability distribution, the info required to
Entropy gives the information required in bits (this can involve fractions of bits!)
entropy( p 1 (^) , p 2 ,, pn ) = − p 1 log p 1 − p 2 log p 2 − pn log p n
witten&eibe Docsity.com
11
Claude Shannon, who has died aged 84, perhaps more than anyone laid the groundwork for today’s digital revolution. His exposition of information theory, stating that all information could be represented mathematically as a succession of noughts and ones, facilitated the digital manipulation of data without which today’s information society would be unthinkable. Shannon’s master’s thesis, obtained in 1940 at MIT, demonstrated that problem solving could be achieved by manipulating the symbols 0 and 1 in a process that could be carried out automatically with electrical circuitry. That dissertation has been hailed as one of the most significant master’s theses of the 20th century. Eight years later, Shannon published another landmark paper, A M athem atical Theory of Com m unication , generally taken as his most important scientific contribution.
Born: 30 April 1916 Died: 23 February 2001
“Father of inform ation theory”
Shannon applied the same radical approach to cryptography research, in which he later became a consultant to the US government. Many of Shannon’s pioneering insights were developed before they could be applied in practical form. He was truly a remarkable man, yet unknown to most of the world. witten&eibe
Docsity.com
13
info([2,3]) =entropy(2/5,3/5)= − 2 / 5 log( 2 / 5 )− 3 / 5 log( 3 / 5 ) = 0. 971 bits
info([4,0]) =entropy(1,0) =− 1 log( 1 )− 0 log( 0 ) = 0 bits
info([3,2]) =entropy(3/5,2/5)= − 3 / 5 log( 3 / 5 )− 2 / 5 log( 2 / 5 ) = 0. 971 bits
_Note: log(0) is not defined, but we evaluate 0log(0) as zero_*
info([3,2], [4,0],[3,2]) =( 5 / 14 )× 0. 971 +( 4 / 14 )× 0 +( 5 / 14 )× 0. 971
= 0. 693 bits witten&eibe Docsity.com
14
= 0. 247 bits
witten&eibe Docsity.com
16
= 0. 247 bits
gain(" Outlook")= 0. 247 bits gain(" Temperature")= 0. 029 bits gain(" Humidity")= 0. 152 bits gain(" Windy") = 0. 048 bits
witten&eibe Docsity.com
17
gain(" Temperature")= 0. 571 bits
gain(" Humidity")= 0. 971 bits
gain(" Windy") = 0. 020 bits
witten&eibe Docsity.com
21
⇒Information gain is biased towards choosing attributes with a large number of values
that is non-optimal for prediction)
witten&eibe Docsity.com
22
ID Outlook Temperature Humidity Windy Play? A sunny hot high false No B sunny hot high true No C overcast hot high false Yes D rain mild high false Yes E rain cool normal false Yes F rain cool normal true No G overcast cool normal true Yes H sunny mild high false No I sunny cool normal false Yes J rain mild normal false Yes K sunny mild normal true Yes L overcast mild high true Yes M overcast hot normal false Yes N rain mild high true No Docsity.com