Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Version Spaces - Artificial Intelligence - Lecture Slides, Slides of Artificial Intelligence

West Bengal University of Animal and Fishery Sciences Artificial Intelligence

Some concept of Artificial Intelligence are Agents and Problem Solving, Autonomy, Programs, Classical and Modern Planning, First-Order Logic, Resolution Theorem Proving, Search Strategies, Structure Learning. Main points of this lecture are: Version Spaces, Decision Trees, Machine Learning, Variables, Data Type Definition, Binary, Supervised Learning Problem, Describe the General Concept, Forecast, Sport

Typology: Slides

2012/2013

Uploaded on 04/29/2013

shantii 🇮🇳

4.4

(14)

97 documents

1 / 24

This page cannot be seen from the preview

Don't miss anything!

Lecture 35 of 41

Machine Learning:

Version Spaces and Decision Trees

Docsity.com

Discover Slides of Artificial Intelligence West Bengal University of Animal and Fishery Sciences

Partial preview of the text

Download Version Spaces - Artificial Intelligence - Lecture Slides and more Slides Artificial Intelligence in PDF only on Docsity!

Lecture 35 of 41

Machine Learning:

Version Spaces and Decision Trees

Example:

Learning A Concept ( EnjoySport ) from Data

Example Sky Air

Temp

Humidity Wind Water Forecast Enjoy

Sport

0 Sunny Warm Normal Strong Warm Same Yes

1 Sunny Warm High Strong Warm Same Yes

2 Rainy Cold High Strong Warm Change No

3 Sunny Warm High Strong Cool Change Yes

Specification for Training Examples
- Similar to a data type definition
- 6 variables ( aka attributes, features): Sky , Temp , Humidity , Wind , Water , Forecast
- Nominal-valued (symbolic) attributes - enumerative data type

• Binary (Boolean-Valued or H -Valued) Concept

Supervised Learning Problem: Describe the General Concept

Typical Concept Learning Tasks

Given
- Instances X: possible days, each described by attributes Sky, AirTemp, Humidity, Wind, Water, Forecast
- Target function c  EnjoySport: X  H  {{Rainy, Sunny}  {Warm, Cold}  {Normal, High}  {None, Mild, Strong}  {Cool, Warm}  {Same, Change}}  {0, 1}
- Hypotheses H : conjunctions of literals (e.g., )
- Training examples D : positive and negative examples of the target function
Determine
- Hypothesis h  H such that h(x) = c(x) for all x  D
- Such h are consistent with the training data
Training Examples
- Assumption: no missing X values
- Noise in values of c (contradictory labels)?

x1, c  x 1  ,  , xm,c  xm 

Inductive Learning Hypothesis

Fundamental Assumption of Inductive Learning
Informal Statement
- Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples
- Definitions deferred: sufficiently large, approximate well, unobserved
Formal Statements, Justification, Analysis
- Statistical (Mitchell, Chapter 5; statistics textbook)
- Probabilistic (R&N, Chapters 14-15 and 19; Mitchell, Chapter 6)
- Computational (R&N, Section 18.6; Mitchell, Chapter 7)
More on This Topic: Machine Learning and Pattern Recognition (CIS732)
Next: How to Find This Hypothesis?

Find-S Algorithm

1. Initialize h to the most specific hypothesis in H

H : the hypothesis space (partially ordered set under relation Less-Specific-Than )

2. For each positive training instance x

For each attribute constraint ai in h IF the constraint ai in h is satisfied by x THEN do nothing ELSE replace ai in h by the next more general constraint that is satisfied by x

3. Output hypothesis h

Hypothesis Space Search

by Find-S

Instances X Hypotheses H

x 1 = <Sunny, Warm, Normal, Strong, Warm, Same>, + x 2 = <Sunny, Warm, High, Strong, Warm, Same>, + x 3 = <Rainy, Cold, High, Strong, Warm, Change>, - x 4 = <Sunny, Warm, High, Strong, Cool, Change>, +

h 1 = <Ø, Ø, Ø, Ø, Ø, Ø> h 2 = <Sunny, Warm, Normal, Strong, Warm, Same> h 3 = <Sunny, Warm, ?, Strong, Warm, Same> h 4 = <Sunny, Warm, ?, Strong, Warm, Same> h 5 = <Sunny, Warm, ?, Strong, ?, ?>

Shortcomings of Find-S
- Can’t tell whether it has learned concept
- Can’t tell when training data inconsistent
- Picks a maximally specific h (why?)
- Depending on H , there might be several!

h 1

h 0

h2,

h 4

x 3

x 1 x 2

x 4

1. Initialization

G  (singleton) set containing most general hypothesis in H , denoted {} S  set of most specific hypotheses in H , denoted {<Ø, … , Ø>}

2. For each training example d

If d is a positive example ( Update-S ) Remove from G any hypotheses inconsistent with d For each hypothesis s in S that is not consistent with d Remove s from S Add to S all minimal generalizations h of s such that

h is consistent with d
Some member of G is more general than h (These are the greatest lower bounds, or meets , s  d , in VSH,D ) Remove from S any hypothesis that is more general than another hypothesis in S (remove any dominated elements)

Candidate Elimination Algorithm [1]

Candidate Elimination Algorithm [2]

(continued)

If d is a negative example ( Update-G ) Remove from S any hypotheses inconsistent with d For each hypothesis g in G that is not consistent with d Remove g from G Add to G all minimal specializations h of g such that

h is consistent with d
Some member of S is more specific than h (These are the least upper bounds, or joins , g  d , in VSH,D ) Remove from G any hypothesis that is less general than another hypothesis in G (remove any dominating elements)

An Unbiased Learner

Example of A Biased H
- Conjunctive concepts with don’t cares
- What concepts can H not express? (Hint: what are its syntactic limitations?)
Idea
- Choose H’ that expresses every teachable concept
- i.e., H’ is the power set of X
- Recall: | A  B | = | B | |^ A^ |^ ( A = X ; B = {labels}; H’ = A  B )
- {{Rainy, Sunny}  {Warm, Cold}  {Normal, High}  {None, Mild, Strong}  {Cool, Warm}  {Same, Change}}  {0, 1}
An Exhaustive Hypothesis Language
- Consider: H’ = disjunctions () , conjunctions (), negations (¬) over previous H
- | H’ | = 2 (2 • 2 • 2 • 3 • 2 • 2)^ = 2^96 ; | H | = 1 + (3 • 3 • 3 • 4 • 3 • 3) = 973
What Are S, G For The Hypothesis Language H’?
- S  disjunction of all positive examples
- G  conjunction of all negated negative examples

Decision Trees

Classifiers: Instances (Unlabeled Examples)
Internal Nodes: Tests for Attribute Values
- Typical: equality test (e.g., “Wind = ?”)
- Inequality, other tests possible
Branches: Attribute Values
- One-to- one correspondence (e.g., “Wind = Strong”, “Wind = Light”)
Leaves: Assigned Classifications (Class Labels)
Representational Power: Propositional Logic ( Why? )

Outlook?

Humidity? Maybe Wind?

Sunny Overcast Rain

No Yes

High Normal

No Maybe

Strong Light

Decision Tree for Concept PlayTennis

[21+, 5-] [8+, 30-]

Decision Tree Learning:

Top-Down Induction ( ID3 )

A 1

True False

[29+, 35-]

[18+, 33-] [11+, 2-]

A 2

True False

[29+, 35-]

Algorithm Build-DT ( Examples , Attributes )

IF all examples have the same label THEN RETURN (leaf node with label ) ELSE IF set of attributes is empty THEN RETURN (leaf with majority label ) ELSE Choose best attribute A as root FOR each value v of A Create a branch out of the root for the condition A = v IF { x  Examples : x.A = v } = Ø THEN RETURN (leaf with majority label ) ELSE Build-DT ({ x  Examples : x.A = v }, Attributes ~ {A})

But Which Attribute Is Best?

Choosing the “Best” Root Attribute

Objective
- Construct a decision tree that is a small as possible (Occam’s Razor)
- Subject to: consistency with labels on training data
Obstacles
- Finding the minimal consistent hypothesis (i.e., decision tree) is NP - hard (D’oh!)
- Recursive algorithm ( Build-DT )
  - A greedy heuristic search for a simple tree
  - Cannot guarantee optimality (D’oh!)
Main Decision: Next Attribute to Condition On
- Want: attributes that split examples into sets that are relatively pure in one label
- Result: closer to a leaf node
- Most popular heuristic
  - Developed by J. R. Quinlan
  - Based on information gain
  - Used in ID3 algorithm

Entropy:

Information Theoretic Definition

Components
- D : a set of examples {< x 1 , c ( x 1 )>, < x 2 , c ( x 2 )>, …, < xm , c ( xm )>}
- p+ = Pr ( c ( x ) = +), p- = Pr ( c ( x ) = -)
Definition
- H is defined over a probability density function p
- D contains examples whose frequency of + and - labels indicates p+ and p- for the observed data
- The entropy of D relative to c is: H ( D )  - p+ log b ( p +) - p- log b ( p - )
What Units is H Measured In?
- Depends on the base b of the log (bits for b = 2, nats for b = e , etc.)
- A single bit is required to encode each example in the worst case ( p+ = 0.5)
- If there is less uncertainty (e.g., p+ = 0.8), we can use less than 1 bit each

Information Gain:

Information Theoretic Definition

Partitioning on Attribute Values
- Recall: a partition of D is a collection of disjoint subsets whose union is D
- Goal: measure the uncertainty removed by splitting on the value of attribute A
Definition
- The information gain of D relative to attribute A is the expected reduction in entropy due to splitting (“sorting”) on A:

where Dv is { x  D : x.A = v }, the set of examples in D where attribute A has value v

Idea: partition on A ; scale entropy to the size of each subset Dv
Which Attribute Is Best?

v values(A) v

v (^) HD D

GainD,A -HD D

[21+, 5-] [8+, 30-]

A 1

True False

[29+, 35-]

[18+, 33-] [11+, 2-]

A 2

True False

[29+, 35-]

Version Spaces - Artificial Intelligence - Lecture Slides, Slides of Artificial Intelligence

Related documents

Partial preview of the text

Download Version Spaces - Artificial Intelligence - Lecture Slides and more Slides Artificial Intelligence in PDF only on Docsity!

Lecture 35 of 41

Machine Learning:

Version Spaces and Decision Trees

Example:

Learning A Concept ( EnjoySport ) from Data

Example Sky Air

Temp

Humidity Wind Water Forecast Enjoy

Sport

0 Sunny Warm Normal Strong Warm Same Yes

1 Sunny Warm High Strong Warm Same Yes

2 Rainy Cold High Strong Warm Change No

3 Sunny Warm High Strong Cool Change Yes

• Binary (Boolean-Valued or H -Valued) Concept

Typical Concept Learning Tasks

x1, c  x 1  ,  , xm,c  xm 

Inductive Learning Hypothesis

Find-S Algorithm

1. Initialize h to the most specific hypothesis in H

2. For each positive training instance x

3. Output hypothesis h

Hypothesis Space Search

by Find-S

1. Initialization

2. For each training example d

Candidate Elimination Algorithm [1]

Candidate Elimination Algorithm [2]

(continued)

An Unbiased Learner

Decision Trees

Decision Tree Learning:

Top-Down Induction ( ID3 )

Choosing the “Best” Root Attribute

Entropy:

Information Theoretic Definition

Information Gain:

Information Theoretic Definition