Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Introduction to Machine Learning - Concepts Learning - Slides | CSI 5325, Papers of Computer Science

Material Type: Paper; Professor: Hamerly; Class: Introduction to Machine Learning; Subject: Computer Science; University: Baylor University; Term: Fall 2008;

Typology: Papers

Pre 2010

Uploaded on 08/18/2009

koofers-user-l2y
koofers-user-l2y 🇺🇸

10 documents

1 / 23

Toggle sidebar

Related documents


Partial preview of the text

Download Introduction to Machine Learning - Concepts Learning - Slides | CSI 5325 and more Papers Computer Science in PDF only on Docsity!

Intro. to machine learning (CSI 5325)

Lecture 2: concept learning

Greg Hamerly

Fall 2008

Some content from Tom Mitchell.

Outline

1 Course administration

2 Concept learning

3 Learning from examples

4 General-to-specific ordering over hypotheses

5 Version spaces and candidate elimination algorithm

Course administration

Course adjustments

Seven assignment dates have been posted. Material/reading may still be adjusted. New component to the course: paper reading/presentation Presentation worth 1 assignment (so ≤ 8 assignments total) Who wants to do the first paper?

Concept learning

The definition of concept learning

Concept learning is learning a function which has a boolean-valued output.

f : X → { 0 , 1 }

Many machine learning approaches use this simplistic binary view of the world. aside: multiclass → binary class reductions

Learning from examples

Training examples for EnjoySport

Sky Temp Humid Wind Water Forecast EnjoySport Sunny Warm Normal Strong Warm Same Yes Sunny Warm High Strong Warm Same Yes Rainy Cold High Strong Warm Change No Sunny Warm High Strong Cool Change Yes

What is the general concept?

Learning from examples

Representing Hypotheses

Many possible representations...

Here, h is conjunction of constraints on attributes.

Each constraint can be a specfic value (e.g., “Water = Warm”) don’t care (e.g., “Water =?”) no value allowed (e.g.,“Water=∅”)

For example, Sky AirTemp Humid Wind Water Forecast 〈Sunny?? Strong? Same〉

Learning from examples

Prototypical Concept Learning Task

Given: Instances X : Possible days, each described by the attributes Sky, AirTemp, Humidity, Wind, Water, Forecast Target function c: EnjoySport : X → { 0 , 1 } Hypotheses H: Conjunctions of literals. E.g.

〈?, Cold, High, ?, ?, ?〉.

Training examples D: Positive and negative examples of the target function:

〈x 1 , c(x 1 )〉,... 〈xm, c(xm)〉

Determine: Hypothesis h ∈ H where h(x) = c(x) for all x ∈ D.

Learning from examples

The inductive learning hypothesis

The inductive learning hypothesis: Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples.

Learning from examples

Noiseless learning

Today, assume there are no mistakes (aka noise) in the examples.

I.e., all attribute values and labels are correct for every example, and we expect a ‘good’ hypothesis to be correct on every training example.

We would like learning algorithms to be robust to noise (why?).

But with noise, we should expect even a ‘good’ hypothesis to make some ‘mistakes’ on the training examples (why?).

General-to-specific ordering over hypotheses

Instance, hypothesis, and more-general-than relation

h = <Sunny, ?, ?, Strong, ?, ?> h = <Sunny, ?, ?, ?, ?, ?> h = <Sunny, ?, ?, ?, Cool, ?>

h 2

h h 3

Instances X Hypotheses H

Specific

General

x 1 x 2

x = <Sunny, Warm, High, Strong, Cool, Same> x = <Sunny, Warm, High, Light, Warm, Same>

1

1 2

1 2 3

General-to-specific ordering over hypotheses

‘More-general’ relations

Let hj and hk be boolean-valued functions defined over X.

More-general-than-or-equal-to:

(hj ≥g hk ) ↔ (∀x ∈ X )[(hk (x) = 1) → (hj (x) = 1)]

Strictly more-general-than:

(hj >g hk ) ↔ (hj ≥g hk ) ∧ (hk 6 ≥g hj )

General-to-specific ordering over hypotheses

Find-S Algorithm

1 Initialize h to the most specific hypothesis in H 2 For each positive training instance x For each attribute constraint ai in h If the constraint ai in h is satisfied by x Then do nothing Else replace ai in h by the next more general constraint that is satisfied by x 3 Output hypothesis h

General-to-specific ordering over hypotheses

Hypothesis Space Search by Find-S

Instances X (^) Hypotheses H

Specific

General

x 1 (^) x 2

x (^) 3

x 4

h 0 h 1

h (^) 2,

h (^) 4

+ +

+

x (^) 4 = , +

x (^) 1 = , + x (^) 2 = , + x (^) 3 = , -

h (^) 1 = h (^) 2 = <Sunny Warm? Strong Warm Same>

h (^) 4 = <Sunny Warm? Strong?? >

h (^) 3 = <Sunny Warm? Strong Warm Same>

h 0 = < ∅, ∅, ∅, ∅, ∅, ∅ >

General-to-specific ordering over hypotheses

Complaints about Find-S

Can’t tell whether it has learned concept Can’t tell when training data inconsistent Picks a maximally specific h Depending on H, there might be several!

Why?

General-to-specific ordering over hypotheses

General-to-specific?

Discuss: could we move from the most general hypothesis towards more specific ones?

Version spaces and candidate elimination algorithm

Version Spaces

To enable a more complete search of H, we turn to version spaces.

Version spaces define all the hypotheses which fit the training data.

This allows a more complete picture of the approximation to the target concept than does looking for a single hypothesis.

Version spaces and candidate elimination algorithm

Some definitions

A hypothesis h is consistent with a set of training examples D of target concept c if and only if h(x) = c(x) for each training example 〈x, c(x)〉 in D.

Consistent(h, D) ≡ (∀〈x, c(x)〉 ∈ D) h(x) = c(x)

The version space, VSH,D , with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with all training examples in D. VSH,D ≡ {h ∈ H|Consistent(h, D)}

Version spaces and candidate elimination algorithm

The List-Then-Eliminate Algorithm:

1 VersionSpace ← a list containing every hypothesis in H 2 For each training example, 〈x, c(x)〉 remove from VersionSpace any hypothesis h for which h(x) 6 = c(x) 3 Output the list of hypotheses in VersionSpace

Version spaces and candidate elimination algorithm

Example Version Space

S:

<Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?>

{ <Sunny, Warm, ?, Strong, ?, ?> }

G: { <Sunny, ?, ?, ?, ?, ?>, ^ }

Version spaces and candidate elimination algorithm

Representing Version Spaces

The General boundary, G, of version space VSH,D is the set of its maximally general members

The Specific boundary, S, of version space VSH,D is the set of its maximally specific members

Every member of the version space lies between these boundaries

VSH,D = {h ∈ H|(∃s ∈ S)(∃g ∈ G )(g ≥g h ≥g s)}

where x ≥g y means x is more general or equal to y

Version spaces and candidate elimination algorithm

Candidate Elimination Algorithm

Initialize: G ← maximally general hypotheses in H S ← maximally specific hypotheses in H For each training example d, do If d is a positive example, adjust sets G and S If d is a negative example, adjust sets G and S

This algorithm still doesn’t work well with noisy data.

Version spaces and candidate elimination algorithm

Candidate Elimination Algorithm – positive example

For a positive example d: Remove from G any hypothesis inconsistent with d

For each hypothesis s in S that is not consistent with d Remove s from S Add to S all minimal generalizations h of s such that h is consistent with d, and some member of G is more general than h Remove from S any hypothesis that is more general than another hypothesis in S

Version spaces and candidate elimination algorithm

Candidate Elimination Algorithm – negative example

For a negative example d: Remove from S any hypothesis inconsistent with d

For each hypothesis g in G that is not consistent with d Remove g from G Add to G all minimal specializations h of g such that h is consistent with d, and some member of S is more specific than h Remove from G any hypothesis that is less general than another hypothesis in G