Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Material Type: Paper; Professor: Hamerly; Class: Introduction to Machine Learning; Subject: Computer Science; University: Baylor University; Term: Fall 2008;
Typology: Papers
1 / 23
Greg Hamerly
Fall 2008
Some content from Tom Mitchell.
Outline
1 Course administration
2 Concept learning
3 Learning from examples
4 General-to-specific ordering over hypotheses
5 Version spaces and candidate elimination algorithm
Course administration
Seven assignment dates have been posted. Material/reading may still be adjusted. New component to the course: paper reading/presentation Presentation worth 1 assignment (so ≤ 8 assignments total) Who wants to do the first paper?
Concept learning
Concept learning is learning a function which has a boolean-valued output.
f : X → { 0 , 1 }
Many machine learning approaches use this simplistic binary view of the world. aside: multiclass → binary class reductions
Learning from examples
Sky Temp Humid Wind Water Forecast EnjoySport Sunny Warm Normal Strong Warm Same Yes Sunny Warm High Strong Warm Same Yes Rainy Cold High Strong Warm Change No Sunny Warm High Strong Cool Change Yes
What is the general concept?
Learning from examples
Many possible representations...
Here, h is conjunction of constraints on attributes.
Each constraint can be a specfic value (e.g., “Water = Warm”) don’t care (e.g., “Water =?”) no value allowed (e.g.,“Water=∅”)
For example, Sky AirTemp Humid Wind Water Forecast 〈Sunny?? Strong? Same〉
Learning from examples
Given: Instances X : Possible days, each described by the attributes Sky, AirTemp, Humidity, Wind, Water, Forecast Target function c: EnjoySport : X → { 0 , 1 } Hypotheses H: Conjunctions of literals. E.g.
〈?, Cold, High, ?, ?, ?〉.
Training examples D: Positive and negative examples of the target function:
〈x 1 , c(x 1 )〉,... 〈xm, c(xm)〉
Determine: Hypothesis h ∈ H where h(x) = c(x) for all x ∈ D.
Learning from examples
The inductive learning hypothesis: Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples.
Learning from examples
Today, assume there are no mistakes (aka noise) in the examples.
I.e., all attribute values and labels are correct for every example, and we expect a ‘good’ hypothesis to be correct on every training example.
We would like learning algorithms to be robust to noise (why?).
But with noise, we should expect even a ‘good’ hypothesis to make some ‘mistakes’ on the training examples (why?).
General-to-specific ordering over hypotheses
h = <Sunny, ?, ?, Strong, ?, ?> h = <Sunny, ?, ?, ?, ?, ?> h = <Sunny, ?, ?, ?, Cool, ?>
h 2
h h 3
Instances X Hypotheses H
Specific
General
x 1 x 2
x = <Sunny, Warm, High, Strong, Cool, Same> x = <Sunny, Warm, High, Light, Warm, Same>
1
1 2
1 2 3
General-to-specific ordering over hypotheses
Let hj and hk be boolean-valued functions defined over X.
More-general-than-or-equal-to:
(hj ≥g hk ) ↔ (∀x ∈ X )[(hk (x) = 1) → (hj (x) = 1)]
Strictly more-general-than:
(hj >g hk ) ↔ (hj ≥g hk ) ∧ (hk 6 ≥g hj )
General-to-specific ordering over hypotheses
1 Initialize h to the most specific hypothesis in H 2 For each positive training instance x For each attribute constraint ai in h If the constraint ai in h is satisfied by x Then do nothing Else replace ai in h by the next more general constraint that is satisfied by x 3 Output hypothesis h
General-to-specific ordering over hypotheses
Instances X (^) Hypotheses H
Specific
General
x 1 (^) x 2
x (^) 3
x 4
h 0 h 1
h (^) 2,
h (^) 4
+ +
+
x (^) 4 =
x (^) 1 =
h (^) 1 =
h (^) 4 = <Sunny Warm? Strong?? >
h (^) 3 = <Sunny Warm? Strong Warm Same>
h 0 = < ∅, ∅, ∅, ∅, ∅, ∅ >
General-to-specific ordering over hypotheses
Can’t tell whether it has learned concept Can’t tell when training data inconsistent Picks a maximally specific h Depending on H, there might be several!
Why?
General-to-specific ordering over hypotheses
Discuss: could we move from the most general hypothesis towards more specific ones?
Version spaces and candidate elimination algorithm
To enable a more complete search of H, we turn to version spaces.
Version spaces define all the hypotheses which fit the training data.
This allows a more complete picture of the approximation to the target concept than does looking for a single hypothesis.
Version spaces and candidate elimination algorithm
A hypothesis h is consistent with a set of training examples D of target concept c if and only if h(x) = c(x) for each training example 〈x, c(x)〉 in D.
Consistent(h, D) ≡ (∀〈x, c(x)〉 ∈ D) h(x) = c(x)
The version space, VSH,D , with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with all training examples in D. VSH,D ≡ {h ∈ H|Consistent(h, D)}
Version spaces and candidate elimination algorithm
1 VersionSpace ← a list containing every hypothesis in H 2 For each training example, 〈x, c(x)〉 remove from VersionSpace any hypothesis h for which h(x) 6 = c(x) 3 Output the list of hypotheses in VersionSpace
Version spaces and candidate elimination algorithm
S:
<Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> , Warm, ?, Strong, ?, ?>
{ <Sunny, Warm, ?, Strong, ?, ?> }
G: { <Sunny, ?, ?, ?, ?, ?>, , Warm, ?, ?, ?, ?>^ }
Version spaces and candidate elimination algorithm
The General boundary, G, of version space VSH,D is the set of its maximally general members
The Specific boundary, S, of version space VSH,D is the set of its maximally specific members
Every member of the version space lies between these boundaries
VSH,D = {h ∈ H|(∃s ∈ S)(∃g ∈ G )(g ≥g h ≥g s)}
where x ≥g y means x is more general or equal to y
Version spaces and candidate elimination algorithm
Initialize: G ← maximally general hypotheses in H S ← maximally specific hypotheses in H For each training example d, do If d is a positive example, adjust sets G and S If d is a negative example, adjust sets G and S
This algorithm still doesn’t work well with noisy data.
Version spaces and candidate elimination algorithm
For a positive example d: Remove from G any hypothesis inconsistent with d
For each hypothesis s in S that is not consistent with d Remove s from S Add to S all minimal generalizations h of s such that h is consistent with d, and some member of G is more general than h Remove from S any hypothesis that is more general than another hypothesis in S
Version spaces and candidate elimination algorithm
For a negative example d: Remove from S any hypothesis inconsistent with d
For each hypothesis g in G that is not consistent with d Remove g from G Add to G all minimal specializations h of g such that h is consistent with d, and some member of S is more specific than h Remove from G any hypothesis that is less general than another hypothesis in G