Download Concept Learning in Machine Learning and more Lecture notes Machine Learning in PDF only on Docsity! Concept Learning Mitchell, Chapter 2 CptS 570 Machine Learning School of EECS Washington State University Outline Definition General-to-specific ordering over hypotheses Version spaces and the candidate elimination algorithm Inductive bias Learning Task: Enjoy Sport Task T Accurately predict enjoyment Performance P Predictive accuracy Experience E Training examples each with attribute values and class value (yes or no) Representing Hypotheses Many possible representations Let hypothesis h be a conjunction of constraints on attributes Hypothesis space H is the set of all possible hypotheses h Each constraint can be Specific value (e.g., Water = Warm) Don’t care (e.g., Water = ?) No value is acceptable (e.g., Water = Ø) For example <Sunny, ?, ?, Strong, ?, Same> I.e., if (Sky=Sunny) and (Wind=Strong) and (Forecast=Same), then EnjoySport=Yes Concept Learning Task Given Instances X: Possible days Each described by the attributes: Sky, AirTemp, Humidity, Wind, Water, Forecast Target function c: EnjoySport {0,1} Hypotheses H: Conjunctions of literals E.g., <?, Cold, High, ?, ?, ?> Training examples D Positive and negative examples of the target function <x1,c(x1)>, …, <xm,c(xm)> Determine A hypothesis h in H such that h(x) = c(x) for all x in D Terminology Inductive learning hypothesis Any hypothesis approximating the target concept well, over a sufficiently large set of training examples, will also approximate the target concept well for unobserved examples Concept Learning as Search Learning viewed as a search through hypothesis space H for a hypothesis consistent with the training examples General-to-specific ordering of hypotheses Allows more directed search of H General-to-Specific Ordering
of Hypotheses
instances X Hypotheses H
Specific
General
x4= <Sunny, Warm, High, Strong, Cool, Same> hy <Sunny, ?, ?, Strong, ?, ?>
x= <Sunny, Warm, High, Light, Warm, Same> a <Sunny, 2,2, 2, 2, 2>
he <Sunny, ?, ?, 2, Cool, 2>
Find-S Algorithm Initialize h to the most specific hypothesis in H For each positive training instance x For each attribute constraint ai in h If the constraint ai in h is satisfied by x Then do nothing Else replace ai in h by the next more general constraint that is satisfied by x Output hypothesis h Find-S Example
Instances X Hypotheses H
e x2 e e Specific
3 .
e
e
af xe
e
e
e 7 © General
‘4
hg= <O, ©, ©, ©, ©, O©>
X1 = <Sunny Warm Normal Strong Warm Same>, + hy = <Sumny Warm Normal Strong Warm Same>
X= <Sunny Warm High Strong Warm Same>, + hy = <Sunny Warm ? Strong Warm Same>
X= <Rainy Cold High Strong Warm Change>, - h, = <Sunny Warm ? Strong Warm Same>
X45 <Sunny Warm High Strong Cool Change>, + Ay = <Sunny Warm ? Strong ? ?>
Find-S Algorithm Will h ever cover a negative example? No, if c ∈ H and training examples consistent Problems with Find-S Cannot tell if converged on target concept Why prefer the most specific hypothesis? Handling inconsistent training examples due to errors or noise What if more than one maximally-specific consistent hypothesis? Version Space Example Version space resulting from previous four EnjoySport examples. Finding the Version Space List-Then-Eliminate VS = list of every hypothesis in H For each training example <x,c(x)> ∈ D Remove from VS any h where h(x) ≠ c(x) Return VS Impractical for all but most trivial H’s Candidate Elimination Algorithm Initialize G to the set of maximally general hypotheses in H Initialize S to the set of maximally specific hypotheses in H For each training example d, do If d is a positive example … If d is a negative example … Example
S9:1{K 9, 0,9,2,8, o>}
y
Ss 1: {<Sunny, Warm, Normal, Strong, Warm, Same> }
y
S 2:| {<Sunny, Warm, ?, Strong, Warm, Same>}
Gy, > Gs] (<2, 2,2, 2,2, 2>)}
Training examples:
1. <Sunny, Warm, Normal, Strong, Warm, Same>, Enjoy Sport = Yes
2. <Sunny, Warm, High, Strong, Warm, Same>, Enjoy Sport = Yes
Example (cont.)
Sy, S 3:|{<Sunny, Warm, ?, Strong, Warm, Same> }
G3: {<Sunny, ?, ?, 2, 2, 2> <?, Warm, ?, ?, ?, ?> <?, ?, ?, ?, ?, Same>}
G:| {<?, 2,2, 2,2, ?>}
Training Example:
3. <Rainy, Cold, High, Strong, Warm, Change>, EnjoySport=No
Example (cont.)
Ss 3: |{<Sunny, Warm, ?, Strong, Warm, Same>}
|!
s 4: {<Sunny, Warm, ?, Strong, ?, ?>}
Gq, |{<Sunny, 7, 2, 2, 2, 2> <2, Warm, ?, 2, 2, 2>}
|
G3: {<Sunmny, ?, ?, ?, 2, P> <P, Warm, ?, 2°, ?, Pm <P, P,P? ? Same>}
Training Example:
4.<Sunny, Warm, High, Strong, Cool, Change>, EnjoySport = Yes
Version Spaces and the Candidate Elimination Algorithm Which training example requested next? Learner may query oracle for example’s classification Ideally, choose example eliminating half of VS Need log2|VS| examples to converge Which Training Example Next? <Sunny, Cold, Normal, Strong, Cool, Change> ? <Sunny, Warm, High, Light, Cool, Change> ? Using VS to Classify New Example <Sunny, Warm, Normal, Strong, Cool, Change> ? <Rainy, Cold, Normal, Light, Warm, Same> ? <Sunny, Warm, Normal, Light, Warm, Same> ? <Sunny, Cold, Normal, Strong, Warm, Same> ? Unbiased Learner H = every teachable concept (power set of X) E.g., EnjoySport | H | = 296 = 1028 (only 973 by previous H, biased!) H’ = arbitrary conjunctions, disjunctions or negations of hypotheses from previous H E.g., [Sky = Sunny or Cloudy] <Sunny,?,?,?,?,?> or <Cloudy,?,?,?,?,?> Unbiased Learner Problems using H’ S = disjunction of positive examples G = negated disjunction of negative examples Thus, no generalization Each unseen instance covered by exactly half of VS Unbiased Learner Bias-free learning is futile Fundamental property of inductive learning Learners that make no a priori assumptions about the target concept have no rational basis for classifying unseen instances Inductive Bias Permits comparison of learners Rote learner Store examples; classify x iff matches previously observed example No bias CE c ∈ H Find-S c ∈ H c(x) = 0 for all instances not covered WEKA’s ConjunctiveRule Classifier Learns rule of the form If A1 and A2 and … An, Then class = c A’s are inequality constraints on attributes A’s chosen based on information gain criterion I.e., which constraint, when added, best improves classification Lastly, performs reduced-error pruning Remove A’s from rule as long as reduces error on pruning set If instance x not covered by rule, then c(x) = majority class of training examples not covered by rule Inductive bias? Summary Concept learning as search General-to-specific ordering Version spaces Candidate elimination algorithm S and G boundary sets characterize learner’s uncertainty Learner can generate useful queries Inductive leaps possible only if learner is biased