Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Classification Algorithms: Generating Rules and Linear Models, Slides of Computer Fundamentals

An overview of classification algorithms, focusing on rule generation and linear models. It covers various methods for generating rules from decision trees, including covering algorithms and simple covering algorithms. The document also discusses linear regression and logistic regression as linear models for classification. It is a useful resource for students and researchers in machine learning and data mining.

Typology: Slides

2012/2013

Uploaded on 01/29/2013

ashu
ashu 🇮🇳

3.8

(16)

108 documents

1 / 38

Toggle sidebar

Related documents


Partial preview of the text

Download Classification Algorithms: Generating Rules and Linear Models and more Slides Computer Fundamentals in PDF only on Docsity!

Classification Algorithms

Docsity.com

Outline

  • Rules
  • Linear Models (Regression)
  • Instance-based (Nearest-neighbor)

2 Docsity.com

Generating Rules

  • Decision tree can be converted into a rule set
  • Straightforward conversion:
    • each path to the leaf becomes a rule – makes an

overly complex rule set

  • More effective conversions are not trivial
    • (e.g. C4.8 tests each node in root-leaf path to see if

it can be eliminated without loss in accuracy)

3 Docsity.com

Covering algorithms

  • Strategy for generating a rule set directly: for

each class in turn find rule set that covers all

instances in it (excluding instances not in the

class)

  • This approach is called a covering approach

because at each stage a rule is identified that

covers some of the instances

4 Docsity.com

Example: generating a rule

5

y

x

a

b b

b

b

b

b b

b

b b^ b b b b

a (^) aa

a a

If true then class = a

Docsity.com

Example: generating a rule, II

6

y

x

a

b b

b

b

b

b b

b

b b^ b b b b

a (^) aa

a a

y

a

b b

b

b

b

b b

b

b b^ b b b b

a (^) aa

a a

1·2 x

If x > 1.2 then class = a

If true then class = a

Docsity.com

Example: generating a rule, III

7

y

x

a

b b

b

b

b

b b

b

b b^ b b b b

a (^) aa

a a

y

a

b b

b

b

b

b b

b

b b^ b b b b

a (^) aa

a a

1·2 x

y

a

b b

b

b

b

b b

b

b b^ b b b b

a (^) a

a a a

1·2 x

If x > 1.2 then class = a

If true then class = a If x > 1.2 and y > 2.6 then class = a

Docsity.com

Example: generating a rule, IV

  • Possible rule set for class “b”:
  • More rules could be added for “perfect” rule set

8

y

x

a

b b

b

b

b

b b

b

b b^ b b b b

a (^) aa

a a

y

a

b b

b

b

b

b b

b

b b^ b b b b

a (^) aa

a a

1·2 x

y

a

b b

b

b

b

b b

b

b b^ b b b b

a (^) a

a a a

1·2 x

If x > 1.2 then class = a

If true then class = a If x > 1.2 and y > 2.6 then class = a

If x ≤ 1.2 then class = b

If x > 1.2 and y ≤ 2.6 then class = b

Docsity.com

Rules vs. trees

  • Corresponding decision tree:

(produces exactly the same

predictions)

  • But: rule sets can be more clear when decision

trees suffer from replicated subtrees

  • Also: in multi-class situations, covering

algorithm concentrates on one class at a time

whereas decision tree learner takes all classes

into account

9 Docsity.com

A simple covering algorithm

  • Generates a rule by adding tests that maximize

rule’s accuracy

  • Similar to situation in decision trees: problem

of selecting an attribute to split on

  • But: decision tree inducer maximizes overall purity
  • Each new test reduces

rule’s coverage:

10

space of examples

rule so far

rule after adding new term

witten&eibe Docsity.com

Selecting a test

  • Goal: maximize accuracy
    • t total number of instances covered by rule
    • p positive examples of the class covered by rule
    • t – p number of errors made by rule
⇒ Select test that maximizes the ratio p/t
  • We are finished when p/t = 1 or the set of instances can’t be

split any further

11 witten&eibe Docsity.com

Example: contact lens data, 1

  • Rule we seek:
  • Possible tests:

12

Age = Young 2/

Age = Pre-presbyopic Age = Presbyopic Spectacle prescription = Myope Spectacle prescription = Hypermetrope Astigmatism = no Astigmatism = yes Tear production rate = Reduced Tear production rate = Normal

If?

then recommendation = hard

witten&eibe Docsity.com

Example: contact lens data, 2

  • Rule we seek:
  • Possible tests:

13

Age = Young 2/

Age = Pre-presbyopic 1/ Age = Presbyopic 1/ Spectacle prescription = Myope 3/ Spectacle prescription = Hypermetrope 1/ Astigmatism = no 0/ Astigmatism = yes 4/ Tear production rate = Reduced 0/ Tear production rate = Normal 4/

If?

then recommendation = hard

witten&eibe Docsity.com

Modified rule and resulting data

  • Rule with best test added:
  • Instances covered by modified rule:

14

Age Spectacle prescription Astigmatism Tear production rate Recommended lenses Young Myope Yes Reduced None Young Myope Yes Normal Hard Young Hypermetrope Yes Reduced None Young Hypermetrope Yes Normal hard Pre-presbyopic Myope Yes Reduced None Pre-presbyopic Myope Yes Normal Hard Pre-presbyopic Hypermetrope Yes Reduced None Pre-presbyopic Hypermetrope Yes Normal None Presbyopic Myope Yes Reduced None Presbyopic Myope Yes Normal Hard Presbyopic Hypermetrope Yes Reduced None Presbyopic Hypermetrope Yes Normal None

If astigmatism = yes then recommendation = hard

witten&eibe Docsity.com

Further refinement, 1

  • Current state:
  • Possible tests:

15

Age = Young 2/ Age = Pre-presbyopic Age = Presbyopic Spectacle prescription = Myope Spectacle prescription = Hypermetrope Tear production rate = Reduced Tear production rate = Normal

If astigmatism = yes

and?

then recommendation = hard

witten&eibe Docsity.com

Further refinement, 2

  • Current state:
  • Possible tests:

16

Age = Young 2/ Age = Pre-presbyopic 1/ Age = Presbyopic 1/ Spectacle prescription = Myope 3/ Spectacle prescription = Hypermetrope 1/ Tear production rate = Reduced 0/ Tear production rate = Normal 4/

If astigmatism = yes

and?

then recommendation = hard

witten&eibe Docsity.com

Modified rule and resulting data

  • Rule with best test added:
  • Instances covered by modified rule:

17

Age Spectacle prescription Astigmatism Tear production rate Recommended lenses Young Myope Yes Normal Hard Young Hypermetrope Yes Normal hard Pre-presbyopic Myope Yes Normal Hard Pre-presbyopic Hypermetrope Yes Normal None Presbyopic Myope Yes Normal Hard Presbyopic Hypermetrope Yes Normal None

If astigmatism = yes and tear production rate = normal then recommendation = hard

witten&eibe Docsity.com

Further refinement, 3

  • Current state:
  • Possible tests:

18

Age = Young Age = Pre-presbyopic Age = Presbyopic Spectacle prescription = Myope Spectacle prescription = Hypermetrope

If astigmatism = yes

and tear production rate = normal

and?

then recommendation = hard

witten&eibe Docsity.com

Further refinement, 4

  • Current state:
  • Possible tests:
  • Tie between the first and the fourth test
    • We choose the one with greater coverage

19

Age = Young 2/ Age = Pre-presbyopic 1/ Age = Presbyopic 1/ Spectacle prescription = Myope 3/ Spectacle prescription = Hypermetrope 1/

If astigmatism = yes

and tear production rate = normal

and?

then recommendation = hard

witten&eibe Docsity.com

The result

  • Final rule:
  • Second rule for recommending “hard lenses”:
(built from instances not covered by first rule)
  • These two rules cover all “hard lenses”:
    • Process is repeated with other two classes

20

If astigmatism = yes and tear production rate = normal and spectacle prescription = myope then recommendation = hard

If age = young and astigmatism = yes and tear production rate = normal then recommendation = hard

witten&eibe Docsity.com

Pseudo-code for PRISM

21

For each class C

Initialize E to the instance set

While E contains instances in class C

Create a rule R with an empty left-hand side that predicts class C

Until R is perfect (or there are no more attributes to use) do For each attribute A not mentioned in R, and each value v,

Consider adding the condition A = v to the left-hand side of R

Select A and v to maximize the accuracy p/t

(break ties by choosing the condition with the largest p)

Add A = v to R Remove the instances covered by R from E

witten&eibe Docsity.com

Rules vs. decision lists

  • PRISM with outer loop removed generates a

decision list for one class

  • Subsequent rules are designed for rules that are not

covered by previous rules

  • But: order doesn’t matter because all rules predict the

same class

  • Outer loop considers all classes separately
    • No order dependence implied
  • Problems: overlapping rules, default rule required

22 Docsity.com

Separate and conquer

  • Methods like PRISM (for dealing with one

class) are separate-and-conquer algorithms:

  • First, a rule is identified
  • Then, all instances covered by the rule are

separated out

  • Finally, the remaining instances are “conquered”
  • Difference to divide-and-conquer methods:
  • Subset covered by rule doesn’t need to be

explored any further

23 witten&eibe Docsity.com

Outline

  • Rules
  • Linear Models (Regression)
  • Instance-based (Nearest-neighbor)

24 Docsity.com

Linear models

  • Work most naturally with numeric attributes
  • Standard technique for numeric prediction: linear

regression

  • Outcome is linear combination of attributes
  • Weights are calculated from the training data
  • Predicted value for first training instance a (1)

25

x = w 0 + w 1 a 1 + w 2 a 2 +...+ wk a k

+ + + + =

k

j

w a wa w a wk ak wjaj

0

( 1 ) ( 1 ) ( 1 ) 2 2

( 1 ) 1 1

( 1 )

0 0 ...

witten&eibe Docsity.com