Rule Based Classifiers and Nearest Neighbor Classifiers | CIS 6930, Study notes of Computer Science

Material Type: Notes; Professor: Ranka; Class: TRANSLATORS; Subject: COMPUTER SCIENCE AND INFORMATION SYSTEMS; University: University of Florida; Term: Fall 2003;

Typology: Study notes

Pre 2010

Uploaded on 09/17/2009

koofers-user-krp
koofers-user-krp 🇺🇸

4.3

(3)

10 documents

1 / 21

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
University of Florida CISE department Gator Engineering
Classification
Part 2
Dr. Sanjay Ranka
Professor
Computer and Information Science and Engineering
University of Florida, Gainesville
Data Mining Sanjay Ranka Fall 2003 2
University of Florida CISE department Gator Engineering
Overview
Rule based Classifiers
Nearest-neighbor Classifiers
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15

Partial preview of the text

Download Rule Based Classifiers and Nearest Neighbor Classifiers | CIS 6930 and more Study notes Computer Science in PDF only on Docsity!

Classification

Part 2

Dr. Sanjay Ranka

Professor

Computer and Information Science and Engineering

University of Florida, Gainesville

Data Mining Sanjay Ranka Fall 2003 2

University of Florida CISE department Gator Engineering

Overview

• Rule based Classifiers

• Nearest-neighbor Classifiers

Data Mining Sanjay Ranka Fall 2003 3

Rule Based Classifiers

• Classify instances by using a collection of

“if … then …” rules

• Rules are presented in Disjunctive Normal

Form, R = (r 1 v r 2 v … rk)

• R is called rule set

• ri ’s are called classification rules

• Each classification rule is of form

– ri : (Conditioni) → y

• Condition is a conjunction of attribute tests

• y is the class label

Data Mining Sanjay Ranka Fall 2003 4

University of Florida CISE department Gator Engineering

Rule Based Classifiers

• ri : (Conditioni) → y

– LHS of the rule is called rule antecedent or pre-condition

– RHS is called the rule consequent

• If the attributes of an instance satisfy the pre-

condition of a rule, then the instance is assigned

to the class designated by the rule consequent

• Example

– (Blood Type=Warm)∧ (Lay Eggs=Yes) → Birds

– (Taxable Income < 50K) ∧ (Refund=Yes)

→ Cheat=No

Data Mining Sanjay Ranka Fall 2003 7

Rule Based Classifiers

• Rules may not be exhaustive

• Strategy:

– A default rule rd : ( )

yd can be added

– The default rule has an empty antecedent and

is applicable when all other rules have failed

– yd is known as default class and is often

assigned to the majority class

Data Mining Sanjay Ranka Fall 2003 8

University of Florida CISE department Gator Engineering

Example of Rule Based Classifier

Tid Refund Marital Status

Taxable Income Evade

1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No

1 0^10 No^ Single^ 90K^ Yes

categorica lcategorica lcontinuousclass

• r 1 : (Refund=No) &

(Marital Status=Single) &

(Taxable Income>80K) 

Yes

• r 2 : (Refund=No) &

(Marital

Status=Divorced) &

(Taxable Income>80K) 

Yes

• default : ( )  No

Data Mining Sanjay Ranka Fall 2003 9

Advantages of Rule Based Classifiers

• As highly expressive as decision trees

• Easy to interpret

• Easy to generate

• Can classify new instances rapidly

• Performance comparable to decision trees

Data Mining Sanjay Ranka Fall 2003 10

University of Florida CISE department Gator Engineering

Basic Definitions

• Coverage of a rule:

– Fraction of

instances that

satisfy the

antecedent of a rule

• Accuracy of a rule:

– Fraction of

instances that

satisfy both the

antecedent and

consequent of a rule

Tid Refund^ Marital Status

Taxable Income Cheat

1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married^ 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10

(Marital Status=Married) → No

Coverage = 40%, Accuracy = 100%

Data Mining Sanjay Ranka Fall 2003 13

Rules can be Simplified

NONO YESYES

NONO

NONO

Yes No

{Married} {Single, Divorced}

< 80K > 80K

Taxable Income

Marita l Status

Refund

Tid Refund Marital Status

Taxable Income Cheat

1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married^ 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married^ 75K No 10 No Single 90K Yes 10

•Initial Rule: (Refund=No) ∧ (Status=Married) → No

•Simplified Rule: (Status=Married) → No

Data Mining Sanjay Ranka Fall 2003 14

University of Florida CISE department Gator Engineering

Indirect Method: C4.5 rules

• Creating an initial set of rules

– Extract rules from an un-pruned decision tree

– For each rule, r : A

y

• Consider an alternative rule r’ : A’  y , where A’ is

obtained by removing one of the conjuncts in A

• Compare the pessimistic error rate for r against all

r’ s

• Prune if one of the r’ s has a lower pessimistic error

rate

• Repeat until we can no longer improve the

generalization error

Data Mining Sanjay Ranka Fall 2003 15

Indirect Method: C4.5 rules

• Ordering the rules

– Instead of ordering the rules, order subsets of

rules

• Each subset is a collection of rules with the same

consequent (class)

• Description length of each subset is computed, and

the subsets are then ordered in the increasing

order of the description length

  • Description length = L (exceptions) + g · L (model)
  • g is a parameter that takes in to account the presence of

redundant attributes in a rule set. Default value is 0.

Data Mining Sanjay Ranka Fall 2003 16

University of Florida CISE department Gator Engineering

Direct Method: Sequential Covering

• Sequential Covering Algorithm (E:

training examples, A: set of attributes)

1. Let R = { } be the initial rule set

2. While stopping criteria is not met

1. r Learn-One-Rule ( E , A )

2. Remove instances from E that are covered by r

3. Add r to rule set: R = R v r

• Example of stopping criteria: Stop when

all instances belong to same class or all

attributes have same value

Data Mining Sanjay Ranka Fall 2003 19

Learn One Rule

• The objective of this function is to extract

the best rule that covers the current set of

training instances

– What is the strategy used for rule growing

– What is the evaluation criteria used for rule

growing

– What is the stopping criteria for rule growing

– What is the pruning criteria for generalizing

the rule

Data Mining Sanjay Ranka Fall 2003 20

University of Florida CISE department Gator Engineering

Learn One Rule

• Rule Growing Strategy

– General-to-specific approach

• It is initially assumed that the best rule is the empty

rule, r : { }  y , where y is the majority class of the

instances

• Iteratively add new conjuncts to the LHS of the rule

until the stopping criterion is met

– Specific-to-general approach

• A positive instance is chosen as the initial seed for a

rule

• The function keeps refining this rule by generalizing

the conjuncts until the stopping criterion is met

Data Mining Sanjay Ranka Fall 2003 21

Learn One Rule

• Rule Evaluation and Stopping Criteria

– Evaluate rules using rule evaluation metric

• Accuracy

• Coverage

• Entropy

• Laplace

• M-estimate

– A typical condition for terminating the rule

growing process is to compare the evaluation

metric of the previous candidate rule to the

newly grown rule

Data Mining Sanjay Ranka Fall 2003 22

University of Florida CISE department Gator Engineering

Learn One Rule

• Rule Pruning

– Each extracted rule can be pruned to improve their

ability to generalize beyond the training instances

– Pruning can be done by removing one of the conjuncts

of the rule and then testing it against a validation set

• Instance Elimination

– Instance elimination prevents the same rule from being

generated again

– Positive instances must be removed after each rule is

extracted

– Some rule based classifiers keep negative instances,

while some remove them prior to generating next rule

Data Mining Sanjay Ranka Fall 2003 25

Foil's Information Gain

• Compares the performance of a rule before and

after adding a new conjunct

• Suppose the rule r : A

y covers p 0 positive and

n 0 negative instances

• After adding a new conjunct B , the rule r’ : A^B

y covers p 1 positive and n 1 negative instances

• Foil's information gain is defined as

= t · [ log 2 (p 1 / (p 1 + n 1 )) - log 2 (p 0 / (p 0 + n 0 )) ]

where t is the number of positive instances covered

by both r and r’

Data Mining Sanjay Ranka Fall 2003 26

University of Florida CISE department Gator Engineering

Direct Method: RIPPER

• Growing a rule:

– Start from empty rule

– Add conjuncts as long as they improve Foil's

information gain

– Stop when rule no longer covers negative examples

– Prune the rule immediately using incremental

reduced error pruning

– Measure for pruning: v = ( p - n ) / ( p + n )

  • p : number of positive examples covered by the rule in the

validation set

  • n : number of negative examples covered by the rule in the

validation set

– Pruning method: delete any final sequence of

conditions that maximizes v

Data Mining Sanjay Ranka Fall 2003 27

Direct Method: RIPPER

• Building a Rule Set:

– Use sequential covering algorithm

• Finds the best rule that covers the current set of

positive examples

• Eliminate both positive and negative examples

covered by the rule

– Each time a rule is added to the rule set,

compute the description length

• Stop adding new rules when the new description

length is d bits longer than the smallest description

length obtained so far. d is often chosen as 64 bits

Data Mining Sanjay Ranka Fall 2003 28

University of Florida CISE department Gator Engineering

Direct Method: RIPPER

• Optimize the rule set:

– For each rule r in the rule set R

• Consider 2 alternative rules:

  • Replacement rule ( r* ): grow new rule from scratch
  • Revised rule ( r’ ): add conjuncts to extend the rule r

• Compare the rule set for r against the rule set for

r* and r’

• Choose rule set that minimizes MDL principle

– Repeat rule generation and rule optimization

for the remaining positive examples

Data Mining Sanjay Ranka Fall 2003 31

C4.5rules versus RIPPER

Amphibians Fishes Reptiles Birds Mammals

ACTUAL Amphibians 0 0 0 0 2

CLASS Fishes 0 3 0 0 0

Reptiles 0 0 3 0 1

Birds 0 0 1 2 1

Mammals 0 2 1 0 4

PREDICTED CLASS

Amphibians Fishes Reptiles Birds Mammals

ACTUAL Amphibians 2 0 0 0 0

CLASS Fishes 0 2 0 0 1

Reptiles 1 0 3 0 0

Birds 1 0 0 3 0

Mammals 0 0 1 0 6

PREDICTED CLASS

C4.5rules:

RIPPER:

Data Mining Sanjay Ranka Fall 2003 32

University of Florida CISE department Gator Engineering

Eager Learners

• So far we have learnt that classification involves

– An inductive step for constructing classification

models from data

– A deductive step for applying the derived model to

previously unseen instances

• For decision tree induction and rule based

classifiers, the models are constructed

immediately after the training set is provided

• Such techniques are known as eager learners

because they intend to learn the model as soon

as possible, once the training data is available

Data Mining Sanjay Ranka Fall 2003 33

Lazy Learners

• An opposite strategy would be to delay the

process of generalizing the training data until it

is needed to classify the unseen instances

• Techniques that employ such strategy are

known as lazy learners

• An example of lazy learner is the Rote Classifier ,

which memorizes the entire training data and

perform classification only if the attributes of a

test instance matches one of the training

instances exactly

Data Mining Sanjay Ranka Fall 2003 34

University of Florida CISE department Gator Engineering

Nearest-neighbor Classifiers

• One way to make the “ Rote Classifier”

approach more flexible is to find all

training instances that are relatively similar

to the test instance. They are called nearest

neighbors of the test instance

• The test instance can then be classified

according to the class label of its neighbors

• “If it walks like a duck, quacks like a duck, and

looks like a duck, then it’s probably a duck”

Data Mining Sanjay Ranka Fall 2003 37

1-nearest Neighbor

• If k = 1, we can illustrate the decision boundary

of each class by using a Voronoi diagram

Data Mining Sanjay Ranka Fall 2003 38

University of Florida CISE department Gator Engineering

Distance Metric

• Distance metric is required to compute the

distance between two instances

• A nearest neighbor classifier represents each

instance as a data point embedded in a d -

dimensional space, where d is the number of

continuous attributes

• Euclidean Distance

• Weighted Distance

– Weight factor, w = 1 / d^2

– Weight the vote according to the distance

i i^ i

d p q p q

Data Mining Sanjay Ranka Fall 2003 39

Choosing the value of k

• If k is too small,

classifier is sensitive

to noise points

• If k is too large

– Computationally

intensive

– Neighborhood may

include points from

other classes

X

Data Mining Sanjay Ranka Fall 2003 40

University of Florida CISE department Gator Engineering

Nearest Neighbor Classifiers

• Problems with Euclidean distance

– High dimensional data

• Curse of dimensionality

– Can produce counter intuitive results (e.g. text

document classification)

• Solution: Normalization

vs.

Euclidean distance between pairs of un-normalized vectors