More Methodology: Nearest-Neighbor Classifiers | CS 591, Exams of Programming Languages

Material Type: Exam; Class: ST: Prog Analy &Mechanization; Subject: Computer Science; University: University of New Mexico; Term: Unknown 1989;

Typology: Exams

Pre 2010

Uploaded on 07/22/2009

koofers-user-kxl
koofers-user-kxl 🇺🇸

4.5

(2)

10 documents

1 / 24

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
More Methodology;
Nearest-Neighbor
Classifiers
Sec 4.7
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18

Partial preview of the text

Download More Methodology: Nearest-Neighbor Classifiers | CS 591 and more Exams Programming Languages in PDF only on Docsity!

More Methodology;

Nearest-Neighbor

Classifiers

Sec 4.

Review: Properties of DTs

Axis orthagonal, hyperrectangular, piecewise- constant models

Categorical labels

Non-metric

Holdout data

Usual to “hold out” a separate set of data for testing; not used to train classifier

A.k.a., test set, holdout set, evaluation set, etc.

E.g.,

is training set accuracy

is test set (or generalization ) accuracy

X = [X 1 , X 2 ,... , XN ]

⇒ X train = [X 1 , X 2 , ..., Xi ]

acc(X train )

acc(X test )

X test = [X i+1 , X i+2 ,... , XN ]

Gotchas...

What if you’re unlucky when you split data into train/test?

E.g., all train data are class A and all test are class B?

No “red” things show up in training data

Best answer: stratification

Try to make sure class (+feature) ratios are same in train/test sets (and same as original data)

Why does this work?

Almost as good: randomization

Shuffle data randomly before split

Why does this work?

CV in pix

[ X ; Y ]

Original data [ X ’; Y ’] Random shuffle k -way partition [ X1Y1 ’] [ X2Y2 ’] [ XkYk ’] ... k train/ test sets k accuracies 53.7% 85.1% 73.2%

But is it really learning?

Now we know how well our models are performing

But are they really learning?

Maybe any classifier would do as well

E.g., a default classifier (pick the most likely class) or a random classifier

How can we tell if the model is learning anything?

Go back to first definitions

What does it mean to learn something?

Measuring variance

Cross validation helps you get better estimate of accuracy for small data

Randomization (shuffling the data) helps guard against poor splits/ordering of the data

Learning curves help assess learning rate/asymptotic accuracy

Still one big missing component: variance

Definition: Variance of a classifier is the fraction of error due to the specific data set it’s trained on

Measuring variance

Variance tells you how much you expect your classifier/performance to change when you train it on a new (but similar) data set

E.g., take 5 samplings of a data source; train/test 5 classifiers

Accuracies: 74.2, 90.3, 58.1, 80.6, 90.

Mean accuracy: 78.7%

Std dev of acc: 13.4%

Variance is usually a function of both classifier and data source

High variance classifiers are very susceptible to small changes in data

Putting it all together

10 20 30 40 50 60 70 80 90 40 50 60 70 80 90 100 % data size accuracy “hepatitis” data

5 minutes of math...

Decision trees are non-metric

Don’t know anything about relations between instances, except sets induced by feature splits

Often, we have well-defined distances between points

Idea of distance encapsulated by a metric

5 minutes of math...

Examples:

Euclidean distance

d(X

a

, X

b

(x

a 1

− x

b 1

2

+ · · · + (x

a d

− x

b d

2

= ((X

a

− X

b

T

· (X

a

− X

b

1 2

d

i=

(x

a i

− x

b i

2

  • Note: omitting the square root still yields a metric and usually won’t change our results

5 minutes of math...

Examples:

Manhattan (taxicab) distance

Distance travelled along a grid between two points

No diagonals allowed

d(X

a

, X

b

) = |x

a 1

− x

b 1

| + · · · + |x

a d

− x

d b

d

i=

|x

a i

− x

b i

5 minutes of math...

Examples:

What if some attribute is categorical?

Typical answer is 0/1 distance :

For each attribute, add 1 if the instances differ in that attribute, else 0

d

0 / 1

d

i=

δ(x

a i

= x

b i

Distances in classification

Nearest neighbor : find the nearest instance to the query point in feature space, return the class of that instance

Simplest possible distance-based classifier

With more notation:

f (X) = Class( arg min

X ′ ∈Xtrain

d(X, X