Error Probability Estimation in Classification Systems, Slides of Pattern Classification and Recognition

Methods for estimating the error probability of a designed classification system using techniques such as error counting, resubstitution, holdout, and leave-one-out. The document also covers the statistical properties of these estimators and their reliability for small data sets.

Typology: Slides

2011/2012

Uploaded on 07/17/2012

bandhula
bandhula 🇮🇳

4.7

(10)

91 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
The goal is to estimate the error probability of the
designed classification system
Error Counting Technique
Let classes
Let data points in class for testing.
the number of
test points.
Let Pithe error probability for class ωi
The classifier is assumed to have been designed
using another independent data set
Assuming that the feature vectors in the test data set
are independent, the probability of kivectors from ωi
being in error is
M
i
iNN
1

ii
ikN
i
k
i
i
i
ii PP
k
N
k
)1(classified wrongly in prob
SYSTEM EVALUATION
i
i
N
M
docsity.com
pf3
pf4
pf5

Partial preview of the text

Download Error Probability Estimation in Classification Systems and more Slides Pattern Classification and Recognition in PDF only on Docsity!

1

The goal is to estimate the error probability of thedesigned classification system

Error Counting Technique

Let

classes

Let

data points in class

for testing.

the number oftest points.

Let

P

i^

the error probability for class

ω

i

The classifier is assumed to have been designedusing another

independent

data set

Assuming that the feature vectors in the test data setare independent, the probability of

k

i^

vectors from

ω

i

being in error is



M i

i^

N

N

(^1) ^

i i

i^

k N i

k i

i i

i

i^

P

P

N k

k

  

   

)

(^1) (

classified

wrongly

in

prob

SYSTEM EVALUATION

i

i M N

2

Since

P

’s are not known, estimate i

P

i^

by

maximizing the above binomial distribution. Itturns out that 

Thus, count the errors and divide by the totalnumber of test points in class. 

Total probability of error

i^ i

i^

k N

ˆP

  

M i^

i i

i^

k N

P

P

1

4

Exploiting the finite size of the data set.

Resubstitution method:Use

the

same

data

for

training

and

testing.

It

underestimates the error.

The estimate improves

for large

N

and large

ratios.

Holdout Method:

Given

N

divide it into:

N

training points

N

test points

N=N

+N 1

2

  • Problem:

Less data both for training and test

N^ l

5

Leave-one-out MethodThe steps:

  • Choose

one

sample

out

of

the

N

.^

Train

the

classifier using the remaining

N-

samples.

Test

the classifier using the selected sample.

Count

an error if it is misclassified.

  • Repeat

the

above

by

excluding

a

different

sample each time.

  • Compute the error probability by averaging the

counted errors