Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Error Bounds for Classification: Hoeffding's Inequality and Bonferroni's Bond, Study notes of Probability and Statistics

Yale University Probability and Statistics

Hoeffding's inequality and its application to gauge the closeness of empirical risks to their expected values in the context of classification error bounds. The document also introduces bonferroni's bound and its use in obtaining probabilities of significant deviations from expected values. The text further explains the distribution-free nature of the bound and its implications for the minimum empirical risk classifier.

Typology: Study notes

2011/2012

Uploaded on 10/19/2012

lumidee 🇺🇸

4.4

(48)

363 documents

1 / 4

This page cannot be seen from the preview

Don't miss anything!

Connexions module: m16265 1

Classification Error Bounds

∗

Robert Nowak

This work is produced by The Connexions Project and licensed under the

Creative Commons Attribution License

†

1 Recap: Classier design

Given a set of training data

{Xi, Yi}n

i=1

and a nite collection of candidate functions

F

, select ^

fn∈ F

that

(hopefully) is a good predictor for future cases. That is

^

fn=argmin

f∈F

^

Rn(f)

(1)

where ^

Rn(f)

is the empirical risk. For any particular

f∈ F

, the corresponding empirical risk is dened as

^

Rn(f) = 1

n

X

i=1

1{f(Xi)6=Yi}.

(2)

2 Hoeding's inequality

Hoeding's inequality (Cherno's bound in this case) allows us to gauge how close ^

Rn(f)

is to the true risk

of

f

,

R(f)

, in probability

P|

^

Rn(f)−R(f)| ≥ ε≤2e−2nε2.

(3)

Since our selection process involves deciding among all

f∈ F

, we would like to gauge how close the

empirical risks are to their expected values. We can do this by studying the probability that one or more of

the empirical risks deviates signicantly from its expected value. This is captured by the probability

Pmax

f∈F |

^

Rn(f)−R(f)| ≥ ε.

(4)

Note that the event

max

f∈F |

^

Rn(f)−R(f)| ≥ ε

(5)

∗

Version 1.2: Feb 11, 2009 10:37 am US/Central

†

http://creativecommons.org/licenses/by/2.0/

http://cnx.org/content/m16265/1.2/

Discover Study notes of Probability and Statistics Yale University

Partial preview of the text

Download Error Bounds for Classification: Hoeffding's Inequality and Bonferroni's Bond and more Study notes Probability and Statistics in PDF only on Docsity!

Classification Error Bounds

Robert Nowak

This work is produced by The Connexions Project and licensed under the Creative Commons Attribution License †

1 Recap: Classier design

Given a set of training data {Xi, Yi} n i=1 and a nite collection of candidate functions^ F, select

^

f (^) n ∈ F that

(hopefully) is a good predictor for future cases. That is

^

fn= argmin f ∈F

^

Rn (f^ )^ (1)

where

^

Rn (f^ )^ is the empirical risk. For any particular^ f^ ∈ F, the corresponding empirical risk is dened as

^

Rn (f^ ) =

n

∑^ n

i=

(^1) {f (Xi) 6 =Yi}. (2)

2 Hoeding's inequality

Hoeding's inequality (Cherno's bound in this case) allows us to gauge how close

^

Rn (f^ )^ is to the true risk

of f , R (f ), in probability

P

^

Rn (f^ )^ −^ R^ (f^ )^ | ≥^ ε

≤ 2 e − 2 nε^2

. (3)

Since our selection process involves deciding among all f ∈ F, we would like to gauge how close the

empirical risks are to their expected values. We can do this by studying the probability that one or more of

the empirical risks deviates signicantly from its expected value. This is captured by the probability

P

max f ∈F

^

Rn (f^ )^ −^ R^ (f^ )^ | ≥^ ε

Note that the event

max f ∈F

^

Rn (f^ )^ −^ R^ (f^ )^ | ≥^ ε^ (5)

∗Version 1.2: Feb 11, 2009 10:37 am US/Central †http://creativecommons.org/licenses/by/2.0/

is equivalent to union of the events

f ∈F

^

Rn (f ) − R (f ) | ≥ ε}. (6)

Therefore, we can use Bonferonni's bound (aka the union of events or union bound) to obtain

P

max f ∈F

^

Rn (f^ )^ −^ R^ (f^ )^ | ≥^ ε

= P

f ∈F |

^

Rn (f^ )^ −^ R^ (f^ )^ | ≥^ ε

f ∈F P

^

Rn (f^ )^ −^ R^ (f^ )^ | ≥^ ε

f ∈F 2 e

− 2 nε^2

= 2 |F|e−^2 nε

2

where |F| is the number of classiers in F. In the proof of Hoeding's inequality we also obtained a one-sided

inequality that implied

P

R (f ) −

^

Rn (f^ )^ ≥^ ε

≤ e − 2 nε^2 (8)

and hence

P

max f ∈F

R (f ) −

^

Rn (f^ )^ ≥^ ε

≤ |F|e − 2 nε^2

. (9)

We can restate the inequality above as follows, For all f ∈ F and for all δ > 0 with probability at least 1 − δ

R (f ) ≤

^

Rn (f ) +

log|F| + log (1/δ)

2 n

This follows by setting δ = |F|e − 2 nε^2 and solving for ε. Thus with a high probability (1 − δ), the true risk for

all f ∈ F is bounded by the empirical risk of f plus a constant that depends on δ > 0 , the number of training

samples n, and the size F. Most importantly the bound does not depend on the unknown distribution PXY.

Therefore, we can call this a distribution-free bound.

3 Error Bounds

We can use the distribution-free bound above to obtain a bound on the expected performance of the

minimum empirical risk classier

^

f (^) n = argmin f ∈F

^

Rn (f^ )^.^ (11)

We are interested in bounding

E

[

R

^

f (^) n

)]

− min f ∈F

R (f ) (12)

the expected risk of

^

f (^) n minus the minimum risk for all f ∈ F. Note that this dierence is always non-negative

since

^

f (^) n is at best as good as

f ∗ = argmin f ∈F

R (f ). (13)

Thus

E

[

R

^

f (^) n

^

Rn (f^

∗ )

]

≤ C (F, n, δ) + δ. (23)

So we have

E

[

R

^

f (^) n

)]

− min f ∈F

R (f ) ≤

log|F| + log (1/δ)

2 n

δ, ∀δ > 0. (24)

In particular, for δ =

1 /n, we have

E

[

R

^

f (^) n

)]

− min f ∈F

R (f ) ≤

log|F|+logn 2 n +^ √^1 n

log|F|+logn+ n ,^ since^

x +

y ≤

x + y, ∀ x, y > 0

4 Application: Histogram Classier

Let F be the collection of all classiers with M equal volume cells. Then |F| = 2 M , and the histogram

classication rule

^

f (^) n = argmin f ∈F

n

∑^ n

i=

(^1) {f (Xi) 6 =Yi}

satises

E

[

R

^

f (^) n

)]

− min f ∈F

R (f ) ≤

M log2 + 2 + logn

n

which suggests the choice M = log 2 n (balancing M log 2 with logn), resulting in

E

[

R

^

f (^) n

)]

− min f ∈F

R (f ) = O

logn

n

Error Bounds for Classification: Hoeffding's Inequality and Bonferroni's Bond, Study notes of Probability and Statistics

Related documents

Partial preview of the text

Download Error Bounds for Classification: Hoeffding's Inequality and Bonferroni's Bond and more Study notes Probability and Statistics in PDF only on Docsity!

Classification Error Bounds

Robert Nowak

1 Recap: Classier design

^

^

^

^

^

2 Hoeding's inequality

^

P

^

P

^

^

^

P

^

= P

^

^

P

^

P

^

^

3 Error Bounds

^

^

E

[

R

^

)]

^

^

E

[

R

^

^

]

E

[

R

^

)]

E

[

R

^

)]

4 Application: Histogram Classier

^

E

[

R

^

)]

E

[

R

^

)]

1 Recap: Classier design

2 Hoeding's inequality

4 Application: Histogram Classier