


























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The key points are: Empirical Risk Minimization, Family of Classifiers, Uniform Convergence, Set of Class Labels, Hypothesis Space, Action Space, Loss Function, Accuracy of Two Estimates, Triangular Inequality, Hoeffiding Bound
Typology: Slides
1 / 98
This page cannot be seen from the preview
Don't miss anything!



























































































We have bee discussing the issue of consistency ofempirical risk minimization (ERM).
PR NPTEL course – p.1/
We have bee discussing the issue of consistency ofempirical risk minimization (ERM).
We have seen that ERM is consistent if theconvergence as per law of large numbers is uniformover
, the family of classifiers over which we are
minimizing empirical risk.
PR NPTEL course – p.2/
We have bee discussing the issue of consistency ofempirical risk minimization (ERM).
We have seen that ERM is consistent if theconvergence as per law of large numbers is uniformover
, the family of classifiers over which we are
minimizing empirical risk.
In this class we continue our discussion oncharacterizing families
where such uniform
convergence holds.
We first briefly recall the notation and the results weproved last class.
PR NPTEL course – p.4/
We are given
PR NPTEL course – p.5/
We are given
Feature space
)
Set of class labels
)
PR NPTEL course – p.7/
We are given
Feature space
)
Set of class labels
)
family of classifiers
) PR NPTEL course – p.8/
We are given
Feature space
)
Set of class labels
)
family of classifiers
)
Each
h
is a function,
h
,
where
is called
action space
.
Training data:
i
, y
i
, i
, n
drawn
iid
according to some distribution
xy
on
.
PR NPTEL course – p.10/
Loss function:
.
PR NPTEL course – p.11/
Loss function:
.
The risk function,
, is given by
h
y, h
y, h
dP
xy
We assume that
is bounded so that the expectation
always exists.
Let
h
∗
= arg min
h
∈H
h
PR NPTEL course – p.13/
Loss function:
.
The risk function,
, is given by
h
y, h
y, h
dP
xy
We assume that
is bounded so that the expectation
always exists.
Let
h
∗
= arg min
h
∈H
h
We define the goal of learning as finding
h
∗
, the global
minimizer of risk.
PR NPTEL course – p.14/
However, we can not directly minimize
.
The
empirical risk function
,
n
, is defined by
n
h
(^1) n
n
i
=
y
i
, h
i
PR NPTEL course – p.16/
However, we can not directly minimize
.
The
empirical risk function
,
n
, is defined by
n
h
(^1) n
n
i
=
y
i
, h
i
Let
ˆh
∗ n
= arg min
h
∈H
n
h
PR NPTEL course – p.17/
We would like the algorithm to satisfy:
ǫ, δ >
,
, such that
Prob
ˆh
∗ n
h
∗
ǫ
δ,
n
PR NPTEL course – p.19/
We would like the algorithm to satisfy:
ǫ, δ >
,
, such that
Prob
ˆh
∗ n
h
∗
ǫ
δ,
n
We would also like to (approximately) know the truerisk of the learnt classifier and hence like to have
Prob
n
ˆh
∗ n
h
∗
ǫ
δ,
n
PR NPTEL course – p.20/