












Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An overview of ensemble methods, specifically focusing on bagging and boosting techniques. Bagging, or bootstrap aggregating, is a randomized algorithm based on bootstrapping that reduces variance by creating multiple models from different subsets of the data. Boosting, on the other hand, combines weak classifiers to create a strong one, with the weights of the data points adjusted in each iteration to focus on misclassified instances. The document also discusses feature selection, which aims to reduce the number of features by selecting the most relevant ones for modeling. This process can help improve model performance and reduce computational costs.
Typology: Study notes
1 / 20
This page cannot be seen from the preview
Don't miss anything!













-^ Bagging:
a^ randomized
algorithm
based
on^ bootstrapping
-^ What is
bootstrapping What
is^ bootstrapping
-^ Concept
of^ Bias
vs^ Variance
-^ Variance
reduction
-^ What
learning
algorithms
will^ be
good^
for^ bagging?
-^ Boosting:^ –
Combine
weak classifiers
(i e^
slightly better than random)
Combine
weak
classifiers
(i.e.,^
slightly
better
than
random)
-^ Training
using
the^ same
data^
set^ but
different
weights
-^ How
to^ update
weights?
-^ Are
all^ classifiers
equally
weighted?
-^ How
to^ incorporate
weights
in^ learning
(DT,^ KNN,
Naïve
Bayes)
-^ One
explanation
for^ not
overfitting:
maximizing
the^ margin
-^ Which
is^ more
sensitive
to^ outliers:
Boosting
or^ bagging?
Feature SelectionFeature
Selection
What
is^
feature
selection?
Task:^ classify
whether
a^ document
is^ about
catsData: word counts in the document
Task:^ predict
chances
of^ lung
disease
Data:^
medical
history
survey
cat^
2
Vegetarian
No
Data:^
word^
counts
in^ the
document X^
X
and^
35 it^
20 kitten^
8
t^2
Plays videogames
Yes Family history
No
Reduced
X^
Reduced
X
electric
2 trouble
4 then^
5
cat^
2 kitten^
8 feline^
2
Athletic
No Smoker
Yes Sex^
Male
Familyhistory
No Smoker
Yes
several
9 feline^
2 while^
4
Lung capacity
5.8L Hair color
Red Car^
Audi
… lemon
2
… Weight
185 lbs
Why feature selection?Why
feature
selection?
-^ Motivation:
try^
to^ find
a^ simple
model
y^
p
-^ Occam’s
razor:
the^
simplest
explanation
that
accounts
for
the^ data
is^ the
best
Wh^
t j^
t^
l^ ifi
(l
i^
l^
ith^
) th t
-^ Wh
y^ not
just
use
classifiers
(learning
algorithms)
that
are^
not^
(or^ less)
sensitive
to^ irrelevant
features?
-^ Even such methods have trouble when faced with largeEven
such
methods
have
trouble
when
faced
with
large
number
of^ irrelevant
features
-^ They
are^
more
prone
to^ overfitting
-^ This
is^ because
with
large
number
of^ irrelevant
features,
it
is^ more
likely
to^ have
some
chance
structure
in^ the
data
that^
the^ learning
algorithm
try^ to
learn,
thus
overfit
FilteringFiltering
-^ Basic
-^ Intuition:
i if x =^ yf i^ for
all^ i,
then
f^ is^
good
no
h^
l^
ifi^
i
matter
what
our
classifier
is
-^ Many
popular
scores
including
one
we^
already
k^
f know
of:
-^ Information
gain
-^ Th
FilteringFiltering
-^ Advantages:
f^ t
-^ Very
fast
-^ Simple
to^ apply
-^ Di
take
into
account
which
learning
algorithm
will be usedwill^
be^ used
-^ Doesn’t
take
into
account
interactions
between
features^ •^ Two
features
may
each
look
bad,
but^
jointly
predict
class
well
-^ S
Wrapper ApproachWrapper
Approach
All features
MultipleFeaturesubsets
Evaluation
search
subsets
Why
include
the
learning
algorithm
in^
the
l^
? loop?
+^
+^
+^ ‐
x^2
+^
x^4
+^
+^
+^ ‐ ‐^
‐^
‐^ ‐
+^
+^
‐ ‐‐
x^1
‐^
‐^ ‐
x^3
Different
learning
algorithms
may^ work
well^ with
different
feature
subsets
Exhaustive search is expensiveExhaustive
search
is^
expensive Empty^ setp y
Kohavi
‐John,^
1997
Full set
N features 2
N^ possible feature subsets!
Full^ set
N^ features
possible
feature
subsets!
We^
need
something
faster!
No^ improvementStop!Stop!
Backward
elimination Initialize
s={all
features}
Do:Delete
feature
from
s
which
improves
Score(s)
most
While
score(s)
can^
be^ improved
Comparisons
-^ Which
of^ these
two
methods
do^
you
expect
to^ be
faster:^ –^ Forward
selection
-^ Which
of^ them
do^
you
expect
to^ perform
better:
-^ Backward
elimination
-^ Better
at^ finding
interacting
features
B t f
tl^ t^
i^ t
fit th
l^
d l^
t th
-^ But
frequently
too^
expensive
to^ fit
the^
large
models
at^ th
e
beginning
of^ search
-^ Both can be too greedyBoth
can
be^
too^
greedy
-^ Why? –^ How
to^ improve?
p
Feature selection summaryFeature
selection
summary
-^ Filter
approaches
-^ consider
one
feature
at^ a
time
-^ Wrapper
approaches
through
feature
subsets
and
include
learning
algorithm
in^ selection
process
-^ Wrapper is more powerful but much more expensive•^ Wrapper
is^ more
powerful
but^
much
more
expensive
-^ Filter
can^
be^ good
for^ initial
screening
-^ Wed Nov 5th -^ midterm (cover contents up to today)
Wed
Nov
5th
midterm
(cover
contents
up^
to^ today)
-^ Unsupervised
learning
and
pattern
discovery
starting
Friday
th 31
weeks
g^
y^
-^ Reinforcement
learning
3 weeks
-^ We will have a guest lecture on automatic speechWe
will
have
a^ guest
lecture
on^
automatic
speech
recognition
on^
th 17 of^ Nov