Adaboost Algorithm: Building Strong Classifiers from Weak Ones, Exercises of Artificial Intelligence

The adaboost algorithm is a machine learning technique used to create strong classifiers from a series of weak classifiers. In this document, we explore the adaboost method, which was developed by freund and schapire, and learn how to calculate error rates, determine weights, and reweight samples to improve classification accuracy. Essential for students studying machine learning, data mining, or artificial intelligence.

Typology: Exercises

2011/2012

Uploaded on 07/31/2012

shaina_44kin
shaina_44kin 🇮🇳

3.9

(9)

64 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
6.034f Boosting Notes
Patrick Winston and Luis Ortiz
Draft of November 17, 2009
A strong classifier is one that has an error rate close to zero. A weak classifier is one that has
an error rate just below 1
2, producing answers just a little better than a coin flip.
Freund and Schapire discovered that you can construct a strong classifier from weak
classifiers such that the strong classifier will correctly classify all samples in a sample set.
Better still, the strong classifier consists of a sequence of weighted classifiers, determined step
by step. In the following, the multiplicative weight found in the first step is α1 and the weak
classifier is h1(xi), and in step s, the weight is αs and the classifier is hs(xi):
H(xi) = sign(α1h1(xi)+ α2h2(xi)+ ···+ αshs(xi) ···
where
sign = +1 for positive arguments
1 for negative arguments
hs(xi)= +1 for samples the classifier thinks belong to the class
1 for samples the classifier thinks do not belong to the class
Freund and Schapire named their method Adaboost, an acronym for adaptive boosting.
In the first Adaboost step, you find the weak classifier, h1(xi), that produces the lowest
error rate; then, you find the corresponding multiplier, α1. In step s, you first find the weak
classifier, hs(xi), that produces the lowest error rate with the samples reweighted to emphasize
previously misclassified samples; then you find the corresponding multiplier, αs. You continue
taking steps until the classifier H(xi) correctly classifies all samples or you cannot find any
weak classifier for the next step.
Several questions emerge:
How do you compute the error rate
, E, for the candidates for hs(xi)?
How do you compute αs once you have hs(xi)?
How do you reweight the samples to emphasize the misclassified samples for the next
step?
To compute the error rate, you assign to each sample, i, at each step, s, an emphasis-
determining weight, ws
i . The weights used in step 1 are all the same:
1
1
w =
i number of samples
Each time you calculate the weights for the next step, you normalize so that the weights
still add to 1:
wi
s =1
i
We use E for the error rate to avoid confusion with the base of the natural logarithms, e.
docsity.com
pf3

Partial preview of the text

Download Adaboost Algorithm: Building Strong Classifiers from Weak Ones and more Exercises Artificial Intelligence in PDF only on Docsity!

6.034f Boosting Notes

Patrick Winston and Luis Ortiz

Draft of November 17, 2009

A strong classifier is one that has an error rate close to zero. A weak classifier is one that has

an error rate just below 12 , producing answers just a little better than a coin flip. Freund and Schapire discovered that you can construct a strong classifier from weak classifiers such that the strong classifier will correctly classify all samples in a sample set. Better still, the strong classifier consists of a sequence of weighted classifiers, determined step by step. In the following, the multiplicative weight found in the first step is α^1 and the weak

classifier is h^1 ( x (^) i ), and in step s , the weight is α s^ and the classifier is h s ( x (^) i ):

H( x (^) i ) = sign(α^1 h^1 ( x (^) i ) + α^2 h^2 ( x (^) i ) + · · · + α s h s ( x (^) i ) · · · where

sign = +1^ for^ positive^ arguments − 1 for negative arguments

h s ( xi ) =

+1 for samples the classifier thinks belong to the class − 1 for samples the classifier thinks do not belong to the class

Freund and Schapire named their method Adaboost , an acronym for ada ptive boosting. In the first Adaboost step, you find the weak classifier, h^1 ( x (^) i ), that produces the lowest error rate; then, you find the corresponding multiplier, α^1. In step s , you first find the weak classifier, h s ( x (^) i ), that produces the lowest error rate with the samples reweighted to emphasize previously misclassified samples; then you find the corresponding multiplier, α s. You continue taking steps until the classifier H( x (^) i ) correctly classifies all samples or you cannot find any weak classifier for the next step. Several questions emerge:

  • How do you compute the error rate

† , E , for the candidates for h s ( x (^) i )?

  • How do you compute α s^ once you have h s ( x (^) i )?
  • How do you reweight the samples to emphasize the misclassified samples for the next step?

To compute the error rate, you assign to each sample, i , at each step, s , an emphasis- determining weight, w s i. The weights used in step 1 are all the same:

w (^) i = number of samples Each time you calculate the weights for the next step, you normalize so that the weights still add to 1:

w i^ s^ = 1 i

†We use E for the error rate to avoid confusion with the base of the natural logarithms, e.

2

The error rate of a candidate classifier for a particular step is the sum of the weights for the samples that the candidate classifier misclassifies at that step:

Es candidate^ = w si for i misclassified by the candidate at step s i Es^ is the error rate for the best of the candidate classifiers. But what about computing α s^ and reweighting the samples. With some moderately complex mathematics, Freund and Shipiri determined that computing new weights from old weights using the following formula ensures that the overall error for H( x (^) i ), as you

add classifiers, will stay under an exponential bound, and eventually go to zero. Ns^ is a

normalizing constant

† for step s that ensures that all the new weights, w s i +1, add up to 1. ⎧ (^) w s e −α

s ⎨ (^) Ns^ i^ for^ correctly^ classified^ samples w^ s + i =^ ⎩ w s i (^) es for misclassified samples N s With more math, Freund and Shipiri determined that the exponential bound on the overall error of H( xi ) is minimized if Ns^ is minimized. This led them to a formula for α s :

1 1 − Es α s^ = ln 2 Es At this point, you have all you need to write an Adaboost program:

  • You use uniform weights to start.
  • For each step, you find the classifier that yields the lowest error rate for the current

weights, w s i.

  • You use that best classifier, h s ( x (^) i ), to compute the error rate associated with the step, Es
  • You determine the alpha for the step, α s^ from the error for the step, Es^.
  • With the alpha in hand, you compute the weights for the next step, w s i +1, from the weights

for the current step, w s i , taking care to include a normalizing factor, Ns , so that the new weights add up to 1.

  • You stop successfully when H( x (^) i ) correctly classifies all the samples, xi ; you stop unsuc cessfully if you reach a point where there is no weak classifier, one with an error rate < 1 2. You, however, are not a computer, so calculating those exponentials and logarithms is im practical on an examination. You need to massage the formulas a bit to make them work for you. First, you plug the formula for α s^ into the reweighting formula, producing the following:

⎪ w i

s (^) Es ⎨ (^) Ns (^) 1 − Es for^ correctly^ classified^ samples w^ s + i =^ ⎪ √ ⎩ w si 1 − Es for misclassified samples N s^ Es Now, because Ns^ must be that number that makes the new weights add up to 1, you can write the following:

†We use Ns (^) rather than Z s , used by Freund and Schapire, to avoid confusion with the number 2 when written by hand.