

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The adaboost algorithm is a machine learning technique used to create strong classifiers from a series of weak classifiers. In this document, we explore the adaboost method, which was developed by freund and schapire, and learn how to calculate error rates, determine weights, and reweight samples to improve classification accuracy. Essential for students studying machine learning, data mining, or artificial intelligence.
Typology: Exercises
1 / 3
This page cannot be seen from the preview
Don't miss anything!


A strong classifier is one that has an error rate close to zero. A weak classifier is one that has
an error rate just below 12 , producing answers just a little better than a coin flip. Freund and Schapire discovered that you can construct a strong classifier from weak classifiers such that the strong classifier will correctly classify all samples in a sample set. Better still, the strong classifier consists of a sequence of weighted classifiers, determined step by step. In the following, the multiplicative weight found in the first step is α^1 and the weak
classifier is h^1 ( x (^) i ), and in step s , the weight is α s^ and the classifier is h s ( x (^) i ):
H( x (^) i ) = sign(α^1 h^1 ( x (^) i ) + α^2 h^2 ( x (^) i ) + · · · + α s h s ( x (^) i ) · · · where
sign = +1^ for^ positive^ arguments − 1 for negative arguments
h s ( xi ) =
+1 for samples the classifier thinks belong to the class − 1 for samples the classifier thinks do not belong to the class
Freund and Schapire named their method Adaboost , an acronym for ada ptive boosting. In the first Adaboost step, you find the weak classifier, h^1 ( x (^) i ), that produces the lowest error rate; then, you find the corresponding multiplier, α^1. In step s , you first find the weak classifier, h s ( x (^) i ), that produces the lowest error rate with the samples reweighted to emphasize previously misclassified samples; then you find the corresponding multiplier, α s. You continue taking steps until the classifier H( x (^) i ) correctly classifies all samples or you cannot find any weak classifier for the next step. Several questions emerge:
† , E , for the candidates for h s ( x (^) i )?
To compute the error rate, you assign to each sample, i , at each step, s , an emphasis- determining weight, w s i. The weights used in step 1 are all the same:
w (^) i = number of samples Each time you calculate the weights for the next step, you normalize so that the weights still add to 1:
w i^ s^ = 1 i
†We use E for the error rate to avoid confusion with the base of the natural logarithms, e.
2
The error rate of a candidate classifier for a particular step is the sum of the weights for the samples that the candidate classifier misclassifies at that step:
Es candidate^ = w si for i misclassified by the candidate at step s i Es^ is the error rate for the best of the candidate classifiers. But what about computing α s^ and reweighting the samples. With some moderately complex mathematics, Freund and Shipiri determined that computing new weights from old weights using the following formula ensures that the overall error for H( x (^) i ), as you
add classifiers, will stay under an exponential bound, and eventually go to zero. Ns^ is a
normalizing constant
† for step s that ensures that all the new weights, w s i +1, add up to 1. ⎧ (^) w s e −α
s ⎨ (^) Ns^ i^ for^ correctly^ classified^ samples w^ s + i =^ ⎩ w s i (^) e +α s for misclassified samples N s With more math, Freund and Shipiri determined that the exponential bound on the overall error of H( xi ) is minimized if Ns^ is minimized. This led them to a formula for α s :
1 1 − Es α s^ = ln 2 Es At this point, you have all you need to write an Adaboost program:
weights, w s i.
for the current step, w s i , taking care to include a normalizing factor, Ns , so that the new weights add up to 1.
⎪ w i
⎧ s (^) Es ⎨ (^) Ns (^) 1 − Es for^ correctly^ classified^ samples w^ s + i =^ ⎪ √ ⎩ w si 1 − Es for misclassified samples N s^ Es Now, because Ns^ must be that number that makes the new weights add up to 1, you can write the following:
†We use Ns (^) rather than Z s , used by Freund and Schapire, to avoid confusion with the number 2 when written by hand.