Machine Learning Assignment 5: Implementing AdaBoost Algorithm using Decision Stumps | Assignments Computer Science

CSCI 5622, Sec 001 Professor Mozer

Machine Learning Spring 2001

Assignment 5

Assigned: Thu Mar 22, 2001

Due: Thu April 5, 2001

In this assignment, you will implement an ensemble technique known as AdaBoost. The AdaBoost algo-

rithm is described in the Schapire paper I’m handing out, and more information on AdaBoost can be found

on his web page (http://www.research.att.com/~schapire/boost.html).

Although you can build ensembles composed of any type of machine learning model, I want you to focus

on a particularly simple model, a

decision stump

, which is simply a decision tree with a single branch. A

single decision stump is a

weak learner

—it does not perform particularly well—but an ensemble of deci-

sion stumps can perform as well as or better than a full-blown decision tree.

Code organization

One approach to this problem is to break it into three separate programs—one that creates decision stubs,

one that produces weightings for AdaBoost, and one that produces predictions and accuracy estimates for

the test set.

The decision stump code should (1) create a decision stump based on a training set and a file containing

weightings; (2) generate an output file containing one line per training example, including its target classifi-

cation, its classification by the decision stub, and the current weighting of the example; and (3) generate an

output file containing one line per test example, including the target and actual classifications.

The boosting code should read in the output file related to the training set, compute the weightings for the

next iteration, and output them to a file for use by the decision stump code.

The prediction code should read in all of the test set files and combine the predictions from the individual

stumps to produce final predictions, which can be compared to the target classification to produce an error

rate.

Decision stump

You will first have to write a program that creates a decision stump based on some training data, and then

classifies a test data set. You should be able to modify your decision tree program to implement the deci-

sion stump, or you can write a new program re-using routines from your decision tree software. Here are

the important differences between the decision tree and the decision stump you will need for this assign-

ment:

•The decision stump has only one level of branching. Thus, it is a decision tree with max_depth = 1. You

can either remove the recursion from your decision tree software, or leave the software unchanged and

force max_depth = 1.

•The data set you used for the decision tree included attributes that all had the values “y” or “n”. In this

assignment each attribute has a different set of values, and some attributes have more than two values.

(Sorry, I tried to find another data set with binary-valued attributes, and there just weren’t any interesting

ones.)

•You may need to handle the case where decisions are based on an attribute dimension which has val-

ues in the test set that weren’t contained in the training set. In this case, you should classify the exam-

ple according to the majority value in the root node.

•In AdaBoost, classifiers are created based on a set of

weighted

training examples. Thus, your code for

training the stump should read in a set of weights associated with the training examples. (If you put the

weights in the same order as the training examples in the data file, you don’t need to figure out the cor-

Machine Learning Assignment 5: Implementing AdaBoost Algorithm using Decision Stumps, Assignments of Computer Science

Related documents

Partial preview of the text

Download Machine Learning Assignment 5: Implementing AdaBoost Algorithm using Decision Stumps and more Assignments Computer Science in PDF only on Docsity!

Assignment 5

Assigned: Thu Mar 22, 2001

Due: Thu April 5, 2001

Code organization