Download Bayesian Learning: Homework 5 on Machine Learning | CS 410 and more Assignments Computer Science in PDF only on Docsity! CS 410/510 Machine Learning Winter, 2006 Homework 5: Bayesian Learning Due Tuesday, February 21. For this homework you will implement a naive Bayes classifier, and compare its performance on classifying spam and non-spam with that of the decision trees you used in Homework 2. You will be using the UCI spam database, as you did for Homework 2. Here are the steps you need to perform: I. Create binary attributes. The UCI spam data uses 57 continuous-valued attributes. You need to transform these into binary-valued attributes by finding a threshold c for each attribute that maximizes information gain. Write a program to do this using the algorithm we discussed in class (and described in the textbook in Section 3.7.2): For each attribute ai: 1. Sort the examples numerically with respect to ai, lowest to highest 2. Find adjacent examples that differ in target classification. 3. Choose candidate threshold ci as the midpoint of the corresponding interval. 4. Compute the information gain for each such candidate threshold ci. Choose the one that gives highest information gain. (Break ties randomly.) Report these attributes and corresponding ci values in your writeup. II. Train naive Bayes classifier. Now you have a set of 57 binary attributes of the form ai > ci. Use these binary attributes to train a naive Bayes classifier, using the training data UCI-spam.data that was given for Homework 2. For probabilities, use the m-estimate of probability, described in class and in the textbook (Section 6.9.1.1). Use p = 1/2, since each attribute has two possible values. Use m = 2. III. Test naive Bayes classifier and compare its results with decision tree. Now run your naive Bayes classifier on the examples in UCI-spam.test. Report the accuracy on this test set. Compare it with the accuracy you obtained on this test set in Homework 2 with your (pruned) decision tree that was trained on UCI-spam.data. Also, for each hypothesis (your naive Bayes classifier and your decision tree), report the recall and the precision. 1