Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Bayesian Learning: Homework 5 on Machine Learning | CS 410, Assignments of Computer Science

Material Type: Assignment; Class: TOP: INTRO TO MULTIMEDIA NTWRK; Subject: Computer Science; University: Portland State University; Term: Unknown 2006;

Typology: Assignments

Pre 2010

Uploaded on 08/18/2009

koofers-user-hd1-1
koofers-user-hd1-1 🇺🇸

10 documents

1 / 2

Toggle sidebar

Related documents


Partial preview of the text

Download Bayesian Learning: Homework 5 on Machine Learning | CS 410 and more Assignments Computer Science in PDF only on Docsity! CS 410/510 Machine Learning Winter, 2006 Homework 5: Bayesian Learning Due Tuesday, February 21. For this homework you will implement a naive Bayes classifier, and compare its performance on classifying spam and non-spam with that of the decision trees you used in Homework 2. You will be using the UCI spam database, as you did for Homework 2. Here are the steps you need to perform: I. Create binary attributes. The UCI spam data uses 57 continuous-valued attributes. You need to transform these into binary-valued attributes by finding a threshold c for each attribute that maximizes information gain. Write a program to do this using the algorithm we discussed in class (and described in the textbook in Section 3.7.2): For each attribute ai: 1. Sort the examples numerically with respect to ai, lowest to highest 2. Find adjacent examples that differ in target classification. 3. Choose candidate threshold ci as the midpoint of the corresponding interval. 4. Compute the information gain for each such candidate threshold ci. Choose the one that gives highest information gain. (Break ties randomly.) Report these attributes and corresponding ci values in your writeup. II. Train naive Bayes classifier. Now you have a set of 57 binary attributes of the form ai > ci. Use these binary attributes to train a naive Bayes classifier, using the training data UCI-spam.data that was given for Homework 2. For probabilities, use the m-estimate of probability, described in class and in the textbook (Section 6.9.1.1). Use p = 1/2, since each attribute has two possible values. Use m = 2. III. Test naive Bayes classifier and compare its results with decision tree. Now run your naive Bayes classifier on the examples in UCI-spam.test. Report the accuracy on this test set. Compare it with the accuracy you obtained on this test set in Homework 2 with your (pruned) decision tree that was trained on UCI-spam.data. Also, for each hypothesis (your naive Bayes classifier and your decision tree), report the recall and the precision. 1