


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Assignment; Professor: Roth; Class: Machine Learning; Subject: Computer Science; University: University of Illinois - Urbana-Champaign; Term: Fall 2008;
Typology: Assignments
1 / 4
This page cannot be seen from the preview
Don't miss anything!



CS446: Pattern Recognition and Machine Learning Fall 2008
Handed Out: November 6, 2008 Due: November 20, 2008
Assume we know that P (h) = 0.2, P (t) = 0.4, and P (t|h) = 0.95. Show and explain your calculations for both part (a) and part (b).
(a) What is P (h|¬t)? (b) Given that the pizza is tasty, what’s the probability that it is hot?
(a) Show that fT H(3,7) has a linear decision surface over the 7 dimensional Boolean cube. (b) Assume that you are given data sampled according to the uniform distribution over the Boolean cube { 0 , 1 }^7 and labeled according to fT H(3,7). Use na¨ıve Bayes to learn a hypothesis that predicts these labels. What is the hypothesis generated by the na¨ıve Bayes algorithm? (You may assume that you have seen all the data required to get accurate estimates of the probabilities). (c) Show that the hypothesis produced in (b) does not represent this function. (d) Are the na¨ıve Bayes assumptions satisfied by fT H(3,7)? Justify your answer.
On the course website, you will find a collection of examples in a boolean feature space generated from sentences containing either the word your or the word you’re in articles from The Wall Street Journal. The raw sentences from which these examples were extracted are also there. The given feature vector files contains one example per line. Each example vector is a comma separated list of feature ID numbers with a ‘:’ at the end of the example. The first feature ID represents the target label (0 or 1), and all the others represent a unique feature that is active for that example. The features represent small conjunctions of the words and their parts of speech; there are 2738 features in total. There are 750 examples for training and 187 for testing.
We want you to compare two classifiers as they learn this task with either full training data, or starting with some small subset of training data. The two classifiers to compare are Na¨ıve Bayes and Perceptron. You are free to implement these classifiers however you want, either using your own implementation, some library or tool kit you are familiar with, or LBJ. If you choose to use LBJ some base files to assist you setting up and parsing the example files will be provided on the course website. No matter what method you choose, make sure to detail the setting of parameters, how and why they were chosen, and any other choices in the exact implementation of the two learning algorithms in your writeup.
an experiment that would test this hypothesis and possibly confirm or deny it. You do not need to run this experiment, but are encouraged to.
mkdir jdoe-hw mv *.java README jdoe-hw gtar zcvf jdoe-hw5.tar.gz jdoe-hw