Problem Set 5 for Pattern Recognition and Machine Learning - Assignment | CS 446, Assignments of Computer Science

Material Type: Assignment; Professor: Roth; Class: Machine Learning; Subject: Computer Science; University: University of Illinois - Urbana-Champaign; Term: Fall 2008;

Typology: Assignments

Pre 2010

Uploaded on 03/10/2009

koofers-user-ta5
koofers-user-ta5 🇺🇸

8 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS446: Pattern Recognition and Machine Learning Fall 2008
Problem Set 5
Handed Out: November 6, 2008 Due: November 20, 2008
Feel free to talk to other members of the class in doing the homework. I am more concerned that
you learn how to solve the problem than that you demonstrate that you solved it entirely on your
own. You should, however, write down your solution yourself. Please try to keep the solution brief
and clear.
Feel free to send me email or come to ask questions regarding this handout or conceptual issues.
Please, no handwritten solutions.
The homework is due at 4:00 pm on the due date. Email your write-up to the TA. Please put
<userid>CS446 hw5 submission” as the subject line of the email when you submit your homework
1. [Probability - 10 points]
Define the following boolean, random variables:
h= pizza is hot
t= pizza is tasty
Assume we know that P(h) = 0.2, P(t) = 0.4, and P(t|h) = 0.95. Show and explain
your calculations for both part (a) and part (b).
(a) What is P(ht)?
(b) Given that the pizza is tasty, what’s the probability that it is hot?
2. [Learning Threshold Functions - 30 points]
Consider the Boolean function fT H (3,7). This is a threshold function defined on the 7
dimensional Boolean cube as follows: given an instance x,fT H(3,7) (x) = 1 if and only
if 3 or more of x’s components are 1.
(a) Show that fTH (3,7) has a linear decision surface over the 7 dimensional Boolean
cube.
(b) Assume that you are given data sampled according to the uniform distribution
over the Boolean cube {0,1}7and labeled according to fT H (3,7). Use na¨ıve Bayes
to learn a hypothesis that predicts these labels. What is the hypothesis generated
by the na¨ıve Bayes algorithm? (You may assume that you have seen all the data
required to get accurate estimates of the probabilities).
(c) Show that the hypothesis produced in (b) does not represent this function.
(d) Are the na¨ıve Bayes assumptions satisfied by fT H(3,7) ? Justify your answer.
1
pf3
pf4

Partial preview of the text

Download Problem Set 5 for Pattern Recognition and Machine Learning - Assignment | CS 446 and more Assignments Computer Science in PDF only on Docsity!

CS446: Pattern Recognition and Machine Learning Fall 2008

Problem Set 5

Handed Out: November 6, 2008 Due: November 20, 2008

  • Feel free to talk to other members of the class in doing the homework. I am more concerned that you learn how to solve the problem than that you demonstrate that you solved it entirely on your own. You should, however, write down your solution yourself. Please try to keep the solution brief and clear.
  • Feel free to send me email or come to ask questions regarding this handout or conceptual issues.
  • Please, no handwritten solutions.
  • The homework is due at 4:00 pm on the due date. Email your write-up to the TA. Please put “ CS446 hw5 submission” as the subject line of the email when you submit your homework to [email protected].
  1. [Probability - 10 points] Define the following boolean, random variables:
  • h = pizza is hot
  • t = pizza is tasty

Assume we know that P (h) = 0.2, P (t) = 0.4, and P (t|h) = 0.95. Show and explain your calculations for both part (a) and part (b).

(a) What is P (h|¬t)? (b) Given that the pizza is tasty, what’s the probability that it is hot?

  1. [Learning Threshold Functions - 30 points] Consider the Boolean function fT H(3,7). This is a threshold function defined on the 7 dimensional Boolean cube as follows: given an instance x, fT H(3,7)(x) = 1 if and only if 3 or more of x’s components are 1.

(a) Show that fT H(3,7) has a linear decision surface over the 7 dimensional Boolean cube. (b) Assume that you are given data sampled according to the uniform distribution over the Boolean cube { 0 , 1 }^7 and labeled according to fT H(3,7). Use na¨ıve Bayes to learn a hypothesis that predicts these labels. What is the hypothesis generated by the na¨ıve Bayes algorithm? (You may assume that you have seen all the data required to get accurate estimates of the probabilities). (c) Show that the hypothesis produced in (b) does not represent this function. (d) Are the na¨ıve Bayes assumptions satisfied by fT H(3,7)? Justify your answer.

  1. In this problem, you will test the Na¨ıve Bayes learning algorithm and study its perfor- mance on a real world task: context sensitive spelling correction. Additionally you will be experimenting with different training methods for cases with limited labeled data.

The Data

On the course website, you will find a collection of examples in a boolean feature space generated from sentences containing either the word your or the word you’re in articles from The Wall Street Journal. The raw sentences from which these examples were extracted are also there. The given feature vector files contains one example per line. Each example vector is a comma separated list of feature ID numbers with a ‘:’ at the end of the example. The first feature ID represents the target label (0 or 1), and all the others represent a unique feature that is active for that example. The features represent small conjunctions of the words and their parts of speech; there are 2738 features in total. There are 750 examples for training and 187 for testing.

The Classifiers

We want you to compare two classifiers as they learn this task with either full training data, or starting with some small subset of training data. The two classifiers to compare are Na¨ıve Bayes and Perceptron. You are free to implement these classifiers however you want, either using your own implementation, some library or tool kit you are familiar with, or LBJ. If you choose to use LBJ some base files to assist you setting up and parsing the example files will be provided on the course website. No matter what method you choose, make sure to detail the setting of parameters, how and why they were chosen, and any other choices in the exact implementation of the two learning algorithms in your writeup.

an experiment that would test this hypothesis and possibly confirm or deny it. You do not need to run this experiment, but are encouraged to.

Submission

  • Submit electronically (by email) a report containing your answers to questions 1 and 2 and describing your design choices for problem 3. Include your graphs and all the measurements we asked for, and discuss the results.
  • Submit electronically (by email) all of your source code and a README file that describes everything one needs to know to compile and run it. Place all files including README in a directory called userID-hw5. Exclude executables and object files. Pack the files together so that when they unpack, the userID-hw directory is created with all your files in it. The name of the packed file should be userID-hw5.zip or userID-hw5.tar.gz. For example, if your user ID is jdoe and you wrote your solution in Java, this might be accomplished in unix as follows:

mkdir jdoe-hw mv *.java README jdoe-hw gtar zcvf jdoe-hw5.tar.gz jdoe-hw