Download Assignment II Problems - Machine Learning | CS 446 and more Assignments Computer Science in PDF only on Docsity!
CS446 Machine Problem 2
- Due Wednesday, Dec 6th. Please by all means hand your hardcopy to the TA at the beginning of the class when you enter the classroom.
- Late homework should be dropped in the box in 3332 SC. Slide under the door if it is not open.
- The source code of your program is also due on the same day. It should be sent by email in a tarball as an attachment to [email protected]. You will receive a confirmation email within 24 hours; if not, contact the TA immediately.
- Feel free to talk to your classmates about the homework. We are more con- cerned that you learn how to solve the problem than that you demonstrate that you solved it entirely on your own. You should, however, write down your solution yourself and master the material to solve similar problems unaided. Keep the solution brief and clear.
- Please, no handwritten solutions. Be sure your name appears on the top of each page and that your pages are stapled together.
- Please in addition present your algorithms in both pseudocode and En- glish. That is, give a precise formulation of your algorithm as pseudocode and also explain in one or two concise paragraphs what your algorithm does. Be aware that pseudocode is much simpler and more abstract than real code. Take a look at the textbook pseudocode (e.g. Table 2.5 on page
- to get an idea about the appropriate level of abstraction.
- You may use the programming language of your choice.
- You may want to refer to Mitchell’s new book chapter and Ng and Jordan’s paper as mentioned in the slides.
- [Implementing the Na¨ıve Bayes classifier—20 points]
The Na¨ıve Bayes classifier approximates the joint probability p(x, y) (by estimating p(xi|y) and p(y) and assuming the Na¨ıve Bayes conditional independence) and uses the Bayes rule to calculate p(y|x). It then picks the most likely y as the classification label. Please implement the Na¨ıve Bayes classifier. You should make use of the smoothing technique to avoid the problem of zero probability.
In short, your learner should, given the training data:
- Estimate ˆp(xi|y) and ˆp(y) of the probabilities p(xi|y) and p(y) as
pˆ(xi = k|y = v) =
#samples{xi = k, y = v} + l #samples{y = v} + ml
and pˆ(y = v) =
#samples{y = v} + l #samples + ml
You can choose l = 1. Please refer to the next problem for the farmat of the input data file. You can assume that the features are binary, and that the class label is binary, in which case m = 2.
- Store the above probabilities in a table.
Your classifier should, for each data point in the testing data:
∏n i=1 ˆp(xi|y^ = 1)) ˆp(y^ = 1) (
∏n i=1 ˆp(xi|y^ = 0)) ˆp(y^ = 0)
- Output 1 as its classification label if the above quantity is positive. Output 0 as its classification label otherwise.
In your report please include
- The source code of your program.
- Pseudo code and English sentences to illustrate your algorithm and ideas.
In your tarball please include
- The source code of your program.
- A README_nb file that details how to compile and run your program.
- [Evaluating the data set using the Na¨ıve Bayes classifier—20 points]
The training and testing data sets will be available on the class website. The data set will be of the following form:
- Each data point is in a separate line.
- Each line begins with a class label, followed by one or more spaces, followed by the list of features separated by one or more spaces.
- Each feature begins with the index of the feature, followed by a colon (:), followed by the value of the feature.
- Unlisted features have values zero (0).
and p(y = 1|x) =
exp(w 0 +
∑n i=1 wixi) 1 + exp(w 0 +
∑n i=1 wixi)
Taking the log of the ratio of the two, the classification rule assigns label 1 if w 0 +
∑n i=1 wixi^ >^ 0, and assigns label 0 otherwise. The update rule for w for the gradient ascent method is
wi ← wi + η
l
xli
yl^ − pˆ(yl^ = 1|x, w)
where the superscript l denotes the lth training example, the subscript i denotes the index of the feature (and the weight), η is a small constant, and ˆp is the logistic regression prediction given by (1) and (2). Please refer to Mitchell’s new book chapter for derivation details. Your learner should, given the training data:
- Use the gradient ascent method to find the weight vector w.
- Store the above weight vector.
Note that if you add one addition feature x 0 = 1 to every example, w 0 can be handled elegantly. Your classifier should, for each data point in the testing data:
∑n i=1 wixi.
- Output 1 as its classification label if the above quantity is positive. Output 0 as its classification label otherwise.
In your report please include
- The source code of your program.
- Pseudo code and English sentences to illustrate your algorithm and ideas.
In your tarball please include
- The source code of your program.
- A README_lr file that details how to compile and run your program.
- [Evaluating the data set using the Logistic Regression classifier—20 points] The training and testing data sets are the same as those for problem 2, available on the class website. The training set is divided into varying sizes of subsets.
- Train a classifier on each training subset, and test each classifier on the testing set.
- Plot a learning curve of testing accuracy against size of training set, similar to the one in the first homework.
- Get the whole data set. Let the training set size be s. For each s ∈ { 5 , 10 , 15 ,... , 100 }, shuffle the data set and random split into a s-example training set and the remaining as the testing set. Train a classifier on the training set and test on the testing set. Repeat at least 20 times for each size of the training set, and compute the averaged testing accuracies for each training set size.
- Plot another learning curve for the averaged testing accuracy against size of the training set.
- Compare the two curves.
In your report please include
- The testing accuracy on the testing data set for each classifier you trained on the different subset of the training data set given.
- The averaged testing accuracy for each size of the training set ob- tained by your random split.
- The two learning curves.
In your tarball please include
- A SCRIPT_lr file that contains the log (record) of how you run your program. It should be a typescript of everything printed on your terminal, similar to the one in problem 2.
- [Comparing the results of the two learners—20 points]
In your report please compare and discuss your findings given by the two classification approaches, including at least, but not limited to, how each approach performs given different sizes of training sets, how the testing accuracy changes given larger training sets, etc.