Pattern Recognition and Machine Learning - Assignment 2 | CS 446, Assignments of Computer Science

Material Type: Assignment; Professor: Roth; Class: Machine Learning; Subject: Computer Science; University: University of Illinois - Urbana-Champaign; Term: Fall 2007;

Typology: Assignments

Pre 2010

Uploaded on 03/11/2009

koofers-user-m6e
koofers-user-m6e 🇺🇸

10 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS446: Pattern Recognition and Machine Learning Fall 2008
Problem Set 2
Handed Out: September 11, 2007 Due: September 25, 2007
Feel free to talk to your classmates about the homework. I am more concerned that you learn how to
solve the problem than that you demonstrate that you solved it entirely on your own. You should,
however, write down your solution yourself. Please try to keep the solution brief and clear.
Please, no handwritten solutions. Be sure your name appears on the top of each page.
Please present your algorithms in both pseudocode and English. That is, give a precise formulation of
your algorithm as pseudocode and also explain in one or two concise paragraphs what your algorithm
does. Be aware that pseudocode is much simpler and more abstract than real code. Take a look at
the textbook pseudocode (e.g. Table 2.5 on page 33) to get an idea about the appropriate level of
abstraction.
The homework is due at 4:00 pm on the due date. Email write-up and your code to the TA. Please
do NOT hand in a hard copy of your write-up. Please put <userid>CS446 hw2 submission” as the
subject line of the email when you submit your homework to [email protected].
1. [Representing Boolean Functions - 10 points] (Based on Mitchell, exercise 3.1)
Give decision trees to represent the following Boolean functions:
a. ¬ABC[3 points]
b. (A ¬B) ¬(CD) [3 points]
c. (AB)CA(¬BC)[4 points]
2. [Space Complexity of Decision Trees - 15 points]
Let xbe a vector of nBoolean variables and krepresent the number of relevant variables
in the target function, (kn).
a. Let Dkbe the class of k-disjunctions (disjunction on kof the nvariables or their
negation) over (x1, x2,···, xn). State the size of the smallest possible consistent
decision tree for Dkin terms of nand k. Describe the shape of the resulting tree.
[3 points]
b. Let Ckbe the class of k-conjunctions (conjunction on kof the nvariables or their
negation) over (x1, x2,···, xn). State the size of the smallest possible consistent
decision tree for Ckin terms of nand k. Describe the shape of the resulting tree.
[3 points]
c. Let Pkbe the class of k-parity functions (parity function on kof the nvariables)
over (x1, x2,···, xn). The (even) parity function evaluates to 1 if there are an even
number of 1’s in the feature and evaluates to 0 if there are an odd number of 1’s
in the feature vector. State the size of the smallest possible consistent decision
tree for Pkin terms of nand k. [3 points]
d. What do these results imply about the application of decision tree learning for
learning functions in Dk,Ck, and Pk? [6 points]
1
pf3
pf4
pf5

Partial preview of the text

Download Pattern Recognition and Machine Learning - Assignment 2 | CS 446 and more Assignments Computer Science in PDF only on Docsity!

CS446: Pattern Recognition and Machine Learning Fall 2008

Problem Set 2

Handed Out: September 11, 2007 Due: September 25, 2007

  • Feel free to talk to your classmates about the homework. I am more concerned that you learn how to solve the problem than that you demonstrate that you solved it entirely on your own. You should, however, write down your solution yourself. Please try to keep the solution brief and clear.
  • Please, no handwritten solutions. Be sure your name appears on the top of each page.
  • Please present your algorithms in both pseudocode and English. That is, give a precise formulation of your algorithm as pseudocode and also explain in one or two concise paragraphs what your algorithm does. Be aware that pseudocode is much simpler and more abstract than real code. Take a look at the textbook pseudocode (e.g. Table 2.5 on page 33) to get an idea about the appropriate level of abstraction.
  • The homework is due at 4:00 pm on the due date. Email write-up and your code to the TA. Please do NOT hand in a hard copy of your write-up. Please put “ CS446 hw2 submission” as the subject line of the email when you submit your homework to [email protected].
  1. [Representing Boolean Functions - 10 points] (Based on Mitchell, exercise 3.1) Give decision trees to represent the following Boolean functions:

a. ¬A ∨ B ∧ C [3 points] b. (A ∧ ¬B) ∨ ¬(C ∧ D) [3 points] c. (A ∨ B) ⊕ C ∨ A ⊕ (¬B ∧ C)[4 points]

  1. [Space Complexity of Decision Trees - 15 points] Let x be a vector of n Boolean variables and k represent the number of relevant variables in the target function, (k ≤ n).

a. Let Dk be the class of k-disjunctions (disjunction on k of the n variables or their negation) over (x 1 , x 2 , · · · , xn). State the size of the smallest possible consistent decision tree for Dk in terms of n and k. Describe the shape of the resulting tree. [3 points] b. Let Ck be the class of k-conjunctions (conjunction on k of the n variables or their negation) over (x 1 , x 2 , · · · , xn). State the size of the smallest possible consistent decision tree for Ck in terms of n and k. Describe the shape of the resulting tree. [3 points] c. Let Pk be the class of k-parity functions (parity function on k of the n variables) over (x 1 , x 2 , · · · , xn). The (even) parity function evaluates to 1 if there are an even number of 1’s in the feature and evaluates to 0 if there are an odd number of 1’s in the feature vector. State the size of the smallest possible consistent decision tree for Pk in terms of n and k. [3 points] d. What do these results imply about the application of decision tree learning for learning functions in Dk, Ck, and Pk? [6 points]

  1. [Implementing Decision Trees - 75 points] In this programming assignment, you will implement a simple ID3-like decision tree learning algorithm and test it on a data set. We will use a data set similar to the one from the Badges Game. You may use the programming language of your choice. The data is available from the course web site in a file called Badges2. It is given as a list names preceded by a label ’+’ or ’−’. Altogether there are 146 positive examples and 148 negative examples.

Your Program

Your program should perform the items listed below. Please note that your actual implementation of the decision tree algorithm should be independent from the feature extraction mechanism, as we may use it in other assignments. In particular, we may be requiring you to reuse this generic decision tree code for rules extraction and boosting later this semester.

  • Pre-process the data For the purposes of this assignment, each example presented to the learning algo- rithm should take the form label string1 string2, where the two strings represent the first and last names, and neither contains any character other than ’a’ through ’z’. Therefore, your program’s first task is to clean the data so that only the first and last names of each person are used. Ignore things like middle initials and names, remove characters that are not letters, and make uppercase letters lower- case.
  • Perform feature extraction Next, you need to extract features from the cleaned data. You will generate 22 features. 20 of these features represent the characters in various positions in the two strings. Specifically, the feature X(i, j) stands for the ith^ character in the jth^ string (i = 1,... , 10; j = 1, 2). Since not all names will contain as many as 10 letters, the X(i, j) feature will need an additional symbol to represent a non- existent character. In this way, these 20 features each have 27 possible values. Also, some strings will be longer than 10 letters, in which case you should just ignore all letters beyond the tenth. The remaining two features describe the length of the first and second strings (which can be greater than 10). You should decide how to handle these two features.
  • Grow a decision tree We will ask you to test two different splitting heuristics to build a decision tree from the training data. The first splitting criteria to test is the standard ID information gain heuristic. The second splitting criteria is intended to spread the examples amongst the branches of the tree evenly, without taking into considera- tion the distribution of labels. As a heuristic for that, at any node in the tree we want to choose the split that gives the most uniform distribution of examples to its descendants. To test for this first find what the expected number of examples

This says that:

  • 70 test examples were predicted to belong to class + and actually did belong to class + (true positives).
  • 5 examples were predicted to be in class + but were actually in class − (false positives).
  • 45 test examples were predicted to belong to class − and actually did belong to class − (true negatives).
  • 30 examples were predicted to be in class − but were actually in class + (false negatives). Finally, report the error rate. The error rate is the sum of the errors (here, 30+5) divided by the total number of examples (here, 150), in this case 23%.

Before you run your program on the data, you may wish to test it on a small set of examples for which you can construct the tree yourself (e.g., the data from Mitchell, exercise 3.2) for debugging purposes. You may also consider testing out your decision tree on the original Badges Game data, which is found in the handouts section of the website.

Evaluation

Once your program can perform the operations detailed above, evaluate its performance using 5-fold cross validation, as described below.

  • As a minimum, experiment with tree depths of 0 (meaning unlimited) and 3 using both information gain and uniform splitting criteria. This yields a total of 4 different “algorithms” to compare, where an algorithm A is defined as

A ≡ (splittingHeuristic, depth, f eatureSet)

Remember, this is the minimum. Feel free to experiment with more tree depths and feature sets. You don’t need to try every combination of splitting criteria, depth and feature set, but you need to have the 4 algorithms mentioned above.

  • For each algorithm you experiment with, run 5-fold cross validation on the given data set. This will determine an estimate for the algorithm’s performance pA on unseen examples. Calculate the 99% confidence interval of this estimate. You will need a table of tn,α values to do this computation - one is available on the course website.
  • Rank your algorithms in decreasing order of the performance estimate pA. For each pair of consecutive algorithms in the ranking, show that the difference be- tween the two algorithms’ performances is or is not statistically significant.

Hand In:

  • A hard copy report Create a report detailing your experiments. For each algorithm in order of the ranking you created, describe the feature set and indicate the tree depth. Give the value of pA with its 99% confidence interval. Indicate when the difference between two consecutive algorithms in the ranking is statistically significant. You may provide these numbers in a table or in a graph with error bars. In the end, your conclusion will be that a particular algorithm (or set of algo- rithms) performed the best. State the assumptions that this conclusion is based on briefly. Turn in a hard copy of this report along with the solutions to the other two problems.
  • Your code and tree print-outs, electronically Hand in all the code you wrote. Also, for each algorithm you experimented with, include the print out for the tree created during cross validation that had the best performance. (That print out should contain both the tree and the performance table as described earlier.) Create a README file that contains your name and email address, a description of which algorithms correspond to which tree files, and enough information for someone to compile your code and run it. Place all files including the tree files and README in a directory called userID-hw2. Exclude executables and object files. Pack the files together so that when they unpack, the userID-hw2 directory is created with all your files in it. The name of the packed file should be userID-hw2.zip or userID-hw2.tar.gz. For example, if your user ID is jdoe and you wrote your solution in Java, this might be accomplished in unix as follows:

mkdir jdoe-hw mv *.java *.tree README jdoe-hw gtar zcvf jdoe-hw2.tar.gz jdoe-hw

Submit this file on via email to the TA.

Grading

  • Pre-process the Data [10 points]
  • Implementation of tree growing algorithms [30 points]
  • Display Tree [10 points]
  • Evaluation [20 points]
  • Other report elements (explanation of implementation and experiments, conclu- sions, etc.) [5 points]