Homework 1 for Machine Learning | CS 446, Assignments of Computer Science

Material Type: Assignment; Class: Machine Learning; Subject: Computer Science; University: University of Illinois - Urbana-Champaign; Term: Unknown 1989;

Typology: Assignments

Pre 2010

Uploaded on 03/16/2009

koofers-user-98d
koofers-user-98d 🇺🇸

9 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS446 Homework 1
Due Wednesday, September 20th. Please by all means hand your hardcopy
to the TA at the beginning of the class when you enter the classroom (since
the TA will not be available after the class).
Late homework should be dropped in the box in 3332 SC. Slide under the
door if it is not open.
The file for problem 3 is also due on the same day. It should be sent by
email as an attachment to [email protected]. You will receive a
confirmation email within 24 hours; if not, contact the TA immediately.
Feel free to talk to your classmates about the homework. We are more con-
cerned that you learn how to solve the problem than that you demonstrate
that you solved it entirely on your own. You should, however, write down
your solution yourself and master the material to solve similar problems
unaided. Keep the solution brief and clear.
Please, no handwritten solutions. Be sure your name appears on the top
of each page and that your pages are stapled together.
Please present your algorithms in both pseudocode and English. That
is, give a precise formulation of your algorithm as pseudocode and also
explain in one or two concise paragraphs what your algorithm does. Be
aware that pseudocode is much simpler and more abstract than real code.
Take a look at the textbook pseudocode (e.g. Table 2.5 on page 33) to get
an idea about the appropriate level of abstraction.
1. [Representing Boolean Functions—10 points] (Based on Mitchell, exercise
3.1) Give decision trees to represent the following Boolean functions:
(a) ¬A ¬BC[3 points]
(b) A(¬BC) [3 points]
(c) (AB)(CD) [4 points]
2. [Space Complexity of Decision Trees—15 points] Let xbe a vector of n
Boolean variables and krepresent the number of relevant variables in the
target function, (kn).
1
pf3
pf4
pf5

Partial preview of the text

Download Homework 1 for Machine Learning | CS 446 and more Assignments Computer Science in PDF only on Docsity!

CS446 Homework 1

  • Due Wednesday, September 20th. Please by all means hand your hardcopy to the TA at the beginning of the class when you enter the classroom (since the TA will not be available after the class).
  • Late homework should be dropped in the box in 3332 SC. Slide under the door if it is not open.
  • The file for problem 3 is also due on the same day. It should be sent by email as an attachment to [email protected]. You will receive a confirmation email within 24 hours; if not, contact the TA immediately.
  • Feel free to talk to your classmates about the homework. We are more con- cerned that you learn how to solve the problem than that you demonstrate that you solved it entirely on your own. You should, however, write down your solution yourself and master the material to solve similar problems unaided. Keep the solution brief and clear.
  • Please, no handwritten solutions. Be sure your name appears on the top of each page and that your pages are stapled together.
  • Please present your algorithms in both pseudocode and English. That is, give a precise formulation of your algorithm as pseudocode and also explain in one or two concise paragraphs what your algorithm does. Be aware that pseudocode is much simpler and more abstract than real code. Take a look at the textbook pseudocode (e.g. Table 2.5 on page 33) to get an idea about the appropriate level of abstraction.
  1. [Representing Boolean Functions—10 points] (Based on Mitchell, exercise 3.1) Give decision trees to represent the following Boolean functions:

(a) ¬A ∨ ¬B ∨ C [3 points] (b) A ∧ (¬B ∨ C) [3 points] (c) (A ⊕ B) ∨ (C ∧ D) [4 points]

  1. [Space Complexity of Decision Trees—15 points] Let x be a vector of n Boolean variables and k represent the number of relevant variables in the target function, (k ≤ n).

(a) Let Dk be the class of monotone k-disjunctions (disjunction on k of the n variables) over (x 1 , x 2 ,... , xn). State the size of the smallest possible consistent decision tree for Dk in terms of n and k. Describe the shape of the resulting tree. [3 points] (b) Let Ck be the class of monotone k-conjunctions (conjunction on k of the n variables) over (x 1 , x 2 ,... , xn). State the size of the smallest possible consistent decision tree for Ck in terms of n and k. Describe the shape of the resulting tree. [3 points] (c) Let Pk be the class of k-parity functions (parity function on k of the n variables) over (x 1 , x 2 ,... , xn). The (odd) parity function evaluates to 1 if there are an odd number of 1’s in the feature and evaluates to 0 if there are an even number of 1’s in the feature vector. State the size of the smallest possible consistent decision tree for Pk in terms of n and k. [3 points] (d) What do these results imply about the application of decision tree learning for learning functions in Dk, Ck, and Pk? [6 points]

  1. [Implementing Decision Trees—75 points] In this programming assign- ment, you will implement a simple ID3-like decision tree learning algo- rithm and test in on a data set. We will use a data set similar to the one from the Badges Game. You may use the programming language of your choice. Please note that your actual implementation of the decision tree algorithm should be independent from the feature extraction mechanism as we may use it as part of other assignments. In particular, we may be requiring you to reuse this generic decision tree code for rules extraction and boosting later this semester. The data is available from the course web site: http://www.cs.uiuc.edu/class/fa06/cs446/ The data is given as a list names preceded by a label ‘+’ or ‘−’. It is split into two sets, Train and Test, consisting of 80% (235 examples) and 20% (59 examples) of the data, respectively. Altogether there are 134 positive examples (107 in Train) and 160 negative examples (128 in Train). Part of your assignment is to pre-process the data and extract features from it. Use 22 features. 20 of these features represent the characters in various positions in the two strings. For example, the feature X(i, j) stands for the ith character in the jth string (i = 1, 2 ,... , 10; j = 1, 2). You should clean the data so that only the first and last names of each person are used. Ignore middle initials and names. Do not distinguish between lowercase and uppercase letters. You will need an additional symbol to represent whitespace and you should pad each string to a length of 10. Note that some strings are longer than 10 characters, in which case you should just ignore all characters beyond the tenth. In this way, these 20 features have 27 possible values. The remaining two features describe the length of the first and second strings (which can be greater than 10). You should decide how to handle these two features.

(You can automate your runs however you like, this is just FYI) http://www.dartmouth.edu/∼rc/classes/ksh/print pages.shtml

It is sufficient to present your decision tree in this fashion:

feature 0 == x feature 1 == y feature 2 == z class = + feature 2 != z class = - feature 1 != y class = + feature 0 != x feature 1 == r class = + feature 1 != r class = -

(Of course, use more descriptive feature names here so your output is comprehensible.)

Your routine for testing the accuracy of a decision tree should print the results in the following form.

Test Cases True False

  • 75 70 5
  • 75 45 30

This says that:

  • 70 test examples were predicted to belong to class + and actually did belong to class + (true positives).
  • 5 examples were predicted to be in class + but were actually in class − (false positives).
  • 45 test examples were predicted to belong to class − and actually did belong to class − (true negatives).
  • 30 examples were predicted to be in class − but were actually in class
    • (false negatives).

Finally, report the error rate. The error rate is the sum of the errors (here, 30 + 5) divided by the total number of examples (here, 150), in this case 23%.

What to turn in

  • Include your name and email in a file called README.
  • Create a file called SCRIPT containing a log of the runs used in the final discussion. This can be generated using the script(1) com- mand on Unix. If people are coding on Windows, cut and paste a log of the relevant runs in a file called SCRIPT.
  • Submit the report as hardcopy. Limit the text of the report to a reasonable length. Please include a printout of your source code.
  • Create the tarball so that it will unpack into a new directory named after your NetID. For example, if my NetID is jdoe, I copy all the source files, README, and SCRIPT into a directory called jdoe-hw1. Then archive: tar cvf jdoe-hw1.tar jdoe-hw Then verify: tar tvf jdoe-hw1.tar Exclude executables and object files from the submission.
  • The tarball will be submitted electronically by email as an attach- ment to [email protected].

You must include the following in your report:

  • For the complete Train set, display at least three decision trees, two limiting the depth as described above (trying different depths). For each one, run the evaluation routine and present the error informa- tion.
  • For the learning curve experiments, display a table of pairs (number of training examples, error rate).
  • If you opted to use other sets of features, compare the results.

Grading

  • Pre-process the data [10 points]
  • Implemetation to grow tree using the information gain splitting heuris- tic [30 points]
  • Display Tree [10 points]
  • Evaluation [20 points]
  • Report - explain implementation, language used, instructions to run [5 points]