Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Homework 1 for Machine Learning | CS 446, Assignments of Computer Science

University of Illinois - Urbana-Champaign Computer Science

Material Type: Assignment; Class: Machine Learning; Subject: Computer Science; University: University of Illinois - Urbana-Champaign; Term: Unknown 1989;

Typology: Assignments

Pre 2010

Uploaded on 03/16/2009

koofers-user-98d 🇺🇸

9 documents

1 / 5

This page cannot be seen from the preview

Don't miss anything!

CS446 Homework 1

•Due Wednesday, September 20th. Please by all means hand your hardcopy

to the TA at the beginning of the class when you enter the classroom (since

the TA will not be available after the class).

•Late homework should be dropped in the box in 3332 SC. Slide under the

door if it is not open.

•The file for problem 3 is also due on the same day. It should be sent by

email as an attachment to [email protected]. You will receive a

confirmation email within 24 hours; if not, contact the TA immediately.

•Feel free to talk to your classmates about the homework. We are more con-

cerned that you learn how to solve the problem than that you demonstrate

that you solved it entirely on your own. You should, however, write down

your solution yourself and master the material to solve similar problems

unaided. Keep the solution brief and clear.

•Please, no handwritten solutions. Be sure your name appears on the top

of each page and that your pages are stapled together.

•Please present your algorithms in both pseudocode and English. That

is, give a precise formulation of your algorithm as pseudocode and also

explain in one or two concise paragraphs what your algorithm does. Be

aware that pseudocode is much simpler and more abstract than real code.

Take a look at the textbook pseudocode (e.g. Table 2.5 on page 33) to get

an idea about the appropriate level of abstraction.

1. [Representing Boolean Functions—10 points] (Based on Mitchell, exercise

3.1) Give decision trees to represent the following Boolean functions:

(a) ¬A∨ ¬B∨C[3 points]

(b) A∧(¬B∨C) [3 points]

(c) (A⊕B)∨(C∧D) [4 points]

2. [Space Complexity of Decision Trees—15 points] Let xbe a vector of n

Boolean variables and krepresent the number of relevant variables in the

target function, (k≤n).

1

Discover Assignments of Computer Science University of Illinois - Urbana-Champaign

Partial preview of the text

Download Homework 1 for Machine Learning | CS 446 and more Assignments Computer Science in PDF only on Docsity!

CS446 Homework 1

Due Wednesday, September 20th. Please by all means hand your hardcopy to the TA at the beginning of the class when you enter the classroom (since the TA will not be available after the class).
Late homework should be dropped in the box in 3332 SC. Slide under the door if it is not open.
The file for problem 3 is also due on the same day. It should be sent by email as an attachment to [email protected]. You will receive a confirmation email within 24 hours; if not, contact the TA immediately.
Feel free to talk to your classmates about the homework. We are more con- cerned that you learn how to solve the problem than that you demonstrate that you solved it entirely on your own. You should, however, write down your solution yourself and master the material to solve similar problems unaided. Keep the solution brief and clear.
Please, no handwritten solutions. Be sure your name appears on the top of each page and that your pages are stapled together.
Please present your algorithms in both pseudocode and English. That is, give a precise formulation of your algorithm as pseudocode and also explain in one or two concise paragraphs what your algorithm does. Be aware that pseudocode is much simpler and more abstract than real code. Take a look at the textbook pseudocode (e.g. Table 2.5 on page 33) to get an idea about the appropriate level of abstraction.

[Representing Boolean Functions—10 points] (Based on Mitchell, exercise 3.1) Give decision trees to represent the following Boolean functions:

(a) ¬A ∨ ¬B ∨ C [3 points] (b) A ∧ (¬B ∨ C) [3 points] (c) (A ⊕ B) ∨ (C ∧ D) [4 points]

[Space Complexity of Decision Trees—15 points] Let x be a vector of n Boolean variables and k represent the number of relevant variables in the target function, (k ≤ n).

(a) Let Dk be the class of monotone k-disjunctions (disjunction on k of the n variables) over (x 1 , x 2 ,... , xn). State the size of the smallest possible consistent decision tree for Dk in terms of n and k. Describe the shape of the resulting tree. [3 points] (b) Let Ck be the class of monotone k-conjunctions (conjunction on k of the n variables) over (x 1 , x 2 ,... , xn). State the size of the smallest possible consistent decision tree for Ck in terms of n and k. Describe the shape of the resulting tree. [3 points] (c) Let Pk be the class of k-parity functions (parity function on k of the n variables) over (x 1 , x 2 ,... , xn). The (odd) parity function evaluates to 1 if there are an odd number of 1’s in the feature and evaluates to 0 if there are an even number of 1’s in the feature vector. State the size of the smallest possible consistent decision tree for Pk in terms of n and k. [3 points] (d) What do these results imply about the application of decision tree learning for learning functions in Dk, Ck, and Pk? [6 points]

[Implementing Decision Trees—75 points] In this programming assign- ment, you will implement a simple ID3-like decision tree learning algo- rithm and test in on a data set. We will use a data set similar to the one from the Badges Game. You may use the programming language of your choice. Please note that your actual implementation of the decision tree algorithm should be independent from the feature extraction mechanism as we may use it as part of other assignments. In particular, we may be requiring you to reuse this generic decision tree code for rules extraction and boosting later this semester. The data is available from the course web site: http://www.cs.uiuc.edu/class/fa06/cs446/ The data is given as a list names preceded by a label ‘+’ or ‘−’. It is split into two sets, Train and Test, consisting of 80% (235 examples) and 20% (59 examples) of the data, respectively. Altogether there are 134 positive examples (107 in Train) and 160 negative examples (128 in Train). Part of your assignment is to pre-process the data and extract features from it. Use 22 features. 20 of these features represent the characters in various positions in the two strings. For example, the feature X(i, j) stands for the ith character in the jth string (i = 1, 2 ,... , 10; j = 1, 2). You should clean the data so that only the first and last names of each person are used. Ignore middle initials and names. Do not distinguish between lowercase and uppercase letters. You will need an additional symbol to represent whitespace and you should pad each string to a length of 10. Note that some strings are longer than 10 characters, in which case you should just ignore all characters beyond the tenth. In this way, these 20 features have 27 possible values. The remaining two features describe the length of the first and second strings (which can be greater than 10). You should decide how to handle these two features.

(You can automate your runs however you like, this is just FYI) http://www.dartmouth.edu/∼rc/classes/ksh/print pages.shtml

It is sufficient to present your decision tree in this fashion:

feature 0 == x feature 1 == y feature 2 == z class = + feature 2 != z class = - feature 1 != y class = + feature 0 != x feature 1 == r class = + feature 1 != r class = -

(Of course, use more descriptive feature names here so your output is comprehensible.)

Your routine for testing the accuracy of a decision tree should print the results in the following form.

Test Cases True False

75 70 5

75 45 30

This says that:

70 test examples were predicted to belong to class + and actually did belong to class + (true positives).
5 examples were predicted to be in class + but were actually in class − (false positives).
45 test examples were predicted to belong to class − and actually did belong to class − (true negatives).
30 examples were predicted to be in class − but were actually in class
- (false negatives).

Finally, report the error rate. The error rate is the sum of the errors (here, 30 + 5) divided by the total number of examples (here, 150), in this case 23%.

What to turn in

Include your name and email in a file called README.
Create a file called SCRIPT containing a log of the runs used in the final discussion. This can be generated using the script(1) com- mand on Unix. If people are coding on Windows, cut and paste a log of the relevant runs in a file called SCRIPT.
Submit the report as hardcopy. Limit the text of the report to a reasonable length. Please include a printout of your source code.
Create the tarball so that it will unpack into a new directory named after your NetID. For example, if my NetID is jdoe, I copy all the source files, README, and SCRIPT into a directory called jdoe-hw1. Then archive: tar cvf jdoe-hw1.tar jdoe-hw Then verify: tar tvf jdoe-hw1.tar Exclude executables and object files from the submission.
The tarball will be submitted electronically by email as an attach- ment to [email protected].

You must include the following in your report:

For the complete Train set, display at least three decision trees, two limiting the depth as described above (trying different depths). For each one, run the evaluation routine and present the error informa- tion.
For the learning curve experiments, display a table of pairs (number of training examples, error rate).
If you opted to use other sets of features, compare the results.

Grading

Pre-process the data [10 points]
Implemetation to grow tree using the information gain splitting heuris- tic [30 points]
Display Tree [10 points]
Evaluation [20 points]
Report - explain implementation, language used, instructions to run [5 points]

Homework 1 for Machine Learning | CS 446, Assignments of Computer Science

Related documents

Partial preview of the text

Download Homework 1 for Machine Learning | CS 446 and more Assignments Computer Science in PDF only on Docsity!

CS446 Homework 1

What to turn in

Grading