Solved Problem Set - Machine Learning | CS 446, Assignments of Computer Science

Material Type: Assignment; Professor: Roth; Class: Machine Learning; Subject: Computer Science; University: University of Illinois - Urbana-Champaign; Term: Fall 2008;

Typology: Assignments

Pre 2010

Uploaded on 03/16/2009

koofers-user-ona
koofers-user-ona 🇺🇸

9 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS446: Pattern Recognition and Machine Learning Fall 2008
Problem Set 1
Handed Out: September 2, 2008 Due: September 11, 2008
Feel free to talk to other members of the class in doing the homework. I am more concerned that
you learn how to solve the problem than that you demonstrate that you solved it entirely on your
own. You should, however, write down your solution yourself. Please try to keep the solution brief
and clear.
Feel free to send me email or come to ask questions.
Please, no handwritten solutions.
Please present your algorithms in both pseudocode and English. That is, give a precise formulation of
your algorithm as pseudocode and also explain in one or two concise paragraphs what your algorithm
does. Be aware that pseudocode is much simpler and more abstract than real code. Take a look at
the textbook pseudocode (e.g. Table 2.5 on page 33) to get an idea about the appropriate level of
abstraction.
The homework is due at 4:00 pm on the due date. Please email an electronic copy of your write-up
and any code to the TA: [email protected].
1. [Learning as a Search - 20 points] (Based on Mitchell, exercise 2.4)
Let Xbe an instance space consisting of all of the points (x, y) in the plane with integer
(not real) coordinates.
-10
-8
-6
-4
-2
0
2
4
6
8
10
-10 -8 -6 -4 -2 0 2 4 6 8 10
Let Hbe a set of hypotheses consisting of all the origin centered “bagels”. Formally,
hypotheses are of the form (a < px2+y2< b), where a < b, and a, b Z(the set of
non-negative integers).
a. Consider the version space with respect to the set of positive (+) and negative
(×) training examples shown above:
+ : (5,5),(6,4),(3,4),(2,4)
1
pf3
pf4
pf5

Partial preview of the text

Download Solved Problem Set - Machine Learning | CS 446 and more Assignments Computer Science in PDF only on Docsity!

CS446: Pattern Recognition and Machine Learning Fall 2008

Problem Set 1

Handed Out: September 2, 2008 Due: September 11, 2008

  • Feel free to talk to other members of the class in doing the homework. I am more concerned that you learn how to solve the problem than that you demonstrate that you solved it entirely on your own. You should, however, write down your solution yourself. Please try to keep the solution brief and clear.
  • Feel free to send me email or come to ask questions.
  • Please, no handwritten solutions.
  • Please present your algorithms in both pseudocode and English. That is, give a precise formulation of your algorithm as pseudocode and also explain in one or two concise paragraphs what your algorithm does. Be aware that pseudocode is much simpler and more abstract than real code. Take a look at the textbook pseudocode (e.g. Table 2.5 on page 33) to get an idea about the appropriate level of abstraction.
  • The homework is due at 4:00 pm on the due date. Please email an electronic copy of your write-up and any code to the TA: [email protected].
  1. [Learning as a Search - 20 points] (Based on Mitchell, exercise 2.4) Let X be an instance space consisting of all of the points (x, y) in the plane with integer (not real) coordinates.

Let H be a set of hypotheses consisting of all the origin centered “bagels”. Formally, hypotheses are of the form (a <

x^2 + y^2 < b), where a < b, and a, b ∈ Z (the set of non-negative integers).

a. Consider the version space with respect to the set of positive (+) and negative (×) training examples shown above:

  • : (5, 5), (− 6 , 4), (− 3 , −4), (2, −4)

× : (− 1 , 2), (− 2 , 0), (6, 7), (8, −8)

What is the S boundary set of the version space in this case? Write out the hypotheses in the form given above and draw them in on the diagram. [5 points] b. What is the G boundary set of this version space? Write out the hypotheses and draw them in. [5 points] c. Suppose the learner may now suggest a new (x, y) instance and ask the trainer for its classification. Suggest a query guaranteed to reduce the size of the version space, regardless of how the trainer classifies it. Suggest one that will not reduce the size of the version space, regardless of how the trainer classifies it. [5 points] d. There are many other possible hypothesis spaces that can explain this data. Pro- pose one alternate hypothesis space and explicitly define its parameters as we did with a, b in the bagel space above. Choose an instance from your hypothesis space that seperates the given data. Write out this hypothesis and sketch it. What can you say about the number of parameters in your space compared to the bagel space and the given number of training points. [5 points]

  1. [Learning Conjunction - 40 points] (Based on Mitchell, exercise 2.9) Consider a learning problem where each instance is a Boolean vector over n variables (x 1 ,... , xn) and is labeled either positive (1) or negative (-1). Thus a typical instance would be

(1, 0 ,... , 1) = (x 1 = 1, x 2 = 0,... , xn = 1)

Now consider a hypothesis space H consisting of all conjunctions of constraints over these variables. For example, a typical hypothesis would be

(x 1 = 1 ∧ x 5 = 0 ∧ x 7 = 1)

For example, an instance (1, 0 , 1 , 0 , 0 , 1 , 1 ,... , 0) will be labeled as positive(1) and (0, 0 , 1 , 0 , 0 , 1 , 0 ,... , 1) as negative(-1) according to the above hypothesis.

a. Propose an algorithm that accepts a sequence of labeled training examples and outputs a hypothesis that is consistent with all the training examples, if one exists. If there are no consistent hypotheses, your algorithm should indicate as such and halt. [10 points] b. Prove the correctness of your algorithm. That is, argue that the hypothesis it produces labels all the training examples correctly. [10 points] c. Analyze the complexity of your algorithm as a function of the number of variables (n) and the number of training examples (m). Your algorithm should run in time polynomial in n and m. [10 points] d. Assume that there exists a conjunction in H that is consistent with all of the training examples. Now, you are given a new, unlabeled example that is not part of your training set. What can you say about the ability of the hypothesis derived by your algorithm in (c) to produce the correct label for this example? Consider interesting subcases in order to say as much as you can about this. [10 points]

Note that inequality (4) of the LP is equivalent to 2 x 1 + x 3 < 4.

~c =

 ~b =

 (^) ~x =

x 1 x 2 x 3

A =

To solve this program using Matlab:

c = [0.1; 0.05; 0.25]; A = [3 1 2; 1 3 2; -2 0 -1]; b = [7; 9; -4]; lowerBound = zeros(3, 1); % this constrains x >= 0 [x, z] = linprog(c, -A, -b, [], [], lowerBound)

The results of this linear program show that to meet your nutritional requirements at a minimum cost, you should eat 1.5 eggs, 2.5 servings of pasta, and no servings of yogurt at a cost of $0.275.

We can use this framework to define the problem of learning a linear discriminant function^3. Let x~ 1 , ~x 2 ,... , ~xm (x~i ∈ Rn) represent n-dimensional samples, ~y ∈ {− 1 , 1 }m be a m-by-1 vector represting the labels of each of the m respective samples, w~ ∈ Rn be a n-by-1 vector representing the weights of the linear discriminant function, and θ be the threshold value. We define ~x to be a positive example when w~T^ ~x ≥ θ and negative when w~T^ ~x < θ for negative examples. With the introduction of an artificial variable δ we can say that we want to minimize the value of δ, such that

yi · w~T^ x~i + δ ≥ yi · θ.

Therefore, we can use the linear program formulation stated previously.

z = δ → min

yi · w~T^ x~i + δ ≥ yi · θ ∀i = 1,... , m δ ≥ 0

Note also that wi∀i and θ are unconstrained. Given that yi ∈ {− 1 , 1 }, you should be able to convince yourself that the above formulation works for both positive and negative examples. It is similar to the fat intake constraint in the example problem. (^3) Note that this discussion closely parallels the linear programming representation found in Duda, Hart,

and Stork, Pattern Classification

a. [7 points] We have given you a few pieces of matlab code. The first file, LPdisc given.m contains some code to read files and plot data points and lines. You should use this given code as a starting point for your own solution. Generate a dataset by hand which represents a monotone conjunction over two variables. You should constrain the values of your variables to be 0 or 1. Take a look at hw1 data.dat for something similar. Using this dataset, generate a linear dis- criminant function using the LP formulation above and plot it.

b. [13 points] Now, consider the data set in hw1 data.dat. It is consistent with a conjunctive concept (as in Problem 2) with n = 10. Use your linear program to learn the target concept. State the linear discriminant function returned and succinctly explain your results. For example, what conjunctive concept is repre- sented by the hypothesis returned, and how can this be known from the resulting hyperplane from your function. (Or can this be known?) What can be said of δ and θ? (The next problem may help you answer this.) In addition to your explanations, you should also give the actual weights and δ and θ reported by your program. c. [20 points] The second program we gave you is randgen.m which generates sets of random points taken from a multivariate gaussian distribution which you specify with the appropriate mean and covariance parameters. You may want to use matlab’s help function to see how it is used. For the first part of this exercise, you should generate 2 sets of points D 0 and D 1 where the label of class D 0 is 1 and D 1 is -1. We would like to see plots of this dataset, so sample your points from bivariate gaussians. Now, using the program you wrote for the above two sections, compute a linear discriminant function, and plot it. You should experiment with different means and covariances, and perhaps number of points. Here are two cases you should try:

  1. [pos, neg] = randgen(50, [5, 5], [1, 1; 0, .1], [2, 5], [1, 1; 0, .1])
  2. [pos, neg] = randgen(50, [1, 3], [1, 0; 0, 1], [4, 3], [1, 0; 0, 1])

In your report, give your results for the above two cases, and any other cases which you think are interesting, and explain why they are interesting. In particular, what effect do your test cases have on δ? What does minimizing δ in your objective function accomplish? Give a brief explanation in your report. If you are having trouble using Matlab, please look at the file LPdisc given.m noting that Matlab supplies help for most commands using the help syntax. There are also some matlab resources available from the CS450 website. (http://www.cse.uiuc.edu/cs450/matlab/index.html)