



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Assignment; Professor: Roth; Class: Machine Learning; Subject: Computer Science; University: University of Illinois - Urbana-Champaign; Term: Fall 2008;
Typology: Assignments
1 / 6
This page cannot be seen from the preview
Don't miss anything!




CS446: Pattern Recognition and Machine Learning Fall 2008
Handed Out: September 2, 2008 Due: September 11, 2008
Let H be a set of hypotheses consisting of all the origin centered “bagels”. Formally, hypotheses are of the form (a <
x^2 + y^2 < b), where a < b, and a, b ∈ Z (the set of non-negative integers).
a. Consider the version space with respect to the set of positive (+) and negative (×) training examples shown above:
What is the S boundary set of the version space in this case? Write out the hypotheses in the form given above and draw them in on the diagram. [5 points] b. What is the G boundary set of this version space? Write out the hypotheses and draw them in. [5 points] c. Suppose the learner may now suggest a new (x, y) instance and ask the trainer for its classification. Suggest a query guaranteed to reduce the size of the version space, regardless of how the trainer classifies it. Suggest one that will not reduce the size of the version space, regardless of how the trainer classifies it. [5 points] d. There are many other possible hypothesis spaces that can explain this data. Pro- pose one alternate hypothesis space and explicitly define its parameters as we did with a, b in the bagel space above. Choose an instance from your hypothesis space that seperates the given data. Write out this hypothesis and sketch it. What can you say about the number of parameters in your space compared to the bagel space and the given number of training points. [5 points]
(1, 0 ,... , 1) = (x 1 = 1, x 2 = 0,... , xn = 1)
Now consider a hypothesis space H consisting of all conjunctions of constraints over these variables. For example, a typical hypothesis would be
(x 1 = 1 ∧ x 5 = 0 ∧ x 7 = 1)
For example, an instance (1, 0 , 1 , 0 , 0 , 1 , 1 ,... , 0) will be labeled as positive(1) and (0, 0 , 1 , 0 , 0 , 1 , 0 ,... , 1) as negative(-1) according to the above hypothesis.
a. Propose an algorithm that accepts a sequence of labeled training examples and outputs a hypothesis that is consistent with all the training examples, if one exists. If there are no consistent hypotheses, your algorithm should indicate as such and halt. [10 points] b. Prove the correctness of your algorithm. That is, argue that the hypothesis it produces labels all the training examples correctly. [10 points] c. Analyze the complexity of your algorithm as a function of the number of variables (n) and the number of training examples (m). Your algorithm should run in time polynomial in n and m. [10 points] d. Assume that there exists a conjunction in H that is consistent with all of the training examples. Now, you are given a new, unlabeled example that is not part of your training set. What can you say about the ability of the hypothesis derived by your algorithm in (c) to produce the correct label for this example? Consider interesting subcases in order to say as much as you can about this. [10 points]
Note that inequality (4) of the LP is equivalent to 2 x 1 + x 3 < 4.
~c =
~b =
(^) ~x =
x 1 x 2 x 3
To solve this program using Matlab:
c = [0.1; 0.05; 0.25]; A = [3 1 2; 1 3 2; -2 0 -1]; b = [7; 9; -4]; lowerBound = zeros(3, 1); % this constrains x >= 0 [x, z] = linprog(c, -A, -b, [], [], lowerBound)
The results of this linear program show that to meet your nutritional requirements at a minimum cost, you should eat 1.5 eggs, 2.5 servings of pasta, and no servings of yogurt at a cost of $0.275.
We can use this framework to define the problem of learning a linear discriminant function^3. Let x~ 1 , ~x 2 ,... , ~xm (x~i ∈ Rn) represent n-dimensional samples, ~y ∈ {− 1 , 1 }m be a m-by-1 vector represting the labels of each of the m respective samples, w~ ∈ Rn be a n-by-1 vector representing the weights of the linear discriminant function, and θ be the threshold value. We define ~x to be a positive example when w~T^ ~x ≥ θ and negative when w~T^ ~x < θ for negative examples. With the introduction of an artificial variable δ we can say that we want to minimize the value of δ, such that
yi · w~T^ x~i + δ ≥ yi · θ.
Therefore, we can use the linear program formulation stated previously.
z = δ → min
yi · w~T^ x~i + δ ≥ yi · θ ∀i = 1,... , m δ ≥ 0
Note also that wi∀i and θ are unconstrained. Given that yi ∈ {− 1 , 1 }, you should be able to convince yourself that the above formulation works for both positive and negative examples. It is similar to the fat intake constraint in the example problem. (^3) Note that this discussion closely parallels the linear programming representation found in Duda, Hart,
and Stork, Pattern Classification
a. [7 points] We have given you a few pieces of matlab code. The first file, LPdisc given.m contains some code to read files and plot data points and lines. You should use this given code as a starting point for your own solution. Generate a dataset by hand which represents a monotone conjunction over two variables. You should constrain the values of your variables to be 0 or 1. Take a look at hw1 data.dat for something similar. Using this dataset, generate a linear dis- criminant function using the LP formulation above and plot it.
b. [13 points] Now, consider the data set in hw1 data.dat. It is consistent with a conjunctive concept (as in Problem 2) with n = 10. Use your linear program to learn the target concept. State the linear discriminant function returned and succinctly explain your results. For example, what conjunctive concept is repre- sented by the hypothesis returned, and how can this be known from the resulting hyperplane from your function. (Or can this be known?) What can be said of δ and θ? (The next problem may help you answer this.) In addition to your explanations, you should also give the actual weights and δ and θ reported by your program. c. [20 points] The second program we gave you is randgen.m which generates sets of random points taken from a multivariate gaussian distribution which you specify with the appropriate mean and covariance parameters. You may want to use matlab’s help function to see how it is used. For the first part of this exercise, you should generate 2 sets of points D 0 and D 1 where the label of class D 0 is 1 and D 1 is -1. We would like to see plots of this dataset, so sample your points from bivariate gaussians. Now, using the program you wrote for the above two sections, compute a linear discriminant function, and plot it. You should experiment with different means and covariances, and perhaps number of points. Here are two cases you should try:
In your report, give your results for the above two cases, and any other cases which you think are interesting, and explain why they are interesting. In particular, what effect do your test cases have on δ? What does minimizing δ in your objective function accomplish? Give a brief explanation in your report. If you are having trouble using Matlab, please look at the file LPdisc given.m noting that Matlab supplies help for most commands using the help