Pattern Recognition and Machine Learning CS446: Problem Set 4 Solution - Prof. Dan Roth | Assignments Computer Science

CS446: Pattern Recognition and Machine Learning Fall 2008

Problem Set 4

Solution Handed In: October 22, 2008

1. [Computing Margins - 35 points]

(a) With the given hyperplane there are 12 positive examples, 38 negative. The

margin of this data is γ= 0.158113883008419.

(b) i. One possible hyperplane for this disjunction is:

w=<0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0>, θ = 0.5

In other words w3, w6, w9, w12 are 1 and all other wiare 0, and theta is 0.5.

This way if any of x3, x6, x9,or x12 are positive, the dot product will be greater

than 0.5 and the example classified as positive.

This hyperplane yields a margin of γ= 0.25 on the data.

ii. The minimum distance between a positive and negative example is 1.732

w=<0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,0>, θ = 0.5

This hyperplane yields a margin of γ= 0.166666666666667 on the data.

ii. The minimum distance between a positive and negative example is 1.414

(d) In the full example space of all 220 binary vectors, the distance between a positive

and negative example is going to be just one bit for any function, since there has

to be some dividing point for positive to negative, and there is going to be a one

bit difference at this point. What we have here is only 50 points sampled from

this space, so differences can be quite further apart. Given that, with the sparse

disjunction there are more unused dimensions for the examples to change: For

two examples to be close but classified differently they need to be close in the

16 unused dimensions, but differ in at least one of the 4 used dimensions. On

the other hand for the denser disjunction two examples need to be close in the

11 unused dimensions, but differ in at least one of the 9 used dimensions. This

means that randomly sampled examples are more likely to be closer when they

have fewer dimensions to differ on.

Given that the distance is closer (which is an upper bound on possible margin)

and the given margin found is less for dense disjunctions, we have less ”wiggle

room” to learn a linear separator, so we will need more examples to find one to

agree with the training data. As we saw in with Novikoff’s bound, the size of the

margin is inversely related to the number of mistakes Perceptron will make.

2. [VC Dimension - 40 points]

(a) The positive space of our hypothesis is a convex space since it is the intersection

of two convex spaces (linear classifiers). This means that any set of points where

Pattern Recognition and Machine Learning CS446: Problem Set 4 Solution - Prof. Dan Roth, Assignments of Computer Science