


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Quiz; Professor: Roth; Class: Machine Learning; Subject: Computer Science; University: University of Illinois - Urbana-Champaign; Term: Fall 2010;
Typology: Quizzes
1 / 4
This page cannot be seen from the preview
Don't miss anything!



CS446: Pattern Recognition and Machine Learning Fall 2010
Date: November 4, 2010 Name (NetID): Instructions:
Consider a multi-class classification problem with k class labels { 1 , 2 ,... k}. Assume that we are given m examples, labeled with one of the k class labels. Assume for simplicity that we have m/k examples of each type. Assume that you have a learning algorithm L that can be used to learn Boolean functions. (E.g., think about L as the perceptron algorithm). We would like to explore several ways to develop learning algorithms for the multi class classification problem.
1: for each example (x, i) (that is, label of x is i) do 2: for all (i, j), i 6 = j do 3: if (wiT − wTj∗ ) · x < 0 (mistaken prediction) then 4: wi ← wi + x (promotion) 5: wj ← wj − x (demotion) 6: end if 7: end for 8: end for
And we get the minimal margin of a data set by minimizing over all examples in it. To make the general case even closer to the balanced case, we present the Conservative update scheme: In fact, when training via constraint classification, we don’t want to penalize all components of w. Rather, we only want to update the component that corresponds to the toughest competition to the correct label i, that is, the label with the smallest margin.
In this section, we just re-write the algorithm above in a way that can later be generalized to a more general setting, of Structure Prediction. We can go now back to the global weight vector view, where we think about the concatenated k weight vectors into
w = (w 1 , w 2 ,... wk) ∈ ℜnk.
In this view, an example (x, i) is embedded in an nk dimensional vector, with x embedded in the i − th part of it, and 0 in all the other dimensions. We note that
f (x, y) = wT^ · x = wyT · x
In this notation the prediction we make is:
y∗^ = argmaxy∈[k]f (x, y).
And, how ca we write the conservative update in this view:
1: for each example (x, i) (that is, label of x is i) do 2: Let j∗^ be such that j∗^ = minj∈[k]\i(wTi · x − wjT · x) 3: if (wTi − wjT∗ ) · x < 0 (mistaken prediction) then 4: w ← w + (x, i) − (x, j∗) 5: end if 6: end for