Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Pattern Recognition and Machine Learning - Machine Learning | CS 446, Quizzes of Computer Science

University of Illinois - Urbana-Champaign Computer Science

Prof. Dan Roth

Material Type: Quiz; Professor: Roth; Class: Machine Learning; Subject: Computer Science; University: University of Illinois - Urbana-Champaign; Term: Fall 2010;

Typology: Quizzes

Pre 2010

Uploaded on 12/15/2010

tonyh1986 🇺🇸

5

(1)

5 documents

1 / 4

This page cannot be seen from the preview

Don't miss anything!

CS446: Pattern Recognition and Machine Learning Fall 2010

Class Exercise 4

Date: November 4, 2010 Name (NetID):

Instructions:

•Please write your name and NetId at the top of this sheet before you return it to the instructor.

•The goal of the exercises is to help you recall previous lectures and homeworks and think about them.

If you want, you may refer to your class notes to answer the questions.

•Answer: The solutions are highlighted.

Multi-class Classification

Consider a multi-class classification problem with kclass labels {1,2,...k}. Assume that

we are given mexamples, labeled with one of the kclass labels. Assume for simplicity that

we have m/k examples of each type.

Assume that you have a learning algorithm Lthat can be used to learn Boolean functions.

(E.g., think about Las the perceptron algorithm). We would like to explore several ways to

develop learning algorithms for the multi class classification problem.

1. Suggest two schemes to use the algorithm Lon the given data set, and produce a

multi-class classification. In each case, determine

•How you will train L? That is, what is the input data, what are the positive and

negative example, etc. Indicate how many “copies” of Lyou will use.

Answer: Scheme 1: We will have kclassifiers (that is, kweight vectors.) The

ith weight vector will assign a confidence score to the ith class. To train this, we

create kbinary problems as follows: For the ith class, the positive examples will

be all examples with label iand the negative examples will be examples with all

other labels.

Scheme 2: We will have 1

2k(k−1) weight vectors. Each weight vector, wi,j, will

assign a preference between classes iand j. To train this, we will create binary

problems as follows: For training wi,j, the positive examples will be examples

labeled iand the negative examples will be those labeled as j.

•How will you use your final hypothesis given a new example.

Answer: Scheme 1: The label can be chosen as the one that achieves the

maximum score. That is, for an input x,y∗=arg maxiwT

ix.

Scheme 2: There are several ways to use the k(k−1) classifiers. One approach

would be use all of them on the example and have each classifier vote on the class.

Then the label with the highest number of votes would be the winner. Another

approach is to conduct a tournament between the labels.

1

Discover Quizzes of Computer Science University of Illinois - Urbana-Champaign

Partial preview of the text

Download Pattern Recognition and Machine Learning - Machine Learning | CS 446 and more Quizzes Computer Science in PDF only on Docsity!

CS446: Pattern Recognition and Machine Learning Fall 2010

Class Exercise 4

Date: November 4, 2010 Name (NetID): Instructions:

Please write your name and NetId at the top of this sheet before you return it to the instructor.
The goal of the exercises is to help you recall previous lectures and homeworks and think about them. If you want, you may refer to your class notes to answer the questions.
Answer: The solutions are highlighted.

Multi-class Classification

Consider a multi-class classification problem with k class labels { 1 , 2 ,... k}. Assume that we are given m examples, labeled with one of the k class labels. Assume for simplicity that we have m/k examples of each type. Assume that you have a learning algorithm L that can be used to learn Boolean functions. (E.g., think about L as the perceptron algorithm). We would like to explore several ways to develop learning algorithms for the multi class classification problem.

Suggest two schemes to use the algorithm L on the given data set, and produce a multi-class classification. In each case, determine - How you will train L? That is, what is the input data, what are the positive and negative example, etc. Indicate how many “copies” of L you will use. Answer: Scheme 1: We will have k classifiers (that is, k weight vectors.) The ith^ weight vector will assign a confidence score to the ith^ class. To train this, we create k binary problems as follows: For the ith^ class, the positive examples will be all examples with label i and the negative examples will be examples with all other labels. Scheme 2: We will have 12 k(k − 1) weight vectors. Each weight vector, wi,j , will assign a preference between classes i and j. To train this, we will create binary problems as follows: For training wi,j , the positive examples will be examples labeled i and the negative examples will be those labeled as j. - How will you use your final hypothesis given a new example. Answer: Scheme 1: The label can be chosen as the one that achieves the maximum score. That is, for an input x, y∗^ = arg maxi wTi x. Scheme 2: There are several ways to use the k(k − 1) classifiers. One approach would be use all of them on the example and have each classifier vote on the class. Then the label with the highest number of votes would be the winner. Another approach is to conduct a tournament between the labels.

In the first scheme proposed above you used k classifiers. We call this scheme 1-vs-all.
- Can you invent a similar scheme that only makes use of log 2 k classifiers? Answer: We will need log 2 k bits to represent all the labels in binary representa- tion. Now, each bit can either be 0 or 1. We can train a classifier for each bit. At prediction time, we can use the predictions of the log 2 k classifiers to form a log 2 k binary string, which will be the prediction.
- Think about one disadvantage of this scheme. Answer: This scheme is extremely sensitive to noise. If even one of the classifiers is incorrect, our final prediction will be wrong.
- How can we deal with this problem? Answer: Using the error correcting code scheme (See below and class slides).
- The error correcting code scheme uses redundancy to address the problem. For simplicity, assume k = 8 class labels. Instead of using 3 classifiers, use 5. - How many elements are there in the output space? Answer: 25 - How will you use the 5 classifiers distinguish the k = 8 labels? Answer: Since we need to represent 8 = 2^3 labels using 5 bits, we can use the remaining two bits to design an error correcting code for each label. For example, consider the following assignment: Label Code 0 0 0 0 0 0 1 0 0 1 0 1 2 0 1 1 1 0 3 0 1 0 1 1 4 1 1 0 0 0 5 1 0 0 0 1 6 1 0 1 1 0 7 1 1 1 1 1 Each code is at least two bits away from all others. This way, the code can correct errors of upto one bit. That is, one of the classifiers can make an incorrect prediction and we can still recover from it. - What problems do you see with this scheme? Answer: The main problem with this scheme is with the meaning of the codes. For example, according to the above encoding, the classifier for the least significant bit should learn to separate labels 0,2,4,6 from the 1,3,5,7. Why should this be separable?

1: for each example (x, i) (that is, label of x is i) do 2: for all (i, j), i 6 = j do 3: if (wiT − wTj∗ ) · x < 0 (mistaken prediction) then 4: wi ← wi + x (promotion) 5: wj ← wj − x (demotion) 6: end if 7: end for 8: end for

And we get the minimal margin of a data set by minimizing over all examples in it. To make the general case even closer to the balanced case, we present the Conservative update scheme: In fact, when training via constraint classification, we don’t want to penalize all components of w. Rather, we only want to update the component that corresponds to the toughest competition to the correct label i, that is, the label with the smallest margin.

From Multiclass to Structure Prediction

In this section, we just re-write the algorithm above in a way that can later be generalized to a more general setting, of Structure Prediction. We can go now back to the global weight vector view, where we think about the concatenated k weight vectors into

w = (w 1 , w 2 ,... wk) ∈ ℜnk.

In this view, an example (x, i) is embedded in an nk dimensional vector, with x embedded in the i − th part of it, and 0 in all the other dimensions. We note that

f (x, y) = wT^ · x = wyT · x

In this notation the prediction we make is:

y∗^ = argmaxy∈[k]f (x, y).

And, how ca we write the conservative update in this view:

1: for each example (x, i) (that is, label of x is i) do 2: Let j∗^ be such that j∗^ = minj∈[k]\i(wTi · x − wjT · x) 3: if (wTi − wjT∗ ) · x < 0 (mistaken prediction) then 4: w ← w + (x, i) − (x, j∗) 5: end if 6: end for

Pattern Recognition and Machine Learning - Machine Learning | CS 446, Quizzes of Computer Science

Related documents

Partial preview of the text

Download Pattern Recognition and Machine Learning - Machine Learning | CS 446 and more Quizzes Computer Science in PDF only on Docsity!

Class Exercise 4

Multi-class Classification

From Multiclass to Structure Prediction